---
title: "Build workflow interactively" 
fontsize: 14pt
editor_options: 
  chunk_output_type: console
type: docs
weight: 1
---

<!--
- Compile from command-line
Rscript -e "rmarkdown::render('systemPipeR.Rmd', c('BiocStyle::html_document'), clean=F); knitr::knit('systemPipeR.Rmd', tangle=TRUE)"; Rscript ../md2jekyll.R systemPipeR.knit.md 2; Rscript -e "rmarkdown::render('systemPipeR.Rmd', c('BiocStyle::pdf_document'))"
-->

To start, we use the same workflow instance like the last [section](..).
 

```{r setup, echo=FALSE, message=FALSE, warning=FALSE}
suppressPackageStartupMessages({
    library(systemPipeR)
})
```

```{r SPRproject_logs, eval=FALSE}
sal <- SPRproject(logs.dir= ".SPRproject", sys.file=".SPRproject/SYSargsList.yml") 
```

```{r SPRproject1, eval=TRUE, echo=FALSE}
sal <- SPRproject(projPath = getwd(), overwrite = TRUE) 
```


```{r}
sal
```

### Adding the first step 

The first step is R code based, and we are splitting the `iris` dataset by `Species`
and for each `Species` will be saved on file. Please note that this code will
not be executed now; it is just store in the container for further execution. 

This constructor function requires the `step_name` and the R-based code under 
the `code` argument. 
The R code should be enclosed by braces (`{}`) and separated by a new line. 

```{r, firstStep_R, eval=TRUE}
appendStep(sal) <- LineWise(code = {
                              mapply(function(x, y) write.csv(x, y),
                                     split(iris, factor(iris$Species)),
                                     file.path("results", paste0(names(split(iris, factor(iris$Species))), ".csv"))
                                     ) 
                            },
                            step_name = "export_iris")
```

For a brief overview of the workflow, we can check the object as follows:

```{r show, eval=TRUE}
sal
```

Also, for printing and double-check the R code in the step, we can use the 
`codeLine` method:

```{r codeLine, eval=TRUE}
codeLine(sal)
```

### Adding more steps

Next, an example of how to compress the exported files using 
[`gzip`](https://www.gnu.org/software/gzip/) command-line. 

The constructor function creates an `SYSargsList` S4 class object using data from
three input files:

    - CWL command-line specification file (`wf_file` argument);
    - Input variables (`input_file` argument);
    - Targets file (`targets` argument).

In CWL, files with the extension `.cwl` define the parameters of a chosen
command-line step or workflow, while files with the extension `.yml` define the
input variables of command-line steps. 

The `targets` file is optional for workflow steps lacking `input` files. The connection 
between `input` variables and the `targets` file is defined under the `inputvars` 
argument. It is required a `named vector`, where each element name needs to match
with column names in the `targets` file, and the value must match the names of 
the `input` variables defined in the `*.yml` files (see Figure \@ref(fig:sprandCWL)). 

A detailed description of the dynamic between `input` variables and `targets` 
files can be found [here](#cwl_targets). 
In addition, the CWL syntax overview can be found [here](#cwl). 

Besides all the data form `targets`, `wf_file`, `input_file` and `dir_path` arguments,
`SYSargsList` constructor function options include: 

  - `step_name`: a unique *name* for the step. This is not mandatory; however, 
    it is highly recommended. If no name is provided, a default `step_x`, where
    `x` reflects the step index, will be added. 
  - `dir`: this option allows creating an exclusive subdirectory for the step 
    in the workflow. All the outfiles and log files for this particular step will 
    be generated in the respective folders.
  - `dependency`: after the first step, all the additional steps appended to 
    the workflow require the information of the dependency tree. 

The `appendStep<-` method is used to append a new step in the workflow.

```{r gzip_secondStep, eval=TRUE}
targetspath <- system.file("extdata/cwl/gunzip", "targets_gunzip.txt", package = "systemPipeR")
appendStep(sal) <- SYSargsList(step_name = "gzip", 
                      targets = targetspath, dir = TRUE,
                      wf_file = "gunzip/workflow_gzip.cwl", input_file = "gunzip/gzip.yml",
                      dir_path = system.file("extdata/cwl", package = "systemPipeR"),
                      inputvars = c(FileName = "_FILE_PATH_", SampleName = "_SampleName_"), 
                      dependency = "export_iris")
```

Note: This will not work if the `gzip` is not available on your system 
(installed and exported to PATH) and may only work on Windows systems using PowerShell. 

For a overview of the workflow, we can check the object as follows:

```{r}
sal
```

Note that we have two steps, and it is expected three files from the second step.
Also, the workflow status is *Pending*, which means the workflow object is 
rendered in R; however, we did not execute the workflow yet. 
In addition to this summary, it can be observed this step has three command lines. 

For more details about the command-line rendered for each target file, it can be 
checked as follows: 

```{r}
cmdlist(sal, step = "gzip")
```

#### Using the `outfiles` for the next step

For building this step, all the previous procedures are being used to append the 
next step. However, here, we can observe power features that build the 
connectivity between steps in the workflow.

In this example, we would like to use the outfiles from *gzip* Step, as
input from the next step, which is the *gunzip*. In this case, let's look at the 
outfiles from the first step:

```{r}
outfiles(sal)
```

The column we want to use is "gzip_file". For the argument `targets` in the 
`SYSargsList` function, it should provide the name of the correspondent step in
the Workflow and which `outfiles` you would like to be incorporated in the next 
step. 
The argument `inputvars` allows the connectivity between `outfiles` and the 
new `targets` file. Here, the name of the previous `outfiles` should be provided 
it. Please note that all `outfiles` column names must be unique.

It is possible to keep all the original columns from the `targets` files or remove
some columns for a clean `targets` file.
The argument `rm_targets_col` provides this flexibility, where it is possible to
specify the names of the columns that should be removed. If no names are passing
here, the new columns will be appended. 

```{r gunzip, eval=TRUE}
appendStep(sal) <- SYSargsList(step_name = "gunzip", 
                      targets = "gzip", dir = TRUE,
                      wf_file = "gunzip/workflow_gunzip.cwl", input_file = "gunzip/gunzip.yml",
                      dir_path = system.file("extdata/cwl", package = "systemPipeR"),
                      inputvars = c(gzip_file = "_FILE_PATH_", SampleName = "_SampleName_"), 
                      rm_targets_col = "FileName", 
                      dependency = "gzip")
```

We can check the targets automatically create for this step, 
based on the previous `outfiles`:

```{r targetsWF_3, eval=TRUE}
targetsWF(sal[3])
```

We can also check all the expected `outfiles` for this particular step, as follows:

```{r outfiles_2, eval=TRUE}
outfiles(sal[3])
```

Now, we can observe that the third step has been added and contains one substep.

```{r}
sal
```

In addition, we can access all the command lines for each one of the substeps. 

```{r, eval=TRUE}
cmdlist(sal["gzip"], targets = 1)
```

#### Getting data from a workflow instance 

The final step in this simple workflow is an R code step. For that, we are using
the `LineWise` constructor function as demonstrated above. 

One interesting feature showed here is the `getColumn` method that allows 
extracting the information for a workflow instance. Those files can be used in
an R code, as demonstrated below. 

```{r getColumn, eval=TRUE}
getColumn(sal, step = "gunzip", 'outfiles')
```

```{r, iris_stats, eval=TRUE}
appendStep(sal) <- LineWise(code = {
                    df <- lapply(getColumn(sal, step = "gunzip", 'outfiles'), function(x) read.delim(x, sep = ",")[-1])
                    df <- do.call(rbind, df)
                    stats <- data.frame(cbind(mean = apply(df[,1:4], 2, mean), sd = apply(df[,1:4], 2, sd)))
                    stats$species <- rownames(stats)
                    
                    plot <- ggplot2::ggplot(stats, ggplot2::aes(x = species, y = mean, fill = species)) + 
                      ggplot2::geom_bar(stat = "identity", color = "black", position = ggplot2::position_dodge()) +
                      ggplot2::geom_errorbar(ggplot2::aes(ymin = mean-sd, ymax = mean+sd), width = .2, position = ggplot2::position_dodge(.9)) 
                    },
                    step_name = "iris_stats", 
                    dependency = "gzip")
```


# Session 
```{r}
sessionInfo()
```