--- title: "Import workflow from template" fontsize: 14pt editor_options: chunk_output_type: console markdown: wrap: 80 type: docs weight: 2 --- ## Build workflow from a template The precisely [same workflow](../step_interactive) can be created by importing the steps from a template file. In SPR, we use R Markdown files as templates. As demonstrated above, it is required to initialize the project with `SPRproject` function. ```{r setup, echo=TRUE, message=FALSE, warning=FALSE} suppressPackageStartupMessages({ library(systemPipeR) }) ``` ## `importWF` `importWF` function will scan and import all the R chunk from the R Markdown file and build all the workflow instances. Then, each R chuck in the file will be converted in a workflow step. We have prepared the template for you already. Let's first see what is in the template or can be read [here](https://raw.githubusercontent.com/systemPipeR/systemPipeR.github.io/main/static/en/sp/spr/sp_run/spr_simple_wf.md): ```{r comment=""} cat(readLines(system.file("extdata", "spr_simple_wf.Rmd", package = "systemPipeR")), sep = "\n") ``` ### SPR chunks The SPR templates has no difference than a normal R markdown file, except one little thing -- the **SPR chunks**. To make a normal R chunk also a SPR chunk, in the chunk header `spr=TRUE` or `spr=T` option needs to be appended. For example: > *\`\`\`{r step_1, eval=TRUE, spr=TRUE}* > *\`\`\`{r step_2, eval=FALSE, spr=TRUE}* Note here the `eval=FALSE`, by default steps with this option will still be imported, but you can use `ignore_eval` flag to change it in `importWF`. #### Preprocess code Inside SPR chunks, before the actual step definition, there is some special space called preprocess code. Why do need preprocess code? When we import/create the workflow steps, these steps are not really executed when the time of creation, no matter it is a `sysArgs` step or a `Linewise` step. However, in many cases, we need to connect different previous steps' outputs to the inputs of the next. This is easy to handle between `sysArgs` steps via the `targets` argument connection. If it is a `Linewise` step to a `sysArgs` step. Things become tricky. Since `Linewise` code is not run at the time of step definition, no output paths are generated, so the next `sysArgs` step cannot find the inputs. To overcome this problem, preprocess code feature is introduced. Defining preprocess code is very easy. Write any lines of R code below the SPR chunk header line. Right before the step is defined, insert one line of comment of `###pre-end` to indicate the completion of preprocess code. For example: ```{r eval=FALSE} targetspath <- system.file("extdata", "cwl", "example", "targets_example.txt", package = "systemPipeR") ###pre-end appendStep(sal) <- SYSargsList( step_name = "Example", targets = targetspath, wf_file = "example/example.cwl", input_file = "example/example.yml", dir_path = system.file("extdata/cwl", package = "systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_") ) ``` In the example above the targets path is not directly loaded but given through an intermediate variable `targetspath`. This is a simple example, other useful actions like path concatenation, checking file integrity before piping to expensive (slow) functions can also be done in preprocess. Another good example will be the [ChIPseq](https://systempipe.org/SPchipseq/articles/SPchipseq.html) workflow. Watch closely how the output of `LineWise` step [merge_bams](https://systempipe.org/SPchipseq/articles/SPchipseq.html#merge-bam-files-of-replicates-prior-to-peak-calling) is predicted and writing to an intermediate targets file on-the-fly in the preprocess code of [call_peaks_macs_withref](https://systempipe.org/SPchipseq/articles/SPchipseq.html#peak-calling-with-inputreference-sample) so it can be used in the `call_peaks_macs_withref` step creation as input targets. Actually, if the SPR chunk has R code before the step definition but `###pre-end` delimiter is not added, these code will **still** be evaluated at the time of import. However, these lines of code will not be store in the `SYSargsList`, so later when you render the report (`renderReport`) or export the workflow as a new template (`sal2rmd`), these lines will **not** be included. That means, these lines are **one-shot** reprocess code and not reproducible. #### Other rules - For SPR chunks, the last object assigned must to be the `SYSargsList`, for example a `sysArgs2`(commandline) steps: ```{r fromFile_example_rules_cmd, eval=FALSE} targetspath <- system.file("extdata/cwl/example/targets_example.txt", package = "systemPipeR") appendStep(sal) <- SYSargsList(step_name = "Example", targets = targetspath, wf_file = "example/example.cwl", input_file = "example/example.yml", dir_path = system.file("extdata/cwl", package = "systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) ``` OR a `Linewise` (R) step: ```{r fromFile_example_rules_r, eval=FALSE} appendStep(sal) <- LineWise(code = { library(systemPipeR) }, step_name = "load_lib") ``` - Also, note that all the required files or objects to generate one particular step must be defined in an imported R code. The motivation for this is that when R Markdown files are imported, the `spr = TRUE` flag will be evaluated, append, and stored in the workflow control class as the `SYSargsList` object. - The workflow object name used in the R Markdown (e.g. `appendStep(sal)`) needs to be the same when the `importWF` function is used. Usually we use the name `sal` ( short abbreviation for `sysargslist`). It is important to keep consistency. If different object names are used, when running the workflow, you can see a error, like *Error:* object not found.. - Special in `importWF`: SPR chunk names will be used as step names, and it has higher priority than the `stepname` argument. For example, the chunk header is `{r step_1, eval=TRUE, spr=TRUE}`, and the inside `SYSargsList` option is `SYSargsList(step_name = "step_99", ...)`. After the import, step name will be overwritten to `"step_1"` instead of `"step_99"`. ### start to import ```{r cleaning3, eval=TRUE, include=FALSE} if (file.exists(".SPRproject_rmd")) unlink(".SPRproject_rmd", recursive = TRUE) ``` ```{r importWF_rmd, eval=TRUE, collapse=TRUE} sal_rmd <- SPRproject(logs.dir = ".SPRproject_rmd") sal_rmd <- importWF(sal_rmd, file_path = system.file("extdata", "spr_simple_wf.Rmd", package = "systemPipeR")) sal_rmd ``` We can see 5 steps are appended to our `sal` object. #### Simple exploration After import, we can explore the workflow to check the steps: ```{r importWF_details} # list individual steps stepsWF(sal_rmd) # list step dependency dependency(sal_rmd) # list R step code codeLine(sal_rmd) # list step targets targetsWF(sal_rmd) ``` ## Update workflow Maybe you have noticed some lines in the importing ```{r eval=FALSE} Template for renderReport is stored at xxxx/.SPRproject_rmd/workflow_template.Rmd Edit this file manually is not recommended ``` It means current import is successful and a copy of your workflow template is copied to this position, and it will be used for `renderReport` as the skeleton. In real data analysis, the workflow template does not always stays the same, e.g. adding some text, new steps to the template. One way we could add new steps is the [interactive method](../step_interactive). The problem is this way does not contain any text description in the final report. `renderReport` has a smart way to insert these new steps that do not exist in the template to the right order but it cannot create text descrption for you. Another way to import new steps or update text in the template is to use `importWF(..., update = TRUE)`. ### Example 1 Let's add a step and some text to [`spr_simple_wf.Rmd`](https://raw.githubusercontent.com/systemPipeR/systemPipeR.github.io/main/static/en/sp/spr/sp_run/spr_simple_wf.md) and try to update. - `update = TRUE` is highly interactive. It uses a Q&A style to ask users things like whether to update preprocess code of certain steps, whether to import certain new steps. In this mode, you can always say "no" to the choice, so you can choose to partially update the template. - Rendering the webpage document is **not** interactive, so here we use `importWF(..., update = TRUE, confirm = TRUE)`, which means confirm all the choices, say "yes" to all. Then, partially update is no longer the option here. For the updated template, you can download [here](https://raw.githubusercontent.com/systemPipeR/systemPipeR.github.io/main/static/en/sp/spr/sp_run/spr_simple_wf_new.md) One step, preprocess code and some description has been added to the end: ````markdown ## A new step This is a new step with some simple code to demonstrate the update of `importWF` `r ''````{r session_info, eval=TRUE, spr=TRUE} cat("some fake preprocess code\n") ###pre-end appendStep(sal) <- LineWise(code={ sessionInfo() }, step_name = "sessionInfo", dependency = "stats") `r ''```` ```` ```{r collapse=TRUE} # the file is with `.md` extension, but `importWF` needs `.Rmd`. # we need to first download and change extension tmp_file <- tempfile(fileext = ".rmd") download.file( "https://raw.githubusercontent.com/systemPipeR/systemPipeR.github.io/main/static/en/sp/spr/sp_run/spr_simple_wf_new.md", tmp_file ) sal_rmd <- importWF(sal_rmd, file_path = tmp_file, update = TRUE, confirm = TRUE) ``` We can see under update mode, `importWF` compare the old template and the new template and find the difference. List all differences to users. It includes: - List all new steps - Compare step orders, update if needed - update line number records of steps from old template to new template. - update preprocess code - finally import new steps A new step has been successfully imported from the new template. ```{r} sal_rmd ``` Under interactive mode, users would have a lot more options. For example, when adding a new step, `importWF` has a back-tracking algorithm that automatically detects the right order where this step should be appended. However, things can go wrong and it does not work 100%. Under interactive mode, the program first lists the previous step where this new step would be appended after, and then users have the option to choose whether this is the correct step. If not, a new prompt would pop up to let the users to manually choose the right order to append the new step. See the gif below. > This does not mean you could append a step to any place. It also has to meet > the dependency requirement. For example, this new step is depend on step 5 but > you manually choose to append it after step 1. Then, the import would fail. ![](../importwf_interactive.gif) ### Example 2 Let's see another [example](https://raw.githubusercontent.com/systemPipeR/systemPipeR.github.io/main/static/en/sp/spr/sp_run/spr_simple_wf_new_precode_changed.md) how `importWF` update preprocess code and line numbers ```{r collapse=TRUE} tmp_file2 <- tempfile(fileext = ".rmd") download.file( "https://raw.githubusercontent.com/systemPipeR/systemPipeR.github.io/main/static/en/sp/spr/sp_run/spr_simple_wf_new_precode_changed.md", tmp_file2 ) sal_rmd <- importWF(sal_rmd, file_path = tmp_file2, update = TRUE, confirm = TRUE) ``` > Note for existing steps, and their preprocess code, they are not re-imported or > re-evaluated. ![](../importwf_interactive_preprocess.gif) ### Colors Rendering the web document is not interactive, so colors are also removed. It is only gray color in code chunks above, but in the actual interactive mode, multiple colors are used to indicate the status as you have seen in the gifs. ## Advanced templates There are quite a few pre-configed templates that is provided by the [systemPipeRdata](/sp/sprdata/) package. You can also take a look at them individual [here](/spr_wf/) ## Session ```{r} sessionInfo() ``` ```{r eval=TRUE, include=FALSE} # cleaning try(unlink(".SPRproject_rmd", recursive = TRUE), TRUE) try(unlink("data", recursive = TRUE), TRUE) try(unlink("results", recursive = TRUE), TRUE) try(unlink("param", recursive = TRUE), TRUE) ```