--- title: "Visualize workflows" date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" vignette: | %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{systemPipeR: Workflow design and reporting generation environment} %\VignetteEngine{knitr::rmarkdown} fontsize: 14pt editor_options: chunk_output_type: console type: docs weight: 4 --- ```{r setup, echo=TRUE, message=FALSE, warning=FALSE} suppressPackageStartupMessages({ library(systemPipeR) }) ``` In the last section, we have learned how to [run/manage workflows](../step_run). In this section, we will learn advanced options how to visualize workflows. First let's set up the workflow using the example workflow template. For real production purposes, we recommend you to check out the complex templates over [here](/spr_wf/). ```{r eval=TRUE, include=FALSE} # cleaning try(unlink(".SPRproject", recursive = TRUE), TRUE) try(unlink("data", recursive = TRUE), TRUE) try(unlink("results", recursive = TRUE), TRUE) try(unlink("param", recursive = TRUE), TRUE) try(unlink("varseq", recursive = TRUE), TRUE) ``` ## dependency graph The workflow plot is also called the dependency graph. It shows users how one step is depend on another. This is very important in SPR. A step will not be run unless all dependencies has been executed successfully. To understand a workflow, we can simply call the sal object to print on console like so ```{r eval=FALSE} sal ## Instance of 'SYSargsList': ## WF Steps: ## 1. load_library --> Status: Success ## 2. export_iris --> Status: Success ## 3. gzip --> Status: Success ## Total Files: 3 | Existing: 3 | Missing: 0 ## 3.1. gzip ## cmdlist: 3 | Success: 3 ## 4. gunzip --> Status: Success ## Total Files: 3 | Existing: 3 | Missing: 0 ## 4.1. gunzip ## cmdlist: 3 | Success: 3 ## 5. stats --> Status: Success ``` However, when the workflow becomes very long and complex, the relation between steps are hard to see from console. Workflow plot is the useful tool to understand the workflow. For example, the [VARseq workflow](https://systempipe.org/SPvarseq/articles/SPvarseq.html) is complex, we can show it by: ```{r eval=FALSE} systemPipeRdata::genWorkenvir("varseq") setwd("varseq") sal <- SPRproject() sal <- importWF(sal, file_path = "systemPipeVARseq.Rmd") sal ``` ```{r echo=FALSE, collapse=TRUE} sal <- readRDS("varseq.rds") sal ``` Directly printing the `sal` object as above does not give us the dependencies between steps and it is hard to see the full picture. Here, we can use `plotWF` to visualize the full workflow. ```{r} plotWF(sal) ``` ## Advanced use The VARseq workflow is too large and too complex. Here, for demonstration purposes, we still use the [simple workflow](https://raw.githubusercontent.com/systemPipeR/systemPipeR.github.io/main/static/en/sp/spr/sp_run/spr_simple_wf.md). ```{r} sal <- SPRproject() sal <- importWF(sal, file_path = system.file("extdata", "spr_simple_wf.Rmd", package = "systemPipeR"), verbose = FALSE) plotWF(sal, rstudio = TRUE) ``` ### Rstudio By default, the plot is opened in a new browser tab, because workflow can be very large and long. Viewing in the small Rstudio window is not ideal. This is controlled by the `rstudio` argument, and it is default `rstudio = FALSE`. It means whether to open the plot in a new browser or view it inside the current tool, for example many people use IDEs like Rstudio. If you insist to view it in the built-in viewer, or sometimes **rendering the R markdown** **from an interactive session**, where we do not want to open in a new browser tab, `rstudio = TRUE` must be added. ### Height and width Workflow plot height and width are adjustable by `height` and `width`. They can take any valid [CSS unit](https://www.w3schools.com/cssref/css_units.asp). By default, it take 100% of the [parent element](https://www.w3schools.com/html/html_elements.asp) width, and automatically calculate the height based on need. Sometimes these fraction based or automatically generated units are not right. We can manually set them ```{r} plotWF(sal, width = "50%", rstudio = TRUE) ``` ```{r} plotWF(sal, height = "300px") ``` ### Color and text On the plot, different colors and numbers indicate different status. This information can be found also in the plot legends. **Shapes:** - circular steps: pure R code steps - rounded squares steps: `sysargs` steps, steps that will invoke command-line calls - blue colored steps and arrows: main branch (see [main branch](#main-branch) section below) **Step colors** - black: pending steps - Green: successful steps, all pass - Orange: successful steps, but some samples have warning - Red: failed steps, at least one sample failed **Number and colors**
There are 4 numbers in the second row of each step, separated by `/` - First No.: number of passed samples - Second No.: number of warning samples - Third No.: number of error samples - Fourth No.: number of total samples **Duration**
This is shown after the sample information, as how long it took to run this step. Units are a few seconds (**s**), some minutes (**m**), or some hours (**h**). For example, let's append a warning step and an error step to the `sal` ```{r} appendStep(sal) <- LineWise(step_name = "warning_step", {warning("this creates a warning")}, dependency = "stats") appendStep(sal) <- LineWise(step_name = "error_step", {stop("this creates an error")}, dependency = "stats") sal ``` ```{r eval=FALSE} sal <- runWF(sal) ``` ```{r error=TRUE, collapse=TRUE, include=FALSE} sal <- runWF(sal) ``` Then let's plot it ```{r} plotWF(sal, width = "80%") ``` Do you see the color difference? ### On hover By default `plotWF` uses SVG to make the plot so it is interactive. When the mouse is hovering on each step, detailed information will be displayed, like sample information, processing time, duration, _etc_. ![](../plotwf_hover.png) ### Embedding In additional to SVG embedding, PNG embedding is supported, but the plot will no longer be interactively, good for browsers without optimal SVG support. ```{r} plotWF(sal, plot_method = "png", width = "80%") ``` Right click on the plot of SVG and PNG, we can see, SVGs are not directly savable, but PNGs are. However, PNGs are not vectorized, so it means it becomes blurry when we zoom in. ### Responsiveness This is a term often used in web development. It means will the plot resize itself if the user resize the document window? By default, `plotWF` will be responsive, meaning it will fit current window container size and adjust the size once the window size has changed. To always display the full sized plot, use `responsive = FALSE`. It is useful for embedding the plot in a full-screen mode. ```{r} plotWF(sal, responsive = FALSE, width = "80%") ``` Now resize your window width and watch plot above _vs_. other plots. ### Pan-zoom The Pan-zoom option enables users to drag the plot instead of scrolling, and to use mouse wheel to zoom in/out. If you do not like the scroll bars in `responsive = FALSE`, try this option. Note it cannot be used with `responsive = TRUE` together. If both `TRUE`, `responsive` will be automatically set to `FALSE`. To enable this function internet connection is required to download Javascript libraries on-the-fly. ```{r} plotWF(sal, pan_zoom = TRUE) ``` ### Layout There a few different layout you can choose. There is no best layout. It all depends on the workflow structure you have. The default is `compact` but we recommend you to try different layouts to find the best fitting one. - `compact`: try to plot steps as close as possible. - `vertical`: main branch will be placed vertically and side branches will be placed on the same horizontal level and sub steps of side branches will be placed vertically. - `horizontal`: main branch is placed horizontally and side branches and sub steps will be placed vertically. - `execution`: a linear plot to show the workflow execution order of all steps. Here we are talking about the concept of **main branch**. It is a way to decide the plot center. We will discuss more below. **vertical** ```{r} plotWF(sal, layout = "vertical", height = "600px") ``` If the plot is very long, use `height` to make it smaller. **horizontal** ```{r} plotWF(sal, layout = "horizontal") ``` **execution** ```{r} plotWF(sal, layout = "execution", height = "600px", responsive = FALSE) ``` If the plot is too long, we can use `height` to limit it and/or use `responsive` to make it scrollable. ### Main branch From the examples above, you can see that there are many steps which do not point to any other steps in downstream. These dead-ends are called ending steps. If we connect the first step, steps in between and these ending step, this will become a branch. Imagine the workflow is a top-bottom tree structure and the root is the first step. Therefore, there are many possible ways to connect the workflow. For the convenience of plotting, we introduce a concept of _"main branch"_, meaning one of the possible connecting strategies that will be placed at the center of the plot. Other steps that are not in this major branch will surround this major space. This _"main branch"_ will not affect how a workflow is run, but just an algorithm to **compute the best visualization**. It will have impact on how we plot the workflow. This main branch will not impact the `compact` layout so much but will have a huge effect on `horizontal` and `vertical` layouts. The algorithm in `plotWF` will automatically choose a best branch for you by default. In simple words, it favors: (a). branches that connect first and last step; (b). as long as possible. You can also choose a branch you want by `branch_method = "choose"`. It will first list all possible branches, and then give you a prompt to ask for your favorite branch. Here, for rendering the Rmarkdown, we cannot have a prompt, so we use a second argument in combination, `branch_no = x` to directly choose a branch and skip the prompt. Also, we use the `verbose = TRUE` to imitate the branch listing in console. In a real case, you only need `branch_method = "choose"`. To have the main branch marked, `mark_main_branch = TRUE` must be added (default `FALSE`). Watch closely how the plot change by choosing different branches. Here we use `vertical` layout to demo. Remember, the main branch is marked in blue. Choose branch 1 ```{r collapse=TRUE} plotWF(sal, mark_main_branch = TRUE, layout = "vertical", branch_method = "choose", branch_no = 1, verbose = FALSE, height = "450px") ``` Choose branch 2 ```{r collapse=TRUE} plotWF(sal, mark_main_branch = TRUE, layout = "vertical", branch_method = "choose", branch_no = 2, verbose = FALSE, height = "450px") ``` Do you see the difference? ### Legends The legend can also be removed by show_legend = FALSE ```{r} plotWF(sal, show_legend = FALSE, height = "500px") ``` ### Output formats There are current three output formats: `"html"` and `"dot"`, `"dot_print"`. If first two were chosen, you also need provide a path `out_path` to save the file. - html: a single html file contains the plot. - dot: a DOT script file with the code to reproduce the plot in a [graphiz](https://graphviz.org/) DOT engine. - dot_print: directly cat the dot script to console. #### HTML HTML format is very useful if you want to view the plot later or share it to other people. This format is also helpful when you are working on a remote computer cluster. To view the workflow plot, a browser device (viewer) must be available, but often time this is not the case for computer clusters. When you plot a workflow and see the message `"Couldn't get a file descriptor referring to the console"`, it means your computer (cluster) does not have a browser device. Saving to HTML format is the best option. ```{r} plotWF(sal, out_format = "html", out_path = "example_out.html") file.exists("example_out.html") ``` #### DOT Saving workflow plot to `dot` format allows one to reproduce the plot with the Graphviz language. ```{r} plotWF(sal, out_format = "dot", out_path = "example_out.dot") file.exists("example_out.dot") ``` #### DOT print Instead of saving the Graphviz plotting code to a file, this option directly prints out the code on console. If you have a Graphviz plotting device in hand, simply copy and paste the code to that engine to reproduce the plot. For example, use our [Workflow Plot Editor](https://systempipe.org/sp/spr/viz_editor). ```{r eval=FALSE} plotWF(sal, out_format = "dot_print") ``` ### Saving Static image file Some users may want to save the plot to a static image, like `.png` format. We will need do some extra work to save the file. The reason we cannot directly save it to a png file is the plot is generated in real-time by a browser javascript engine. It requires one type of javascript engine, like Chrome, MS Edge, Viewer in Rstudio, to render the plot before we can see it, no matter you use SVG or PNG embedding. #### Interactive - With the `plot_ctr = TRUE` (default) option, a plot control panel is displayed on the top-left corner. One can choose from different formats like _png_, _jpg_, _svg_ or _pdf_ to download them from the webpage. To enable these buttons, internet connection is required. The underlying Javascript libraries are download on-the-fly. Please make sure internet is connected. There are known conflicts of underlying web format creating libraries with R markdown web libraries, so some of these buttons may not work inside R markdown as you are seeing in this vignette right now. However, they should work properly if the workflow plot is saved to a stand-alone HTML file. - If you are working in Rstudio, you can also use the `export` button in the viewer to save an image file. > Note: due to the web libraries conflicts of this website and the libraries used > in `plotWF`. Some buttons may not work when you click, but it will work when you > open make workflow plots interactively and view it in a stand-alone browser tab. #### Non-interactive If you cannot have an interactive session, like submitting a job to a cluster, but still want the png, we recommend to use the {[webshot2](https://github.com/rstudio/webshot2)} package to screenshot the plot. It runs headless Chrome in the back-end (which has a javascript engine). Install the package ```{r eval=FALSE} remotes::install_github("rstudio/webshot2") ``` Save to html first ```{r eval=FALSE} plotWF(sal, out_format = "html", out_path = "example_out.html") file.exists("example_out.html") ``` Use `webshot2` to save the image ```{r eval=FALSE} webshot2::webshot("example_out.html", "example_out.png") ``` ### In logs The workflow steps will also become clickable if `in_log = TRUE`. This will create links for each step that navigate to corresponding log section in the SPR [workflow log file](https://systempipe.org/sp/spr/sp_run/step_reports/). Normally this option is handled by SPR log file generating function `renderLogs` to create this plot on top of the log file, so when a certain step is click, it will navigate to the detailed section down the page. Visit [this page](../logs_example.html) to see a real example. Try to click on the step in the workflow plot and watch what happens. # Session ```{r} sessionInfo() ``` ```{r eval=TRUE, include=FALSE} # cleaning try(unlink(".SPRproject", recursive = TRUE), TRUE) try(unlink("data", recursive = TRUE), TRUE) try(unlink("results", recursive = TRUE), TRUE) try(unlink("param", recursive = TRUE), TRUE) try(unlink("varseq", recursive = TRUE), TRUE) try(unlink("example_out.dot", recursive = TRUE), TRUE) try(unlink("example_out.html", recursive = TRUE), TRUE) ```