---
title: "Visualize workflows"
date: "Last update: `r format(Sys.time(), '%d %B, %Y')`"
vignette: |
%\VignetteEncoding{UTF-8}
%\VignetteIndexEntry{systemPipeR: Workflow design and reporting generation environment}
%\VignetteEngine{knitr::rmarkdown}
fontsize: 14pt
editor_options:
chunk_output_type: console
type: docs
weight: 4
---
```{r setup, echo=TRUE, message=FALSE, warning=FALSE}
suppressPackageStartupMessages({
library(systemPipeR)
})
```
In the last section, we have learned how to [run/manage workflows](../step_run).
In this section, we will learn advanced options how to visualize workflows.
First let's set up the workflow using the example workflow template. For real
production purposes, we recommend you to check out the complex templates over [here](/spr_wf/).
```{r eval=TRUE, include=FALSE}
# cleaning
try(unlink(".SPRproject", recursive = TRUE), TRUE)
try(unlink("data", recursive = TRUE), TRUE)
try(unlink("results", recursive = TRUE), TRUE)
try(unlink("param", recursive = TRUE), TRUE)
try(unlink("varseq", recursive = TRUE), TRUE)
```
## dependency graph
The workflow plot is also called the dependency graph. It shows users how
one step is depend on another. This is very important in SPR. A step will not
be run unless all dependencies has been executed successfully.
To understand a workflow, we can simply call the sal object to print on console
like so
```{r eval=FALSE}
sal
## Instance of 'SYSargsList':
## WF Steps:
## 1. load_library --> Status: Success
## 2. export_iris --> Status: Success
## 3. gzip --> Status: Success
## Total Files: 3 | Existing: 3 | Missing: 0
## 3.1. gzip
## cmdlist: 3 | Success: 3
## 4. gunzip --> Status: Success
## Total Files: 3 | Existing: 3 | Missing: 0
## 4.1. gunzip
## cmdlist: 3 | Success: 3
## 5. stats --> Status: Success
```
However, when the workflow becomes very long and complex, the relation between steps are
hard to see from console. Workflow plot is the useful tool to understand the workflow.
For example, the [VARseq workflow](https://systempipe.org/SPvarseq/articles/SPvarseq.html)
is complex, we can show it by:
```{r eval=FALSE}
systemPipeRdata::genWorkenvir("varseq")
setwd("varseq")
sal <- SPRproject()
sal <- importWF(sal, file_path = "systemPipeVARseq.Rmd")
sal
```
```{r echo=FALSE, collapse=TRUE}
sal <- readRDS("varseq.rds")
sal
```
Directly printing the `sal` object as above does not give us the dependencies between
steps and it is hard to see the full picture. Here, we can use `plotWF` to
visualize the full workflow.
```{r}
plotWF(sal)
```
## Advanced use
The VARseq workflow is too large and too complex. Here, for demonstration purposes, we still use the [simple workflow](https://raw.githubusercontent.com/systemPipeR/systemPipeR.github.io/main/static/en/sp/spr/sp_run/spr_simple_wf.md).
```{r}
sal <- SPRproject()
sal <- importWF(sal, file_path = system.file("extdata", "spr_simple_wf.Rmd", package = "systemPipeR"), verbose = FALSE)
plotWF(sal, rstudio = TRUE)
```
### Rstudio
By default, the plot is opened in a new browser tab, because workflow
can be very large and long. Viewing in the small Rstudio window is not ideal.
This is controlled by the `rstudio` argument, and it is default `rstudio = FALSE`. It means
whether to open the plot in a new browser or view it inside the current tool,
for example many people use IDEs like Rstudio.
If you insist to view it in the built-in viewer, or sometimes **rendering the R markdown**
**from an interactive session**, where we do not want
to open in a new browser tab, `rstudio = TRUE` must be added.
### Height and width
Workflow plot height and width are adjustable by `height` and `width`. They can
take any valid [CSS unit](https://www.w3schools.com/cssref/css_units.asp). By
default, it take 100% of the [parent element](https://www.w3schools.com/html/html_elements.asp)
width, and automatically calculate the height based on need. Sometimes these
fraction based or automatically generated units are not right.
We can manually set them
```{r}
plotWF(sal, width = "50%", rstudio = TRUE)
```
```{r}
plotWF(sal, height = "300px")
```
### Color and text
On the plot, different colors and numbers indicate different status. This information
can be found also in the plot legends.
**Shapes:**
- circular steps: pure R code steps
- rounded squares steps: `sysargs` steps, steps that will invoke command-line calls
- blue colored steps and arrows: main branch (see [main branch](#main-branch) section below)
**Step colors**
- black: pending steps
- Green: successful steps, all pass
- Orange: successful steps, but some samples have warning
- Red: failed steps, at least one sample failed
**Number and colors**
There are 4 numbers in the second row of each step, separated by `/`
- First No.: number of passed samples
- Second No.: number of warning samples
- Third No.: number of error samples
- Fourth No.: number of total samples
**Duration**
This is shown after the sample information, as how long it took to run this step.
Units are a few seconds (**s**), some minutes (**m**), or some hours (**h**).
For example, let's append a warning step and an error step to the `sal`
```{r}
appendStep(sal) <- LineWise(step_name = "warning_step", {warning("this creates a warning")}, dependency = "stats")
appendStep(sal) <- LineWise(step_name = "error_step", {stop("this creates an error")}, dependency = "stats")
sal
```
```{r eval=FALSE}
sal <- runWF(sal)
```
```{r error=TRUE, collapse=TRUE, include=FALSE}
sal <- runWF(sal)
```
Then let's plot it
```{r}
plotWF(sal, width = "80%")
```
Do you see the color difference?
### On hover
By default `plotWF` uses SVG to make the plot so it is interactive.
When the mouse is hovering on each step, detailed information will be displayed,
like sample information, processing time, duration, _etc_.
![](../plotwf_hover.png)
### Embedding
In additional to SVG embedding, PNG embedding is supported, but the plot will
no longer be interactively, good for browsers without optimal SVG support.
```{r}
plotWF(sal, plot_method = "png", width = "80%")
```
Right click on the plot of SVG and PNG, we can see, SVGs are not directly savable,
but PNGs are. However, PNGs are not vectorized, so it means it becomes blurry when
we zoom in.
### Responsiveness
This is a term often used in web development. It means will the plot resize itself if the user
resize the document window? By default, `plotWF` will be responsive, meaning it
will fit current window container size and adjust the size once the window size has
changed. To always display the full sized plot, use `responsive = FALSE`. It is useful
for embedding the plot in a full-screen mode.
```{r}
plotWF(sal, responsive = FALSE, width = "80%")
```
Now resize your window width and watch plot above _vs_. other plots.
### Pan-zoom
The Pan-zoom option enables users to drag the plot instead of scrolling, and to
use mouse wheel to zoom in/out. If you do not like the scroll bars in `responsive = FALSE`,
try this option. Note it cannot be used with `responsive = TRUE` together.
If both `TRUE`, `responsive` will be automatically set to `FALSE`. To enable this function
internet connection is required to download Javascript libraries on-the-fly.
```{r}
plotWF(sal, pan_zoom = TRUE)
```
### Layout
There a few different layout you can choose. There is no best layout. It all depends
on the workflow structure you have. The default is `compact` but we recommend you
to try different layouts to find the best fitting one.
- `compact`: try to plot steps as close as possible.
- `vertical`: main branch will be placed vertically and side branches will be placed
on the same horizontal level and sub steps of side branches will be placed
vertically.
- `horizontal`: main branch is placed horizontally and side branches and sub
steps will be placed vertically.
- `execution`: a linear plot to show the workflow execution order of all steps.
Here we are talking about the concept of **main branch**. It is a way to decide
the plot center. We will discuss more below.
**vertical**
```{r}
plotWF(sal, layout = "vertical", height = "600px")
```
If the plot is very long, use `height` to make it smaller.
**horizontal**
```{r}
plotWF(sal, layout = "horizontal")
```
**execution**
```{r}
plotWF(sal, layout = "execution", height = "600px", responsive = FALSE)
```
If the plot is too long, we can use `height` to limit it and/or use `responsive`
to make it scrollable.
### Main branch
From the examples above, you can see that there are many steps which do not point to any
other steps in downstream. These dead-ends are called ending steps. If we connect the first step,
steps in between and these ending step, this will become a branch. Imagine the workflow is
a top-bottom tree structure and the root is the first step. Therefore, there are
many possible ways to connect the workflow. For the convenience of plotting, we
introduce a concept of _"main branch"_, meaning one of the possible connecting
strategies that will be placed at the center of the plot. Other steps that are not
in this major branch will surround this major space. This _"main branch"_ will not
affect how a workflow is run, but just an algorithm to **compute the best visualization**.
It will have impact on how we plot the workflow.
This main branch will not impact the `compact` layout so much but will have a huge
effect on `horizontal` and `vertical` layouts.
The algorithm in `plotWF` will automatically choose a best branch for
you by default. In simple words, it favors: (a). branches that connect first and last step;
(b). as long as possible.
You can also choose a branch you want by `branch_method = "choose"`. It will first
list all possible branches, and then give you a prompt to ask for your favorite branch.
Here, for rendering the Rmarkdown, we cannot have a prompt, so we use a second argument
in combination, `branch_no = x` to directly choose a branch and skip the prompt. Also,
we use the `verbose = TRUE` to imitate the branch listing in console. In a real case,
you only need `branch_method = "choose"`.
To have the main branch marked, `mark_main_branch = TRUE` must be added (default `FALSE`).
Watch closely how the plot change by choosing different branches. Here we use `vertical`
layout to demo. Remember, the main branch is marked in blue.
Choose branch 1
```{r collapse=TRUE}
plotWF(sal, mark_main_branch = TRUE, layout = "vertical", branch_method = "choose", branch_no = 1, verbose = FALSE, height = "450px")
```
Choose branch 2
```{r collapse=TRUE}
plotWF(sal, mark_main_branch = TRUE, layout = "vertical", branch_method = "choose", branch_no = 2, verbose = FALSE, height = "450px")
```
Do you see the difference?
### Legends
The legend can also be removed by show_legend = FALSE
```{r}
plotWF(sal, show_legend = FALSE, height = "500px")
```
### Output formats
There are current three output formats: `"html"` and `"dot"`, `"dot_print"`. If first
two were chosen, you also need provide a path `out_path` to save the file.
- html: a single html file contains the plot.
- dot: a DOT script file with the code to reproduce the plot in a [graphiz](https://graphviz.org/)
DOT engine.
- dot_print: directly cat the dot script to console.
#### HTML
HTML format is very useful if you want to view the plot later or share it to other
people. This format is also helpful when you are working on a remote computer
cluster. To view the workflow plot, a browser device (viewer) must be available,
but often time this is not the case for computer clusters. When you plot a workflow
and see the message `"Couldn't get a file descriptor referring to the console"`,
it means your computer (cluster) does not have a browser device. Saving to
HTML format is the best option.
```{r}
plotWF(sal, out_format = "html", out_path = "example_out.html")
file.exists("example_out.html")
```
#### DOT
Saving workflow plot to `dot` format allows one to reproduce the plot with
the Graphviz language.
```{r}
plotWF(sal, out_format = "dot", out_path = "example_out.dot")
file.exists("example_out.dot")
```
#### DOT print
Instead of saving the Graphviz plotting code to a file, this option directly
prints out the code on console. If you have a Graphviz plotting device in hand,
simply copy and paste the code to that engine to reproduce the plot. For example,
use our [Workflow Plot Editor](https://systempipe.org/sp/spr/viz_editor).
```{r eval=FALSE}
plotWF(sal, out_format = "dot_print")
```
### Saving Static image file
Some users may want to save the plot to a static image, like `.png` format. We will
need do some extra work to save the file. The reason we cannot directly save it to
a png file is the plot is generated in real-time by a browser javascript engine. It
requires one type of javascript engine, like Chrome, MS Edge, Viewer in Rstudio,
to render the plot before we can see it, no matter you use SVG or PNG embedding.
#### Interactive
- With the `plot_ctr = TRUE` (default) option, a plot control panel is displayed
on the top-left corner. One can choose from different formats like _png_, _jpg_,
_svg_ or _pdf_ to download them from the webpage. To enable these buttons, internet connection is required.
The underlying Javascript libraries are download on-the-fly. Please make sure
internet is connected. There are known conflicts of underlying web format
creating libraries with R markdown web libraries, so some of these buttons may not work
inside R markdown as you are seeing in this vignette right now. However, they
should work properly if the workflow plot is saved to a stand-alone HTML file.
- If you are working in Rstudio, you can also use the `export` button in the viewer to save
an image file.
> Note: due to the web libraries conflicts of this website and the libraries used
> in `plotWF`. Some buttons may not work when you click, but it will work when you
> open make workflow plots interactively and view it in a stand-alone browser tab.
#### Non-interactive
If you cannot have an interactive session, like submitting a job to a cluster,
but still want the png, we recommend to use the {[webshot2](https://github.com/rstudio/webshot2)}
package to screenshot the plot. It runs headless Chrome in the back-end (which has a javascript engine).
Install the package
```{r eval=FALSE}
remotes::install_github("rstudio/webshot2")
```
Save to html first
```{r eval=FALSE}
plotWF(sal, out_format = "html", out_path = "example_out.html")
file.exists("example_out.html")
```
Use `webshot2` to save the image
```{r eval=FALSE}
webshot2::webshot("example_out.html", "example_out.png")
```
### In logs
The workflow steps will also become clickable if `in_log = TRUE`. This will create links
for each step that navigate to corresponding log section in the SPR
[workflow log file](https://systempipe.org/sp/spr/sp_run/step_reports/). Normally this option
is handled by SPR log file generating function `renderLogs` to create this plot on top of the log file,
so when a certain step is click, it will navigate to the detailed section down the page.
Visit [this page](../logs_example.html) to see a real example. Try to click on
the step in the workflow plot and watch what happens.
# Session
```{r}
sessionInfo()
```
```{r eval=TRUE, include=FALSE}
# cleaning
try(unlink(".SPRproject", recursive = TRUE), TRUE)
try(unlink("data", recursive = TRUE), TRUE)
try(unlink("results", recursive = TRUE), TRUE)
try(unlink("param", recursive = TRUE), TRUE)
try(unlink("varseq", recursive = TRUE), TRUE)
try(unlink("example_out.dot", recursive = TRUE), TRUE)
try(unlink("example_out.html", recursive = TRUE), TRUE)
```