---
title: "Visualize workflows" 
date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" 
vignette: |
  %\VignetteEncoding{UTF-8}
  %\VignetteIndexEntry{systemPipeR: Workflow design and reporting generation environment}
  %\VignetteEngine{knitr::rmarkdown}
fontsize: 14pt
editor_options: 
  chunk_output_type: console
type: docs
weight: 4
---


```{r setup, echo=TRUE, message=FALSE, warning=FALSE}
suppressPackageStartupMessages({
    library(systemPipeR)
})
```

In the last section, we have learned how to [run/manage workflows](../step_run). 
In this section, we will learn advanced options how to visualize workflows. 


First let's set up the workflow using the example workflow template. For real 
production purposes, we recommend you to check out the complex templates over [here](/spr_wf/).

```{r eval=TRUE, include=FALSE}
# cleaning
try(unlink(".SPRproject", recursive = TRUE), TRUE)
try(unlink("data", recursive = TRUE), TRUE)
try(unlink("results", recursive = TRUE), TRUE)
try(unlink("param", recursive = TRUE), TRUE)
try(unlink("varseq", recursive = TRUE), TRUE)
```

## dependency graph
The workflow plot is also called the dependency graph. It shows users how 
one step is depend on another. This is very important in SPR. A step will not 
be run unless all dependencies has been executed successfully. 

To understand a workflow, we can simply call the sal object to print on console
like so

```{r eval=FALSE}
sal
## Instance of 'SYSargsList': 
##     WF Steps:
##        1. load_library --> Status: Success
##        2. export_iris --> Status: Success
##        3. gzip --> Status: Success
##            Total Files: 3 | Existing: 3 | Missing: 0
##          3.1. gzip
##              cmdlist: 3 | Success: 3
##        4. gunzip --> Status: Success
##            Total Files: 3 | Existing: 3 | Missing: 0
##          4.1. gunzip
##              cmdlist: 3 | Success: 3
##        5. stats --> Status: Success
```

However, when the workflow becomes very long and complex, the relation between steps are 
hard to see from console. Workflow plot is the useful tool to understand the workflow.

For example, the [VARseq workflow](https://systempipe.org/SPvarseq/articles/SPvarseq.html) 
is complex, we can show it by: 

```{r eval=FALSE}
systemPipeRdata::genWorkenvir("varseq")
setwd("varseq")
sal <- SPRproject()
sal <- importWF(sal, file_path = "systemPipeVARseq.Rmd")
sal
```


```{r echo=FALSE, collapse=TRUE}
sal <- readRDS("varseq.rds")
sal
```


Directly printing the `sal` object as above does not give us the dependencies between 
steps and it is hard to see the full picture. Here, we can use `plotWF` to 
visualize the full workflow. 
```{r}
plotWF(sal)
```


## Advanced use 
The VARseq workflow is too large and too complex. Here, for demonstration purposes, we still use the [simple workflow](https://raw.githubusercontent.com/systemPipeR/systemPipeR.github.io/main/static/en/sp/spr/sp_run/spr_simple_wf.md).
```{r}
sal <- SPRproject()
sal <- importWF(sal, file_path = system.file("extdata", "spr_simple_wf.Rmd", package = "systemPipeR"), verbose = FALSE)
plotWF(sal, rstudio = TRUE)
```


### Rstudio

By default, the plot is opened in a new browser tab, because workflow 
can be very large and long. Viewing in the small Rstudio window is not ideal. 
This is controlled by the `rstudio` argument, and it is default `rstudio = FALSE`. It means 
whether to open the plot in a new browser or view it inside the current tool, 
for example many people use IDEs like Rstudio. 
If you insist to view it in the built-in viewer, or sometimes **rendering the R markdown**
**from an interactive session**, where we do not want 
to open in a new browser tab, `rstudio = TRUE` must be added. 

### Height and width 
Workflow plot height and width are adjustable by `height` and `width`. They can 
take any valid [CSS unit](https://www.w3schools.com/cssref/css_units.asp). By 
default, it take 100% of the [parent element](https://www.w3schools.com/html/html_elements.asp)
width, and automatically calculate the height based on need. Sometimes these 
fraction based or automatically generated units are not right. 

We can manually set them 
```{r}
plotWF(sal, width = "50%", rstudio = TRUE)
```


```{r}
plotWF(sal, height = "300px")
```

### Color and text

On the plot, different colors and numbers indicate different status. This information 
can be found also in the plot legends.

**Shapes:**

- circular steps: pure R code steps
- rounded squares steps: `sysargs` steps, steps that will invoke command-line calls
- blue colored steps and arrows: main branch (see [main branch](#main-branch) section below)

**Step colors**

- black: pending steps
- <span style="color: green">Green</span>: successful steps, all pass
- <span style="color: orange">Orange</span>: successful steps, but some samples have warning
- <span class="text-danger">Red</span>: failed steps, at least one sample failed

**Number and colors**

<br>There are 4 numbers in the second row of each step, separated by `/`

- <span style="color: green">First No.</span>: number of passed samples
- <span style="color: orange">Second No.</span>: number of warning samples
- <span class="text-danger">Third No.</span>: number of error samples
- <span style="color:blue;">Fourth No.</span>: number of total samples

**Duration**

<br>This is shown after the sample information, as how long it took to run this step. 
Units are a few seconds (**s**), some minutes (**m**), or some hours (**h**). 

For example, let's append a warning step and an error step to the `sal`
```{r}
appendStep(sal) <- LineWise(step_name = "warning_step", {warning("this creates a warning")}, dependency = "stats")
appendStep(sal) <- LineWise(step_name = "error_step", {stop("this creates an error")}, dependency = "stats")
sal
```

```{r eval=FALSE}
sal <- runWF(sal)
```

```{r error=TRUE, collapse=TRUE, include=FALSE}
sal <- runWF(sal)
```


Then let's plot it 
```{r}
plotWF(sal, width = "80%")
```

Do you see the color difference?

### On hover

By default `plotWF` uses SVG to make the plot so it is interactive. 
When the mouse is hovering on each step, detailed information will be displayed,
like sample information, processing time, duration, _etc_.

![](../plotwf_hover.png)

### Embedding

In additional to SVG embedding, PNG embedding is supported, but the plot will 
no longer be interactively, good for browsers without optimal SVG support. 

```{r}
plotWF(sal, plot_method = "png", width = "80%")
```

Right click on the plot of SVG and PNG, we can see, SVGs are not directly savable,
but PNGs are. However, PNGs are not vectorized, so it means it becomes blurry when 
we zoom in. 

### Responsiveness
This is a term often used in web development. It means will the plot resize itself if the user 
resize the document window? By default, `plotWF` will be responsive, meaning it 
will fit current window container size and adjust the size once the window size has 
changed. To always display the full sized plot, use `responsive = FALSE`. It is useful 
for embedding the plot in a full-screen mode.

```{r}
plotWF(sal, responsive = FALSE, width = "80%")
```

Now resize your window width and watch plot above _vs_. other plots. 

### Pan-zoom
The Pan-zoom option enables users to drag the plot instead of scrolling, and to 
use mouse wheel to zoom in/out. If you do not like the scroll bars in `responsive = FALSE`, 
try this option. Note it cannot be used with `responsive = TRUE` together. 
If both `TRUE`, `responsive` will be automatically set to `FALSE`. To enable this function 
internet connection is required to download Javascript libraries on-the-fly. 

```{r}
plotWF(sal, pan_zoom = TRUE)
```


### Layout
There a few different layout you can choose. There is no best layout. It all depends 
on the workflow structure you have. The default is `compact` but we recommend you 
to try different layouts to find the best fitting one. 

- `compact`: try to plot steps as close as possible.
- `vertical`: main branch will be placed vertically and side branches will be placed
  on the same horizontal level and sub steps of side branches will be placed
  vertically.
- `horizontal`: main branch is placed horizontally and side branches and sub
  steps will be placed vertically.
- `execution`: a linear plot to show the workflow execution order of all steps.

Here we are talking about the concept of **main branch**. It is a way to decide 
the plot center. We will discuss more below. 

**vertical**

```{r}
plotWF(sal, layout = "vertical", height = "600px")
```
If the plot is very long, use `height` to make it smaller.

**horizontal**

```{r}
plotWF(sal, layout = "horizontal")
```

**execution**

```{r}
plotWF(sal, layout = "execution", height = "600px", responsive = FALSE)
```

If the plot is too long, we can use `height` to limit it and/or use `responsive` 
to make it scrollable. 

### Main branch 

From the examples above, you can see that there are many steps which do not point to any 
other steps in downstream. These dead-ends are called ending steps. If we connect the first step,
steps in between and these ending step, this will become a branch. Imagine the workflow is 
a top-bottom tree structure and the root is the first step. Therefore, there are 
many possible ways to connect the workflow. For the convenience of plotting, we 
introduce a concept of _"main branch"_, meaning one of the possible connecting 
strategies that will be placed at the center of the plot. Other steps that are not 
in this major branch will surround this major space. This  _"main branch"_ will not 
affect how a workflow is run, but just an algorithm to **compute the best visualization**. 
It will have impact on how we plot the workflow. 

This main branch will not impact the `compact` layout so much but will have a huge 
effect on `horizontal` and `vertical` layouts. 

The algorithm in `plotWF` will automatically choose a best branch for 
you by default. In simple words, it favors: (a). branches that connect first and last step;
(b). as long as possible. 

You can also choose a branch you want by `branch_method = "choose"`. It will first 
list all possible branches, and then give you a prompt to ask for your favorite branch. 
Here, for rendering the Rmarkdown, we cannot have a prompt, so we use a second argument 
in combination, `branch_no = x` to directly choose a branch and skip the prompt. Also, 
we use the `verbose = TRUE` to imitate the branch listing in console. In a real case, 
you only need `branch_method = "choose"`.

To have the main branch marked, `mark_main_branch = TRUE` must be added (default `FALSE`). 
Watch closely how the plot change by choosing different branches. Here we use `vertical`
layout to demo. Remember, the main branch is marked in blue.

Choose branch 1
```{r collapse=TRUE}
plotWF(sal,  mark_main_branch = TRUE, layout = "vertical", branch_method = "choose", branch_no = 1, verbose = FALSE, height = "450px")
```
Choose branch 2
```{r collapse=TRUE}
plotWF(sal,  mark_main_branch = TRUE, layout = "vertical", branch_method = "choose", branch_no = 2, verbose = FALSE, height = "450px")
```

Do you see the difference?

### Legends
The legend can also be removed by show_legend = FALSE
```{r}
plotWF(sal, show_legend = FALSE, height = "500px")
```

### Output formats 

There are current three output formats: `"html"` and `"dot"`, `"dot_print"`. If first 
two were chosen, you also need provide a path `out_path` to save the file. 

- html: a single html file contains the plot.
- dot: a DOT script file with the code to reproduce the plot in a [graphiz](https://graphviz.org/)
  DOT engine. 
- dot_print: directly cat the dot script to console. 

#### HTML
HTML format is very useful if you want to view the plot later or share it to other 
people. This format is also helpful when you are working on a remote computer 
cluster. To view the workflow plot, a browser device (viewer) must be available, 
but often time this is not the case for computer clusters. When you plot a workflow 
and see the message `"Couldn't get a file descriptor referring to the console"`,
it means your computer (cluster) does not have a browser device. Saving to 
HTML format is the best option. 

```{r}
plotWF(sal, out_format = "html", out_path = "example_out.html")
file.exists("example_out.html")
```

#### DOT
Saving workflow plot to `dot` format allows one to reproduce the plot with 
the Graphviz language. 
```{r}
plotWF(sal, out_format = "dot", out_path = "example_out.dot")
file.exists("example_out.dot")
```

#### DOT print
Instead of saving the Graphviz plotting code to a file, this option directly 
prints out the code on console. If you have a Graphviz plotting device in hand,
simply copy and paste the code to that engine to reproduce the plot. For example, 
use our [Workflow Plot Editor](https://systempipe.org/sp/spr/viz_editor).

```{r eval=FALSE}
plotWF(sal, out_format = "dot_print")
```

### Saving Static image file

Some users may want to save the plot to a static image, like `.png` format. We will 
need do some extra work to save the file. The reason we cannot directly save it to 
a png file is the plot is generated in real-time by a browser javascript engine. It 
requires one type of javascript engine, like Chrome, MS Edge, Viewer in Rstudio,
to render the plot before we can see it, no matter you use SVG or PNG embedding. 

#### Interactive 

- With the `plot_ctr = TRUE` (default) option, a plot control panel is displayed 
  on the top-left corner. One can choose from different formats like _png_, _jpg_, 
  _svg_ or _pdf_ to download them from the webpage. To enable these buttons, internet connection is required.
  The underlying Javascript libraries are download on-the-fly. Please make sure 
  internet is connected. There are known conflicts of underlying web format
  creating libraries with R markdown web libraries, so some of these buttons may not work
  inside R markdown as you are seeing in this vignette right now. However, they 
  should work properly if the workflow plot is saved to a stand-alone HTML file.
- If you are working in Rstudio, you can also use the `export` button in the viewer to save 
  an image file. 

> Note: due to the web libraries conflicts of this website and the libraries used 
> in `plotWF`. Some buttons may not work when you click, but it will work when you 
> open make workflow plots interactively and view it in a stand-alone browser tab. 

#### Non-interactive

If you cannot have an interactive session, like submitting a job to a cluster, 
but still want the png, we recommend to use the {[webshot2](https://github.com/rstudio/webshot2)}
package to screenshot the plot. It runs headless Chrome in the back-end (which has a javascript engine).

Install the package 

```{r eval=FALSE}
remotes::install_github("rstudio/webshot2")
```

Save to html first

```{r eval=FALSE}
plotWF(sal, out_format = "html", out_path = "example_out.html")
file.exists("example_out.html")
```

Use `webshot2` to save the image

```{r eval=FALSE}
webshot2::webshot("example_out.html", "example_out.png")
```

### In logs
The workflow steps will also become clickable if `in_log = TRUE`. This will create links 
for each step that navigate to corresponding log section in the SPR 
[workflow log file](https://systempipe.org/sp/spr/sp_run/step_reports/). Normally this option 
is handled by SPR log file generating function `renderLogs` to create this plot on top of the log file,
so when a certain step is click, it will navigate to the detailed section down the page. 

Visit [this page](../logs_example.html) to see a real example. Try to click on 
the step in the workflow plot and watch what happens.


# Session 
```{r}
sessionInfo()
```

```{r eval=TRUE, include=FALSE}
# cleaning
try(unlink(".SPRproject", recursive = TRUE), TRUE)
try(unlink("data", recursive = TRUE), TRUE)
try(unlink("results", recursive = TRUE), TRUE)
try(unlink("param", recursive = TRUE), TRUE)
try(unlink("varseq", recursive = TRUE), TRUE)
try(unlink("example_out.dot", recursive = TRUE), TRUE)
try(unlink("example_out.html", recursive = TRUE), TRUE)
```