---
title: "Workflow Templates"
author: "Author: Daniela Cassol (danielac@ucr.edu) and Thomas Girke (thomas.girke@ucr.edu)"
date: "Last update: 22 February, 2021"
output:
BiocStyle::html_document:
toc_float: true
code_folding: show
BiocStyle::pdf_document: default
package: systemPipeR
vignette: |
%\VignetteEncoding{UTF-8}
%\VignetteIndexEntry{systemPipeR: Workflow design and reporting generation environment}
%\VignetteEngine{knitr::rmarkdown}
fontsize: 14pt
bibliography: bibtex.bib
editor_options:
chunk_output_type: console
type: docs
weight: 5
---
# Workflow templates
The intended way of running *`systemPipeR`* workflows is via *`*.Rmd`* files, which
can be executed either line-wise in interactive mode or with a single command from
R or the command-line. This way comprehensive and reproducible analysis reports
can be generated in PDF or HTML format in a fully automated manner by making use
of the highly functional reporting utilities available for R.
The following shows how to execute a workflow (*e.g.*, systemPipeRNAseq.Rmd)
from the command-line.
``` bash
Rscript -e "rmarkdown::render('systemPipeRNAseq.Rmd')"
```
Templates for setting up custom project reports are provided as *`*.Rmd`* files by the helper package *`systemPipeRdata`* and in the vignettes subdirectory of *`systemPipeR`*. The corresponding HTML of these report templates are available here: [*`systemPipeRNAseq`*](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.html), [*`systemPipeRIBOseq`*](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRIBOseq.html), [*`systemPipeChIPseq`*](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeChIPseq.html) and [*`systemPipeVARseq`*](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.html). To work with *`*.Rnw`* or *`*.Rmd`* files efficiently, basic knowledge of [*`Sweave`*](https://www.stat.uni-muenchen.de/~leisch/Sweave/) or [*`knitr`*](http://yihui.name/knitr/) and [*`Latex`*](http://www.latex-project.org/) or [*`R Markdown v2`*](http://rmarkdown.rstudio.com/) is required.
## RNA-Seq sample
Load the RNA-Seq sample workflow into your current working directory.
``` r
library(systemPipeRdata)
genWorkenvir(workflow = "rnaseq")
setwd("rnaseq")
```
### Run workflow
Next, run the chosen sample workflow *`systemPipeRNAseq`* ([PDF](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/rnaseq/systemPipeRNAseq.pdf?raw=true), [Rmd](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/rnaseq/systemPipeRNAseq.Rmd)) by executing from the command-line *`make -B`* within the *`rnaseq`* directory. Alternatively, one can run the code from the provided *`*.Rmd`* template file from within R interactively.
The workflow includes following steps:
1. Read preprocessing
- Quality filtering (trimming)
- FASTQ quality report
2. Alignments: *`Tophat2`* (or any other RNA-Seq aligner)
3. Alignment stats
4. Read counting
5. Sample-wise correlation analysis
6. Analysis of differentially expressed genes (DEGs)
7. GO term enrichment analysis
8. Gene-wise clustering
## ChIP-Seq sample
Load the ChIP-Seq sample workflow into your current working directory.
``` r
library(systemPipeRdata)
genWorkenvir(workflow = "chipseq")
setwd("chipseq")
```
### Run workflow
Next, run the chosen sample workflow *`systemPipeChIPseq_single`* ([PDF](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/chipseq/systemPipeChIPseq.pdf?raw=true), [Rmd](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/chipseq/systemPipeChIPseq.Rmd)) by executing from the command-line *`make -B`* within the *`chipseq`* directory. Alternatively, one can run the code from the provided *`*.Rmd`* template file from within R interactively.
The workflow includes the following steps:
1. Read preprocessing
- Quality filtering (trimming)
- FASTQ quality report
2. Alignments: *`Bowtie2`* or *`rsubread`*
3. Alignment stats
4. Peak calling: *`MACS2`*, *`BayesPeak`*
5. Peak annotation with genomic context
6. Differential binding analysis
7. GO term enrichment analysis
8. Motif analysis
## VAR-Seq sample
### VAR-Seq workflow for the single machine
Load the VAR-Seq sample workflow into your current working directory.
``` r
library(systemPipeRdata)
genWorkenvir(workflow = "varseq")
setwd("varseq")
```
### Run workflow
Next, run the chosen sample workflow *`systemPipeVARseq_single`* ([PDF](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/varseq/systemPipeVARseq_single.pdf?raw=true), [Rmd](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/varseq/systemPipeVARseq_single.Rmd)) by executing from the command-line *`make -B`* within the *`varseq`* directory. Alternatively, one can run the code from the provided *`*.Rmd`* template file from within R interactively.
The workflow includes following steps:
1. Read preprocessing
- Quality filtering (trimming)
- FASTQ quality report
2. Alignments: *`gsnap`*, *`bwa`*
3. Variant calling: *`VariantTools`*, *`GATK`*, *`BCFtools`*
4. Variant filtering: *`VariantTools`* and *`VariantAnnotation`*
5. Variant annotation: *`VariantAnnotation`*
6. Combine results from many samples
7. Summary statistics of samples
### VAR-Seq workflow for computer cluster
The workflow template provided for this step is called *`systemPipeVARseq.Rmd`* ([PDF](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/varseq/systemPipeVARseq.pdf?raw=true), [Rmd](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/varseq/systemPipeVARseq.Rmd)).
It runs the above VAR-Seq workflow in parallel on multiple compute nodes of an HPC system using Slurm as the scheduler.
## Ribo-Seq sample
Load the Ribo-Seq sample workflow into your current working directory.
``` r
library(systemPipeRdata)
genWorkenvir(workflow = "riboseq")
setwd("riboseq")
```
### Run workflow
Next, run the chosen sample workflow *`systemPipeRIBOseq`* ([PDF](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/riboseq/systemPipeRIBOseq.pdf?raw=true), [Rmd](https://github.com/tgirke/systemPipeRdata/blob/master/inst/extdata/workflows/ribseq/systemPipeRIBOseq.Rmd)) by executing from the command-line *`make -B`* within the *`ribseq`* directory. Alternatively, one can run the code from the provided *`*.Rmd`* template file from within R interactively.
The workflow includes following steps:
1. Read preprocessing
- Adaptor trimming and quality filtering
- FASTQ quality report
2. Alignments: *`Tophat2`* (or any other RNA-Seq aligner)
3. Alignment stats
4. Compute read distribution across genomic features
5. Adding custom features to the workflow (e.g. uORFs)
6. Genomic read coverage along with transcripts
7. Read counting
8. Sample-wise correlation analysis
9. Analysis of differentially expressed genes (DEGs)
10. GO term enrichment analysis
11. Gene-wise clustering
12. Differential ribosome binding (translational efficiency)
# Version information
**Note:** the most recent version of this tutorial can be found here.
``` r
sessionInfo()
```
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /home/dcassol/src/R-4.0.3/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] magrittr_2.0.1 batchtools_0.9.15
## [3] ape_5.4-1 ggplot2_3.3.3
## [5] systemPipeR_1.25.6 ShortRead_1.48.0
## [7] GenomicAlignments_1.26.0 SummarizedExperiment_1.20.0
## [9] Biobase_2.50.0 MatrixGenerics_1.2.1
## [11] matrixStats_0.58.0 BiocParallel_1.24.1
## [13] Rsamtools_2.6.0 Biostrings_2.58.0
## [15] XVector_0.30.0 GenomicRanges_1.42.0
## [17] GenomeInfoDb_1.26.2 IRanges_2.24.1
## [19] S4Vectors_0.28.1 BiocGenerics_0.36.0
## [21] BiocStyle_2.18.1
##
## loaded via a namespace (and not attached):
## [1] colorspace_2.0-0 rjson_0.2.20 hwriter_1.3.2
## [4] ellipsis_0.3.1 bit64_4.0.5 AnnotationDbi_1.52.0
## [7] xml2_1.3.2 codetools_0.2-18 splines_4.0.3
## [10] cachem_1.0.3 knitr_1.31 jsonlite_1.7.2
## [13] annotate_1.68.0 GO.db_3.12.1 dbplyr_2.1.0
## [16] png_0.1-7 pheatmap_1.0.12 graph_1.68.0
## [19] BiocManager_1.30.10 compiler_4.0.3 httr_1.4.2
## [22] GOstats_2.56.0 backports_1.2.1 assertthat_0.2.1
## [25] Matrix_1.3-2 fastmap_1.1.0 limma_3.46.0
## [28] formatR_1.7 htmltools_0.5.1.1 prettyunits_1.1.1
## [31] tools_4.0.3 gtable_0.3.0 glue_1.4.2
## [34] GenomeInfoDbData_1.2.4 Category_2.56.0 dplyr_1.0.4
## [37] rsvg_2.1 rappdirs_0.3.3 V8_3.4.0
## [40] Rcpp_1.0.6 vctrs_0.3.6 nlme_3.1-152
## [43] blogdown_1.1.7 rtracklayer_1.50.0 xfun_0.21
## [46] stringr_1.4.0 lifecycle_1.0.0.9000 XML_3.99-0.5
## [49] edgeR_3.32.1 zlibbioc_1.36.0 scales_1.1.1
## [52] BSgenome_1.58.0 VariantAnnotation_1.36.0 hms_1.0.0
## [55] RBGL_1.66.0 RColorBrewer_1.1-2 yaml_2.2.1
## [58] curl_4.3 memoise_2.0.0 biomaRt_2.46.3
## [61] latticeExtra_0.6-29 stringi_1.5.3 RSQLite_2.2.3
## [64] genefilter_1.72.1 checkmate_2.0.0 GenomicFeatures_1.42.1
## [67] DOT_0.1 rlang_0.4.10 pkgconfig_2.0.3
## [70] bitops_1.0-6 evaluate_0.14 lattice_0.20-41
## [73] purrr_0.3.4 bit_4.0.4 tidyselect_1.1.0
## [76] GSEABase_1.52.1 AnnotationForge_1.32.0 bookdown_0.21
## [79] R6_2.5.0 generics_0.1.0 base64url_1.4
## [82] DelayedArray_0.16.1 DBI_1.1.1 withr_2.4.1
## [85] pillar_1.4.7 survival_3.2-7 RCurl_1.98-1.2
## [88] tibble_3.0.6 crayon_1.4.1 BiocFileCache_1.14.0
## [91] rmarkdown_2.6 jpeg_0.1-8.1 progress_1.2.2
## [94] locfit_1.5-9.4 grid_4.0.3 data.table_1.13.6
## [97] blob_1.2.1 Rgraphviz_2.34.0 digest_0.6.27
## [100] xtable_1.8-4 brew_1.0-6 openssl_1.4.3
## [103] munsell_0.5.0 askpass_1.1
# References