--- title: "Introduction" fontsize: 14pt bibliography: bibtex.bib editor_options: chunk_output_type: console type: docs weight: 1 --- > **Note:** if you use *{`systemPipeR`}* in published research, please cite: > Backman, T.W.H and Girke, T. (2016). *systemPipeR*: NGS Workflow and Report Generation Environment. *BMC Bioinformatics*, 17: 388. \>[10.1186/s12859-016-1241-0](https://doi.org/10.1186/s12859-016-1241-0). [*`systemPipeR`*(SPR)](http://www.bioconductor.org/packages/devel/bioc/html/systemPipeR.html) provides flexible utilities for designing, building and running automated end-to-end analysis workflows for a wide range of research applications, including next-generation sequencing (NGS) experiments, such as RNA-Seq, ChIP-Seq, VAR-Seq and Ribo-Seq (H Backman and Girke 2016). Important features include a uniform workflow interface across different data analysis applications, automated report generation, and support for running both R and command-line software, such as NGS aligners or peak/variant callers, on local computers or compute clusters (Figure 1). The latter supports interactive job submissions and batch submissions to queuing systems of clusters. *`systemPipeR`* has been designed to improve the reproducibility of large-scale data analysis projects while substantially reducing the time it takes to analyze complex omics data sets. It provides a uniform workflow interface and management system that allows the users to run selected workflow steps, as well as customize and design entirely new workflows. Additionally, the package take advantage of central community S4 classes of the Bioconductor ecosystem, and enhances them with command-line software support. The main motivation and advantages of using *`systemPipeR`* for complex data analysis tasks are: 1. Design of complex workflows involving multiple R/Bioconductor packages 2. Common workflow interface for different applications 3. User-friendly access to widely used Bioconductor utilities 4. Support of command-line software from within R 5. Reduced complexity of using compute clusters from R 6. Accelerated runtime of workflows via parallelization on computer systems with multiple CPU cores and/or multiple nodes 7. Improved reproducibility by automating the generation of analysis reports