miND - miRNA NGS Data pipeline by TAmiRNA GmbH, developed by: Andreas B. Diendorfer, PhD
This miND report provides a condensed overview of the mapping and (if requested) differential expression analysis of the provided samples. Details on the analysis parameters are listed below:
Comment:
Samples from ENA project PRJEB27261
Analysis parameters:
Tabular data can be filtered or sorted using the fields and options at the top of each table. To export the data for further processing, please select the desired format (Excel or CSV) at the table.
The first part of this report aims to give a general overview of the sequencing data. Please be aware, that any downstream analysis depends on certain assumptions on the distribution and quality of the data. It is important to manually evaluate the data with the plots and tables provided in this section. Samples that do not pass those evaluations should be excluded from statistical analysis, as they can distort the results.
To evaluate the quality and check the data for common sequencing problems, all processed files are also analysed with the “fastQC” tool. The results of all samples are then combined into one report together with statistics about the adapter trimming step.
The multiQC report was provided alongside with this file (multiqc_report.html).
Any files that do not pass the manual evaluation of this step should be excluded from further processing and analysis, as they could distort the results.
Reads classification gives insights into the type and origin (i.e. composition) of all sequences obtained for each sample. After processing of the reads (adapter trimming, quality filtering, size filtering), all reads are mapped against various databases to categorize them. This is done in a hierarchical process, where reads are first mapped against the genome. Genome mapped reads are then mapped against known miRNA sequences and only those not identified as miRNAs get mapped against other databases for further classification.
“Unclassified genomic” indicates reads that were mapped against the genome but were not found in any of the RNA specific databases, while “unmapped” are reads that could not be found in the given reference genome.
The “Relative reads” tab shows the same data scaled to 100% to indicate the relative abundance of each read classification in a given sample.
Hint: You can double click on any of the RNA categories in the legend to hide all other and only show this one category.
The following histograms show the number (y-axis) of genome mapped reads against their length (x-axis) for each sample. The stacked bar charts visualize the proportions of unmapped and mapped reads and can be used to evaluate the read quality. Most microRNAs are 22 nucleotides long.
The data in this table are equivalent to the data shown in the reads classification graph above (absolute reads). These are raw read counts (without any normalization).
This table contains all identified miRNAs in each sample. Read counts are normalized to 1 million mapped miRNAs.
Please use the download link provided underneath the table to save the miRNA mappings data. The buttons provided at the top of the table can also be used, but won’t include detailed group information of the samples.
This table contains all identified miRNAs in each sample. These are raw read counts (without any normalization).
Please use the download link provided underneath the table to save the miRNA mappings data. The buttons provided at the top of the table can also be used, but won’t include detailed group information of the samples.
This graph shows the amount of distinct mature miRNAs identified in each sample.
This overview plots the abundance of a miRNA (collapsed read count) on the x axis and the amount of other miRNAs in this range on the y axis. It illustrates the distribution of miRNAs in the sample.
Data is based on RPM normalized reads and scaled using the unit variance method for visualization in heatmaps. Clustering is done using the average method of pheatmap calculating the distances as correlations.
This heatmap shows only the top 50 miRNAs (based on coefficient of variation (CV%)). An additional filter was introduce to increase the robustness: only miRNAs that show an RPM in at least 1 / n(groups) percent of samples (e.g. with 4 groups, the miRNA has to have an RPM value above 5 in at least 25% of the samples). This removes miRNAs that have a high CV but are only expressed in a too small amount of samples to bear any statistical significance or biological relevance.
434 miRNAs are shown in the following heatmap, based on the same filters described at the top 50 miRNAs.
Principal component analysis (PCA) uses RPM normalized miRNA reads and reduces the data dimensions down to two, so that it can be plotted in a graph. A quick introduction to PCA plots and the underlaying principle, can be found here.
Samples are either colored by their first group or by the cluster they were assigend to. Clustering is done using the ward (ward.D2) alrogithm of hclust (split at euclidian cluster height of 40).
t-SNE is a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualization in a low-dimensional space (like 2 dimensions here). It models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability. More details can be found in the author’s publication (Maaten and Hinton 2008).
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.0.5 (2021-03-31)
## os Amazon Linux 2
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Etc/UCT
## date 2021-12-09
## pandoc 2.16.2 @ /home/ec2-user/mind/envs/tmp/79ceb962e5bc2aa81550cf556ab60c33/bin/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## annotate 1.68.0 2020-10-27 [1] Bioconductor
## AnnotationDbi 1.52.0 2020-10-27 [1] Bioconductor
## Biobase * 2.50.0 2020-10-27 [1] Bioconductor
## BiocGenerics * 0.36.0 2020-10-27 [1] Bioconductor
## bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.3)
## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.3)
## blob 1.2.2 2021-07-23 [1] CRAN (R 4.0.5)
## cachem 1.0.6 2021-08-19 [1] CRAN (R 4.0.5)
## callr 3.7.0 2021-04-20 [1] CRAN (R 4.0.3)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.5)
## cli 3.1.0 2021-10-27 [1] CRAN (R 4.0.5)
## colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.0.5)
## crayon 1.4.2 2021-10-29 [1] CRAN (R 4.0.5)
## crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.0.5)
## data.table 1.14.2 2021-09-27 [1] CRAN (R 4.0.5)
## DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
## desc 1.4.0 2021-09-28 [1] CRAN (R 4.0.5)
## devtools 2.4.3 2021-11-30 [1] CRAN (R 4.0.5)
## digest 0.6.29 2021-12-01 [1] CRAN (R 4.0.5)
## dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.0.5)
## DT * 0.17 2021-01-06 [1] CRAN (R 4.0.3)
## edgeR * 3.32.1 2021-01-14 [1] Bioconductor
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.3)
## evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.5)
## fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.5)
## farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.3)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.3)
## fs 1.5.2 2021-12-08 [1] CRAN (R 4.0.5)
## genefilter * 1.72.1 2021-01-21 [1] Bioconductor
## generics 0.1.1 2021-10-25 [1] CRAN (R 4.0.5)
## ggfortify * 0.4.13 2021-10-25 [1] CRAN (R 4.0.5)
## ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.0.5)
## ggrepel * 0.8.2 2020-03-08 [1] CRAN (R 4.0.0)
## glue 1.5.1 2021-11-30 [1] CRAN (R 4.0.5)
## gridExtra * 2.3 2017-09-09 [1] CRAN (R 4.0.5)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.5)
## highr 0.9 2021-04-16 [1] CRAN (R 4.0.3)
## hms 1.1.1 2021-09-26 [1] CRAN (R 4.0.5)
## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.0.5)
## htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.0.5)
## httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.5)
## IRanges 2.24.1 2020-12-12 [1] Bioconductor
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.3)
## jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3)
## kableExtra * 1.3.4 2021-02-20 [1] CRAN (R 4.0.3)
## knitr 1.36 2021-09-29 [1] CRAN (R 4.0.5)
## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.0.5)
## lattice 0.20-45 2021-09-22 [1] CRAN (R 4.0.5)
## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.5)
## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.0.5)
## limma * 3.46.0 2020-10-27 [1] Bioconductor
## locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.5)
## magrittr * 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
## Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.0.5)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.5)
## mime 0.12 2021-09-28 [1] CRAN (R 4.0.5)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.5)
## pcaMethods * 1.82.0 2020-10-27 [1] Bioconductor
## pheatmap * 1.0.12 2019-01-04 [1] CRAN (R 4.0.5)
## pillar 1.6.4 2021-10-18 [1] CRAN (R 4.0.5)
## pkgbuild 1.2.1 2021-11-30 [1] CRAN (R 4.0.5)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.5)
## pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.0.5)
## plotly * 4.9.4.1 2021-06-18 [1] CRAN (R 4.0.5)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.5)
## processx 3.5.2 2021-04-30 [1] CRAN (R 4.0.3)
## ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.3)
## purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.3)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.0.5)
## RColorBrewer * 1.1-2 2014-12-07 [1] CRAN (R 4.0.5)
## Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.0.5)
## readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.5)
## readxl * 1.3.1 2019-03-13 [1] CRAN (R 4.0.5)
## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.0.5)
## rlang 0.4.12 2021-10-18 [1] CRAN (R 4.0.5)
## rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.0.5)
## rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3)
## RSQLite 2.2.8 2021-08-21 [1] CRAN (R 4.0.5)
## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3)
## Rtsne * 0.15 2018-11-10 [1] CRAN (R 4.0.5)
## rvest 1.0.2 2021-10-16 [1] CRAN (R 4.0.5)
## S4Vectors 0.28.1 2020-12-09 [1] Bioconductor
## scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.5)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.0.5)
## stringi 1.7.6 2021-11-29 [1] CRAN (R 4.0.5)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.5)
## survival 3.2-13 2021-08-24 [1] CRAN (R 4.0.5)
## svglite 2.0.0 2021-02-20 [1] CRAN (R 4.0.3)
## systemfonts 1.0.3 2021-10-13 [1] CRAN (R 4.0.5)
## testthat 3.1.1 2021-12-03 [1] CRAN (R 4.0.5)
## tibble * 3.1.6 2021-11-07 [1] CRAN (R 4.0.5)
## tidyr * 1.1.4 2021-09-27 [1] CRAN (R 4.0.5)
## tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.3)
## usethis 2.1.3 2021-10-27 [1] CRAN (R 4.0.5)
## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.0.5)
## vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
## viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.0.3)
## webshot 0.5.2 2019-11-22 [1] CRAN (R 4.0.5)
## withr 2.4.3 2021-11-30 [1] CRAN (R 4.0.5)
## WriteXLS * 6.3.0 2021-04-01 [1] CRAN (R 4.0.3)
## xfun 0.22 2021-03-11 [1] CRAN (R 4.0.3)
## XML 3.99-0.8 2021-09-17 [1] CRAN (R 4.0.5)
## xml2 1.3.3 2021-11-30 [1] CRAN (R 4.0.5)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.5)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.3)
##
## [1] /home/ec2-user/mind/envs/tmp/79ceb962e5bc2aa81550cf556ab60c33/lib/R/library
##
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
The following references are provided for tools used with implications on the scientific and statistical outcome of this analysis. A multitude of other tools helped in preparation of this report of which many are available as open source. Please contact us for a full list of references.