--- title: Graphics and Data Visualization in R author: "First/last name (first.last@ucr.edu)" date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" output: html_document: toc: true toc_float: collapsed: true smooth_scroll: true toc_depth: 3 fig_caption: yes code_folding: show number_sections: true fontsize: 14pt bibliography: bibtex.bib weight: 15 type: docs --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() options(width=100, max.print=1000) knitr::opts_chunk$set( message=FALSE, eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")), cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE"))) ```
Source code downloads:     [ [.Rmd](https://raw.githubusercontent.com/tgirke/GEN242//main/content/en/tutorials/rgraphics/rgraphics.Rmd) ]     [ [.R](https://raw.githubusercontent.com/tgirke/GEN242//main/content/en/tutorials/rgraphics/rgraphics.R) ]
## Overview ### Graphics in R - Powerful environment for visualizing scientific data - Integrated graphics and statistics infrastructure - Publication quality graphics - Fully programmable - Highly reproducible - Full [LaTeX](http://www.latex-project.org/), [Sweave](http://www.stat.auckland.ac.nz/~dscott/782/Sweave-manual-20060104.pdf), [knitr](http://yihui.name/knitr/) and [R Markdown](http://rmarkdown.rstudio.com/) support. - Vast number of R packages with graphics utilities ### Documentation on Graphics in R - General - [Graphics Task Page](https://cran.r-project.org/web/views/) - [R Graph Gallery](http://www.r-graph-gallery.com/) - [R Graphical Manual](https://www.imsbio.co.jp/RGM/R_image_list?page=575&init=true) - [Paul Murrell’s book R (Grid) Graphics](http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html) - ggplot2: [Book](https://ggplot2-book.org/), [R for Data Science](https://r4ds.had.co.nz/data-visualisation.html) and [ggplot2 reference](https://ggplot2.tidyverse.org/reference/) - Interactive graphics - [rggobi (GGobi)](http://www.ggobi.org/) - [iplots](http://www.iplots.org/) - [Open GL (`rgl`)](http://rgl.neoscientists.org/gallery.shtml) - [plotly](https://plotly.com/r/) ### Graphics Environments - Viewing and savings graphics in R - On-screen graphics - postscript, pdf, svg - jpeg/png/wmf/tiff/... - Four major graphics environments - Low-level infrastructure - R Base Graphics (low- and high-level) - `grid`: [Manual](http://www.stat.auckland.ac.nz/~paul/grid/grid.html), [Book](http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html) - High-level infrastructure - `lattice`: [Manual](http://lmdvr.r-forge.r-project.org), [Intro](https://lattice.r-forge.r-project.org/Vignettes/src/lattice-intro/lattice-intro.pdf), [Book](http://www.amazon.com/Lattice-Multivariate-Data-Visualization-Use/dp/0387759689) - `ggplot2`: [Manual](https://r4ds.hadley.nz/data-visualize.html), [Intro](http://www.ling.upenn.edu/~joseff/rstudy/summer2010_ggplot2_intro.html), [Book](https://ggplot2-book.org/) ## Base Graphics ### Overview - Important high-level plotting functions - `plot`: generic x-y plotting - `barplot`: bar plots - `boxplot`: box-and-whisker plot - `hist`: histograms - `pie`: pie charts - `dotchart`: cleveland dot plots - `image, heatmap, contour, persp`: functions to generate image-like plots - `qqnorm, qqline, qqplot`: distribution comparison plots - `pairs, coplot`: display of multivariant data - Help on these functions - `?myfct` - `?plot` - `?par` ### Preferred Input Data Objects - Matrices and data frames - Vectors - Named vectors ### Scatter Plots #### Basic scatter plots Sample data set for subsequent plots ```{r scatter_plot_basic, eval=TRUE} set.seed(1410) y <- matrix(runif(30), ncol=3, dimnames=list(letters[1:10], LETTERS[1:3])) y plot(y[,1], y[,2]) ``` #### All pairs ```{r scatter_plot_allpairs, eval=TRUE} pairs(y) ``` #### Plot labels ```{r scatter_plot_labels, eval=TRUE} plot(y[,1], y[,2], pch=20, col="red", main="Symbols and Labels") text(y[,1]+0.03, y[,2], rownames(y)) ``` #### More examples Print instead of symbols the row names ```{r scatter_plot_row_names, eval=FALSE} plot(y[,1], y[,2], type="n", main="Plot of Labels") text(y[,1], y[,2], rownames(y)) ``` Usage of important plotting parameters ```{r plot_parameters, eval=FALSE} grid(5, 5, lwd = 2) op <- par(mar=c(8,8,8,8), bg="lightblue") plot(y[,1], y[,2], type="p", col="red", cex.lab=1.2, cex.axis=1.2, cex.main=1.2, cex.sub=1, lwd=4, pch=20, xlab="x label", ylab="y label", main="My Main", sub="My Sub") par(op) ``` Important arguments} - `mar`: specifies the margin sizes around the plotting area in order: `c(bottom, left, top, right)` - `col`: color of symbols - `pch`: type of symbols, samples: `example(points)` - `lwd`: size of symbols - `cex.*`: control font sizes - For details see `?par` Add a regression line to a plot ```{r scatter_plot_regress, eval=TRUE} plot(y[,1], y[,2]) myline <- lm(y[,2]~y[,1]); abline(myline, lwd=2) summary(myline) ``` Same plot as above, but on log scale ```{r scatter_plot_log_scale, eval=TRUE} plot(y[,1], y[,2], log="xy") ``` Add a mathematical expression to a plot ```{r scatter_plot_math, eval=TRUE} plot(y[,1], y[,2]); text(y[1,1], y[1,2], expression(sum(frac(1,sqrt(x^2*pi)))), cex=1.3) ``` #### Exercise 1 - __Task 1__: Generate scatter plot for first two columns in `iris` data frame and color dots by its `Species` column. - __Task 2__: Use the `xlim/ylim` arguments to set limits on the x- and y-axes so that all data points are restricted to the left bottom quadrant of the plot. Structure of iris data set: ```{r iris_structure, eval=TRUE} class(iris) iris[1:4,] table(iris$Species) ``` ```{r exercise1_solution, eval=FALSE, echo=FALSE} plot(iris[,1], iris[,2], col=iris$Species, lwd=2, pch=19) plot(iris[,1], iris[,2], col=iris$Species, lwd=2, pch=19, xlim=c(4,16), ylim=c(2,8)) ``` ### Line Plots #### Single Data Set ```{r line_plot_single, eval=TRUE} plot(y[,1], type="l", lwd=2, col="blue") ``` Additional lines can be added to an existing plot with the `lines()` function. ```{r line_plot_multiple, eval=TRUE} plot(y[,1], type="l", lwd=2, col="blue") lines(y[,2], lwd=2, lty=1, col="red") legend(8.3, 0.95, legend=c("Line 1", "Line 2"), col=c("blue", "red"), lty=1) ``` #### Many Data Sets Alterntively, one can plot a line graph for all columns in data frame `y` with the following approach. The `split.screen` function is used in this example in a `for` loop to overlay several line graphs in the same plot. ```{r line_plot_many, eval=TRUE} split.screen(c(1,1)) plot(y[,1], ylim=c(0,1), xlab="Measurement", ylab="Intensity", type="l", lwd=2, col=1) for(i in 2:length(y[1,])) { screen(1, new=FALSE) plot(y[,i], ylim=c(0,1), type="l", lwd=2, col=i, xaxt="n", yaxt="n", ylab="", xlab="", main="", bty="n") } close.screen(all=TRUE) ``` ### Bar Plots #### Basics ```{r bar_plot_basic, eval=TRUE} barplot(y[1:4,], ylim=c(0, max(y[1:4,])+0.3), beside=TRUE, legend=letters[1:4]) text(labels=round(as.vector(as.matrix(y[1:4,])),2), x=seq(1.5, 13, by=1) +sort(rep(c(0,1,2), 4)), y=as.vector(as.matrix(y[1:4,]))+0.04) ``` #### Error bars ```{r bar_plot_error_bar, eval=TRUE} bar <- barplot(m <- rowMeans(y) * 10, ylim=c(0, 10)) stdev <- sd(t(y)) arrows(bar, m, bar, m + stdev, length=0.15, angle = 90) ``` #### Mirrored bar plot ```{r bar_plot_mirrored, eval=TRUE} df <- data.frame(group = rep(c("Above", "Below"), each=10), x = rep(1:10, 2), y = c(runif(10, 0, 1), runif(10, -1, 0))) plot(c(0,12), range(df$y), type = "n") barplot(height = df$y[df$group == "Above"], add = TRUE, axes = FALSE) barplot(height = df$y[df$group == "Below"], add = TRUE, axes = FALSE) ``` #### Bar plot of loan payments and amortization tables The following imports a `mortgage` payment function (from [here](https://raw.githubusercontent.com/tgirke/GEN242/main/content/en/tutorials/rgraphics/scripts/mortgage.R)) that calculates monthly and annual mortgage/loan payments, generates amortization tables and plots the results in form of a bar plot. A Shiny App using this function has been created by Antoine Soetewey [here](https://antoinesoetewey.shinyapps.io/mortgage-calculator/). ```{r bar_plot_mortgage, eval=TRUE} source("https://raw.githubusercontent.com/tgirke/GEN242/main/content/en/tutorials/rgraphics/scripts/mortgage.R") m <- mortgage(P=250000, I=6, L=15, plotData=TRUE) ``` ### Histograms ```{r hist_plot, eval=TRUE} hist(y, freq=TRUE, breaks=10) ``` ### Density Plots} ```{r density_plot, eval=TRUE} plot(density(y), col="red") ``` ### Pie Charts ```{r pie_chart, eval=TRUE} pie(y[,1], col=rainbow(length(y[,1]), start=0.1, end=0.8), clockwise=TRUE) legend("topright", legend=row.names(y), cex=1.3, bty="n", pch=15, pt.cex=1.8, col=rainbow(length(y[,1]), start=0.1, end=0.8), ncol=1) ``` ### Color Selection Utilities Default color palette and how to change it ```{r color_palette, eval=TRUE} palette() palette(rainbow(5, start=0.1, end=0.2)) palette() palette("default") ``` The `gray` function allows to select any type of gray shades by providing values from 0 to 1 ```{r gray_palette, eval=TRUE} gray(seq(0.1, 1, by= 0.2)) ``` Color gradients with `colorpanel` function from `gplots` library ```{r color_panel, eval=FALSE} library(gplots) colorpanel(5, "darkblue", "yellow", "white") ``` Much more on colors in R see Earl Glynn's [color chart](http://earlglynn.github.io/color/ColorInR) ### Arranging Several Plots on Single Page With `par(mfrow=c(nrow, ncol))` one can define how several plots are arranged next to each other. ```{r par_mfrow, eval=TRUE} par(mfrow=c(2,3)) for(i in 1:6) plot(1:10) ``` #### Arranging Plots with Variable Width The `layout` function allows to divide the plotting device into variable numbers of rows and columns with the column-widths and the row-heights specified in the respective arguments. ```{r layout_plot, eval=TRUE} nf <- layout(matrix(c(1,2,3,3), 2, 2, byrow=TRUE), c(3,7), c(5,5), respect=TRUE) # layout.show(nf) for(i in 1:3) barplot(1:10) ``` ### Saving Graphics to Files After the `pdf()` command all graphs are redirected to file `test.pdf`. Works for all common formats similarly: jpeg, png, ps, tiff, ... ```{r save_pdf, eval=FALSE} pdf("test.pdf"); plot(1:10, 1:10); dev.off() ``` Generates Scalable Vector Graphics (SVG) files that can be edited in vector graphics programs, such as InkScape. ```{r save_svg, eval=FALSE} svg("test.svg"); plot(1:10, 1:10); dev.off() ``` #### Exercise 2 Bar plots - __Task 1__: Calculate the mean values for the `Species` components of the first four columns in the `iris` data set. Organize the results in a matrix where the row names are the unique values from the `iris Species` column and the column names are the same as in the first four `iris` columns. - __Task 2__: Generate two bar plots: one with stacked bars and one with horizontally arranged bars. Structure of iris data set: ```{r structure_iris2, eval=TRUE} class(iris) iris[1:4,] table(iris$Species) ``` ```{r exercise2_solution, eval=FALSE, echo=FALSE} mMA <- sapply(colnames(iris[,1:4]), function(x) tapply(iris[,x], iris[,5], mean)) barplot(mMA, beside=T, legend=rownames(mMA)) ``` ## Grid Graphics - What is `grid`? - Low-level graphics system - Highly flexible and controllable system - Does not provide high-level functions - Intended as development environment for custom plotting functions - Pre-installed on new R distributions - Documentation and Help - [Manual](http://www.stat.auckland.ac.nz/~paul/grid/grid.html) - [Book](http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html) ## lattice Graphics - What is `lattice`? - High-level graphics system - Developed by Deepayan Sarkar - Implements Trellis graphics system from S-Plus - Simplifies high-level plotting tasks: arranging complex graphical features - Syntax similar to R's base graphics - Documentation and Help - [Manual](http://lmdvr.r-forge.r-project.org) - [Intro](https://www.statmethods.net/advgraphs/trellis.html) - [Book](http://www.amazon.com/Lattice-Multivariate-Data-Visualization-Use/dp/0387759689) Open a list of all functions available in the lattice package ```{r help_lattice, eval=FALSE} library(lattice) library(help=lattice) ``` Accessing and changing global parameters: ```{r param_lattice, eval=FALSE} ?lattice.options ?trellis.device ``` ### Scatter Plot Sample ```{r scatter_plot_lattice, eval=TRUE} library(lattice) p1 <- xyplot(1:8 ~ 1:8 | rep(LETTERS[1:4], each=2), as.table=TRUE) plot(p1) ``` ### Line Plot Sample ```{r line_plot_lattice, eval=TRUE} library(lattice) p2 <- parallelplot(~iris[1:4] | Species, iris, horizontal.axis = FALSE, layout = c(1, 3, 1)) plot(p2) ``` ## ggplot2 Graphics - What is `ggplot2`? - High-level graphics system developed by Hadley Wickham - Implements grammar of graphics from [Leland Wilkinson](http://www.amazon.com/Grammar-Graphics-Leland-Wilkinson/dp/0387987746) - Streamlines many graphics workflows for complex plots - Syntax centered around main `ggplot` function - Simpler `qplot` function provides many shortcuts - Documentation and Help - [Book](https://ggplot2-book.org/) - [R for Data Science](https://r4ds.had.co.nz/data-visualisation.html) - [ggplot2 reference](https://ggplot2.tidyverse.org/reference/) - [Cookbook for R](http://www.cookbook-r.com/Graphs/) ### Design Concept of `ggplot2` Plotting formalized and implemented by the grammar of graphics by Leland Wilkinson and Hadley Wickham [@Wickham2010-nu; @Wilkinson2012-wv; @Wickham2009-aq]. The plotting process in `ggplot2` is devided into layers including: 1. Data: the actual data to be plotted 2. Aesthetics: visual property of the objects in a plot (_e.g._ size, shape or color ) 3. Geometries: shapes used to represent data (_e.g._ bar or scatter plot) 4. Facets: row and column layout of sub-plots 5. Statistics: data models and summaries 6. Coordinates: the plotting space 7. Theme: styles to be used, such as fonts, backgrounds, etc.
### `ggplot2` Usage - `ggplot` function accepts two main arguments - Data set to be plotted - Aesthetic mappings provided by `aes` function - Additional parameters such as geometric objects (_e.g._ points, lines, bars) are passed on by appending them with `+` as separator. - List of available `geom_*` functions see [here](https://ggplot2.tidyverse.org/reference/) - Settings of plotting theme can be accessed with the command `theme_get()` and its settings can be changed with `theme()`. - Preferred input data object - `qplot`: `data.frame` or `tibble` (support for `vector`, `matrix`, `...`) - `ggplot`: `data.frame` or `tibble` - Packages with convenience utilities to create expected inputs - `dplyr` (`plyr`) - `tidyr` and `reshape2` ### `qplot` Function The syntax of `qplot` is similar as R's basic `plot` function - Arguments - `x`: x-coordinates (_e.g._ `col1`) - `y`: y-coordinates (_e.g._ `col2`) - `data`: `data.frame` or `tibble` with corresponding column names - `xlim, ylim`: _e.g._ `xlim=c(0,10)` - `log`: _e.g._ `log="x"` or `log="xy"` - `main`: main title; see `?plotmath` for mathematical formula - `xlab, ylab`: labels for the x- and y-axes - `color`, `shape`, `size` - `...`: many arguments accepted by `plot` function ### `qplot`: scatter plot basics Create sample data, here 3 vectors: `x`, `y` and `cat` ```{r qplot_scatter_plot_sample, eval=TRUE} library(ggplot2) x <- sample(1:10, 10); y <- sample(1:10, 10); cat <- rep(c("A", "B"), 5) ``` Simple scatter plot ```{r qplot_scatter_plot, eval=TRUE} qplot(x, y, geom="point") ``` Prints dots with different sizes and colors ```{r qplot_scatter_plot_dot_param, eval=TRUE} qplot(x, y, geom="point", size=x, color=cat, main="Dot Size and Color Relative to Some Values") ``` Drops legend ```{r qplot_scatter_plot_no_legend, eval=TRUE} qplot(x, y, geom="point", size=x, color=cat) + theme(legend.position = "none") ``` Plot different shapes ```{r qplot_scatter_plot_shapes, eval=TRUE} qplot(x, y, geom="point", size=5, shape=cat) ``` #### Colored groups ```{r qplot_scatter_plot_colored_groups, eval=TRUE} p <- qplot(x, y, geom="point", size=x, color=cat, main="Dot Size and Color Relative to Some Values") + theme(legend.position = "none") print(p) ``` #### Regression line ```{r qplot_scatter_regression_line, eval=TRUE} set.seed(1410) dsmall <- diamonds[sample(nrow(diamonds), 1000), ] p <- qplot(carat, price, data = dsmall) + geom_smooth(method="lm") print(p) ``` #### Local regression curve (loess) ```{r qplot_scatter_regression_loess, eval=TRUE} p <- qplot(carat, price, data=dsmall, geom=c("point", "smooth")) print(p) # Setting se=FALSE removes error shade ``` ### `ggplot` Function - More important than `qplot` to access full functionality of `ggplot2` - Main arguments - data set, usually a `data.frame` or `tibble` - aesthetic mappings provided by `aes` function - General `ggplot` syntax - __`ggplot(data, aes(...)) + geom() + ... + stat() + ...`__ - Layer specifications - `geom(mapping, data, ..., geom, position)` - `stat(mapping, data, ..., stat, position)` - Additional components - `scales` - `coordinates` - `facet` - `aes()` mappings can be passed on to all components (`ggplot, geom`, etc.). Effects are global when passed on to `ggplot()` and local for other components. - `x, y` - `color`: grouping vector (factor) - `group`: grouping vector (factor) #### Changing Plotting Themes in `ggplot` - Theme settings can be accessed with `theme_get()` - Their settings can be changed with `theme()` Example how to change background color to white ```{r ggplot_background_color, eval=FALSE} ... + theme(panel.background=element_rect(fill = "white", colour = "black")) ``` #### Storing `ggplot` Specifications Plots and layers can be stored in variables ```{r ggplot_store_plot, eval=FALSE} p <- ggplot(dsmall, aes(carat, price)) + geom_point() p # or print(p) ``` Returns information about data and aesthetic mappings followed by each layer ```{r ggplot_summary, eval=FALSE} summary(p) ``` Print dots with different sizes and colors ```{r ggplot_dot_sizes, eval=FALSE} bestfit <- geom_smooth(method = "lm", se = F, color = alpha("steelblue", 0.5), size = 2) p + bestfit # Plot with custom regression line ``` Syntax to pass on other data sets ```{r ggplot_pass_data, eval=FALSE} p %+% diamonds[sample(nrow(diamonds), 100),] ``` Saves plot stored in variable `p` to file ```{r ggplot_save_plot, eval=FALSE} ggsave(p, file="myplot.pdf") ``` Standard R export functons for graphics work as well (see [here](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rgraphics/rgraphics/#saving-graphics-to-files)). ### `ggplot`: scatter plots #### Basic example ```{r ggplot_scatter_plot1, eval=TRUE} set.seed(1410) dsmall <- as.data.frame(diamonds[sample(nrow(diamonds), 1000), ]) p <- ggplot(dsmall, aes(carat, price, color=color)) + geom_point(size=4) print(p) ``` Interactive version of above plot can be generated with the `ggplotly` function from the `plotly` package. ```{r ggplot_scatter_plot_interactive1, eval=TRUE} library(plotly) ggplotly(p) ``` #### Regression line ```{r ggplot_regression_line, eval=TRUE} p <- ggplot(dsmall, aes(carat, price)) + geom_point() + geom_smooth(method="lm", se=FALSE) + theme(panel.background=element_rect(fill = "white", colour = "black")) print(p) ``` #### Several regression lines ```{r ggplot_many_regression_lines, eval=TRUE} p <- ggplot(dsmall, aes(carat, price, group=color)) + geom_point(aes(color=color), size=2) + geom_smooth(aes(color=color), method = "lm", se=FALSE) print(p) ``` #### Local regression curve (loess) ```{r ggplot_loess_regression, eval=TRUE} p <- ggplot(dsmall, aes(carat, price)) + geom_point() + geom_smooth() print(p) # Setting se=FALSE removes error shade ``` ### `ggplot`: line plot ```{r ggplot_line_plot, eval=TRUE} p <- ggplot(iris, aes(Petal.Length, Petal.Width, group=Species, color=Species)) + geom_line() print(p) ``` ### Faceting ```{r ggplot_line_plot_faceting, eval=TRUE} p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_line(aes(color=Species), size=1) + facet_wrap(~Species, ncol=1) print(p) ``` #### Exercise 3 Scatter plots with `ggplot2` - __Task 1__: Generate scatter plot for first two columns in `iris` data frame and color dots by its `Species` column. - __Task 2__: Use the `xlim` and `ylim` arguments to set limits on the x- and y-axes so that all data points are restricted to the left bottom quadrant of the plot. - __Task 3__: Generate corresponding line plot with faceting presenting the individual data sets in saparate plots. Structure of `iris` data set ```{r structure_iris_data3, eval=TRUE} class(iris) iris[1:4,] table(iris$Species) ``` ```{r exercise3_solution, eval=FALSE, echo=FALSE} ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point(aes(color=Species), size=3) + xlim(4,15) + ylim(2,12) ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_line(aes(color=Species), size=3) + ggtitle("Line Plot") + facet_wrap(~Species, ncol=1) ``` ### Bar Plots Sample Set: the following transforms the `iris` data set into a ggplot2-friendly format. Calculate mean values for aggregates given by `Species` column in `iris` data set ```{r iris_mean, eval=TRUE} iris_mean <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=mean) ``` Calculate standard deviations for aggregates given by `Species` column in `iris` data set ```{r iris_sd, eval=TRUE} iris_sd <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=sd) ``` Reformat `iris_mean` with `melt` from wide to long form as expected by `ggplot2`. Newer alternatives for restructuring `data.frames` and `tibbles` from wide into long form use the `gather` and `pivot_longer` functions defined by the `tidyr` package. Their usage is shown below as well. The functions `pivot_longer` and `pivot_wider` are expected to provide the most flexible long-term solution, but may not work in older R versions. ```{r iris_mean_melt, eval=TRUE} library(reshape2) # Defines melt function df_mean <- melt(iris_mean, id.vars=c("Species"), variable.name = "Samples", value.name="Values") df_mean2 <- tidyr::gather(iris_mean, !Species, key = "Samples", value = "Values") df_mean3 <- tidyr::pivot_longer(iris_mean, !Species, names_to="Samples", values_to="Values") ``` Reformat `iris_sd` with `melt` ```{r iris_sd_melt, eval=TRUE} df_sd <- melt(iris_sd, id.vars=c("Species"), variable.name = "Samples", value.name="Values") ``` Define standard deviation limits ```{r iris_limits, eval=TRUE} limits <- aes(ymax = df_mean[,"Values"] + df_sd[,"Values"], ymin=df_mean[,"Values"] - df_sd[,"Values"]) ``` #### Verical orientation ```{r iris_mean_bar_plot, eval=TRUE} p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) + geom_bar(position="dodge", stat="identity") print(p) ``` To enforce that the bars are plotted in the order specified in the input data, one can instruct `ggplot` to do so by turning the corresponding column (here `Species`) into an ordered factor as follows. ```{r iris_factor_order, eval=FALSE} df_mean$Species <- factor(df_mean$Species, levels=unique(df_mean$Species), ordered=TRUE) ``` In the above example this is not necessary since `ggplot` uses this order already. #### Horizontal orientation ```{r iris_mean_bar_plot_sideways, eval=TRUE} p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) + geom_bar(position="dodge", stat="identity") + coord_flip() + theme(axis.text.y=element_text(angle=0, hjust=1)) print(p) ``` #### Faceting ```{r iris_mean_bar_plot_facetting, eval=FALSE} p <- ggplot(df_mean, aes(Samples, Values)) + geom_bar(aes(fill = Species), stat="identity") + facet_wrap(~Species, ncol=1) print(p) ``` #### Error bars ```{r iris_mean_bar_plot_error, eval=TRUE} p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) + geom_bar(position="dodge", stat="identity") + geom_errorbar(limits, position="dodge") print(p) ``` #### Mirrored ```{r ggplot2_mirrored_barplot, eval=TRUE} df <- data.frame(group = rep(c("Above", "Below"), each=10), x = rep(1:10, 2), y = c(runif(10, 0, 1), runif(10, -1, 0))) p <- ggplot(df, aes(x=x, y=y, fill=group)) + geom_col() print(p) ``` ### Changing Color Settings ```{r ggplot2_color1, eval=TRUE} library(RColorBrewer) # display.brewer.all() p <- ggplot(df_mean, aes(Samples, Values, fill=Species, color=Species)) + geom_bar(position="dodge", stat="identity") + geom_errorbar(limits, position="dodge") + scale_fill_brewer(palette="Blues") + scale_color_brewer(palette = "Greys") print(p) ``` #### Using standard R color theme ```{r ggplot2_color2, eval=TRUE} p <- ggplot(df_mean, aes(Samples, Values, fill=Species, color=Species)) + geom_bar(position="dodge", stat="identity") + geom_errorbar(limits, position="dodge") + scale_fill_manual(values=c("red", "green3", "blue")) + scale_color_manual(values=c("red", "green3", "blue")) print(p) ``` #### Exercise 4 Bar plots - __Task 1__: Calculate the mean values for the `Species` components of the first four columns in the `iris` data set. Use the `melt` function from the `reshape2` package to bring the data into the expected format for `ggplot`. - __Task 2__: Generate two bar plots: one with stacked bars and one with horizontally arranged bars. Structure of iris data set ```{r iris_structure4, eval=TRUE} class(iris) iris[1:4,] table(iris$Species) ``` ```{r exercise4_solution, eval=FALSE, echo=FALSE} iris_mean <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=mean) library(reshape2) df_mean <- melt(iris_mean, id.vars=c("Species"), variable.name = "Samples", value.name="Values") ggplot(df_mean, aes(Samples, Values, fill=Species)) + geom_bar(position="stack", stat="identity") ggplot(df_mean, aes(Samples, Values, fill=Species)) + geom_bar(position="dodge", stat="identity") ``` ### Data reformatting example Here for line plot ```{r ggplot_melt_data, eval=TRUE} y <- matrix(rnorm(500), 100, 5, dimnames=list(paste("g", 1:100, sep=""), paste("Sample", 1:5, sep=""))) y <- data.frame(Position=1:length(y[,1]), y) y[1:4, ] # First rows of input format expected by melt() df <- melt(y, id.vars=c("Position"), variable.name = "Samples", value.name="Values") p <- ggplot(df, aes(Position, Values)) + geom_line(aes(color=Samples)) + facet_wrap(~Samples, ncol=1) print(p) ``` Same data can be represented in box plot as follows ```{r ggplot_melt_box_plot, eval=FALSE} ggplot(df, aes(Samples, Values, fill=Samples)) + geom_boxplot() + geom_jitter(color="darkgrey") ``` ### Jitter Plots ```{r ggplot_jitter_plot, eval=TRUE} p <- ggplot(dsmall, aes(color, price/carat)) + geom_jitter(alpha = I(1 / 2), aes(color=color)) print(p) ``` ### Box plots ```{r ggplot_box_plot2, eval=TRUE} p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_boxplot() print(p) ``` ### Violin plots ```{r ggplot_box_violin, eval=TRUE} p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_violin() print(p) ``` Same violin plot as interactive plot generated with `ggplotly`, where the actual data points are shown as well by including `geom_jitter()`. ```{r ggplot_box_plot2_jitter, eval=TRUE} p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_violin() + geom_jitter(aes(color=color)) ggplotly(p) ``` ### Density plots #### Line coloring ```{r ggplot_density_plot_linecol, eval=TRUE} p <- ggplot(dsmall, aes(carat)) + geom_density(aes(color = color)) print(p) ``` #### Area coloring ```{r ggplot_density_plot_areacol, eval=TRUE} p <- ggplot(dsmall, aes(carat)) + geom_density(aes(fill = color)) print(p) ``` ### Histograms ```{r ggplot_histogram, eval=TRUE} p <- ggplot(iris, aes(x=Sepal.Width)) + geom_histogram(aes(y = ..density.., fill = ..count..), binwidth=0.2) + geom_density() print(p) ``` ### Pie Chart ```{r ggplot_pie_chart, eval=TRUE} df <- data.frame(variable=rep(c("cat", "mouse", "dog", "bird", "fly")), value=c(1,3,3,4,2)) p <- ggplot(df, aes(x = "", y = value, fill = variable)) + geom_bar(width = 1, stat="identity") + coord_polar("y", start=pi / 3) + ggtitle("Pie Chart") print(p) ``` ### Wind Rose Pie Chart ```{r ggplot_windrose_pie_chart, eval=TRUE} p <- ggplot(df, aes(x = variable, y = value, fill = variable)) + geom_bar(width = 1, stat="identity") + coord_polar("y", start=pi / 3) + ggtitle("Pie Chart") print(p) ``` ### Arranging Graphics on Page Using `grid` package ```{r ggplot_arrange_graphics, eval=TRUE} library(grid) a <- ggplot(dsmall, aes(color, price/carat)) + geom_jitter(size=4, alpha = I(1 / 1.5), aes(color=color)) b <- ggplot(dsmall, aes(color, price/carat, color=color)) + geom_boxplot() c <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_boxplot() + theme(legend.position = "none") grid.newpage() # Open a new page on grid device pushViewport(viewport(layout = grid.layout(2, 2))) # Assign to device viewport with 2 by 2 grid layout print(a, vp = viewport(layout.pos.row = 1, layout.pos.col = 1:2)) print(b, vp = viewport(layout.pos.row = 2, layout.pos.col = 1)) print(c, vp = viewport(layout.pos.row = 2, layout.pos.col = 2, width=0.3, height=0.3, x=0.8, y=0.8)) ``` Using `gridExtra` package ```{r ggplot_arrange_graphics2, eval=TRUE, warning=FALSE, message=FALSE} library(gridExtra) grid.arrange(a, b, c, nrow = 2, ncol=2) ``` Also see `patchwork` in ggplot2 book [here](https://ggplot2-book.org/arranging-plots.html). ### Inserting Graphics into Plots ```{r ggplot_insert_graphics, eval=TRUE} library(grid) print(a) print(b, vp=viewport(width=0.3, height=0.3, x=0.8, y=0.8)) ``` ## Specialty Graphics ### Spatial Heatmap Diagrams Spatial expression data can be visualized with the [`spatialHeatmap`](https://bioconductor.org/packages/release/bioc/vignettes/spatialHeatmap/inst/doc/spatialHeatmap.html) package.
### Venn Diagrams ```{r specgraph_venn, eval=TRUE, message=FALSE, fig.dim=c(5,5), fig.align="center"} library(systemPipeR) setlist5 <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18)) OLlist5 <- overLapper(setlist=setlist5, sep="_", type="vennsets") vennPlot(OLlist5, mymain="", mysub="default", colmode=2, ccol=c("blue", "red")) ``` ### UpSet Plots ```{r specgraph_upset, eval=TRUE, message=FALSE, fig.dim=c(5,5), fig.align="center"} library(ComplexHeatmap) setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20)) setma <- make_comb_mat(setlist) UpSet(setma) ``` ### Compound Structures Plots depictions of small molecules with `ChemmineR` package ```{r specgraph_structure, eval=TRUE} library(ChemmineR) data(sdfsample) plot(sdfsample[1], print=FALSE) ``` ### Heatmaps There are many packages for plotting heatmaps in R. The following uses `pheatmap`. ```{r pheatmap, eval=TRUE, warning=FALSE, message=FALSE, fig.dim=c(6,6), fig.align="center"} library(pheatmap); library("RColorBrewer") y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep=""))) pheatmap(y, color=brewer.pal(9,"Blues")) ``` ### ROC Plots A variety of libraries are available for plotting receiver operating characteristic (ROC) curves in R: + [ROCR](http://rocr.bioinf.mpi-sb.mpg.de/) + [ROC](http://bioconductor.org/packages/release/bioc/html/ROC.html) + [pROC](http://web.expasy.org/pROC/) (includes ggplot2) + [plotROC](https://cran.r-project.org/web/packages/plotROC/) (uses ggplot2) + [ggplot2](http://largedata.blogspot.com/2011/07/plotting-roc-curves-in-ggplot2.html) (custom code example) #### Example with ROCR Most commonly, in an ROC we plot the true positive rate (y-axis) against the false positive rate (x-axis) at decreasing thresholds. An illustrative example is provided in the `ROCR` package where one wants to inspect the content of the `ROCR.simple` object defining the structure of the input data in two vectors. ```{r ROCR_example, eval=TRUE, warning=FALSE, message=FALSE} # install.packages("ROCR") # Install if necessary library(ROCR) data(ROCR.simple) ROCR.simple pred <- prediction(ROCR.simple$predictions, ROCR.simple$labels) perf <- performance( pred, "tpr", "fpr" ) plot(perf) ``` Obtain area under the curve (AUC) ```{r ROCR_example2, eval=TRUE, warning=FALSE, message=FALSE} auc <- performance( pred, "tpr", "fpr", measure = "auc") auc@y.values[[1]] ``` #### Example with plotROC The [`plotROC`](https://cran.r-project.org/web/packages/plotROC/) package plots ROCs with `ggplot2`. The following generates a sample data set for several performance results, here three. For convenience the data are first arranged in a `data.frame` in wide format. Next, the `melt_roc` is used to convert the `data.frame` into the long format as required by `ggplot`. ```{r plotROC_example, eval=TRUE, warning=FALSE, message=FALSE} library(plotROC) set.seed(2529) D.ex <- rbinom(200, size = 1, prob = .5); M1 <- rnorm(200, mean = D.ex, sd = .65); M2 <- rnorm(200, mean = D.ex, sd = 1.1); M3 <- rnorm(200, mean = D.ex, sd = 1.5) perfDF <- data.frame(D = D.ex, D.str = c("TRUE", "Ill")[D.ex + 1], M1 = M1, M2 = M2, M3=M3, stringsAsFactors = FALSE) perfDF[1:4,] # wide format long_perfDF <- melt_roc(perfDF, "D", c("M1", "M2", "M3")) # transformed into long format for ggplot long_perfDF[1:4,] # long format ``` After converting the sample data into the long format the results can be plotted with `geom_roc`, where several ROCs are combined in a single plot and the corresponding AUC values are shown in the legend. ```{r plotROC_example2, eval=TRUE, warning=FALSE, message=FALSE} multi_roc <- ggplot(long_perfDF, aes(d = D, m = M, color = name)) + geom_roc(n.cuts=0) auc_df <- calc_auc(multi_roc) # calculate AUC values auc_str <- paste0(auc_df$name, ": ", round(auc_df$AUC, 2)) multi_roc + scale_color_manual(name="AUC:", labels=auc_str, values=seq_along(auc_str)) ``` ### Trees The [`ape`](http://ape-package.ird.fr/ape_screenshots.html) package provides many useful utilities for phylogenetic analysis and tree plotting. Another useful package for plotting trees is [`ggtree`](http://bioconductor.org/packages/release/bioc/html/ggtree.html). The following example plots two trees face to face with links to identical leaf labels. ```{r trees_ape1, eval=TRUE} library(ape) tree1 <- rtree(40) tree2 <- rtree(20) association <- cbind(tree2$tip.label, tree2$tip.label) cophyloplot(tree1, tree2, assoc = association, length.line = 4, space = 28, gap = 3) ``` ## Genome Graphics ### `ggbio` - What is `ggbio`? - A programmable genome browser environment - Genome broswer concepts - A genome browser is a visulalization tool for plotting different types of genomic data in separate tracks along chromosomes. - The `ggbio` package [@Yin2012-jj] facilitates plotting of complex genome data objects, such as read alignments (SAM/BAM), genomic context/annotation information (gff/txdb), variant calls (VCF/BCF), and more. To easily compare these data sets, it extends the faceting facility of `ggplot2` to genome browser-like tracks. - Most of the core object types for handling genomic data with R/Bioconductor are supported: `GRanges`, `GAlignments`, `VCF`, etc. For more details, see Table 1.1 of the `ggbio` vignette [here](http://www.bioconductor.org/packages/release/bioc/vignettes/ggbio/inst/doc/ggbio.pdf). - `ggbio`'s convenience plotting function is `autoplot`. For more customizable plots, one can use the generic `ggplot` function. - Apart from the standard `ggplot2` plotting components, `ggbio` defines serval new components useful for genomic data visualization. A detailed list is given in Table 1.2 of the vignette [here](http://www.bioconductor.org/packages/release/bioc/vignettes/ggbio/inst/doc/ggbio.pdf). - Useful web sites: - [ggbio Bioc vignette](http://bioconductor.org/packages/release/bioc/html/ggbio.html) - [ggbio online manual](https://lawremi.github.io/ggbio/) - [autoplot demo](https://lawremi.github.io/ggbio/docs/man/autoplot-method.html) #### Tracks: aligning plots along chromosomes ```{r ggbio_align, eval=TRUE, warning=FALSE, message=FALSE} library(ggbio) df1 <- data.frame(time = 1:100, score = sin((1:100)/20)*10) p1 <- qplot(data = df1, x = time, y = score, geom = "line") df2 <- data.frame(time = 30:120, score = sin((30:120)/20)*10, value = rnorm(120-30 +1)) p2 <- ggplot(data = df2, aes(x = time, y = score)) + geom_line() + geom_point(size = 2, aes(color = value)) tracks(time1 = p1, time2 = p2) + xlim(1, 40) + theme_tracks_sunset() ``` #### Plotting genomic ranges `GRanges` objects are essential for storing alignment or annotation ranges in R/Bioconductor. The following creates a sample `GRanges` object and plots its content. ```{r ggbio_granges, eval=TRUE} library(GenomicRanges) set.seed(1); N <- 100; gr <- GRanges(seqnames = sample(c("chr1", "chr2", "chr3"), size = N, replace = TRUE), IRanges(start = sample(1:300, size = N, replace = TRUE), width = sample(70:75, size = N,replace = TRUE)), strand = sample(c("+", "-"), size = N, replace = TRUE), value = rnorm(N, 10, 3), score = rnorm(N, 100, 30), sample = sample(c("Normal", "Tumor"), size = N, replace = TRUE), pair = sample(letters, size = N, replace = TRUE)) autoplot(gr, aes(color = strand, fill = strand), facets = strand ~ seqnames) ``` #### Plotting coverage ```{r ggbio_coverage, eval=TRUE} autoplot(gr, aes(color = strand, fill = strand), facets = strand ~ seqnames, stat = "coverage") ``` #### Mirrored coverage ```{r ggbio_mirrored_coverage, eval=TRUE} pos <- sapply(coverage(gr[strand(gr)=="+"]), as.numeric) pos <- data.frame(Chr=rep(names(pos), sapply(pos, length)), Strand=rep("+", length(unlist(pos))), Position=unlist(sapply(pos, function(x) 1:length(x))), Coverage=as.numeric(unlist(pos))) neg <- sapply(coverage(gr[strand(gr)=="-"]), as.numeric) neg <- data.frame(Chr=rep(names(neg), sapply(neg, length)), Strand=rep("-", length(unlist(neg))), Position=unlist(sapply(neg, function(x) 1:length(x))), Coverage=-as.numeric(unlist(neg))) covdf <- rbind(pos, neg) p <- ggplot(covdf, aes(Position, Coverage, fill=Strand)) + geom_col() + facet_wrap(~Chr) p ``` ### Circular genome plots ```{r ggbio_circular1, eval=TRUE} ggplot(gr) + layout_circle(aes(fill = seqnames), geom = "rect") ``` More complex circular example ```{r ggbio_circular2, eval=FALSE} seqlengths(gr) <- c(400, 500, 700) values(gr)$to.gr <- gr[sample(1:length(gr), size = length(gr))] idx <- sample(1:length(gr), size = 50) gr <- gr[idx] ggplot() + layout_circle(gr, geom = "ideo", fill = "gray70", radius = 7, trackWidth = 3) + layout_circle(gr, geom = "bar", radius = 10, trackWidth = 4, aes(fill = score, y = score)) + layout_circle(gr, geom = "point", color = "red", radius = 14, trackWidth = 3, grid = TRUE, aes(y = score)) + layout_circle(gr, geom = "link", linked.to = "to.gr", radius = 6, trackWidth = 1) ```
#### Alignments and variants To make the following example work, please download and unpack this [data archive](https://cluster.hpcc.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec_12_16_2013/Rgraphics/data.zip) containing GFF, BAM and VCF sample files. ```{r ggbio_align_variants, eval=TRUE, warning=FALSE, message=FALSE} library(rtracklayer); library(GenomicFeatures); library(Rsamtools); library(GenomicAlignments); library(VariantAnnotation) ga <- readGAlignments("./data/SRR064167.fastq.bam", use.names=TRUE, param=ScanBamParam(which=GRanges("Chr5", IRanges(4000, 8000)))) p1 <- autoplot(ga, geom = "rect") p2 <- autoplot(ga, geom = "line", stat = "coverage") vcf <- readVcf(file="data/varianttools_gnsap.vcf", genome="ATH1") p3 <- autoplot(vcf[seqnames(vcf)=="Chr5"], type = "fixed") + xlim(4000, 8000) + theme(legend.position = "none", axis.text.y = element_blank(), axis.ticks.y=element_blank()) txdb <- makeTxDbFromGFF(file="./data/TAIR10_GFF3_trunc.gff", format="gff3") p4 <- autoplot(txdb, which=GRanges("Chr5", IRanges(4000, 8000)), names.expr = "gene_id") tracks(Reads=p1, Coverage=p2, Variant=p3, Transcripts=p4, heights = c(0.3, 0.2, 0.1, 0.35)) + ylab("") ``` ### Additional examples See `autoplot` demo [here](https://lawremi.github.io/ggbio/docs/man/autoplot-method.html) ### Additional genome graphics - [`Gviz`](http://www.bioconductor.org/packages/devel/bioc/html/Gviz.html) - [`RCircos`](http://cran.us.r-project.org/web/packages/RCircos/index.html) [@Zhang2013-zn] - [`genoPlotR`](http://genoplotr.r-forge.r-project.org/) ### Genome Browser: IGV View genome data in IGV - Download and open [IGV](http://software.broadinstitute.org/software/igv/download) - Select in menu in top left corner _A. thaliana_ (TAIR10) - Upload the following indexed/sorted Bam files with `File -> Load from URL...` ``` https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064154.fastq.bam https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064155.fastq.bam https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064166.fastq.bam https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064167.fastq.bam ``` - To view area of interest, enter its coordinates `Chr1:49,457-51,457` in position menu on top.
#### Create symbolic links For viewing BAM files in IGV as part of `systemPipeR` workflows. - `systemPipeR`: utilities for building NGS analysis pipelines. ```{r systempiper_symbolic_links, eval=FALSE, message=FALSE} library("systemPipeR") symLink2bam(sysargs=args, htmldir=c("~/.html/", "somedir/"), urlbase="http://myserver.edu/~username/", urlfile="IGVurl.txt") ``` #### Controlling IGV from R Open IGV before running the following routine. Alternatively, open IGV from within R with `startIGV("lm")`. Note, the latter may not work on all systems. ```{r control_igv, eval=FALSE, warning=FALSE, message=FALSE} library(SRAdb) myurls <- readLines("http://cluster.hpcc.ucr.edu/~tgirke/Documents/R_BioCond/Samples/bam_urls.txt") # startIGV("lm") # opens IGV sock <- IGVsocket() IGVload(sock, myurls) IGVgoto(sock, 'Chr1:45296-47019') ``` ## Session Info ```{r sessionInfo} sessionInfo() ``` ## References