---
title: Graphics and Data Visualization in R
author: "First/last name (first.last@ucr.edu)"
date: "Last update: `r format(Sys.time(), '%d %B, %Y')`"
output:
html_document:
toc: true
toc_float:
collapsed: true
smooth_scroll: true
toc_depth: 3
fig_caption: yes
code_folding: show
number_sections: true
fontsize: 14pt
bibliography: bibtex.bib
weight: 15
type: docs
---
```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
options(width=100, max.print=1000)
knitr::opts_chunk$set(
message=FALSE,
eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")),
cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")))
```
Source code downloads:
[ [.Rmd](https://raw.githubusercontent.com/tgirke/GEN242//main/content/en/tutorials/rgraphics/rgraphics.Rmd) ]
[ [.R](https://raw.githubusercontent.com/tgirke/GEN242//main/content/en/tutorials/rgraphics/rgraphics.R) ]
## Overview
### Graphics in R
- Powerful environment for visualizing scientific data
- Integrated graphics and statistics infrastructure
- Publication quality graphics
- Fully programmable
- Highly reproducible
- Full [LaTeX](http://www.latex-project.org/), [Sweave](http://www.stat.auckland.ac.nz/~dscott/782/Sweave-manual-20060104.pdf), [knitr](http://yihui.name/knitr/) and [R Markdown](http://rmarkdown.rstudio.com/) support.
- Vast number of R packages with graphics utilities
### Documentation on Graphics in R
- General
- [Graphics Task Page](https://cran.r-project.org/web/views/)
- [R Graph Gallery](http://www.r-graph-gallery.com/)
- [R Graphical Manual](https://www.imsbio.co.jp/RGM/R_image_list?page=575&init=true)
- [Paul Murrell’s book R (Grid) Graphics](http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html)
- ggplot2: [Book](https://ggplot2-book.org/), [R for Data Science](https://r4ds.had.co.nz/data-visualisation.html) and [ggplot2 reference](https://ggplot2.tidyverse.org/reference/)
- Interactive graphics
- [rggobi (GGobi)](http://www.ggobi.org/)
- [iplots](http://www.iplots.org/)
- [Open GL (`rgl`)](http://rgl.neoscientists.org/gallery.shtml)
- [plotly](https://plotly.com/r/)
### Graphics Environments
- Viewing and savings graphics in R
- On-screen graphics
- postscript, pdf, svg
- jpeg/png/wmf/tiff/...
- Four major graphics environments
- Low-level infrastructure
- R Base Graphics (low- and high-level)
- `grid`: [Manual](http://www.stat.auckland.ac.nz/~paul/grid/grid.html), [Book](http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html)
- High-level infrastructure
- `lattice`: [Manual](http://lmdvr.r-forge.r-project.org), [Intro](https://lattice.r-forge.r-project.org/Vignettes/src/lattice-intro/lattice-intro.pdf), [Book](http://www.amazon.com/Lattice-Multivariate-Data-Visualization-Use/dp/0387759689)
- `ggplot2`: [Manual](https://r4ds.hadley.nz/data-visualize.html), [Intro](http://www.ling.upenn.edu/~joseff/rstudy/summer2010_ggplot2_intro.html), [Book](https://ggplot2-book.org/)
## Base Graphics
### Overview
- Important high-level plotting functions
- `plot`: generic x-y plotting
- `barplot`: bar plots
- `boxplot`: box-and-whisker plot
- `hist`: histograms
- `pie`: pie charts
- `dotchart`: cleveland dot plots
- `image, heatmap, contour, persp`: functions to generate image-like plots
- `qqnorm, qqline, qqplot`: distribution comparison plots
- `pairs, coplot`: display of multivariant data
- Help on these functions
- `?myfct`
- `?plot`
- `?par`
### Preferred Input Data Objects
- Matrices and data frames
- Vectors
- Named vectors
### Scatter Plots
#### Basic scatter plots
Sample data set for subsequent plots
```{r scatter_plot_basic, eval=TRUE}
set.seed(1410)
y <- matrix(runif(30), ncol=3, dimnames=list(letters[1:10], LETTERS[1:3]))
y
plot(y[,1], y[,2])
```
#### All pairs
```{r scatter_plot_allpairs, eval=TRUE}
pairs(y)
```
#### Plot labels
```{r scatter_plot_labels, eval=TRUE}
plot(y[,1], y[,2], pch=20, col="red", main="Symbols and Labels")
text(y[,1]+0.03, y[,2], rownames(y))
```
#### More examples
Print instead of symbols the row names
```{r scatter_plot_row_names, eval=FALSE}
plot(y[,1], y[,2], type="n", main="Plot of Labels")
text(y[,1], y[,2], rownames(y))
```
Usage of important plotting parameters
```{r plot_parameters, eval=FALSE}
grid(5, 5, lwd = 2)
op <- par(mar=c(8,8,8,8), bg="lightblue")
plot(y[,1], y[,2], type="p", col="red", cex.lab=1.2, cex.axis=1.2,
cex.main=1.2, cex.sub=1, lwd=4, pch=20, xlab="x label",
ylab="y label", main="My Main", sub="My Sub")
par(op)
```
Important arguments}
- `mar`: specifies the margin sizes around the plotting area in order: `c(bottom, left, top, right)`
- `col`: color of symbols
- `pch`: type of symbols, samples: `example(points)`
- `lwd`: size of symbols
- `cex.*`: control font sizes
- For details see `?par`
Add a regression line to a plot
```{r scatter_plot_regress, eval=TRUE}
plot(y[,1], y[,2])
myline <- lm(y[,2]~y[,1]); abline(myline, lwd=2)
summary(myline)
```
Same plot as above, but on log scale
```{r scatter_plot_log_scale, eval=TRUE}
plot(y[,1], y[,2], log="xy")
```
Add a mathematical expression to a plot
```{r scatter_plot_math, eval=TRUE}
plot(y[,1], y[,2]); text(y[1,1], y[1,2],
expression(sum(frac(1,sqrt(x^2*pi)))), cex=1.3)
```
#### Exercise 1
- __Task 1__: Generate scatter plot for first two columns in `iris` data frame and color dots by its `Species` column.
- __Task 2__: Use the `xlim/ylim` arguments to set limits on the x- and y-axes so that all data points are restricted to the left bottom quadrant of the plot.
Structure of iris data set:
```{r iris_structure, eval=TRUE}
class(iris)
iris[1:4,]
table(iris$Species)
```
```{r exercise1_solution, eval=FALSE, echo=FALSE}
plot(iris[,1], iris[,2], col=iris$Species, lwd=2, pch=19)
plot(iris[,1], iris[,2], col=iris$Species, lwd=2, pch=19, xlim=c(4,16), ylim=c(2,8))
```
### Line Plots
#### Single Data Set
```{r line_plot_single, eval=TRUE}
plot(y[,1], type="l", lwd=2, col="blue")
```
Additional lines can be added to an existing plot with the `lines()` function.
```{r line_plot_multiple, eval=TRUE}
plot(y[,1], type="l", lwd=2, col="blue")
lines(y[,2], lwd=2, lty=1, col="red")
legend(8.3, 0.95, legend=c("Line 1", "Line 2"), col=c("blue", "red"), lty=1)
```
#### Many Data Sets
Alterntively, one can plot a line graph for all columns in data frame `y` with the following approach. The `split.screen`
function is used in this example in a `for` loop to overlay several line graphs in the same plot.
```{r line_plot_many, eval=TRUE}
split.screen(c(1,1))
plot(y[,1], ylim=c(0,1), xlab="Measurement", ylab="Intensity", type="l", lwd=2, col=1)
for(i in 2:length(y[1,])) {
screen(1, new=FALSE)
plot(y[,i], ylim=c(0,1), type="l", lwd=2, col=i, xaxt="n", yaxt="n", ylab="",
xlab="", main="", bty="n")
}
close.screen(all=TRUE)
```
### Bar Plots
#### Basics
```{r bar_plot_basic, eval=TRUE}
barplot(y[1:4,], ylim=c(0, max(y[1:4,])+0.3), beside=TRUE,
legend=letters[1:4])
text(labels=round(as.vector(as.matrix(y[1:4,])),2), x=seq(1.5, 13, by=1)
+sort(rep(c(0,1,2), 4)), y=as.vector(as.matrix(y[1:4,]))+0.04)
```
#### Error bars
```{r bar_plot_error_bar, eval=TRUE}
bar <- barplot(m <- rowMeans(y) * 10, ylim=c(0, 10))
stdev <- sd(t(y))
arrows(bar, m, bar, m + stdev, length=0.15, angle = 90)
```
#### Mirrored bar plot
```{r bar_plot_mirrored, eval=TRUE}
df <- data.frame(group = rep(c("Above", "Below"), each=10), x = rep(1:10, 2), y = c(runif(10, 0, 1), runif(10, -1, 0)))
plot(c(0,12), range(df$y), type = "n")
barplot(height = df$y[df$group == "Above"], add = TRUE, axes = FALSE)
barplot(height = df$y[df$group == "Below"], add = TRUE, axes = FALSE)
```
#### Bar plot of loan payments and amortization tables
The following imports a `mortgage` payment function (from [here](https://raw.githubusercontent.com/tgirke/GEN242/main/content/en/tutorials/rgraphics/scripts/mortgage.R)) that calculates
monthly and annual mortgage/loan payments, generates amortization tables and
plots the results in form of a bar plot. A Shiny App using this function has been created
by Antoine Soetewey [here](https://antoinesoetewey.shinyapps.io/mortgage-calculator/).
```{r bar_plot_mortgage, eval=TRUE}
source("https://raw.githubusercontent.com/tgirke/GEN242/main/content/en/tutorials/rgraphics/scripts/mortgage.R")
m <- mortgage(P=250000, I=6, L=15, plotData=TRUE)
```
### Histograms
```{r hist_plot, eval=TRUE}
hist(y, freq=TRUE, breaks=10)
```
### Density Plots}
```{r density_plot, eval=TRUE}
plot(density(y), col="red")
```
### Pie Charts
```{r pie_chart, eval=TRUE}
pie(y[,1], col=rainbow(length(y[,1]), start=0.1, end=0.8), clockwise=TRUE)
legend("topright", legend=row.names(y), cex=1.3, bty="n", pch=15, pt.cex=1.8,
col=rainbow(length(y[,1]), start=0.1, end=0.8), ncol=1)
```
### Color Selection Utilities
Default color palette and how to change it
```{r color_palette, eval=TRUE}
palette()
palette(rainbow(5, start=0.1, end=0.2))
palette()
palette("default")
```
The `gray` function allows to select any type of gray shades by providing values from 0 to 1
```{r gray_palette, eval=TRUE}
gray(seq(0.1, 1, by= 0.2))
```
Color gradients with `colorpanel` function from `gplots` library
```{r color_panel, eval=FALSE}
library(gplots)
colorpanel(5, "darkblue", "yellow", "white")
```
Much more on colors in R see Earl Glynn's [color chart](http://earlglynn.github.io/color/ColorInR)
### Arranging Several Plots on Single Page
With `par(mfrow=c(nrow, ncol))` one can define how several plots are arranged next to each other.
```{r par_mfrow, eval=TRUE}
par(mfrow=c(2,3))
for(i in 1:6) plot(1:10)
```
#### Arranging Plots with Variable Width
The `layout` function allows to divide the plotting device into variable numbers of rows and columns with the column-widths and the row-heights specified in the respective arguments.
```{r layout_plot, eval=TRUE}
nf <- layout(matrix(c(1,2,3,3), 2, 2, byrow=TRUE), c(3,7), c(5,5),
respect=TRUE)
# layout.show(nf)
for(i in 1:3) barplot(1:10)
```
### Saving Graphics to Files
After the `pdf()` command all graphs are redirected to file `test.pdf`. Works for all common formats similarly: jpeg, png, ps, tiff, ...
```{r save_pdf, eval=FALSE}
pdf("test.pdf"); plot(1:10, 1:10); dev.off()
```
Generates Scalable Vector Graphics (SVG) files that can be edited in vector graphics programs, such as InkScape.
```{r save_svg, eval=FALSE}
svg("test.svg"); plot(1:10, 1:10); dev.off()
```
#### Exercise 2
Bar plots
- __Task 1__: Calculate the mean values for the `Species` components of the first four columns in the `iris` data set. Organize the results in a matrix where the row names are the unique values from the `iris Species` column and the column names are the same as in the first four `iris` columns.
- __Task 2__: Generate two bar plots: one with stacked bars and one with horizontally arranged bars.
Structure of iris data set:
```{r structure_iris2, eval=TRUE}
class(iris)
iris[1:4,]
table(iris$Species)
```
```{r exercise2_solution, eval=FALSE, echo=FALSE}
mMA <- sapply(colnames(iris[,1:4]), function(x) tapply(iris[,x], iris[,5], mean))
barplot(mMA, beside=T, legend=rownames(mMA))
```
## Grid Graphics
- What is `grid`?
- Low-level graphics system
- Highly flexible and controllable system
- Does not provide high-level functions
- Intended as development environment for custom plotting functions
- Pre-installed on new R distributions
- Documentation and Help
- [Manual](http://www.stat.auckland.ac.nz/~paul/grid/grid.html)
- [Book](http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html)
## lattice Graphics
- What is `lattice`?
- High-level graphics system
- Developed by Deepayan Sarkar
- Implements Trellis graphics system from S-Plus
- Simplifies high-level plotting tasks: arranging complex graphical features
- Syntax similar to R's base graphics
- Documentation and Help
- [Manual](http://lmdvr.r-forge.r-project.org)
- [Intro](https://www.statmethods.net/advgraphs/trellis.html)
- [Book](http://www.amazon.com/Lattice-Multivariate-Data-Visualization-Use/dp/0387759689)
Open a list of all functions available in the lattice package
```{r help_lattice, eval=FALSE}
library(lattice)
library(help=lattice)
```
Accessing and changing global parameters:
```{r param_lattice, eval=FALSE}
?lattice.options
?trellis.device
```
### Scatter Plot Sample
```{r scatter_plot_lattice, eval=TRUE}
library(lattice)
p1 <- xyplot(1:8 ~ 1:8 | rep(LETTERS[1:4], each=2), as.table=TRUE)
plot(p1)
```
### Line Plot Sample
```{r line_plot_lattice, eval=TRUE}
library(lattice)
p2 <- parallelplot(~iris[1:4] | Species, iris, horizontal.axis = FALSE,
layout = c(1, 3, 1))
plot(p2)
```
## ggplot2 Graphics
- What is `ggplot2`?
- High-level graphics system developed by Hadley Wickham
- Implements grammar of graphics from [Leland Wilkinson](http://www.amazon.com/Grammar-Graphics-Leland-Wilkinson/dp/0387987746)
- Streamlines many graphics workflows for complex plots
- Syntax centered around main `ggplot` function
- Simpler `qplot` function provides many shortcuts
- Documentation and Help
- [Book](https://ggplot2-book.org/)
- [R for Data Science](https://r4ds.had.co.nz/data-visualisation.html)
- [ggplot2 reference](https://ggplot2.tidyverse.org/reference/)
- [Cookbook for R](http://www.cookbook-r.com/Graphs/)
### Design Concept of `ggplot2`
Plotting formalized and implemented by the grammar of graphics by Leland Wilkinson and Hadley Wickham [@Wickham2010-nu; @Wilkinson2012-wv; @Wickham2009-aq]. The plotting process
in `ggplot2` is devided into layers including:
1. Data: the actual data to be plotted
2. Aesthetics: visual property of the objects in a plot (_e.g._ size, shape or color )
3. Geometries: shapes used to represent data (_e.g._ bar or scatter plot)
4. Facets: row and column layout of sub-plots
5. Statistics: data models and summaries
6. Coordinates: the plotting space
7. Theme: styles to be used, such as fonts, backgrounds, etc.
### `ggplot2` Usage
- `ggplot` function accepts two main arguments
- Data set to be plotted
- Aesthetic mappings provided by `aes` function
- Additional parameters such as geometric objects (_e.g._ points, lines, bars) are passed on by appending them with `+` as separator.
- List of available `geom_*` functions see [here](https://ggplot2.tidyverse.org/reference/)
- Settings of plotting theme can be accessed with the command `theme_get()` and its settings can be changed with `theme()`.
- Preferred input data object
- `qplot`: `data.frame` or `tibble` (support for `vector`, `matrix`, `...`)
- `ggplot`: `data.frame` or `tibble`
- Packages with convenience utilities to create expected inputs
- `dplyr` (`plyr`)
- `tidyr` and `reshape2`
### `qplot` Function
The syntax of `qplot` is similar as R's basic `plot` function
- Arguments
- `x`: x-coordinates (_e.g._ `col1`)
- `y`: y-coordinates (_e.g._ `col2`)
- `data`: `data.frame` or `tibble` with corresponding column names
- `xlim, ylim`: _e.g._ `xlim=c(0,10)`
- `log`: _e.g._ `log="x"` or `log="xy"`
- `main`: main title; see `?plotmath` for mathematical formula
- `xlab, ylab`: labels for the x- and y-axes
- `color`, `shape`, `size`
- `...`: many arguments accepted by `plot` function
### `qplot`: scatter plot basics
Create sample data, here 3 vectors: `x`, `y` and `cat`
```{r qplot_scatter_plot_sample, eval=TRUE}
library(ggplot2)
x <- sample(1:10, 10); y <- sample(1:10, 10); cat <- rep(c("A", "B"), 5)
```
Simple scatter plot
```{r qplot_scatter_plot, eval=TRUE}
qplot(x, y, geom="point")
```
Prints dots with different sizes and colors
```{r qplot_scatter_plot_dot_param, eval=TRUE}
qplot(x, y, geom="point", size=x, color=cat,
main="Dot Size and Color Relative to Some Values")
```
Drops legend
```{r qplot_scatter_plot_no_legend, eval=TRUE}
qplot(x, y, geom="point", size=x, color=cat) +
theme(legend.position = "none")
```
Plot different shapes
```{r qplot_scatter_plot_shapes, eval=TRUE}
qplot(x, y, geom="point", size=5, shape=cat)
```
#### Colored groups
```{r qplot_scatter_plot_colored_groups, eval=TRUE}
p <- qplot(x, y, geom="point", size=x, color=cat,
main="Dot Size and Color Relative to Some Values") +
theme(legend.position = "none")
print(p)
```
#### Regression line
```{r qplot_scatter_regression_line, eval=TRUE}
set.seed(1410)
dsmall <- diamonds[sample(nrow(diamonds), 1000), ]
p <- qplot(carat, price, data = dsmall) +
geom_smooth(method="lm")
print(p)
```
#### Local regression curve (loess)
```{r qplot_scatter_regression_loess, eval=TRUE}
p <- qplot(carat, price, data=dsmall, geom=c("point", "smooth"))
print(p) # Setting se=FALSE removes error shade
```
### `ggplot` Function
- More important than `qplot` to access full functionality of `ggplot2`
- Main arguments
- data set, usually a `data.frame` or `tibble`
- aesthetic mappings provided by `aes` function
- General `ggplot` syntax
- __`ggplot(data, aes(...)) + geom() + ... + stat() + ...`__
- Layer specifications
- `geom(mapping, data, ..., geom, position)`
- `stat(mapping, data, ..., stat, position)`
- Additional components
- `scales`
- `coordinates`
- `facet`
- `aes()` mappings can be passed on to all components (`ggplot, geom`, etc.). Effects are global when passed on to `ggplot()` and local for other components.
- `x, y`
- `color`: grouping vector (factor)
- `group`: grouping vector (factor)
#### Changing Plotting Themes in `ggplot`
- Theme settings can be accessed with `theme_get()`
- Their settings can be changed with `theme()`
Example how to change background color to white
```{r ggplot_background_color, eval=FALSE}
... + theme(panel.background=element_rect(fill = "white", colour = "black"))
```
#### Storing `ggplot` Specifications
Plots and layers can be stored in variables
```{r ggplot_store_plot, eval=FALSE}
p <- ggplot(dsmall, aes(carat, price)) + geom_point()
p # or print(p)
```
Returns information about data and aesthetic mappings followed by each layer
```{r ggplot_summary, eval=FALSE}
summary(p)
```
Print dots with different sizes and colors
```{r ggplot_dot_sizes, eval=FALSE}
bestfit <- geom_smooth(method = "lm", se = F, color = alpha("steelblue", 0.5), size = 2)
p + bestfit # Plot with custom regression line
```
Syntax to pass on other data sets
```{r ggplot_pass_data, eval=FALSE}
p %+% diamonds[sample(nrow(diamonds), 100),]
```
Saves plot stored in variable `p` to file
```{r ggplot_save_plot, eval=FALSE}
ggsave(p, file="myplot.pdf")
```
Standard R export functons for graphics work as well (see [here](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rgraphics/rgraphics/#saving-graphics-to-files)).
### `ggplot`: scatter plots
#### Basic example
```{r ggplot_scatter_plot1, eval=TRUE}
set.seed(1410)
dsmall <- as.data.frame(diamonds[sample(nrow(diamonds), 1000), ])
p <- ggplot(dsmall, aes(carat, price, color=color)) +
geom_point(size=4)
print(p)
```
Interactive version of above plot can be generated with the `ggplotly` function from
the `plotly` package.
```{r ggplot_scatter_plot_interactive1, eval=TRUE}
library(plotly)
ggplotly(p)
```
#### Regression line
```{r ggplot_regression_line, eval=TRUE}
p <- ggplot(dsmall, aes(carat, price)) + geom_point() +
geom_smooth(method="lm", se=FALSE) +
theme(panel.background=element_rect(fill = "white", colour = "black"))
print(p)
```
#### Several regression lines
```{r ggplot_many_regression_lines, eval=TRUE}
p <- ggplot(dsmall, aes(carat, price, group=color)) +
geom_point(aes(color=color), size=2) +
geom_smooth(aes(color=color), method = "lm", se=FALSE)
print(p)
```
#### Local regression curve (loess)
```{r ggplot_loess_regression, eval=TRUE}
p <- ggplot(dsmall, aes(carat, price)) + geom_point() + geom_smooth()
print(p) # Setting se=FALSE removes error shade
```
### `ggplot`: line plot
```{r ggplot_line_plot, eval=TRUE}
p <- ggplot(iris, aes(Petal.Length, Petal.Width, group=Species,
color=Species)) + geom_line()
print(p)
```
### Faceting
```{r ggplot_line_plot_faceting, eval=TRUE}
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_line(aes(color=Species), size=1) +
facet_wrap(~Species, ncol=1)
print(p)
```
#### Exercise 3
Scatter plots with `ggplot2`
- __Task 1__: Generate scatter plot for first two columns in `iris` data frame and color dots by its `Species` column.
- __Task 2__: Use the `xlim` and `ylim` arguments to set limits on the x- and y-axes so that all data points are restricted to the left bottom quadrant of the plot.
- __Task 3__: Generate corresponding line plot with faceting presenting the individual data sets in saparate plots.
Structure of `iris` data set
```{r structure_iris_data3, eval=TRUE}
class(iris)
iris[1:4,]
table(iris$Species)
```
```{r exercise3_solution, eval=FALSE, echo=FALSE}
ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point(aes(color=Species), size=3) + xlim(4,15) + ylim(2,12)
ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_line(aes(color=Species), size=3) + ggtitle("Line Plot") + facet_wrap(~Species, ncol=1)
```
### Bar Plots
Sample Set: the following transforms the `iris` data set into a ggplot2-friendly format.
Calculate mean values for aggregates given by `Species` column in `iris` data set
```{r iris_mean, eval=TRUE}
iris_mean <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=mean)
```
Calculate standard deviations for aggregates given by `Species` column in `iris` data set
```{r iris_sd, eval=TRUE}
iris_sd <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=sd)
```
Reformat `iris_mean` with `melt` from wide to long form as expected by `ggplot2`. Newer
alternatives for restructuring `data.frames` and `tibbles` from wide into long form use the
`gather` and `pivot_longer` functions defined by the `tidyr` package. Their usage is shown
below as well. The functions `pivot_longer` and `pivot_wider` are expected to provide the most
flexible long-term solution, but may not work in older R versions.
```{r iris_mean_melt, eval=TRUE}
library(reshape2) # Defines melt function
df_mean <- melt(iris_mean, id.vars=c("Species"), variable.name = "Samples", value.name="Values")
df_mean2 <- tidyr::gather(iris_mean, !Species, key = "Samples", value = "Values")
df_mean3 <- tidyr::pivot_longer(iris_mean, !Species, names_to="Samples", values_to="Values")
```
Reformat `iris_sd` with `melt`
```{r iris_sd_melt, eval=TRUE}
df_sd <- melt(iris_sd, id.vars=c("Species"), variable.name = "Samples", value.name="Values")
```
Define standard deviation limits
```{r iris_limits, eval=TRUE}
limits <- aes(ymax = df_mean[,"Values"] + df_sd[,"Values"], ymin=df_mean[,"Values"] - df_sd[,"Values"])
```
#### Verical orientation
```{r iris_mean_bar_plot, eval=TRUE}
p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) +
geom_bar(position="dodge", stat="identity")
print(p)
```
To enforce that the bars are plotted in the order specified in the input data, one can instruct `ggplot`
to do so by turning the corresponding column (here `Species`) into an ordered factor as follows.
```{r iris_factor_order, eval=FALSE}
df_mean$Species <- factor(df_mean$Species, levels=unique(df_mean$Species), ordered=TRUE)
```
In the above example this is not necessary since `ggplot` uses this order already.
#### Horizontal orientation
```{r iris_mean_bar_plot_sideways, eval=TRUE}
p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) +
geom_bar(position="dodge", stat="identity") + coord_flip() +
theme(axis.text.y=element_text(angle=0, hjust=1))
print(p)
```
#### Faceting
```{r iris_mean_bar_plot_facetting, eval=FALSE}
p <- ggplot(df_mean, aes(Samples, Values)) + geom_bar(aes(fill = Species), stat="identity") +
facet_wrap(~Species, ncol=1)
print(p)
```
#### Error bars
```{r iris_mean_bar_plot_error, eval=TRUE}
p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) +
geom_bar(position="dodge", stat="identity") +
geom_errorbar(limits, position="dodge")
print(p)
```
#### Mirrored
```{r ggplot2_mirrored_barplot, eval=TRUE}
df <- data.frame(group = rep(c("Above", "Below"), each=10), x = rep(1:10, 2), y = c(runif(10, 0, 1), runif(10, -1, 0)))
p <- ggplot(df, aes(x=x, y=y, fill=group)) +
geom_col()
print(p)
```
### Changing Color Settings
```{r ggplot2_color1, eval=TRUE}
library(RColorBrewer)
# display.brewer.all()
p <- ggplot(df_mean, aes(Samples, Values, fill=Species, color=Species)) +
geom_bar(position="dodge", stat="identity") + geom_errorbar(limits, position="dodge") +
scale_fill_brewer(palette="Blues") + scale_color_brewer(palette = "Greys")
print(p)
```
#### Using standard R color theme
```{r ggplot2_color2, eval=TRUE}
p <- ggplot(df_mean, aes(Samples, Values, fill=Species, color=Species)) +
geom_bar(position="dodge", stat="identity") + geom_errorbar(limits, position="dodge") +
scale_fill_manual(values=c("red", "green3", "blue")) +
scale_color_manual(values=c("red", "green3", "blue"))
print(p)
```
#### Exercise 4
Bar plots
- __Task 1__: Calculate the mean values for the `Species` components of the first four columns in the `iris` data set. Use the `melt` function from the `reshape2` package to bring the data into the expected format for `ggplot`.
- __Task 2__: Generate two bar plots: one with stacked bars and one with horizontally arranged bars.
Structure of iris data set
```{r iris_structure4, eval=TRUE}
class(iris)
iris[1:4,]
table(iris$Species)
```
```{r exercise4_solution, eval=FALSE, echo=FALSE}
iris_mean <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=mean)
library(reshape2)
df_mean <- melt(iris_mean, id.vars=c("Species"), variable.name = "Samples", value.name="Values")
ggplot(df_mean, aes(Samples, Values, fill=Species)) + geom_bar(position="stack", stat="identity")
ggplot(df_mean, aes(Samples, Values, fill=Species)) + geom_bar(position="dodge", stat="identity")
```
### Data reformatting example
Here for line plot
```{r ggplot_melt_data, eval=TRUE}
y <- matrix(rnorm(500), 100, 5, dimnames=list(paste("g", 1:100, sep=""), paste("Sample", 1:5, sep="")))
y <- data.frame(Position=1:length(y[,1]), y)
y[1:4, ] # First rows of input format expected by melt()
df <- melt(y, id.vars=c("Position"), variable.name = "Samples", value.name="Values")
p <- ggplot(df, aes(Position, Values)) + geom_line(aes(color=Samples)) + facet_wrap(~Samples, ncol=1)
print(p)
```
Same data can be represented in box plot as follows
```{r ggplot_melt_box_plot, eval=FALSE}
ggplot(df, aes(Samples, Values, fill=Samples)) + geom_boxplot() + geom_jitter(color="darkgrey")
```
### Jitter Plots
```{r ggplot_jitter_plot, eval=TRUE}
p <- ggplot(dsmall, aes(color, price/carat)) +
geom_jitter(alpha = I(1 / 2), aes(color=color))
print(p)
```
### Box plots
```{r ggplot_box_plot2, eval=TRUE}
p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_boxplot()
print(p)
```
### Violin plots
```{r ggplot_box_violin, eval=TRUE}
p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_violin()
print(p)
```
Same violin plot as interactive plot generated with `ggplotly`, where the actual data points
are shown as well by including `geom_jitter()`.
```{r ggplot_box_plot2_jitter, eval=TRUE}
p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_violin() + geom_jitter(aes(color=color))
ggplotly(p)
```
### Density plots
#### Line coloring
```{r ggplot_density_plot_linecol, eval=TRUE}
p <- ggplot(dsmall, aes(carat)) + geom_density(aes(color = color))
print(p)
```
#### Area coloring
```{r ggplot_density_plot_areacol, eval=TRUE}
p <- ggplot(dsmall, aes(carat)) + geom_density(aes(fill = color))
print(p)
```
### Histograms
```{r ggplot_histogram, eval=TRUE}
p <- ggplot(iris, aes(x=Sepal.Width)) +
geom_histogram(aes(y = ..density.., fill = ..count..), binwidth=0.2) +
geom_density()
print(p)
```
### Pie Chart
```{r ggplot_pie_chart, eval=TRUE}
df <- data.frame(variable=rep(c("cat", "mouse", "dog", "bird", "fly")),
value=c(1,3,3,4,2))
p <- ggplot(df, aes(x = "", y = value, fill = variable)) +
geom_bar(width = 1, stat="identity") +
coord_polar("y", start=pi / 3) + ggtitle("Pie Chart")
print(p)
```
### Wind Rose Pie Chart
```{r ggplot_windrose_pie_chart, eval=TRUE}
p <- ggplot(df, aes(x = variable, y = value, fill = variable)) +
geom_bar(width = 1, stat="identity") +
coord_polar("y", start=pi / 3) +
ggtitle("Pie Chart")
print(p)
```
### Arranging Graphics on Page
Using `grid` package
```{r ggplot_arrange_graphics, eval=TRUE}
library(grid)
a <- ggplot(dsmall, aes(color, price/carat)) + geom_jitter(size=4, alpha = I(1 / 1.5), aes(color=color))
b <- ggplot(dsmall, aes(color, price/carat, color=color)) + geom_boxplot()
c <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_boxplot() + theme(legend.position = "none")
grid.newpage() # Open a new page on grid device
pushViewport(viewport(layout = grid.layout(2, 2))) # Assign to device viewport with 2 by 2 grid layout
print(a, vp = viewport(layout.pos.row = 1, layout.pos.col = 1:2))
print(b, vp = viewport(layout.pos.row = 2, layout.pos.col = 1))
print(c, vp = viewport(layout.pos.row = 2, layout.pos.col = 2, width=0.3, height=0.3, x=0.8, y=0.8))
```
Using `gridExtra` package
```{r ggplot_arrange_graphics2, eval=TRUE, warning=FALSE, message=FALSE}
library(gridExtra)
grid.arrange(a, b, c, nrow = 2, ncol=2)
```
Also see `patchwork` in ggplot2 book [here](https://ggplot2-book.org/arranging-plots.html).
### Inserting Graphics into Plots
```{r ggplot_insert_graphics, eval=TRUE}
library(grid)
print(a)
print(b, vp=viewport(width=0.3, height=0.3, x=0.8, y=0.8))
```
## Specialty Graphics
### Spatial Heatmap Diagrams
Spatial expression data can be visualized with the [`spatialHeatmap`](https://bioconductor.org/packages/release/bioc/vignettes/spatialHeatmap/inst/doc/spatialHeatmap.html) package.
### Venn Diagrams
```{r specgraph_venn, eval=TRUE, message=FALSE, fig.dim=c(5,5), fig.align="center"}
library(systemPipeR)
setlist5 <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18))
OLlist5 <- overLapper(setlist=setlist5, sep="_", type="vennsets")
vennPlot(OLlist5, mymain="", mysub="default", colmode=2, ccol=c("blue", "red"))
```
### UpSet Plots
```{r specgraph_upset, eval=TRUE, message=FALSE, fig.dim=c(5,5), fig.align="center"}
library(ComplexHeatmap)
setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20))
setma <- make_comb_mat(setlist)
UpSet(setma)
```
### Compound Structures
Plots depictions of small molecules with `ChemmineR` package
```{r specgraph_structure, eval=TRUE}
library(ChemmineR)
data(sdfsample)
plot(sdfsample[1], print=FALSE)
```
### Heatmaps
There are many packages for plotting heatmaps in R. The following uses `pheatmap`.
```{r pheatmap, eval=TRUE, warning=FALSE, message=FALSE, fig.dim=c(6,6), fig.align="center"}
library(pheatmap); library("RColorBrewer")
y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
pheatmap(y, color=brewer.pal(9,"Blues"))
```
### ROC Plots
A variety of libraries are available for plotting receiver operating characteristic (ROC) curves in R:
+ [ROCR](http://rocr.bioinf.mpi-sb.mpg.de/)
+ [ROC](http://bioconductor.org/packages/release/bioc/html/ROC.html)
+ [pROC](http://web.expasy.org/pROC/) (includes ggplot2)
+ [plotROC](https://cran.r-project.org/web/packages/plotROC/) (uses ggplot2)
+ [ggplot2](http://largedata.blogspot.com/2011/07/plotting-roc-curves-in-ggplot2.html) (custom code example)
#### Example with ROCR
Most commonly, in an ROC we plot the true positive rate (y-axis) against the false positive rate (x-axis) at decreasing thresholds.
An illustrative example is provided in the `ROCR` package where one wants to inspect the content of the `ROCR.simple` object
defining the structure of the input data in two vectors.
```{r ROCR_example, eval=TRUE, warning=FALSE, message=FALSE}
# install.packages("ROCR") # Install if necessary
library(ROCR)
data(ROCR.simple)
ROCR.simple
pred <- prediction(ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance( pred, "tpr", "fpr" )
plot(perf)
```
Obtain area under the curve (AUC)
```{r ROCR_example2, eval=TRUE, warning=FALSE, message=FALSE}
auc <- performance( pred, "tpr", "fpr", measure = "auc")
auc@y.values[[1]]
```
#### Example with plotROC
The [`plotROC`](https://cran.r-project.org/web/packages/plotROC/) package plots ROCs with `ggplot2`. The following generates a sample data set for several performance
results, here three. For convenience the data are first arranged in a `data.frame` in wide format. Next, the `melt_roc`
is used to convert the `data.frame` into the long format as required by `ggplot`.
```{r plotROC_example, eval=TRUE, warning=FALSE, message=FALSE}
library(plotROC)
set.seed(2529)
D.ex <- rbinom(200, size = 1, prob = .5); M1 <- rnorm(200, mean = D.ex, sd = .65); M2 <- rnorm(200, mean = D.ex, sd = 1.1); M3 <- rnorm(200, mean = D.ex, sd = 1.5)
perfDF <- data.frame(D = D.ex, D.str = c("TRUE", "Ill")[D.ex + 1], M1 = M1, M2 = M2, M3=M3, stringsAsFactors = FALSE)
perfDF[1:4,] # wide format
long_perfDF <- melt_roc(perfDF, "D", c("M1", "M2", "M3")) # transformed into long format for ggplot
long_perfDF[1:4,] # long format
```
After converting the sample data into the long format the results can be plotted with `geom_roc`, where several
ROCs are combined in a single plot and the corresponding AUC values are shown in the legend.
```{r plotROC_example2, eval=TRUE, warning=FALSE, message=FALSE}
multi_roc <- ggplot(long_perfDF, aes(d = D, m = M, color = name)) + geom_roc(n.cuts=0)
auc_df <- calc_auc(multi_roc) # calculate AUC values
auc_str <- paste0(auc_df$name, ": ", round(auc_df$AUC, 2))
multi_roc + scale_color_manual(name="AUC:", labels=auc_str, values=seq_along(auc_str))
```
### Trees
The [`ape`](http://ape-package.ird.fr/ape_screenshots.html) package provides many useful utilities for phylogenetic analysis and tree plotting. Another useful package for
plotting trees is [`ggtree`](http://bioconductor.org/packages/release/bioc/html/ggtree.html). The following example plots two trees face to face with links to identical
leaf labels.
```{r trees_ape1, eval=TRUE}
library(ape)
tree1 <- rtree(40)
tree2 <- rtree(20)
association <- cbind(tree2$tip.label, tree2$tip.label)
cophyloplot(tree1, tree2, assoc = association,
length.line = 4, space = 28, gap = 3)
```
## Genome Graphics
### `ggbio`
- What is `ggbio`?
- A programmable genome browser environment
- Genome broswer concepts
- A genome browser is a visulalization tool for plotting different types of genomic data in separate tracks along chromosomes.
- The `ggbio` package [@Yin2012-jj] facilitates plotting of complex genome data objects, such as read alignments (SAM/BAM), genomic context/annotation information (gff/txdb), variant calls (VCF/BCF), and more. To easily compare these data sets, it extends the faceting facility of `ggplot2` to genome browser-like tracks.
- Most of the core object types for handling genomic data with R/Bioconductor are supported: `GRanges`, `GAlignments`, `VCF`, etc. For more details, see Table 1.1 of the `ggbio` vignette [here](http://www.bioconductor.org/packages/release/bioc/vignettes/ggbio/inst/doc/ggbio.pdf).
- `ggbio`'s convenience plotting function is `autoplot`. For more customizable plots, one can use the generic `ggplot` function.
- Apart from the standard `ggplot2` plotting components, `ggbio` defines serval new components useful for genomic data visualization. A detailed list is given in Table 1.2 of the vignette [here](http://www.bioconductor.org/packages/release/bioc/vignettes/ggbio/inst/doc/ggbio.pdf).
- Useful web sites:
- [ggbio Bioc vignette](http://bioconductor.org/packages/release/bioc/html/ggbio.html)
- [ggbio online manual](https://lawremi.github.io/ggbio/)
- [autoplot demo](https://lawremi.github.io/ggbio/docs/man/autoplot-method.html)
#### Tracks: aligning plots along chromosomes
```{r ggbio_align, eval=TRUE, warning=FALSE, message=FALSE}
library(ggbio)
df1 <- data.frame(time = 1:100, score = sin((1:100)/20)*10)
p1 <- qplot(data = df1, x = time, y = score, geom = "line")
df2 <- data.frame(time = 30:120, score = sin((30:120)/20)*10, value = rnorm(120-30 +1))
p2 <- ggplot(data = df2, aes(x = time, y = score)) + geom_line() + geom_point(size = 2, aes(color = value))
tracks(time1 = p1, time2 = p2) + xlim(1, 40) + theme_tracks_sunset()
```
#### Plotting genomic ranges
`GRanges` objects are essential for storing alignment or annotation ranges in R/Bioconductor. The following creates a sample `GRanges` object and plots its content.
```{r ggbio_granges, eval=TRUE}
library(GenomicRanges)
set.seed(1); N <- 100; gr <- GRanges(seqnames = sample(c("chr1", "chr2", "chr3"), size = N, replace = TRUE), IRanges(start = sample(1:300, size = N, replace = TRUE), width = sample(70:75, size = N,replace = TRUE)), strand = sample(c("+", "-"), size = N, replace = TRUE), value = rnorm(N, 10, 3), score = rnorm(N, 100, 30), sample = sample(c("Normal", "Tumor"), size = N, replace = TRUE), pair = sample(letters, size = N, replace = TRUE))
autoplot(gr, aes(color = strand, fill = strand), facets = strand ~ seqnames)
```
#### Plotting coverage
```{r ggbio_coverage, eval=TRUE}
autoplot(gr, aes(color = strand, fill = strand), facets = strand ~ seqnames, stat = "coverage")
```
#### Mirrored coverage
```{r ggbio_mirrored_coverage, eval=TRUE}
pos <- sapply(coverage(gr[strand(gr)=="+"]), as.numeric)
pos <- data.frame(Chr=rep(names(pos), sapply(pos, length)), Strand=rep("+", length(unlist(pos))), Position=unlist(sapply(pos, function(x) 1:length(x))), Coverage=as.numeric(unlist(pos)))
neg <- sapply(coverage(gr[strand(gr)=="-"]), as.numeric)
neg <- data.frame(Chr=rep(names(neg), sapply(neg, length)), Strand=rep("-", length(unlist(neg))), Position=unlist(sapply(neg, function(x) 1:length(x))), Coverage=-as.numeric(unlist(neg)))
covdf <- rbind(pos, neg)
p <- ggplot(covdf, aes(Position, Coverage, fill=Strand)) +
geom_col() +
facet_wrap(~Chr)
p
```
### Circular genome plots
```{r ggbio_circular1, eval=TRUE}
ggplot(gr) +
layout_circle(aes(fill = seqnames), geom = "rect")
```
More complex circular example
```{r ggbio_circular2, eval=FALSE}
seqlengths(gr) <- c(400, 500, 700)
values(gr)$to.gr <- gr[sample(1:length(gr), size = length(gr))]
idx <- sample(1:length(gr), size = 50)
gr <- gr[idx]
ggplot() +
layout_circle(gr, geom = "ideo", fill = "gray70", radius = 7, trackWidth = 3) +
layout_circle(gr, geom = "bar", radius = 10, trackWidth = 4, aes(fill = score, y = score)) +
layout_circle(gr, geom = "point", color = "red", radius = 14, trackWidth = 3, grid = TRUE, aes(y = score)) +
layout_circle(gr, geom = "link", linked.to = "to.gr", radius = 6, trackWidth = 1)
```
#### Alignments and variants
To make the following example work, please download and unpack this [data archive](https://cluster.hpcc.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec_12_16_2013/Rgraphics/data.zip) containing GFF, BAM and VCF sample files.
```{r ggbio_align_variants, eval=TRUE, warning=FALSE, message=FALSE}
library(rtracklayer); library(GenomicFeatures); library(Rsamtools); library(GenomicAlignments); library(VariantAnnotation)
ga <- readGAlignments("./data/SRR064167.fastq.bam", use.names=TRUE, param=ScanBamParam(which=GRanges("Chr5", IRanges(4000, 8000))))
p1 <- autoplot(ga, geom = "rect")
p2 <- autoplot(ga, geom = "line", stat = "coverage")
vcf <- readVcf(file="data/varianttools_gnsap.vcf", genome="ATH1")
p3 <- autoplot(vcf[seqnames(vcf)=="Chr5"], type = "fixed") + xlim(4000, 8000) + theme(legend.position = "none", axis.text.y = element_blank(), axis.ticks.y=element_blank())
txdb <- makeTxDbFromGFF(file="./data/TAIR10_GFF3_trunc.gff", format="gff3")
p4 <- autoplot(txdb, which=GRanges("Chr5", IRanges(4000, 8000)), names.expr = "gene_id")
tracks(Reads=p1, Coverage=p2, Variant=p3, Transcripts=p4, heights = c(0.3, 0.2, 0.1, 0.35)) + ylab("")
```
### Additional examples
See `autoplot` demo [here](https://lawremi.github.io/ggbio/docs/man/autoplot-method.html)
### Additional genome graphics
- [`Gviz`](http://www.bioconductor.org/packages/devel/bioc/html/Gviz.html)
- [`RCircos`](http://cran.us.r-project.org/web/packages/RCircos/index.html) [@Zhang2013-zn]
- [`genoPlotR`](http://genoplotr.r-forge.r-project.org/)
### Genome Browser: IGV
View genome data in IGV
- Download and open [IGV](http://software.broadinstitute.org/software/igv/download)
- Select in menu in top left corner _A. thaliana_ (TAIR10)
- Upload the following indexed/sorted Bam files with `File -> Load from URL...`
```
https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064154.fastq.bam
https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064155.fastq.bam
https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064166.fastq.bam
https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064167.fastq.bam
```
- To view area of interest, enter its coordinates `Chr1:49,457-51,457` in position menu on top.
#### Create symbolic links
For viewing BAM files in IGV as part of `systemPipeR` workflows.
- `systemPipeR`: utilities for building NGS analysis pipelines.
```{r systempiper_symbolic_links, eval=FALSE, message=FALSE}
library("systemPipeR")
symLink2bam(sysargs=args, htmldir=c("~/.html/", "somedir/"),
urlbase="http://myserver.edu/~username/",
urlfile="IGVurl.txt")
```
#### Controlling IGV from R
Open IGV before running the following routine. Alternatively, open IGV from within R with `startIGV("lm")`.
Note, the latter may not work on all systems.
```{r control_igv, eval=FALSE, warning=FALSE, message=FALSE}
library(SRAdb)
myurls <- readLines("http://cluster.hpcc.ucr.edu/~tgirke/Documents/R_BioCond/Samples/bam_urls.txt")
# startIGV("lm") # opens IGV
sock <- IGVsocket()
IGVload(sock, myurls)
IGVgoto(sock, 'Chr1:45296-47019')
```
## Session Info
```{r sessionInfo}
sessionInfo()
```
## References