---
execute:
echo: true
message: false
warning: false
fig-format: "svg"
format:
revealjs:
highlight-style: a11y-dark
reference-location: margin
theme: lecture_styles.scss
controls: true
slide-number: true
code-link: true
chalkboard: true
incremental: false
smaller: true
preview-links: true
code-line-numbers: true
history: false
progress: true
link-external-icon: true
code-annotations: hover
pointer:
color: "#b18eb1"
revealjs-plugins:
- pointer
---
```{r}
#| echo = FALSE
require(downlit)
require(xml2)
```
## {#title-slide data-menu-title="Visualizing Data" background="#1e4655" background-image="../../images/csss-logo.png" background-position="center top 5%" background-size="50%"}
[Visualizing Data]{.custom-title}
[CS&SS 508 • Lecture 2]{.custom-subtitle}
[{{< var lectures.two >}}]{.custom-subtitle2}
[Victoria Sass]{.custom-subtitle3}
# Roadmap{.section-title background-color="#99a486"}
---
:::: {.columns}
::: {.column width="50%"}
### Last time, we learned:
* R and RStudio
* Quarto headers, syntax, and chunks
* Basics of functions, objects, and vectors
* Base `R` and packages
:::
::: {.column width="50%"}
::: {.fragment}
### Today, we will cover:
* Introducing the `tidyverse`!
* Basics of `ggplot2`
* Advanced features of `ggplot2`
* `ggplot2` extensions
:::
:::
::::
## File Types
We mainly work with three types of files in this class:
:::{.incremental}
* `.qmd`^[Quarto builds on a decade of developments with R Markdown documents. .Rmd files operate **very** similarly to Quarto documents but there are minor differences that you can read more about [here](https://quarto.org/docs/computations/r.html#:~:text=Another%20difference%20between%20R%20Markdown,than%20relying%20on%20external%20packages).]: These are **markdown** *syntax* files, where you write code and plain or formatted text to *make documents*.
* `.R`: These are **R** *syntax* files, where you write code to process and analyze data *without making an output document*^[You can use the `source()` function to run a `.R` script file inside a `.qmd` or `.R` file. Using this you can break a large project up into multiple files but still run it all at once!].
* `.html` (or `.pdf`, `.docx`, etc.): These are the output documents created when you *Render* a quarto markdown document.
:::
. . .
Make sure you understand the difference between the uses of these file types! Please ask for clarification if needed!
# Introducing the `tidyverse` {.section-title background-color="#99a486"}
## Packages
Last week we discussed Base `R` and the fact that what makes `R` extremely powerful and flexible is the large number of diverse user-created packages.
. . .
::: {.callout-note icon=false}
## {{< fa hand >}} What are packages again?
Recall that packages are simply **collections of functions and tools** others have already created, that will make your life easier!
:::
. . .
::: {.callout-caution icon=false}
## {{< fa triangle-exclamation >}} The package 2-step
Remember that to **install** a new package you use `install.packages("package_name")` in the **console**. You only need to do this once per machine (unless you want to update to a newer version of a package).
To **load** a package into your current session of `R` you use `library(package_name)`, preferably at the beginning of your `R` script or Quarto document. Every time you open RStudio it's a new session and you'll have to call `library()` on the packages you want to use.
:::
## Packages
The `Packages` tab in the bottom-right pane of RStudio lists your installed packages.
![](images/package_tab.png){fig-align="center"}
## The `tidyverse`
The `tidyverse` refers to two things:
:::{.incremental}
1. a specific package in `R` that loads several core packages within the `tidyverse`.
2. a specific design philosophy, grammar, and focus on "tidy" data structures developed by Hadley Wickham^[You can read the official manifesto [here](https://tidyverse.tidyverse.org/articles/manifesto.html).] and his team at RStudio (now named Posit).
:::
## The `tidyverse` package
:::: {.columns}
::: {.column width="50%"}
The core packages within the `tidyverse` include:
:::{.incremental}
* `ggplot2` (visualizations)
* `dplyr` (data manipulation)
* `tidyr` (data reshaping)
* `readr` (data import/export)
* `purrr` (iteration)
* `tibble` (modern dataframe)
* `stringr` (text data)
* `forcats` (factors)
:::
:::
::: {.column width="50%"}
![](images/tidyverse.png)
:::
::::
## The `tidyverse` philosophy
:::: {.columns}
::: {.column width="50%"}
The principles underlying the tidyverse are:
:::{.incremental}
1. Reuse existing data structures.
2. Compose simple functions with the pipe.
3. Embrace functional programming.
4. Design for humans.
:::
:::
::: {.column width="50%"}
![](images/extended_tidyverse.jpeg)
:::
::::
## {data-menu-title="Research process" background-image="images/research_process.png" background-position="center" background-size="90%"}
## {data-menu-title="Research process - visualization" background-image="images/research_process_visualize.png" background-position="center" background-size="90%"}
## {data-menu-title="Research process - communication" background-image="images/research_process_communicate.png" background-position="center" background-size="90%"}
## Gapminder Data
We'll be working with data from Hans Rosling's [Gapminder](http://www.gapminder.org) project. An excerpt of these data can be accessed through an R package called `gapminder`^[Cleaned and assembled by Jenny Bryan at UBC.]. Check the packages tab to see if `gapminder` appears (unchecked) in your computer's list of downloaded packages.
. . .
If it doesn't, run `install.packages("gapminder")` in the console.
. . .
Now, load the `gapminder` package as well as the `tidyverse` package:
```{r}
#| message: true
library(gapminder)
library(tidyverse) # <1>
```
1. Every time you `library` (i.e. load) `tidyverse` it will tell you which individual packages it is loading, as well as all function conflicts it has with other packages loaded in the current session. This is useful information but you can suppress seeing/printing this output by adding the `message: false` chunk option to your code chunk.
## Check Out Gapminder {{< fa scroll >}} {.scrollable}
The data frame we will work with is called `gapminder`, available once you have loaded the package. Let's see its structure:
```{r}
str(gapminder)
```
. . .
#### What's Notable Here?
::: {.incremental}
* **Factor** variables `country` and `continent`
* Factors are categorical data with an underlying numeric representation
* We'll spend a lot of time on factors later!
* Many observations: $n=`r nrow(gapminder)`$ rows
* For each observation, a few variables: $p=`r ncol(gapminder)`$ columns
* A nested/hierarchical structure: `year` in `country` in `continent`
* These are panel data!
:::
## Base `R` plot
:::: {.columns}
::: {.column width="50%"}
```{r}
#| eval=FALSE
China <- gapminder |>
filter(country == "China")
plot(lifeExp ~ year,
data = China,
xlab = "Year",
ylab = "Life expectancy",
main = "Life expectancy in China",
col = "red",
pch = 16)
```
This plot is made with *one function* and *many arguments.*
:::
::: {.column width="50%"}
```{r}
#| echo: false
#| fig-width: 6
#| fig-height: 6
China <- subset(gapminder,gapminder$country == "China")
plot(lifeExp ~ year, data = China, xlab = "Year", ylab = "Life expectancy",
main = "Life expectancy in China", col = "red", pch = 16)
```
:::
::::
::: aside
Note: Don't worry about the code used to create the object `China`. We'll explore data manipulation in a couple weeks!
:::
## Fancier: `ggplot`
:::: {.columns}
::: {.column width="50%"}
```{r}
#| eval: false
ggplot(data = China,
mapping = aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(title = "Life expectancy in China",
x = "Year",
y = "Life expectancy") +
theme_minimal(base_size = 18)
```
This `ggplot` is made with *many functions* and *fewer arguments* in each.
:::
::: {.column width="50%"}
```{r}
#| warning: false
#| message: false
#| echo: false
#| fig-width: 6
#| fig-height: 6
ggplot(data = China,
mapping = aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(title = "Life expectancy in China",
x = "Year",
y = "Life expectancy") +
theme_minimal(base_size = 18)
```
:::
::::
# {data-menu-title="`ggplot`" background-image="images/ggplot2.png" background-size="contain" background-position="center" .section-title background-color="#1e4655"}
## `ggplot2`
The `ggplot2` package provides an alternative toolbox for plotting.
. . .
The core idea underlying this package is the [**layered grammar of graphics**](https://vita.had.co.nz/papers/layered-grammar.pdf): i.e. that we can break up elements of a plot into pieces and combine them.
. . .
`ggplot`s take a *bit* more work to create than Base `R` plots, but are usually:
* prettier
* more professional
* **much** more customizable
## Layered grammar of graphics {{< fa scroll >}} {.scrollable}
![](images/gglayers.png){fig-align="center"}
::: aside
This is based on Leland Wilkinson's book [*The Grammar of Graphics*](https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/1juclfo/alma99145336560001452) and extended by Hadley Wickham in his paper ["A layered grammar of graphics"](https://vita.had.co.nz/papers/layered-grammar.html).
:::
## Structure of a ggplot
`ggplot` graphics objects consist of two primary components:
. . .
1. **Layers**, the components of a graph.
* We *add* layers to a `ggplot` object using `+`.
* This includes adding lines, shapes, and text to a plot.
. . .
2. **Aesthetics**, which determine how the layers appear.
* We *set* aesthetics using *arguments* (e.g. `color = "red"`) inside layer functions.
* This includes modifying locations, colors, and sizes of the layers.
::: {.callout-tip icon=false}
## {{< fa hand-point-up >}} Aesthetic Vignette
Learn more about all possible aesthetic mappings [here](https://ggplot2.tidyverse.org/articles/ggplot2-specs.html).
:::
## Layers
**Layers** are the components of the graph, such as:
::: {.incremental}
* `ggplot()`: initializes basic plotting object, specifies input data
* `geom_point()`: layer of scatterplot points
* `geom_line()`: layer of lines
* `geom_histogram()`: layer of a histogram
* `labs` (or to specify individually: `ggtitle()`, `xlab()`, `ylab()`): layers of labels
* `facet_wrap()`: layer creating multiple plot panels
* `theme_bw()`: layer replacing default gray background with black-and-white
:::
. . .
Layers are separated by a `+` sign. For clarity, I usually put each layer on a new line.
::: {.callout-important icon=false}
## {{< fa circle-exclamation >}} Syntax Warning
Be sure to **end** each line with the `+`. The code will not run if a new line *begins* with a `+`.
:::
## Aesthetics
**Aesthetics** control the appearance of the layers:
* `x`, `y`: $x$ and $y$ coordinate values to use
* `color`: set color of elements based on some data value
* `group`: describe which points are conceptually grouped together for the plot (often used with lines)
* `size`: set size of points/lines based on some data value (greater than 0)
* `alpha`: set transparency based on some data value (between 0 and 1)
::: {.callout-warning icon=false}
## {{< fa triangle-exclamation >}} Mapping data inside `aes()` vs. creating plot-wise settings outside `aes()`
When aesthetic arguments are called within `aes()` they specify a variable of the data and therefore map said value of the data by that aesthetic. Called outside `aes()`, these are only settings that can be given a specific value but will not display a dimension of the data.
:::
## `ggplot` Templates
. . .
#### All layers have:
an initializing `ggplot` call and at least one `geom` function.
::: {.panel-tabset}
### same data & aesthetics
```{r}
#| eval: false
ggplot(data = [dataset],
mapping = aes(x = [x-variable], y = [y-variable])) +
geom_xxx() +
other options
```
### same data, diff aesthetics
```{r}
#| eval: false
ggplot(data = [dataset],
mapping = aes(x = [x-variable], y = [y-variable])) +
geom_xxx() +
geom_yyy(mapping = aes(x = [x-variable], y = [y-variable])) +
other options
```
### diff data & aesthetics
```{r}
#| eval: false
ggplot() +
geom_xxx(data = [dataset1],
mapping = aes(x = [x-variable], y = [y-variable])) +
geom_yyy(data = [dataset2],
mapping = aes(x = [x-variable], y = [y-variable])) +
other options
```
:::
# Example: Basic Jargon in Action!{.section-title background-color="#99a486"}
## Axis Labels, Points, No Background{auto-animate="true"}
### Base `ggplot`
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: FALSE
#| code-line-numbers: "1-2"
ggplot(data = China,
aes(x = year, y = lifeExp))
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = China,
aes(x = year, y = lifeExp))
```
:::
::::
::: aside
Initialize the plot with `ggplot()` and `x` and `y` aesthetics **mapped** to variables. These aesthetics will be accessible to any future layers since they're in the primary layer.
:::
## Axis Labels, Points, No Background{auto-animate="true"}
### Scatterplot
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "3"
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point()
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point()
```
:::
::::
::: aside
Add a scatterplot **layer**.
:::
## Axis Labels, Points, No Background{auto-animate="true"}
### Point Color and Size
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "3"
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3)
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3)
```
:::
::::
::: aside
**Set** aesthetics to make the points larger and red. Notice that these "aesthetics" are not inside the `aes` call the way `x` and `y` are on line 2. These are therefore global settings rather than mapping aesthetics.
:::
## Axis Labels, Points, No Background{auto-animate="true"}
### X-Axis Label
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "4"
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year")
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year")
```
:::
::::
::: aside
Add a layer to capitalize the x-axis label.
:::
## Axis Labels, Points, No Background{auto-animate="true"}
### Y-Axis Label
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "5"
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy")
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy")
```
:::
::::
::: aside
Add a layer to clean up the y-axis label.
:::
## Axis Labels, Points, No Background{auto-animate="true"}
### Title
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "6"
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy in China")
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy in China")
```
:::
::::
::: aside
Add a title layer.
:::
## Axis Labels, Points, No Background{auto-animate="true"}
### Theme
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "7"
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy in China") +
theme_minimal()
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy in China") +
theme_minimal()
```
:::
::::
::: aside
Pick a nicer theme with a new layer.
:::
## Axis Labels, Points, No Background{auto-animate="true"}
### Text Size
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "7"
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy in China") +
theme_minimal(base_size = 18)
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = China,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy in China") +
theme_minimal(base_size = 18)
```
:::
::::
::: aside
Increase the base text size.
:::
## Plotting All Countries
We have a plot we like for China...
. . .
... but what if we want *all the countries*?
## Plotting All Countries{auto-animate="true"}
### A Mess!
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "|1|3"
ggplot(data = gapminder,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = gapminder,
aes(x = year, y = lifeExp)) +
geom_point(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
::::
::: aside
We can't tell countries apart! Maybe we could follow *lines*?
:::
## Plotting All Countries{auto-animate="true"}
### Lines
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "3"
ggplot(data = gapminder,
aes(x = year, y = lifeExp)) +
geom_line(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = gapminder,
aes(x = year, y = lifeExp)) +
geom_line(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
::::
::: aside
`ggplot2` doesn't know how to connect the lines!
:::
## Plotting All Countries{auto-animate="true"}
### Grouping
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "3"
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country)) +
geom_line(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country)) +
geom_line(color = "red", size = 3) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
::::
::: aside
That looks more reasonable... but the lines are too thick!
:::
## Plotting All Countries{auto-animate="true"}
### Size
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "4"
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country)) +
geom_line(color = "red") +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country)) +
geom_line(color = "red") +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
::::
::: aside
Much better... but what if we highlight regional differences?
:::
## Plotting All Countries{auto-animate="true"}
### Color
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "4-5"
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country,
color = continent)) +
geom_line() +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country,
color = continent)) +
geom_line() +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18)
```
:::
::::
::: aside
Patterns are obvious... but it might be even more impactful if we separate continents completely.
:::
## Plotting All Countries{auto-animate="true"}
### Facets
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "10"
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country,
color = continent)) +
geom_line() +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18) +
facet_wrap(vars(continent))
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country,
color = continent)) +
geom_line() +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal(base_size = 18) +
facet_wrap(vars(continent))
```
:::
::::
::: aside
Now the text is too big!
:::
## Plotting All Countries{auto-animate="true"}
### Text Size
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "9"
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country,
color = continent)) +
geom_line() +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal() +
facet_wrap(vars(continent))
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country,
color = continent)) +
geom_line() +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal() +
facet_wrap(vars(continent))
```
:::
::::
::: aside
Better. Do we even need the legend anymore?
:::
## Plotting All Countries{auto-animate="true"}
### No Legend
:::: {.columns}
:::{.column width="50%"}
```{r}
#| eval: false
#| code-line-numbers: "11"
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country,
color = continent)) +
geom_line() +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal() +
facet_wrap(vars(continent)) +
theme(legend.position = "none")
```
:::
:::{.column width="50%"}
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 6
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country,
color = continent)) +
geom_line() +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal() +
facet_wrap(vars(continent)) +
theme(legend.position = "none")
```
:::
::::
::: aside
Looking pretty good!
:::
# Lab 2 {.section-title background-color="#99a486"}
## Make a histogram
In pairs, create a histogram of life expectancy observations in the complete Gapminder dataset.
1. Set the base layer by specifying the data as `gapminder` and the x variable as `lifeExp`
2. Add a second layer to create a histogram using the function `geom_histogram()`
3. Customize your plot with nice axis labels and a title.
4. Add the color "salmon" to the entire plot (hint: use the `fill` argument, not `color`).
5. Change this fill setting to an aesthetic and map continent onto it.
6. Change the `geom` to `geom_freqpoly`. What happened and how might you fix it?
7. Add facets for `continent` (create only 1 column).
8. Add one of the [built-in themes](https://ggplot2.tidyverse.org/reference/ggtheme.html) from `ggplot2`.
9. Remove the legend from the plot.
## Solution: 1. Set Base Layer {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "1"
ggplot(gapminder, aes(x = lifeExp))
```
## Solution: 2. Add Histogram Layer {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "2"
ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(bins = 30)
```
::: aside
Setting the `bins` aesthetic tells ggplot how many values to bin by (lower is more fine-grained, higher is less descriptive).
:::
## Solution: 3. Add Label Layers {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "3-5"
ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(bins = 30) +
xlab("Life Expectancy") +
ylab("Count") +
ggtitle("Histogram of Life Expectancy in Gapminder Data")
```
## Solution: 4. Add fill setting {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "2"
ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(bins = 30, fill = "salmon") +
xlab("Life Expectancy") +
ylab("Count") +
ggtitle("Histogram of Life Expectancy in Gapminder Data")
```
## Solution: 5. Add fill aesthetic {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "1"
ggplot(gapminder, aes(x = lifeExp, fill = continent)) +
geom_histogram(bins = 30) +
xlab("Life Expectancy") +
ylab("Count") +
ggtitle("Histogram of Life Expectancy in Gapminder Data")
```
## Solution: 6. Change geometry {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "2"
ggplot(gapminder, aes(x = lifeExp, fill = continent)) +
geom_freqpoly(bins = 30) +
xlab("Life Expectancy") +
ylab("Count") +
ggtitle("Histogram of Life Expectancy in Gapminder Data")
```
## Solution: 6. Change geometry {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "2|1"
ggplot(gapminder, aes(x = lifeExp, color = continent)) +
geom_freqpoly(bins = 30) +
xlab("Life Expectancy") +
ylab("Count") +
ggtitle("Histogram of Life Expectancy in Gapminder Data")
```
## Solution: 7. Add facets {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "3"
ggplot(gapminder, aes(x = lifeExp, color = continent)) +
geom_freqpoly(bins = 30) +
facet_wrap(vars(continent), ncol = 1) +
xlab("Life Expectancy") +
ylab("Count") +
ggtitle("Histogram of Life Expectancy in Gapminder Data")
```
## Solution: 8. Add nicer theme {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "7"
ggplot(gapminder, aes(x = lifeExp, color = continent)) +
geom_freqpoly(bins = 30) +
facet_wrap(vars(continent), ncol = 1) +
xlab("Life Expectancy") +
ylab("Count") +
ggtitle("Histogram of Life Expectancy in Gapminder Data") +
theme_minimal()
```
## Solution: 9. Remove legend {auto-animate="true"}
```{r}
#| eval: true
#| fig-align: center
#| fig-width: 10
#| fig-height: 5
#| code-line-numbers: "8"
ggplot(gapminder, aes(x = lifeExp, color = continent)) +
geom_freqpoly(bins = 30) +
facet_wrap(vars(continent), ncol = 1) +
xlab("Life Expectancy") +
ylab("Count") +
ggtitle("Histogram of Life Expectancy in Gapminder Data") +
theme_minimal() +
theme(legend.position = "none")
```
# Break!{.section-title background-color="#1e4655"}
# Advanced ggplot tools{.section-title background-color="#99a486"}
## Further customization
Next, we'll discuss:
* Storing, modifying, and saving ggplots
* Advanced axis changes (scales, text, ticks)
* Legend changes (scales, colors, locations)
* Using multiple `geoms`
* Adding annotation for emphasis
## Storing Plots
We can assign a `ggplot` object to a name:
```{r}
lifeExp_by_year <-
ggplot(data = gapminder,
aes(x = year, y = lifeExp,
group = country,
color = continent)) +
geom_line() +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time") +
theme_minimal() +
facet_wrap(vars(continent)) +
theme(legend.position = "none")
```
Afterwards, you can display or modify `ggplot`s...
## Showing a Stored Graph
```{r}
#| fig-width: 10
#| fig-height: 6
lifeExp_by_year
```
## Overriding previous specifications
```{r}
#| fig-width: 12
#| fig-height: 3
#| code-line-numbers: "2"
lifeExp_by_year +
facet_grid(cols = vars(continent))
```
## Adding More Layers
```{r}
#| fig-height: 3.5
#| fig-width: 12
#| code-line-numbers: "3"
lifeExp_by_year +
facet_grid(cols = vars(continent)) +
theme(legend.position = "bottom")
```
## Saving `ggplot` Plots
If you want to save a ggplot, use `ggsave()`:
```{r}
#| eval: false
ggsave(filename = "I_saved_a_file.pdf",
plot = lifeExp_by_year,
height = 3, width = 5, units = "in")
```
If you didn't manually set font sizes, these will usually come out at a reasonable size given the dimensions of your output file.
## Changing the Axes
We can modify the axes in a variety of ways, such as:
* Change the $x$ or $y$ range using `xlim()` or `ylim()` layers
* Change to a logarithmic or square-root scale on either axis: `scale_x_log10()`, `scale_y_sqrt()`
* Change where the major/minor breaks are: `scale_x_continuous(breaks =, minor_breaks = )`
## Axis Changes
```{r}
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
#| code-line-numbers: "3"
ggplot(data = China, aes(x = year, y = gdpPercap)) +
geom_line() +
scale_y_log10(breaks = c(1000, 2000, 3000, 4000, 5000)) +
xlim(1940, 2010) + ggtitle("Chinese GDP per capita")
```
## Precise Legend Position
```{r}
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
lifeExp_by_year +
theme(legend.position = c(0.8, 0.2))
```
Instead of coordinates, you could also use "top", "bottom", "left", or "right".
## Scales for Color, Shape, etc.
**Scales** are layers that control how the mapped aesthetics appear.
You can modify these with a `scale_[aesthetic]_[option]()` layer:
:::{.incremental}
* `[aesthetic]` is `x`, `y`, `color`, `shape`, `linetype`, `alpha`, `size`, `fill`, etc.
* `[option]` is something like `manual`, `continuous`, `binned` or `discrete` (depending on nature of the variable).
:::
. . .
**Examples:**
* `scale_alpha_ordinal()`: scales alpha transparency for ordinal categorical variable
* `scale_x_log10()`: maps a log10 transformation of the x-axis variable
* `scale_color_manual()`: allows manual specification of color aesthetic
## Legend Name and Manual Colors
```{r}
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
#| code-line-numbers: "3-6"
#| eval: false
lifeExp_by_year +
theme(legend.position = c(0.8, 0.2)) +
scale_color_manual(
name = "Which continent are\nwe looking at?", # \n adds a line break
values = c("Africa" = "#4e79a7", "Americas" = "#f28e2c",
"Asia" = "#e15759", "Europe" = "#76b7b2", "Oceania" = "#59a14f"))
```
:::: {.columns}
::: {.column width="70%"}
```{r}
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
#| eval: true
#| echo: false
lifeExp_by_year +
theme(legend.position = c(0.8, 0.2)) +
scale_color_manual(
name = "Which continent are\nwe looking at?", # \n adds a line break
values = c("Africa" = "#4e79a7", "Americas" = "#f28e2c",
"Asia" = "#e15759", "Europe" = "#76b7b2", "Oceania" = "#59a14f"))
```
:::
::: {.column width="30%"}
::: {.callout-note icon=false}
## {{< fa circle-info >}} Note
This scale argument knows to "map" onto `continent` because it is specified as the aesthetic for color in our original ggplot object.
:::
:::
::::
## Fixed versus Free Scales {{< fa scroll >}} {.scrollable}
::: {.panel-tabset}
### Untransformed
```{r}
#| fig-align: center
#| code-fold: true
#| fig-width: 16
#| fig-height: 8
gapminder_sub <- gapminder |>
filter(year %in% c(1952, 1982, 2002)) # create subset with only 3 years
scales_plot <- ggplot(data = gapminder_sub,
aes(x = lifeExp, y = gdpPercap, fill = continent)) +
geom_jitter(alpha = 0.5, # alpha of points halfway transparent
pch = 21, # shape is a circle with fill
size = 3, # increase size
color = "black") + # outline of circle is black
scale_fill_viridis_d(option = "D") + # circle is filled by colors perceptable for various forms of color-blindness
facet_grid(rows = vars(year), # facet by years in the row
cols = vars(continent)) + # facet by continent in the columns
ggthemes::theme_tufte(base_size = 20) # increase base text size
scales_plot
```
### Fixed
```{r}
#| fig-align: center
#| code-fold: true
#| fig-width: 16
#| fig-height: 8
scales_plot + scale_y_log10(breaks = c(250, 1000, 10000, 50000, 115000)) # transform the y axis to the logarithm to gain better visualization
```
### Free x
```{r}
#| code-fold: true
#| fig-align: center
#| fig-width: 16
#| fig-height: 8
scales_plot + scale_y_log10(breaks = c(250, 1000, 10000, 50000, 115000)) +
facet_grid(rows = vars(year),
cols = vars(continent),
scales = "free_x") # make the x axis vary by data
```
### Free y
```{r}
#| code-fold: true
#| fig-align: center
#| fig-width: 16
#| fig-height: 8
scales_plot + scale_y_log10(breaks = c(250, 1000, 10000, 50000, 115000)) +
facet_grid(rows = vars(year),
cols = vars(continent),
scales = "free_y") # make the y axis vary by data
```
### Free x & y
```{r}
#| code-fold: true
#| fig-align: center
#| fig-width: 16
#| fig-height: 8
scales_plot + scale_y_log10(breaks = c(250, 1000, 10000, 50000, 115000)) +
facet_grid(rows = vars(year),
cols = vars(continent),
scales = "free") # make both axes vary by data
```
:::
## Using multiple `geoms` {auto-animate="true"}
```{r}
#| fig-align: center
#| code-line-numbers: "|2"
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.colour = "maroon")
```
## Using multiple `geoms` {auto-animate="true"}
```{r}
#| fig-align: center
#| code-line-numbers: "3"
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.colour = "maroon") +
geom_point(alpha = 0.25)
```
## Using multiple `geoms` {auto-animate="true"}
```{r}
#| fig-align: center
#| code-line-numbers: "3"
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.colour = "maroon") +
geom_jitter(alpha = 0.25)
```
## Using multiple `geoms` {auto-animate="true"}
```{r}
#| fig-align: center
#| code-line-numbers: "3"
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.colour = "maroon") +
geom_jitter(position = position_jitter(width = 0.1, height = 0), alpha = 0.25)
```
## Annotating specific datapoints for emphasis
```{r}
#| echo: false
outliers <- gapminder |>
group_by(continent) |>
mutate(outlier = case_when(quantile(lifeExp, probs = 0.25) - (IQR(lifeExp) * 1.5) > lifeExp ~ "outlier", # anything lower than the 1st quartile - 1.5*IQR
quantile(lifeExp, probs = 0.75) + (IQR(lifeExp) * 1.5) < lifeExp ~ "outlier", # anything higher than the 3rd quartile + 1.5*IQR
.default = NA)) |>
filter(!is.na(outlier)) |> # remove non-outliers
ungroup() |> group_by(country) |> # regroup by country
filter(lifeExp == min(lifeExp)) # filter just the min for each country
no_outliers <- gapminder |>
group_by(continent) |>
mutate(outlier = case_when(quantile(lifeExp, probs = 0.25) - (IQR(lifeExp) * 1.5) > lifeExp ~ "outlier",
quantile(lifeExp, probs = 0.75) + (IQR(lifeExp) * 1.5) < lifeExp ~ "outlier",
.default = NA)) |>
filter(is.na(outlier)) # remove outliers
```
::: {.panel-tabset}
### Basic annotation
```{r}
#| warning: false
#| fig-align: center
#| code-line-numbers: "|4"
#| code-fold: true
#| fig-width: 16
#| fig-height: 8
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.colour = "maroon", outlier.size = 3) +
geom_jitter(data = no_outliers, position = position_jitter(width = 0.1, height = 0), alpha = 0.25, size = 3) +
geom_text(data = outliers, aes(label = country), color = "maroon", size = 8) +
theme_minimal(base_size = 18)
```
### Offset annotation
```{r}
#| warning: false
#| fig-align: center
#| code-line-numbers: "|1,5"
#| code-fold: true
#| fig-width: 16
#| fig-height: 8
library(ggrepel)
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.colour = "maroon", outlier.size = 3) +
geom_jitter(data = no_outliers, position = position_jitter(width = 0.1, height = 0), alpha = 0.25, size = 3) +
geom_label_repel(data = outliers, aes(label = country), color = "maroon", alpha = 0.7, size = 8, max.overlaps = 13) +
theme_minimal(base_size = 18)
```
### Code: outliers
```{r}
#| code-fold: true
outliers <- gapminder |>
group_by(continent) |>
mutate(outlier = case_when(quantile(lifeExp, probs = 0.25) - (IQR(lifeExp) * 1.5) > lifeExp ~ "outlier", # anything lower than the 1st quartile - 1.5*IQR
quantile(lifeExp, probs = 0.75) + (IQR(lifeExp) * 1.5) < lifeExp ~ "outlier", # anything higher than the 3rd quartile + 1.5*IQR
.default = NA)) |>
filter(!is.na(outlier)) |> # remove non-outliers
ungroup() |> group_by(country) |> # regroup by country
filter(lifeExp == min(lifeExp)) # filter just the min for each country
outliers
```
### Code: no outliers
```{r}
#| code-fold: true
no_outliers <- gapminder |>
group_by(continent) |>
mutate(outlier = case_when(quantile(lifeExp, probs = 0.25) - (IQR(lifeExp) * 1.5) > lifeExp ~ "outlier",
quantile(lifeExp, probs = 0.75) + (IQR(lifeExp) * 1.5) < lifeExp ~ "outlier",
.default = NA)) |>
filter(is.na(outlier)) # remove outliers
no_outliers
```
:::
# Bonus: Advanced Example!{.section-title background-color="#99a486"}
## End Result
We're going to *slowly* build up a *really detailed plot* now!
```{r}
#| echo: false
#| fig-height: 6
#| fig-width: 10
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) +
geom_line(alpha = 0.5, aes(color = "Country", size = "Country")) +
geom_line(stat = "smooth", method = "loess",
aes(group = continent, color = "Continent", size = "Continent"), alpha = 0.5) +
facet_wrap(~ continent, nrow = 2) +
scale_color_manual(name = "Life Exp. for:", values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:", values = c("Country" = 0.25, "Continent" = 3)) +
theme_minimal(base_size = 14) + ylab("Years") + xlab("") +
ggtitle("Life Expectancy, 1952-2007", subtitle = "By continent and country") +
theme(legend.position=c(0.82, 0.15), axis.text.x = element_text(angle = 45))
```
## Base `ggplot`
::: {.panel-tabset}
### Code
```{r}
#| eval: false
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country))
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country))
```
:::
::: aside
What might be a good `geom` layer for this data?
:::
## Lines
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "3"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line()
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line()
```
::: aside
Let's also add a continent-specific average so we can visualize country-deviations from the regional average.
:::
:::
## Continent Average
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "4-6"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent))
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent))
```
:::
::: aside
A [*loess* curve](https://en.wikipedia.org/wiki/Local_regression) is something like a moving average.
:::
## Facets
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "7-8"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent,
nrow = 2)
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent, nrow = 2)
```
:::
::: aside
Facets allow us to gain a clearer understanding of the regional patterns. We want to differentiate the continent-average line from the country-specific lines though so let's change it's color.
:::
## Color Scale
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "9-10"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue"))
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent, nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue"))
```
::: aside
Hmm, can't quite see the blue line yet. Let's make it bigger?
:::
:::
## Size Scale
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "11-12"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3))
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3))
```
:::
::: aside
It doesn't look like our color and size scales are actually mapping onto our variables. Why is that?
:::
## Mapping Color & Size
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "3,5"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line(aes(color = "Country", size = "Country")) +
geom_line(stat = "smooth", method = "loess",
aes(group = continent, color = "Continent", size = "Continent")) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3))
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line(aes(color = "Country", size = "Country")) +
geom_line(stat = "smooth", method = "loess",
aes(group = continent, color = "Continent", size = "Continent")) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3))
```
:::
::: aside
Huzzah! Let's change the transparency on these lines a touch so we can see all our data more easily.
:::
## Alpha (Transparency)
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "3,7"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line(alpha = 0.5,
aes(color = "Country", size = "Country")) +
geom_line(stat = "smooth", method = "loess",
aes(group = continent, color = "Continent", size = "Continent"),
alpha = 0.5) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3))
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line(alpha = 0.5,
aes(color = "Country", size = "Country")) +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent, color = "Continent", size = "Continent"),
alpha = 0.5) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3))
```
:::
::: aside
Now we're getting somewhere! We can also add useful labels and clean up the theme.
:::
## Theme and Labels
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "13-15"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3)) +
theme_minimal(base_size = 14) +
labs(y = "Years",
x = "")
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line(alpha = 0.5,
aes(color = "Country", size = "Country")) +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent, color = "Continent", size = "Continent"),
alpha = 0.5) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3)) +
theme_minimal(base_size = 14) +
labs(y = "Years",
x = "")
```
:::
::: aside
What's our plot showing? We should be explicit about that.
:::
## Title and Subtitle
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "16-17"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3)) +
theme_minimal(base_size = 14) +
labs(y = "Years",
x = "",
title = "Life Expectancy, 1952-2007",
subtitle = "By continent and country")
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line(alpha = 0.5,
aes(color = "Country", size = "Country")) +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent, color = "Continent", size = "Continent"),
alpha = 0.5) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3)) +
theme_minimal(base_size = 14) +
labs(y = "Years",
x = "",
title = "Life Expectancy, 1952-2007",
subtitle = "By continent and country")
```
:::
::: aside
The x-axis feels a little busy right now...
:::
## Angled Tick Values
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "18"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3)) +
theme_minimal(base_size = 14) +
labs(y = "Years",
x = "",
title = "Life Expectancy, 1952-2007",
subtitle = "By continent and country") +
theme(axis.text.x = element_text(angle = 45))
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line(alpha = 0.5,
aes(color = "Country", size = "Country")) +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent, color = "Continent", size = "Continent"),
alpha = 0.5) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3)) +
theme_minimal(base_size = 14) +
labs(y = "Years",
x = "",
title = "Life Expectancy, 1952-2007",
subtitle = "By continent and country") +
theme(axis.text.x = element_text(angle = 45))
```
:::
. . .
:::aside
Note: Fewer values might be better than angled labels! Finally, let's move our legend so it isn't wasting space.
:::
## Legend Position
::: {.panel-tabset}
### Code
```{r}
#| eval: false
#| code-line-numbers: "18"
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent)) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3)) +
theme_minimal(base_size = 14) +
labs(y = "Years",
x = "",
title = "Life Expectancy, 1952-2007",
subtitle = "By continent and country") +
theme(legend.position = c(0.82, 0.15),
axis.text.x = element_text(angle = 45))
```
### Plot
```{r}
#| echo: false
#| eval: true
#| fig-height: 6
#| fig-width: 10
#| fig-align: center
ggplot(data = gapminder,
aes(x = year, y = lifeExp, group = country)) +
geom_line(alpha = 0.5,
aes(color = "Country", size = "Country")) +
geom_line(stat = "smooth",
method = "loess",
aes(group = continent, color = "Continent", size = "Continent"),
alpha = 0.5) +
facet_wrap(~ continent,
nrow = 2) +
scale_color_manual(name = "Life Exp. for:",
values = c("Country" = "black", "Continent" = "blue")) +
scale_size_manual(name = "Life Exp. for:",
values = c("Country" = 0.25, "Continent" = 3)) +
theme_minimal(base_size = 14) +
labs(y = "Years",
x = "",
title = "Life Expectancy, 1952-2007",
subtitle = "By continent and country") +
theme(legend.position=c(0.82, 0.15),
axis.text.x = element_text(angle = 45))
```
:::
::: aside
Voilà!
:::
# `ggplot` Extensions!{.section-title background-color="#99a486"}
## `tidyverse` extended universe
`ggplot2` can obviously do a lot on its own. But because `R` allows for anyone and everyone to expand the functionality of what already exists, numerous extensions^[The full list can be found [here](https://exts.ggplot2.tidyverse.org/gallery/).] to `ggplot2` have been created.
. . .
We've already seen one example with `ggrepel`. But let's look at a few others...
## `geomtextpath` {{< fa scroll >}} {.scrollable}
If you want your labels to follow along the path of your plot (and maintain proper angles and spacing) try using [`geomtextpath`](https://allancameron.github.io/geomtextpath/index.html).
```{r}
#| code-fold: true
#| fig-align: center
# install.packages("geomtextpath") <- run in console first
library(geomtextpath)
gapminder |>
filter(country %in% c("Cuba", "Haiti", "Dominican Republic")) |> # restricting data to 3 regionally-specific countries
ggplot(aes(x = year,
y = lifeExp,
color = country,
label = country)) + # specify label with text to appear
geom_textpath() + # adding textpath geom to put labels within lines
theme(legend.position = "none") # removing legend
```
## `ggridges` {{< fa scroll >}} {.scrollable}
We can visualize the differing distributions of a continuous variable by levels of a categorical variable with [ggridges](https://wilkelab.org/ggridges/)!
```{r}
#| code-fold: true
#| fig-align: center
# install.packages("ggridges") <- run in console first
library(ggridges)
ggplot(gapminder,
aes(x = lifeExp,
y = continent,
fill = continent,
color = continent)) +
geom_density_ridges(alpha = 0.5,
show.legend = FALSE) # add ridges, make all a bit transparent, remove legend
```
## Correlation Matricies {{< fa scroll >}} {.scrollable}
Make visually appealing & informative correlation plots in [`GGally`]() or [`ggcorrplot`]().
::: {.panel-tabset}
### `GGally`
```{r}
#| code-fold: true
#| fig-align: center
#| fig-width: 6
#| fig-height: 6
# install.packages("GGally") <- run in console first
library(GGally)
ggcorr(swiss,
geom = "circle",
min_size = 25, # specify minimum size of shape
max_size = 25, # specify maximum size of shape
label = TRUE, # label circles with correlation coefficient
label_alpha = TRUE, # less strong correlations have lower alpha
label_round = 2, # round correlations coefficients to 2 decimal points
legend.position = c(0.15, 0.6),
legend.size = 12)
```
### `ggcorrplot`
```{r}
#| code-fold: true
#| fig-align: center
#| #| fig-width: 6
#| fig-height: 6
# install.packages("ggcorrplot") <- run in console first
library(ggcorrplot)
# compute correlation matrix
corr <- round(cor(swiss), 1)
# computer matrix of correlation p-values
p_mat <- cor_pmat(swiss)
ggcorrplot(corr,
hc.order = TRUE, # use hierarchical clustering to group like-correlations together
type = "lower", # only show lower half of correlation matrix
p.mat = p_mat, # give corresponding p-values for correlation matrix
insig = "pch", # add default shape (an X) to correlations that are insignificant
outline.color = "black", # outline cells in white
ggtheme = ggthemes::theme_tufte(), # using a specific theme I like from ggthemes package
colors = c("#4e79a7", "white", "#e15759")) + # specify custom colors
theme(legend.position = c(0.15, 0.67))
```
### Bonus: `ggpairs()` from `GGally`
```{r}
#| code-fold: true
#| fig-align: center
ggpairs(swiss,
lower = list(continuous = wrap("smooth", # specify a smoothing line added to scatterplots
alpha = 0.5,
size=0.2))) +
ggthemes::theme_tufte() # add nice theme from ggthemes
```
:::
## `patchwork` {{< fa scroll >}} {.scrollable}
Combine separate plots into the same graphic using [`patchwork`]().
```{r}
#| code-fold: true
#| fig-align: center
# install.packages("patchwork") <- run in console first
library(patchwork)
# Create first plot object
plot_lifeExp <- ggplot(gapminder,
aes(x = lifeExp, y = continent, fill = continent, color = continent)) +
geom_density_ridges(alpha = 0.5, show.legend = FALSE)
# Create second plot object
plot_boxplot <- ggplot(gapminder,
aes(x = continent, y = lifeExp, color = continent),
alpha = 0.5) +
geom_boxplot(outlier.colour = "black", varwidth = TRUE) + # change outlier color and make width of boxes relative to N
coord_flip() + # flip the coordinates (x & y) to align with first plot
geom_jitter(position = position_jitter(width = 0.1, height = 0), # add datapoints to boxplot
alpha = 0.25) +
geom_label_repel(data = outliers, # mapping new dataset with the outliers
aes(label = country),
color = "black",
alpha = 0.7,
max.overlaps = 13) +
theme(axis.text.y = element_blank(), # remove y axis text
axis.ticks.y = element_blank(), # remove y axis ticks
axis.title.y = element_blank(), # remove y axis title
legend.position = "none")
plot_lifeExp + plot_boxplot # simply add two objects together to place side by side
```
## themes in `ggplot2` {{< fa scroll >}} {.scrollable}
There are several built-in themes within `ggplot2`.
::: {.panel-tabset}
### bw
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_bw() # reusing plot_lifeExp from previous slide and changing theme
```
### light
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_light()
```
### classic
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_classic()
```
### linedraw
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_linedraw()
```
### dark
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_dark()
```
### minimal
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_minimal()
```
### gray
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_gray()
```
### void
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_void()
```
:::
## [`ggthemes`](https://jrnold.github.io/ggthemes/) {{< fa scroll >}} {.scrollable}
::: {.panel-tabset}
### excel
```{r}
#| code-fold: true
#| fig-align: center
library(ggthemes)
plot_lifeExp + theme_excel()
```
### economist
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_economist()
```
### few
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_few()
```
### fivethirtyeight
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_fivethirtyeight()
```
### gdocs
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_gdocs()
```
### stata
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_stata()
```
### tufte
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_tufte()
```
### wsj
```{r}
#| code-fold: true
#| fig-align: center
plot_lifeExp + theme_wsj()
```
:::
## Other theme packages and making your own!
These are just a handful of all the ready-made theme options available out there. Some other packages that might be useful/fun to check out:
:::{.incremental}
* [`hrbrthemes`](https://hrbrmstr.github.io/hrbrthemes/index.html) - *provides typography-centric themes and theme components for ggplot2*
* [`urbnthemes`](https://urbaninstitute.github.io/urbnthemes/index.html) *a set of tools for creating Urban Institute-themed plots and maps in R*
* [`bbplot`](https://github.com/bbc/bbplot/) - *provides helpful functions for creating and exporting graphics made in ggplot in the style used by the BBC News data team*
* [`ggpomological`](https://www.garrickadenbuie.com/project/ggpomological/) - *A ggplot2 theme based on the USDA Pomological Watercolor Collection*
:::
. . .
You are also able to design your own theme using the `theme()` function and really [getting into the weeds](https://ggplot2.tidyverse.org/reference/theme.html) with how to specify all the non-data ink in your plot. Once you come up with a theme you like you can save it as an object (i.e. `my_theme`) and add it to any `ggplot` you create to maintain your own unique and consistent style.
# Summary{.section-title background-color="#99a486"}
## Summary
`ggplot2` can do a LOT! I don't expect you to memorize all these tools, and neither should you! With time and practice, you'll start to remember the key tools.
::::{.columns}
:::{.column width="55%"}
* When in doubt, Google it! (i.e. "*R ggplot 'whatever issue you need help with'*")
* There are lots of great resources out there:
+ The [ggplot2 reference page](https://ggplot2.tidyverse.org/reference/index.html)
+ The [Cookbook for R website](http://www.cookbook-r.com/)
+ The [RStudio ggplot Cheatsheets](https://rstudio.github.io/cheatsheets/data-visualization.pdf).
+ Kieran Healy's book [Data Visualization: A Practical Introduction](https://socviz.co/) (right) is targeted at social scientists without technical backgrounds and uses the same tools we'll be learning in this class.
:::
:::{.column width="45%"}
![](images/dataviz_kieranhealy.jpeg)
:::
::::
# Homework{.section-title background-color="#1e4655"}
## {data-menu-title="Homework 2" background-iframe="https://vsass.github.io/CSSS508/Homework/HW2/homework2.html" background-interactive=TRUE}