ggplot2
library(tidyverse)
Given your preparation for today’s class, now let’s practice generating layered graphics in R using data from Gapminder World, which compiles country-level data on quality-of-life measures.
gapminder
datasetIf you have not already installed the gapminder
package and you try to load it using the following code, you will get an error:
library(gapminder)
Error in library(gapminder) : there is no package called ‘gapminder’
If this happens, install the gapminder package by running install.packages("gapminder")
in your console.
Once you’ve done this, run the following code to load the gapminder dataset, the ggplot2
library, and a helper library for printing the contents of gapminder
:
library(gapminder)
library(ggplot2)
library(tibble)
glimpse(gapminder)
## Observations: 1,704
## Variables: 6
## $ country <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, ...
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia...
## $ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992...
## $ lifeExp <dbl> 28.8, 30.3, 32.0, 34.0, 36.1, 38.4, 39.9, 40.8, 41.7...
## $ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 1488...
## $ gdpPercap <dbl> 779, 821, 853, 836, 740, 786, 978, 852, 649, 635, 72...
Run
?gapminder
in the console to open the help file for the data and definitions for each of the columns.
Using the grammar of graphics and your knowledge of the ggplot2
library, generate a series of graphs that explore the relationships between specific variables.
ggplot(data = gapminder, mapping = aes(x = lifeExp)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Hint: think about how to split your plots to show different subsets of data.
ggplot(data = gapminder, mapping = aes(x = lifeExp)) +
geom_histogram() +
facet_wrap(~ continent)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp)) +
geom_boxplot()
ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp)) +
geom_violin()
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'gam'
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess'
# using facet_wrap()
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
geom_smooth() +
facet_wrap(~ continent)
## `geom_smooth()` using method = 'loess'
# using facet_grid()
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
geom_smooth() +
facet_grid(. ~ continent)
## `geom_smooth()` using method = 'loess'
Why use facet_grid()
here instead of facet_wrap()
? Good question! Let’s reframe it and instead ask, what is the difference between facet_grid()
and facet_wrap()
?1
The answer below refers to the case when you have 2 arguments in facet_grid()
or facet_wrap()
. facet_grid(x ~ y)
will display \(x \times y\) plots even if some plots are empty. For example:
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_grid(cyl ~ class)
There are 4 distinct cyl
and 7 distinct class
values. This plot displays \(4 \times 7 = 28\) plots, even if some are empty (because some classes do not have corresponding cylinder values, like rows with class = "midsize"
doesn’t have any corresponding cyl = 5
value ).
facet_wrap(x ~ y)
displays only the plots having actual values.
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(~ cyl + class)
cyl
and class
. So for this exercise, I would use facet_wrap()
because we are faceting on a single variable. If we faceted on multiple variables, facet_grid()
may be more appropriate.
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, label = country)) +
geom_smooth() +
geom_text()
## `geom_smooth()` using method = 'gam'
devtools::session_info()
## Session info -------------------------------------------------------------
## setting value
## version R version 3.4.3 (2017-11-30)
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/Chicago
## date 2018-03-30
## Packages -----------------------------------------------------------------
## package * version date source
## assertthat 0.2.0 2017-04-11 CRAN (R 3.4.0)
## backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
## base * 3.4.3 2017-12-07 local
## bindr 0.1.1 2018-03-13 CRAN (R 3.4.3)
## bindrcpp 0.2 2017-06-17 CRAN (R 3.4.0)
## broom 0.4.3 2017-11-20 CRAN (R 3.4.1)
## cellranger 1.1.0 2016-07-27 CRAN (R 3.4.0)
## cli 1.0.0 2017-11-05 CRAN (R 3.4.2)
## colorspace 1.3-2 2016-12-14 CRAN (R 3.4.0)
## compiler 3.4.3 2017-12-07 local
## crayon 1.3.4 2017-10-03 Github (gaborcsardi/crayon@b5221ab)
## datasets * 3.4.3 2017-12-07 local
## devtools 1.13.5 2018-02-18 CRAN (R 3.4.3)
## digest 0.6.15 2018-01-28 CRAN (R 3.4.3)
## dplyr * 0.7.4.9000 2017-10-03 Github (tidyverse/dplyr@1a0730a)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
## forcats * 0.3.0 2018-02-19 CRAN (R 3.4.3)
## foreign 0.8-69 2017-06-22 CRAN (R 3.4.3)
## ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.4.0)
## glue 1.2.0 2017-10-29 CRAN (R 3.4.2)
## graphics * 3.4.3 2017-12-07 local
## grDevices * 3.4.3 2017-12-07 local
## grid 3.4.3 2017-12-07 local
## gtable 0.2.0 2016-02-26 CRAN (R 3.4.0)
## haven 1.1.1 2018-01-18 CRAN (R 3.4.3)
## hms 0.4.2 2018-03-10 CRAN (R 3.4.3)
## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
## httr 1.3.1 2017-08-20 CRAN (R 3.4.1)
## jsonlite 1.5 2017-06-01 CRAN (R 3.4.0)
## knitr 1.20 2018-02-20 CRAN (R 3.4.3)
## lattice 0.20-35 2017-03-25 CRAN (R 3.4.3)
## lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.2)
## lubridate 1.7.3 2018-02-27 CRAN (R 3.4.3)
## magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
## memoise 1.1.0 2017-04-21 CRAN (R 3.4.0)
## methods * 3.4.3 2017-12-07 local
## mnormt 1.5-5 2016-10-15 CRAN (R 3.4.0)
## modelr 0.1.1 2017-08-10 local
## munsell 0.4.3 2016-02-13 CRAN (R 3.4.0)
## nlme 3.1-131.1 2018-02-16 CRAN (R 3.4.3)
## parallel 3.4.3 2017-12-07 local
## pillar 1.2.1 2018-02-27 CRAN (R 3.4.3)
## pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0)
## plyr 1.8.4 2016-06-08 CRAN (R 3.4.0)
## psych 1.7.8 2017-09-09 CRAN (R 3.4.1)
## purrr * 0.2.4 2017-10-18 CRAN (R 3.4.2)
## R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
## Rcpp 0.12.15 2018-01-20 CRAN (R 3.4.3)
## readr * 1.1.1 2017-05-16 CRAN (R 3.4.0)
## readxl 1.0.0 2017-04-18 CRAN (R 3.4.0)
## reshape2 1.4.3 2017-12-11 CRAN (R 3.4.3)
## rlang 0.2.0 2018-02-20 cran (@0.2.0)
## rmarkdown 1.9 2018-03-01 CRAN (R 3.4.3)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3)
## rstudioapi 0.7 2017-09-07 CRAN (R 3.4.1)
## rvest 0.3.2 2016-06-17 CRAN (R 3.4.0)
## scales 0.5.0 2017-08-24 cran (@0.5.0)
## stats * 3.4.3 2017-12-07 local
## stringi 1.1.7 2018-03-12 CRAN (R 3.4.3)
## stringr * 1.3.0 2018-02-19 CRAN (R 3.4.3)
## tibble * 1.4.2 2018-01-22 CRAN (R 3.4.3)
## tidyr * 0.8.0 2018-01-29 CRAN (R 3.4.3)
## tidyverse * 1.2.1 2017-11-14 CRAN (R 3.4.2)
## tools 3.4.3 2017-12-07 local
## utils * 3.4.3 2017-12-07 local
## withr 2.1.1 2017-12-19 CRAN (R 3.4.3)
## xml2 1.2.0 2018-01-24 CRAN (R 3.4.3)
## yaml 2.1.18 2018-03-08 CRAN (R 3.4.4)
Example drawn from this StackOverflow thread.↩
This work is licensed under the CC BY-NC 4.0 Creative Commons License.