library(tidyverse)

Given your preparation for today’s class, now let’s practice generating layered graphics in R using data from Gapminder World, which compiles country-level data on quality-of-life measures.

Load the gapminder dataset

If you have not already installed the gapminder package and you try to load it using the following code, you will get an error:

library(gapminder)
Error in library(gapminder) : there is no package called ‘gapminder’

If this happens, install the gapminder package by running install.packages("gapminder") in your console.

Once you’ve done this, run the following code to load the gapminder dataset, the ggplot2 library, and a helper library for printing the contents of gapminder:

library(gapminder)
library(ggplot2)
library(tibble)

glimpse(gapminder)
## Observations: 1,704
## Variables: 6
## $ country   <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, ...
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia...
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992...
## $ lifeExp   <dbl> 28.8, 30.3, 32.0, 34.0, 36.1, 38.4, 39.9, 40.8, 41.7...
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 1488...
## $ gdpPercap <dbl> 779, 821, 853, 836, 740, 786, 978, 852, 649, 635, 72...

Run ?gapminder in the console to open the help file for the data and definitions for each of the columns.

Using the grammar of graphics and your knowledge of the ggplot2 library, generate a series of graphs that explore the relationships between specific variables.

Exercises

Generate a histogram of life expectancy

Click for the solution

ggplot(data = gapminder, mapping = aes(x = lifeExp)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Generate separate histograms of life expectancy for each continent

Hint: think about how to split your plots to show different subsets of data.

Click for the solution

ggplot(data = gapminder, mapping = aes(x = lifeExp)) +
  geom_histogram() +
  facet_wrap(~ continent)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Compare the distribution of life expectancy, by continent by generating a boxplot

Click for the solution

ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp)) +
  geom_boxplot()

Redraw the plot, but this time use a violin plot

Click for the solution

ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp)) +
  geom_violin()

Generate a scatterplot of the relationship between per capita GDP and life expectancy

Click for the solution

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point()

Add a smoothing line to the scatterplot

Click for the solution

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'gam'

Identify whether this relationship differs by continent

Use the color aesthetic to identify differences

Click for the solution

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess'

Use faceting to identify differences

Click for the solution

# using facet_wrap()
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point() +
  geom_smooth() +
  facet_wrap(~ continent)
## `geom_smooth()` using method = 'loess'

# using facet_grid()
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point() +
  geom_smooth() +
  facet_grid(. ~ continent)
## `geom_smooth()` using method = 'loess'

Why use facet_grid() here instead of facet_wrap()? Good question! Let’s reframe it and instead ask, what is the difference between facet_grid() and facet_wrap()?1

The answer below refers to the case when you have 2 arguments in facet_grid() or facet_wrap(). facet_grid(x ~ y) will display \(x \times y\) plots even if some plots are empty. For example:

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  facet_grid(cyl ~ class)

There are 4 distinct cyl and 7 distinct class values. This plot displays \(4 \times 7 = 28\) plots, even if some are empty (because some classes do not have corresponding cylinder values, like rows with class = "midsize" doesn’t have any corresponding cyl = 5 value ).

facet_wrap(x ~ y) displays only the plots having actual values.

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  facet_wrap(~ cyl + class)

There are 19 plots displayed now, one for every combination of cyl and class. So for this exercise, I would use facet_wrap() because we are faceting on a single variable. If we faceted on multiple variables, facet_grid() may be more appropriate.

Bonus: Identify the outlying countries on the right-side of the graph by labeling each observation with the country name

Click for the solution

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, label = country)) +
  geom_smooth() +
  geom_text()
## `geom_smooth()` using method = 'gam'

Session Info

devtools::session_info()
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.3 (2017-11-30)
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/Chicago             
##  date     2018-03-30
## Packages -----------------------------------------------------------------
##  package    * version    date       source                             
##  assertthat   0.2.0      2017-04-11 CRAN (R 3.4.0)                     
##  backports    1.1.2      2017-12-13 CRAN (R 3.4.3)                     
##  base       * 3.4.3      2017-12-07 local                              
##  bindr        0.1.1      2018-03-13 CRAN (R 3.4.3)                     
##  bindrcpp     0.2        2017-06-17 CRAN (R 3.4.0)                     
##  broom        0.4.3      2017-11-20 CRAN (R 3.4.1)                     
##  cellranger   1.1.0      2016-07-27 CRAN (R 3.4.0)                     
##  cli          1.0.0      2017-11-05 CRAN (R 3.4.2)                     
##  colorspace   1.3-2      2016-12-14 CRAN (R 3.4.0)                     
##  compiler     3.4.3      2017-12-07 local                              
##  crayon       1.3.4      2017-10-03 Github (gaborcsardi/crayon@b5221ab)
##  datasets   * 3.4.3      2017-12-07 local                              
##  devtools     1.13.5     2018-02-18 CRAN (R 3.4.3)                     
##  digest       0.6.15     2018-01-28 CRAN (R 3.4.3)                     
##  dplyr      * 0.7.4.9000 2017-10-03 Github (tidyverse/dplyr@1a0730a)   
##  evaluate     0.10.1     2017-06-24 CRAN (R 3.4.1)                     
##  forcats    * 0.3.0      2018-02-19 CRAN (R 3.4.3)                     
##  foreign      0.8-69     2017-06-22 CRAN (R 3.4.3)                     
##  ggplot2    * 2.2.1      2016-12-30 CRAN (R 3.4.0)                     
##  glue         1.2.0      2017-10-29 CRAN (R 3.4.2)                     
##  graphics   * 3.4.3      2017-12-07 local                              
##  grDevices  * 3.4.3      2017-12-07 local                              
##  grid         3.4.3      2017-12-07 local                              
##  gtable       0.2.0      2016-02-26 CRAN (R 3.4.0)                     
##  haven        1.1.1      2018-01-18 CRAN (R 3.4.3)                     
##  hms          0.4.2      2018-03-10 CRAN (R 3.4.3)                     
##  htmltools    0.3.6      2017-04-28 CRAN (R 3.4.0)                     
##  httr         1.3.1      2017-08-20 CRAN (R 3.4.1)                     
##  jsonlite     1.5        2017-06-01 CRAN (R 3.4.0)                     
##  knitr        1.20       2018-02-20 CRAN (R 3.4.3)                     
##  lattice      0.20-35    2017-03-25 CRAN (R 3.4.3)                     
##  lazyeval     0.2.1      2017-10-29 CRAN (R 3.4.2)                     
##  lubridate    1.7.3      2018-02-27 CRAN (R 3.4.3)                     
##  magrittr     1.5        2014-11-22 CRAN (R 3.4.0)                     
##  memoise      1.1.0      2017-04-21 CRAN (R 3.4.0)                     
##  methods    * 3.4.3      2017-12-07 local                              
##  mnormt       1.5-5      2016-10-15 CRAN (R 3.4.0)                     
##  modelr       0.1.1      2017-08-10 local                              
##  munsell      0.4.3      2016-02-13 CRAN (R 3.4.0)                     
##  nlme         3.1-131.1  2018-02-16 CRAN (R 3.4.3)                     
##  parallel     3.4.3      2017-12-07 local                              
##  pillar       1.2.1      2018-02-27 CRAN (R 3.4.3)                     
##  pkgconfig    2.0.1      2017-03-21 CRAN (R 3.4.0)                     
##  plyr         1.8.4      2016-06-08 CRAN (R 3.4.0)                     
##  psych        1.7.8      2017-09-09 CRAN (R 3.4.1)                     
##  purrr      * 0.2.4      2017-10-18 CRAN (R 3.4.2)                     
##  R6           2.2.2      2017-06-17 CRAN (R 3.4.0)                     
##  Rcpp         0.12.15    2018-01-20 CRAN (R 3.4.3)                     
##  readr      * 1.1.1      2017-05-16 CRAN (R 3.4.0)                     
##  readxl       1.0.0      2017-04-18 CRAN (R 3.4.0)                     
##  reshape2     1.4.3      2017-12-11 CRAN (R 3.4.3)                     
##  rlang        0.2.0      2018-02-20 cran (@0.2.0)                      
##  rmarkdown    1.9        2018-03-01 CRAN (R 3.4.3)                     
##  rprojroot    1.3-2      2018-01-03 CRAN (R 3.4.3)                     
##  rstudioapi   0.7        2017-09-07 CRAN (R 3.4.1)                     
##  rvest        0.3.2      2016-06-17 CRAN (R 3.4.0)                     
##  scales       0.5.0      2017-08-24 cran (@0.5.0)                      
##  stats      * 3.4.3      2017-12-07 local                              
##  stringi      1.1.7      2018-03-12 CRAN (R 3.4.3)                     
##  stringr    * 1.3.0      2018-02-19 CRAN (R 3.4.3)                     
##  tibble     * 1.4.2      2018-01-22 CRAN (R 3.4.3)                     
##  tidyr      * 0.8.0      2018-01-29 CRAN (R 3.4.3)                     
##  tidyverse  * 1.2.1      2017-11-14 CRAN (R 3.4.2)                     
##  tools        3.4.3      2017-12-07 local                              
##  utils      * 3.4.3      2017-12-07 local                              
##  withr        2.1.1      2017-12-19 CRAN (R 3.4.3)                     
##  xml2         1.2.0      2018-01-24 CRAN (R 3.4.3)                     
##  yaml         2.1.18     2018-03-08 CRAN (R 3.4.4)

  1. Example drawn from this StackOverflow thread.

This work is licensed under the CC BY-NC 4.0 Creative Commons License.