--- title: "cm010 Exercises: Tibble Joins" output: html_document: keep_md: true theme: paper --- ## Requirements You will need Joey's `singer` R package for this exercise. And to install that, you'll need to install `devtools`. Running this code in your console should do the trick: ``` install.packages("devtools") devtools::install_github("JoeyBernhardt/singer") ``` If you can't install the `singer` package, we've put the singer data on the `STAT545-UBC/Classroom` repo, and you can load those in instead by running the following lines of code: ``` songs <- read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/songs.csv") locations <- read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/loc.csv") ``` Load required packages: ```{r, echo = FALSE, warning = FALSE, message = FALSE} library(tidyverse) library(singer) knitr::opts_chunk$set(fig.width=4, fig.height=3, warning = FALSE, fig.align = "center") ``` ```{r allow errors, echo = FALSE} knitr::opts_chunk$set(error = TRUE) ``` ## Exercise 1: `singer` The package `singer` comes with two smallish data frames about songs. Let's take a look at them (after minor modifications by renaming and shuffling): ```{r} (time <- as_tibble(songs) %>% rename(song = title)) ``` ```{r} (album <- as_tibble(locations) %>% select(title, everything()) %>% rename(album = release, song = title)) ``` 1. We really care about the songs in `time`. But, which of those songs do we know its corresponding album? ```{r} time %>% FILL_THIS_IN(album, by = FILL_THIS_IN) ``` 2. Go ahead and add the corresponding albums to the `time` tibble, being sure to preserve rows even if album info is not readily available. ```{r} time %>% FILL_THIS_IN(album, by = FILL_THIS_IN) ``` 3. Which songs do we have "year", but not album info? ```{r} time %>% FILL_THIS_IN(album, by = "song") ``` 4. Which artists are in `time`, but not in `album`? ```{r} time %>% anti_join(album, by = "FILL_THIS_IN") ``` 5. You've come across these two tibbles, and just wish all the info was available in one tibble. What would you do? ```{r} FILL_THIS_IN %>% FILL_THIS_IN(FILL_THIS_IN, by = "song") ``` ## Exercise 2: LOTR Load in the three Lord of the Rings tibbles that we saw last time: ```{r} fell <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Fellowship_Of_The_Ring.csv") ttow <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Two_Towers.csv") retk <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Return_Of_The_King.csv") ``` 1. Combine these into a single tibble. ```{r} FILL_THIS_IN(fell, FILL_THIS_IN) ``` 2. Which races are present in "The Fellowship of the Ring" (`fell`), but not in any of the other ones? ```{r} fell %>% anti_join(ttow, by = "Race") %>% FILL_THIS_IN(FILL_THIS_IN, by = "Race") ``` ## Exercise 3: Set Operations Let's use three set functions: `intersect`, `union` and `setdiff`. We'll work with two toy tibbles named `y` and `z`, similar to Data Wrangling Cheatsheet ```{r} (y <- tibble(x1 = LETTERS[1:3], x2 = 1:3)) ``` ```{r} (z <- tibble(x1 = c("B", "C", "D"), x2 = 2:4)) ``` 1. Rows that appear in both `y` and `z` ```{r} FILL_THIS_IN(y, z) ``` 2. You collected the data in `y` on Day 1, and `z` in Day 2. Make a data set to reflect that. ```{r} FILL_THIS_IN( mutate(y, day = "Day 1"), mutate(z, day = "Day 2") ) ``` 3. The rows contained in `z` are bad! Remove those rows from `y`. ```{r} FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN) ```