{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "e1e0ee9ade87bd4efeb54e592fda2042",
     "grade": false,
     "grade_id": "cell-a52fdbcf333bb582",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Worksheet A-5: Working With Factors & Tibble Joins\n",
    "\n",
    "## Getting Started\n",
    "\n",
    "Load the requirements for this worksheet:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "e999f78c3475485bda71c1cd4a38219c",
     "grade": false,
     "grade_id": "cell-c48a21406e8bb917",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "suppressPackageStartupMessages(library(tidyverse))\n",
    "suppressPackageStartupMessages(library(tsibble))\n",
    "suppressPackageStartupMessages(library(gapminder))\n",
    "suppressPackageStartupMessages(library(testthat))\n",
    "suppressPackageStartupMessages(library(digest))\n",
    "suppressMessages({\n",
    "  time <- read_csv(\"https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/songs.csv\") %>% \n",
    "    rename(song = title)\n",
    "  album <- read_csv(\"https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/loc.csv\") %>% \n",
    "    select(title, everything()) %>% \n",
    "    rename(song = title, album = release)\n",
    "})\n",
    "suppressMessages({\n",
    "  fell <- read_csv(\"https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Fellowship_Of_The_Ring.csv\")\n",
    "  ttow <- read_csv(\"https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Two_Towers.csv\")\n",
    "  retk <- read_csv(\"https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Return_Of_The_King.csv\")\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "1a9cd5eb0778d5d434c5cefe6e3f4c55",
     "grade": false,
     "grade_id": "cell-c67ecef016e98418",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "The following code chunk has been unlocked, to give you the flexibility to start this document with some of your own code. Remember, it's bad manners to keep a call to `install.packages()` in your source code, so don't forget to delete these lines if you ever need to run them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# An unlocked code cell."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "1428d8bd85deb83393ba2b31daf743ae",
     "grade": false,
     "grade_id": "cell-dbacfd6c24590e38",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Part 0: Dates and Tsibble \n",
    "\n",
    "We'll convert dates into a year-month object with the tsibble package (loaded at the start of the worksheet).\n",
    "\n",
    "## Question 0.1\n",
    "\n",
    "Consider the built-in presidential dataset that looks at the start and ending terms of US presidents:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "9efcf5c512faf2887050ff3eeae06dc9",
     "grade": false,
     "grade_id": "cell-a37c8810b6e3de5d",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "head(presidential)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "1b27f3c8b69bb0af1782fe185acd6571",
     "grade": false,
     "grade_id": "cell-3ec12f8dddc1b136",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Use `tsibble::yearmonth()` to convert the existing start and end column dates into only year and month. Name this tibble `president_ym`.\n",
    "\n",
    "```\n",
    "president_ym <- presidential %>%\n",
    "   mutate(start = FILL_THIS_IN, \n",
    "          end = FILL_THIS_IN)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "9d733bf26d4c42d2dbb724c591c5d93c",
     "grade": false,
     "grade_id": "cell-1bb9d4544eb25006",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(president_ym)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "b589c7aaea69c8329428287aabf4751e",
     "grade": true,
     "grade_id": "cell-7d4a3a9d3ba31dd4",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 0.1\", expect_known_hash(president_ym[1,], \"8b9ac24bc52a692ab7d1bd83f9e0a19c\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "98b4a1b5a49a2cbc8f7252206d92ce39",
     "grade": false,
     "grade_id": "cell-ff4d90bac29bce18",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Part 1: Creating Factors\n",
    "\n",
    "For the best experience working with factors in R, we will use the forcats package, which is part of the tidyverse metapackage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "5960c95ea2598691226427a13ba11124",
     "grade": false,
     "grade_id": "cell-92d93e5de5944b49",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 1.1\n",
    "\n",
    "Using the gapminder dataset from the gapminder package, create a new data set for the year 1997, adding a new column `life_level` containing 5 new levels according to the following table.\n",
    "\n",
    "| Criteria          |`life_level`   | \n",
    "|-------------------|-------------|\n",
    "| less than 23      | very low    |\n",
    "| between 23 and 48 | low         |\n",
    "| between 48 and 59 | moderate    |\n",
    "| between 59 and 70 | high        |\n",
    "| more than 70      | very high   |\n",
    "\n",
    "Store this new data frame in variable `gapminder_1997`.\n",
    "\n",
    "**Hint**: We are using `case_when()`, a tidier way to vectorise multiple `if_else()` statements.\n",
    "You can read more about this function [in the tidyverse reference](https://dplyr.tidyverse.org/reference/case_when.html).\n",
    "\n",
    "```\n",
    "gapminder_1997 <- gapminder %>% \n",
    "   FILL_THIS_IN(year == FILL_THIS_IN) %>% \n",
    "   FILL_THIS_IN(life_level = case_when(FILL_THIS_IN < FILL_THIS_IN ~ \"very low\",\n",
    "                                 FILL_THIS_IN < FILL_THIS_IN ~ \"low\",\n",
    "                                 FILL_THIS_IN < FILL_THIS_IN ~ \"moderate\",\n",
    "                                 FILL_THIS_IN < FILL_THIS_IN ~ \"high\",\n",
    "                                 TRUE ~ \"very high\"))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ef0f70c7ff836ae444c855a3f6806b99",
     "grade": false,
     "grade_id": "cell-dc036d146cef2025",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(gapminder_1997)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "a3d45fc2cc08a9548cd6985ef734e9dc",
     "grade": true,
     "grade_id": "cell-f8b2ff75e5097542",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 1.1\", expect_known_hash(table(gapminder_1997$life_level), \"3d2e691667d4706e66ce5784bb1d7042\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "27362b06ff6270477cb4b383f7c307d0",
     "grade": false,
     "grade_id": "cell-5c2f86da93e2a963",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "FYI: We can now plot boxplots for the GDP per capita per level of life expectancy.\n",
    "Run the following code to see the boxplots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "483f75964e96e4e9b9c37a729a55195f",
     "grade": false,
     "grade_id": "cell-bb53dd9bcb9f2f82",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "ggplot(gapminder_1997) + geom_boxplot(aes(x = life_level, y = gdpPercap)) +\n",
    "  labs(y = \"GDP per capita ($)\", x = \"Life expectancy level (years)\") +\n",
    "  ggtitle(\"GDP per capita per Level of Life Expectancy\") +\n",
    "  theme_bw() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "4064ee6fb1f5dcdc5934c65245a694c3",
     "grade": false,
     "grade_id": "cell-95e05c5acbdd7a28",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 1.2\n",
    "\n",
    "Notice a few oddities in the above plot:\n",
    "\n",
    "- It seems that none of the countries had a \"very low\" life-expectancy in 1997. \n",
    "- However, since it was an option in our analysis it should be included in our plot. Right?\n",
    "- Notice also how levels on x-axis are placed in the \"wrong\" order. (in alphabetical order)\n",
    "\n",
    "You can correct these issues by explicitly making `life_level` a factor and setting the levels parameter.\n",
    "Create a new data frame as in Question 1.1, but make the column `life_level` a factor with levels ordered from *very low* to *very high*.\n",
    "Store this new data frame in variable `gapminder_1997_fct`.\n",
    "\n",
    "```\n",
    "gapminder_1997_fct <- gapminder %>% \n",
    "   FILL_THIS_IN(year == 1997) %>% \n",
    "   FILL_THIS_IN(life_level = FILL_THIS_IN(case_when(FILL_THIS_IN < FILL_THIS_IN ~ \"very low\",\n",
    "                                        FILL_THIS_IN < FILL_THIS_IN ~ \"low\",\n",
    "                                        FILL_THIS_IN < FILL_THIS_IN ~ \"moderate\",\n",
    "                                        FILL_THIS_IN < FILL_THIS_IN ~ \"high\",\n",
    "                                        TRUE ~ \"very high\"),\n",
    "                              levels = c('FILL_THIS_IN', 'FILL_THIS_IN', 'FILL_THIS_IN', 'FILL_THIS_IN', 'FILL_THIS_IN')))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "a6b7375914651c80d4f8e35697d45ccc",
     "grade": false,
     "grade_id": "cell-0f65d0f778ec6287",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(gapminder_1997_fct)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "3d73158d441e43c924e1d65686888a12",
     "grade": true,
     "grade_id": "cell-8448390b94500381",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 1.2\", expect_known_hash(table(gapminder_1997_fct$life_level), \"8e62f09fbd0756d7e69d1bc95715d333\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "79159e2f1e076a32d34245013baeb8c8",
     "grade": false,
     "grade_id": "cell-ab66c1758a1353d7",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Run the following code to see the boxplots from the new data frame with life expectancy level as factor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 2,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "49428ed049346010159818eab9a71a9d",
     "grade": false,
     "grade_id": "cell-a660106ec5a2bf52",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "ggplot(gapminder_1997_fct) + geom_boxplot(aes(x = life_level, y = gdpPercap)) +\n",
    "  labs(y = \"GDP per capita ($)\", x= \"Life expectancy level (years)\") +\n",
    "  scale_x_discrete(drop = FALSE) + # Don't drop the very low factor\n",
    "  ggtitle(\"GDP per capita per level of Life Expectancy\") +\n",
    "  theme_bw() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "479aa33b770ae072df04c6eb749d2f9d",
     "grade": false,
     "grade_id": "cell-d2c47ea01eefe9d8",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Part 2: Inspecting Factors\n",
    "\n",
    "In Part 1, you created your own factors, so now let's explore what categorical variables are in the `gapminder` dataset.\n",
    "\n",
    "## Question 2.1\n",
    "\n",
    "What levels does the column `continent` have?\n",
    "Assign the levels to variable `continent_levels`, using the `levels()` function. (To mix things up a bit, the template code we're giving you extracts a column using the Base R way of extracting columns -- with a dollar sign.)\n",
    "\n",
    "```\n",
    "continent_levels <- FILL_THIS_IN(gapminder$FILL_THIS_IN)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "76a3507ffca700c224121447d12fb9f0",
     "grade": false,
     "grade_id": "cell-537277f01997b17c",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "print(continent_levels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "7a8f3dcc16c9df12faae8dd82bbbc920",
     "grade": true,
     "grade_id": "cell-5aacbda4d51ef339",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 2.1\", expect_known_hash(continent_levels, \"6926255b7f073fb8e7d89773802102a6\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "811d56a160efacca9d5b30dc131e88b7",
     "grade": false,
     "grade_id": "cell-49df2e07dc37ef51",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 2.2\n",
    "\n",
    "How many levels does the column `country` have?\n",
    "Assign the number of levels to variable `gap_nr_countries`. Hint: there's a function called `nlevels()`. \n",
    "\n",
    "```\n",
    "gap_nr_countries <- FILL_THIS_IN(gapminder$FILL_THIS_IN)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "993a22e5837b922d522afb6b1edabffa",
     "grade": false,
     "grade_id": "cell-b79dc64b081f578c",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "print(gap_nr_countries)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "90b93fb9a1650573c403ca53b361b28e",
     "grade": true,
     "grade_id": "cell-e34eff454c51eedd",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 2.2\", expect_known_hash(as.integer(gap_nr_countries), \"3b6d002135d8d45a3c5f4a9fb857c323\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "fca7a6d0d7888f86dcadcf286cdaf56d",
     "grade": false,
     "grade_id": "cell-6fe136fd3a220c57",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 2.3\n",
    "\n",
    "Consider we are only interested in the following 5 countries: Egypt, Haiti, Romania, Thailand, and Venezuela.\n",
    "Create a new data frame with only these 5 countries and store it in variable `gap_5`. _Hint_: nothing new here -- use your dplyr knowledge!\n",
    "\n",
    "```\n",
    "gap_5 <- gapminder %>%\n",
    "   FILL_THIS_IN(FILL_THIS_IN %in% c(\"FILL_THIS_IN\", \"FILL_THIS_IN\", \"FILL_THIS_IN\", \"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "bc2f4d1b202663df76a5a79622e9c121",
     "grade": false,
     "grade_id": "cell-46503448ea2a9cb9",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(gap_5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 2,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "b64d97ece2323c943482bdcf2c5f1695",
     "grade": true,
     "grade_id": "cell-caed098411987014",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 2.3\", {\n",
    "  expect_known_hash(dim(gap_5), \"6c0f8c2a8d488051f33fc89b2c327dcd\")\n",
    "  expect_known_hash(table(gap_5$country), \"05b8ca3033e94f96b9ec5422a69c1207\")\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "7ed381661b1550f7be814d56cc95de47",
     "grade": false,
     "grade_id": "cell-4ad94aa7ee66ed58",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 2.4\n",
    "\n",
    "However, subsetting the data set does not affect the levels of the factors.\n",
    "The column `country` in tibble `gap_5` still has the same number of levels as in the original data frame.\n",
    "\n",
    "Your task: create a new tibble from `gap_5`, where all unused levels from column `country` are dropped. _Hint_: use the `droplevels()` function. Store new new tibble in variable `gap_5_dropped`.\n",
    "\n",
    "By way of demonstration, check the number of levels in the \"country\" column before and after the change -- we've included the code for this for you.\n",
    "\n",
    "```\n",
    "nlevels(gap_5$country)\n",
    "gap_5_dropped <- FILL_THIS_IN(FILL_THIS_IN)\n",
    "nlevels(gap_5_dropped$country)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "1ca53dfb9b1eb9799849178862b07658",
     "grade": false,
     "grade_id": "cell-7e52beeb587753f4",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(gap_5_dropped)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "1d9ba781006b505bfe882f4271047b99",
     "grade": true,
     "grade_id": "cell-806b19b02e2333e2",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 2.4\", expect_known_hash(sort(levels(gap_5_dropped$country)), \"ac97b9af845a59395697b028c5121503\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "31641405c70f672717e4e25f0b294e00",
     "grade": false,
     "grade_id": "cell-1d6499c6b3e1bea1",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 2.5\n",
    "\n",
    "The factor levels of column `continent` in data frame `gapminder` are ordered alphabetically.\n",
    "Create a new data frame, with the levels of column `continent` in *increasing* order according to their frequency (i.e., the number of rows for each continent).\n",
    "Store the new data frame in variable `gap_continent_freq`. *Hint*: Use `fct_infreq()` and `fct_rev()`.\n",
    "\n",
    "```\n",
    "gap_continent_freq <- gapminder %>%\n",
    "   mutate(continent = FILL_THIS_IN(FILL_THIS_IN(continent)))\n",
    "```\n",
    "\n",
    "**Hint**: The first `FILL_THIS_IN` corresponds to a `fct_*` function that reverses the levels of the factors. The second `FILL_THIS_IN` correspond to a `fct_*` function that orders the levels by *decreasing* frequency."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "60c7da198c7e04c9b493fa3bb0cbefd3",
     "grade": false,
     "grade_id": "cell-041e1b9fdf167cc9",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(gap_continent_freq)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "c3828180276afc0102a4978f9731e8a0",
     "grade": true,
     "grade_id": "cell-b38804e6a06d9de3",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 2.5\", expect_known_hash(table(gap_continent_freq$continent), \"0bb23ea87ce71deb5452eaae8cdbf7cf\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "20b5f11b6f24b9fed13a6135737a95b0",
     "grade": false,
     "grade_id": "cell-ff00d58b5fb34ad7",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "FYI: You can't \"see\" any difference in the tibble, but there are _attributes_ behind the hood keeping track of the order of the \"continent\" entries. You _can_ see the difference, however, in a plot, as below. Notice how the x-axis is no longer ordered alphabetically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "c9da62f05387398682067d5af42f4019",
     "grade": false,
     "grade_id": "cell-1317f4d18c821807",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "ggplot(gap_continent_freq, aes(continent)) + geom_bar()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "9ed29c3ed2da088569d8acbec4466bf3",
     "grade": false,
     "grade_id": "cell-b8379204e1f83944",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 2.6\n",
    "\n",
    "Again based on the `gapminder` data set, create another data frame, with the levels of column `continent` in *increasing* order of their average life expectancy (from column `lifeExp`).\n",
    "Store the new data frame in variable `gap_continent_life`. _Hint_: use `fct_reorder()`.\n",
    "\n",
    "```\n",
    "gap_continent_life <- gapminder %>%\n",
    "   mutate(continent = FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN, FILL_THIS_IN))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "7cb6057e8abd5a4083f9c08f43411355",
     "grade": false,
     "grade_id": "cell-d9568dc0d8c17add",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(gap_continent_life)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "b4c54847aff6c135e62d54b8a27d82ed",
     "grade": true,
     "grade_id": "cell-7afaeb0beeff31e0",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 2.6\", expect_known_hash(table(gap_continent_life$continent), \"7688676a0807063f1bfa5b4cc721c2d9\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "9a928e67b020017df4a9d01eaa41596e",
     "grade": false,
     "grade_id": "cell-c3992342f9ce1001",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Again, you can't \"see\" any difference in the tibble. But here's a plot that makes the difference clearer. Notice the ordering of the x-axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "6d289297f5bed91afe065f8082874b65",
     "grade": false,
     "grade_id": "cell-50e1fa7db691ef53",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "ggplot(gap_continent_life, aes(continent, lifeExp)) + geom_boxplot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "aa5ff73ce33259884777c352474405b2",
     "grade": false,
     "grade_id": "cell-99aa45fbba8d8199",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 2.7\n",
    "\n",
    "Consider now you want to make comparisons between countries, relative to Canada.\n",
    "Create a new data frame, with the levels of column `country` rearranged to have Canada as the first one.\n",
    "Store the new data frame in variable `gap_canada_base`.\n",
    "\n",
    "```\n",
    "(gap_canada_base <- gapminder %>%\n",
    "   mutate(country = FILL_THIS_IN(FILL_THIS_IN, \"FILL_THIS_IN\")))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8cd80b87102d4581e588fd2a73ced4f5",
     "grade": false,
     "grade_id": "cell-636b191a444f10bf",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(gap_canada_base)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "9e04c74ef9f16087badabdd4d35b0a5e",
     "grade": true,
     "grade_id": "cell-de2504ab380c7da4",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 2.7\", expect_known_hash(table(gap_canada_base$country), \"72d75ce05a16d8965f7bd0ae3fb986d3\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "9d58e265ab80ca22275c8b12936eaebd",
     "grade": false,
     "grade_id": "cell-ab318aa41d0d5f13",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Take a look at the levels of the \"country\" factor, and you'll now see Canada first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "efc00230990576f114feeda6ffd242c5",
     "grade": false,
     "grade_id": "cell-81562f5b59805bcf",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "gap_canada_base %>% \n",
    "   pull(country) %>% \n",
    "   levels()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "c9df0f3698f933b15811ea39f8968d4e",
     "grade": false,
     "grade_id": "cell-4474329a11f182dd",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 2.8\n",
    "\n",
    "Sometimes you want to manually change a few factor levels, e.g., if the level is too wide for plotting.\n",
    "Based on the `gapminder` data set, create a new data frame with the Central African Republic renamed to *Central African Rep.* and Bosnia and Herzegovina renamed to *Bosnia & Herzegovina*.\n",
    "Store the new data frame in variable `gap_car`. _Hint_: use `fct_recode()`.\n",
    "\n",
    "```\n",
    "gap_car <- gapminder %>%\n",
    "   mutate(country = FILL_THIS_IN(FILL_THIS_IN, \"Central African Rep.\" = \"FILL_THIS_IN\",\n",
    "                               \"Bosnia & Herzegovina\" = \"FILL_THIS_IN\"))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ce084968f1e3785c1637a76d3df88371",
     "grade": false,
     "grade_id": "cell-57746edd3f5caa7d",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(gap_car)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 2,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8238a1b5e4f9338ffcd0652717b9b4cf",
     "grade": true,
     "grade_id": "cell-339f3fc46993a445",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 2.8\", expect_known_hash(table(gap_car$country), \"9cc15f09cb70b5596bbf3feaa73ee471\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "debd146fdf7bd7f964caf53e5664791c",
     "grade": false,
     "grade_id": "cell-ebce2cf9fcfbd426",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Part 3: Tibble Joins\n",
    "\n",
    "At the start of this worksheet, we loaded a couple datasets from the [singer](https://github.com/JoeyBernhardt/singer) package, and called them `time` and `album`. These two data sets contain information about a few popular songs and albums.\n",
    "\n",
    "We'll practice various joins using these two datasets. You'll need to find out which join is appropriate for each case!\n",
    "\n",
    "Run the following R codes to look at the two data sets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "75a59fdaeb80fc4672eb7cfe95760f4a",
     "grade": false,
     "grade_id": "cell-e42385546e19be2e",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "head(time)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "4f30e20a803c7da4a45fa2ce0b832722",
     "grade": false,
     "grade_id": "cell-43a2ee99557fd52e",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "head(album)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "35472bba0705175e11fe420529003810",
     "grade": false,
     "grade_id": "cell-81082dfacbaa989f",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 3.1\n",
    "\n",
    "Create a new data frame containing all songs from `time` that have a corresponding album in the `album` dataset, while also adding the album information. Store the joined data set in variable `songs_with_album`.\n",
    "\n",
    "```\n",
    "songs_with_album <- time %>% \n",
    "  FILL_THIS_IN(FILL_THIS_IN, by = c(\"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5cc779ee29fc8f53bff600d1517061ca",
     "grade": false,
     "grade_id": "cell-7d3c34b0dd2dff5c",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(songs_with_album)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "3fe9eb3b0a9cb9e972977e5976a1a7a8",
     "grade": true,
     "grade_id": "cell-e851d23c3d11b3bd",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 3.1\", {\n",
    "  expect_known_hash(sort(songs_with_album$song), \"36c9ffd98cbd94a3abaa4f89cbc00db2\")\n",
    "  expect_known_hash(table(songs_with_album$artist_name), \"51f7daeec65e839e5ae6c84ac5a1cb70\")\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "0b3ef8ada4ced8c99e905200c5f4e101",
     "grade": false,
     "grade_id": "cell-bfbeb7e5c7c5da68",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 3.2\n",
    "\n",
    "Go ahead and add the corresponding albums to the `time` tibble, being sure to preserve rows even if album info is not readily available.\n",
    "Store the joined data set in variable `all_songs`.\n",
    "\n",
    "```\n",
    "all_songs <- time %>% \n",
    "  FILL_THIS_IN(FILL_THIS_IN, by = c(\"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "b69ab410a37f5b67909c472e8b372518",
     "grade": false,
     "grade_id": "cell-05f5f08439831d83",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(all_songs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "a0c1ee61efe4d5299d9a3dfd6164940b",
     "grade": true,
     "grade_id": "cell-3add15a64f96339a",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 3.2\", {\n",
    "  expect_known_hash(sort(all_songs$song), \"fc3a73c2979ed9bbd751d99325baa3e5\")\n",
    "  expect_known_hash(all_songs$album[order(all_songs$song)], \"727d92da1c4266521bef6a4fbd0f5e28\")\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "4b4e8a80b05c8cdf79e203f51f10d3d7",
     "grade": false,
     "grade_id": "cell-eff281d8aeda3161",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 3.3: Joining Rows by Columns\n",
    "\n",
    "Create a new tibble with songs from `time` for which there is no album info.\n",
    "Store the new data set in variable `songs_without_album`.\n",
    "\n",
    "```\n",
    "songs_without_album <- time %>% \n",
    "  FILL_THIS_IN(FILL_THIS_IN, by = c(\"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "93ae01956cda7b91c33d53d66815ef03",
     "grade": false,
     "grade_id": "cell-bd6ca35fe501ec50",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(songs_without_album)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "9d489d87cc1e4da2ab854637d4e17f2b",
     "grade": true,
     "grade_id": "cell-0c5e0fec11dfa949",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 3.3\", expect_known_hash(sort(songs_without_album$song), \"3e6a210ad915fb07eb7e894a7ca0e856\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 3.4\n",
    "\n",
    "Create a new tibble with all songs from artists for whom there is no album information.\n",
    "Store the new data set in variable `songs_artists_no_album`.\n",
    "\n",
    "```\n",
    "songs_artists_no_album <- time %>% \n",
    "  FILL_THIS_IN(FILL_THIS_IN, by = \"FILL_THIS_IN\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "6341db6003ef5b98261f5bf21bcaa4ef",
     "grade": false,
     "grade_id": "cell-129b24ab36ddde03",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(songs_artists_no_album)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "18489049f84f807a1e708ae07a1f0e2c",
     "grade": true,
     "grade_id": "cell-3452c71d7eb8391d",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 3.4\", expect_known_hash(table(songs_artists_no_album$artist_name), \n",
    "                                            \"244510c51477c31e6e795cbc0ca0b0d7\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "96b61edf9845504319d9cdb65bb84990",
     "grade": false,
     "grade_id": "cell-7fec11ba9e292ff8",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 3.5\n",
    "Create a new tibble with all the information from both tibbles, regardless of no corresponding information being present in the other tibble.\n",
    "Store the new data set in variable `all_songs_and_albums`.\n",
    "\n",
    "```\n",
    "all_songs_and_albums <- time %>% \n",
    "  FILL_THIS_IN(FILL_THIS_IN, by = c(\"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "c3d7b242129fe696c878dff7ca1d3387",
     "grade": false,
     "grade_id": "cell-17f8124265baf8e0",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "head(all_songs_and_albums)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "3b9cb673c387c3bb7d25c51f4d427f06",
     "grade": true,
     "grade_id": "cell-0bae4a61eb231bdc",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 3.5\", {\n",
    "  expect_known_hash(sort(all_songs_and_albums$song), \"98346204cbe6e679c49f393f1aa9cbe5\")\n",
    "  expect_known_hash(with(all_songs_and_albums, album[order(song)]), \"1ac76cf61f956a09c22402f9335eb35f\")\n",
    "  expect_known_hash(with(all_songs_and_albums, year[order(song)]), \"8b5f0b0292fc41e59b265f84d1399336\")\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "a80eaac62f4f6cc387b81e6cc71ff526",
     "grade": false,
     "grade_id": "cell-3bca1c6621e1410f",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Part 4: Concatenating Rows\n",
    "\n",
    "At the start of the worksheet, we loaded three Lord of the Rings datasets (one for each of the three movies). Run the following R codes to take a look at the 3 tibbles:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "cf3c8b40799f4e0901e1ba86eed4b994",
     "grade": false,
     "grade_id": "cell-83ad6b3db4ab34cd",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "fell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "735e681bd18461f3016d948b43c8787a",
     "grade": false,
     "grade_id": "cell-d6e0b8ca24c1e571",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "ttow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5829d1044a023667ebcb840e4e1b02c2",
     "grade": false,
     "grade_id": "cell-35c54be74e77d971",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "retk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "6df4393256ea2618a5bda092ee43a195",
     "grade": false,
     "grade_id": "cell-dccbf8195a0dae5c",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 4.1\n",
    "\n",
    "Combine the three data sets into a single tibble, storing the new tibble in variable `lotr`.\n",
    "\n",
    "```\n",
    "lotr <- FILL_THIS_IN(fell, ttow, retk)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "c124fce98a9b8d0005c5af1541b6b970",
     "grade": false,
     "grade_id": "cell-ef7784a633e9d4cb",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "print(lotr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "0fc85ffc26aa13c84a427df1c7b6f11e",
     "grade": true,
     "grade_id": "cell-f900e47761160711",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 4.1\", expect_known_hash(table(lotr$Film), \"41c29122f6c217d447e85a9069f5a92f\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "26509d11eadc1f644d5e98d3f9e1d9d5",
     "grade": false,
     "grade_id": "cell-d1bd923a8a3c9a16",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Part 5: Set Operations\n",
    "\n",
    "Let's use three set functions: `intersect()`, `union()` and `setdiff()`.\n",
    "They work for data frames with the same column names.\n",
    "\n",
    "We'll work with two toy tibbles named `y` and `z`, similar to the Data Wrangling Cheatsheet.\n",
    "\n",
    "Run the following R codes to create the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "70773086505cfcc2bf778def8428e9e8",
     "grade": false,
     "grade_id": "cell-baae74bab6546360",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "(y <-  tibble(x1 = LETTERS[1:3], x2 = 1:3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ef560d32a3416dbb9632c4d77c2af662",
     "grade": false,
     "grade_id": "cell-2b7351a73648122a",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "(z <- tibble(x1 = c(\"B\", \"C\", \"D\"), x2 = 2:4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "282f5a3dd799ea922bbf17d055d8ac2c",
     "grade": false,
     "grade_id": "cell-5e730724ecd74680",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 5.1\n",
    "\n",
    "Use one of the three methods mentioned above to create a new data set which contains all rows that appear in both `y` and `z`.\n",
    "Store the new data frame in variable `in_both`\n",
    "\n",
    "```\n",
    "in_both <- FILL_THIS_IN(y, z)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "f34d6e1fa9f1bb1cd7d083f3d51d5f8e",
     "grade": false,
     "grade_id": "cell-e267783b3599e511",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "in_both"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "033a343ffb57c19e288b52e0f3a24448",
     "grade": true,
     "grade_id": "cell-52d241c8bca32dad",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 5.1\", expect_known_hash(in_both$x1, \"745ec49ab3231655a04484be44a15f98\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "fedddaca18f2be0fd68c2f848c31ce32",
     "grade": false,
     "grade_id": "cell-26dd7f2f3531aef8",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 5.2\n",
    "Assume that rows in `y` are from *Day 1* and rows in `z` are from *Day 2*.\n",
    "Create a new data set with all rows from `y` and `z`, as well as an additional column `day` which is *Day 1* for rows from `y` and *Day 2* for rows from `z`.\n",
    "Store the new data set in variable `both_days`.\n",
    "\n",
    "```\n",
    "both_days <- FILL_THIS_IN(\n",
    "  mutate(y, day = \"Day 1\"),\n",
    "  mutate(z, day = \"Day 2\")\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "4845d5d3090b34c53e1e9139ef317f7c",
     "grade": false,
     "grade_id": "cell-cb58ae9785411629",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "both_days"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "490b32df7c2550866c72f06573441b2e",
     "grade": true,
     "grade_id": "cell-50e235b2f53237e1",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 5.2\", expect_known_hash(with(both_days, x1[order(x2, day)]), \"66b9eefd39c2f0b5d130453c139a2051\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "8587d824ff4b560782dd97fd5dd84d4f",
     "grade": false,
     "grade_id": "cell-199323d324c8580e",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Question 5.3\n",
    "\n",
    "The rows contained in `z` are bad.\n",
    "Use one of the three methods mentioned above to create a new data set which contains only the rows from `y` which are not in `z`.\n",
    "Store the new data frame in variable `only_y`\n",
    "\n",
    "```\n",
    "only_y <- FILL_THIS_IN(y, z)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "lines_to_next_cell": 0,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "02a4edeff864160a7265c3968b7fc732",
     "grade": false,
     "grade_id": "cell-f23a4f7c749f9a42",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# your code here\n",
    "fail() # No Answer - remove if you provide an answer\n",
    "only_y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8ef3b98e9a5b5ad79b39ff547de4e1bd",
     "grade": true,
     "grade_id": "cell-99b0c66bc2260c3a",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "test_that(\"Question 5.3\", expect_known_hash(only_y$x1, \"75f1160e72554f4270c809f041c7a776\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Attribution\n",
    "\n",
    "Assembled by Almas Khan and Vincenzo Coia, reviewed by Diana Lin, and assisted by David Kepplinger."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all"
  },
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.4.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}