{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "markdown", "checksum": "e1e0ee9ade87bd4efeb54e592fda2042", "grade": false, "grade_id": "cell-a52fdbcf333bb582", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Worksheet A-5: Working With Factors & Tibble Joins\n", "\n", "## Getting Started\n", "\n", "Load the requirements for this worksheet:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "e999f78c3475485bda71c1cd4a38219c", "grade": false, "grade_id": "cell-c48a21406e8bb917", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "suppressPackageStartupMessages(library(tidyverse))\n", "suppressPackageStartupMessages(library(tsibble))\n", "suppressPackageStartupMessages(library(gapminder))\n", "suppressPackageStartupMessages(library(testthat))\n", "suppressPackageStartupMessages(library(digest))\n", "suppressMessages({\n", " time <- read_csv(\"https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/songs.csv\") %>% \n", " rename(song = title)\n", " album <- read_csv(\"https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/loc.csv\") %>% \n", " select(title, everything()) %>% \n", " rename(song = title, album = release)\n", "})\n", "suppressMessages({\n", " fell <- read_csv(\"https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Fellowship_Of_The_Ring.csv\")\n", " ttow <- read_csv(\"https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Two_Towers.csv\")\n", " retk <- read_csv(\"https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Return_Of_The_King.csv\")\n", "})" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "markdown", "checksum": "1a9cd5eb0778d5d434c5cefe6e3f4c55", "grade": false, "grade_id": "cell-c67ecef016e98418", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "The following code chunk has been unlocked, to give you the flexibility to start this document with some of your own code. Remember, it's bad manners to keep a call to `install.packages()` in your source code, so don't forget to delete these lines if you ever need to run them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# An unlocked code cell." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "1428d8bd85deb83393ba2b31daf743ae", "grade": false, "grade_id": "cell-dbacfd6c24590e38", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Part 0: Dates and Tsibble \n", "\n", "We'll convert dates into a year-month object with the tsibble package (loaded at the start of the worksheet).\n", "\n", "## Question 0.1\n", "\n", "Consider the built-in presidential dataset that looks at the start and ending terms of US presidents:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "9efcf5c512faf2887050ff3eeae06dc9", "grade": false, "grade_id": "cell-a37c8810b6e3de5d", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "head(presidential)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "1b27f3c8b69bb0af1782fe185acd6571", "grade": false, "grade_id": "cell-3ec12f8dddc1b136", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Use `tsibble::yearmonth()` to convert the existing start and end column dates into only year and month. Name this tibble `president_ym`.\n", "\n", "```\n", "president_ym <- presidential %>%\n", " mutate(start = FILL_THIS_IN, \n", " end = FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "9d733bf26d4c42d2dbb724c591c5d93c", "grade": false, "grade_id": "cell-1bb9d4544eb25006", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(president_ym)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "b589c7aaea69c8329428287aabf4751e", "grade": true, "grade_id": "cell-7d4a3a9d3ba31dd4", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 0.1\", expect_known_hash(president_ym[1,], \"8b9ac24bc52a692ab7d1bd83f9e0a19c\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "98b4a1b5a49a2cbc8f7252206d92ce39", "grade": false, "grade_id": "cell-ff4d90bac29bce18", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Part 1: Creating Factors\n", "\n", "For the best experience working with factors in R, we will use the forcats package, which is part of the tidyverse metapackage." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "5960c95ea2598691226427a13ba11124", "grade": false, "grade_id": "cell-92d93e5de5944b49", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 1.1\n", "\n", "Using the gapminder dataset from the gapminder package, create a new data set for the year 1997, adding a new column `life_level` containing 5 new levels according to the following table.\n", "\n", "| Criteria |`life_level` | \n", "|-------------------|-------------|\n", "| less than 23 | very low |\n", "| between 23 and 48 | low |\n", "| between 48 and 59 | moderate |\n", "| between 59 and 70 | high |\n", "| more than 70 | very high |\n", "\n", "Store this new data frame in variable `gapminder_1997`.\n", "\n", "**Hint**: We are using `case_when()`, a tidier way to vectorise multiple `if_else()` statements.\n", "You can read more about this function [in the tidyverse reference](https://dplyr.tidyverse.org/reference/case_when.html).\n", "\n", "```\n", "gapminder_1997 <- gapminder %>% \n", " FILL_THIS_IN(year == FILL_THIS_IN) %>% \n", " FILL_THIS_IN(life_level = case_when(FILL_THIS_IN < FILL_THIS_IN ~ \"very low\",\n", " FILL_THIS_IN < FILL_THIS_IN ~ \"low\",\n", " FILL_THIS_IN < FILL_THIS_IN ~ \"moderate\",\n", " FILL_THIS_IN < FILL_THIS_IN ~ \"high\",\n", " TRUE ~ \"very high\"))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "ef0f70c7ff836ae444c855a3f6806b99", "grade": false, "grade_id": "cell-dc036d146cef2025", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(gapminder_1997)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "a3d45fc2cc08a9548cd6985ef734e9dc", "grade": true, "grade_id": "cell-f8b2ff75e5097542", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 1.1\", expect_known_hash(table(gapminder_1997$life_level), \"3d2e691667d4706e66ce5784bb1d7042\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "27362b06ff6270477cb4b383f7c307d0", "grade": false, "grade_id": "cell-5c2f86da93e2a963", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "FYI: We can now plot boxplots for the GDP per capita per level of life expectancy.\n", "Run the following code to see the boxplots." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "483f75964e96e4e9b9c37a729a55195f", "grade": false, "grade_id": "cell-bb53dd9bcb9f2f82", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "ggplot(gapminder_1997) + geom_boxplot(aes(x = life_level, y = gdpPercap)) +\n", " labs(y = \"GDP per capita ($)\", x = \"Life expectancy level (years)\") +\n", " ggtitle(\"GDP per capita per Level of Life Expectancy\") +\n", " theme_bw() " ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "4064ee6fb1f5dcdc5934c65245a694c3", "grade": false, "grade_id": "cell-95e05c5acbdd7a28", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 1.2\n", "\n", "Notice a few oddities in the above plot:\n", "\n", "- It seems that none of the countries had a \"very low\" life-expectancy in 1997. \n", "- However, since it was an option in our analysis it should be included in our plot. Right?\n", "- Notice also how levels on x-axis are placed in the \"wrong\" order. (in alphabetical order)\n", "\n", "You can correct these issues by explicitly making `life_level` a factor and setting the levels parameter.\n", "Create a new data frame as in Question 1.1, but make the column `life_level` a factor with levels ordered from *very low* to *very high*.\n", "Store this new data frame in variable `gapminder_1997_fct`.\n", "\n", "```\n", "gapminder_1997_fct <- gapminder %>% \n", " FILL_THIS_IN(year == 1997) %>% \n", " FILL_THIS_IN(life_level = FILL_THIS_IN(case_when(FILL_THIS_IN < FILL_THIS_IN ~ \"very low\",\n", " FILL_THIS_IN < FILL_THIS_IN ~ \"low\",\n", " FILL_THIS_IN < FILL_THIS_IN ~ \"moderate\",\n", " FILL_THIS_IN < FILL_THIS_IN ~ \"high\",\n", " TRUE ~ \"very high\"),\n", " levels = c('FILL_THIS_IN', 'FILL_THIS_IN', 'FILL_THIS_IN', 'FILL_THIS_IN', 'FILL_THIS_IN')))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "a6b7375914651c80d4f8e35697d45ccc", "grade": false, "grade_id": "cell-0f65d0f778ec6287", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(gapminder_1997_fct)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "3d73158d441e43c924e1d65686888a12", "grade": true, "grade_id": "cell-8448390b94500381", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 1.2\", expect_known_hash(table(gapminder_1997_fct$life_level), \"8e62f09fbd0756d7e69d1bc95715d333\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "79159e2f1e076a32d34245013baeb8c8", "grade": false, "grade_id": "cell-ab66c1758a1353d7", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Run the following code to see the boxplots from the new data frame with life expectancy level as factor." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 2, "nbgrader": { "cell_type": "code", "checksum": "49428ed049346010159818eab9a71a9d", "grade": false, "grade_id": "cell-a660106ec5a2bf52", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "ggplot(gapminder_1997_fct) + geom_boxplot(aes(x = life_level, y = gdpPercap)) +\n", " labs(y = \"GDP per capita ($)\", x= \"Life expectancy level (years)\") +\n", " scale_x_discrete(drop = FALSE) + # Don't drop the very low factor\n", " ggtitle(\"GDP per capita per level of Life Expectancy\") +\n", " theme_bw() " ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "479aa33b770ae072df04c6eb749d2f9d", "grade": false, "grade_id": "cell-d2c47ea01eefe9d8", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Part 2: Inspecting Factors\n", "\n", "In Part 1, you created your own factors, so now let's explore what categorical variables are in the `gapminder` dataset.\n", "\n", "## Question 2.1\n", "\n", "What levels does the column `continent` have?\n", "Assign the levels to variable `continent_levels`, using the `levels()` function. (To mix things up a bit, the template code we're giving you extracts a column using the Base R way of extracting columns -- with a dollar sign.)\n", "\n", "```\n", "continent_levels <- FILL_THIS_IN(gapminder$FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "76a3507ffca700c224121447d12fb9f0", "grade": false, "grade_id": "cell-537277f01997b17c", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "print(continent_levels)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "7a8f3dcc16c9df12faae8dd82bbbc920", "grade": true, "grade_id": "cell-5aacbda4d51ef339", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 2.1\", expect_known_hash(continent_levels, \"6926255b7f073fb8e7d89773802102a6\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "811d56a160efacca9d5b30dc131e88b7", "grade": false, "grade_id": "cell-49df2e07dc37ef51", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 2.2\n", "\n", "How many levels does the column `country` have?\n", "Assign the number of levels to variable `gap_nr_countries`. Hint: there's a function called `nlevels()`. \n", "\n", "```\n", "gap_nr_countries <- FILL_THIS_IN(gapminder$FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "993a22e5837b922d522afb6b1edabffa", "grade": false, "grade_id": "cell-b79dc64b081f578c", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "print(gap_nr_countries)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "90b93fb9a1650573c403ca53b361b28e", "grade": true, "grade_id": "cell-e34eff454c51eedd", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 2.2\", expect_known_hash(as.integer(gap_nr_countries), \"3b6d002135d8d45a3c5f4a9fb857c323\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "fca7a6d0d7888f86dcadcf286cdaf56d", "grade": false, "grade_id": "cell-6fe136fd3a220c57", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 2.3\n", "\n", "Consider we are only interested in the following 5 countries: Egypt, Haiti, Romania, Thailand, and Venezuela.\n", "Create a new data frame with only these 5 countries and store it in variable `gap_5`. _Hint_: nothing new here -- use your dplyr knowledge!\n", "\n", "```\n", "gap_5 <- gapminder %>%\n", " FILL_THIS_IN(FILL_THIS_IN %in% c(\"FILL_THIS_IN\", \"FILL_THIS_IN\", \"FILL_THIS_IN\", \"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "bc2f4d1b202663df76a5a79622e9c121", "grade": false, "grade_id": "cell-46503448ea2a9cb9", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(gap_5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 2, "nbgrader": { "cell_type": "code", "checksum": "b64d97ece2323c943482bdcf2c5f1695", "grade": true, "grade_id": "cell-caed098411987014", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 2.3\", {\n", " expect_known_hash(dim(gap_5), \"6c0f8c2a8d488051f33fc89b2c327dcd\")\n", " expect_known_hash(table(gap_5$country), \"05b8ca3033e94f96b9ec5422a69c1207\")\n", "})" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "7ed381661b1550f7be814d56cc95de47", "grade": false, "grade_id": "cell-4ad94aa7ee66ed58", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 2.4\n", "\n", "However, subsetting the data set does not affect the levels of the factors.\n", "The column `country` in tibble `gap_5` still has the same number of levels as in the original data frame.\n", "\n", "Your task: create a new tibble from `gap_5`, where all unused levels from column `country` are dropped. _Hint_: use the `droplevels()` function. Store new new tibble in variable `gap_5_dropped`.\n", "\n", "By way of demonstration, check the number of levels in the \"country\" column before and after the change -- we've included the code for this for you.\n", "\n", "```\n", "nlevels(gap_5$country)\n", "gap_5_dropped <- FILL_THIS_IN(FILL_THIS_IN)\n", "nlevels(gap_5_dropped$country)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "1ca53dfb9b1eb9799849178862b07658", "grade": false, "grade_id": "cell-7e52beeb587753f4", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(gap_5_dropped)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "1d9ba781006b505bfe882f4271047b99", "grade": true, "grade_id": "cell-806b19b02e2333e2", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 2.4\", expect_known_hash(sort(levels(gap_5_dropped$country)), \"ac97b9af845a59395697b028c5121503\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "31641405c70f672717e4e25f0b294e00", "grade": false, "grade_id": "cell-1d6499c6b3e1bea1", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 2.5\n", "\n", "The factor levels of column `continent` in data frame `gapminder` are ordered alphabetically.\n", "Create a new data frame, with the levels of column `continent` in *increasing* order according to their frequency (i.e., the number of rows for each continent).\n", "Store the new data frame in variable `gap_continent_freq`. *Hint*: Use `fct_infreq()` and `fct_rev()`.\n", "\n", "```\n", "gap_continent_freq <- gapminder %>%\n", " mutate(continent = FILL_THIS_IN(FILL_THIS_IN(continent)))\n", "```\n", "\n", "**Hint**: The first `FILL_THIS_IN` corresponds to a `fct_*` function that reverses the levels of the factors. The second `FILL_THIS_IN` correspond to a `fct_*` function that orders the levels by *decreasing* frequency." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "60c7da198c7e04c9b493fa3bb0cbefd3", "grade": false, "grade_id": "cell-041e1b9fdf167cc9", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(gap_continent_freq)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "c3828180276afc0102a4978f9731e8a0", "grade": true, "grade_id": "cell-b38804e6a06d9de3", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 2.5\", expect_known_hash(table(gap_continent_freq$continent), \"0bb23ea87ce71deb5452eaae8cdbf7cf\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "20b5f11b6f24b9fed13a6135737a95b0", "grade": false, "grade_id": "cell-ff00d58b5fb34ad7", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "FYI: You can't \"see\" any difference in the tibble, but there are _attributes_ behind the hood keeping track of the order of the \"continent\" entries. You _can_ see the difference, however, in a plot, as below. Notice how the x-axis is no longer ordered alphabetically." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "c9da62f05387398682067d5af42f4019", "grade": false, "grade_id": "cell-1317f4d18c821807", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "ggplot(gap_continent_freq, aes(continent)) + geom_bar()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "9ed29c3ed2da088569d8acbec4466bf3", "grade": false, "grade_id": "cell-b8379204e1f83944", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 2.6\n", "\n", "Again based on the `gapminder` data set, create another data frame, with the levels of column `continent` in *increasing* order of their average life expectancy (from column `lifeExp`).\n", "Store the new data frame in variable `gap_continent_life`. _Hint_: use `fct_reorder()`.\n", "\n", "```\n", "gap_continent_life <- gapminder %>%\n", " mutate(continent = FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN, FILL_THIS_IN))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "7cb6057e8abd5a4083f9c08f43411355", "grade": false, "grade_id": "cell-d9568dc0d8c17add", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(gap_continent_life)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "b4c54847aff6c135e62d54b8a27d82ed", "grade": true, "grade_id": "cell-7afaeb0beeff31e0", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 2.6\", expect_known_hash(table(gap_continent_life$continent), \"7688676a0807063f1bfa5b4cc721c2d9\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "9a928e67b020017df4a9d01eaa41596e", "grade": false, "grade_id": "cell-c3992342f9ce1001", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Again, you can't \"see\" any difference in the tibble. But here's a plot that makes the difference clearer. Notice the ordering of the x-axis." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6d289297f5bed91afe065f8082874b65", "grade": false, "grade_id": "cell-50e1fa7db691ef53", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "ggplot(gap_continent_life, aes(continent, lifeExp)) + geom_boxplot()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "aa5ff73ce33259884777c352474405b2", "grade": false, "grade_id": "cell-99aa45fbba8d8199", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 2.7\n", "\n", "Consider now you want to make comparisons between countries, relative to Canada.\n", "Create a new data frame, with the levels of column `country` rearranged to have Canada as the first one.\n", "Store the new data frame in variable `gap_canada_base`.\n", "\n", "```\n", "(gap_canada_base <- gapminder %>%\n", " mutate(country = FILL_THIS_IN(FILL_THIS_IN, \"FILL_THIS_IN\")))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "8cd80b87102d4581e588fd2a73ced4f5", "grade": false, "grade_id": "cell-636b191a444f10bf", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(gap_canada_base)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "9e04c74ef9f16087badabdd4d35b0a5e", "grade": true, "grade_id": "cell-de2504ab380c7da4", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 2.7\", expect_known_hash(table(gap_canada_base$country), \"72d75ce05a16d8965f7bd0ae3fb986d3\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "9d58e265ab80ca22275c8b12936eaebd", "grade": false, "grade_id": "cell-ab318aa41d0d5f13", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Take a look at the levels of the \"country\" factor, and you'll now see Canada first:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "efc00230990576f114feeda6ffd242c5", "grade": false, "grade_id": "cell-81562f5b59805bcf", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "gap_canada_base %>% \n", " pull(country) %>% \n", " levels()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "c9df0f3698f933b15811ea39f8968d4e", "grade": false, "grade_id": "cell-4474329a11f182dd", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 2.8\n", "\n", "Sometimes you want to manually change a few factor levels, e.g., if the level is too wide for plotting.\n", "Based on the `gapminder` data set, create a new data frame with the Central African Republic renamed to *Central African Rep.* and Bosnia and Herzegovina renamed to *Bosnia & Herzegovina*.\n", "Store the new data frame in variable `gap_car`. _Hint_: use `fct_recode()`.\n", "\n", "```\n", "gap_car <- gapminder %>%\n", " mutate(country = FILL_THIS_IN(FILL_THIS_IN, \"Central African Rep.\" = \"FILL_THIS_IN\",\n", " \"Bosnia & Herzegovina\" = \"FILL_THIS_IN\"))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "ce084968f1e3785c1637a76d3df88371", "grade": false, "grade_id": "cell-57746edd3f5caa7d", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(gap_car)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 2, "nbgrader": { "cell_type": "code", "checksum": "8238a1b5e4f9338ffcd0652717b9b4cf", "grade": true, "grade_id": "cell-339f3fc46993a445", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 2.8\", expect_known_hash(table(gap_car$country), \"9cc15f09cb70b5596bbf3feaa73ee471\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "markdown", "checksum": "debd146fdf7bd7f964caf53e5664791c", "grade": false, "grade_id": "cell-ebce2cf9fcfbd426", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Part 3: Tibble Joins\n", "\n", "At the start of this worksheet, we loaded a couple datasets from the [singer](https://github.com/JoeyBernhardt/singer) package, and called them `time` and `album`. These two data sets contain information about a few popular songs and albums.\n", "\n", "We'll practice various joins using these two datasets. You'll need to find out which join is appropriate for each case!\n", "\n", "Run the following R codes to look at the two data sets:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "75a59fdaeb80fc4672eb7cfe95760f4a", "grade": false, "grade_id": "cell-e42385546e19be2e", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "head(time)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "4f30e20a803c7da4a45fa2ce0b832722", "grade": false, "grade_id": "cell-43a2ee99557fd52e", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "head(album)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "35472bba0705175e11fe420529003810", "grade": false, "grade_id": "cell-81082dfacbaa989f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 3.1\n", "\n", "Create a new data frame containing all songs from `time` that have a corresponding album in the `album` dataset, while also adding the album information. Store the joined data set in variable `songs_with_album`.\n", "\n", "```\n", "songs_with_album <- time %>% \n", " FILL_THIS_IN(FILL_THIS_IN, by = c(\"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "5cc779ee29fc8f53bff600d1517061ca", "grade": false, "grade_id": "cell-7d3c34b0dd2dff5c", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(songs_with_album)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "666f98d3da6a8105b0434ab63a65b016", "grade": true, "grade_id": "cell-e851d23c3d11b3bd", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 3.1\", {\n", " expect_known_hash(sort(songs_with_album$song), \"146ff293a74ccc1ad24505a6bc0b6682\")\n", " expect_known_hash(table(songs_with_album$artist_name), \"51f7daeec65e839e5ae6c84ac5a1cb70\")\n", "})" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "0b3ef8ada4ced8c99e905200c5f4e101", "grade": false, "grade_id": "cell-bfbeb7e5c7c5da68", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 3.2\n", "\n", "Go ahead and add the corresponding albums to the `time` tibble, being sure to preserve rows even if album info is not readily available.\n", "Store the joined data set in variable `all_songs`.\n", "\n", "```\n", "all_songs <- time %>% \n", " FILL_THIS_IN(FILL_THIS_IN, by = c(\"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "b69ab410a37f5b67909c472e8b372518", "grade": false, "grade_id": "cell-05f5f08439831d83", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(all_songs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "5b3a04b4745c60a770421763b1891c87", "grade": true, "grade_id": "cell-3add15a64f96339a", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 3.2\", {\n", " expect_known_hash(sort(all_songs$song), \"dd1c0b2e14a879cb1a6f07077ed38e97\")\n", " expect_known_hash(all_songs$album[order(all_songs$song)], \"2baea3c1a23797fdac5a9e0dc119073e\")\n", "})" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "4b4e8a80b05c8cdf79e203f51f10d3d7", "grade": false, "grade_id": "cell-eff281d8aeda3161", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 3.3: Joining Rows by Columns\n", "\n", "Create a new tibble with songs from `time` for which there is no album info.\n", "Store the new data set in variable `songs_without_album`.\n", "\n", "```\n", "songs_without_album <- time %>% \n", " FILL_THIS_IN(FILL_THIS_IN, by = c(\"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "93ae01956cda7b91c33d53d66815ef03", "grade": false, "grade_id": "cell-bd6ca35fe501ec50", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(songs_without_album)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "9d489d87cc1e4da2ab854637d4e17f2b", "grade": true, "grade_id": "cell-0c5e0fec11dfa949", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 3.3\", expect_known_hash(sort(songs_without_album$song), \"3e6a210ad915fb07eb7e894a7ca0e856\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3.4\n", "\n", "Create a new tibble with all songs from artists for whom there is no album information.\n", "Store the new data set in variable `songs_artists_no_album`.\n", "\n", "```\n", "songs_artists_no_album <- time %>% \n", " FILL_THIS_IN(FILL_THIS_IN, by = \"FILL_THIS_IN\")\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "6341db6003ef5b98261f5bf21bcaa4ef", "grade": false, "grade_id": "cell-129b24ab36ddde03", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(songs_artists_no_album)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "18489049f84f807a1e708ae07a1f0e2c", "grade": true, "grade_id": "cell-3452c71d7eb8391d", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 3.4\", expect_known_hash(table(songs_artists_no_album$artist_name), \n", " \"244510c51477c31e6e795cbc0ca0b0d7\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "96b61edf9845504319d9cdb65bb84990", "grade": false, "grade_id": "cell-7fec11ba9e292ff8", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 3.5\n", "Create a new tibble with all the information from both tibbles, regardless of no corresponding information being present in the other tibble.\n", "Store the new data set in variable `all_songs_and_albums`.\n", "\n", "```\n", "all_songs_and_albums <- time %>% \n", " FILL_THIS_IN(FILL_THIS_IN, by = c(\"FILL_THIS_IN\", \"FILL_THIS_IN\"))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "c3d7b242129fe696c878dff7ca1d3387", "grade": false, "grade_id": "cell-17f8124265baf8e0", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "head(all_songs_and_albums)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "119604ad12e607f8ceae8ac1fc3d5e43", "grade": true, "grade_id": "cell-0bae4a61eb231bdc", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 3.5\", {\n", " expect_known_hash(sort(all_songs_and_albums$song), \"ba2ba3507e50c56d21028893404259a5\")\n", " expect_known_hash(with(all_songs_and_albums, album[order(song)]), \"dbc70af8d3078ea830be9cfb0dee6b9d\")\n", " expect_known_hash(with(all_songs_and_albums, year[order(song)]), \"10669b0750ab4d53b54f0e509430e2d1\")\n", "})" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "a80eaac62f4f6cc387b81e6cc71ff526", "grade": false, "grade_id": "cell-3bca1c6621e1410f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Part 4: Concatenating Rows\n", "\n", "At the start of the worksheet, we loaded three Lord of the Rings datasets (one for each of the three movies). Run the following R codes to take a look at the 3 tibbles:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "cf3c8b40799f4e0901e1ba86eed4b994", "grade": false, "grade_id": "cell-83ad6b3db4ab34cd", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "fell" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "735e681bd18461f3016d948b43c8787a", "grade": false, "grade_id": "cell-d6e0b8ca24c1e571", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "ttow" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "5829d1044a023667ebcb840e4e1b02c2", "grade": false, "grade_id": "cell-35c54be74e77d971", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "retk" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "6df4393256ea2618a5bda092ee43a195", "grade": false, "grade_id": "cell-dccbf8195a0dae5c", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 4.1\n", "\n", "Combine the three data sets into a single tibble, storing the new tibble in variable `lotr`.\n", "\n", "```\n", "lotr <- FILL_THIS_IN(fell, ttow, retk)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "c124fce98a9b8d0005c5af1541b6b970", "grade": false, "grade_id": "cell-ef7784a633e9d4cb", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "print(lotr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "0fc85ffc26aa13c84a427df1c7b6f11e", "grade": true, "grade_id": "cell-f900e47761160711", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 4.1\", expect_known_hash(table(lotr$Film), \"41c29122f6c217d447e85a9069f5a92f\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "markdown", "checksum": "26509d11eadc1f644d5e98d3f9e1d9d5", "grade": false, "grade_id": "cell-d1bd923a8a3c9a16", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Part 5: Set Operations\n", "\n", "Let's use three set functions: `intersect()`, `union()` and `setdiff()`.\n", "They work for data frames with the same column names.\n", "\n", "We'll work with two toy tibbles named `y` and `z`, similar to the Data Wrangling Cheatsheet.\n", "\n", "Run the following R codes to create the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "70773086505cfcc2bf778def8428e9e8", "grade": false, "grade_id": "cell-baae74bab6546360", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "(y <- tibble(x1 = LETTERS[1:3], x2 = 1:3))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "ef560d32a3416dbb9632c4d77c2af662", "grade": false, "grade_id": "cell-2b7351a73648122a", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "(z <- tibble(x1 = c(\"B\", \"C\", \"D\"), x2 = 2:4))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "282f5a3dd799ea922bbf17d055d8ac2c", "grade": false, "grade_id": "cell-5e730724ecd74680", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 5.1\n", "\n", "Use one of the three methods mentioned above to create a new data set which contains all rows that appear in both `y` and `z`.\n", "Store the new data frame in variable `in_both`\n", "\n", "```\n", "in_both <- FILL_THIS_IN(y, z)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "f34d6e1fa9f1bb1cd7d083f3d51d5f8e", "grade": false, "grade_id": "cell-e267783b3599e511", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "in_both" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "033a343ffb57c19e288b52e0f3a24448", "grade": true, "grade_id": "cell-52d241c8bca32dad", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 5.1\", expect_known_hash(in_both$x1, \"745ec49ab3231655a04484be44a15f98\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "fedddaca18f2be0fd68c2f848c31ce32", "grade": false, "grade_id": "cell-26dd7f2f3531aef8", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 5.2\n", "Assume that rows in `y` are from *Day 1* and rows in `z` are from *Day 2*.\n", "Create a new data set with all rows from `y` and `z`, as well as an additional column `day` which is *Day 1* for rows from `y` and *Day 2* for rows from `z`.\n", "Store the new data set in variable `both_days`.\n", "\n", "```\n", "both_days <- FILL_THIS_IN(\n", " mutate(y, day = \"Day 1\"),\n", " mutate(z, day = \"Day 2\")\n", ")\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "4845d5d3090b34c53e1e9139ef317f7c", "grade": false, "grade_id": "cell-cb58ae9785411629", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "both_days" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "490b32df7c2550866c72f06573441b2e", "grade": true, "grade_id": "cell-50e235b2f53237e1", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 5.2\", expect_known_hash(with(both_days, x1[order(x2, day)]), \"66b9eefd39c2f0b5d130453c139a2051\"))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "8587d824ff4b560782dd97fd5dd84d4f", "grade": false, "grade_id": "cell-199323d324c8580e", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Question 5.3\n", "\n", "The rows contained in `z` are bad.\n", "Use one of the three methods mentioned above to create a new data set which contains only the rows from `y` which are not in `z`.\n", "Store the new data frame in variable `only_y`\n", "\n", "```\n", "only_y <- FILL_THIS_IN(y, z)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", "checksum": "02a4edeff864160a7265c3968b7fc732", "grade": false, "grade_id": "cell-f23a4f7c749f9a42", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "only_y" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "8ef3b98e9a5b5ad79b39ff547de4e1bd", "grade": true, "grade_id": "cell-99b0c66bc2260c3a", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 5.3\", expect_known_hash(only_y$x1, \"75f1160e72554f4270c809f041c7a776\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Attribution\n", "\n", "Assembled by Almas Khan and Vincenzo Coia, reviewed by Diana Lin, and assisted by David Kepplinger." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all" }, "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.1.1" } }, "nbformat": 4, "nbformat_minor": 4 }