{ "cells": [ { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-8b26929bab50eea3", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Worksheet 01A: Intro to R, R Markdown, and Reproducibility\n", "**Version 2.0**\n", "\n", "\n", "## Welcome to STAT545!\n", "\n", "We hope that you are excited to become an R pro during the next few weeks! These in-class worksheets have been designed to help you navigate this R journey. We'll start easy, with some examples of R commands, and evolve to the more complex - and arguably cooler - syntax and structures of the R language.\n", "\n", "## An important note\n", "\n", "**Submission of this worksheet is optional**. Future worksheets **must** be submitted for participation marks. \n", "\n", "Lectures will mostly involve going through these worksheets and giving you the answers (yes, before the deadline). We suggest going through the worksheets before coming to class so that you can find out where you get stuck.\n", "\n", "## Attributions\n", "\n", "This document was primarily put together by Icíar Fernández Boyano. \n", "\n", "The following resources were used as inspiration in the creation of this worksheet:\n", "\n", "+ [Swirl R Programming Tutorial](https://swirlstats.com/scn/rprog.html)\n", "+ [A (very) short introduction to R](https://github.com/ClaudiaBrauer/A-very-short-introduction-to-R/blob/master/documents/A%20(very)%20short%20introduction%20to%20R.pdf)\n", "+ [Happy Git and GitHub for the useR](https://happygitwithr.com/)\n", "+ [2019 STAT545 Guidebook](https://stat545guidebook.netlify.app/index.html)\n", "+ [Jenny Bryan's STAT545 Guidebook](https://stat545.com/)\n", "\n", "## Getting Started\n", "\n", "Load the required add-on packages for this assignment by running the following code chunk (or _cell_). In Jupyter, you can load the chunk by clicking on the chunk, and clicking the \"Run\" button (keyboard shortcut: Command + Enter on a Mac, or Control + Enter on Windows). _If this fails, read on..._" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-e7cc0f456605c076", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "library(testthat)\n", "library(digest)" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-e4d5c23517e65a5a", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Did that fail for you? It could be that you don't have those packages installed. The following code chunk has been unlocked for you (did you notice that you weren't able to edit the above cells?), so you can use it to install these packages, or to generally just give you the flexibility to start this worksheet with some of your own code. To install the \"testthat\" package, execute the command `install.packages(\"testthat\")`; what would you need to type to install the \"digest\" package?\n", "\n", "**Please be sure to remove any `install.packages` commands after you've run them**: once you've successfully _installed_ a package with `install.packages`, the package is permanently installed on your computer. \n", "\n", "To _load_ the packages for use in this R session, try executing the above code chunk again." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# An unlocked code chunk." ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-0da508f15814e7c2", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "In Episode 01A of the [STAT 545 video series](https://www.youtube.com/channel/UCrB-uourf2vxGeBnGjQrA0w), RStudio was mentioned as being an IDE for R. You're probably viewing this worksheet in another IDE called **jupyter**. We're using jupyter for the STAT 545 worksheets because it works well with an autograder called nbgrader." ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-d11d3973ab3460a1", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Getting Familiar with R\n", "\n", "### 1.1 Calculator\n", "\n", "In its simplest form, R can be used as a interactive calculator. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "name": "example-1-0", "nbgrader": { "grade": false, "grade_id": "cell-02e613b3a71a2045", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "14" ], "text/latex": [ "14" ], "text/markdown": [ "14" ], "text/plain": [ "[1] 14" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "6" ], "text/latex": [ "6" ], "text/markdown": [ "6" ], "text/plain": [ "[1] 6" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "2" ], "text/latex": [ "2" ], "text/markdown": [ "2" ], "text/plain": [ "[1] 2" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "10" ], "text/latex": [ "10" ], "text/markdown": [ "10" ], "text/plain": [ "[1] 10" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "81" ], "text/latex": [ "81" ], "text/markdown": [ "81" ], "text/plain": [ "[1] 81" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "10 + 4 # you can add \n", "10 - 4 # subtract\n", "4 / 2 # divide\n", "2 * 5 # multiply\n", "3 ^ 4 # and exponentiate" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-ab5c3cec51c4ac26", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Now, what if you need to compute a longer expression? Let's say that I want to find out the percentage of students in the STAT department that are taking STAT545A (note: these numbers are fictional!). I could compute this in several steps, or use a more complex expression. \n", "\n", "**Using multiple steps...**\n", "\n", "+ To calculate the number of students in the STAT department, I add 375 new students that have enrolled this year, to the 2000 that were already enrolled." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "name": "example-1-1a", "nbgrader": { "grade": false, "grade_id": "cell-3a342455e968efc2", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "2375" ], "text/latex": [ "2375" ], "text/markdown": [ "2375" ], "text/plain": [ "[1] 2375" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "2000 + 375" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-ccd48654544fd353", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "+ There are 82 students taking STAT545A this year. Last year, there was the same number of students, but 3 dropped the course after the first two weeks. Let's hypothesise that only 1 will drop the course this year - although I hope the real number is 0 :)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "name": "example-1-1b", "nbgrader": { "grade": false, "grade_id": "cell-23dd9f564b57ea4f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "81" ], "text/latex": [ "81" ], "text/markdown": [ "81" ], "text/plain": [ "[1] 81" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "82 - 1" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-345a371ec8621190", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "+ With the number of students taking STAT545 this year (hypothetically), and the number of students currently in the STAT department, I can now calculate what percentage of students in the STAT department are taking this class." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "name": "example-1-1c", "nbgrader": { "grade": false, "grade_id": "cell-f85728ee7af8d6bf", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "0.0341052631578947" ], "text/latex": [ "0.0341052631578947" ], "text/markdown": [ "0.0341052631578947" ], "text/plain": [ "[1] 0.03410526" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "81 / 2375" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "name": "example 1-1d", "nbgrader": { "grade": false, "grade_id": "cell-a9bfe516af8ca200", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "3.410526" ], "text/latex": [ "3.410526" ], "text/markdown": [ "3.410526" ], "text/plain": [ "[1] 3.410526" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "0.03410526 * 100" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-42ec2ad158bd9526", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**What if we use a single expression?**\n", "\n", "It seems that around 3% of students in the STAT department are taking STAT545A... but that took *a lot* of steps to calculate. We could also write it like this to save some time:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "name": "example-1-2", "nbgrader": { "grade": false, "grade_id": "cell-1998144f9345aaf9", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "3.41052631578947" ], "text/latex": [ "3.41052631578947" ], "text/markdown": [ "3.41052631578947" ], "text/plain": [ "[1] 3.410526" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "(82 - 1) / (2000 + 375) * 100 " ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-8aa79a95beade98c", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "As you can see, *taking care of precedence rules* (i.e. using brackets appropriately), we can save some time by writing a single expression.\n", "\n", "Your turn! Can you calculate the percentage of your life that you have spent in university? \n", "\n", "Compute the difference between 2020 and the year that you started university, and divide this by the difference between 2020 and the year that you were born. Multiply this with 100 to get the percentage of your life that you have spent in university. Your *challenge* here is to use a single expression." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "name": "activity-1-2" }, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-1d3d407b175d3580", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "### 1.2 Variables\n", " \n", "Alright, R as a calculator works just fine... but you don't learn a programming language *only* to compute arithmetic expressions. What if you want to use your result from above in a second calculation? Instead of retyping your expression every time that you need it, or copying and pasting the result, you can simply create a new variable that stores it. \n", "\n", "Earlier, I figured out that I had spent 18% of my life at university. I want to assign this value to a variable called `life_university`, which will help me remember what my value means. The way you assign a value to a variable in R is by using the assignment operator, which is just a \"less than\" symbol, followed by a minus sign. It looks like this:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "name": "example-1-3a", "nbgrader": { "grade": false, "grade_id": "cell-0f70f29b98cfc79a", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "life_university <- 18" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-b8bde8ad059c03e2", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Now, the variable `life_university`, stores the value 18, which is the percentage of time that I had spent at university. But prior to saving this into a variable, I had to calculate the value separately. What if I directly assigned the arithmetic expression that I used to compute my value to the variable?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "name": "example-1-3b", "nbgrader": { "grade": false, "grade_id": "cell-e943d9cf804fdf52", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "life_university <- (2020 - 2016) / (2020 - 1998) * 100" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-9eeb2f90831c3118", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Notice that R did not print the result of my expression this time. When you use the assignment operator, R assumes that you don't want to see the result immediately, but rather that you intend to use it for something else later on. \n", "\n", "To view the contents of the variable, you simply have to type the name of the variable - in this case, `life_university` and press Enter. Try it below!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "name": "activity-1-3" }, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-dbf010f638a0b404", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.0**\n", "\n", "Now, it's your turn to store the percentage of time that **you** have spent at university into a variable - try typing the arithmetic expression that you used to compute that value, rather than the value itself! Name this variable `my_life_university` in the first cell below, and check whether the answer is acceptable by running the second cell below. If the test cell gives you an error, try a different answer!\n", "\n", "```\n", "my_life_university <- FILL_THIS_IN / FILL_THIS_IN * 100\n", "```" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "name": "question-1-0", "nbgrader": { "grade": false, "grade_id": "cell-04e950ccf3313117", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "my_life_university <- 12 / 33 * 100 # Any number between 0 and 100.\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "name": "test-1-0", "nbgrader": { "grade": true, "grade_id": "cell-cebd6f6df5bdb3cc", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Question 1.0\", {\n", " expect_gte(my_life_university, 0)\n", " expect_lte(my_life_university, 100)\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-93af8d6b89c2ed33", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "### 1.3 Data structures\n", "\n", "Any object that contains data is called a data structure.\n", "\n", "### 1.3.1 Vectors\n", "\n", "#### Numeric vectors\n", "\n", "So far, you've learned how to use R as a calculator, and how to use variables to store numeric values. But in reality, a \"variable\" in R is just a way to name your data so that R can recall it later. Think of it as a label that you put on a box, so that you remember the contents that are inside it. \n", "\n", "The variable that you created above, `my_life_university`, stores the most basic data structure in R programming language: a vector. Even a single number is considered a vector of length one, which is the case with the vector that was assigned to `my_life_university`. Let's have a look again:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "name": "example-1-3", "nbgrader": { "grade": false, "grade_id": "cell-df8e5f769d89a75b", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "36.3636363636364" ], "text/latex": [ "36.3636363636364" ], "text/markdown": [ "36.3636363636364" ], "text/plain": [ "[1] 36.36364" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "my_life_university" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-f825e25671f73f81", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "In this way, you can think of the vector as the data structure, and the variable as a label. But what if you want a vector that's greater than length one, or in other words, that stores more than a single numeric value? The easiest way to create a vector is using `c()`, which stands for \"concatenate\", or \"combine\". \n", "\n", "**QUESTION 1.1**\n", "\n", "Let's give it a try. To create a vector containing the numbers 3.14, 2.71, and 6.28, type `c(3.14, 2.71, 6.28)`. Store the result in a variable called `x`. \n", "\n", "```\n", "x <- FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN, FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "name": "answer-1-1", "nbgrader": { "grade": false, "grade_id": "cell-0d87e7f0d19a4cda", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "x <- c(3.14, 2.71, 6.28)\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "lines_to_next_cell": 2, "name": "test-1-1", "nbgrader": { "grade": true, "grade_id": "cell-76c5ca05ef0dfe64", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Question 1.1\", {\n", " expect_equal(digest(x), \"d696b13d28ab63409f1f528a2d37bb0e\")\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-f9bab0fdef44f45e", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Now, type `x` and press Enter to view its contents. Notice that there are no commas separating the values in the output!" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "name": "activity-1-4" }, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-5200389128901c83", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "You can combine several vectors to make a new vector. And here is where things get fun! For the sake of seeing the result immediately, we won't store this combined vector in a new variable for now." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "name": "example-1-6", "nbgrader": { "grade": false, "grade_id": "cell-e71c214be4ee55a2", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 10
  2. 50
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 10\n", "\\item 50\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 10\n", "2. 50\n", "\n", "\n" ], "text/plain": [ "[1] 10 50" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "c(10, 50)" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-ac73a3512f6cace0", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "And what's more: you can combine any numeric vectors together, regardless of whether they have already been assigned to a variable or not!" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "name": "example-1-7", "nbgrader": { "grade": false, "grade_id": "cell-130366e4f3577c91", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 3.14
  2. 2.71
  3. 6.28
  4. 50
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 3.14\n", "\\item 2.71\n", "\\item 6.28\n", "\\item 50\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 3.14\n", "2. 2.71\n", "3. 6.28\n", "4. 50\n", "\n", "\n" ], "text/plain": [ "[1] 3.14 2.71 6.28 50.00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "c(x, 50)" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-15649fdc5348c402", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.2**\n", "\n", "Your turn to give it a try. Create a new vector that contains `life_university`, `my_life_university`, and `25`. Store your result in a variable named `answer1.2`\n", "\n", "```\n", "answer1.2 <- FILL_THIS_IN(FILL_THIS_IN, FILL_THIS, FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-61d5339d601052e3", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "### BEGIN SOLUTION\n", "answer1.2 <- c(life_university, my_life_university, 25) \n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "lines_to_next_cell": 2, "nbgrader": { "grade": true, "grade_id": "cell-57e9c5dc76061b99", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "test_that(\"Question 1.1\", {\n", " expect_identical(answer1.2[1L], life_university)\n", " expect_identical(answer1.2[2L], my_life_university)\n", " expect_equal(answer1.2[3L], 25)\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-d735b7d8bb863e07", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "One more cool thing before we go on: numeric vectors can be used in arithmetic expressions. Remembering the vector that we created earlier and assigned to the variable `x`? Let's have a look at it again." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "name": "example-1-8", "nbgrader": { "grade": false, "grade_id": "cell-fc6dd20fb255df19", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 3.14
  2. 2.71
  3. 6.28
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 3.14\n", "\\item 2.71\n", "\\item 6.28\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 3.14\n", "2. 2.71\n", "3. 6.28\n", "\n", "\n" ], "text/plain": [ "[1] 3.14 2.71 6.28" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-0fd8e0ec44a7e215", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.3**\n", "\n", "Here's a fun fact: those three numbers are actually pi, euler's number, and tau. But that's a story for another course! Type the following to see what happens: `x * 2 + 100`... Actually, **wait!** What do **you** think will be the result of doing that?\n", "\n", "1: a vector of length three\n", "2: a single number (a vector of length 1)\n", "3: a vector of length 0 (i.e an empty vector)\n", "\n", "Assign your answer (`1`, `2`, or `3`) to a variable named `answer1.3`." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "name": "question-1-2", "nbgrader": { "grade": false, "grade_id": "cell-4ea46a88f826bae6", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "#answer1.3 <- youranswer\n", "### BEGIN SOLUTION\n", "answer1.3 <- 1\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "lines_to_next_cell": 2, "name": "answer-1-2", "nbgrader": { "grade": true, "grade_id": "cell-9c92d0ea4e3f081a", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(as.integer(answer1.3)), \n", " \"4b5630ee914e848e8d07221556b0a2fb\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-d3aa19d258f6f70c", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Let's see what actually happens. Type `x * 2 + 100` and press Enter." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "name": "activity-1-5" }, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-e1c1527b81a91bf6", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "First, R multiplied each of the three elements in `x` by 2. Then, it added 100 to each element to get the result that you see.\n", "\n", "#### Logical vectors\n", "\n", "So far we have only dealt with **numeric** vectors. But there are other types of vectors in the R universe. Let's have a look.\n", "\n", "**QUESTION 1.4**\n", "\n", "Enough of university, let's talk about vacation! A group of friends are discussing the places that they visited in 2019, and trying to figure out how much total vacation time each of them took. Pablo says he took 54 days off to travel locally, Dana was on vacation for only 14 days, and Marianne went to the Caribbean for 30 days.\n", "\n", "Create a vector that contains the values of Pablo, Dana, and Marianne's vacation days, respectively. Assign it to a variable named `vacation_time`.\n", "\n", "```\n", "vacation_time <- FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN, FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "name": "question-1-3", "nbgrader": { "grade": false, "grade_id": "cell-bf017d7e63aa3be5", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "vacation_time <- c(54, 14, 30)\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "lines_to_next_cell": 2, "name": "test 1-3", "nbgrader": { "grade": true, "grade_id": "cell-103585f25bdefe8c", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(as.integer(vacation_time)), \n", " \"8336872ae5cc234b1c1574e27d863ebb\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-f9ff62ffcfa6cfdd", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.5**\n", "\n", "Which person was on vacation for more than 21600 minutes? First, create a numeric vector that multiplies the `vacation_time` vector by 1440 (the number of minutes in a day), to find out what each person's vacation time is *in minutes*. Assign this to a variable named `vacation_time_minutes`.\n", "\n", "```\n", "vacation_time_minutes <- FILL_THIS_IN * FILL_THIS_IN\n", "```" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "name": "question-1-4", "nbgrader": { "grade": false, "grade_id": "cell-eff123da37be623e", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "vacation_time_minutes <- vacation_time * 1440\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "name": "test-1-4", "nbgrader": { "grade": true, "grade_id": "cell-aa364a693337a118", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(as.numeric(vacation_time_minutes)),\n", " \"ce79c61a9b5bd2b5bf4b4def95455438\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-c564977e2f838536", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.6**\n", "\n", "Now, create a variable called `under_21600` that gets the result of `vacation_time_minutes > 21600`, which is read as 'vacation_time_minutes is more than 21600'.\n", "\n", "```\n", "under_21600 <- FILL_THIS_IN > FILL_THIS_IN\n", "```" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "name": "question-1-5", "nbgrader": { "grade": false, "grade_id": "cell-1dc4a1742d34d0d7", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "under_21600 <- vacation_time_minutes > 21600\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "lines_to_next_cell": 2, "name": "test-1-5", "nbgrader": { "grade": true, "grade_id": "cell-f62760b95bea63e2", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(under_21600),\n", " \"4f00878a54c541bdbf07c006a9d412dc\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-abc0b44d2845a5af", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Have a look at the output of `under_21600` by typing the name and pressing Enter." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "name": "activity-1-6" }, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-4d1d9ba986efdbdd", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Congratulations! You've created your first **logical vector**. Logical vectors can contain the values `TRUE`, `FALSE`, and `NA` (for 'not available' - this happens when you have missing data!). These values are generated as the result of logical 'conditions'. We have seen the logical operator \"greater than\" in this activity, but there are [many more](https://www.statmethods.net/management/operators.html), such as \"less than\", \"exactly equal to\", or \"not equal to\". Don't worry, there will be plenty of time to use those in the future!\n", "\n", "#### ...and more\n", "\n", "There are other types of vectors out there in the R universe, such as character vectors. We won't get into the nitty gritty of these - logical and numeric are the most basic R vectors, the ones that you absolutely need to know & that we will use most often. However, we didn't want to leave you in the dark about these other types of vectors! If you really, really want to know more, you can read [more about vectors](https://r4ds.had.co.nz/vectors.html) here.\n", "\n", "Anyway, here is a handy tip! If you ever come across a vector and you're not sure what it is, you can inspect its two key properties: type, and length. Here is an example of how you would do it. *\"Double\" is just a type of numeric vector.*" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "name": "example-1-9", "nbgrader": { "grade": false, "grade_id": "cell-ed6eca0c2a79760f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "'double'" ], "text/latex": [ "'double'" ], "text/markdown": [ "'double'" ], "text/plain": [ "[1] \"double\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "3" ], "text/latex": [ "3" ], "text/markdown": [ "3" ], "text/plain": [ "[1] 3" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "typeof(x)\n", "length(x)" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-08f89be93b548b8e", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "### 1.3.2 Dataframes\n", "\n", "Living in a vector-only world would be nice if all data analyses involved one variable. When we have more than one variable, data frames come to the rescue. Basically, a data frame holds data in tabular format. R has some data frames \"built in\". For example, motor car data is attached to the variable name mtcars.\n", "\n", "Print `mtcars` to screen. If I haven't mentioned before, \"print\" means to type the name of the object, and press Enter -- which is the same as surrounding the object with the `print()` function. Notice the tabular format." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "name": "activity-1-7", "nbgrader": { "grade": false, "grade_id": "cell-d9ce48db712f1b07", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 32 × 11
mpgcyldisphpdratwtqsecvsamgearcarb
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
Mazda RX421.06160.01103.902.62016.460144
Mazda RX4 Wag21.06160.01103.902.87517.020144
Datsun 71022.84108.0 933.852.32018.611141
Hornet 4 Drive21.46258.01103.083.21519.441031
Hornet Sportabout18.78360.01753.153.44017.020032
Valiant18.16225.01052.763.46020.221031
Duster 36014.38360.02453.213.57015.840034
Merc 240D24.44146.7 623.693.19020.001042
Merc 23022.84140.8 953.923.15022.901042
Merc 28019.26167.61233.923.44018.301044
Merc 280C17.86167.61233.923.44018.901044
Merc 450SE16.48275.81803.074.07017.400033
Merc 450SL17.38275.81803.073.73017.600033
Merc 450SLC15.28275.81803.073.78018.000033
Cadillac Fleetwood10.48472.02052.935.25017.980034
Lincoln Continental10.48460.02153.005.42417.820034
Chrysler Imperial14.78440.02303.235.34517.420034
Fiat 12832.44 78.7 664.082.20019.471141
Honda Civic30.44 75.7 524.931.61518.521142
Toyota Corolla33.94 71.1 654.221.83519.901141
Toyota Corona21.54120.1 973.702.46520.011031
Dodge Challenger15.58318.01502.763.52016.870032
AMC Javelin15.28304.01503.153.43517.300032
Camaro Z2813.38350.02453.733.84015.410034
Pontiac Firebird19.28400.01753.083.84517.050032
Fiat X1-927.34 79.0 664.081.93518.901141
Porsche 914-226.04120.3 914.432.14016.700152
Lotus Europa30.44 95.11133.771.51316.901152
Ford Pantera L15.88351.02644.223.17014.500154
Ferrari Dino19.76145.01753.622.77015.500156
Maserati Bora15.08301.03353.543.57014.600158
Volvo 142E21.44121.01094.112.78018.601142
\n" ], "text/latex": [ "A data.frame: 32 × 11\n", "\\begin{tabular}{r|lllllllllll}\n", " & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb\\\\\n", " & & & & & & & & & & & \\\\\n", "\\hline\n", "\tMazda RX4 & 21.0 & 6 & 160.0 & 110 & 3.90 & 2.620 & 16.46 & 0 & 1 & 4 & 4\\\\\n", "\tMazda RX4 Wag & 21.0 & 6 & 160.0 & 110 & 3.90 & 2.875 & 17.02 & 0 & 1 & 4 & 4\\\\\n", "\tDatsun 710 & 22.8 & 4 & 108.0 & 93 & 3.85 & 2.320 & 18.61 & 1 & 1 & 4 & 1\\\\\n", "\tHornet 4 Drive & 21.4 & 6 & 258.0 & 110 & 3.08 & 3.215 & 19.44 & 1 & 0 & 3 & 1\\\\\n", "\tHornet Sportabout & 18.7 & 8 & 360.0 & 175 & 3.15 & 3.440 & 17.02 & 0 & 0 & 3 & 2\\\\\n", "\tValiant & 18.1 & 6 & 225.0 & 105 & 2.76 & 3.460 & 20.22 & 1 & 0 & 3 & 1\\\\\n", "\tDuster 360 & 14.3 & 8 & 360.0 & 245 & 3.21 & 3.570 & 15.84 & 0 & 0 & 3 & 4\\\\\n", "\tMerc 240D & 24.4 & 4 & 146.7 & 62 & 3.69 & 3.190 & 20.00 & 1 & 0 & 4 & 2\\\\\n", "\tMerc 230 & 22.8 & 4 & 140.8 & 95 & 3.92 & 3.150 & 22.90 & 1 & 0 & 4 & 2\\\\\n", "\tMerc 280 & 19.2 & 6 & 167.6 & 123 & 3.92 & 3.440 & 18.30 & 1 & 0 & 4 & 4\\\\\n", "\tMerc 280C & 17.8 & 6 & 167.6 & 123 & 3.92 & 3.440 & 18.90 & 1 & 0 & 4 & 4\\\\\n", "\tMerc 450SE & 16.4 & 8 & 275.8 & 180 & 3.07 & 4.070 & 17.40 & 0 & 0 & 3 & 3\\\\\n", "\tMerc 450SL & 17.3 & 8 & 275.8 & 180 & 3.07 & 3.730 & 17.60 & 0 & 0 & 3 & 3\\\\\n", "\tMerc 450SLC & 15.2 & 8 & 275.8 & 180 & 3.07 & 3.780 & 18.00 & 0 & 0 & 3 & 3\\\\\n", "\tCadillac Fleetwood & 10.4 & 8 & 472.0 & 205 & 2.93 & 5.250 & 17.98 & 0 & 0 & 3 & 4\\\\\n", "\tLincoln Continental & 10.4 & 8 & 460.0 & 215 & 3.00 & 5.424 & 17.82 & 0 & 0 & 3 & 4\\\\\n", "\tChrysler Imperial & 14.7 & 8 & 440.0 & 230 & 3.23 & 5.345 & 17.42 & 0 & 0 & 3 & 4\\\\\n", "\tFiat 128 & 32.4 & 4 & 78.7 & 66 & 4.08 & 2.200 & 19.47 & 1 & 1 & 4 & 1\\\\\n", "\tHonda Civic & 30.4 & 4 & 75.7 & 52 & 4.93 & 1.615 & 18.52 & 1 & 1 & 4 & 2\\\\\n", "\tToyota Corolla & 33.9 & 4 & 71.1 & 65 & 4.22 & 1.835 & 19.90 & 1 & 1 & 4 & 1\\\\\n", "\tToyota Corona & 21.5 & 4 & 120.1 & 97 & 3.70 & 2.465 & 20.01 & 1 & 0 & 3 & 1\\\\\n", "\tDodge Challenger & 15.5 & 8 & 318.0 & 150 & 2.76 & 3.520 & 16.87 & 0 & 0 & 3 & 2\\\\\n", "\tAMC Javelin & 15.2 & 8 & 304.0 & 150 & 3.15 & 3.435 & 17.30 & 0 & 0 & 3 & 2\\\\\n", "\tCamaro Z28 & 13.3 & 8 & 350.0 & 245 & 3.73 & 3.840 & 15.41 & 0 & 0 & 3 & 4\\\\\n", "\tPontiac Firebird & 19.2 & 8 & 400.0 & 175 & 3.08 & 3.845 & 17.05 & 0 & 0 & 3 & 2\\\\\n", "\tFiat X1-9 & 27.3 & 4 & 79.0 & 66 & 4.08 & 1.935 & 18.90 & 1 & 1 & 4 & 1\\\\\n", "\tPorsche 914-2 & 26.0 & 4 & 120.3 & 91 & 4.43 & 2.140 & 16.70 & 0 & 1 & 5 & 2\\\\\n", "\tLotus Europa & 30.4 & 4 & 95.1 & 113 & 3.77 & 1.513 & 16.90 & 1 & 1 & 5 & 2\\\\\n", "\tFord Pantera L & 15.8 & 8 & 351.0 & 264 & 4.22 & 3.170 & 14.50 & 0 & 1 & 5 & 4\\\\\n", "\tFerrari Dino & 19.7 & 6 & 145.0 & 175 & 3.62 & 2.770 & 15.50 & 0 & 1 & 5 & 6\\\\\n", "\tMaserati Bora & 15.0 & 8 & 301.0 & 335 & 3.54 & 3.570 & 14.60 & 0 & 1 & 5 & 8\\\\\n", "\tVolvo 142E & 21.4 & 4 & 121.0 & 109 & 4.11 & 2.780 & 18.60 & 1 & 1 & 4 & 2\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 32 × 11\n", "\n", "| | mpg <dbl> | cyl <dbl> | disp <dbl> | hp <dbl> | drat <dbl> | wt <dbl> | qsec <dbl> | vs <dbl> | am <dbl> | gear <dbl> | carb <dbl> |\n", "|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |\n", "| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |\n", "| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |\n", "| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |\n", "| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |\n", "| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |\n", "| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |\n", "| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |\n", "| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |\n", "| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |\n", "| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |\n", "| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |\n", "| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |\n", "| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |\n", "| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |\n", "| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |\n", "| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |\n", "| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |\n", "| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |\n", "| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |\n", "| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |\n", "| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |\n", "| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |\n", "| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |\n", "| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |\n", "| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |\n", "| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |\n", "| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |\n", "| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |\n", "| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |\n", "| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |\n", "| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |\n", "\n" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am gear carb\n", "Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 \n", "Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 \n", "Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 \n", "Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 \n", "Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 \n", "Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 \n", "Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 \n", "Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 \n", "Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 \n", "Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 \n", "Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 \n", "Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 \n", "Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 \n", "Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 \n", "Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 \n", "Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 \n", "Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 \n", "Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 \n", "Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 \n", "Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 \n", "Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 \n", "Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 \n", "AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 \n", "Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 \n", "Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 \n", "Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 \n", "Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 \n", "Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 \n", "Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 \n", "Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 \n", "Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 \n", "Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " mpg cyl disp hp drat wt qsec vs am gear carb\n", "Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n", "Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4\n", "Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\n", "Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1\n", "Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\n", "Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\n", "Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n", "Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2\n", "Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2\n", "Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\n", "Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4\n", "Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\n", "Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3\n", "Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3\n", "Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n", "Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\n", "Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n", "Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\n", "Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\n", "Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\n", "Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1\n", "Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\n", "AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2\n", "Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\n", "Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2\n", "Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1\n", "Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2\n", "Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n", "Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\n", "Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\n", "Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\n", "Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2\n" ] } ], "source": [ "mtcars\n", "print(mtcars)" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-61de1f2d0df5bca4", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "We will talk more about dataframes in just a bit, but for now, just keep in mind that they are one of the most used data structures in R - albeit more complex than vectors.\n", "\n", "### 1.4 Subsetting\n", "\n", "Often, when you're working with a large dataset (such as `mtcars`), you will only be interested in a small portion of it. Even when working with a simpler data structure, such as vectors, you may want to extract a particular value that you are interested in. R has several ways of doing this, in a process that it calls \"subsetting\". Subsetting dataframes is definitely a more complex task - we will start little, with vectors.\n", "\n", "A student from a previous STAT545 cohort tracked his commute times for two weeks (10 days), and saved them in a vector that he stored in the variable `times`. Here is the `times` variable. " ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "name": "times", "nbgrader": { "grade": false, "grade_id": "cell-78252235e1d5148b", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "times <- c(18, 22, 43, 26, 75, 31, 32, 17, 16, 51)" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-f9666453c58f185c", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "We use `[]` to subset the vector of times. Although we had a look at this in class, here are a couple examples to refresh your memory. To extract the first entry of a vector:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "name": "example-1-10", "nbgrader": { "grade": false, "grade_id": "cell-bd8f9c32442ea258", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "3.14" ], "text/latex": [ "3.14" ], "text/markdown": [ "3.14" ], "text/plain": [ "[1] 3.14" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x[1]" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-d22c02234f3ab50f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "And if I want to extract everything *but* the first entry:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "name": "example-1-11", "nbgrader": { "grade": false, "grade_id": "cell-2fead6186e5faa2f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 2.71
  2. 6.28
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 2.71\n", "\\item 6.28\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 2.71\n", "2. 6.28\n", "\n", "\n" ], "text/plain": [ "[1] 2.71 6.28" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x[-1]" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-a9afbe0547d200af", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "You're doing a great job! Now, it's your turn to use `[]` to subset the vector of times. Keep it up!\n", "\n", "**QUESTION 1.7**\n", "\n", "Extract the third entry of the `times` vector, and store the result in a variable named `answer1.7`.\n", "\n", "```\n", "answer1.7 <- FILL_THIS_IN[FILL_THIS_IN]\n", "```" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-9c43bdaa066cf9be", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "answer1.7 <- times[3]\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "lines_to_next_cell": 2, "nbgrader": { "grade": true, "grade_id": "cell-fc4318f2c768353e", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(answer1.7),\n", " \"e3aac2c171de0322895102f09101ba98\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-6d72f91cfe89bbbe", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.8**\n", "\n", "Extract everything in `times` except the third entry. Store the result in a variable named `answer1.8`.\n", "\n", "```\n", "answer1.8 <- FILL_THIS_IN[-FILL_THIS_IN]\n", "```" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "name": "question-1-6", "nbgrader": { "grade": false, "grade_id": "cell-5059b20d3a5e2257", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "answer1.8 <- times[-3]\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "lines_to_next_cell": 2, "name": "test-1-6", "nbgrader": { "grade": true, "grade_id": "cell-529aaa0beede8288", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(answer1.8),\n", " \"600c1ff6db302a52139f9ac39dd41d0c\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-291d05e19e05b8ff", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.9**\n", "\n", "Extract the second and fourth entry of `times`, and store it in a variable called `answer1.9a`. Extract the fourth and second entry of `times`, and store it in a variable called `answer1.9b`. *Hint: remember `c()`?*\n", "\n", "```\n", "answer1.9a <- FILL_THIS_IN[FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN)]\n", "answer1.9b <- FILL_THIS_IN[FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN)]\n", "```" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "name": "question-1-7", "nbgrader": { "grade": false, "grade_id": "cell-122807aab027ca58", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "answer1.9a <- times[c(2, 4)]\n", "answer1.9b <- times[c(4, 2)]\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "lines_to_next_cell": 0, "name": "test-1-7", "nbgrader": { "grade": true, "grade_id": "cell-5ce1faeb13ff62de", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(answer1.9a),\n", " \"24887cb43232d541bb551ce34f852e69\"\n", " )\n", " expect_identical(\n", " digest(answer1.9b),\n", " \"94001bedd89d74d064e93afdf1b57986\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-285513d8d0a7b29b", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.10**\n", "\n", "Extract the second through fifth entry of `times` – make use of `:` to construct sequential vectors. Store the result in a variable named `answer1.10`.\n", "\n", "```\n", "answer1.10 <- FILL_THIS_IN[FILL_THIS_IN:FILL_THIS_IN]\n", "```" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "name": "question-1-8", "nbgrader": { "grade": false, "grade_id": "cell-b43afbf7653655cd", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "answer1.10 <- times[2:5]\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "lines_to_next_cell": 2, "name": "test-1-8", "nbgrader": { "grade": true, "grade_id": "cell-aa55397aaf797937", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(answer1.10),\n", " \"3dff0beb6577b621859c9a3579b8d379\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-19dcdf530107701a", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.11**\n", "\n", "Extract all entries of `times` that are less than 30 minutes, and place the result in a variable named `answer1.11`. Why does this work? Logical subsetting!\n", "\n", "```\n", "answer1.11 <- FILL_THIS_IN[FILL_THIS_IN < FILL_THIS_IN]\n", "```" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "name": "question-1-9", "nbgrader": { "grade": false, "grade_id": "cell-2e0e3bcdd6873bb5", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "answer1.11 <- times[times < 30]\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "lines_to_next_cell": 2, "name": "test-1-9", "nbgrader": { "grade": true, "grade_id": "cell-84c765a59f939e4a", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(answer1.11),\n", " \"547b9ded5983c354d5684dbfa0909ceb\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-000a9cf18a089909", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.12**\n", "\n", "After all of that, did the `times` object change at all?\n", "\n", "Multiple Choice:\n", "\n", "A) yes\n", "\n", "B) no\n", "\n", "C) not sure\n", "\n", "Store your answer (e.g. the letter corresponding to the correct option) in an object called `answer1.12`.\n", "\n", "```\n", "answer1.12 <- FILL_THIS_IN\n", "```" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "name": "question-1-10", "nbgrader": { "grade": false, "grade_id": "cell-7ab202b71ea51f36", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "#answer1.12 <- ...\n", "### BEGIN SOLUTION\n", "answer1.12 <- \"B\"\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "nbgrader": { "grade": true, "grade_id": "cell-b58cf9a17dd5dae2", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(toupper(as.character(answer1.12))),\n", " \"3a5505c06543876fe45598b5e5e5195d\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-bd15376b48e4811b", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.13**\n", "\n", "This is a bit of challenge, but I bet you can do it. Try using `[]` in conjunction with `<-` to change the `times` objects by replacing the 2nd and 3rd entries with 2 new travel times of your choosing.\n", "\n", "(Before you do that, allow us to store the original `times` object for autograding!)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-cd7661342aaca022", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "times_old <- times" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-9394243633df3245", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Now, answer away!\n", "\n", "```\n", "FILL_THIS_IN[FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN)] <- FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "name": "question-1-11", "nbgrader": { "grade": false, "grade_id": "cell-2e19fbd54689c8f5", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "times[c(2, 3)] <- c(10, 12) # as an example, your answer could be different!\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "lines_to_next_cell": 2, "name": "test-1-11", "nbgrader": { "grade": true, "grade_id": "cell-6db10ac9c23cc3bd", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "# Test that `times` still has length 10, and that the entries are the\n", "# same except for the 2nd and 3rd.\n", "test_that(\"Answer check\", {\n", " expect_identical(length(times), 10L)\n", " expect_identical(times_old[-c(2, 3)], times[-c(2, 3)])\n", " expect_true(times_old[2] != times[2])\n", " expect_true(times_old[3] != times[3])\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-ff59e02a034653c1", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "### 1.6 Functions\n", "\n", "Functions are one of the fundamental building blocks of the R language. They are small pieces of reusable code that can be treated like any other R object. Functions are easily recognizable because they are usually characterized by their name followed by parenthesis. For example, if there was a function that could make bread, it would look like this: bread().\n", "\n", "**QUESTION 1.14**\n", "\n", "You have actually already used 3 functions in this worksheet before being formally introduced to what a function is. Can you recall if any of these functions have been used in this worksheet already?\n", "\n", "A) `c()`\n", "\n", "B) `mean()`\n", "\n", "C) `typeof()`\n", "\n", "D) `length()`\n", "\n", "**Hint:** More than 1 answer may be correct -- make a vector of all of the correct ones!\n", "\n", "```\n", "answer1.11 <- c(\"FILL_THIS_IN\", \"FILL_THIS_IN\", ...)\n", "```" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "name": "answer-1-11", "nbgrader": { "grade": false, "grade_id": "cell-9fcc0142a463e216", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# answer1.11 <- youranswer\n", "### BEGIN SOLUTION\n", "answer1.11 <- c(\"A\", \"C\", \"D\")\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "nbgrader": { "grade": true, "grade_id": "cell-e6eed02e488ad154", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(\n", " digest(sort(toupper(as.character(answer1.11)))),\n", " \"82baea6f032c2c9fa74e85f8b379f021\"\n", " )\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-ee34dc4a43ba3467", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "There are tens of thousands of functions that one can use in R, which seems a bit large for this worksheet. Let's explore a few basic functions just for fun. Type `Sys.Date()` below to see what happens!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "activity-1-9" }, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-97dbbd0ff341ffff", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Remember that there are different types of vectors, besides numeric and logical? Well, the output of `Sys.Date()` is actually an example of another vector type, known in R language as a \"string\". A \"string\" is just a character (any value written within a pair of single or double quotes in R) variable that contains one or more characters! \n", "\n", "The value that `Sys.Date()` computes is based on your computer's environment, but functions in R can also manipulate input data in order to compute a return value. At the start of this worksheet, you were introduced to the simplest form of R - as a calculator. Actually, R functions allow us to compute certain things that could be done manually as a calculator, but much faster.\n", "\n", "Recall the `times` vector earlier. What's the average travel time? Instead of computing this manually, we can use a function called `mean`. " ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "name": "example-1-12", "nbgrader": { "grade": false, "grade_id": "cell-49301c151f447382", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/html": [ "28.8" ], "text/latex": [ "28.8" ], "text/markdown": [ "28.8" ], "text/plain": [ "[1] 28.8" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mean(times)" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-9d07b29f76367267", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Notice the syntax of using a function: starting by the left with the *function name*, and the *input* goes inside brackets. We *input* times, and we got an *output*. Did this function change the *input*? Check:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-ade6a4aff0947e9f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.15**\n", "\n", "Aside from bizarre functions, this is always the case. But functions don't always return a single value. Try the `range()` function (assigning the result to `answer1.15a`), and the `sqrt()` function (assigning the result to `answer1.15b`), using the `times` vector as an argument for both.\n", "\n", "```\n", "answer1.15a <- FILL_THIS_IN(FILL_THIS_IN)\n", "answer1.15b <- FILL_THIS_IN(FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "lines_to_next_cell": 2, "name": "question-1-12", "nbgrader": { "grade": false, "grade_id": "cell-b47579d0c904248a", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "answer1.15a <- range(times)\n", "answer1.15b <- sqrt(times)\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "name": "answer-1-12", "nbgrader": { "grade": true, "grade_id": "cell-4c85f43d6bfd6e06", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(answer1.15a, range(times))\n", " expect_identical(answer1.15b, sqrt(times)) \n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-d1c9d42b92296b01", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Functions can also take more than one argument as input, separated by commas. You can find out what these arguments are by accessing the function’s documentation, which you can do by executing `?\"function name\"`. Try accessing the documentation of the `mean()` function by executing `?mean`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "activity-1-10" }, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-e94a18f9bb776edb", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "There are four arguments. All the arguments have names, except for the `...` argument (more on `...` -- ellipses -- later). This is always the case.\n", "\n", "Under \"Usage\", some of the arguments are of the form `name = value`. These are default values, in case you don't specify these arguments. This is a sure sign that these arguments are optional.\n", "\n", "`x` is \"on its own\". This typically means that it has no default, and often (but not always) means that the argument is required. We can specify an argument in one of two ways:\n", "\n", "+ specifying argument `name = value` in the function parentheses; or\n", "+ matching the ordering of the input with the ordering of the arguments.\n", "\n", "For readability, this is not recommended beyond the first or sometimes second argument! \n", "\n", "**QUESTION 1.16**\n", "\n", "Try executing `mean()` again with `times` as an argument, but this time, set the `na.rm` to `TRUE`. Store the result in a variable named `answer1.16`.\n", "\n", "```\n", "answer1.16 <- FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN = FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "name": "question-1-13", "nbgrader": { "grade": false, "grade_id": "cell-de91816819964158", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "answer1.16 <- mean(times, na.rm = TRUE)\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "name": "answer-1-13", "nbgrader": { "grade": true, "grade_id": "cell-ae3bba6dd9e77802", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(answer1.16, mean(times, na.rm = TRUE))\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-3a3b2d082a935f90", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.17**\n", "\n", "The mean is the same, because there are no `NA` values in the vector `times`. Put your subsetting knowledge into practice by replacing the third entry of the `times` vector by a missing value (`NA`).\n", "\n", "**Hint**: Your solution starts with `times`.\n", "\n", "```\n", "FILL_THIS_IN[FILL_THIS_IN] <- NA\n", "```" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "name": "question-1-14", "nbgrader": { "grade": false, "grade_id": "cell-342edc09f72e7a8c", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "ename": "ERROR", "evalue": "Error in times[3] <- NA: object 'times' not found\n", "output_type": "error", "traceback": [ "Error in times[3] <- NA: object 'times' not found\nTraceback:\n" ] } ], "source": [ "### BEGIN SOLUTION\n", "times[3] <- NA\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "name": "answer-1-14", "nbgrader": { "grade": true, "grade_id": "cell-6cde747b897599a4", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(which(is.na(times)), 3L)\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-d43cdab8bc89c4b3", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**QUESTION 1.18**\n", "\n", "Now, try executing `mean()` specifying `na.rm` as `TRUE` again (with `times` as an input). Store the output in a variable named `answer1.18`.\n", "\n", "```\n", "answer1.18 <- FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN = FILL_THIS_IN)\n", "```" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "name": "question-1-15", "nbgrader": { "grade": false, "grade_id": "cell-796c60eb34c7f758", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "answer1.18 <- mean(times, na.rm = TRUE)\n", "### END SOLUTION" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "name": "answer-1-15", "nbgrader": { "grade": true, "grade_id": "cell-ad7598274cbe0104", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"success!\"\n" ] } ], "source": [ "test_that(\"Answer check\", {\n", " expect_identical(answer1.18, mean(times, na.rm = TRUE))\n", "})\n", "print(\"success!\")" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-12b47a9a552cf299", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Notice how the output changes. What if you try setting `na.rm` as `FALSE` instead?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "activity-1-11" }, "outputs": [], "source": [ "# your code here" ] } ], "metadata": { "celltoolbar": "Create Assignment", "jupytext": { "cell_metadata_filter": "warning,name,message,-all", "notebook_metadata_filter": "-all", "text_representation": { "extension": ".Rmd", "format_name": "rmarkdown" } }, "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.0.2" } }, "nbformat": 4, "nbformat_minor": 4 }