{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4758928a",
   "metadata": {},
   "source": [
    "In this tutorial we are going to learn the library **dplyr**, which is really useful for transforming the data (e.g. create new variables, reorder observations, etc)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2bacfb97",
   "metadata": {},
   "source": [
    "# Loading dplyr"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3203cc6",
   "metadata": {},
   "source": [
    "Like **ggplot2**, dplyr is also part of **tidyverse**. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2feeed5b",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "── \u001b[1mAttaching packages\u001b[22m ─────────────────────────────────────── tidyverse 1.3.2 ──\n",
      "\u001b[32m✔\u001b[39m \u001b[34mggplot2\u001b[39m 3.4.0      \u001b[32m✔\u001b[39m \u001b[34mpurrr  \u001b[39m 1.0.1 \n",
      "\u001b[32m✔\u001b[39m \u001b[34mtibble \u001b[39m 3.1.8      \u001b[32m✔\u001b[39m \u001b[34mdplyr  \u001b[39m 1.0.10\n",
      "\u001b[32m✔\u001b[39m \u001b[34mtidyr  \u001b[39m 1.2.1      \u001b[32m✔\u001b[39m \u001b[34mstringr\u001b[39m 1.5.0 \n",
      "\u001b[32m✔\u001b[39m \u001b[34mreadr  \u001b[39m 2.1.3      \u001b[32m✔\u001b[39m \u001b[34mforcats\u001b[39m 0.5.2 \n",
      "── \u001b[1mConflicts\u001b[22m ────────────────────────────────────────── tidyverse_conflicts() ──\n",
      "\u001b[31m✖\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mfilter()\u001b[39m masks \u001b[34mstats\u001b[39m::filter()\n",
      "\u001b[31m✖\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mlag()\u001b[39m    masks \u001b[34mstats\u001b[39m::lag()\n"
     ]
    }
   ],
   "source": [
    "library(tidyverse)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3c97df2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "#sat.dat<-read.csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/psych/sat.act.csv\")\n",
    "sat.dat<-read.csv(\"../../datasets/sat.act.csv\")\n",
    "sat.dat$gender<- as.factor( sat.dat$gender )\n",
    "sat.dat$education<- as.factor( sat.dat$education )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "85722a96",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".list-inline {list-style: none; margin:0; padding: 0}\n",
       ".list-inline>li {display: inline-block}\n",
       ".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n",
       "</style>\n",
       "<ol class=list-inline><li>700</li><li>7</li></ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item 700\n",
       "\\item 7\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. 700\n",
       "2. 7\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1] 700   7"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Like we learned, this gives the number of rows and columns of the data frame \n",
    "dim(sat.dat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e4b61558",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "       X         gender  education      age             ACT       \n",
       " Min.   :29442   1:247   0: 57     Min.   :13.00   Min.   : 3.00  \n",
       " 1st Qu.:32117   2:453   1: 45     1st Qu.:19.00   1st Qu.:25.00  \n",
       " Median :34896           2: 44     Median :22.00   Median :29.00  \n",
       " Mean   :34731           3:275     Mean   :25.59   Mean   :28.55  \n",
       " 3rd Qu.:37250           4:138     3rd Qu.:29.00   3rd Qu.:32.00  \n",
       " Max.   :39985           5:141     Max.   :65.00   Max.   :36.00  \n",
       "                                                                  \n",
       "      SATV            SATQ      \n",
       " Min.   :200.0   Min.   :200.0  \n",
       " 1st Qu.:550.0   1st Qu.:530.0  \n",
       " Median :620.0   Median :620.0  \n",
       " Mean   :612.2   Mean   :610.2  \n",
       " 3rd Qu.:700.0   3rd Qu.:700.0  \n",
       " Max.   :800.0   Max.   :800.0  \n",
       "                 NA's   :13     "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# And this, a summary of our data frame\n",
    "summary(sat.dat)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76f452da",
   "metadata": {},
   "source": [
    "# `filter`: Pick observations by their values\n",
    "\n",
    "This function allows you to subset observations based on specific conditions. The first argument is the name of the data frame. The second and subsequent arguments are the expressions that filter the data frame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d708d9db",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".list-inline {list-style: none; margin:0; padding: 0}\n",
       ".list-inline>li {display: inline-block}\n",
       ".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n",
       "</style>\n",
       "<ol class=list-inline><li>247</li><li>7</li></ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item 247\n",
       "\\item 7\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. 247\n",
       "2. 7\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1] 247   7"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Filter data for gender = 1\n",
    "sat.dat.filter<-filter(sat.dat, gender == 1)\n",
    "\n",
    "dim(sat.dat.filter)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "5d72b712",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".list-inline {list-style: none; margin:0; padding: 0}\n",
       ".list-inline>li {display: inline-block}\n",
       ".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n",
       "</style>\n",
       "<ol class=list-inline><li>146</li><li>7</li></ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item 146\n",
       "\\item 7\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. 146\n",
       "2. 7\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1] 146   7"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Filter data for gender = 1 and age less than 25\n",
    "sat.dat.filter<-filter(sat.dat, gender == 1, age<25)\n",
    "dim(sat.dat.filter)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "9dd91a01",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "       X         gender  education      age             ACT       \n",
       " Min.   :29503   1: 53   0:16      Min.   :15.00   Min.   :26.00  \n",
       " 1st Qu.:31413   2:134   1:12      1st Qu.:19.00   1st Qu.:27.00  \n",
       " Median :34356           2:12      Median :23.00   Median :28.00  \n",
       " Mean   :34323           3:67      Mean   :25.11   Mean   :27.55  \n",
       " 3rd Qu.:36680           4:40      3rd Qu.:28.00   3rd Qu.:28.00  \n",
       " Max.   :39937           5:40      Max.   :58.00   Max.   :29.00  \n",
       "                                                                  \n",
       "      SATV            SATQ      \n",
       " Min.   :400.0   Min.   :430.0  \n",
       " 1st Qu.:550.0   1st Qu.:532.5  \n",
       " Median :600.0   Median :600.0  \n",
       " Mean   :608.1   Mean   :599.8  \n",
       " 3rd Qu.:670.0   3rd Qu.:660.0  \n",
       " Max.   :800.0   Max.   :800.0  \n",
       "                 NA's   :5      "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Filter observations within a range of ACT. Check that filtering worked by getting a summary of the filtered data frame. \n",
    "sat.dat.filter<-filter(sat.dat, (ACT > 25 & ACT < 30))\n",
    "summary(sat.dat.filter)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce88ce1e",
   "metadata": {},
   "source": [
    "# `arrange`: Reorder the rows\n",
    "\n",
    "This function takes a data frame, and a set of column names (or more complicated expressions) to order by (in ascending order by default). If more than one column name is provided, each additional column is used to break ties in the values of preceding columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "5c5ec278",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 7</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>X</th><th scope=col>gender</th><th scope=col>education</th><th scope=col>age</th><th scope=col>ACT</th><th scope=col>SATV</th><th scope=col>SATQ</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>1</th><td>36093</td><td>1</td><td>2</td><td>22</td><td> 3</td><td>200</td><td>400</td></tr>\n",
       "\t<tr><th scope=row>2</th><td>30591</td><td>2</td><td>2</td><td>27</td><td>15</td><td>375</td><td>375</td></tr>\n",
       "\t<tr><th scope=row>3</th><td>36906</td><td>2</td><td>0</td><td>18</td><td>15</td><td>470</td><td>500</td></tr>\n",
       "\t<tr><th scope=row>4</th><td>37398</td><td>1</td><td>1</td><td>17</td><td>15</td><td>600</td><td>596</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>36763</td><td>1</td><td>1</td><td>23</td><td>15</td><td>600</td><td>600</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>34578</td><td>2</td><td>3</td><td>18</td><td>16</td><td>430</td><td>350</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 7\n",
       "\\begin{tabular}{r|lllllll}\n",
       "  & X & gender & education & age & ACT & SATV & SATQ\\\\\n",
       "  & <int> & <fct> & <fct> & <int> & <int> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t1 & 36093 & 1 & 2 & 22 &  3 & 200 & 400\\\\\n",
       "\t2 & 30591 & 2 & 2 & 27 & 15 & 375 & 375\\\\\n",
       "\t3 & 36906 & 2 & 0 & 18 & 15 & 470 & 500\\\\\n",
       "\t4 & 37398 & 1 & 1 & 17 & 15 & 600 & 596\\\\\n",
       "\t5 & 36763 & 1 & 1 & 23 & 15 & 600 & 600\\\\\n",
       "\t6 & 34578 & 2 & 3 & 18 & 16 & 430 & 350\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 7\n",
       "\n",
       "| <!--/--> | X &lt;int&gt; | gender &lt;fct&gt; | education &lt;fct&gt; | age &lt;int&gt; | ACT &lt;int&gt; | SATV &lt;int&gt; | SATQ &lt;int&gt; |\n",
       "|---|---|---|---|---|---|---|---|\n",
       "| 1 | 36093 | 1 | 2 | 22 |  3 | 200 | 400 |\n",
       "| 2 | 30591 | 2 | 2 | 27 | 15 | 375 | 375 |\n",
       "| 3 | 36906 | 2 | 0 | 18 | 15 | 470 | 500 |\n",
       "| 4 | 37398 | 1 | 1 | 17 | 15 | 600 | 596 |\n",
       "| 5 | 36763 | 1 | 1 | 23 | 15 | 600 | 600 |\n",
       "| 6 | 34578 | 2 | 3 | 18 | 16 | 430 | 350 |\n",
       "\n"
      ],
      "text/plain": [
       "  X     gender education age ACT SATV SATQ\n",
       "1 36093 1      2         22   3  200  400 \n",
       "2 30591 2      2         27  15  375  375 \n",
       "3 36906 2      0         18  15  470  500 \n",
       "4 37398 1      1         17  15  600  596 \n",
       "5 36763 1      1         23  15  600  600 \n",
       "6 34578 2      3         18  16  430  350 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This order first based in ACT, then SATQ and finally SATV\n",
    "sat.dat.arrange<-arrange(sat.dat, ACT, SATQ, SATV)\n",
    "head(sat.dat.arrange)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31fce8b6",
   "metadata": {},
   "source": [
    "If we want to reorder in descending order, we can use the function `desc`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "78bb4fdb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 7</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>X</th><th scope=col>gender</th><th scope=col>education</th><th scope=col>age</th><th scope=col>ACT</th><th scope=col>SATV</th><th scope=col>SATQ</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>1</th><td>31080</td><td>2</td><td>4</td><td>33</td><td>36</td><td>500</td><td>500</td></tr>\n",
       "\t<tr><th scope=row>2</th><td>29527</td><td>2</td><td>5</td><td>30</td><td>36</td><td>610</td><td>500</td></tr>\n",
       "\t<tr><th scope=row>3</th><td>33539</td><td>2</td><td>3</td><td>31</td><td>36</td><td>700</td><td>500</td></tr>\n",
       "\t<tr><th scope=row>4</th><td>32697</td><td>2</td><td>2</td><td>26</td><td>36</td><td>550</td><td>555</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>30129</td><td>2</td><td>5</td><td>30</td><td>36</td><td>660</td><td>610</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>39651</td><td>2</td><td>4</td><td>28</td><td>36</td><td>590</td><td>620</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 7\n",
       "\\begin{tabular}{r|lllllll}\n",
       "  & X & gender & education & age & ACT & SATV & SATQ\\\\\n",
       "  & <int> & <fct> & <fct> & <int> & <int> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t1 & 31080 & 2 & 4 & 33 & 36 & 500 & 500\\\\\n",
       "\t2 & 29527 & 2 & 5 & 30 & 36 & 610 & 500\\\\\n",
       "\t3 & 33539 & 2 & 3 & 31 & 36 & 700 & 500\\\\\n",
       "\t4 & 32697 & 2 & 2 & 26 & 36 & 550 & 555\\\\\n",
       "\t5 & 30129 & 2 & 5 & 30 & 36 & 660 & 610\\\\\n",
       "\t6 & 39651 & 2 & 4 & 28 & 36 & 590 & 620\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 7\n",
       "\n",
       "| <!--/--> | X &lt;int&gt; | gender &lt;fct&gt; | education &lt;fct&gt; | age &lt;int&gt; | ACT &lt;int&gt; | SATV &lt;int&gt; | SATQ &lt;int&gt; |\n",
       "|---|---|---|---|---|---|---|---|\n",
       "| 1 | 31080 | 2 | 4 | 33 | 36 | 500 | 500 |\n",
       "| 2 | 29527 | 2 | 5 | 30 | 36 | 610 | 500 |\n",
       "| 3 | 33539 | 2 | 3 | 31 | 36 | 700 | 500 |\n",
       "| 4 | 32697 | 2 | 2 | 26 | 36 | 550 | 555 |\n",
       "| 5 | 30129 | 2 | 5 | 30 | 36 | 660 | 610 |\n",
       "| 6 | 39651 | 2 | 4 | 28 | 36 | 590 | 620 |\n",
       "\n"
      ],
      "text/plain": [
       "  X     gender education age ACT SATV SATQ\n",
       "1 31080 2      4         33  36  500  500 \n",
       "2 29527 2      5         30  36  610  500 \n",
       "3 33539 2      3         31  36  700  500 \n",
       "4 32697 2      2         26  36  550  555 \n",
       "5 30129 2      5         30  36  660  610 \n",
       "6 39651 2      4         28  36  590  620 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sat.dat.arrange<-arrange(sat.dat, desc(ACT), SATQ, SATV)\n",
    "head(sat.dat.arrange)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fd0549c",
   "metadata": {},
   "source": [
    "# `select`: Pick variables by their names\n",
    "\n",
    "It allows you to pick a subset of variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "111402d5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 3</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>ACT</th><th scope=col>SATQ</th><th scope=col>SATV</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>1</th><td>24</td><td>500</td><td>500</td></tr>\n",
       "\t<tr><th scope=row>2</th><td>35</td><td>500</td><td>600</td></tr>\n",
       "\t<tr><th scope=row>3</th><td>21</td><td>470</td><td>480</td></tr>\n",
       "\t<tr><th scope=row>4</th><td>26</td><td>520</td><td>550</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>31</td><td>550</td><td>600</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>28</td><td>640</td><td>640</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 3\n",
       "\\begin{tabular}{r|lll}\n",
       "  & ACT & SATQ & SATV\\\\\n",
       "  & <int> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t1 & 24 & 500 & 500\\\\\n",
       "\t2 & 35 & 500 & 600\\\\\n",
       "\t3 & 21 & 470 & 480\\\\\n",
       "\t4 & 26 & 520 & 550\\\\\n",
       "\t5 & 31 & 550 & 600\\\\\n",
       "\t6 & 28 & 640 & 640\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 3\n",
       "\n",
       "| <!--/--> | ACT &lt;int&gt; | SATQ &lt;int&gt; | SATV &lt;int&gt; |\n",
       "|---|---|---|---|\n",
       "| 1 | 24 | 500 | 500 |\n",
       "| 2 | 35 | 500 | 600 |\n",
       "| 3 | 21 | 470 | 480 |\n",
       "| 4 | 26 | 520 | 550 |\n",
       "| 5 | 31 | 550 | 600 |\n",
       "| 6 | 28 | 640 | 640 |\n",
       "\n"
      ],
      "text/plain": [
       "  ACT SATQ SATV\n",
       "1 24  500  500 \n",
       "2 35  500  600 \n",
       "3 21  470  480 \n",
       "4 26  520  550 \n",
       "5 31  550  600 \n",
       "6 28  640  640 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This selects columns by the given names of the columns. Here we just select ACT, SATQ and SATV\n",
    "sat.dat.select<-select(sat.dat, ACT, SATQ, SATV)\n",
    "\n",
    "head(sat.dat.select)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2e460f0",
   "metadata": {},
   "source": [
    "If you want to select consecutive columns, we can always use the operator `:`, like with vectors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "047832aa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 4</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>gender</th><th scope=col>education</th><th scope=col>age</th><th scope=col>ACT</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>1</th><td>2</td><td>3</td><td>19</td><td>24</td></tr>\n",
       "\t<tr><th scope=row>2</th><td>2</td><td>3</td><td>23</td><td>35</td></tr>\n",
       "\t<tr><th scope=row>3</th><td>2</td><td>3</td><td>20</td><td>21</td></tr>\n",
       "\t<tr><th scope=row>4</th><td>1</td><td>4</td><td>27</td><td>26</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>1</td><td>2</td><td>33</td><td>31</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>1</td><td>5</td><td>26</td><td>28</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 4\n",
       "\\begin{tabular}{r|llll}\n",
       "  & gender & education & age & ACT\\\\\n",
       "  & <fct> & <fct> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t1 & 2 & 3 & 19 & 24\\\\\n",
       "\t2 & 2 & 3 & 23 & 35\\\\\n",
       "\t3 & 2 & 3 & 20 & 21\\\\\n",
       "\t4 & 1 & 4 & 27 & 26\\\\\n",
       "\t5 & 1 & 2 & 33 & 31\\\\\n",
       "\t6 & 1 & 5 & 26 & 28\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 4\n",
       "\n",
       "| <!--/--> | gender &lt;fct&gt; | education &lt;fct&gt; | age &lt;int&gt; | ACT &lt;int&gt; |\n",
       "|---|---|---|---|---|\n",
       "| 1 | 2 | 3 | 19 | 24 |\n",
       "| 2 | 2 | 3 | 23 | 35 |\n",
       "| 3 | 2 | 3 | 20 | 21 |\n",
       "| 4 | 1 | 4 | 27 | 26 |\n",
       "| 5 | 1 | 2 | 33 | 31 |\n",
       "| 6 | 1 | 5 | 26 | 28 |\n",
       "\n"
      ],
      "text/plain": [
       "  gender education age ACT\n",
       "1 2      3         19  24 \n",
       "2 2      3         23  35 \n",
       "3 2      3         20  21 \n",
       "4 1      4         27  26 \n",
       "5 1      2         33  31 \n",
       "6 1      5         26  28 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This selects all columns between gender and ACT (inclusive)\n",
    "sat.dat.select<-select(sat.dat, gender:ACT)\n",
    "\n",
    "head(sat.dat.select)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "285a30ea",
   "metadata": {},
   "source": [
    "Instead of selecting columns, you can always use a minus (-) sign before the name of the columns to filter them out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "4f454cc7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 3</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>X</th><th scope=col>SATV</th><th scope=col>SATQ</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>1</th><td>29442</td><td>500</td><td>500</td></tr>\n",
       "\t<tr><th scope=row>2</th><td>29457</td><td>600</td><td>500</td></tr>\n",
       "\t<tr><th scope=row>3</th><td>29498</td><td>480</td><td>470</td></tr>\n",
       "\t<tr><th scope=row>4</th><td>29503</td><td>550</td><td>520</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>29504</td><td>600</td><td>550</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>29518</td><td>640</td><td>640</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 3\n",
       "\\begin{tabular}{r|lll}\n",
       "  & X & SATV & SATQ\\\\\n",
       "  & <int> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t1 & 29442 & 500 & 500\\\\\n",
       "\t2 & 29457 & 600 & 500\\\\\n",
       "\t3 & 29498 & 480 & 470\\\\\n",
       "\t4 & 29503 & 550 & 520\\\\\n",
       "\t5 & 29504 & 600 & 550\\\\\n",
       "\t6 & 29518 & 640 & 640\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 3\n",
       "\n",
       "| <!--/--> | X &lt;int&gt; | SATV &lt;int&gt; | SATQ &lt;int&gt; |\n",
       "|---|---|---|---|\n",
       "| 1 | 29442 | 500 | 500 |\n",
       "| 2 | 29457 | 600 | 500 |\n",
       "| 3 | 29498 | 480 | 470 |\n",
       "| 4 | 29503 | 550 | 520 |\n",
       "| 5 | 29504 | 600 | 550 |\n",
       "| 6 | 29518 | 640 | 640 |\n",
       "\n"
      ],
      "text/plain": [
       "  X     SATV SATQ\n",
       "1 29442 500  500 \n",
       "2 29457 600  500 \n",
       "3 29498 480  470 \n",
       "4 29503 550  520 \n",
       "5 29504 600  550 \n",
       "6 29518 640  640 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This selects all columns byt those betweena gender and ACT (inclusive)\n",
    "sat.dat.select<-select(sat.dat, -(gender:ACT))\n",
    "\n",
    "head(sat.dat.select)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f5d8319",
   "metadata": {},
   "source": [
    "# `mutate`: Create new variables\n",
    "\n",
    "This function allows you to add new columns to data frames. These new columns can be set to any fixed value, a passed vector or, more interestingly and useful, from the existing columns in the data frame. New columns are always added at the end of the data frame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "23f841fa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 8</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>X</th><th scope=col>gender</th><th scope=col>education</th><th scope=col>age</th><th scope=col>ACT</th><th scope=col>SATV</th><th scope=col>SATQ</th><th scope=col>qvdiff</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>1</th><td>29442</td><td>2</td><td>3</td><td>19</td><td>24</td><td>500</td><td>500</td><td>   0</td></tr>\n",
       "\t<tr><th scope=row>2</th><td>29457</td><td>2</td><td>3</td><td>23</td><td>35</td><td>600</td><td>500</td><td>-100</td></tr>\n",
       "\t<tr><th scope=row>3</th><td>29498</td><td>2</td><td>3</td><td>20</td><td>21</td><td>480</td><td>470</td><td> -10</td></tr>\n",
       "\t<tr><th scope=row>4</th><td>29503</td><td>1</td><td>4</td><td>27</td><td>26</td><td>550</td><td>520</td><td> -30</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>29504</td><td>1</td><td>2</td><td>33</td><td>31</td><td>600</td><td>550</td><td> -50</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>29518</td><td>1</td><td>5</td><td>26</td><td>28</td><td>640</td><td>640</td><td>   0</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 8\n",
       "\\begin{tabular}{r|llllllll}\n",
       "  & X & gender & education & age & ACT & SATV & SATQ & qvdiff\\\\\n",
       "  & <int> & <fct> & <fct> & <int> & <int> & <int> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t1 & 29442 & 2 & 3 & 19 & 24 & 500 & 500 &    0\\\\\n",
       "\t2 & 29457 & 2 & 3 & 23 & 35 & 600 & 500 & -100\\\\\n",
       "\t3 & 29498 & 2 & 3 & 20 & 21 & 480 & 470 &  -10\\\\\n",
       "\t4 & 29503 & 1 & 4 & 27 & 26 & 550 & 520 &  -30\\\\\n",
       "\t5 & 29504 & 1 & 2 & 33 & 31 & 600 & 550 &  -50\\\\\n",
       "\t6 & 29518 & 1 & 5 & 26 & 28 & 640 & 640 &    0\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 8\n",
       "\n",
       "| <!--/--> | X &lt;int&gt; | gender &lt;fct&gt; | education &lt;fct&gt; | age &lt;int&gt; | ACT &lt;int&gt; | SATV &lt;int&gt; | SATQ &lt;int&gt; | qvdiff &lt;int&gt; |\n",
       "|---|---|---|---|---|---|---|---|---|\n",
       "| 1 | 29442 | 2 | 3 | 19 | 24 | 500 | 500 |    0 |\n",
       "| 2 | 29457 | 2 | 3 | 23 | 35 | 600 | 500 | -100 |\n",
       "| 3 | 29498 | 2 | 3 | 20 | 21 | 480 | 470 |  -10 |\n",
       "| 4 | 29503 | 1 | 4 | 27 | 26 | 550 | 520 |  -30 |\n",
       "| 5 | 29504 | 1 | 2 | 33 | 31 | 600 | 550 |  -50 |\n",
       "| 6 | 29518 | 1 | 5 | 26 | 28 | 640 | 640 |    0 |\n",
       "\n"
      ],
      "text/plain": [
       "  X     gender education age ACT SATV SATQ qvdiff\n",
       "1 29442 2      3         19  24  500  500     0  \n",
       "2 29457 2      3         23  35  600  500  -100  \n",
       "3 29498 2      3         20  21  480  470   -10  \n",
       "4 29503 1      4         27  26  550  520   -30  \n",
       "5 29504 1      2         33  31  600  550   -50  \n",
       "6 29518 1      5         26  28  640  640     0  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This creates a new column, which we called qvdiff, with the difference between SATQ and SATV\n",
    "sat.dat.mutate<-mutate(sat.dat, qvdiff=SATQ-SATV)\n",
    "head(sat.dat.mutate)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a108362",
   "metadata": {},
   "source": [
    "We can always use a preceding created column to define a new column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "02c20b0b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 9</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>X</th><th scope=col>gender</th><th scope=col>education</th><th scope=col>age</th><th scope=col>ACT</th><th scope=col>SATV</th><th scope=col>SATQ</th><th scope=col>qvdiff</th><th scope=col>absqvdiff</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>1</th><td>29442</td><td>2</td><td>3</td><td>19</td><td>24</td><td>500</td><td>500</td><td>   0</td><td>  0</td></tr>\n",
       "\t<tr><th scope=row>2</th><td>29457</td><td>2</td><td>3</td><td>23</td><td>35</td><td>600</td><td>500</td><td>-100</td><td>100</td></tr>\n",
       "\t<tr><th scope=row>3</th><td>29498</td><td>2</td><td>3</td><td>20</td><td>21</td><td>480</td><td>470</td><td> -10</td><td> 10</td></tr>\n",
       "\t<tr><th scope=row>4</th><td>29503</td><td>1</td><td>4</td><td>27</td><td>26</td><td>550</td><td>520</td><td> -30</td><td> 30</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>29504</td><td>1</td><td>2</td><td>33</td><td>31</td><td>600</td><td>550</td><td> -50</td><td> 50</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>29518</td><td>1</td><td>5</td><td>26</td><td>28</td><td>640</td><td>640</td><td>   0</td><td>  0</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 9\n",
       "\\begin{tabular}{r|lllllllll}\n",
       "  & X & gender & education & age & ACT & SATV & SATQ & qvdiff & absqvdiff\\\\\n",
       "  & <int> & <fct> & <fct> & <int> & <int> & <int> & <int> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t1 & 29442 & 2 & 3 & 19 & 24 & 500 & 500 &    0 &   0\\\\\n",
       "\t2 & 29457 & 2 & 3 & 23 & 35 & 600 & 500 & -100 & 100\\\\\n",
       "\t3 & 29498 & 2 & 3 & 20 & 21 & 480 & 470 &  -10 &  10\\\\\n",
       "\t4 & 29503 & 1 & 4 & 27 & 26 & 550 & 520 &  -30 &  30\\\\\n",
       "\t5 & 29504 & 1 & 2 & 33 & 31 & 600 & 550 &  -50 &  50\\\\\n",
       "\t6 & 29518 & 1 & 5 & 26 & 28 & 640 & 640 &    0 &   0\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 9\n",
       "\n",
       "| <!--/--> | X &lt;int&gt; | gender &lt;fct&gt; | education &lt;fct&gt; | age &lt;int&gt; | ACT &lt;int&gt; | SATV &lt;int&gt; | SATQ &lt;int&gt; | qvdiff &lt;int&gt; | absqvdiff &lt;int&gt; |\n",
       "|---|---|---|---|---|---|---|---|---|---|\n",
       "| 1 | 29442 | 2 | 3 | 19 | 24 | 500 | 500 |    0 |   0 |\n",
       "| 2 | 29457 | 2 | 3 | 23 | 35 | 600 | 500 | -100 | 100 |\n",
       "| 3 | 29498 | 2 | 3 | 20 | 21 | 480 | 470 |  -10 |  10 |\n",
       "| 4 | 29503 | 1 | 4 | 27 | 26 | 550 | 520 |  -30 |  30 |\n",
       "| 5 | 29504 | 1 | 2 | 33 | 31 | 600 | 550 |  -50 |  50 |\n",
       "| 6 | 29518 | 1 | 5 | 26 | 28 | 640 | 640 |    0 |   0 |\n",
       "\n"
      ],
      "text/plain": [
       "  X     gender education age ACT SATV SATQ qvdiff absqvdiff\n",
       "1 29442 2      3         19  24  500  500     0     0      \n",
       "2 29457 2      3         23  35  600  500  -100   100      \n",
       "3 29498 2      3         20  21  480  470   -10    10      \n",
       "4 29503 1      4         27  26  550  520   -30    30      \n",
       "5 29504 1      2         33  31  600  550   -50    50      \n",
       "6 29518 1      5         26  28  640  640     0     0      "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This defines a new column, which is just the absolute of a new column just created qvdiff\n",
    "sat.dat.mutate<-mutate(sat.dat, qvdiff=SATQ-SATV, absqvdiff=abs(qvdiff))\n",
    "head(sat.dat.mutate)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "9a293c3d",
   "metadata": {},
   "outputs": [
    {
     "ename": "ERROR",
     "evalue": "\u001b[1m\u001b[33mError\u001b[39m in `mutate()`:\u001b[22m\n\u001b[1m\u001b[22m\u001b[33m!\u001b[39m Problem while computing `absqvdiff = abs(qvdiff)`.\n\u001b[1mCaused by error in `mask$eval_all_mutate()`:\u001b[22m\n\u001b[33m!\u001b[39m object 'qvdiff' not found\n",
     "output_type": "error",
     "traceback": [
      "\u001b[1m\u001b[33mError\u001b[39m in `mutate()`:\u001b[22m\n\u001b[1m\u001b[22m\u001b[33m!\u001b[39m Problem while computing `absqvdiff = abs(qvdiff)`.\n\u001b[1mCaused by error in `mask$eval_all_mutate()`:\u001b[22m\n\u001b[33m!\u001b[39m object 'qvdiff' not found\nTraceback:\n",
      "1. mutate(sat.dat, absqvdiff = abs(qvdiff), qvdiff = SATQ - SATV)",
      "2. mutate.data.frame(sat.dat, absqvdiff = abs(qvdiff), qvdiff = SATQ - \n .     SATV)",
      "3. mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())",
      "4. withCallingHandlers({\n .     for (i in seq_along(dots)) {\n .         context_poke(\"column\", old_current_column)\n .         quosures <- expand_across(dots[[i]])\n .         quosures_results <- vector(mode = \"list\", length = length(quosures))\n .         for (k in seq_along(quosures)) {\n .             quo <- quosures[[k]]\n .             quo_data <- attr(quo, \"dplyr:::data\")\n .             if (!is.null(quo_data$column)) {\n .                 context_poke(\"column\", quo_data$column)\n .             }\n .             chunks <- NULL\n .             result <- NULL\n .             if (quo_is_symbol(quo)) {\n .                 name <- as_string(quo_get_expr(quo))\n .                 if (name %in% names(new_columns)) {\n .                   result <- new_columns[[name]]\n .                   chunks <- mask$resolve(name)\n .                 }\n .                 else if (name %in% names(.data)) {\n .                   result <- .data[[name]]\n .                   chunks <- mask$resolve(name)\n .                 }\n .                 if (inherits(.data, \"rowwise_df\") && vec_is_list(result)) {\n .                   sizes <- list_sizes(result)\n .                   wrong <- which(sizes != 1)\n .                   if (length(wrong)) {\n .                     group <- wrong[1L]\n .                     mask$set_current_group(group)\n .                     abort(class = c(\"dplyr:::mutate_incompatible_size\", \n .                       \"dplyr:::internal_error\"), dplyr_error_data = list(result_size = sizes[group], \n .                       expected_size = 1))\n .                   }\n .                 }\n .             }\n .             else if (!quo_is_symbolic(quo) && !is.null(quo_get_expr(quo))) {\n .                 result <- quo_get_expr(quo)\n .                 result <- withCallingHandlers(vec_recycle(result, \n .                   vec_size(.data)), error = function(cnd) {\n .                   abort(class = c(\"dplyr:::mutate_constant_recycle_error\", \n .                     \"dplyr:::internal_error\"), constant_size = vec_size(result), \n .                     data_size = vec_size(.data))\n .                 })\n .                 chunks <- vec_chop(result, rows)\n .             }\n .             if (is.null(chunks)) {\n .                 if (is.null(quo_data$column)) {\n .                   chunks <- mask$eval_all_mutate(quo)\n .                 }\n .                 else {\n .                   chunks <- withCallingHandlers(mask$eval_all_mutate(quo), \n .                     error = function(cnd) {\n .                       msg <- glue(\"Problem while computing column `{quo_data$name_auto}`.\")\n .                       abort(msg, call = call(\"across\"), parent = cnd)\n .                     })\n .                 }\n .             }\n .             if (is.null(chunks)) {\n .                 next\n .             }\n .             if (is.null(result)) {\n .                 if (length(rows) == 1) {\n .                   result <- chunks[[1]]\n .                 }\n .                 else {\n .                   chunks <- dplyr_vec_cast_common(chunks, quo_data$name_auto)\n .                   result <- vec_unchop(chunks, rows)\n .                 }\n .             }\n .             quosures_results[[k]] <- list(result = result, chunks = chunks)\n .         }\n .         for (k in seq_along(quosures)) {\n .             quo <- quosures[[k]]\n .             quo_data <- attr(quo, \"dplyr:::data\")\n .             quo_result <- quosures_results[[k]]\n .             if (is.null(quo_result)) {\n .                 if (quo_data$is_named) {\n .                   name <- quo_data$name_given\n .                   new_columns[[name]] <- zap()\n .                   mask$remove(name)\n .                 }\n .                 next\n .             }\n .             result <- quo_result$result\n .             chunks <- quo_result$chunks\n .             if (!quo_data$is_named && is.data.frame(result)) {\n .                 types <- vec_ptype(result)\n .                 types_names <- names(types)\n .                 chunks_extracted <- .Call(dplyr_extract_chunks, \n .                   chunks, types)\n .                 for (j in seq_along(types)) {\n .                   mask$add_one(types_names[j], chunks_extracted[[j]], \n .                     result = result[[j]])\n .                 }\n .                 new_columns[types_names] <- result\n .             }\n .             else {\n .                 name <- quo_data$name_auto\n .                 mask$add_one(name = name, chunks = chunks, result = result)\n .                 new_columns[[name]] <- result\n .             }\n .         }\n .     }\n . }, error = function(e) {\n .     local_error_context(dots = dots, .index = i, mask = mask)\n .     bullets <- c(cnd_bullet_header(\"computing\"), mutate_bullets(e))\n .     abort(bullets, class = \"dplyr:::mutate_error\", parent = skip_internal_condition(e), \n .         bullets = bullets, call = error_call)\n . }, warning = function(w) {\n .     if (check_muffled_warning(w)) {\n .         maybe_restart(\"muffleWarning\")\n .     }\n .     local_error_context(dots = dots, .index = i, mask = mask)\n .     warn(c(cnd_bullet_header(\"computing\"), i = cnd_header(w), \n .         i = cnd_bullet_cur_group_label(what = \"warning\")))\n .     maybe_restart(\"muffleWarning\")\n . })",
      "5. mask$eval_all_mutate(quo)",
      "6. .handleSimpleError(function (e) \n . {\n .     local_error_context(dots = dots, .index = i, mask = mask)\n .     bullets <- c(cnd_bullet_header(\"computing\"), mutate_bullets(e))\n .     abort(bullets, class = \"dplyr:::mutate_error\", parent = skip_internal_condition(e), \n .         bullets = bullets, call = error_call)\n . }, \"object 'qvdiff' not found\", base::quote(mask$eval_all_mutate(quo)))",
      "7. h(simpleError(msg, call))",
      "8. abort(bullets, class = \"dplyr:::mutate_error\", parent = skip_internal_condition(e), \n .     bullets = bullets, call = error_call)",
      "9. signal_abort(cnd, .file)"
     ]
    }
   ],
   "source": [
    "# Obviously, this is going to giv an error because we are trying to use qvdiff before it was created\n",
    "sat.dat.mutate<-mutate(sat.dat,absqvdiff=abs(qvdiff),  qvdiff=SATQ-SATV)\n",
    "head(sat.dat.mutate)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0509424a",
   "metadata": {},
   "source": [
    "If we want to define a data frame with just the new columns, we can use `transmute` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "ad128811",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 2</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>qvdiff</th><th scope=col>absqvdiff</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>1</th><td>   0</td><td>  0</td></tr>\n",
       "\t<tr><th scope=row>2</th><td>-100</td><td>100</td></tr>\n",
       "\t<tr><th scope=row>3</th><td> -10</td><td> 10</td></tr>\n",
       "\t<tr><th scope=row>4</th><td> -30</td><td> 30</td></tr>\n",
       "\t<tr><th scope=row>5</th><td> -50</td><td> 50</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>   0</td><td>  0</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 2\n",
       "\\begin{tabular}{r|ll}\n",
       "  & qvdiff & absqvdiff\\\\\n",
       "  & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t1 &    0 &   0\\\\\n",
       "\t2 & -100 & 100\\\\\n",
       "\t3 &  -10 &  10\\\\\n",
       "\t4 &  -30 &  30\\\\\n",
       "\t5 &  -50 &  50\\\\\n",
       "\t6 &    0 &   0\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 2\n",
       "\n",
       "| <!--/--> | qvdiff &lt;int&gt; | absqvdiff &lt;int&gt; |\n",
       "|---|---|---|\n",
       "| 1 |    0 |   0 |\n",
       "| 2 | -100 | 100 |\n",
       "| 3 |  -10 |  10 |\n",
       "| 4 |  -30 |  30 |\n",
       "| 5 |  -50 |  50 |\n",
       "| 6 |    0 |   0 |\n",
       "\n"
      ],
      "text/plain": [
       "  qvdiff absqvdiff\n",
       "1    0     0      \n",
       "2 -100   100      \n",
       "3  -10    10      \n",
       "4  -30    30      \n",
       "5  -50    50      \n",
       "6    0     0      "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This just keeps the new created columns\n",
    "sat.dat.transmute<-transmute(sat.dat, qvdiff=SATQ-SATV, absqvdiff=abs(qvdiff))\n",
    "head(sat.dat.transmute)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f9099de",
   "metadata": {},
   "source": [
    "# `summarize`: Collapse many values down to a single summary\n",
    "\n",
    "This functions collapses a data frame to a single row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "b43a2b4f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 1 × 1</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>mean.act</th></tr>\n",
       "\t<tr><th scope=col>&lt;dbl&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>28.54714</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 1 × 1\n",
       "\\begin{tabular}{l}\n",
       " mean.act\\\\\n",
       " <dbl>\\\\\n",
       "\\hline\n",
       "\t 28.54714\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 1 × 1\n",
       "\n",
       "| mean.act &lt;dbl&gt; |\n",
       "|---|\n",
       "| 28.54714 |\n",
       "\n"
      ],
      "text/plain": [
       "  mean.act\n",
       "1 28.54714"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "summarize(sat.dat, mean.act = mean(ACT))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e04ce731",
   "metadata": {},
   "source": [
    "Ok, so this is not very useful, right? It gets so though if we combine it with `group_by` from **tidyverse** as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "99bbf3d5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A tibble: 2 × 2</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>gender</th><th scope=col>mean.act</th></tr>\n",
       "\t<tr><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;dbl&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>1</td><td>28.78543</td></tr>\n",
       "\t<tr><td>2</td><td>28.41722</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A tibble: 2 × 2\n",
       "\\begin{tabular}{ll}\n",
       " gender & mean.act\\\\\n",
       " <fct> & <dbl>\\\\\n",
       "\\hline\n",
       "\t 1 & 28.78543\\\\\n",
       "\t 2 & 28.41722\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A tibble: 2 × 2\n",
       "\n",
       "| gender &lt;fct&gt; | mean.act &lt;dbl&gt; |\n",
       "|---|---|\n",
       "| 1 | 28.78543 |\n",
       "| 2 | 28.41722 |\n",
       "\n"
      ],
      "text/plain": [
       "  gender mean.act\n",
       "1 1      28.78543\n",
       "2 2      28.41722"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This first groups by gender and then calculates the mean ACT within each category\n",
    "by_gender <- group_by(sat.dat, gender)\n",
    "summarize(by_gender, mean.act = mean(ACT))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "ff8ef275",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[22m`summarise()` has grouped output by 'gender'. You can override using the\n",
      "`.groups` argument.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A grouped_df: 12 × 3</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>gender</th><th scope=col>education</th><th scope=col>mean.act</th></tr>\n",
       "\t<tr><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;dbl&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>1</td><td>0</td><td>29.03704</td></tr>\n",
       "\t<tr><td>1</td><td>1</td><td>26.70000</td></tr>\n",
       "\t<tr><td>1</td><td>2</td><td>26.65217</td></tr>\n",
       "\t<tr><td>1</td><td>3</td><td>28.56250</td></tr>\n",
       "\t<tr><td>1</td><td>4</td><td>28.94118</td></tr>\n",
       "\t<tr><td>1</td><td>5</td><td>30.82609</td></tr>\n",
       "\t<tr><td>2</td><td>0</td><td>26.06667</td></tr>\n",
       "\t<tr><td>2</td><td>1</td><td>28.12000</td></tr>\n",
       "\t<tr><td>2</td><td>2</td><td>27.33333</td></tr>\n",
       "\t<tr><td>2</td><td>3</td><td>28.18462</td></tr>\n",
       "\t<tr><td>2</td><td>4</td><td>29.44828</td></tr>\n",
       "\t<tr><td>2</td><td>5</td><td>29.01053</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A grouped\\_df: 12 × 3\n",
       "\\begin{tabular}{lll}\n",
       " gender & education & mean.act\\\\\n",
       " <fct> & <fct> & <dbl>\\\\\n",
       "\\hline\n",
       "\t 1 & 0 & 29.03704\\\\\n",
       "\t 1 & 1 & 26.70000\\\\\n",
       "\t 1 & 2 & 26.65217\\\\\n",
       "\t 1 & 3 & 28.56250\\\\\n",
       "\t 1 & 4 & 28.94118\\\\\n",
       "\t 1 & 5 & 30.82609\\\\\n",
       "\t 2 & 0 & 26.06667\\\\\n",
       "\t 2 & 1 & 28.12000\\\\\n",
       "\t 2 & 2 & 27.33333\\\\\n",
       "\t 2 & 3 & 28.18462\\\\\n",
       "\t 2 & 4 & 29.44828\\\\\n",
       "\t 2 & 5 & 29.01053\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A grouped_df: 12 × 3\n",
       "\n",
       "| gender &lt;fct&gt; | education &lt;fct&gt; | mean.act &lt;dbl&gt; |\n",
       "|---|---|---|\n",
       "| 1 | 0 | 29.03704 |\n",
       "| 1 | 1 | 26.70000 |\n",
       "| 1 | 2 | 26.65217 |\n",
       "| 1 | 3 | 28.56250 |\n",
       "| 1 | 4 | 28.94118 |\n",
       "| 1 | 5 | 30.82609 |\n",
       "| 2 | 0 | 26.06667 |\n",
       "| 2 | 1 | 28.12000 |\n",
       "| 2 | 2 | 27.33333 |\n",
       "| 2 | 3 | 28.18462 |\n",
       "| 2 | 4 | 29.44828 |\n",
       "| 2 | 5 | 29.01053 |\n",
       "\n"
      ],
      "text/plain": [
       "   gender education mean.act\n",
       "1  1      0         29.03704\n",
       "2  1      1         26.70000\n",
       "3  1      2         26.65217\n",
       "4  1      3         28.56250\n",
       "5  1      4         28.94118\n",
       "6  1      5         30.82609\n",
       "7  2      0         26.06667\n",
       "8  2      1         28.12000\n",
       "9  2      2         27.33333\n",
       "10 2      3         28.18462\n",
       "11 2      4         29.44828\n",
       "12 2      5         29.01053"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This first groups by gender and education and then calculates the mean ACT within each combined category\n",
    "by_gender <- group_by(sat.dat, gender, education)\n",
    "summarize(by_gender, mean.act = mean(ACT))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adf63282",
   "metadata": {},
   "source": [
    "# The pipe (%>%) operator \n",
    "\n",
    "Ok, so what is the fuss about all this if, in the end, most of what we've just seen can be done without having to use any of the above functions. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "dbc7bc9b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 7</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>X</th><th scope=col>gender</th><th scope=col>education</th><th scope=col>age</th><th scope=col>ACT</th><th scope=col>SATV</th><th scope=col>SATQ</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>4</th><td>29503</td><td>1</td><td>4</td><td>27</td><td>26</td><td>550</td><td>520</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>29504</td><td>1</td><td>2</td><td>33</td><td>31</td><td>600</td><td>550</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>29518</td><td>1</td><td>5</td><td>26</td><td>28</td><td>640</td><td>640</td></tr>\n",
       "\t<tr><th scope=row>8</th><td>29529</td><td>1</td><td>3</td><td>19</td><td>22</td><td>520</td><td>560</td></tr>\n",
       "\t<tr><th scope=row>11</th><td>29578</td><td>1</td><td>3</td><td>23</td><td>32</td><td>760</td><td>710</td></tr>\n",
       "\t<tr><th scope=row>13</th><td>29617</td><td>1</td><td>4</td><td>32</td><td>21</td><td>600</td><td>600</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 7\n",
       "\\begin{tabular}{r|lllllll}\n",
       "  & X & gender & education & age & ACT & SATV & SATQ\\\\\n",
       "  & <int> & <fct> & <fct> & <int> & <int> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t4 & 29503 & 1 & 4 & 27 & 26 & 550 & 520\\\\\n",
       "\t5 & 29504 & 1 & 2 & 33 & 31 & 600 & 550\\\\\n",
       "\t6 & 29518 & 1 & 5 & 26 & 28 & 640 & 640\\\\\n",
       "\t8 & 29529 & 1 & 3 & 19 & 22 & 520 & 560\\\\\n",
       "\t11 & 29578 & 1 & 3 & 23 & 32 & 760 & 710\\\\\n",
       "\t13 & 29617 & 1 & 4 & 32 & 21 & 600 & 600\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 7\n",
       "\n",
       "| <!--/--> | X &lt;int&gt; | gender &lt;fct&gt; | education &lt;fct&gt; | age &lt;int&gt; | ACT &lt;int&gt; | SATV &lt;int&gt; | SATQ &lt;int&gt; |\n",
       "|---|---|---|---|---|---|---|---|\n",
       "| 4 | 29503 | 1 | 4 | 27 | 26 | 550 | 520 |\n",
       "| 5 | 29504 | 1 | 2 | 33 | 31 | 600 | 550 |\n",
       "| 6 | 29518 | 1 | 5 | 26 | 28 | 640 | 640 |\n",
       "| 8 | 29529 | 1 | 3 | 19 | 22 | 520 | 560 |\n",
       "| 11 | 29578 | 1 | 3 | 23 | 32 | 760 | 710 |\n",
       "| 13 | 29617 | 1 | 4 | 32 | 21 | 600 | 600 |\n",
       "\n"
      ],
      "text/plain": [
       "   X     gender education age ACT SATV SATQ\n",
       "4  29503 1      4         27  26  550  520 \n",
       "5  29504 1      2         33  31  600  550 \n",
       "6  29518 1      5         26  28  640  640 \n",
       "8  29529 1      3         19  22  520  560 \n",
       "11 29578 1      3         23  32  760  710 \n",
       "13 29617 1      4         32  21  600  600 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Here filtering observations of one gender\n",
    "head(sat.dat[sat.dat$gender==1,])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "28d7c46d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 8</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>X</th><th scope=col>gender</th><th scope=col>education</th><th scope=col>age</th><th scope=col>ACT</th><th scope=col>SATV</th><th scope=col>SATQ</th><th scope=col>qvdiff</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>1</th><td>29442</td><td>2</td><td>3</td><td>19</td><td>24</td><td>500</td><td>500</td><td>   0</td></tr>\n",
       "\t<tr><th scope=row>2</th><td>29457</td><td>2</td><td>3</td><td>23</td><td>35</td><td>600</td><td>500</td><td>-100</td></tr>\n",
       "\t<tr><th scope=row>3</th><td>29498</td><td>2</td><td>3</td><td>20</td><td>21</td><td>480</td><td>470</td><td> -10</td></tr>\n",
       "\t<tr><th scope=row>4</th><td>29503</td><td>1</td><td>4</td><td>27</td><td>26</td><td>550</td><td>520</td><td> -30</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>29504</td><td>1</td><td>2</td><td>33</td><td>31</td><td>600</td><td>550</td><td> -50</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>29518</td><td>1</td><td>5</td><td>26</td><td>28</td><td>640</td><td>640</td><td>   0</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 8\n",
       "\\begin{tabular}{r|llllllll}\n",
       "  & X & gender & education & age & ACT & SATV & SATQ & qvdiff\\\\\n",
       "  & <int> & <fct> & <fct> & <int> & <int> & <int> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t1 & 29442 & 2 & 3 & 19 & 24 & 500 & 500 &    0\\\\\n",
       "\t2 & 29457 & 2 & 3 & 23 & 35 & 600 & 500 & -100\\\\\n",
       "\t3 & 29498 & 2 & 3 & 20 & 21 & 480 & 470 &  -10\\\\\n",
       "\t4 & 29503 & 1 & 4 & 27 & 26 & 550 & 520 &  -30\\\\\n",
       "\t5 & 29504 & 1 & 2 & 33 & 31 & 600 & 550 &  -50\\\\\n",
       "\t6 & 29518 & 1 & 5 & 26 & 28 & 640 & 640 &    0\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 8\n",
       "\n",
       "| <!--/--> | X &lt;int&gt; | gender &lt;fct&gt; | education &lt;fct&gt; | age &lt;int&gt; | ACT &lt;int&gt; | SATV &lt;int&gt; | SATQ &lt;int&gt; | qvdiff &lt;int&gt; |\n",
       "|---|---|---|---|---|---|---|---|---|\n",
       "| 1 | 29442 | 2 | 3 | 19 | 24 | 500 | 500 |    0 |\n",
       "| 2 | 29457 | 2 | 3 | 23 | 35 | 600 | 500 | -100 |\n",
       "| 3 | 29498 | 2 | 3 | 20 | 21 | 480 | 470 |  -10 |\n",
       "| 4 | 29503 | 1 | 4 | 27 | 26 | 550 | 520 |  -30 |\n",
       "| 5 | 29504 | 1 | 2 | 33 | 31 | 600 | 550 |  -50 |\n",
       "| 6 | 29518 | 1 | 5 | 26 | 28 | 640 | 640 |    0 |\n",
       "\n"
      ],
      "text/plain": [
       "  X     gender education age ACT SATV SATQ qvdiff\n",
       "1 29442 2      3         19  24  500  500     0  \n",
       "2 29457 2      3         23  35  600  500  -100  \n",
       "3 29498 2      3         20  21  480  470   -10  \n",
       "4 29503 1      4         27  26  550  520   -30  \n",
       "5 29504 1      2         33  31  600  550   -50  \n",
       "6 29518 1      5         26  28  640  640     0  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Here creating a new score as the difference between SATQ and SATV scores\n",
    "new.sat.dat<-sat.dat\n",
    "new.sat.dat$qvdiff<-new.sat.dat$SATQ - new.sat.dat$SATV\n",
    "\n",
    "head(new.sat.dat)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5f1179c",
   "metadata": {},
   "source": [
    "Apart from the user-friendly sintax of the **dplyr** functions, imagine that you would want to apply many transformation steps to your data frame. One should save intermediate objects or nest many functions within the same line. For example:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce513a95",
   "metadata": {},
   "source": [
    "Nesting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "3809c1db",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 1 × 1</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>mean.act</th></tr>\n",
       "\t<tr><th scope=col>&lt;dbl&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>20.51429</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 1 × 1\n",
       "\\begin{tabular}{l}\n",
       " mean.act\\\\\n",
       " <dbl>\\\\\n",
       "\\hline\n",
       "\t 20.51429\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 1 × 1\n",
       "\n",
       "| mean.act &lt;dbl&gt; |\n",
       "|---|\n",
       "| 20.51429 |\n",
       "\n"
      ],
      "text/plain": [
       "  mean.act\n",
       "1 20.51429"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This selects observations for gender 1, then creates a new variable and finally computes its mean  \n",
    "summarize(mutate(filter(sat.dat, sat.dat$gender==1), qvdiff=SATQ-SATV), mean.act = mean(qvdiff, na.rm = TRUE))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9ed7fd5",
   "metadata": {},
   "source": [
    "Saving intermediate results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "23e9057f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 6 × 8</caption>\n",
       "<thead>\n",
       "\t<tr><th></th><th scope=col>X</th><th scope=col>gender</th><th scope=col>education</th><th scope=col>age</th><th scope=col>ACT</th><th scope=col>SATV</th><th scope=col>SATQ</th><th scope=col>qvdiff</th></tr>\n",
       "\t<tr><th></th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><th scope=row>4</th><td>29503</td><td>1</td><td>4</td><td>27</td><td>26</td><td>550</td><td>520</td><td>-30</td></tr>\n",
       "\t<tr><th scope=row>5</th><td>29504</td><td>1</td><td>2</td><td>33</td><td>31</td><td>600</td><td>550</td><td>-50</td></tr>\n",
       "\t<tr><th scope=row>6</th><td>29518</td><td>1</td><td>5</td><td>26</td><td>28</td><td>640</td><td>640</td><td>  0</td></tr>\n",
       "\t<tr><th scope=row>8</th><td>29529</td><td>1</td><td>3</td><td>19</td><td>22</td><td>520</td><td>560</td><td> 40</td></tr>\n",
       "\t<tr><th scope=row>11</th><td>29578</td><td>1</td><td>3</td><td>23</td><td>32</td><td>760</td><td>710</td><td>-50</td></tr>\n",
       "\t<tr><th scope=row>13</th><td>29617</td><td>1</td><td>4</td><td>32</td><td>21</td><td>600</td><td>600</td><td>  0</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 8\n",
       "\\begin{tabular}{r|llllllll}\n",
       "  & X & gender & education & age & ACT & SATV & SATQ & qvdiff\\\\\n",
       "  & <int> & <fct> & <fct> & <int> & <int> & <int> & <int> & <int>\\\\\n",
       "\\hline\n",
       "\t4 & 29503 & 1 & 4 & 27 & 26 & 550 & 520 & -30\\\\\n",
       "\t5 & 29504 & 1 & 2 & 33 & 31 & 600 & 550 & -50\\\\\n",
       "\t6 & 29518 & 1 & 5 & 26 & 28 & 640 & 640 &   0\\\\\n",
       "\t8 & 29529 & 1 & 3 & 19 & 22 & 520 & 560 &  40\\\\\n",
       "\t11 & 29578 & 1 & 3 & 23 & 32 & 760 & 710 & -50\\\\\n",
       "\t13 & 29617 & 1 & 4 & 32 & 21 & 600 & 600 &   0\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 8\n",
       "\n",
       "| <!--/--> | X &lt;int&gt; | gender &lt;fct&gt; | education &lt;fct&gt; | age &lt;int&gt; | ACT &lt;int&gt; | SATV &lt;int&gt; | SATQ &lt;int&gt; | qvdiff &lt;int&gt; |\n",
       "|---|---|---|---|---|---|---|---|---|\n",
       "| 4 | 29503 | 1 | 4 | 27 | 26 | 550 | 520 | -30 |\n",
       "| 5 | 29504 | 1 | 2 | 33 | 31 | 600 | 550 | -50 |\n",
       "| 6 | 29518 | 1 | 5 | 26 | 28 | 640 | 640 |   0 |\n",
       "| 8 | 29529 | 1 | 3 | 19 | 22 | 520 | 560 |  40 |\n",
       "| 11 | 29578 | 1 | 3 | 23 | 32 | 760 | 710 | -50 |\n",
       "| 13 | 29617 | 1 | 4 | 32 | 21 | 600 | 600 |   0 |\n",
       "\n"
      ],
      "text/plain": [
       "   X     gender education age ACT SATV SATQ qvdiff\n",
       "4  29503 1      4         27  26  550  520  -30   \n",
       "5  29504 1      2         33  31  600  550  -50   \n",
       "6  29518 1      5         26  28  640  640    0   \n",
       "8  29529 1      3         19  22  520  560   40   \n",
       "11 29578 1      3         23  32  760  710  -50   \n",
       "13 29617 1      4         32  21  600  600    0   "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "20.5142857142857"
      ],
      "text/latex": [
       "20.5142857142857"
      ],
      "text/markdown": [
       "20.5142857142857"
      ],
      "text/plain": [
       "[1] 20.51429"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This selects observations for gender 1, then create a new variable and finally computes its mean\n",
    "sat.dat.subset<-sat.dat[sat.dat$gender==1,]\n",
    "sat.dat.subset$qvdiff<-sat.dat.subset$SATQ - sat.dat.subset$SATV\n",
    "\n",
    "head(sat.dat.subset)\n",
    "mean(sat.dat.subset$qvdiff, na.rm = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b9a2bce",
   "metadata": {},
   "source": [
    "Now is when the  %>% operator from **tidyverse** brings it magic, since it allows you to apply multiple operations from left-to-right, top-to-bottom (i.e. it is read as “then”)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "d0d94d1d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 1 × 1</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>mean.act</th></tr>\n",
       "\t<tr><th scope=col>&lt;dbl&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>20.51429</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 1 × 1\n",
       "\\begin{tabular}{l}\n",
       " mean.act\\\\\n",
       " <dbl>\\\\\n",
       "\\hline\n",
       "\t 20.51429\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 1 × 1\n",
       "\n",
       "| mean.act &lt;dbl&gt; |\n",
       "|---|\n",
       "| 20.51429 |\n",
       "\n"
      ],
      "text/plain": [
       "  mean.act\n",
       "1 20.51429"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sat.dat %>% \n",
    "filter(gender==1) %>%\n",
    "mutate(qvdiff=SATQ-SATV) %>%\n",
    "summarize(mean.act = mean(qvdiff, na.rm = TRUE))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e90eef5d",
   "metadata": {},
   "source": [
    "This considerably improves the readability of code. It's pretty cool, right?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "7c14abd0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A data.frame: 1 × 8</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>mu</th><th scope=col>pop_med</th><th scope=col>sigma</th><th scope=col>pop_iqr</th><th scope=col>pop_min</th><th scope=col>pop_max</th><th scope=col>pop_q1</th><th scope=col>pop_q3</th></tr>\n",
       "\t<tr><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>28.54714</td><td>29</td><td>4.82356</td><td>7</td><td>3</td><td>36</td><td>25</td><td>32</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 1 × 8\n",
       "\\begin{tabular}{llllllll}\n",
       " mu & pop\\_med & sigma & pop\\_iqr & pop\\_min & pop\\_max & pop\\_q1 & pop\\_q3\\\\\n",
       " <dbl> & <dbl> & <dbl> & <dbl> & <int> & <int> & <dbl> & <dbl>\\\\\n",
       "\\hline\n",
       "\t 28.54714 & 29 & 4.82356 & 7 & 3 & 36 & 25 & 32\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 1 × 8\n",
       "\n",
       "| mu &lt;dbl&gt; | pop_med &lt;dbl&gt; | sigma &lt;dbl&gt; | pop_iqr &lt;dbl&gt; | pop_min &lt;int&gt; | pop_max &lt;int&gt; | pop_q1 &lt;dbl&gt; | pop_q3 &lt;dbl&gt; |\n",
       "|---|---|---|---|---|---|---|---|\n",
       "| 28.54714 | 29 | 4.82356 | 7 | 3 | 36 | 25 | 32 |\n",
       "\n"
      ],
      "text/plain": [
       "  mu       pop_med sigma   pop_iqr pop_min pop_max pop_q1 pop_q3\n",
       "1 28.54714 29      4.82356 7       3       36      25     32    "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This was the example of the previous tutorial\n",
    "sat.dat %>% \n",
    "        summarize(mu = mean(ACT), \n",
    "        pop_med = median(ACT), \n",
    "        sigma = sd(ACT), \n",
    "        pop_iqr = IQR(ACT),\n",
    "        pop_min = min(ACT), \n",
    "        pop_max = max(ACT),\n",
    "        pop_q1 = quantile(ACT, 0.25),  # first quartile, 25th percentile\n",
    "        pop_q3 = quantile(ACT, 0.75))  # third quartile, 75th percentile)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80a5671c",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\"> <b>Practice</b>:  Using the operator %>%, perform the following operations:\n",
    "    <ol>\n",
    "<li>Filter observations such that we keep only those with an age within 18 and 45 years. </li>\n",
    "<li>Create a new variable, \"SATV.avg\", as the average of SATQ and SATV.</li>\n",
    "<li>Select the columns \"gender\", \"education\", and \"SATV.avg\".</li>\n",
    "<li>Group the data frame by gender and education</li>\n",
    "<li>Summarize the data frame by computing the mean and standard deviation of \"SATV.avg\".</li>\n",
    "<li>With the resulting data frame, create a barplot using geom_bar, such that the x position corresponds to each level of education and the height of each bar (i.e. the y position) is the mean of \"SATV.avg\". In order to have a different bar per gender, pass this variable to the fill argument in aesthetics (set position=\"dodge\" to have these bars next to each other instead of stacked). Finally, use geom_errorbar to add the standard deviation as an error bar (set position=\"dodge\" again here). Visit <a href=\"https://r-graph-gallery.com/4-barplot-with-error-bar.html\"> https://r-graph-gallery.com/4-barplot-with-error-bar.html</a> to see an example of how to generate this kind of plots </li></ol> </div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "d959323e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your response here"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.2.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}