--- title: "R Operators & Pipes" author: date: urlcolor: blue output: html_document: toc: true toc_depth: 2 toc_float: true # toc_float option to float the table of contents to the left of the main document content. floating table of contents will always be visible even when the document is scrolled #collapsed: false # collapsed (defaults to TRUE) controls whether the TOC appears with only the top-level (e.g., H2) headers. If collapsed initially, the TOC is automatically expanded inline when necessary #smooth_scroll: true # smooth_scroll (defaults to TRUE) controls whether page scrolls are animated when TOC items are navigated to via mouse clicks number_sections: true fig_caption: true # ? this option doesn't seem to be working for figure inserted below outside of r code chunk highlight: tango # Supported styles include "default", "tango", "pygments", "kate", "monochrome", "espresso", "zenburn", and "haddock" (specify null to prevent syntax theme: default # theme specifies the Bootstrap theme to use for the page. Valid themes include default, cerulean, journal, flatly, readable, spacelab, united, cosmo, lumen, paper, sandstone, simplex, and yeti. df_print: tibble #options: default, tibble, paged keep_md: true # may be helpful for storing on github --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Introduction Load packages: ```{r, message=FALSE} #install.packages('tidyverse') library(tidyverse) ``` Load data: ```{r} #data we used in the last problem set load(url('https://github.com/anyone-can-cook/rclass1/raw/master/data/recruiting/recruit_event_somevars.RData')) #data we are using in this lecture load(url("https://github.com/ozanj/rclass/raw/master/data/prospect_list/wwlist_merged.RData")) ``` Resources used to create this document: - https://www.datamentor.io/r-programming/precedence-associativity/ - https://stat.ethz.ch/R-manual/R-devel/library/base/html/Syntax.html - https://www.w3adda.com/r-tutorial/r-operator-precedence # Operator Precedence and Associativity It is important to understand operator precedence and associativity when working with data in general, but particularly when we use logical operators like `&` or `|` to filter data.
**Operator Precedence** Operator Precedence refers to the order in which operators are executed. Operators with higher precedence are evaluated first and operators with the lowest precedence are evaluated last. If you recall [PEMDAS](https://study.com/academy/lesson/what-is-pemdas-definition-rule-examples.html#:~:text=PEMDAS%20is%20an%20acronym%20for,until%20the%20calculation%20is%20complete.), multiplication `*` takes precedence over addition `+`. ```{r} 1 + 2 * 6 ``` If we add a parenthesis to the equation, our answer changes. ```{r} (1 + 2) * 6 ```
**Operator Associativity** When you are working with operators with the same precedence, the execution of the operators is determined through associativity. > Operators with same precedence follows operator associativity defined for its operator group. In R, operators can either follow left-associative, right-associative or have no associativity. Operators with left-associative are evaluted from left to right, operators with right-associative are evaluated from right to left and operators with no associativity, does not follow any predefined order. *Credit: [R Operator Precedence](https://www.w3adda.com/r-tutorial/r-operator-precedence)* ```{r} 8 / 4 / 2 ``` ```{r} 8 / (4 / 2) ``` *Credit: [R Operator Precedence and Associativity](https://www.w3adda.com/r-tutorial/r-operator-precedence), [Operator Syntax and Precedence](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Syntax.html)*
**R logical operator precedence** According to this [link](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Syntax.html), the `&` takes precedence over `|`. And both logical operators are left-associative, meaning they are evaluated from left to right. *Credit [R Operator Precedence](https://www.datamentor.io/r-programming/precedence-associativity/)*
**Now let's see an example using the data frame from problem set 3** Why do we get different results? ```{r} nrow(subset(df_event, event_state %in% c("CA", "FL", "MA") & event_type == "public hs" | total_students_pri > 1000)) nrow(subset(df_event, event_state %in% c("CA", "FL", "MA") & (event_type == "public hs" | total_students_pri > 1000))) ``` Let's try to put our code into words. The first code reads. Return observations where the recruiting visit took place at a public high school in the state of California, Florida, or Massachusetts **OR** a private high school where the total number of students is greater than 1000. ```{r} tail(subset(df_event, event_state %in% c("CA", "FL", "MA") & event_type == "public hs" | total_students_pri > 1000, select = c(event_state, event_type, total_students_pri))) ``` The second code reads. Return observations where the recruiting visit took place at a public high school **OR** private high school where the total number of students is greater than 1000 in the state of California, Florida, or Massachusetts. ```{r} tail(subset(df_event, event_state %in% c("CA", "FL", "MA") & (event_type == "public hs" | total_students_pri > 1000), select = c(event_state, event_type, total_students_pri))) ``` # Practice with pipes **wwlist** data frame - De-identified list of prospective students purchased by Western Washington University from College Board - We collected these data using public records requests request ```{r} dim(wwlist) names(wwlist) #glimpse(wwlist) #str(wwlist) ``` Let's use `select()`, `filter()`, and `arrange()` to do the following using the Base R approach: - Sort `wwlist` descending by `total_students` - Select the following variables: `hs_state`, `hs_city`, `hs_name`, `school_type`, `total_students` - Filter for private schools `(school_type == "private")` - Print the first 10 observations ```{r} head(select(arrange(filter(wwlist, school_type == "private"), desc(total_students)), hs_state, hs_city, hs_name, school_type, total_students), n = 10) ``` ```{r} df_temp <- filter(wwlist,school_type == "private") df_temp2 <- arrange(df_temp,desc(total_students)) head(select(df_temp2, hs_state, hs_city, hs_name, school_type, total_students),n=10) rm(df_temp,df_temp2) ``` Now let's use pipes `%` ```{r} wwlist %>% filter(school_type == "private") %>% arrange(desc(total_students)) %>% select(state, hs_name, school_type, total_students) %>% head(n = 10) ```
**Your turn** Use`select()`, `filter()`, and `arrange()` to do the following using both the Base R & tidyverse approach: - Sort `wwlist` descending by `med_inc_zip` - Select the following variables: `hs_state`, `hs_city`, `hs_name`, `school_type`, `med_inc_zip`, `ethn_code`, `med_inc_state` - Filter for public schools `(school_type == "public")` in the state of New York `hs_state == "NY"` - Print the first 10 observations **Base R** ```{r} ``` **Tidyverse using pipes** ```{r} ```
Now let's use`select()`, `filter()`, and `arrange()` to do the following using both the Base R & tidyverse approach: - Sort `wwlist` descending by `med_inc_zip` - Select the following variables: `hs_state`, `hs_city`, `hs_name`, `school_type`, `med_inc_zip`, `ethn_code`, `med_inc_state` - Filter for public schools `(school_type == "public")` where the `med_inc_zip` is less than the `med_inc_state` - Print the first 10 observations **Base R** ```{r} ``` **Tidyverse using pipes** ```{r} ```
**Bonus** question - Write down a question you have about the data. - Using any or all of the following functions `select()`, `filter()`, `arrange()`, how would you go about subsetting and sorting the data to answer your question? Write down your question below: Now try to work through your question. ```{r} ```