--- title: "Lecture 2 Problem Set" author: "INSERT YOUR NAME HERE" date: "10/4/2019" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ```{r, echo=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", highlight = TRUE) #comment = "#>" makes it so results from a code chunk start with "#>"; default is "##" ``` In this problem set, you will investigate objects and data patterns. # Step 1: Make changes to YAML header Recommend reading R Markdown: The Definative Guide section 3.1 ([LINK HERE](https://bookdown.org/yihui/rmarkdown/html-document.html)) before answering these questions 1. Add a table of contents to YAML header 1. table of contents should have "depth" of 2 1. table of contents should "float" 1. add section numbering to headers Load tidyverse package [code already provided] ```{r} #install.packages("tidyverse") #install if you do not have tidyverse installed library(tidyverse) ``` Load data frame object and use `count()` to tabulate the total number of visits each school received [code already provided] ```{r} rm(list = ls()) # remove all objects getwd() load(url("https://github.com/ozanj/rclass/raw/master/data/recruiting/recruit_school_allvars.RData")) #glimpse(df_school_all) df_school_all %>% count(total_visits) ``` # Step 2: Investigating objects 1. Answer the following questions about the object `df_school_all` by running the appropriate R commands within the code chunk and writing any substantive response required next to the question. The code and substantive response for the first question will be answered for you as an example. + What "type" of object is `df_school_all`? + __ANSWER [FREEBIE]: the object df_school_all is a list type__ + What is the "length" of the object `df_school_all`? What does this specific value of length refer to? + __YOUR ANSWER HERE__ + how many "rows" are in the object `df_school_all`? what does each row represent? + __YOUR ANSWER HERE__ ```{r} #type of df_school_all #length of df_school_all #num of rows in df_school_all ``` \newline 2. In the below code chunk, use the `str()` function to describe the contents of `df_school_all` and then answer the following questions in text above the code chunk. + What does each element of the object `df_school_all` represent? (hint Lecture 1.2) + Are the individual elements within `df_school_all` lists or vectors? + Are the individual elements within `df_school_all` named or un-named? what do these element names refer to? (hint use `names()`) ```{r} ``` \newline 3. These questions refer to the variable `school_type` within the object `df_school_all`. Run the appropriate R commands in the chunk below and write substantive responses below each question. - What is the data "type" of `school_type`? - __ANSWER__: - What is the "length" of `school_type`? What does this specific value of length refer to? - __ANSWER__: ```{r} ``` 4. In these questions, apply the `table()` function to the variable `school_type` within the object `df_school_all`. Run the appropriate R code within the chunk below and write substantive responses below each question. + In your own words, what does the `table()` function do? + What does the `useNA` argument of the `table()` function control? + What is the default value of the `useNA` argument and what does this default value mean? + What happens when you assign the value `"ifany"` to the `useNA` argument? + What happens when you assign the value `"always"` to the `useNA` argument? + In the below R code chunk, use the `table()` function to count the number of observations for each value `school_type` three different ways: + First, without specifying any value for `useNA` + Second, by assigning the value `"ifany"` to the `useNA` argument + Third, by assigning the value `"always"` to the `useNA` argument ```{r} ``` # Step 3: Filter, select, arrange questions The data frame `df_school_all` has one observation for each high school (public and private). - The variables that begin with `visits_by_...` identify how many off-campus recruiting visits the high school received from a particular public university. For example, UC Berkeley has the ID `110635` so the variable `visits_by_110635` identifies how many visits the high school received from UC Berkeley. - The variable `total_visits` identifies the number of visits the high school received from all (16) public research universities in this data collection sample. For the questions below, imagine that you have been asked by a major news outlet to identify which high schools receive the most total number of off-campus recruiting visits from public universities. - For all questions below, you can answer using one line of code or you can answer in several steps (e.g., first create new data frame, then print selected variables) - For questions that ask you to print the "top 10" observations, you can simply print the object and rely on the fact that the default option [for "tibble" data frames] is to print 10 observations OR you can wrap the command in the `head()` function and explicitly tell R to print 10 observations - Before conducting analyses, we'll rename the variable `avgmedian_inc_2564` to give it a shorter name. 1. Rename the variable `avgmedian_inc_2564` to `med_inc` and assign new variable name to the existing object `df_school_all` ```{r} ``` 2. The news outlet is interested in comparing the in-state and out-of-state high school visits for [The University of Alabama](https://nces.ed.gov/globallocator/col_info_popup.asp?ID=100751) (IPEDS ID = 100751) variable `visits_by_100751`. Count the number of in-state public high schools that received at least one vist from The University of Alabama. - Note: You will need to use `filter` and the `count` function - Use commas to separate varibles for this questions (e.g. filter(dataframe, variable == something, variable == something)) - You can do this in one step by wrapping the `count` function around the `filter` function ; or you can do this in two steps by creating a new data frame first ```{r} ``` 3. How many public in-state high schools visited by The University of Alabama enroll at least 50% Latinx students **or** 50% Black students? - hint: use the variables `pct_hispanic` and `pct_black`. ```{r} ``` 4. Now count the out-of-state public high schools that received at least one visit by the university of Alabama without using commas to seprate conditions in your filter. - hint: `&` or `%in%` ```{r} ``` 5. How many public out-of-state high schools visited by The University of Alabama enroll at least 50% Latinx students **or** 50% Black students? ```{r} ``` Once finished, knit to (HTML) and upload both .Rmd and HTML files to ps_submit_2 on Piazza *Remember to use this naming convention "lastname_firstname_ps2"*