--- title: "Data Wrangling III" author: "YOUR NAME HERE" date: "today" format: html: self-contained: true toc: true toc_float: true number_section: false highlight: "tango" theme: "cosmo" editor: visual editor_options: chunk_output_type: console --- ```{r setup, message = FALSE} library(tidyverse) ``` # Part 1 To start, you will work with the following data called `scores` ```{r} scores <- tribble( ~student_id, ~math, ~english, "S1", 90, 92, "S2", 85, 80, "S3", 88, 85, "S4", 95, 74 ) scores ``` Note: The `tribble()` function above is helpful for creating small data frames (tibbles) with an easier to read row-by-row layout. It has three variables (`student_id`, `math`, and `english`) and four rows (one for each student). ## Exercise 1 Before writing any code, answer the following questions: Suppose you want to reshape the data frame so that there is one row per student per subject. What function would you use to do this? > YOUR ANSWER HERE If you reshaped the data to have one row per student per subject, how many rows would the resulting data frame have? > YOUR ANSWER HERE How many columns would the resulting data frame have and what would the column names be? > YOUR ANSWER HERE Now, write the code to reshape the data frame as described above. ```{r} ``` ## Exercise 2 ```{r} patients <- tribble( ~patient_id, ~measurement_time_, ~systolic_bp, "P1", "Morning", 120, "P1", "Noon", 115, "P1", "Evening", 123, "P2", "Morning", 118, "P2", "Evening", 121 ) patients ``` It has three variables (`patient_id`, `measurement_time_`, and `systolic_bp` -- short for systolic blood pressure) and five rows (one per patient per measurement time). Before writing any code, answer the following questions: + Suppose you want to reshape the data frame so that there is one row per patient and measurements at different times of the day are recorded in different columns. What function would you use to do this? > YOUR ANSWER HERE + If you reshaped the data to have one row per patient, - how many rows would the resulting data frame have? - how many columns would the resulting data frame have and what would the column names be? > YOUR ANSWER HERE Now, write the code to reshape the data frame as described above. What does the `NA` value mean in the resulting data frame? ```{r} ``` > YOUR ANSWER HERE # Part 2 In this part you'll investigate the relationship between age and opinion on the impact of the many changes (transfer portal, athlete name, image and likeness (NIL) compensation, conference realignments) taking place in Division I college athletics. YouGov, in collaboration with Elon University Poll and the Knight Commission on Intercollegiate Athletics, polled 1,500 US adults (aged 18 and older) between July 7-11, 2025.[^2] The following question was asked to these 1,500 adults: > Overall, how would you describe the impact of the many changes (transfer portal, athlete name, image and likeness (NIL) compensation, conference realignments[^3]) taking place in Division I college athletics? [^2]: Full survey results can be found at . [^3]: The transfer portal is an online database for college student-athletes who wish to transfer to a different school. Name, image, and likeness (NIL) compensation allows college athletes to earn money from third-party companies for using their "name, image, and likeness" through activities like endorsements, social media promotions, and public appearances. Conference realignments refer to the shifting of colleges and universities between athletic conferences, which can affect competition levels, revenue distribution, and media exposure. Responses were broken down into the following categories: | Variable | Levels | |:---------|:------------------------------------------------| | Age | 18-44; 45+ | | Opinion | Very positive; Somewhat positive; Neutral; Somewhat negative; Very negative; Unsure | The counts for each age level and opinion are given in the dataset `survey_counts` below. ```{r} survey_counts <- tribble( ~age, ~opinion, ~n, "18-44", "Very positive", 78, "18-44", "Somewhat positive", 176, "18-44", "Neutral", 162, "18-44", "Somewhat negative", 50, "18-44", "Very negative", 36, "18-44", "Unsure", 197, "45+", "Very positive", 41, "45+", "Somewhat positive", 121, "45+", "Neutral", 186, "45+", "Somewhat negative", 146, "45+", "Very negative", 97, "45+", "Unsure", 210 ) |> mutate( opinion = factor(opinion, levels = c("Very positive", "Somewhat positive", "Neutral", "Somewhat negative", "Very negative", "Unsure")) ) ``` For each exercise below, use a single pipeline starting with `survey_counts`, calculate the desired proportions, and make sure the result is an **ungrouped** data frame with a column for relevant counts, a column for relevant proportions, and a column for the groups you're interested in. ## Exercise 3 Marginal proportions of age: Calculate the proportions of individuals who are 18-44 year olds and 45+ year-olds in this sample. ```{r} ``` ## Exercise 4 Marginal proportions of opinion: Calculate the proportions of individuals who are Very positive, Somewhat positive, Neutral, Somewhat negative, Very negative, and Unsure. ```{r} ``` ## Exercise 5 Conditional proportions of opinion based on age: Calculate the proportions of individuals who are Very positive, Somewhat positive, Neutral, Somewhat negative, Very negative, and Unsure - among those who are 18-44 years old and - among those who are 45+ years old. ```{r} ``` ## Exercise 6 Adapt your code from Exercise 5 to instead display your proportions as a 6x3 data frame where the first column lists the opinion category, and the 2nd and 3rd columns give the proportions for the `18-44` and `45+` age categories, respectively. ```{r} ```