--- title: 'FEV Case Study: Graphing' author: "Lucy" date: "2023-08-28" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Review We'll continue exploring the FEV data set from last week. Let's start by loading the data and required packages. ```{r, message = FALSE} library(rigr) library(dplyr) library(ggplot2) fev_tbl <- as_tibble(fev) %>% mutate(across(sex:smoke, ~ as.factor(.x))) ``` Last week, we found that the mean FEV in the smoking group was substantially higher than the average FEV in the non-smoking group. That is, the smokers appear to have *better* lung function than the non-smokers. We also had a theory as to why: the association between FEV and smoking status may be confounded, eg. by age/size. We'll investigate further this week with our newly acquired plotting skills! Here are some helpful resources for making plots, now that we are moving towards less "guided" exercises: - [ggplot2 package webpage](https://ggplot2.tidyverse.org/index.html) - [ggplot2 Book](https://ggplot2-book.org/) - [R Graphics Cookbook](https://r-graphics.org/) ## Smoking and FEV (unadjusted) ### Exercise 1 Now that we have plotting tools, let's make a plot that compares the FEV of smokers to the FEV of non-smokers. ```{r} # FILL IN HERE ``` This broadly tells the same story as the summaries we calculated last week, but conveys more information and is more visually engaging. ## Searching for potential confounders Let's do some digging to see if we can root out some potential confounders. ### Exercise 2 Last week, we found that the youngest patient who smokes is 9, suggesting that there is a difference in the age distribution among smokers and non-smokers. Make a plot that compares these two distributions. ```{r} # FILL IN HERE ``` Again, we see that the smokers are overall older than the non-smokers. ### Exercise 3 We think that age should be related to height, which in turn should be related to FEV. Let's investigate that more systematically now with plotting. How would you like to investigate that with plotting? Here's one suggestion, though you don't have to take it: make a plot with two panels, one that has a scatterplot of height versus age for the female patients, and one that has a scatterplot of height versus age for the male patients. ```{r} # FILL IN HERE ``` We see that the within-sex trend is similar: height is linear-ish at younger ages, and flat-ish at older ages. Boys wind up taller at older ages. We will then make a scatterplot of FEV versus height. ```{r} ggplot(fev_tbl, aes(x = height, y = fev)) + geom_jitter(width=0.2, alpha = 0.75) + ggtitle("FEV versus height") + ylab("FEV (l/s)") + xlab("Height (inches)") + theme_bw() ``` We see that taller participants generally have higher FEV. ## Smoking and lung function: a more nuanced look Now that we know that the smokers are older and bigger and have higher FEV, let's look at the relationship between FEV and smoking status *adjusted* for height. ### Exercise 4 Make a scatterplot of FEV versus height, with points marked by smoking status. ```{r} # FILL IN HERE ``` Based on this plot, it seems like the FEV of smokers and non-smokers *of the same height* is pretty similar.