--- title: "Plotting Tutorial" subtitle: "Applied Longitudinal Data Analysis" author: "Emorie D Beck" date: "`r Sys.setlocale('LC_TIME', 'C'); format(Sys.time(), '%d\\\\. %B %Y')`" output: html_document: theme: united highlight: tango df_print: paged code_folding: show toc: true toc_float: true editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = F, warning = F, error = F) ``` Download .Rmd (won't work in Safari or IE) See GitHub Repository For a much more exhaustive tutorial, see this. # Workspace ## Packages First let's load in our packages. We're going to load in a couple of extras (`brms` and `tidybayes`) that we haven't used before. These will let us make some pretty plots later. ```{r packages} library(psych) library(knitr) library(kableExtra) library(lme4) library(broom.mixed) library(brms) library(tidybayes) library(plyr) library(tidyverse) data_path <- "https://github.com/emoriebeck/R-tutorials/raw/master" ``` ## Read in Data The National Longitudinal Study of Youths 1979 Child and Young Adult Sample (NLSYCYA) is a longitudinal study conducted by the National Bureau of Labor Statistics. The sample includes the children of the original 1979 sample, of which we will use a small subset. Here, we are going to use a subset of the more than 11,000 variables available that include the following: Item Name | Description | Time-Varying? ----------- | ----------------------------- | ------------- PROC_CID | Participant ID | No Dem_DOB | Year of Date of Birth | No groups | Jail, Community Service, None | No DemPWeight | Weight Percentile at age 10 | No age | Age of participant | Yes Year | Year of Survey | Yes age0 | Age of participant (centered) | Yes SensSeek | Sensation-Seeking Composite | Yes CESD | CESD Depression Composite | Yes ```{r read data ex6} load(url(sprintf("%s/ALDA/week_4_plotting/data/sample.RData", data_path))) sample_dat ``` ## Restructure Data These data are already largely cleaned as that is not our focus today. We just need to do a little bit of restructuring. To run our models using `purrr`, we need to restrucutre the data to long. While we're at it, we need to create a time variable centered at zero, so we can interpret our moderators later. ```{r} (df1_long <- sample_dat %>% gather(key = trait, value = value, CESD, SensSeek, na.rm = T) %>% mutate(wave = year - 1996, age0 = age-16)) ``` # The Basic Growth Model We'll start with the basic growth model (just looking at change in a single variable over time). we will use list columns to do it. We'll start by using the `group_by()` and `nest()` functions from `dplyr` and `tidyr` to put the data for each trait into a cell of our data frame: ```{r nest ex6} (df1_nested <- df1_long %>% group_by(trait) %>% nest()) ``` Now, our data frame is 2 x 2, with the elements in the second column each containing the data frame that corresponds to that trait. This makes it really easy to run our models using the `map()` family of unctions from `purrr`. Before we fit the full growth model, we will first fit the unconditional model. Below, we will add a new column to our data frame that will contain the unconditional model for each trait. In this case the model will be in the form of:

Level 1: $Y_{ij} = \beta_{0j} + \varepsilon{ij}$
Level 2:

$\beta_{0j} = \gamma_{00} + U_{0j}$

```{r fit0 ex6} (df1_nested <- df1_nested %>% mutate(fit0 = map(data, ~lmer(value ~ 1 + (1 | PROC_CID), data = .)))) ``` Now we can see we have a new list column in our data frame called fit0 that contains an S4 class lmerMod, which simply means your growth model. To understand model, I personally find it easiest to visualize it. What this model is telling us is the mean across all observations as well as the between-person variability in that estimate. Now, moving on to the growth model: In this case the model will be in the form of:

Level 1: $Y_{ij} = \beta_{0j} + \beta_{1j}*time_{ij} + \varepsilon{ij}$
Level 2:

$\beta_{0j} = \gamma_{00} + U_{0j}$
$\beta_{1j} = \gamma_{10} + U_{1j}$

```{r fit12 ex6} (df1_nested <- df1_nested %>% mutate(fit2 = map(data, ~lmer(value ~ 1 + age0 + (age0 | PROC_CID), data = .)))) ``` Now that we've run our model, we are ready to plot the results. To do so, we'll write a short function that will get the predicted values both for group level effects, as well as for individual-level estimates. There's also a hidden function for getting standard errors of our pointwise estimates that we'll use to create confidence bands in our plots. Don't worry about that. ```{r, echo = F} pv_fun <- function(frame, m){ nvar <- nrow(vcov(m)) # # of variables varnames <- str_replace_all(colnames(vcov(m)), "[()]", "") # names for vcov mat vcov_mat <- matrix(vcov(summary(m))@x, nvar) # chnage vcov mat class colnames(vcov_mat) <- varnames; rownames(vcov_mat) <- varnames # rename vcov mat frame_mat <- frame %>% as.matrix pv <- diag(frame_mat %*% vcov_mat %*% t(frame_mat)) # compute variance SE <- sqrt(pv) return(SE) } ``` ## Predicted Values Now we'll use the predict function to get predicted values and the function I created to get pointwise standard errors for the fixed effects. ```{r} # function to get fixed effects fixed_pred_fun <- function(m){ frame <- tibble(Intercept = 1, age0 = seq(0, 8, .1)) %>% mutate(fixed_pred = predict(m, newdata = ., re.form = NA), age = age0 + 16) frame$SE <- pv_fun(frame %>% select(Intercept, age0), m) frame %>% select(-Intercept) } # function to get random effects predictions ran_pred_fun <- function(m){ crossing(age0 = seq(0, 8, 1), PROC_CID = m@frame$PROC_CID) %>% mutate(ran_pred = predict(m, newdata = .), age = age0 + 16) } ran_pred_fun <- function(m){ augment(m) } # get fixed and random efects, and also combine them. (df1_nested <- df1_nested %>% mutate(fixed_pred = map(fit2, fixed_pred_fun), ran_pred = map(fit2, ran_pred_fun), combined_pred = map2(fixed_pred, ran_pred, full_join))) ``` ```{r} df1_nested <- df1_nested %>% mutate(ran_pred = map(fit2, augment)) df1_nested %>% unnest(ran_pred) %>% mutate(age = age0 + 16) %>% ggplot(aes(x = age, y = .fitted, group = PROC_CID)) + geom_line(size = .25, alpha = .5, color = "darkgreen") + facet_grid(~trait) + theme_classic() ``` ## Plot Now let's ```{r} df1_nested %>% select(trait, combined_pred) %>% unnest(combined_pred) %>% ggplot(aes(x = age)) + geom_line(aes(y = ran_pred, color = trait, group = PROC_CID), size = .25, alpha = .5) + geom_line(aes(y = fixed_pred), size = 2) + lims(y = c(0,4)) + labs(x = "Age", y = "Composite Rating", title = "Simple Growth Models") + facet_grid(~trait) + theme_classic() + theme(axis.text = element_text(face = "bold", size = rel(1.2)), axis.title = element_text(face = "bold", size = rel(1.2)), legend.title = element_text(face = "bold", size = rel(1.2)), legend.position = "none", plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)) ``` And with confidence bands: ```{r} df1_nested %>% select(trait, fixed_pred) %>% unnest(fixed_pred) %>% ggplot(aes(x = age, y = fixed_pred, fill = trait)) + geom_ribbon(aes(ymin=fixed_pred-1.96*SE, ymax=fixed_pred+1.96*SE),alpha=0.2) + geom_line(size = 2) + labs(x = "Age", y = "Predicted Values") + ylim(0,4) + facet_grid(~trait) + theme_classic() + theme(axis.text = element_text(face = "bold", size = rel(1.2)), axis.title = element_text(face = "bold", size = rel(1.2)), legend.title = element_text(face = "bold", size = rel(1.2)), legend.position = "none", plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)) ``` ```{r} df1_nested %>% mutate(df = map_dbl(fit2, df.residual)) %>% select(trait, fixed_pred, df) %>% unnest(fixed_pred) %>% ggplot(aes(x = age, y = fixed_pred, fill = trait)) + stat_dist_lineribbon( aes(dist = "student_t", arg1 = unique(df), arg2 = fixed_pred, arg3 = SE), alpha = 1/4 ) + facet_grid(~trait) + theme_classic() + theme(axis.text = element_text(face = "bold", size = rel(1.2)), axis.title = element_text(face = "bold", size = rel(1.2)), legend.title = element_text(face = "bold", size = rel(1.2)), legend.position = "none", plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)) ``` # Two-Level Categorical Moderator Let's start with the basic syntax:

Level 1: $Y_{ij} = \beta_{0j} + \beta_{1j}*time_{1j} + \varepsilon{ij}$
Level 2:

$\beta_{0j} = \gamma_{00} + \gamma_{01}*X_{2j} + U_{0j}$
$\beta_{1j} = \gamma_{10} + \gamma_{11}*X_{2j} + U_{1j}$

Now let's swap that out for a 2 group sample from the present data:

Level 1: $Y_{ij} = \beta_{0j} + \beta_{1j}*age0_{ij} + \varepsilon{ij}$
Level 2:

$\beta_{0j} = \gamma_{00} + \gamma_{01}*groupsNone + U_{0j}$
$\beta_{1j} = \gamma_{10} + \gamma_{11}*groupsNone + U_{1j}$

Variable | D1 --------- | --- Jail | 0 None | 1 ```{r} df1_nested <- df1_nested %>% mutate(data_2g = map(data, function(x) x %>% filter(groups != "CommServ")), fit3 = map(data_2g, ~lmer(value ~ 1 + age0*groups + (age0 | PROC_CID), data = .))) ``` When we plot these, we are plotting the simple slopes. Subbing 1 and 0 into the equations above we end up with the following for the groups. None: $Y_{ij} = \gamma_{00} + \gamma_{10}*age$ Jail: $Y_{ij} = \gamma_{00} + \gamma_{01} + (\gamma_{10} + \gamma_{11})*age$ ```{r} fixed_pred_fun <- function(m){ frame <- crossing(Intercept = 1, age0 = seq(0, 8, .1), groups = c("Jail", "None")) %>% mutate(fixed_pred = predict(m, newdata = ., re.form = NA), age = age0 + 16, groupsn = as.numeric(mapvalues(groups, unique(groups), c(1,0))), Int = age0*groupsn) frame$SE <- pv_fun(frame %>% select(Intercept, age0, groupsn, Int), m) frame %>% select(-Intercept, -groupsn, -Int) } ran_pred_fun <- function(m){ crossing(age0 = seq(0, 8, 1), PROC_CID = m@frame$PROC_CID) %>% left_join(m@frame %>% tbl_df %>% select(PROC_CID, groups)) %>% distinct() %>% mutate(ran_pred = predict(m, newdata = .), age = age0 + 16) } (df1_nested <- df1_nested %>% mutate(fixed_pred3 = map(fit3, fixed_pred_fun), ran_pred3 = map(fit3, ran_pred_fun), combined_pred3 = map2(fixed_pred3, ran_pred3, full_join))) ``` ```{r} df1_nested %>% select(trait, combined_pred3) %>% unnest(combined_pred3) %>% ggplot(aes(x = age)) + scale_color_manual(values = c("blue", "red")) + geom_line(aes(y = ran_pred, color = groups, group = PROC_CID), size = .25, alpha = .1) + geom_line(aes(y = fixed_pred, color = groups), size = 2) + lims(y = c(0,4)) + labs(x = "Age", y = "Composite Rating", title = "Simple Growth Models") + facet_grid(~trait) + theme_classic() + theme(axis.text = element_text(face = "bold", size = rel(1.2)), axis.title = element_text(face = "bold", size = rel(1.2)), legend.title = element_text(face = "bold", size = rel(1.2)), legend.position = "none", plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)) ``` ```{r} df1_nested %>% unnest(fixed_pred3) %>% ggplot(aes(x = age, y = fixed_pred, group = groups)) + geom_ribbon(aes(ymin = fixed_pred - SE * 1.96, ymax = fixed_pred + SE * 1.96, fill = groups), alpha = .5) + geom_line(size = 1) + facet_grid(~trait) + theme_classic() ``` ## Three-Level Categorical Moderator

Level 1: $Y_{ij} = \beta_{0j} + \beta_{1j}*age0_{ij} + \varepsilon{ij}$
Level 2:

$\beta_{0j} = \gamma_{00} + \gamma_{01}*D1 + \gamma_{02}*D2 + U_{0j}$
$\beta_{1j} = \gamma_{10} + \gamma_{11}*D1 + \gamma_{12}*D2 + U_{1j}$

Variable | D1 | D2 --------- | --- | --- Jail | 0 | 0 None | 1 | 0 CommServ | 0 | 1 ```{r} df1_nested <- df1_nested %>% mutate(fit4 = map(data, ~lmer(value ~ 1 + age0*groups + (age0 | PROC_CID), data = .))) ``` When we plot these, we are plotting the simple slopes. Subbing 1 and 0 into the equations above we end up with the following for the groups. None: $Y_{ij} = \gamma_{00} + \gamma_{10}*age$ Jail: $Y_{ij} = \gamma_{00} + \gamma_{01} + (\gamma_{10} + \gamma_{11})*age$ Community Service: $Y_{ij} = \gamma_{00} + \gamma_{02} + (\gamma_{10} + \gamma_{12})*age$ ```{r} fixed_pred_fun <- function(m){ crossing(age0 = seq(0, 8, 1), groups = c("Jail", "None", "CommServ")) %>% mutate(fixed_pred = predict(m, newdata = ., re.form = NA), age = age0 + 16) } ran_pred_fun <- function(m){ crossing(age0 = seq(0, 8, 1), PROC_CID = m@frame$PROC_CID) %>% left_join(m@frame %>% tbl_df %>% select(PROC_CID, groups)) %>% distinct() %>% mutate(ran_pred = predict(m, newdata = .), age = age0 + 16) } (df1_nested <- df1_nested %>% mutate(fixed_pred4 = map(fit4, fixed_pred_fun), ran_pred4 = map(fit4, ran_pred_fun), combined_pred4 = map2(fixed_pred4, ran_pred4, full_join))) ``` ```{r} df1_nested %>% select(trait, combined_pred4) %>% unnest(combined_pred4) %>% ggplot(aes(x = age)) + scale_color_manual(values = c("blue", "red", "black")) + geom_line(aes(y = ran_pred, color = groups, group = PROC_CID), size = .25, alpha = .1) + geom_line(aes(y = fixed_pred, color = groups), size = 2) + lims(y = c(0,4)) + labs(x = "Age", y = "Composite Rating", title = "Simple Growth Models") + facet_grid(~trait) + theme_classic() + theme(axis.text = element_text(face = "bold", size = rel(1.2)), axis.title = element_text(face = "bold", size = rel(1.2)), legend.title = element_text(face = "bold", size = rel(1.2)), legend.position = "none", plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)) ``` ## Continuous Time-Invariant Moderator ```{r} df1_nested <- df1_nested %>% mutate(fit5 = map(data, ~lmer(value ~ 1 + age0*DemPweight + (age0 | PROC_CID), data = .))) ``` ```{r} fixed_pred_fun <- function(m){ desc <- Rmisc::summarySE(m@frame, measurevar = "DemPweight") crossing(age0 = seq(0, 8, 1), DemPweight = c(desc$DemPweight - desc$sd, desc$DemPweight, desc$DemPweight + desc$sd), gender = ) %>% mutate(fixed_pred = predict(m, newdata = ., re.form = NA), age = age0 + 16, Weight = factor(DemPweight, levels = unique(DemPweight), labels = c("-1SD", "0SD", "1SD"))) } ran_pred_fun <- function(m){ crossing(age0 = seq(0, 8, 1), PROC_CID = m@frame$PROC_CID) %>% left_join(m@frame %>% tbl_df %>% select(PROC_CID, DemPweight)) %>% distinct() %>% mutate(ran_pred = predict(m, newdata = .), age = age0 + 16) } (df1_nested <- df1_nested %>% mutate(fixed_pred5 = map(fit5, fixed_pred_fun), ran_pred5 = map(fit5, ran_pred_fun), combined_pred5 = map2(fixed_pred5, ran_pred5, full_join))) ``` ```{r} df1_nested %>% select(trait, combined_pred5) %>% unnest(combined_pred5) %>% ggplot(aes(x = age)) + scale_color_manual(values = c("blue", "red", "black")) + # geom_line(aes(y = ran_pred, group = PROC_CID), size = .25, alpha = .1) + geom_line(aes(y = fixed_pred, color = Weight), size = 1) + lims(y = c(0,4)) + labs(x = "Age", y = "Composite Rating", title = "Simple Growth Models") + facet_wrap(~trait, scales = "free_y") + theme_classic() + theme(axis.text = element_text(face = "bold", size = rel(1.2)), axis.title = element_text(face = "bold", size = rel(1.2)), legend.title = element_text(face = "bold", size = rel(1.2)), legend.position = "none", plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)) ``` ```{r} df1_nested %>% unnest(fixed_pred5) %>% ggplot(aes(x = age, y = fixed_pred, group = Weight)) + geom_line(aes(color = Weight), size = 1) + facet_wrap(~trait, scales = "free_y") + theme_classic() ```