---
title: "Longitudinal Analyses with IBIS Data 1:\nUsing Mixed Models"
author: "Kevin Donovan"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output: slidy_presentation
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, message = FALSE, warning = FALSE, fig.width = 8, fig.height = 4)

library(tidyverse)
library(readr)
library(shiny)
library(rmarkdown)
library(dagitty)
library(ggdag)
library(broom)
library(flextable)
library(ggpubr)
library(nlme)
library(lme4)
library(mice)
library(naniar)
```

# Introduction 

**Previous Session**: discussed association analyses with correlation, linear regression

**Now**: focus on longitudinal analyses and linear regression with *clustered data*

# Clustered Data

**Independent Data**: 

- $n$ independent observations
- Ex. outcome $Y$ and covariates $X$ (as vector) have $\{(X_1,Y_1), \ldots, (X_n,Y_n)\}$
- Ex. cross-sectional data with one observation per subject

**Clustered Data**

- $n$ total observations, can be grouped into sets of "dependent" observations or *clusters*
- Ex. suppose we have $K$ clusters, with cluster $k$ having sample size $n_k$

$$
\begin{align}
&\text{Cluster 1}: \{(X_{1,1},Y_{1,1}), \ldots, (X_{1,n_1},Y_{1,n_1})\} \\
&\text{Cluster 2}: \{(X_{2,1},Y_{2,1}), \ldots, (X_{2,n_2},Y_{2,n_2})\} \\
& \ldots \\
&\text{Cluster K}: \{(X_{K,1},Y_{K,1}), \ldots, (X_{K,n_K},Y_{K,n_K})\} \\
\end{align}
$$

# Clustered Data

**Examples**:

![](images/network_data.jpg){height=40%, width=40%} ![](images/long_data.png){height=45%, width=45%}

# Analysis Goals

1. Estimating **population** parameters (mean for example)
  - How to account for dependent observations in estimation and testing?
  - Do we even have to account for this?  Why?

2. Estimating **cluster-specific** parameters (within-cluster means for example)

3. Can we incorporate **both in a single model/analysis framework**?

# Longitudinal Data Exploration

For this session, we focus on **longitudinal data** 

$\leftrightarrow$ "cluster" = single participant

1. Common "first" visualization = *spaghetti plot*

```{r}
# Read in data and convert from wide to long
full_data <- read_csv("../Data/Cross-sec_full.csv", na=c(".", "", " "))

names(full_data) <- gsub(" |,",".",names(full_data))

mixed_model_data <- full_data %>%
  select(Identifiers, GROUP, Gender, Study_Site,
         V36.mullen.composite_standard_score:V12.mullen.composite_standard_score)

vars_to_convert <- names(mixed_model_data)[-1:-4]

mixed_model_data <- mixed_model_data %>%
  gather(variable, var_value, vars_to_convert) %>%
  separate(variable,c("Visit","Variable"),sep=3) %>% 
  spread(key=Variable, value=var_value) %>%
  plyr::rename(c(".mullen.composite_standard_score"="Mullen_Composite_Score")) %>%
  mutate(ASD_Diag = factor(ifelse(grepl("ASD", GROUP), "ASD_Pos", "ASD_Neg")),
         Visit=factor(Visit),
         Visit_Num=as.numeric(Visit)-1) %>%
  arrange(Identifiers, Visit) %>%
  select(Identifiers, Visit, Visit_Num, Gender, Study_Site, ASD_Diag, everything())

# Spaghetti plot
ggplot(data=mixed_model_data,
       mapping=aes(x=Visit_Num, y=Mullen_Composite_Score, color=GROUP))+
  geom_point()+
  geom_line(aes(group=Identifiers), alpha=0.25)+
  geom_smooth(method="lm", se=FALSE)+
  theme_classic()
```

# Longitudinal Data Exploration

2. Visualize patterns of missing data

```{r fig.width = 15, fig.height = 5}
# View missingness patterns
plot_1 <- gg_miss_fct(mixed_model_data %>% select(-Identifiers, -Visit_Num),
            fct=Visit)

plot_2 <- ggplot(mixed_model_data,
       mapping=aes(x=Visit_Num, y=Mullen_Composite_Score))+
  geom_miss_point()+
  theme_classic()

plot_3 <- gg_miss_var(mixed_model_data %>% select(-Identifiers, -Visit_Num),
            facet=Visit)

plot_4 <- gg_miss_var(mixed_model_data %>% select(-Identifiers, -Visit_Num),
            facet=Visit, show_pct = TRUE)

ggarrange(plotlist = list(plot_1, plot_2), nrow=1)
```

# Data analysis

**Goal**: Suppose we want to analyze the associations between variables $X$ and $Y$

* Longitudinal dependence in data maybe **a) nuisance** or **b) of interest**

* Why not just compute Pearson or Spearman correlation between $X$ and $Y$ in whole data?
  * Dependent observations $\rightarrow$ standard error formulas incorrect
  * Correlations estimates **will be correct** but **standard errors too low**
  * $\rightarrow$ p-values **too low** and confidence intervals **don't reach desired rate**
  * Difficult to correct
  
* Okay, what about linear modeling?
  * Method exist which easily adjust for within-subject dependence
  * Two methods: 1) mixed models and 2) GEE
  
# Mixed models

For simplicity, suppose we are interested in the association between $Y$ and covariate $X$

Suppose we also observe time variable $T$, observe each subject $K$ times

**Recall**: Linear regression model with $X$ and $T$ is

$$
\begin{align}
&Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\epsilon_{i,j} \\
&\\
&\text{where E}(\epsilon_{i,j})=0 \text{; Var}(\epsilon_{i,j})=\sigma^2 \\
&\epsilon_{i,j} \perp \epsilon_{k,l} \text{ for }i\neq k|j\neq l
\end{align}
$$

* Dependent observations within subject $\rightarrow$ $\perp$ assumption violated
* **Result**: Estimates $\hat{\beta}$ will be correct but standard errors too low

# Mixed models

**Idea**: Let's tie together observation in the same cluster/subject using *random effects*

**Example**: Suppose want to tie observations in subject together based on *starting point*

**Model**:

$$
\begin{align}
&Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_i+\epsilon_{i,j} \\
&\\
&\text{where E}(\epsilon_{i,j})=0 \text{; Var}(\epsilon_{i,j})=\sigma^2 \\
&\text{where E}(\phi_{i})=0 \text{; Var}(\phi_{i})=\sigma_{\phi}^2; \text{Cor}(\epsilon_{i,j}, \epsilon_{i,l})=\rho_{j,l} \\
& \phi_{i} \perp \phi_{j} \text{ for }i\neq j \\
&\epsilon_{i,j} \perp \epsilon_{k,l} \text{ for }i\neq k
\end{align}
$$
  
# Mixed models

How is this modeling dependence?

1. Between two different subjects, at two different time points:

$$
\begin{align}
&Y_{1,1} = \beta_0+\beta_1X_{1,1}+\beta_2T_{1,1}+\phi_1+\epsilon_{1,1} \\
&Y_{2,2} = \beta_0+\beta_1X_{2,2}+\beta_2T_{2,2}+\phi_2+\epsilon_{2,2}
\end{align}
$$

**Between subjects** all pieces **independent from one another** $\rightarrow$

Variables are independent

2. Within single subject, at two different time points:

$$
\begin{align}
&Y_{1,1} = \beta_0+\beta_1X_{1,1}+\beta_2T_{1,1}+\phi_1+\epsilon_{1,1} \\
&Y_{1,2} = \beta_0+\beta_1X_{1,2}+\beta_2T_{1,2}+\phi_1+\epsilon_{1,2}
\end{align}
$$

* **Within subjects**, dependence comes from $\phi_1$ in both points
* Additional dependence comes from $\text{Cor}(\epsilon_{1,1}, \epsilon_{1,2})$

# Mixed models

- $\phi_i$ an example of a **random effect**
- Random effect = term representing aspect of **subject-specific mean**
  - i.e. randomly assigned piece to each subject which carries through all their observations
  - represents idea of longitudinal data = population data + subject-level data
  
- $\phi_i$ denoted as **random intercept**
  - Represents subject-level deviation from pop-level mean starting point
  
- $\epsilon_{i,j}$ = random fluctuation from subject-specific mean across observations
  - Also induces within-subject correlation
- $\beta_0,\beta_1, \beta_2$ denoted **fixed effects**
  - Represent **population-level** mean relationships
  
# Mixed models

Can represent these different means using our equation

1. Population level mean

$$
\text{E}(Y_{i,j}|X_{i,j}, T_{i,j}) = \beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}
$$

- Averaging across subjects in population
- Since both $\phi_i$ and $\epsilon_{i,j}$ have mean 0, these are moved
- Represents "pop-level trend line" or mean

2. Subject level mean

$$
\text{E}(Y_{i,j}|X_{i,j}, T_{i,j}, \phi_i) = \beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_i
$$

- Conditional on $\phi_i$ $\rightarrow$ subject-specific effect stays
- $\epsilon_{i,j}$ has mean 0, is removed
- Represents "subject-level" trend line or mean, for each subject $i$

# Correlation structure

- Means can be decomposed as done before, but so can correlations/covariances

1. Within-subject variance 

- Only random intercept $\rightarrow$ within-subject correlation structure determined by $\text{Cor}(\epsilon_{i,j}, \epsilon_{i,l})$
- Examples of structures
  - **Unstructured**: no assumptions placed on $\text{Cor}(\epsilon_{i,j}, \epsilon_{i,l})$
  - **Independence**: assumes independent residuals
    - $\text{Cor}(\epsilon_{i,j}, \epsilon_{i,l})=0$
    - $\rightarrow$ within-subject correlation represented solely by $\phi_i$
    - Commonly used in analysis for simplicity

# Correlation structure

2. Between-subject variance 

- Subjects assumed independent
- All variation between subjects comes from the following
  1. Covariate values
  2. Variance in $\phi_i$ between subjects
  3. Variance in $\epsilon_{i,j}$
  
![](images/between_within_subject_covarance.png)

# Hierarchical Models

Thus, mixed models often referred to as **hierarchical models**

1. Have subject-level and pop-level mean structure

2. Have within-subject and between-subject variance/covariance

3. Slopes and intercepts also have levels

$$
\begin{align}
&Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_i+\epsilon_{i,j} \\
&Y_{i,j}=[\beta_0+\phi_i]+\beta_1X_{i,j}+\beta_2T_{i,j}+\epsilon_{i,j}\\
&\\
&Y_{i,j}=\beta_{0,i}+\beta_1X_{i,j}+\beta_2T_{i,j}+\epsilon_{i,j}\\
\end{align}
$$

- That is, subject-level slope = fixed effect + random slope

# Hierarchical Models

- We can also add in a **random slope** to reflect subject-specific relationships with a covariate of interest
- Common example: reflect subject-specific mean trend over time

$$
\begin{align}
&Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_{0,i}+\phi_{1,i}T_{i,j}+\epsilon_{i,j} \\
&\\
&\text{where E}(\epsilon_{i,j})=0 \text{; Var}(\epsilon_{i,j})=\sigma^2 \\
&\text{where E}(\phi_{0,i})=\text{where E}(\phi_{1,i})=0 \text{; Var}(\phi_{0,i})=\sigma_{\phi_0}^2 \text{; Var}(\phi_{1,i})=\sigma_{\phi_1}^2\\
& \text{Cor}(\phi_{0,i}, \phi_{1,i})=\rho_{\phi_{0,1}} \\
& \text{Cor}(\epsilon_{i,j}, \epsilon_{i,l})=\rho_{\epsilon_{j,l}} \\
& \phi_{i,l} \perp \phi_{j,m} \text{ for }i\neq j \\
&\epsilon_{i,j} \perp \epsilon_{k,l} \text{ for }i\neq k
\end{align}
$$

- Now have two random effects (intercept and slope for time) $\rightarrow$ can specify their correlation
  - i.e., a subjects own mean starting point may be correlated with their own time-based trajectory
  - Adds additional piece to within-subject correlation
  - Like with $\epsilon$, can specify covariance structure for random effects
  - Often **unstructured** is used
  
# Hierarchical Models

- Can now write the different model levels from this overall model:

1. Population level mean

$$
\text{E}(Y_{i,j}|X_{i,j}, T_{i,j}) = \beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}
$$

- Since $\phi_{0,i}, \phi_{1,i}$ and $\epsilon_{i,j}$ have mean 0, these are removed

2. Subject level mean

$$
\text{E}(Y_{i,j}|X_{i,j}, T_{i,j}, \phi_{0,i}, \phi_{1,i}) = \beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_{0,i}+\phi_{1,i}T_{i,j}
$$ 

3. Hierarchical intercepts and slopes

$$
\begin{align}
&Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_{0,i}+\phi_{1,i}T_{i,j}+\epsilon_{i,j} \\
&Y_{i,j}=[\beta_0+\phi_{0,i}]+\beta_1X_{i,j}+[\beta_2+\phi_{1,i}]T_{i,j}+\epsilon_{i,j}\\
& \\
&Y_{i,j}=\beta_{0,i}+\beta_1X_{i,j}+\beta_{2,i}T_{i,j}+\epsilon_{i,j}\\
\end{align}
$$

# Mixed model examples:

See code RMD file

- What about GEE
  - Will discuss next time due to time (briefly!)