---
title: "Hierarchical Models"
output:
  pdf_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = F)
library(tidyverse) 
library(arm)
library(knitr)
library(kableExtra)
library(lme4)
library(rstan)
library(rstanarm)
options(mc.cores=parallel::detectCores())
```


### Motivating Dataset

Recall the housing dataset from King County, WA that contains sales prices of homes across the Seattle area. 

\vfill

```{r, message = F}
seattle <- read_csv("http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv")

seattle <- seattle %>% mutate(zipcode = factor(zipcode),
                              sqft_living_sq = sqft_living ^2,
                              sqft1000 = sqft_living / 1000,
                              price100000 = price / 100000,
                              scale_sqft = scale(sqft_living))
```

```{r, echo = F}
seattle %>% 
  ggplot(aes(y = price100000, x = sqft1000, color = zipcode)) +
  geom_point() + geom_smooth(method = 'lm', formula = 'y~x') + theme_bw() +
  facet_wrap(.~zipcode) + theme(legend.position = "none") +
  ggtitle('Housing price vs. Living Square Feet in King County, WA') + 
  ylab('Sales Price ($100,000)') + xlab('Living Space (1000 sqft)') + 
  theme(axis.text.x = element_text(angle = 270, hjust = 1))
```

\vfill

### Multilevel models

While we will initially just look at a model with the varying intercepts, 

\vfill

There are several different, but equivalent specifications in GH 12.5, but here is one way to look at the model.


### lmer

One common approach for hierarchical models is to use the `lmer` function in the `lme4` package. Note that the hierarchical structure we have detailed can also be applied to GLMs using `glmer`. Note that most of this code (and the textbook) is "pre-rstanarm", so it might be more intuitive to use `stan_glmer`, which we will also look at a Bayesian version in a little bit using `stan_glmer`.

\vfill


```{r, echo = T}
lmer1 <- lmer(price ~ (1 | zipcode) , data = seattle)
display(lmer1)
coef(lmer1)
```

Note the coefficients for a specific group are defined as the fixed effect + the random effect.

```{r, echo = T}
fixef(lmer1)
```

The fixed effect here corresponds to $\mu_\alpha$. The standard component associated with the random effect can also be extracted.

```{r, echo = T}
sigma.hat(lmer1)$sigma$zipcode
```


\newpage


```{r, echo = T}
ranef(lmer1)
```

#### Summarizing the model

```{r, echo = T}
fixed_ci <- round(fixef(lmer1)['(Intercept)'] + c(-2,2) * se.fixef(lmer1)['(Intercept)'])
```

The 95% interval for the fixed effects intercept is (`r prettyNum(fixed_ci[1],big.mark=",",scientific=FALSE)`, `r prettyNum(fixed_ci[2],big.mark=",",scientific=FALSE)`). This can be interpreted as the overall mean price of a house. Formally, this is more the mean of the group means.
\vfill

The 95% intervals for the group effects (or deviations from the mean price) are:
```{r}
tibble(zipcode = rownames(ranef(lmer1)$zipcode),
       lower = round(ranef(lmer1)$zipcode + -2 * se.ranef(lmer1)$zipcode), 
       upper = round(ranef(lmer1)$zipcode + 2 * se.ranef(lmer1)$zipcode)) %>% 
  kable(format.args = list(big.mark = ",")) 
```

\vfill

A more useful way to summarize the data would be to create 95%  intervals for the overall intercept 

\vfill

```{r, echo = T}
samples <- arm::sim(lmer1, n.sims = 1000)
overall <- fixef(samples)
group <- matrix(ranef(samples)$zipcode[,,1], nrow = 1000, ncol = ngrps(lmer1), byrow = F)
group_totals <- group + matrix(overall, nrow = 1000, ncol = ngrps(lmer1))
```

\newpage
```{r}
group_int <- apply(group_totals, 2, quantile, probs = c(.025,.975) )
tibble(zipcode = rownames(ranef(lmer1)$zipcode), lower =group_int[1,], upper = group_int[2,]) %>% 
  ggplot(aes(x = lower, xend = upper, y = zipcode, yend = zipcode, color = zipcode)) + 
  theme_bw()  + 
  ggtitle('Mean housing price from multilevel model') +
  xlab('Closing Price (USD)') + scale_x_continuous(breaks = c(500000, 1000000, 2000000),  
      label = c("500k", "1 million", "2 million"), limits = c(0, 3000000)) + geom_segment() +
  annotate('text', 2180000, 4.5, label ="Medina, WA") +
  geom_point(inherit.aes = F, aes(y = zipcode, x = price, color = zipcode), data = seattle, size = .2) + geom_segment(color = 'black') + labs(caption = "note: black bars represent confidence interval for mean price \n dots represent individual houses, where those more expensive than $3 million are excluded ") + 
  theme(legend.position = "none")
```

### Prediction

Note the previous figure contains uncertainty for the mean price within a particular zipcode. Similar to before you could also make predictions for a new home in an existing dataset. 

\vfill


```{r, echo = T}
sigma_alpha <- sigma.hat(lmer1)$sigma$zipcode
mu_alpha <- fixef(lmer1)["(Intercept)"]
rnorm(10, mu_alpha, sigma_alpha)
alpha_samples <- rnorm(1000, mu_alpha, sigma_alpha)
```

\vfill

```{r, echo = T}
sigma_y <- sigma.hat(lmer1)$sigma$data

new_zip <- rnorm(1000, mean = alpha_samples, sd = sigma_y)
```

\newpage
```{r}
tibble(price = new_zip) %>% ggplot(aes(x = price))  + geom_histogram(bins = 50) + theme_bw() + 
  ggtitle("Estimated price distribution for a new zipcode in King County, WA")
```

#### Adding Coefficients
The model we have just outlined does not include any additional covariates. 

\vfill

```{r, echo = T}
lmer2 <- lmer(price ~ scale_sqft + (1 |zipcode), data = seattle)

display(lmer2)
```

\newpage

\vfill


\vfill

Note: you may have to adjust the REML and optimizer options to achieve convergence
```{r, echo = T, warning = T}
lmer_nonconverge <- lmer(price ~ scale_sqft + (1 + scale_sqft|zipcode), data = seattle)
```


```{r, echo = T}
lmer3 <- lmer(price ~ scale_sqft + (1 + scale_sqft|zipcode), data = seattle,
      REML = FALSE)
display(lmer3)
```
\vfill

The fixed-effects or means of the group-level effects can be extracted.
```{r, echo = T}
fixef(lmer3)
```

Similarly, the variance of those group-level effects can also be obtained from the model.
```{r, echo = T}
sigma.hat(lmer3)$sigma
```

\newpage

## stan_glmer

Similar to how we have used `stan_glm()`, we can also use `stan_glmer()` to fit these models.

```{r, echo = T}
stan_lmer1 <- stan_glmer(price ~ (1 | zipcode) , data = seattle)
```

\vfill

```{r, echo = T}
print(stan_lmer1)
display(lmer1)
```

\newpage

```{r, echo = T}
coef(stan_lmer1)
coef(lmer1)
```


\newpage

```{r}
summary(stan_lmer1, probs = c(0.025, 0.975))
```

\newpage

We can also directly extract the simulations from the stan object.

```{r}
sims <- as.matrix(stan_lmer1)

head(sims)
```

This can be used for generating predictions and credible intervals.

\newpage

##### Final Connections

__Group-level covariates__: 

\vfill

__Interactions:__ 

\vfill

__Shrinkage:__ 

\vfill

$$\hat{\alpha}_j \approx \frac{\frac{n_j}{\sigma_y^2}\bar{y}_j + \frac{1}{\sigma^2_\alpha }\bar{y}_{all}}{\frac{n_j}{\sigma_y^2} + \frac{1}{\sigma^2_\alpha }}$$
where $\sigma^2_y$ is the variance of the data and $\sigma^2_\alpha$ is the variance of the group-level averages.

\vfill

__Selection of Random Effects:__ these varying effect models necessarily impose additional complexity on our modeling framework; however, GH suggest embracing the complexity (as it often helps directly answer research questions), moreover, they don't recommend using evidence statements to select specific random effects.