---
title: "Week 13: Activity Key"
format: pdf
editor: source
editor_options: 
  chunk_output_type: console
---


### Last Week

- Spatial EDA
- GP models to spatial data
- Spatial Prediction / Model Choice
- Anisotropic Spatial Models

### This Week

- GLM models
- Spatial GLMs

---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(message = FALSE)
library(tidyverse)
library(ggmap)
library(knitr)
library(gtools)
#library(spBayes)
library(mgcv)
library(mnormt)
library(arm)
library(rstanarm)
library(rstan)
```

## Generalized Linear Model Notation
There are three components to a generalized linear model:

1. Sampling Distribution: such as Poisson or Binomial
\vfill

2. Linear combination of predictors: $\eta = X\beta$
\vfill

3. A link function to map the linear combination of predictors to the support of the sampling distribution.

\vfill

## Binary Regression Overview
Write out the complete model specification for binary regression.
\vfill

- Assume $Y_i$ is the binary response for the $i^{th}$ observation,
\begin{eqnarray*}
Y_i &\sim& Bernoulli(\pi_i)\\
logit(\pi_i) &=& X_i \beta, \\
\text{or } \Phi ^{-1}(\pi_i) &=& X_i \beta
\end{eqnarray*}

\vfill

- where $logit(\pi_i) = log \left(\frac{\pi_i}{1-\pi_i}\right)$
\vfill

- where $\Phi() =$ is the CDF of a standard normal distribution

\newpage
Latent interpretation of probit model:

Let $z_i > 0$ if $y_i = 1$. Otherwise, let $z_i <0$. Then $z_i \sim N(X \beta, 1)$ is a latent continuous variable that is mapped to zero or 1.

\vfill

For a set of predictors, $X^{'}$, then $z^{'} \sim N(X^{'} \beta, 1)$ and the probability of a 1 or zero can be obtained by integrating the latent distribution.

\vfill


Consider air quality data from Colorado as a motivating example.

```{r, echo = T, echo = F}
CO <- read_csv("https://raw.githubusercontent.com/stat456/Activities/refs/heads/main/CO_air.csv")

state <- map_data("state")

colorado <- subset(state, region=="colorado")

co_map <- ggplot(data=colorado, mapping=aes(x=long, y=lat, group=group)) + 
  coord_fixed(1.3) + 
  geom_polygon(color="black", fill="white") + 
  geom_polygon(data=colorado, fill=NA, color="white") + 
  geom_polygon(color="black", fill=NA) + theme_bw()


#co_map + geom_point(aes(x = Longitude, y = Latitude, color= Exceedance_Count), data = CO, inherit.aes = F) + ggtitle("Ozone Measurements")

co_map + geom_point(aes(x = Longitude, y = Latitude, color= Exceedance), data = CO, inherit.aes = F) + ggtitle("Ozone Measurements")
```

\newpage


Interpret the output.

```{r, echo = T}
CO <- CO %>% mutate(north = as.numeric(Latitude > 38 ))
glm(Exceedance~north, family=binomial(link = 'probit'),data=CO) %>% display()
```

\vfill

```{r, echo = T}
glm(Exceedance~north, family=binomial(link = 'logit'),data=CO) %>% display()
```


\vfill

## Spatial Binary Regression
Assume $Y(\boldsymbol{s_i})$ is the binary response for $\boldsymbol{s_i}$,
\begin{eqnarray*}
Y(\boldsymbol{s_i})|\beta, w(\boldsymbol{s_i}) &\sim& Bernoulli(\pi(\boldsymbol{s_i}))\\
\Phi^{-1}(\pi(\boldsymbol{s_i})) &=& X(\boldsymbol{s_i}) \beta + w(\boldsymbol{s_i}), \\
\end{eqnarray*}
where $\boldsymbol{W} \sim N(\boldsymbol{0},\sigma^2 H(\phi))$

\vfill


\newpage

## Simulating spatial random effects for binary data


```{r, eval = T, echo = T}
N.sim <- 100
Lat.sim <- runif(N.sim,37,40)
Long.sim <- runif(N.sim,-109,-104)
phi.sim <- 1
sigmasq.sim <- 1
beta.sim <- c(-1,1)
north.sim <-  as.numeric(Lat.sim > 38)


d <- dist(cbind(Lat.sim,Long.sim), upper = T, diag = T) %>% as.matrix
H.sim <- sigmasq.sim * exp(- d / phi.sim)
w.sim <- rmnorm(1,0,H.sim)
xb.sim <- beta.sim[1] + beta.sim[2] * north.sim
y.sim <- rbinom(N.sim,1,pnorm(xb.sim + w.sim))
```

\newpage

```{r}
tibble(y = Lat.sim, x = Long.sim, response = factor(y.sim)) %>%
  ggplot(aes(y=y, x=x, color = response)) + geom_point() +
  theme_bw() + ggtitle('Binary Response') + scale_color_manual(values=c("#023FA5", "#8E063B"))

tibble(y = Lat.sim, x = Long.sim, `random \neffect` = w.sim) %>%
  ggplot(aes(y=y, x=x, color = `random \neffect`)) + geom_point() +
  theme_bw() + ggtitle('Spatial Random Effect') + 
  scale_color_gradientn(colours = colorspace::diverge_hcl(7))
```

\newpage

#### STAN: probit regression

```{r}
writeLines(readLines('probit_regression.stan'))
```


# Binary Regression
```{r warning = F, message = F, results = 'hide', echo = T}
probit_stan <- stan(file = 'probit_regression.stan', data = list(N = N.sim, y = y.sim, x = north.sim))
```

```{r warning = F, message = F, echo = T}
print(probit_stan, pars = c('beta0', 'beta1'))
glm(y.sim ~ north.sim, family = binomial(link = 'probit'))
tibble(y.sim = y.sim, north.sim = north.sim) %>% stan_glm(y.sim ~ north.sim, family = binomial(link = 'probit'), refresh = 0, data = .)
```


\newpage

# Spatial Poisson Regression

## Motivation

```{r}

co_map + geom_point(aes(x = Longitude, y = Latitude, color= Exceedance_Count), data = CO, inherit.aes = F) + ggtitle("Ozone Measurements")

```


## Poisson Regression Overview
Write out the complete model specification for Poisson regression.

\vfill

Assume $Y_i$ is the count response for the $i^{th}$ observation,
\begin{eqnarray*}
Y_i &\sim& Poisson(\lambda_i)\\
\log(\lambda_i) &=& X_i \beta,
\end{eqnarray*}

thus $\exp(X_i \beta) \geq 0$

\vfill

Next write out a Poisson regression model with spatial random effects

\begin{eqnarray*}
Y(\boldsymbol{s_i}) &\sim& Poisson(\lambda(\boldsymbol{s_i}))\\
\log(\lambda(\boldsymbol{s_i})) &=& X(\boldsymbol{s_i}) \beta + w(\boldsymbol{s_i}),
\end{eqnarray*}

\vfill

where $\boldsymbol{W} \sim N(\boldsymbol{0},\sigma^2 H(\phi))$

\newpage


## 1. Simulate and visualize spatial random effects for binary data: No Covariates

```{r, eval = T, echo = T}
N <- 100
x1 <- runif(N)
x2 <- runif(N)
phi_true <- .2
sigmasq_true <- 1

d <- dist(cbind(x1,x2), upper = T, diag = T) %>% as.matrix
H <- sigmasq_true * exp(- d / phi_true)
w_true <- rmnorm(1,0,H)
p_true <- pnorm(w_true)
y_sim <- rbinom(N,1,p_true)

sim1_dat <- tibble(x1 = x1, x2 = x2, w_true = w_true, p_true = p_true, y_sim = as.factor(y_sim))

sim1_dat %>% ggplot(aes(y = x1, x = x2, color = w_true)) + 
  geom_point() + theme_bw() + 
  scale_color_gradientn(colours = colorspace::diverge_hcl(7)) +
  ggtitle('spatial random effect')

sim1_dat %>% ggplot(aes(y = x1, x = x2, color = y_sim)) + 
  geom_point() + theme_bw() + 
  ggtitle('Binary Response') + scale_color_manual(values=c("#023FA5", "#8E063B"))
```


## 2. Fit a model for this setting

```{r}
writeLines(readLines('act13_demo.stan'))
```


```{r, results = 'hide', message = F, warning = F}
probit_stan <- stan(file = 'act13_demo.stan', data = list(N = N, y = y_sim, d = d), 
                    chains = 2, iter = 10000)
```

```{r}
print(probit_stan, pars = c('phi','sigmasq'))
```