--- title: "Week 13: Activity Key" format: pdf editor: source editor_options: chunk_output_type: console --- ### Last Week - Spatial EDA - GP models to spatial data - Spatial Prediction / Model Choice - Anisotropic Spatial Models ### This Week - GLM models - Spatial GLMs --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) knitr::opts_chunk$set(message = FALSE) library(tidyverse) library(ggmap) library(knitr) library(gtools) #library(spBayes) library(mgcv) library(mnormt) library(arm) library(rstanarm) library(rstan) ``` ## Generalized Linear Model Notation There are three components to a generalized linear model: 1. Sampling Distribution: such as Poisson or Binomial \vfill 2. Linear combination of predictors: $\eta = X\beta$ \vfill 3. A link function to map the linear combination of predictors to the support of the sampling distribution. \vfill ## Binary Regression Overview Write out the complete model specification for binary regression. \vfill - Assume $Y_i$ is the binary response for the $i^{th}$ observation, \begin{eqnarray*} Y_i &\sim& Bernoulli(\pi_i)\\ logit(\pi_i) &=& X_i \beta, \\ \text{or } \Phi ^{-1}(\pi_i) &=& X_i \beta \end{eqnarray*} \vfill - where $logit(\pi_i) = log \left(\frac{\pi_i}{1-\pi_i}\right)$ \vfill - where $\Phi() =$ is the CDF of a standard normal distribution \newpage Latent interpretation of probit model: Let $z_i > 0$ if $y_i = 1$. Otherwise, let $z_i <0$. Then $z_i \sim N(X \beta, 1)$ is a latent continuous variable that is mapped to zero or 1. \vfill For a set of predictors, $X^{'}$, then $z^{'} \sim N(X^{'} \beta, 1)$ and the probability of a 1 or zero can be obtained by integrating the latent distribution. \vfill Consider air quality data from Colorado as a motivating example. ```{r, echo = T, echo = F} CO <- read_csv("https://raw.githubusercontent.com/stat456/Activities/refs/heads/main/CO_air.csv") state <- map_data("state") colorado <- subset(state, region=="colorado") co_map <- ggplot(data=colorado, mapping=aes(x=long, y=lat, group=group)) + coord_fixed(1.3) + geom_polygon(color="black", fill="white") + geom_polygon(data=colorado, fill=NA, color="white") + geom_polygon(color="black", fill=NA) + theme_bw() #co_map + geom_point(aes(x = Longitude, y = Latitude, color= Exceedance_Count), data = CO, inherit.aes = F) + ggtitle("Ozone Measurements") co_map + geom_point(aes(x = Longitude, y = Latitude, color= Exceedance), data = CO, inherit.aes = F) + ggtitle("Ozone Measurements") ``` \newpage Interpret the output. ```{r, echo = T} CO <- CO %>% mutate(north = as.numeric(Latitude > 38 )) glm(Exceedance~north, family=binomial(link = 'probit'),data=CO) %>% display() ``` \vfill ```{r, echo = T} glm(Exceedance~north, family=binomial(link = 'logit'),data=CO) %>% display() ``` \vfill ## Spatial Binary Regression Assume $Y(\boldsymbol{s_i})$ is the binary response for $\boldsymbol{s_i}$, \begin{eqnarray*} Y(\boldsymbol{s_i})|\beta, w(\boldsymbol{s_i}) &\sim& Bernoulli(\pi(\boldsymbol{s_i}))\\ \Phi^{-1}(\pi(\boldsymbol{s_i})) &=& X(\boldsymbol{s_i}) \beta + w(\boldsymbol{s_i}), \\ \end{eqnarray*} where $\boldsymbol{W} \sim N(\boldsymbol{0},\sigma^2 H(\phi))$ \vfill \newpage ## Simulating spatial random effects for binary data ```{r, eval = T, echo = T} N.sim <- 100 Lat.sim <- runif(N.sim,37,40) Long.sim <- runif(N.sim,-109,-104) phi.sim <- 1 sigmasq.sim <- 1 beta.sim <- c(-1,1) north.sim <- as.numeric(Lat.sim > 38) d <- dist(cbind(Lat.sim,Long.sim), upper = T, diag = T) %>% as.matrix H.sim <- sigmasq.sim * exp(- d / phi.sim) w.sim <- rmnorm(1,0,H.sim) xb.sim <- beta.sim[1] + beta.sim[2] * north.sim y.sim <- rbinom(N.sim,1,pnorm(xb.sim + w.sim)) ``` \newpage ```{r} tibble(y = Lat.sim, x = Long.sim, response = factor(y.sim)) %>% ggplot(aes(y=y, x=x, color = response)) + geom_point() + theme_bw() + ggtitle('Binary Response') + scale_color_manual(values=c("#023FA5", "#8E063B")) tibble(y = Lat.sim, x = Long.sim, `random \neffect` = w.sim) %>% ggplot(aes(y=y, x=x, color = `random \neffect`)) + geom_point() + theme_bw() + ggtitle('Spatial Random Effect') + scale_color_gradientn(colours = colorspace::diverge_hcl(7)) ``` \newpage #### STAN: probit regression ```{r} writeLines(readLines('probit_regression.stan')) ``` # Binary Regression ```{r warning = F, message = F, results = 'hide', echo = T} probit_stan <- stan(file = 'probit_regression.stan', data = list(N = N.sim, y = y.sim, x = north.sim)) ``` ```{r warning = F, message = F, echo = T} print(probit_stan, pars = c('beta0', 'beta1')) glm(y.sim ~ north.sim, family = binomial(link = 'probit')) tibble(y.sim = y.sim, north.sim = north.sim) %>% stan_glm(y.sim ~ north.sim, family = binomial(link = 'probit'), refresh = 0, data = .) ``` \newpage # Spatial Poisson Regression ## Motivation ```{r} co_map + geom_point(aes(x = Longitude, y = Latitude, color= Exceedance_Count), data = CO, inherit.aes = F) + ggtitle("Ozone Measurements") ``` ## Poisson Regression Overview Write out the complete model specification for Poisson regression. \vfill Assume $Y_i$ is the count response for the $i^{th}$ observation, \begin{eqnarray*} Y_i &\sim& Poisson(\lambda_i)\\ \log(\lambda_i) &=& X_i \beta, \end{eqnarray*} thus $\exp(X_i \beta) \geq 0$ \vfill Next write out a Poisson regression model with spatial random effects \begin{eqnarray*} Y(\boldsymbol{s_i}) &\sim& Poisson(\lambda(\boldsymbol{s_i}))\\ \log(\lambda(\boldsymbol{s_i})) &=& X(\boldsymbol{s_i}) \beta + w(\boldsymbol{s_i}), \end{eqnarray*} \vfill where $\boldsymbol{W} \sim N(\boldsymbol{0},\sigma^2 H(\phi))$ \newpage ## 1. Simulate and visualize spatial random effects for binary data: No Covariates ```{r, eval = T, echo = T} N <- 100 x1 <- runif(N) x2 <- runif(N) phi_true <- .2 sigmasq_true <- 1 d <- dist(cbind(x1,x2), upper = T, diag = T) %>% as.matrix H <- sigmasq_true * exp(- d / phi_true) w_true <- rmnorm(1,0,H) p_true <- pnorm(w_true) y_sim <- rbinom(N,1,p_true) sim1_dat <- tibble(x1 = x1, x2 = x2, w_true = w_true, p_true = p_true, y_sim = as.factor(y_sim)) sim1_dat %>% ggplot(aes(y = x1, x = x2, color = w_true)) + geom_point() + theme_bw() + scale_color_gradientn(colours = colorspace::diverge_hcl(7)) + ggtitle('spatial random effect') sim1_dat %>% ggplot(aes(y = x1, x = x2, color = y_sim)) + geom_point() + theme_bw() + ggtitle('Binary Response') + scale_color_manual(values=c("#023FA5", "#8E063B")) ``` ## 2. Fit a model for this setting ```{r} writeLines(readLines('act13_demo.stan')) ``` ```{r, results = 'hide', message = F, warning = F} probit_stan <- stan(file = 'act13_demo.stan', data = list(N = N, y = y_sim, d = d), chains = 2, iter = 10000) ``` ```{r} print(probit_stan, pars = c('phi','sigmasq')) ```