--- title: "Week 15: Activity" format: gfm editor: source editor_options: chunk_output_type: console --- ### Last Time - Intro to Areal Data - Areal Data Visualization - Assessing Spatial Structure in Areal Data ### This Time - Assessing Spatial Structure in Areal Data - Spatial Smoothing with Areal Data --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) knitr::opts_chunk$set(message = FALSE) library(tidyverse) library(ggmap) library(knitr) library(gtools) library(mgcv) library(mnormt) library(arm) library(rstanarm) library(rstan) library(viridis) library(spdep) library(urbnmapr) ``` ## Spatial Association There are two common statistics used for assessing spatial association: Moran's I and Geary's C. \vfill Moran's I $I =n \sum_i \sum_j w_{ij} (Y_i - \bar{Y})(Y_j -\bar{Y}) / (\sum_{i\neq j \;w_{ij}})\sum_i(Y_i - \bar{Y})^2$ \vfill Moran's I is analogous to correlation, where values close to 1 exhibit spatial clustering and values near -1 show spatial regularity (checkerboard effect). \vfill Geary's C $C=(n-1)\sum_i \sum_j w_{ij}(Y_i-Y_j)^2 / 2(\sum_{i \neq j \; w_{ij}})\sum_i (Y_i - \bar{Y})^2$ \vfill Geary's C is more similar to a variogram (has a connection to Durbin-Watson in 1-D). The statistics ranges from 0 to 2; values close to 2 exhibit clustering and values close to 0 show repulsion. A Geary's C near 1 would be expected with no spatial structure. \vfill ## Spatial Association Exercise Consider the following scenarios and use the following 4-by-4 grid ```{r, echo = F} d4 <- tibble(xmin = rep(c(3.5, 2.5, 1.5, 0.5), each = 4), x = rep(4:1, each =4), xmax = rep(c(3.5, 2.5, 1.5, 0.5), each = 4) +1, ymin = rep(c(3.5, 2.5, 1.5, 0.5), 4), y = rep(4:1, 4), ymax = rep(c(3.5, 2.5, 1.5, 0.5), 4) +1, rpos = rep(4:1, 4), cpos = rep(1:4, each = 4), id=16:1) ggplot() + scale_x_continuous(name="column") + scale_y_continuous(name="row",breaks = 1:4,labels = c('4','3','2', '1') ) + geom_rect(data=d4, mapping=aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax), color="black", alpha=0.05) + geom_text(data=d4, aes(x=xmin+(xmax-xmin)/2, y=ymin+(ymax-ymin)/2, label=id), size=4) + theme_minimal() ``` and proximity matrix ```{r} W <- matrix(0, 16, 16) for (i in 1:16){ W[i,] <- as.numeric((d4$rpos[i] == d4$rpos & (abs(d4$cpos[i] - d4$cpos) == 1)) | (d4$cpos[i] == d4$cpos & (abs(d4$rpos[i] - d4$rpos) == 1))) } head(W) ``` \vfill for each scenario plot the grid, calculate I and G, along with permutation-based p-values. 1. Simulate data where the responses are i.i.d. N(0,1). ```{r, echo = T} #| warning: false d4$z <- rnorm(16) ggplot() + geom_tile(data=d4, mapping=aes(x = x, y = y, fill = z)) + geom_text(data=d4, aes(x=xmin+(xmax-xmin)/2, y=ymin+(ymax-ymin)/2, label=id), size=4) + theme_minimal() + scale_fill_gradient2(midpoint=0, low="blue", mid="white", high="red", space ="Lab" ) moran.test(d4$z, mat2listw(W), alternative = 'two.sided') #moran.plot(d4$z, mat2listw(W)) geary.test(d4$z, mat2listw(W), alternative = 'two.sided') ``` \vfill 2. Simulate data and calculate I and G for a 4-by-4 grid with a chess board approach, where "black squares" $\sim N(-2,1)$ and "white squares" $\sim N(2,1)$. \vfill 3. Simulate multivariate normal response on a 4-by-4 grid where $y \sim N(0, (I- \rho W)^{-1})$, where $\rho = .3$ is a correlation parameter and $W$ is a proximity matrix. ## Spatial Smoothing Spatial smoothing results in a "smoother" spatial surface, by sharing information from across the neighborhood structure. \vfill This smoothing is akin to fitted values (expected values) in a traditional modeling framework. \vfill One option is replacing $Y_i$ with $$\hat{Y}_i = \sum_j w_{ij} Y_j / w_{i+}$$ where $w_{i+} = \sum_j w_{ij}$. \vfill What are some pros and cons of this smoother? \vfill ### "Exponential" smoother Another option would be to use: $$\hat{Y}_i^* = (1 - \alpha) Y_i + \hat{Y}_i$$ \vfill Compare $\hat{Y}_i^*$ with $\hat{Y}_i$. \vfill What is the impact of $\alpha?$ _This is essentially the exponential smoother from time series._ ## Areal Data Example Recall the motivating image, ```{r, out.width = "95%", echo = F, fig.align = 'center', fig.cap='source: https://www.politico.com/election-results/2018/montana/'} knitr::include_graphics("MT_Map.png") ``` using the following dataset ```{r} Tester <- read_csv('Tester_Results.csv') Tester <- Tester %>% mutate(Tester_Prop = TESTER / (TESTER + ROSENDALE + BRECKENRIDGE), county_fips = as.character(FIPS)) ``` recreate the image. Recall `urbnmapr::counties` has a shape file with county level boundaries. recreate the image. ## Adjacency Matrix Follow the code below to create an adjacency matrix for Montana. ```{r, eval = T, echo = T} MT_sf <- get_urbn_map("counties", sf = TRUE) |> filter(state_abbv == 'MT') |> arrange(county_fips) mt_W <- nb2mat(poly2nb(MT_sf), style = 'B') MT_list <- nb2listw(poly2nb(MT_sf), style = 'B') ``` ## Moran's I / Geary's C Using the Tester - Rosendale election results and the adjacency matrix compute and interpret Moran's I and Geary's C with the proportion voting for Tester. ## Now consider some covariates to explain the response Consider a linear model with county population ```{r, echo = T} library(usmap) usmap::countypop ``` Extract the residuals create a choropleth and test for spatial structure.