---
title: "Week 15: Activity"
format: gfm
editor: source
editor_options: 
  chunk_output_type: console
---

### This Week

- Intro to Areal Data
- Areal Data Visualization
- Assessing Spatial Structure in Areal Data

---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(message = FALSE)
library(tidyverse)
library(ggmap)
library(knitr)
library(gtools)
library(mgcv)
library(mnormt)
library(arm)
library(rstanarm)
library(rstan)
library(viridis)
library(spdep)
```


```{r, out.width = "90%", echo = F, fig.align = 'center', fig.cap='source: https://www.politico.com/election-results/2018/montana/'}
knitr::include_graphics("MT_Map.png") 
```

Do you think this figure shows spatial structure?


If so, how can spatial information be incorporated with this data type?


## Areal Data Intro

Defining features: random observation measured at well defined subsets, such as a city or state. Data, typically averages or totals, are captured for geographic units or blocks

\vfill
One way to characterize the transition from geostatistical, or point-referenced, data to areal data is that of going from a continuous spatial process to a discrete spatial process.

\vfill
Another way to characterize the transition from point pattern data to areal data is thinking of areal data as taking the count of point pattern observations in each areal unit.

\vfill

Spatial correlation is incorporated with 

\vfill

Autoregressive models on

\vfill

Model based approaches will incorporate covariates and introduce spatial structure with random effects.

\vfill

#### Areal Data Inferential Questions

Is there a spatial pattern?

\vfill

In presenting a map of expected responses, should the raw values or a smoothed response be presented?

\vfill

What values would be expected for new set of areal units?


## Areal Data Visualization

#### Choropleth Tutorial
```{r}
#| echo: true
#devtools::install_github("UrbanInstitute/urbnmapr")

library(urbnmapr)

```


What are the objects `urbnmapr::states` and `urbnmapr::counties`?
\vfill


```{r}
urbnmapr::states
```

\vfill

```{r}
urbnmapr::counties
```
\vfill


```{r}
ggplot() +
  geom_polygon(data = urbnmapr::states,
               mapping = aes(x = long, y = lat, group = group), fill = "white", color = "grey") +
  coord_map(projection = "mercator") +
  theme_minimal()
```

```{r}
ggplot() +
  geom_polygon(data = urbnmapr::counties,
               mapping = aes(x = long, y = lat, group = group), fill = "white", color = "grey") +
  coord_map(projection = "mercator") +
  theme_minimal()
```


What is `urbnmapr::countydata`? Create a choropleth using this dataset to visualize median household income both nationally and in Montana.

\vfill
```{r}

urbnmapr::countydata
```
\vfill


\vfill


## Assessing Spatial Structure in Areal Data

#### Proximity Matrix
Similar to the distance matrix with point-reference data, a proximity matrix $W$ is used to model areal data.

\vfill

Given measurements $Y_i, \dots, Y_n$ associated with areal units $1, \dots, n$, the elements of $W$, $w_{ij}$ connect units $i$ and $j$

\vfill

Common values for $w_{ij}$ are
$w_{ij} = 1$ if i and j are adjacent and $w_{ij}=0$ otherwise

\vfill

#### Grid Example

```{r, echo = F}
d=data.frame(xmin=c(0.5,0.5,0.5,-.5,-.5,-.5,-1.5,-1.5,-1.5),
             xmax=c(1.5,1.5,1.5,.5,.5,.5,-.5,-.5,-.5),
             ymin=rep(c(.5,-.5,-1.5), 3),
             ymax=rep(c(1.5,.5,-.5), 3),
             id=c(1,2,3,4,5,6,7,8,9))
ggplot() +
  scale_x_continuous(name="") +
  scale_y_continuous(name="") +
  geom_rect(data=d, mapping=aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax), color="black", alpha=0.05) +
  geom_text(data=d, aes(x=xmin+(xmax-xmin)/2, y=ymin+(ymax-ymin)/2, label=id), size=4) +
  theme_minimal() +
  theme(axis.text=element_blank(),
        axis.ticks = element_blank())

```

Create an adjacency matrix with diagonal neighbors

\vfill

Create an adjacency matrix without diagonal neighbors

## Spatial Association

There are two common statistics used for assessing spatial association:
Moran's I and Geary's C.

\vfill

Moran's I
$I =n \sum_i \sum_j w_{ij} (Y_i - \bar{Y})(Y_j -\bar{Y}) / (\sum_{i\neq j \;w_{ij}})\sum_i(Y_i - \bar{Y})^2$

\vfill

Moran's I is analogous to correlation, where values close to 1 exhibit spatial clustering and values near -1 show spatial regularity (checkerboard effect).

\vfill

Geary's C
$C=(n-1)\sum_i \sum_j w_{ij}(Y_i-Y_j)^2 / 2(\sum_{i \neq j \; w_{ij}})\sum_i (Y_i - \bar{Y})^2$

\vfill

Geary's C is more similar to a variogram (has a connection to Durbin-Watson in 1-D). The statistics ranges from 0 to 2; values close to 2 exhibit regularity and values close to 1 show clustering.

\vfill

## Spatial Association Exercise

Consider the following scenarios and use the following 4-by-4 grid

```{r, echo = F}
d4 <- tibble(xmin = rep(c(3.5, 2.5, 1.5, 0.5), each = 4),
             x = rep(4:1, each =4),      
             xmax = rep(c(3.5, 2.5, 1.5, 0.5), each = 4) +1,
             ymin = rep(c(3.5, 2.5, 1.5, 0.5), 4), 
             y = rep(4:1, 4),
             ymax = rep(c(3.5, 2.5, 1.5, 0.5), 4) +1,
             rpos = rep(4:1, 4),
             cpos = rep(1:4, each = 4),
             id=16:1)
ggplot() + 
  scale_x_continuous(name="column") + 
  scale_y_continuous(name="row",breaks = 1:4,labels = c('4','3','2', '1') ) +
  geom_rect(data=d4, mapping=aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax), color="black", alpha=0.05) +
  geom_text(data=d4, aes(x=xmin+(xmax-xmin)/2, y=ymin+(ymax-ymin)/2, label=id), size=4) +
  theme_minimal()
```

and proximity matrix 

```{r}
W <- matrix(0, 16, 16)
for (i in 1:16){
  W[i,] <- as.numeric((d4$rpos[i] == d4$rpos & (abs(d4$cpos[i] - d4$cpos) == 1)) | 
                        (d4$cpos[i] == d4$cpos & (abs(d4$rpos[i] - d4$rpos) == 1)))
}
head(W)
```
\vfill

for each scenario plot the grid, calculate I and G, along with permutation-based p-values. Note you can use `moran.test()` and `geary.test()` from `spdep`

\newpage

1. Simulate data where the responses are i.i.d. N(0,1). 


2. Simulate data and calculate I and G for a 4-by-4 grid with a chess board approach, where "black squares" $\sim N(-2,1)$ and "white squares" $\sim N(2,1)$.

3. Simulate multivariate normal response on a 4-by-4 grid where $y \sim N(0, (I- \rho W)^{-1})$, where $\rho = .3$ is a correlation parameter and $W$ is a proximity matrix.