---
title: "Lab 4 Tutorial | PAF 516"
subtitle: "Spatial Autocorrelation & Hot Spot Analysis"
format:
  html:
    theme: readable
    highlight-style: tango
    toc: true
    toc-title: "Steps"
    self-contained: true
    number-sections: false
execute:
  echo: true
  warning: false
  message: false
  fig-width: 10
---

::: {.callout-tip}
## How to use this file
**First time:** Click **Render** (Cmd+Shift+K / Ctrl+Shift+K) to see all results. Then run chunk by chunk to understand each step.

This file is for learning — do not edit it. When ready for the assignment, go to:
[https://antjam-howell.github.io/paf-516-labs/lab4-assignment.html](https://antjam-howell.github.io/paf-516-labs/lab4-assignment.html)
:::

```{r}
#| label: setup
#| include: false
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 10)
options(tigris_use_cache = TRUE, tigris_progress_bar = FALSE)
```

```{r}
#| label: packages
# ── One-time course environment setup ─────────────────────────────────────
if (!file.exists("~/.paf516_ready")) {
  if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
  tmp <- tempfile(fileext = ".lock")
  download.file(
    url      = "https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/renv.lock",
    destfile = tmp, mode = "wb"
  )
  renv::restore(lockfile = tmp, prompt = FALSE)
  file.create("~/.paf516_ready")
}

library(tidyverse)
library(tidycensus)
library(sf)
library(spdep)        # Spatial dependence: weights, Moran's I, LISA
library(viridis)
library(patchwork)
```

```{r}
#| label: api-key
# Get a free Census API key at: https://api.census.gov/data/key_signup.html
# Run ONCE in RStudio console: census_api_key("YOUR_KEY_HERE", install = TRUE)
# Then restart R.
```

```{r}
#| label: fallback-data
# Option A (local): cbg_raw <- readRDS("../data/cbg_hardship_maricopa_2023.rds")
# Option B (GitHub):
# cbg_raw <- readRDS(url("https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/Lab4/data/cbg_hardship_maricopa_2023.rds"))
# If using fallback, skip to Step 2.
```


# Step 1: Pull Maricopa County Block Group Data

By now you have done this four times. We pull the same five ACS hardship
variables used in Lab 1, but this time at the **Census Block Group (CBG)**
level for Maricopa County. CBGs are smaller than tracts (roughly 600--3,000
people), giving us finer spatial resolution to detect local clustering.

```{r}
#| label: step.01-vars

hardship_vars <- c(
  pov_below50    = "C17002_002",  pov_50to99  = "C17002_003",
  pov_den        = "C17002_001",  median_income = "B19013_001",
  unemp_num      = "B23025_005",  unemp_den   = "B23025_002",
  no_hs_num      = "B15003_002",  no_hs_den   = "B15003_001",
  no_ins_num     = "B27010_017",  no_ins_den  = "B27010_001"
)
```

```{r}
#| label: step.01-pull
#| results: hide

cbg_raw <- get_acs(
  geography = "block group", variables = hardship_vars,
  state = "AZ", county = "Maricopa",
  year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE
)
```

```{r}
#| label: step.01-pull-offline
#| eval: false
# ── Offline fallback — run this block instead if the Census API is down ──
# Replace the get_acs() chunk above with these readRDS() calls.
# All datasets are pre-built and stored in data/ for offline use.
#
# cbg_raw <- readRDS("../data/cbg_index_lab4_complete.rds")
```


# Step 2: Build the 5-Variable Economic Hardship Index

```{r}
#| label: step.02-wrangle

cbg_wide <- cbg_raw %>%
  select(-moe) %>%
  pivot_wider(names_from = variable, values_from = estimate) %>%
  mutate(
    poverty_rate = (pov_below50 + pov_50to99) / pov_den,
    unemp_rate   = unemp_num / unemp_den,
    no_hs_rate   = no_hs_num / no_hs_den,
    no_ins_rate  = no_ins_num / no_ins_den
  )

cbg_index <- cbg_wide %>%
  filter(!is.na(poverty_rate),
         !is.na(unemp_rate),
         !is.na(no_hs_rate),
         !is.na(no_ins_rate),
         !is.na(median_income)) %>%
  mutate(
    z_poverty = as.numeric(scale(poverty_rate)),
    z_unemp   = as.numeric(scale(unemp_rate)),
    z_no_hs   = as.numeric(scale(no_hs_rate)),
    z_no_ins  = as.numeric(scale(no_ins_rate)),
    z_income  = -1 * as.numeric(scale(median_income))
  ) %>%
  mutate(
    hardship_index = (z_poverty + z_unemp + z_no_hs + z_no_ins + z_income) / 5
  )

cat("CBGs with valid index:", nrow(cbg_index), "\n")
```

```{r}
#| label: step.02-map
#| fig-height: 8

ggplot(cbg_index) +
  geom_sf(aes(fill = hardship_index), color = NA) +
  scale_fill_viridis(option = "inferno", name = "Hardship\nIndex") +
  labs(
    title    = "Economic Hardship Index — Maricopa County CBGs",
    subtitle = "Higher values = greater hardship (z-score composite)",
    caption  = "Source: ACS 2019-2023 5-Year Estimates"
  ) +
  theme_void()
```

The map shows a clear visual pattern: high-hardship CBGs appear to cluster
together in certain parts of the county, particularly in south Phoenix and
parts of the west valley. But is this clustering statistically significant,
or could it be random? That is the question spatial autocorrelation answers.


# Step 3: Build Spatial Weights

Before we can test for spatial clustering, we need to define what "neighbor"
means. We use **queen contiguity**: two CBGs are neighbors if they share any
boundary point (including corners), similar to how a queen moves in chess.

```{r}
#| label: step.03-weights

# Fix any geometry errors before building neighbor lists
cbg_index <- st_make_valid(cbg_index)

# poly2nb(): identify which CBGs share a boundary
nb <- poly2nb(cbg_index, queen = TRUE)

# nb2listw(): convert to row-standardized spatial weights
# style="W" means each neighbor gets weight 1/n (n = number of neighbors)
# zero.policy=TRUE handles any island CBGs with no neighbors
lw <- nb2listw(nb, style = "W", zero.policy = TRUE)
```

::: {.callout-tip}
## Row-standardization explained
If CBG A has 3 neighbors, each gets weight 1/3. If CBG B has 6 neighbors,
each gets weight 1/6. This way, the **spatial lag** (weighted average of
neighbors) is a true average regardless of how many neighbors a polygon has.
:::


# Step 4: Global Moran's I

Global Moran's I is a single summary statistic for the **entire study area**.
It answers: "Overall, is the economic hardship index spatially clustered,
randomly distributed, or dispersed?"

- **I near +1** = strong positive spatial autocorrelation (clustering)
- **I near 0** = spatial randomness
- **I near -1** = spatial dispersion (checkerboard)

```{r}
#| label: step.04-morans

moran_global <- moran.test(
  cbg_index$hardship_index,
  lw,
  zero.policy = TRUE
)

moran_global
```

```{r}
#| label: step.04-interpret

cat("Moran's I statistic:", round(moran_global$estimate["Moran I statistic"], 4), "\n")
cat("Expected I under randomness:", round(moran_global$estimate["Expectation"], 4), "\n")
cat("p-value:", format.pval(moran_global$p.value, digits = 3), "\n")
```

**Interpretation:** The Global Moran's I is positive and the p-value is
extremely small (well below 0.05), meaning we reject the null hypothesis of
spatial randomness. Hardship values are **significantly clustered** in
Maricopa County --- CBGs with high hardship tend to be surrounded by other
high-hardship CBGs, and vice versa.


# Step 5: Moran Scatter Plot

The Moran scatter plot visualizes the relationship between each CBG's
hardship value (x-axis) and the average hardship of its neighbors --- the
**spatial lag** (y-axis). The slope of the fitted line equals Global Moran's I.

The four quadrants define the type of spatial association:

- **HH (upper-right):** High value surrounded by high neighbors (hot spot)
- **LL (lower-left):** Low value surrounded by low neighbors (cold spot)
- **HL (lower-right):** High value surrounded by low neighbors (spatial outlier)
- **LH (upper-left):** Low value surrounded by high neighbors (spatial outlier)

```{r}
#| label: step.05-scatter
#| fig-height: 7

moran.plot(
  cbg_index$hardship_index,
  lw,
  zero.policy = TRUE,
  xlab = "Economic Hardship Index (standardized)",
  ylab = "Spatial Lag of Economic Hardship Index",
  main = "Moran Scatter Plot — Economic Hardship",
  pch  = 20,
  col  = "steelblue"
)
```

Most points fall in the HH and LL quadrants, confirming positive spatial
autocorrelation. The relatively few points in the HL and LH quadrants are
spatial outliers --- CBGs that differ markedly from their neighbors.


# Step 6: Local Moran's I (LISA)

Global Moran's I tells us clustering exists *overall*, but not *where*.
**Local Indicators of Spatial Association (LISA)** decompose the global
statistic into a value for **each CBG**, telling us:

1. Whether that specific CBG is part of a significant cluster
2. What *type* of cluster it belongs to (HH, LL, HL, or LH)

```{r}
#| label: step.06-lisa

lisa <- localmoran(
  cbg_index$hardship_index,
  lw,
  zero.policy = TRUE
)

# Attach LISA results to the spatial data frame
cbg_index$lisa_i     <- lisa[, "Ii"]
cbg_index$lisa_z     <- lisa[, "Z.Ii"]
cbg_index$lisa_p     <- lisa[, "Pr(z != E(Ii))"]

# Compute the spatial lag for quadrant classification
cbg_index$lag_hardship <- lag.listw(lw, cbg_index$hardship_index, zero.policy = TRUE)
```

```{r}
#| label: step.06-classify

cbg_index <- cbg_index %>%
  mutate(
    val_centered = hardship_index - mean(hardship_index, na.rm = TRUE),
    lag_centered = lag_hardship   - mean(lag_hardship, na.rm = TRUE),

    quadrant = case_when(
      val_centered > 0 & lag_centered > 0 ~ "HH",
      val_centered < 0 & lag_centered < 0 ~ "LL",
      val_centered > 0 & lag_centered < 0 ~ "HL",
      val_centered < 0 & lag_centered > 0 ~ "LH",
      TRUE ~ "NS"
    ),

    lisa_cluster = if_else(lisa_p <= 0.05, quadrant, "NS")
  )

table(cbg_index$lisa_cluster)
```


# Step 7: LISA Cluster Map

This is the payoff map. Each CBG is colored by its LISA classification:

- **Red (HH):** Hot spots --- high-hardship CBGs surrounded by high-hardship neighbors
- **Blue (LL):** Cold spots --- low-hardship CBGs surrounded by low-hardship neighbors
- **Light coral (HL):** High-hardship outliers in low-hardship areas
- **Light blue (LH):** Low-hardship outliers in high-hardship areas
- **Grey (NS):** Not statistically significant at p <= 0.05

```{r}
#| label: step.07-lisa-map
#| fig-height: 9

lisa_colors <- c(
  "HH" = "red",
  "LL" = "blue",
  "HL" = "#F08080",
  "LH" = "#ADD8E6",
  "NS" = "grey90"
)

cbg_index$lisa_cluster <- factor(
  cbg_index$lisa_cluster,
  levels = c("HH", "LL", "HL", "LH", "NS")
)

ggplot(cbg_index) +
  geom_sf(aes(fill = lisa_cluster), color = "white", linewidth = 0.05) +
  scale_fill_manual(
    values = lisa_colors,
    labels = c("HH (Hot Spot)",
               "LL (Cold Spot)",
               "HL (High-Low Outlier)",
               "LH (Low-High Outlier)",
               "NS (Not Significant)"),
    name   = "LISA Cluster"
  ) +
  labs(
    title    = "LISA Cluster Map — Economic Hardship Index",
    subtitle = "Maricopa County CBGs | Queen contiguity, p <= 0.05",
    caption  = "Source: ACS 2019-2023 5-Year Estimates"
  ) +
  theme_void() +
  theme(legend.position = "right")
```

```{r}
#| label: step.07-side-by-side
#| fig-height: 8

p1 <- ggplot(cbg_index) +
  geom_sf(aes(fill = hardship_index), color = NA) +
  scale_fill_viridis(option = "inferno", name = "Hardship\nIndex") +
  labs(title = "Economic Hardship Index (Raw Values)") +
  theme_void()

p2 <- ggplot(cbg_index) +
  geom_sf(aes(fill = lisa_cluster), color = "white", linewidth = 0.05) +
  scale_fill_manual(values = lisa_colors, name = "LISA\nCluster") +
  labs(title = "LISA Clusters (Significant Only)") +
  theme_void()

p1 + p2
```

The side-by-side comparison shows the power of LISA analysis: the left map
shows the raw pattern (which we could see visually), while the right map
tells us which parts of that pattern are **statistically significant**. Many
areas that *look* like they might be clusters turn out to be not significant
at the 0.05 level.

---

::: {.callout-note}
## Ready for the assignment?
Head to the [Lab 4 Assignment page](https://antjam-howell.github.io/paf-516-labs/lab4-assignment.html) to download `Lab4_Assignment.qmd`. The assignment asks you to run LISA on a single component variable (poverty rate) and compare the hot spot pattern to the composite index.
:::