---
title: "Lab 4 Assignment | PAF 516"
subtitle: "Spatial Autocorrelation & Hot Spot Analysis"
format:
  html:
    theme: readable
    highlight-style: tango
    toc: true
    toc-title: "Steps & Questions"
    self-contained: true
    number-sections: false
execute:
  echo: true
  warning: false
  message: false
  fig-width: 10
---

::: {.callout-important}
## Before you begin
1. **This file contains the same pre-filled code as the tutorial.** Steps 1–7 below produce the same output you saw in the tutorial. If this file is in the same folder as your tutorial QMD, you can knit immediately (Cmd+Shift+K / Ctrl+Shift+K) and see the same results. If you placed it in a different location, run the `packages` chunk first (Cmd+Return / Ctrl+Enter), then knit.
3. **The assignment questions are at the top of this document.** Here is the workflow:
   - **Q1** asks you to interpret the Global Moran's I result and identify hot spot areas from the LISA map. Knit first, then review Step 4 (Moran's I output) and Step 7 (LISA cluster map), then write your answers.
   - **Q2** asks you to re-run the full LISA analysis on **poverty rate alone** instead of the composite index. Fill in the scaffold chunks labeled `q2-...` at the end of this document, then re-knit.
   - **Q3** asks you to compare the poverty-rate hot spots to the composite index hot spots and discuss what this tells you about measurement.
4. Submit both `Lab4_Assignment_YourLastName.qmd` and `Lab4_Assignment_YourLastName.html` to Canvas.
:::

# Assignment Questions

## Q1 — Interpret the Global Moran's I and LISA Results {.question}

> **Text answer — no code required.** Knit the file first, then review the Step 4 output and Step 7 maps before writing your answers.

- **Global Moran's I:** What is the Moran's I value and is it statistically significant? What does this tell you about the spatial distribution of economic hardship in Maricopa County?

[YOUR ANSWER HERE]

- **Hot spot identification:** From the LISA cluster map, identify 2--3 specific hot spot (HH) areas. Where in the county are they concentrated? Do they correspond to areas you would expect based on your knowledge of the Phoenix metro area?

[YOUR ANSWER HERE]

- **Cold spots and outliers:** Where are the cold spots (LL)? Are there any spatial outliers (HL or LH), and what might explain them?

[YOUR ANSWER HERE]


## Q2 — LISA Analysis on Poverty Rate Alone {.question}

> **Code + output required.** Rerun the entire LISA analysis on a **single component variable** --- poverty rate alone --- instead of the composite economic hardship index. Fill in the scaffold chunks at the end of this document to produce: (1) Global Moran's I for poverty rate, (2) LISA classification, (3) a LISA cluster map, and (4) a side-by-side comparison of poverty vs. composite hot spots.

::: {.callout-tip}
**Hint:** The spatial weights (`lw`) are already built in Step 3. You only need to rerun `moran.test()`, `localmoran()`, and the classification/mapping code — substituting `poverty_rate` for `hardship_index`.
:::


## Q3 — Compare Poverty Hot Spots vs. Composite Index Hot Spots {.question}

> **Text answer — no code required.**

- **Pattern comparison:** Do the hot spots shift when you use poverty rate alone vs. the 5-variable composite index? Are some areas identified as hot spots by one measure but not the other?

[YOUR ANSWER HERE]

- **Measurement implications:** What does this comparison tell you about using a single variable vs. a composite index for spatial analysis? Refer back to the Module 1 discussion of constructs vs. indicators.

[YOUR ANSWER HERE]

- **Policy relevance:** If you were advising a city agency on where to target neighborhood investment, would you use the poverty-rate hot spots or the composite-index hot spots? Why?

[YOUR ANSWER HERE]


---


```{r}
#| label: packages
# ── One-time course environment setup ─────────────────────────────────────
if (!file.exists("~/.paf516_ready")) {
  if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
  tmp <- tempfile(fileext = ".lock")
  download.file(
    url      = "https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/renv.lock",
    destfile = tmp, mode = "wb"
  )
  renv::restore(lockfile = tmp, prompt = FALSE)
  file.create("~/.paf516_ready")
}

library(tidyverse)
library(tidycensus)
library(sf)
library(spdep)
library(viridis)
library(patchwork)
```

```{r}
#| label: api-key
# Get a free Census API key at: https://api.census.gov/data/key_signup.html
# Run ONCE in RStudio console: census_api_key("YOUR_KEY_HERE", install = TRUE)
# Then restart R.
```


# Step 1: Pull Maricopa County Block Group Data

```{r}
#| label: step.01-vars

hardship_vars <- c(
  pov_below50    = "C17002_002",  pov_50to99  = "C17002_003",
  pov_den        = "C17002_001",  median_income = "B19013_001",
  unemp_num      = "B23025_005",  unemp_den   = "B23025_002",
  no_hs_num      = "B15003_002",  no_hs_den   = "B15003_001",
  no_ins_num     = "B27010_017",  no_ins_den  = "B27010_001"
)
```

```{r}
#| label: step.01-pull
#| results: hide

cbg_raw <- get_acs(
  geography = "block group", variables = hardship_vars,
  state = "AZ", county = "Maricopa",
  year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE
)
```

```{r}
#| label: step.01-pull-offline
#| eval: false
# ── Offline fallback — run this block instead if the Census API is down ──
# Replace the get_acs() chunk above with these readRDS() calls.
# All datasets are pre-built and stored in data/ for offline use.
#
# cbg_raw <- readRDS("../data/cbg_index_lab4_complete.rds")
```


# Step 2: Build the 5-Variable Economic Hardship Index

```{r}
#| label: step.02-wrangle

cbg_wide <- cbg_raw %>%
  select(-moe) %>%
  pivot_wider(names_from = variable, values_from = estimate) %>%
  mutate(
    poverty_rate = (pov_below50 + pov_50to99) / pov_den,
    unemp_rate   = unemp_num / unemp_den,
    no_hs_rate   = no_hs_num / no_hs_den,
    no_ins_rate  = no_ins_num / no_ins_den
  )

cbg_index <- cbg_wide %>%
  filter(!is.na(poverty_rate),
         !is.na(unemp_rate),
         !is.na(no_hs_rate),
         !is.na(no_ins_rate),
         !is.na(median_income)) %>%
  mutate(
    z_poverty = as.numeric(scale(poverty_rate)),
    z_unemp   = as.numeric(scale(unemp_rate)),
    z_no_hs   = as.numeric(scale(no_hs_rate)),
    z_no_ins  = as.numeric(scale(no_ins_rate)),
    z_income  = -1 * as.numeric(scale(median_income))
  ) %>%
  mutate(
    hardship_index = (z_poverty + z_unemp + z_no_hs + z_no_ins + z_income) / 5
  )

cat("CBGs with valid index:", nrow(cbg_index), "\n")
```

```{r}
#| label: step.02-map
#| fig-height: 8

ggplot(cbg_index) +
  geom_sf(aes(fill = hardship_index), color = NA) +
  scale_fill_viridis(option = "inferno", name = "Hardship\nIndex") +
  labs(
    title    = "Economic Hardship Index — Maricopa County CBGs",
    subtitle = "Higher values = greater hardship (z-score composite)",
    caption  = "Source: ACS 2019-2023 5-Year Estimates"
  ) +
  theme_void()
```


# Step 3: Build Spatial Weights

```{r}
#| label: step.03-weights

cbg_index <- st_make_valid(cbg_index)
nb <- poly2nb(cbg_index, queen = TRUE)
lw <- nb2listw(nb, style = "W", zero.policy = TRUE)
```

::: {.callout-tip}
## Row-standardization explained
If CBG A has 3 neighbors, each gets weight 1/3. If CBG B has 6 neighbors,
each gets weight 1/6. This way, the **spatial lag** (weighted average of
neighbors) is a true average regardless of how many neighbors a polygon has.
:::


# Step 4: Global Moran's I

```{r}
#| label: step.04-morans

moran_global <- moran.test(
  cbg_index$hardship_index,
  lw,
  zero.policy = TRUE
)

moran_global
```

```{r}
#| label: step.04-interpret

cat("Moran's I statistic:", round(moran_global$estimate["Moran I statistic"], 4), "\n")
cat("Expected I under randomness:", round(moran_global$estimate["Expectation"], 4), "\n")
cat("p-value:", format.pval(moran_global$p.value, digits = 3), "\n")
```


# Step 5: Moran Scatter Plot

```{r}
#| label: step.05-scatter
#| fig-height: 7

moran.plot(
  cbg_index$hardship_index,
  lw,
  zero.policy = TRUE,
  xlab = "Economic Hardship Index (standardized)",
  ylab = "Spatial Lag of Economic Hardship Index",
  main = "Moran Scatter Plot — Economic Hardship",
  pch  = 20,
  col  = "steelblue"
)
```


# Step 6: Local Moran's I (LISA)

```{r}
#| label: step.06-lisa

lisa <- localmoran(
  cbg_index$hardship_index,
  lw,
  zero.policy = TRUE
)

cbg_index$lisa_i     <- lisa[, "Ii"]
cbg_index$lisa_z     <- lisa[, "Z.Ii"]
cbg_index$lisa_p     <- lisa[, "Pr(z != E(Ii))"]
cbg_index$lag_hardship <- lag.listw(lw, cbg_index$hardship_index, zero.policy = TRUE)
```

```{r}
#| label: step.06-classify

cbg_index <- cbg_index %>%
  mutate(
    val_centered = hardship_index - mean(hardship_index, na.rm = TRUE),
    lag_centered = lag_hardship   - mean(lag_hardship, na.rm = TRUE),

    quadrant = case_when(
      val_centered > 0 & lag_centered > 0 ~ "HH",
      val_centered < 0 & lag_centered < 0 ~ "LL",
      val_centered > 0 & lag_centered < 0 ~ "HL",
      val_centered < 0 & lag_centered > 0 ~ "LH",
      TRUE ~ "NS"
    ),

    lisa_cluster = if_else(lisa_p <= 0.05, quadrant, "NS")
  )

table(cbg_index$lisa_cluster)
```


# Step 7: LISA Cluster Map

```{r}
#| label: step.07-lisa-map
#| fig-height: 9

lisa_colors <- c(
  "HH" = "red",
  "LL" = "blue",
  "HL" = "#F08080",
  "LH" = "#ADD8E6",
  "NS" = "grey90"
)

cbg_index$lisa_cluster <- factor(
  cbg_index$lisa_cluster,
  levels = c("HH", "LL", "HL", "LH", "NS")
)

ggplot(cbg_index) +
  geom_sf(aes(fill = lisa_cluster), color = "white", linewidth = 0.05) +
  scale_fill_manual(
    values = lisa_colors,
    labels = c("HH (Hot Spot)",
               "LL (Cold Spot)",
               "HL (High-Low Outlier)",
               "LH (Low-High Outlier)",
               "NS (Not Significant)"),
    name   = "LISA Cluster"
  ) +
  labs(
    title    = "LISA Cluster Map — Economic Hardship Index",
    subtitle = "Maricopa County CBGs | Queen contiguity, p <= 0.05",
    caption  = "Source: ACS 2019-2023 5-Year Estimates"
  ) +
  theme_void() +
  theme(legend.position = "right")
```

```{r}
#| label: step.07-side-by-side
#| fig-height: 8

p1 <- ggplot(cbg_index) +
  geom_sf(aes(fill = hardship_index), color = NA) +
  scale_fill_viridis(option = "inferno", name = "Hardship\nIndex") +
  labs(title = "Economic Hardship Index (Raw Values)") +
  theme_void()

p2 <- ggplot(cbg_index) +
  geom_sf(aes(fill = lisa_cluster), color = "white", linewidth = 0.05) +
  scale_fill_manual(values = lisa_colors, name = "LISA\nCluster") +
  labs(title = "LISA Clusters (Significant Only)") +
  theme_void()

p1 + p2
```


---

# Question 2: Student Modify Code

```{r}
#| label: q2-poverty-morans

# YOUR CODE HERE
# Hint: use moran.test() with cbg_index$poverty_rate
# and the same lw weights object from Step 3
```

```{r}
#| label: q2-poverty-lisa

# YOUR CODE HERE
# Hint: use localmoran() with cbg_index$poverty_rate,
# then classify into HH/LL/HL/LH/NS using the same
# centering + quadrant logic from Step 6
```

```{r}
#| label: q2-poverty-lisa-map
#| fig-height: 9

# YOUR CODE HERE
# Hint: create a LISA cluster map for poverty_rate
# using the same color scheme as Step 7
```

```{r}
#| label: q2-comparison
#| fig-height: 8

# YOUR CODE HERE
# Hint: put the composite LISA map and poverty LISA map
# side by side using patchwork (p1 + p2)
```