--- title: "Lab 4 Assignment | PAF 516" subtitle: "Spatial Autocorrelation & Hot Spot Analysis" format: html: theme: readable highlight-style: tango toc: true toc-title: "Steps & Questions" self-contained: true number-sections: false execute: echo: true warning: false message: false fig-width: 10 --- ::: {.callout-important} ## Before you begin 1. **This file contains the same pre-filled code as the tutorial.** Steps 1–7 below produce the same output you saw in the tutorial. If this file is in the same folder as your tutorial QMD, you can knit immediately (Cmd+Shift+K / Ctrl+Shift+K) and see the same results. If you placed it in a different location, run the `packages` chunk first (Cmd+Return / Ctrl+Enter), then knit. 3. **The assignment questions are at the top of this document.** Here is the workflow: - **Q1** asks you to interpret the Global Moran's I result and identify hot spot areas from the LISA map. Knit first, then review Step 4 (Moran's I output) and Step 7 (LISA cluster map), then write your answers. - **Q2** asks you to re-run the full LISA analysis on **poverty rate alone** instead of the composite index. Fill in the scaffold chunks labeled `q2-...` at the end of this document, then re-knit. - **Q3** asks you to compare the poverty-rate hot spots to the composite index hot spots and discuss what this tells you about measurement. 4. Submit both `Lab4_Assignment_YourLastName.qmd` and `Lab4_Assignment_YourLastName.html` to Canvas. ::: # Assignment Questions ## Q1 — Interpret the Global Moran's I and LISA Results {.question} > **Text answer — no code required.** Knit the file first, then review the Step 4 output and Step 7 maps before writing your answers. - **Global Moran's I:** What is the Moran's I value and is it statistically significant? What does this tell you about the spatial distribution of economic hardship in Maricopa County? [YOUR ANSWER HERE] - **Hot spot identification:** From the LISA cluster map, identify 2--3 specific hot spot (HH) areas. Where in the county are they concentrated? Do they correspond to areas you would expect based on your knowledge of the Phoenix metro area? [YOUR ANSWER HERE] - **Cold spots and outliers:** Where are the cold spots (LL)? Are there any spatial outliers (HL or LH), and what might explain them? [YOUR ANSWER HERE] ## Q2 — LISA Analysis on Poverty Rate Alone {.question} > **Code + output required.** Rerun the entire LISA analysis on a **single component variable** --- poverty rate alone --- instead of the composite economic hardship index. Fill in the scaffold chunks at the end of this document to produce: (1) Global Moran's I for poverty rate, (2) LISA classification, (3) a LISA cluster map, and (4) a side-by-side comparison of poverty vs. composite hot spots. ::: {.callout-tip} **Hint:** The spatial weights (`lw`) are already built in Step 3. You only need to rerun `moran.test()`, `localmoran()`, and the classification/mapping code — substituting `poverty_rate` for `hardship_index`. ::: ## Q3 — Compare Poverty Hot Spots vs. Composite Index Hot Spots {.question} > **Text answer — no code required.** - **Pattern comparison:** Do the hot spots shift when you use poverty rate alone vs. the 5-variable composite index? Are some areas identified as hot spots by one measure but not the other? [YOUR ANSWER HERE] - **Measurement implications:** What does this comparison tell you about using a single variable vs. a composite index for spatial analysis? Refer back to the Module 1 discussion of constructs vs. indicators. [YOUR ANSWER HERE] - **Policy relevance:** If you were advising a city agency on where to target neighborhood investment, would you use the poverty-rate hot spots or the composite-index hot spots? Why? [YOUR ANSWER HERE] --- ```{r} #| label: packages # ── One-time course environment setup ───────────────────────────────────── if (!file.exists("~/.paf516_ready")) { if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv") tmp <- tempfile(fileext = ".lock") download.file( url = "https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/renv.lock", destfile = tmp, mode = "wb" ) renv::restore(lockfile = tmp, prompt = FALSE) file.create("~/.paf516_ready") } library(tidyverse) library(tidycensus) library(sf) library(spdep) library(viridis) library(patchwork) ``` ```{r} #| label: api-key # Get a free Census API key at: https://api.census.gov/data/key_signup.html # Run ONCE in RStudio console: census_api_key("YOUR_KEY_HERE", install = TRUE) # Then restart R. ``` # Step 1: Pull Maricopa County Block Group Data ```{r} #| label: step.01-vars hardship_vars <- c( pov_below50 = "C17002_002", pov_50to99 = "C17002_003", pov_den = "C17002_001", median_income = "B19013_001", unemp_num = "B23025_005", unemp_den = "B23025_002", no_hs_num = "B15003_002", no_hs_den = "B15003_001", no_ins_num = "B27010_017", no_ins_den = "B27010_001" ) ``` ```{r} #| label: step.01-pull #| results: hide cbg_raw <- get_acs( geography = "block group", variables = hardship_vars, state = "AZ", county = "Maricopa", year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE ) ``` ```{r} #| label: step.01-pull-offline #| eval: false # ── Offline fallback — run this block instead if the Census API is down ── # Replace the get_acs() chunk above with these readRDS() calls. # All datasets are pre-built and stored in data/ for offline use. # # cbg_raw <- readRDS("../data/cbg_index_lab4_complete.rds") ``` # Step 2: Build the 5-Variable Economic Hardship Index ```{r} #| label: step.02-wrangle cbg_wide <- cbg_raw %>% select(-moe) %>% pivot_wider(names_from = variable, values_from = estimate) %>% mutate( poverty_rate = (pov_below50 + pov_50to99) / pov_den, unemp_rate = unemp_num / unemp_den, no_hs_rate = no_hs_num / no_hs_den, no_ins_rate = no_ins_num / no_ins_den ) cbg_index <- cbg_wide %>% filter(!is.na(poverty_rate), !is.na(unemp_rate), !is.na(no_hs_rate), !is.na(no_ins_rate), !is.na(median_income)) %>% mutate( z_poverty = as.numeric(scale(poverty_rate)), z_unemp = as.numeric(scale(unemp_rate)), z_no_hs = as.numeric(scale(no_hs_rate)), z_no_ins = as.numeric(scale(no_ins_rate)), z_income = -1 * as.numeric(scale(median_income)) ) %>% mutate( hardship_index = (z_poverty + z_unemp + z_no_hs + z_no_ins + z_income) / 5 ) cat("CBGs with valid index:", nrow(cbg_index), "\n") ``` ```{r} #| label: step.02-map #| fig-height: 8 ggplot(cbg_index) + geom_sf(aes(fill = hardship_index), color = NA) + scale_fill_viridis(option = "inferno", name = "Hardship\nIndex") + labs( title = "Economic Hardship Index — Maricopa County CBGs", subtitle = "Higher values = greater hardship (z-score composite)", caption = "Source: ACS 2019-2023 5-Year Estimates" ) + theme_void() ``` # Step 3: Build Spatial Weights ```{r} #| label: step.03-weights cbg_index <- st_make_valid(cbg_index) nb <- poly2nb(cbg_index, queen = TRUE) lw <- nb2listw(nb, style = "W", zero.policy = TRUE) ``` ::: {.callout-tip} ## Row-standardization explained If CBG A has 3 neighbors, each gets weight 1/3. If CBG B has 6 neighbors, each gets weight 1/6. This way, the **spatial lag** (weighted average of neighbors) is a true average regardless of how many neighbors a polygon has. ::: # Step 4: Global Moran's I ```{r} #| label: step.04-morans moran_global <- moran.test( cbg_index$hardship_index, lw, zero.policy = TRUE ) moran_global ``` ```{r} #| label: step.04-interpret cat("Moran's I statistic:", round(moran_global$estimate["Moran I statistic"], 4), "\n") cat("Expected I under randomness:", round(moran_global$estimate["Expectation"], 4), "\n") cat("p-value:", format.pval(moran_global$p.value, digits = 3), "\n") ``` # Step 5: Moran Scatter Plot ```{r} #| label: step.05-scatter #| fig-height: 7 moran.plot( cbg_index$hardship_index, lw, zero.policy = TRUE, xlab = "Economic Hardship Index (standardized)", ylab = "Spatial Lag of Economic Hardship Index", main = "Moran Scatter Plot — Economic Hardship", pch = 20, col = "steelblue" ) ``` # Step 6: Local Moran's I (LISA) ```{r} #| label: step.06-lisa lisa <- localmoran( cbg_index$hardship_index, lw, zero.policy = TRUE ) cbg_index$lisa_i <- lisa[, "Ii"] cbg_index$lisa_z <- lisa[, "Z.Ii"] cbg_index$lisa_p <- lisa[, "Pr(z != E(Ii))"] cbg_index$lag_hardship <- lag.listw(lw, cbg_index$hardship_index, zero.policy = TRUE) ``` ```{r} #| label: step.06-classify cbg_index <- cbg_index %>% mutate( val_centered = hardship_index - mean(hardship_index, na.rm = TRUE), lag_centered = lag_hardship - mean(lag_hardship, na.rm = TRUE), quadrant = case_when( val_centered > 0 & lag_centered > 0 ~ "HH", val_centered < 0 & lag_centered < 0 ~ "LL", val_centered > 0 & lag_centered < 0 ~ "HL", val_centered < 0 & lag_centered > 0 ~ "LH", TRUE ~ "NS" ), lisa_cluster = if_else(lisa_p <= 0.05, quadrant, "NS") ) table(cbg_index$lisa_cluster) ``` # Step 7: LISA Cluster Map ```{r} #| label: step.07-lisa-map #| fig-height: 9 lisa_colors <- c( "HH" = "red", "LL" = "blue", "HL" = "#F08080", "LH" = "#ADD8E6", "NS" = "grey90" ) cbg_index$lisa_cluster <- factor( cbg_index$lisa_cluster, levels = c("HH", "LL", "HL", "LH", "NS") ) ggplot(cbg_index) + geom_sf(aes(fill = lisa_cluster), color = "white", linewidth = 0.05) + scale_fill_manual( values = lisa_colors, labels = c("HH (Hot Spot)", "LL (Cold Spot)", "HL (High-Low Outlier)", "LH (Low-High Outlier)", "NS (Not Significant)"), name = "LISA Cluster" ) + labs( title = "LISA Cluster Map — Economic Hardship Index", subtitle = "Maricopa County CBGs | Queen contiguity, p <= 0.05", caption = "Source: ACS 2019-2023 5-Year Estimates" ) + theme_void() + theme(legend.position = "right") ``` ```{r} #| label: step.07-side-by-side #| fig-height: 8 p1 <- ggplot(cbg_index) + geom_sf(aes(fill = hardship_index), color = NA) + scale_fill_viridis(option = "inferno", name = "Hardship\nIndex") + labs(title = "Economic Hardship Index (Raw Values)") + theme_void() p2 <- ggplot(cbg_index) + geom_sf(aes(fill = lisa_cluster), color = "white", linewidth = 0.05) + scale_fill_manual(values = lisa_colors, name = "LISA\nCluster") + labs(title = "LISA Clusters (Significant Only)") + theme_void() p1 + p2 ``` --- # Question 2: Student Modify Code ```{r} #| label: q2-poverty-morans # YOUR CODE HERE # Hint: use moran.test() with cbg_index$poverty_rate # and the same lw weights object from Step 3 ``` ```{r} #| label: q2-poverty-lisa # YOUR CODE HERE # Hint: use localmoran() with cbg_index$poverty_rate, # then classify into HH/LL/HL/LH/NS using the same # centering + quadrant logic from Step 6 ``` ```{r} #| label: q2-poverty-lisa-map #| fig-height: 9 # YOUR CODE HERE # Hint: create a LISA cluster map for poverty_rate # using the same color scheme as Step 7 ``` ```{r} #| label: q2-comparison #| fig-height: 8 # YOUR CODE HERE # Hint: put the composite LISA map and poverty LISA map # side by side using patchwork (p1 + p2) ```