---
title: "Lab 5 Tutorial | PAF 516"
subtitle: "Temporal Change Analysis — 2013 → 2019"
format:
  html:
    theme: readable
    highlight-style: tango
    toc: true
    toc-title: "Steps"
    self-contained: true
    number-sections: false
execute:
  echo: true
  warning: false
  message: false
  fig-width: 10
---

::: {.callout-tip}
## How to use this file
**First time:** Click **Render** (Cmd+Shift+K / Ctrl+Shift+K) to see all results. Then run chunk by chunk to understand each step.

This file is for learning — do not edit it. When ready for the assignment, go to:
[Lab 5 Assignment](https://antjam-howell.github.io/paf-516-labs/lab5-assignment.html)
:::

```{r}
#| label: setup
#| include: false
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 10)
options(tigris_use_cache = TRUE)
options(tigris_progress_bar = FALSE)
```

```{r}
#| label: packages
# ── One-time course environment setup ─────────────────────────────────────
# Sentinel lives in user home (~/.paf516_ready) — persists across all lab
# folders. renv restores ONCE for the entire course, not once per lab.
# To force re-install: delete ~/.paf516_ready and re-render.
if (!file.exists("~/.paf516_ready")) {
  if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
  tmp <- tempfile(fileext = ".lock")
  download.file(
    url      = "https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/renv.lock",
    destfile = tmp, mode = "wb"
  )
  renv::restore(lockfile = tmp, prompt = FALSE)
  file.create("~/.paf516_ready")
}

library(tidyverse)
library(tidycensus)
library(sf)
library(viridis)
library(patchwork)
library(spdep)
```

```{r}
#| label: api-key
# Get a free key at: https://api.census.gov/data/key_signup.html
# Run ONCE in the console: census_api_key("YOUR_KEY", install = TRUE)
# Then restart R.
```


# Overview

In Labs 1–4 we built, mapped, and spatially analyzed our economic hardship index at a single point in time. This lab adds the **time dimension**: we pull the same hardship variables for two ACS vintages bracketing the post-recession decade, compute change scores on a common pooled scale, test whether changes exceed sampling noise, and compare findings to the Lab 4 LISA clusters.

**Time periods — both on 2010 TIGER/Line boundaries:**

- **Period 1:** 2009–2013 ACS 5-year estimates (`year = 2013`) — Maricopa County mid-recovery, unemployment still ~7%
- **Period 2:** 2015–2019 ACS 5-year estimates (`year = 2019`) — recovery peak, pre-COVID

Using years that share the same decennial boundary vintage means every census tract GEOID is identical across both pulls. No missing tracts, no white gaps.

**Index: 3 variables**

We use three ACS variables with excellent tract-level coverage in both 2013 and 2019:

| Variable | Tables | Direction |
|---|---|---|
| Poverty rate | C17002_002 + C17002_003 / C17002_001 | ↑ = more hardship |
| Unemployment rate | B23025_005 / B23025_002 | ↑ = more hardship |
| Median household income | B19013_001 | ↑ = LESS hardship (reverse-coded) |

Education variables (B15003) are excluded: `B15003_002` captures only "no schooling completed" — a single narrow cell that dramatically undercounts the "less than HS diploma" population and produces unstable estimates at small geographies. The three variables above have been consistently available since the 2009 ACS with near-complete tract-level coverage.


# Step 1: Define Variables and Pull Data

```{r}
#| label: step.01-vars

# ── Variables available in both the 2013 and 2019 ACS ────────────────────
hardship_vars <- c(
  pov_below50   = "C17002_002",   # Income-to-poverty ratio < 0.50
  pov_50to99    = "C17002_003",   # Income-to-poverty ratio 0.50–0.99
  pov_den       = "C17002_001",   # Total pop (poverty universe)
  unemp_num     = "B23025_005",   # Unemployed
  unemp_den     = "B23025_002",   # In labor force
  median_income = "B19013_001"    # Median household income
)
```

```{r}
#| label: step.01-pull-2013
#| results: hide

# ── Load from shared data cache or pull from Census API ───────────────────
# Pre-built RDS files in /data/ are shared across Labs 5 and 6 — no duplicate
# API calls. If the cache doesn't exist, pulls live and saves for next time.
load_or_pull <- function(rds_path, pull_fn) {
  if (file.exists(rds_path)) {
    readRDS(rds_path)
  } else {
    result <- pull_fn()
    dir.create(dirname(rds_path), showWarnings = FALSE, recursive = TRUE)
    saveRDS(result, rds_path)
    result
  }
}

# 2009–2013 ACS | year = 2013 | 2010 TIGER/Line boundaries
tract_2013_raw <- load_or_pull(
  "../data/maricopa_tracts_2013.rds",
  function() get_acs(
    geography = "tract", variables = hardship_vars,
    state = "AZ", county = "Maricopa", year = 2013,
    survey = "acs5", geometry = TRUE, progress_bar = FALSE
  )
)
cat("2013 rows:", nrow(tract_2013_raw), "\n")
```

```{r}
#| label: step.01-pull-2013-offline
#| eval: false
# ── Offline fallback — run this block instead if the Census API is down ──
# Replace the get_acs() chunk above with these readRDS() calls.
# All datasets are pre-built and stored in data/ for offline use.
#
# tract_2013_raw <- readRDS("../data/maricopa_tracts_2013.rds")
```

```{r}
#| label: step.01-pull-2019
#| results: hide

# 2015–2019 ACS | year = 2019 | also 2010 TIGER/Line boundaries
tract_2019_raw <- load_or_pull(
  "../data/maricopa_tracts_2019.rds",
  function() get_acs(
    geography = "tract", variables = hardship_vars,
    state = "AZ", county = "Maricopa", year = 2019,
    survey = "acs5", geometry = TRUE, progress_bar = FALSE
  )
)
cat("2019 rows:", nrow(tract_2019_raw), "\n")
```

```{r}
#| label: step.01-pull-2019-offline
#| eval: false
# ── Offline fallback — run this block instead if the Census API is down ──
# Replace the get_acs() chunk above with these readRDS() calls.
# All datasets are pre-built and stored in data/ for offline use.
#
# tract_2019_raw <- readRDS("../data/maricopa_tracts_2019.rds")
```


# Step 2: Reshape and Compute Rates

```{r}
#| label: step.02-widen

# ── Pivot long → wide; compute poverty rate and unemployment rate ─────────
widen_and_rate <- function(df) {
  df %>%
    select(-moe) %>%
    pivot_wider(names_from = variable, values_from = estimate) %>%
    mutate(
      poverty_rate = (pov_below50 + pov_50to99) / pov_den,
      unemp_rate   = unemp_num / unemp_den
    ) %>%
    filter(
      !is.na(poverty_rate),
      !is.na(unemp_rate),
      !is.na(median_income)
    )
}

tract_2013_wide <- widen_and_rate(tract_2013_raw)
tract_2019_wide <- widen_and_rate(tract_2019_raw)

cat("2013 tracts with complete data:", nrow(tract_2013_wide), "\n")
cat("2019 tracts with complete data:", nrow(tract_2019_wide), "\n")
```


# Step 3: Pooled Standardization

Before computing change scores both years must be on the **same scale**. If we standardized each year separately, a z-score of 1.0 in 2013 and 1.0 in 2019 would both mean "one SD above *that year's* mean" — hiding the fact that 2013 was a harder year on average. Pooled standardization computes the mean and SD across *both years combined*, so z-scores are directly comparable across time periods.

```{r}
#| label: step.03-pooled

# ── Drop geometry; select only the variables entering the index ───────────
r2013 <- tract_2013_wide %>%
  st_drop_geometry() %>%
  select(GEOID, poverty_rate, unemp_rate, median_income)

r2019 <- tract_2019_wide %>%
  st_drop_geometry() %>%
  select(GEOID, poverty_rate, unemp_rate, median_income)

# ── Pooled mean and SD: computed from all observations in both years ───────
vars_to_pool <- c("poverty_rate", "unemp_rate", "median_income")

pooled_stats <- map_dfr(vars_to_pool, function(v) {
  vals <- c(r2013[[v]], r2019[[v]])
  tibble(variable = v,
         mean     = mean(vals, na.rm = TRUE),
         sd       = sd(vals,   na.rm = TRUE))
})

cat("── Pooled statistics (2013 + 2019 combined) ──\n")
print(pooled_stats)
```

```{r}
#| label: step.03-apply

# ── Standardize both years using the pooled mean and SD ───────────────────
apply_pooled_z <- function(df, stats) {
  for (v in stats$variable) {
    pm <- stats$mean[stats$variable == v]
    ps <- stats$sd[stats$variable == v]
    df[[paste0("z_", v)]] <- (df[[v]] - pm) / ps
  }
  # Reverse-code income: higher income = LOWER hardship
  df$z_median_income <- -1 * df$z_median_income

  # Composite index: row mean of the 3 standardized variables
  df$hardship_index <- rowMeans(
    df[, c("z_poverty_rate", "z_unemp_rate", "z_median_income")],
    na.rm = TRUE
  )
  return(df)
}

r2013_z <- apply_pooled_z(r2013, pooled_stats)
r2019_z <- apply_pooled_z(r2019, pooled_stats)

# ── Sanity check ──────────────────────────────────────────────────────────
# Because both years are standardized on the same pooled scale,
# the 2013 mean will be slightly positive if that was a harder year overall.
cat("Mean economic hardship index 2013:", round(mean(r2013_z$hardship_index), 3), "\n")
cat("Mean economic hardship index 2019:", round(mean(r2019_z$hardship_index), 3), "\n")
```


# Step 4: Join and Compute Change Scores

```{r}
#| label: step.04-join

# ── Attach 2019 geometry; join 2013 index on GEOID ───────────────────────
p2019_sf <- tract_2019_wide %>%
  select(GEOID, NAME) %>%
  left_join(r2019_z %>% select(GEOID, hardship_2019 = hardship_index),
            by = "GEOID")

p2013_tbl <- r2013_z %>%
  select(GEOID, hardship_2013 = hardship_index)

# inner_join keeps only tracts present in BOTH periods with complete data.
# Because 2013 and 2019 share 2010 TIGER/Line boundaries, nearly all tracts match.
change_df <- inner_join(p2019_sf, p2013_tbl, by = "GEOID")

cat("Tracts matched across 2013 and 2019:", nrow(change_df), "\n")
```

```{r}
#| label: step.04-change

# ── Change score: positive = worsened, negative = improved ────────────────
change_df <- change_df %>%
  mutate(hardship_change = hardship_2019 - hardship_2013)

cat("── Hardship change summary (2013 → 2019) ──\n")
cat("Mean change:  ", round(mean(  change_df$hardship_change, na.rm = TRUE), 3), "\n")
cat("Median change:", round(median(change_df$hardship_change, na.rm = TRUE), 3), "\n")
cat("SD of change: ", round(sd(    change_df$hardship_change, na.rm = TRUE), 3), "\n")
cat("\n")

n_worse  <- sum(change_df$hardship_change >  0.05, na.rm = TRUE)
n_better <- sum(change_df$hardship_change < -0.05, na.rm = TRUE)
n_stable <- nrow(change_df) - n_worse - n_better

cat("Tracts where hardship worsened  (Δ > +0.05):", n_worse,
    sprintf("(%.1f%%)\n", 100 * n_worse  / nrow(change_df)))
cat("Tracts where hardship improved  (Δ < -0.05):", n_better,
    sprintf("(%.1f%%)\n", 100 * n_better / nrow(change_df)))
cat("Tracts essentially stable      (|Δ| ≤ 0.05):", n_stable,
    sprintf("(%.1f%%)\n", 100 * n_stable / nrow(change_df)))
```


# Step 5: MOE Significance Test

ACS estimates carry margins of error that can be substantial. A change in poverty rate from 15% to 18% looks meaningful but may fall within sampling noise. The Census Bureau recommends a z-test for the difference between two ACS estimates:

$$z = \frac{\hat{p}_{2019} - \hat{p}_{2013}}{\sqrt{SE_{2013}^2 + SE_{2019}^2}}, \quad SE = \frac{MOE}{1.645}$$

If |z| > 1.645, the change is significant at the 90% confidence level.

```{r}
#| label: step.05-moe

# ── Extract poverty MOE from the raw 2013 pull ────────────────────────────
moe_2013 <- tract_2013_raw %>%
  filter(variable %in% c("pov_below50", "pov_50to99", "pov_den")) %>%
  st_drop_geometry() %>%
  select(GEOID, variable, estimate, moe) %>%
  pivot_wider(names_from  = variable,
              values_from = c(estimate, moe))

# ── Extract poverty MOE from the raw 2019 pull ────────────────────────────
moe_2019 <- tract_2019_raw %>%
  filter(variable %in% c("pov_below50", "pov_50to99", "pov_den")) %>%
  st_drop_geometry() %>%
  select(GEOID, variable, estimate, moe) %>%
  pivot_wider(names_from  = variable,
              values_from = c(estimate, moe))

# ── Compute poverty rate and SE for each year ─────────────────────────────
# Propagated MOE for a ratio: sqrt(MOE_num² + rate² × MOE_den²) / den
compute_pov_se <- function(df) {
  df %>%
    mutate(
      pov_est = (estimate_pov_below50 + estimate_pov_50to99) / estimate_pov_den,
      pov_moe = sqrt(moe_pov_below50^2 + moe_pov_50to99^2 +
                     pov_est^2 * moe_pov_den^2) / estimate_pov_den,
      pov_se  = pov_moe / 1.645
    ) %>%
    select(GEOID, pov_est, pov_se)
}

pov_2013 <- compute_pov_se(moe_2013)   # SE from the 2013 pull
pov_2019 <- compute_pov_se(moe_2019)   # SE from the 2019 pull

# ── Z-test: is the 2013→2019 change in poverty rate statistically real? ───
pov_sig <- inner_join(pov_2013, pov_2019,
                      by     = "GEOID",
                      suffix = c("_2013", "_2019")) %>%
  mutate(
    pov_change = pov_est_2019 - pov_est_2013,
    pov_z      = pov_change / sqrt(pov_se_2013^2 + pov_se_2019^2),
    sig_change = abs(pov_z) > 1.645
  )

cat("Tracts with statistically significant poverty rate change (90% CI):\n")
cat("  Significant:    ", sum( pov_sig$sig_change, na.rm = TRUE),
    sprintf("(%.1f%%)\n", 100 * mean(pov_sig$sig_change, na.rm = TRUE)))
cat("  Not significant:", sum(!pov_sig$sig_change, na.rm = TRUE), "\n")

# ── Attach significance flag to change_df ────────────────────────────────
change_df <- change_df %>%
  left_join(pov_sig %>% select(GEOID, pov_change, pov_z, sig_change),
            by = "GEOID")
```


# Step 6: Change Map with Diverging Palette

```{r}
#| label: step.06-change-map
#| fig-height: 8

# ── Winsorize at 2nd/98th percentile to prevent outlier compression ───────
lo      <- quantile(change_df$hardship_change, 0.02, na.rm = TRUE)
hi      <- quantile(change_df$hardship_change, 0.98, na.rm = TRUE)
max_abs <- max(abs(lo), abs(hi))

change_df <- change_df %>%
  mutate(change_plot = pmax(pmin(hardship_change, max_abs), -max_abs))

# ── Blue = improved 2013→2019 | White = no change | Red = worsened ────────
ggplot(change_df) +
  geom_sf(aes(fill = change_plot), color = NA) +
  scale_fill_gradient2(
    low      = "#2C7BB6",
    mid      = "white",
    high     = "#D7191C",
    midpoint = 0,
    limits   = c(-max_abs, max_abs),
    name     = "Economic\nHardship\nChange"
  ) +
  theme_void() +
  labs(
    title    = "Change in Economic Hardship Index (2013 → 2019)",
    subtitle = "Maricopa County Census Tracts | Pooled z-score standardization",
    caption  = "Blue = hardship decreased (improved) | Red = hardship increased (worsened) | Winsorized at 2nd/98th pct"
  )
```

```{r}
#| label: step.06-sig-map
#| fig-height: 8

# ── Significance overlay: color only tracts where change exceeds MOE ───────
ggplot(change_df) +
  geom_sf(fill = "#CCCCCC", color = NA) +
  geom_sf(data = change_df %>% filter(sig_change == TRUE),
          aes(fill = change_plot), color = NA) +
  scale_fill_gradient2(
    low = "#2C7BB6", mid = "white", high = "#D7191C",
    midpoint = 0, limits = c(-max_abs, max_abs),
    name = "Economic\nHardship\nChange", na.value = "#CCCCCC"
  ) +
  theme_void() +
  labs(
    title    = "Statistically Significant Hardship Change (2013 → 2019)",
    subtitle = "Only tracts where poverty rate change exceeds margin of error",
    caption  = "Gray = change not statistically significant at 90% confidence level"
  )
```


# Step 7: Side-by-Side — Both Periods and Change

```{r}
#| label: step.07-sidebyside
#| fig-width: 16
#| fig-height: 6

p1 <- ggplot(change_df) +
  geom_sf(aes(fill = hardship_2013), color = NA) +
  scale_fill_viridis_c(option = "inferno", direction = -1, name = "Economic\nHardship\nIndex") +
  theme_void() +
  labs(title = "Economic Hardship Index 2013", subtitle = "2009–2013 ACS | mid-recovery")

p2 <- ggplot(change_df) +
  geom_sf(aes(fill = hardship_2019), color = NA) +
  scale_fill_viridis_c(option = "inferno", direction = -1, name = "Economic\nHardship\nIndex") +
  theme_void() +
  labs(title = "Economic Hardship Index 2019", subtitle = "2015–2019 ACS | recovery peak")

p3 <- ggplot(change_df) +
  geom_sf(aes(fill = change_plot), color = NA) +
  scale_fill_gradient2(
    low = "#2C7BB6", mid = "white", high = "#D7191C",
    midpoint = 0, limits = c(-max_abs, max_abs), name = "Change"
  ) +
  theme_void() +
  labs(title = "Change (2019 − 2013)")

p1 + p2 + p3 +
  plot_annotation(
    title    = "Economic Hardship: Post-Recession Decade (2013–2019)",
    subtitle = "Maricopa County Census Tracts | ACS 5-Year Estimates",
    caption  = "Left/Center: viridis palette (higher = more hardship) | Right: diverging palette (blue = improved, red = worsened)"
  )
```


# Step 8: Overlay Lab 4 LISA Clusters

Are the hot spots identified in Lab 4 persisting, worsening, or dissolving over the 2013–2019 period?

```{r}
#| label: step.08-lisa

# ── Build LISA clusters from the 2019 economic hardship index ─────────────────────
nb   <- poly2nb(change_df, queen = TRUE)
w    <- nb2listw(nb, style = "W", zero.policy = TRUE)
lisa <- localmoran(change_df$hardship_2019, w, zero.policy = TRUE)

z_vals  <- scale(change_df$hardship_2019)[, 1]
wz_vals <- lag.listw(w, z_vals, zero.policy = TRUE)
p_vals  <- lisa[, "Pr(z != E(Ii))"]

change_df <- change_df %>%
  mutate(
    lisa_quad = case_when(
      z_vals > 0 & wz_vals > 0 & p_vals < 0.05 ~ "HH (Hot Spot)",
      z_vals < 0 & wz_vals < 0 & p_vals < 0.05 ~ "LL (Cold Spot)",
      z_vals > 0 & wz_vals < 0 & p_vals < 0.05 ~ "HL (Outlier)",
      z_vals < 0 & wz_vals > 0 & p_vals < 0.05 ~ "LH (Outlier)",
      TRUE                                       ~ "Not Significant"
    )
  )

# ── Were 2019 hot spots already hot in 2013, or did they worsen? ──────────
cat("── Change direction for 2019 hot spots (vs. 2013 baseline) ──\n")
change_df %>%
  st_drop_geometry() %>%
  filter(lisa_quad == "HH (Hot Spot)") %>%
  mutate(direction = case_when(
    hardship_change >  0.10 ~ "Worsened 2013→2019",
    hardship_change < -0.10 ~ "Improved 2013→2019",
    TRUE                    ~ "Stable   2013→2019"
  )) %>%
  count(direction) %>%
  mutate(pct = round(100 * n / sum(n), 1)) %>%
  print()
```

```{r}
#| label: step.08-overlay
#| fig-height: 8

ggplot(change_df) +
  geom_sf(aes(fill = change_plot), color = NA) +
  scale_fill_gradient2(
    low = "#2C7BB6", mid = "white", high = "#D7191C",
    midpoint = 0, limits = c(-max_abs, max_abs),
    name = "Economic\nHardship\nChange"
  ) +
  geom_sf(data = change_df %>% filter(lisa_quad == "HH (Hot Spot)"),
          fill = NA, color = "black", linewidth = 0.5) +
  theme_void() +
  labs(
    title    = "Economic Hardship Change (2013 → 2019) with Hot Spots Outlined",
    subtitle = "Black outline = LISA HH hot spot (2019, p < 0.05)",
    caption  = "Blue = improved | Red = worsened | Black outline = hot spot boundary"
  )
```


# Step 9: Distribution of Change Scores

```{r}
#| label: step.09-histogram
#| fig-height: 5

ggplot(change_df, aes(x = hardship_change)) +
  geom_histogram(bins = 50, fill = "#8C1D40", color = "white", alpha = 0.85) +
  geom_vline(xintercept = 0,
             linetype = "dashed", color = "#FFC627", linewidth = 1) +
  geom_vline(xintercept = mean(change_df$hardship_change, na.rm = TRUE),
             linetype = "solid", color = "#2C7BB6", linewidth = 0.8) +
  theme_minimal(base_size = 12) +
  labs(
    title    = "Distribution of Economic Hardship Change Scores (2013 → 2019)",
    subtitle = "Maricopa County Census Tracts",
    x        = "Change in Economic Hardship Index (pooled z-score units)",
    y        = "Number of Tracts",
    caption  = "Dashed gold = zero change | Solid blue = mean change"
  )
```

---

::: {.callout-note}
## Ready for the Assignment
Head to the [Lab 5 Assignment](https://antjam-howell.github.io/paf-516-labs/lab5-assignment.html) to download `Lab5_Assignment.qmd`. The assignment asks you to interpret the change patterns and compare scales.
:::