--- title: "Lab 1 Tutorial | PAF 516" subtitle: "Measurement, Indexing & Multi-Scale Mapping" format: html: theme: readable highlight-style: tango toc: true toc-title: "Steps" self-contained: true number-sections: false execute: echo: true warning: false message: false fig-width: 10 --- ::: {.callout-tip} ## How to use this file **First time:** Click the **Render** button (or Cmd+Shift+K / Ctrl+Shift+K) to produce a complete HTML with all results. Then come back and run chunk by chunk to understand each step. **Chunk by chunk:** Place cursor inside any chunk → Cmd+Return (Mac) / Ctrl+Enter (PC). This file is for learning — do not edit it. When you are ready for the assignment, download `Lab1_Assignment.qmd` from the [Lab 1 Assignment page](https://antjam-howell.github.io/paf-516-labs/lab1-assignment.html). ::: ```{r} #| label: setup #| include: false knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 10) options(tigris_use_cache = TRUE) options(tigris_progress_bar = FALSE) ``` ```{r} #| label: packages # ── One-time course environment setup ───────────────────────────────────── if (!file.exists("~/.paf516_ready")) { if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv") tmp <- tempfile(fileext = ".lock") download.file( url = "https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/renv.lock", destfile = tmp, mode = "wb" ) renv::restore(lockfile = tmp, prompt = FALSE) file.create("~/.paf516_ready") } library(tidyverse) library(tidycensus) library(sf) library(tigris) library(psych) library(corrplot) library(viridis) library(ggplot2) library(patchwork) ``` ```{r} #| label: build-tools #| eval: false # ── One-time build tools setup ───────────────────────────────────────────── # R needs a system-level compiler to install certain packages in this course. # This is a one-time setup — you will not need to repeat it in future labs. # # Windows: run these two lines in your RStudio Console: # install.packages("installr") # installr::install.Rtools() # # Mac: run this line in your Terminal (Applications → Utilities → Terminal): # xcode-select --install # # Once complete, restart RStudio, then run the packages chunk above. ``` ```{r} #| label: api-key # Get a free Census API key at: https://api.census.gov/data/key_signup.html # After receiving your key by email, run this line ONCE in the RStudio console: # census_api_key("YOUR_KEY_HERE", install = TRUE) # Then restart R: Session → Restart R # You only need to do this once — the key is saved to your .Renviron file. ``` ```{r} #| label: fallback-data # ── FALLBACK: Load pre-built datasets if Census API is unavailable ───── # Uncomment Option B lines below and skip Steps 1–4. # # Option A — from local file (save .rds files to PAF516/Lab1/data/ first): # county_complete <- readRDS("../data/hardship_counties_us.rds") # CenDF_complete <- readRDS("../data/hardship_tracts_az.rds") # cbg_complete <- readRDS("../data/hardship_cbg_maricopa.rds") # # Option B — directly from GitHub: # county_complete <- readRDS(url("https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/Lab1/data/hardship_counties_us.rds")) # CenDF_complete <- readRDS(url("https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/Lab1/data/hardship_tracts_az.rds")) # cbg_complete <- readRDS(url("https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/Lab1/data/hardship_cbg_maricopa.rds")) # # If using fallback data, skip to Step 5. ``` # Step 1: Retrieve Census Variables We build an **Economic Hardship Index** from 5 computed rates at the Arizona tract level. Each rate requires a numerator/denominator pair from the ACS, so the variable list contains 10 census codes — but they collapse into 5 measures: poverty rate, unemployment rate, no high school diploma rate, no health insurance rate, and median income. ```{r} #| label: step.01 #| results: hide hardship_vars <- c( pov_below50 = "C17002_002", # Income-to-poverty ratio under .50 pov_50to99 = "C17002_003", # Income-to-poverty ratio .50 to .99 pov_den = "C17002_001", # Total population (poverty universe) median_income = "B19013_001", # Median household income pct_unemp_num = "B23025_005", # Unemployed population pct_unemp_den = "B23025_002", # In labor force pct_no_hs_num = "B15003_002", # Population with less than high school pct_no_hs_den = "B15003_001", # Total population 25+ pct_no_ins_num = "B27010_017", # Without health insurance (18–34) pct_no_ins_den = "B27010_001" # Total population (insurance universe) ) CenDF <- get_acs( geography = "tract", variables = hardship_vars, state = "AZ", year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE ) ``` ```{r} #| label: step.01-offline #| eval: false # ── Offline fallback — run this block instead if the Census API is down ── # Replace the get_acs() chunk above with these readRDS() calls. # All datasets are pre-built and stored in data/ for offline use. # # CenDF_complete <- readRDS("../data/hardship_tracts_az.rds") # CenDF_complete <- readRDS("../data/CenDF_complete_5var.rds") ``` **Variable justification:** | Variable | Census Code | Why Included | |----------|------------|--------------| | Poverty rate | C17002_002 + _003 / _001 | Direct measure of economic deprivation | | Unemployment rate | B23025_005 / B23025_002 | Labor market hardship | | No high school diploma | B15003_002 / B15003_001 | Education predicts economic outcomes | | No health insurance | B27010_017 / B27010_001 | Reflects economic vulnerability | | Median household income | B19013_001 | Core income measure (reverse coded) | # Step 2: Compute Rates and Standardize Reshape from long to wide, compute percentage variables, and standardize to z-scores. ```{r} #| label: step.02 CenDF_wide <- CenDF %>% select(-moe) %>% pivot_wider(names_from = variable, values_from = estimate) CenDF_wide <- CenDF_wide %>% mutate( poverty_rate = (pov_below50 + pov_50to99) / pov_den, unemp_rate = pct_unemp_num / pct_unemp_den, no_hs_rate = pct_no_hs_num / pct_no_hs_den, no_ins_rate = pct_no_ins_num / pct_no_ins_den ) # Select indicator variables and standardize index_vars <- CenDF_wide %>% st_drop_geometry() %>% select(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) %>% drop_na() index_z <- index_vars %>% mutate(across(everything(), ~ as.numeric(scale(.)))) %>% mutate(median_income = -1 * median_income) # reverse-code: high income = low hardship ``` # Step 3: Cronbach's Alpha Assess whether the 5 items reliably measure the same underlying construct. ```{r} #| label: step.03 alpha_result <- psych::alpha(index_z) print(alpha_result) ``` ```{r} #| label: alpha-summary alpha_value <- alpha_result$total$raw_alpha cat("Cronbach's alpha:", round(alpha_value, 3), "\n") ``` **Interpreting the output:** The `raw_alpha` is our Cronbach's alpha. The "Reliability if an item is dropped" section shows what alpha would be without each variable — if dropping an item increases alpha, it may not fit the construct well. # Step 4: Build the Index and Visualize ```{r} #| label: step.04a cor_matrix <- cor(index_z, use = "complete.obs") corrplot( cor_matrix, method = "color", type = "upper", addCoef.col = "black", tl.col = "black", tl.srt = 45, title = "Inter-Item Correlation Matrix: Economic Hardship Index", mar = c(0, 0, 2, 0) ) ``` ```{r} #| label: step.04b index_z$hardship_index <- rowMeans(index_z, na.rm = TRUE) ggplot(index_z, aes(x = hardship_index)) + geom_histogram(bins = 40, fill = "#8c1d40", color = "white", alpha = 0.8) + geom_vline(xintercept = mean(index_z$hardship_index, na.rm = TRUE), linetype = "dashed", color = "#ffc627", linewidth = 1) + theme_minimal() + labs( title = "Distribution of Economic Hardship Index", subtitle = "Arizona Census Tracts (ACS 2023)", x = "Economic Hardship Index (z-score scale)", y = "Number of Tracts", caption = "Dashed line = mean" ) ``` ```{r} #| label: step.04c CenDF_complete <- CenDF_wide %>% drop_na(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) CenDF_complete$hardship_index <- index_z$hardship_index ``` # Step 5: National County-Level Map Pull the same 5 variables at the **county level** for the entire US and map national hardship patterns. ```{r} #| label: step.05-pull #| results: hide county_df <- get_acs( geography = "county", variables = hardship_vars, year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE ) ``` ```{r} #| label: step.05-pull-offline #| eval: false # ── Offline fallback — run this block instead if the Census API is down ── # Replace the get_acs() chunk above with these readRDS() calls. # All datasets are pre-built and stored in data/ for offline use. # # county_complete <- readRDS("../data/hardship_counties_us.rds") # county_complete <- readRDS("../data/county_complete.rds") ``` ```{r} #| label: step.05-index county_wide <- county_df %>% select(-moe) %>% pivot_wider(names_from = variable, values_from = estimate) %>% mutate( poverty_rate = (pov_below50 + pov_50to99) / pov_den, unemp_rate = pct_unemp_num / pct_unemp_den, no_hs_rate = pct_no_hs_num / pct_no_hs_den, no_ins_rate = pct_no_ins_num / pct_no_ins_den ) county_index <- county_wide %>% st_drop_geometry() %>% select(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) %>% drop_na() %>% mutate(across(everything(), ~ as.numeric(scale(.)))) %>% mutate(median_income = -1 * median_income) county_index$hardship_index <- rowMeans(county_index, na.rm = TRUE) county_complete <- county_wide %>% drop_na(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) county_complete$hardship_index <- county_index$hardship_index ``` ```{r} #| label: step.05-map county_shifted <- shift_geometry(county_complete) ggplot(county_shifted) + geom_sf(aes(fill = hardship_index), color = NA, linewidth = 0) + scale_fill_viridis_c(option = "inferno", name = "Hardship\nIndex", na.value = "grey80") + theme_void() + labs( title = "Economic Hardship Across U.S. Counties", subtitle = "ACS 5-Year (2023) | Higher values = greater hardship", caption = "Alaska and Hawaii shown as insets" ) ``` # Step 6: Arizona Census Tract Map ```{r} #| label: step.06-map ggplot(CenDF_complete) + geom_sf(aes(fill = hardship_index), color = NA) + scale_fill_viridis(option = "inferno", direction = -1, name = "Hardship\nIndex") + theme_void() + labs( title = "Economic Hardship Index by Census Tract", subtitle = "Arizona — ACS 5-Year (2023)", caption = "Higher values = greater economic hardship" ) ``` # Step 7: Maricopa County Block Group Map Block groups are the smallest ACS geography, revealing fine-grained variation within tracts. ```{r} #| label: step.07-pull #| results: hide cbg_df <- get_acs( geography = "block group", variables = hardship_vars, state = "AZ", county = "Maricopa", year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE ) ``` ```{r} #| label: step.07-pull-offline #| eval: false # ── Offline fallback — run this block instead if the Census API is down ── # Replace the get_acs() chunk above with these readRDS() calls. # All datasets are pre-built and stored in data/ for offline use. # # cbg_complete <- readRDS("../data/hardship_cbg_maricopa.rds") # cbg_complete <- readRDS("../data/cbg_complete.rds") ``` ```{r} #| label: step.07-index cbg_wide <- cbg_df %>% select(-moe) %>% pivot_wider(names_from = variable, values_from = estimate) %>% mutate( poverty_rate = (pov_below50 + pov_50to99) / pov_den, unemp_rate = pct_unemp_num / pct_unemp_den, no_hs_rate = pct_no_hs_num / pct_no_hs_den, no_ins_rate = pct_no_ins_num / pct_no_ins_den ) cbg_index <- cbg_wide %>% st_drop_geometry() %>% select(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) %>% drop_na() %>% mutate(across(everything(), ~ as.numeric(scale(.)))) %>% mutate(median_income = -1 * median_income) cbg_index$hardship_index <- rowMeans(cbg_index, na.rm = TRUE) cbg_complete <- cbg_wide %>% drop_na(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) cbg_complete$hardship_index <- cbg_index$hardship_index ``` ```{r} #| label: step.07-map ggplot(cbg_complete) + geom_sf(aes(fill = hardship_index), color = NA) + scale_fill_viridis(option = "inferno", direction = -1, name = "Hardship\nIndex") + theme_void() + labs( title = "Economic Hardship Index by Block Group", subtitle = "Maricopa County, AZ — ACS 5-Year (2023)", caption = "Higher values = greater economic hardship" ) ``` # Step 8: Enriched Index — Does Measurement Choice Change the Map? The 5-variable baseline captures five well-established dimensions of economic hardship. But hardship is multi-dimensional. A broader index might also reflect: | New Variable | Dimension | Census Code | |---|---|---| | Renter-occupied rate | Housing instability / wealth exclusion | B25003_003 / B25003_001 | | No vehicle access | Transportation barrier | B08201_002 / B08201_001 | | Single-parent household rate | Family structure stress | B11012_010 / B11001_001 | | Public assistance income rate | Direct economic need | B19057_002 / B19057_001 | | No internet access rate | Digital divide / economic barrier | B28002_013 / B28002_001 | Adding these five produces a **10-variable enriched index**. The key question: does this broader measurement change which communities we identify as hardship hotspots — and how much? ```{r} #| label: step.08-pull #| results: hide hardship_vars_10 <- c( # ── Original 5 variable pairs ────────────────────────────────────── pov_below50 = "C17002_002", pov_50to99 = "C17002_003", pov_den = "C17002_001", median_income = "B19013_001", pct_unemp_num = "B23025_005", pct_unemp_den = "B23025_002", pct_no_hs_num = "B15003_002", pct_no_hs_den = "B15003_001", pct_no_ins_num = "B27010_017", pct_no_ins_den = "B27010_001", # ── 5 new dimensions ─────────────────────────────────────────────── renter_num = "B25003_003", renter_den = "B25003_001", noveh_num = "B08201_002", noveh_den = "B08201_001", sinpar_num = "B11012_010", sinpar_den = "B11001_001", pubast_num = "B19057_002", pubast_den = "B19057_001", noinet_num = "B28002_013", noinet_den = "B28002_001" ) CenDF_10 <- get_acs( geography = "tract", variables = hardship_vars_10, state = "AZ", year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE ) ``` ```{r} #| label: step.08-pull-offline #| eval: false # ── Offline fallback — run this block instead if the Census API is down ── # Replace the get_acs() chunk above with these readRDS() calls. # All datasets are pre-built and stored in data/ for offline use. # # CenDF_complete_10 <- readRDS("../data/CenDF_complete_10var.rds") ``` ```{r} #| label: step.08-build10 CenDF_wide_10 <- CenDF_10 %>% select(-moe) %>% pivot_wider(names_from = variable, values_from = estimate) %>% mutate( # Original 5 poverty_rate = (pov_below50 + pov_50to99) / pov_den, unemp_rate = pct_unemp_num / pct_unemp_den, no_hs_rate = pct_no_hs_num / pct_no_hs_den, no_ins_rate = pct_no_ins_num / pct_no_ins_den, # New 5 renter_rate = renter_num / renter_den, noveh_rate = noveh_num / noveh_den, sinpar_rate = sinpar_num / sinpar_den, pubast_rate = pubast_num / pubast_den, noinet_rate = noinet_num / noinet_den ) CenDF_complete_10 <- CenDF_wide_10 %>% drop_na(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income, renter_rate, noveh_rate, sinpar_rate, pubast_rate, noinet_rate) index_z_10 <- CenDF_complete_10 %>% st_drop_geometry() %>% select(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income, renter_rate, noveh_rate, sinpar_rate, pubast_rate, noinet_rate) %>% mutate(across(everything(), ~ as.numeric(scale(.)))) %>% mutate(median_income = -1 * median_income) # reverse-code # Compare alpha: 5-variable vs 10-variable alpha_5 <- psych::alpha(index_z |> select(-hardship_index)) alpha_10 <- psych::alpha(index_z_10) cat("5-variable alpha: ", round(alpha_5$total$raw_alpha, 3), "\n") cat("10-variable alpha:", round(alpha_10$total$raw_alpha, 3), "\n") cat("Change: ", round(alpha_10$total$raw_alpha - alpha_5$total$raw_alpha, 3), "\n") # Build enriched index index_z_10$hardship_index_10 <- rowMeans(index_z_10, na.rm = TRUE) CenDF_complete_10$hardship_index_10 <- index_z_10$hardship_index_10 ``` ```{r} #| label: step.08-compare-maps #| fig-width: 16 #| fig-height: 6 p_5var <- ggplot(CenDF_complete) + geom_sf(aes(fill = hardship_index), color = NA) + scale_fill_viridis_c(option = "inferno", name = "Hardship", na.value = "grey80") + theme_void() + labs(title = "5-Variable Index", subtitle = "Poverty · Unemployment · Education\nInsurance · Income") p_10var <- ggplot(CenDF_complete_10) + geom_sf(aes(fill = hardship_index_10), color = NA) + scale_fill_viridis_c(option = "inferno", name = "Hardship", na.value = "grey80") + theme_void() + labs(title = "10-Variable Enriched Index", subtitle = "Adds renter · no vehicle · single-parent\npublic assistance · no internet") p_5var + p_10var + plot_annotation( title = "Original vs. Enriched Economic Hardship Index", subtitle = "Does expanding measurement change the spatial story?", caption = "ACS 5-Year (2023) | Arizona Census Tracts" ) ``` ```{r} #| label: step.08-difference #| fig-width: 9 #| fig-height: 6 compare_df <- CenDF_complete_10 %>% select(GEOID, geometry, hardship_index_10) %>% left_join( CenDF_complete %>% st_drop_geometry() %>% select(GEOID, hardship_index), by = "GEOID" ) %>% mutate(diff_10_minus_5 = hardship_index_10 - hardship_index) ggplot(compare_df) + geom_sf(aes(fill = diff_10_minus_5), color = NA) + scale_fill_gradient2( low = "#4575b4", mid = "#f7f7f7", high = "#d73027", midpoint = 0, name = "10-var\nminus\n5-var" ) + theme_void() + labs( title = "Where Does Measurement Choice Matter Most?", subtitle = "Red = tracts scoring higher with enriched index\nBlue = tracts scoring lower", caption = "ACS 5-Year (2023) | Difference = 10-variable minus 5-variable index" ) ``` **Interpreting the difference map:** Red tracts are communities where the enriched index reveals *more hardship* than the baseline captured — likely places with high renter rates, low vehicle access, or many single-parent households that census income and poverty measures undercount. Blue tracts are places where those new dimensions don't add to the hardship signal — often wealthier suburbs where some of those variables don't apply. --- ::: {.callout-note} ## Ready for the assignment? Head to the [Lab 1 Assignment page](https://antjam-howell.github.io/paf-516-labs/lab1-assignment.html) to download `Lab1_Assignment.qmd`. The assignment asks you to add **SNAP receipt** as an 11th variable and show whether one more dimension still meaningfully shifts the difference map. :::