--- title: "Lab 1 Assignment | PAF 516" subtitle: "Measurement, Indexing & Multi-Scale Mapping" format: html: theme: readable highlight-style: tango toc: true toc-title: "Steps & Questions" self-contained: true number-sections: false execute: echo: true warning: false message: false fig-width: 10 --- ::: {.callout-important} ## Before you begin 1. **This file contains the same pre-filled code as the tutorial.** Steps 1–8 below produce the same output you saw in the tutorial. If this file is in the same folder as your tutorial QMD, you can knit immediately (Cmd+Shift+K / Ctrl+Shift+K) and see the same results. If you placed it in a different location, run the `packages` chunk first (Cmd+Return / Ctrl+Enter), then knit. 3. **The assignment questions are at the top of this document.** Here is the workflow: - **Q1** asks you to interpret results from the pre-filled steps. Knit first, then review the Cronbach's alpha output and maps in Steps 3–8, then write your answers. - **Q2** asks you to extend the enriched index by adding SNAP benefits as an 11th variable by adapting the existing code. Fill in the scaffold chunks labeled `q2-...` at the end of this document, then re-knit. - **Q3** asks you to evaluate diminishing returns — does adding one more variable still shift the map meaningfully? Compare the 5→10 and 10→11 changes. 4. Submit both `Lab1_Assignment_YourLastName.qmd` and `Lab1_Assignment_YourLastName.html` to Canvas. ::: # Assignment Questions ## Q1 — Interpret the Baseline and Enriched Results {.question} > **Text answer — no code required.** Knit the file first, then review the outputs from Steps 3–8 and answer below. - **Cronbach's alpha:** What is the 5-variable alpha? What is the 10-variable alpha? Did expanding the index improve internal consistency? What does this tell you about whether the new dimensions belong in the hardship construct? [YOUR ANSWER HERE] - **Map comparison (Step 8):** Looking at the 5-variable vs. 10-variable maps, where do they disagree most? Identify a specific area of Arizona where expanding measurement changes the apparent level of hardship. [YOUR ANSWER HERE] - **Difference map:** What does the red/blue pattern reveal? What types of communities — urban core, suburbs, rural, tribal areas — tend to score *higher* in the enriched index relative to the baseline? Why might that be? [YOUR ANSWER HERE] ## Q2 — Add SNAP as an 11th Variable {.question} > **Code + output required.** > > The 10-variable enriched index does not yet include SNAP (food assistance) receipt. Add **percent of households receiving SNAP benefits** as an 11th variable: > - Numerator: `B19058_002` (households receiving SNAP) > - Denominator: `B19058_001` (total households) > > Fill in the three scaffold chunks at the end of this document to: > 1. Re-pull tract data with SNAP added > 2. Recompute Cronbach's alpha with 11 variables and build the new index > 3. Produce a side-by-side map (10-var vs. 11-var) **and** a difference map (11-var minus 10-var) ## Q3 — Evaluate Diminishing Returns {.question} > **Text answer — no code required.** - **Marginal alpha change:** How much did alpha change going from 5→10 variables? How much from 10→11 (adding SNAP)? Is there a pattern of diminishing returns? [YOUR ANSWER HERE] - **Marginal map change:** Compare the 5→10 difference map (Step 8) to your 10→11 difference map (Q2). Did adding SNAP produce as dramatic a spatial shift as expanding from 5 to 10 variables? Where, if anywhere, did adding SNAP still matter? [YOUR ANSWER HERE] - **Recommendation:** At what point should you stop adding variables to a composite index? Using evidence from your alpha comparisons and difference maps, make a recommendation for how many variables this economic hardship index should include — and justify it. [YOUR ANSWER HERE] --- ```{r} #| label: packages # ── One-time course environment setup ───────────────────────────────────── if (!file.exists("~/.paf516_ready")) { if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv") tmp <- tempfile(fileext = ".lock") download.file( url = "https://raw.githubusercontent.com/AntJam-Howell/paf-516-labs/main/renv.lock", destfile = tmp, mode = "wb" ) renv::restore(lockfile = tmp, prompt = FALSE) file.create("~/.paf516_ready") } library(tidyverse) library(tidycensus) library(sf) library(tigris) library(psych) library(corrplot) library(viridis) library(ggplot2) library(patchwork) ``` ```{r} #| label: api-key # Get a free Census API key at: https://api.census.gov/data/key_signup.html # After receiving your key by email, run this line ONCE in the RStudio console: # census_api_key("YOUR_KEY_HERE", install = TRUE) # Then restart R: Session → Restart R ``` # Step 1: Retrieve Census Variables We build an **Economic Hardship Index** from 5 computed rates at the Arizona tract level. Each rate requires a numerator/denominator pair from the ACS, so the variable list contains 10 census codes — but they collapse into 5 measures: poverty rate, unemployment rate, no high school diploma rate, no health insurance rate, and median income. ```{r} #| label: step.01 #| results: hide hardship_vars <- c( pov_below50 = "C17002_002", # Income-to-poverty ratio under .50 pov_50to99 = "C17002_003", # Income-to-poverty ratio .50 to .99 pov_den = "C17002_001", # Total population (poverty universe) median_income = "B19013_001", # Median household income pct_unemp_num = "B23025_005", # Unemployed population pct_unemp_den = "B23025_002", # In labor force pct_no_hs_num = "B15003_002", # Population with less than high school pct_no_hs_den = "B15003_001", # Total population 25+ pct_no_ins_num = "B27010_017", # Without health insurance (18–34) pct_no_ins_den = "B27010_001" # Total population (insurance universe) ) CenDF <- get_acs( geography = "tract", variables = hardship_vars, state = "AZ", year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE ) ``` ```{r} #| label: step.01-offline #| eval: false # ── Offline fallback — run this block instead if the Census API is down ── # Replace the get_acs() chunk above with these readRDS() calls. # All datasets are pre-built and stored in data/ for offline use. # # CenDF_complete <- readRDS("../data/hardship_tracts_az.rds") # CenDF_complete <- readRDS("../data/CenDF_complete_5var.rds") ``` **Variable justification:** | Variable | Census Code | Why Included | |----------|------------|--------------| | Poverty rate | C17002_002 + _003 / _001 | Direct measure of economic deprivation | | Unemployment rate | B23025_005 / B23025_002 | Labor market hardship | | No high school diploma | B15003_002 / B15003_001 | Education predicts economic outcomes | | No health insurance | B27010_017 / B27010_001 | Reflects economic vulnerability | | Median household income | B19013_001 | Core income measure (reverse coded) | # Step 2: Compute Rates and Standardize Reshape from long to wide, compute percentage variables, and standardize to z-scores. ```{r} #| label: step.02 CenDF_wide <- CenDF %>% select(-moe) %>% pivot_wider(names_from = variable, values_from = estimate) CenDF_wide <- CenDF_wide %>% mutate( poverty_rate = (pov_below50 + pov_50to99) / pov_den, unemp_rate = pct_unemp_num / pct_unemp_den, no_hs_rate = pct_no_hs_num / pct_no_hs_den, no_ins_rate = pct_no_ins_num / pct_no_ins_den ) # Select indicator variables and standardize index_vars <- CenDF_wide %>% st_drop_geometry() %>% select(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) %>% drop_na() index_z <- index_vars %>% mutate(across(everything(), ~ as.numeric(scale(.)))) %>% mutate(median_income = -1 * median_income) # reverse-code: high income = low hardship ``` # Step 3: Cronbach's Alpha Assess whether the 5 items reliably measure the same underlying construct. ```{r} #| label: step.03 alpha_result <- psych::alpha(index_z) print(alpha_result) ``` ```{r} #| label: alpha-summary alpha_value <- alpha_result$total$raw_alpha cat("Cronbach's alpha:", round(alpha_value, 3), "\n") ``` **Interpreting the output:** The `raw_alpha` is our Cronbach's alpha. The "Reliability if an item is dropped" section shows what alpha would be without each variable — if dropping an item increases alpha, it may not fit the construct well. # Step 4: Build the Index and Visualize ```{r} #| label: step.04a cor_matrix <- cor(index_z, use = "complete.obs") corrplot( cor_matrix, method = "color", type = "upper", addCoef.col = "black", tl.col = "black", tl.srt = 45, title = "Inter-Item Correlation Matrix: Economic Hardship Index", mar = c(0, 0, 2, 0) ) ``` ```{r} #| label: step.04b index_z$hardship_index <- rowMeans(index_z, na.rm = TRUE) ggplot(index_z, aes(x = hardship_index)) + geom_histogram(bins = 40, fill = "#8c1d40", color = "white", alpha = 0.8) + geom_vline(xintercept = mean(index_z$hardship_index, na.rm = TRUE), linetype = "dashed", color = "#ffc627", linewidth = 1) + theme_minimal() + labs( title = "Distribution of Economic Hardship Index", subtitle = "Arizona Census Tracts (ACS 2023)", x = "Economic Hardship Index (z-score scale)", y = "Number of Tracts", caption = "Dashed line = mean" ) ``` ```{r} #| label: step.04c CenDF_complete <- CenDF_wide %>% drop_na(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) CenDF_complete$hardship_index <- index_z$hardship_index ``` # Step 5: National County-Level Map Pull the same 5 variables at the **county level** for the entire US and map national hardship patterns. ```{r} #| label: step.05-pull #| results: hide county_df <- get_acs( geography = "county", variables = hardship_vars, year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE ) ``` ```{r} #| label: step.05-pull-offline #| eval: false # ── Offline fallback — run this block instead if the Census API is down ── # Replace the get_acs() chunk above with these readRDS() calls. # All datasets are pre-built and stored in data/ for offline use. # # county_complete <- readRDS("../data/hardship_counties_us.rds") # county_complete <- readRDS("../data/county_complete.rds") ``` ```{r} #| label: step.05-index county_wide <- county_df %>% select(-moe) %>% pivot_wider(names_from = variable, values_from = estimate) %>% mutate( poverty_rate = (pov_below50 + pov_50to99) / pov_den, unemp_rate = pct_unemp_num / pct_unemp_den, no_hs_rate = pct_no_hs_num / pct_no_hs_den, no_ins_rate = pct_no_ins_num / pct_no_ins_den ) county_index <- county_wide %>% st_drop_geometry() %>% select(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) %>% drop_na() %>% mutate(across(everything(), ~ as.numeric(scale(.)))) %>% mutate(median_income = -1 * median_income) county_index$hardship_index <- rowMeans(county_index, na.rm = TRUE) county_complete <- county_wide %>% drop_na(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) county_complete$hardship_index <- county_index$hardship_index ``` ```{r} #| label: step.05-map county_shifted <- shift_geometry(county_complete) ggplot(county_shifted) + geom_sf(aes(fill = hardship_index), color = NA, linewidth = 0) + scale_fill_viridis_c(option = "inferno", name = "Hardship\nIndex", na.value = "grey80") + theme_void() + labs( title = "Economic Hardship Across U.S. Counties", subtitle = "ACS 5-Year (2023) | Higher values = greater hardship", caption = "Alaska and Hawaii shown as insets" ) ``` # Step 6: Arizona Census Tract Map ```{r} #| label: step.06-map ggplot(CenDF_complete) + geom_sf(aes(fill = hardship_index), color = NA) + scale_fill_viridis(option = "inferno", direction = -1, name = "Hardship\nIndex") + theme_void() + labs( title = "Economic Hardship Index by Census Tract", subtitle = "Arizona — ACS 5-Year (2023)", caption = "Higher values = greater economic hardship" ) ``` # Step 7: Maricopa County Block Group Map Block groups are the smallest ACS geography, revealing fine-grained variation within tracts. ```{r} #| label: step.07-pull #| results: hide cbg_df <- get_acs( geography = "block group", variables = hardship_vars, state = "AZ", county = "Maricopa", year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE ) ``` ```{r} #| label: step.07-pull-offline #| eval: false # ── Offline fallback — run this block instead if the Census API is down ── # Replace the get_acs() chunk above with these readRDS() calls. # All datasets are pre-built and stored in data/ for offline use. # # cbg_complete <- readRDS("../data/hardship_cbg_maricopa.rds") # cbg_complete <- readRDS("../data/cbg_complete.rds") ``` ```{r} #| label: step.07-index cbg_wide <- cbg_df %>% select(-moe) %>% pivot_wider(names_from = variable, values_from = estimate) %>% mutate( poverty_rate = (pov_below50 + pov_50to99) / pov_den, unemp_rate = pct_unemp_num / pct_unemp_den, no_hs_rate = pct_no_hs_num / pct_no_hs_den, no_ins_rate = pct_no_ins_num / pct_no_ins_den ) cbg_index <- cbg_wide %>% st_drop_geometry() %>% select(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) %>% drop_na() %>% mutate(across(everything(), ~ as.numeric(scale(.)))) %>% mutate(median_income = -1 * median_income) cbg_index$hardship_index <- rowMeans(cbg_index, na.rm = TRUE) cbg_complete <- cbg_wide %>% drop_na(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income) cbg_complete$hardship_index <- cbg_index$hardship_index ``` ```{r} #| label: step.07-map ggplot(cbg_complete) + geom_sf(aes(fill = hardship_index), color = NA) + scale_fill_viridis(option = "inferno", direction = -1, name = "Hardship\nIndex") + theme_void() + labs( title = "Economic Hardship Index by Block Group", subtitle = "Maricopa County, AZ — ACS 5-Year (2023)", caption = "Higher values = greater economic hardship" ) ``` # Step 8: Enriched Index — Does Measurement Choice Change the Map? The 5-variable baseline captures five well-established dimensions of economic hardship. But hardship is multi-dimensional. A broader index might also reflect: | New Variable | Dimension | Census Code | |---|---|---| | Renter-occupied rate | Housing instability / wealth exclusion | B25003_003 / B25003_001 | | No vehicle access | Transportation barrier | B08201_002 / B08201_001 | | Single-parent household rate | Family structure stress | B11012_010 / B11001_001 | | Public assistance income rate | Direct economic need | B19057_002 / B19057_001 | | No internet access rate | Digital divide / economic barrier | B28002_013 / B28002_001 | Adding these five produces a **10-variable enriched index**. The key question: does this broader measurement change which communities we identify as hardship hotspots — and how much? ```{r} #| label: step.08-pull #| results: hide hardship_vars_10 <- c( # ── Original 5 variable pairs ────────────────────────────────────── pov_below50 = "C17002_002", pov_50to99 = "C17002_003", pov_den = "C17002_001", median_income = "B19013_001", pct_unemp_num = "B23025_005", pct_unemp_den = "B23025_002", pct_no_hs_num = "B15003_002", pct_no_hs_den = "B15003_001", pct_no_ins_num = "B27010_017", pct_no_ins_den = "B27010_001", # ── 5 new dimensions ─────────────────────────────────────────────── renter_num = "B25003_003", renter_den = "B25003_001", noveh_num = "B08201_002", noveh_den = "B08201_001", sinpar_num = "B11012_010", sinpar_den = "B11001_001", pubast_num = "B19057_002", pubast_den = "B19057_001", noinet_num = "B28002_013", noinet_den = "B28002_001" ) CenDF_10 <- get_acs( geography = "tract", variables = hardship_vars_10, state = "AZ", year = 2023, survey = "acs5", geometry = TRUE, progress_bar = FALSE ) ``` ```{r} #| label: step.08-pull-offline #| eval: false # ── Offline fallback — run this block instead if the Census API is down ── # Replace the get_acs() chunk above with these readRDS() calls. # All datasets are pre-built and stored in data/ for offline use. # # CenDF_complete_10 <- readRDS("../data/CenDF_complete_10var.rds") ``` ```{r} #| label: step.08-build10 CenDF_wide_10 <- CenDF_10 %>% select(-moe) %>% pivot_wider(names_from = variable, values_from = estimate) %>% mutate( # Original 5 poverty_rate = (pov_below50 + pov_50to99) / pov_den, unemp_rate = pct_unemp_num / pct_unemp_den, no_hs_rate = pct_no_hs_num / pct_no_hs_den, no_ins_rate = pct_no_ins_num / pct_no_ins_den, # New 5 renter_rate = renter_num / renter_den, noveh_rate = noveh_num / noveh_den, sinpar_rate = sinpar_num / sinpar_den, pubast_rate = pubast_num / pubast_den, noinet_rate = noinet_num / noinet_den ) CenDF_complete_10 <- CenDF_wide_10 %>% drop_na(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income, renter_rate, noveh_rate, sinpar_rate, pubast_rate, noinet_rate) index_z_10 <- CenDF_complete_10 %>% st_drop_geometry() %>% select(poverty_rate, unemp_rate, no_hs_rate, no_ins_rate, median_income, renter_rate, noveh_rate, sinpar_rate, pubast_rate, noinet_rate) %>% mutate(across(everything(), ~ as.numeric(scale(.)))) %>% mutate(median_income = -1 * median_income) # reverse-code # Compare alpha: 5-variable vs 10-variable alpha_5 <- psych::alpha(index_z |> select(-hardship_index)) alpha_10 <- psych::alpha(index_z_10) cat("5-variable alpha: ", round(alpha_5$total$raw_alpha, 3), "\n") cat("10-variable alpha:", round(alpha_10$total$raw_alpha, 3), "\n") cat("Change: ", round(alpha_10$total$raw_alpha - alpha_5$total$raw_alpha, 3), "\n") # Build enriched index index_z_10$hardship_index_10 <- rowMeans(index_z_10, na.rm = TRUE) CenDF_complete_10$hardship_index_10 <- index_z_10$hardship_index_10 ``` ```{r} #| label: step.08-compare-maps #| fig-width: 16 #| fig-height: 6 p_5var <- ggplot(CenDF_complete) + geom_sf(aes(fill = hardship_index), color = NA) + scale_fill_viridis_c(option = "inferno", name = "Hardship", na.value = "grey80") + theme_void() + labs(title = "5-Variable Index", subtitle = "Poverty · Unemployment · Education\nInsurance · Income") p_10var <- ggplot(CenDF_complete_10) + geom_sf(aes(fill = hardship_index_10), color = NA) + scale_fill_viridis_c(option = "inferno", name = "Hardship", na.value = "grey80") + theme_void() + labs(title = "10-Variable Enriched Index", subtitle = "Adds renter · no vehicle · single-parent\npublic assistance · no internet") p_5var + p_10var + plot_annotation( title = "Original vs. Enriched Economic Hardship Index", subtitle = "Does expanding measurement change the spatial story?", caption = "ACS 5-Year (2023) | Arizona Census Tracts" ) ``` ```{r} #| label: step.08-difference #| fig-width: 9 #| fig-height: 6 compare_df <- CenDF_complete_10 %>% select(GEOID, geometry, hardship_index_10) %>% left_join( CenDF_complete %>% st_drop_geometry() %>% select(GEOID, hardship_index), by = "GEOID" ) %>% mutate(diff_10_minus_5 = hardship_index_10 - hardship_index) ggplot(compare_df) + geom_sf(aes(fill = diff_10_minus_5), color = NA) + scale_fill_gradient2( low = "#4575b4", mid = "#f7f7f7", high = "#d73027", midpoint = 0, name = "10-var\nminus\n5-var" ) + theme_void() + labs( title = "Where Does Measurement Choice Matter Most?", subtitle = "Red = tracts scoring higher with enriched index\nBlue = tracts scoring lower", caption = "ACS 5-Year (2023) | Difference = 10-variable minus 5-variable index" ) ``` **Interpreting the difference map:** Red tracts are communities where the enriched index reveals *more hardship* than the baseline captured — likely places with high renter rates, low vehicle access, or many single-parent households that census income and poverty measures undercount. Blue tracts are places where those new dimensions don't add to the hardship signal — often wealthier suburbs where some of those variables don't apply. --- # Question 2: Student Modify Code ```{r} #| label: q2-pull #| results: hide # YOUR CODE HERE # Hint: start from hardship_vars_10 and add: # snap_num = "B19058_002" (households receiving SNAP) # snap_den = "B19058_001" (total households) # Then re-run get_acs() for AZ tracts ``` ```{r} #| label: q2-build11 # YOUR CODE HERE # Compute snap_rate = snap_num / snap_den # Add to index variables, z-score, rowMeans -> hardship_index_11 # Run psych::alpha() and print the 10-var vs 11-var comparison ``` ```{r} #| label: q2-maps #| fig-width: 16 #| fig-height: 6 # YOUR CODE HERE — side-by-side: 10-variable vs 11-variable ``` ```{r} #| label: q2-difference #| fig-width: 9 #| fig-height: 6 # YOUR CODE HERE — difference map: 11-var minus 10-var ```