* stata_codebook.do - attach long-form notes to the .dta files (run once in Stata). * Generated by build_data_dictionary.py - do not edit by hand. * ---- proposition99.dta ---- use "proposition99.dta", clear label data "Source 39-state annual panel (raw, with covariate NAs)" note _dta: Mirror of the workshop's proposition99.rds. Balanced 39-state x 31-year panel (1,209 rows). cigsale and retprice fully observed; lnincome, beer, age15to24 carry missing values (filled in data_imputed.csv). note state: Name of the U.S. state (treated unit is California; the other 38 are donor states).. Construction: From the source Abadie-Diamond-Hainmueller panel; data_california.csv is filtered to California only.. Units: string. Source: Abadie, Diamond & Hainmueller (2010) note year: Annual time index; 1989 is the Proposition 99 policy date.. Construction: Source panel runs 1970-2000; the last full pre-period year is 1988.. Units: year. Source: Abadie, Diamond & Hainmueller (2010) note cigsale: Annual per-capita sales of cigarette packs (the policy outcome).. Construction: Observed; fully populated in all three files (no imputation needed).. Units: packs per capita. Source: Abadie, Diamond & Hainmueller (2010) note lnincome: Natural log of per-capita personal income (covariate).. Construction: Source covariate; missing in 16.1% of source rows, filled in data_imputed.csv by random-forest imputation.. Units: log US$. Source: Abadie, Diamond & Hainmueller (2010); imputed (mice rf) in data_imputed.csv note beer: Per-capita beer consumption (covariate proxy for tobacco-related behaviour).. Construction: Source covariate; missing in 54.8% of source rows, filled in data_imputed.csv by random-forest imputation.. Units: gallons per capita. Source: Abadie, Diamond & Hainmueller (2010); imputed (mice rf) in data_imputed.csv note age15to24: Fraction of the state population aged 15 to 24 (covariate).. Construction: Source covariate; missing in 32.3% of source rows, filled in data_imputed.csv by random-forest imputation.. Units: 0-1 (share). Source: Abadie, Diamond & Hainmueller (2010); imputed (mice rf) in data_imputed.csv note retprice: Average retail price of cigarettes (covariate).. Construction: Observed; fully populated in all three files (no imputation needed).. Units: US cents per pack. Source: Abadie, Diamond & Hainmueller (2010) save "proposition99.dta", replace * ---- data_california.dta ---- use "data_california.dta", clear label data "California-only series with Pre/Post factor" note _dta: California rows from the source panel with an added prepost factor (Pre = year <= 1988, Post = year >= 1989). Same seven columns as the source plus prepost. Covariate NAs are retained (these methods use only cigsale). note state: Name of the U.S. state (treated unit is California; the other 38 are donor states).. Construction: From the source Abadie-Diamond-Hainmueller panel; data_california.csv is filtered to California only.. Units: string. Source: Abadie, Diamond & Hainmueller (2010) note year: Annual time index; 1989 is the Proposition 99 policy date.. Construction: Source panel runs 1970-2000; the last full pre-period year is 1988.. Units: year. Source: Abadie, Diamond & Hainmueller (2010) note cigsale: Annual per-capita sales of cigarette packs (the policy outcome).. Construction: Observed; fully populated in all three files (no imputation needed).. Units: packs per capita. Source: Abadie, Diamond & Hainmueller (2010) note lnincome: Natural log of per-capita personal income (covariate).. Construction: Source covariate; missing in 16.1% of source rows, filled in data_imputed.csv by random-forest imputation.. Units: log US$. Source: Abadie, Diamond & Hainmueller (2010); imputed (mice rf) in data_imputed.csv note beer: Per-capita beer consumption (covariate proxy for tobacco-related behaviour).. Construction: Source covariate; missing in 54.8% of source rows, filled in data_imputed.csv by random-forest imputation.. Units: gallons per capita. Source: Abadie, Diamond & Hainmueller (2010); imputed (mice rf) in data_imputed.csv note age15to24: Fraction of the state population aged 15 to 24 (covariate).. Construction: Source covariate; missing in 32.3% of source rows, filled in data_imputed.csv by random-forest imputation.. Units: 0-1 (share). Source: Abadie, Diamond & Hainmueller (2010); imputed (mice rf) in data_imputed.csv note retprice: Average retail price of cigarettes (covariate).. Construction: Observed; fully populated in all three files (no imputation needed).. Units: US cents per pack. Source: Abadie, Diamond & Hainmueller (2010) note prepost: Factor marking the Proposition 99 period: Pre = up to 1988, Post = 1989 onward.. Construction: factor(year > 1988, labels = c('Pre','Post')); present only in data_california.csv.. Units: Pre/Post. Source: Derived (this study) save "data_california.dta", replace * ---- data_imputed.dta ---- use "data_imputed.dta", clear label data "Full panel after random-forest imputation (no NAs)" note _dta: Source panel after one round of mice(m = 1, method = 'rf') random-forest imputation under set.seed(42). Same 1,209 rows x 7 columns as the source, but lnincome, beer and age15to24 have every gap filled; cigsale and retprice are unchanged. note state: Name of the U.S. state (treated unit is California; the other 38 are donor states).. Construction: From the source Abadie-Diamond-Hainmueller panel; data_california.csv is filtered to California only.. Units: string. Source: Abadie, Diamond & Hainmueller (2010) note year: Annual time index; 1989 is the Proposition 99 policy date.. Construction: Source panel runs 1970-2000; the last full pre-period year is 1988.. Units: year. Source: Abadie, Diamond & Hainmueller (2010) note cigsale: Annual per-capita sales of cigarette packs (the policy outcome).. Construction: Observed; fully populated in all three files (no imputation needed).. Units: packs per capita. Source: Abadie, Diamond & Hainmueller (2010) note lnincome: Natural log of per-capita personal income (covariate).. Construction: Source covariate; missing in 16.1% of source rows, filled in data_imputed.csv by random-forest imputation.. Units: log US$. Source: Abadie, Diamond & Hainmueller (2010); imputed (mice rf) in data_imputed.csv note beer: Per-capita beer consumption (covariate proxy for tobacco-related behaviour).. Construction: Source covariate; missing in 54.8% of source rows, filled in data_imputed.csv by random-forest imputation.. Units: gallons per capita. Source: Abadie, Diamond & Hainmueller (2010); imputed (mice rf) in data_imputed.csv note age15to24: Fraction of the state population aged 15 to 24 (covariate).. Construction: Source covariate; missing in 32.3% of source rows, filled in data_imputed.csv by random-forest imputation.. Units: 0-1 (share). Source: Abadie, Diamond & Hainmueller (2010); imputed (mice rf) in data_imputed.csv note retprice: Average retail price of cigarettes (covariate).. Construction: Observed; fully populated in all three files (no imputation needed).. Units: US cents per pack. Source: Abadie, Diamond & Hainmueller (2010) save "data_imputed.dta", replace