* stata_codebook.do - attach long-form notes to the .dta files (run once in Stata). * Generated by build_data_dictionary.py - do not edit by hand. * ---- sim_country_panel.dta ---- use "sim_country_panel.dta", clear label data "Annual country panel: inequality indices + covariates" note _dta: Annual country panel; 56 synthetic countries, 1980-2009 (unbalanced). WCV and covariates per country-year; the income polynomial terms support the Kuznets-curve regressions. note country: Synthetic country identifier (real country names used as labels).. Construction: From the hard-coded 56-country scaffold (region counts/areas from Lessmann 2014).. Units: string. Source: Scaffold (Lessmann 2014 appendix) note region_grp: Geographic grouping of the country.. Construction: Assigned per country: EAP, ECA, LAC, MENA, NA, SA, SSA.. Units: code. Source: Assigned note year: Annual time index.. Construction: Country-specific start/end years (unbalanced coverage).. Units: year. Source: Simulation note lnGDP: Natural log of country GDP per capita.. Construction: Simulated: 2009 base value minus growth path plus AR-type residual.. Units: log US$. Source: Simulation note wcv: Population-weighted spread of regional GDP per capita / country mean (headline index).. Construction: WCV = (1/?)-[? p_j (?-y_j)^2]^(1/2) over regions, per country-year.. Units: 0-1. Source: Computed (this study) note cv: Unweighted spread of regional GDP per capita.. Construction: SD / mean of regional gdp_pc.. Units: >=0. Source: Computed (this study) note gini_reg: Population-weighted Gini of the regional income distribution.. Construction: Gini over sorted regional income with population shares.. Units: 0-1. Source: Computed (this study) note wcv_nocap: WCV recomputed after dropping the largest (capital) region.. Construction: Exclude region 1, re-normalize population shares, recompute WCV.. Units: 0-1. Source: Computed (this study) note trade_gdp: Exports + imports as a share of GDP.. Construction: Country base mean + AR(1) noise, clamped to [15, 171].. Units: % GDP. Source: Simulation note urbanization: Share of population in urban areas.. Construction: Country base mean + random-walk drift, clamped to [20, 99].. Units: %. Source: Simulation note nonag: Share of gross value added outside agriculture (structural change).. Construction: Logistic transform of lnGDP.. Units: % (0-100). Source: Simulation note ethnic: Degree of ethnic/linguistic heterogeneity (time-invariant per country).. Construction: Beta(1.0, 1.7) draw rescaled to [0, 0.75].. Units: 0-1. Source: Simulation note federal: 1 if the country has a federal constitution, else 0.. Construction: 1 for a fixed set of federal states (US, Canada, Brazil, India, Germany, ...).. Units: 0/1. Source: Assigned note lnunits: Natural log of the count of territorial units.. Construction: log(n_reg); region counts from Lessmann (2014) appendix.. Units: log count. Source: Scaffold (Lessmann 2014) note lnarea: Natural log of country land area.. Construction: log(area); areas from Lessmann (2014) appendix.. Units: log km^2. Source: Scaffold (Lessmann 2014) note area_units: Land area relative to the number of regions.. Construction: lnarea / lnunits.. Units: log scale. Source: Derived note period5: Categorical 5-year period (1=1980-84 ... 6=2005-09).. Construction: Year binned into six 5-year periods.. Units: 1-6. Source: Derived note lnGDP2: Quadratic income term for the Kuznets polynomial.. Construction: lnGDP^2.. Units: log^2. Source: Derived note lnGDP3: Cubic income term (potential N-shape) for the Kuznets polynomial.. Construction: lnGDP^3.. Units: log^3. Source: Derived save "sim_country_panel.dta", replace * ---- sim_regional_gdp.dta ---- use "sim_regional_gdp.dta", clear label data "Regional micro cross-section (base year per country)" note _dta: Regional micro-data at each country's first observation year; region 1 is typically the capital. The inequality indices in the country panel are built from these regions. note country: Synthetic country identifier (real country names used as labels).. Construction: From the hard-coded 56-country scaffold (region counts/areas from Lessmann 2014).. Units: string. Source: Scaffold (Lessmann 2014 appendix) note year: Annual time index.. Construction: Country-specific start/end years (unbalanced coverage).. Units: year. Source: Simulation note region: Sequential, persistent region identifier (region 1 is typically the capital).. Construction: 1..n_reg per country.. Units: integer. Source: Simulation note pop_share: Fraction of country population in the region (sums to 1 per country).. Construction: Gamma(0.85) draws sorted descending, adjusted for a capital weight.. Units: 0-1. Source: Simulation note gdp_pc: Simulated regional GDP per capita.. Construction: exp(lnGDP)-exp(?-z - 0.5-?^2): country mean scaled by a persistent regional position.. Units: US$. Source: Simulation save "sim_regional_gdp.dta", replace