* ==================================================================== * stata_codebook.do - attach long-form documentation to the .dta files * -------------------------------------------------------------------- * pyreadstat writes variable labels and value labels but cannot write * Stata `notes`. Run this once in Stata to attach a dataset note and a * per-variable note (definition / construction / units / source) to each * .dta, then re-save. Generated by make_stata.py - do not edit by hand. * ==================================================================== * ---- Prediction_Data.dta ---- use "Prediction_Data.dta", clear label data "Kuznets/DMSP region-year training panel (light->GDP, Table 1)" note _dta: Region-year training panel; 1,504 regions in 81 countries; 1992-2010; 5,258 rows. Training sample for the nighttime-light -> GDP prediction model (Table 1): regions with both observed GDP and DMSP-OLS lights, plus geographic, World Bank region, and satellite-era controls. Keyed by code_Coutry_Region x year (country = Country_ISO). note Region_NAME: Name of the first-level admin unit (state/province/canton). Construction: GADM admin-1 name. Units: string. Source: GADM (Global Administrative Areas). note Country_NAME: Country name (English). Construction: From GADM country attributes. Units: string. Source: GADM (Global Administrative Areas). note Country_ISO: Three-letter country identifier. Construction: Assigned per country. Units: string. Source: GADM (Global Administrative Areas). note id_t_j: Concatenated year and ISO code. Construction: year concatenated with Country_ISO. Units: string. Source: Authors' replication archive. note code_Coutry_Region: Numeric identifier for a region (unique within country). Construction: Region identifier carried verbatim from the authors' archive. Units: integer. Source: Authors' replication archive. note year: Year of observation. Construction: -. Units: year. Source: -. note Pop_Region: Total population of the region. Construction: Population density x region area, rounded up (min 1); 5-yr waves interpolated to annual. Units: persons. Source: GPW v3 (CIESIN). note Pop_Country: Total population of the country. Construction: Sum of regional populations. Units: persons. Source: GPW v3 (CIESIN). note GDP_pc_Region: Observed regional GDP per capita (training target). Construction: Regional accounts, constant 2005 PPP US$. Units: US$ (2005 PPP). Source: Gennaioli et al. (2014). note log_GDP_pc_Region: Natural log of GDP_pc_Region. Construction: ln(GDP_pc_Region). Units: log US$. Source: Gennaioli et al. (2014). note log_Light_ppix_Region: Natural log of the region mean DMSP-OLS stable-lights digital number. Construction: ln(mean DN); mean set to 0.01 when 0 so the log is defined; DN ranges 0-63. Units: log DN. Source: NOAA/NGDC DMSP-OLS stable lights. note log_GDP_pc_Country: Natural log of national GDP per capita. Construction: ln(national GDP per capita). Units: log US$. Source: World Bank WDI. note log_N_pix_top_cod_1_ppix: Log number of saturated (top-coded) pixels in the region. Construction: ln(count of DN=63 pixels) per region; controls for sensor saturation. Units: log count. Source: NOAA/NGDC DMSP-OLS stable lights. note log_N_pix_low_cod_1_ppix: Log number of dark (low-coded) pixels in the region. Construction: ln(count of DN=0 pixels) per region; controls for sparse/rural area. Units: log count. Source: NOAA/NGDC DMSP-OLS stable lights. note log_area: Natural log of the region polygon area. Construction: ln(region area in km^2). Units: log km^2. Source: GADM (Global Administrative Areas). note log_region: Log count of first-level regions per country. Construction: ln(number of regions in the country). Units: log count. Source: GADM (Global Administrative Areas). note log_region_X_log_area: Product of log_region and log_area. Construction: log_region * log_area. Units: -. Source: This study (derived). note eap: 1 if the country is in East Asia & Pacific (North America = reference). Construction: World Bank regional grouping indicator. Units: 0/1. Source: World Bank WDI. note eca: 1 if the country is in Europe & Central Asia (North America = reference). Construction: World Bank regional grouping indicator. Units: 0/1. Source: World Bank WDI. note lac: 1 if the country is in Latin America & Caribbean (North America = reference). Construction: World Bank regional grouping indicator. Units: 0/1. Source: World Bank WDI. note mena: 1 if the country is in Middle East & North Africa (North America = reference). Construction: World Bank regional grouping indicator. Units: 0/1. Source: World Bank WDI. note sa: 1 if the country is in South Asia (North America = reference). Construction: World Bank regional grouping indicator. Units: 0/1. Source: World Bank WDI. note ssa: 1 if the country is in Sub-Saharan Africa (North America = reference). Construction: World Bank regional grouping indicator. Units: 0/1. Source: World Bank WDI. note satyear_1: 1 for DMSP satellite/sensor configuration era 1. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_2: 1 for DMSP satellite/sensor configuration era 2. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_3: 1 for DMSP satellite/sensor configuration era 3. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_4: 1 for DMSP satellite/sensor configuration era 4. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_5: 1 for DMSP satellite/sensor configuration era 5. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_6: 1 for DMSP satellite/sensor configuration era 6. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_7: 1 for DMSP satellite/sensor configuration era 7. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. save "Prediction_Data.dta", replace * ---- Table_2_data.dta ---- use "Table_2_data.dta", clear label data "Kuznets/DMSP region-year inputs for inequality-index validation" note _dta: Region-year frame (same 1,504-region training sample); 1992-2010; 5,258 rows. Pairs predicted and observed regional income with regional/country lights and population to validate the five inequality indices (Table 2). Has no explicit region-id column; rows are the training frame at region-year. note Country_ISO: Three-letter country identifier. Construction: Assigned per country. Units: string. Source: GADM (Global Administrative Areas). note year: Year of observation. Construction: -. Units: year. Source: -. note pred_GDP_pc_Region: Model-predicted regional GDP per capita. Construction: Back-transformed fitted values of the eq.-1 random-effects model. Units: US$ (2005 PPP). Source: This study (derived). note GDP_pc_Region: Observed regional GDP per capita (training target). Construction: Regional accounts, constant 2005 PPP US$. Units: US$ (2005 PPP). Source: Gennaioli et al. (2014). note Light_Region: Sum of pixel digital numbers over the region. Construction: Sum of DMSP-OLS stable-lights DN over the region's pixels. Units: summed DN. Source: NOAA/NGDC DMSP-OLS stable lights. note Light_Country: Sum of pixel digital numbers over the whole country. Construction: Sum of DMSP-OLS stable-lights DN over all country pixels. Units: summed DN. Source: NOAA/NGDC DMSP-OLS stable lights. note Pop_Region: Total population of the region. Construction: Population density x region area, rounded up (min 1); 5-yr waves interpolated to annual. Units: persons. Source: GPW v3 (CIESIN). note Pop_Country: Total population of the country. Construction: Sum of regional populations. Units: persons. Source: GPW v3 (CIESIN). save "Table_2_data.dta", replace * ---- Table_3_data.dta ---- use "Table_3_data.dta", clear label data "Kuznets/DMSP country-year panel: GDP + 5 inequality indices (Table 3)" note _dta: Country-year panel; 180 countries; 1992-2012; 3,675 rows. Core Kuznets dataset: national GDP per capita plus five population-weighted regional inequality indices built from predicted regional incomes. Used to estimate the spatial Kuznets curve (Table 3). Keyed by Country_ISO x year. note Country_NAME: Country name (English). Construction: From GADM country attributes. Units: string. Source: GADM (Global Administrative Areas). note Country_ISO: Three-letter country identifier. Construction: Assigned per country. Units: string. Source: GADM (Global Administrative Areas). note year: Year of observation. Construction: -. Units: year. Source: -. note GDP_pc_Country: National GDP per capita. Construction: World Bank WDI, constant 2005 PPP US$. Units: US$ (2005 PPP). Source: World Bank WDI. note GINIW_pred_GDP_pc: Population-weighted Gini of predicted regional income within a country-year. Construction: Gini of pred_GDP_pc_Region across regions, weighted by Pop_Region, per country-year. Units: 0-1. Source: This study (derived). note COVW_pred_GDP_pc: Population-weighted coefficient of variation of predicted regional income. Construction: pop-weighted SD / pop-weighted mean of pred_GDP_pc_Region, per country-year. Units: >=0. Source: This study (derived). note GE_1W_pred_GDP_pc: Population-weighted Theil index of predicted regional income. Construction: Generalized entropy GE(alpha=1) of pred_GDP_pc_Region, pop-weighted, per country-year. Units: >=0. Source: This study (derived). note GE_0W_pred_GDP_pc: Population-weighted mean log deviation of predicted regional income. Construction: Generalized entropy GE(alpha=0) of pred_GDP_pc_Region, pop-weighted, per country-year. Units: >=0. Source: This study (derived). note GE_m1W_pred_GDP_pc: Population-weighted GE(-1) of predicted regional income. Construction: Generalized entropy GE(alpha=-1) of pred_GDP_pc_Region, pop-weighted, per country-year. Units: >=0. Source: This study (derived). save "Table_3_data.dta", replace * ---- Table_4_data.dta ---- use "Table_4_data.dta", clear label data "Kuznets/DMSP country-year panel: inequality determinants (Table 4)" note _dta: Country-year panel; 180 countries; 1992-2012; 3,675 rows. Regional Gini plus structural correlates of regional inequality (resource rents, arable land, trade, FDI, gasoline price, aid, schooling, ethnic inequality, democracy, federalism). Determinant coverage varies by country-year (see per-variable N). Keyed by Country_ISO x year. note Country_NAME: Country name (English). Construction: From GADM country attributes. Units: string. Source: GADM (Global Administrative Areas). note Country_ISO: Three-letter country identifier. Construction: Assigned per country. Units: string. Source: GADM (Global Administrative Areas). note year: Year of observation. Construction: -. Units: year. Source: -. note GINIW_pred_GDP_pc: Population-weighted Gini of predicted regional income within a country-year. Construction: Gini of pred_GDP_pc_Region across regions, weighted by Pop_Region, per country-year. Units: 0-1. Source: This study (derived). note GDP_pc_Country: National GDP per capita. Construction: World Bank WDI, constant 2005 PPP US$. Units: US$ (2005 PPP). Source: World Bank WDI. note Pop_Country: Total population of the country. Construction: Sum of regional populations. Units: persons. Source: GPW v3 (CIESIN). note Resources_rents_share_of_GDP: Total natural-resource rents as a share of GDP. Construction: Oil + gas + coal + mineral + forest rents, % of GDP. Units: % GDP. Source: World Bank WDI. note Arable_land: Arable land as a share of land area (FAO definition). Construction: Arable land / total land area. Units: share. Source: World Bank WDI. note Trade_GDP_share: Trade as a share of GDP. Construction: (Exports + imports) / GDP. Units: ratio. Source: World Bank WDI. note FDI_share_of_GDP: Net foreign direct investment inflows as a share of GDP. Construction: Net FDI inflows / GDP. Units: ratio. Source: World Bank WDI. note area: Total land area excluding inland water. Construction: World Bank WDI land area. Units: km^2. Source: World Bank WDI. note price_gasoline: Pump price for gasoline. Construction: Pump price, PPP constant 2005 US$/litre; paper's transport cost = area x price_gasoline. Units: US$/litre. Source: World Bank WDI. note Aid: Net official development assistance received. Construction: Net ODA received, constant 2011 US$. Units: US$ (2011). Source: World Bank WDI. note School_enrollment_secondary: Gross secondary-school enrolment ratio (>100% with over-age pupils). Construction: Secondary enrolment / age-eligible population. Units: % gross. Source: World Bank WDI. note GINIW_Eth_light: Population-weighted light-Gini computed across ethnic homelands. Construction: Light Gini across ethnic homelands (method of Alesina et al. 2016). Units: 0-1. Source: GREG (Weidmann et al. 2010) + NOAA/NGDC. note Polity2: Rescaled Polity IV combined democracy-autocracy score. Construction: Polity IV combined score rescaled -1 (autocracy) to +1 (democracy). Units: -1..+1. Source: Polity IV (Center for Systemic Peace). note fedelupd2: 1 if the country is federally organised. Construction: Federalism indicator from the authors' archive. Units: 0/1. Source: Authors' replication archive. save "Table_4_data.dta", replace * ---- Table_B4_data.dta ---- use "Table_B4_data.dta", clear label data "Kuznets/DMSP region-year panel: Conley spatial-HAC inputs + lat/lon" note _dta: Region-year frame; 1,504 regions in 81 countries; 1992-2010; 5,258 rows. Adds region-centroid coordinates so the light elasticity can be re-estimated with Conley spatial-HAC standard errors (spatial-correlation robustness check). Keyed by code_Coutry_Region x year. note code_Coutry_Region: Numeric identifier for a region (unique within country). Construction: Region identifier carried verbatim from the authors' archive. Units: integer. Source: Authors' replication archive. note Country_ISO: Three-letter country identifier. Construction: Assigned per country. Units: string. Source: GADM (Global Administrative Areas). note year: Year of observation. Construction: -. Units: year. Source: -. note Latitude: Latitude of the region polygon centroid. Construction: GADM polygon centroid. Units: degrees. Source: GADM (Global Administrative Areas). note Longitude: Longitude of the region polygon centroid. Construction: GADM polygon centroid. Units: degrees. Source: GADM (Global Administrative Areas). note log_GDP_pc_Region: Natural log of GDP_pc_Region. Construction: ln(GDP_pc_Region). Units: log US$. Source: Gennaioli et al. (2014). note log_Light_ppix_Region: Natural log of the region mean DMSP-OLS stable-lights digital number. Construction: ln(mean DN); mean set to 0.01 when 0 so the log is defined; DN ranges 0-63. Units: log DN. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_1: 1 for DMSP satellite/sensor configuration era 1. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_2: 1 for DMSP satellite/sensor configuration era 2. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_3: 1 for DMSP satellite/sensor configuration era 3. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_4: 1 for DMSP satellite/sensor configuration era 4. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_5: 1 for DMSP satellite/sensor configuration era 5. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_6: 1 for DMSP satellite/sensor configuration era 6. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. note satyear_7: 1 for DMSP satellite/sensor configuration era 7. Construction: Sensor-era indicator; DMSP sensors change and age over 1992-2010. Units: 0/1. Source: NOAA/NGDC DMSP-OLS stable lights. save "Table_B4_data.dta", replace * ---- Figure_5_data.dta ---- use "Figure_5_data.dta", clear label data "Kuznets/DMSP country-year: regional vs interpersonal Gini (Figure 5)" note _dta: Country-year panel; 180 countries; 1992-2012; 3,675 rows. Pairs the population-weighted regional Gini with the national household-survey interpersonal income Gini (sparsely observed) to compare regional vs personal inequality (Figure 5). Keyed by Country_ISO x year. note Country_ISO: Three-letter country identifier. Construction: Assigned per country. Units: string. Source: GADM (Global Administrative Areas). note Country_NAME: Country name (English). Construction: From GADM country attributes. Units: string. Source: GADM (Global Administrative Areas). note year: Year of observation. Construction: -. Units: year. Source: -. note GINIW_pred_GDP_pc: Population-weighted Gini of predicted regional income within a country-year. Construction: Gini of pred_GDP_pc_Region across regions, weighted by Pop_Region, per country-year. Units: 0-1. Source: This study (derived). note Giniall: Household-survey interpersonal income Gini. Construction: Reported household income Gini on a 0-100 scale (note: regional indices are 0-1). Units: 0-100. Source: Lessmann & Seidel (2017). save "Figure_5_data.dta", replace