{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Installation\n", "- Follow directions at the PySAL-ArcGIS-Toolbox Git Repository [https://github.com/Esri/PySAL-ArcGIS-Toolbox]\n", "- Please make note of the section on **Adding a Git Project to your ArcGIS Installation Python Path**.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import arcpy as ARCPY\n", "import arcgisscripting as ARC\n", "import SSDataObject as SSDO\n", "import SSUtilities as UTILS\n", "import WeightsUtilities as WU\n", "import numpy as NUM\n", "import scipy as SCIPY\n", "import pysal as PYSAL\n", "import os as OS\n", "import pandas as PANDAS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Example: Testing the Income Convergence Hypothesis in California Counties (1969 - 2010)\n", "\n", "- Use the Auto-Model Spatial Econometric Tool to identify the appropriate model\n", "\n", "- Regressing the growth rate of incomes on the log of starting incomes \n", "\n", " - a significant negative coefficient indicates convergence\n", "\n", "- The percentage of the population w/o a high school education and the population itself are the other exogenous factors." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing Your Data into a PANDAS DataFrame" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " GROWTH LOGPCR69 PERCNOHS POP1969\n", "158 0.011426 0.176233 37.0 1060099\n", "159 -0.137376 0.214186 38.3 398\n", "160 -0.188417 0.067722 41.4 11240\n", "161 -0.085070 -0.118248 42.9 101057\n", "162 -0.049022 -0.081377 48.1 13328\n" ] } ], "source": [ "inputFC = r'../data/CA_Polygons.shp'\n", "fullFC = OS.path.abspath(inputFC)\n", "fullPath, fcName = OS.path.split(fullFC)\n", "ssdo = SSDO.SSDataObject(inputFC)\n", "uniqueIDField = \"MYID\"\n", "fieldNames = ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969']\n", "ssdo.obtainData(uniqueIDField, fieldNames)\n", "df = ssdo.getDataFrame()\n", "print(df.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use the PySAL-ArcGIS Utilities to Read in Spatial Weights Files" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import pysal2ArcUtils as PYSAL_UTILS\n", "swmFile = OS.path.join(fullPath, \"queen.swm\")\n", "W = PYSAL_UTILS.PAT_W(ssdo, swmFile)\n", "w = W.w\n", "\n", "kernelSWMFile = OS.path.join(fullPath, \"knn8.swm\")\n", "KW = PYSAL_UTILS.PAT_W(ssdo, kernelSWMFile)\n", "kw = KW.w" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run the Auto Model Class and Export Your Data to an Output Feature Class" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import AutoModel as AUTO\n", "auto = AUTO.AutoSpace_PySAL(ssdo, \"GROWTH\", ['LOGPCR69', 'PERCNOHS', 'POP1969'],\n", " W, KW, pValue = 0.1, useCombo = True)\n", "ARCPY.env.overwriteOutput = True\n", "outputFC = r'../data/pysal_automodel.shp'\n", "auto.createOutput(outputFC)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare OLS and Spatial Lag Results" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "REGRESSION\n", "----------\n", "SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES\n", "-----------------------------------------\n", "Data set :../data/CA_Polygons.shp\n", "Weights matrix : queen.swm\n", "Dependent Variable : GROWTH Number of Observations: 58\n", "Mean dependent var : -0.1152 Number of Variables : 4\n", "S.D. dependent var : 0.1641 Degrees of Freedom : 54\n", "R-squared : 0.5537\n", "Adjusted R-squared : 0.5290\n", "Sum squared residual: 0.685 F-statistic : 22.3358\n", "Sigma-square : 0.013 Prob(F-statistic) : 1.551e-09\n", "S.E. of regression : 0.113 Log likelihood : 46.429\n", "Sigma-square ML : 0.012 Akaike info criterion : -84.858\n", "S.E of regression ML: 0.1087 Schwarz criterion : -76.616\n", "\n", "------------------------------------------------------------------------------------\n", " Variable Coefficient Std.Error t-Statistic Probability\n", "------------------------------------------------------------------------------------\n", " CONSTANT 0.5972912 0.1097673 5.4414326 0.0000013\n", " LOGPCR69 -0.0390200 0.1358352 -0.2872601 0.7750127\n", " PERCNOHS -0.0170809 0.0025175 -6.7848679 0.0000000\n", " POP1969 -0.0000000 0.0000000 -0.7126791 0.4791127\n", "------------------------------------------------------------------------------------\n", "\n", "REGRESSION DIAGNOSTICS\n", "MULTICOLLINEARITY CONDITION NUMBER 15.894\n", "\n", "TEST ON NORMALITY OF ERRORS\n", "TEST DF VALUE PROB\n", "Jarque-Bera 2 1.181 0.5541\n", "\n", "DIAGNOSTICS FOR HETEROSKEDASTICITY\n", "RANDOM COEFFICIENTS\n", "TEST DF VALUE PROB\n", "Breusch-Pagan test 3 1.329 0.7222\n", "Koenker-Bassett test 3 1.999 0.5725\n", "\n", "DIAGNOSTICS FOR SPATIAL DEPENDENCE\n", "TEST MI/DF VALUE PROB\n", "Lagrange Multiplier (lag) 1 5.330 0.0210\n", "Robust LM (lag) 1 5.977 0.0145\n", "Lagrange Multiplier (error) 1 1.336 0.2477\n", "Robust LM (error) 1 1.983 0.1591\n", "Lagrange Multiplier (SARMA) 2 7.313 0.0258\n", "\n", "================================ END OF REPORT =====================================\n" ] } ], "source": [ "print(auto.olsModel.summary)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "REGRESSION\n", "----------\n", "SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES\n", "--------------------------------------------------\n", "Data set :../data/CA_Polygons.shp\n", "Weights matrix : queen.swm\n", "Dependent Variable : GROWTH Number of Observations: 58\n", "Mean dependent var : -0.1152 Number of Variables : 5\n", "S.D. dependent var : 0.1641 Degrees of Freedom : 53\n", "Pseudo R-squared : 0.6169\n", "Spatial Pseudo R-squared: 0.5131\n", "\n", "------------------------------------------------------------------------------------\n", " Variable Coefficient Std.Error z-Statistic Probability\n", "------------------------------------------------------------------------------------\n", " CONSTANT 0.6611717 0.1005943 6.5726567 0.0000000\n", " LOGPCR69 -0.2400177 0.1394294 -1.7214277 0.0851732\n", " PERCNOHS -0.0161070 0.0022769 -7.0739777 0.0000000\n", " POP1969 -0.0000000 0.0000000 -0.4311762 0.6663403\n", " W_GROWTH 0.7523195 0.2556755 2.9424783 0.0032560\n", "------------------------------------------------------------------------------------\n", "Instrumented: W_GROWTH\n", "Instruments: W_LOGPCR69, W_PERCNOHS, W_POP1969\n", "================================ END OF REPORT =====================================\n" ] } ], "source": [ "print(auto.finalModel.summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpreting the Results\n", "- While the coefficient for the log of starting incomes (LOGPCR69 (-.039)) was negative in the OLS model, it was not statistically significant [p-value = .775].\n", "- The negative coefficient for LOGPCR69 (-.240) was statistically significant in the Spatial Lag Model at the 90% Confidence Level. This provides evidence to bolster the regional convergence hypothesis in the California Counties from 1969 to 2010.\n", "- The overall level of population of population in 1969 (POP1969) did not appear to contribute to the growth rate of regional incomes in California over the time period as their respective coefficients were insignificant in each model [p-values = .479, .666].\n", "- The percentage of the population with no high school education (PERCNOHS) appears to be a strong indicator for regional income growth. The statistically significant coefficients in both models demonstrate that there is a negative relationship between this metric for human capital and growth rates [p-values = <.0000].\n", "- The positive and statistically significant coefficent for the spatial lag variable (W_GROWTH, .752, p-value = .003) indicates that there are considerable spillover effects among growth rates in the counties. Locations with higher growth rates tend to be nearer to others with higher growth rates and vice-a-versa." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 1 }