{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using the Spatial Statistics Data Object (SSDataObject) Makes Feature IO Simple\n", "- SSDataObject does the read/write and accounting of feature/attribute and NumPy Array order\n", "- Write/Utilize methods that take NumPy Arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using NumPy as the common denominator\n", "\n", "- Could use the ArcPy Data Access Module directly, but there are host of issues/information one must take into account:\n", " * How to deal with projections and other environment settings?\n", " * How Cursors affect the accounting of features?\n", " * How to deal with bad records/bad data and error handling?\n", " * How to honor/account for full field object control?\n", " * How do I create output features that correspond to my inputs?\n", " - Points are easy, what about Polygons and Polylines?\n", "- Spatial Statistics Data Object (SSDataObject)\n", " * Almost 30 Spatial Statistics Tools written in Python that ${\\bf{must}}$ behave like traditional GP Tools\n", " * Use SSDataObject and your code should adhere" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Data Analysis Python Modules\n", "\n", "- [PANDAS (Python Data Analysis Library)](http://pandas.pydata.org/)\n", " \n", "- [SciPy (Scientific Python)](http://www.scipy.org/)\n", "\n", "- [PySAL (Python Spatial Analysis Library)](https://geodacenter.asu.edu/pysal)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Imports" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import arcpy as ARCPY\n", "import numpy as NUM\n", "import SSDataObject as SSDO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize and Load Fields into Spatial Statsitics Data Object\n", "- The Unique ID Field (\"MYID\" in this example) will keep track of the order of your features\n", " * You can use ```ssdo.oidName``` as your Unique ID Field\n", " * You have no control over Object ID Fields. It is quick, assures \"uniqueness\", but can't assume they will not get \"scrambled\" during copies.\n", " * To assure full control I advocate the \"Add Field (LONG)\" --> \"Calculate Field (From Object ID)\" workflow." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " GROWTH LOGPCR69 PERCNOHS POP1969\n", "158 0.011426 0.176233 37.0 1060099\n", "159 -0.137376 0.214186 38.3 398\n", "160 -0.188417 0.067722 41.4 11240\n", "161 -0.085070 -0.118248 42.9 101057\n", "162 -0.049022 -0.081377 48.1 13328\n" ] } ], "source": [ "inputFC = r'../data/CA_Polygons.shp'\n", "ssdo = SSDO.SSDataObject(inputFC)\n", "ssdo.obtainData(\"MYID\", ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969'])\n", "df = ssdo.getDataFrame()\n", "print(df.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## You can get your data using the core NumPy Arrays \n", "- Use ```.data``` to get the native data type\n", "- Use the ```returnDouble()``` function to cast explicitly to float\n", "\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.06009900e+06 3.98000000e+02 1.12400000e+04 1.01057000e+05\n", " 1.33280000e+04]\n" ] } ], "source": [ "pop69 = ssdo.fields['POP1969']\n", "nativePop69 = pop69.data\n", "floatPop69 = pop69.returnDouble()\n", "print(floatPop69[0:5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## You can get your data in a PANDAS Data Frame\n", "- Note the Unique ID Field is used as the Index" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " GROWTH LOGPCR69 PERCNOHS POP1969\n", "158 0.011426 0.176233 37.0 1060099\n", "159 -0.137376 0.214186 38.3 398\n", "160 -0.188417 0.067722 41.4 11240\n", "161 -0.085070 -0.118248 42.9 101057\n", "162 -0.049022 -0.081377 48.1 13328\n" ] } ], "source": [ "df = ssdo.getDataFrame()\n", "print(df.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By default the SSDataObject only stores the centroids of the features " ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " GROWTH LOGPCR69 PERCNOHS POP1969 XCoords YCoords\n", "158 0.011426 0.176233 37.0 1060099 -1.356736e+07 4.503012e+06\n", "159 -0.137376 0.214186 38.3 398 -1.333797e+07 4.637142e+06\n", "160 -0.188417 0.067722 41.4 11240 -1.343007e+07 4.615529e+06\n", "161 -0.085070 -0.118248 42.9 101057 -1.353566e+07 4.789809e+06\n", "162 -0.049022 -0.081377 48.1 13328 -1.341895e+07 4.581597e+06\n" ] } ], "source": [ "df['XCoords'] = ssdo.xyCoords[:,0]\n", "df['YCoords'] = ssdo.xyCoords[:,1]\n", "print(df.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## You can get the core ArcPy Geometries if desired\n", "- Set ```requireGeometry = True``` " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " GROWTH LOGPCR69 PERCNOHS POP1969 \\\n", "158 0.011426 0.176233 37.0 1060099 \n", "159 -0.137376 0.214186 38.3 398 \n", "160 -0.188417 0.067722 41.4 11240 \n", "161 -0.085070 -0.118248 42.9 101057 \n", "162 -0.049022 -0.081377 48.1 13328 \n", "\n", " shapes \n", "158 (