{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using the Spatial Statistics Data Object (SSDataObject) Makes Feature IO Simple\n",
    "- SSDataObject does the read/write and accounting of feature/attribute and NumPy Array order\n",
    "- Write/Utilize methods that take NumPy Arrays"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using NumPy as the common denominator\n",
    "\n",
    "- Could use the ArcPy Data Access Module directly, but there are host of issues/information one must take into account:\n",
    "    * How to deal with projections and other environment settings?\n",
    "    * How Cursors affect the accounting of features?\n",
    "    * How to deal with bad records/bad data and error handling?\n",
    "    * How to honor/account for full field object control?\n",
    "    * How do I create output features that correspond to my inputs?\n",
    "        - Points are easy, what about Polygons and Polylines?\n",
    "- Spatial Statistics Data Object (SSDataObject)\n",
    "    * Almost 30 Spatial Statistics Tools written in Python that ${\\bf{must}}$  behave like traditional GP Tools\n",
    "    * Use SSDataObject and your code should adhere"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Data Analysis Python Modules\n",
    "\n",
    "- [PANDAS (Python Data Analysis Library)](http://pandas.pydata.org/)\n",
    "    \n",
    "- [SciPy (Scientific Python)](http://www.scipy.org/)\n",
    "\n",
    "- [PySAL (Python Spatial Analysis Library)](https://geodacenter.asu.edu/pysal)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import arcpy as ARCPY\n",
    "import numpy as NUM\n",
    "import SSDataObject as SSDO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize and Load Fields into Spatial Statsitics Data Object\n",
    "- The Unique ID Field (\"MYID\" in this example) will keep track of the order of your features\n",
    "    * You can use ```ssdo.oidName``` as your Unique ID Field\n",
    "    * You have no control over Object ID Fields.  It is quick, assures \"uniqueness\", but can't assume they will not get \"scrambled\" during copies.\n",
    "    * To assure full control I advocate the \"Add Field (LONG)\" --> \"Calculate Field (From Object ID)\" workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       GROWTH  LOGPCR69  PERCNOHS  POP1969\n",
      "158  0.011426  0.176233      37.0  1060099\n",
      "159 -0.137376  0.214186      38.3      398\n",
      "160 -0.188417  0.067722      41.4    11240\n",
      "161 -0.085070 -0.118248      42.9   101057\n",
      "162 -0.049022 -0.081377      48.1    13328\n"
     ]
    }
   ],
   "source": [
    "inputFC = r'../data/CA_Polygons.shp'\n",
    "ssdo = SSDO.SSDataObject(inputFC)\n",
    "ssdo.obtainData(\"MYID\", ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969'])\n",
    "df = ssdo.getDataFrame()\n",
    "print(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## You can get your data using the core NumPy Arrays \n",
    "- Use ```.data``` to get the native data type\n",
    "- Use the ```returnDouble()``` function to cast explicitly to float\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[  1.06009900e+06   3.98000000e+02   1.12400000e+04   1.01057000e+05\n",
      "   1.33280000e+04]\n"
     ]
    }
   ],
   "source": [
    "pop69 = ssdo.fields['POP1969']\n",
    "nativePop69 = pop69.data\n",
    "floatPop69 = pop69.returnDouble()\n",
    "print(floatPop69[0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## You can get your data in a PANDAS Data Frame\n",
    "- Note the Unique ID Field is used as the Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       GROWTH  LOGPCR69  PERCNOHS  POP1969\n",
      "158  0.011426  0.176233      37.0  1060099\n",
      "159 -0.137376  0.214186      38.3      398\n",
      "160 -0.188417  0.067722      41.4    11240\n",
      "161 -0.085070 -0.118248      42.9   101057\n",
      "162 -0.049022 -0.081377      48.1    13328\n"
     ]
    }
   ],
   "source": [
    "df = ssdo.getDataFrame()\n",
    "print(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## By default the SSDataObject only stores the centroids of the features "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       GROWTH  LOGPCR69  PERCNOHS  POP1969       XCoords       YCoords\n",
      "158  0.011426  0.176233      37.0  1060099 -1.356736e+07  4.503012e+06\n",
      "159 -0.137376  0.214186      38.3      398 -1.333797e+07  4.637142e+06\n",
      "160 -0.188417  0.067722      41.4    11240 -1.343007e+07  4.615529e+06\n",
      "161 -0.085070 -0.118248      42.9   101057 -1.353566e+07  4.789809e+06\n",
      "162 -0.049022 -0.081377      48.1    13328 -1.341895e+07  4.581597e+06\n"
     ]
    }
   ],
   "source": [
    "df['XCoords'] = ssdo.xyCoords[:,0]\n",
    "df['YCoords'] = ssdo.xyCoords[:,1]\n",
    "print(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## You can get the core ArcPy Geometries if desired\n",
    "- Set ```requireGeometry = True``` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       GROWTH  LOGPCR69  PERCNOHS  POP1969  \\\n",
      "158  0.011426  0.176233      37.0  1060099   \n",
      "159 -0.137376  0.214186      38.3      398   \n",
      "160 -0.188417  0.067722      41.4    11240   \n",
      "161 -0.085070 -0.118248      42.9   101057   \n",
      "162 -0.049022 -0.081377      48.1    13328   \n",
      "\n",
      "                                                shapes  \n",
      "158  (<geoprocessing array object object at 0x00000...  \n",
      "159  (<geoprocessing array object object at 0x00000...  \n",
      "160  (<geoprocessing array object object at 0x00000...  \n",
      "161  (<geoprocessing array object object at 0x00000...  \n",
      "162  (<geoprocessing array object object at 0x00000...  \n"
     ]
    }
   ],
   "source": [
    "ssdo = SSDO.SSDataObject(inputFC)\n",
    "ssdo.obtainData(\"MYID\", ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969'],\n",
    "               requireGeometry = True)\n",
    "df = ssdo.getDataFrame()\n",
    "shapes = NUM.array(ssdo.shapes, dtype = object)\n",
    "df['shapes'] = shapes\n",
    "print(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Coming Soon... ArcPy Geometry Data Frame Integration \n",
    "- In conjunction with the ArcGIS Python SDK \n",
    "- Spatial operators on ArcGIS Data Frames: selection, clip, intersection etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating Output Feature Classes \n",
    "- Simple Example: Adding a field of random standard normal values to your input/output\n",
    "- ```appendFields``` can be used to copy over any fields from the input whether you read them into the SSDataObject or not.\n",
    "- E.g. 'NEW_NAME' was never read into Python but it will be copied to the output.  This can save you a lot of memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy.random as RAND\n",
    "import os as OS\n",
    "ARCPY.env.overwriteOutput = True\n",
    "outArray = RAND.normal(0,1, (ssdo.numObs,))\n",
    "outDict = {}\n",
    "outField = SSDO.CandidateField('STDNORM', 'DOUBLE', outArray, alias = 'Standard Normal')\n",
    "outDict[outField.name] = outField\n",
    "outputFC = OS.path.abspath(r'../data/testMyOutput.shp')\n",
    "ssdo.output2NewFC(outputFC, outDict, appendFields = ['GROWTH', 'PERCNOHS', 'NEW_NAME'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}