{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import USGS use data & Create usage table\n",
    "Here we download the raw usage data for years 2000, 2005, and 2010 from the USGS usage site and synthesize all data into a tidy table listing: year, FIPS code, and total annual withdrawals by sector in MGal/day tagged by sector, source type (surface or groundwater) and water type (fresh or saline). This table is formatted as follows:\n",
    "\n",
    "|STATE | FIPS | YEAR | UseClass | SourceClass | SourceType | Amount |\n",
    "| :--: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | \n",
    "| AL | 01001 | 2000 | PublicSupply | Surface  | Fresh |0.00 |\n",
    "| AL | 01001 | 2000 | Industrial | Ground  | Saline |0.00 |\n",
    "\n",
    "----\n",
    "\n",
    "#### Workflow:\n",
    "* Pull the raw data file for a given year, in tab-delimted format, from the USGS server into a pandas dataframe.\n",
    "    * Care must be taken that the FIPS codes are preserved as text, not numbers\n",
    "* Fix some inconsistencies in nomeclature across years in the USGS data. This includes:\n",
    "    * For year 2000, remapping 'IT', 'LA', 'LS', and 'PE' fields to 'IR', 'AQ', 'LI' and 'PC', respectively([reference]( https://water.usgs.gov/watuse/data/2000/datadict.html)).\n",
    "        * IT -> IR (Irrigated cropland)\n",
    "        * LA -> AQ (Aquaculture)\n",
    "        * LS -> LI (Livestock)\n",
    "        * PE -> PC (Thermoelectric power closed-loop)\n",
    "    * For year 2005, remapping 'LA' and 'LS' fields to 'AQ' and 'LI', respectively([reference]( https://water.usgs.gov/watuse/data/2005/datadict.html)). \n",
    "        * LA -> AQ (Aquaculture)\n",
    "        * LS -> LI (Livestock)\n",
    "    * Adding year columns to 2000 and 2005 datasets.\n",
    "    * Fixing an inconsistency in the total withdrawal fields (Wtotl to WTotal).\n",
    "    * Removing extranneous fields     \n",
    "* Re-arrange data into a tidy format to facilitate additional analyses.\n",
    "* Append all tables into a single dataframe, with records tagged by the year of the dataset.\n",
    "* Remove rows with null values\n",
    "* Save the table to a file for later analysis. \n",
    "\n",
    "This table can then easily be summarized and joined, by YEAR and FIPS code, to other accounting data tables and summarized by state. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Import modules required for analysis\n",
    "import os\n",
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Create prefix remapping dictionaries to standardize column names for all years to 2010\n",
    "remapDict = {'IT-':'IR-', #Irrigated (total)\n",
    "             'LA-':'AQ-', #Aquaculture\n",
    "             'LS-':'LI-', #Livestock\n",
    "             'PE-':'PC-', #Closed-loop thermo electric\n",
    "             'Wtotl':'WTotl' #Capitalization mismatch\n",
    "            }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Create a function for importing a table and standardizing all field names\n",
    "def importTable(year, remapDict):\n",
    "    \n",
    "    #Set the download URL using with the year\n",
    "    theURL = 'http://water.usgs.gov/watuse/data/{0}/usco{0}.txt'.format(year) \n",
    "    \n",
    "    #Download the dataset to a data frame (keeping the FIPS attribute as a string)\n",
    "    df = pd.read_table(theURL,dtype={'FIPS':str})\n",
    "    \n",
    "    #Ensure that the FIPS column maintains 5 characters\n",
    "    df['FIPS'] = df['FIPS'].apply(lambda x: str(x).zfill(5))\n",
    "    \n",
    "    #Remove the STATEFIPS and COUNTYFIPS columns (as they are not needed)\n",
    "    df.drop(\"STATEFIPS\",axis=1,inplace=True)\n",
    "    df.drop(\"COUNTYFIPS\",axis=1,inplace=True)\n",
    "    \n",
    "    #Use the remap dictionary to rename columns\n",
    "    \n",
    "    #Get the current column names as a list\n",
    "    colNames = df.columns.values.tolist()\n",
    "    \n",
    "    for inFld,outFld in remapDict.items():\n",
    "        #This loops through each item in colNames and replaces it with a revised one\n",
    "        colNames_update = [x.replace(inFld,outFld) for x in colNames]\n",
    "        colNames = colNames_update\n",
    "\n",
    "    #Update the column names in the data frame\n",
    "    df.columns = colNames\n",
    "    \n",
    "    #Add year field, if not present\n",
    "    if \"YEAR\" not in df.columns.values: \n",
    "        df.insert(1,\"YEAR\",year)\n",
    "        \n",
    "    #Remove unnamed columns, which oddly appear in 2005 and 2010 datasets\n",
    "    if \"Unnamed\" in df.columns.values[-1]:\n",
    "        df.drop(df.columns.values[-1],axis=1,inplace=True)\n",
    "        \n",
    "    #Status\n",
    "    print \"{} records and {} attributes returned for year {}\".format(df.shape[0],df.shape[1],year)\n",
    "        \n",
    "    #Return the data frame\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3222 records and 69 attributes returned for year 2000\n",
      "3224 records and 106 attributes returned for year 2005\n",
      "3224 records and 115 attributes returned for year 2010\n"
     ]
    }
   ],
   "source": [
    "#Get the tables\n",
    "df2000 = importTable(2000,remapDict)\n",
    "df2005 = importTable(2005,remapDict)\n",
    "df2010 = importTable(2010,remapDict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>STATE</th>\n",
       "      <th>YEAR</th>\n",
       "      <th>FIPS</th>\n",
       "      <th>TP-TotPop</th>\n",
       "      <th>PS-TOPop</th>\n",
       "      <th>PS-WGWFr</th>\n",
       "      <th>PS-WSWFr</th>\n",
       "      <th>PS-WFrTo</th>\n",
       "      <th>DO-SSPop</th>\n",
       "      <th>DO-WGWFr</th>\n",
       "      <th>...</th>\n",
       "      <th>PC-WTotl</th>\n",
       "      <th>TO-WGWFr</th>\n",
       "      <th>TO-WGWSa</th>\n",
       "      <th>TO-WGWTo</th>\n",
       "      <th>TO-WSWFr</th>\n",
       "      <th>TO-WSWSa</th>\n",
       "      <th>TO-WSWTo</th>\n",
       "      <th>TO-WFrTo</th>\n",
       "      <th>TO-WSaTo</th>\n",
       "      <th>TO-WTotl</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AL</td>\n",
       "      <td>2000</td>\n",
       "      <td>01001</td>\n",
       "      <td>43.67</td>\n",
       "      <td>39.30</td>\n",
       "      <td>5.70</td>\n",
       "      <td>0.00</td>\n",
       "      <td>5.70</td>\n",
       "      <td>4.37</td>\n",
       "      <td>2.95</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>10.65</td>\n",
       "      <td>0.0</td>\n",
       "      <td>10.65</td>\n",
       "      <td>26.60</td>\n",
       "      <td>0.0</td>\n",
       "      <td>26.60</td>\n",
       "      <td>37.25</td>\n",
       "      <td>0.0</td>\n",
       "      <td>37.25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AL</td>\n",
       "      <td>2000</td>\n",
       "      <td>01003</td>\n",
       "      <td>140.42</td>\n",
       "      <td>107.61</td>\n",
       "      <td>19.70</td>\n",
       "      <td>0.00</td>\n",
       "      <td>19.70</td>\n",
       "      <td>32.81</td>\n",
       "      <td>2.46</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>27.04</td>\n",
       "      <td>0.0</td>\n",
       "      <td>27.04</td>\n",
       "      <td>7.59</td>\n",
       "      <td>0.0</td>\n",
       "      <td>7.59</td>\n",
       "      <td>34.63</td>\n",
       "      <td>0.0</td>\n",
       "      <td>34.63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AL</td>\n",
       "      <td>2000</td>\n",
       "      <td>01005</td>\n",
       "      <td>29.04</td>\n",
       "      <td>22.27</td>\n",
       "      <td>4.85</td>\n",
       "      <td>0.00</td>\n",
       "      <td>4.85</td>\n",
       "      <td>6.77</td>\n",
       "      <td>0.51</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>6.36</td>\n",
       "      <td>0.0</td>\n",
       "      <td>6.36</td>\n",
       "      <td>1.06</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.06</td>\n",
       "      <td>7.42</td>\n",
       "      <td>0.0</td>\n",
       "      <td>7.42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AL</td>\n",
       "      <td>2000</td>\n",
       "      <td>01007</td>\n",
       "      <td>20.83</td>\n",
       "      <td>17.93</td>\n",
       "      <td>3.78</td>\n",
       "      <td>0.00</td>\n",
       "      <td>3.78</td>\n",
       "      <td>2.90</td>\n",
       "      <td>0.22</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>4.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AL</td>\n",
       "      <td>2000</td>\n",
       "      <td>01009</td>\n",
       "      <td>51.02</td>\n",
       "      <td>45.90</td>\n",
       "      <td>2.55</td>\n",
       "      <td>29.26</td>\n",
       "      <td>31.81</td>\n",
       "      <td>5.12</td>\n",
       "      <td>0.38</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.93</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.93</td>\n",
       "      <td>29.33</td>\n",
       "      <td>0.0</td>\n",
       "      <td>29.33</td>\n",
       "      <td>32.26</td>\n",
       "      <td>0.0</td>\n",
       "      <td>32.26</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 69 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  STATE  YEAR   FIPS  TP-TotPop  PS-TOPop  PS-WGWFr  PS-WSWFr  PS-WFrTo  \\\n",
       "0    AL  2000  01001      43.67     39.30      5.70      0.00      5.70   \n",
       "1    AL  2000  01003     140.42    107.61     19.70      0.00     19.70   \n",
       "2    AL  2000  01005      29.04     22.27      4.85      0.00      4.85   \n",
       "3    AL  2000  01007      20.83     17.93      3.78      0.00      3.78   \n",
       "4    AL  2000  01009      51.02     45.90      2.55     29.26     31.81   \n",
       "\n",
       "   DO-SSPop  DO-WGWFr    ...     PC-WTotl  TO-WGWFr  TO-WGWSa  TO-WGWTo  \\\n",
       "0      4.37      2.95    ...          0.0     10.65       0.0     10.65   \n",
       "1     32.81      2.46    ...          0.0     27.04       0.0     27.04   \n",
       "2      6.77      0.51    ...          0.0      6.36       0.0      6.36   \n",
       "3      2.90      0.22    ...          0.0      4.00       0.0      4.00   \n",
       "4      5.12      0.38    ...          0.0      2.93       0.0      2.93   \n",
       "\n",
       "   TO-WSWFr  TO-WSWSa  TO-WSWTo  TO-WFrTo  TO-WSaTo  TO-WTotl  \n",
       "0     26.60       0.0     26.60     37.25       0.0     37.25  \n",
       "1      7.59       0.0      7.59     34.63       0.0     34.63  \n",
       "2      1.06       0.0      1.06      7.42       0.0      7.42  \n",
       "3      0.00       0.0      0.00      4.00       0.0      4.00  \n",
       "4     29.33       0.0     29.33     32.26       0.0     32.26  \n",
       "\n",
       "[5 rows x 69 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#CHECK: example output with a column for each supply/source/type grouping.\n",
    "df2000.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#\"Melt\" data so each use/class/type becomes a new row\n",
    "df2000m = pd.melt(df2000,id_vars=['STATE','FIPS','YEAR'],var_name='Class',value_name='Amount')\n",
    "df2005m = pd.melt(df2005,id_vars=['STATE','FIPS','YEAR'],var_name='Class',value_name='Amount')\n",
    "df2010m = pd.melt(df2010,id_vars=['STATE','FIPS','YEAR'],var_name='Class',value_name='Amount')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>STATE</th>\n",
       "      <th>FIPS</th>\n",
       "      <th>YEAR</th>\n",
       "      <th>Class</th>\n",
       "      <th>Amount</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AL</td>\n",
       "      <td>01001</td>\n",
       "      <td>2000</td>\n",
       "      <td>TP-TotPop</td>\n",
       "      <td>43.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AL</td>\n",
       "      <td>01003</td>\n",
       "      <td>2000</td>\n",
       "      <td>TP-TotPop</td>\n",
       "      <td>140.42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AL</td>\n",
       "      <td>01005</td>\n",
       "      <td>2000</td>\n",
       "      <td>TP-TotPop</td>\n",
       "      <td>29.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AL</td>\n",
       "      <td>01007</td>\n",
       "      <td>2000</td>\n",
       "      <td>TP-TotPop</td>\n",
       "      <td>20.83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AL</td>\n",
       "      <td>01009</td>\n",
       "      <td>2000</td>\n",
       "      <td>TP-TotPop</td>\n",
       "      <td>51.02</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  STATE   FIPS  YEAR      Class  Amount\n",
       "0    AL  01001  2000  TP-TotPop   43.67\n",
       "1    AL  01003  2000  TP-TotPop  140.42\n",
       "2    AL  01005  2000  TP-TotPop   29.04\n",
       "3    AL  01007  2000  TP-TotPop   20.83\n",
       "4    AL  01009  2000  TP-TotPop   51.02"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#CHECK: Observe how the data has been 'melted'\n",
    "df2000m.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Merge all records into a single table\n",
    "dfUse = pd.concat([df2000m, df2005m, df2010m],ignore_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(905812, 5)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#CHECK: the dfUse frame should have as many rows as the 2000, 2005, and 2010 data combined\n",
    "dfUse.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Create remap dictionaries. These enable extraction of specific sub-attriubutes\n",
    "# contained in the Class field. For example, PS-WGWFr refers to Fresh groundwater\n",
    "# withdrawn for Public Supply.\n",
    "useClassMap = {'TP':'Population',\n",
    "               'PS':'PublicSupply',\n",
    "               'DO':'Domestic',\n",
    "               'IN':'Industrial',\n",
    "               'IR':'Irrigation',\n",
    "               'IC':'Irrigation_Crop',\n",
    "               'IG':'Irrigation_Golf',\n",
    "               'LI':'Livestock',\n",
    "               'AQ':'Aquaculture',\n",
    "               'MI':'Mining',\n",
    "               'PT':'ThermoElec',\n",
    "               'PO':'ThermoElec_OnceThru',\n",
    "               'PC':'ThermoElec_Recirc',\n",
    "               'TO':'Total'\n",
    "              }\n",
    "\n",
    "srcClassMap = {'-WGW':'Groundwater',\n",
    "               '-WSW':'Surface'\n",
    "              }\n",
    "\n",
    "srcTypeMap = {'Fr':'Fresh',\n",
    "              'Sa':'Saline'\n",
    "             }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Set the use classes (PS, DO, LI, etc.)\n",
    "for code,value in useClassMap.items():\n",
    "    dfUse.loc[dfUse.Class.str.startswith(code), 'UseClass'] = value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Set the source classes (surface or groundwater)\n",
    "for code,value in srcClassMap.items():\n",
    "    dfUse.loc[dfUse.Class.str.contains(code), 'SrcClass'] = value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Set the source types (fresh or saline)\n",
    "for code,value in srcTypeMap.items():\n",
    "    dfUse.loc[dfUse.Class.str.endswith(code), 'SrcType'] = value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['COUNTY'], dtype=object)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#CHECK: List classes with null srcClass remaps\n",
    "dfUse[pd.isnull(dfUse['UseClass'])].Class.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['TP-TotPop', 'PS-TOPop', 'PS-WFrTo', 'DO-SSPop', 'DO-WFrTo',\n",
       "       'IN-WFrTo', 'IN-WSaTo', 'IN-WTotl', 'IR-IrSpr', 'IR-IrMic',\n",
       "       'IR-IrSur', 'IR-IrTot', 'IR-WFrTo', 'AQ-WFrTo', 'LI-WFrTo',\n",
       "       'MI-WFrTo', 'MI-WSaTo', 'MI-WTotl', 'PT-WFrTo', 'PT-WSaTo',\n",
       "       'PT-WTotl', 'PC-WFrTo', 'PC-WSaTo', 'PC-WTotl', 'TO-WFrTo',\n",
       "       'TO-WSaTo', 'TO-WTotl', 'PS-GWPop', 'PS-SWPop', 'PS-WSaTo',\n",
       "       'PS-WTotl', 'DO-PSDel', 'DO-TOTAL ', 'IC-WFrTo', 'IC-IrSpr',\n",
       "       'IC-IrMic', 'IC-IrSur', 'IC-IrTot', 'IG-WFrTo', 'IG-IrSpr',\n",
       "       'IG-IrMic', 'IG-IrSur', 'IG-IrTot', 'PT-Power', 'PO-WFrTo',\n",
       "       'PO-WSaTo', 'PO-WTotl', 'PO-Power', 'PC-Power', 'COUNTY',\n",
       "       'DO-SSPCp', 'DO-PSPCp', 'AQ-WSaTo', 'AQ-WTotl'], dtype=object)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#CHECK: List classes with null srcClass remaps\n",
    "dfUse[pd.isnull(dfUse['SrcClass'])].Class.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['TP-TotPop', 'PS-TOPop', 'PS-WFrTo', 'DO-SSPop', 'DO-WFrTo',\n",
       "       'IN-WGWTo', 'IN-WSWTo', 'IN-WFrTo', 'IN-WSaTo', 'IN-WTotl',\n",
       "       'IR-IrSpr', 'IR-IrMic', 'IR-IrSur', 'IR-IrTot', 'IR-WFrTo',\n",
       "       'AQ-WFrTo', 'LI-WFrTo', 'MI-WGWTo', 'MI-WSWTo', 'MI-WFrTo',\n",
       "       'MI-WSaTo', 'MI-WTotl', 'PT-WSWTo', 'PT-WFrTo', 'PT-WSaTo',\n",
       "       'PT-WTotl', 'PO-WSWTo', 'PC-WSWTo', 'PC-WFrTo', 'PC-WSaTo',\n",
       "       'PC-WTotl', 'TO-WGWTo', 'TO-WSWTo', 'TO-WFrTo', 'TO-WSaTo',\n",
       "       'TO-WTotl', 'PS-GWPop', 'PS-SWPop', 'PS-WGWTo', 'PS-WSWTo',\n",
       "       'PS-WSaTo', 'PS-WTotl', 'DO-PSDel', 'DO-TOTAL ', 'IC-WFrTo',\n",
       "       'IC-IrSpr', 'IC-IrMic', 'IC-IrSur', 'IC-IrTot', 'IG-WFrTo',\n",
       "       'IG-IrSpr', 'IG-IrMic', 'IG-IrSur', 'IG-IrTot', 'PT-WGWTo',\n",
       "       'PT-Power', 'PO-WGWTo', 'PO-WFrTo', 'PO-WSaTo', 'PO-WTotl',\n",
       "       'PO-Power', 'PC-WGWTo', 'PC-Power', 'COUNTY', 'DO-SSPCp',\n",
       "       'DO-PSPCp', 'AQ-WGWTo', 'AQ-WSWTo', 'AQ-WSaTo', 'AQ-WTotl'], dtype=object)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#CHECK: List classes with null srcClass remaps\n",
    "dfUse[pd.isnull(dfUse['SrcType'])].Class.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Remove rows (axis = 0), with a null value in 'any' column\n",
    "dfOutput = dfUse.dropna(axis=0,how='any')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>STATE</th>\n",
       "      <th>FIPS</th>\n",
       "      <th>YEAR</th>\n",
       "      <th>Class</th>\n",
       "      <th>Amount</th>\n",
       "      <th>UseClass</th>\n",
       "      <th>SrcClass</th>\n",
       "      <th>SrcType</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6444</th>\n",
       "      <td>AL</td>\n",
       "      <td>01001</td>\n",
       "      <td>2000</td>\n",
       "      <td>PS-WGWFr</td>\n",
       "      <td>5.7</td>\n",
       "      <td>PublicSupply</td>\n",
       "      <td>Groundwater</td>\n",
       "      <td>Fresh</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6445</th>\n",
       "      <td>AL</td>\n",
       "      <td>01003</td>\n",
       "      <td>2000</td>\n",
       "      <td>PS-WGWFr</td>\n",
       "      <td>19.7</td>\n",
       "      <td>PublicSupply</td>\n",
       "      <td>Groundwater</td>\n",
       "      <td>Fresh</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6446</th>\n",
       "      <td>AL</td>\n",
       "      <td>01005</td>\n",
       "      <td>2000</td>\n",
       "      <td>PS-WGWFr</td>\n",
       "      <td>4.85</td>\n",
       "      <td>PublicSupply</td>\n",
       "      <td>Groundwater</td>\n",
       "      <td>Fresh</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6447</th>\n",
       "      <td>AL</td>\n",
       "      <td>01007</td>\n",
       "      <td>2000</td>\n",
       "      <td>PS-WGWFr</td>\n",
       "      <td>3.78</td>\n",
       "      <td>PublicSupply</td>\n",
       "      <td>Groundwater</td>\n",
       "      <td>Fresh</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6448</th>\n",
       "      <td>AL</td>\n",
       "      <td>01009</td>\n",
       "      <td>2000</td>\n",
       "      <td>PS-WGWFr</td>\n",
       "      <td>2.55</td>\n",
       "      <td>PublicSupply</td>\n",
       "      <td>Groundwater</td>\n",
       "      <td>Fresh</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     STATE   FIPS  YEAR     Class Amount      UseClass     SrcClass SrcType\n",
       "6444    AL  01001  2000  PS-WGWFr    5.7  PublicSupply  Groundwater   Fresh\n",
       "6445    AL  01003  2000  PS-WGWFr   19.7  PublicSupply  Groundwater   Fresh\n",
       "6446    AL  01005  2000  PS-WGWFr   4.85  PublicSupply  Groundwater   Fresh\n",
       "6447    AL  01007  2000  PS-WGWFr   3.78  PublicSupply  Groundwater   Fresh\n",
       "6448    AL  01009  2000  PS-WGWFr   2.55  PublicSupply  Groundwater   Fresh"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#CHECK: see that new columns were added to decode the Class\n",
    "dfOutput.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Set the output location and filename\n",
    "dataDir = '../../Data'\n",
    "outFN = 'UsageDataTidy.csv'\n",
    "dfOutput.to_csv(dataDir + os.sep + outFN,index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}