{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Retrieve State Usage Data\n",
    "\n",
    "This notebook downloads water use data for a user specified state and year from the National Water Information System server, and translates these data into physical supply and use table (PSUT), using a preformatted PSUT template and value mapping table. \n",
    "\n",
    "Original data are from the USGS state water programs. Below is a sample URL for Louisiana:\n",
    "`https://waterdata.usgs.gov/la/nwis/water_use?format=rdb&rdb_compression=value&wu_area=County&wu_year=2000%2C2005%2C2010&wu_county=ALL&wu_category=ALL&wu_county_nms=--ALL%2BCounties--&wu_category_nms=--ALL%2BCategories--`\n",
    "\n",
    "##### Workflow:\n",
    "* Get the state and year from the user\n",
    "* Get the filenames for the PSUT template and value mapping table\n",
    "* Construct the url and retrieve the data into a pandas data frame\n",
    "* Melt/gather the usage columns into row values under the column name 'Group'\n",
    "* Remove rows with no usage data (identified by not having \"Mgal\" in the 'Group' name)\n",
    "\n",
    "#### Required 3rd party modules\n",
    "The following may require installation before running this notebook. These can each be installed using `pip`\n",
    "* Pandas`  ------>pip install pandas`\n",
    "* OpenPyXl`  ---->pip install openpyxl`\n",
    "---\n",
    "`Created September 2017`<br>`John.Fay@duke.edu`\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Begin by specifying the 2-character state code and the year to process"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Specify the state and year to process\n",
    "state = 'la' #Louisiana\n",
    "year = 2010"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Next, specify the path to the template PSUT and the value mapping filenames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Get the input filenames\n",
    "templateFilename = '../Data/Templates/StatePSUT_Template.xlsx'\n",
    "remapFilename = '../Data/RemapTables/StatePSUTLookup.csv'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The rest is automated..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### First, we need to import required packages, installing 3rd party packages if required"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Import built-in modules\n",
    "import sys, os, urllib\n",
    "from shutil import copyfile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Import pandas, install if necessary...\n",
    "try:\n",
    "    import pandas as pd\n",
    "except:\n",
    "    import pip\n",
    "    pip.main(['install','pandas'])\n",
    "    import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Import openpyxl, install if necessary...\n",
    "try:\n",
    "    from openpyxl import load_workbook\n",
    "except:\n",
    "    import pip\n",
    "    pip.main(['install','openpyxl'])\n",
    "    from openpyxl import load_workbook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Output data will be named with the state code and year (e.g. `la_2010.csv`) and will be saved in the StateData subfolder of the Data directory.\n",
    "This folder will be created, if it does not exist already. And then the PSUT template will be copied to this location using the output filename. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Create the StateData folder, if not already present\n",
    "outFolder = '..\\\\Data\\\\StateData'\n",
    "if os.path.exists(outFolder) == False:\n",
    "    os.mkdir(outFolder)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Output will be written to ..\\Data\\StateData\\la_2010.xlsx\n"
     ]
    }
   ],
   "source": [
    "#Set the output filename, putting it in the output data folder\n",
    "outFilename = outFolder + os.sep + '{0}_{1}.xlsx'.format(state,year)\n",
    "print(\"Output will be written to \" + outFilename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Copy the template to the output filename\n",
    "copyfile(src=templateFilename,dst=outFilename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Load in the value mapping table as a data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Get the remap table and load as a dataframe\n",
    "dfRemap = pd.read_csv(remapFilename,dtype='str',index_col=\"Index\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Construct the URL for data retreival and then pull the on-line data to an in-memory data frame named `dfRaw`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Set the data URL path and parameters and construct the url\n",
    "path = 'https://waterdata.usgs.gov/{}/nwis/water_use?'.format(state)\n",
    "values = {'format':'rdb',\n",
    "         'rdb_compression':'value',\n",
    "         'wu_area':'County',\n",
    "         'wu_year': year,\n",
    "         'wu_county':'ALL',\n",
    "         'wu_county_nms':'--ALL+Counties--',\n",
    "         'wu_category_nms':'--ALL+Categories--'\n",
    "        }\n",
    "url = path + urllib.urlencode(values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Pull data in using the URL and remove the 2nd row of headers\n",
    "dfRaw = pd.read_table(url,comment='#',header=[0,1],na_values='-')\n",
    "dfRaw.columns = dfRaw.columns.droplevel(level=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>state_cd</th>\n",
       "      <th>state_name</th>\n",
       "      <th>county_cd</th>\n",
       "      <th>county_nm</th>\n",
       "      <th>year</th>\n",
       "      <th>Total Population total population of area, in thousands</th>\n",
       "      <th>Public Supply population served by groundwater, in thousands</th>\n",
       "      <th>Public Supply population served by surface water, in thousands</th>\n",
       "      <th>Public Supply total population served, in thousands</th>\n",
       "      <th>Public Supply self-supplied groundwater withdrawals, fresh, in Mgal/d</th>\n",
       "      <th>...</th>\n",
       "      <th>Hydroelectric Power total offstream surface-water withdrawals in Mgal/d</th>\n",
       "      <th>Hydroelectric Power power generated by instream use, in gigawatt-hours</th>\n",
       "      <th>Hydroelectric Power power generated by offstream use, in gigawatt-hours</th>\n",
       "      <th>Hydroelectric Power total power generated, in gigawatt-hours</th>\n",
       "      <th>Hydroelectric Power number of instream facilities</th>\n",
       "      <th>Hydroelectric Power number of offstream facilities</th>\n",
       "      <th>Hydroelectric Power total number of facilities</th>\n",
       "      <th>Wastewater Treatment returns by public wastewater facilities, in Mgal/d</th>\n",
       "      <th>Wastewater Treatment number of public wastewater facilities</th>\n",
       "      <th>Wastewater Treatment reclaimed wastewater released by wastewater facilities, in Mgal/d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>1</td>\n",
       "      <td>Acadia Parish</td>\n",
       "      <td>2010</td>\n",
       "      <td>61.773</td>\n",
       "      <td>44.666</td>\n",
       "      <td>0.000</td>\n",
       "      <td>44.666</td>\n",
       "      <td>5.82</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen Parish</td>\n",
       "      <td>2010</td>\n",
       "      <td>25.764</td>\n",
       "      <td>22.573</td>\n",
       "      <td>0.000</td>\n",
       "      <td>22.573</td>\n",
       "      <td>4.27</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>5</td>\n",
       "      <td>Ascension Parish</td>\n",
       "      <td>2010</td>\n",
       "      <td>107.215</td>\n",
       "      <td>29.510</td>\n",
       "      <td>27.094</td>\n",
       "      <td>56.604</td>\n",
       "      <td>3.02</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>7</td>\n",
       "      <td>Assumption Parish</td>\n",
       "      <td>2010</td>\n",
       "      <td>23.421</td>\n",
       "      <td>0.000</td>\n",
       "      <td>23.041</td>\n",
       "      <td>23.041</td>\n",
       "      <td>0.00</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>9</td>\n",
       "      <td>Avoyelles Parish</td>\n",
       "      <td>2010</td>\n",
       "      <td>42.073</td>\n",
       "      <td>39.756</td>\n",
       "      <td>0.000</td>\n",
       "      <td>39.756</td>\n",
       "      <td>3.85</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 281 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   state_cd state_name  county_cd          county_nm  year  \\\n",
       "0        22  Louisiana          1      Acadia Parish  2010   \n",
       "1        22  Louisiana          3       Allen Parish  2010   \n",
       "2        22  Louisiana          5   Ascension Parish  2010   \n",
       "3        22  Louisiana          7  Assumption Parish  2010   \n",
       "4        22  Louisiana          9   Avoyelles Parish  2010   \n",
       "\n",
       "   Total Population total population of area, in thousands  \\\n",
       "0                                             61.773         \n",
       "1                                             25.764         \n",
       "2                                            107.215         \n",
       "3                                             23.421         \n",
       "4                                             42.073         \n",
       "\n",
       "   Public Supply population served by groundwater, in thousands  \\\n",
       "0                                             44.666              \n",
       "1                                             22.573              \n",
       "2                                             29.510              \n",
       "3                                              0.000              \n",
       "4                                             39.756              \n",
       "\n",
       "   Public Supply population served by surface water, in thousands  \\\n",
       "0                                              0.000                \n",
       "1                                              0.000                \n",
       "2                                             27.094                \n",
       "3                                             23.041                \n",
       "4                                              0.000                \n",
       "\n",
       "   Public Supply total population served, in thousands  \\\n",
       "0                                             44.666     \n",
       "1                                             22.573     \n",
       "2                                             56.604     \n",
       "3                                             23.041     \n",
       "4                                             39.756     \n",
       "\n",
       "   Public Supply self-supplied groundwater withdrawals, fresh, in Mgal/d  \\\n",
       "0                                               5.82                       \n",
       "1                                               4.27                       \n",
       "2                                               3.02                       \n",
       "3                                               0.00                       \n",
       "4                                               3.85                       \n",
       "\n",
       "                                            ...                                            \\\n",
       "0                                           ...                                             \n",
       "1                                           ...                                             \n",
       "2                                           ...                                             \n",
       "3                                           ...                                             \n",
       "4                                           ...                                             \n",
       "\n",
       "   Hydroelectric Power total offstream surface-water withdrawals in Mgal/d  \\\n",
       "0                                                NaN                         \n",
       "1                                                NaN                         \n",
       "2                                                NaN                         \n",
       "3                                                NaN                         \n",
       "4                                                NaN                         \n",
       "\n",
       "   Hydroelectric Power power generated by instream use, in gigawatt-hours  \\\n",
       "0                                                NaN                        \n",
       "1                                                NaN                        \n",
       "2                                                NaN                        \n",
       "3                                                NaN                        \n",
       "4                                                NaN                        \n",
       "\n",
       "   Hydroelectric Power power generated by offstream use, in gigawatt-hours  \\\n",
       "0                                                NaN                         \n",
       "1                                                NaN                         \n",
       "2                                                NaN                         \n",
       "3                                                NaN                         \n",
       "4                                                NaN                         \n",
       "\n",
       "   Hydroelectric Power total power generated, in gigawatt-hours  \\\n",
       "0                                                NaN              \n",
       "1                                                NaN              \n",
       "2                                                NaN              \n",
       "3                                                NaN              \n",
       "4                                                NaN              \n",
       "\n",
       "   Hydroelectric Power number of instream facilities  \\\n",
       "0                                                NaN   \n",
       "1                                                NaN   \n",
       "2                                                NaN   \n",
       "3                                                NaN   \n",
       "4                                                NaN   \n",
       "\n",
       "   Hydroelectric Power number of offstream facilities  \\\n",
       "0                                                NaN    \n",
       "1                                                NaN    \n",
       "2                                                NaN    \n",
       "3                                                NaN    \n",
       "4                                                NaN    \n",
       "\n",
       "   Hydroelectric Power total number of facilities  \\\n",
       "0                                             NaN   \n",
       "1                                             NaN   \n",
       "2                                             NaN   \n",
       "3                                             NaN   \n",
       "4                                             NaN   \n",
       "\n",
       "   Wastewater Treatment returns by public wastewater facilities, in Mgal/d  \\\n",
       "0                                                NaN                         \n",
       "1                                                NaN                         \n",
       "2                                                NaN                         \n",
       "3                                                NaN                         \n",
       "4                                                NaN                         \n",
       "\n",
       "   Wastewater Treatment number of public wastewater facilities  \\\n",
       "0                                                NaN             \n",
       "1                                                NaN             \n",
       "2                                                NaN             \n",
       "3                                                NaN             \n",
       "4                                                NaN             \n",
       "\n",
       "   Wastewater Treatment reclaimed wastewater released by wastewater facilities, in Mgal/d  \n",
       "0                                                NaN                                       \n",
       "1                                                NaN                                       \n",
       "2                                                NaN                                       \n",
       "3                                                NaN                                       \n",
       "4                                                NaN                                       \n",
       "\n",
       "[5 rows x 281 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#CHECK: Display a sample of the retrieved data\n",
    "dfRaw.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### With the data now held locally, we reformat the table so that each of the usage (and other) columns are \"[melted](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html)\" into a single column such that the original column name is stored in a new field and the value is stored in another. This facilitates subsequent analysis, which includes:\n",
    " * Removing rows (formerly columns) that don't report volume, e.g. population served values, and\n",
    " * Ensuring that volume data is a floating point number."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data transformed from (64, 281) rows/columns to (17664, 7) rows/columns\n"
     ]
    }
   ],
   "source": [
    "#Tidy the data: transform so data in each usage column become row values with a new column listing the usage type\n",
    "rowHeadings = ['county_cd', 'county_nm', 'state_cd', 'state_name', 'year']\n",
    "dfTidy = pd.melt(dfRaw,id_vars=rowHeadings,value_name='MGal',var_name='Group')\n",
    "print(\"Data transformed from {0} rows/columns to {1} rows/columns\".format(dfRaw.shape, dfTidy.shape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(15040, 7)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Remove rows that don't have volume data (i.e. keep only columns with 'Mgal' in the name)\n",
    "dfTidy = dfTidy[dfTidy['Group'].str.contains('Mgal')].copy(deep=True)\n",
    "dfTidy.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Change the type of the MGal column to float \n",
    "dfTidy['MGal'] = dfTidy.MGal.astype(\"float\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>county_cd</th>\n",
       "      <th>county_nm</th>\n",
       "      <th>state_cd</th>\n",
       "      <th>state_name</th>\n",
       "      <th>year</th>\n",
       "      <th>Group</th>\n",
       "      <th>MGal</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>256</th>\n",
       "      <td>1</td>\n",
       "      <td>Acadia Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>5.82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>257</th>\n",
       "      <td>3</td>\n",
       "      <td>Allen Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>4.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>258</th>\n",
       "      <td>5</td>\n",
       "      <td>Ascension Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>3.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259</th>\n",
       "      <td>7</td>\n",
       "      <td>Assumption Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260</th>\n",
       "      <td>9</td>\n",
       "      <td>Avoyelles Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>3.85</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     county_cd          county_nm  state_cd state_name  year  \\\n",
       "256          1      Acadia Parish        22  Louisiana  2010   \n",
       "257          3       Allen Parish        22  Louisiana  2010   \n",
       "258          5   Ascension Parish        22  Louisiana  2010   \n",
       "259          7  Assumption Parish        22  Louisiana  2010   \n",
       "260          9   Avoyelles Parish        22  Louisiana  2010   \n",
       "\n",
       "                                                 Group  MGal  \n",
       "256  Public Supply self-supplied groundwater withdr...  5.82  \n",
       "257  Public Supply self-supplied groundwater withdr...  4.27  \n",
       "258  Public Supply self-supplied groundwater withdr...  3.02  \n",
       "259  Public Supply self-supplied groundwater withdr...  0.00  \n",
       "260  Public Supply self-supplied groundwater withdr...  3.85  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#CHECK: Show the structure of the 'tidied' data frame\n",
    "dfTidy.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Next, we join the value mapping table to the tidied data frame, thus linking the values (in MGal/year) with the address of the PSUT cell in which they should be ported. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>county_cd</th>\n",
       "      <th>county_nm</th>\n",
       "      <th>state_cd</th>\n",
       "      <th>state_name</th>\n",
       "      <th>year</th>\n",
       "      <th>Group</th>\n",
       "      <th>MGal</th>\n",
       "      <th>Column1</th>\n",
       "      <th>Row1</th>\n",
       "      <th>Column2</th>\n",
       "      <th>Row2</th>\n",
       "      <th>Column3</th>\n",
       "      <th>Row3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Acadia Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>5.82</td>\n",
       "      <td>N</td>\n",
       "      <td>31</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>Allen Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>4.27</td>\n",
       "      <td>N</td>\n",
       "      <td>31</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>Ascension Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>3.02</td>\n",
       "      <td>N</td>\n",
       "      <td>31</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7</td>\n",
       "      <td>Assumption Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>0.00</td>\n",
       "      <td>N</td>\n",
       "      <td>31</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9</td>\n",
       "      <td>Avoyelles Parish</td>\n",
       "      <td>22</td>\n",
       "      <td>Louisiana</td>\n",
       "      <td>2010</td>\n",
       "      <td>Public Supply self-supplied groundwater withdr...</td>\n",
       "      <td>3.85</td>\n",
       "      <td>N</td>\n",
       "      <td>31</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   county_cd          county_nm  state_cd state_name  year  \\\n",
       "0          1      Acadia Parish        22  Louisiana  2010   \n",
       "1          3       Allen Parish        22  Louisiana  2010   \n",
       "2          5   Ascension Parish        22  Louisiana  2010   \n",
       "3          7  Assumption Parish        22  Louisiana  2010   \n",
       "4          9   Avoyelles Parish        22  Louisiana  2010   \n",
       "\n",
       "                                               Group  MGal Column1 Row1  \\\n",
       "0  Public Supply self-supplied groundwater withdr...  5.82       N   31   \n",
       "1  Public Supply self-supplied groundwater withdr...  4.27       N   31   \n",
       "2  Public Supply self-supplied groundwater withdr...  3.02       N   31   \n",
       "3  Public Supply self-supplied groundwater withdr...  0.00       N   31   \n",
       "4  Public Supply self-supplied groundwater withdr...  3.85       N   31   \n",
       "\n",
       "  Column2 Row2 Column3 Row3  \n",
       "0     NaN  NaN     NaN  NaN  \n",
       "1     NaN  NaN     NaN  NaN  \n",
       "2     NaN  NaN     NaN  NaN  \n",
       "3     NaN  NaN     NaN  NaN  \n",
       "4     NaN  NaN     NaN  NaN  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Join the remap table\n",
    "dfAll = pd.merge(dfTidy,dfRemap,how='inner',left_on=\"Group\",right_on=\"Group\")\n",
    "dfAll.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### With the tables joined, we can open the output worksheet and start porting over values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "os.path.exists(outFilename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Open the spreadsheet template\n",
    "wb = load_workbook(filename=outFilename)\n",
    "ws = wb['template']\n",
    "ws.title = str(year)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Notes on mapping values in the dataframe to PSUT cells\n",
    "* As some water usage categories are inserted into more than one cell - some in as many as three - in the PSUT, the value mapping file contains three sets of cell coordinates (i.e. column/row pairs). Thus, we repeat the process of reading the dataframe and mapping values to the PSUT three times. \n",
    "\n",
    "* In cases where two or more usage categories map to the same PSUT cell, the values are summed. This is done in the `dfAll.groupby` statement. \n",
    "\n",
    "* Where no data exist for a particular usage category, the value `n/a` is inserted. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Loop through the first set of row/columns and insert values into the Excel spreadsheet\n",
    "dfRound1 = dfAll.groupby(['Column1','Row1'])['MGal'].sum()\n",
    "dfRound1.fillna(value=\"n/a\",inplace=True)\n",
    "for (row,column), value in dfRound1.iteritems():\n",
    "    #Set the value in the workbook\n",
    "    rv = str(row)+ str(column)\n",
    "    ws[rv] = value\n",
    "#Save the workbook\n",
    "wb.save(outFilename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Loop through the second set of row/columns and insert values into the Excel spreadsheet\n",
    "dfRound2 = dfAll.groupby(['Column2','Row2'])['MGal'].sum()\n",
    "dfRound2.fillna(value=\"n/a\",inplace=True)\n",
    "for (row,column), value in dfRound2.iteritems():\n",
    "    #Set the value in the workbook\n",
    "    rv = str(row)+ str(column)\n",
    "    ws[rv] = value\n",
    "#Save the workbook\n",
    "wb.save(outFilename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Loop through the third set of row/columns and insert values into the Excel spreadsheet\n",
    "dfRound3 = dfAll.groupby(['Column3','Row3'])['MGal'].sum()\n",
    "dfRound3.fillna(value=\"n/a\",inplace=True)\n",
    "for (row,column), value in dfRound3.iteritems():\n",
    "    #print row, column, value\n",
    "    #Set the value in the workbook\n",
    "    rv = str(row)+ str(column)\n",
    "    ws[rv] = value\n",
    "#Save the workbook\n",
    "wb.save(outFilename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Open the notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.startfile(outFilename)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}