{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import water supply data & create supply table\n",
    "Here we download the raw supply data for years 2000, 2005, and 2010 from from downscaled CMIP5 hydrology projections ([link](http://gdo-dcp.ucllnl.org/downscaled_cmip_projections/techmemo/BCSD5HydrologyMemo.pdf)).\n",
    "\n",
    "These data include monthly estimates of runoff, precipitation, evapotranspiration, and soil moisture content at a 1/8th degree spatial resolution across the US for the period of 1950 to 2099. Estimates are provided for 21 different climate projection ensembles applied to the Variable Infiltration Capacity (VIC) Macroscale Hydrologic Model ([link](http://vic.readthedocs.io/en/master/)); see the PDF document for a complete list. For demonstration purposes, this project uses the National Center for Atmospheric Research CCSM4 2.6 projection ensembles as the base data for water supply figures. \n",
    " \n",
    "The steps involved include:\n",
    "\n",
    "* Download monthly runoff (total_runoff), precipitation (pr), evapotranspiration (et), and soil moisture content (smc) data, in NetCDF format, from a central data repository ([link](ftp://gdo-dcp.ucllnl.org/pub/dcp/archive/cmip5/hydro/BCSD_mon_VIC_nc/ccsm4_rcp26_r1i1p1/)) for a given sample year (2000, 2005, and 2010).\n",
    "\n",
    "* For each year and parameter combination:\n",
    "\n",
    "    * Extract the monthly data from the downloaded NetCDF files into 4-dimensional NumPy arrays (time, parameter value, latitude, longitude).\n",
    "\n",
    "    * Collapse the time dimension (months) into annual sums, resulting in a 3-dimensional array for each parameter, i.e. a single annual value for each 1/8th degree coordinate pair: rows = latitudes, columns = longitudes.\n",
    "\n",
    "    * Re-lable columns as longitude values and insert a column of latitude values. Then melt the table into a listing of lat, long, and value. \n",
    "\n",
    "    * Combines these 3-dimensional arrays, one for each parameter, into a single data frame listing parameter value, latitude, and longitude. \n",
    "\n",
    "    * Spatially join state FIPS codes to the data frame, using a county shapefile stored in the data folder. \n",
    "\n",
    "* Summarize supply values on FIPS to create a table that can be joined to other county level data:\n",
    "\n",
    "| YEAR | FIPS | Precip | ET | Runoff | SoilMoisture | TotalSupply | \n",
    "| :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
    "| 2000 | 01001 | 0 | 0 | 0 | 0 | 0 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Import libraries\n",
    "import sys, os, glob, time, datetime, urllib\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import netCDF4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Set filename locations\n",
    "countyFN = '../../Data/cb_2016_us_county_5m.shp'\n",
    "tidyFN = '../../Data/SupplyTableTidy.csv'\n",
    "outputFN = '../../Data/SupplyTable.csv'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Function for geocoding FIPS using geopandas\n",
    "def addFIPS(df,countyFN):\n",
    "    '''This function uses geopandas to convert lat/lon columns into a point object and then\n",
    "       spatially join this point object with a shapefile of counties to extract the FIPS code\n",
    "       for each location in the dataframe.\n",
    "    '''\n",
    "    #Import modules; NOTE this function requires geopandas and shapely\n",
    "    import geopandas as gpd\n",
    "    from geopandas import GeoDataFrame, read_file\n",
    "    from geopandas.tools import sjoin\n",
    "    from shapely.geometry import Point, mapping, shape\n",
    "    \n",
    "    #Add a geometry field to the data frame, setting the value as a new Shapely point object\n",
    "    df['geometry'] = df.apply(lambda z: Point(z.LON, z.LAT), axis=1)\n",
    "    \n",
    "    #Create a geopandas dataframe from the dataframe created above\n",
    "    gdfPoints = gpd.GeoDataFrame(df)\n",
    "    \n",
    "    #Create a geopandas dataframe from the counties file\n",
    "    gdfPolygons = gpd.GeoDataFrame.from_file(countyFN)\n",
    "    \n",
    "    #Set the coordinate system of the points equal to the polygons\n",
    "    gdfPoints.crs = gdfPolygons.crs\n",
    "    \n",
    "    #Execute the spatial join of polygon attributes to the point objects\n",
    "    dfMerged=sjoin(gdfPoints, gdfPolygons, how='left', op='within')\n",
    "    \n",
    "    #Compute total area from the area of land and water\n",
    "    dfMerged['Area'] = dfMerged.ALAND + dfMerged.AWATER\n",
    "    \n",
    "    #Drop unneeded columns\n",
    "    dfMerged.drop(['geometry','index_right','ALAND','AWATER','AFFGEOID','COUNTYFP','COUNTYNS','LSAD'],axis=1,inplace=True)\n",
    "    \n",
    "    #Drop rows with no data (usually falling outside the US)\n",
    "    dfMerged.dropna(inplace=True)\n",
    "    \n",
    "    #Return the merged dataframe\n",
    "    return dfMerged\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Function for pulling in netCDF4 format and converting to a dataframe\n",
    "def getSupplyData(year):\n",
    "    '''This function pulls the runoff, precipitation, evaptotranpiration, and \n",
    "       soil moisture content data from the CMIP5 data ftp server as individual \n",
    "       NetCDF4 (nc) files. Each nc file stores monthly values (n=12) across a\n",
    "       1/8th degree geographic grid (463 x 222). Monthly values are summed to \n",
    "       create a data frame ('dfParam') where columns represent longitude, rows \n",
    "       represent latitude, and the value is the parameter (runoff, precip, etc) \n",
    "       in mm/year. This data frame, in turn, is melted to generate a new data \n",
    "       frame ('df') listing lat/long pairs and the parameter value associated \n",
    "       at that location. \n",
    "       \n",
    "       After each parameter data frame is created, they are joined together\n",
    "       to create a listing of lat/long pairs, followed by mm/year of runoff, \n",
    "       precipitation, evapotranspiration, and soil moisutre content, respectively\n",
    "       for the year submitted when calling the function.\n",
    "    '''\n",
    "    \n",
    "    #Get URLs for NCAR 2.6 scenario ensembles: runoff(ro), precipitation(pr), evapotranspiration(et), soil moisture (sm)\n",
    "    baseURL = 'ftp://gdo-dcp.ucllnl.org/pub/dcp/archive/cmip5/hydro/BCSD_mon_VIC_nc/ccsm4_rcp26_r1i1p1/'\n",
    "    baseURL2 = 'ftp://gdo-dcp.ucllnl.org/pub/dcp/archive/cmip5/hydro/BCSD_mon_forc_nc/ccsm4_rcp26_r1i1p1/' #(for precip)\n",
    "    roURL = baseURL + \"conus_c5.ccsm4_rcp26_r1i1p1.monthly.total_runoff.{}.nc\".format(year)\n",
    "    prURL = baseURL2 + \"conus_c5.ccsm4_rcp26_r1i1p1.monthly.pr.{}.nc\".format(year)\n",
    "    etURL = baseURL + \"conus_c5.ccsm4_rcp26_r1i1p1.monthly.et.{}.nc\".format(year)\n",
    "    smURL = baseURL + \"conus_c5.ccsm4_rcp26_r1i1p1.monthly.smc.{}.nc\".format(year)\n",
    "    \n",
    "    #Save as a dictionary (for clearer scripting)\n",
    "    paramDict = {'runoff': roURL, 'precip': prURL, 'et':etURL, 'soil':smURL}\n",
    "\n",
    "    #These lines fix an issue with slow network connections\n",
    "    import socket\n",
    "    socket.setdefaulttimeout(30)\n",
    "\n",
    "    #Loop through each file and create an annual sum array; add it to a dictionary\n",
    "    for param, url in paramDict.items():\n",
    "        print \"->Downloading {} data for year {}\".format(param, year),\n",
    "        \n",
    "        #Retrieve the data file from the ftp server\n",
    "        urllib.urlretrieve(url,\"tmpData.nc\")\n",
    "        \n",
    "        #Convert to netCDF object\n",
    "        nc = netCDF4.Dataset(\"tmpData.nc\",mode='r')\n",
    "        \n",
    "        #Get the parameter name and its values\n",
    "        param_name = nc.variables.keys()[-1]\n",
    "        param_vals = nc.variables.values()[-1]\n",
    "        \n",
    "        #Collapse the monthly values into a single 3d data frame\n",
    "        dfParam = pd.DataFrame(param_vals[:,:,:].sum(axis=0))\n",
    "        \n",
    "        #Create latitude and longitude arrays (for the first element only)\n",
    "        if url == roURL:\n",
    "            dfLats = pd.DataFrame(nc.variables['latitude'][:])\n",
    "            dfLons = pd.DataFrame(nc.variables['longitude'][:])\n",
    "            \n",
    "        #Close the nc object\n",
    "        nc.close()\n",
    "        \n",
    "        #Delete the nc file\n",
    "        os.remove(\"tmpData.nc\")\n",
    "        \n",
    "        #Update\n",
    "        urllib.urlcleanup()\n",
    "        print \"  ...**complete!**\"\n",
    "\n",
    "        #Melt the data into a 3 column, 2d data frame\n",
    "        '''At this point, the dfParam data frame contains columns for each 1/8d longitude\n",
    "           and rows for each 1/8d of latitude. The 'melt' procedure below collapses this into\n",
    "           a three column table of lat,lon,value for the current parameter (e.g. runoff)\n",
    "        '''\n",
    "        dfParam.columns = dfLons[0].values.tolist() #Set column names to longitudes\n",
    "        dfParam['LAT'] = dfLats[0].values.tolist()  #Add column of longitudes\n",
    "        df = pd.melt(dfParam,id_vars=['LAT'],var_name='LON',value_name=param_name)\n",
    "\n",
    "        #Append to dataframe, if not the first element\n",
    "        '''Each attribute will become its own column in the final dataframe. So after assembling\n",
    "           the first dataframe (runoff), we can just append the others to it. We also join the FIPS\n",
    "           codes (created in a previous script) to the runoff dataframe so we can summarize by states\n",
    "           or counties later. \n",
    "        '''\n",
    "        if param == 'runoff': #If its the first in the series of runoff, precip, et, soil moisture...\n",
    "            #Copy to the year dataframe\n",
    "            dfYear = df.copy(deep=True)\n",
    "        else:\n",
    "            #Add column to the year data frame\n",
    "            dfYear[param_name] = df[param_name]\n",
    "            \n",
    "    #Add year to the master data frame\n",
    "    dfYear.insert(1,'YEAR',year)\n",
    "    \n",
    "    #Return the dataframe\n",
    "    return dfYear"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "->Downloading runoff data for year 2000   ...**complete!**\n",
      "->Downloading et data for year 2000   ...**complete!**\n",
      "->Downloading precip data for year 2000   ...**complete!**\n",
      "->Downloading soil data for year 2000   ...**complete!**\n",
      "->Downloading runoff data for year 2005   ...**complete!**\n",
      "->Downloading et data for year 2005   ...**complete!**\n",
      "->Downloading precip data for year 2005   ...**complete!**\n",
      "->Downloading soil data for year 2005   ...**complete!**\n",
      "->Downloading runoff data for year 2010   ...**complete!**\n",
      "->Downloading et data for year 2010   ...**complete!**\n",
      "->Downloading precip data for year 2010   ...**complete!**\n",
      "->Downloading soil data for year 2010   ...**complete!**\n"
     ]
    }
   ],
   "source": [
    "#Retrieve data frames for each sample year, using the function above\n",
    "df2000 = getSupplyData(2000)\n",
    "df2005 = getSupplyData(2005)\n",
    "df2010 = getSupplyData(2010)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Use the addFIPS function to add FIPS values to the table\n",
    "df2000 = addFIPS(df2000,countyFN)\n",
    "df2005 = addFIPS(df2005,countyFN) \n",
    "df2010 = addFIPS(df2010,countyFN)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Concatenate the tables\n",
    "dfAllYears = pd.concat([df2000,df2005,df2010],ignore_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Rename FIPS columns\n",
    "dfAllYears.rename(columns={'GEOID':'FIPS','STATEFP':'STATEFIPS'},inplace=True)\n",
    "dfAllYears.to_csv(tidyFN,index=False,encoding='utf8')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Compute mean values for each county\n",
    "groupDict = {'total_runoff':['count','sum']}\n",
    "dfCounty = dfAllYears.groupby(('YEAR','STATEFIPS','FIPS','Area'))['total_runoff','pr','et','smc'].mean()\n",
    "\n",
    "#Convert indexes back to columns\n",
    "dfCounty.reset_index(inplace=True)\n",
    "\n",
    "#Convert mm/Year * county area (m2) into MGal/day - to match use\n",
    "'''m = [mm] / 1000; m * [m2] = m3; m3 / 3785 = MGal'''\n",
    "for param in ('total_runoff','pr','et','smc'):\n",
    "    dfCounty[param] = (dfCounty[param] / 1000.0) * dfCounty.Area / 3785.0 / 365.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Compute supply as precip - evapotranspiration\n",
    "dfCounty['Supply'] = dfCounty.pr - dfCounty.et"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Remove Area field\n",
    "dfCounty.drop(['Area'],axis=1,inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Write the table to the file\n",
    "dfCounty.to_csv(outputFN,encoding='utf8',index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}