{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Summarize tidy tables\n", "This script summarizes the water use and water suppy tidy tables generated by the *CreateUsageTable* and *CreateSupplyTable* scripts, respectively. Each table is then merged into a single dataframe to create a table listing water use and supply for each year/state combination. \n", "\n", "-----\n", "#### Workflow\n", "* Import and summarize use table on state, listing usage amounts by use class and source class (surface/groundwater)\n", "* Import and summarize supply table, first on county so that amounts can be converted from mm to MGal/year.\n", "* Summarize the county supply table to the state level, listing the total MGal/year of supply in each state" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Import libraries\n", "import sys, os\n", "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Get file names; these files are created by the CreateUsageTable.py and CreateSupplyTable.py respectively\n", "dataDir = '../../Data'\n", "tidyuseFN = dataDir + os.sep + \"UsageDataTidy.csv\"\n", "tidysupplyFN = dataDir + os.sep + \"SupplyTableTidy.csv\"\n", "outCountyFN = dataDir + os.sep + \"WaterByCounty.csv\"\n", "outStateFN = dataDir + os.sep + \"WaterByState.csv\"\n", "outNationFN = dataDir + os.sep + \"WaterBalanceData.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Summarize USE table by county\n", "Computes water usage for each county broken into each sector and source category. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Read in the usage table from the csv file\n", "dfUse = pd.read_csv(tidyuseFN,dtype={'FIPS':np.str})" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Remove rows with irrigation and thermoelectric sub-classes\n", "#dropValues = ['Irrigation_Crop', 'Irrigation_Golf','ThermoElec_OnceThru', 'ThermoElec_Recirc']\n", "dropValues = ['Irrigation','ThermoElec']\n", "dfUse = dfUse[~dfUse['UseClass'].isin(dropValues)]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Convert amounts from MGal/day to MGal/year\n", "dfUse['Amount'] = dfUse['Amount'] * 365" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Add STATEFIPS column to dfUse (as left most 2 characters of FIPS values)\n", "dfUse['STATEFIPS'] = dfUse['FIPS'].str[:2]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Pivot on YEAR and FIPS listing usage in sector/source categories\n", "dfUseFIPS = dfUse.pivot_table(index=['YEAR','STATE','FIPS'],\n", " values='Amount',\n", " aggfunc='sum',\n", " columns=['UseClass','SrcClass'])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Flatten hierarchical column names\n", "dfUseFIPS.columns = ['_'.join(col).strip() for col in dfUseFIPS.columns.values]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Remove indices so values are available as columns\n", "dfUseFIPS.reset_index(inplace=True)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YEARSTATEFIPSAquaculture_GroundwaterAquaculture_SurfaceDomestic_GroundwaterDomestic_SurfaceIndustrial_GroundwaterIndustrial_SurfaceIrrigation_Crop_Groundwater...Mining_GroundwaterMining_SurfacePublicSupply_GroundwaterPublicSupply_SurfaceThermoElec_OnceThru_GroundwaterThermoElec_OnceThru_SurfaceThermoElec_Recirc_GroundwaterThermoElec_Recirc_SurfaceTotal_GroundwaterTotal_Surface
02000AK02013NaNNaN3.650.00.0821.25NaN...0.00.07.30357.7NaN0.00.00.010.951178.95
12000AK02016NaNNaN0.000.00.0876.00NaN...0.00.0127.75781.1NaN0.00.00.0127.751657.10
\n", "

2 rows × 25 columns

\n", "
" ], "text/plain": [ " YEAR STATE FIPS Aquaculture_Groundwater Aquaculture_Surface \\\n", "0 2000 AK 02013 NaN NaN \n", "1 2000 AK 02016 NaN NaN \n", "\n", " Domestic_Groundwater Domestic_Surface Industrial_Groundwater \\\n", "0 3.65 0.0 0.0 \n", "1 0.00 0.0 0.0 \n", "\n", " Industrial_Surface Irrigation_Crop_Groundwater ... \\\n", "0 821.25 NaN ... \n", "1 876.00 NaN ... \n", "\n", " Mining_Groundwater Mining_Surface PublicSupply_Groundwater \\\n", "0 0.0 0.0 7.30 \n", "1 0.0 0.0 127.75 \n", "\n", " PublicSupply_Surface ThermoElec_OnceThru_Groundwater \\\n", "0 357.7 NaN \n", "1 781.1 NaN \n", "\n", " ThermoElec_OnceThru_Surface ThermoElec_Recirc_Groundwater \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "\n", " ThermoElec_Recirc_Surface Total_Groundwater Total_Surface \n", "0 0.0 10.95 1178.95 \n", "1 0.0 127.75 1657.10 \n", "\n", "[2 rows x 25 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfUseFIPS.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Import and summarize supply table by county" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Read in the supply table from the csv file\n", "dfSupply = pd.read_csv(tidysupplyFN,dtype={'FIPS':np.str,'STATEFIPS':np.str}) " ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Compute supply as precipitation - evapotranspiration \n", "#(See https://www.fs.fed.us/rm/value/docs/spatial_distribution_water_supply.pdf)\n", "# * Could also use total_runoff\n", "# * Values are in mm/year and need to be adjusted to MGal/year by mulitplying by weighted area\n", "dfSupply['Supply'] = dfSupply['pr'] - dfSupply['et']" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Summarize supply on YEAR and FIPS\n", "'''We take the mean mm/year across points in a county and then \n", " mulitply by county area to get volume (mm * m3). These values\n", " then need to by converted to MGal to give MGal/year\n", "'''\n", "\n", "#Compute mean runoff and supply on year and county\n", "dfSupplyFIPS = dfSupply.groupby(('YEAR','STATEFIPS','FIPS','Area'))['total_runoff','Supply'].mean()\n", "\n", "#Reset the index so Year, StateFIPS, FIPS, and AREA become columns again\n", "dfSupplyFIPS.reset_index(inplace=True)\n", "\n", "#Convert mm/Year * county area (m2) into MGal/year - to match use values\n", "''' m = [mm] / 1000; \n", " m * [m2] = m3;\n", " [m3] / 3785.41178 = 1 MGal'''\n", "for param in ('total_runoff','Supply'):\n", " dfSupplyFIPS[param] = (dfSupplyFIPS[param] / 1000.0) * dfSupplyFIPS.Area / 3785.41178" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YEARSTATEFIPSFIPSAreatotal_runoffSupply
0200001010011.565359e+09164183.119550136727.091260
1200001010035.250715e+09747212.294697697763.569535
\n", "
" ], "text/plain": [ " YEAR STATEFIPS FIPS Area total_runoff Supply\n", "0 2000 01 01001 1.565359e+09 164183.119550 136727.091260\n", "1 2000 01 01003 5.250715e+09 747212.294697 697763.569535" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfSupplyFIPS.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Join Use and Supply Tables on Year and FIPS" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['YEAR', 'STATEFIPS', 'FIPS', 'Area', 'total_runoff', 'Supply'], dtype=object)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfSupplyFIPS.columns.values" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Merge the two tables on YEAR and FIPS columns\n", "dfAll = pd.merge(dfUseFIPS,dfSupplyFIPS, how='outer',on=['YEAR','FIPS'],left_index=True,right_index=True)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YEARSTATEFIPSAquaculture_GroundwaterAquaculture_SurfaceDomestic_GroundwaterDomestic_SurfaceIndustrial_GroundwaterIndustrial_SurfaceIrrigation_Crop_Groundwater...ThermoElec_OnceThru_GroundwaterThermoElec_OnceThru_SurfaceThermoElec_Recirc_GroundwaterThermoElec_Recirc_SurfaceTotal_GroundwaterTotal_SurfaceSTATEFIPSAreatotal_runoffSupply
02000AK02013NaNNaN3.650.00.0821.25NaN...NaN0.00.00.010.951178.95011.565359e+09164183.119550136727.091260
12000AK02016NaNNaN0.000.00.0876.00NaN...NaN0.00.00.0127.751657.10015.250715e+09747212.294697697763.569535
\n", "

2 rows × 29 columns

\n", "
" ], "text/plain": [ " YEAR STATE FIPS Aquaculture_Groundwater Aquaculture_Surface \\\n", "0 2000 AK 02013 NaN NaN \n", "1 2000 AK 02016 NaN NaN \n", "\n", " Domestic_Groundwater Domestic_Surface Industrial_Groundwater \\\n", "0 3.65 0.0 0.0 \n", "1 0.00 0.0 0.0 \n", "\n", " Industrial_Surface Irrigation_Crop_Groundwater ... \\\n", "0 821.25 NaN ... \n", "1 876.00 NaN ... \n", "\n", " ThermoElec_OnceThru_Groundwater ThermoElec_OnceThru_Surface \\\n", "0 NaN 0.0 \n", "1 NaN 0.0 \n", "\n", " ThermoElec_Recirc_Groundwater ThermoElec_Recirc_Surface \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "\n", " Total_Groundwater Total_Surface STATEFIPS Area total_runoff \\\n", "0 10.95 1178.95 01 1.565359e+09 164183.119550 \n", "1 127.75 1657.10 01 5.250715e+09 747212.294697 \n", "\n", " Supply \n", "0 136727.091260 \n", "1 697763.569535 \n", "\n", "[2 rows x 29 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfAll.head(2)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Export to csv\n", "dfAll.to_csv(outCountyFN, index=False, encoding='utf8')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summarize for entire US" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Aquaculture_GroundwaterAquaculture_SurfaceDomestic_GroundwaterDomestic_SurfaceIndustrial_GroundwaterIndustrial_SurfaceIrrigation_Crop_GroundwaterIrrigation_Crop_SurfaceIrrigation_Golf_GroundwaterIrrigation_Golf_Surface...PublicSupply_SurfaceThermoElec_OnceThru_GroundwaterThermoElec_OnceThru_SurfaceThermoElec_Recirc_GroundwaterThermoElec_Recirc_SurfaceTotal_GroundwaterTotal_SurfaceAreatotal_runoffSupply
YEAR
2000385301.30964625.651094419.6512136.251307024.855900746.95NaNNaNNaNNaN...9955243.60NaN64593608.35149412.756525090.4026287117.5097545549.28.617206e+124.970406e+084.837853e+08
2005695671.752509250.901365508.8032017.801133857.905491231.554485871.9010189058.95146657.0168275.95...10815975.65561574.4066812209.75154570.205795367.8030165092.85119654562.88.811690e+125.600854e+085.907308e+08
2010662522.452777405.451292045.2523330.801076239.004744346.6512366590.5518290529.60207261.6195475.75...9610727.4076909.1554995433.30186212.053461802.3528941576.35100469498.56.804011e+125.688181e+086.134356e+08
\n", "

3 rows × 25 columns

\n", "
" ], "text/plain": [ " Aquaculture_Groundwater Aquaculture_Surface Domestic_Groundwater \\\n", "YEAR \n", "2000 385301.30 964625.65 1094419.65 \n", "2005 695671.75 2509250.90 1365508.80 \n", "2010 662522.45 2777405.45 1292045.25 \n", "\n", " Domestic_Surface Industrial_Groundwater Industrial_Surface \\\n", "YEAR \n", "2000 12136.25 1307024.85 5900746.95 \n", "2005 32017.80 1133857.90 5491231.55 \n", "2010 23330.80 1076239.00 4744346.65 \n", "\n", " Irrigation_Crop_Groundwater Irrigation_Crop_Surface \\\n", "YEAR \n", "2000 NaN NaN \n", "2005 4485871.90 10189058.95 \n", "2010 12366590.55 18290529.60 \n", "\n", " Irrigation_Golf_Groundwater Irrigation_Golf_Surface ... \\\n", "YEAR ... \n", "2000 NaN NaN ... \n", "2005 146657.0 168275.95 ... \n", "2010 207261.6 195475.75 ... \n", "\n", " PublicSupply_Surface ThermoElec_OnceThru_Groundwater \\\n", "YEAR \n", "2000 9955243.60 NaN \n", "2005 10815975.65 561574.40 \n", "2010 9610727.40 76909.15 \n", "\n", " ThermoElec_OnceThru_Surface ThermoElec_Recirc_Groundwater \\\n", "YEAR \n", "2000 64593608.35 149412.75 \n", "2005 66812209.75 154570.20 \n", "2010 54995433.30 186212.05 \n", "\n", " ThermoElec_Recirc_Surface Total_Groundwater Total_Surface \\\n", "YEAR \n", "2000 6525090.40 26287117.50 97545549.2 \n", "2005 5795367.80 30165092.85 119654562.8 \n", "2010 3461802.35 28941576.35 100469498.5 \n", "\n", " Area total_runoff Supply \n", "YEAR \n", "2000 8.617206e+12 4.970406e+08 4.837853e+08 \n", "2005 8.811690e+12 5.600854e+08 5.907308e+08 \n", "2010 6.804011e+12 5.688181e+08 6.134356e+08 \n", "\n", "[3 rows x 25 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Group by YEAR\n", "dfUS = dfAll.groupby('YEAR').sum()\n", "dfUS.head()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dfUS.reset_index(inplace=True)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dfUSm = pd.melt(dfUS,id_vars='YEAR',var_name='Group',value_name='MGal')" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dfUSm.to_csv(outNationFN,index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }