{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Import USGS use data & Create usage table\n", "Here we download the raw usage data for years 2000, 2005, and 2010 from the USGS usage site and synthesize all data into a tidy table listing: year, FIPS code, and total annual withdrawals by sector in MGal/day tagged by sector, source type (surface or groundwater) and water type (fresh or saline). This table is formatted as follows:\n", "\n", "|STATE | FIPS | YEAR | UseClass | SourceClass | SourceType | Amount |\n", "| :--: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | \n", "| AL | 01001 | 2000 | PublicSupply | Surface | Fresh |0.00 |\n", "| AL | 01001 | 2000 | Industrial | Ground | Saline |0.00 |\n", "\n", "----\n", "\n", "#### Workflow:\n", "* Pull the raw data file for a given year, in tab-delimted format, from the USGS server into a pandas dataframe.\n", " * Care must be taken that the FIPS codes are preserved as text, not numbers\n", "* Fix some inconsistencies in nomeclature across years in the USGS data. This includes:\n", " * For year 2000, remapping 'IT', 'LA', 'LS', and 'PE' fields to 'IR', 'AQ', 'LI' and 'PC', respectively([reference]( https://water.usgs.gov/watuse/data/2000/datadict.html)).\n", " * IT -> IR (Irrigated cropland)\n", " * LA -> AQ (Aquaculture)\n", " * LS -> LI (Livestock)\n", " * PE -> PC (Thermoelectric power closed-loop)\n", " * For year 2005, remapping 'LA' and 'LS' fields to 'AQ' and 'LI', respectively([reference]( https://water.usgs.gov/watuse/data/2005/datadict.html)). \n", " * LA -> AQ (Aquaculture)\n", " * LS -> LI (Livestock)\n", " * Adding year columns to 2000 and 2005 datasets.\n", " * Fixing an inconsistency in the total withdrawal fields (Wtotl to WTotal).\n", " * Removing extranneous fields \n", "* Re-arrange data into a tidy format to facilitate additional analyses.\n", "* Append all tables into a single dataframe, with records tagged by the year of the dataset.\n", "* Remove rows with null values\n", "* Save the table to a file for later analysis. \n", "\n", "This table can then easily be summarized and joined, by YEAR and FIPS code, to other accounting data tables and summarized by state. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Import modules required for analysis\n", "import os\n", "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Create prefix remapping dictionaries to standardize column names for all years to 2010\n", "remapDict = {'IT-':'IR-', #Irrigated (total)\n", " 'LA-':'AQ-', #Aquaculture\n", " 'LS-':'LI-', #Livestock\n", " 'PE-':'PC-', #Closed-loop thermo electric\n", " 'Wtotl':'WTotl' #Capitalization mismatch\n", " }" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Create a function for importing a table and standardizing all field names\n", "def importTable(year, remapDict):\n", " \n", " #Set the download URL using with the year\n", " theURL = 'http://water.usgs.gov/watuse/data/{0}/usco{0}.txt'.format(year) \n", " \n", " #Download the dataset to a data frame (keeping the FIPS attribute as a string)\n", " df = pd.read_table(theURL,dtype={'FIPS':str})\n", " \n", " #Ensure that the FIPS column maintains 5 characters\n", " df['FIPS'] = df['FIPS'].apply(lambda x: str(x).zfill(5))\n", " \n", " #Remove the STATEFIPS and COUNTYFIPS columns (as they are not needed)\n", " df.drop(\"STATEFIPS\",axis=1,inplace=True)\n", " df.drop(\"COUNTYFIPS\",axis=1,inplace=True)\n", " \n", " #Use the remap dictionary to rename columns\n", " \n", " #Get the current column names as a list\n", " colNames = df.columns.values.tolist()\n", " \n", " for inFld,outFld in remapDict.items():\n", " #This loops through each item in colNames and replaces it with a revised one\n", " colNames_update = [x.replace(inFld,outFld) for x in colNames]\n", " colNames = colNames_update\n", "\n", " #Update the column names in the data frame\n", " df.columns = colNames\n", " \n", " #Add year field, if not present\n", " if \"YEAR\" not in df.columns.values: \n", " df.insert(1,\"YEAR\",year)\n", " \n", " #Remove unnamed columns, which oddly appear in 2005 and 2010 datasets\n", " if \"Unnamed\" in df.columns.values[-1]:\n", " df.drop(df.columns.values[-1],axis=1,inplace=True)\n", " \n", " #Status\n", " print \"{} records and {} attributes returned for year {}\".format(df.shape[0],df.shape[1],year)\n", " \n", " #Return the data frame\n", " return df" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3222 records and 69 attributes returned for year 2000\n", "3224 records and 106 attributes returned for year 2005\n", "3224 records and 115 attributes returned for year 2010\n" ] } ], "source": [ "#Get the tables\n", "df2000 = importTable(2000,remapDict)\n", "df2005 = importTable(2005,remapDict)\n", "df2010 = importTable(2010,remapDict)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STATEYEARFIPSTP-TotPopPS-TOPopPS-WGWFrPS-WSWFrPS-WFrToDO-SSPopDO-WGWFr...PC-WTotlTO-WGWFrTO-WGWSaTO-WGWToTO-WSWFrTO-WSWSaTO-WSWToTO-WFrToTO-WSaToTO-WTotl
0AL20000100143.6739.305.700.005.704.372.95...0.010.650.010.6526.600.026.6037.250.037.25
1AL200001003140.42107.6119.700.0019.7032.812.46...0.027.040.027.047.590.07.5934.630.034.63
2AL20000100529.0422.274.850.004.856.770.51...0.06.360.06.361.060.01.067.420.07.42
3AL20000100720.8317.933.780.003.782.900.22...0.04.000.04.000.000.00.004.000.04.00
4AL20000100951.0245.902.5529.2631.815.120.38...0.02.930.02.9329.330.029.3332.260.032.26
\n", "

5 rows × 69 columns

\n", "
" ], "text/plain": [ " STATE YEAR FIPS TP-TotPop PS-TOPop PS-WGWFr PS-WSWFr PS-WFrTo \\\n", "0 AL 2000 01001 43.67 39.30 5.70 0.00 5.70 \n", "1 AL 2000 01003 140.42 107.61 19.70 0.00 19.70 \n", "2 AL 2000 01005 29.04 22.27 4.85 0.00 4.85 \n", "3 AL 2000 01007 20.83 17.93 3.78 0.00 3.78 \n", "4 AL 2000 01009 51.02 45.90 2.55 29.26 31.81 \n", "\n", " DO-SSPop DO-WGWFr ... PC-WTotl TO-WGWFr TO-WGWSa TO-WGWTo \\\n", "0 4.37 2.95 ... 0.0 10.65 0.0 10.65 \n", "1 32.81 2.46 ... 0.0 27.04 0.0 27.04 \n", "2 6.77 0.51 ... 0.0 6.36 0.0 6.36 \n", "3 2.90 0.22 ... 0.0 4.00 0.0 4.00 \n", "4 5.12 0.38 ... 0.0 2.93 0.0 2.93 \n", "\n", " TO-WSWFr TO-WSWSa TO-WSWTo TO-WFrTo TO-WSaTo TO-WTotl \n", "0 26.60 0.0 26.60 37.25 0.0 37.25 \n", "1 7.59 0.0 7.59 34.63 0.0 34.63 \n", "2 1.06 0.0 1.06 7.42 0.0 7.42 \n", "3 0.00 0.0 0.00 4.00 0.0 4.00 \n", "4 29.33 0.0 29.33 32.26 0.0 32.26 \n", "\n", "[5 rows x 69 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#CHECK: example output with a column for each supply/source/type grouping.\n", "df2000.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#\"Melt\" data so each use/class/type becomes a new row\n", "df2000m = pd.melt(df2000,id_vars=['STATE','FIPS','YEAR'],var_name='Class',value_name='Amount')\n", "df2005m = pd.melt(df2005,id_vars=['STATE','FIPS','YEAR'],var_name='Class',value_name='Amount')\n", "df2010m = pd.melt(df2010,id_vars=['STATE','FIPS','YEAR'],var_name='Class',value_name='Amount')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STATEFIPSYEARClassAmount
0AL010012000TP-TotPop43.67
1AL010032000TP-TotPop140.42
2AL010052000TP-TotPop29.04
3AL010072000TP-TotPop20.83
4AL010092000TP-TotPop51.02
\n", "
" ], "text/plain": [ " STATE FIPS YEAR Class Amount\n", "0 AL 01001 2000 TP-TotPop 43.67\n", "1 AL 01003 2000 TP-TotPop 140.42\n", "2 AL 01005 2000 TP-TotPop 29.04\n", "3 AL 01007 2000 TP-TotPop 20.83\n", "4 AL 01009 2000 TP-TotPop 51.02" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#CHECK: Observe how the data has been 'melted'\n", "df2000m.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Merge all records into a single table\n", "dfUse = pd.concat([df2000m, df2005m, df2010m],ignore_index=True)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(905812, 5)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#CHECK: the dfUse frame should have as many rows as the 2000, 2005, and 2010 data combined\n", "dfUse.shape[0]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Create remap dictionaries. These enable extraction of specific sub-attriubutes\n", "# contained in the Class field. For example, PS-WGWFr refers to Fresh groundwater\n", "# withdrawn for Public Supply.\n", "useClassMap = {'TP':'Population',\n", " 'PS':'PublicSupply',\n", " 'DO':'Domestic',\n", " 'IN':'Industrial',\n", " 'IR':'Irrigation',\n", " 'IC':'Irrigation_Crop',\n", " 'IG':'Irrigation_Golf',\n", " 'LI':'Livestock',\n", " 'AQ':'Aquaculture',\n", " 'MI':'Mining',\n", " 'PT':'ThermoElec',\n", " 'PO':'ThermoElec_OnceThru',\n", " 'PC':'ThermoElec_Recirc',\n", " 'TO':'Total'\n", " }\n", "\n", "srcClassMap = {'-WGW':'Groundwater',\n", " '-WSW':'Surface'\n", " }\n", "\n", "srcTypeMap = {'Fr':'Fresh',\n", " 'Sa':'Saline'\n", " }" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Set the use classes (PS, DO, LI, etc.)\n", "for code,value in useClassMap.items():\n", " dfUse.loc[dfUse.Class.str.startswith(code), 'UseClass'] = value" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Set the source classes (surface or groundwater)\n", "for code,value in srcClassMap.items():\n", " dfUse.loc[dfUse.Class.str.contains(code), 'SrcClass'] = value" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Set the source types (fresh or saline)\n", "for code,value in srcTypeMap.items():\n", " dfUse.loc[dfUse.Class.str.endswith(code), 'SrcType'] = value" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['COUNTY'], dtype=object)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#CHECK: List classes with null srcClass remaps\n", "dfUse[pd.isnull(dfUse['UseClass'])].Class.unique()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['TP-TotPop', 'PS-TOPop', 'PS-WFrTo', 'DO-SSPop', 'DO-WFrTo',\n", " 'IN-WFrTo', 'IN-WSaTo', 'IN-WTotl', 'IR-IrSpr', 'IR-IrMic',\n", " 'IR-IrSur', 'IR-IrTot', 'IR-WFrTo', 'AQ-WFrTo', 'LI-WFrTo',\n", " 'MI-WFrTo', 'MI-WSaTo', 'MI-WTotl', 'PT-WFrTo', 'PT-WSaTo',\n", " 'PT-WTotl', 'PC-WFrTo', 'PC-WSaTo', 'PC-WTotl', 'TO-WFrTo',\n", " 'TO-WSaTo', 'TO-WTotl', 'PS-GWPop', 'PS-SWPop', 'PS-WSaTo',\n", " 'PS-WTotl', 'DO-PSDel', 'DO-TOTAL ', 'IC-WFrTo', 'IC-IrSpr',\n", " 'IC-IrMic', 'IC-IrSur', 'IC-IrTot', 'IG-WFrTo', 'IG-IrSpr',\n", " 'IG-IrMic', 'IG-IrSur', 'IG-IrTot', 'PT-Power', 'PO-WFrTo',\n", " 'PO-WSaTo', 'PO-WTotl', 'PO-Power', 'PC-Power', 'COUNTY',\n", " 'DO-SSPCp', 'DO-PSPCp', 'AQ-WSaTo', 'AQ-WTotl'], dtype=object)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#CHECK: List classes with null srcClass remaps\n", "dfUse[pd.isnull(dfUse['SrcClass'])].Class.unique()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['TP-TotPop', 'PS-TOPop', 'PS-WFrTo', 'DO-SSPop', 'DO-WFrTo',\n", " 'IN-WGWTo', 'IN-WSWTo', 'IN-WFrTo', 'IN-WSaTo', 'IN-WTotl',\n", " 'IR-IrSpr', 'IR-IrMic', 'IR-IrSur', 'IR-IrTot', 'IR-WFrTo',\n", " 'AQ-WFrTo', 'LI-WFrTo', 'MI-WGWTo', 'MI-WSWTo', 'MI-WFrTo',\n", " 'MI-WSaTo', 'MI-WTotl', 'PT-WSWTo', 'PT-WFrTo', 'PT-WSaTo',\n", " 'PT-WTotl', 'PO-WSWTo', 'PC-WSWTo', 'PC-WFrTo', 'PC-WSaTo',\n", " 'PC-WTotl', 'TO-WGWTo', 'TO-WSWTo', 'TO-WFrTo', 'TO-WSaTo',\n", " 'TO-WTotl', 'PS-GWPop', 'PS-SWPop', 'PS-WGWTo', 'PS-WSWTo',\n", " 'PS-WSaTo', 'PS-WTotl', 'DO-PSDel', 'DO-TOTAL ', 'IC-WFrTo',\n", " 'IC-IrSpr', 'IC-IrMic', 'IC-IrSur', 'IC-IrTot', 'IG-WFrTo',\n", " 'IG-IrSpr', 'IG-IrMic', 'IG-IrSur', 'IG-IrTot', 'PT-WGWTo',\n", " 'PT-Power', 'PO-WGWTo', 'PO-WFrTo', 'PO-WSaTo', 'PO-WTotl',\n", " 'PO-Power', 'PC-WGWTo', 'PC-Power', 'COUNTY', 'DO-SSPCp',\n", " 'DO-PSPCp', 'AQ-WGWTo', 'AQ-WSWTo', 'AQ-WSaTo', 'AQ-WTotl'], dtype=object)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#CHECK: List classes with null srcClass remaps\n", "dfUse[pd.isnull(dfUse['SrcType'])].Class.unique()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Remove rows (axis = 0), with a null value in 'any' column\n", "dfOutput = dfUse.dropna(axis=0,how='any')" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STATEFIPSYEARClassAmountUseClassSrcClassSrcType
6444AL010012000PS-WGWFr5.7PublicSupplyGroundwaterFresh
6445AL010032000PS-WGWFr19.7PublicSupplyGroundwaterFresh
6446AL010052000PS-WGWFr4.85PublicSupplyGroundwaterFresh
6447AL010072000PS-WGWFr3.78PublicSupplyGroundwaterFresh
6448AL010092000PS-WGWFr2.55PublicSupplyGroundwaterFresh
\n", "
" ], "text/plain": [ " STATE FIPS YEAR Class Amount UseClass SrcClass SrcType\n", "6444 AL 01001 2000 PS-WGWFr 5.7 PublicSupply Groundwater Fresh\n", "6445 AL 01003 2000 PS-WGWFr 19.7 PublicSupply Groundwater Fresh\n", "6446 AL 01005 2000 PS-WGWFr 4.85 PublicSupply Groundwater Fresh\n", "6447 AL 01007 2000 PS-WGWFr 3.78 PublicSupply Groundwater Fresh\n", "6448 AL 01009 2000 PS-WGWFr 2.55 PublicSupply Groundwater Fresh" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#CHECK: see that new columns were added to decode the Class\n", "dfOutput.head()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Set the output location and filename\n", "dataDir = '../../Data'\n", "outFN = 'UsageDataTidy.csv'\n", "dfOutput.to_csv(dataDir + os.sep + outFN,index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }