{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Convert raw Louisiana SWUDS output to a formatted balance sheet\n", "The SWUDS output is a spreadsheet of individual industries tagged with its use category, water usage (in MGal/Day), and source type (SIC and Description). This script summarizes the statewide dataset to list the total usage by each SIC for a given year and reports the findings in a formatted MS Excel balance sheet. \n", "\n", "The resulting balance sheet is created dynamically. It will contain a column for each SIC found in the input database, listing usage totals in appropriate locations. It will also contain a column of total supply and usage values. \n", "\n", "This resulting balance sheet is limited by the data available: supply data are not provided, and usage data for Louisiana is solely from wells (no surface water intakes). \n", "\n", "August 2017. \n", "John.Fay@duke.edu" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#Import modules\n", "import sys, os, openpyxl, xlrd\n", "from openpyxl.utils import get_column_letter\n", "from copy import copy\n", "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Get the filenames\n", "dataDir = '../../Data'\n", "dataFN = dataDir + os.sep + 'LouisianaWaterUse_Distribute.xlsx' #Raw output from SWUDS database query\n", "templateFN = dataDir + os.sep + 'LA_SWUDS_BalanceSheet.xlsx' #Template of water balance sheet\n", "outFN = dataDir + os.sep + 'LA_SWUDS_BalanceSheet2.xlsx' #Output file containing all data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Extracting the SWUDS data\n", "This section imports and summarizes the raw SWUDS Excel data, producing a dataframe listing the total usage by each SIC for the selected years (2000, 2005, 2010). From this, the script generates a list of each unique SIC found in the dataset, which will be used to create individual columns in the output balance sheets. This dataframe is also used to identify the total usage by each category, the value of which is entered in the appropriate cell in the output balance sheet. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
County CodeSIC1DescriptionWater Use TypeMP IDMP TypeMP NameSource Water TypeAquifer CodeAquifer Name2Aquifer Group IDAquifer Group NameHUC-8HUC-12YearAnnual Amt. (MGD)
01112Riceri1173WellAc-RI-112CHCT-08080201gw112CHCTChicot Aquifer22Coastal lowlands aquifer system S100CSLLWD - 2...8080201.0NaN1979171.8598
11112Riceri1173WellAc-RI-112CHCT-08080201gw112CHCTChicot Aquifer22Coastal lowlands aquifer system S100CSLLWD - 2...8080201.0NaN198589.0000
21112Riceri1173WellAc-RI-112CHCT-08080201gw112CHCTChicot Aquifer22Coastal lowlands aquifer system S100CSLLWD - 2...8080201.0NaN198930.4600
31112Riceri1173WellAc-RI-112CHCT-08080201gw112CHCTChicot Aquifer22Coastal lowlands aquifer system S100CSLLWD - 2...8080201.0NaN199432.3040
41112Riceri1173WellAc-RI-112CHCT-08080201gw112CHCTChicot Aquifer22Coastal lowlands aquifer system S100CSLLWD - 2...8080201.0NaN199941.3039
\n", "
" ], "text/plain": [ " County Code SIC1 Description Water Use Type MP ID MP Type \\\n", "0 1 112 Rice ri 1173 Well \n", "1 1 112 Rice ri 1173 Well \n", "2 1 112 Rice ri 1173 Well \n", "3 1 112 Rice ri 1173 Well \n", "4 1 112 Rice ri 1173 Well \n", "\n", " MP Name Source Water Type Aquifer Code Aquifer Name2 \\\n", "0 Ac-RI-112CHCT-08080201 gw 112CHCT Chicot Aquifer \n", "1 Ac-RI-112CHCT-08080201 gw 112CHCT Chicot Aquifer \n", "2 Ac-RI-112CHCT-08080201 gw 112CHCT Chicot Aquifer \n", "3 Ac-RI-112CHCT-08080201 gw 112CHCT Chicot Aquifer \n", "4 Ac-RI-112CHCT-08080201 gw 112CHCT Chicot Aquifer \n", "\n", " Aquifer Group ID Aquifer Group Name \\\n", "0 22 Coastal lowlands aquifer system S100CSLLWD - 2... \n", "1 22 Coastal lowlands aquifer system S100CSLLWD - 2... \n", "2 22 Coastal lowlands aquifer system S100CSLLWD - 2... \n", "3 22 Coastal lowlands aquifer system S100CSLLWD - 2... \n", "4 22 Coastal lowlands aquifer system S100CSLLWD - 2... \n", "\n", " HUC-8 HUC-12 Year Annual Amt. (MGD) \n", "0 8080201.0 NaN 1979 171.8598 \n", "1 8080201.0 NaN 1985 89.0000 \n", "2 8080201.0 NaN 1989 30.4600 \n", "3 8080201.0 NaN 1994 32.3040 \n", "4 8080201.0 NaN 1999 41.3039 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Read the data xlsx file in as a dataframe\n", "dfRaw = pd.read_excel(dataFN,sheetname='data1')\n", "dfRaw.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Select only 2000, 2005, and 2010 records and list the number of data for each column\n", "dfSelect = dfRaw[dfRaw.Year.isin([2000,2005,2010])]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[u'Alkalies & Chlorine',\n", " u'Animal, Marine Fats & Oils',\n", " u'Carbon Black',\n", " u'Chemical Preparations',\n", " u'Civil, Social, and Fraternal Associations',\n", " u'Corretional Institutions',\n", " u'Cyclic-Crudes, Intermediates, Dyes & Org Pigments',\n", " u'Electric and Other Services Combined',\n", " u'Electrical Services',\n", " u'Engineering Services',\n", " u'Hotels and Motels',\n", " u'Industrial Inorganic Chemicals',\n", " u'Industrial Organic Chemicals',\n", " u'Marine Cargo Handling',\n", " u'Minerals & Earths: Ground Or Treated',\n", " u'Natural Gas Transmission',\n", " u'Natural Gas Transmission and Distribution',\n", " u'Nitrogenous Fertilizers',\n", " u'Noncommercial Research Organizations',\n", " u'Paper Mills',\n", " u'Paperboard Mills',\n", " u'Pesticides & Agricultural Chemicals',\n", " u'Petroleum Refining',\n", " u'Plastic Products',\n", " u'Plastics, Materials & Nonvulcanizable Elastomers',\n", " u'Primary Production of Aluminum',\n", " u'Products Of Petroleum & Coal',\n", " u'Psychiatric Hospitals',\n", " u'Pulp Mills',\n", " u'Refuse Systems',\n", " u'Secondary Smelting & Refining Of Nonferrous Metals',\n", " u'Shipbuilding & Repairing',\n", " u'Soft Drinks',\n", " u'Sugar, Cane',\n", " u'Sugar, Cane Refining',\n", " u'Synthetic Rubber (Vulcanizable Elastomers)',\n", " u'Water Supply',\n", " u'Water Transportation of Freight',\n", " 'TOTAL']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Create a list of all the unique SIC descriptions; these will define the columns in the resulting sheet. \n", "sectorNames = dfSelect.Description.unique().tolist()\n", "sectorNames.sort() #Sort the list alphabetically\n", "sectorNames.append(\"TOTAL\") #Add a total entry\n", "#Display the names\n", "sectorNames" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearSIC1Description200020052010
02061Sugar, Cane18.335212.418313.6363
12062Sugar, Cane Refining9.74052.58520.1233
22077Animal, Marine Fats & OilsNaN0.0758NaN
32086Soft Drinks0.33900.2410NaN
42611Pulp Mills40.944838.6510NaN
\n", "
" ], "text/plain": [ "Year SIC1 Description 2000 2005 2010\n", "0 2061 Sugar, Cane 18.3352 12.4183 13.6363\n", "1 2062 Sugar, Cane Refining 9.7405 2.5852 0.1233\n", "2 2077 Animal, Marine Fats & Oils NaN 0.0758 NaN\n", "3 2086 Soft Drinks 0.3390 0.2410 NaN\n", "4 2611 Pulp Mills 40.9448 38.6510 NaN" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Pivot on year, listing usage (Annual Amt. MGD), by SIC1/Description\n", "dfPivot = dfSelect.pivot_table(columns='Year', #Creates a usage column for each year\n", " index=('SIC1','Description'), #Summarizes values by SIC \n", " values='Annual Amt. (MGD)', #Value reported in the resulting table\n", " aggfunc='sum') #Computes the sum of all entries for each SIC\n", "#Reset the index to add index values back as columns\n", "dfPivot.reset_index(inplace=True)\n", "#Show the table\n", "dfPivot.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Open the balance sheet template workbook\n", "wb = openpyxl.load_workbook(templateFN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Functions\n", "Here, two functions are created. The *updateSICColumn* function add appropriate formula for an individual SIC entry in the balance sheet. It also names the column using the supplied sector name. The *updateTotalColumn* function similarly populates cell formulae in the appropriate locations, but these formula compute totals from all preceding SIC columns. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def updateSICColumn(worksheet,col,name,useGW=0):\n", " '''Updates the specified column in the balance sheet.\n", " col = the letter of the column to be updated\n", " name = the name of the sector\n", " useGW = the use from groundwater\n", " '''\n", " #Sector names\n", " worksheet['{0}2'.format(col)].value = name\n", " worksheet['{0}19'.format(col)].value = name\n", "\n", " ##Supply calculations\n", " worksheet['{0}12'.format(col)].value = '=SUM({0}9:{0}11)'.format(col)\n", " worksheet['{0}16'.format(col)].value = '=SUM({0}14:{0}15)'.format(col)\n", " worksheet['{0}17'.format(col)].value = '=SUM({0}12,{0}16)'.format(col)\n", "\n", " ##Use calculations\n", " worksheet['{0}24'.format(col)].value = '=SUM({0}21:{0}23)'.format(col)\n", " worksheet['{0}29'.format(col)].value = '=SUM({0}26:{0}28)'.format(col)\n", " worksheet['{0}34'.format(col)].value = '=SUM({0}24,{0}29)'.format(col)\n", "\n", " ##Totals\n", " worksheet['{0}36'.format(col)].value = '=({0}34-{0}17)'.format(col)\n", " \n", " ##Values\n", " worksheet['{0}22'.format(col)].value = useGW\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Function for updating formulate for the totals column. \n", "def updateTotalColumn(worksheet, colNum):\n", " totalCol = get_column_letter(colNum)\n", " prevCol = get_column_letter(colNum - 1)\n", "\n", " #Add sum formula in the following rows\n", " for row in [4,5,6,7,9,10,11,12,14,15,16,17,20,21,22,23,24,26,27,28,29,31,32,32,34]:\n", " worksheet['{0}{1}'.format(totalCol,row)].value = '=SUM(B{0}:{1}{0})'.format(row,prevCol)\n", "\n", " #Add the grand total (supply - use)\n", " worksheet['{0}36'.format(totalCol)].value = '=({0}34-{0}17)'.format(totalCol)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Main section\n", "This is the main section of the script. Here, it loops through each year and for each year it:\n", "* Gets the appropriate worksheet from the template\n", "* Updates the first column (which is preformatted in the template) with data for the first SIC\n", " * This include setting the column name and inserting the appropriate usage values\n", "* Loops through the remaining SIC entries, and: \n", " * Adds a new column to the worksheet, \n", " * Sets the style of rows in that column to match the initial column\n", " * And, as above, inserts appropriate formulae and values within the column\n", "* When the TOTAL column is reached:\n", " * A final column is added (again with matching style)\n", " * Formulate are updated using the updatTotalColumn function" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Loop through each year\n", "for year in (2000,2005,2010):\n", " \n", " #Get the sheet for the given year\n", " ws = wb.get_sheet_by_name(str(year))\n", " \n", " #Update the cells of the first sector column\n", " col = 'B'\n", " name = sectorNames[0]\n", " usage = dfPivot.loc[dfPivot.Description == sectorNames[0], year].iloc[0]\n", " updateSICColumn(ws, col, name, usage)\n", " \n", " #Loop through each sector, by index\n", " for idx in range(1,len(sectorNames)):\n", " \n", " #get the sector name at the index\n", " sectorName = sectorNames[idx]\n", " \n", " #get the letter of the column to create\n", " colLetter = get_column_letter(idx+1)\n", "\n", " #Add a new column and copy the style\n", " for src, dst in zip(ws['B:B'], ws['{0}:{0}'.format(colLetter)]):\n", " if src.value: dst.value = str(src.value).replace('B',colLetter)\n", " if src.has_style:\n", " dst.font = copy(src.font)\n", " dst.fill = copy(src.fill)\n", " dst.number_format = copy(src.number_format)\n", " dst.alignment = copy(src.alignment)\n", " dst.border = copy(src.border)\n", "\n", " #Update the value for the current sector, unless it's the TOTAL column\n", " if sectorName != 'TOTAL':\n", " usage = dfPivot.loc[dfPivot.Description == sectorName, year].iloc[0]\n", " updateSICColumn(ws, colLetter, sectorName, usage)\n", "\n", " #Otherwise, fill the column with the formulae for the Totals\n", " else: \n", " updateTotalColumn(ws, idx+1)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Save\n", "wb.save(outFN)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }