{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![panda](img/panda.png)\n", "# Data Munging with Python and Pandas\n", "\n", "Becky Sweger, 18F Engineering Team" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What is Pandas?\n", "\n", "From the [docs](http://pandas.pydata.org/pandas-docs/stable/):\n", "\n", "> pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Demo getting a dataframe:\n", "\n", "`eiti = pd.read_excel('https://useiti.doi.gov/downloads/federal_revenue_onshore_acct-year_FY04-14_2015-11-20.xlsx')`\n", "\n", "Slice:\n", "\n", "`eiti[eiti['Total']>10000]`\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Pandas\n", "\n", "* Series: 1-dimensional\n", " * think column in a spreadsheet\n", "* Dataframe: 2-dimensional:\n", " * think spreadsheet\n", " * analagous to an R data.frame" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![Pandas ecosystem](img/pandas-ecosystem.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Pandas is good for:\n", "\n", "* fast, repeatable data wrangling\n", "* structured-ish data\n", "* integration with Python data stack\n", "* glue" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Array Programming\n", "\n", "Pandas is built on top of [Numpy](http://www.numpy.org/), a Python library that enables array programming\n", "\n", "* fast, efficient operations on arrays of homogeneous data\n", "* you operate on aggregates of data; no explicit loops\n", "* many operations performed via compiled code" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Data Munging\n", "\n", "Step 0: if you're new to Python, skip the yak shaving and [download the Anaconda Python distribution](https://www.continuum.io/downloads \"download Anaconda\").\n", "\n", "1. Get the data\n", "2. Explore\n", "3. Clean\n", "4. Augment & Prep\n", "5. Output" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Step 1: Get The Data" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "source": [ "Pandas makes it easy to create a DataFrame from existing data.\n", "\n", "The [U.S. Federal Budget](https://www.whitehouse.gov/omb/budget \"The President's Budget\")\n", "* [Outlays](https://www.whitehouse.gov/omb/budget/Supplemental \"supplemental materials\") (scroll down to _Public Budget Database_)\n", "* [\"Deflators\"](https://www.whitehouse.gov/omb/budget/Historicals \"historical tables\") (scroll down to _Table 10.1_)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# import things\n", "import pandas as pd \n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import matplotlib\n", "matplotlib.style.use('ggplot')\n", "%matplotlib inline\n", "pd.set_option('display.float_format', lambda x: '%.3f' % x)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Agency CodeAgency NameBureau CodeBureau NameAccount CodeAccount NameTreasury Agency CodeSubfunction CodeSubfunction TitleBEA Category...2012201320142015201620172018201920202021
01Legislative Branch0Legislative BranchnanReceipts, Central fiscal operationsnan803Central fiscal operationsMandatory...0000000000
11Legislative Branch0Legislative BranchnanReceipts, Central fiscal operationsnan908Other interestNet interest...0000000000
21Legislative Branch0Legislative Branch241400.000Charges for services to trust fundsnan803Central fiscal operationsMandatory...0000000000
31Legislative Branch5Senate0.000Senate0.000801Legislative functionsDiscretionary...0000000000
41Legislative Branch5Senate100.000Compensation of Members, Senate0.000801Legislative functionsMandatory...23,00023,00023,00023,00027,00024,00024,00024,00024,00024,000
\n", "

5 rows × 73 columns

\n", "
" ], "text/plain": [ " Agency Code Agency Name Bureau Code Bureau Name \\\n", "0 1 Legislative Branch 0 Legislative Branch \n", "1 1 Legislative Branch 0 Legislative Branch \n", "2 1 Legislative Branch 0 Legislative Branch \n", "3 1 Legislative Branch 5 Senate \n", "4 1 Legislative Branch 5 Senate \n", "\n", " Account Code Account Name Treasury Agency Code \\\n", "0 nan Receipts, Central fiscal operations nan \n", "1 nan Receipts, Central fiscal operations nan \n", "2 241400.000 Charges for services to trust funds nan \n", "3 0.000 Senate 0.000 \n", "4 100.000 Compensation of Members, Senate 0.000 \n", "\n", " Subfunction Code Subfunction Title BEA Category ... 2012 \\\n", "0 803 Central fiscal operations Mandatory ... 0 \n", "1 908 Other interest Net interest ... 0 \n", "2 803 Central fiscal operations Mandatory ... 0 \n", "3 801 Legislative functions Discretionary ... 0 \n", "4 801 Legislative functions Mandatory ... 23,000 \n", "\n", " 2013 2014 2015 2016 2017 2018 2019 2020 2021 \n", "0 0 0 0 0 0 0 0 0 0 \n", "1 0 0 0 0 0 0 0 0 0 \n", "2 0 0 0 0 0 0 0 0 0 \n", "3 0 0 0 0 0 0 0 0 0 \n", "4 23,000 23,000 23,000 27,000 24,000 24,000 24,000 24,000 24,000 \n", "\n", "[5 rows x 73 columns]" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create a DataFrame from the outlays .csv\n", "outlays = pd.read_csv('data/outlays.csv')\n", "outlays.head() " ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create a DataFrame from the \"deflators\" .xls (requires xlrd library)\n", "deflators = pd.read_excel('https://obamawhitehouse.archives.gov/sites/default/files/omb/budget/fy2017//assets/hist10z1.xls')\n", "type(deflators)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Other things that can create a DataFrame\n", "\n", "* Python-specific (_e.g._, tuples, lists, dictionaries, pickle)\n", "* Relational database\n", "* JSON\n", "* HTML\n", "* Stata/SAS\n", "* HDF5\n", "* MessagePack (experimental)\n", "* Google's BigQuery (experimental)\n", "\n", "More information: [IO Tools](http://pandas.pydata.org/pandas-docs/stable/io.html \"Pandas IO Tools\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Take a peek...\n", "\n", "Some useful functions for a first look at your DataFrame:\n", "\n", "* [head](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html \"DataFrame.head\")\n", "* [tail](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.tail.html \"DataFrame.tail\")\n", "* [iloc](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html \"pandas.DataFrame.iloc\")\n", "* [unique](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html \"pandas.Series.unique\")\n", "\n", "See also:\n", "\n", "* [Pandas docs: Indexing and Selecting Data](http://pandas.pydata.org/pandas-docs/stable/indexing.html \"Pandas: indexing and selecting data\")\n", "* [Chris Albon's indexing and selecting examples](http://chrisalbon.com/python/pandas_indexing_selecting.html \"Pandas indexing and selecting examples\")" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Agency CodeAgency NameBureau CodeBureau NameAccount CodeAccount NameTreasury Agency CodeSubfunction CodeSubfunction TitleBEA Category...2012201320142015201620172018201920202021
5081902Undistributed Offsetting Receipts0Undistributed Offsetting Receipts977120.000Interest, Special Worker's Compensation Expenses16.000902Interest received by on-budget trust fundsNet interest...0000000000
5082902Undistributed Offsetting Receipts0Undistributed Offsetting Receipts977910.000Employing agency contributions, Miscellaneous ...20.000951Employer share, employee retirement (on-budget)Mandatory...0000000000
5083902Undistributed Offsetting Receipts0Undistributed Offsetting Receipts977920.000Interest, Miscellaneous Trust Funds, Governmen...20.000902Interest received by on-budget trust fundsNet interest...0000000000
5084902Undistributed Offsetting Receipts0Undistributed Offsetting Receipts997120.000Interest, Other DOD Trust Funds17.000902Interest received by on-budget trust fundsNet interest...0000-1,000-1,000-1,000-1,000-1,000-1,000
5085930Miscellaneous Receipts Below the Reporting Thr...0Miscellaneous Receipts Below the Reporting Thr...901000.000Miscellaneous Unconverted Offsetting Receipts99.000809Deductions for offsetting receiptsMandatory...-7,0000-8,000-15,000000000
\n", "

5 rows × 73 columns

\n", "
" ], "text/plain": [ " Agency Code Agency Name \\\n", "5081 902 Undistributed Offsetting Receipts \n", "5082 902 Undistributed Offsetting Receipts \n", "5083 902 Undistributed Offsetting Receipts \n", "5084 902 Undistributed Offsetting Receipts \n", "5085 930 Miscellaneous Receipts Below the Reporting Thr... \n", "\n", " Bureau Code Bureau Name \\\n", "5081 0 Undistributed Offsetting Receipts \n", "5082 0 Undistributed Offsetting Receipts \n", "5083 0 Undistributed Offsetting Receipts \n", "5084 0 Undistributed Offsetting Receipts \n", "5085 0 Miscellaneous Receipts Below the Reporting Thr... \n", "\n", " Account Code Account Name \\\n", "5081 977120.000 Interest, Special Worker's Compensation Expenses \n", "5082 977910.000 Employing agency contributions, Miscellaneous ... \n", "5083 977920.000 Interest, Miscellaneous Trust Funds, Governmen... \n", "5084 997120.000 Interest, Other DOD Trust Funds \n", "5085 901000.000 Miscellaneous Unconverted Offsetting Receipts \n", "\n", " Treasury Agency Code Subfunction Code \\\n", "5081 16.000 902 \n", "5082 20.000 951 \n", "5083 20.000 902 \n", "5084 17.000 902 \n", "5085 99.000 809 \n", "\n", " Subfunction Title BEA Category ... \\\n", "5081 Interest received by on-budget trust funds Net interest ... \n", "5082 Employer share, employee retirement (on-budget) Mandatory ... \n", "5083 Interest received by on-budget trust funds Net interest ... \n", "5084 Interest received by on-budget trust funds Net interest ... \n", "5085 Deductions for offsetting receipts Mandatory ... \n", "\n", " 2012 2013 2014 2015 2016 2017 2018 2019 2020 \\\n", "5081 0 0 0 0 0 0 0 0 0 \n", "5082 0 0 0 0 0 0 0 0 0 \n", "5083 0 0 0 0 0 0 0 0 0 \n", "5084 0 0 0 0 -1,000 -1,000 -1,000 -1,000 -1,000 \n", "5085 -7,000 0 -8,000 -15,000 0 0 0 0 0 \n", "\n", " 2021 \n", "5081 0 \n", "5082 0 \n", "5083 0 \n", "5084 -1,000 \n", "5085 0 \n", "\n", "[5 rows x 73 columns]" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outlays.tail()" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Table 10.1—GROSS DOMESTIC PRODUCT AND DEFLATORS USED IN THE HISTORICAL TABLES: 1940–2021Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15
0(Fiscal Year 2009 = 1.000)NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1Fiscal YearGDP (in billions of dollars)GDP (Chained) Price IndexComposite Outlay DeflatorsNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2NaNNaNNaNTotalTotal DefenseTotal Non- defensePayment for IndividualsNaNNaNOther GrantsNet InterestUndis- tributed Offsetting ReceiptsAll OtherAddendum: Direct CapitalNaNNaN
862021 estimate22875.2001.2291.2551.2361.2581.2581.2581.2591.3611.2291.2831.2591.1981.1931.214
87Note: Constant dollar research and development...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " Table 10.1—GROSS DOMESTIC PRODUCT AND DEFLATORS USED IN THE HISTORICAL TABLES: 1940–2021 \\\n", "0 (Fiscal Year 2009 = 1.000) \n", "1 Fiscal Year \n", "2 NaN \n", "86 2021 estimate \n", "87 Note: Constant dollar research and development... \n", "\n", " Unnamed: 1 Unnamed: 2 \\\n", "0 NaN NaN \n", "1 GDP (in billions of dollars) GDP (Chained) Price Index \n", "2 NaN NaN \n", "86 22875.200 1.229 \n", "87 NaN NaN \n", "\n", " Unnamed: 3 Unnamed: 4 Unnamed: 5 \\\n", "0 NaN NaN NaN \n", "1 Composite Outlay Deflators NaN NaN \n", "2 Total Total Defense Total Non- defense \n", "86 1.255 1.236 1.258 \n", "87 NaN NaN NaN \n", "\n", " Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10 \\\n", "0 NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN \n", "2 Payment for Individuals NaN NaN Other Grants Net Interest \n", "86 1.258 1.258 1.259 1.361 1.229 \n", "87 NaN NaN NaN NaN NaN \n", "\n", " Unnamed: 11 Unnamed: 12 Unnamed: 13 \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 Undis- tributed Offsetting Receipts All Other Addendum: Direct Capital \n", "86 1.283 1.259 1.198 \n", "87 NaN NaN NaN \n", "\n", " Unnamed: 14 Unnamed: 15 \n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "86 1.193 1.214 \n", "87 NaN NaN " ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# grab first 3 and the last 2 rows of the deflators dataframe\n", "\n", "deflators.iloc[[0,1,2,-2,-1]]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Step 2: Explore\n", "\n", "What kind of pain are you in for?" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "Index(['Agency Code', 'Agency Name', 'Bureau Code', 'Bureau Name',\n", " 'Account Code', 'Account Name', 'Treasury Agency Code',\n", " 'Subfunction Code', 'Subfunction Title', 'BEA Category',\n", " 'Grant/non-grant split', 'On- or Off- Budget', '1962', '1963', '1964',\n", " '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973',\n", " '1974', '1975', '1976', 'TQ', '1977', '1978', '1979', '1980', '1981',\n", " '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990',\n", " '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999',\n", " '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008',\n", " '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017',\n", " '2018', '2019', '2020', '2021'],\n", " dtype='object')" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# see the column names\n", "outlays.columns" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 5086 entries, 0 to 5085\n", "Data columns (total 73 columns):\n", "Agency Code 5086 non-null int64\n", "Agency Name 5086 non-null object\n", "Bureau Code 5086 non-null int64\n", "Bureau Name 5086 non-null object\n", "Account Code 5048 non-null float64\n", "Account Name 5086 non-null object\n", "Treasury Agency Code 4944 non-null float64\n", "Subfunction Code 5086 non-null int64\n", "Subfunction Title 5086 non-null object\n", "BEA Category 5086 non-null object\n", "Grant/non-grant split 5086 non-null object\n", "On- or Off- Budget 5086 non-null object\n", "1962 5086 non-null object\n", "1963 5086 non-null object\n", "1964 5086 non-null object\n", "1965 5086 non-null object\n", "1966 5086 non-null object\n", "1967 5086 non-null object\n", "1968 5086 non-null object\n", "1969 5086 non-null object\n", "1970 5086 non-null object\n", "1971 5086 non-null object\n", "1972 5086 non-null object\n", "1973 5086 non-null object\n", "1974 5086 non-null object\n", "1975 5086 non-null object\n", "1976 5086 non-null object\n", "TQ 5086 non-null object\n", "1977 5086 non-null object\n", "1978 5086 non-null object\n", "1979 5086 non-null object\n", "1980 5086 non-null object\n", "1981 5086 non-null object\n", "1982 5086 non-null object\n", "1983 5086 non-null object\n", "1984 5086 non-null object\n", "1985 5086 non-null object\n", "1986 5086 non-null object\n", "1987 5086 non-null object\n", "1988 5086 non-null object\n", "1989 5086 non-null object\n", "1990 5086 non-null object\n", "1991 5086 non-null object\n", "1992 5086 non-null object\n", "1993 5086 non-null object\n", "1994 5086 non-null object\n", "1995 5086 non-null object\n", "1996 5086 non-null object\n", "1997 5086 non-null object\n", "1998 5086 non-null object\n", "1999 5086 non-null object\n", "2000 5086 non-null object\n", "2001 5086 non-null object\n", "2002 5086 non-null object\n", "2003 5086 non-null object\n", "2004 5086 non-null object\n", "2005 5086 non-null object\n", "2006 5086 non-null object\n", "2007 5086 non-null object\n", "2008 5086 non-null object\n", "2009 5086 non-null object\n", "2010 5086 non-null object\n", "2011 5086 non-null object\n", "2012 5086 non-null object\n", "2013 5086 non-null object\n", "2014 5086 non-null object\n", "2015 5086 non-null object\n", "2016 5086 non-null object\n", "2017 5086 non-null object\n", "2018 5086 non-null object\n", "2019 5086 non-null object\n", "2020 5086 non-null object\n", "2021 5086 non-null object\n", "dtypes: float64(2), int64(3), object(68)\n", "memory usage: 2.8+ MB\n" ] } ], "source": [ "# see datatypes and high-level view of missing data\n", "# watch points:\n", "# - is the number of rows in the ballpark?\n", "# - are there codes coming in as numeric datatypes?\n", "# - are there columns with unexpected null values?\n", "# - are any of these column names going to be annoying?\n", "# - NumPy loves floats! Are you planning to do any math?\n", "\n", "outlays.info()" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Agency CodeBureau CodeAccount CodeTreasury Agency CodeSubfunction Code
count5086.0005086.0005048.0004944.0005086.000
mean106.27719.283166440.52944.419484.108
std207.34225.214285555.50532.740249.309
min1.0000.0000.0000.00051.000
25%9.0000.000566.00014.000302.000
50%15.0009.0004082.50028.000452.000
75%27.00030.000271715.00075.000751.000
max930.00099.000997200.00099.000959.000
\n", "
" ], "text/plain": [ " Agency Code Bureau Code Account Code Treasury Agency Code \\\n", "count 5086.000 5086.000 5048.000 4944.000 \n", "mean 106.277 19.283 166440.529 44.419 \n", "std 207.342 25.214 285555.505 32.740 \n", "min 1.000 0.000 0.000 0.000 \n", "25% 9.000 0.000 566.000 14.000 \n", "50% 15.000 9.000 4082.500 28.000 \n", "75% 27.000 30.000 271715.000 75.000 \n", "max 930.000 99.000 997200.000 99.000 \n", "\n", " Subfunction Code \n", "count 5086.000 \n", "mean 484.108 \n", "std 249.309 \n", "min 51.000 \n", "25% 302.000 \n", "50% 452.000 \n", "75% 751.000 \n", "max 959.000 " ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# math! note that our dollar amts aren't being treated as numeric values\n", "outlays.describe()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Fixin' to do\n", "\n", "* Override data types\n", "* Retain leading zeroes\n", "* Skip header and footer rows\n", "* Convert whitespaces to null/NaN\n", "* Use your own column names\n", "* Exclude unwanted columns\n", "* Ignore blank rows (or not!)\n", "* [And more](http://pandas.pydata.org/pandas-docs/stable/io.html \"Pandas IO Tools\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Step 2A: Get the Data Again\n", "\n", "Yeah, you'll probably have to do that." ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "count 5086.000\n", "mean 48310.506\n", "std 924126.826\n", "min -8294670.000\n", "25% 0.000\n", "50% 0.000\n", "75% 0.000\n", "max 42955698.000\n", "Name: 1973, dtype: float64" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# some better data types\n", "# - force selected columns to string/object\n", "# - thousands parameter ensures incoming numbers w/ commas are not interpreted as string\n", "\n", "outlays = pd.read_csv(\n", " 'data/outlays.csv',\n", " thousands=',',\n", " dtype={'Agency Code': str,\n", " 'Bureau Code': str,\n", " 'Account Code': str,\n", " 'Treasury Agency Code': str,\n", " 'Subfunction Code': str}\n", " )\n", "# dollar amounts now treated as numeric\n", "outlays['1973'].describe()" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fiscal_yeargdp_billionsdeflator
0194098.2000.081
11941116.2000.084
21942147.7000.090
31943184.6000.096
41944213.8000.100
\n", "
" ], "text/plain": [ " fiscal_year gdp_billions deflator\n", "0 1940 98.200 0.081\n", "1 1941 116.200 0.084\n", "2 1942 147.700 0.090\n", "3 1943 184.600 0.096\n", "4 1944 213.800 0.100" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# take 2: deflators\n", "# - only take the 1st 3 columns\n", "# - specify better header names\n", "# - ignore header rows in spreadsheet\n", "# - ignore the footer row\n", "\n", "deflators = pd.read_excel('data/hist10z1.xls', \n", " parse_cols=2,\n", " names=['fiscal_year', 'gdp_billions', 'deflator'],\n", " header=None,\n", " skiprows=5,\n", " skip_footer=1\n", " )\n", "deflators.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Step 3: Clean\n", "\n", "The annoying stuff." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## For example\n", "\n", "* [Renaming](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rename.html \"pandas renaming\")\n", "* [Handling missing values](http://pandas.pydata.org/pandas-docs/stable/missing_data.html \"pandas: working with missing data\")\n", "* Delete unecessary data\n", "* [Clean up strings](http://pandas.pydata.org/pandas-docs/stable/text.html \"working with strings\")\n", "* [Tidy data](https://www.jstatsoft.org/article/view/v059i10 \"Tidy Data by Hadley Wickham\")\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Clean up outlays" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "Index(['agency_code', 'agency_name', 'bureau_code', 'bureau_name',\n", " 'account_code', 'account_name', 'treasury_agency_code',\n", " 'subfunction_code', 'subfunction_name', 'bea_category', 'grant_split',\n", " 'on_off_budget', '1962', '1963', '1964', '1965', '1966', '1967', '1968',\n", " '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', 'TQ',\n", " '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985',\n", " '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994',\n", " '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003',\n", " '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012',\n", " '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021'],\n", " dtype='object')" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Rename some columns\n", "\n", "outlays = outlays.rename(columns = {\n", " 'Agency Code': 'agency_code',\n", " 'Agency Name': 'agency_name',\n", " 'Bureau Code': 'bureau_code',\n", " 'Bureau Name': 'bureau_name',\n", " 'Account Code': 'account_code',\n", " 'Account Name': 'account_name',\n", " 'Treasury Agency Code': 'treasury_agency_code',\n", " 'Subfunction Code': 'subfunction_code',\n", " 'Subfunction Title': 'subfunction_name',\n", " 'BEA Category': 'bea_category',\n", " 'Grant/non-grant split': 'grant_split',\n", " 'On- or Off- Budget': 'on_off_budget'\n", " })\n", "outlays.columns" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Extraneous information\n", "\n", "del outlays['TQ']" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " account_code treasury_agency_code 2012\n", "0 NaN NaN 0\n", "1 NaN NaN 0\n", " account_code treasury_agency_code 2012\n", "0 missing missing 0\n", "1 missing missing 0\n" ] } ], "source": [ "# Missing values\n", "\n", "print(outlays[['account_code', 'treasury_agency_code', '2012']].head(2))\n", "outlays = outlays.fillna({\n", " 'account_code': 'missing',\n", " 'treasury_agency_code': 'missing',\n", " '1962': 0 # not needed here, but included as an example\n", " })\n", "print(outlays[['account_code', 'treasury_agency_code', '2012']].head(2))" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Receipts, Central fiscal operations'\n", " 'Charges for services to trust funds' 'Senate' ...,\n", " 'Interest, Miscellaneous Trust Funds, Government-wide'\n", " 'Interest, Other DOD Trust Funds'\n", " 'Miscellaneous Unconverted Offsetting Receipts']\n", "['receipts, central fiscal operations'\n", " 'charges for services to trust funds' 'senate' ...,\n", " 'interest, miscellaneous trust funds, government-wide'\n", " 'interest, other dod trust funds'\n", " 'miscellaneous unconverted offsetting receipts']\n", "['receipts, central fiscal operations'\n", " 'charges for services to trust funds' 'senate' ...,\n", " 'interest, miscellaneous trust funds, government-wide'\n", " 'interest, other DOD trust funds'\n", " 'miscellaneous unconverted offsetting receipts']\n" ] } ], "source": [ "# string clean-up\n", "print(pd.unique(outlays.account_name.ravel()))\n", "outlays.account_name = outlays.account_name.str.strip().str.lower()\n", "print(pd.unique(outlays.account_name.ravel()))\n", "\n", "# oops capitalize DOD again\n", "outlays.account_name = outlays.account_name.str.replace(' dod ', ' DOD ') # regex works too\n", "print(pd.unique(outlays.account_name.ravel()))\n", "\n", "outlays.subfunction_title = outlays.account_name.str.strip().str.lower()\n", "outlays.bea_category = outlays.bea_category.str.lower()" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "2423" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get discretionary spending only\n", "\n", "disc = outlays[outlays['bea_category'] == 'discretionary']\n", "# gut check number of rows\n", "len(disc.index)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Clean up deflators" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['1940' '1941' '1942' '1943' '1944' '1945' '1946' '1947' '1948' '1949'\n", " '1950' '1951' '1952' '1953' '1954' '1955' '1956' '1957' '1958' '1959'\n", " '1960' '1961' '1962' '1963' '1964' '1965' '1966' '1967' '1968' '1969'\n", " '1970' '1971' '1972' '1973' '1974' '1975' '1976' 'TQ' '1977' '1978' '1979'\n", " '1980' '1981' '1982' '1983' '1984' '1985' '1986' '1987' '1988' '1989'\n", " '1990' '1991' '1992' '1993' '1994' '1995' '1996' '1997' '1998' '1999'\n", " '2000' '2001' '2002' '2003' '2004' '2005' '2006' '2007' '2008' '2009'\n", " '2010' '2011' '2012' '2013' '2014' '2015' '2016 estimate' '2017 estimate'\n", " '2018 estimate' '2019 estimate' '2020 estimate' '2021 estimate']\n", "\n", "['1940' '1941' '1942' '1943' '1944' '1945' '1946' '1947' '1948' '1949'\n", " '1950' '1951' '1952' '1953' '1954' '1955' '1956' '1957' '1958' '1959'\n", " '1960' '1961' '1962' '1963' '1964' '1965' '1966' '1967' '1968' '1969'\n", " '1970' '1971' '1972' '1973' '1974' '1975' '1976' 'TQ' '1977' '1978' '1979'\n", " '1980' '1981' '1982' '1983' '1984' '1985' '1986' '1987' '1988' '1989'\n", " '1990' '1991' '1992' '1993' '1994' '1995' '1996' '1997' '1998' '1999'\n", " '2000' '2001' '2002' '2003' '2004' '2005' '2006' '2007' '2008' '2009'\n", " '2010' '2011' '2012' '2013' '2014' '2015' '2016' '2017' '2018' '2019'\n", " '2020' '2021']\n" ] } ], "source": [ "# clean up year column\n", "# - get rid of the \"estimate\" text by applying a function to the fiscal_year Series\n", "\n", "print(deflators.fiscal_year.unique())\n", "deflators.fiscal_year = deflators.fiscal_year.map(lambda x: x[:4])\n", "print('\\n{}'.format(deflators.fiscal_year.unique()))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 3A: Tidy Data \n", "\n", "> \n", "> **_Datasets are often constructed in bizarre ways_**\n", "> \n", "\n", "\\- Hadley Wickham, in his 2014 paper _Tidy Data_ ([Journal of Statistical Software, August 2014](https://www.jstatsoft.org/article/view/v059i10))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Principles of Tidy Data\n", "\n", "* each variable is a column\n", "* each observation is a row\n", "* each type of observational unit is a table\n", "* easy to model, manipulate, and visualize" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agency_codeagency_namebureau_codebureau_nameaccount_codeaccount_nametreasury_agency_codesubfunction_codesubfunction_namebea_categorygrant_spliton_off_budgetfiscal_yearamount
0001Legislative Branch00Legislative Branchmissingreceipts, central fiscal operationsmissing803Central fiscal operationsmandatoryNongrantOn-budget1962-628
1001Legislative Branch00Legislative Branchmissingreceipts, central fiscal operationsmissing908Other interestnet interestNongrantOn-budget19620
2001Legislative Branch00Legislative Branch241400charges for services to trust fundsmissing803Central fiscal operationsmandatoryNongrantOn-budget19620
3001Legislative Branch05Senate0000senate00801Legislative functionsdiscretionaryNongrantOn-budget196226946
4001Legislative Branch05Senate0100compensation of members, senate00801Legislative functionsmandatoryNongrantOn-budget19620
\n", "
" ], "text/plain": [ " agency_code agency_name bureau_code bureau_name \\\n", "0 001 Legislative Branch 00 Legislative Branch \n", "1 001 Legislative Branch 00 Legislative Branch \n", "2 001 Legislative Branch 00 Legislative Branch \n", "3 001 Legislative Branch 05 Senate \n", "4 001 Legislative Branch 05 Senate \n", "\n", " account_code account_name treasury_agency_code \\\n", "0 missing receipts, central fiscal operations missing \n", "1 missing receipts, central fiscal operations missing \n", "2 241400 charges for services to trust funds missing \n", "3 0000 senate 00 \n", "4 0100 compensation of members, senate 00 \n", "\n", " subfunction_code subfunction_name bea_category grant_split \\\n", "0 803 Central fiscal operations mandatory Nongrant \n", "1 908 Other interest net interest Nongrant \n", "2 803 Central fiscal operations mandatory Nongrant \n", "3 801 Legislative functions discretionary Nongrant \n", "4 801 Legislative functions mandatory Nongrant \n", "\n", " on_off_budget fiscal_year amount \n", "0 On-budget 1962 -628 \n", "1 On-budget 1962 0 \n", "2 On-budget 1962 0 \n", "3 On-budget 1962 26946 \n", "4 On-budget 1962 0 " ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# tidy the outlays (aka spending) dataset\n", "# - variables = agency, bureau, account, year, subfunction, etc.\n", "# - observation = dollar amount\n", "\n", "# list of dataframe columns to use as identifier variables\n", "# (all columns not listed will be unpivoted and added an additional identifier columm)\n", "variables = ['agency_code', 'agency_name', 'bureau_code', 'bureau_name', 'account_code', 'account_name', 'treasury_agency_code', 'subfunction_code', 'subfunction_name', 'bea_category', 'grant_split', 'on_off_budget'] \n", "outlays = pd.melt(outlays,\n", " id_vars=variables,\n", " # name of the new, unpivoted identifier column\n", " var_name='fiscal_year',\n", " # name of the value column for the unpivoted data\n", " value_name='amount'\n", " )\n", "\n", "outlays.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Useful Pandas Tidy-ing Functions\n", "\n", "* [melt](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html \"pandas.melt\")\n", "* [stack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.stack.html \"pandas.DataFrame.stack\")\n", "* [unstack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html \"pandas.DataFrame.unstack\")\n", "* [pivot_table](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html \"pandas.pivot_table\")\n", "\n", "See also:\n", "* Pandas docs: [Reshaping and pivot tables](http://pandas.pydata.org/pandas-docs/stable/reshaping.html \"Pandas: reshaping and pivot tables\")\n", "* [Tidy Data in Python](https://www.ibm.com/developerworks/community/blogs/jfp/entry/Tidy_Data_In_Python?lang=en \"Tidy Data in Python\")\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Step 4: Fun Stuff" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Calculating New Columns\n", "\n", "We need inflation-adjusted amounts to look at spending trends over time." ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agency_codeagency_namebureau_codebureau_nameaccount_codeaccount_nametreasury_agency_codesubfunction_codesubfunction_namebea_categorygrant_spliton_off_budgetfiscal_yearamountgdp_billionsdeflator
0001Legislative Branch00Legislative Branchmissingreceipts, central fiscal operationsmissing803Central fiscal operationsmandatoryNongrantOn-budget1962-628586.9000.178
1001Legislative Branch00Legislative Branchmissingreceipts, central fiscal operationsmissing908Other interestnet interestNongrantOn-budget19620586.9000.178
2001Legislative Branch00Legislative Branch241400charges for services to trust fundsmissing803Central fiscal operationsmandatoryNongrantOn-budget19620586.9000.178
3001Legislative Branch05Senate0000senate00801Legislative functionsdiscretionaryNongrantOn-budget196226946586.9000.178
4001Legislative Branch05Senate0100compensation of members, senate00801Legislative functionsmandatoryNongrantOn-budget19620586.9000.178
\n", "
" ], "text/plain": [ " agency_code agency_name bureau_code bureau_name \\\n", "0 001 Legislative Branch 00 Legislative Branch \n", "1 001 Legislative Branch 00 Legislative Branch \n", "2 001 Legislative Branch 00 Legislative Branch \n", "3 001 Legislative Branch 05 Senate \n", "4 001 Legislative Branch 05 Senate \n", "\n", " account_code account_name treasury_agency_code \\\n", "0 missing receipts, central fiscal operations missing \n", "1 missing receipts, central fiscal operations missing \n", "2 241400 charges for services to trust funds missing \n", "3 0000 senate 00 \n", "4 0100 compensation of members, senate 00 \n", "\n", " subfunction_code subfunction_name bea_category grant_split \\\n", "0 803 Central fiscal operations mandatory Nongrant \n", "1 908 Other interest net interest Nongrant \n", "2 803 Central fiscal operations mandatory Nongrant \n", "3 801 Legislative functions discretionary Nongrant \n", "4 801 Legislative functions mandatory Nongrant \n", "\n", " on_off_budget fiscal_year amount gdp_billions deflator \n", "0 On-budget 1962 -628 586.900 0.178 \n", "1 On-budget 1962 0 586.900 0.178 \n", "2 On-budget 1962 0 586.900 0.178 \n", "3 On-budget 1962 26946 586.900 0.178 \n", "4 On-budget 1962 0 586.900 0.178 " ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# merge discretionary outlays and deflators dataframes\n", "# - by default Pandas will use like column names as the merge key\n", "# - this is very intuitive for SQL nerds\n", "\n", "outlays = outlays.merge(deflators, how='left')\n", "outlays.head()" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fiscal_yearamountamount_adj
01962-628-3522.154
1196200.000
2196200.000
3196226946151127.314
4196200.000
\n", "
" ], "text/plain": [ " fiscal_year amount amount_adj\n", "0 1962 -628 -3522.154\n", "1 1962 0 0.000\n", "2 1962 0 0.000\n", "3 1962 26946 151127.314\n", "4 1962 0 0.000" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add calculated column to represent spending amount in 2009 dollars\n", "\n", "outlays['amount_adj'] = outlays.amount / outlays.deflator\n", "outlays[['fiscal_year', 'amount', 'amount_adj']].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Grouping\n", "\n", "Summarize spending by category." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
budget_function_desc
budget_function_code
010national defense
150international affairs
250general science, space, and technology
270energy
300natural resources and environment
\n", "
" ], "text/plain": [ " budget_function_desc\n", "budget_function_code \n", "010 national defense\n", "150 international affairs\n", "250 general science, space, and technology\n", "270 energy\n", "300 natural resources and environment" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get a list of budget function codes from sqlite & tweak to use as a lookup\n", "\n", "from sqlalchemy import create_engine\n", "db_engine = create_engine('sqlite:///data/pandas_data_munging.db')\n", "functions = pd.read_sql_query(\n", " 'SELECT budget_function_code, budget_function_desc FROM budget_function', db_engine\n", ")\n", "\n", "# index/sort the dataframe by budget_function_code\n", "functions = functions.set_index(keys='budget_function_code').sort_index()\n", "functions.head()" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subfunction_namefunction_name
305155Interest received by on-budget trust fundsnet interest
305156Employer share, employee retirement (on-budget)undistributed offsetting receipts
305157Interest received by on-budget trust fundsnet interest
305158Interest received by on-budget trust fundsnet interest
305159Deductions for offsetting receiptsgeneral government
\n", "
" ], "text/plain": [ " subfunction_name \\\n", "305155 Interest received by on-budget trust funds \n", "305156 Employer share, employee retirement (on-budget) \n", "305157 Interest received by on-budget trust funds \n", "305158 Interest received by on-budget trust funds \n", "305159 Deductions for offsetting receipts \n", "\n", " function_name \n", "305155 net interest \n", "305156 undistributed offsetting receipts \n", "305157 net interest \n", "305158 net interest \n", "305159 general government " ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# use the budget function information from sqllite to add this\n", "# higher-level category to our discretionary spending dataframe\n", "# (might not be the best approach but demos a few Pandas things)\n", "\n", "def get_budget_function(sf):\n", " i = functions.index.get_loc(sf, method='ffill')\n", " return functions.iloc[i].budget_function_desc\n", "\n", "subfunctions = pd.DataFrame(\n", " outlays.subfunction_code.unique(), \n", " columns=['subfunction_code']\n", ")\n", "subfunctions['function_name'] = subfunctions.subfunction_code.map(\n", " get_budget_function)\n", "outlays = outlays.merge(subfunctions, how='left')\n", "outlays[['subfunction_name', 'function_name']].tail()" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
amount_adj
fiscal_yearfunction_name
2021income security482216528.388
international affairs42545143.973
medicare610920774.361
national defense498161704.897
natural resources and environment37465430.291
net interest467040832.927
social security999925166.748
transportation105579144.298
undistributed offsetting receipts-84646982.268
veterans benefits and services169685212.299
\n", "
" ], "text/plain": [ " amount_adj\n", "fiscal_year function_name \n", "2021 income security 482216528.388\n", " international affairs 42545143.973\n", " medicare 610920774.361\n", " national defense 498161704.897\n", " natural resources and environment 37465430.291\n", " net interest 467040832.927\n", " social security 999925166.748\n", " transportation 105579144.298\n", " undistributed offsetting receipts -84646982.268\n", " veterans benefits and services 169685212.299" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# group spending by year and category\n", "grouped = outlays.groupby(['fiscal_year', 'function_name'])\n", " \n", "grouped.agg({'amount_adj': 'sum'}).tail(10)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfkAAAFhCAYAAABzg9PKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl4VOX5PvB7lkyWmewZkpCQBUhYwhICCgiIgoisgqzR\nAhWwaou2orYu3yqtlqXgVhRUbKul/RUkbgQVKYtg2QkkIYEECEkgkH2fPZPz/v6IHUUgzEAmk8zc\nn+vyCjNnzswzjwN3zpn3vK9MCCFAREREbkfu6gKIiIjIORjyREREboohT0RE5KYY8kRERG6KIU9E\nROSmGPJERERuym1CPisrC/PmzWv1MStWrMDMmTMxe/ZsZGRktFNlRERErqF0dQFtYcOGDdi6dSt8\nfX2v+5i8vDycOHECW7ZsQXFxMZYuXYpPP/20HaskIiJqX25xJB8TE4O1a9fabufn52PevHmYN28e\nnnjiCTQ2NqJLly7w8fGBxWKBTqeDUukWv98QERFdl1uE/Pjx468I7d///vd4+eWXsXHjRtx55534\n4IMPoFQqIZfLMWHCBDz88MNYuHChCysmIiJyPrc8nC0oKMAf/vAHAEBTUxPi4uLw+eefIywsDH/9\n61+h1+vx4IMPIjk5GRERES6uloiIyDncMuTj4+OxatUqdO3aFRkZGaisrITJZIKfnx8UCgXUajVU\nKhUMBoOrSyUiInIatwz5ZcuW4Xe/+x2sVitkMhn+9Kc/ISYmBsePH8fcuXPR3NyMKVOmoHv37q4u\nlYiIyGlkXIWOiIjIPbnFwDsiIiK6GkOeiIjITXX67+QrKxtdXUK7Cg72Q20tBwzai/1yDPtlP/bK\nMeyXY1rrl1brb/fz8Ei+k1EqFa4uoVNhvxzDftmPvXIM++WYtuoXQ56IiMhNMeSJiIjcFEOeiIjI\nTTHkiYiI3BRDnoiIyE0x5ImIiNwUQ56IiMhNMeSJiIjcFEOeiIjITTHkiYiI3BRDnoiIqJOoaTA5\n9HiGPBERUSfQaLBg5b+OO7QPQ56IiKiDszZLeOezHFTV80ieiIjIbQghsPGbfJy5WIchvbQO7cuQ\nJyIi6sD+c/QivssuRWy4PxZN7uvQvgx5IiKiDiq7oBqb95xDoEaFJ2b0h7eXY+vMM+SJiIg6oEtV\nery3NQdKhRxPPDAAIQE+Dj8HQ56IiKiDadBbsDYtG0ZzMx6e2Bvduwbc1PMo27guIiIiugXWZglv\nfHQUFXVGTL4jDsP6Rtz0c/FInoiIqIMQQuBf/zmDkwVVGNxLi2mj4m/p+RjyREREHcTOYyXYm3kZ\n3aMCsXhSX8hlslt6Pqedrm9qasILL7yAS5cuwWKx4PHHH8fYsWNt23fv3o133nkHSqUSM2bMwOzZ\nsyFJEpYtW4b8/HyoVCq8+uqriI2NdVaJREREHUZ2QRU27T6LQLUK//fwUMBqveXndFrIb926FUFB\nQVi9ejXq6uowbdo0W8g3NTVhxYoVSEtLg6+vL1JTUzFmzBgcP34cFosFmzdvRmZmJlauXIn169c7\nq0QiIqIOoaRCh3e/yIVSIceTMwdAG+yLysrGW35ep4X8fffdh/HjxwNo+Y5Bofjh2r6CggLExMQg\nMDAQADB48GAcPXoUmZmZGDVqFAAgOTkZOTk5ziqPiIioQ2jQW/BWWjZMlmY8dn8S4iNvbiT9tTgt\n5NVqNQBAp9PhySefxG9+8xvbNp1OB39//yseq9PpoNPpoNFobPcrFApYrVYoldcvMzjYD0qlY5MD\ndHZarf+NH0Q27Jdj2C/7sVeOYb+uZmlqxp//fQLVDSY8dF9vTLqzp21bW/TLqZfQlZaW4le/+hUe\nfPBBTJkyxXa/RqOBXq+33dbr9fD397/qfkmSWg14AKitNbR94R2YVuvfJqdwPAX75Rj2y37slWPY\nr6sJIbBh2ynkFddiWN9wjBkYaetRa/1yJPydNrq+qqoKCxcuxLPPPouZM2desa1Hjx4oLi5GXV0d\nLBYLjh07hkGDBiElJQX79u0DAGRmZiIxMdFZ5REREbnUtgNFOJRbjh5RAXh4Ym/IbnEk/bU47Uj+\n3XffRUNDA9atW4d169YBAGbNmgWj0Yg5c+bgueeew6JFiyCEwIwZMxAeHo5x48Zh//79mDt3LoQQ\nWL58ubPKIyIicpmjeRX47LtChAZ4Y8kDA+DlpK+dZUII4ZRnbieedvqHp7wcw345hv2yH3vlGPbr\nB4WlDVj5r+OQy2V48WeDEd1Fc9VjOvzpeiIiIrqStVnC+1tzYW2W8NjUpGsGfFtiyBMREbWTXRkl\nKK81YsygaAzsGeb012PIExERtYMGgwVb9xdB7aPE/bc4J729GPJERETt4PPvCmE0WzF1ZDw0vl7t\n8poMeSIiIie7WKHD3sxLiAz1w92DotrtdRnyRERETiSEwKZdZyEEMHdsApSK9otehjwREZETZZ6t\nwuniWvTvHor+3UPb9bUZ8kRERE7SZJWwefc5yGUyzB3b88Y7tDGGPBERkZPsyihBRZ0RY1KiEBmq\nbvfXZ8gTERE5QYPegvQDhVD7KDF1ZPtcMvdTDHkiIiIn+HTfeRjNzZg2qnu7XTL3Uwx5IiKiNnah\nvBHfZV1G1zA17hrU1WV1MOSJiIjakO2SOQBzx/aEQu66qGXIExERtaGdx0qQd6EOA3uEol98+14y\n91MMeSIiojZyMKcM/951FoFqFR4al+jqchjyREREbSHrXBX++uVp+HkrsXROMsKCfF1dEkOeiIjo\nVp25WId1n+dAqZDh17MGoJuT14m3F0OeiIjoFlwob8RbadmQJIFfTu+HhOggV5dkw5AnIiK6SRW1\nBrz+cRZMZisWTeqDAT3CXF3SFRjyREREN6G20Yw1mzLRoLfgwXGJGJYU4eqSrsKQJyIicpDe1ITX\nP85EVb0J94+Mx9jB0a4u6ZoY8kRERA4wNzXjrS3ZuFSpx9iUaEwdEefqkq6LIU9ERGSnZknCe1/k\n4tylegztG47UcQmQyWSuLuu6GPJERER2EELgH9vzkXmuCklxwVg0qQ/kHTjgAYY8ERGRXT77rhDf\nZZciNsIfv5zeH0pFx4/Qjl8hERGRi+0+XoJtB4rQJcgXT80aCF9vpatLsgtDnoiIqBXH8irwrx1n\nEODnhaVzBiJArXJ1SXZjyBMREV1HXnEt3k/PhUqlwFOzk9El2M/VJTmEIU9ERHQNF8obsfbTbAgB\nLHmgP2Ij/F1dksMY8kRERD9RXW/CG1uyYDQ3Y/HkvkiKC3F1STfFqSMHsrKysGbNGmzcuNF2X2Vl\nJZYuXWq7ffr0aTz99NNITU3F9OnTodG0rNwTHR2NFStWOLM8IiKia9q8+yzqdRbMHdMTQ/uGu7qc\nm+a0kN+wYQO2bt0KX98r19PVarW20D9x4gTeeOMNzJ49G2azGUKIK34hICKijq+wtAEb0k9hxuju\nGNyri6vLuWUFl+txLL8SPboGYNxt3Vxdzi1x2un6mJgYrF279rrbhRB45ZVXsGzZMigUCuTl5cFo\nNGLhwoWYP38+MjMznVUaERG1oYz8SpTVGLDu8xx8m3nJoX1rG81I+7YA5TUGJ1XnGCEEtuwpAADM\nurtnh57Nzh5OO5IfP348SkpKrrt99+7dSEhIQPfu3QEAPj4+WLRoEWbNmoWioiI88sgj2L59O5TK\n1ksMDvaDUqlo09o7Oq228w3+cCX2yzHsl/3YqxaXvw9oja8K/9ieDwkyzL4n8aqA/Gm/jp0uxxv/\nPo4GvQUnzlbh9adGQ+Pr1W51X8vRU2U4c7EOt/eNwIgU1x7Ft8Xny2VX82/duhXz58+33Y6Pj0ds\nbCxkMhni4+MRFBSEyspKREZGtvo8tbUd47e/9qLV+qOystHVZXQa7Jdj2C/7sVcthBA4d7EO2iAf\nPDU7Ga9tysQ/t+ehtFKH1HsSbNO+/rhf1mYJn+47j+2HL0CpkKFffAhyCmuw4u+H8eTMAS6bKlaS\nBP66NQcyGTBleIxL//+29vlyJPxdNro+JycHKSkptttpaWlYuXIlAKC8vBw6nQ5ardZV5RERkR2q\nG0zQGZsQGxGAiBA/vDBvMKK0auzKKMGG9FOwNktXPL6qzohV/zqO7YcvoEuwL16cNwS/mTUQSfEh\nyC6oRvr+Ite8EQD7c0pxqVKPEf0jEaXVuKyOttRuIZ+eno7NmzcDAGpqaqDRaK44lTNz5kw0NjYi\nNTUVTz31FJYvX37DU/VERORaxWUtR5tx319DHuzvjeceSkHP6EAcPlWOt9KyYbJYAQAZ+RVY9vej\nKLjcgGF9w/Hyz29DbIQ/5HIZHp2ahLBAH3zx30Jknqtq9/dhaWrG598Vwkspx7SR8e3++s4iE0II\nVxdxKzztdBlPETqG/XIM+2U/9qrFJ3sL8OXBYjw9JxlJ8T9cS25uasb6z3OQXVCN+MgA9I4PwdcH\niqBSyvHQuESMHBB51Xf2xWWNWP7PDCgVcrz08yEIb8fZ5b4+VIwt3xZg4rBYzLyrR7u97vV0+tP1\nRETU+f3vSP6ns8F5eymw5IH+uKNfBApLG/D1gSJEhanx+wVDMGpg12uOWo+N8MeC+3rBaLbi7U9P\n2s4AOJvO2IQvDxZD7aPExGEx7fKa7YXnw4mI6KYIIVBU1oiwQJ9rjopXKuRYOKkPIkP9IGRyjBsc\nBW+v1q+GuqNfJAovN2LX8RJ8+HUeHp2a5PTL2L48WASD2Yo5Y3rCz8e1o/vbGkOeiIhuSm2jGTpj\nE3rFBF33MXKZDJOGxzn09cacsT1RXNGII6crEB8ZgPG3O+/ouqreiF0ZJQgN8MGYlGinvY6r8HQ9\nERHdlKKfDLprK0qFHL+c1g+BahW27CnA6eLaNn3+H/v8u0JYmwUeuLM7vJTuF4nu946IiKhdFF3n\n+/i2EKTxxi+n94NMBrz7RQ5qG81t/hoXyhtxMKcM3bpoMDSp885P3xqGPBER3RTboLtw58z8lxAd\nhNljeqLR0IQN6bmQpLa7GEwIgbRvCyAAzLqrh8sm4HE2hjwRETlMCIHisgaEBnjD30/ltNe5Z3A0\nBiWEIe9CHdIPFLXZ835z5CJyCmuQFBd8xaV/7oYhT0REDqttNKPB0DLTnTPJZDIsnNQHoQE+2Lq/\nEHlt8P18zvlqbPn2HII0Kiya3LfTL0LTGoY8ERE5rLjced/H/5TaxwuP3p8EGWR4Lz0XDQbLTT9X\neY0B736RC4VcjiUPDECQxrsNK+14GPJEROSwn05n62w9owIxY3R31Oss+GDbKUg3MVmr0WzFXz7J\nhsFsxYL7eqF7V+eehegIGPJEROQwZ46sv57xQ2NaVqw7X4NvjlxwaF9JCGxIP4XSagPuva0bRvRv\nfYVTd8GQJyIihxWXNSIkwBsBThx091NymQyLJ/dFoEaFT/eeR8Glerv3/eK7loVv+sYFY9bdrp+b\nvr0w5ImIyCG1jWbU6y1Ou3SuNQFqFX4xJQmSJPDuF7nQm5puuM+xvAqkHyiCNsgHj93fDwq550Sf\n57xTIiJqE+056O5a+sQGY8qIOFQ3mPD3r/LQ2mKqJRU6/PXL0/D2UuCJGQOuOce+O+Pc9URE5JD2\nHnR3LVNHxOPMxTocP1OJP350DEFqFfx8vKD2VULj4wW1rxf8fJT4bN95mJua8avp/RGt1bisXldh\nyBMRkUN+WF7WdaPT5XIZHpmShLfSsnCxXIfiVo7mp46Iw+Be2nasruNgyBMRkUOKyhoQ7O+NQHX7\nDbq7lmB/byx7+HYIIWCyNENvbILeZIXO1ASDyQq9sQlqXy+PDXiAIU9ERA6o15lRp7MguWeYq0ux\nkclk8PVWwtdbiY5TVcfAgXdERGQ3V1wfTzePIU9ERHZz9ch6cgxDnoiI7NYRRtaT/RjyRERkt6Ky\nRgRqVG6/sIu7YMgTEZFdGvQW1DaaEeeCme7o5jDkiYjILhx01/kw5ImIyC4cdNf5MOSJiMguPwy6\nc/912N0FQ56IiOxSXNaAALUKQRrXznRH9mPIExHRDTUaLKhuMCMuwh8ymczV5ZCdGPJERHRDtkVp\nOLK+U2HIExHRDRVxEpxOyakhn5WVhXnz5l11/4cffohJkyZh3rx5mDdvHs6fPw9JkvDSSy9hzpw5\nmDdvHoqLi51ZGhEROYAj6zsnp61Ct2HDBmzduhW+vr5XbcvJycGqVavQr18/2307duyAxWLB5s2b\nkZmZiZUrV2L9+vXOKo+IiOwkhEBRaQP8/bwQ7M+Z7joTpx3Jx8TEYO3atdfclpubi/fffx+pqal4\n7733AAAZGRkYNWoUACA5ORk5OTnOKo2IiBxw8nw1qhvM6BsXwkF3nYzTjuTHjx+PkpKSa26bNGkS\nHnzwQWg0GixZsgR79uyBTqeDRqOxPUahUMBqtUKpbL3E4GA/KJWKNq29o9NqebrMEeyXY9gv+3lC\nr4QQ+Or/HQcA/Gxi31t6z57Qr7bUFv1yWshfjxACCxYsgL9/S/GjR4/GqVOnoNFooNfrbY+TJOmG\nAQ8AtbUGp9XaEWm1/qisbHR1GZ0G++UY9st+ntKr7IJqnLlQh8G9tFArZTf9nj2lX22ltX45Ev7t\nPrpep9Nh8uTJ0Ov1EELg8OHD6NevH1JSUrBv3z4AQGZmJhITE9u7NCIi+hEhBLbuLwQATB0R7+Jq\n6Ga025F8eno6DAYD5syZg6eeegrz58+HSqXC8OHDMXr0aEiShP3792Pu3LkQQmD58uXtVRoREV1D\nTmENzl9uwOBELbp10dx4B+pwZEII4eoiboWnnf7hKS/HsF+OYb/s5+69EkJg+cYMFFxuwLKHb0PM\nLU6C4+79amud9nQ9ERF1fLmFNSi43ICURO0tBzy5DkOeiIiuIITAF7bv4uNcWwzdEoY8ERFdIbeo\nBgWXGjAoIYxH8Z0cQ56IiGyEENj63yIAHFHvDhjyRERkc6qoFucu1WNQQhjnqXcDDHkiIgLw0+/i\neRTvDhjyREQEADhVXItzJfVI7smjeHfBkCciou+/i285ir9/JI/i3UW7z11PRETtTwiBAzllKKsx\nQJIEmiXR8lO0/DSarTjLo3i3w5AnIvIAn+w9j68OFbf6GIVcxqN4N8OQJyJyczuOXMBXh4oRHuyL\nBff1hpeXHAq5DHKZrOWnvOWnn48XNL5eri6X2hBDnojIjR3MKcOm3ecQqFHh6TnJCAvydXVJ1I44\n8I6IyE1lF1Tjb1+dhp+3Ek/PZsB7IoY8EZEbKrhUj3Wfn4RcLsOTMwcgmkvFeiSGPBGRm7lcpceb\nW7JgtQo8fn8/JHYLcnVJ5CIMeSIiN1LTYMJrmzOhN1nx8wm9kZwQ5uqSyIUY8kREbkJnbMJrmzNR\n22jGrLt6YOSASFeXRC7W6uj66dOn47PPPkPv3r0hk8kghLhiu6+vL2bPno3nn3/eqUUSEVHrmqzN\n+Msn2SitNmD87d1w39AYV5dEHUCrIf/ZZ58BAPLy8q65vbGxERMmTGDIExG5kBACf/86D+dK6nF7\nny6YdXdPyGQyV5dFHUCrIf/222+3uvOSJUvw8ccft2lBRETkmPQDRTiUW44eUQFYNKkP5Ax4+p5d\n38lnZ2djx44dkMvlUKlU2Lt3L86dOwcA6Nq1q1MLJCKi6ztyuhyff1eIsEAfPPHAAHgpFa4uiTqQ\nVo/klyxZAgCYO3cuNm/eDF/flokUFixYgPnz5zu/OiIiuq6CS/X4YNtp+KgUeHLmAASoVa4uiToY\nu47ka2trr/h+p6mpCXV1dU4rioiIWldVZ8TaT7LRLEl4fFo/RGs52Q1dza6562fNmoUZM2bgzjvv\nhCRJ+Pbbb3kkT0TkIkazFW99ko0GQxMeGpeI/t1DXV0SdVB2hfzixYsxbNgwHDlyBDKZDG+99RZ6\n9+7t7NqIiOgnmiUJ736Ri0uVeowdHI2xg6NdXRJ1YHadrrdYLCgtLUVISAiCg4ORm5uLt956y9m1\nERHRT2zadQ4nz1ejf/dQzB3b09XlUAdn15H8kiVLYDQaceHCBQwZMgRHjx5FcnKys2sjIqLvSUJg\ny55z2JVRgiitGo/dnwSFnJOWUuvs+oQUFhbiH//4B8aNG4fFixdjy5YtqKiocHZtREQEwNos4YNt\np/DNkYuIDPXDU7MGwtfbrmM08nB2hXxoaChkMhni4+ORn5+P8PBwWCwWZ9dGROTxTBYr3krLbpns\npmsAnv/ZYIQE+Li6LOok7PpVMCEhAa+88gpSU1PxzDPPoKKiAk1NTc6ujYjIozXoLXhzSxaKyhox\nsEcoHpvWD95enOyG7GdXyC9btgwnTpxAz5498cQTT+DgwYN47bXXAACVlZXQarXX3C8rKwtr1qzB\nxo0br7h/27Zt+Oijj6BQKJCYmIhly5ZBLpdj+vTp0GharvWMjo7GihUrbuW9ERF1WpV1Rry2ORMV\ntUaM7B+JBRN68Tt4cphdIa9QKDBkyBAAwNixYzF27Fjbtl/84he2hWx+bMOGDdi6dattlrz/MZlM\nePPNN5Geng5fX18sXboUe/bswciRIyGEuOoXAiIiT3OhvBFvfJyFer0Fk4bH4oE7u3PBGbopt/xr\n4U+Xn/2fmJgYrF279qr7VSoVNm3aZAt/q9UKb29v5OXlwWg0YuHChZg/fz4yMzNvtTQiok4nr7gW\nK/91HA16C1LvScCM0T0Y8HTTbnl45vU+fOPHj0dJSclV98vlcoSFhQEANm7cCIPBgBEjRuDMmTNY\ntGgRZs2ahaKiIjzyyCPYvn07lMrWSwwO9oPSwxZk0Gr9XV1Cp8J+OYb9sl9b9+pypQ5vf3YS1mYJ\nz/5sCEYNimrT53c1frYc0xb9csk1GJIkYfXq1SgsLMTatWttI/djY2Ntfw4KCkJlZSUiIyNbfa7a\nWkM7Vd0xaLX+qKxsdHUZnQb75Rj2y35t3SuzpRl/2ngMBpMViyf3Qe/oALf6f8HPlmNa65cj4e+S\nURwvvfQSzGYz1q1bZzttn5aWhpUrVwIAysvLodPprjugj4jInQgh8I9v8lBSqcfdg6JwR7/WD26I\n7HXLR/LX+07+p9LT02EwGNCvXz+kpaVhyJAhWLBgAQBg/vz5mDlzJp5//nmkpqZCJpNh+fLlNzxV\nT0TkDr49cQkHc8sRHxmAuWMTXF0OuRGZsCOl33vvPTz66KNX3Pf6669j6dKlOHbsmG3kvSt42ukf\nnvJyDPvlGPbLfm3Vq4LL9Vj5z+Pw9Vbi5Z/fhtBA95zohp8tx7TV6fpWD5XXrFmD6upq7N69G0VF\nRbb7rVYrsrOzsXTpUpcGPBFRZ9ZgsGDdZzmQJIFHpya5bcCT67Qa8vfeey8KCgpw6NAh3H777bb7\nFQoFfvWrXzm9OCIidyVJAu9vzUVtoxnT7+yOpPgQV5dEbqjVkB8wYAAGDBiAe+65B/7+vPSBiKit\nfP7f8zhVVIuBPUIxaXisq8shN2XXyLadO3di5cqVaGhoANAy2E4mk+H06dNOLY6IyB1lnq3CtgPF\n0Ab5YPGUvpBzshtyErtC/u2338bGjRuRmJjo7HqIiNxaRZ0RG7adgpdSjl9N7w+1j5erSyI3Ztd1\n8uHh4Qx4IqJbZG2WsGFrLoxmK352byJiwvk1KDmXXUfySUlJePLJJzFixAh4e3vb7p82bZrTCiMi\ncjfp+4tQcLkBQ/uGY2R/TnhDzmdXyOt0OqjV6qsWjWHIExHZ58zFOmw7WITQAB/Mu7cXF52hdmFX\nyHNddyKim2cwNWFDei4A4BdT+8LPh7N5Uvuw65M2ZsyYa/7WuWvXrjYviIjInQghsHHHGVQ3mDF1\nRBwSooNcXRJ5ELtCfuPGjbY/W61W/Oc//4HFYnFaUURE7uJgbhkOnypHj64BmDIiztXlkIexa3R9\nVFSU7b/Y2FgsXrwYO3fudHZtRESdWkWdEf/ccQY+KgUemZoEhdwlC3+SB7PrSP7o0aO2PwshcPbs\nWZjNZqcVRUTU2TVLLZfLmSzNWDy5D7oE+bq6JPJAdoX8X/7yF9ufZTIZgoODbWu/ExHR1X58udzw\npAhXl0MeyqHv5HU6HSRJQkBAgFOLIiLqzM5crEP6AV4uR65nV8hfvHgRTz31FC5evAghBLp27Yo3\n3ngD8fHxzq6PiKhTkITA2Yt1+O/JUhzLqwTAy+XI9ez69L300ktYvHgx7rvvPgDAV199hZdeeumK\nUfdERJ6oss6IAzll2H+yFFX1JgBAWKAP7h+ZyMvlyOXsCvna2lpbwAPAxIkTsX79eqcVRUTUkTVZ\nJew6egHbDxQi70IdAMDbS4ER/SMwsn8kEroFcWU56hDsCnmVSoXc3FwkJSUBAHJycuDry5GiROR5\nzlysw0fb81BabQAA9I4Jwoj+kRjcSwsfFU/NU8di1yfyhRdewBNPPIGgoCAIIVBfX4/XX3/d2bUR\nEXUYBpMVaXsL8O2JS5ABmDQiHnf2j4CWl8ZRB2ZXyCcnJ+Obb75BUVERJElCfHw8VCqVs2sjIuoQ\njp+pxD935KNOZ0FUmBo/n9Abw5KjUVnZ6OrSiFplV8ifP38eH3/8Merr66+4nwvXEJE7q2004//9\n5wwyzlRCqZBh+qh4TBgWC6WCM9dR52BXyC9ZsgQTJ05Er169nF0PEZHLGc1WHMgpw6f7zsNotiIh\nOhA/n9AbkaFqV5dG5BC7Qj4gIABLlixxdi1ERC5jbZaQc74GB3PLkHmuCk1WCb7eCswf3wt3Jnfl\naHnqlOwK+enTp+ONN97AsGHDoFT+sMttt93mtMKIiJxNCIGCSw04mFuGo3kV0BmbAAARIX4YnhSO\nUQO7Ikjj7eIqiW6eXSF/5MgRnDx5EsePH7fdJ5PJ8I9//MNphREROUttoxnfZV/G/pOlqKxrmcAm\nQK3CuCHdMLxfOGLD/TkVLbkFu0I+JycHO3bscHYtREROIwmB3MIafHviErLOVUMSAt5eCgxPisDw\npHD0iQvmUrDkduwK+cTEROTl5aF3797OroeIqE3V6y34b/Zl7M28bJt2NiZcg7sGRWFon3D4enMC\nG3Jfdi9Q88ADDyAsLAxeXl62+3ft2uW0woiIHCVJApV1Rlyq0uNSpQ5FZY3ILqhGsySg8pJj1IBI\n3DUoCnEWn4Y8AAAgAElEQVQRPB1PnsGukH/ttdewb98+6PV6REVFobm5GYcPH3Z2bURE16U3NaHw\ncgOKyxtxuUqPS1V6lFYb0GSVrnhctFaN0clRGJ4UwRXhyOPY9YlftWoVjEYjLly4gCFDhuDo0aNI\nTk6+4X5ZWVlYs2bNVavV7d69G++88w6USiVmzJiB2bNnQ5IkLFu2DPn5+VCpVHj11VcRGxt7c++K\niNxKsyShpEKP85frcf5yAwouN6CsxnDFY7yUcnQNVaNrmBpR2paf0WFqhAb68KidPJZdIV9YWIgd\nO3bgT3/6E2bMmIHf/va3+PWvf93qPhs2bMDWrVuvWsimqakJK1asQFpaGnx9fZGamooxY8bg+PHj\nsFgs2Lx5MzIzM7Fy5UqudEfkoYQQKK02IPNcFbILqlFU2gDLj47QfVQK9IkNRo+oAMRFBCBKq4Y2\n0BdyOcOc6MfsCvnQ0FDIZDLEx8cjPz8f06ZNg8ViaXWfmJgYrF27Fr/97W+vuL+goAAxMTEIDAwE\nAAwePBhHjx5FZmYmRo0aBaBlrvycnJybeT9E1ElJksC5S/XIPFuFE2crUV5rBADIZEBUmAbduwag\nR9cAdO8agMhQNQOdyA52hXxCQgJeeeUVpKam4plnnkFFRQWamppa3Wf8+PEoKSm56n6dTgd/f3/b\nbbVaDZ1OB51OB41GY7tfoVDAarVeMfkOEbkXk8WKU0W1yDxbhayCKjQaWv5d8fZSYHCiFskJYRjY\nMwwaX68bPBMRXYtdCbps2TKcOHECPXv2xBNPPIGDBw/itddeu6kX1Gg00Ov1ttt6vR7+/v5X3S9J\nkl0BHxzsB6VScVO1dFZarf+NH0Q27JdjnN2vsmo9jp0ux9FT5cg+VwVrc8tp+GB/b4wfFouhSREY\nmKCFyqvj/73mZ8sx7Jdj2qJfdoW8QqHAkCFDAABjx47F2LFjb/oFe/TogeLiYtTV1cHPzw/Hjh3D\nokWLIJPJsGfPHkycOBGZmZlITEy06/lqaw03fpAb0Wr9ubylA9gvxzjar2ZJwuniWhw9XQGTpRlq\nXy+ofZRQ+3z/8/vbkiSQU1iDrIJqXK764Zf5mC4aDOgZioE9wxAfGWCbH76+ruP/veZnyzHsl2Na\n65cj4d9u58LT09NhMBgwZ84cPPfcc1i0aBGEEJgxYwbCw8Mxbtw47N+/H3PnzoUQAsuXL2+v0ojI\nAUIIFJU14mBuGY6crkCDvvXxOT+mUsoxsEdLqA/oEYqQAB8nVkpEMiGEcHURt8LTfjPkb8OOYb8c\n01q/ymsNOJRbjkOnylH+/eVrGl8v3Na7C4YlhaNLkC90Jiv0xiboTU0wfP9nnckKa7OE3jFB6B0T\n3ClOw9uDny3HsF+O6XRH8kTU+ZibmnEsrwL7si7jbEk9gJbr0W/v0wXDkiLQLz4ESsUP870HcsU2\nog6FIU9EVykua8S+7Ms4lFsOo9kKAOgTG4w7+kUgJVHL+d6JOgn+TSUiAIDRbMXXB4vw5X/Po7is\n5TRhkEaFsYNjMWpAV2iDfFt/AiLqcBjyRISCS/V44+MsGMxWyGUyJPcMw50Du6J/jxAuv0rUiTHk\niTxcUVkDXv84C2ZLMx68txcGJ4Qh2J/frRO5A4Y8kQe7WKHDa5syYTJb8cjUvpgyOoEjoIncCM/D\nEXmoy1V6rNl0AnqTFQ9P7INhfSNcXRIRtTGGPJEHKq8xYPWmE2g0NGH++F4YOSDS1SURkRMw5Ik8\nTGWdEX/+9wnU6yxIHZuAuwZFubokInIShjyRB6lpMGH1v0+gttGMWXf1wLjburm6JCJyIoY8kYeo\n05mx+t8nUFVvwrSR8ZgwLNbVJRGRk3F0PZGbkySBfdmX8ene89AZmzBpeCymjIhzdVlE1A4Y8kRu\nLP9CLf698ywuVOjgrVJgzpieuPe2bpB9v6QrEbk3hjyRG6quN+HjPedwNK8CAHBHvwjMGN2Dk9wQ\neRiGPJEbMTc1Y/vhC/j6UDEsVgnxkQF4cFwCenQNdHVpROQCDHkiN2C2NOO/J0vx9eFi1DSYEahW\nYd74HhjeLwJynpon8lgMeaJOrEFvwa6MEuw+XgK9yQqlQo4Jw2IweXgcl4MlIoY8UUdSWq3H6eJa\nBKq9ER7sC22wL7y9FFc9rqzGgB1HLmB/ThmarBLUPkpMuSMOYwdHI0CtckHlRNQRMeSJXMxsacaR\nvHJ8l12KcyX1V20P9m8J/C7BvugS7IeCS/XIPFsFAUAb5IN7b4vByP6R8FZd/csAEXk2hjyRCwgh\ncL60Ad9lleLI6XKYLM2QAUiKC8ZtfcJhMFlRUWtAea0RFbUG5F+oQ96FOtv+8ZH+uG9oLAYnaiGX\n8zt3Iro2hjxRO/tvdim+OXIBl6r0AICQAG/ce1s3jOwfibAg32vu02RtRkWdCRU1BvirVejRNYDX\nuhPRDTHkidrRvqzL+PDrPCjkMgzppcWogV2RFBdyw6NxL6UCUWFqRIWp26lSInIHDHmidnKqqAYb\nv8mHxtcLL8wbjIgQP1eXRERujgvUELWDy1V6vPNZDmQyYMkD/RnwRNQuGPJETtagt+DNLVkwmq14\neEIfJHYLcnVJROQhGPJETtRkbcbaT7NRVW/C1BFxGN4vwtUlEZEHYcgTOYkkBP765WkUXGrAsL7h\nuH9kvKtLIiIPw5AncpLPvyvEkdMV6BkdiIcn9uYlb0TU7hjyRE6w/2Qpth0oQpcgXzzxQH94KTkb\nHRG1P15CR9RGJCFQXNaInPPV2Lq/CH7eSvx61gD4+3EueSJyDaeFvCRJWLZsGfLz86FSqfDqq68i\nNjYWAFBZWYmlS5faHnv69Gk8/fTTSE1NxfTp06HRaAAA0dHRWLFihbNKJLplNQ0m5BTW4FRRDU4V\n1UJnbAIAqJRy/OqB/ogM5eQ1ROQ6Tgv5nTt3wmKxYPPmzcjMzMTKlSuxfv16AIBWq8XGjRsBACdO\nnMAbb7yB2bNnw2w2Qwhh20bUETXoLdh+5AKyzlWhtNpguz/Y3xsjB0SiX3wI+saFQOPr5cIqiYic\nGPIZGRkYNWoUACA5ORk5OTlXPUYIgVdeeQVr1qyBQqFATk4OjEYjFi5cCKvViqVLlyI5OdlZJRI5\npMnajP8cK8G2A0UwWZrh7aXAgB6hSIoPQb/4EESE+HFwHRF1KE4LeZ1OZzvtDgAKhQJWqxVK5Q8v\nuXv3biQkJKB79+4AAB8fHyxatAizZs1CUVERHnnkEWzfvv2KfYjamxACR/MqkPZtAarqTdD4euGh\ncT1w58Cu8FJy7CoRdVxOS0+NRgO9Xm+7LUnSVWG9detWzJ8/33Y7Pj4esbGxkMlkiI+PR1BQECor\nKxEZGXnd1wkO9oPSw0Yua7X+ri6hU7mVfp25UIsPvsjB6aIaKBUyTBvdA3PuSYTGjQfT8fNlP/bK\nMeyXY9qiX04L+ZSUFOzZswcTJ05EZmYmEhMTr3pMTk4OUlJSbLfT0tJw5swZLFu2DOXl5dDpdNBq\nta2+Tm2todXt7kar9UdlZaOry+g0brZfFXVGfP7deRzKLQcApCRqMevuHggP9oNRb4ZRb27rUjsE\nfr7sx145hv1yTGv9ciT8nRby48aNw/79+zF37lwIIbB8+XKkp6fDYDBgzpw5qKmpgUajueI7zJkz\nZ+L5559HamoqZDIZli9fzlP11G4kSSD7fDW+PXEJJwuqIQDEhGswd0wCescGu7o8IiKHyYQQwtVF\n3ApP+82Qvw07xp5+Negt+C77Mr49cRnVDSYAQI+oAIxNicbtfcMh96DBdPx82Y+9cgz75ZgOfyRP\n1NGdK6nH7uMlOJpXgWZJQOUlx+jkrrh7UBRiwvndIRF1fgx58ji1jWb8e+cZHMuvBABEhvphTEo0\nhidFwM+HfyWIyH3wXzTyGJIksOt4CT7bdx4mSzN6RgXigTu7o1dMEK9vJyK3xJAnj1BY2oB/fJOP\n4rJGqH2U+PmE3hg5INKjvm8nIs/DkCe3pjc24V//OYPdx0sgBDA8KQJzxvREgNp9r3MnIvofhjy5\nraxzVdi4Ix81DWaEh/hh/r2J6BMX4uqyiIjaDUOe3NLhU+V4Pz0XSoUc00bGY8KwWE5BS0QehyFP\nbud/Ae+jUuCVR+9AiB9XgyMiz8RDG3Irh06V2QJ+6Zxk9Irl6Xki8lw8kie3cSi3DBu2nYKPSomn\n5ySje9cAV5dERORSDHlyCwdzy/DB9wH/zNxkxEcy4ImIGPLU6R3MKcMHXzLgiYh+iiFPndr/At5X\npcTTDHgioisw5KnTkIRAWbUBhaUNtv+KShvh663EM6nJiItgwBMR/RhDnjq0/Au1OHm+piXQyxpg\nNDfbtinkMnSPCsDPxvVCbARXjSMi+imGPHVI1fUmbNp1FhlnKm33RYb6YVBCAOIjW/7r1kXDCW6I\niFrBkKcOpckqYfuRC/jyQBEsVgkJ0YGYckccuncNgJ8PJ7UhInIEQ546jJPnq/Gv/5xBRa0RAWoV\n5t/XA8OTIrgMLBHRTWLIk8tV1Rnx711nceJsFeQyGe4ZEo1pI7vDz4cfTyKiW8F/RcklzJZm5BRW\n48TZKhzLq7Cdmv/Zvb3QrYvG1eUREbkFhjy1m3q9BVnnqnD8TCVOFdXC2iwBAEICvDHjzh4YlhTO\nU/NERG2IIU9OZbJYsS/zMo7mV+D8pQaI7++P0qoxKCEMgxK0iI3wh5zhTkTU5hjy5BTmpmbsOX4J\nXx8uRqOhCTIZkNAt6PtgD0OXYD9Xl0hE5PYY8tSmmqzN+DbzMr46WIx6vQW+3gpMHRGHMYOjEeCn\ncnV5REQehSFPbcLaLOG7rMvYdrAYtY1meKsUmHxHLO69LQYaX17fTkTkCgx5umU5hdX46Ot8VDeY\noFLKMWFoDO4bGgN/HrkTEbkUQ55uSWFpA97+5CQkITBuSDdMHB6LQDXDnYioI2DI002rrjfhL2nZ\naGqW8MQDA5CcEObqkoiI6Ee4ugfdFKPZijfTslCvt2Du2AQGPBFRB8SQJ4c1SxLWf56DS5V6jE2J\nxrgh3VxdEhERXQNDnhwihMC//nMWOYU1GNAjFHPv6enqkoiI6Dqc9p28JElYtmwZ8vPzoVKp8Oqr\nryI2Nta2/cMPP8SWLVsQEhICAPjDH/6AuLi4Vvch19tx9CK+PXEJ3bpo8OjUJCjk/D2RiKijclrI\n79y5ExaLBZs3b0ZmZiZWrlyJ9evX27bn5ORg1apV6Nevn+2+HTt2tLoPudbxM5X4ePc5BGlU+PXM\nAfD15rhNIqKOzGn/SmdkZGDUqFEAgOTkZOTk5FyxPTc3F++//z4qKytx11134dFHH73hPuQ6haUN\neH9rLlReCvx65kCEBPi4uiQiIroBp4W8TqeDRvPDkqEKhQJWqxVKZctLTpo0CQ8++CA0Gg2WLFmC\nPXv23HCfawkO9oNSqXDW2+iQtFr/dnmdiloDThfW4HRRDb7LvARrs4QXHx6KIUkR7fL6baW9+uUu\n2C/7sVeOYb8c0xb9clrIazQa6PV6221JkmxhLYTAggUL4O/f8gZGjx6NU6dOtbrP9dTWGpxQfcel\n1fqjsrKxzZ+3WZJQUqHH2ZI6nLtUj7Ml9ahtNNu2q5Ry/OzeXojvonbK6zuLs/rlrtgv+7FXjmG/\nHNNavxwJf6eFfEpKCvbs2YOJEyciMzMTiYmJtm06nQ6TJ0/GV199BT8/Pxw+fBgzZsyAyWS67j7U\ndpqsEi5V6VBc1oji8pafJZU6NFkl22MC/LyQkqhFz6hAJHQLRGy4P5QKDrIjIupMnBby48aNw/79\n+zF37lwIIbB8+XKkp6fDYDBgzpw5eOqppzB//nyoVCoMHz4co0ePhiRJV+1Dt04IgbMl9TiQU4ai\nsgZcqtSjWRK27Qq5DFFhasRF+qNnVBASugWiS5AvZFzjnYioU5MJIcSNH9ZxedrpH0dOeTVZJRw5\nXY6dx0pQXN6yj5dSjm5dNIgN90dshD9iw/3RNUwNL6V7HqXzFKFj2C/7sVeOYb8c0+FP15Pr1OvM\n2HPiEr49cQkNhibIZMDgRC3GDo5GQrdAXttOROQhGPJuwmCyoqisAftPluLI6Qo0SwJ+3krcd3sM\nxqREISzI19UlEhFRO2PIu4C1WbqlQWx6U1PLoLmyRhSXN6KorBEVtUbb9shQP9wzpBvuSIqAt8qz\nLi8kIqIfMOTbkc7YhL99eRqZ56qg8fWCNsgHoYG+0Ab6ICzQB2FBvggL9EGzJFCvt6BeZ/7+p8V2\nu1ZnQXnNlZcNqn2U6BMbjLgIf/SNC0HfuGAOmiMiIoZ8eym4XI93P89BdYMZUWFqWCWBixU6FJY6\nNhAlSOONpLhgxEYEIC6iZfBcWKAPQ52IiK7CkHcyIQR2ZpTg493nIEkC00bFY/LwOMjlMkhCoF5n\nQVW9EVX1JlTVtfxUKuQIVKsQoFEhSO2NQI2q5bZahciIQI5QJSIiuzDknchgsuLDr0/jWH4lAvy8\n8IupSegbF2LbLpfJEOzvjWB/byREu7BQIiJySwx5J7lQ3oh1n+egotaIxG5BeHRqEoL9vV1dFhER\neRCGvBPsy7qMf+44A2uzhInDYjH9znhem05ERO2OId+GhBD44r+F2Lq/CGofJX45vR+Se4a5uiwi\nIvJQDPk2IoRA2t4CfH3oArRBPnh67iB04QQ0RETkQgz5NiCEwL93ncXOYyUID/HDs3OTERLg4+qy\niIjIwzHkb5EkBP75TT6+zbyMqDA1npmbjEANB9gREZHrMeRvgSQJ/P3r09h/sgwxXTR4em4y/P1U\nri6LiIgIAEP+pjVLEj7YdhqHT5UjPtIfT81OhsbXy9VlERER2TDkb4K1WcJ7X+Qi40wlekYF4jez\nBsLPh60kIqKOhcnkoDqdGRvST+F0cS16xwThyZkD4KNiG4mIqONhOjkgu6Aaf/3yFBoNTRiUEIZf\nTE2CtxeXciUioo6JIW+HJquET/YWYMfRi1AqZEi9JwH3DI7mym9ERNShMeRvoLzGgHe/yEVxeSPC\nQ/zw2NQkxEb4u7osIiKiG/LIkDdZrMgtrIHeZEVooA+0gT4ICfCBUnHl/PIHckqxcccZmC3NGNk/\nEg+OS+D370RE1Gl4TGIZTE3IPFeFjPxK5BTWoMkqXbFdJgNC/L0RFuiLsCAfGExWnDhbBR+VAr+Y\n2hfD+ka4qHIiIqKb49Yh36C34MTZSmTkV+J0cS2aJQEA6BqmxuBELcKCfFBdb0JlnQlV9UZU1Ztw\n5mId8i+27B8f6Y9HpyahS7CfC98FERHRzXGLkJckgYo6I0oqdCip1KGkUo+SSh0qa40Q3z8mNtwf\ng3tpMbiXFpGh6us+V5NVQk2DCTpjE2Ij/K86hU9ERNRZdPqQ/+OHR3G5Sg/LT06/a3y90Ds2GAN6\nhCIlUQutnSvCeSnlCA/xQ7gziiUiImpHnT7kSyp16BqqRnQXDaK1GkRr1YjSahCkUfESNyIi8mid\nPuTXPz0aCjlPqRMREf1Up09HBjwREdG1MSGJiIjcFEOeiIjITTntO3lJkrBs2TLk5+dDpVLh1Vdf\nRWxsrG37tm3b8NFHH0GhUCAxMRHLli2DXC7H9OnTodFoAADR0dFYsWKFs0okIiJya04L+Z07d8Ji\nsWDz5s3IzMzEypUrsX79egCAyWTCm2++ifT0dPj6+mLp0qXYs2cPRo4cCSEENm7c6KyyiIiIPIbT\nTtdnZGRg1KhRAIDk5GTk5OTYtqlUKmzatAm+vi3XrlutVnh7eyMvLw9GoxELFy7E/PnzkZmZ6azy\niIiI3J7TjuR1Op3ttDsAKBQKWK1WKJVKyOVyhIWFAQA2btwIg8GAESNG4MyZM1i0aBFmzZqFoqIi\nPPLII9i+fTuUyuuXGRzsB6XSs9Z012q5Cp4j2C/HsF/2Y68cw345pi365bSQ12g00Ov1ttuSJF0R\n1pIkYfXq1SgsLMTatWshk8kQHx+P2NhY25+DgoJQWVmJyMjI675Oba3BWW+hQ9Jq/VFZ2ejqMjoN\n9ssx7Jf92CvHsF+Oaa1fjoS/007Xp6SkYN++fQCAzMxMJCYmXrH9pZdegtlsxrp162yn7dPS0rBy\n5UoAQHl5OXQ6HbRarbNKJCIicmtOO5IfN24c9u/fj7lz50IIgeXLlyM9PR0GgwH9+vVDWloahgwZ\nggULFgAA5s+fj5kzZ+L5559HamoqZDIZli9f3uqpeiIiIro+mRBC3PhhHZennf7hKS/HsF+OYb/s\nx145hv1yTIc/XU9ERESu1emP5ImIiOjaeCRPRETkphjyREREboohT0RE5KYY8kRERG6KIU9EROSm\nGPJERERuitPJdSBZWVlYs2YNNm7ciNzcXLz88stQqVTo06cPXnzxReTn52P58uW2x2dmZuKdd97B\n7bffjmeffRbV1dVQq9VYtWoVQkJCXPhO2seN+iWXy/G3v/0N27Ztg0wmw2OPPYZx48bBZDKxX9fp\n1/vvv48vv/wSGo0Gixcvxt133+1R/WpqasILL7yAS5cuwWKx4PHHH0fPnj3x3HPPQSaTISEhAS+/\n/DLkcjk+/vhjbNq0CUqlEo8//rjH9QpwrF8AUFNTg9TUVGzduhXe3t7sVyv9+vDDD/Hll18CAEaP\nHo0lS5bcXL8EdQjvv/++mDx5spg1a5YQQojp06eLjIwMIYQQr7/+uvj888+vePxXX30lli5dKoQQ\n4m9/+5v4y1/+IoQQYtu2beKVV15px8pdw55+1dfXi9GjRwuz2Szq6urEXXfdJYRgv4S4dr/y8vLE\nlClThMlkEiaTSUybNk0YDAaP6ldaWpp49dVXhRBC1NbWitGjR4tHH31UHDp0SAghxO9//3uxY8cO\nUVFRISZPnizMZrNoaGiw/dmTeiWE/f0SQoh9+/aJ+++/XwwaNEiYTCYhhOf9XbS3XxcuXBDTp08X\nVqtVSJIk5syZI06fPn1T/eLp+g4iJiYGa9eutd0uLy9HSkoKgJbFfjIyMmzbDAYD1q5dixdffBEA\nkJGRgVGjRgEA7rzzThw8eLAdK3cNe/rl6+uLrl27wmg0wmg0QiaTAWC/gGv3q6CgALfffju8vb3h\n7e2N2NhY5Ofne1S/7rvvPvz6178GAAghoFAokJubi9tvvx1Ay/s/cOAAsrOzMWjQIKhUKvj7+yMm\nJgZ5eXke1SvA/n4BgFwux9///ncEBQXZ9me/rt2viIgIfPDBB1AoFJDJZLBarfD29r6pfjHkO4jx\n48dfsRhPt27dcOTIEQDAnj17YDQabdvS0tJw33332U7T6HQ6+Pu3zGWsVqvR2Oj+80Pb26/IyEhM\nmjQJ06dPx/z58wGwX8C1+9WrVy8cO3YMOp0OtbW1OHHiBIxGo0f1S61WQ6PRQKfT4cknn8RvfvMb\nCCFsvyD+7/3/uCf/u1+n03lUrwD7+wUAI0aMQHBw8BX7s1/X7peXlxdCQkIghMCqVavQt29fxMfH\n31S/GPId1PLly/Hee+9hwYIFCA0NveIvR3p6OmbNmmW7rdFooNfrAQB6vR4BAQHtXq+rXatf+/bt\nQ0VFBXbt2oVvv/0WO3fuRHZ2NvuFa/erR48eeOihh7B48WK88sorGDhwIIKDgz2uX6WlpZg/fz7u\nv/9+TJkyxfZ9MvDD+/9xT/53v7+/v8f1CrCvX9fDfl2/X2azGc888wz0ej1efvllADfXL4Z8B7V3\n716sWbMGH330Eerq6jBixAgAQGNjIywWCyIjI22PTUlJwd69ewEA+/btw+DBg11Ssytdq1+BgYHw\n8fGBSqWCt7c3/P390dDQwH7h2v2qqamBXq/Hpk2b8Ic//AGlpaVISEjwqH5VVVVh4cKFePbZZzFz\n5kwAQN++fXH48GEALe9/yJAhGDBgADIyMmA2m9HY2IiCggIkJiZ6VK8A+/t1PezXtfslhMAvf/lL\n9OrVC3/84x+hUCgA3Fy/OLq+g4qNjcXPf/5z+Pr6YujQoRg9ejQAoLCwEFFRUVc8NjU1Fb/73e+Q\nmpoKLy8vvPbaa64o2aWu168DBw5g9uzZkMvlSElJwYgRIzB48GD26xr9EkLg/PnzmDFjBry8vPDb\n3/4WCoXCoz5f7777LhoaGrBu3TqsW7cOAPDiiy/i1Vdfxeuvv47u3btj/PjxUCgUmDdvHh588EEI\nIfDUU0/B29vbo3oF2N+v62G/rt2vnTt34siRI7BYLPjuu+8AAEuXLr2pfnEVOiIiIjfF0/VERERu\niiFPRETkphjyREREboohT0RE5KYY8kRERG6KIU/khp5//nmMHz8evXr1arPnfO655/Dpp5+22fMR\nkfPxOnkiN/TZZ58hOzsbKpXK1aUQkQsx5InczGOPPQYhBO644w40NTUhKysL6enptgUvoqOjsXr1\naqhUKqxZswY7d+6EQqHAnDlzsGDBAhw5cgRvvPEGTCYT6uvr8eyzz2LChAk3fN0tW7bg0KFDtgk6\n3n77bahUKjz00EP44x//iLNnz6K5uRmPPPIIJk+eDJ1OhxdeeAHl5eWoqKjAkCFD8Oc//xlHjhzB\n6tWrIUkSEhISsGrVKme3jMh9tek6ekTUISQmJoqLFy+Ku+++WwghxJgxY0RVVZUQomVp2VOnTomv\nvvpKzJ07V5jNZqHT6cTUqVNFRUWFeOKJJ8S5c+eEEEIcOHBATJ48WQghxO9+9zvxySefXPc1dTqd\nGD58uNDpdEKSJHHvvfeKsrIysXr1avHRRx8JIYRobGwUkyZNEhcuXBDp6eli3bp1QgghzGazuOee\ne8TJkyfFoUOHxODBg0VDQ4PT+kPkKXgkT+QB7r77bqSmpmLs2LEYP348+vTpgy1btmDChAlQqVRQ\nqVT44osvAACrV6/Gnj17sH37dmRlZV2xEEtr1Go1Ro8ejR07dqBbt27o1q0bwsPDceDAAZhMJnzy\nyScAWpZKPnv2LCZPnozs7Gx8+OGHOH/+POrq6mAwGAAA8fHxV6zyRkQ3hyFP5AH+7//+D3l5edi7\ndyDTv5YAAAHlSURBVC+effZZLFmy5IqlZwGgpKQEISEhmDdvHoYOHYqhQ4di+PDheOaZZ+x+nRkz\nZmD9+vWIjo7GAw88AACQJAmrV69GUlISgJZFOgIDA7Fx40Z88803mD17Nu644w6cOXMG4vtZtn18\nfNronRN5No6uJ3JzVqsV9957L4KDg/Hoo4/+//bukFVhMArj+F8wCIrVto8gCgo2y5oyQRhrOsEo\nmkyWJcEvYfVDmBfsggaDQVBsJgdj8zbTDYYrXl6fX33DC6c8nMOBQ6fTYbfbUa/XWa/XxHHM/X5n\nOBxyOBw4Ho9MJhOazSZhGJIkyct/1Wo1LpcLm80G27YBaDQarFYrAK7XK47jcD6fCcMQz/NwHIdM\nJsN+vydN07fUQORbqZMXMVw2m2U8HjMYDMjlchSLRRaLBaVSie12S7fbJU1Ter0e5XIZ13VptVoU\nCgUqlQpRFD3H6K+wbZvb7fbc7B+NRgRBQLvdJkkSptMplmXR7/cJgoDlckk+n6darXI6nbAs612l\nEPk6ukInIn/i8XgQxzG+7zObzZ7jeRH5HHXyIvKyKIrwPO/XN9/3mc/nuK6rgBf5J9TJi4iIGEqL\ndyIiIoZSyIuIiBhKIS8iImIohbyIiIihFPIiIiKGUsiLiIgY6gdStt0mJAJxtgAAAABJRU5ErkJg\ngg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# use Seaborn library to plot a Pandas DataFrame\n", "plot_df = outlays[['fiscal_year', 'function_name', 'amount_adj']]\n", "plot_group = plot_df.groupby(['fiscal_year', 'function_name']).sum().reset_index()\n", "sns.set(style=\"darkgrid\")\n", "ax = sns.tsplot(data=plot_group, time='fiscal_year', unit='function_name', value='amount_adj', err_style=None)" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "fiscal_year\n", "1962 18.201\n", "1963 17.974\n", "1964 17.880\n", "1965 16.635\n", "1966 17.206\n", "1967 18.786\n", "1968 19.808\n", "1969 18.695\n", "1970 18.649\n", "1971 18.777\n", "1972 18.916\n", "1973 18.120\n", "1974 18.124\n", "1975 20.634\n", "1976 20.767\n", "1977 20.174\n", "1978 20.136\n", "1979 19.612\n", "1980 21.129\n", "1981 21.611\n", "1982 22.503\n", "1983 22.828\n", "1984 21.549\n", "1985 22.161\n", "1986 21.833\n", "1987 20.996\n", "1988 20.648\n", "1989 20.534\n", "1990 21.185\n", "1991 21.673\n", "1992 21.470\n", "1993 20.742\n", "1994 20.308\n", "1995 19.988\n", "1996 19.559\n", "1997 18.874\n", "1998 18.453\n", "1999 17.894\n", "2000 17.628\n", "2001 17.633\n", "2002 18.488\n", "2003 19.060\n", "2004 18.967\n", "2005 19.179\n", "2006 19.402\n", "2007 19.051\n", "2008 20.217\n", "2009 24.404\n", "2010 23.361\n", "2011 23.428\n", "2012 22.068\n", "2013 20.940\n", "2014 20.404\n", "2015 20.717\n", "2016 21.391\n", "2017 21.485\n", "2018 21.621\n", "2019 22.102\n", "2020 22.261\n", "2021 22.401\n", "dtype: float64" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# group by years and calculate spending as % of GDP\n", "\n", "by_year = outlays.groupby('fiscal_year')\n", "def percent_gdp(group):\n", " spending = group['amount'] * 1000\n", " gdp = group['gdp_billions'] * 1000000000\n", " return spending.sum() / gdp.max() * 100\n", "\n", "by_year.apply(percent_gdp)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Useful tools:\n", "\n", "* [Applying operations over pandas DataFrames](http://chrisalbon.com/python/pandas_apply_operations_to_dataframes.html \"Applying operations over Pandas DataFrames\")\n", "* Pandas docs: [apply-split-combine](http://pandas.pydata.org/pandas-docs/stable/groupby.html \"Pandas: apply-split-combine\")\n", "* Pandas docs: [computational tools](http://pandas.pydata.org/pandas-docs/stable/computation.html \"Pandas: computational tools\") (includes info on percent change, correlation, window functions)\n", "* Pandas docs: [time series/date functionality](http://pandas.pydata.org/pandas-docs/stable/timeseries.html \"Pands: time series\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Step 5: Output" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# filter out the columns we don't want to write out\n", "# (ran out of time to find a way to get rid of both columns in one line)\n", "outlays_tidier = outlays.loc[:, outlays.columns != 'deflator']\n", "outlays_tidier = outlays.loc[:, outlays.columns != 'gdp_billions']\n", "\n", "# write to a .csv\n", "outlays_tidier.to_csv('data/outlays_done.csv', index=False)" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# dump to SQL db\n", "\n", "db_engine = create_engine('sqlite:///data/pandas_data_munging.db')\n", "outlays_tidier.to_sql(\n", " 'outlays',\n", " db_engine,\n", " if_exists='replace',\n", " index=False\n", ")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Resources\n", "\n", "* Python for Data Analysis (by Wes McKinney): http://shop.oreilly.com/product/0636920023784.do\n", "* Common Excel Tasks Demonstrated in Pandas: http://pbpython.com/excel-pandas-comp.html\n", "* Pandas Cookbook: http://pandas.pydata.org/pandas-docs/stable/cookbook.html\n", "* Useful Pandas Snippets: https://gist.github.com/bsweger/e5817488d161f37dcbd2\n", "* [Chris Albon's](http://chrisalbon.com/) series of tutorials (scroll down to _Data Wrangling_)\n", "* Pandas docs: [how to do SQL operations in Pandas](http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html)\n", "* Giant panda image courtesy: http://www.clipartlord.com/2015/11/23/free-giant-panda-clip-art-3/" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 1 }