{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"![panda](img/panda.png)\n",
"# Data Munging with Python and Pandas\n",
"\n",
"Becky Sweger, 18F Engineering Team"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## What is Pandas?\n",
"\n",
"From the [docs](http://pandas.pydata.org/pandas-docs/stable/):\n",
"\n",
"> pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"source": [
"Demo getting a dataframe:\n",
"\n",
"`eiti = pd.read_excel('https://useiti.doi.gov/downloads/federal_revenue_onshore_acct-year_FY04-14_2015-11-20.xlsx')`\n",
"\n",
"Slice:\n",
"\n",
"`eiti[eiti['Total']>10000]`\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Pandas\n",
"\n",
"* Series: 1-dimensional\n",
" * think column in a spreadsheet\n",
"* Dataframe: 2-dimensional:\n",
" * think spreadsheet\n",
" * analagous to an R data.frame"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"![Pandas ecosystem](img/pandas-ecosystem.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Pandas is good for:\n",
"\n",
"* fast, repeatable data wrangling\n",
"* structured-ish data\n",
"* integration with Python data stack\n",
"* glue"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Array Programming\n",
"\n",
"Pandas is built on top of [Numpy](http://www.numpy.org/), a Python library that enables array programming\n",
"\n",
"* fast, efficient operations on arrays of homogeneous data\n",
"* you operate on aggregates of data; no explicit loops\n",
"* many operations performed via compiled code"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Data Munging\n",
"\n",
"Step 0: if you're new to Python, skip the yak shaving and [download the Anaconda Python distribution](https://www.continuum.io/downloads \"download Anaconda\").\n",
"\n",
"1. Get the data\n",
"2. Explore\n",
"3. Clean\n",
"4. Augment & Prep\n",
"5. Output"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Step 1: Get The Data"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Pandas makes it easy to create a DataFrame from existing data.\n",
"\n",
"The [U.S. Federal Budget](https://www.whitehouse.gov/omb/budget \"The President's Budget\")\n",
"* [Outlays](https://www.whitehouse.gov/omb/budget/Supplemental \"supplemental materials\") (scroll down to _Public Budget Database_)\n",
"* [\"Deflators\"](https://www.whitehouse.gov/omb/budget/Historicals \"historical tables\") (scroll down to _Table 10.1_)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"# import things\n",
"import pandas as pd \n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib\n",
"matplotlib.style.use('ggplot')\n",
"%matplotlib inline\n",
"pd.set_option('display.float_format', lambda x: '%.3f' % x)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Agency Code | \n",
" Agency Name | \n",
" Bureau Code | \n",
" Bureau Name | \n",
" Account Code | \n",
" Account Name | \n",
" Treasury Agency Code | \n",
" Subfunction Code | \n",
" Subfunction Title | \n",
" BEA Category | \n",
" ... | \n",
" 2012 | \n",
" 2013 | \n",
" 2014 | \n",
" 2015 | \n",
" 2016 | \n",
" 2017 | \n",
" 2018 | \n",
" 2019 | \n",
" 2020 | \n",
" 2021 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Legislative Branch | \n",
" 0 | \n",
" Legislative Branch | \n",
" nan | \n",
" Receipts, Central fiscal operations | \n",
" nan | \n",
" 803 | \n",
" Central fiscal operations | \n",
" Mandatory | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" Legislative Branch | \n",
" 0 | \n",
" Legislative Branch | \n",
" nan | \n",
" Receipts, Central fiscal operations | \n",
" nan | \n",
" 908 | \n",
" Other interest | \n",
" Net interest | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" Legislative Branch | \n",
" 0 | \n",
" Legislative Branch | \n",
" 241400.000 | \n",
" Charges for services to trust funds | \n",
" nan | \n",
" 803 | \n",
" Central fiscal operations | \n",
" Mandatory | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" Legislative Branch | \n",
" 5 | \n",
" Senate | \n",
" 0.000 | \n",
" Senate | \n",
" 0.000 | \n",
" 801 | \n",
" Legislative functions | \n",
" Discretionary | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
" Legislative Branch | \n",
" 5 | \n",
" Senate | \n",
" 100.000 | \n",
" Compensation of Members, Senate | \n",
" 0.000 | \n",
" 801 | \n",
" Legislative functions | \n",
" Mandatory | \n",
" ... | \n",
" 23,000 | \n",
" 23,000 | \n",
" 23,000 | \n",
" 23,000 | \n",
" 27,000 | \n",
" 24,000 | \n",
" 24,000 | \n",
" 24,000 | \n",
" 24,000 | \n",
" 24,000 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 73 columns
\n",
"
"
],
"text/plain": [
" Agency Code Agency Name Bureau Code Bureau Name \\\n",
"0 1 Legislative Branch 0 Legislative Branch \n",
"1 1 Legislative Branch 0 Legislative Branch \n",
"2 1 Legislative Branch 0 Legislative Branch \n",
"3 1 Legislative Branch 5 Senate \n",
"4 1 Legislative Branch 5 Senate \n",
"\n",
" Account Code Account Name Treasury Agency Code \\\n",
"0 nan Receipts, Central fiscal operations nan \n",
"1 nan Receipts, Central fiscal operations nan \n",
"2 241400.000 Charges for services to trust funds nan \n",
"3 0.000 Senate 0.000 \n",
"4 100.000 Compensation of Members, Senate 0.000 \n",
"\n",
" Subfunction Code Subfunction Title BEA Category ... 2012 \\\n",
"0 803 Central fiscal operations Mandatory ... 0 \n",
"1 908 Other interest Net interest ... 0 \n",
"2 803 Central fiscal operations Mandatory ... 0 \n",
"3 801 Legislative functions Discretionary ... 0 \n",
"4 801 Legislative functions Mandatory ... 23,000 \n",
"\n",
" 2013 2014 2015 2016 2017 2018 2019 2020 2021 \n",
"0 0 0 0 0 0 0 0 0 0 \n",
"1 0 0 0 0 0 0 0 0 0 \n",
"2 0 0 0 0 0 0 0 0 0 \n",
"3 0 0 0 0 0 0 0 0 0 \n",
"4 23,000 23,000 23,000 27,000 24,000 24,000 24,000 24,000 24,000 \n",
"\n",
"[5 rows x 73 columns]"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create a DataFrame from the outlays .csv\n",
"outlays = pd.read_csv('data/outlays.csv')\n",
"outlays.head() "
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create a DataFrame from the \"deflators\" .xls (requires xlrd library)\n",
"deflators = pd.read_excel('https://obamawhitehouse.archives.gov/sites/default/files/omb/budget/fy2017//assets/hist10z1.xls')\n",
"type(deflators)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Other things that can create a DataFrame\n",
"\n",
"* Python-specific (_e.g._, tuples, lists, dictionaries, pickle)\n",
"* Relational database\n",
"* JSON\n",
"* HTML\n",
"* Stata/SAS\n",
"* HDF5\n",
"* MessagePack (experimental)\n",
"* Google's BigQuery (experimental)\n",
"\n",
"More information: [IO Tools](http://pandas.pydata.org/pandas-docs/stable/io.html \"Pandas IO Tools\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Take a peek...\n",
"\n",
"Some useful functions for a first look at your DataFrame:\n",
"\n",
"* [head](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html \"DataFrame.head\")\n",
"* [tail](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.tail.html \"DataFrame.tail\")\n",
"* [iloc](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html \"pandas.DataFrame.iloc\")\n",
"* [unique](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html \"pandas.Series.unique\")\n",
"\n",
"See also:\n",
"\n",
"* [Pandas docs: Indexing and Selecting Data](http://pandas.pydata.org/pandas-docs/stable/indexing.html \"Pandas: indexing and selecting data\")\n",
"* [Chris Albon's indexing and selecting examples](http://chrisalbon.com/python/pandas_indexing_selecting.html \"Pandas indexing and selecting examples\")"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Agency Code | \n",
" Agency Name | \n",
" Bureau Code | \n",
" Bureau Name | \n",
" Account Code | \n",
" Account Name | \n",
" Treasury Agency Code | \n",
" Subfunction Code | \n",
" Subfunction Title | \n",
" BEA Category | \n",
" ... | \n",
" 2012 | \n",
" 2013 | \n",
" 2014 | \n",
" 2015 | \n",
" 2016 | \n",
" 2017 | \n",
" 2018 | \n",
" 2019 | \n",
" 2020 | \n",
" 2021 | \n",
"
\n",
" \n",
" \n",
" \n",
" 5081 | \n",
" 902 | \n",
" Undistributed Offsetting Receipts | \n",
" 0 | \n",
" Undistributed Offsetting Receipts | \n",
" 977120.000 | \n",
" Interest, Special Worker's Compensation Expenses | \n",
" 16.000 | \n",
" 902 | \n",
" Interest received by on-budget trust funds | \n",
" Net interest | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 5082 | \n",
" 902 | \n",
" Undistributed Offsetting Receipts | \n",
" 0 | \n",
" Undistributed Offsetting Receipts | \n",
" 977910.000 | \n",
" Employing agency contributions, Miscellaneous ... | \n",
" 20.000 | \n",
" 951 | \n",
" Employer share, employee retirement (on-budget) | \n",
" Mandatory | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 5083 | \n",
" 902 | \n",
" Undistributed Offsetting Receipts | \n",
" 0 | \n",
" Undistributed Offsetting Receipts | \n",
" 977920.000 | \n",
" Interest, Miscellaneous Trust Funds, Governmen... | \n",
" 20.000 | \n",
" 902 | \n",
" Interest received by on-budget trust funds | \n",
" Net interest | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 5084 | \n",
" 902 | \n",
" Undistributed Offsetting Receipts | \n",
" 0 | \n",
" Undistributed Offsetting Receipts | \n",
" 997120.000 | \n",
" Interest, Other DOD Trust Funds | \n",
" 17.000 | \n",
" 902 | \n",
" Interest received by on-budget trust funds | \n",
" Net interest | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" -1,000 | \n",
" -1,000 | \n",
" -1,000 | \n",
" -1,000 | \n",
" -1,000 | \n",
" -1,000 | \n",
"
\n",
" \n",
" 5085 | \n",
" 930 | \n",
" Miscellaneous Receipts Below the Reporting Thr... | \n",
" 0 | \n",
" Miscellaneous Receipts Below the Reporting Thr... | \n",
" 901000.000 | \n",
" Miscellaneous Unconverted Offsetting Receipts | \n",
" 99.000 | \n",
" 809 | \n",
" Deductions for offsetting receipts | \n",
" Mandatory | \n",
" ... | \n",
" -7,000 | \n",
" 0 | \n",
" -8,000 | \n",
" -15,000 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 73 columns
\n",
"
"
],
"text/plain": [
" Agency Code Agency Name \\\n",
"5081 902 Undistributed Offsetting Receipts \n",
"5082 902 Undistributed Offsetting Receipts \n",
"5083 902 Undistributed Offsetting Receipts \n",
"5084 902 Undistributed Offsetting Receipts \n",
"5085 930 Miscellaneous Receipts Below the Reporting Thr... \n",
"\n",
" Bureau Code Bureau Name \\\n",
"5081 0 Undistributed Offsetting Receipts \n",
"5082 0 Undistributed Offsetting Receipts \n",
"5083 0 Undistributed Offsetting Receipts \n",
"5084 0 Undistributed Offsetting Receipts \n",
"5085 0 Miscellaneous Receipts Below the Reporting Thr... \n",
"\n",
" Account Code Account Name \\\n",
"5081 977120.000 Interest, Special Worker's Compensation Expenses \n",
"5082 977910.000 Employing agency contributions, Miscellaneous ... \n",
"5083 977920.000 Interest, Miscellaneous Trust Funds, Governmen... \n",
"5084 997120.000 Interest, Other DOD Trust Funds \n",
"5085 901000.000 Miscellaneous Unconverted Offsetting Receipts \n",
"\n",
" Treasury Agency Code Subfunction Code \\\n",
"5081 16.000 902 \n",
"5082 20.000 951 \n",
"5083 20.000 902 \n",
"5084 17.000 902 \n",
"5085 99.000 809 \n",
"\n",
" Subfunction Title BEA Category ... \\\n",
"5081 Interest received by on-budget trust funds Net interest ... \n",
"5082 Employer share, employee retirement (on-budget) Mandatory ... \n",
"5083 Interest received by on-budget trust funds Net interest ... \n",
"5084 Interest received by on-budget trust funds Net interest ... \n",
"5085 Deductions for offsetting receipts Mandatory ... \n",
"\n",
" 2012 2013 2014 2015 2016 2017 2018 2019 2020 \\\n",
"5081 0 0 0 0 0 0 0 0 0 \n",
"5082 0 0 0 0 0 0 0 0 0 \n",
"5083 0 0 0 0 0 0 0 0 0 \n",
"5084 0 0 0 0 -1,000 -1,000 -1,000 -1,000 -1,000 \n",
"5085 -7,000 0 -8,000 -15,000 0 0 0 0 0 \n",
"\n",
" 2021 \n",
"5081 0 \n",
"5082 0 \n",
"5083 0 \n",
"5084 -1,000 \n",
"5085 0 \n",
"\n",
"[5 rows x 73 columns]"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"outlays.tail()"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 10.1—GROSS DOMESTIC PRODUCT AND DEFLATORS USED IN THE HISTORICAL TABLES: 1940–2021 | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
" Unnamed: 6 | \n",
" Unnamed: 7 | \n",
" Unnamed: 8 | \n",
" Unnamed: 9 | \n",
" Unnamed: 10 | \n",
" Unnamed: 11 | \n",
" Unnamed: 12 | \n",
" Unnamed: 13 | \n",
" Unnamed: 14 | \n",
" Unnamed: 15 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" (Fiscal Year 2009 = 1.000) | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Fiscal Year | \n",
" GDP (in billions of dollars) | \n",
" GDP (Chained) Price Index | \n",
" Composite Outlay Deflators | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" Total | \n",
" Total Defense | \n",
" Total Non- defense | \n",
" Payment for Individuals | \n",
" NaN | \n",
" NaN | \n",
" Other Grants | \n",
" Net Interest | \n",
" Undis- tributed Offsetting Receipts | \n",
" All Other | \n",
" Addendum: Direct Capital | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 86 | \n",
" 2021 estimate | \n",
" 22875.200 | \n",
" 1.229 | \n",
" 1.255 | \n",
" 1.236 | \n",
" 1.258 | \n",
" 1.258 | \n",
" 1.258 | \n",
" 1.259 | \n",
" 1.361 | \n",
" 1.229 | \n",
" 1.283 | \n",
" 1.259 | \n",
" 1.198 | \n",
" 1.193 | \n",
" 1.214 | \n",
"
\n",
" \n",
" 87 | \n",
" Note: Constant dollar research and development... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Table 10.1—GROSS DOMESTIC PRODUCT AND DEFLATORS USED IN THE HISTORICAL TABLES: 1940–2021 \\\n",
"0 (Fiscal Year 2009 = 1.000) \n",
"1 Fiscal Year \n",
"2 NaN \n",
"86 2021 estimate \n",
"87 Note: Constant dollar research and development... \n",
"\n",
" Unnamed: 1 Unnamed: 2 \\\n",
"0 NaN NaN \n",
"1 GDP (in billions of dollars) GDP (Chained) Price Index \n",
"2 NaN NaN \n",
"86 22875.200 1.229 \n",
"87 NaN NaN \n",
"\n",
" Unnamed: 3 Unnamed: 4 Unnamed: 5 \\\n",
"0 NaN NaN NaN \n",
"1 Composite Outlay Deflators NaN NaN \n",
"2 Total Total Defense Total Non- defense \n",
"86 1.255 1.236 1.258 \n",
"87 NaN NaN NaN \n",
"\n",
" Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10 \\\n",
"0 NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN \n",
"2 Payment for Individuals NaN NaN Other Grants Net Interest \n",
"86 1.258 1.258 1.259 1.361 1.229 \n",
"87 NaN NaN NaN NaN NaN \n",
"\n",
" Unnamed: 11 Unnamed: 12 Unnamed: 13 \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 Undis- tributed Offsetting Receipts All Other Addendum: Direct Capital \n",
"86 1.283 1.259 1.198 \n",
"87 NaN NaN NaN \n",
"\n",
" Unnamed: 14 Unnamed: 15 \n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"86 1.193 1.214 \n",
"87 NaN NaN "
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# grab first 3 and the last 2 rows of the deflators dataframe\n",
"\n",
"deflators.iloc[[0,1,2,-2,-1]]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Step 2: Explore\n",
"\n",
"What kind of pain are you in for?"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Agency Code', 'Agency Name', 'Bureau Code', 'Bureau Name',\n",
" 'Account Code', 'Account Name', 'Treasury Agency Code',\n",
" 'Subfunction Code', 'Subfunction Title', 'BEA Category',\n",
" 'Grant/non-grant split', 'On- or Off- Budget', '1962', '1963', '1964',\n",
" '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973',\n",
" '1974', '1975', '1976', 'TQ', '1977', '1978', '1979', '1980', '1981',\n",
" '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990',\n",
" '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999',\n",
" '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008',\n",
" '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017',\n",
" '2018', '2019', '2020', '2021'],\n",
" dtype='object')"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# see the column names\n",
"outlays.columns"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 5086 entries, 0 to 5085\n",
"Data columns (total 73 columns):\n",
"Agency Code 5086 non-null int64\n",
"Agency Name 5086 non-null object\n",
"Bureau Code 5086 non-null int64\n",
"Bureau Name 5086 non-null object\n",
"Account Code 5048 non-null float64\n",
"Account Name 5086 non-null object\n",
"Treasury Agency Code 4944 non-null float64\n",
"Subfunction Code 5086 non-null int64\n",
"Subfunction Title 5086 non-null object\n",
"BEA Category 5086 non-null object\n",
"Grant/non-grant split 5086 non-null object\n",
"On- or Off- Budget 5086 non-null object\n",
"1962 5086 non-null object\n",
"1963 5086 non-null object\n",
"1964 5086 non-null object\n",
"1965 5086 non-null object\n",
"1966 5086 non-null object\n",
"1967 5086 non-null object\n",
"1968 5086 non-null object\n",
"1969 5086 non-null object\n",
"1970 5086 non-null object\n",
"1971 5086 non-null object\n",
"1972 5086 non-null object\n",
"1973 5086 non-null object\n",
"1974 5086 non-null object\n",
"1975 5086 non-null object\n",
"1976 5086 non-null object\n",
"TQ 5086 non-null object\n",
"1977 5086 non-null object\n",
"1978 5086 non-null object\n",
"1979 5086 non-null object\n",
"1980 5086 non-null object\n",
"1981 5086 non-null object\n",
"1982 5086 non-null object\n",
"1983 5086 non-null object\n",
"1984 5086 non-null object\n",
"1985 5086 non-null object\n",
"1986 5086 non-null object\n",
"1987 5086 non-null object\n",
"1988 5086 non-null object\n",
"1989 5086 non-null object\n",
"1990 5086 non-null object\n",
"1991 5086 non-null object\n",
"1992 5086 non-null object\n",
"1993 5086 non-null object\n",
"1994 5086 non-null object\n",
"1995 5086 non-null object\n",
"1996 5086 non-null object\n",
"1997 5086 non-null object\n",
"1998 5086 non-null object\n",
"1999 5086 non-null object\n",
"2000 5086 non-null object\n",
"2001 5086 non-null object\n",
"2002 5086 non-null object\n",
"2003 5086 non-null object\n",
"2004 5086 non-null object\n",
"2005 5086 non-null object\n",
"2006 5086 non-null object\n",
"2007 5086 non-null object\n",
"2008 5086 non-null object\n",
"2009 5086 non-null object\n",
"2010 5086 non-null object\n",
"2011 5086 non-null object\n",
"2012 5086 non-null object\n",
"2013 5086 non-null object\n",
"2014 5086 non-null object\n",
"2015 5086 non-null object\n",
"2016 5086 non-null object\n",
"2017 5086 non-null object\n",
"2018 5086 non-null object\n",
"2019 5086 non-null object\n",
"2020 5086 non-null object\n",
"2021 5086 non-null object\n",
"dtypes: float64(2), int64(3), object(68)\n",
"memory usage: 2.8+ MB\n"
]
}
],
"source": [
"# see datatypes and high-level view of missing data\n",
"# watch points:\n",
"# - is the number of rows in the ballpark?\n",
"# - are there codes coming in as numeric datatypes?\n",
"# - are there columns with unexpected null values?\n",
"# - are any of these column names going to be annoying?\n",
"# - NumPy loves floats! Are you planning to do any math?\n",
"\n",
"outlays.info()"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Agency Code | \n",
" Bureau Code | \n",
" Account Code | \n",
" Treasury Agency Code | \n",
" Subfunction Code | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 5086.000 | \n",
" 5086.000 | \n",
" 5048.000 | \n",
" 4944.000 | \n",
" 5086.000 | \n",
"
\n",
" \n",
" mean | \n",
" 106.277 | \n",
" 19.283 | \n",
" 166440.529 | \n",
" 44.419 | \n",
" 484.108 | \n",
"
\n",
" \n",
" std | \n",
" 207.342 | \n",
" 25.214 | \n",
" 285555.505 | \n",
" 32.740 | \n",
" 249.309 | \n",
"
\n",
" \n",
" min | \n",
" 1.000 | \n",
" 0.000 | \n",
" 0.000 | \n",
" 0.000 | \n",
" 51.000 | \n",
"
\n",
" \n",
" 25% | \n",
" 9.000 | \n",
" 0.000 | \n",
" 566.000 | \n",
" 14.000 | \n",
" 302.000 | \n",
"
\n",
" \n",
" 50% | \n",
" 15.000 | \n",
" 9.000 | \n",
" 4082.500 | \n",
" 28.000 | \n",
" 452.000 | \n",
"
\n",
" \n",
" 75% | \n",
" 27.000 | \n",
" 30.000 | \n",
" 271715.000 | \n",
" 75.000 | \n",
" 751.000 | \n",
"
\n",
" \n",
" max | \n",
" 930.000 | \n",
" 99.000 | \n",
" 997200.000 | \n",
" 99.000 | \n",
" 959.000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Agency Code Bureau Code Account Code Treasury Agency Code \\\n",
"count 5086.000 5086.000 5048.000 4944.000 \n",
"mean 106.277 19.283 166440.529 44.419 \n",
"std 207.342 25.214 285555.505 32.740 \n",
"min 1.000 0.000 0.000 0.000 \n",
"25% 9.000 0.000 566.000 14.000 \n",
"50% 15.000 9.000 4082.500 28.000 \n",
"75% 27.000 30.000 271715.000 75.000 \n",
"max 930.000 99.000 997200.000 99.000 \n",
"\n",
" Subfunction Code \n",
"count 5086.000 \n",
"mean 484.108 \n",
"std 249.309 \n",
"min 51.000 \n",
"25% 302.000 \n",
"50% 452.000 \n",
"75% 751.000 \n",
"max 959.000 "
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# math! note that our dollar amts aren't being treated as numeric values\n",
"outlays.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Fixin' to do\n",
"\n",
"* Override data types\n",
"* Retain leading zeroes\n",
"* Skip header and footer rows\n",
"* Convert whitespaces to null/NaN\n",
"* Use your own column names\n",
"* Exclude unwanted columns\n",
"* Ignore blank rows (or not!)\n",
"* [And more](http://pandas.pydata.org/pandas-docs/stable/io.html \"Pandas IO Tools\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Step 2A: Get the Data Again\n",
"\n",
"Yeah, you'll probably have to do that."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"count 5086.000\n",
"mean 48310.506\n",
"std 924126.826\n",
"min -8294670.000\n",
"25% 0.000\n",
"50% 0.000\n",
"75% 0.000\n",
"max 42955698.000\n",
"Name: 1973, dtype: float64"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# some better data types\n",
"# - force selected columns to string/object\n",
"# - thousands parameter ensures incoming numbers w/ commas are not interpreted as string\n",
"\n",
"outlays = pd.read_csv(\n",
" 'data/outlays.csv',\n",
" thousands=',',\n",
" dtype={'Agency Code': str,\n",
" 'Bureau Code': str,\n",
" 'Account Code': str,\n",
" 'Treasury Agency Code': str,\n",
" 'Subfunction Code': str}\n",
" )\n",
"# dollar amounts now treated as numeric\n",
"outlays['1973'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" fiscal_year | \n",
" gdp_billions | \n",
" deflator | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1940 | \n",
" 98.200 | \n",
" 0.081 | \n",
"
\n",
" \n",
" 1 | \n",
" 1941 | \n",
" 116.200 | \n",
" 0.084 | \n",
"
\n",
" \n",
" 2 | \n",
" 1942 | \n",
" 147.700 | \n",
" 0.090 | \n",
"
\n",
" \n",
" 3 | \n",
" 1943 | \n",
" 184.600 | \n",
" 0.096 | \n",
"
\n",
" \n",
" 4 | \n",
" 1944 | \n",
" 213.800 | \n",
" 0.100 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" fiscal_year gdp_billions deflator\n",
"0 1940 98.200 0.081\n",
"1 1941 116.200 0.084\n",
"2 1942 147.700 0.090\n",
"3 1943 184.600 0.096\n",
"4 1944 213.800 0.100"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# take 2: deflators\n",
"# - only take the 1st 3 columns\n",
"# - specify better header names\n",
"# - ignore header rows in spreadsheet\n",
"# - ignore the footer row\n",
"\n",
"deflators = pd.read_excel('data/hist10z1.xls', \n",
" parse_cols=2,\n",
" names=['fiscal_year', 'gdp_billions', 'deflator'],\n",
" header=None,\n",
" skiprows=5,\n",
" skip_footer=1\n",
" )\n",
"deflators.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Step 3: Clean\n",
"\n",
"The annoying stuff."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## For example\n",
"\n",
"* [Renaming](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rename.html \"pandas renaming\")\n",
"* [Handling missing values](http://pandas.pydata.org/pandas-docs/stable/missing_data.html \"pandas: working with missing data\")\n",
"* Delete unecessary data\n",
"* [Clean up strings](http://pandas.pydata.org/pandas-docs/stable/text.html \"working with strings\")\n",
"* [Tidy data](https://www.jstatsoft.org/article/view/v059i10 \"Tidy Data by Hadley Wickham\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Clean up outlays"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['agency_code', 'agency_name', 'bureau_code', 'bureau_name',\n",
" 'account_code', 'account_name', 'treasury_agency_code',\n",
" 'subfunction_code', 'subfunction_name', 'bea_category', 'grant_split',\n",
" 'on_off_budget', '1962', '1963', '1964', '1965', '1966', '1967', '1968',\n",
" '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', 'TQ',\n",
" '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985',\n",
" '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994',\n",
" '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003',\n",
" '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012',\n",
" '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021'],\n",
" dtype='object')"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Rename some columns\n",
"\n",
"outlays = outlays.rename(columns = {\n",
" 'Agency Code': 'agency_code',\n",
" 'Agency Name': 'agency_name',\n",
" 'Bureau Code': 'bureau_code',\n",
" 'Bureau Name': 'bureau_name',\n",
" 'Account Code': 'account_code',\n",
" 'Account Name': 'account_name',\n",
" 'Treasury Agency Code': 'treasury_agency_code',\n",
" 'Subfunction Code': 'subfunction_code',\n",
" 'Subfunction Title': 'subfunction_name',\n",
" 'BEA Category': 'bea_category',\n",
" 'Grant/non-grant split': 'grant_split',\n",
" 'On- or Off- Budget': 'on_off_budget'\n",
" })\n",
"outlays.columns"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"# Extraneous information\n",
"\n",
"del outlays['TQ']"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" account_code treasury_agency_code 2012\n",
"0 NaN NaN 0\n",
"1 NaN NaN 0\n",
" account_code treasury_agency_code 2012\n",
"0 missing missing 0\n",
"1 missing missing 0\n"
]
}
],
"source": [
"# Missing values\n",
"\n",
"print(outlays[['account_code', 'treasury_agency_code', '2012']].head(2))\n",
"outlays = outlays.fillna({\n",
" 'account_code': 'missing',\n",
" 'treasury_agency_code': 'missing',\n",
" '1962': 0 # not needed here, but included as an example\n",
" })\n",
"print(outlays[['account_code', 'treasury_agency_code', '2012']].head(2))"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Receipts, Central fiscal operations'\n",
" 'Charges for services to trust funds' 'Senate' ...,\n",
" 'Interest, Miscellaneous Trust Funds, Government-wide'\n",
" 'Interest, Other DOD Trust Funds'\n",
" 'Miscellaneous Unconverted Offsetting Receipts']\n",
"['receipts, central fiscal operations'\n",
" 'charges for services to trust funds' 'senate' ...,\n",
" 'interest, miscellaneous trust funds, government-wide'\n",
" 'interest, other dod trust funds'\n",
" 'miscellaneous unconverted offsetting receipts']\n",
"['receipts, central fiscal operations'\n",
" 'charges for services to trust funds' 'senate' ...,\n",
" 'interest, miscellaneous trust funds, government-wide'\n",
" 'interest, other DOD trust funds'\n",
" 'miscellaneous unconverted offsetting receipts']\n"
]
}
],
"source": [
"# string clean-up\n",
"print(pd.unique(outlays.account_name.ravel()))\n",
"outlays.account_name = outlays.account_name.str.strip().str.lower()\n",
"print(pd.unique(outlays.account_name.ravel()))\n",
"\n",
"# oops capitalize DOD again\n",
"outlays.account_name = outlays.account_name.str.replace(' dod ', ' DOD ') # regex works too\n",
"print(pd.unique(outlays.account_name.ravel()))\n",
"\n",
"outlays.subfunction_title = outlays.account_name.str.strip().str.lower()\n",
"outlays.bea_category = outlays.bea_category.str.lower()"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2423"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get discretionary spending only\n",
"\n",
"disc = outlays[outlays['bea_category'] == 'discretionary']\n",
"# gut check number of rows\n",
"len(disc.index)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Clean up deflators"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['1940' '1941' '1942' '1943' '1944' '1945' '1946' '1947' '1948' '1949'\n",
" '1950' '1951' '1952' '1953' '1954' '1955' '1956' '1957' '1958' '1959'\n",
" '1960' '1961' '1962' '1963' '1964' '1965' '1966' '1967' '1968' '1969'\n",
" '1970' '1971' '1972' '1973' '1974' '1975' '1976' 'TQ' '1977' '1978' '1979'\n",
" '1980' '1981' '1982' '1983' '1984' '1985' '1986' '1987' '1988' '1989'\n",
" '1990' '1991' '1992' '1993' '1994' '1995' '1996' '1997' '1998' '1999'\n",
" '2000' '2001' '2002' '2003' '2004' '2005' '2006' '2007' '2008' '2009'\n",
" '2010' '2011' '2012' '2013' '2014' '2015' '2016 estimate' '2017 estimate'\n",
" '2018 estimate' '2019 estimate' '2020 estimate' '2021 estimate']\n",
"\n",
"['1940' '1941' '1942' '1943' '1944' '1945' '1946' '1947' '1948' '1949'\n",
" '1950' '1951' '1952' '1953' '1954' '1955' '1956' '1957' '1958' '1959'\n",
" '1960' '1961' '1962' '1963' '1964' '1965' '1966' '1967' '1968' '1969'\n",
" '1970' '1971' '1972' '1973' '1974' '1975' '1976' 'TQ' '1977' '1978' '1979'\n",
" '1980' '1981' '1982' '1983' '1984' '1985' '1986' '1987' '1988' '1989'\n",
" '1990' '1991' '1992' '1993' '1994' '1995' '1996' '1997' '1998' '1999'\n",
" '2000' '2001' '2002' '2003' '2004' '2005' '2006' '2007' '2008' '2009'\n",
" '2010' '2011' '2012' '2013' '2014' '2015' '2016' '2017' '2018' '2019'\n",
" '2020' '2021']\n"
]
}
],
"source": [
"# clean up year column\n",
"# - get rid of the \"estimate\" text by applying a function to the fiscal_year Series\n",
"\n",
"print(deflators.fiscal_year.unique())\n",
"deflators.fiscal_year = deflators.fiscal_year.map(lambda x: x[:4])\n",
"print('\\n{}'.format(deflators.fiscal_year.unique()))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Step 3A: Tidy Data \n",
"\n",
"> \n",
"> **_Datasets are often constructed in bizarre ways_**\n",
"> \n",
"\n",
"\\- Hadley Wickham, in his 2014 paper _Tidy Data_ ([Journal of Statistical Software, August 2014](https://www.jstatsoft.org/article/view/v059i10))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Principles of Tidy Data\n",
"\n",
"* each variable is a column\n",
"* each observation is a row\n",
"* each type of observational unit is a table\n",
"* easy to model, manipulate, and visualize"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" agency_code | \n",
" agency_name | \n",
" bureau_code | \n",
" bureau_name | \n",
" account_code | \n",
" account_name | \n",
" treasury_agency_code | \n",
" subfunction_code | \n",
" subfunction_name | \n",
" bea_category | \n",
" grant_split | \n",
" on_off_budget | \n",
" fiscal_year | \n",
" amount | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 00 | \n",
" Legislative Branch | \n",
" missing | \n",
" receipts, central fiscal operations | \n",
" missing | \n",
" 803 | \n",
" Central fiscal operations | \n",
" mandatory | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" -628 | \n",
"
\n",
" \n",
" 1 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 00 | \n",
" Legislative Branch | \n",
" missing | \n",
" receipts, central fiscal operations | \n",
" missing | \n",
" 908 | \n",
" Other interest | \n",
" net interest | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 00 | \n",
" Legislative Branch | \n",
" 241400 | \n",
" charges for services to trust funds | \n",
" missing | \n",
" 803 | \n",
" Central fiscal operations | \n",
" mandatory | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 05 | \n",
" Senate | \n",
" 0000 | \n",
" senate | \n",
" 00 | \n",
" 801 | \n",
" Legislative functions | \n",
" discretionary | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" 26946 | \n",
"
\n",
" \n",
" 4 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 05 | \n",
" Senate | \n",
" 0100 | \n",
" compensation of members, senate | \n",
" 00 | \n",
" 801 | \n",
" Legislative functions | \n",
" mandatory | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" agency_code agency_name bureau_code bureau_name \\\n",
"0 001 Legislative Branch 00 Legislative Branch \n",
"1 001 Legislative Branch 00 Legislative Branch \n",
"2 001 Legislative Branch 00 Legislative Branch \n",
"3 001 Legislative Branch 05 Senate \n",
"4 001 Legislative Branch 05 Senate \n",
"\n",
" account_code account_name treasury_agency_code \\\n",
"0 missing receipts, central fiscal operations missing \n",
"1 missing receipts, central fiscal operations missing \n",
"2 241400 charges for services to trust funds missing \n",
"3 0000 senate 00 \n",
"4 0100 compensation of members, senate 00 \n",
"\n",
" subfunction_code subfunction_name bea_category grant_split \\\n",
"0 803 Central fiscal operations mandatory Nongrant \n",
"1 908 Other interest net interest Nongrant \n",
"2 803 Central fiscal operations mandatory Nongrant \n",
"3 801 Legislative functions discretionary Nongrant \n",
"4 801 Legislative functions mandatory Nongrant \n",
"\n",
" on_off_budget fiscal_year amount \n",
"0 On-budget 1962 -628 \n",
"1 On-budget 1962 0 \n",
"2 On-budget 1962 0 \n",
"3 On-budget 1962 26946 \n",
"4 On-budget 1962 0 "
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# tidy the outlays (aka spending) dataset\n",
"# - variables = agency, bureau, account, year, subfunction, etc.\n",
"# - observation = dollar amount\n",
"\n",
"# list of dataframe columns to use as identifier variables\n",
"# (all columns not listed will be unpivoted and added an additional identifier columm)\n",
"variables = ['agency_code', 'agency_name', 'bureau_code', 'bureau_name', 'account_code', 'account_name', 'treasury_agency_code', 'subfunction_code', 'subfunction_name', 'bea_category', 'grant_split', 'on_off_budget'] \n",
"outlays = pd.melt(outlays,\n",
" id_vars=variables,\n",
" # name of the new, unpivoted identifier column\n",
" var_name='fiscal_year',\n",
" # name of the value column for the unpivoted data\n",
" value_name='amount'\n",
" )\n",
"\n",
"outlays.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Useful Pandas Tidy-ing Functions\n",
"\n",
"* [melt](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html \"pandas.melt\")\n",
"* [stack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.stack.html \"pandas.DataFrame.stack\")\n",
"* [unstack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html \"pandas.DataFrame.unstack\")\n",
"* [pivot_table](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html \"pandas.pivot_table\")\n",
"\n",
"See also:\n",
"* Pandas docs: [Reshaping and pivot tables](http://pandas.pydata.org/pandas-docs/stable/reshaping.html \"Pandas: reshaping and pivot tables\")\n",
"* [Tidy Data in Python](https://www.ibm.com/developerworks/community/blogs/jfp/entry/Tidy_Data_In_Python?lang=en \"Tidy Data in Python\")\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Step 4: Fun Stuff"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Calculating New Columns\n",
"\n",
"We need inflation-adjusted amounts to look at spending trends over time."
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" agency_code | \n",
" agency_name | \n",
" bureau_code | \n",
" bureau_name | \n",
" account_code | \n",
" account_name | \n",
" treasury_agency_code | \n",
" subfunction_code | \n",
" subfunction_name | \n",
" bea_category | \n",
" grant_split | \n",
" on_off_budget | \n",
" fiscal_year | \n",
" amount | \n",
" gdp_billions | \n",
" deflator | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 00 | \n",
" Legislative Branch | \n",
" missing | \n",
" receipts, central fiscal operations | \n",
" missing | \n",
" 803 | \n",
" Central fiscal operations | \n",
" mandatory | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" -628 | \n",
" 586.900 | \n",
" 0.178 | \n",
"
\n",
" \n",
" 1 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 00 | \n",
" Legislative Branch | \n",
" missing | \n",
" receipts, central fiscal operations | \n",
" missing | \n",
" 908 | \n",
" Other interest | \n",
" net interest | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" 0 | \n",
" 586.900 | \n",
" 0.178 | \n",
"
\n",
" \n",
" 2 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 00 | \n",
" Legislative Branch | \n",
" 241400 | \n",
" charges for services to trust funds | \n",
" missing | \n",
" 803 | \n",
" Central fiscal operations | \n",
" mandatory | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" 0 | \n",
" 586.900 | \n",
" 0.178 | \n",
"
\n",
" \n",
" 3 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 05 | \n",
" Senate | \n",
" 0000 | \n",
" senate | \n",
" 00 | \n",
" 801 | \n",
" Legislative functions | \n",
" discretionary | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" 26946 | \n",
" 586.900 | \n",
" 0.178 | \n",
"
\n",
" \n",
" 4 | \n",
" 001 | \n",
" Legislative Branch | \n",
" 05 | \n",
" Senate | \n",
" 0100 | \n",
" compensation of members, senate | \n",
" 00 | \n",
" 801 | \n",
" Legislative functions | \n",
" mandatory | \n",
" Nongrant | \n",
" On-budget | \n",
" 1962 | \n",
" 0 | \n",
" 586.900 | \n",
" 0.178 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" agency_code agency_name bureau_code bureau_name \\\n",
"0 001 Legislative Branch 00 Legislative Branch \n",
"1 001 Legislative Branch 00 Legislative Branch \n",
"2 001 Legislative Branch 00 Legislative Branch \n",
"3 001 Legislative Branch 05 Senate \n",
"4 001 Legislative Branch 05 Senate \n",
"\n",
" account_code account_name treasury_agency_code \\\n",
"0 missing receipts, central fiscal operations missing \n",
"1 missing receipts, central fiscal operations missing \n",
"2 241400 charges for services to trust funds missing \n",
"3 0000 senate 00 \n",
"4 0100 compensation of members, senate 00 \n",
"\n",
" subfunction_code subfunction_name bea_category grant_split \\\n",
"0 803 Central fiscal operations mandatory Nongrant \n",
"1 908 Other interest net interest Nongrant \n",
"2 803 Central fiscal operations mandatory Nongrant \n",
"3 801 Legislative functions discretionary Nongrant \n",
"4 801 Legislative functions mandatory Nongrant \n",
"\n",
" on_off_budget fiscal_year amount gdp_billions deflator \n",
"0 On-budget 1962 -628 586.900 0.178 \n",
"1 On-budget 1962 0 586.900 0.178 \n",
"2 On-budget 1962 0 586.900 0.178 \n",
"3 On-budget 1962 26946 586.900 0.178 \n",
"4 On-budget 1962 0 586.900 0.178 "
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# merge discretionary outlays and deflators dataframes\n",
"# - by default Pandas will use like column names as the merge key\n",
"# - this is very intuitive for SQL nerds\n",
"\n",
"outlays = outlays.merge(deflators, how='left')\n",
"outlays.head()"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" fiscal_year | \n",
" amount | \n",
" amount_adj | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1962 | \n",
" -628 | \n",
" -3522.154 | \n",
"
\n",
" \n",
" 1 | \n",
" 1962 | \n",
" 0 | \n",
" 0.000 | \n",
"
\n",
" \n",
" 2 | \n",
" 1962 | \n",
" 0 | \n",
" 0.000 | \n",
"
\n",
" \n",
" 3 | \n",
" 1962 | \n",
" 26946 | \n",
" 151127.314 | \n",
"
\n",
" \n",
" 4 | \n",
" 1962 | \n",
" 0 | \n",
" 0.000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" fiscal_year amount amount_adj\n",
"0 1962 -628 -3522.154\n",
"1 1962 0 0.000\n",
"2 1962 0 0.000\n",
"3 1962 26946 151127.314\n",
"4 1962 0 0.000"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Add calculated column to represent spending amount in 2009 dollars\n",
"\n",
"outlays['amount_adj'] = outlays.amount / outlays.deflator\n",
"outlays[['fiscal_year', 'amount', 'amount_adj']].head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Grouping\n",
"\n",
"Summarize spending by category."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" budget_function_desc | \n",
"
\n",
" \n",
" budget_function_code | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 010 | \n",
" national defense | \n",
"
\n",
" \n",
" 150 | \n",
" international affairs | \n",
"
\n",
" \n",
" 250 | \n",
" general science, space, and technology | \n",
"
\n",
" \n",
" 270 | \n",
" energy | \n",
"
\n",
" \n",
" 300 | \n",
" natural resources and environment | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" budget_function_desc\n",
"budget_function_code \n",
"010 national defense\n",
"150 international affairs\n",
"250 general science, space, and technology\n",
"270 energy\n",
"300 natural resources and environment"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get a list of budget function codes from sqlite & tweak to use as a lookup\n",
"\n",
"from sqlalchemy import create_engine\n",
"db_engine = create_engine('sqlite:///data/pandas_data_munging.db')\n",
"functions = pd.read_sql_query(\n",
" 'SELECT budget_function_code, budget_function_desc FROM budget_function', db_engine\n",
")\n",
"\n",
"# index/sort the dataframe by budget_function_code\n",
"functions = functions.set_index(keys='budget_function_code').sort_index()\n",
"functions.head()"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" subfunction_name | \n",
" function_name | \n",
"
\n",
" \n",
" \n",
" \n",
" 305155 | \n",
" Interest received by on-budget trust funds | \n",
" net interest | \n",
"
\n",
" \n",
" 305156 | \n",
" Employer share, employee retirement (on-budget) | \n",
" undistributed offsetting receipts | \n",
"
\n",
" \n",
" 305157 | \n",
" Interest received by on-budget trust funds | \n",
" net interest | \n",
"
\n",
" \n",
" 305158 | \n",
" Interest received by on-budget trust funds | \n",
" net interest | \n",
"
\n",
" \n",
" 305159 | \n",
" Deductions for offsetting receipts | \n",
" general government | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" subfunction_name \\\n",
"305155 Interest received by on-budget trust funds \n",
"305156 Employer share, employee retirement (on-budget) \n",
"305157 Interest received by on-budget trust funds \n",
"305158 Interest received by on-budget trust funds \n",
"305159 Deductions for offsetting receipts \n",
"\n",
" function_name \n",
"305155 net interest \n",
"305156 undistributed offsetting receipts \n",
"305157 net interest \n",
"305158 net interest \n",
"305159 general government "
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# use the budget function information from sqllite to add this\n",
"# higher-level category to our discretionary spending dataframe\n",
"# (might not be the best approach but demos a few Pandas things)\n",
"\n",
"def get_budget_function(sf):\n",
" i = functions.index.get_loc(sf, method='ffill')\n",
" return functions.iloc[i].budget_function_desc\n",
"\n",
"subfunctions = pd.DataFrame(\n",
" outlays.subfunction_code.unique(), \n",
" columns=['subfunction_code']\n",
")\n",
"subfunctions['function_name'] = subfunctions.subfunction_code.map(\n",
" get_budget_function)\n",
"outlays = outlays.merge(subfunctions, how='left')\n",
"outlays[['subfunction_name', 'function_name']].tail()"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" amount_adj | \n",
"
\n",
" \n",
" fiscal_year | \n",
" function_name | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2021 | \n",
" income security | \n",
" 482216528.388 | \n",
"
\n",
" \n",
" international affairs | \n",
" 42545143.973 | \n",
"
\n",
" \n",
" medicare | \n",
" 610920774.361 | \n",
"
\n",
" \n",
" national defense | \n",
" 498161704.897 | \n",
"
\n",
" \n",
" natural resources and environment | \n",
" 37465430.291 | \n",
"
\n",
" \n",
" net interest | \n",
" 467040832.927 | \n",
"
\n",
" \n",
" social security | \n",
" 999925166.748 | \n",
"
\n",
" \n",
" transportation | \n",
" 105579144.298 | \n",
"
\n",
" \n",
" undistributed offsetting receipts | \n",
" -84646982.268 | \n",
"
\n",
" \n",
" veterans benefits and services | \n",
" 169685212.299 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" amount_adj\n",
"fiscal_year function_name \n",
"2021 income security 482216528.388\n",
" international affairs 42545143.973\n",
" medicare 610920774.361\n",
" national defense 498161704.897\n",
" natural resources and environment 37465430.291\n",
" net interest 467040832.927\n",
" social security 999925166.748\n",
" transportation 105579144.298\n",
" undistributed offsetting receipts -84646982.268\n",
" veterans benefits and services 169685212.299"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# group spending by year and category\n",
"grouped = outlays.groupby(['fiscal_year', 'function_name'])\n",
" \n",
"grouped.agg({'amount_adj': 'sum'}).tail(10)"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAfkAAAFhCAYAAABzg9PKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl4VOX5PvB7lkyWmewZkpCQBUhYwhICCgiIgoisgqzR\nAhWwaou2orYu3yqtlqXgVhRUbKul/RUkbgQVKYtg2QkkIYEECEkgkH2fPZPz/v6IHUUgzEAmk8zc\nn+vyCjNnzswzjwN3zpn3vK9MCCFAREREbkfu6gKIiIjIORjyREREboohT0RE5KYY8kRERG6KIU9E\nROSmGPJERERuym1CPisrC/PmzWv1MStWrMDMmTMxe/ZsZGRktFNlRERErqF0dQFtYcOGDdi6dSt8\nfX2v+5i8vDycOHECW7ZsQXFxMZYuXYpPP/20HaskIiJqX25xJB8TE4O1a9fabufn52PevHmYN28e\nnnjiCTQ2NqJLly7w8fGBxWKBTqeDUukWv98QERFdl1uE/Pjx468I7d///vd4+eWXsXHjRtx55534\n4IMPoFQqIZfLMWHCBDz88MNYuHChCysmIiJyPrc8nC0oKMAf/vAHAEBTUxPi4uLw+eefIywsDH/9\n61+h1+vx4IMPIjk5GRERES6uloiIyDncMuTj4+OxatUqdO3aFRkZGaisrITJZIKfnx8UCgXUajVU\nKhUMBoOrSyUiInIatwz5ZcuW4Xe/+x2sVitkMhn+9Kc/ISYmBsePH8fcuXPR3NyMKVOmoHv37q4u\nlYiIyGlkXIWOiIjIPbnFwDsiIiK6GkOeiIjITXX67+QrKxtdXUK7Cg72Q20tBwzai/1yDPtlP/bK\nMeyXY1rrl1brb/fz8Ei+k1EqFa4uoVNhvxzDftmPvXIM++WYtuoXQ56IiMhNMeSJiIjcFEOeiIjI\nTTHkiYiI3BRDnoiIyE0x5ImIiNwUQ56IiMhNMeSJiIjcFEOeiIjITTHkiYiI3BRDnoiIqJOoaTA5\n9HiGPBERUSfQaLBg5b+OO7QPQ56IiKiDszZLeOezHFTV80ieiIjIbQghsPGbfJy5WIchvbQO7cuQ\nJyIi6sD+c/QivssuRWy4PxZN7uvQvgx5IiKiDiq7oBqb95xDoEaFJ2b0h7eXY+vMM+SJiIg6oEtV\nery3NQdKhRxPPDAAIQE+Dj8HQ56IiKiDadBbsDYtG0ZzMx6e2Bvduwbc1PMo27guIiIiugXWZglv\nfHQUFXVGTL4jDsP6Rtz0c/FInoiIqIMQQuBf/zmDkwVVGNxLi2mj4m/p+RjyREREHcTOYyXYm3kZ\n3aMCsXhSX8hlslt6Pqedrm9qasILL7yAS5cuwWKx4PHHH8fYsWNt23fv3o133nkHSqUSM2bMwOzZ\nsyFJEpYtW4b8/HyoVCq8+uqriI2NdVaJREREHUZ2QRU27T6LQLUK//fwUMBqveXndFrIb926FUFB\nQVi9ejXq6uowbdo0W8g3NTVhxYoVSEtLg6+vL1JTUzFmzBgcP34cFosFmzdvRmZmJlauXIn169c7\nq0QiIqIOoaRCh3e/yIVSIceTMwdAG+yLysrGW35ep4X8fffdh/HjxwNo+Y5Bofjh2r6CggLExMQg\nMDAQADB48GAcPXoUmZmZGDVqFAAgOTkZOTk5ziqPiIioQ2jQW/BWWjZMlmY8dn8S4iNvbiT9tTgt\n5NVqNQBAp9PhySefxG9+8xvbNp1OB39//yseq9PpoNPpoNFobPcrFApYrVYoldcvMzjYD0qlY5MD\ndHZarf+NH0Q27Jdj2C/7sVeOYb+uZmlqxp//fQLVDSY8dF9vTLqzp21bW/TLqZfQlZaW4le/+hUe\nfPBBTJkyxXa/RqOBXq+33dbr9fD397/qfkmSWg14AKitNbR94R2YVuvfJqdwPAX75Rj2y37slWPY\nr6sJIbBh2ynkFddiWN9wjBkYaetRa/1yJPydNrq+qqoKCxcuxLPPPouZM2desa1Hjx4oLi5GXV0d\nLBYLjh07hkGDBiElJQX79u0DAGRmZiIxMdFZ5REREbnUtgNFOJRbjh5RAXh4Ym/IbnEk/bU47Uj+\n3XffRUNDA9atW4d169YBAGbNmgWj0Yg5c+bgueeew6JFiyCEwIwZMxAeHo5x48Zh//79mDt3LoQQ\nWL58ubPKIyIicpmjeRX47LtChAZ4Y8kDA+DlpK+dZUII4ZRnbieedvqHp7wcw345hv2yH3vlGPbr\nB4WlDVj5r+OQy2V48WeDEd1Fc9VjOvzpeiIiIrqStVnC+1tzYW2W8NjUpGsGfFtiyBMREbWTXRkl\nKK81YsygaAzsGeb012PIExERtYMGgwVb9xdB7aPE/bc4J729GPJERETt4PPvCmE0WzF1ZDw0vl7t\n8poMeSIiIie7WKHD3sxLiAz1w92DotrtdRnyRERETiSEwKZdZyEEMHdsApSK9otehjwREZETZZ6t\nwuniWvTvHor+3UPb9bUZ8kRERE7SZJWwefc5yGUyzB3b88Y7tDGGPBERkZPsyihBRZ0RY1KiEBmq\nbvfXZ8gTERE5QYPegvQDhVD7KDF1ZPtcMvdTDHkiIiIn+HTfeRjNzZg2qnu7XTL3Uwx5IiKiNnah\nvBHfZV1G1zA17hrU1WV1MOSJiIjakO2SOQBzx/aEQu66qGXIExERtaGdx0qQd6EOA3uEol98+14y\n91MMeSIiojZyMKcM/951FoFqFR4al+jqchjyREREbSHrXBX++uVp+HkrsXROMsKCfF1dEkOeiIjo\nVp25WId1n+dAqZDh17MGoJuT14m3F0OeiIjoFlwob8RbadmQJIFfTu+HhOggV5dkw5AnIiK6SRW1\nBrz+cRZMZisWTeqDAT3CXF3SFRjyREREN6G20Yw1mzLRoLfgwXGJGJYU4eqSrsKQJyIicpDe1ITX\nP85EVb0J94+Mx9jB0a4u6ZoY8kRERA4wNzXjrS3ZuFSpx9iUaEwdEefqkq6LIU9ERGSnZknCe1/k\n4tylegztG47UcQmQyWSuLuu6GPJERER2EELgH9vzkXmuCklxwVg0qQ/kHTjgAYY8ERGRXT77rhDf\nZZciNsIfv5zeH0pFx4/Qjl8hERGRi+0+XoJtB4rQJcgXT80aCF9vpatLsgtDnoiIqBXH8irwrx1n\nEODnhaVzBiJArXJ1SXZjyBMREV1HXnEt3k/PhUqlwFOzk9El2M/VJTmEIU9ERHQNF8obsfbTbAgB\nLHmgP2Ij/F1dksMY8kRERD9RXW/CG1uyYDQ3Y/HkvkiKC3F1STfFqSMHsrKysGbNGmzcuNF2X2Vl\nJZYuXWq7ffr0aTz99NNITU3F9OnTodG0rNwTHR2NFStWOLM8IiKia9q8+yzqdRbMHdMTQ/uGu7qc\nm+a0kN+wYQO2bt0KX98r19PVarW20D9x4gTeeOMNzJ49G2azGUKIK34hICKijq+wtAEb0k9hxuju\nGNyri6vLuWUFl+txLL8SPboGYNxt3Vxdzi1x2un6mJgYrF279rrbhRB45ZVXsGzZMigUCuTl5cFo\nNGLhwoWYP38+MjMznVUaERG1oYz8SpTVGLDu8xx8m3nJoX1rG81I+7YA5TUGJ1XnGCEEtuwpAADM\nurtnh57Nzh5OO5IfP348SkpKrrt99+7dSEhIQPfu3QEAPj4+WLRoEWbNmoWioiI88sgj2L59O5TK\n1ksMDvaDUqlo09o7Oq228w3+cCX2yzHsl/3YqxaXvw9oja8K/9ieDwkyzL4n8aqA/Gm/jp0uxxv/\nPo4GvQUnzlbh9adGQ+Pr1W51X8vRU2U4c7EOt/eNwIgU1x7Ft8Xny2VX82/duhXz58+33Y6Pj0ds\nbCxkMhni4+MRFBSEyspKREZGtvo8tbUd47e/9qLV+qOystHVZXQa7Jdj2C/7sVcthBA4d7EO2iAf\nPDU7Ga9tysQ/t+ehtFKH1HsSbNO+/rhf1mYJn+47j+2HL0CpkKFffAhyCmuw4u+H8eTMAS6bKlaS\nBP66NQcyGTBleIxL//+29vlyJPxdNro+JycHKSkptttpaWlYuXIlAKC8vBw6nQ5ardZV5RERkR2q\nG0zQGZsQGxGAiBA/vDBvMKK0auzKKMGG9FOwNktXPL6qzohV/zqO7YcvoEuwL16cNwS/mTUQSfEh\nyC6oRvr+Ite8EQD7c0pxqVKPEf0jEaXVuKyOttRuIZ+eno7NmzcDAGpqaqDRaK44lTNz5kw0NjYi\nNTUVTz31FJYvX37DU/VERORaxWUtR5tx319DHuzvjeceSkHP6EAcPlWOt9KyYbJYAQAZ+RVY9vej\nKLjcgGF9w/Hyz29DbIQ/5HIZHp2ahLBAH3zx30Jknqtq9/dhaWrG598Vwkspx7SR8e3++s4iE0II\nVxdxKzztdBlPETqG/XIM+2U/9qrFJ3sL8OXBYjw9JxlJ8T9cS25uasb6z3OQXVCN+MgA9I4PwdcH\niqBSyvHQuESMHBB51Xf2xWWNWP7PDCgVcrz08yEIb8fZ5b4+VIwt3xZg4rBYzLyrR7u97vV0+tP1\nRETU+f3vSP6ns8F5eymw5IH+uKNfBApLG/D1gSJEhanx+wVDMGpg12uOWo+N8MeC+3rBaLbi7U9P\n2s4AOJvO2IQvDxZD7aPExGEx7fKa7YXnw4mI6KYIIVBU1oiwQJ9rjopXKuRYOKkPIkP9IGRyjBsc\nBW+v1q+GuqNfJAovN2LX8RJ8+HUeHp2a5PTL2L48WASD2Yo5Y3rCz8e1o/vbGkOeiIhuSm2jGTpj\nE3rFBF33MXKZDJOGxzn09cacsT1RXNGII6crEB8ZgPG3O+/ouqreiF0ZJQgN8MGYlGinvY6r8HQ9\nERHdlKKfDLprK0qFHL+c1g+BahW27CnA6eLaNn3+H/v8u0JYmwUeuLM7vJTuF4nu946IiKhdFF3n\n+/i2EKTxxi+n94NMBrz7RQ5qG81t/hoXyhtxMKcM3bpoMDSp885P3xqGPBER3RTboLtw58z8lxAd\nhNljeqLR0IQN6bmQpLa7GEwIgbRvCyAAzLqrh8sm4HE2hjwRETlMCIHisgaEBnjD30/ltNe5Z3A0\nBiWEIe9CHdIPFLXZ835z5CJyCmuQFBd8xaV/7oYhT0REDqttNKPB0DLTnTPJZDIsnNQHoQE+2Lq/\nEHlt8P18zvlqbPn2HII0Kiya3LfTL0LTGoY8ERE5rLjced/H/5TaxwuP3p8EGWR4Lz0XDQbLTT9X\neY0B736RC4VcjiUPDECQxrsNK+14GPJEROSwn05n62w9owIxY3R31Oss+GDbKUg3MVmr0WzFXz7J\nhsFsxYL7eqF7V+eehegIGPJEROQwZ46sv57xQ2NaVqw7X4NvjlxwaF9JCGxIP4XSagPuva0bRvRv\nfYVTd8GQJyIihxWXNSIkwBsBThx091NymQyLJ/dFoEaFT/eeR8Glerv3/eK7loVv+sYFY9bdrp+b\nvr0w5ImIyCG1jWbU6y1Ou3SuNQFqFX4xJQmSJPDuF7nQm5puuM+xvAqkHyiCNsgHj93fDwq550Sf\n57xTIiJqE+056O5a+sQGY8qIOFQ3mPD3r/LQ2mKqJRU6/PXL0/D2UuCJGQOuOce+O+Pc9URE5JD2\nHnR3LVNHxOPMxTocP1OJP350DEFqFfx8vKD2VULj4wW1rxf8fJT4bN95mJua8avp/RGt1bisXldh\nyBMRkUN+WF7WdaPT5XIZHpmShLfSsnCxXIfiVo7mp46Iw+Be2nasruNgyBMRkUOKyhoQ7O+NQHX7\nDbq7lmB/byx7+HYIIWCyNENvbILeZIXO1ASDyQq9sQlqXy+PDXiAIU9ERA6o15lRp7MguWeYq0ux\nkclk8PVWwtdbiY5TVcfAgXdERGQ3V1wfTzePIU9ERHZz9ch6cgxDnoiI7NYRRtaT/RjyRERkt6Ky\nRgRqVG6/sIu7YMgTEZFdGvQW1DaaEeeCme7o5jDkiYjILhx01/kw5ImIyC4cdNf5MOSJiMguPwy6\nc/912N0FQ56IiOxSXNaAALUKQRrXznRH9mPIExHRDTUaLKhuMCMuwh8ymczV5ZCdGPJERHRDtkVp\nOLK+U2HIExHRDRVxEpxOyakhn5WVhXnz5l11/4cffohJkyZh3rx5mDdvHs6fPw9JkvDSSy9hzpw5\nmDdvHoqLi51ZGhEROYAj6zsnp61Ct2HDBmzduhW+vr5XbcvJycGqVavQr18/2307duyAxWLB5s2b\nkZmZiZUrV2L9+vXOKo+IiOwkhEBRaQP8/bwQ7M+Z7joTpx3Jx8TEYO3atdfclpubi/fffx+pqal4\n7733AAAZGRkYNWoUACA5ORk5OTnOKo2IiBxw8nw1qhvM6BsXwkF3nYzTjuTHjx+PkpKSa26bNGkS\nHnzwQWg0GixZsgR79uyBTqeDRqOxPUahUMBqtUKpbL3E4GA/KJWKNq29o9NqebrMEeyXY9gv+3lC\nr4QQ+Or/HQcA/Gxi31t6z57Qr7bUFv1yWshfjxACCxYsgL9/S/GjR4/GqVOnoNFooNfrbY+TJOmG\nAQ8AtbUGp9XaEWm1/qisbHR1GZ0G++UY9st+ntKr7IJqnLlQh8G9tFArZTf9nj2lX22ltX45Ev7t\nPrpep9Nh8uTJ0Ov1EELg8OHD6NevH1JSUrBv3z4AQGZmJhITE9u7NCIi+hEhBLbuLwQATB0R7+Jq\n6Ga025F8eno6DAYD5syZg6eeegrz58+HSqXC8OHDMXr0aEiShP3792Pu3LkQQmD58uXtVRoREV1D\nTmENzl9uwOBELbp10dx4B+pwZEII4eoiboWnnf7hKS/HsF+OYb/s5+69EkJg+cYMFFxuwLKHb0PM\nLU6C4+79amud9nQ9ERF1fLmFNSi43ICURO0tBzy5DkOeiIiuIITAF7bv4uNcWwzdEoY8ERFdIbeo\nBgWXGjAoIYxH8Z0cQ56IiGyEENj63yIAHFHvDhjyRERkc6qoFucu1WNQQhjnqXcDDHkiIgLw0+/i\neRTvDhjyREQEADhVXItzJfVI7smjeHfBkCciou+/i285ir9/JI/i3UW7z11PRETtTwiBAzllKKsx\nQJIEmiXR8lO0/DSarTjLo3i3w5AnIvIAn+w9j68OFbf6GIVcxqN4N8OQJyJyczuOXMBXh4oRHuyL\nBff1hpeXHAq5DHKZrOWnvOWnn48XNL5eri6X2hBDnojIjR3MKcOm3ecQqFHh6TnJCAvydXVJ1I44\n8I6IyE1lF1Tjb1+dhp+3Ek/PZsB7IoY8EZEbKrhUj3Wfn4RcLsOTMwcgmkvFeiSGPBGRm7lcpceb\nW7JgtQo8fn8/JHYLcnVJ5CIMeSIiN1LTYMJrmzOhN1nx8wm9kZwQ5uqSyIUY8kREbkJnbMJrmzNR\n22jGrLt6YOSASFeXRC7W6uj66dOn47PPPkPv3r0hk8kghLhiu6+vL2bPno3nn3/eqUUSEVHrmqzN\n+Msn2SitNmD87d1w39AYV5dEHUCrIf/ZZ58BAPLy8q65vbGxERMmTGDIExG5kBACf/86D+dK6nF7\nny6YdXdPyGQyV5dFHUCrIf/222+3uvOSJUvw8ccft2lBRETkmPQDRTiUW44eUQFYNKkP5Ax4+p5d\n38lnZ2djx44dkMvlUKlU2Lt3L86dOwcA6Nq1q1MLJCKi6ztyuhyff1eIsEAfPPHAAHgpFa4uiTqQ\nVo/klyxZAgCYO3cuNm/eDF/flokUFixYgPnz5zu/OiIiuq6CS/X4YNtp+KgUeHLmAASoVa4uiToY\nu47ka2trr/h+p6mpCXV1dU4rioiIWldVZ8TaT7LRLEl4fFo/RGs52Q1dza6562fNmoUZM2bgzjvv\nhCRJ+Pbbb3kkT0TkIkazFW99ko0GQxMeGpeI/t1DXV0SdVB2hfzixYsxbNgwHDlyBDKZDG+99RZ6\n9+7t7NqIiOgnmiUJ736Ri0uVeowdHI2xg6NdXRJ1YHadrrdYLCgtLUVISAiCg4ORm5uLt956y9m1\nERHRT2zadQ4nz1ejf/dQzB3b09XlUAdn15H8kiVLYDQaceHCBQwZMgRHjx5FcnKys2sjIqLvSUJg\ny55z2JVRgiitGo/dnwSFnJOWUuvs+oQUFhbiH//4B8aNG4fFixdjy5YtqKiocHZtREQEwNos4YNt\np/DNkYuIDPXDU7MGwtfbrmM08nB2hXxoaChkMhni4+ORn5+P8PBwWCwWZ9dGROTxTBYr3krLbpns\npmsAnv/ZYIQE+Li6LOok7PpVMCEhAa+88gpSU1PxzDPPoKKiAk1NTc6ujYjIozXoLXhzSxaKyhox\nsEcoHpvWD95enOyG7GdXyC9btgwnTpxAz5498cQTT+DgwYN47bXXAACVlZXQarXX3C8rKwtr1qzB\nxo0br7h/27Zt+Oijj6BQKJCYmIhly5ZBLpdj+vTp0GharvWMjo7GihUrbuW9ERF1WpV1Rry2ORMV\ntUaM7B+JBRN68Tt4cphdIa9QKDBkyBAAwNixYzF27Fjbtl/84he2hWx+bMOGDdi6dattlrz/MZlM\nePPNN5Geng5fX18sXboUe/bswciRIyGEuOoXAiIiT3OhvBFvfJyFer0Fk4bH4oE7u3PBGbopt/xr\n4U+Xn/2fmJgYrF279qr7VSoVNm3aZAt/q9UKb29v5OXlwWg0YuHChZg/fz4yMzNvtTQiok4nr7gW\nK/91HA16C1LvScCM0T0Y8HTTbnl45vU+fOPHj0dJSclV98vlcoSFhQEANm7cCIPBgBEjRuDMmTNY\ntGgRZs2ahaKiIjzyyCPYvn07lMrWSwwO9oPSwxZk0Gr9XV1Cp8J+OYb9sl9b9+pypQ5vf3YS1mYJ\nz/5sCEYNimrT53c1frYc0xb9csk1GJIkYfXq1SgsLMTatWttI/djY2Ntfw4KCkJlZSUiIyNbfa7a\nWkM7Vd0xaLX+qKxsdHUZnQb75Rj2y35t3SuzpRl/2ngMBpMViyf3Qe/oALf6f8HPlmNa65cj4e+S\nURwvvfQSzGYz1q1bZzttn5aWhpUrVwIAysvLodPprjugj4jInQgh8I9v8lBSqcfdg6JwR7/WD26I\n7HXLR/LX+07+p9LT02EwGNCvXz+kpaVhyJAhWLBgAQBg/vz5mDlzJp5//nmkpqZCJpNh+fLlNzxV\nT0TkDr49cQkHc8sRHxmAuWMTXF0OuRGZsCOl33vvPTz66KNX3Pf6669j6dKlOHbsmG3kvSt42ukf\nnvJyDPvlGPbLfm3Vq4LL9Vj5z+Pw9Vbi5Z/fhtBA95zohp8tx7TV6fpWD5XXrFmD6upq7N69G0VF\nRbb7rVYrsrOzsXTpUpcGPBFRZ9ZgsGDdZzmQJIFHpya5bcCT67Qa8vfeey8KCgpw6NAh3H777bb7\nFQoFfvWrXzm9OCIidyVJAu9vzUVtoxnT7+yOpPgQV5dEbqjVkB8wYAAGDBiAe+65B/7+vPSBiKit\nfP7f8zhVVIuBPUIxaXisq8shN2XXyLadO3di5cqVaGhoANAy2E4mk+H06dNOLY6IyB1lnq3CtgPF\n0Ab5YPGUvpBzshtyErtC/u2338bGjRuRmJjo7HqIiNxaRZ0RG7adgpdSjl9N7w+1j5erSyI3Ztd1\n8uHh4Qx4IqJbZG2WsGFrLoxmK352byJiwvk1KDmXXUfySUlJePLJJzFixAh4e3vb7p82bZrTCiMi\ncjfp+4tQcLkBQ/uGY2R/TnhDzmdXyOt0OqjV6qsWjWHIExHZ58zFOmw7WITQAB/Mu7cXF52hdmFX\nyHNddyKim2cwNWFDei4A4BdT+8LPh7N5Uvuw65M2ZsyYa/7WuWvXrjYviIjInQghsHHHGVQ3mDF1\nRBwSooNcXRJ5ELtCfuPGjbY/W61W/Oc//4HFYnFaUURE7uJgbhkOnypHj64BmDIiztXlkIexa3R9\nVFSU7b/Y2FgsXrwYO3fudHZtRESdWkWdEf/ccQY+KgUemZoEhdwlC3+SB7PrSP7o0aO2PwshcPbs\nWZjNZqcVRUTU2TVLLZfLmSzNWDy5D7oE+bq6JPJAdoX8X/7yF9ufZTIZgoODbWu/ExHR1X58udzw\npAhXl0MeyqHv5HU6HSRJQkBAgFOLIiLqzM5crEP6AV4uR65nV8hfvHgRTz31FC5evAghBLp27Yo3\n3ngD8fHxzq6PiKhTkITA2Yt1+O/JUhzLqwTAy+XI9ez69L300ktYvHgx7rvvPgDAV199hZdeeumK\nUfdERJ6oss6IAzll2H+yFFX1JgBAWKAP7h+ZyMvlyOXsCvna2lpbwAPAxIkTsX79eqcVRUTUkTVZ\nJew6egHbDxQi70IdAMDbS4ER/SMwsn8kEroFcWU56hDsCnmVSoXc3FwkJSUBAHJycuDry5GiROR5\nzlysw0fb81BabQAA9I4Jwoj+kRjcSwsfFU/NU8di1yfyhRdewBNPPIGgoCAIIVBfX4/XX3/d2bUR\nEXUYBpMVaXsL8O2JS5ABmDQiHnf2j4CWl8ZRB2ZXyCcnJ+Obb75BUVERJElCfHw8VCqVs2sjIuoQ\njp+pxD935KNOZ0FUmBo/n9Abw5KjUVnZ6OrSiFplV8ifP38eH3/8Merr66+4nwvXEJE7q2004//9\n5wwyzlRCqZBh+qh4TBgWC6WCM9dR52BXyC9ZsgQTJ05Er169nF0PEZHLGc1WHMgpw6f7zsNotiIh\nOhA/n9AbkaFqV5dG5BC7Qj4gIABLlixxdi1ERC5jbZaQc74GB3PLkHmuCk1WCb7eCswf3wt3Jnfl\naHnqlOwK+enTp+ONN97AsGHDoFT+sMttt93mtMKIiJxNCIGCSw04mFuGo3kV0BmbAAARIX4YnhSO\nUQO7Ikjj7eIqiW6eXSF/5MgRnDx5EsePH7fdJ5PJ8I9//MNphREROUttoxnfZV/G/pOlqKxrmcAm\nQK3CuCHdMLxfOGLD/TkVLbkFu0I+JycHO3bscHYtREROIwmB3MIafHviErLOVUMSAt5eCgxPisDw\npHD0iQvmUrDkduwK+cTEROTl5aF3797OroeIqE3V6y34b/Zl7M28bJt2NiZcg7sGRWFon3D4enMC\nG3Jfdi9Q88ADDyAsLAxeXl62+3ft2uW0woiIHCVJApV1Rlyq0uNSpQ5FZY3ILqhGsySg8pJj1IBI\n3DUoCnEWn4Y8AAAgAElEQVQRPB1PnsGukH/ttdewb98+6PV6REVFobm5GYcPH3Z2bURE16U3NaHw\ncgOKyxtxuUqPS1V6lFYb0GSVrnhctFaN0clRGJ4UwRXhyOPY9YlftWoVjEYjLly4gCFDhuDo0aNI\nTk6+4X5ZWVlYs2bNVavV7d69G++88w6USiVmzJiB2bNnQ5IkLFu2DPn5+VCpVHj11VcRGxt7c++K\niNxKsyShpEKP85frcf5yAwouN6CsxnDFY7yUcnQNVaNrmBpR2paf0WFqhAb68KidPJZdIV9YWIgd\nO3bgT3/6E2bMmIHf/va3+PWvf93qPhs2bMDWrVuvWsimqakJK1asQFpaGnx9fZGamooxY8bg+PHj\nsFgs2Lx5MzIzM7Fy5UqudEfkoYQQKK02IPNcFbILqlFU2gDLj47QfVQK9IkNRo+oAMRFBCBKq4Y2\n0BdyOcOc6MfsCvnQ0FDIZDLEx8cjPz8f06ZNg8ViaXWfmJgYrF27Fr/97W+vuL+goAAxMTEIDAwE\nAAwePBhHjx5FZmYmRo0aBaBlrvycnJybeT9E1ElJksC5S/XIPFuFE2crUV5rBADIZEBUmAbduwag\nR9cAdO8agMhQNQOdyA52hXxCQgJeeeUVpKam4plnnkFFRQWamppa3Wf8+PEoKSm56n6dTgd/f3/b\nbbVaDZ1OB51OB41GY7tfoVDAarVeMfkOEbkXk8WKU0W1yDxbhayCKjQaWv5d8fZSYHCiFskJYRjY\nMwwaX68bPBMRXYtdCbps2TKcOHECPXv2xBNPPIGDBw/itddeu6kX1Gg00Ov1ttt6vR7+/v5X3S9J\nkl0BHxzsB6VScVO1dFZarf+NH0Q27JdjnN2vsmo9jp0ux9FT5cg+VwVrc8tp+GB/b4wfFouhSREY\nmKCFyqvj/73mZ8sx7Jdj2qJfdoW8QqHAkCFDAABjx47F2LFjb/oFe/TogeLiYtTV1cHPzw/Hjh3D\nokWLIJPJsGfPHkycOBGZmZlITEy06/lqaw03fpAb0Wr9ubylA9gvxzjar2ZJwuniWhw9XQGTpRlq\nXy+ofZRQ+3z/8/vbkiSQU1iDrIJqXK764Zf5mC4aDOgZioE9wxAfGWCbH76+ruP/veZnyzHsl2Na\n65cj4d9u58LT09NhMBgwZ84cPPfcc1i0aBGEEJgxYwbCw8Mxbtw47N+/H3PnzoUQAsuXL2+v0ojI\nAUIIFJU14mBuGY6crkCDvvXxOT+mUsoxsEdLqA/oEYqQAB8nVkpEMiGEcHURt8LTfjPkb8OOYb8c\n01q/ymsNOJRbjkOnylH+/eVrGl8v3Na7C4YlhaNLkC90Jiv0xiboTU0wfP9nnckKa7OE3jFB6B0T\n3ClOw9uDny3HsF+O6XRH8kTU+ZibmnEsrwL7si7jbEk9gJbr0W/v0wXDkiLQLz4ESsUP870HcsU2\nog6FIU9EVykua8S+7Ms4lFsOo9kKAOgTG4w7+kUgJVHL+d6JOgn+TSUiAIDRbMXXB4vw5X/Po7is\n5TRhkEaFsYNjMWpAV2iDfFt/AiLqcBjyRISCS/V44+MsGMxWyGUyJPcMw50Du6J/jxAuv0rUiTHk\niTxcUVkDXv84C2ZLMx68txcGJ4Qh2J/frRO5A4Y8kQe7WKHDa5syYTJb8cjUvpgyOoEjoIncCM/D\nEXmoy1V6rNl0AnqTFQ9P7INhfSNcXRIRtTGGPJEHKq8xYPWmE2g0NGH++F4YOSDS1SURkRMw5Ik8\nTGWdEX/+9wnU6yxIHZuAuwZFubokInIShjyRB6lpMGH1v0+gttGMWXf1wLjburm6JCJyIoY8kYeo\n05mx+t8nUFVvwrSR8ZgwLNbVJRGRk3F0PZGbkySBfdmX8ene89AZmzBpeCymjIhzdVlE1A4Y8kRu\nLP9CLf698ywuVOjgrVJgzpieuPe2bpB9v6QrEbk3hjyRG6quN+HjPedwNK8CAHBHvwjMGN2Dk9wQ\neRiGPJEbMTc1Y/vhC/j6UDEsVgnxkQF4cFwCenQNdHVpROQCDHkiN2C2NOO/J0vx9eFi1DSYEahW\nYd74HhjeLwJynpon8lgMeaJOrEFvwa6MEuw+XgK9yQqlQo4Jw2IweXgcl4MlIoY8UUdSWq3H6eJa\nBKq9ER7sC22wL7y9FFc9rqzGgB1HLmB/ThmarBLUPkpMuSMOYwdHI0CtckHlRNQRMeSJXMxsacaR\nvHJ8l12KcyX1V20P9m8J/C7BvugS7IeCS/XIPFsFAUAb5IN7b4vByP6R8FZd/csAEXk2hjyRCwgh\ncL60Ad9lleLI6XKYLM2QAUiKC8ZtfcJhMFlRUWtAea0RFbUG5F+oQ96FOtv+8ZH+uG9oLAYnaiGX\n8zt3Iro2hjxRO/tvdim+OXIBl6r0AICQAG/ce1s3jOwfibAg32vu02RtRkWdCRU1BvirVejRNYDX\nuhPRDTHkidrRvqzL+PDrPCjkMgzppcWogV2RFBdyw6NxL6UCUWFqRIWp26lSInIHDHmidnKqqAYb\nv8mHxtcLL8wbjIgQP1eXRERujgvUELWDy1V6vPNZDmQyYMkD/RnwRNQuGPJETtagt+DNLVkwmq14\neEIfJHYLcnVJROQhGPJETtRkbcbaT7NRVW/C1BFxGN4vwtUlEZEHYcgTOYkkBP765WkUXGrAsL7h\nuH9kvKtLIiIPw5AncpLPvyvEkdMV6BkdiIcn9uYlb0TU7hjyRE6w/2Qpth0oQpcgXzzxQH94KTkb\nHRG1P15CR9RGJCFQXNaInPPV2Lq/CH7eSvx61gD4+3EueSJyDaeFvCRJWLZsGfLz86FSqfDqq68i\nNjYWAFBZWYmlS5faHnv69Gk8/fTTSE1NxfTp06HRaAAA0dHRWLFihbNKJLplNQ0m5BTW4FRRDU4V\n1UJnbAIAqJRy/OqB/ogM5eQ1ROQ6Tgv5nTt3wmKxYPPmzcjMzMTKlSuxfv16AIBWq8XGjRsBACdO\nnMAbb7yB2bNnw2w2Qwhh20bUETXoLdh+5AKyzlWhtNpguz/Y3xsjB0SiX3wI+saFQOPr5cIqiYic\nGPIZGRkYNWoUACA5ORk5OTlXPUYIgVdeeQVr1qyBQqFATk4OjEYjFi5cCKvViqVLlyI5OdlZJRI5\npMnajP8cK8G2A0UwWZrh7aXAgB6hSIoPQb/4EESE+HFwHRF1KE4LeZ1OZzvtDgAKhQJWqxVK5Q8v\nuXv3biQkJKB79+4AAB8fHyxatAizZs1CUVERHnnkEWzfvv2KfYjamxACR/MqkPZtAarqTdD4euGh\ncT1w58Cu8FJy7CoRdVxOS0+NRgO9Xm+7LUnSVWG9detWzJ8/33Y7Pj4esbGxkMlkiI+PR1BQECor\nKxEZGXnd1wkO9oPSw0Yua7X+ri6hU7mVfp25UIsPvsjB6aIaKBUyTBvdA3PuSYTGjQfT8fNlP/bK\nMeyXY9qiX04L+ZSUFOzZswcTJ05EZmYmEhMTr3pMTk4OUlJSbLfT0tJw5swZLFu2DOXl5dDpdNBq\nta2+Tm2todXt7kar9UdlZaOry+g0brZfFXVGfP7deRzKLQcApCRqMevuHggP9oNRb4ZRb27rUjsE\nfr7sx145hv1yTGv9ciT8nRby48aNw/79+zF37lwIIbB8+XKkp6fDYDBgzpw5qKmpgUajueI7zJkz\nZ+L5559HamoqZDIZli9fzlP11G4kSSD7fDW+PXEJJwuqIQDEhGswd0wCescGu7o8IiKHyYQQwtVF\n3ApP+82Qvw07xp5+Negt+C77Mr49cRnVDSYAQI+oAIxNicbtfcMh96DBdPx82Y+9cgz75ZgOfyRP\n1NGdK6nH7uMlOJpXgWZJQOUlx+jkrrh7UBRiwvndIRF1fgx58ji1jWb8e+cZHMuvBABEhvphTEo0\nhidFwM+HfyWIyH3wXzTyGJIksOt4CT7bdx4mSzN6RgXigTu7o1dMEK9vJyK3xJAnj1BY2oB/fJOP\n4rJGqH2U+PmE3hg5INKjvm8nIs/DkCe3pjc24V//OYPdx0sgBDA8KQJzxvREgNp9r3MnIvofhjy5\nraxzVdi4Ix81DWaEh/hh/r2J6BMX4uqyiIjaDUOe3NLhU+V4Pz0XSoUc00bGY8KwWE5BS0QehyFP\nbud/Ae+jUuCVR+9AiB9XgyMiz8RDG3Irh06V2QJ+6Zxk9Irl6Xki8lw8kie3cSi3DBu2nYKPSomn\n5ySje9cAV5dERORSDHlyCwdzy/DB9wH/zNxkxEcy4ImIGPLU6R3MKcMHXzLgiYh+iiFPndr/At5X\npcTTDHgioisw5KnTkIRAWbUBhaUNtv+KShvh663EM6nJiItgwBMR/RhDnjq0/Au1OHm+piXQyxpg\nNDfbtinkMnSPCsDPxvVCbARXjSMi+imGPHVI1fUmbNp1FhlnKm33RYb6YVBCAOIjW/7r1kXDCW6I\niFrBkKcOpckqYfuRC/jyQBEsVgkJ0YGYckccuncNgJ8PJ7UhInIEQ546jJPnq/Gv/5xBRa0RAWoV\n5t/XA8OTIrgMLBHRTWLIk8tV1Rnx711nceJsFeQyGe4ZEo1pI7vDz4cfTyKiW8F/RcklzJZm5BRW\n48TZKhzLq7Cdmv/Zvb3QrYvG1eUREbkFhjy1m3q9BVnnqnD8TCVOFdXC2iwBAEICvDHjzh4YlhTO\nU/NERG2IIU9OZbJYsS/zMo7mV+D8pQaI7++P0qoxKCEMgxK0iI3wh5zhTkTU5hjy5BTmpmbsOX4J\nXx8uRqOhCTIZkNAt6PtgD0OXYD9Xl0hE5PYY8tSmmqzN+DbzMr46WIx6vQW+3gpMHRGHMYOjEeCn\ncnV5REQehSFPbcLaLOG7rMvYdrAYtY1meKsUmHxHLO69LQYaX17fTkTkCgx5umU5hdX46Ot8VDeY\noFLKMWFoDO4bGgN/HrkTEbkUQ55uSWFpA97+5CQkITBuSDdMHB6LQDXDnYioI2DI002rrjfhL2nZ\naGqW8MQDA5CcEObqkoiI6Ee4ugfdFKPZijfTslCvt2Du2AQGPBFRB8SQJ4c1SxLWf56DS5V6jE2J\nxrgh3VxdEhERXQNDnhwihMC//nMWOYU1GNAjFHPv6enqkoiI6Dqc9p28JElYtmwZ8vPzoVKp8Oqr\nryI2Nta2/cMPP8SWLVsQEhICAPjDH/6AuLi4Vvch19tx9CK+PXEJ3bpo8OjUJCjk/D2RiKijclrI\n79y5ExaLBZs3b0ZmZiZWrlyJ9evX27bn5ORg1apV6Nevn+2+HTt2tLoPudbxM5X4ePc5BGlU+PXM\nAfD15rhNIqKOzGn/SmdkZGDUqFEAgOTkZOTk5FyxPTc3F++//z4qKytx11134dFHH73hPuQ6haUN\neH9rLlReCvx65kCEBPi4uiQiIroBp4W8TqeDRvPDkqEKhQJWqxVKZctLTpo0CQ8++CA0Gg2WLFmC\nPXv23HCfawkO9oNSqXDW2+iQtFr/dnmdiloDThfW4HRRDb7LvARrs4QXHx6KIUkR7fL6baW9+uUu\n2C/7sVeOYb8c0xb9clrIazQa6PV6221JkmxhLYTAggUL4O/f8gZGjx6NU6dOtbrP9dTWGpxQfcel\n1fqjsrKxzZ+3WZJQUqHH2ZI6nLtUj7Ml9ahtNNu2q5Ry/OzeXojvonbK6zuLs/rlrtgv+7FXjmG/\nHNNavxwJf6eFfEpKCvbs2YOJEyciMzMTiYmJtm06nQ6TJ0/GV199BT8/Pxw+fBgzZsyAyWS67j7U\ndpqsEi5V6VBc1oji8pafJZU6NFkl22MC/LyQkqhFz6hAJHQLRGy4P5QKDrIjIupMnBby48aNw/79\n+zF37lwIIbB8+XKkp6fDYDBgzpw5eOqppzB//nyoVCoMHz4co0ePhiRJV+1Dt04IgbMl9TiQU4ai\nsgZcqtSjWRK27Qq5DFFhasRF+qNnVBASugWiS5AvZFzjnYioU5MJIcSNH9ZxedrpH0dOeTVZJRw5\nXY6dx0pQXN6yj5dSjm5dNIgN90dshD9iw/3RNUwNL6V7HqXzFKFj2C/7sVeOYb8c0+FP15Pr1OvM\n2HPiEr49cQkNhibIZMDgRC3GDo5GQrdAXttOROQhGPJuwmCyoqisAftPluLI6Qo0SwJ+3krcd3sM\nxqREISzI19UlEhFRO2PIu4C1WbqlQWx6U1PLoLmyRhSXN6KorBEVtUbb9shQP9wzpBvuSIqAt8qz\nLi8kIqIfMOTbkc7YhL99eRqZ56qg8fWCNsgHoYG+0Ab6ICzQB2FBvggL9EGzJFCvt6BeZ/7+p8V2\nu1ZnQXnNlZcNqn2U6BMbjLgIf/SNC0HfuGAOmiMiIoZ8eym4XI93P89BdYMZUWFqWCWBixU6FJY6\nNhAlSOONpLhgxEYEIC6iZfBcWKAPQ52IiK7CkHcyIQR2ZpTg493nIEkC00bFY/LwOMjlMkhCoF5n\nQVW9EVX1JlTVtfxUKuQIVKsQoFEhSO2NQI2q5bZahciIQI5QJSIiuzDknchgsuLDr0/jWH4lAvy8\n8IupSegbF2LbLpfJEOzvjWB/byREu7BQIiJySwx5J7lQ3oh1n+egotaIxG5BeHRqEoL9vV1dFhER\neRCGvBPsy7qMf+44A2uzhInDYjH9znhem05ERO2OId+GhBD44r+F2Lq/CGofJX45vR+Se4a5uiwi\nIvJQDPk2IoRA2t4CfH3oArRBPnh67iB04QQ0RETkQgz5NiCEwL93ncXOYyUID/HDs3OTERLg4+qy\niIjIwzHkb5EkBP75TT6+zbyMqDA1npmbjEANB9gREZHrMeRvgSQJ/P3r09h/sgwxXTR4em4y/P1U\nri6LiIgIAEP+pjVLEj7YdhqHT5UjPtIfT81OhsbXy9VlERER2TDkb4K1WcJ7X+Qi40wlekYF4jez\nBsLPh60kIqKOhcnkoDqdGRvST+F0cS16xwThyZkD4KNiG4mIqONhOjkgu6Aaf/3yFBoNTRiUEIZf\nTE2CtxeXciUioo6JIW+HJquET/YWYMfRi1AqZEi9JwH3DI7mym9ERNShMeRvoLzGgHe/yEVxeSPC\nQ/zw2NQkxEb4u7osIiKiG/LIkDdZrMgtrIHeZEVooA+0gT4ICfCBUnHl/PIHckqxcccZmC3NGNk/\nEg+OS+D370RE1Gl4TGIZTE3IPFeFjPxK5BTWoMkqXbFdJgNC/L0RFuiLsCAfGExWnDhbBR+VAr+Y\n2hfD+ka4qHIiIqKb49Yh36C34MTZSmTkV+J0cS2aJQEA6BqmxuBELcKCfFBdb0JlnQlV9UZU1Ztw\n5mId8i+27B8f6Y9HpyahS7CfC98FERHRzXGLkJckgYo6I0oqdCip1KGkUo+SSh0qa40Q3z8mNtwf\ng3tpMbiXFpGh6us+V5NVQk2DCTpjE2Ij/K86hU9ERNRZdPqQ/+OHR3G5Sg/LT06/a3y90Ds2GAN6\nhCIlUQutnSvCeSnlCA/xQ7gziiUiImpHnT7kSyp16BqqRnQXDaK1GkRr1YjSahCkUfESNyIi8mid\nPuTXPz0aCjlPqRMREf1Up09HBjwREdG1MSGJiIjcFEOeiIjITTntO3lJkrBs2TLk5+dDpVLh1Vdf\nRWxsrG37tm3b8NFHH0GhUCAxMRHLli2DXC7H9OnTodFoAADR0dFYsWKFs0okIiJya04L+Z07d8Ji\nsWDz5s3IzMzEypUrsX79egCAyWTCm2++ifT0dPj6+mLp0qXYs2cPRo4cCSEENm7c6KyyiIiIPIbT\nTtdnZGRg1KhRAIDk5GTk5OTYtqlUKmzatAm+vi3XrlutVnh7eyMvLw9GoxELFy7E/PnzkZmZ6azy\niIiI3J7TjuR1Op3ttDsAKBQKWK1WKJVKyOVyhIWFAQA2btwIg8GAESNG4MyZM1i0aBFmzZqFoqIi\nPPLII9i+fTuUyuuXGRzsB6XSs9Z012q5Cp4j2C/HsF/2Y68cw345pi365bSQ12g00Ov1ttuSJF0R\n1pIkYfXq1SgsLMTatWshk8kQHx+P2NhY25+DgoJQWVmJyMjI675Oba3BWW+hQ9Jq/VFZ2ejqMjoN\n9ssx7Jf92CvHsF+Oaa1fjoS/007Xp6SkYN++fQCAzMxMJCYmXrH9pZdegtlsxrp162yn7dPS0rBy\n5UoAQHl5OXQ6HbRarbNKJCIicmtOO5IfN24c9u/fj7lz50IIgeXLlyM9PR0GgwH9+vVDWloahgwZ\nggULFgAA5s+fj5kzZ+L5559HamoqZDIZli9f3uqpeiIiIro+mRBC3PhhHZennf7hKS/HsF+OYb/s\nx145hv1yTIc/XU9ERESu1emP5ImIiOjaeCRPRETkphjyREREboohT0RE5KYY8kRERG6KIU9EROSm\nGPJERERuitPJdSBZWVlYs2YNNm7ciNzcXLz88stQqVTo06cPXnzxReTn52P58uW2x2dmZuKdd97B\n7bffjmeffRbV1dVQq9VYtWoVQkJCXPhO2seN+iWXy/G3v/0N27Ztg0wmw2OPPYZx48bBZDKxX9fp\n1/vvv48vv/wSGo0Gixcvxt133+1R/WpqasILL7yAS5cuwWKx4PHHH0fPnj3x3HPPQSaTISEhAS+/\n/DLkcjk+/vhjbNq0CUqlEo8//rjH9QpwrF8AUFNTg9TUVGzduhXe3t7sVyv9+vDDD/Hll18CAEaP\nHo0lS5bcXL8EdQjvv/++mDx5spg1a5YQQojp06eLjIwMIYQQr7/+uvj888+vePxXX30lli5dKoQQ\n4m9/+5v4y1/+IoQQYtu2beKVV15px8pdw55+1dfXi9GjRwuz2Szq6urEXXfdJYRgv4S4dr/y8vLE\nlClThMlkEiaTSUybNk0YDAaP6ldaWpp49dVXhRBC1NbWitGjR4tHH31UHDp0SAghxO9//3uxY8cO\nUVFRISZPnizMZrNoaGiw/dmTeiWE/f0SQoh9+/aJ+++/XwwaNEiYTCYhhOf9XbS3XxcuXBDTp08X\nVqtVSJIk5syZI06fPn1T/eLp+g4iJiYGa9eutd0uLy9HSkoKgJbFfjIyMmzbDAYD1q5dixdffBEA\nkJGRgVGjRgEA7rzzThw8eLAdK3cNe/rl6+uLrl27wmg0wmg0QiaTAWC/gGv3q6CgALfffju8vb3h\n7e2N2NhY5Ofne1S/7rvvPvz6178GAAghoFAokJubi9tvvx1Ay/s/cOAAsrOzMWjQIKhUKvj7+yMm\nJgZ5eXke1SvA/n4BgFwux9///ncEBQXZ9me/rt2viIgIfPDBB1AoFJDJZLBarfD29r6pfjHkO4jx\n48dfsRhPt27dcOTIEQDAnj17YDQabdvS0tJw33332U7T6HQ6+Pu3zGWsVqvR2Oj+80Pb26/IyEhM\nmjQJ06dPx/z58wGwX8C1+9WrVy8cO3YMOp0OtbW1OHHiBIxGo0f1S61WQ6PRQKfT4cknn8RvfvMb\nCCFsvyD+7/3/uCf/u1+n03lUrwD7+wUAI0aMQHBw8BX7s1/X7peXlxdCQkIghMCqVavQt29fxMfH\n31S/GPId1PLly/Hee+9hwYIFCA0NveIvR3p6OmbNmmW7rdFooNfrAQB6vR4BAQHtXq+rXatf+/bt\nQ0VFBXbt2oVvv/0WO3fuRHZ2NvuFa/erR48eeOihh7B48WK88sorGDhwIIKDgz2uX6WlpZg/fz7u\nv/9+TJkyxfZ9MvDD+/9xT/53v7+/v8f1CrCvX9fDfl2/X2azGc888wz0ej1efvllADfXL4Z8B7V3\n716sWbMGH330Eerq6jBixAgAQGNjIywWCyIjI22PTUlJwd69ewEA+/btw+DBg11Ssytdq1+BgYHw\n8fGBSqWCt7c3/P390dDQwH7h2v2qqamBXq/Hpk2b8Ic//AGlpaVISEjwqH5VVVVh4cKFePbZZzFz\n5kwAQN++fXH48GEALe9/yJAhGDBgADIyMmA2m9HY2IiCggIkJiZ6VK8A+/t1PezXtfslhMAvf/lL\n9OrVC3/84x+hUCgA3Fy/OLq+g4qNjcXPf/5z+Pr6YujQoRg9ejQAoLCwEFFRUVc8NjU1Fb/73e+Q\nmpoKLy8vvPbaa64o2aWu168DBw5g9uzZkMvlSElJwYgRIzB48GD26xr9EkLg/PnzmDFjBry8vPDb\n3/4WCoXCoz5f7777LhoaGrBu3TqsW7cOAPDiiy/i1Vdfxeuvv47u3btj/PjxUCgUmDdvHh588EEI\nIfDUU0/B29vbo3oF2N+v62G/rt2vnTt34siRI7BYLPjuu+8AAEuXLr2pfnEVOiIiIjfF0/VERERu\niiFPRETkphjyREREboohT0RE5KYY8kRERG6KIU/khp5//nmMHz8evXr1arPnfO655/Dpp5+22fMR\nkfPxOnkiN/TZZ58hOzsbKpXK1aUQkQsx5InczGOPPQYhBO644w40NTUhKysL6enptgUvoqOjsXr1\naqhUKqxZswY7d+6EQqHAnDlzsGDBAhw5cgRvvPEGTCYT6uvr8eyzz2LChAk3fN0tW7bg0KFDtgk6\n3n77bahUKjz00EP44x//iLNnz6K5uRmPPPIIJk+eDJ1OhxdeeAHl5eWoqKjAkCFD8Oc//xlHjhzB\n6tWrIUkSEhISsGrVKme3jMh9tek6ekTUISQmJoqLFy+Ku+++WwghxJgxY0RVVZUQomVp2VOnTomv\nvvpKzJ07V5jNZqHT6cTUqVNFRUWFeOKJJ8S5c+eEEEIcOHBATJ48WQghxO9+9zvxySefXPc1dTqd\nGD58uNDpdEKSJHHvvfeKsrIysXr1avHRRx8JIYRobGwUkyZNEhcuXBDp6eli3bp1QgghzGazuOee\ne8TJkyfFoUOHxODBg0VDQ4PT+kPkKXgkT+QB7r77bqSmpmLs2LEYP348+vTpgy1btmDChAlQqVRQ\nqVT44osvAACrV6/Gnj17sH37dmRlZV2xEEtr1Go1Ro8ejR07dqBbt27o1q0bwsPDceDAAZhMJnzy\nyScAWpZKPnv2LCZPnozs7Gx8+OGHOH/+POrq6mAwGAAA8fHxV6zyRkQ3hyFP5AH+7//+D3l5edi7\ndyDTv5YAAAHlSURBVC+effZZLFmy5IqlZwGgpKQEISEhmDdvHoYOHYqhQ4di+PDheOaZZ+x+nRkz\nZmD9+vWIjo7GAw88AACQJAmrV69GUlISgJZFOgIDA7Fx40Z88803mD17Nu644w6cOXMG4vtZtn18\nfNronRN5No6uJ3JzVqsV9957L4KDg/Hoo4/+//bukFVhMArj+F8wCIrVto8gCgo2y5oyQRhrOsEo\nmkyWJcEvYfVDmBfsggaDQVBsJgdj8zbTDYYrXl6fX33DC6c8nMOBQ6fTYbfbUa/XWa/XxHHM/X5n\nOBxyOBw4Ho9MJhOazSZhGJIkyct/1Wo1LpcLm80G27YBaDQarFYrAK7XK47jcD6fCcMQz/NwHIdM\nJsN+vydN07fUQORbqZMXMVw2m2U8HjMYDMjlchSLRRaLBaVSie12S7fbJU1Ter0e5XIZ13VptVoU\nCgUqlQpRFD3H6K+wbZvb7fbc7B+NRgRBQLvdJkkSptMplmXR7/cJgoDlckk+n6darXI6nbAs612l\nEPk6ukInIn/i8XgQxzG+7zObzZ7jeRH5HHXyIvKyKIrwPO/XN9/3mc/nuK6rgBf5J9TJi4iIGEqL\ndyIiIoZSyIuIiBhKIS8iImIohbyIiIihFPIiIiKGUsiLiIgY6gdStt0mJAJxtgAAAABJRU5ErkJg\ngg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# use Seaborn library to plot a Pandas DataFrame\n",
"plot_df = outlays[['fiscal_year', 'function_name', 'amount_adj']]\n",
"plot_group = plot_df.groupby(['fiscal_year', 'function_name']).sum().reset_index()\n",
"sns.set(style=\"darkgrid\")\n",
"ax = sns.tsplot(data=plot_group, time='fiscal_year', unit='function_name', value='amount_adj', err_style=None)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"fiscal_year\n",
"1962 18.201\n",
"1963 17.974\n",
"1964 17.880\n",
"1965 16.635\n",
"1966 17.206\n",
"1967 18.786\n",
"1968 19.808\n",
"1969 18.695\n",
"1970 18.649\n",
"1971 18.777\n",
"1972 18.916\n",
"1973 18.120\n",
"1974 18.124\n",
"1975 20.634\n",
"1976 20.767\n",
"1977 20.174\n",
"1978 20.136\n",
"1979 19.612\n",
"1980 21.129\n",
"1981 21.611\n",
"1982 22.503\n",
"1983 22.828\n",
"1984 21.549\n",
"1985 22.161\n",
"1986 21.833\n",
"1987 20.996\n",
"1988 20.648\n",
"1989 20.534\n",
"1990 21.185\n",
"1991 21.673\n",
"1992 21.470\n",
"1993 20.742\n",
"1994 20.308\n",
"1995 19.988\n",
"1996 19.559\n",
"1997 18.874\n",
"1998 18.453\n",
"1999 17.894\n",
"2000 17.628\n",
"2001 17.633\n",
"2002 18.488\n",
"2003 19.060\n",
"2004 18.967\n",
"2005 19.179\n",
"2006 19.402\n",
"2007 19.051\n",
"2008 20.217\n",
"2009 24.404\n",
"2010 23.361\n",
"2011 23.428\n",
"2012 22.068\n",
"2013 20.940\n",
"2014 20.404\n",
"2015 20.717\n",
"2016 21.391\n",
"2017 21.485\n",
"2018 21.621\n",
"2019 22.102\n",
"2020 22.261\n",
"2021 22.401\n",
"dtype: float64"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# group by years and calculate spending as % of GDP\n",
"\n",
"by_year = outlays.groupby('fiscal_year')\n",
"def percent_gdp(group):\n",
" spending = group['amount'] * 1000\n",
" gdp = group['gdp_billions'] * 1000000000\n",
" return spending.sum() / gdp.max() * 100\n",
"\n",
"by_year.apply(percent_gdp)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Useful tools:\n",
"\n",
"* [Applying operations over pandas DataFrames](http://chrisalbon.com/python/pandas_apply_operations_to_dataframes.html \"Applying operations over Pandas DataFrames\")\n",
"* Pandas docs: [apply-split-combine](http://pandas.pydata.org/pandas-docs/stable/groupby.html \"Pandas: apply-split-combine\")\n",
"* Pandas docs: [computational tools](http://pandas.pydata.org/pandas-docs/stable/computation.html \"Pandas: computational tools\") (includes info on percent change, correlation, window functions)\n",
"* Pandas docs: [time series/date functionality](http://pandas.pydata.org/pandas-docs/stable/timeseries.html \"Pands: time series\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Step 5: Output"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"# filter out the columns we don't want to write out\n",
"# (ran out of time to find a way to get rid of both columns in one line)\n",
"outlays_tidier = outlays.loc[:, outlays.columns != 'deflator']\n",
"outlays_tidier = outlays.loc[:, outlays.columns != 'gdp_billions']\n",
"\n",
"# write to a .csv\n",
"outlays_tidier.to_csv('data/outlays_done.csv', index=False)"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"# dump to SQL db\n",
"\n",
"db_engine = create_engine('sqlite:///data/pandas_data_munging.db')\n",
"outlays_tidier.to_sql(\n",
" 'outlays',\n",
" db_engine,\n",
" if_exists='replace',\n",
" index=False\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Resources\n",
"\n",
"* Python for Data Analysis (by Wes McKinney): http://shop.oreilly.com/product/0636920023784.do\n",
"* Common Excel Tasks Demonstrated in Pandas: http://pbpython.com/excel-pandas-comp.html\n",
"* Pandas Cookbook: http://pandas.pydata.org/pandas-docs/stable/cookbook.html\n",
"* Useful Pandas Snippets: https://gist.github.com/bsweger/e5817488d161f37dcbd2\n",
"* [Chris Albon's](http://chrisalbon.com/) series of tutorials (scroll down to _Data Wrangling_)\n",
"* Pandas docs: [how to do SQL operations in Pandas](http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html)\n",
"* Giant panda image courtesy: http://www.clipartlord.com/2015/11/23/free-giant-panda-clip-art-3/"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}