{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"top\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><h1>Python <a href=\"http://pandas.pydata.org/pandas-docs/stable/\">Pandas</a> Cheat Sheet</h1></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a data analyst, below are common tasks I perform using Pandas.  However, best practice rule of thumb is to use as much SQL as possible for transformations or summarizations before pulling data into pandas as pandas is limited to using your local machine's resources.  Another reason why SQL should be used as much as possible, there will likely be more people that are familiar with SQL in your organization than Python/pandas.  So to reduce \"technical debt\", Python or pandas use should be limited if data manipulations can be done in SQL instead."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [Pandas display output options](#options)\n",
    "- [Quick summary statistics, # of rows, columns, etc](#quick_summary)\n",
    "- [Sorting More Than One Column](#sorting)\n",
    "- [Removing duplicates](#removing_duplicates)\n",
    "- [Finding duplicates](#find_duplicates)\n",
    "- [Creating a column based on values from another column](#map)\n",
    "- [Removing a column](#remove_column)\n",
    "- [Replacing values in a series](#replacing_values)\n",
    "- [Dropping or Filling NaN/NA values](#drop_fill_na)\n",
    "- [Using ```converters``` parameter to convert data during data ingestion](#converters)\n",
    "- [Reading CSV or Excel files](http://pandas.pydata.org/pandas-docs/stable/io.html)\n",
    "- [Renaming axis](#renaming_axis)\n",
    "- [Renaming indexes or columns](#renaming_indexes_columns)\n",
    "- [Binning data into intervals](#binning)\n",
    "- [Sub-selecting or Slicing a Data Frame](#slicing)\n",
    "- [Boolean indexing](#boolean)\n",
    "- [Accessors](#accessors)\n",
    "- [Obtaining columns with partial column labels](#partial_column_labels)\n",
    "- [Getting value counts](#value_counts)\n",
    "- [Getting cumulative sum](#cum_sum)\n",
    "- [Grouping data](#group)\n",
    "- [Pivot table](#pivot_table)\n",
    "- [Named aggregations (version 0.25)](#named_aggregation)\n",
    "- [Percent of Rows](#perc_rows)\n",
    "- [Percent of Columns](#perc_columns)\n",
    "- [Transpose a data frame](#transpose)\n",
    "- [Converting index to a column](#index2column)\n",
    "- [Converting column to index](#column2index)\n",
    "- [How to add or fill in missing dates](#missing_dates)\n",
    "- [How to connect to an ODBC data source (Windows)](#database)\n",
    "- [How to convert data in wide format to long format using melt()](#melt)\n",
    "- [How to convert data in long format data to wide format using pivot()](#pivot)\n",
    "- [Using category data type to control sort order](#category)\n",
    "- [Merging 2 data frames using merge()](#merge)\n",
    "- [Finding rows containing data with missing values](#missing)\n",
    "- [Converting a data type of a column in a data frame](#convert_type)\n",
    "- [Plotting data frames using MATPLOTLIB ver 1.5+](#matplotlib)\n",
    "- [Method chaining](#chaining)\n",
    "- [pipe() example](#pipe)\n",
    "- [Debugging groupby objects with this introspection method](https://realpython.com/python-pandas-tricks/#6-introspect-groupby-objects-via-iteration)\n",
    "- [Mapping trick for memebership binning](https://realpython.com/python-pandas-tricks/#7-use-this-mapping-trick-for-membership-binning)\n",
    "- [BONUS #1: A HUGE list of python and pandas snippets by Chris Albon](http://chrisalbon.com/)\n",
    "- [BONUS #2: More goodies from a major pandas contributor, Tom Augspurger](http://tomaugspurger.github.io/)\n",
    "- [BONUS #3: Data School YouTube Series on Pandas](https://www.youtube.com/watch?v=yzIMircGU5I&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y) and it's accompanying [notebooks](https://github.com/justmarkham/pandas-videos) on Github!\n",
    "- [BONUS #4: Article on Dask: pandas on steroids](https://towardsdatascience.com/why-every-data-scientist-should-use-dask-81b2b850e15b) [Basic Video Overview](https://www.youtube.com/watch?v=ods97a5Pzw0) [More Detailed Video Overview](https://www.youtube.com/watch?v=mjQ7tCQxYFQ)\n",
    "- [BONUS #5: 25 Pandas Tips and Tricks by Kevin Markham](https://nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/top_25_pandas_tricks.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='options'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setting display output [options](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.set_option.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, a large pandas dataframe (several columns or several rows) won't be completely rendered in your browser.  You can override the defaults.  However, you should not be surprised if your browser crashes if it tries to render several hundreds of thousands of rows.  Here's the [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html) to pandas display options documentation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "pd.set_option(\"display.max_rows\",1000)    # or pd.options.display.max_rows=1000\n",
    "pd.set_option(\"display.max_columns\",20)   # or pd.options.display.max_columns=20 \n",
    "pd.set_option('precision',7)\n",
    "pd.set_option('large_repr', 'truncate')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"quick_summary\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Quick summary statistics using df.describe() and data types using df.info().  Also check out [pandas-summary](https://github.com/mouradmourafiq/pandas-summary) - an extension for data frame's describe() method and also [pandas profiling](https://github.com/JosPolfliet/pandas-profiling)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top]](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces\n",
       "0     a     4.0\n",
       "1     a     3.0\n",
       "2     a    12.0\n",
       "3     b     6.0\n",
       "4     b     7.5\n",
       "5     b     8.0\n",
       "6     c     3.0\n",
       "7     c     5.0\n",
       "8     c     6.0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "data = pd.DataFrame({'group': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>9.0000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>6.0555556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>2.8553069</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>3.0000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>4.0000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>6.0000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>7.5000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>12.0000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           ounces\n",
       "count   9.0000000\n",
       "mean    6.0555556\n",
       "std     2.8553069\n",
       "min     3.0000000\n",
       "25%     4.0000000\n",
       "50%     6.0000000\n",
       "75%     7.5000000\n",
       "max    12.0000000"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### df.info() shows data types, number of rows and columns, and memory usage of your data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 9 entries, 0 to 8\n",
      "Data columns (total 2 columns):\n",
      " #   Column  Non-Null Count  Dtype  \n",
      "---  ------  --------------  -----  \n",
      " 0   group   9 non-null      object \n",
      " 1   ounces  9 non-null      float64\n",
      "dtypes: float64(1), object(1)\n",
      "memory usage: 272.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='sorting'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sorting More Than One Column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top]](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Just pass a list of columns you want to sort and pass corresponding list of True/False to the ascending parameter**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces\n",
       "6     c     3.0\n",
       "7     c     5.0\n",
       "8     c     6.0\n",
       "3     b     6.0\n",
       "4     b     7.5\n",
       "5     b     8.0\n",
       "1     a     3.0\n",
       "0     a     4.0\n",
       "2     a    12.0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.sort_values(by=['group','ounces'], ascending=[False, True], inplace=True)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='removing_duplicates'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Removing duplicates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "data = pd.DataFrame({'k1': ['one'] * 3 + ['two'] * 4, 'k2': [3, 2, 1, 3, 3, 4, 4]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k1</th>\n",
       "      <th>k2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>one</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>one</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>one</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>two</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>two</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>two</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>two</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k1  k2\n",
       "0  one   3\n",
       "1  one   2\n",
       "2  one   1\n",
       "3  two   3\n",
       "4  two   3\n",
       "5  two   4\n",
       "6  two   4"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.sort_values(by='k2')\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k1</th>\n",
       "      <th>k2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>one</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>one</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>one</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>two</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>two</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k1  k2\n",
       "0  one   3\n",
       "1  one   2\n",
       "2  one   1\n",
       "3  two   3\n",
       "5  two   4"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.drop_duplicates()  # by default, a \"duplicate\" is defined by all columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Define duplicates by column name(s):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k1</th>\n",
       "      <th>k2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>one</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>two</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k1  k2\n",
       "0  one   3\n",
       "3  two   3"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.drop_duplicates(subset='k1')  # duplicate in column k1 only"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"find_duplicates\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Finding duplicate rows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k1</th>\n",
       "      <th>k2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>one</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>one</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>one</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>two</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>two</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>two</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>two</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k1  k2\n",
       "0  one   3\n",
       "1  one   2\n",
       "2  one   1\n",
       "3  two   3\n",
       "4  two   3\n",
       "5  two   4\n",
       "6  two   4"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on method duplicated in module pandas.core.frame:\n",
      "\n",
      "duplicated(subset: Union[Hashable, Sequence[Hashable], NoneType] = None, keep: Union[str, bool] = 'first') -> 'Series' method of pandas.core.frame.DataFrame instance\n",
      "    Return boolean Series denoting duplicate rows.\n",
      "    \n",
      "    Considering certain columns is optional.\n",
      "    \n",
      "    Parameters\n",
      "    ----------\n",
      "    subset : column label or sequence of labels, optional\n",
      "        Only consider certain columns for identifying duplicates, by\n",
      "        default use all of the columns.\n",
      "    keep : {'first', 'last', False}, default 'first'\n",
      "        Determines which duplicates (if any) to mark.\n",
      "    \n",
      "        - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n",
      "        - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n",
      "        - False : Mark all duplicates as ``True``.\n",
      "    \n",
      "    Returns\n",
      "    -------\n",
      "    Series\n",
      "        Boolean series for each duplicated rows.\n",
      "    \n",
      "    See Also\n",
      "    --------\n",
      "    Index.duplicated : Equivalent method on index.\n",
      "    Series.duplicated : Equivalent method on Series.\n",
      "    Series.drop_duplicates : Remove duplicate values from Series.\n",
      "    DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n",
      "    \n",
      "    Examples\n",
      "    --------\n",
      "    Consider dataset containing ramen rating.\n",
      "    \n",
      "    >>> df = pd.DataFrame({\n",
      "    ...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],\n",
      "    ...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],\n",
      "    ...     'rating': [4, 4, 3.5, 15, 5]\n",
      "    ... })\n",
      "    >>> df\n",
      "        brand style  rating\n",
      "    0  Yum Yum   cup     4.0\n",
      "    1  Yum Yum   cup     4.0\n",
      "    2  Indomie   cup     3.5\n",
      "    3  Indomie  pack    15.0\n",
      "    4  Indomie  pack     5.0\n",
      "    \n",
      "    By default, for each set of duplicated values, the first occurrence\n",
      "    is set on False and all others on True.\n",
      "    \n",
      "    >>> df.duplicated()\n",
      "    0    False\n",
      "    1     True\n",
      "    2    False\n",
      "    3    False\n",
      "    4    False\n",
      "    dtype: bool\n",
      "    \n",
      "    By using 'last', the last occurrence of each set of duplicated values\n",
      "    is set on False and all others on True.\n",
      "    \n",
      "    >>> df.duplicated(keep='last')\n",
      "    0     True\n",
      "    1    False\n",
      "    2    False\n",
      "    3    False\n",
      "    4    False\n",
      "    dtype: bool\n",
      "    \n",
      "    By setting ``keep`` on False, all duplicates are True.\n",
      "    \n",
      "    >>> df.duplicated(keep=False)\n",
      "    0     True\n",
      "    1     True\n",
      "    2    False\n",
      "    3    False\n",
      "    4    False\n",
      "    dtype: bool\n",
      "    \n",
      "    To find duplicates on specific column(s), use ``subset``.\n",
      "    \n",
      "    >>> df.duplicated(subset=['brand'])\n",
      "    0    False\n",
      "    1     True\n",
      "    2    False\n",
      "    3     True\n",
      "    4     True\n",
      "    dtype: bool\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(data.duplicated)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k1</th>\n",
       "      <th>k2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>two</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>two</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k1  k2\n",
       "4  two   3\n",
       "6  two   4"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "duplicate_rows = data[data.duplicated()]\n",
    "duplicate_rows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"map\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a new column based on values from one or more columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you are familiar with Excel's IF/ELSE function or SQL's CASE WHEN statements, then this section is for you.  For smallish data sets and not too complex IF/ELSE logic, `map()` and `apply()` are fine.  But for larger data sets, you will notice a significant slowdown.  For large data sets, you have a few options:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- use pandas `.loc`\n",
    "- use `np.where`\n",
    "- use `np.select`\n",
    "- use [swifter](https://github.com/jmcarpenter2/swifter) library\n",
    "- Dask's dataframe [apply](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.apply) which swifter does for you under the hood"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This [article](https://towardsdatascience.com/efficient-implementation-of-conditional-logic-on-pandas-dataframes-4afa61eb7fce) summarizes the first 3 options above and explains the reasons for slowdown.  You may have to open the link using your browser's private/incognito mode."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the data was originally sourced from a database table, then apply CASE WHEN logic prior to bringing the data into pandas.  Best practice rule of thumb is to use as much SQL as possible for transformations or summarizations before pulling data into pandas as pandas is limited to using your local machine's resources.  Another reason why SQL should be used as much as possible, there will likely be more people that are familiar with SQL in your organization than Python/pandas.  So to reduce \"technical debt\", Python or pandas use should be limited if data manipulations can be done in SQL instead."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>food</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bacon</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>pulled pork</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bacon</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>corned beef</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bacon</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>honey ham</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>nova lox</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          food  ounces\n",
       "0        bacon     4.0\n",
       "1  pulled pork     3.0\n",
       "2        bacon    12.0\n",
       "3     pastrami     6.0\n",
       "4  corned beef     7.5\n",
       "5        bacon     8.0\n",
       "6     pastrami     3.0\n",
       "7    honey ham     5.0\n",
       "8     nova lox     6.0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "data = pd.DataFrame({'food': ['bacon', 'pulled pork', 'bacon', 'pastrami','corned beef', 'bacon', 'pastrami', 'honey ham','nova lox'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Suppose you wanted to add a column indicating the type of animal that each food came from. Let’s write down a mapping of each distinct meat type to the kind of animal using a dictionary and also a function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "meat_to_animal = {\n",
    "'bacon': 'pig',\n",
    "'pulled pork': 'pig',\n",
    "'pastrami': 'cow',\n",
    "'corned beef': 'cow',\n",
    "'honey ham': 'pig',\n",
    "'nova lox': 'salmon'\n",
    "}\n",
    "\n",
    "def meat2animal(column):\n",
    "    if column == 'bacon':\n",
    "        return 'pig'\n",
    "    elif column == 'pulled pork':\n",
    "        return 'pig'\n",
    "    elif column == 'pastrami':\n",
    "        return 'cow'\n",
    "    elif column == 'corned beef':\n",
    "        return 'cow'\n",
    "    elif column == 'honey ham':\n",
    "        return 'pig'\n",
    "    else:\n",
    "        return 'salmon'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the Python dictionary from above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>food</th>\n",
       "      <th>ounces</th>\n",
       "      <th>animal</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bacon</td>\n",
       "      <td>4.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>pulled pork</td>\n",
       "      <td>3.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bacon</td>\n",
       "      <td>12.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>6.0</td>\n",
       "      <td>cow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>corned beef</td>\n",
       "      <td>7.5</td>\n",
       "      <td>cow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bacon</td>\n",
       "      <td>8.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>3.0</td>\n",
       "      <td>cow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>honey ham</td>\n",
       "      <td>5.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>nova lox</td>\n",
       "      <td>6.0</td>\n",
       "      <td>salmon</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          food  ounces  animal\n",
       "0        bacon     4.0     pig\n",
       "1  pulled pork     3.0     pig\n",
       "2        bacon    12.0     pig\n",
       "3     pastrami     6.0     cow\n",
       "4  corned beef     7.5     cow\n",
       "5        bacon     8.0     pig\n",
       "6     pastrami     3.0     cow\n",
       "7    honey ham     5.0     pig\n",
       "8     nova lox     6.0  salmon"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['animal'] = data['food'].map(meat_to_animal)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or using the function ```meat2animal()```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>food</th>\n",
       "      <th>ounces</th>\n",
       "      <th>animal</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bacon</td>\n",
       "      <td>4.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>pulled pork</td>\n",
       "      <td>3.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bacon</td>\n",
       "      <td>12.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>6.0</td>\n",
       "      <td>cow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>corned beef</td>\n",
       "      <td>7.5</td>\n",
       "      <td>cow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bacon</td>\n",
       "      <td>8.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>3.0</td>\n",
       "      <td>cow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>honey ham</td>\n",
       "      <td>5.0</td>\n",
       "      <td>pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>nova lox</td>\n",
       "      <td>6.0</td>\n",
       "      <td>salmon</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          food  ounces  animal\n",
       "0        bacon     4.0     pig\n",
       "1  pulled pork     3.0     pig\n",
       "2        bacon    12.0     pig\n",
       "3     pastrami     6.0     cow\n",
       "4  corned beef     7.5     cow\n",
       "5        bacon     8.0     pig\n",
       "6     pastrami     3.0     cow\n",
       "7    honey ham     5.0     pig\n",
       "8     nova lox     6.0  salmon"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['animal'] = data['food'].map(meat2animal)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### We could also modify the function slightly to accept a dataframe row and perform logic on more than one column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "def use2columns(row):\n",
    "    if row['animal'] == 'pig' and row['ounces'] > 4:\n",
    "        return 'Big Pig'\n",
    "    elif row['animal'] == 'pig' and row['ounces'] <= 4:\n",
    "        return 'Little Pig'\n",
    "    else:\n",
    "        return 'Other Animal'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE:** With large dataframes, applying a function across an entire row will impose a huge performance penalty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>food</th>\n",
       "      <th>ounces</th>\n",
       "      <th>animal</th>\n",
       "      <th>animal2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bacon</td>\n",
       "      <td>4.0</td>\n",
       "      <td>pig</td>\n",
       "      <td>Little Pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>pulled pork</td>\n",
       "      <td>3.0</td>\n",
       "      <td>pig</td>\n",
       "      <td>Little Pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bacon</td>\n",
       "      <td>12.0</td>\n",
       "      <td>pig</td>\n",
       "      <td>Big Pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>6.0</td>\n",
       "      <td>cow</td>\n",
       "      <td>Other Animal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>corned beef</td>\n",
       "      <td>7.5</td>\n",
       "      <td>cow</td>\n",
       "      <td>Other Animal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bacon</td>\n",
       "      <td>8.0</td>\n",
       "      <td>pig</td>\n",
       "      <td>Big Pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>3.0</td>\n",
       "      <td>cow</td>\n",
       "      <td>Other Animal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>honey ham</td>\n",
       "      <td>5.0</td>\n",
       "      <td>pig</td>\n",
       "      <td>Big Pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>nova lox</td>\n",
       "      <td>6.0</td>\n",
       "      <td>salmon</td>\n",
       "      <td>Other Animal</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          food  ounces  animal       animal2\n",
       "0        bacon     4.0     pig    Little Pig\n",
       "1  pulled pork     3.0     pig    Little Pig\n",
       "2        bacon    12.0     pig       Big Pig\n",
       "3     pastrami     6.0     cow  Other Animal\n",
       "4  corned beef     7.5     cow  Other Animal\n",
       "5        bacon     8.0     pig       Big Pig\n",
       "6     pastrami     3.0     cow  Other Animal\n",
       "7    honey ham     5.0     pig       Big Pig\n",
       "8     nova lox     6.0  salmon  Other Animal"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['animal2'] = data.apply(use2columns, axis='columns')\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use apply() along with lambda function to create new columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "      <th>date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.9893220</td>\n",
       "      <td>2013-01-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.1526321</td>\n",
       "      <td>2013-01-02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.4833127</td>\n",
       "      <td>2013-01-03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.4310336</td>\n",
       "      <td>2013-01-04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-2.3151007</td>\n",
       "      <td>2013-01-05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.4999976</td>\n",
       "      <td>2013-01-06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.8193681</td>\n",
       "      <td>2013-01-07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>-1.0378093</td>\n",
       "      <td>2013-01-08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>-0.6673418</td>\n",
       "      <td>2013-01-09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.3181329</td>\n",
       "      <td>2013-01-10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       value       date\n",
       "0 -0.9893220 2013-01-01\n",
       "1  0.1526321 2013-01-02\n",
       "2  1.4833127 2013-01-03\n",
       "3  1.4310336 2013-01-04\n",
       "4 -2.3151007 2013-01-05\n",
       "5  0.4999976 2013-01-06\n",
       "6  0.8193681 2013-01-07\n",
       "7 -1.0378093 2013-01-08\n",
       "8 -0.6673418 2013-01-09\n",
       "9  0.3181329 2013-01-10"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "dates = pd.date_range('20130101',periods=10)\n",
    "data = {'value': [value[0] for value in np.random.randn(10,1).tolist()], 'date': dates }\n",
    "df = pd.DataFrame.from_dict(data)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "value           float64\n",
       "date     datetime64[ns]\n",
       "dtype: object"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### OBJECTIVE: Let's say you want to create a month and year column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Since ```date``` column is an actual ```datetime``` object, we can leverage its strftime() function:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "      <th>date</th>\n",
       "      <th>month</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.9893220</td>\n",
       "      <td>2013-01-01</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.1526321</td>\n",
       "      <td>2013-01-02</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.4833127</td>\n",
       "      <td>2013-01-03</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.4310336</td>\n",
       "      <td>2013-01-04</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-2.3151007</td>\n",
       "      <td>2013-01-05</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.4999976</td>\n",
       "      <td>2013-01-06</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.8193681</td>\n",
       "      <td>2013-01-07</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>-1.0378093</td>\n",
       "      <td>2013-01-08</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>-0.6673418</td>\n",
       "      <td>2013-01-09</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.3181329</td>\n",
       "      <td>2013-01-10</td>\n",
       "      <td>2013-01</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       value       date    month  year\n",
       "0 -0.9893220 2013-01-01  2013-01  2013\n",
       "1  0.1526321 2013-01-02  2013-01  2013\n",
       "2  1.4833127 2013-01-03  2013-01  2013\n",
       "3  1.4310336 2013-01-04  2013-01  2013\n",
       "4 -2.3151007 2013-01-05  2013-01  2013\n",
       "5  0.4999976 2013-01-06  2013-01  2013\n",
       "6  0.8193681 2013-01-07  2013-01  2013\n",
       "7 -1.0378093 2013-01-08  2013-01  2013\n",
       "8 -0.6673418 2013-01-09  2013-01  2013\n",
       "9  0.3181329 2013-01-10  2013-01  2013"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['month'] = df['date'].apply(lambda x: x.strftime('%Y-%m'))\n",
    "df['year'] = df['date'].apply(lambda x: x.strftime('%Y'))\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a new column based on values in a separate dataframe using a combination of map, query and lambda function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A better alternative is to use the [merge()](#merge) function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces\n",
       "0     a     4.0\n",
       "1     a     3.0\n",
       "2     a    12.0\n",
       "3     b     6.0\n",
       "4     b     7.5\n",
       "5     b     8.0\n",
       "6     c     3.0\n",
       "7     c     5.0\n",
       "8     c     6.0"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "left = pd.DataFrame({'group': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "left"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>alpha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>b</td>\n",
       "      <td>beta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c</td>\n",
       "      <td>charlie</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  label    value\n",
       "0     a    alpha\n",
       "1     b     beta\n",
       "2     c  charlie"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "right = pd.DataFrame({'label': ['a','b','c'],\n",
    "                     'value': ['alpha','beta','charlie']})\n",
    "right"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "      <th>Label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "      <td>alpha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "      <td>alpha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "      <td>alpha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "      <td>beta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "      <td>beta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "      <td>beta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "      <td>charlie</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "      <td>charlie</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "      <td>charlie</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces    Label\n",
       "0     a     4.0    alpha\n",
       "1     a     3.0    alpha\n",
       "2     a    12.0    alpha\n",
       "3     b     6.0     beta\n",
       "4     b     7.5     beta\n",
       "5     b     8.0     beta\n",
       "6     c     3.0  charlie\n",
       "7     c     5.0  charlie\n",
       "8     c     6.0  charlie"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left['Label'] = left['group'].map(\n",
    "    lambda group: right.query(\"label == @group\")['value'].iloc[0]\n",
    ")\n",
    "left"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### An equivalent to SQL's ROW_NUMBER / PARTITION BY"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### OBJECTIVE:\n",
    "    - To identify duplicate records with row numbering\n",
    "    - In the example below, a duplicate record is based on CLAIM_NUM and PART_NUM (both sorted by ascending order)\n",
    "    - With records having a row number other than 1, we want to set cost amounts to zero"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example mimics SQL's [row_number()](https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql) function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_clipboard()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CLAIM_NUM</th>\n",
       "      <th>PART_NUM</th>\n",
       "      <th>PART_COST_USD</th>\n",
       "      <th>LABOR_COST_USD</th>\n",
       "      <th>HANDLING_COST_USD</th>\n",
       "      <th>TOTAL_COST_USD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>062315LH</td>\n",
       "      <td>645.33</td>\n",
       "      <td>60.34</td>\n",
       "      <td>46.30</td>\n",
       "      <td>751.97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>062345LH</td>\n",
       "      <td>323.55</td>\n",
       "      <td>67.25</td>\n",
       "      <td>20.56</td>\n",
       "      <td>751.97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>062015LH</td>\n",
       "      <td>303.13</td>\n",
       "      <td>80.45</td>\n",
       "      <td>35.34</td>\n",
       "      <td>751.97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>062315LH</td>\n",
       "      <td>613.45</td>\n",
       "      <td>60.34</td>\n",
       "      <td>46.30</td>\n",
       "      <td>720.09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>062015LH</td>\n",
       "      <td>300.25</td>\n",
       "      <td>80.45</td>\n",
       "      <td>35.34</td>\n",
       "      <td>720.09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3</td>\n",
       "      <td>062345LH</td>\n",
       "      <td>333.10</td>\n",
       "      <td>67.25</td>\n",
       "      <td>20.56</td>\n",
       "      <td>420.91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>4</td>\n",
       "      <td>062345LH</td>\n",
       "      <td>300.25</td>\n",
       "      <td>80.45</td>\n",
       "      <td>46.30</td>\n",
       "      <td>427.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   CLAIM_NUM  PART_NUM  PART_COST_USD  LABOR_COST_USD  HANDLING_COST_USD  \\\n",
       "0          1  062315LH         645.33           60.34              46.30   \n",
       "1          1  062345LH         323.55           67.25              20.56   \n",
       "2          1  062015LH         303.13           80.45              35.34   \n",
       "3          2  062315LH         613.45           60.34              46.30   \n",
       "4          2  062015LH         300.25           80.45              35.34   \n",
       "5          3  062345LH         333.10           67.25              20.56   \n",
       "6          4  062345LH         300.25           80.45              46.30   \n",
       "\n",
       "   TOTAL_COST_USD  \n",
       "0          751.97  \n",
       "1          751.97  \n",
       "2          751.97  \n",
       "3          720.09  \n",
       "4          720.09  \n",
       "5          420.91  \n",
       "6          427.00  "
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use combination of sort_values(), groupby(), and cumcount() function to create ```ROW_NUM``` column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CLAIM_NUM</th>\n",
       "      <th>PART_NUM</th>\n",
       "      <th>PART_COST_USD</th>\n",
       "      <th>LABOR_COST_USD</th>\n",
       "      <th>HANDLING_COST_USD</th>\n",
       "      <th>TOTAL_COST_USD</th>\n",
       "      <th>ROW_NUM</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>062015LH</td>\n",
       "      <td>303.13</td>\n",
       "      <td>80.45</td>\n",
       "      <td>35.34</td>\n",
       "      <td>751.97</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>062315LH</td>\n",
       "      <td>645.33</td>\n",
       "      <td>60.34</td>\n",
       "      <td>46.30</td>\n",
       "      <td>751.97</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>062345LH</td>\n",
       "      <td>323.55</td>\n",
       "      <td>67.25</td>\n",
       "      <td>20.56</td>\n",
       "      <td>751.97</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>062015LH</td>\n",
       "      <td>300.25</td>\n",
       "      <td>80.45</td>\n",
       "      <td>35.34</td>\n",
       "      <td>720.09</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>062315LH</td>\n",
       "      <td>613.45</td>\n",
       "      <td>60.34</td>\n",
       "      <td>46.30</td>\n",
       "      <td>720.09</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3</td>\n",
       "      <td>062345LH</td>\n",
       "      <td>333.10</td>\n",
       "      <td>67.25</td>\n",
       "      <td>20.56</td>\n",
       "      <td>420.91</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>4</td>\n",
       "      <td>062345LH</td>\n",
       "      <td>300.25</td>\n",
       "      <td>80.45</td>\n",
       "      <td>46.30</td>\n",
       "      <td>427.00</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   CLAIM_NUM  PART_NUM  PART_COST_USD  LABOR_COST_USD  HANDLING_COST_USD  \\\n",
       "2          1  062015LH         303.13           80.45              35.34   \n",
       "0          1  062315LH         645.33           60.34              46.30   \n",
       "1          1  062345LH         323.55           67.25              20.56   \n",
       "4          2  062015LH         300.25           80.45              35.34   \n",
       "3          2  062315LH         613.45           60.34              46.30   \n",
       "5          3  062345LH         333.10           67.25              20.56   \n",
       "6          4  062345LH         300.25           80.45              46.30   \n",
       "\n",
       "   TOTAL_COST_USD  ROW_NUM  \n",
       "2          751.97        1  \n",
       "0          751.97        2  \n",
       "1          751.97        3  \n",
       "4          720.09        1  \n",
       "3          720.09        2  \n",
       "5          420.91        1  \n",
       "6          427.00        1  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['ROW_NUM'] = df.sort_values(by=['PART_NUM']).groupby(['CLAIM_NUM']).cumcount() + 1\n",
    "df.sort_values(by=['CLAIM_NUM', 'ROW_NUM'], inplace=True)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then use np.where() to \"zero out\" the cost columns whose row numbers are not 1:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['PART_COST_USD'] = np.where(df['ROW_NUM'] == 1, df['PART_COST_USD'], 0)\n",
    "df['LABOR_COST_USD'] = np.where(df['ROW_NUM'] == 1, df['LABOR_COST_USD'], 0)\n",
    "df['HANDLING_COST_USD'] = np.where(df['ROW_NUM'] == 1, df['HANDLING_COST_USD'], 0)\n",
    "df['TOTAL_COST_USD'] = np.where(df['ROW_NUM'] == 1, df['TOTAL_COST_USD'], 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CLAIM_NUM</th>\n",
       "      <th>PART_NUM</th>\n",
       "      <th>PART_COST_USD</th>\n",
       "      <th>LABOR_COST_USD</th>\n",
       "      <th>HANDLING_COST_USD</th>\n",
       "      <th>TOTAL_COST_USD</th>\n",
       "      <th>ROW_NUM</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>062015LH</td>\n",
       "      <td>303.13</td>\n",
       "      <td>80.45</td>\n",
       "      <td>35.34</td>\n",
       "      <td>751.97</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>062315LH</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>062345LH</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>062015LH</td>\n",
       "      <td>300.25</td>\n",
       "      <td>80.45</td>\n",
       "      <td>35.34</td>\n",
       "      <td>720.09</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>062315LH</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3</td>\n",
       "      <td>062345LH</td>\n",
       "      <td>333.10</td>\n",
       "      <td>67.25</td>\n",
       "      <td>20.56</td>\n",
       "      <td>420.91</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>4</td>\n",
       "      <td>062345LH</td>\n",
       "      <td>300.25</td>\n",
       "      <td>80.45</td>\n",
       "      <td>46.30</td>\n",
       "      <td>427.00</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   CLAIM_NUM  PART_NUM  PART_COST_USD  LABOR_COST_USD  HANDLING_COST_USD  \\\n",
       "2          1  062015LH         303.13           80.45              35.34   \n",
       "0          1  062315LH           0.00            0.00               0.00   \n",
       "1          1  062345LH           0.00            0.00               0.00   \n",
       "4          2  062015LH         300.25           80.45              35.34   \n",
       "3          2  062315LH           0.00            0.00               0.00   \n",
       "5          3  062345LH         333.10           67.25              20.56   \n",
       "6          4  062345LH         300.25           80.45              46.30   \n",
       "\n",
       "   TOTAL_COST_USD  ROW_NUM  \n",
       "2          751.97        1  \n",
       "0            0.00        2  \n",
       "1            0.00        3  \n",
       "4          720.09        1  \n",
       "3            0.00        2  \n",
       "5          420.91        1  \n",
       "6          427.00        1  "
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### We can also use [dataframe.assign()](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dsintro-chained-assignment) function to create a new column which was added in pandas version 0.16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "      <th>ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.1957977</td>\n",
       "      <td>-1.3006177</td>\n",
       "      <td>0.1505420</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.5043693</td>\n",
       "      <td>1.8548645</td>\n",
       "      <td>0.8110400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-2.1327051</td>\n",
       "      <td>-0.9218325</td>\n",
       "      <td>2.3135495</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.4580521</td>\n",
       "      <td>-1.6185442</td>\n",
       "      <td>-0.2830025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-1.4260360</td>\n",
       "      <td>0.2567198</td>\n",
       "      <td>-5.5548338</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       data1      data2      ratio\n",
       "0 -0.1957977 -1.3006177  0.1505420\n",
       "1  1.5043693  1.8548645  0.8110400\n",
       "2 -2.1327051 -0.9218325  2.3135495\n",
       "3  0.4580521 -1.6185442 -0.2830025\n",
       "4 -1.4260360  0.2567198 -5.5548338"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "df = pd.DataFrame({'data1' : np.random.randn(5),\n",
    "                   'data2' : np.random.randn(5)})\n",
    "\n",
    "df.assign(ratio = df['data1'] / df['data2'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Text to Columns Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>number</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>John Doe</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jane Doe</td>\n",
       "      <td>234</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Someone Else</td>\n",
       "      <td>235</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           name  number\n",
       "0      John Doe      12\n",
       "1      Jane Doe     234\n",
       "2  Someone Else     235"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame({'name': ['John Doe','Jane Doe','Someone Else'], \n",
    "                   'number': [12, 234, 235]\n",
    "                  })\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "df[['first_name', 'last_name']] = df['name'].str.split(' ', expand=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>number</th>\n",
       "      <th>first_name</th>\n",
       "      <th>last_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>John Doe</td>\n",
       "      <td>12</td>\n",
       "      <td>John</td>\n",
       "      <td>Doe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jane Doe</td>\n",
       "      <td>234</td>\n",
       "      <td>Jane</td>\n",
       "      <td>Doe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Someone Else</td>\n",
       "      <td>235</td>\n",
       "      <td>Someone</td>\n",
       "      <td>Else</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           name  number first_name last_name\n",
       "0      John Doe      12       John       Doe\n",
       "1      Jane Doe     234       Jane       Doe\n",
       "2  Someone Else     235    Someone      Else"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='remove_column'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Removing or dropping a column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back top top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>food</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bacon</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>pulled pork</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bacon</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>corned beef</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bacon</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>pastrami</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>honey ham</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>nova lox</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          food  ounces\n",
       "0        bacon     4.0\n",
       "1  pulled pork     3.0\n",
       "2        bacon    12.0\n",
       "3     pastrami     6.0\n",
       "4  corned beef     7.5\n",
       "5        bacon     8.0\n",
       "6     pastrami     3.0\n",
       "7    honey ham     5.0\n",
       "8     nova lox     6.0"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.DataFrame({'food': ['bacon', 'pulled pork', 'bacon', 'pastrami','corned beef', 'bacon', 'pastrami', 'honey ham','nova lox'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ounces\n",
       "0     4.0\n",
       "1     3.0\n",
       "2    12.0\n",
       "3     6.0\n",
       "4     7.5\n",
       "5     8.0\n",
       "6     3.0\n",
       "7     5.0\n",
       "8     6.0"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.drop(columns=['food'], inplace=True)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"replacing_values\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Replacing Values in a Series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1.0\n",
       "1    -999.0\n",
       "2       2.0\n",
       "3    -999.0\n",
       "4   -1000.0\n",
       "5       3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "data = pd.Series([1., -999., 2., -999., -1000., 3.])\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### If you want to replace -999 with NaN:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1.0\n",
       "1       NaN\n",
       "2       2.0\n",
       "3       NaN\n",
       "4   -1000.0\n",
       "5       3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.replace(-999, np.nan, inplace=True)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### If you want to replace multiple values at once, you instead pass a list then the substitute value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1.0\n",
       "1    -999.0\n",
       "2       2.0\n",
       "3    -999.0\n",
       "4   -1000.0\n",
       "5       3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "data = pd.Series([1., -999., 2., -999., -1000., 3.])\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    NaN\n",
       "2    2.0\n",
       "3    NaN\n",
       "4    NaN\n",
       "5    3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.replace([-999, -1000], np.nan, inplace=True)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"drop_fill_na\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Filling or Dropping NaN (\"Not a Number\") Values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Dropping rows that have NaN:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "2    2.0\n",
       "5    3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.dropna()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Filling/replacing NaN values with something else (replace NaN with 0/zero):**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    0.0\n",
       "2    2.0\n",
       "3    0.0\n",
       "4    0.0\n",
       "5    3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.fillna(value=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"converters\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using ```converters``` parameter to transform data during data ingestion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from datetime import datetime"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>GBP_to_USD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Wednesday 15 November 2017</td>\n",
       "      <td>1 GBP = 1.317 USD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Tuesday 14 November 2017</td>\n",
       "      <td>1 GBP = 1.3158 USD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Monday 13 November 2017</td>\n",
       "      <td>1 GBP = 1.312 USD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sunday 12 November 2017</td>\n",
       "      <td>1 GBP = 1.32 USD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Saturday 11 November 2017</td>\n",
       "      <td>1 GBP = 1.32 USD</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         Date          GBP_to_USD\n",
       "0  Wednesday 15 November 2017   1 GBP = 1.317 USD\n",
       "1    Tuesday 14 November 2017  1 GBP = 1.3158 USD\n",
       "2     Monday 13 November 2017   1 GBP = 1.312 USD\n",
       "3     Sunday 12 November 2017    1 GBP = 1.32 USD\n",
       "4   Saturday 11 November 2017    1 GBP = 1.32 USD"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_clipboard()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### OBJECTIVE: You want to make the dates actual dates and you want to extract the USD amounts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Need to convert the string dates into actual Python dates and extract just the numeric values from the ```GBP_to_USD``` columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>GBP_to_USD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-11-15</td>\n",
       "      <td>1.317</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-11-14</td>\n",
       "      <td>1.3158</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2017-11-13</td>\n",
       "      <td>1.312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2017-11-12</td>\n",
       "      <td>1.32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2017-11-11</td>\n",
       "      <td>1.32</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date GBP_to_USD\n",
       "0 2017-11-15      1.317\n",
       "1 2017-11-14     1.3158\n",
       "2 2017-11-13      1.312\n",
       "3 2017-11-12       1.32\n",
       "4 2017-11-11       1.32"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_clipboard(converters={'Date': lambda x: datetime.strptime(x, \"%A %d %B %Y\"),\n",
    "                                   'GBP_to_USD': lambda x: x.split(\" \")[3]\n",
    "                                  })\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"renaming_axis\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"renaming_indexes_columns\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Renaming Index or Columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE:** In pandas land, index is equivalent to your row labels or row numbers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### <i>rename()</i> can be used in conjunction with a dict-like object providing new values for a subset of the axis labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "      <th>three</th>\n",
       "      <th>four</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Colorado</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          one  two  three  four\n",
       "Ohio        0    1      2     3\n",
       "Colorado    4    5      6     7\n",
       "New York    8    9     10    11"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "data = pd.DataFrame(np.arange(12).reshape((3, 4)),index=['Ohio', 'Colorado', 'New York'],columns=['one', 'two', 'three', 'four'])\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "      <th>peekaboo</th>\n",
       "      <th>four</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>INDIANA</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Colorado</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          one  two  peekaboo  four\n",
       "INDIANA     0    1         2     3\n",
       "Colorado    4    5         6     7\n",
       "New York    8    9        10    11"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.rename(index={'Ohio': 'INDIANA'},columns={'three': 'peekaboo'},inplace=True)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also apply str functions to modify the index or column labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ONE</th>\n",
       "      <th>TWO</th>\n",
       "      <th>PEEKABOO</th>\n",
       "      <th>FOUR</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Indiana</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Colorado</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          ONE  TWO  PEEKABOO  FOUR\n",
       "Indiana     0    1         2     3\n",
       "Colorado    4    5         6     7\n",
       "New York    8    9        10    11"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.rename(index=str.title, columns=str.upper, inplace=True)  # str.title means to make the 1st letter capitalized only\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"binning\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Binning Data Into Intervals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Let’s divide these into bins of 18 to 25, 26 to 35, 35 to 60, and finally 60 and older. To do so, you have to use `cut()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "bins = [18, 25, 35, 60, 100]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]]\n",
       "Length: 12\n",
       "Categories (4, interval[int64]): [(18, 25] < (25, 35] < (35, 60] < (60, 100]]"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cats = pd.cut(ages, bins)\n",
    "cats"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18, 25]     5\n",
       "(35, 60]     3\n",
       "(25, 35]     3\n",
       "(60, 100]    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.value_counts(cats)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### If you want the right value to be exclusive in the intervals:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[18, 26), [18, 26), [18, 26), [26, 36), [18, 26), ..., [26, 36), [61, 100), [36, 61), [36, 61), [26, 36)]\n",
       "Length: 12\n",
       "Categories (4, interval[int64]): [[18, 26) < [26, 36) < [36, 61) < [61, 100)]"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut(ages, [18, 26, 36, 61, 100], right=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### You can also pass your own bin names by passing a list or array to the labels option:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Youth         5\n",
       "MiddleAged    3\n",
       "YoungAdult    3\n",
       "Senior        1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group_names = ['Youth', 'YoungAdult', 'MiddleAged', 'Senior']\n",
    "pd.cut(ages, bins, labels=group_names)\n",
    "pd.value_counts(pd.cut(ages, bins, labels=group_names))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='slicing'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sub-selecting or Slicing a Data Frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Filtering by label name: [[loc](#loc)]<br>\n",
    "Filtering by index row and/or column: [[iloc](#iloc)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='df'></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>-0.6096394</td>\n",
       "      <td>-1.4504565</td>\n",
       "      <td>-0.4148328</td>\n",
       "      <td>-0.4758915</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>1.6101875</td>\n",
       "      <td>-0.7751663</td>\n",
       "      <td>-0.2917161</td>\n",
       "      <td>1.1975733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-2.7636373</td>\n",
       "      <td>-0.1262512</td>\n",
       "      <td>0.2035659</td>\n",
       "      <td>-1.0004354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>1.7385611</td>\n",
       "      <td>0.0070701</td>\n",
       "      <td>0.0479491</td>\n",
       "      <td>0.8737716</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>0.1361647</td>\n",
       "      <td>0.8550785</td>\n",
       "      <td>-0.3441474</td>\n",
       "      <td>1.5101716</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>1.3321668</td>\n",
       "      <td>0.5266501</td>\n",
       "      <td>-0.3535708</td>\n",
       "      <td>1.8317850</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-01 -0.6096394 -1.4504565 -0.4148328 -0.4758915\n",
       "2013-01-02  1.6101875 -0.7751663 -0.2917161  1.1975733\n",
       "2013-01-03 -2.7636373 -0.1262512  0.2035659 -1.0004354\n",
       "2013-01-04  1.7385611  0.0070701  0.0479491  0.8737716\n",
       "2013-01-05  0.1361647  0.8550785 -0.3441474  1.5101716\n",
       "2013-01-06  1.3321668  0.5266501 -0.3535708  1.8317850"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "dates = pd.date_range('20130101',periods=6)\n",
    "df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Getting first n rows of data frame using index slicing syntax"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>-0.6096394</td>\n",
       "      <td>-1.4504565</td>\n",
       "      <td>-0.4148328</td>\n",
       "      <td>-0.4758915</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>1.6101875</td>\n",
       "      <td>-0.7751663</td>\n",
       "      <td>-0.2917161</td>\n",
       "      <td>1.1975733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-2.7636373</td>\n",
       "      <td>-0.1262512</td>\n",
       "      <td>0.2035659</td>\n",
       "      <td>-1.0004354</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-01 -0.6096394 -1.4504565 -0.4148328 -0.4758915\n",
       "2013-01-02  1.6101875 -0.7751663 -0.2917161  1.1975733\n",
       "2013-01-03 -2.7636373 -0.1262512  0.2035659 -1.0004354"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[0:3]  # get first 3 rows of the data frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Slicing based on data frame's index range"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>1.6101875</td>\n",
       "      <td>-0.7751663</td>\n",
       "      <td>-0.2917161</td>\n",
       "      <td>1.1975733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-2.7636373</td>\n",
       "      <td>-0.1262512</td>\n",
       "      <td>0.2035659</td>\n",
       "      <td>-1.0004354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>1.7385611</td>\n",
       "      <td>0.0070701</td>\n",
       "      <td>0.0479491</td>\n",
       "      <td>0.8737716</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-02  1.6101875 -0.7751663 -0.2917161  1.1975733\n",
       "2013-01-03 -2.7636373 -0.1262512  0.2035659 -1.0004354\n",
       "2013-01-04  1.7385611  0.0070701  0.0479491  0.8737716"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['20130102':'20130104'] # get rows by index range"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='loc'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Slicing based on column labels/names using loc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[view df](#slicing)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>-0.6096394</td>\n",
       "      <td>-1.4504565</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>1.6101875</td>\n",
       "      <td>-0.7751663</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-2.7636373</td>\n",
       "      <td>-0.1262512</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>1.7385611</td>\n",
       "      <td>0.0070701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>0.1361647</td>\n",
       "      <td>0.8550785</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>1.3321668</td>\n",
       "      <td>0.5266501</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B\n",
       "2013-01-01 -0.6096394 -1.4504565\n",
       "2013-01-02  1.6101875 -0.7751663\n",
       "2013-01-03 -2.7636373 -0.1262512\n",
       "2013-01-04  1.7385611  0.0070701\n",
       "2013-01-05  0.1361647  0.8550785\n",
       "2013-01-06  1.3321668  0.5266501"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[:,['A','B']]  # syntax is: df.loc[rows_index, cols_index]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Slicing based on row index label and column label combined using loc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>1.6101875</td>\n",
       "      <td>-0.7751663</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-2.7636373</td>\n",
       "      <td>-0.1262512</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>1.7385611</td>\n",
       "      <td>0.0070701</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B\n",
       "2013-01-02  1.6101875 -0.7751663\n",
       "2013-01-03 -2.7636373 -0.1262512\n",
       "2013-01-04  1.7385611  0.0070701"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['20130102':'20130104',['A','B']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Resampling Time Series Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For complete list of resampling time periods - [link](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>1.0478521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>-0.8181170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.6467086</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.6722718</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>0.6782096</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A\n",
       "2013-01-01  1.0478521\n",
       "2013-01-02 -0.8181170\n",
       "2013-01-03 -0.6467086\n",
       "2013-01-04  0.6722718\n",
       "2013-01-05  0.6782096"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "dates = pd.date_range('20130101',periods=180)\n",
    "df = pd.DataFrame(np.random.randn(180,1),index=dates,columns=list('A'))\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Partial date filtering using loc, get just January's data:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>1.0478521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>-0.8181170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.6467086</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.6722718</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>0.6782096</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.8074374</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-07</th>\n",
       "      <td>1.5643882</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-08</th>\n",
       "      <td>0.3723588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-09</th>\n",
       "      <td>0.4197943</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-10</th>\n",
       "      <td>1.5196560</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-11</th>\n",
       "      <td>-0.4932106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-12</th>\n",
       "      <td>0.4736382</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-13</th>\n",
       "      <td>0.2886090</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-14</th>\n",
       "      <td>0.8174301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-15</th>\n",
       "      <td>0.0395030</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-16</th>\n",
       "      <td>1.5106605</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-17</th>\n",
       "      <td>-2.0664942</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-18</th>\n",
       "      <td>0.8735501</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-19</th>\n",
       "      <td>-0.8793671</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-20</th>\n",
       "      <td>1.0978811</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-21</th>\n",
       "      <td>-0.5022753</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-22</th>\n",
       "      <td>-1.1285848</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-23</th>\n",
       "      <td>-1.2105503</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-24</th>\n",
       "      <td>-0.3503544</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-25</th>\n",
       "      <td>0.0748389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-26</th>\n",
       "      <td>0.8998185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-27</th>\n",
       "      <td>-0.0481116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-28</th>\n",
       "      <td>-1.7208055</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-29</th>\n",
       "      <td>-0.1856816</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-30</th>\n",
       "      <td>-0.0957440</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-31</th>\n",
       "      <td>0.0110788</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A\n",
       "2013-01-01  1.0478521\n",
       "2013-01-02 -0.8181170\n",
       "2013-01-03 -0.6467086\n",
       "2013-01-04  0.6722718\n",
       "2013-01-05  0.6782096\n",
       "2013-01-06 -0.8074374\n",
       "2013-01-07  1.5643882\n",
       "2013-01-08  0.3723588\n",
       "2013-01-09  0.4197943\n",
       "2013-01-10  1.5196560\n",
       "2013-01-11 -0.4932106\n",
       "2013-01-12  0.4736382\n",
       "2013-01-13  0.2886090\n",
       "2013-01-14  0.8174301\n",
       "2013-01-15  0.0395030\n",
       "2013-01-16  1.5106605\n",
       "2013-01-17 -2.0664942\n",
       "2013-01-18  0.8735501\n",
       "2013-01-19 -0.8793671\n",
       "2013-01-20  1.0978811\n",
       "2013-01-21 -0.5022753\n",
       "2013-01-22 -1.1285848\n",
       "2013-01-23 -1.2105503\n",
       "2013-01-24 -0.3503544\n",
       "2013-01-25  0.0748389\n",
       "2013-01-26  0.8998185\n",
       "2013-01-27 -0.0481116\n",
       "2013-01-28 -1.7208055\n",
       "2013-01-29 -0.1856816\n",
       "2013-01-30 -0.0957440\n",
       "2013-01-31  0.0110788"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['2013-01']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**UPDATE:** You no longer need to use the .loc() function if your index is already of type ```DateTime```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>1.0478521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>-0.8181170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.6467086</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.6722718</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>0.6782096</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.8074374</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-07</th>\n",
       "      <td>1.5643882</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-08</th>\n",
       "      <td>0.3723588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-09</th>\n",
       "      <td>0.4197943</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-10</th>\n",
       "      <td>1.5196560</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-11</th>\n",
       "      <td>-0.4932106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-12</th>\n",
       "      <td>0.4736382</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-13</th>\n",
       "      <td>0.2886090</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-14</th>\n",
       "      <td>0.8174301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-15</th>\n",
       "      <td>0.0395030</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-16</th>\n",
       "      <td>1.5106605</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-17</th>\n",
       "      <td>-2.0664942</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-18</th>\n",
       "      <td>0.8735501</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-19</th>\n",
       "      <td>-0.8793671</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-20</th>\n",
       "      <td>1.0978811</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-21</th>\n",
       "      <td>-0.5022753</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-22</th>\n",
       "      <td>-1.1285848</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-23</th>\n",
       "      <td>-1.2105503</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-24</th>\n",
       "      <td>-0.3503544</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-25</th>\n",
       "      <td>0.0748389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-26</th>\n",
       "      <td>0.8998185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-27</th>\n",
       "      <td>-0.0481116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-28</th>\n",
       "      <td>-1.7208055</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-29</th>\n",
       "      <td>-0.1856816</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-30</th>\n",
       "      <td>-0.0957440</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-31</th>\n",
       "      <td>0.0110788</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A\n",
       "2013-01-01  1.0478521\n",
       "2013-01-02 -0.8181170\n",
       "2013-01-03 -0.6467086\n",
       "2013-01-04  0.6722718\n",
       "2013-01-05  0.6782096\n",
       "2013-01-06 -0.8074374\n",
       "2013-01-07  1.5643882\n",
       "2013-01-08  0.3723588\n",
       "2013-01-09  0.4197943\n",
       "2013-01-10  1.5196560\n",
       "2013-01-11 -0.4932106\n",
       "2013-01-12  0.4736382\n",
       "2013-01-13  0.2886090\n",
       "2013-01-14  0.8174301\n",
       "2013-01-15  0.0395030\n",
       "2013-01-16  1.5106605\n",
       "2013-01-17 -2.0664942\n",
       "2013-01-18  0.8735501\n",
       "2013-01-19 -0.8793671\n",
       "2013-01-20  1.0978811\n",
       "2013-01-21 -0.5022753\n",
       "2013-01-22 -1.1285848\n",
       "2013-01-23 -1.2105503\n",
       "2013-01-24 -0.3503544\n",
       "2013-01-25  0.0748389\n",
       "2013-01-26  0.8998185\n",
       "2013-01-27 -0.0481116\n",
       "2013-01-28 -1.7208055\n",
       "2013-01-29 -0.1856816\n",
       "2013-01-30 -0.0957440\n",
       "2013-01-31  0.0110788"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['2013-01']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Resample daily data to monthly data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-31</th>\n",
       "      <td>1.4080967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-02-28</th>\n",
       "      <td>-3.1467609</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-03-31</th>\n",
       "      <td>-4.7020674</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-04-30</th>\n",
       "      <td>-10.9058725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-05-31</th>\n",
       "      <td>2.5918750</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-06-30</th>\n",
       "      <td>11.2540062</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     A\n",
       "2013-01-31   1.4080967\n",
       "2013-02-28  -3.1467609\n",
       "2013-03-31  -4.7020674\n",
       "2013-04-30 -10.9058725\n",
       "2013-05-31   2.5918750\n",
       "2013-06-30  11.2540062"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "monthly = df.resample('M').sum()\n",
    "monthly.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**But now, I prefer to see YYYY-MM format - for complete list of date formats - [link](http://strftime.org/)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01</th>\n",
       "      <td>1.4080967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-02</th>\n",
       "      <td>-3.1467609</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-03</th>\n",
       "      <td>-4.7020674</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-04</th>\n",
       "      <td>-10.9058725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-05</th>\n",
       "      <td>2.5918750</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-06</th>\n",
       "      <td>11.2540062</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  A\n",
       "2013-01   1.4080967\n",
       "2013-02  -3.1467609\n",
       "2013-03  -4.7020674\n",
       "2013-04 -10.9058725\n",
       "2013-05   2.5918750\n",
       "2013-06  11.2540062"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "monthly.index = monthly.index.strftime('%Y-%m')\n",
    "monthly.head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='iloc'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Slicing based on index / numerical position of the row or column using iloc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[view df](#slicing)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>-0.7594473</td>\n",
       "      <td>1.1939995</td>\n",
       "      <td>1.0831108</td>\n",
       "      <td>0.3719371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.9382271</td>\n",
       "      <td>-0.7338731</td>\n",
       "      <td>0.5363171</td>\n",
       "      <td>1.4589336</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.2829617</td>\n",
       "      <td>-0.9869213</td>\n",
       "      <td>-0.7907918</td>\n",
       "      <td>-3.3821940</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.1283517</td>\n",
       "      <td>0.1085160</td>\n",
       "      <td>0.5731905</td>\n",
       "      <td>0.3059759</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>-0.8193987</td>\n",
       "      <td>-0.5878911</td>\n",
       "      <td>-0.7304131</td>\n",
       "      <td>2.0117699</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.1047669</td>\n",
       "      <td>-1.6513622</td>\n",
       "      <td>-0.7548627</td>\n",
       "      <td>0.1238924</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-01 -0.7594473  1.1939995  1.0831108  0.3719371\n",
       "2013-01-02  0.9382271 -0.7338731  0.5363171  1.4589336\n",
       "2013-01-03 -0.2829617 -0.9869213 -0.7907918 -3.3821940\n",
       "2013-01-04  0.1283517  0.1085160  0.5731905  0.3059759\n",
       "2013-01-05 -0.8193987 -0.5878911 -0.7304131  2.0117699\n",
       "2013-01-06 -0.1047669 -1.6513622 -0.7548627  0.1238924"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dates = pd.date_range('20130101',periods=6)\n",
    "df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0.1283517\n",
       "B    0.1085160\n",
       "C    0.5731905\n",
       "D    0.3059759\n",
       "Name: 2013-01-04 00:00:00, dtype: float64"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[3]  # returns 4th row (index=3) of the data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.1283517</td>\n",
       "      <td>0.1085160</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>-0.8193987</td>\n",
       "      <td>-0.5878911</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B\n",
       "2013-01-04  0.1283517  0.1085160\n",
       "2013-01-05 -0.8193987 -0.5878911"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[3:5,0:2]  # returns specific range of rows and columns of the data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.9382271</td>\n",
       "      <td>-0.7338731</td>\n",
       "      <td>0.5363171</td>\n",
       "      <td>1.4589336</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.2829617</td>\n",
       "      <td>-0.9869213</td>\n",
       "      <td>-0.7907918</td>\n",
       "      <td>-3.3821940</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-02  0.9382271 -0.7338731  0.5363171  1.4589336\n",
       "2013-01-03 -0.2829617 -0.9869213 -0.7907918 -3.3821940"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[1:3,:]  # returning specific rows and returning all columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>1.1939995</td>\n",
       "      <td>1.0831108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>-0.7338731</td>\n",
       "      <td>0.5363171</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.9869213</td>\n",
       "      <td>-0.7907918</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.1085160</td>\n",
       "      <td>0.5731905</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>-0.5878911</td>\n",
       "      <td>-0.7304131</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-1.6513622</td>\n",
       "      <td>-0.7548627</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    B          C\n",
       "2013-01-01  1.1939995  1.0831108\n",
       "2013-01-02 -0.7338731  0.5363171\n",
       "2013-01-03 -0.9869213 -0.7907918\n",
       "2013-01-04  0.1085160  0.5731905\n",
       "2013-01-05 -0.5878911 -0.7304131\n",
       "2013-01-06 -1.6513622 -0.7548627"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[:,1:3]  # returning all rows and specific columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-0.7338730678633689"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[1,1]  # getting secific scalar/single value"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='boolean'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [Boolean Indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[view df](#df)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The boolean operators are: **|** for or, <strong>&</strong> for and, and **~** for not. These **must** be grouped by using parentheses."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**By the way, there are 2 ways to access a dataframe column. Using bracket syntax (df[\"column_name\"]) or dot notation syntax (df.column_name).  It is recommended that you use bracket syntax since you could in theory create a column name that happens to be an attribute of the dataframe.  Another reason to use bracket syntax is if your column name has a space in it.  Then in this case, the dot notation syntax would not work.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.9382271</td>\n",
       "      <td>-0.7338731</td>\n",
       "      <td>0.5363171</td>\n",
       "      <td>1.4589336</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.2829617</td>\n",
       "      <td>-0.9869213</td>\n",
       "      <td>-0.7907918</td>\n",
       "      <td>-3.3821940</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.1283517</td>\n",
       "      <td>0.1085160</td>\n",
       "      <td>0.5731905</td>\n",
       "      <td>0.3059759</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.1047669</td>\n",
       "      <td>-1.6513622</td>\n",
       "      <td>-0.7548627</td>\n",
       "      <td>0.1238924</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-02  0.9382271 -0.7338731  0.5363171  1.4589336\n",
       "2013-01-03 -0.2829617 -0.9869213 -0.7907918 -3.3821940\n",
       "2013-01-04  0.1283517  0.1085160  0.5731905  0.3059759\n",
       "2013-01-06 -0.1047669 -1.6513622 -0.7548627  0.1238924"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df.A > -0.5]  # or df[df[\"A\"] > -0.5], this syntax works when there is a space in the column name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or I've seen some people do boolean filtering by passing \"criteria\" variable to the data frame like so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.9382271</td>\n",
       "      <td>-0.7338731</td>\n",
       "      <td>0.5363171</td>\n",
       "      <td>1.4589336</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.2829617</td>\n",
       "      <td>-0.9869213</td>\n",
       "      <td>-0.7907918</td>\n",
       "      <td>-3.3821940</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.1283517</td>\n",
       "      <td>0.1085160</td>\n",
       "      <td>0.5731905</td>\n",
       "      <td>0.3059759</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.1047669</td>\n",
       "      <td>-1.6513622</td>\n",
       "      <td>-0.7548627</td>\n",
       "      <td>0.1238924</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-02  0.9382271 -0.7338731  0.5363171  1.4589336\n",
       "2013-01-03 -0.2829617 -0.9869213 -0.7907918 -3.3821940\n",
       "2013-01-04  0.1283517  0.1085160  0.5731905  0.3059759\n",
       "2013-01-06 -0.1047669 -1.6513622 -0.7548627  0.1238924"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "criteria = df['A'] > -0.5\n",
    "df[criteria]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>E</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>-0.7594473</td>\n",
       "      <td>1.1939995</td>\n",
       "      <td>1.0831108</td>\n",
       "      <td>0.3719371</td>\n",
       "      <td>one</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.9382271</td>\n",
       "      <td>-0.7338731</td>\n",
       "      <td>0.5363171</td>\n",
       "      <td>1.4589336</td>\n",
       "      <td>one</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.2829617</td>\n",
       "      <td>-0.9869213</td>\n",
       "      <td>-0.7907918</td>\n",
       "      <td>-3.3821940</td>\n",
       "      <td>two</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.1283517</td>\n",
       "      <td>0.1085160</td>\n",
       "      <td>0.5731905</td>\n",
       "      <td>0.3059759</td>\n",
       "      <td>three</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>-0.8193987</td>\n",
       "      <td>-0.5878911</td>\n",
       "      <td>-0.7304131</td>\n",
       "      <td>2.0117699</td>\n",
       "      <td>four</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.1047669</td>\n",
       "      <td>-1.6513622</td>\n",
       "      <td>-0.7548627</td>\n",
       "      <td>0.1238924</td>\n",
       "      <td>three</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D      E\n",
       "2013-01-01 -0.7594473  1.1939995  1.0831108  0.3719371    one\n",
       "2013-01-02  0.9382271 -0.7338731  0.5363171  1.4589336    one\n",
       "2013-01-03 -0.2829617 -0.9869213 -0.7907918 -3.3821940    two\n",
       "2013-01-04  0.1283517  0.1085160  0.5731905  0.3059759  three\n",
       "2013-01-05 -0.8193987 -0.5878911 -0.7304131  2.0117699   four\n",
       "2013-01-06 -0.1047669 -1.6513622 -0.7548627  0.1238924  three"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = df.copy()\n",
    "df2['E']=['one', 'one','two','three','four','three']\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>E</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.2829617</td>\n",
       "      <td>-0.9869213</td>\n",
       "      <td>-0.7907918</td>\n",
       "      <td>-3.3821940</td>\n",
       "      <td>two</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>-0.8193987</td>\n",
       "      <td>-0.5878911</td>\n",
       "      <td>-0.7304131</td>\n",
       "      <td>2.0117699</td>\n",
       "      <td>four</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D     E\n",
       "2013-01-03 -0.2829617 -0.9869213 -0.7907918 -3.3821940   two\n",
       "2013-01-05 -0.8193987 -0.5878911 -0.7304131  2.0117699  four"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2[df2['E'].isin(['two','four'])]  # read as \"return rows where column E contains two or four\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### We can also do partial string matching.  So let's say you don't know the exact spelling a word is you want to match, you can do this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>E</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.2829617</td>\n",
       "      <td>-0.9869213</td>\n",
       "      <td>-0.7907918</td>\n",
       "      <td>-3.3821940</td>\n",
       "      <td>two</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>-0.8193987</td>\n",
       "      <td>-0.5878911</td>\n",
       "      <td>-0.7304131</td>\n",
       "      <td>2.0117699</td>\n",
       "      <td>four</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D     E\n",
       "2013-01-03 -0.2829617 -0.9869213 -0.7907918 -3.3821940   two\n",
       "2013-01-05 -0.8193987 -0.5878911 -0.7304131  2.0117699  four"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2[df2.E.str.contains(\"tw|ou\")]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using ~ to perform a \"NOT\" filtering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>E</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>-0.7594473</td>\n",
       "      <td>1.1939995</td>\n",
       "      <td>1.0831108</td>\n",
       "      <td>0.3719371</td>\n",
       "      <td>one</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.9382271</td>\n",
       "      <td>-0.7338731</td>\n",
       "      <td>0.5363171</td>\n",
       "      <td>1.4589336</td>\n",
       "      <td>one</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.1283517</td>\n",
       "      <td>0.1085160</td>\n",
       "      <td>0.5731905</td>\n",
       "      <td>0.3059759</td>\n",
       "      <td>three</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.1047669</td>\n",
       "      <td>-1.6513622</td>\n",
       "      <td>-0.7548627</td>\n",
       "      <td>0.1238924</td>\n",
       "      <td>three</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D      E\n",
       "2013-01-01 -0.7594473  1.1939995  1.0831108  0.3719371    one\n",
       "2013-01-02  0.9382271 -0.7338731  0.5363171  1.4589336    one\n",
       "2013-01-04  0.1283517  0.1085160  0.5731905  0.3059759  three\n",
       "2013-01-06 -0.1047669 -1.6513622 -0.7548627  0.1238924  three"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2[~df2['E'].isin(['two','four'])]  # column E containing values not in two or four"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Filtering using ```query()``` method.  With version 0.25, we can use ```query()``` / ```eval()``` with column names with spaces using back ticks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is equivalent to using SQL's `WHERE` clause"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>-0.0238212</td>\n",
       "      <td>0.1475556</td>\n",
       "      <td>-1.0245812</td>\n",
       "      <td>1.6811354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.0652982</td>\n",
       "      <td>-0.0231936</td>\n",
       "      <td>0.5774753</td>\n",
       "      <td>-0.6419250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>-0.9531144</td>\n",
       "      <td>1.0518699</td>\n",
       "      <td>-0.0242550</td>\n",
       "      <td>-0.3607456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>1.6691680</td>\n",
       "      <td>-1.1965801</td>\n",
       "      <td>-0.2535312</td>\n",
       "      <td>2.0905688</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>0.6564564</td>\n",
       "      <td>2.3430137</td>\n",
       "      <td>1.8094084</td>\n",
       "      <td>0.6435664</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.4517836</td>\n",
       "      <td>0.7971626</td>\n",
       "      <td>-0.6857641</td>\n",
       "      <td>1.6924577</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-01 -0.0238212  0.1475556 -1.0245812  1.6811354\n",
       "2013-01-02  0.0652982 -0.0231936  0.5774753 -0.6419250\n",
       "2013-01-03 -0.9531144  1.0518699 -0.0242550 -0.3607456\n",
       "2013-01-04  1.6691680 -1.1965801 -0.2535312  2.0905688\n",
       "2013-01-05  0.6564564  2.3430137  1.8094084  0.6435664\n",
       "2013-01-06 -0.4517836  0.7971626 -0.6857641  1.6924577"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "dates = pd.date_range('20130101',periods=6)\n",
    "df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>-0.0238212</td>\n",
       "      <td>0.1475556</td>\n",
       "      <td>-1.0245812</td>\n",
       "      <td>1.6811354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>1.6691680</td>\n",
       "      <td>-1.1965801</td>\n",
       "      <td>-0.2535312</td>\n",
       "      <td>2.0905688</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.4517836</td>\n",
       "      <td>0.7971626</td>\n",
       "      <td>-0.6857641</td>\n",
       "      <td>1.6924577</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-01 -0.0238212  0.1475556 -1.0245812  1.6811354\n",
       "2013-01-04  1.6691680 -1.1965801 -0.2535312  2.0905688\n",
       "2013-01-06 -0.4517836  0.7971626 -0.6857641  1.6924577"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query(\"A > C\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.0652982</td>\n",
       "      <td>-0.0231936</td>\n",
       "      <td>0.5774753</td>\n",
       "      <td>-0.6419250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>1.6691680</td>\n",
       "      <td>-1.1965801</td>\n",
       "      <td>-0.2535312</td>\n",
       "      <td>2.0905688</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>0.6564564</td>\n",
       "      <td>2.3430137</td>\n",
       "      <td>1.8094084</td>\n",
       "      <td>0.6435664</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-02  0.0652982 -0.0231936  0.5774753 -0.6419250\n",
       "2013-01-04  1.6691680 -1.1965801 -0.2535312  2.0905688\n",
       "2013-01-05  0.6564564  2.3430137  1.8094084  0.6435664"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query(\"A > 0\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.0652982</td>\n",
       "      <td>-0.0231936</td>\n",
       "      <td>0.5774753</td>\n",
       "      <td>-0.6419250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>0.6564564</td>\n",
       "      <td>2.3430137</td>\n",
       "      <td>1.8094084</td>\n",
       "      <td>0.6435664</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-02  0.0652982 -0.0231936  0.5774753 -0.6419250\n",
       "2013-01-05  0.6564564  2.3430137  1.8094084  0.6435664"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query(\"A > 0 & A < 1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>-0.0238212</td>\n",
       "      <td>0.1475556</td>\n",
       "      <td>-1.0245812</td>\n",
       "      <td>1.6811354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>0.0652982</td>\n",
       "      <td>-0.0231936</td>\n",
       "      <td>0.5774753</td>\n",
       "      <td>-0.6419250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>1.6691680</td>\n",
       "      <td>-1.1965801</td>\n",
       "      <td>-0.2535312</td>\n",
       "      <td>2.0905688</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.4517836</td>\n",
       "      <td>0.7971626</td>\n",
       "      <td>-0.6857641</td>\n",
       "      <td>1.6924577</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-01 -0.0238212  0.1475556 -1.0245812  1.6811354\n",
       "2013-01-02  0.0652982 -0.0231936  0.5774753 -0.6419250\n",
       "2013-01-04  1.6691680 -1.1965801 -0.2535312  2.0905688\n",
       "2013-01-06 -0.4517836  0.7971626 -0.6857641  1.6924577"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query(\"A > B | A > C\")  # where A is greater than B or A is greater than C"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"accessors\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Accessors and their methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top]](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You've already seen an example usage of the ```.str.``` accessor above.  There are 2 other accessors that help you perform date, and categorical functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'cat', 'dt', 'sparse', 'str'}"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series._accessors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['__annotations__',\n",
       " '__class__',\n",
       " '__delattr__',\n",
       " '__dict__',\n",
       " '__dir__',\n",
       " '__doc__',\n",
       " '__eq__',\n",
       " '__format__',\n",
       " '__ge__',\n",
       " '__getattribute__',\n",
       " '__gt__',\n",
       " '__hash__',\n",
       " '__init__',\n",
       " '__init_subclass__',\n",
       " '__le__',\n",
       " '__lt__',\n",
       " '__module__',\n",
       " '__ne__',\n",
       " '__new__',\n",
       " '__reduce__',\n",
       " '__reduce_ex__',\n",
       " '__repr__',\n",
       " '__setattr__',\n",
       " '__sizeof__',\n",
       " '__str__',\n",
       " '__subclasshook__',\n",
       " '__weakref__',\n",
       " '_accessors',\n",
       " '_add_delegate_accessors',\n",
       " '_constructor',\n",
       " '_delegate_method',\n",
       " '_delegate_property_get',\n",
       " '_delegate_property_set',\n",
       " '_deprecations',\n",
       " '_dir_additions',\n",
       " '_dir_deletions',\n",
       " '_freeze',\n",
       " '_get_values',\n",
       " '_reset_cache',\n",
       " 'asfreq',\n",
       " 'ceil',\n",
       " 'components',\n",
       " 'date',\n",
       " 'day',\n",
       " 'day_name',\n",
       " 'dayofweek',\n",
       " 'dayofyear',\n",
       " 'days',\n",
       " 'days_in_month',\n",
       " 'daysinmonth',\n",
       " 'end_time',\n",
       " 'floor',\n",
       " 'freq',\n",
       " 'hour',\n",
       " 'is_leap_year',\n",
       " 'is_month_end',\n",
       " 'is_month_start',\n",
       " 'is_quarter_end',\n",
       " 'is_quarter_start',\n",
       " 'is_year_end',\n",
       " 'is_year_start',\n",
       " 'isocalendar',\n",
       " 'microsecond',\n",
       " 'microseconds',\n",
       " 'minute',\n",
       " 'month',\n",
       " 'month_name',\n",
       " 'nanosecond',\n",
       " 'nanoseconds',\n",
       " 'normalize',\n",
       " 'quarter',\n",
       " 'qyear',\n",
       " 'round',\n",
       " 'second',\n",
       " 'seconds',\n",
       " 'start_time',\n",
       " 'strftime',\n",
       " 'time',\n",
       " 'timetz',\n",
       " 'to_period',\n",
       " 'to_pydatetime',\n",
       " 'to_pytimedelta',\n",
       " 'to_timestamp',\n",
       " 'total_seconds',\n",
       " 'tz',\n",
       " 'tz_convert',\n",
       " 'tz_localize',\n",
       " 'week',\n",
       " 'weekday',\n",
       " 'weekofyear',\n",
       " 'year']"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dir(pd.Series.dt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the `dir()` output above, we see that there are some useful date-related attributes that we can leverage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [],
   "source": [
    "daterng = pd.Series(pd.date_range('2017', periods=9, freq='Q'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   2017-03-31\n",
       "1   2017-06-30\n",
       "2   2017-09-30\n",
       "3   2017-12-31\n",
       "4   2018-03-31\n",
       "5   2018-06-30\n",
       "6   2018-09-30\n",
       "7   2018-12-31\n",
       "8   2019-03-31\n",
       "dtype: datetime64[ns]"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "daterng"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      Friday\n",
       "1      Friday\n",
       "2    Saturday\n",
       "3      Sunday\n",
       "4    Saturday\n",
       "5    Saturday\n",
       "6      Sunday\n",
       "7      Monday\n",
       "8      Sunday\n",
       "dtype: object"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "daterng.dt.day_name()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3   2017-12-31\n",
       "7   2018-12-31\n",
       "dtype: datetime64[ns]"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "daterng[daterng.dt.is_year_end]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='partial_column_labels'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Obtaining columns with partial column labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.5009150</td>\n",
       "      <td>-1.1067497</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.4532828</td>\n",
       "      <td>0.3262502</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.2458276</td>\n",
       "      <td>0.2946161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.3467547</td>\n",
       "      <td>-0.6925296</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>0.1528616</td>\n",
       "      <td>-0.0112956</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2      data1      data2\n",
       "0    a  one -0.5009150 -1.1067497\n",
       "1    a  two -0.4532828  0.3262502\n",
       "2    b  one -0.2458276  0.2946161\n",
       "3    b  two -0.3467547 -0.6925296\n",
       "4    a  one  0.1528616 -0.0112956"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],\n",
    "                   'key2' : ['one', 'two', 'one', 'two', 'one'],\n",
    "                   'data1' : np.random.randn(5),\n",
    "                   'data2' : np.random.randn(5)})\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.5009150</td>\n",
       "      <td>-1.1067497</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.4532828</td>\n",
       "      <td>0.3262502</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.2458276</td>\n",
       "      <td>0.2946161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.3467547</td>\n",
       "      <td>-0.6925296</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.1528616</td>\n",
       "      <td>-0.0112956</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       data1      data2\n",
       "0 -0.5009150 -1.1067497\n",
       "1 -0.4532828  0.3262502\n",
       "2 -0.2458276  0.2946161\n",
       "3 -0.3467547 -0.6925296\n",
       "4  0.1528616 -0.0112956"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.filter(like='data')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='value_counts'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Value Counts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is so tempting to use the groupby() function or pivot_table, but most of the time, value_counts() function is all we need."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces\n",
       "0     a     4.0\n",
       "1     a     3.0\n",
       "2     a    12.0\n",
       "3     b     6.0\n",
       "4     b     7.5\n",
       "5     b     8.0\n",
       "6     c     3.0\n",
       "7     c     5.0\n",
       "8     c     6.0"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "data = pd.DataFrame({'group': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    3\n",
       "b    3\n",
       "c    3\n",
       "Name: group, dtype: int64"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['group'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**There is also a handy ```normalize``` option:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0.3333333\n",
       "b    0.3333333\n",
       "c    0.3333333\n",
       "Name: group, dtype: float64"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['group'].value_counts(normalize=True) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='cum_sum'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Cumulative Sum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    3\n",
       "b    6\n",
       "c    9\n",
       "Name: group, dtype: int64"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['group'].value_counts().cumsum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"group\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Grouping Data ( see also [value_count()](#value_counts) and [pivot_table()](#pivot_table) )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### New in version 0.20 - agg() function.  Now aggregating is less verbose!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Now for some simple aggregations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>0.9596146</td>\n",
       "      <td>1.9972782</td>\n",
       "      <td>0.6017156</td>\n",
       "      <td>1.4154062</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>-1.4137538</td>\n",
       "      <td>-1.3745631</td>\n",
       "      <td>-0.3215356</td>\n",
       "      <td>0.7472984</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>0.7632499</td>\n",
       "      <td>1.6742977</td>\n",
       "      <td>1.3021326</td>\n",
       "      <td>0.1888954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>0.8005111</td>\n",
       "      <td>-0.1462794</td>\n",
       "      <td>0.3436599</td>\n",
       "      <td>0.1310278</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>0.9066343</td>\n",
       "      <td>0.5098032</td>\n",
       "      <td>-0.1090932</td>\n",
       "      <td>-0.6649942</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.5543259</td>\n",
       "      <td>-0.7512375</td>\n",
       "      <td>1.7489706</td>\n",
       "      <td>-1.6296998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-07</th>\n",
       "      <td>0.2970382</td>\n",
       "      <td>1.2624804</td>\n",
       "      <td>1.0115971</td>\n",
       "      <td>0.2765671</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-08</th>\n",
       "      <td>-0.9845397</td>\n",
       "      <td>0.8508958</td>\n",
       "      <td>0.5400208</td>\n",
       "      <td>0.6634031</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-09</th>\n",
       "      <td>-0.1188055</td>\n",
       "      <td>0.4146147</td>\n",
       "      <td>-0.8512962</td>\n",
       "      <td>0.1770277</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-10</th>\n",
       "      <td>-1.3645758</td>\n",
       "      <td>0.0008273</td>\n",
       "      <td>-0.8191429</td>\n",
       "      <td>0.3820062</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    A          B          C          D\n",
       "2013-01-01  0.9596146  1.9972782  0.6017156  1.4154062\n",
       "2013-01-02 -1.4137538 -1.3745631 -0.3215356  0.7472984\n",
       "2013-01-03  0.7632499  1.6742977  1.3021326  0.1888954\n",
       "2013-01-04  0.8005111 -0.1462794  0.3436599  0.1310278\n",
       "2013-01-05  0.9066343  0.5098032 -0.1090932 -0.6649942\n",
       "2013-01-06 -0.5543259 -0.7512375  1.7489706 -1.6296998\n",
       "2013-01-07  0.2970382  1.2624804  1.0115971  0.2765671\n",
       "2013-01-08 -0.9845397  0.8508958  0.5400208  0.6634031\n",
       "2013-01-09 -0.1188055  0.4146147 -0.8512962  0.1770277\n",
       "2013-01-10 -1.3645758  0.0008273 -0.8191429  0.3820062"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "dates = pd.date_range('20130101',periods=10)\n",
    "df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=list('ABCD'))\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sum for each column:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A   -0.7089525\n",
       "B    4.4381173\n",
       "C    3.4470286\n",
       "D    1.6869378\n",
       "dtype: float64"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.agg('sum')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sum and min of each column:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>sum</th>\n",
       "      <td>-0.7089525</td>\n",
       "      <td>4.4381173</td>\n",
       "      <td>3.4470286</td>\n",
       "      <td>1.6869378</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>-1.4137538</td>\n",
       "      <td>-1.3745631</td>\n",
       "      <td>-0.8512962</td>\n",
       "      <td>-1.6296998</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             A          B          C          D\n",
       "sum -0.7089525  4.4381173  3.4470286  1.6869378\n",
       "min -1.4137538 -1.3745631 -0.8512962 -1.6296998"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.agg(['sum', 'min'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sum of just one or more columns:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-0.7089524679704087"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['A'].agg('sum')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A   -0.7089525\n",
       "C    3.4470286\n",
       "dtype: float64"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['A','C']].agg('sum')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Now for some groupby() examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>0.1564853</td>\n",
       "      <td>-0.6441757</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>two</td>\n",
       "      <td>1.3446095</td>\n",
       "      <td>0.2890500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>0.4096645</td>\n",
       "      <td>-1.2109906</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.1875962</td>\n",
       "      <td>1.3129774</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.7794965</td>\n",
       "      <td>0.4034388</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2      data1      data2\n",
       "0    a  one  0.1564853 -0.6441757\n",
       "1    a  two  1.3446095  0.2890500\n",
       "2    b  one  0.4096645 -1.2109906\n",
       "3    b  two -0.1875962  1.3129774\n",
       "4    a  one -0.7794965  0.4034388"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],\n",
    "                   'key2' : ['one', 'two', 'one', 'two', 'one'],\n",
    "                   'data1' : np.random.randn(5),\n",
    "                   'data2' : np.random.randn(5)})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using a combination of ```groupby()``` and ```agg()```:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In a lot of cases, instead of ```pivot_table()```, we can use the ```groupby()``` \\ ```agg()``` combination."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "General preferred syntax: **```df.groupby('grouping_column').agg({'aggregating_column': 'aggregating_functions'})```**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>0.1564853</td>\n",
       "      <td>-0.6441757</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>two</td>\n",
       "      <td>1.3446095</td>\n",
       "      <td>0.2890500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>0.4096645</td>\n",
       "      <td>-1.2109906</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.1875962</td>\n",
       "      <td>1.3129774</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.7794965</td>\n",
       "      <td>0.4034388</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2      data1      data2\n",
       "0    a  one  0.1564853 -0.6441757\n",
       "1    a  two  1.3446095  0.2890500\n",
       "2    b  one  0.4096645 -1.2109906\n",
       "3    b  two -0.1875962  1.3129774\n",
       "4    a  one -0.7794965  0.4034388"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### NOTE: You can pass a list of aggregating functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">data1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>sum</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0.7215983</td>\n",
       "      <td>1.3446095</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>0.2220683</td>\n",
       "      <td>0.4096645</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          data1           \n",
       "            sum        max\n",
       "key1                      \n",
       "a     0.7215983  1.3446095\n",
       "b     0.2220683  0.4096645"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('key1').agg({'data1': ['sum','max']})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='pivot_table'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pivot table example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See also this excellent [article](http://pbpython.com/pandas-pivot-table-explained.html) on pivot tables by Chris Moffitt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces\n",
       "0     a     4.0\n",
       "1     a     3.0\n",
       "2     a    12.0\n",
       "3     b     6.0\n",
       "4     b     7.5\n",
       "5     b     8.0\n",
       "6     c     3.0\n",
       "7     c     5.0\n",
       "8     c     6.0"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "data = pd.DataFrame({'group': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Calculating the means of each group"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>group</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>6.3333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>7.1666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>4.6666667</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          ounces\n",
       "group           \n",
       "a      6.3333333\n",
       "b      7.1666667\n",
       "c      4.6666667"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.pivot_table(values='ounces',index='group',aggfunc=np.mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Getting counts by group"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>group</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       ounces\n",
       "group        \n",
       "a           3\n",
       "b           3\n",
       "c           3"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.pivot_table(values='ounces',index='group',aggfunc='count')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Getting cumulative sum or running total of the group counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>group</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       ounces\n",
       "group        \n",
       "a           3\n",
       "b           6\n",
       "c           9"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.pivot_table(values='ounces',index='group',aggfunc='count').cumsum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**You can pass a dictionary to the ```aggfunc=``` parameter to specify specific columns to have specific aggregate function applied to them** "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>group</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>3</td>\n",
       "      <td>19.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>3</td>\n",
       "      <td>21.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>3</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       group  ounces\n",
       "group               \n",
       "a          3    19.0\n",
       "b          3    21.5\n",
       "c          3    14.0"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.pivot_table(values='ounces', index='group', aggfunc={'group': len, 'ounces': np.sum})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"named_aggregation\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Named aggregations (v0.25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With named aggregations, we can create new columns based on aggregations on other columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>kind</th>\n",
       "      <th>height</th>\n",
       "      <th>weight</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>cat</td>\n",
       "      <td>9.1</td>\n",
       "      <td>7.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>dog</td>\n",
       "      <td>6.0</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>cat</td>\n",
       "      <td>9.5</td>\n",
       "      <td>9.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>dog</td>\n",
       "      <td>34.0</td>\n",
       "      <td>198.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  kind  height  weight\n",
       "0  cat     9.1     7.9\n",
       "1  dog     6.0     7.5\n",
       "2  cat     9.5     9.9\n",
       "3  dog    34.0   198.0"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],\n",
    "                        'height': [9.1, 6.0, 9.5, 34.0],\n",
    "                        'weight': [7.9, 7.5, 9.9, 198.0]})\n",
    "animals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>min_height</th>\n",
       "      <th>max_height</th>\n",
       "      <th>average_weight</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>kind</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>cat</th>\n",
       "      <td>9.1</td>\n",
       "      <td>9.5</td>\n",
       "      <td>8.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dog</th>\n",
       "      <td>6.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>102.75</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      min_height  max_height  average_weight\n",
       "kind                                        \n",
       "cat          9.1         9.5            8.90\n",
       "dog          6.0        34.0          102.75"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "animals.groupby(\"kind\").agg(\n",
    "    min_height=pd.NamedAgg(column='height', aggfunc='min'),\n",
    "    max_height=pd.NamedAgg(column='height', aggfunc='max'),\n",
    "    average_weight=pd.NamedAgg(column='weight', aggfunc=np.mean)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is also a less verbose syntax using tuples instead of ```NamedAgg()``` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>min_height</th>\n",
       "      <th>max_height</th>\n",
       "      <th>average_weight</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>kind</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>cat</th>\n",
       "      <td>9.1</td>\n",
       "      <td>9.5</td>\n",
       "      <td>8.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dog</th>\n",
       "      <td>6.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>102.75</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      min_height  max_height  average_weight\n",
       "kind                                        \n",
       "cat          9.1         9.5            8.90\n",
       "dog          6.0        34.0          102.75"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "animals.groupby(\"kind\").agg(\n",
    "    min_height=('height', 'min'),\n",
    "    max_height=('height', 'max'),\n",
    "    average_weight=('weight', np.mean)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='perc_rows'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Percent of rows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>80</td>\n",
       "      <td>20</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>70</td>\n",
       "      <td>30</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>75</td>\n",
       "      <td>25</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>75</td>\n",
       "      <td>25</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   group  ounces  size\n",
       "0     80      20   100\n",
       "1     70      30   100\n",
       "2     75      25   100\n",
       "3     75      25   100"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "data = pd.DataFrame({'group': [80, 70, 75, 75],\n",
    "                     'ounces': [20, 30, 25, 25],\n",
    "                     'size': [100, 100, 100, 100]}\n",
    "                   )\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>40.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>50.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>35.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>50.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>37.5</td>\n",
       "      <td>12.5</td>\n",
       "      <td>50.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37.5</td>\n",
       "      <td>12.5</td>\n",
       "      <td>50.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   group  ounces  size\n",
       "0   40.0    10.0  50.0\n",
       "1   35.0    15.0  50.0\n",
       "2   37.5    12.5  50.0\n",
       "3   37.5    12.5  50.0"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "perc_of_rows = data.apply(lambda x : x / x.sum() * 100, axis='columns') # or axis=1\n",
    "perc_of_rows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='perc_columns'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Percent of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>26.6666667</td>\n",
       "      <td>20.0</td>\n",
       "      <td>25.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>23.3333333</td>\n",
       "      <td>30.0</td>\n",
       "      <td>25.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>25.0000000</td>\n",
       "      <td>25.0</td>\n",
       "      <td>25.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>25.0000000</td>\n",
       "      <td>25.0</td>\n",
       "      <td>25.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        group  ounces  size\n",
       "0  26.6666667    20.0  25.0\n",
       "1  23.3333333    30.0  25.0\n",
       "2  25.0000000    25.0  25.0\n",
       "3  25.0000000    25.0  25.0"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "perc_of_columns = data.apply(lambda x : x / x.sum() * 100, axis='index') # or axis=0\n",
    "perc_of_columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**or percent of a specific column:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "      <th>size</th>\n",
       "      <th>Perc_of_Group_Column</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>80</td>\n",
       "      <td>20</td>\n",
       "      <td>100</td>\n",
       "      <td>26.6666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>70</td>\n",
       "      <td>30</td>\n",
       "      <td>100</td>\n",
       "      <td>23.3333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>75</td>\n",
       "      <td>25</td>\n",
       "      <td>100</td>\n",
       "      <td>25.0000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>75</td>\n",
       "      <td>25</td>\n",
       "      <td>100</td>\n",
       "      <td>25.0000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   group  ounces  size  Perc_of_Group_Column\n",
       "0     80      20   100            26.6666667\n",
       "1     70      30   100            23.3333333\n",
       "2     75      25   100            25.0000000\n",
       "3     75      25   100            25.0000000"
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['Perc_of_Group_Column'] = data['group'] / data['group'].sum() * 100\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='transpose'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Transpose a data frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's say you have a data frame with several columns and don't want to horizontally scroll to see the columns, but want to vertically scroll instead.  Then that case, transpose the data frame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>1.2570738</td>\n",
       "      <td>1.2159738</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>two</td>\n",
       "      <td>0.6660120</td>\n",
       "      <td>-1.4213658</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>0.6255760</td>\n",
       "      <td>-0.0309261</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>0.1350255</td>\n",
       "      <td>1.4271483</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>1.4709446</td>\n",
       "      <td>-0.3034248</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2      data1      data2\n",
       "0    a  one  1.2570738  1.2159738\n",
       "1    a  two  0.6660120 -1.4213658\n",
       "2    b  one  0.6255760 -0.0309261\n",
       "3    b  two  0.1350255  1.4271483\n",
       "4    a  one  1.4709446 -0.3034248"
      ]
     },
     "execution_count": 128,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],\n",
    "                   'key2' : ['one', 'two', 'one', 'two', 'one'],\n",
    "                   'data1' : np.random.randn(5),\n",
    "                   'data2' : np.random.randn(5)})\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key2</th>\n",
       "      <td>one</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data1</th>\n",
       "      <td>1.257074</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>1.215974</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              0\n",
       "key1          a\n",
       "key2        one\n",
       "data1  1.257074\n",
       "data2  1.215974"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(1).transpose()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"index2column\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Converting a data frame index to a column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces\n",
       "0     a     4.0\n",
       "1     a     3.0\n",
       "2     a    12.0\n",
       "3     b     6.0\n",
       "4     b     7.5\n",
       "5     b     8.0\n",
       "6     c     3.0\n",
       "7     c     5.0\n",
       "8     c     6.0"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "data = pd.DataFrame({'group': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>6</td>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>7</td>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8</td>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index group  ounces\n",
       "0      0     a     4.0\n",
       "1      1     a     3.0\n",
       "2      2     a    12.0\n",
       "3      3     b     6.0\n",
       "4      4     b     7.5\n",
       "5      5     b     8.0\n",
       "6      6     c     3.0\n",
       "7      7     c     5.0\n",
       "8      8     c     6.0"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.reset_index(level=0, inplace=True)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"column2index\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to set one of the dataframe's columns to be the index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top]](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>20</td>\n",
       "      <td>21</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    a   b   c\n",
       "0   1   2   3\n",
       "1  10  11  12\n",
       "2  20  21  22"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "col = ['a','b','c']\n",
    "data = pd.DataFrame([[1,2,3],[10,11,12],[20,21,22]],columns=col)\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>21</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     b   c\n",
       "a         \n",
       "1    2   3\n",
       "10  11  12\n",
       "20  21  22"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = data.set_index('a')\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To remove the index name (\"a\"), do:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>21</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     b   c\n",
       "1    2   3\n",
       "10  11  12\n",
       "20  21  22"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.index.name = None\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='missing_dates'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to add or fill in missing dates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top]](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "09-02-2013     2\n",
       "09-03-2013    10\n",
       "09-06-2013     5\n",
       "09-07-2013     1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "idx = pd.date_range('09-01-2013', '09-30-2013')\n",
    "\n",
    "s = pd.Series({'09-02-2013': 2,\n",
    "               '09-03-2013': 10,\n",
    "               '09-06-2013': 5,\n",
    "               '09-07-2013': 1})\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### From above, we have holes in our data.  Let's fill in those missing holes using the idx date_range we specified above using `reindex` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2013-09-01     0\n",
       "2013-09-02     2\n",
       "2013-09-03    10\n",
       "2013-09-04     0\n",
       "2013-09-05     0\n",
       "2013-09-06     5\n",
       "2013-09-07     1\n",
       "2013-09-08     0\n",
       "2013-09-09     0\n",
       "2013-09-10     0\n",
       "2013-09-11     0\n",
       "2013-09-12     0\n",
       "2013-09-13     0\n",
       "2013-09-14     0\n",
       "2013-09-15     0\n",
       "2013-09-16     0\n",
       "2013-09-17     0\n",
       "2013-09-18     0\n",
       "2013-09-19     0\n",
       "2013-09-20     0\n",
       "2013-09-21     0\n",
       "2013-09-22     0\n",
       "2013-09-23     0\n",
       "2013-09-24     0\n",
       "2013-09-25     0\n",
       "2013-09-26     0\n",
       "2013-09-27     0\n",
       "2013-09-28     0\n",
       "2013-09-29     0\n",
       "2013-09-30     0\n",
       "Freq: D, dtype: int64"
      ]
     },
     "execution_count": 137,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# First need to convert the dataframe index type as type DateTime\n",
    "s.index = pd.DatetimeIndex(s.index)\n",
    "\n",
    "s = s.reindex(idx, fill_value=0)\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='database'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to connect and query against a database registered as an ODBC data source (Windows)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top]](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pyodbc\n",
    "import pandas as pd\n",
    "from getpass import getpass    # Module that will create a text input widget AND mask your password\n",
    "\n",
    "userid = 'your_userid'\n",
    "pw = getpass(prompt='Enter your password: ')\n",
    "\n",
    "cnxn_string = 'DSN=your_dsn;UID=' + userid + ';PWD=' + pw\n",
    "\n",
    "sql = \"\"\"\n",
    "SELECT * from your_table...\n",
    "\"\"\"\n",
    "\n",
    "with pyodbc.connect(cnxn_string) as conn:\n",
    "    df = pd.read_sql(sql, conn, index_col=None, parse_dates={'some_column':\"%Y-%m-%d\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='melt'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to convert data in wide format to long format using pd.melt()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Certain charting libaries like expect the data to be in long format.  Let's get some data in wide format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "from pandas_datareader import data as pdr\n",
    "from datetime import datetime\n",
    "import pandas as pd\n",
    "\n",
    "stocks = {'tech':['GOOGL', 'MSFT', 'LNKD', 'YHOO', 'FB','HPQ','AMZN'],\n",
    "          'auto':['TM','F','GM','HMC','NSANY','HYMTF'],\n",
    "          'housing':['HD','WMT','LOW']\n",
    "          }\n",
    "\n",
    "start_date = datetime(2016,1,1)\n",
    "end_date = datetime(2016, 4, 21)\n",
    "\n",
    "def get_px(stock, start, end):\n",
    "    return web.get_data_yahoo(stock, start, end)['Adj Close']\n",
    "\n",
    "df = pd.DataFrame({n: get_px(n, start_date, end_date) for n in stocks['tech']})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>AMZN</th>\n",
       "      <th>FB</th>\n",
       "      <th>GOOGL</th>\n",
       "      <th>HPQ</th>\n",
       "      <th>LNKD</th>\n",
       "      <th>MSFT</th>\n",
       "      <th>YHOO</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>636.989990</td>\n",
       "      <td>102.220001</td>\n",
       "      <td>759.440002</td>\n",
       "      <td>11.471342</td>\n",
       "      <td>225.550003</td>\n",
       "      <td>54.409346</td>\n",
       "      <td>31.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>633.789978</td>\n",
       "      <td>102.730003</td>\n",
       "      <td>761.530029</td>\n",
       "      <td>11.580122</td>\n",
       "      <td>229.000000</td>\n",
       "      <td>54.657563</td>\n",
       "      <td>32.200001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>632.650024</td>\n",
       "      <td>102.970001</td>\n",
       "      <td>759.330017</td>\n",
       "      <td>11.164780</td>\n",
       "      <td>226.080002</td>\n",
       "      <td>53.664692</td>\n",
       "      <td>32.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>607.940002</td>\n",
       "      <td>97.919998</td>\n",
       "      <td>741.000000</td>\n",
       "      <td>10.650548</td>\n",
       "      <td>217.000000</td>\n",
       "      <td>51.798093</td>\n",
       "      <td>30.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>607.049988</td>\n",
       "      <td>97.330002</td>\n",
       "      <td>730.909973</td>\n",
       "      <td>10.462655</td>\n",
       "      <td>215.899994</td>\n",
       "      <td>51.956956</td>\n",
       "      <td>30.629999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>617.739990</td>\n",
       "      <td>97.510002</td>\n",
       "      <td>733.070007</td>\n",
       "      <td>10.531878</td>\n",
       "      <td>210.929993</td>\n",
       "      <td>51.927167</td>\n",
       "      <td>30.170000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>617.890015</td>\n",
       "      <td>99.370003</td>\n",
       "      <td>745.340027</td>\n",
       "      <td>10.699993</td>\n",
       "      <td>210.429993</td>\n",
       "      <td>52.403745</td>\n",
       "      <td>30.690001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>581.809998</td>\n",
       "      <td>95.440002</td>\n",
       "      <td>719.570007</td>\n",
       "      <td>10.462655</td>\n",
       "      <td>203.130005</td>\n",
       "      <td>51.271873</td>\n",
       "      <td>29.440001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>593.000000</td>\n",
       "      <td>98.370003</td>\n",
       "      <td>731.390015</td>\n",
       "      <td>10.502211</td>\n",
       "      <td>203.210007</td>\n",
       "      <td>52.731395</td>\n",
       "      <td>30.320000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>570.179993</td>\n",
       "      <td>94.970001</td>\n",
       "      <td>710.489990</td>\n",
       "      <td>9.997867</td>\n",
       "      <td>196.320007</td>\n",
       "      <td>50.626508</td>\n",
       "      <td>29.139999</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date        AMZN          FB       GOOGL        HPQ        LNKD  \\\n",
       "0   1/4/2016  636.989990  102.220001  759.440002  11.471342  225.550003   \n",
       "1   1/5/2016  633.789978  102.730003  761.530029  11.580122  229.000000   \n",
       "2   1/6/2016  632.650024  102.970001  759.330017  11.164780  226.080002   \n",
       "3   1/7/2016  607.940002   97.919998  741.000000  10.650548  217.000000   \n",
       "4   1/8/2016  607.049988   97.330002  730.909973  10.462655  215.899994   \n",
       "5  1/11/2016  617.739990   97.510002  733.070007  10.531878  210.929993   \n",
       "6  1/12/2016  617.890015   99.370003  745.340027  10.699993  210.429993   \n",
       "7  1/13/2016  581.809998   95.440002  719.570007  10.462655  203.130005   \n",
       "8  1/14/2016  593.000000   98.370003  731.390015  10.502211  203.210007   \n",
       "9  1/15/2016  570.179993   94.970001  710.489990   9.997867  196.320007   \n",
       "\n",
       "        MSFT       YHOO  \n",
       "0  54.409346  31.400000  \n",
       "1  54.657563  32.200001  \n",
       "2  53.664692  32.160000  \n",
       "3  51.798093  30.160000  \n",
       "4  51.956956  30.629999  \n",
       "5  51.927167  30.170000  \n",
       "6  52.403745  30.690001  \n",
       "7  51.271873  29.440001  \n",
       "8  52.731395  30.320000  \n",
       "9  50.626508  29.139999  "
      ]
     },
     "execution_count": 144,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_clipboard()\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Below is what data looks like in 'wide' format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>AMZN</th>\n",
       "      <th>FB</th>\n",
       "      <th>GOOGL</th>\n",
       "      <th>HPQ</th>\n",
       "      <th>LNKD</th>\n",
       "      <th>MSFT</th>\n",
       "      <th>YHOO</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>636.989990</td>\n",
       "      <td>102.220001</td>\n",
       "      <td>759.440002</td>\n",
       "      <td>11.471342</td>\n",
       "      <td>225.550003</td>\n",
       "      <td>54.409346</td>\n",
       "      <td>31.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>633.789978</td>\n",
       "      <td>102.730003</td>\n",
       "      <td>761.530029</td>\n",
       "      <td>11.580122</td>\n",
       "      <td>229.000000</td>\n",
       "      <td>54.657563</td>\n",
       "      <td>32.200001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>632.650024</td>\n",
       "      <td>102.970001</td>\n",
       "      <td>759.330017</td>\n",
       "      <td>11.164780</td>\n",
       "      <td>226.080002</td>\n",
       "      <td>53.664692</td>\n",
       "      <td>32.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>607.940002</td>\n",
       "      <td>97.919998</td>\n",
       "      <td>741.000000</td>\n",
       "      <td>10.650548</td>\n",
       "      <td>217.000000</td>\n",
       "      <td>51.798093</td>\n",
       "      <td>30.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>607.049988</td>\n",
       "      <td>97.330002</td>\n",
       "      <td>730.909973</td>\n",
       "      <td>10.462655</td>\n",
       "      <td>215.899994</td>\n",
       "      <td>51.956956</td>\n",
       "      <td>30.629999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>617.739990</td>\n",
       "      <td>97.510002</td>\n",
       "      <td>733.070007</td>\n",
       "      <td>10.531878</td>\n",
       "      <td>210.929993</td>\n",
       "      <td>51.927167</td>\n",
       "      <td>30.170000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>617.890015</td>\n",
       "      <td>99.370003</td>\n",
       "      <td>745.340027</td>\n",
       "      <td>10.699993</td>\n",
       "      <td>210.429993</td>\n",
       "      <td>52.403745</td>\n",
       "      <td>30.690001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>581.809998</td>\n",
       "      <td>95.440002</td>\n",
       "      <td>719.570007</td>\n",
       "      <td>10.462655</td>\n",
       "      <td>203.130005</td>\n",
       "      <td>51.271873</td>\n",
       "      <td>29.440001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>593.000000</td>\n",
       "      <td>98.370003</td>\n",
       "      <td>731.390015</td>\n",
       "      <td>10.502211</td>\n",
       "      <td>203.210007</td>\n",
       "      <td>52.731395</td>\n",
       "      <td>30.320000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>570.179993</td>\n",
       "      <td>94.970001</td>\n",
       "      <td>710.489990</td>\n",
       "      <td>9.997867</td>\n",
       "      <td>196.320007</td>\n",
       "      <td>50.626508</td>\n",
       "      <td>29.139999</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date        AMZN          FB       GOOGL        HPQ        LNKD  \\\n",
       "0   1/4/2016  636.989990  102.220001  759.440002  11.471342  225.550003   \n",
       "1   1/5/2016  633.789978  102.730003  761.530029  11.580122  229.000000   \n",
       "2   1/6/2016  632.650024  102.970001  759.330017  11.164780  226.080002   \n",
       "3   1/7/2016  607.940002   97.919998  741.000000  10.650548  217.000000   \n",
       "4   1/8/2016  607.049988   97.330002  730.909973  10.462655  215.899994   \n",
       "5  1/11/2016  617.739990   97.510002  733.070007  10.531878  210.929993   \n",
       "6  1/12/2016  617.890015   99.370003  745.340027  10.699993  210.429993   \n",
       "7  1/13/2016  581.809998   95.440002  719.570007  10.462655  203.130005   \n",
       "8  1/14/2016  593.000000   98.370003  731.390015  10.502211  203.210007   \n",
       "9  1/15/2016  570.179993   94.970001  710.489990   9.997867  196.320007   \n",
       "\n",
       "        MSFT       YHOO  \n",
       "0  54.409346  31.400000  \n",
       "1  54.657563  32.200001  \n",
       "2  53.664692  32.160000  \n",
       "3  51.798093  30.160000  \n",
       "4  51.956956  30.629999  \n",
       "5  51.927167  30.170000  \n",
       "6  52.403745  30.690001  \n",
       "7  51.271873  29.440001  \n",
       "8  52.731395  30.320000  \n",
       "9  50.626508  29.139999  "
      ]
     },
     "execution_count": 146,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_clipboard()\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### This is what data looks like in 'long' format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>variable</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>636.989990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>633.789978</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>632.650024</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>607.940002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>607.049988</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>617.739990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>617.890015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>581.809998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>593.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>570.179993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>102.220001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>102.730003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>102.970001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>97.919998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>97.330002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>97.510002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>99.370003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>95.440002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>98.370003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>94.970001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>759.440002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>761.530029</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>759.330017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>741.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>730.909973</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>733.070007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>745.340027</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>719.570007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>731.390015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>710.489990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>11.471342</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>11.580122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>11.164780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.650548</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.462655</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.531878</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.699993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.462655</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.502211</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>9.997867</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>225.550003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>229.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>226.080002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>217.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>215.899994</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>210.929993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>210.429993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>203.130005</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>203.210007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>196.320007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>54.409346</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>54.657563</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>53.664692</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>51.798093</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>51.956956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>51.927167</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.403745</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>51.271873</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.731395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>50.626508</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>31.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>32.200001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>32.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>64</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.629999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>65</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.170000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>66</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.690001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>29.440001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.320000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>29.139999</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Date variable       value\n",
       "0    1/4/2016     AMZN  636.989990\n",
       "1    1/5/2016     AMZN  633.789978\n",
       "2    1/6/2016     AMZN  632.650024\n",
       "3    1/7/2016     AMZN  607.940002\n",
       "4    1/8/2016     AMZN  607.049988\n",
       "5   1/11/2016     AMZN  617.739990\n",
       "6   1/12/2016     AMZN  617.890015\n",
       "7   1/13/2016     AMZN  581.809998\n",
       "8   1/14/2016     AMZN  593.000000\n",
       "9   1/15/2016     AMZN  570.179993\n",
       "10   1/4/2016       FB  102.220001\n",
       "11   1/5/2016       FB  102.730003\n",
       "12   1/6/2016       FB  102.970001\n",
       "13   1/7/2016       FB   97.919998\n",
       "14   1/8/2016       FB   97.330002\n",
       "15  1/11/2016       FB   97.510002\n",
       "16  1/12/2016       FB   99.370003\n",
       "17  1/13/2016       FB   95.440002\n",
       "18  1/14/2016       FB   98.370003\n",
       "19  1/15/2016       FB   94.970001\n",
       "20   1/4/2016    GOOGL  759.440002\n",
       "21   1/5/2016    GOOGL  761.530029\n",
       "22   1/6/2016    GOOGL  759.330017\n",
       "23   1/7/2016    GOOGL  741.000000\n",
       "24   1/8/2016    GOOGL  730.909973\n",
       "25  1/11/2016    GOOGL  733.070007\n",
       "26  1/12/2016    GOOGL  745.340027\n",
       "27  1/13/2016    GOOGL  719.570007\n",
       "28  1/14/2016    GOOGL  731.390015\n",
       "29  1/15/2016    GOOGL  710.489990\n",
       "30   1/4/2016      HPQ   11.471342\n",
       "31   1/5/2016      HPQ   11.580122\n",
       "32   1/6/2016      HPQ   11.164780\n",
       "33   1/7/2016      HPQ   10.650548\n",
       "34   1/8/2016      HPQ   10.462655\n",
       "35  1/11/2016      HPQ   10.531878\n",
       "36  1/12/2016      HPQ   10.699993\n",
       "37  1/13/2016      HPQ   10.462655\n",
       "38  1/14/2016      HPQ   10.502211\n",
       "39  1/15/2016      HPQ    9.997867\n",
       "40   1/4/2016     LNKD  225.550003\n",
       "41   1/5/2016     LNKD  229.000000\n",
       "42   1/6/2016     LNKD  226.080002\n",
       "43   1/7/2016     LNKD  217.000000\n",
       "44   1/8/2016     LNKD  215.899994\n",
       "45  1/11/2016     LNKD  210.929993\n",
       "46  1/12/2016     LNKD  210.429993\n",
       "47  1/13/2016     LNKD  203.130005\n",
       "48  1/14/2016     LNKD  203.210007\n",
       "49  1/15/2016     LNKD  196.320007\n",
       "50   1/4/2016     MSFT   54.409346\n",
       "51   1/5/2016     MSFT   54.657563\n",
       "52   1/6/2016     MSFT   53.664692\n",
       "53   1/7/2016     MSFT   51.798093\n",
       "54   1/8/2016     MSFT   51.956956\n",
       "55  1/11/2016     MSFT   51.927167\n",
       "56  1/12/2016     MSFT   52.403745\n",
       "57  1/13/2016     MSFT   51.271873\n",
       "58  1/14/2016     MSFT   52.731395\n",
       "59  1/15/2016     MSFT   50.626508\n",
       "60   1/4/2016     YHOO   31.400000\n",
       "61   1/5/2016     YHOO   32.200001\n",
       "62   1/6/2016     YHOO   32.160000\n",
       "63   1/7/2016     YHOO   30.160000\n",
       "64   1/8/2016     YHOO   30.629999\n",
       "65  1/11/2016     YHOO   30.170000\n",
       "66  1/12/2016     YHOO   30.690001\n",
       "67  1/13/2016     YHOO   29.440001\n",
       "68  1/14/2016     YHOO   30.320000\n",
       "69  1/15/2016     YHOO   29.139999"
      ]
     },
     "execution_count": 147,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_long = pd.melt(df, id_vars=['Date']).dropna()\n",
    "df_long"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Stock</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>636.989990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>633.789978</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>632.650024</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>607.940002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>607.049988</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>617.739990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>617.890015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>581.809998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>593.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>AMZN</td>\n",
       "      <td>570.179993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>102.220001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>102.730003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>102.970001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>97.919998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>97.330002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>97.510002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>99.370003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>95.440002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>98.370003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>FB</td>\n",
       "      <td>94.970001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>759.440002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>761.530029</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>759.330017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>741.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>730.909973</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>733.070007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>745.340027</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>719.570007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>731.390015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>GOOGL</td>\n",
       "      <td>710.489990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>11.471342</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>11.580122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>11.164780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.650548</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.462655</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.531878</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.699993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.462655</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>10.502211</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>HPQ</td>\n",
       "      <td>9.997867</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>225.550003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>229.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>226.080002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>217.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>215.899994</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>210.929993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>210.429993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>203.130005</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>203.210007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>LNKD</td>\n",
       "      <td>196.320007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>54.409346</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>54.657563</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>53.664692</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>51.798093</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>51.956956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>51.927167</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.403745</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>51.271873</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.731395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>50.626508</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60</th>\n",
       "      <td>1/4/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>31.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61</th>\n",
       "      <td>1/5/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>32.200001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>1/6/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>32.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63</th>\n",
       "      <td>1/7/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>64</th>\n",
       "      <td>1/8/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.629999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>65</th>\n",
       "      <td>1/11/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.170000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>66</th>\n",
       "      <td>1/12/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.690001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>1/13/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>29.440001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>1/14/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>30.320000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69</th>\n",
       "      <td>1/15/2016</td>\n",
       "      <td>YHOO</td>\n",
       "      <td>29.139999</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Date  Stock       Price\n",
       "0    1/4/2016   AMZN  636.989990\n",
       "1    1/5/2016   AMZN  633.789978\n",
       "2    1/6/2016   AMZN  632.650024\n",
       "3    1/7/2016   AMZN  607.940002\n",
       "4    1/8/2016   AMZN  607.049988\n",
       "5   1/11/2016   AMZN  617.739990\n",
       "6   1/12/2016   AMZN  617.890015\n",
       "7   1/13/2016   AMZN  581.809998\n",
       "8   1/14/2016   AMZN  593.000000\n",
       "9   1/15/2016   AMZN  570.179993\n",
       "10   1/4/2016     FB  102.220001\n",
       "11   1/5/2016     FB  102.730003\n",
       "12   1/6/2016     FB  102.970001\n",
       "13   1/7/2016     FB   97.919998\n",
       "14   1/8/2016     FB   97.330002\n",
       "15  1/11/2016     FB   97.510002\n",
       "16  1/12/2016     FB   99.370003\n",
       "17  1/13/2016     FB   95.440002\n",
       "18  1/14/2016     FB   98.370003\n",
       "19  1/15/2016     FB   94.970001\n",
       "20   1/4/2016  GOOGL  759.440002\n",
       "21   1/5/2016  GOOGL  761.530029\n",
       "22   1/6/2016  GOOGL  759.330017\n",
       "23   1/7/2016  GOOGL  741.000000\n",
       "24   1/8/2016  GOOGL  730.909973\n",
       "25  1/11/2016  GOOGL  733.070007\n",
       "26  1/12/2016  GOOGL  745.340027\n",
       "27  1/13/2016  GOOGL  719.570007\n",
       "28  1/14/2016  GOOGL  731.390015\n",
       "29  1/15/2016  GOOGL  710.489990\n",
       "30   1/4/2016    HPQ   11.471342\n",
       "31   1/5/2016    HPQ   11.580122\n",
       "32   1/6/2016    HPQ   11.164780\n",
       "33   1/7/2016    HPQ   10.650548\n",
       "34   1/8/2016    HPQ   10.462655\n",
       "35  1/11/2016    HPQ   10.531878\n",
       "36  1/12/2016    HPQ   10.699993\n",
       "37  1/13/2016    HPQ   10.462655\n",
       "38  1/14/2016    HPQ   10.502211\n",
       "39  1/15/2016    HPQ    9.997867\n",
       "40   1/4/2016   LNKD  225.550003\n",
       "41   1/5/2016   LNKD  229.000000\n",
       "42   1/6/2016   LNKD  226.080002\n",
       "43   1/7/2016   LNKD  217.000000\n",
       "44   1/8/2016   LNKD  215.899994\n",
       "45  1/11/2016   LNKD  210.929993\n",
       "46  1/12/2016   LNKD  210.429993\n",
       "47  1/13/2016   LNKD  203.130005\n",
       "48  1/14/2016   LNKD  203.210007\n",
       "49  1/15/2016   LNKD  196.320007\n",
       "50   1/4/2016   MSFT   54.409346\n",
       "51   1/5/2016   MSFT   54.657563\n",
       "52   1/6/2016   MSFT   53.664692\n",
       "53   1/7/2016   MSFT   51.798093\n",
       "54   1/8/2016   MSFT   51.956956\n",
       "55  1/11/2016   MSFT   51.927167\n",
       "56  1/12/2016   MSFT   52.403745\n",
       "57  1/13/2016   MSFT   51.271873\n",
       "58  1/14/2016   MSFT   52.731395\n",
       "59  1/15/2016   MSFT   50.626508\n",
       "60   1/4/2016   YHOO   31.400000\n",
       "61   1/5/2016   YHOO   32.200001\n",
       "62   1/6/2016   YHOO   32.160000\n",
       "63   1/7/2016   YHOO   30.160000\n",
       "64   1/8/2016   YHOO   30.629999\n",
       "65  1/11/2016   YHOO   30.170000\n",
       "66  1/12/2016   YHOO   30.690001\n",
       "67  1/13/2016   YHOO   29.440001\n",
       "68  1/14/2016   YHOO   30.320000\n",
       "69  1/15/2016   YHOO   29.139999"
      ]
     },
     "execution_count": 148,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_long.rename(columns={'variable': 'Stock', 'value':'Price'},inplace=True)\n",
    "df_long"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='pivot'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to convert data in long format to wide format using dataframe.pivot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Stock</th>\n",
       "      <th>AMZN</th>\n",
       "      <th>FB</th>\n",
       "      <th>GOOGL</th>\n",
       "      <th>HPQ</th>\n",
       "      <th>LNKD</th>\n",
       "      <th>MSFT</th>\n",
       "      <th>YHOO</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1/11/2016</th>\n",
       "      <td>617.739990</td>\n",
       "      <td>97.510002</td>\n",
       "      <td>733.070007</td>\n",
       "      <td>10.531878</td>\n",
       "      <td>210.929993</td>\n",
       "      <td>51.927167</td>\n",
       "      <td>30.170000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/12/2016</th>\n",
       "      <td>617.890015</td>\n",
       "      <td>99.370003</td>\n",
       "      <td>745.340027</td>\n",
       "      <td>10.699993</td>\n",
       "      <td>210.429993</td>\n",
       "      <td>52.403745</td>\n",
       "      <td>30.690001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/13/2016</th>\n",
       "      <td>581.809998</td>\n",
       "      <td>95.440002</td>\n",
       "      <td>719.570007</td>\n",
       "      <td>10.462655</td>\n",
       "      <td>203.130005</td>\n",
       "      <td>51.271873</td>\n",
       "      <td>29.440001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/14/2016</th>\n",
       "      <td>593.000000</td>\n",
       "      <td>98.370003</td>\n",
       "      <td>731.390015</td>\n",
       "      <td>10.502211</td>\n",
       "      <td>203.210007</td>\n",
       "      <td>52.731395</td>\n",
       "      <td>30.320000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/15/2016</th>\n",
       "      <td>570.179993</td>\n",
       "      <td>94.970001</td>\n",
       "      <td>710.489990</td>\n",
       "      <td>9.997867</td>\n",
       "      <td>196.320007</td>\n",
       "      <td>50.626508</td>\n",
       "      <td>29.139999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/4/2016</th>\n",
       "      <td>636.989990</td>\n",
       "      <td>102.220001</td>\n",
       "      <td>759.440002</td>\n",
       "      <td>11.471342</td>\n",
       "      <td>225.550003</td>\n",
       "      <td>54.409346</td>\n",
       "      <td>31.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/5/2016</th>\n",
       "      <td>633.789978</td>\n",
       "      <td>102.730003</td>\n",
       "      <td>761.530029</td>\n",
       "      <td>11.580122</td>\n",
       "      <td>229.000000</td>\n",
       "      <td>54.657563</td>\n",
       "      <td>32.200001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/6/2016</th>\n",
       "      <td>632.650024</td>\n",
       "      <td>102.970001</td>\n",
       "      <td>759.330017</td>\n",
       "      <td>11.164780</td>\n",
       "      <td>226.080002</td>\n",
       "      <td>53.664692</td>\n",
       "      <td>32.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/7/2016</th>\n",
       "      <td>607.940002</td>\n",
       "      <td>97.919998</td>\n",
       "      <td>741.000000</td>\n",
       "      <td>10.650548</td>\n",
       "      <td>217.000000</td>\n",
       "      <td>51.798093</td>\n",
       "      <td>30.160000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/8/2016</th>\n",
       "      <td>607.049988</td>\n",
       "      <td>97.330002</td>\n",
       "      <td>730.909973</td>\n",
       "      <td>10.462655</td>\n",
       "      <td>215.899994</td>\n",
       "      <td>51.956956</td>\n",
       "      <td>30.629999</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Stock            AMZN          FB       GOOGL        HPQ        LNKD  \\\n",
       "Date                                                                   \n",
       "1/11/2016  617.739990   97.510002  733.070007  10.531878  210.929993   \n",
       "1/12/2016  617.890015   99.370003  745.340027  10.699993  210.429993   \n",
       "1/13/2016  581.809998   95.440002  719.570007  10.462655  203.130005   \n",
       "1/14/2016  593.000000   98.370003  731.390015  10.502211  203.210007   \n",
       "1/15/2016  570.179993   94.970001  710.489990   9.997867  196.320007   \n",
       "1/4/2016   636.989990  102.220001  759.440002  11.471342  225.550003   \n",
       "1/5/2016   633.789978  102.730003  761.530029  11.580122  229.000000   \n",
       "1/6/2016   632.650024  102.970001  759.330017  11.164780  226.080002   \n",
       "1/7/2016   607.940002   97.919998  741.000000  10.650548  217.000000   \n",
       "1/8/2016   607.049988   97.330002  730.909973  10.462655  215.899994   \n",
       "\n",
       "Stock           MSFT       YHOO  \n",
       "Date                             \n",
       "1/11/2016  51.927167  30.170000  \n",
       "1/12/2016  52.403745  30.690001  \n",
       "1/13/2016  51.271873  29.440001  \n",
       "1/14/2016  52.731395  30.320000  \n",
       "1/15/2016  50.626508  29.139999  \n",
       "1/4/2016   54.409346  31.400000  \n",
       "1/5/2016   54.657563  32.200001  \n",
       "1/6/2016   53.664692  32.160000  \n",
       "1/7/2016   51.798093  30.160000  \n",
       "1/8/2016   51.956956  30.629999  "
      ]
     },
     "execution_count": 149,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pivoted = df_long.pivot(index='Date', columns='Stock', values='Price')\n",
    "pivoted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='category'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using category data type to control sort order"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>medal</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bronze</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>silver</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>silver</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>gold</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>bronze</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bronze</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>gold</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>gold</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>gold</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    medal  ounces\n",
       "0  bronze     4.0\n",
       "1  silver     3.0\n",
       "2  silver    12.0\n",
       "3    gold     6.0\n",
       "4  bronze     7.5\n",
       "5  bronze     8.0\n",
       "6    gold     3.0\n",
       "7    gold     5.0\n",
       "8    gold     6.0"
      ]
     },
     "execution_count": 150,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "data = pd.DataFrame({'medal': ['bronze', 'silver', 'silver', 'gold','bronze', 'bronze', 'gold', 'gold','gold'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "medal      object\n",
       "ounces    float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 151,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "medal     category\n",
       "ounces     float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 152,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[\"medal\"] = data[\"medal\"].astype(\"category\")\n",
    "data.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 153,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>medal</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bronze</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>bronze</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bronze</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>gold</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>gold</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>gold</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>gold</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>silver</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>silver</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    medal  ounces\n",
       "0  bronze     4.0\n",
       "4  bronze     7.5\n",
       "5  bronze     8.0\n",
       "3    gold     6.0\n",
       "6    gold     3.0\n",
       "7    gold     5.0\n",
       "8    gold     6.0\n",
       "1  silver     3.0\n",
       "2  silver    12.0"
      ]
     },
     "execution_count": 153,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.sort_values(by=[\"medal\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now define the order of the catgorical data using set_categories()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)] [[back to section](#category)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {},
   "outputs": [],
   "source": [
    "data[\"medal\"].cat.set_categories([ \"gold\",\"silver\",\"bronze\"], inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>medal</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>gold</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>gold</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>gold</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>gold</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>silver</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>silver</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bronze</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>bronze</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bronze</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    medal  ounces\n",
       "3    gold     6.0\n",
       "6    gold     3.0\n",
       "7    gold     5.0\n",
       "8    gold     6.0\n",
       "1  silver     3.0\n",
       "2  silver    12.0\n",
       "0  bronze     4.0\n",
       "4  bronze     7.5\n",
       "5  bronze     8.0"
      ]
     },
     "execution_count": 155,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.sort_values(by='medal')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a column of type categorical that is also ordered"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>quality</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>101</td>\n",
       "      <td>very good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>103</td>\n",
       "      <td>excellent</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    ID    quality\n",
       "0  100       good\n",
       "1  101  very good\n",
       "2  102       good\n",
       "3  103  excellent"
      ]
     },
     "execution_count": 156,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'ID': [100, 101, 102, 103],\n",
    "                   'quality': ['good', 'very good', 'good', 'excellent']})\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "metadata": {},
   "outputs": [],
   "source": [
    "cat_dtype = pd.api.types.CategoricalDtype(categories=['good', 'very good', 'excellent'], ordered=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>quality</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>101</td>\n",
       "      <td>very good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>103</td>\n",
       "      <td>excellent</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    ID    quality\n",
       "0  100       good\n",
       "2  102       good\n",
       "1  101  very good\n",
       "3  103  excellent"
      ]
     },
     "execution_count": 161,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['quality'] = df.quality.astype(cat_dtype)\n",
    "df.sort_values(by='quality')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='merge'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Merging 2 data frames using merge()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces\n",
       "0     a     4.0\n",
       "1     a     3.0\n",
       "2     a    12.0\n",
       "3     b     6.0\n",
       "4     b     7.5\n",
       "5     b     8.0\n",
       "6     c     3.0\n",
       "7     c     5.0\n",
       "8     c     6.0"
      ]
     },
     "execution_count": 162,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "left = pd.DataFrame({'group': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "left"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>alpha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>b</td>\n",
       "      <td>beta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c</td>\n",
       "      <td>charlie</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  label    value\n",
       "0     a    alpha\n",
       "1     b     beta\n",
       "2     c  charlie"
      ]
     },
     "execution_count": 163,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "right = pd.DataFrame({'label': ['a','b','c'],\n",
    "                     'value': ['alpha','beta','charlie']})\n",
    "right"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### By default, merge() does an inner join, but you can specify 'left', 'right', or 'outer' joins."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These types of joins are familiar if you have SQL background.  Here's a good SQL joins [visualizer](http://sql-joins.leopard.in.ua/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "      <th>label</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "      <td>a</td>\n",
       "      <td>alpha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "      <td>a</td>\n",
       "      <td>alpha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "      <td>a</td>\n",
       "      <td>alpha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "      <td>b</td>\n",
       "      <td>beta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "      <td>b</td>\n",
       "      <td>beta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "      <td>b</td>\n",
       "      <td>beta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "      <td>c</td>\n",
       "      <td>charlie</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "      <td>c</td>\n",
       "      <td>charlie</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "      <td>c</td>\n",
       "      <td>charlie</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces label    value\n",
       "0     a     4.0     a    alpha\n",
       "1     a     3.0     a    alpha\n",
       "2     a    12.0     a    alpha\n",
       "3     b     6.0     b     beta\n",
       "4     b     7.5     b     beta\n",
       "5     b     8.0     b     beta\n",
       "6     c     3.0     c  charlie\n",
       "7     c     5.0     c  charlie\n",
       "8     c     6.0     c  charlie"
      ]
     },
     "execution_count": 164,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inner_joined = pd.merge(left, right, how='inner', left_on='group', right_on='label')\n",
    "inner_joined"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE:** To merge 2 Series, you have to use [pd.concat()](http://stackoverflow.com/questions/18062135/combining-two-series-into-a-dataframe-in-pandas) instead"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### For more on merge, join, and concatenate, see the official [docs](http://pandas.pydata.org/pandas-docs/stable/merging.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### YouTube [video](https://www.youtube.com/watch?v=8K8Bs7z6d2M&feature=youtu.be) on joining data sets using pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"missing\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Finding rows of a data frame containing missing data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A    B    C\n",
       "0  0  1.0  2.0\n",
       "1  0  NaN  1.0\n",
       "2  1  1.0  NaN\n",
       "3  0  1.0  2.0\n",
       "4  0  1.0  2.0"
      ]
     },
     "execution_count": 165,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "df = pd.DataFrame([range(3), [0, np.NaN, 1], [1, 1, np.NaN], range(3), range(3)])\n",
    "df.columns = ['A','B','C']\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 166,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1     True\n",
       "2     True\n",
       "3    False\n",
       "4    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 166,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isna().any(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A    B    C\n",
       "1  0  NaN  1.0\n",
       "2  1  1.0  NaN"
      ]
     },
     "execution_count": 167,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df.isna().any(axis=1)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### To filter a column for na/null values and also filter another column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A   B    C\n",
       "1  0 NaN  1.0"
      ]
     },
     "execution_count": 168,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[(df['C']==1) & (df['B'].isna())]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='convert_type'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Converting a data type of a column in a data frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "      <th>three</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>1.2</td>\n",
       "      <td>4.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>b</td>\n",
       "      <td>70</td>\n",
       "      <td>0.03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>x</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  one  two three\n",
       "0   a  1.2   4.2\n",
       "1   b   70  0.03\n",
       "2   x    5     0"
      ]
     },
     "execution_count": 169,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]\n",
    "df = pd.DataFrame(a, columns=['one', 'two', 'three'])\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "one      object\n",
       "two      object\n",
       "three    object\n",
       "dtype: object"
      ]
     },
     "execution_count": 170,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "metadata": {},
   "outputs": [],
   "source": [
    "df[['two', 'three']] = df[['two', 'three']].astype(float)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "one       object\n",
       "two      float64\n",
       "three    float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 172,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"matplotlib\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plotting data frames using MATPLOTLIB version 1.5 and up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Starting with version 1.5, you can now pass a \"data\" parameter inside MATPLOTLIB's plot() function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Prior to version 1.5:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAeZklEQVR4nO3deXhV9aH18e8vJAESSJgChCEkEMaQIBBAwBEHVBxA6rXWqpVabG9t9W2vEHBCRQW1VmudcLZqvZYEUBAHigJOKCiczBBCIGFKGDKQkPH83j/I29dalADnZJ9hfZ6Hh0AOJ+swrGdns/eKsdYiIiK+K8TpACIi8uNU1CIiPk5FLSLi41TUIiI+TkUtIuLjQr3xpN26dbPx8fHeeGoRkYC0cePG/dbamGO9zytFHR8fz4YNG7zx1CIiAckYs+OH3qdTHyIiPk5FLSLi41TUIiI+TkUtIuLjVNQiIj6uRUVtjOlkjFlsjMkzxuQaY8Z7O5iIiBzV0svzngDet9b+xBgTDkR4MZOIiHzHcY+ojTHRwFnAiwDW2nprbbmXc4mI+JWviw7y7JptXnnulpz6SADKgJeNMd8aY14wxkR+/0HGmJnGmA3GmA1lZWUeDyoi4osO1zVy97Isrnr2C95cv5Oa+kaPf4yWFHUoMAp4xlo7EqgG0r7/IGvtImttqrU2NSbmmHdBiogElDVbypj857X87csd3DgxnpW3nklEuOdv+G7JM5YAJdba9c0/XswxilpEJFgcqq7n/hU5ZHyzi8TuHVj86wmM7tfZax/vuEVtrd1rjCk2xgy21uYD5wE5XkskIuKjrLWszNrL3cuyKK9p4HeTErllUiJtQ9t49eO29Bj9d8AbzVd8FAI3ei+SiIjvKa2s5a5lWXyQvY/k3tG8NmMcw3pFtcrHblFRW2s3AanejSIi4nustfxjYwnzl+dQ1+gm7eIh3HRGAqFtWu9+Qa/MnIqIBILigzXMycjk04L9jE3owoIrk+kf06HVc6ioRUS+p8ltefXzIh75IJ82IYb5U4fzs7FxhIQYR/KoqEVEvmPrvipmp7v4Zmc55wyO4cFpyfTq1N7RTCpqERGgocnNs59s48nVBUS2bcPjV5/GFaf1whhnjqK/S0UtIkEvs6SC2xdvJm9vFZemxDLv8iS6dWjrdKx/UVGLSNCqbWjiz6u28PzaQmI6tmXRdaO5MKmn07H+g4paRILSl4UHSEt3UXSghmvG9iXt4qFEtw9zOtYxqahFJKhU1TawYGUeb6zfSVyXCN68aRwTErs5HetHqahFJGh8nFfK3CWZ7Kus5aYzEvjDhYO8MqLkab6fUETkFB2srue+d7NZumk3A7t34OnfTGBknPdGlDxNRS0iActay3LXHua9k01lbQO3njeQ/z53gNdHlDxNRS0iAWlvRS13Ls1iVe4+RvSJZuFPxjGkZ+uMKHmailpEAoq1lre+LubBFbk0uN3ccclQZpyRQBuHbv/2BBW1iASMHQeqSUvP5IvCA5zevwsLrkwhvtt/fOVAv6OiFhG/1+S2vPzZdh79MJ+wkBAeujKZq1P7Ojai5GkqahHxa/l7q5iV7mJzcTnnD+3O/KnJ9Ixu53Qsj1JRi4hfqm908/QnBTz1cQEd24Xxl2tGcllKrE+MKHmailpE/M6m4nJmL3aRv6+KK07rxT2XJdElMtzpWF6johYRv3GkvonHPsrnxU+3071jO168IZXzhvZwOpbXqahFxC98vm0/aemZ7DxYw7Xj4ph98RCi2vnmiJKnqahFxKdV1jbw0Ht5/P2rncR3jeCtmadzev+uTsdqVSpqEfFZq3L2ccfSTMqq6rj5rP7cdv4g2of71+3fnqCiFhGfc+BwHfPezeHdzbsZ0rMjz1+fSkqfTk7HcoyKWkR8hrWWdzbvZt472Ryua+QPFwzi12cPIDw0xOlojlJRi4hP2F1+hDuXZrE6r5SRcZ1YOD2FQT06Oh3LJ6ioRcRRbrflza92smBlHk1uy92XDuOGCfF+PaLkaSpqEXHM9v3VpKW7WL/9IBMTu/LQtBTiukY4HcvnqKhFpNU1Nrl58dPtPPbRFsJDQ3h4egpXpfYJyNu/PUFFLSKtKndPJbPTXbhKKrhgWA/mTx1Oj6jAGlHyNBW1iLSKusYmnlpdwNOfbKNTRBhP/WwUlyT31FF0C6ioRcTrNu44xOx0FwWlh7lyVG/umjKMzgE8ouRpLSpqY0wRUAU0AY3W2lRvhhKRwFBT38gjH+TzyudFxEa14+Ubx3Du4O5Ox/I7J3JEfa61dr/XkohIQPl0637SMlyUHDrC9eP7MeuiIXRoq0/iT4Z+10TEoyqONPDgilz+d0MxCd0iefvm8YxN6OJ0LL/W0qK2wIfGGAs8Z61d9P0HGGNmAjMB4uLiPJdQRPzGB9l7uWtpFgeq6/n12QO47fyBtAsLvhElT2tpUZ9hrd1ljOkOfGSMybPWrv3uA5rLexFAamqq9XBOEfFhZVV1zHsnmxWZexgWG8VLvxjD8N7RTscKGC0qamvtrubvS40xS4CxwNof/1UiEuistSz5dhf3Lc+hpq6J2ycPZuZZ/QlrE9wjSp523KI2xkQCIdbaqua3LwTu83oyEfFpu8qPMDcjkzVbyhjdrzMLp6eQ2L2D07ECUkuOqHsAS5ovSg8F3rTWvu/VVCLis9xuy+vrd7BwZR4WmHfZMK4fH0+IRpS85rhFba0tBEa0QhYR8XHbyg6Tlu7i66JDnDmwGw9OS6ZvF40oeZsuzxOR42pscrNoXSGPr9pKu9AQHvlJCj8ZrRGl1qKiFpEflb27gtnpLrJ2VXLx8J7ce0US3TtqRKk1qahF5JhqG5p4cvVWnl1TSOeIcJ65dhQXJ8c6HSsoqahF5D9sKDrIrHQXhWXV/GR0H+6cMpROERpRcoqKWkT+pbru6IjSq18U0Su6Pa/NGMtZg2KcjhX0VNQiAsDaLWXMychkd8URbhgfz+2TBxOpESWfoD8FkSBXXlPP/BW5LN5YQv+YSP5x83hS4zWi5EtU1CJBbGXmHu5als2hmnpuOTeRWyYlakTJB6moRYJQaVUt9yzLZmXWXpJ6RfHqjDEk9dKIkq9SUYsEEWstizeWMH9FLkcamph90RB+dWYCoRpR8mkqapEgUXywhrlLMlm3dT9j4juzYHoKA2I0ouQPVNQiAc7ttrz2RREPf5CPAe6/Iolrx/XTiJIfUVGLBLCC0ipmp2eyccchzh4UwwPThtOns0aU/I2KWiQANTS5WbS2kCdWbSWibRse+68RTBvZWyNKfkpFLRJgsnZVMGuxi5w9lUxJiWXeZUnEdGzrdCw5BSpqkQBR29DEE//cyqK1hXSJDOe560YzOamn07HEA1TUIgHgq+0HSUt3Ubi/mqtT+zL3kqFER4Q5HUs8REUt4scO1zWycGUef/tyB306t+f1X47jjIHdnI4lHqaiFvFTH+eXckdGJnsqa5kxMYH/mTyIiHD9kw5E+lMV8TOHquu5f3kOGd/uIrF7Bxb/egKj+3V2OpZ4kYpaxE9Ya3kvcy/3vJNFeU0Dv5+UyG8nJdI2VCNKgU5FLeIH9lXWctfSLD7M2Udy72j+9stxDI2NcjqWtBIVtYgPs9by9oZi5q/Ipb7RzZyLh/DLMzSiFGxU1CI+aueBGuYscfFZwQHGJnRh4fQUErpFOh1LHKCiFvExTW7LK58X8egH+bQJMcyfOpyfjY3TiFIQU1GL+JCt+6qYle7i253lnDs4hgemJdOrU3unY4nDVNQiPqC+0c2za7bx19UFRLZtwxM/PY3LR/TSiJIAKmoRx7lKypm12EXe3iouG9GLeZcNo2sHjSjJ/6eiFnHIkfomHl+1hefXFRLTsS3PX5/KBcN6OB1LfJCKWsQBXxYeIC3dRdGBGq4Z25c5lwwlqp1GlOTYWlzUxpg2wAZgl7X2Uu9FEglcVbUNLFiZxxvrdxLXJYI3bxrHhESNKMmPO5Ej6luBXEC3Q4mchNV5+7hjSRb7Kmu56YwE/njhYNqH6/ZvOb4WFbUxpg8wBXgA+INXE4kEmIPV9dz3bjZLN+1mUI8OPH3tBEbGaURJWq6lR9SPA7OAjj/0AGPMTGAmQFxc3CkHE/F31lrede1h3jvZVNU2cOt5A/ntuYmEh+r2bzkxxy1qY8ylQKm1dqMx5pwfepy1dhGwCCA1NdV6KqCIP9pbUcudS7NYlbuPEX078fD0FAb3/MHjHJEf1ZIj6onA5caYS4B2QJQx5nVr7c+9G03E/1hreevrYh5ckUuD282dU4Zy48QE2uj2bzkFxy1qa+0cYA5A8xH1/6ikRf7TjgPVpKVn8kXhAcb378qC6cn066oRJTl1uo5a5BQ1uS0vf7adRz/MJywkhIeuTOanY/rq9m/xmBMqamvtJ8AnXkki4ofy9x4dUdpcXM75Q7szf2oyPaPbOR1LAoyOqEVOQn2jm6c/KeCpjwuIahfGk9eM5NKUWB1Fi1eoqEVO0KbicmYvdpG/r4qpp/Xi7suS6BIZ7nQsCWAqapEWOlLfxJ8+zOelz7bTI6odL/0ilUlDNKIk3qeiFmmBz7ftJy09k50Ha7h2XBxpFw+ho0aUpJWoqEV+RGVtAw+9l8vfvyomvmsEb808ndP7d3U6lgQZFbXID1iVs487lmZSVlXHzWf157bzB2lESRyhohb5nv2H67j33Rze3bybIT078vz1qaT06eR0LAliKmqRZtZalm3azb3vZlNd18QfLxjEzWcP0IiSOE5FLQLsLj/CnUuzWJ1Xysi4oyNKA3toREl8g4pagprbbXnzq50sWJlHk9ty96XDuGFCvEaUxKeoqCVobd9fTVq6i/XbDzIxsSsPTUshrmuE07FE/oOKWoJOY5ObFz/dzmMfbSE8NISHp6dwVWof3f4tPktFLUElZ3cls9NdZO6q4MJhPbh/6nB6RGlESXybilqCQl1jE39dXcAzn2yjU0QYT/1sFJck99RRtPgFFbUEvI07DjE73UVB6WGuHNWbu6YMo7NGlMSPqKglYNXUN/LIB/m88nkRsVHtePnGMZw7uLvTsUROmIpaAtKnW/eTluGi5NARrh/fj1kXDaFDW/11F/+kv7kSUCpqGnjgvRze3lBC/26RvH3zeMYmdHE6lsgpUVFLwHg/ay93LcviYHU9vzlnALeeN5B2YRpREv+noha/V1ZVx7x3slmRuYdhsVG8/IsxDO8d7XQsEY9RUYvfstaS8c0u7luew5H6Jm6fPJiZZ/UnrI1GlCSwqKjFL+0qP8LcjEzWbCljdL/OLJyeQmL3Dk7HEvEKFbX4Fbfb8vr6HSxcmYcF7r08ietO70eIRpQkgKmoxW9sKztMWrqLr4sOcebAbjw4LZm+XTSiJIFPRS0+r6HJzfPrCnl81Vbah7Xh0atGMH1Ub93+LUFDRS0+LWtXBbPTXWTvruTi4T2594okunfUiJIEFxW1+KTahiaeXL2VZ9cU0jkinGeuHcXFybFOxxJxhIpafM6GooPMSndRWFbNVaP7cMeUoXSK0IiSBC8VtfiM6rqjI0qvflFEr+j2vDZjLGcNinE6lojjVNTiE9ZsKWNuRia7K45ww/h4bp88mEiNKIkAKmpxWHlNPfcvzyX9mxIGxETyj5vHkxqvESWR7zpuURtj2gFrgbbNj19srb3H28Ek8K3M3MNdy7I5VFPPLecmcsukRI0oiRxDS46o64BJ1trDxpgw4FNjzEpr7ZdeziYBqrSylruXZfN+9l6SekXx6owxJPXSiJLIDzluUVtrLXC4+Ydhzd+sN0NJYLLWsnhjCfcvz6G20c3si4bwqzMTCNWIksiPatE5amNMG2AjkAg8Za1df4zHzARmAsTFxXkyowSA4oM1zF2Sybqt+xkTf3REqX+MRpREWqJFRW2tbQJOM8Z0ApYYY4Zba7O+95hFwCKA1NRUHXELAE1uy2tfFPHIB/kY4P4rkrh2nEaURE7ECV31Ya0tN8Z8DFwEZB3v8RLcCkqrmJ2eycYdhzh7UAwPXplM707tnY4l4ndactVHDNDQXNLtgQuAhV5PJn6rocnNc2u28Zd/FhDRtg2P/dcIpo3UiJLIyWrJEXUs8GrzeeoQ4G1r7XLvxhJ/lbWrgtsXu8jdU8mUlFjmXZZETMe2TscS8WstuerDBYxshSzix2obmnh81VaeX1dI18hwnrtuNJOTejodSyQg6M5EOWVfbT9IWrqLwv3VXJ3al7lThhLdPszpWCIBQ0UtJ62qtoGH38/nb1/uoG+X9rxx0zgmJnZzOpZIwFFRy0n5OL+UOzIy2VNZy4yJCfzP5EFEhOuvk4g36F+WnJBD1fXcvzyHjG93MbB7B9J/M4FRcZ2djiUS0FTU0iLWWlZk7uGeZdlUHGng95MS+e2kRNqGakRJxNtU1HJc+ypruWtpFh/m7COlTzSv3zSOobFRTscSCRoqavlB1lre3lDM/BW51De6mXvJEGZM1IiSSGtTUcsx7TxQw5wlLj4rOMC4hC4snJ5CfLdIp2OJBCUVtfybJrfllc+LePSDfNqEGB6YNpxrxsRpREnEQSpq+Zct+6qYtdjFpuJyJg3pzgPThhMbrRElEaepqIX6RjfPrtnGk6u30qFtKE/89DQuH9FLI0oiPkJFHeQ2F5czO91F3t4qLhvRi3mXDaNrB40oifgSFXWQOlLfxOOrtvD8ukJiOrbl+etTuWBYD6djicgxqKiD0BfbDjAnw0XRgRquGRvHnEuGENVOI0oivkpFHUQqaxtYsDKPN9fvpF/XCN781TgmDNCIkoivU1EHidV5+5ibkUVpVS2/OjOBP1wwmPbhuv1bxB+oqAPcgcN13Lc8h2WbdjO4R0eevW40p/Xt5HQsETkBKuoAZa3lXdce5r2TTVVtA7edP5D/PieR8FDd/i3ib1TUAWhvRS13Ls1kVW4pI/p24uHpKQzu2dHpWCJyklTUAcRay1tfF/Pgilwa3G7unDKUGycm0Ea3f4v4NRV1gCjaX82cjEy+KDzA+P5dWTA9mX5dNaIkEghU1H6uyW156dPt/OmjfMJCQlhwZTJXj+mr279FAoiK2o/l761i1uLNbC6p4Pyh3Zk/NZme0e2cjiUiHqai9kP1jW6e+riApz8pIKpdGE9eM5JLU2J1FC0SoFTUfmZTcTmzFm9my77DTD2tF3dflkSXyHCnY4mIF6mo/cSR+ib+9GE+L322nR5R7XjpF6lMGqIRJZFgoKL2A58X7CctI5OdB2v4+elxzL5oCB01oiQSNFTUPqziSAMPvZfLW18XE981grdmns7p/bs6HUtEWpmK2kd9lLOPO5dmUlZVx81n9+f/nD+IdmEaURIJRipqH7P/cB3z3slmuWsPQ3p25PnrU0np08npWCLioOMWtTGmL/Aa0AOwwCJr7RPeDhZsrLUs27Sbe9/NprquiT9eMIibzx6gESURadERdSPwR2vtN8aYjsBGY8xH1tocL2cLGrvLj3Dn0ixW55UyMu7oiNLAHhpREpGjjlvU1to9wJ7mt6uMMblAb0BFfYrcbsubX+1kwco8mtyWuy8dxg0T4jWiJCL/5oTOURtj4oGRwPpjvG8mMBMgLi7OE9kC2vb91cxOd/HV9oOckdiNh65Mpm+XCKdjiYgPanFRG2M6AOnAbdbayu+/31q7CFgEkJqaaj2WMMA0Nrl54dPt/PmjLYSHhvDw9BSuSu2j279F5Ae1qKiNMWEcLek3rLUZ3o0UuHJ2VzI73UXmrgouHNaD+6cOp0eURpRE5Me15KoPA7wI5FprH/N+pMBT19jEX1cX8Mwn2+gUEcZTPxvFJck9dRQtIi3SkiPqicB1QKYxZlPzz8211r7ntVQBZOOOQ8xOd1FQepgrR/XmrinD6KwRJRE5AS256uNTQId+J6i6rpFHP8znlc+L6BXdnlduHMM5g7s7HUtE/JDuTPSCdVvLmJORScmhI1w/vh+zLhpCh7b6rRaRk6P28KCKmgYeeC+HtzeU0L9bJG/fPJ6xCV2cjiUifk5F7SHvZ+3lrmVZHKyu5zfnDODW8wZqRElEPEJFfYrKqo6OKK3I3MOw2Che/sUYhveOdjqWiAQQFfVJstaS8c0u7luew5GGJm6fPJiZZ/UnrI1GlETEs1TUJ6HkUA1zl2SxdksZo/t1ZuH0FBK7d3A6logEKBX1CXC7La+v38HClXlY4N7Lk7ju9H6EaERJRLxIRd1C28oOk5bu4uuiQ5w5sBsPTtOIkoi0DhX1cTQ0uXl+XSGPr9pK+7A2PHrVCKaP6q3bv0Wk1aiof0TWrgpmp7vI3l3JJck9mXd5Et07akRJRFqXivoYahua+Ms/t/Lc2kI6R4Tz7M9HcdHwWKdjiUiQUlF/z4aig8xKd1FYVs1Vo/tw55RhREeEOR1LRIKYirrZ4bpGHnk/j9e+3EGv6Pa8NmMsZw2KcTqWiIiKGmDNljLmZmSyu+IIN4yP5/bJg4nUiJKI+IigbqPymnruW55Dxje7GBATyT9uHk9qvEaURMS3BG1Rv5e5h7uXZVFe08At5yZyy6REjSiJiE8KuqIurazl7mXZvJ+9l+G9o3h1xliSemlESUR8V9AUtbWWf2wsYf7yHGob3cy+aAi/OjOBUI0oiYiPC4qiLj5Yw9wlmazbup+x8V1YMD2Z/jEaURIR/xDQRd3ktrz2RREPv59PiIH7r0ji2nEaURIR/xKwRV1QWsWsxS6+2VnO2YNiePDKZHp3au90LBGRExZwRd3Q5Oa5Ndv4yz8LiGjbhj9fPYKpp2lESUT8V0AVdWZJBbcv3kze3iqmpMRy7+VJdOvQ1ulYIiKnJCCKurahiT+v2sIL67bTNTKc564bzeSknk7HEhHxCL8v6vWFB0jLyGT7/mquTu3L3ClDiW6vESURCRx+W9RVtQ0sfD+P17/cSd8u7XnjpnFMTOzmdCwREY/zy6L+OK+UO5Zksqeyll+ekcAfLxxERLhfvhQRkePyq3Y7WF3P/ctzWPLtLgZ270D6byYwKq6z07FERLzKL4raWsty1x7mvZNNxZEGfn/eQH577gDahmpESUQCn88X9b7KWu5YksWq3H2k9Inm9ZvGMTQ2yulYIiKtxmeL2lrL/35dzAPv5VLf6GbuJUOYMVEjSiISfI5b1MaYl4BLgVJr7XDvR4KdB2pIy3Dx+bYDjEvowsLpKcR3i2yNDy0i4nNackT9CvBX4DXvRjk6ovTyZ9t59MN8QkNCeGDacK4ZE6cRJREJasctamvtWmNMvLeDVNQ0cMPLX7GpuJxJQ7rzwLThxEZrRElExGPnqI0xM4GZAHFxcSf866Pah9KvawQ3Tozn8hG9NKIkItLMWGuP/6CjR9TLW3qOOjU11W7YsOEUo4mIBA9jzEZrbeqx3qdLKEREfJyKWkTExx23qI0xfwe+AAYbY0qMMb/0fiwREfl/WnLVxzWtEURERI5Npz5ERHycilpExMepqEVEfJyKWkTEx7XohpcTflJjyoAdJ/nLuwH7PRjHH+g1B75ge72g13yi+llrY471Dq8U9akwxmz4obtzApVec+ALttcLes2epFMfIiI+TkUtIuLjfLGoFzkdwAF6zYEv2F4v6DV7jM+doxYRkX/ni0fUIiLyHSpqEREf5zNFbYx5yRhTaozJcjpLazDG9DXGfGyMyTHGZBtjbnU6k7cZY9oZY74yxmxufs33Op2ptRhj2hhjvjXGLHc6S2swxhQZYzKNMZuMMUHxVUSMMZ2MMYuNMXnGmFxjzHiPPbevnKM2xpwFHAZea62vdu4kY0wsEGut/cYY0xHYCEy11uY4HM1rzNGvrxZprT1sjAkDPgVutdZ+6XA0rzPG/AFIBaKstZc6ncfbjDFFQKq1NmhueDHGvAqss9a+YIwJByKsteWeeG6fOaK21q4FDjqdo7VYa/dYa79pfrsKyAV6O5vKu+xRh5t/GNb8zTeOFLzIGNMHmAK84HQW8Q5jTDRwFvAigLW23lMlDT5U1MGs+WtSjgTWOxzF65pPAWwCSoGPrLUB/5qBx4FZgNvhHK3JAh8aYzY2f+HrQJcAlAEvN5/iesEYE+mpJ1dRO8wY0wFIB26z1lY6ncfbrLVN1trTgD7AWGNMQJ/mMsZcCpRaazc6naWVnWGtHQVcDPy2+dRmIAsFRgHPWGtHAtVAmqeeXEXtoObztOnAG9baDKfztKbmTws/Bi5yOIq3TQQubz5n+xYwyRjzurORvM9au6v5+1JgCTDW2UReVwKUfOczxMUcLW6PUFE7pPk/1l4Ecq21jzmdpzUYY2KMMZ2a324PXADkORrKy6y1c6y1fay18cBPgdXW2p87HMurjDGRzf9BTvOn/xcCAX01l7V2L1BsjBnc/FPnAR67MOC4XzOxtTR/Ed1zgG7GmBLgHmvti86m8qqJwHVAZvM5W4C51tr3nIvkdbHAq8aYNhw9SHjbWhsUl6sFmR7AkqPHIoQCb1pr33c2Uqv4HfBG8xUfhcCNnnpin7k8T0REjk2nPkREfJyKWkTEx6moRUR8nIpaRMTHqahFRHycilpExMepqEVEfNz/BcHtvKh0gTY8AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "df = pd.DataFrame({\"var1\":[1,2,3,4,5,6], \"var2\":[1,2,3,4,5,6]})\n",
    "plt.plot(df[\"var1\"], df[\"var2\"])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**With version 1.5+**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAeZklEQVR4nO3deXhV9aH18e8vJAESSJgChCEkEMaQIBBAwBEHVBxA6rXWqpVabG9t9W2vEHBCRQW1VmudcLZqvZYEUBAHigJOKCiczBBCIGFKGDKQkPH83j/I29dalADnZJ9hfZ6Hh0AOJ+swrGdns/eKsdYiIiK+K8TpACIi8uNU1CIiPk5FLSLi41TUIiI+TkUtIuLjQr3xpN26dbPx8fHeeGoRkYC0cePG/dbamGO9zytFHR8fz4YNG7zx1CIiAckYs+OH3qdTHyIiPk5FLSLi41TUIiI+TkUtIuLjVNQiIj6uRUVtjOlkjFlsjMkzxuQaY8Z7O5iIiBzV0svzngDet9b+xBgTDkR4MZOIiHzHcY+ojTHRwFnAiwDW2nprbbmXc4mI+JWviw7y7JptXnnulpz6SADKgJeNMd8aY14wxkR+/0HGmJnGmA3GmA1lZWUeDyoi4osO1zVy97Isrnr2C95cv5Oa+kaPf4yWFHUoMAp4xlo7EqgG0r7/IGvtImttqrU2NSbmmHdBiogElDVbypj857X87csd3DgxnpW3nklEuOdv+G7JM5YAJdba9c0/XswxilpEJFgcqq7n/hU5ZHyzi8TuHVj86wmM7tfZax/vuEVtrd1rjCk2xgy21uYD5wE5XkskIuKjrLWszNrL3cuyKK9p4HeTErllUiJtQ9t49eO29Bj9d8AbzVd8FAI3ei+SiIjvKa2s5a5lWXyQvY/k3tG8NmMcw3pFtcrHblFRW2s3AanejSIi4nustfxjYwnzl+dQ1+gm7eIh3HRGAqFtWu9+Qa/MnIqIBILigzXMycjk04L9jE3owoIrk+kf06HVc6ioRUS+p8ltefXzIh75IJ82IYb5U4fzs7FxhIQYR/KoqEVEvmPrvipmp7v4Zmc55wyO4cFpyfTq1N7RTCpqERGgocnNs59s48nVBUS2bcPjV5/GFaf1whhnjqK/S0UtIkEvs6SC2xdvJm9vFZemxDLv8iS6dWjrdKx/UVGLSNCqbWjiz6u28PzaQmI6tmXRdaO5MKmn07H+g4paRILSl4UHSEt3UXSghmvG9iXt4qFEtw9zOtYxqahFJKhU1TawYGUeb6zfSVyXCN68aRwTErs5HetHqahFJGh8nFfK3CWZ7Kus5aYzEvjDhYO8MqLkab6fUETkFB2srue+d7NZumk3A7t34OnfTGBknPdGlDxNRS0iActay3LXHua9k01lbQO3njeQ/z53gNdHlDxNRS0iAWlvRS13Ls1iVe4+RvSJZuFPxjGkZ+uMKHmailpEAoq1lre+LubBFbk0uN3ccclQZpyRQBuHbv/2BBW1iASMHQeqSUvP5IvCA5zevwsLrkwhvtt/fOVAv6OiFhG/1+S2vPzZdh79MJ+wkBAeujKZq1P7Ojai5GkqahHxa/l7q5iV7mJzcTnnD+3O/KnJ9Ixu53Qsj1JRi4hfqm908/QnBTz1cQEd24Xxl2tGcllKrE+MKHmailpE/M6m4nJmL3aRv6+KK07rxT2XJdElMtzpWF6johYRv3GkvonHPsrnxU+3071jO168IZXzhvZwOpbXqahFxC98vm0/aemZ7DxYw7Xj4ph98RCi2vnmiJKnqahFxKdV1jbw0Ht5/P2rncR3jeCtmadzev+uTsdqVSpqEfFZq3L2ccfSTMqq6rj5rP7cdv4g2of71+3fnqCiFhGfc+BwHfPezeHdzbsZ0rMjz1+fSkqfTk7HcoyKWkR8hrWWdzbvZt472Ryua+QPFwzi12cPIDw0xOlojlJRi4hP2F1+hDuXZrE6r5SRcZ1YOD2FQT06Oh3LJ6ioRcRRbrflza92smBlHk1uy92XDuOGCfF+PaLkaSpqEXHM9v3VpKW7WL/9IBMTu/LQtBTiukY4HcvnqKhFpNU1Nrl58dPtPPbRFsJDQ3h4egpXpfYJyNu/PUFFLSKtKndPJbPTXbhKKrhgWA/mTx1Oj6jAGlHyNBW1iLSKusYmnlpdwNOfbKNTRBhP/WwUlyT31FF0C6ioRcTrNu44xOx0FwWlh7lyVG/umjKMzgE8ouRpLSpqY0wRUAU0AY3W2lRvhhKRwFBT38gjH+TzyudFxEa14+Ubx3Du4O5Ox/I7J3JEfa61dr/XkohIQPl0637SMlyUHDrC9eP7MeuiIXRoq0/iT4Z+10TEoyqONPDgilz+d0MxCd0iefvm8YxN6OJ0LL/W0qK2wIfGGAs8Z61d9P0HGGNmAjMB4uLiPJdQRPzGB9l7uWtpFgeq6/n12QO47fyBtAsLvhElT2tpUZ9hrd1ljOkOfGSMybPWrv3uA5rLexFAamqq9XBOEfFhZVV1zHsnmxWZexgWG8VLvxjD8N7RTscKGC0qamvtrubvS40xS4CxwNof/1UiEuistSz5dhf3Lc+hpq6J2ycPZuZZ/QlrE9wjSp523KI2xkQCIdbaqua3LwTu83oyEfFpu8qPMDcjkzVbyhjdrzMLp6eQ2L2D07ECUkuOqHsAS5ovSg8F3rTWvu/VVCLis9xuy+vrd7BwZR4WmHfZMK4fH0+IRpS85rhFba0tBEa0QhYR8XHbyg6Tlu7i66JDnDmwGw9OS6ZvF40oeZsuzxOR42pscrNoXSGPr9pKu9AQHvlJCj8ZrRGl1qKiFpEflb27gtnpLrJ2VXLx8J7ce0US3TtqRKk1qahF5JhqG5p4cvVWnl1TSOeIcJ65dhQXJ8c6HSsoqahF5D9sKDrIrHQXhWXV/GR0H+6cMpROERpRcoqKWkT+pbru6IjSq18U0Su6Pa/NGMtZg2KcjhX0VNQiAsDaLWXMychkd8URbhgfz+2TBxOpESWfoD8FkSBXXlPP/BW5LN5YQv+YSP5x83hS4zWi5EtU1CJBbGXmHu5als2hmnpuOTeRWyYlakTJB6moRYJQaVUt9yzLZmXWXpJ6RfHqjDEk9dKIkq9SUYsEEWstizeWMH9FLkcamph90RB+dWYCoRpR8mkqapEgUXywhrlLMlm3dT9j4juzYHoKA2I0ouQPVNQiAc7ttrz2RREPf5CPAe6/Iolrx/XTiJIfUVGLBLCC0ipmp2eyccchzh4UwwPThtOns0aU/I2KWiQANTS5WbS2kCdWbSWibRse+68RTBvZWyNKfkpFLRJgsnZVMGuxi5w9lUxJiWXeZUnEdGzrdCw5BSpqkQBR29DEE//cyqK1hXSJDOe560YzOamn07HEA1TUIgHgq+0HSUt3Ubi/mqtT+zL3kqFER4Q5HUs8REUt4scO1zWycGUef/tyB306t+f1X47jjIHdnI4lHqaiFvFTH+eXckdGJnsqa5kxMYH/mTyIiHD9kw5E+lMV8TOHquu5f3kOGd/uIrF7Bxb/egKj+3V2OpZ4kYpaxE9Ya3kvcy/3vJNFeU0Dv5+UyG8nJdI2VCNKgU5FLeIH9lXWctfSLD7M2Udy72j+9stxDI2NcjqWtBIVtYgPs9by9oZi5q/Ipb7RzZyLh/DLMzSiFGxU1CI+aueBGuYscfFZwQHGJnRh4fQUErpFOh1LHKCiFvExTW7LK58X8egH+bQJMcyfOpyfjY3TiFIQU1GL+JCt+6qYle7i253lnDs4hgemJdOrU3unY4nDVNQiPqC+0c2za7bx19UFRLZtwxM/PY3LR/TSiJIAKmoRx7lKypm12EXe3iouG9GLeZcNo2sHjSjJ/6eiFnHIkfomHl+1hefXFRLTsS3PX5/KBcN6OB1LfJCKWsQBXxYeIC3dRdGBGq4Z25c5lwwlqp1GlOTYWlzUxpg2wAZgl7X2Uu9FEglcVbUNLFiZxxvrdxLXJYI3bxrHhESNKMmPO5Ej6luBXEC3Q4mchNV5+7hjSRb7Kmu56YwE/njhYNqH6/ZvOb4WFbUxpg8wBXgA+INXE4kEmIPV9dz3bjZLN+1mUI8OPH3tBEbGaURJWq6lR9SPA7OAjj/0AGPMTGAmQFxc3CkHE/F31lrede1h3jvZVNU2cOt5A/ntuYmEh+r2bzkxxy1qY8ylQKm1dqMx5pwfepy1dhGwCCA1NdV6KqCIP9pbUcudS7NYlbuPEX078fD0FAb3/MHjHJEf1ZIj6onA5caYS4B2QJQx5nVr7c+9G03E/1hreevrYh5ckUuD282dU4Zy48QE2uj2bzkFxy1qa+0cYA5A8xH1/6ikRf7TjgPVpKVn8kXhAcb378qC6cn066oRJTl1uo5a5BQ1uS0vf7adRz/MJywkhIeuTOanY/rq9m/xmBMqamvtJ8AnXkki4ofy9x4dUdpcXM75Q7szf2oyPaPbOR1LAoyOqEVOQn2jm6c/KeCpjwuIahfGk9eM5NKUWB1Fi1eoqEVO0KbicmYvdpG/r4qpp/Xi7suS6BIZ7nQsCWAqapEWOlLfxJ8+zOelz7bTI6odL/0ilUlDNKIk3qeiFmmBz7ftJy09k50Ha7h2XBxpFw+ho0aUpJWoqEV+RGVtAw+9l8vfvyomvmsEb808ndP7d3U6lgQZFbXID1iVs487lmZSVlXHzWf157bzB2lESRyhohb5nv2H67j33Rze3bybIT078vz1qaT06eR0LAliKmqRZtZalm3azb3vZlNd18QfLxjEzWcP0IiSOE5FLQLsLj/CnUuzWJ1Xysi4oyNKA3toREl8g4pagprbbXnzq50sWJlHk9ty96XDuGFCvEaUxKeoqCVobd9fTVq6i/XbDzIxsSsPTUshrmuE07FE/oOKWoJOY5ObFz/dzmMfbSE8NISHp6dwVWof3f4tPktFLUElZ3cls9NdZO6q4MJhPbh/6nB6RGlESXybilqCQl1jE39dXcAzn2yjU0QYT/1sFJck99RRtPgFFbUEvI07DjE73UVB6WGuHNWbu6YMo7NGlMSPqKglYNXUN/LIB/m88nkRsVHtePnGMZw7uLvTsUROmIpaAtKnW/eTluGi5NARrh/fj1kXDaFDW/11F/+kv7kSUCpqGnjgvRze3lBC/26RvH3zeMYmdHE6lsgpUVFLwHg/ay93LcviYHU9vzlnALeeN5B2YRpREv+noha/V1ZVx7x3slmRuYdhsVG8/IsxDO8d7XQsEY9RUYvfstaS8c0u7luew5H6Jm6fPJiZZ/UnrI1GlCSwqKjFL+0qP8LcjEzWbCljdL/OLJyeQmL3Dk7HEvEKFbX4Fbfb8vr6HSxcmYcF7r08ietO70eIRpQkgKmoxW9sKztMWrqLr4sOcebAbjw4LZm+XTSiJIFPRS0+r6HJzfPrCnl81Vbah7Xh0atGMH1Ub93+LUFDRS0+LWtXBbPTXWTvruTi4T2594okunfUiJIEFxW1+KTahiaeXL2VZ9cU0jkinGeuHcXFybFOxxJxhIpafM6GooPMSndRWFbNVaP7cMeUoXSK0IiSBC8VtfiM6rqjI0qvflFEr+j2vDZjLGcNinE6lojjVNTiE9ZsKWNuRia7K45ww/h4bp88mEiNKIkAKmpxWHlNPfcvzyX9mxIGxETyj5vHkxqvESWR7zpuURtj2gFrgbbNj19srb3H28Ek8K3M3MNdy7I5VFPPLecmcsukRI0oiRxDS46o64BJ1trDxpgw4FNjzEpr7ZdeziYBqrSylruXZfN+9l6SekXx6owxJPXSiJLIDzluUVtrLXC4+Ydhzd+sN0NJYLLWsnhjCfcvz6G20c3si4bwqzMTCNWIksiPatE5amNMG2AjkAg8Za1df4zHzARmAsTFxXkyowSA4oM1zF2Sybqt+xkTf3REqX+MRpREWqJFRW2tbQJOM8Z0ApYYY4Zba7O+95hFwCKA1NRUHXELAE1uy2tfFPHIB/kY4P4rkrh2nEaURE7ECV31Ya0tN8Z8DFwEZB3v8RLcCkqrmJ2eycYdhzh7UAwPXplM707tnY4l4ndactVHDNDQXNLtgQuAhV5PJn6rocnNc2u28Zd/FhDRtg2P/dcIpo3UiJLIyWrJEXUs8GrzeeoQ4G1r7XLvxhJ/lbWrgtsXu8jdU8mUlFjmXZZETMe2TscS8WstuerDBYxshSzix2obmnh81VaeX1dI18hwnrtuNJOTejodSyQg6M5EOWVfbT9IWrqLwv3VXJ3al7lThhLdPszpWCIBQ0UtJ62qtoGH38/nb1/uoG+X9rxx0zgmJnZzOpZIwFFRy0n5OL+UOzIy2VNZy4yJCfzP5EFEhOuvk4g36F+WnJBD1fXcvzyHjG93MbB7B9J/M4FRcZ2djiUS0FTU0iLWWlZk7uGeZdlUHGng95MS+e2kRNqGakRJxNtU1HJc+ypruWtpFh/m7COlTzSv3zSOobFRTscSCRoqavlB1lre3lDM/BW51De6mXvJEGZM1IiSSGtTUcsx7TxQw5wlLj4rOMC4hC4snJ5CfLdIp2OJBCUVtfybJrfllc+LePSDfNqEGB6YNpxrxsRpREnEQSpq+Zct+6qYtdjFpuJyJg3pzgPThhMbrRElEaepqIX6RjfPrtnGk6u30qFtKE/89DQuH9FLI0oiPkJFHeQ2F5czO91F3t4qLhvRi3mXDaNrB40oifgSFXWQOlLfxOOrtvD8ukJiOrbl+etTuWBYD6djicgxqKiD0BfbDjAnw0XRgRquGRvHnEuGENVOI0oivkpFHUQqaxtYsDKPN9fvpF/XCN781TgmDNCIkoivU1EHidV5+5ibkUVpVS2/OjOBP1wwmPbhuv1bxB+oqAPcgcN13Lc8h2WbdjO4R0eevW40p/Xt5HQsETkBKuoAZa3lXdce5r2TTVVtA7edP5D/PieR8FDd/i3ib1TUAWhvRS13Ls1kVW4pI/p24uHpKQzu2dHpWCJyklTUAcRay1tfF/Pgilwa3G7unDKUGycm0Ea3f4v4NRV1gCjaX82cjEy+KDzA+P5dWTA9mX5dNaIkEghU1H6uyW156dPt/OmjfMJCQlhwZTJXj+mr279FAoiK2o/l761i1uLNbC6p4Pyh3Zk/NZme0e2cjiUiHqai9kP1jW6e+riApz8pIKpdGE9eM5JLU2J1FC0SoFTUfmZTcTmzFm9my77DTD2tF3dflkSXyHCnY4mIF6mo/cSR+ib+9GE+L322nR5R7XjpF6lMGqIRJZFgoKL2A58X7CctI5OdB2v4+elxzL5oCB01oiQSNFTUPqziSAMPvZfLW18XE981grdmns7p/bs6HUtEWpmK2kd9lLOPO5dmUlZVx81n9+f/nD+IdmEaURIJRipqH7P/cB3z3slmuWsPQ3p25PnrU0np08npWCLioOMWtTGmL/Aa0AOwwCJr7RPeDhZsrLUs27Sbe9/NprquiT9eMIibzx6gESURadERdSPwR2vtN8aYjsBGY8xH1tocL2cLGrvLj3Dn0ixW55UyMu7oiNLAHhpREpGjjlvU1to9wJ7mt6uMMblAb0BFfYrcbsubX+1kwco8mtyWuy8dxg0T4jWiJCL/5oTOURtj4oGRwPpjvG8mMBMgLi7OE9kC2vb91cxOd/HV9oOckdiNh65Mpm+XCKdjiYgPanFRG2M6AOnAbdbayu+/31q7CFgEkJqaaj2WMMA0Nrl54dPt/PmjLYSHhvDw9BSuSu2j279F5Ae1qKiNMWEcLek3rLUZ3o0UuHJ2VzI73UXmrgouHNaD+6cOp0eURpRE5Me15KoPA7wI5FprH/N+pMBT19jEX1cX8Mwn2+gUEcZTPxvFJck9dRQtIi3SkiPqicB1QKYxZlPzz8211r7ntVQBZOOOQ8xOd1FQepgrR/XmrinD6KwRJRE5AS256uNTQId+J6i6rpFHP8znlc+L6BXdnlduHMM5g7s7HUtE/JDuTPSCdVvLmJORScmhI1w/vh+zLhpCh7b6rRaRk6P28KCKmgYeeC+HtzeU0L9bJG/fPJ6xCV2cjiUifk5F7SHvZ+3lrmVZHKyu5zfnDODW8wZqRElEPEJFfYrKqo6OKK3I3MOw2Che/sUYhveOdjqWiAQQFfVJstaS8c0u7luew5GGJm6fPJiZZ/UnrI1GlETEs1TUJ6HkUA1zl2SxdksZo/t1ZuH0FBK7d3A6logEKBX1CXC7La+v38HClXlY4N7Lk7ju9H6EaERJRLxIRd1C28oOk5bu4uuiQ5w5sBsPTtOIkoi0DhX1cTQ0uXl+XSGPr9pK+7A2PHrVCKaP6q3bv0Wk1aiof0TWrgpmp7vI3l3JJck9mXd5Et07akRJRFqXivoYahua+Ms/t/Lc2kI6R4Tz7M9HcdHwWKdjiUiQUlF/z4aig8xKd1FYVs1Vo/tw55RhREeEOR1LRIKYirrZ4bpGHnk/j9e+3EGv6Pa8NmMsZw2KcTqWiIiKGmDNljLmZmSyu+IIN4yP5/bJg4nUiJKI+IigbqPymnruW55Dxje7GBATyT9uHk9qvEaURMS3BG1Rv5e5h7uXZVFe08At5yZyy6REjSiJiE8KuqIurazl7mXZvJ+9l+G9o3h1xliSemlESUR8V9AUtbWWf2wsYf7yHGob3cy+aAi/OjOBUI0oiYiPC4qiLj5Yw9wlmazbup+x8V1YMD2Z/jEaURIR/xDQRd3ktrz2RREPv59PiIH7r0ji2nEaURIR/xKwRV1QWsWsxS6+2VnO2YNiePDKZHp3au90LBGRExZwRd3Q5Oa5Ndv4yz8LiGjbhj9fPYKpp2lESUT8V0AVdWZJBbcv3kze3iqmpMRy7+VJdOvQ1ulYIiKnJCCKurahiT+v2sIL67bTNTKc564bzeSknk7HEhHxCL8v6vWFB0jLyGT7/mquTu3L3ClDiW6vESURCRx+W9RVtQ0sfD+P17/cSd8u7XnjpnFMTOzmdCwREY/zy6L+OK+UO5Zksqeyll+ekcAfLxxERLhfvhQRkePyq3Y7WF3P/ctzWPLtLgZ270D6byYwKq6z07FERLzKL4raWsty1x7mvZNNxZEGfn/eQH577gDahmpESUQCn88X9b7KWu5YksWq3H2k9Inm9ZvGMTQ2yulYIiKtxmeL2lrL/35dzAPv5VLf6GbuJUOYMVEjSiISfI5b1MaYl4BLgVJr7XDvR4KdB2pIy3Dx+bYDjEvowsLpKcR3i2yNDy0i4nNackT9CvBX4DXvRjk6ovTyZ9t59MN8QkNCeGDacK4ZE6cRJREJasctamvtWmNMvLeDVNQ0cMPLX7GpuJxJQ7rzwLThxEZrRElExGPnqI0xM4GZAHFxcSf866Pah9KvawQ3Tozn8hG9NKIkItLMWGuP/6CjR9TLW3qOOjU11W7YsOEUo4mIBA9jzEZrbeqx3qdLKEREfJyKWkTExx23qI0xfwe+AAYbY0qMMb/0fiwREfl/WnLVxzWtEURERI5Npz5ERHycilpExMepqEVEfJyKWkTEx7XohpcTflJjyoAdJ/nLuwH7PRjHH+g1B75ge72g13yi+llrY471Dq8U9akwxmz4obtzApVec+ALttcLes2epFMfIiI+TkUtIuLjfLGoFzkdwAF6zYEv2F4v6DV7jM+doxYRkX/ni0fUIiLyHSpqEREf5zNFbYx5yRhTaozJcjpLazDG9DXGfGyMyTHGZBtjbnU6k7cZY9oZY74yxmxufs33Op2ptRhj2hhjvjXGLHc6S2swxhQZYzKNMZuMMUHxVUSMMZ2MMYuNMXnGmFxjzHiPPbevnKM2xpwFHAZea62vdu4kY0wsEGut/cYY0xHYCEy11uY4HM1rzNGvrxZprT1sjAkDPgVutdZ+6XA0rzPG/AFIBaKstZc6ncfbjDFFQKq1NmhueDHGvAqss9a+YIwJByKsteWeeG6fOaK21q4FDjqdo7VYa/dYa79pfrsKyAV6O5vKu+xRh5t/GNb8zTeOFLzIGNMHmAK84HQW8Q5jTDRwFvAigLW23lMlDT5U1MGs+WtSjgTWOxzF65pPAWwCSoGPrLUB/5qBx4FZgNvhHK3JAh8aYzY2f+HrQJcAlAEvN5/iesEYE+mpJ1dRO8wY0wFIB26z1lY6ncfbrLVN1trTgD7AWGNMQJ/mMsZcCpRaazc6naWVnWGtHQVcDPy2+dRmIAsFRgHPWGtHAtVAmqeeXEXtoObztOnAG9baDKfztKbmTws/Bi5yOIq3TQQubz5n+xYwyRjzurORvM9au6v5+1JgCTDW2UReVwKUfOczxMUcLW6PUFE7pPk/1l4Ecq21jzmdpzUYY2KMMZ2a324PXADkORrKy6y1c6y1fay18cBPgdXW2p87HMurjDGRzf9BTvOn/xcCAX01l7V2L1BsjBnc/FPnAR67MOC4XzOxtTR/Ed1zgG7GmBLgHmvti86m8qqJwHVAZvM5W4C51tr3nIvkdbHAq8aYNhw9SHjbWhsUl6sFmR7AkqPHIoQCb1pr33c2Uqv4HfBG8xUfhcCNnnpin7k8T0REjk2nPkREfJyKWkTEx6moRUR8nIpaRMTHqahFRHycilpExMepqEVEfNz/BcHtvKh0gTY8AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot(\"var1\", \"var2\", data=df)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"chaining\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Method Chaining"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With method chaining, it eliminates the need for making intermediary variables.  You can process your data frame in a series of method calls when you enclose your data frame with parenthesis. Here's a very contrived example where you are creating these intermediary variables between transformation steps:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('data.csv')\n",
    "df = df.fillna(...)\n",
    "df = df.query('some_condition')\n",
    "df['new_column'] = df.cut(...)\n",
    "df = df.pivot_table(...)\n",
    "df = df.rename(...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of the above, we can chain the methods as long as you surround the code with parenthesis:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "(pd.read_csv('data.csv')\n",
    "   .fillna(...)\n",
    "   .query('some_condition')\n",
    "   .assign(new_column = df.cut(...))\n",
    "   .pivot_table(...)\n",
    "   .rename(...)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is recommended that you don't use method chaining if it will significantly impact readability."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Also check out [pipe() method](http://pandas.pydata.org/pandas-docs/stable/basics.html#tablewise-function-application) as it can also be used to better organize data transformations in discrete functions.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"pipe\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### pipe() example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A short series of video [tutorials](https://calmcode.io/pandas-pipe/introduction.html) on using `pipe()`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "df = pd.DataFrame({'group': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],\n",
    "                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  group  ounces\n",
       "0     a     4.0\n",
       "1     a     3.0\n",
       "2     a    12.0\n",
       "3     b     6.0\n",
       "4     b     7.5\n",
       "5     b     8.0\n",
       "6     c     3.0\n",
       "7     c     5.0\n",
       "8     c     6.0"
      ]
     },
     "execution_count": 176,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 177,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mean_age_by_group(df, col):\n",
    "    return df.groupby(col).mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 178,
   "metadata": {},
   "outputs": [],
   "source": [
    "def uppercase_column_names(df):\n",
    "    df.columns = df.columns.str.upper()\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 179,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>OUNCES</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>group</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>6.3333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>7.1666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>4.6666667</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          OUNCES\n",
       "group           \n",
       "a      6.3333333\n",
       "b      7.1666667\n",
       "c      4.6666667"
      ]
     },
     "execution_count": 179,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(df.pipe(mean_age_by_group, col='group')\n",
    "     .pipe(uppercase_column_names)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"rpy2\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[back to top](#top)]"
   ]
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "toc": {
   "toc_cell": false,
   "toc_number_sections": true,
   "toc_threshold": 6,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}