{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hierarchical Indexing"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Up to this point we've been focused primarily on one-dimensional and two-dimensional data, stored in Pandas `Series` and `DataFrame` objects, respectively.\n",
    "Often it is useful to go beyond this and store higher-dimensional data—that is, data indexed by more than one or two keys.\n",
    "Early Pandas versions provided `Panel` and `Panel4D` objects that could be thought of as 3D or 4D analogs to the 2D `DataFrame`, but they were somewhat clunky to use in practice. A far more common pattern for handling higher-dimensional data is to make use of *hierarchical indexing* (also known as *multi-indexing*) to incorporate multiple index *levels* within a single index.\n",
    "In this way, higher-dimensional data can be compactly represented within the familiar one-dimensional `Series` and two-dimensional `DataFrame` objects.\n",
    "(If you're interested in true *N*-dimensional arrays with Pandas-style flexible indices, you can look into the excellent [Xarray package](https://xarray.pydata.org/).)\n",
    "\n",
    "In this chapter, we'll explore the direct creation of `MultiIndex` objects; considerations when indexing, slicing, and computing statistics across multiply indexed data; and useful routines for converting between simple and hierarchically indexed representations of data.\n",
    "\n",
    "We begin with the standard imports:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "deletable": true,
    "editable": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## A Multiply Indexed Series\n",
    "\n",
    "Let's start by considering how we might represent two-dimensional data within a one-dimensional `Series`.\n",
    "For concreteness, we will consider a series of data where each point has a character and numerical key."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### The Bad Way\n",
    "\n",
    "Suppose you would like to track data about states from two different years.\n",
    "Using the Pandas tools we've already covered, you might be tempted to simply use Python tuples as keys:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(California, 2010)    37253956\n",
       "(California, 2020)    39538223\n",
       "(New York, 2010)      19378102\n",
       "(New York, 2020)      20201249\n",
       "(Texas, 2010)         25145561\n",
       "(Texas, 2020)         29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "index = [('California', 2010), ('California', 2020),\n",
    "         ('New York', 2010), ('New York', 2020),\n",
    "         ('Texas', 2010), ('Texas', 2020)]\n",
    "populations = [37253956, 39538223,\n",
    "               19378102, 20201249,\n",
    "               25145561, 29145505]\n",
    "pop = pd.Series(populations, index=index)\n",
    "pop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "With this indexing scheme, you can straightforwardly index or slice the series based on this tuple index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(California, 2020)    39538223\n",
       "(New York, 2010)      19378102\n",
       "(New York, 2020)      20201249\n",
       "(Texas, 2010)         25145561\n",
       "dtype: int64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop[('California', 2020):('Texas', 2010)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "But the convenience ends there. For example, if you need to select all values from 2010, you'll need to do some messy (and potentially slow) munging to make it happen:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(California, 2010)    37253956\n",
       "(New York, 2010)      19378102\n",
       "(Texas, 2010)         25145561\n",
       "dtype: int64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop[[i for i in pop.index if i[1] == 2010]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "This produces the desired result, but is not as clean (or as efficient for large datasets) as the slicing syntax we've grown to love in Pandas."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### The Better Way: The Pandas MultiIndex\n",
    "Fortunately, Pandas provides a better way.\n",
    "Our tuple-based indexing is essentially a rudimentary multi-index, and the Pandas `MultiIndex` type gives us the types of operations we wish to have.\n",
    "We can create a multi-index from the tuples as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "index = pd.MultiIndex.from_tuples(index)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The `MultiIndex` represents multiple *levels* of indexing—in this case, the state names and the years—as well as multiple *labels* for each data point which encode these levels.\n",
    "\n",
    "If we reindex our series with this `MultiIndex`, we see the hierarchical representation of the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California  2010    37253956\n",
       "            2020    39538223\n",
       "New York    2010    19378102\n",
       "            2020    20201249\n",
       "Texas       2010    25145561\n",
       "            2020    29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop = pop.reindex(index)\n",
    "pop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Here the first two columns of the Series representation show the multiple index values, while the third column shows the data.\n",
    "Notice that some entries are missing in the first column: in this multi-index representation, any blank entry indicates the same value as the line above it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Now to access all data for which the second index is 2020, we can use the Pandas slicing notation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California    39538223\n",
       "New York      20201249\n",
       "Texas         29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop[:, 2020]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The result is a singly indexed Series with just the keys we're interested in.\n",
    "This syntax is much more convenient (and the operation is much more efficient!) than the home-spun tuple-based multi-indexing solution that we started with.\n",
    "We'll now further discuss this sort of indexing operation on hierarchically indexed data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### MultiIndex as Extra Dimension\n",
    "\n",
    "You might notice something else here: we could easily have stored the same data using a simple `DataFrame` with index and column labels.\n",
    "In fact, Pandas is built with this equivalence in mind. The `unstack` method will quickly convert a multiply indexed `Series` into a conventionally indexed `DataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2010</th>\n",
       "      <th>2020</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>37253956</td>\n",
       "      <td>39538223</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19378102</td>\n",
       "      <td>20201249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>25145561</td>\n",
       "      <td>29145505</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                2010      2020\n",
       "California  37253956  39538223\n",
       "New York    19378102  20201249\n",
       "Texas       25145561  29145505"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop_df = pop.unstack()\n",
    "pop_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Naturally, the ``stack`` method provides the opposite operation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California  2010    37253956\n",
       "            2020    39538223\n",
       "New York    2010    19378102\n",
       "            2020    20201249\n",
       "Texas       2010    25145561\n",
       "            2020    29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop_df.stack()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Seeing this, you might wonder why would we would bother with hierarchical indexing at all.\n",
    "The reason is simple: just as we were able to use multi-indexing to manipulate two-dimensional data within a one-dimensional `Series`, we can also use it to manipulate data of three or more dimensions in a `Series` or `DataFrame`.\n",
    "Each extra level in a multi-index represents an extra dimension of data; taking advantage of this property gives us much more flexibility in the types of data we can represent. Concretely, we might want to add another column of demographic data for each state at each year (say, population under 18); with a `MultiIndex` this is as easy as adding another column to the ``DataFrame``:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>total</th>\n",
       "      <th>under18</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">California</th>\n",
       "      <th>2010</th>\n",
       "      <td>37253956</td>\n",
       "      <td>9284094</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020</th>\n",
       "      <td>39538223</td>\n",
       "      <td>8898092</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">New York</th>\n",
       "      <th>2010</th>\n",
       "      <td>19378102</td>\n",
       "      <td>4318033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020</th>\n",
       "      <td>20201249</td>\n",
       "      <td>4181528</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Texas</th>\n",
       "      <th>2010</th>\n",
       "      <td>25145561</td>\n",
       "      <td>6879014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020</th>\n",
       "      <td>29145505</td>\n",
       "      <td>7432474</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    total  under18\n",
       "California 2010  37253956  9284094\n",
       "           2020  39538223  8898092\n",
       "New York   2010  19378102  4318033\n",
       "           2020  20201249  4181528\n",
       "Texas      2010  25145561  6879014\n",
       "           2020  29145505  7432474"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop_df = pd.DataFrame({'total': pop,\n",
    "                       'under18': [9284094, 8898092,\n",
    "                                   4318033, 4181528,\n",
    "                                   6879014, 7432474]})\n",
    "pop_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "In addition, all the ufuncs and other functionality discussed in [Operating on Data in Pandas](03.03-Operations-in-Pandas.ipynb) work with hierarchical indices as well.\n",
    "Here we compute the fraction of people under 18 by year, given the above data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2010</th>\n",
       "      <th>2020</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>0.249211</td>\n",
       "      <td>0.225050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>0.222831</td>\n",
       "      <td>0.206994</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>0.273568</td>\n",
       "      <td>0.255013</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                2010      2020\n",
       "California  0.249211  0.225050\n",
       "New York    0.222831  0.206994\n",
       "Texas       0.273568  0.255013"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "f_u18 = pop_df['under18'] / pop_df['total']\n",
    "f_u18.unstack()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "This allows us to easily and quickly manipulate and explore even high-dimensional data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Methods of MultiIndex Creation\n",
    "\n",
    "The most straightforward way to construct a multiply indexed `Series` or `DataFrame` is to simply pass a list of two or more index arrays to the constructor. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td>0.748464</td>\n",
       "      <td>0.561409</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.379199</td>\n",
       "      <td>0.622461</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td>0.701679</td>\n",
       "      <td>0.687932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.436200</td>\n",
       "      <td>0.950664</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        data1     data2\n",
       "a 1  0.748464  0.561409\n",
       "  2  0.379199  0.622461\n",
       "b 1  0.701679  0.687932\n",
       "  2  0.436200  0.950664"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(np.random.rand(4, 2),\n",
    "                  index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],\n",
    "                  columns=['data1', 'data2'])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The work of creating the ``MultiIndex`` is done in the background.\n",
    "\n",
    "Similarly, if you pass a dictionary with appropriate tuples as keys, Pandas will automatically recognize this and use a ``MultiIndex`` by default:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California  2010    37253956\n",
       "            2020    39538223\n",
       "New York    2010    19378102\n",
       "            2020    20201249\n",
       "Texas       2010    25145561\n",
       "            2020    29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {('California', 2010): 37253956,\n",
    "        ('California', 2020): 39538223,\n",
    "        ('New York', 2010): 19378102,\n",
    "        ('New York', 2020): 20201249,\n",
    "        ('Texas', 2010): 25145561,\n",
    "        ('Texas', 2020): 29145505}\n",
    "pd.Series(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Nevertheless, it is sometimes useful to explicitly create a `MultiIndex`; we'll look at a couple of methods for doing this next."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Explicit MultiIndex Constructors\n",
    "\n",
    "For more flexibility in how the index is constructed, you can instead use the constructor methods available in the `pd.MultiIndex` class.\n",
    "For example, as we did before, you can construct a `MultiIndex` from a simple list of arrays giving the index values within each level:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('a', 1),\n",
       "            ('a', 2),\n",
       "            ('b', 1),\n",
       "            ('b', 2)],\n",
       "           )"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Or you can construct it from a list of tuples giving the multiple index values of each point:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('a', 1),\n",
       "            ('a', 2),\n",
       "            ('b', 1),\n",
       "            ('b', 2)],\n",
       "           )"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "You can even construct it from a Cartesian product of single indices:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('a', 1),\n",
       "            ('a', 2),\n",
       "            ('b', 1),\n",
       "            ('b', 2)],\n",
       "           )"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.MultiIndex.from_product([['a', 'b'], [1, 2]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Similarly, you can construct a `MultiIndex` directly using its internal encoding by passing `levels` (a list of lists containing available index values for each level) and `codes` (a list of lists that reference these labels):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('a', 1),\n",
       "            ('a', 2),\n",
       "            ('b', 1),\n",
       "            ('b', 2)],\n",
       "           )"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.MultiIndex(levels=[['a', 'b'], [1, 2]],\n",
    "              codes=[[0, 0, 1, 1], [0, 1, 0, 1]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Any of these objects can be passed as the `index` argument when creating a `Series` or `DataFrame`, or be passed to the `reindex` method of an existing `Series` or `DataFrame`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### MultiIndex Level Names\n",
    "\n",
    "Sometimes it is convenient to name the levels of the `MultiIndex`.\n",
    "This can be accomplished by passing the `names` argument to any of the previously discussed `MultiIndex` constructors, or by setting the `names` attribute of the index after the fact:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2010    37253956\n",
       "            2020    39538223\n",
       "New York    2010    19378102\n",
       "            2020    20201249\n",
       "Texas       2010    25145561\n",
       "            2020    29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop.index.names = ['state', 'year']\n",
    "pop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "With more involved datasets, this can be a useful way to keep track of the meaning of various index values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### MultiIndex for Columns\n",
    "\n",
    "In a `DataFrame`, the rows and columns are completely symmetric, and just as the rows can have multiple levels of indices, the columns can have multiple levels as well.\n",
    "Consider the following, which is a mock-up of some (somewhat realistic) medical data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>subject</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Bob</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Guido</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Sue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2013</th>\n",
       "      <th>1</th>\n",
       "      <td>30.0</td>\n",
       "      <td>38.0</td>\n",
       "      <td>56.0</td>\n",
       "      <td>38.3</td>\n",
       "      <td>45.0</td>\n",
       "      <td>35.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>47.0</td>\n",
       "      <td>37.1</td>\n",
       "      <td>27.0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>37.0</td>\n",
       "      <td>36.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2014</th>\n",
       "      <th>1</th>\n",
       "      <td>51.0</td>\n",
       "      <td>35.9</td>\n",
       "      <td>24.0</td>\n",
       "      <td>36.7</td>\n",
       "      <td>32.0</td>\n",
       "      <td>36.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>49.0</td>\n",
       "      <td>36.3</td>\n",
       "      <td>48.0</td>\n",
       "      <td>39.2</td>\n",
       "      <td>31.0</td>\n",
       "      <td>35.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject      Bob       Guido         Sue      \n",
       "type          HR  Temp    HR  Temp    HR  Temp\n",
       "year visit                                    \n",
       "2013 1      30.0  38.0  56.0  38.3  45.0  35.8\n",
       "     2      47.0  37.1  27.0  36.0  37.0  36.4\n",
       "2014 1      51.0  35.9  24.0  36.7  32.0  36.2\n",
       "     2      49.0  36.3  48.0  39.2  31.0  35.7"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# hierarchical indices and columns\n",
    "index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],\n",
    "                                   names=['year', 'visit'])\n",
    "columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],\n",
    "                                     names=['subject', 'type'])\n",
    "\n",
    "# mock some data\n",
    "data = np.round(np.random.randn(4, 6), 1)\n",
    "data[:, ::2] *= 10\n",
    "data += 37\n",
    "\n",
    "# create the DataFrame\n",
    "health_data = pd.DataFrame(data, index=index, columns=columns)\n",
    "health_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "This is fundamentally four-dimensional data, where the dimensions are the subject, the measurement type, the year, and the visit number.\n",
    "With this in place we can, for example, index the top-level column by the person's name and get a full `DataFrame` containing just that person's information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2013</th>\n",
       "      <th>1</th>\n",
       "      <td>56.0</td>\n",
       "      <td>38.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27.0</td>\n",
       "      <td>36.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2014</th>\n",
       "      <th>1</th>\n",
       "      <td>24.0</td>\n",
       "      <td>36.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>48.0</td>\n",
       "      <td>39.2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "type          HR  Temp\n",
       "year visit            \n",
       "2013 1      56.0  38.3\n",
       "     2      27.0  36.0\n",
       "2014 1      24.0  36.7\n",
       "     2      48.0  39.2"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "health_data['Guido']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Indexing and Slicing a MultiIndex\n",
    "\n",
    "Indexing and slicing on a `MultiIndex` is designed to be intuitive, and it helps if you think about the indices as added dimensions.\n",
    "We'll first look at indexing multiply indexed `Series`, and then multiply indexed `DataFrame` objects."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Multiply Indexed Series\n",
    "\n",
    "Consider the multiply indexed `Series` of state populations we saw earlier:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2010    37253956\n",
       "            2020    39538223\n",
       "New York    2010    19378102\n",
       "            2020    20201249\n",
       "Texas       2010    25145561\n",
       "            2020    29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "We can access single elements by indexing with multiple terms:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "37253956"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop['California', 2010]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The `MultiIndex` also supports *partial indexing*, or indexing just one of the levels in the index.\n",
    "The result is another `Series`, with the lower-level indices maintained:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "year\n",
       "2010    37253956\n",
       "2020    39538223\n",
       "dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop['California']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Partial slicing is available as well, as long as the `MultiIndex` is sorted (see the discussion in [Sorted and Unsorted Indices](#Sorted-and-unsorted-indices)):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2010    37253956\n",
       "            2020    39538223\n",
       "New York    2010    19378102\n",
       "            2020    20201249\n",
       "dtype: int64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop.loc['California':'New York']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "With sorted indices, partial indexing can be performed on lower levels by passing an empty slice in the first index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state\n",
       "California    37253956\n",
       "New York      19378102\n",
       "Texas         25145561\n",
       "dtype: int64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop[:, 2010]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Other types of indexing and selection (discussed in [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb)) work as well; for example, selection based on Boolean masks:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2010    37253956\n",
       "            2020    39538223\n",
       "Texas       2010    25145561\n",
       "            2020    29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop[pop > 22000000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Selection based on fancy indexing also works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2010    37253956\n",
       "            2020    39538223\n",
       "Texas       2010    25145561\n",
       "            2020    29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop[['California', 'Texas']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Multiply Indexed DataFrames\n",
    "\n",
    "A multiply indexed `DataFrame` behaves in a similar manner.\n",
    "Consider our toy medical `DataFrame` from before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>subject</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Bob</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Guido</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Sue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2013</th>\n",
       "      <th>1</th>\n",
       "      <td>30.0</td>\n",
       "      <td>38.0</td>\n",
       "      <td>56.0</td>\n",
       "      <td>38.3</td>\n",
       "      <td>45.0</td>\n",
       "      <td>35.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>47.0</td>\n",
       "      <td>37.1</td>\n",
       "      <td>27.0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>37.0</td>\n",
       "      <td>36.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2014</th>\n",
       "      <th>1</th>\n",
       "      <td>51.0</td>\n",
       "      <td>35.9</td>\n",
       "      <td>24.0</td>\n",
       "      <td>36.7</td>\n",
       "      <td>32.0</td>\n",
       "      <td>36.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>49.0</td>\n",
       "      <td>36.3</td>\n",
       "      <td>48.0</td>\n",
       "      <td>39.2</td>\n",
       "      <td>31.0</td>\n",
       "      <td>35.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject      Bob       Guido         Sue      \n",
       "type          HR  Temp    HR  Temp    HR  Temp\n",
       "year visit                                    \n",
       "2013 1      30.0  38.0  56.0  38.3  45.0  35.8\n",
       "     2      47.0  37.1  27.0  36.0  37.0  36.4\n",
       "2014 1      51.0  35.9  24.0  36.7  32.0  36.2\n",
       "     2      49.0  36.3  48.0  39.2  31.0  35.7"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "health_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Remember that columns are primary in a `DataFrame`, and the syntax used for multiply indexed `Series` applies to the columns.\n",
    "For example, we can recover Guido's heart rate data with a simple operation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "year  visit\n",
       "2013  1        56.0\n",
       "      2        27.0\n",
       "2014  1        24.0\n",
       "      2        48.0\n",
       "Name: (Guido, HR), dtype: float64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "health_data['Guido', 'HR']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Also, as with the single-index case, we can use the `loc`, `iloc`, and `ix` indexers introduced in [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb). For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>subject</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Bob</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2013</th>\n",
       "      <th>1</th>\n",
       "      <td>30.0</td>\n",
       "      <td>38.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>47.0</td>\n",
       "      <td>37.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject      Bob      \n",
       "type          HR  Temp\n",
       "year visit            \n",
       "2013 1      30.0  38.0\n",
       "     2      47.0  37.1"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "health_data.iloc[:2, :2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "These indexers provide an array-like view of the underlying two-dimensional data, but each individual index in `loc` or `iloc` can be passed a tuple of multiple indices. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "year  visit\n",
       "2013  1        30.0\n",
       "      2        47.0\n",
       "2014  1        51.0\n",
       "      2        49.0\n",
       "Name: (Bob, HR), dtype: float64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "health_data.loc[:, ('Bob', 'HR')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Working with slices within these index tuples is not especially convenient; trying to create a slice within a tuple will lead to a syntax error:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "invalid syntax (3311942670.py, line 1)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;36m  File \u001b[0;32m\"/var/folders/xc/sptt9bk14s34rgxt7453p03r0000gp/T/ipykernel_86488/3311942670.py\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m    health_data.loc[(:, 1), (:, 'HR')]\u001b[0m\n\u001b[0m                     ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
     ]
    }
   ],
   "source": [
    "health_data.loc[(:, 1), (:, 'HR')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "You could get around this by building the desired slice explicitly using Python's built-in `slice` function, but a better way in this context is to use an `IndexSlice` object, which Pandas provides for precisely this situation.\n",
    "For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>subject</th>\n",
       "      <th>Bob</th>\n",
       "      <th>Guido</th>\n",
       "      <th>Sue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>HR</th>\n",
       "      <th>HR</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <th>1</th>\n",
       "      <td>30.0</td>\n",
       "      <td>56.0</td>\n",
       "      <td>45.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <th>1</th>\n",
       "      <td>51.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>32.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject      Bob Guido   Sue\n",
       "type          HR    HR    HR\n",
       "year visit                  \n",
       "2013 1      30.0  56.0  45.0\n",
       "2014 1      51.0  24.0  32.0"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "idx = pd.IndexSlice\n",
    "health_data.loc[idx[:, 1], idx[:, 'HR']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "As you can see, there are many ways to interact with data in multiply indexed `Series` and ``DataFrame``s, and as with many tools in this book the best way to become familiar with them is to try them out!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Rearranging Multi-Indexes\n",
    "\n",
    "One of the keys to working with multiply indexed data is knowing how to effectively transform the data.\n",
    "There are a number of operations that will preserve all the information in the dataset, but rearrange it for the purposes of various computations.\n",
    "We saw a brief example of this in the `stack` and `unstack` methods, but there are many more ways to finely control the rearrangement of data between hierarchical indices and columns, and we'll explore them here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Sorted and Unsorted Indices\n",
    "\n",
    "Earlier I briefly mentioned a caveat, but I should emphasize it more here.\n",
    "*Many of the `MultiIndex` slicing operations will fail if the index is not sorted.*\n",
    "Let's take a closer look.\n",
    "\n",
    "We'll start by creating some simple multiply indexed data where the indices are *not lexographically sorted*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "char  int\n",
       "a     1      0.280341\n",
       "      2      0.097290\n",
       "c     1      0.206217\n",
       "      2      0.431771\n",
       "b     1      0.100183\n",
       "      2      0.015851\n",
       "dtype: float64"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1, 2]])\n",
    "data = pd.Series(np.random.rand(6), index=index)\n",
    "data.index.names = ['char', 'int']\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "If we try to take a partial slice of this index, it will result in an error:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "KeyError 'Key length (1) was greater than MultiIndex lexsort depth (0)'\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    data['a':'b']\n",
    "except KeyError as e:\n",
    "    print(\"KeyError\", e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Although it is not entirely clear from the error message, this is the result of the `MultiIndex` not being sorted.\n",
    "For various reasons, partial slices and other similar operations require the levels in the `MultiIndex` to be in sorted (i.e., lexographical) order.\n",
    "Pandas provides a number of convenience routines to perform this type of sorting, such as the `sort_index` and `sortlevel` methods of the `DataFrame`.\n",
    "We'll use the simplest, `sort_index`, here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "char  int\n",
       "a     1      0.280341\n",
       "      2      0.097290\n",
       "b     1      0.100183\n",
       "      2      0.015851\n",
       "c     1      0.206217\n",
       "      2      0.431771\n",
       "dtype: float64"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = data.sort_index()\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "With the index sorted in this way, partial slicing will work as expected:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "char  int\n",
       "a     1      0.280341\n",
       "      2      0.097290\n",
       "b     1      0.100183\n",
       "      2      0.015851\n",
       "dtype: float64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['a':'b']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Stacking and Unstacking Indices\n",
    "\n",
    "As we saw briefly before, it is possible to convert a dataset from a stacked multi-index to a simple two-dimensional representation, optionally specifying the level to use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>state</th>\n",
       "      <th>California</th>\n",
       "      <th>New York</th>\n",
       "      <th>Texas</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>37253956</td>\n",
       "      <td>19378102</td>\n",
       "      <td>25145561</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020</th>\n",
       "      <td>39538223</td>\n",
       "      <td>20201249</td>\n",
       "      <td>29145505</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "state  California  New York     Texas\n",
       "year                                 \n",
       "2010     37253956  19378102  25145561\n",
       "2020     39538223  20201249  29145505"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop.unstack(level=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>year</th>\n",
       "      <th>2010</th>\n",
       "      <th>2020</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>state</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>37253956</td>\n",
       "      <td>39538223</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19378102</td>\n",
       "      <td>20201249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>25145561</td>\n",
       "      <td>29145505</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "year            2010      2020\n",
       "state                         \n",
       "California  37253956  39538223\n",
       "New York    19378102  20201249\n",
       "Texas       25145561  29145505"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop.unstack(level=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The opposite of `unstack` is `stack`, which here can be used to recover the original series:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2010    37253956\n",
       "            2020    39538223\n",
       "New York    2010    19378102\n",
       "            2020    20201249\n",
       "Texas       2010    25145561\n",
       "            2020    29145505\n",
       "dtype: int64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop.unstack().stack()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Index Setting and Resetting\n",
    "\n",
    "Another way to rearrange hierarchical data is to turn the index labels into columns; this can be accomplished with the `reset_index` method.\n",
    "Calling this on the population dictionary will result in a `DataFrame` with `state` and `year` columns holding the information that was formerly in the index.\n",
    "For clarity, we can optionally specify the name of the data for the column representation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th>year</th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>California</td>\n",
       "      <td>2010</td>\n",
       "      <td>37253956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>California</td>\n",
       "      <td>2020</td>\n",
       "      <td>39538223</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York</td>\n",
       "      <td>2010</td>\n",
       "      <td>19378102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>New York</td>\n",
       "      <td>2020</td>\n",
       "      <td>20201249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Texas</td>\n",
       "      <td>2010</td>\n",
       "      <td>25145561</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Texas</td>\n",
       "      <td>2020</td>\n",
       "      <td>29145505</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        state  year  population\n",
       "0  California  2010    37253956\n",
       "1  California  2020    39538223\n",
       "2    New York  2010    19378102\n",
       "3    New York  2020    20201249\n",
       "4       Texas  2010    25145561\n",
       "5       Texas  2020    29145505"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop_flat = pop.reset_index(name='population')\n",
    "pop_flat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "A common pattern is to build a `MultiIndex` from the column values.\n",
    "This can be done with the `set_index` method of the `DataFrame`, which returns a multiply indexed `DataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>state</th>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">California</th>\n",
       "      <th>2010</th>\n",
       "      <td>37253956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020</th>\n",
       "      <td>39538223</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">New York</th>\n",
       "      <th>2010</th>\n",
       "      <td>19378102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020</th>\n",
       "      <td>20201249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Texas</th>\n",
       "      <th>2010</th>\n",
       "      <td>25145561</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020</th>\n",
       "      <td>29145505</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 population\n",
       "state      year            \n",
       "California 2010    37253956\n",
       "           2020    39538223\n",
       "New York   2010    19378102\n",
       "           2020    20201249\n",
       "Texas      2010    25145561\n",
       "           2020    29145505"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop_flat.set_index(['state', 'year'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "In practice, this type of reindexing is one of the more useful patterns when exploring real-world datasets."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "jupytext": {
   "formats": "ipynb,md"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}