# Advanced indexing

In [None]:
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
try:
    import seaborn
except ImportError:
    pass

pd.options.display.max_rows = 10

This dataset is borrowed from the [PyCon tutorial of Brandon Rhodes](https://github.com/brandon-rhodes/pycon-pandas-tutorial/) (so all credit to him!). You can download these data from here: [`titles.csv`](https://drive.google.com/file/d/0B3G70MlBnCgKa0U4WFdWdGdVOFU/view?usp=sharing) and [`cast.csv`](https://drive.google.com/file/d/0B3G70MlBnCgKRzRmTWdQTUdjNnM/view?usp=sharing) and put them in the `/data` folder.

In [None]:
cast = pd.read_csv('data/cast.csv')
cast.head()

In [None]:
titles = pd.read_csv('data/titles.csv')
titles.head()

## Setting columns as the index

Why is it useful to have an index?

- Giving meaningful labels to your data -> easier to remember which data are where
- Unleash some powerful methods, eg with a DatetimeIndex for time series
- Easier and faster selection of data

It is this last one we are going to explore here!

Setting the `title` column as the index:

In [None]:
c = cast.set_index('title')

In [None]:
c.head()

Instead of doing:

In [None]:
%%time
cast[cast['title'] == 'Hamlet']

we can now do:

In [None]:
%%time
c.loc['Hamlet']

But you can also have multiple columns as the index, leading to a **multi-index or hierarchical index**:

In [None]:
c = cast.set_index(['title', 'year'])

In [None]:
c.head()

In [None]:
%%time
c.loc[('Hamlet', 2000),:]

In [None]:
c2 = c.sort_index()

In [None]:
%%time
c2.loc[('Hamlet', 2000),:]