{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python pandas Q&A video series by [Data School](http://www.dataschool.io/)\n",
    "\n",
    "### [YouTube playlist](https://www.youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y) and [GitHub repository](https://github.com/justmarkham/pandas-videos)\n",
    "\n",
    "## Table of contents\n",
    "\n",
    "1. <a href=\"#1.-What-is-pandas%3F-%28video%29\">What is pandas?</a>\n",
    "2. <a href=\"#2.-How-do-I-read-a-tabular-data-file-into-pandas%3F-%28video%29\">How do I read a tabular data file into pandas?</a>\n",
    "3. <a href=\"#3.-How-do-I-select-a-pandas-Series-from-a-DataFrame%3F-%28video%29\">How do I select a pandas Series from a DataFrame?</a>\n",
    "4. <a href=\"#4.-Why-do-some-pandas-commands-end-with-parentheses-%28and-others-don't%29%3F-%28video%29\">Why do some pandas commands end with parentheses (and others don't)?</a>\n",
    "5. <a href=\"#5.-How-do-I-rename-columns-in-a-pandas-DataFrame%3F-%28video%29\">How do I rename columns in a pandas DataFrame?</a>\n",
    "6. <a href=\"#6.-How-do-I-remove-columns-from-a-pandas-DataFrame%3F-%28video%29\">How do I remove columns from a pandas DataFrame?</a>\n",
    "7. <a href=\"#7.-How-do-I-sort-a-pandas-DataFrame-or-a-Series%3F-%28video%29\">How do I sort a pandas DataFrame or a Series?</a>\n",
    "8. <a href=\"#8.-How-do-I-filter-rows-of-a-pandas-DataFrame-by-column-value%3F-%28video%29\">How do I filter rows of a pandas DataFrame by column value?</a>\n",
    "9. <a href=\"#9.-How-do-I-apply-multiple-filter-criteria-to-a-pandas-DataFrame%3F-%28video%29\">How do I apply multiple filter criteria to a pandas DataFrame?</a>\n",
    "10. <a href=\"#10.-Your-pandas-questions-answered%21-%28video%29\">Your pandas questions answered!</a>\n",
    "11. <a href=\"#11.-How-do-I-use-the-%22axis%22-parameter-in-pandas%3F-%28video%29\">How do I use the \"axis\" parameter in pandas?</a>\n",
    "12. <a href=\"#12.-How-do-I-use-string-methods-in-pandas%3F-%28video%29\">How do I use string methods in pandas?</a>\n",
    "13. <a href=\"#13.-How-do-I-change-the-data-type-of-a-pandas-Series%3F-%28video%29\">How do I change the data type of a pandas Series?</a>\n",
    "14. <a href=\"#14.-When-should-I-use-a-%22groupby%22-in-pandas%3F-%28video%29\">When should I use a \"groupby\" in pandas?</a>\n",
    "15. <a href=\"#15.-How-do-I-explore-a-pandas-Series%3F-%28video%29\">How do I explore a pandas Series?</a>\n",
    "16. <a href=\"#16.-How-do-I-handle-missing-values-in-pandas%3F-%28video%29\">How do I handle missing values in pandas?</a>\n",
    "17. <a href=\"#17.-What-do-I-need-to-know-about-the-pandas-index%3F-%28Part-1%29-%28video%29\">What do I need to know about the pandas index? (Part 1)</a>\n",
    "18. <a href=\"#18.-What-do-I-need-to-know-about-the-pandas-index%3F-%28Part-2%29-%28video%29\">What do I need to know about the pandas index? (Part 2)</a>\n",
    "19. <a href=\"#19.-How-do-I-select-multiple-rows-and-columns-from-a-pandas-DataFrame%3F-%28video%29\">How do I select multiple rows and columns from a pandas DataFrame?</a>\n",
    "20. <a href=\"#20.-When-should-I-use-the-%22inplace%22-parameter-in-pandas%3F-%28video%29\">When should I use the \"inplace\" parameter in pandas?</a>\n",
    "21. <a href=\"#21.-How-do-I-make-my-pandas-DataFrame-smaller-and-faster%3F-%28video%29\">How do I make my pandas DataFrame smaller and faster?</a>\n",
    "22. <a href=\"#22.-How-do-I-use-pandas-with-scikit-learn-to-create-Kaggle-submissions%3F-%28video%29\">How do I use pandas with scikit-learn to create Kaggle submissions?</a>\n",
    "23. <a href=\"#23.-More-of-your-pandas-questions-answered%21-%28video%29\">More of your pandas questions answered!</a>\n",
    "24. <a href=\"#24.-How-do-I-create-dummy-variables-in-pandas%3F-%28video%29\">How do I create dummy variables in pandas?</a>\n",
    "25. <a href=\"#25.-How-do-I-work-with-dates-and-times-in-pandas%3F-%28video%29\">How do I work with dates and times in pandas?</a>\n",
    "26. <a href=\"#26.-How-do-I-find-and-remove-duplicate-rows-in-pandas%3F-%28video%29\">How do I find and remove duplicate rows in pandas?</a>\n",
    "27. <a href=\"#27.-How-do-I-avoid-a-SettingWithCopyWarning-in-pandas%3F-%28video%29\">How do I avoid a SettingWithCopyWarning in pandas?</a>\n",
    "28. <a href=\"#28.-How-do-I-change-display-options-in-pandas%3F-%28video%29\">How do I change display options in pandas?</a>\n",
    "29. <a href=\"#29.-How-do-I-create-a-pandas-DataFrame-from-another-object%3F-%28video%29\">How do I create a pandas DataFrame from another object?</a>\n",
    "30. <a href=\"#30.-How-do-I-apply-a-function-to-a-pandas-Series-or-DataFrame%3F-%28video%29\">How do I apply a function to a pandas Series or DataFrame?</a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# conventional way to import pandas\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. What is pandas? ([video](https://www.youtube.com/watch?v=yzIMircGU5I&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=1))\n",
    "\n",
    "- [pandas main page](http://pandas.pydata.org/)\n",
    "- [pandas installation instructions](http://pandas.pydata.org/pandas-docs/stable/install.html)\n",
    "- [Anaconda distribution of Python](https://www.continuum.io/downloads) (includes pandas)\n",
    "- [How to use the IPython/Jupyter notebook](https://youtu.be/IsXXlYVBt1M?t=5m17s) (video)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. How do I read a tabular data file into pandas? ([video](https://www.youtube.com/watch?v=5_QXMwezPJE&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# read a dataset of Chipotle orders directly from a URL and store the results in a DataFrame\n",
    "orders = pd.read_table('http://bit.ly/chiporders')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>order_id</th>\n",
       "      <th>quantity</th>\n",
       "      <th>item_name</th>\n",
       "      <th>choice_description</th>\n",
       "      <th>item_price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Chips and Fresh Tomato Salsa</td>\n",
       "      <td>NaN</td>\n",
       "      <td>$2.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Izze</td>\n",
       "      <td>[Clementine]</td>\n",
       "      <td>$3.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Nantucket Nectar</td>\n",
       "      <td>[Apple]</td>\n",
       "      <td>$3.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Chips and Tomatillo-Green Chili Salsa</td>\n",
       "      <td>NaN</td>\n",
       "      <td>$2.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>Chicken Bowl</td>\n",
       "      <td>[Tomatillo-Red Chili Salsa (Hot), [Black Beans...</td>\n",
       "      <td>$16.98</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   order_id  quantity                              item_name  \\\n",
       "0         1         1           Chips and Fresh Tomato Salsa   \n",
       "1         1         1                                   Izze   \n",
       "2         1         1                       Nantucket Nectar   \n",
       "3         1         1  Chips and Tomatillo-Green Chili Salsa   \n",
       "4         2         2                           Chicken Bowl   \n",
       "\n",
       "                                  choice_description item_price  \n",
       "0                                                NaN     $2.39   \n",
       "1                                       [Clementine]     $3.39   \n",
       "2                                            [Apple]     $3.39   \n",
       "3                                                NaN     $2.39   \n",
       "4  [Tomatillo-Red Chili Salsa (Hot), [Black Beans...    $16.98   "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the first 5 rows\n",
    "orders.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`read_table`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# read a dataset of movie reviewers (modifying the default parameter values for read_table)\n",
    "user_cols = ['user_id', 'age', 'gender', 'occupation', 'zip_code']\n",
    "users = pd.read_table('http://bit.ly/movieusers', sep='|', header=None, names=user_cols)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip_code</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>24</td>\n",
       "      <td>M</td>\n",
       "      <td>technician</td>\n",
       "      <td>85711</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>53</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>94043</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>23</td>\n",
       "      <td>M</td>\n",
       "      <td>writer</td>\n",
       "      <td>32067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>24</td>\n",
       "      <td>M</td>\n",
       "      <td>technician</td>\n",
       "      <td>43537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>33</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>15213</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id  age gender  occupation zip_code\n",
       "0        1   24      M  technician    85711\n",
       "1        2   53      F       other    94043\n",
       "2        3   23      M      writer    32067\n",
       "3        4   24      M  technician    43537\n",
       "4        5   33      F       other    15213"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the first 5 rows\n",
    "users.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. How do I select a pandas Series from a DataFrame? ([video](https://www.youtube.com/watch?v=zxqjeyKP2Tk&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# read a dataset of UFO reports into a DataFrame\n",
    "ufo = pd.read_table('http://bit.ly/uforeports', sep=',')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# read_csv is equivalent to read_table, except it assumes a comma separator\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State             Time\n",
       "0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3               Abilene             NaN           DISK    KS   6/1/1931 13:00\n",
       "4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the first 5 rows\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                      Ithaca\n",
       "1                 Willingboro\n",
       "2                     Holyoke\n",
       "3                     Abilene\n",
       "4        New York Worlds Fair\n",
       "5                 Valley City\n",
       "6                 Crater Lake\n",
       "7                        Alma\n",
       "8                     Eklutna\n",
       "9                     Hubbard\n",
       "10                    Fontana\n",
       "11                   Waterloo\n",
       "12                     Belton\n",
       "13                     Keokuk\n",
       "14                  Ludington\n",
       "15                Forest Home\n",
       "16                Los Angeles\n",
       "17                  Hapeville\n",
       "18                     Oneida\n",
       "19                 Bering Sea\n",
       "20                   Nebraska\n",
       "21                        NaN\n",
       "22                        NaN\n",
       "23                  Owensboro\n",
       "24                 Wilderness\n",
       "25                  San Diego\n",
       "26                 Wilderness\n",
       "27                     Clovis\n",
       "28                 Los Alamos\n",
       "29               Ft. Duschene\n",
       "                 ...         \n",
       "18211                 Holyoke\n",
       "18212                  Carson\n",
       "18213                Pasadena\n",
       "18214                  Austin\n",
       "18215                El Campo\n",
       "18216            Garden Grove\n",
       "18217           Berthoud Pass\n",
       "18218              Sisterdale\n",
       "18219            Garden Grove\n",
       "18220             Shasta Lake\n",
       "18221                Franklin\n",
       "18222          Albrightsville\n",
       "18223              Greenville\n",
       "18224                 Eufaula\n",
       "18225             Simi Valley\n",
       "18226           San Francisco\n",
       "18227           San Francisco\n",
       "18228              Kingsville\n",
       "18229                 Chicago\n",
       "18230             Pismo Beach\n",
       "18231             Pismo Beach\n",
       "18232                    Lodi\n",
       "18233               Anchorage\n",
       "18234                Capitola\n",
       "18235          Fountain Hills\n",
       "18236              Grant Park\n",
       "18237             Spirit Lake\n",
       "18238             Eagle River\n",
       "18239             Eagle River\n",
       "18240                    Ybor\n",
       "Name: City, dtype: object"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select the 'City' Series using bracket notation\n",
    "ufo['City']\n",
    "\n",
    "# or equivalently, use dot notation\n",
    "ufo.City"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Bracket notation** will always work, whereas **dot notation** has limitations:\n",
    "\n",
    "- Dot notation doesn't work if there are **spaces** in the Series name\n",
    "- Dot notation doesn't work if the Series has the same name as a **DataFrame method or attribute** (like 'head' or 'shape')\n",
    "- Dot notation can't be used to define the name of a **new Series** (see below)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "      <th>Location</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "      <td>Ithaca, NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "      <td>Willingboro, NJ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "      <td>Holyoke, CO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "      <td>Abilene, KS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "      <td>New York Worlds Fair, NY</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State             Time  \\\n",
       "0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00   \n",
       "1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00   \n",
       "2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00   \n",
       "3               Abilene             NaN           DISK    KS   6/1/1931 13:00   \n",
       "4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00   \n",
       "\n",
       "                   Location  \n",
       "0                Ithaca, NY  \n",
       "1           Willingboro, NJ  \n",
       "2               Holyoke, CO  \n",
       "3               Abilene, KS  \n",
       "4  New York Worlds Fair, NY  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a new 'Location' Series (must use bracket notation to define the Series name)\n",
    "ufo['Location'] = ufo.City + ', ' + ufo.State\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Why do some pandas commands end with parentheses (and others don't)? ([video](https://www.youtube.com/watch?v=hSrDViyKWVk&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# read a dataset of top-rated IMDb movies into a DataFrame\n",
    "movies = pd.read_csv('http://bit.ly/imdbratings')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Methods** end with parentheses, while **attributes** don't:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.9</td>\n",
       "      <td>Pulp Fiction</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>154</td>\n",
       "      <td>[u'John Travolta', u'Uma Thurman', u'Samuel L....</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating   genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R   Crime       142   \n",
       "1          9.2             The Godfather              R   Crime       175   \n",
       "2          9.1    The Godfather: Part II              R   Crime       200   \n",
       "3          9.0           The Dark Knight          PG-13  Action       152   \n",
       "4          8.9              Pulp Fiction              R   Crime       154   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1    [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3  [u'Christian Bale', u'Heath Ledger', u'Aaron E...  \n",
       "4  [u'John Travolta', u'Uma Thurman', u'Samuel L....  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# example method: show the first 5 rows\n",
    "movies.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>duration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>979.000000</td>\n",
       "      <td>979.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>7.889785</td>\n",
       "      <td>120.979571</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>0.336069</td>\n",
       "      <td>26.218010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>7.400000</td>\n",
       "      <td>64.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>7.600000</td>\n",
       "      <td>102.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>7.800000</td>\n",
       "      <td>117.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>8.100000</td>\n",
       "      <td>134.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>9.300000</td>\n",
       "      <td>242.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       star_rating    duration\n",
       "count   979.000000  979.000000\n",
       "mean      7.889785  120.979571\n",
       "std       0.336069   26.218010\n",
       "min       7.400000   64.000000\n",
       "25%       7.600000  102.000000\n",
       "50%       7.800000  117.000000\n",
       "75%       8.100000  134.000000\n",
       "max       9.300000  242.000000"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# example method: calculate summary statistics\n",
    "movies.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(979, 6)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# example attribute: number of rows and columns\n",
    "movies.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "star_rating       float64\n",
       "title              object\n",
       "content_rating     object\n",
       "genre              object\n",
       "duration            int64\n",
       "actors_list        object\n",
       "dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# example attribute: data type of each column\n",
    "movies.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>979</td>\n",
       "      <td>976</td>\n",
       "      <td>979</td>\n",
       "      <td>979</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>975</td>\n",
       "      <td>12</td>\n",
       "      <td>16</td>\n",
       "      <td>969</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>The Girl with the Dragon Tattoo</td>\n",
       "      <td>R</td>\n",
       "      <td>Drama</td>\n",
       "      <td>[u'Daniel Radcliffe', u'Emma Watson', u'Rupert...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>2</td>\n",
       "      <td>460</td>\n",
       "      <td>278</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                  title content_rating  genre  \\\n",
       "count                               979            976    979   \n",
       "unique                              975             12     16   \n",
       "top     The Girl with the Dragon Tattoo              R  Drama   \n",
       "freq                                  2            460    278   \n",
       "\n",
       "                                              actors_list  \n",
       "count                                                 979  \n",
       "unique                                                969  \n",
       "top     [u'Daniel Radcliffe', u'Emma Watson', u'Rupert...  \n",
       "freq                                                    6  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use an optional parameter to the describe method to summarize only 'object' columns\n",
    "movies.describe(include=['object'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`describe`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. How do I rename columns in a pandas DataFrame? ([video](https://www.youtube.com/watch?v=0uBirYFhizE&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# read a dataset of UFO reports into a DataFrame\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'City', u'Colors Reported', u'Shape Reported', u'State', u'Time'], dtype='object')"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the column names\n",
    "ufo.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'City', u'Colors_Reported', u'Shape_Reported', u'State', u'Time'], dtype='object')"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rename two of the columns by using the 'rename' method\n",
    "ufo.rename(columns={'Colors Reported':'Colors_Reported', 'Shape Reported':'Shape_Reported'}, inplace=True)\n",
    "ufo.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`rename`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rename.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'city', u'colors reported', u'shape reported', u'state', u'time'], dtype='object')"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# replace all of the column names by overwriting the 'columns' attribute\n",
    "ufo_cols = ['city', 'colors reported', 'shape reported', 'state', 'time']\n",
    "ufo.columns = ufo_cols\n",
    "ufo.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'city', u'colors reported', u'shape reported', u'state', u'time'], dtype='object')"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# replace the column names during the file reading process by using the 'names' parameter\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports', header=0, names=ufo_cols)\n",
    "ufo.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`read_csv`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'city', u'colors_reported', u'shape_reported', u'state', u'time'], dtype='object')"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# replace all spaces with underscores in the column names by using the 'str.replace' method\n",
    "ufo.columns = ufo.columns.str.replace(' ', '_')\n",
    "ufo.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`str.replace`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.replace.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. How do I remove columns from a pandas DataFrame? ([video](https://www.youtube.com/watch?v=gnUKkS964WQ&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State             Time\n",
       "0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3               Abilene             NaN           DISK    KS   6/1/1931 13:00\n",
       "4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of UFO reports into a DataFrame\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports')\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Shape Reported State             Time\n",
       "0                Ithaca       TRIANGLE    NY   6/1/1930 22:00\n",
       "1           Willingboro          OTHER    NJ  6/30/1930 20:00\n",
       "2               Holyoke           OVAL    CO  2/15/1931 14:00\n",
       "3               Abilene           DISK    KS   6/1/1931 13:00\n",
       "4  New York Worlds Fair          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# remove a single column (axis=1 refers to columns)\n",
    "ufo.drop('Colors Reported', axis=1, inplace=True)\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`drop`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OTHER</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>OVAL</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DISK</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>LIGHT</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Shape Reported             Time\n",
       "0       TRIANGLE   6/1/1930 22:00\n",
       "1          OTHER  6/30/1930 20:00\n",
       "2           OVAL  2/15/1931 14:00\n",
       "3           DISK   6/1/1931 13:00\n",
       "4          LIGHT  4/18/1933 19:00"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# remove multiple columns at once\n",
    "ufo.drop(['City', 'State'], axis=1, inplace=True)\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>OVAL</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DISK</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>LIGHT</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>DISK</td>\n",
       "      <td>9/15/1934 15:30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>CIRCLE</td>\n",
       "      <td>6/15/1935 0:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Shape Reported             Time\n",
       "2           OVAL  2/15/1931 14:00\n",
       "3           DISK   6/1/1931 13:00\n",
       "4          LIGHT  4/18/1933 19:00\n",
       "5           DISK  9/15/1934 15:30\n",
       "6         CIRCLE   6/15/1935 0:00"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# remove multiple rows at once (axis=0 refers to rows)\n",
    "ufo.drop([0, 1], axis=0, inplace=True)\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. How do I sort a pandas DataFrame or a Series? ([video](https://www.youtube.com/watch?v=zY4doF6xSxY&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.9</td>\n",
       "      <td>Pulp Fiction</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>154</td>\n",
       "      <td>[u'John Travolta', u'Uma Thurman', u'Samuel L....</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating   genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R   Crime       142   \n",
       "1          9.2             The Godfather              R   Crime       175   \n",
       "2          9.1    The Godfather: Part II              R   Crime       200   \n",
       "3          9.0           The Dark Knight          PG-13  Action       152   \n",
       "4          8.9              Pulp Fiction              R   Crime       154   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1    [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3  [u'Christian Bale', u'Heath Ledger', u'Aaron E...  \n",
       "4  [u'John Travolta', u'Uma Thurman', u'Samuel L....  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of top-rated IMDb movies into a DataFrame\n",
    "movies = pd.read_csv('http://bit.ly/imdbratings')\n",
    "movies.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** None of the sorting methods below affect the underlying data. (In other words, the sorting is temporary)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "542     (500) Days of Summer\n",
       "5               12 Angry Men\n",
       "201         12 Years a Slave\n",
       "698                127 Hours\n",
       "110    2001: A Space Odyssey\n",
       "Name: title, dtype: object"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sort the 'title' Series in ascending order (returns a Series)\n",
    "movies.title.sort_values().head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "864               [Rec]\n",
       "526                Zulu\n",
       "615          Zombieland\n",
       "677              Zodiac\n",
       "955    Zero Dark Thirty\n",
       "Name: title, dtype: object"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sort in descending order instead\n",
    "movies.title.sort_values(ascending=False).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`sort_values`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_values.html) for a **Series**. (Prior to version 0.17, use [**`order`**](http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.order.html) instead.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>542</th>\n",
       "      <td>7.8</td>\n",
       "      <td>(500) Days of Summer</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Comedy</td>\n",
       "      <td>95</td>\n",
       "      <td>[u'Zooey Deschanel', u'Joseph Gordon-Levitt', ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>8.9</td>\n",
       "      <td>12 Angry Men</td>\n",
       "      <td>NOT RATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>96</td>\n",
       "      <td>[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>201</th>\n",
       "      <td>8.1</td>\n",
       "      <td>12 Years a Slave</td>\n",
       "      <td>R</td>\n",
       "      <td>Biography</td>\n",
       "      <td>134</td>\n",
       "      <td>[u'Chiwetel Ejiofor', u'Michael Kenneth Willia...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>698</th>\n",
       "      <td>7.6</td>\n",
       "      <td>127 Hours</td>\n",
       "      <td>R</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>94</td>\n",
       "      <td>[u'James Franco', u'Amber Tamblyn', u'Kate Mara']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>110</th>\n",
       "      <td>8.3</td>\n",
       "      <td>2001: A Space Odyssey</td>\n",
       "      <td>G</td>\n",
       "      <td>Mystery</td>\n",
       "      <td>160</td>\n",
       "      <td>[u'Keir Dullea', u'Gary Lockwood', u'William S...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     star_rating                  title content_rating      genre  duration  \\\n",
       "542          7.8   (500) Days of Summer          PG-13     Comedy        95   \n",
       "5            8.9           12 Angry Men      NOT RATED      Drama        96   \n",
       "201          8.1       12 Years a Slave              R  Biography       134   \n",
       "698          7.6              127 Hours              R  Adventure        94   \n",
       "110          8.3  2001: A Space Odyssey              G    Mystery       160   \n",
       "\n",
       "                                           actors_list  \n",
       "542  [u'Zooey Deschanel', u'Joseph Gordon-Levitt', ...  \n",
       "5    [u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...  \n",
       "201  [u'Chiwetel Ejiofor', u'Michael Kenneth Willia...  \n",
       "698  [u'James Franco', u'Amber Tamblyn', u'Kate Mara']  \n",
       "110  [u'Keir Dullea', u'Gary Lockwood', u'William S...  "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sort the entire DataFrame by the 'title' Series (returns a DataFrame)\n",
    "movies.sort_values('title').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>864</th>\n",
       "      <td>7.5</td>\n",
       "      <td>[Rec]</td>\n",
       "      <td>R</td>\n",
       "      <td>Horror</td>\n",
       "      <td>78</td>\n",
       "      <td>[u'Manuela Velasco', u'Ferran Terraza', u'Jorg...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>526</th>\n",
       "      <td>7.8</td>\n",
       "      <td>Zulu</td>\n",
       "      <td>UNRATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>138</td>\n",
       "      <td>[u'Stanley Baker', u'Jack Hawkins', u'Ulla Jac...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>615</th>\n",
       "      <td>7.7</td>\n",
       "      <td>Zombieland</td>\n",
       "      <td>R</td>\n",
       "      <td>Comedy</td>\n",
       "      <td>88</td>\n",
       "      <td>[u'Jesse Eisenberg', u'Emma Stone', u'Woody Ha...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>677</th>\n",
       "      <td>7.7</td>\n",
       "      <td>Zodiac</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>157</td>\n",
       "      <td>[u'Jake Gyllenhaal', u'Robert Downey Jr.', u'M...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>955</th>\n",
       "      <td>7.4</td>\n",
       "      <td>Zero Dark Thirty</td>\n",
       "      <td>R</td>\n",
       "      <td>Drama</td>\n",
       "      <td>157</td>\n",
       "      <td>[u'Jessica Chastain', u'Joel Edgerton', u'Chri...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     star_rating             title content_rating   genre  duration  \\\n",
       "864          7.5             [Rec]              R  Horror        78   \n",
       "526          7.8              Zulu        UNRATED   Drama       138   \n",
       "615          7.7        Zombieland              R  Comedy        88   \n",
       "677          7.7            Zodiac              R   Crime       157   \n",
       "955          7.4  Zero Dark Thirty              R   Drama       157   \n",
       "\n",
       "                                           actors_list  \n",
       "864  [u'Manuela Velasco', u'Ferran Terraza', u'Jorg...  \n",
       "526  [u'Stanley Baker', u'Jack Hawkins', u'Ulla Jac...  \n",
       "615  [u'Jesse Eisenberg', u'Emma Stone', u'Woody Ha...  \n",
       "677  [u'Jake Gyllenhaal', u'Robert Downey Jr.', u'M...  \n",
       "955  [u'Jessica Chastain', u'Joel Edgerton', u'Chri...  "
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sort in descending order instead\n",
    "movies.sort_values('title', ascending=False).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`sort_values`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html) for a **DataFrame**. (Prior to version 0.17, use [**`sort`**](http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.sort.html) instead.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>713</th>\n",
       "      <td>7.6</td>\n",
       "      <td>The Jungle Book</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Animation</td>\n",
       "      <td>78</td>\n",
       "      <td>[u'Phil Harris', u'Sebastian Cabot', u'Louis P...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>513</th>\n",
       "      <td>7.8</td>\n",
       "      <td>Invasion of the Body Snatchers</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Horror</td>\n",
       "      <td>80</td>\n",
       "      <td>[u'Kevin McCarthy', u'Dana Wynter', u'Larry Ga...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>272</th>\n",
       "      <td>8.1</td>\n",
       "      <td>The Killing</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Crime</td>\n",
       "      <td>85</td>\n",
       "      <td>[u'Sterling Hayden', u'Coleen Gray', u'Vince E...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>703</th>\n",
       "      <td>7.6</td>\n",
       "      <td>Dracula</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Horror</td>\n",
       "      <td>85</td>\n",
       "      <td>[u'Bela Lugosi', u'Helen Chandler', u'David Ma...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>612</th>\n",
       "      <td>7.7</td>\n",
       "      <td>A Hard Day's Night</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Comedy</td>\n",
       "      <td>87</td>\n",
       "      <td>[u'John Lennon', u'Paul McCartney', u'George H...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     star_rating                           title content_rating      genre  \\\n",
       "713          7.6                 The Jungle Book       APPROVED  Animation   \n",
       "513          7.8  Invasion of the Body Snatchers       APPROVED     Horror   \n",
       "272          8.1                     The Killing       APPROVED      Crime   \n",
       "703          7.6                         Dracula       APPROVED     Horror   \n",
       "612          7.7              A Hard Day's Night       APPROVED     Comedy   \n",
       "\n",
       "     duration                                        actors_list  \n",
       "713        78  [u'Phil Harris', u'Sebastian Cabot', u'Louis P...  \n",
       "513        80  [u'Kevin McCarthy', u'Dana Wynter', u'Larry Ga...  \n",
       "272        85  [u'Sterling Hayden', u'Coleen Gray', u'Vince E...  \n",
       "703        85  [u'Bela Lugosi', u'Helen Chandler', u'David Ma...  \n",
       "612        87  [u'John Lennon', u'Paul McCartney', u'George H...  "
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sort the DataFrame first by 'content_rating', then by 'duration'\n",
    "movies.sort_values(['content_rating', 'duration']).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Summary of changes to the sorting API](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#changes-to-sorting-api) in pandas 0.17\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. How do I filter rows of a pandas DataFrame by column value? ([video](https://www.youtube.com/watch?v=2AFGPdNn4FM&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=8))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.9</td>\n",
       "      <td>Pulp Fiction</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>154</td>\n",
       "      <td>[u'John Travolta', u'Uma Thurman', u'Samuel L....</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating   genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R   Crime       142   \n",
       "1          9.2             The Godfather              R   Crime       175   \n",
       "2          9.1    The Godfather: Part II              R   Crime       200   \n",
       "3          9.0           The Dark Knight          PG-13  Action       152   \n",
       "4          8.9              Pulp Fiction              R   Crime       154   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1    [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3  [u'Christian Bale', u'Heath Ledger', u'Aaron E...  \n",
       "4  [u'John Travolta', u'Uma Thurman', u'Samuel L....  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of top-rated IMDb movies into a DataFrame\n",
    "movies = pd.read_csv('http://bit.ly/imdbratings')\n",
    "movies.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(979, 6)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the number of rows and columns\n",
    "movies.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Filter the DataFrame rows to only show movies with a 'duration' of at least 200 minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# create a list in which each element refers to a DataFrame row: True if the row satisfies the condition, False otherwise\n",
    "booleans = []\n",
    "for length in movies.duration:\n",
    "    if length >= 200:\n",
    "        booleans.append(True)\n",
    "    else:\n",
    "        booleans.append(False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "979"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# confirm that the list has the same length as the DataFrame\n",
    "len(booleans)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[False, False, True, False, False]"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the first five list elements\n",
    "booleans[0:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2     True\n",
       "3    False\n",
       "4    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert the list to a Series\n",
    "is_long = pd.Series(booleans)\n",
    "is_long.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8.9</td>\n",
       "      <td>The Lord of the Rings: The Return of the King</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>201</td>\n",
       "      <td>[u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>8.7</td>\n",
       "      <td>Seven Samurai</td>\n",
       "      <td>UNRATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>207</td>\n",
       "      <td>[u'Toshir\\xf4 Mifune', u'Takashi Shimura', u'K...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78</th>\n",
       "      <td>8.4</td>\n",
       "      <td>Once Upon a Time in America</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>229</td>\n",
       "      <td>[u'Robert De Niro', u'James Woods', u'Elizabet...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>8.4</td>\n",
       "      <td>Lawrence of Arabia</td>\n",
       "      <td>PG</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>216</td>\n",
       "      <td>[u\"Peter O'Toole\", u'Alec Guinness', u'Anthony...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>8.3</td>\n",
       "      <td>Lagaan: Once Upon a Time in India</td>\n",
       "      <td>PG</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>224</td>\n",
       "      <td>[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>157</th>\n",
       "      <td>8.2</td>\n",
       "      <td>Gone with the Wind</td>\n",
       "      <td>G</td>\n",
       "      <td>Drama</td>\n",
       "      <td>238</td>\n",
       "      <td>[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>204</th>\n",
       "      <td>8.1</td>\n",
       "      <td>Ben-Hur</td>\n",
       "      <td>G</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>212</td>\n",
       "      <td>[u'Charlton Heston', u'Jack Hawkins', u'Stephe...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>445</th>\n",
       "      <td>7.9</td>\n",
       "      <td>The Ten Commandments</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>220</td>\n",
       "      <td>[u'Charlton Heston', u'Yul Brynner', u'Anne Ba...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>476</th>\n",
       "      <td>7.8</td>\n",
       "      <td>Hamlet</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Drama</td>\n",
       "      <td>242</td>\n",
       "      <td>[u'Kenneth Branagh', u'Julie Christie', u'Dere...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>630</th>\n",
       "      <td>7.7</td>\n",
       "      <td>Malcolm X</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Biography</td>\n",
       "      <td>202</td>\n",
       "      <td>[u'Denzel Washington', u'Angela Bassett', u'De...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>767</th>\n",
       "      <td>7.6</td>\n",
       "      <td>It's a Mad, Mad, Mad, Mad World</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Action</td>\n",
       "      <td>205</td>\n",
       "      <td>[u'Spencer Tracy', u'Milton Berle', u'Ethel Me...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     star_rating                                          title  \\\n",
       "2            9.1                         The Godfather: Part II   \n",
       "7            8.9  The Lord of the Rings: The Return of the King   \n",
       "17           8.7                                  Seven Samurai   \n",
       "78           8.4                    Once Upon a Time in America   \n",
       "85           8.4                             Lawrence of Arabia   \n",
       "142          8.3              Lagaan: Once Upon a Time in India   \n",
       "157          8.2                             Gone with the Wind   \n",
       "204          8.1                                        Ben-Hur   \n",
       "445          7.9                           The Ten Commandments   \n",
       "476          7.8                                         Hamlet   \n",
       "630          7.7                                      Malcolm X   \n",
       "767          7.6                It's a Mad, Mad, Mad, Mad World   \n",
       "\n",
       "    content_rating      genre  duration  \\\n",
       "2                R      Crime       200   \n",
       "7            PG-13  Adventure       201   \n",
       "17         UNRATED      Drama       207   \n",
       "78               R      Crime       229   \n",
       "85              PG  Adventure       216   \n",
       "142             PG  Adventure       224   \n",
       "157              G      Drama       238   \n",
       "204              G  Adventure       212   \n",
       "445       APPROVED  Adventure       220   \n",
       "476          PG-13      Drama       242   \n",
       "630          PG-13  Biography       202   \n",
       "767       APPROVED     Action       205   \n",
       "\n",
       "                                           actors_list  \n",
       "2    [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "7    [u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...  \n",
       "17   [u'Toshir\\xf4 Mifune', u'Takashi Shimura', u'K...  \n",
       "78   [u'Robert De Niro', u'James Woods', u'Elizabet...  \n",
       "85   [u\"Peter O'Toole\", u'Alec Guinness', u'Anthony...  \n",
       "142  [u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...  \n",
       "157  [u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...  \n",
       "204  [u'Charlton Heston', u'Jack Hawkins', u'Stephe...  \n",
       "445  [u'Charlton Heston', u'Yul Brynner', u'Anne Ba...  \n",
       "476  [u'Kenneth Branagh', u'Julie Christie', u'Dere...  \n",
       "630  [u'Denzel Washington', u'Angela Bassett', u'De...  \n",
       "767  [u'Spencer Tracy', u'Milton Berle', u'Ethel Me...  "
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use bracket notation with the boolean Series to tell the DataFrame which rows to display\n",
    "movies[is_long]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8.9</td>\n",
       "      <td>The Lord of the Rings: The Return of the King</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>201</td>\n",
       "      <td>[u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>8.7</td>\n",
       "      <td>Seven Samurai</td>\n",
       "      <td>UNRATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>207</td>\n",
       "      <td>[u'Toshir\\xf4 Mifune', u'Takashi Shimura', u'K...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78</th>\n",
       "      <td>8.4</td>\n",
       "      <td>Once Upon a Time in America</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>229</td>\n",
       "      <td>[u'Robert De Niro', u'James Woods', u'Elizabet...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>8.4</td>\n",
       "      <td>Lawrence of Arabia</td>\n",
       "      <td>PG</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>216</td>\n",
       "      <td>[u\"Peter O'Toole\", u'Alec Guinness', u'Anthony...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>8.3</td>\n",
       "      <td>Lagaan: Once Upon a Time in India</td>\n",
       "      <td>PG</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>224</td>\n",
       "      <td>[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>157</th>\n",
       "      <td>8.2</td>\n",
       "      <td>Gone with the Wind</td>\n",
       "      <td>G</td>\n",
       "      <td>Drama</td>\n",
       "      <td>238</td>\n",
       "      <td>[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>204</th>\n",
       "      <td>8.1</td>\n",
       "      <td>Ben-Hur</td>\n",
       "      <td>G</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>212</td>\n",
       "      <td>[u'Charlton Heston', u'Jack Hawkins', u'Stephe...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>445</th>\n",
       "      <td>7.9</td>\n",
       "      <td>The Ten Commandments</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>220</td>\n",
       "      <td>[u'Charlton Heston', u'Yul Brynner', u'Anne Ba...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>476</th>\n",
       "      <td>7.8</td>\n",
       "      <td>Hamlet</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Drama</td>\n",
       "      <td>242</td>\n",
       "      <td>[u'Kenneth Branagh', u'Julie Christie', u'Dere...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>630</th>\n",
       "      <td>7.7</td>\n",
       "      <td>Malcolm X</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Biography</td>\n",
       "      <td>202</td>\n",
       "      <td>[u'Denzel Washington', u'Angela Bassett', u'De...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>767</th>\n",
       "      <td>7.6</td>\n",
       "      <td>It's a Mad, Mad, Mad, Mad World</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Action</td>\n",
       "      <td>205</td>\n",
       "      <td>[u'Spencer Tracy', u'Milton Berle', u'Ethel Me...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     star_rating                                          title  \\\n",
       "2            9.1                         The Godfather: Part II   \n",
       "7            8.9  The Lord of the Rings: The Return of the King   \n",
       "17           8.7                                  Seven Samurai   \n",
       "78           8.4                    Once Upon a Time in America   \n",
       "85           8.4                             Lawrence of Arabia   \n",
       "142          8.3              Lagaan: Once Upon a Time in India   \n",
       "157          8.2                             Gone with the Wind   \n",
       "204          8.1                                        Ben-Hur   \n",
       "445          7.9                           The Ten Commandments   \n",
       "476          7.8                                         Hamlet   \n",
       "630          7.7                                      Malcolm X   \n",
       "767          7.6                It's a Mad, Mad, Mad, Mad World   \n",
       "\n",
       "    content_rating      genre  duration  \\\n",
       "2                R      Crime       200   \n",
       "7            PG-13  Adventure       201   \n",
       "17         UNRATED      Drama       207   \n",
       "78               R      Crime       229   \n",
       "85              PG  Adventure       216   \n",
       "142             PG  Adventure       224   \n",
       "157              G      Drama       238   \n",
       "204              G  Adventure       212   \n",
       "445       APPROVED  Adventure       220   \n",
       "476          PG-13      Drama       242   \n",
       "630          PG-13  Biography       202   \n",
       "767       APPROVED     Action       205   \n",
       "\n",
       "                                           actors_list  \n",
       "2    [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "7    [u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...  \n",
       "17   [u'Toshir\\xf4 Mifune', u'Takashi Shimura', u'K...  \n",
       "78   [u'Robert De Niro', u'James Woods', u'Elizabet...  \n",
       "85   [u\"Peter O'Toole\", u'Alec Guinness', u'Anthony...  \n",
       "142  [u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...  \n",
       "157  [u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...  \n",
       "204  [u'Charlton Heston', u'Jack Hawkins', u'Stephe...  \n",
       "445  [u'Charlton Heston', u'Yul Brynner', u'Anne Ba...  \n",
       "476  [u'Kenneth Branagh', u'Julie Christie', u'Dere...  \n",
       "630  [u'Denzel Washington', u'Angela Bassett', u'De...  \n",
       "767  [u'Spencer Tracy', u'Milton Berle', u'Ethel Me...  "
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# simplify the steps above: no need to write a for loop to create 'is_long' since pandas will broadcast the comparison\n",
    "is_long = movies.duration >= 200\n",
    "movies[is_long]\n",
    "\n",
    "# or equivalently, write it in one line (no need to create the 'is_long' object)\n",
    "movies[movies.duration >= 200]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2          Crime\n",
       "7      Adventure\n",
       "17         Drama\n",
       "78         Crime\n",
       "85     Adventure\n",
       "142    Adventure\n",
       "157        Drama\n",
       "204    Adventure\n",
       "445    Adventure\n",
       "476        Drama\n",
       "630    Biography\n",
       "767       Action\n",
       "Name: genre, dtype: object"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select the 'genre' Series from the filtered DataFrame\n",
    "movies[movies.duration >= 200].genre\n",
    "\n",
    "# or equivalently, use the 'loc' method\n",
    "movies.loc[movies.duration >= 200, 'genre']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`loc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. How do I apply multiple filter criteria to a pandas DataFrame? ([video](https://www.youtube.com/watch?v=YPItfQ87qjM&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=9))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.9</td>\n",
       "      <td>Pulp Fiction</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>154</td>\n",
       "      <td>[u'John Travolta', u'Uma Thurman', u'Samuel L....</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating   genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R   Crime       142   \n",
       "1          9.2             The Godfather              R   Crime       175   \n",
       "2          9.1    The Godfather: Part II              R   Crime       200   \n",
       "3          9.0           The Dark Knight          PG-13  Action       152   \n",
       "4          8.9              Pulp Fiction              R   Crime       154   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1    [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3  [u'Christian Bale', u'Heath Ledger', u'Aaron E...  \n",
       "4  [u'John Travolta', u'Uma Thurman', u'Samuel L....  "
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of top-rated IMDb movies into a DataFrame\n",
    "movies = pd.read_csv('http://bit.ly/imdbratings')\n",
    "movies.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8.9</td>\n",
       "      <td>The Lord of the Rings: The Return of the King</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>201</td>\n",
       "      <td>[u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>8.7</td>\n",
       "      <td>Seven Samurai</td>\n",
       "      <td>UNRATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>207</td>\n",
       "      <td>[u'Toshir\\xf4 Mifune', u'Takashi Shimura', u'K...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78</th>\n",
       "      <td>8.4</td>\n",
       "      <td>Once Upon a Time in America</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>229</td>\n",
       "      <td>[u'Robert De Niro', u'James Woods', u'Elizabet...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>8.4</td>\n",
       "      <td>Lawrence of Arabia</td>\n",
       "      <td>PG</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>216</td>\n",
       "      <td>[u\"Peter O'Toole\", u'Alec Guinness', u'Anthony...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>8.3</td>\n",
       "      <td>Lagaan: Once Upon a Time in India</td>\n",
       "      <td>PG</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>224</td>\n",
       "      <td>[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>157</th>\n",
       "      <td>8.2</td>\n",
       "      <td>Gone with the Wind</td>\n",
       "      <td>G</td>\n",
       "      <td>Drama</td>\n",
       "      <td>238</td>\n",
       "      <td>[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>204</th>\n",
       "      <td>8.1</td>\n",
       "      <td>Ben-Hur</td>\n",
       "      <td>G</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>212</td>\n",
       "      <td>[u'Charlton Heston', u'Jack Hawkins', u'Stephe...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>445</th>\n",
       "      <td>7.9</td>\n",
       "      <td>The Ten Commandments</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>220</td>\n",
       "      <td>[u'Charlton Heston', u'Yul Brynner', u'Anne Ba...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>476</th>\n",
       "      <td>7.8</td>\n",
       "      <td>Hamlet</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Drama</td>\n",
       "      <td>242</td>\n",
       "      <td>[u'Kenneth Branagh', u'Julie Christie', u'Dere...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>630</th>\n",
       "      <td>7.7</td>\n",
       "      <td>Malcolm X</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Biography</td>\n",
       "      <td>202</td>\n",
       "      <td>[u'Denzel Washington', u'Angela Bassett', u'De...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>767</th>\n",
       "      <td>7.6</td>\n",
       "      <td>It's a Mad, Mad, Mad, Mad World</td>\n",
       "      <td>APPROVED</td>\n",
       "      <td>Action</td>\n",
       "      <td>205</td>\n",
       "      <td>[u'Spencer Tracy', u'Milton Berle', u'Ethel Me...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     star_rating                                          title  \\\n",
       "2            9.1                         The Godfather: Part II   \n",
       "7            8.9  The Lord of the Rings: The Return of the King   \n",
       "17           8.7                                  Seven Samurai   \n",
       "78           8.4                    Once Upon a Time in America   \n",
       "85           8.4                             Lawrence of Arabia   \n",
       "142          8.3              Lagaan: Once Upon a Time in India   \n",
       "157          8.2                             Gone with the Wind   \n",
       "204          8.1                                        Ben-Hur   \n",
       "445          7.9                           The Ten Commandments   \n",
       "476          7.8                                         Hamlet   \n",
       "630          7.7                                      Malcolm X   \n",
       "767          7.6                It's a Mad, Mad, Mad, Mad World   \n",
       "\n",
       "    content_rating      genre  duration  \\\n",
       "2                R      Crime       200   \n",
       "7            PG-13  Adventure       201   \n",
       "17         UNRATED      Drama       207   \n",
       "78               R      Crime       229   \n",
       "85              PG  Adventure       216   \n",
       "142             PG  Adventure       224   \n",
       "157              G      Drama       238   \n",
       "204              G  Adventure       212   \n",
       "445       APPROVED  Adventure       220   \n",
       "476          PG-13      Drama       242   \n",
       "630          PG-13  Biography       202   \n",
       "767       APPROVED     Action       205   \n",
       "\n",
       "                                           actors_list  \n",
       "2    [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "7    [u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...  \n",
       "17   [u'Toshir\\xf4 Mifune', u'Takashi Shimura', u'K...  \n",
       "78   [u'Robert De Niro', u'James Woods', u'Elizabet...  \n",
       "85   [u\"Peter O'Toole\", u'Alec Guinness', u'Anthony...  \n",
       "142  [u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...  \n",
       "157  [u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...  \n",
       "204  [u'Charlton Heston', u'Jack Hawkins', u'Stephe...  \n",
       "445  [u'Charlton Heston', u'Yul Brynner', u'Anne Ba...  \n",
       "476  [u'Kenneth Branagh', u'Julie Christie', u'Dere...  \n",
       "630  [u'Denzel Washington', u'Angela Bassett', u'De...  \n",
       "767  [u'Spencer Tracy', u'Milton Berle', u'Ethel Me...  "
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# filter the DataFrame to only show movies with a 'duration' of at least 200 minutes\n",
    "movies[movies.duration >= 200]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Understanding **logical operators:**\n",
    "\n",
    "- **`and`**: True only if **both sides** of the operator are True\n",
    "- **`or`**: True if **either side** of the operator is True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "False\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "# demonstration of the 'and' operator\n",
    "print(True and True)\n",
    "print(True and False)\n",
    "print(False and False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "# demonstration of the 'or' operator\n",
    "print(True or True)\n",
    "print(True or False)\n",
    "print(False or False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Rules for specifying **multiple filter criteria** in pandas:\n",
    "\n",
    "- use **`&`** instead of **`and`**\n",
    "- use **`|`** instead of **`or`**\n",
    "- add **parentheses** around each condition to specify evaluation order"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Further filter the DataFrame of long movies (duration >= 200) to only show movies which also have a 'genre' of 'Drama'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>8.7</td>\n",
       "      <td>Seven Samurai</td>\n",
       "      <td>UNRATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>207</td>\n",
       "      <td>[u'Toshir\\xf4 Mifune', u'Takashi Shimura', u'K...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>157</th>\n",
       "      <td>8.2</td>\n",
       "      <td>Gone with the Wind</td>\n",
       "      <td>G</td>\n",
       "      <td>Drama</td>\n",
       "      <td>238</td>\n",
       "      <td>[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>476</th>\n",
       "      <td>7.8</td>\n",
       "      <td>Hamlet</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Drama</td>\n",
       "      <td>242</td>\n",
       "      <td>[u'Kenneth Branagh', u'Julie Christie', u'Dere...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     star_rating               title content_rating  genre  duration  \\\n",
       "17           8.7       Seven Samurai        UNRATED  Drama       207   \n",
       "157          8.2  Gone with the Wind              G  Drama       238   \n",
       "476          7.8              Hamlet          PG-13  Drama       242   \n",
       "\n",
       "                                           actors_list  \n",
       "17   [u'Toshir\\xf4 Mifune', u'Takashi Shimura', u'K...  \n",
       "157  [u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...  \n",
       "476  [u'Kenneth Branagh', u'Julie Christie', u'Dere...  "
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# CORRECT: use the '&' operator to specify that both conditions are required\n",
    "movies[(movies.duration >=200) & (movies.genre == 'Drama')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>8.9</td>\n",
       "      <td>12 Angry Men</td>\n",
       "      <td>NOT RATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>96</td>\n",
       "      <td>[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8.9</td>\n",
       "      <td>The Lord of the Rings: The Return of the King</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>201</td>\n",
       "      <td>[u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>8.9</td>\n",
       "      <td>Fight Club</td>\n",
       "      <td>R</td>\n",
       "      <td>Drama</td>\n",
       "      <td>139</td>\n",
       "      <td>[u'Brad Pitt', u'Edward Norton', u'Helena Bonh...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>8.8</td>\n",
       "      <td>Forrest Gump</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Drama</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tom Hanks', u'Robin Wright', u'Gary Sinise']</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    star_rating                                          title content_rating  \\\n",
       "2           9.1                         The Godfather: Part II              R   \n",
       "5           8.9                                   12 Angry Men      NOT RATED   \n",
       "7           8.9  The Lord of the Rings: The Return of the King          PG-13   \n",
       "9           8.9                                     Fight Club              R   \n",
       "13          8.8                                   Forrest Gump          PG-13   \n",
       "\n",
       "        genre  duration                                        actors_list  \n",
       "2       Crime       200  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "5       Drama        96  [u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...  \n",
       "7   Adventure       201  [u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...  \n",
       "9       Drama       139  [u'Brad Pitt', u'Edward Norton', u'Helena Bonh...  \n",
       "13      Drama       142    [u'Tom Hanks', u'Robin Wright', u'Gary Sinise']  "
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# INCORRECT: using the '|' operator would have shown movies that are either long or dramas (or both)\n",
    "movies[(movies.duration >=200) | (movies.genre == 'Drama')].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Filter the original DataFrame to show movies with a 'genre' of 'Crime' or 'Drama' or 'Action'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.9</td>\n",
       "      <td>Pulp Fiction</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>154</td>\n",
       "      <td>[u'John Travolta', u'Uma Thurman', u'Samuel L....</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>8.9</td>\n",
       "      <td>12 Angry Men</td>\n",
       "      <td>NOT RATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>96</td>\n",
       "      <td>[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>8.9</td>\n",
       "      <td>Fight Club</td>\n",
       "      <td>R</td>\n",
       "      <td>Drama</td>\n",
       "      <td>139</td>\n",
       "      <td>[u'Brad Pitt', u'Edward Norton', u'Helena Bonh...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>8.8</td>\n",
       "      <td>Inception</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>148</td>\n",
       "      <td>[u'Leonardo DiCaprio', u'Joseph Gordon-Levitt'...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>8.8</td>\n",
       "      <td>Star Wars: Episode V - The Empire Strikes Back</td>\n",
       "      <td>PG</td>\n",
       "      <td>Action</td>\n",
       "      <td>124</td>\n",
       "      <td>[u'Mark Hamill', u'Harrison Ford', u'Carrie Fi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>8.8</td>\n",
       "      <td>Forrest Gump</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Drama</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tom Hanks', u'Robin Wright', u'Gary Sinise']</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    star_rating                                           title  \\\n",
       "0           9.3                        The Shawshank Redemption   \n",
       "1           9.2                                   The Godfather   \n",
       "2           9.1                          The Godfather: Part II   \n",
       "3           9.0                                 The Dark Knight   \n",
       "4           8.9                                    Pulp Fiction   \n",
       "5           8.9                                    12 Angry Men   \n",
       "9           8.9                                      Fight Club   \n",
       "11          8.8                                       Inception   \n",
       "12          8.8  Star Wars: Episode V - The Empire Strikes Back   \n",
       "13          8.8                                    Forrest Gump   \n",
       "\n",
       "   content_rating   genre  duration  \\\n",
       "0               R   Crime       142   \n",
       "1               R   Crime       175   \n",
       "2               R   Crime       200   \n",
       "3           PG-13  Action       152   \n",
       "4               R   Crime       154   \n",
       "5       NOT RATED   Drama        96   \n",
       "9               R   Drama       139   \n",
       "11          PG-13  Action       148   \n",
       "12             PG  Action       124   \n",
       "13          PG-13   Drama       142   \n",
       "\n",
       "                                          actors_list  \n",
       "0   [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1     [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2   [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3   [u'Christian Bale', u'Heath Ledger', u'Aaron E...  \n",
       "4   [u'John Travolta', u'Uma Thurman', u'Samuel L....  \n",
       "5   [u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...  \n",
       "9   [u'Brad Pitt', u'Edward Norton', u'Helena Bonh...  \n",
       "11  [u'Leonardo DiCaprio', u'Joseph Gordon-Levitt'...  \n",
       "12  [u'Mark Hamill', u'Harrison Ford', u'Carrie Fi...  \n",
       "13    [u'Tom Hanks', u'Robin Wright', u'Gary Sinise']  "
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use the '|' operator to specify that a row can match any of the three criteria\n",
    "movies[(movies.genre == 'Crime') | (movies.genre == 'Drama') | (movies.genre == 'Action')].head(10)\n",
    "\n",
    "# or equivalently, use the 'isin' method\n",
    "movies[movies.genre.isin(['Crime', 'Drama', 'Action'])].head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`isin`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Your pandas questions answered! ([video](https://www.youtube.com/watch?v=B-r9VuK80dk&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** When reading from a file, how do I read in only a subset of the columns?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'City', u'Colors Reported', u'Shape Reported', u'State', u'Time'], dtype='object')"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of UFO reports into a DataFrame, and check the columns\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports')\n",
    "ufo.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'City', u'Time'], dtype='object')"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# specify which columns to include by name\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports', usecols=['City', 'State'])\n",
    "\n",
    "# or equivalently, specify columns by position\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports', usecols=[0, 4])\n",
    "ufo.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** When reading from a file, how do I read in only a subset of the rows?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City  Colors Reported Shape Reported State             Time\n",
       "0       Ithaca              NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1  Willingboro              NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2      Holyoke              NaN           OVAL    CO  2/15/1931 14:00"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# specify how many rows to read\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports', nrows=3)\n",
    "ufo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`read_csv`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** How do I iterate through a Series?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ithaca\n",
      "Willingboro\n",
      "Holyoke\n"
     ]
    }
   ],
   "source": [
    "# Series are directly iterable (like a list)\n",
    "for c in ufo.City:\n",
    "    print(c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** How do I iterate through a DataFrame?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(0, 'Ithaca', 'NY')\n",
      "(1, 'Willingboro', 'NJ')\n",
      "(2, 'Holyoke', 'CO')\n"
     ]
    }
   ],
   "source": [
    "# various methods are available to iterate through a DataFrame\n",
    "for index, row in ufo.iterrows():\n",
    "    print(index, row.City, row.State)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`iterrows`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** How do I drop all non-numeric columns from a DataFrame?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country                          object\n",
       "beer_servings                     int64\n",
       "spirit_servings                   int64\n",
       "wine_servings                     int64\n",
       "total_litres_of_pure_alcohol    float64\n",
       "continent                        object\n",
       "dtype: object"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame, and check the data types\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
    "drinks.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "beer_servings                     int64\n",
       "spirit_servings                   int64\n",
       "wine_servings                     int64\n",
       "total_litres_of_pure_alcohol    float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# only include numeric columns in the DataFrame\n",
    "import numpy as np\n",
    "drinks.select_dtypes(include=[np.number]).dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`select_dtypes`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** How do I know whether I should pass an argument as a string or a list?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>193.000000</td>\n",
       "      <td>193.000000</td>\n",
       "      <td>193.000000</td>\n",
       "      <td>193.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>106.160622</td>\n",
       "      <td>80.994819</td>\n",
       "      <td>49.450777</td>\n",
       "      <td>4.717098</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>101.143103</td>\n",
       "      <td>88.284312</td>\n",
       "      <td>79.697598</td>\n",
       "      <td>3.773298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>20.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.300000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>76.000000</td>\n",
       "      <td>56.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>4.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>188.000000</td>\n",
       "      <td>128.000000</td>\n",
       "      <td>59.000000</td>\n",
       "      <td>7.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>376.000000</td>\n",
       "      <td>438.000000</td>\n",
       "      <td>370.000000</td>\n",
       "      <td>14.400000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       beer_servings  spirit_servings  wine_servings  \\\n",
       "count     193.000000       193.000000     193.000000   \n",
       "mean      106.160622        80.994819      49.450777   \n",
       "std       101.143103        88.284312      79.697598   \n",
       "min         0.000000         0.000000       0.000000   \n",
       "25%        20.000000         4.000000       1.000000   \n",
       "50%        76.000000        56.000000       8.000000   \n",
       "75%       188.000000       128.000000      59.000000   \n",
       "max       376.000000       438.000000     370.000000   \n",
       "\n",
       "       total_litres_of_pure_alcohol  \n",
       "count                    193.000000  \n",
       "mean                       4.717098  \n",
       "std                        3.773298  \n",
       "min                        0.000000  \n",
       "25%                        1.300000  \n",
       "50%                        4.200000  \n",
       "75%                        7.200000  \n",
       "max                       14.400000  "
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# describe all of the numeric columns\n",
    "drinks.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>193</td>\n",
       "      <td>193.000000</td>\n",
       "      <td>193.000000</td>\n",
       "      <td>193.000000</td>\n",
       "      <td>193.000000</td>\n",
       "      <td>193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>193</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>Lesotho</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>NaN</td>\n",
       "      <td>106.160622</td>\n",
       "      <td>80.994819</td>\n",
       "      <td>49.450777</td>\n",
       "      <td>4.717098</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>NaN</td>\n",
       "      <td>101.143103</td>\n",
       "      <td>88.284312</td>\n",
       "      <td>79.697598</td>\n",
       "      <td>3.773298</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>NaN</td>\n",
       "      <td>20.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.300000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>NaN</td>\n",
       "      <td>76.000000</td>\n",
       "      <td>56.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>4.200000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>NaN</td>\n",
       "      <td>188.000000</td>\n",
       "      <td>128.000000</td>\n",
       "      <td>59.000000</td>\n",
       "      <td>7.200000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>NaN</td>\n",
       "      <td>376.000000</td>\n",
       "      <td>438.000000</td>\n",
       "      <td>370.000000</td>\n",
       "      <td>14.400000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        country  beer_servings  spirit_servings  wine_servings  \\\n",
       "count       193     193.000000       193.000000     193.000000   \n",
       "unique      193            NaN              NaN            NaN   \n",
       "top     Lesotho            NaN              NaN            NaN   \n",
       "freq          1            NaN              NaN            NaN   \n",
       "mean        NaN     106.160622        80.994819      49.450777   \n",
       "std         NaN     101.143103        88.284312      79.697598   \n",
       "min         NaN       0.000000         0.000000       0.000000   \n",
       "25%         NaN      20.000000         4.000000       1.000000   \n",
       "50%         NaN      76.000000        56.000000       8.000000   \n",
       "75%         NaN     188.000000       128.000000      59.000000   \n",
       "max         NaN     376.000000       438.000000     370.000000   \n",
       "\n",
       "        total_litres_of_pure_alcohol continent  \n",
       "count                     193.000000       193  \n",
       "unique                           NaN         6  \n",
       "top                              NaN    Africa  \n",
       "freq                             NaN        53  \n",
       "mean                        4.717098       NaN  \n",
       "std                         3.773298       NaN  \n",
       "min                         0.000000       NaN  \n",
       "25%                         1.300000       NaN  \n",
       "50%                         4.200000       NaN  \n",
       "75%                         7.200000       NaN  \n",
       "max                        14.400000       NaN  "
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# pass the string 'all' to describe all columns\n",
    "drinks.describe(include='all')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>193</td>\n",
       "      <td>193.000000</td>\n",
       "      <td>193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>193</td>\n",
       "      <td>NaN</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>Lesotho</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>NaN</td>\n",
       "      <td>4.717098</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>NaN</td>\n",
       "      <td>3.773298</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>NaN</td>\n",
       "      <td>1.300000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>NaN</td>\n",
       "      <td>4.200000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>NaN</td>\n",
       "      <td>7.200000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>NaN</td>\n",
       "      <td>14.400000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        country  total_litres_of_pure_alcohol continent\n",
       "count       193                    193.000000       193\n",
       "unique      193                           NaN         6\n",
       "top     Lesotho                           NaN    Africa\n",
       "freq          1                           NaN        53\n",
       "mean        NaN                      4.717098       NaN\n",
       "std         NaN                      3.773298       NaN\n",
       "min         NaN                      0.000000       NaN\n",
       "25%         NaN                      1.300000       NaN\n",
       "50%         NaN                      4.200000       NaN\n",
       "75%         NaN                      7.200000       NaN\n",
       "max         NaN                     14.400000       NaN"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# pass a list of data types to only describe certain types\n",
    "drinks.describe(include=['object', 'float64'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>193</td>\n",
       "      <td>193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>193</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>Lesotho</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        country continent\n",
       "count       193       193\n",
       "unique      193         6\n",
       "top     Lesotho    Africa\n",
       "freq          1        53"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# pass a list even if you only want to describe a single data type\n",
    "drinks.describe(include=['object'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`describe`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 11. How do I use the \"axis\" parameter in pandas? ([video](https://www.youtube.com/watch?v=PtO3t6ynH-8&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=11))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol  \n",
       "0                           0.0  \n",
       "1                           4.9  \n",
       "2                           0.7  \n",
       "3                          12.4  \n",
       "4                           5.9  "
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop a column (temporarily)\n",
    "drinks.drop('continent', axis=1).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`drop`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Antigua &amp; Barbuda</td>\n",
       "      <td>102</td>\n",
       "      <td>128</td>\n",
       "      <td>45</td>\n",
       "      <td>4.9</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0        Afghanistan              0                0              0   \n",
       "1            Albania             89              132             54   \n",
       "3            Andorra            245              138            312   \n",
       "4             Angola            217               57             45   \n",
       "5  Antigua & Barbuda            102              128             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol      continent  \n",
       "0                           0.0           Asia  \n",
       "1                           4.9         Europe  \n",
       "3                          12.4         Europe  \n",
       "4                           5.9         Africa  \n",
       "5                           4.9  North America  "
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop a row (temporarily)\n",
    "drinks.drop(2, axis=0).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When **referring to rows or columns** with the axis parameter:\n",
    "\n",
    "- **axis 0** refers to rows\n",
    "- **axis 1** refers to columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "beer_servings                   106.160622\n",
       "spirit_servings                  80.994819\n",
       "wine_servings                    49.450777\n",
       "total_litres_of_pure_alcohol      4.717098\n",
       "dtype: float64"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the mean of each numeric column\n",
    "drinks.mean()\n",
    "\n",
    "# or equivalently, specify the axis explicitly\n",
    "drinks.mean(axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`mean`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0.000\n",
       "1     69.975\n",
       "2      9.925\n",
       "3    176.850\n",
       "4     81.225\n",
       "dtype: float64"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the mean of each row\n",
    "drinks.mean(axis=1).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When performing a **mathematical operation** with the axis parameter:\n",
    "\n",
    "- **axis 0** means the operation should \"move down\" the row axis\n",
    "- **axis 1** means the operation should \"move across\" the column axis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "beer_servings                   106.160622\n",
       "spirit_servings                  80.994819\n",
       "wine_servings                    49.450777\n",
       "total_litres_of_pure_alcohol      4.717098\n",
       "dtype: float64"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'index' is an alias for axis 0\n",
    "drinks.mean(axis='index')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0.000\n",
       "1     69.975\n",
       "2      9.925\n",
       "3    176.850\n",
       "4     81.225\n",
       "dtype: float64"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'columns' is an alias for axis 1\n",
    "drinks.mean(axis='columns').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 12. How do I use string methods in pandas? ([video](https://www.youtube.com/watch?v=bofaC0IckHo&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=12))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>order_id</th>\n",
       "      <th>quantity</th>\n",
       "      <th>item_name</th>\n",
       "      <th>choice_description</th>\n",
       "      <th>item_price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Chips and Fresh Tomato Salsa</td>\n",
       "      <td>NaN</td>\n",
       "      <td>$2.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Izze</td>\n",
       "      <td>[Clementine]</td>\n",
       "      <td>$3.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Nantucket Nectar</td>\n",
       "      <td>[Apple]</td>\n",
       "      <td>$3.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Chips and Tomatillo-Green Chili Salsa</td>\n",
       "      <td>NaN</td>\n",
       "      <td>$2.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>Chicken Bowl</td>\n",
       "      <td>[Tomatillo-Red Chili Salsa (Hot), [Black Beans...</td>\n",
       "      <td>$16.98</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   order_id  quantity                              item_name  \\\n",
       "0         1         1           Chips and Fresh Tomato Salsa   \n",
       "1         1         1                                   Izze   \n",
       "2         1         1                       Nantucket Nectar   \n",
       "3         1         1  Chips and Tomatillo-Green Chili Salsa   \n",
       "4         2         2                           Chicken Bowl   \n",
       "\n",
       "                                  choice_description item_price  \n",
       "0                                                NaN     $2.39   \n",
       "1                                       [Clementine]     $3.39   \n",
       "2                                            [Apple]     $3.39   \n",
       "3                                                NaN     $2.39   \n",
       "4  [Tomatillo-Red Chili Salsa (Hot), [Black Beans...    $16.98   "
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of Chipotle orders into a DataFrame\n",
    "orders = pd.read_table('http://bit.ly/chiporders')\n",
    "orders.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'HELLO'"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# normal way to access string methods in Python\n",
    "'hello'.upper()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0             CHIPS AND FRESH TOMATO SALSA\n",
       "1                                     IZZE\n",
       "2                         NANTUCKET NECTAR\n",
       "3    CHIPS AND TOMATILLO-GREEN CHILI SALSA\n",
       "4                             CHICKEN BOWL\n",
       "Name: item_name, dtype: object"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# string methods for pandas Series are accessed via 'str'\n",
    "orders.item_name.str.upper().head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3    False\n",
       "4     True\n",
       "Name: item_name, dtype: bool"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# string method 'contains' checks for a substring and returns a boolean Series\n",
    "orders.item_name.str.contains('Chicken').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>order_id</th>\n",
       "      <th>quantity</th>\n",
       "      <th>item_name</th>\n",
       "      <th>choice_description</th>\n",
       "      <th>item_price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>Chicken Bowl</td>\n",
       "      <td>[Tomatillo-Red Chili Salsa (Hot), [Black Beans...</td>\n",
       "      <td>$16.98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>Chicken Bowl</td>\n",
       "      <td>[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...</td>\n",
       "      <td>$10.98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>Chicken Crispy Tacos</td>\n",
       "      <td>[Roasted Chili Corn Salsa, [Fajita Vegetables,...</td>\n",
       "      <td>$8.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>Chicken Soft Tacos</td>\n",
       "      <td>[Roasted Chili Corn Salsa, [Rice, Black Beans,...</td>\n",
       "      <td>$8.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>Chicken Bowl</td>\n",
       "      <td>[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...</td>\n",
       "      <td>$11.25</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    order_id  quantity             item_name  \\\n",
       "4          2         2          Chicken Bowl   \n",
       "5          3         1          Chicken Bowl   \n",
       "11         6         1  Chicken Crispy Tacos   \n",
       "12         6         1    Chicken Soft Tacos   \n",
       "13         7         1          Chicken Bowl   \n",
       "\n",
       "                                   choice_description item_price  \n",
       "4   [Tomatillo-Red Chili Salsa (Hot), [Black Beans...    $16.98   \n",
       "5   [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...    $10.98   \n",
       "11  [Roasted Chili Corn Salsa, [Fajita Vegetables,...     $8.75   \n",
       "12  [Roasted Chili Corn Salsa, [Rice, Black Beans,...     $8.75   \n",
       "13  [Fresh Tomato Salsa, [Fajita Vegetables, Rice,...    $11.25   "
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use the boolean Series to filter the DataFrame\n",
    "orders[orders.item_name.str.contains('Chicken')].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                                                  NaN\n",
       "1                                           Clementine\n",
       "2                                                Apple\n",
       "3                                                  NaN\n",
       "4    Tomatillo-Red Chili Salsa (Hot), Black Beans, ...\n",
       "Name: choice_description, dtype: object"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# string methods can be chained together\n",
    "orders.choice_description.str.replace('[', '').str.replace(']', '').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                                                  NaN\n",
       "1                                           Clementine\n",
       "2                                                Apple\n",
       "3                                                  NaN\n",
       "4    Tomatillo-Red Chili Salsa (Hot), Black Beans, ...\n",
       "Name: choice_description, dtype: object"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# many pandas string methods support regular expressions (regex)\n",
    "orders.choice_description.str.replace('[\\[\\]]', '').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[String handling section](http://pandas.pydata.org/pandas-docs/stable/api.html#string-handling) of the pandas API reference\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 13. How do I change the data type of a pandas Series? ([video](https://www.youtube.com/watch?v=V0AWyzVMf54&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=13))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country                          object\n",
       "beer_servings                     int64\n",
       "spirit_servings                   int64\n",
       "wine_servings                     int64\n",
       "total_litres_of_pure_alcohol    float64\n",
       "continent                        object\n",
       "dtype: object"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the data type of each Series\n",
    "drinks.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country                          object\n",
       "beer_servings                   float64\n",
       "spirit_servings                   int64\n",
       "wine_servings                     int64\n",
       "total_litres_of_pure_alcohol    float64\n",
       "continent                        object\n",
       "dtype: object"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# change the data type of an existing Series\n",
    "drinks['beer_servings'] = drinks.beer_servings.astype(float)\n",
    "drinks.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`astype`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country                          object\n",
       "beer_servings                   float64\n",
       "spirit_servings                   int64\n",
       "wine_servings                     int64\n",
       "total_litres_of_pure_alcohol    float64\n",
       "continent                        object\n",
       "dtype: object"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# alternatively, change the data type of a Series while reading in a file\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry', dtype={'beer_servings':float})\n",
    "drinks.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>order_id</th>\n",
       "      <th>quantity</th>\n",
       "      <th>item_name</th>\n",
       "      <th>choice_description</th>\n",
       "      <th>item_price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Chips and Fresh Tomato Salsa</td>\n",
       "      <td>NaN</td>\n",
       "      <td>$2.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Izze</td>\n",
       "      <td>[Clementine]</td>\n",
       "      <td>$3.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Nantucket Nectar</td>\n",
       "      <td>[Apple]</td>\n",
       "      <td>$3.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Chips and Tomatillo-Green Chili Salsa</td>\n",
       "      <td>NaN</td>\n",
       "      <td>$2.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>Chicken Bowl</td>\n",
       "      <td>[Tomatillo-Red Chili Salsa (Hot), [Black Beans...</td>\n",
       "      <td>$16.98</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   order_id  quantity                              item_name  \\\n",
       "0         1         1           Chips and Fresh Tomato Salsa   \n",
       "1         1         1                                   Izze   \n",
       "2         1         1                       Nantucket Nectar   \n",
       "3         1         1  Chips and Tomatillo-Green Chili Salsa   \n",
       "4         2         2                           Chicken Bowl   \n",
       "\n",
       "                                  choice_description item_price  \n",
       "0                                                NaN     $2.39   \n",
       "1                                       [Clementine]     $3.39   \n",
       "2                                            [Apple]     $3.39   \n",
       "3                                                NaN     $2.39   \n",
       "4  [Tomatillo-Red Chili Salsa (Hot), [Black Beans...    $16.98   "
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of Chipotle orders into a DataFrame\n",
    "orders = pd.read_table('http://bit.ly/chiporders')\n",
    "orders.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "order_id               int64\n",
       "quantity               int64\n",
       "item_name             object\n",
       "choice_description    object\n",
       "item_price            object\n",
       "dtype: object"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the data type of each Series\n",
    "orders.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7.464335785374397"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert a string to a number in order to do math\n",
    "orders.item_price.str.replace('$', '').astype(float).mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3    False\n",
       "4     True\n",
       "Name: item_name, dtype: bool"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# string method 'contains' checks for a substring and returns a boolean Series\n",
    "orders.item_name.str.contains('Chicken').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    0\n",
       "2    0\n",
       "3    0\n",
       "4    1\n",
       "Name: item_name, dtype: int32"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert a boolean Series to an integer (False = 0, True = 1)\n",
    "orders.item_name.str.contains('Chicken').astype(int).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 14. When should I use a \"groupby\" in pandas? ([video](https://www.youtube.com/watch?v=qy0fDqoMJx8&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=14))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "106.16062176165804"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the mean beer servings across the entire dataset\n",
    "drinks.beer_servings.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "61.471698113207545"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the mean beer servings just for countries in Africa\n",
    "drinks[drinks.continent=='Africa'].beer_servings.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "continent\n",
       "Africa            61.471698\n",
       "Asia              37.045455\n",
       "Europe           193.777778\n",
       "North America    145.434783\n",
       "Oceania           89.687500\n",
       "South America    175.083333\n",
       "Name: beer_servings, dtype: float64"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the mean beer servings for each continent\n",
    "drinks.groupby('continent').beer_servings.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`groupby`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "continent\n",
       "Africa           376\n",
       "Asia             247\n",
       "Europe           361\n",
       "North America    285\n",
       "Oceania          306\n",
       "South America    333\n",
       "Name: beer_servings, dtype: int64"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# other aggregation functions (such as 'max') can also be used with groupby\n",
    "drinks.groupby('continent').beer_servings.max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>continent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Africa</th>\n",
       "      <td>53</td>\n",
       "      <td>61.471698</td>\n",
       "      <td>0</td>\n",
       "      <td>376</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Asia</th>\n",
       "      <td>44</td>\n",
       "      <td>37.045455</td>\n",
       "      <td>0</td>\n",
       "      <td>247</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Europe</th>\n",
       "      <td>45</td>\n",
       "      <td>193.777778</td>\n",
       "      <td>0</td>\n",
       "      <td>361</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>North America</th>\n",
       "      <td>23</td>\n",
       "      <td>145.434783</td>\n",
       "      <td>1</td>\n",
       "      <td>285</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oceania</th>\n",
       "      <td>16</td>\n",
       "      <td>89.687500</td>\n",
       "      <td>0</td>\n",
       "      <td>306</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>South America</th>\n",
       "      <td>12</td>\n",
       "      <td>175.083333</td>\n",
       "      <td>93</td>\n",
       "      <td>333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               count        mean  min  max\n",
       "continent                                 \n",
       "Africa            53   61.471698    0  376\n",
       "Asia              44   37.045455    0  247\n",
       "Europe            45  193.777778    0  361\n",
       "North America     23  145.434783    1  285\n",
       "Oceania           16   89.687500    0  306\n",
       "South America     12  175.083333   93  333"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# multiple aggregation functions can be applied simultaneously\n",
    "drinks.groupby('continent').beer_servings.agg(['count', 'mean', 'min', 'max'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`agg`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.agg.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>continent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Africa</th>\n",
       "      <td>61.471698</td>\n",
       "      <td>16.339623</td>\n",
       "      <td>16.264151</td>\n",
       "      <td>3.007547</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Asia</th>\n",
       "      <td>37.045455</td>\n",
       "      <td>60.840909</td>\n",
       "      <td>9.068182</td>\n",
       "      <td>2.170455</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Europe</th>\n",
       "      <td>193.777778</td>\n",
       "      <td>132.555556</td>\n",
       "      <td>142.222222</td>\n",
       "      <td>8.617778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>North America</th>\n",
       "      <td>145.434783</td>\n",
       "      <td>165.739130</td>\n",
       "      <td>24.521739</td>\n",
       "      <td>5.995652</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oceania</th>\n",
       "      <td>89.687500</td>\n",
       "      <td>58.437500</td>\n",
       "      <td>35.625000</td>\n",
       "      <td>3.381250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>South America</th>\n",
       "      <td>175.083333</td>\n",
       "      <td>114.750000</td>\n",
       "      <td>62.416667</td>\n",
       "      <td>6.308333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               beer_servings  spirit_servings  wine_servings  \\\n",
       "continent                                                      \n",
       "Africa             61.471698        16.339623      16.264151   \n",
       "Asia               37.045455        60.840909       9.068182   \n",
       "Europe            193.777778       132.555556     142.222222   \n",
       "North America     145.434783       165.739130      24.521739   \n",
       "Oceania            89.687500        58.437500      35.625000   \n",
       "South America     175.083333       114.750000      62.416667   \n",
       "\n",
       "               total_litres_of_pure_alcohol  \n",
       "continent                                    \n",
       "Africa                             3.007547  \n",
       "Asia                               2.170455  \n",
       "Europe                             8.617778  \n",
       "North America                      5.995652  \n",
       "Oceania                            3.381250  \n",
       "South America                      6.308333  "
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# specifying a column to which the aggregation function should be applied is not required\n",
    "drinks.groupby('continent').mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# allow plots to appear in the notebook\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x9a52cf8>"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAFOCAYAAACWguaYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8VNX5x/HPE2RRIZAASQiEkOKO/iiCuKFGrVoRRa3s\ni6C1dcECVqsWkdC6r63+Wi1KFVQEbdWC8kNaFbWoRVS0iqCCBAgSZU2gIkue3x8zmSaYkIRMcieX\n7/v1mhczd33uvcyTM+eee465OyIi0vAlBR2AiIjEhxK6iEhIKKGLiISEErqISEgooYuIhIQSuohI\nSFSZ0M2sg5m9amafmNm/zewX0ekpZjbXzJaa2ctm1rLMOjea2edm9qmZnVmXByAiIhFWVTt0M8sA\nMtx9kZk1B94D+gIjgfXufpeZXQ+kuPsNZnYE8BRwDNAB+AdwsKvBu4hInaqyhO7ua919UfT9FuBT\nIom6LzAlutgU4Pzo+/OA6e6+091XAJ8DPeMct4iI7KZGdehm1gn4IfAOkO7uhRBJ+kBadLH2wKoy\nqxVEp4mISB3ar7oLRqtb/gKMdvctZrZ7FUqNqlQqWF9ERKrB3a2i6dUqoZvZfkSS+RPu/rfo5EIz\nS4/OzwC+jk4vALLKrN4hOq2ioOrtNWHChHrdX32/dHwN+xXm4wvzsQVxfHtS3SqXPwOL3f33ZabN\nBEZE318M/K3M9IFm1sTMcoCDgAXV3I+IiOylKqtczOxEYAjwbzP7gEjVyq+BO4FnzOwSIB/oD+Du\ni83sGWAxsAO40qv6syIiIrVWZUJ39/lAo0pm/6iSdW4Hbq9FXHGXm5sbdAh1SsfXsIX5+MJ8bJBY\nx1dlO/Q627GZCu4iIjVkZnglN0Wr3cqlvnTq1In8/PygwxBpcLKzs1mxYkXQYUiAEq6EHv3rE0BE\nIg2bvjv7hj2V0NU5l4hISCihi4iEhBK6iEhIKKGLiISEEnoN5eTk8OqrrwYdRp274ooruPXWW4MO\nQ0RqIOGaLVYkI6MThYV115QxPT2btWtX1Nn2G6KHHnoo6BBEpIYaREKPJPO6a45VWFhhC6DA7Nq1\ni0aNKns4Nz7cHbPEOm4RqR1VueyFBQsW0KVLF1q3bs2ll17K9u3bAXjxxRfp1q0bKSkp9OrVi3//\n+9+xdb766isuuugi0tLS6Ny5Mw8++GBs3sSJE+nXrx/Dhg2jVatWTJky5Xv7LPXuu+9yzDHH0LJl\nS9q1a8e1114bm/fOO+9w4oknkpKSQrdu3Xj99ddj80499VRuuukmevXqxYEHHsjdd9/NMcccU27b\n999/P+efHxmnZOTIkdx8880AvP7662RlZXHfffeRnp5O+/btefzxx2PrbdiwgXPPPZeWLVty7LHH\nMn78eE466aTY/LFjx5Kenk7Lli3p2rUrixcvrsnpFpHqCqrLyciuv6+i6YCD1+Gr4lgq0qlTJz/q\nqKO8oKDAN27c6CeeeKKPHz/eP/jgA09LS/N3333XS0pKfOrUqd6pUyffvn27l5SUePfu3f2WW27x\nnTt3+pdffumdO3f2uXPnurt7Xl6eN2nSxGfOnOnu7tu2bat0/8cff7w/+eST7u6+detW/9e//uXu\n7gUFBd66dWufM2eOu7v/4x//8NatW/u6devc3T03N9ezs7P9008/9V27dvnmzZs9OTnZv/jii9i2\njznmGH/mmWfc3X3EiBE+fvx4d3efN2+e77fffp6Xl+c7d+702bNn+wEHHOCbNm1yd/cBAwb4oEGD\nfNu2bb548WLPysryk046yd3dX375Ze/Ro4cXFRW5u/uSJUt87dq11T7fUn01+X8sDVf0OleYV1VC\n3wtXX301mZmZtGrVinHjxjFt2jQmTZrE5ZdfTo8ePTAzhg0bRtOmTXnnnXd49913WbduHePGjaNR\no0Z06tSJn/70p0yfPj22zeOPP55zzz0XgKZNm1a67yZNmvDFF1+wfv16DjjgAHr2jIzu9+STT3LO\nOedw1llnAXD66afTo0cPZs+eHVt3xIgRHHbYYSQlJZGcnEzfvn15+umnAfj8889ZunRpLIaK9jt+\n/HgaNWrE2WefTfPmzVm6dCklJSU899xz/OY3v6Fp06YcfvjhXHzxxbH1GjduTHFxMYsXL8bdOfTQ\nQ0lPT9/LMy8ie6KEvhc6dOgQe5+dnc2aNWtYuXIl99xzD6mpqaSmppKSksLq1atZs2YN+fn5FBQU\nlJt3++238/XXX8e2k5WVVdGuvmfy5MksXbqUww47jGOPPZaXXnoJgPz8fJ555ply+5g/fz5r166t\ndB+DBg2KJfRp06Zx/vnn06xZswr327p1a5KS/vvf5YADDmDLli1888037Nq1q9w5KbufU089lVGj\nRnHVVVeRnp7O5ZdfzpYtW6p1rCJSM0roe2HVqv8Ombpy5Urat29PVlYWN910Exs2bGDDhg1s3LiR\nLVu2MGDAALKysvjBD35Qbt7mzZuZNWtWbDvVvUHZuXNnpk2bxjfffMOvfvUrLrroIr799luysrIY\nPnx4uX0UFxdz3XXXVbqPM844g2+++YYPP/yQ6dOnM3jw4Bqfi7Zt27LffvuxevXqCs8PwKhRo1i4\ncCGLFy9m6dKl3H333TXej4hUTQl9L/zhD3+goKCADRs2cOuttzJw4EB++tOf8tBDD7FgQWRwpq1b\ntzJ79my2bt1Kz549adGiBXfddRfbtm1j165dfPLJJyxcuLDG+37qqadYt24dAC1btsTMSEpKYujQ\nocyaNYu5c+dSUlLCtm3beP3111mzZk2l29pvv/3o168f1113HRs3buSMM86ocTxJSUlceOGF5OXl\n8e2337JkyRKmTp0am79w4UIWLFjAzp072X///WnWrFm5kr6IxE+D+Galp2cDVmevyParx8wYPHgw\nZ555JgcddBAHH3ww48aNo3v37jz66KOMGjWK1NRUDjnkkFhrlaSkJF588UUWLVpETk4OaWlpXHbZ\nZRQVFdX4XMyZM4cuXbqQnJzM2LFjmTFjBk2bNqVDhw787W9/47bbbqNt27ZkZ2dzzz33UFJSEou7\nIoMGDeKVV16hf//+NUq0Zbf34IMPsmnTJtq1a8fFF1/M4MGDY/cBioqKuOyyy0hNTSUnJ4c2bdqU\n+9UgIvGj7nMl7m644QYKCwt57LHHgg5ln6Lvzr5B3edKnVq6dGmszf2CBQuYPHkyF154YcBRiex7\nlNATUO/evWnRogXJyckkJyfH3t9xxx1Bh1ah4uJiLrzwQpo3b86gQYO47rrrKm3+KCJ1R1UuIiGh\n786+QVUuIiL7ACV0EZGQUEIXESHSTbeZVfjKyOgUdHjVojp0kZDQd6d2Is9WVHb+Eufcqg5dRGQf\noIReD4488kjeeOONSufffvvt/OxnP6vHiGom0eMTkYgGUeWS0SGDwoLCOoslvX06a1evrXrBepCf\nn09OTg47d+5UnydSI6pyqZ0wVLk0jCHoCgohrw63n1d3fyyqsvtwcx4dGq4+//OUlJToj4dICOhb\nXEN33nknHTp0IDk5mcMPP5zXXnstNoTcwIEDSU5OpkePHnz00UexdXJycnj11VeBioebmzhxIsOH\nDwfglFNOAaBVq1YkJyfzr3/9q9JYli1bRm5uLq1atSItLY1BgwbF5i1ZsoQzzzyT1q1bc/jhh/Ps\ns8/G5o0cOZIrr7ySc845hxYtWnDPPffQrl27cn9Enn/+eX74wx/GYh42bBgQ+QWRlJTE1KlTyc7O\nJi0tjdtuuy223rZt27j44otJTU2lS5cu3H333eX6R6/o/IlIfCih18Bnn33GH/7wB9577z2Kiop4\n+eWX6dSpEwAzZ85kwIABbNy4kUGDBnH++eeza9euCrczc+ZM+vfvz6ZNm77XB3lpXXtRURFFRUUc\ne+yxlcYzfvx4zjrrLDZt2sTq1au5+uqrAfjPf/7DmWeeydChQ1m3bh3Tp0/nyiuvZMmSJbF1n376\nacaPH09xcTGjR4+mefPmsT86pfOHDBkS+7x7b43z58/n888/5x//+Ae/+c1vWLp0KQB5eXmsXLmS\nFStW8Pe//50nn3wytu6ezp+I1J4Seg00atSI7du38/HHH7Nz5046duxITk4OAN27d+eCCy6gUaNG\nXHPNNWzbto133nmnwu2UHW6ushGCqlPl0rhx49hoSE2aNOGEE04AIoNV5+TkMHz4cMyMrl278pOf\n/KRcKb1v374cd9xxQGTIu4EDBzJt2jQg0jfL7Nmzy5X4yzIz8vLyaNKkCf/zP/9D165d+fDDDwF4\n9tlnGTduHMnJyWRmZvKLX/yiWudPRGpPCb0GOnfuzO9+9zvy8vJIS0tj8ODBfPXVV0D5YdfMjA4d\nOlQ6uER1h5uryt13301JSQk9e/bkqKOOinVXm5+fzzvvvFNuOLpp06ZRWPjfewW7xzB48GCef/55\nduzYwXPPPUf37t3LDSu3u7LjgpYORwewZs2aSoejK3v+0tPTy50/Eak9JfQaGjhwIG+++SYrV64E\n4PrrrwfKD7vm7qxevZr27dtXuI09DTdX3aHoANLS0pg0aRIFBQU8/PDDXHnllSxfvpysrCxyc3PL\nDUdXVFTE//7v/1a6n8MPP5zs7Gxmz57N008/vVfD0QG0a9eu3HB0peepVOn5y8/PByJ9p4tIfCih\n18Bnn33Ga6+9xvbt22nSpAn7779/rIXKe++9xwsvvMCuXbu4//77adas2R7rvyvTtm1bkpKSWLZs\nWZXL/uUvf6GgoACI3ERNSkoiKSmJPn368Nlnn/Hkk0+yc+dOduzYwcKFC2P13JUZPHgwv//973nz\nzTfp169fpcvtqTqof//+3H777WzatImCggL+8Ic/xOZVdP7UukYkfhpEs8X09ul12rQwvX161QsB\n3333HTfccANLliyhcePGnHDCCUyaNIk//elP9O3blxkzZjB8+HAOPvhgnnvuuViyr0mpe//992fc\nuHGceOKJ7Ny5kzlz5tCzZ88Kl3333XcZM2YMRUVFpKen88ADD8RuMs6dO5exY8dyzTXX4O507dqV\n++67b4/7HjhwIDfeeCO9e/cmNTW10uV2P56yn2+++WYuv/xycnJyyMzMZMiQIbGqoMrOn4jER4N4\nsCjRTZw4kWXLlpUbHFkiHn74YWbMmKHmifWgIX53EkkYHizS712Jq7Vr1/LWW2/h7ixdupR7771X\nw9GJ1BMl9AR3xRVXVDgc3ZVXXhl0aBXavn07P//5z0lOTuZHP/oRF1xwAVdccUXQYYnsE1TlIhIS\n+u7UjqpcREQkYSihi4iEhBK6iEhIKKGLiISEErqISEgoodeBFi1asGLFiqDDqLXevXvzxBNPBB2G\niFRTg2i22Ckjg/zCunv0Pzs9nRVrE2MIOpG9pWaLtbNPNFs0s8lmVmhmH5WZNsHMVpvZ+9HXj8vM\nu9HMPjezT83szHgcQH5hIQ519qrLPxaJqrLBN0Sk4apOlctjwFkVTL/P3Y+OvuYAmNnhQH/gcOBs\n4I9Wk56pEtzjjz/OeeedF/t88MEHM2DAgNjnjh078uGHH5KUlMTy5cuByHBvo0aNok+fPiQnJ3P8\n8cfz5ZdfxtbZ01BxlZk9ezZdunQhOTmZrKyscp1uvfjii3Tr1o2UlBR69erFv//979i8nJwc7rrr\nLrp27Urz5s256667vter4ujRoxkzZgwAp556Kn/+858BmDJlCieddBLXXXcdqampdO7cmTlz5sTW\nW7FiBaeccgotW7bkzDPPZNSoUbFh67777juGDRtGmzZtSElJ4dhjj+Wbb76p+oSLSM24e5UvIBv4\nqMznCcAvK1juBuD6Mp//Dzi2km16RSqaDrjX4auyWHa3fPlyT0lJcXf3NWvWeHZ2tmdlZbm7+7Jl\nyzw1NdXd3c3Mly1b5u7uI0aM8DZt2vjChQt9165dPmTIEB80aJC7u2/dutWzsrJ8ypQpXlJS4osW\nLfK2bdv6p59+usc42rVr5/Pnz3d3902bNvkHH3zg7u7vv/++p6Wl+bvvvuslJSU+depU79Spk2/f\nvt3d3Tt16uTdunXzgoIC37Ztm+fn5/uBBx7oW7ZscXf3Xbt2ebt27XzBggXu7p6bm+uTJ092d/fH\nH3/cmzRp4pMnT/aSkhJ/6KGHPDMzMxbT8ccf77/61a98x44d/s9//tOTk5N92LBh7u7+pz/9yc87\n7zzftm2bl5SU+Pvvv+/FxcXVOudSfdX9fywVA/aQJhLn3EZjqTBX1+am6CgzW2Rmj5pZy+i09sCq\nMssURKeFQk5ODi1atGDRokW88cYbnHXWWWRmZvLZZ5/xxhtvcNJJJ1W43gUXXED37t1JSkpiyJAh\nLFq0CKh4qLgLL7ywylJ6kyZN+OSTTyguLqZly5axwZwfeeQRLr/8cnr06IGZMWzYMJo2bVpuKLzR\no0eTmZlJ06ZN6dixI0cffTTPP/88AK+88goHHnggxxxzTIX7zc7O5pJLLsHMuPjii/nqq6/4+uuv\nWbVqFQsXLmTixInst99+nHjiieV+yTRu3Jj169fz2WefYWZ069aN5s2bV//Ei0i17G1/6H8EfuPu\nbma3APcCP63pRvLy8mLvc3Nzyc3N3ctw6s8pp5zCa6+9xhdffEFubi4pKSnMmzePt99+m1NOOaXC\ndTIyMmLvyw7XVnaoOIj8Wtq1a1esqqIyf/3rX/ntb3/L9ddfT9euXbn99ts57rjjyM/PZ+rUqTz4\n4IOx7e3YsaPcUHi7Dys3aNAgnn76aYYOHVrlSEVlj2P//fcHYMuWLXzzzTekpqaWGx81KysrNnLR\nsGHDWL16NQMHDmTz5s0MHTqUW2+9NdZfvIhUbt68ecybN69ay+5VQnf3shWgjwCzou8LgLKDVXaI\nTqtQ2YTeUJx88snMmjWLFStWMG7cOFq2bMlTTz3FO++8U25A5OooHSru5ZdfrtF63bt3j42O9OCD\nD9K/f39WrlxJVlYW48aN48Ybb6x03d1vafTr149rr72WgoICnn/++UoHtt6Tdu3asWHDBrZt2xZL\n6qtWrYrta7/99mP8+PGMHz+elStXcvbZZ3PooYcycuTIGu9LZF+ze2F34sSJlS5b3SoXi74iH8wy\nysy7EPg4+n4mMNDMmphZDnAQsKCa+2gQSkvo3377LZmZmZx00knMmTOH9evXx6o+qquyoeKWLFlS\n6To7duxg2rRpFBUV0ahRI1q0aBEr6V522WU8/PDDLFgQOeVbt25l9uzZbN26tdLttWnThlNOOYWR\nI0fygx/8gEMPPbRGxwCRm8E9evQgLy+PHTt28PbbbzNr1qzY/Hnz5vHxxx9TUlJC8+bNady4sYae\nE6kD1Wm2OA14CzjEzFaa2UjgLjP7yMwWAacAYwHcfTHwDLAYmA1cGa3Er5Xs9PTYX5S6eGWnV28I\nOoi0bGnRogUnn3wyEHmIqHPnzvTq1StWIq1uw57mzZszd+5cpk+fTmZmJpmZmdxwww1s3759j+s9\n8cQT5OTk0KpVKyZNmsS0adOASMn9kUceYdSoUaSmpnLIIYcwZcqU2HqVxTV48GBeeeUVhgwZUm56\nVcdRdv5TTz3FW2+9RZs2bbj55psZOHAgTZs2BSKDXlx00UW0bNmSLl26cOqpp1ZZrSQiNdcgHiyS\nhmfgwIEcfvjhTJgwIehQ9hn67tTOPvFgkUh1LFy4kOXLl+PuzJkzh5kzZ3L++ecHHZbIPkUJPUEd\neeSRsWHnyg499/TTTwcdWoXWrl1Lbm4uLVq0YMyYMTz88MN07do16LBE9imqchEJCX13akdVLiIi\nkjCU0EVEQkIJXUQkJJTQRURCQgldRCQklNATyMiRI7n55pv3uMzrr79OVtZ/u8s58sgjeeONN+o6\ntL329ddfc/LJJ9OyZUuuu+66oMOpldI+4Wtj9+tX3/uXcGsQCT2jY0fMrM5eGR07VjuWnJwcXn31\n1bgvWxNlH7n/+OOPY90QTJw4keHDh8d9f7UxadIk0tLS2Lx5M3fffXfQ4dRaPMZrqc02QjRejNSB\nve0+t14VrloFr71Wd9s/9dQ623aicfd6TQr5+fkcccQR9bY/iAyvp655ZV/UIEroiWL48OGsXLmS\nc889l+TkZO655x5mzZrFkUceSWpqKqeddhpLly6tdFmA/v37065dO1JSUsjNzWXx4sW1iqn0V8DL\nL7/MbbfdxowZM2jRogXdunUDIsPI3XTTTfTq1YsDDzyQL7/8kqKiIi699FIyMzPJyspi/PjxsYcm\nli1bRm5uLq1atSItLY1BgwZVGcNbb71Fz549Y8PLvf3220CkCmnKlCnceeedJCcn7/HXysSJE+nX\nrx8DBw4kOTmZHj168NFHsWFsyw3rV7rt0uqp0mqMu+66i3bt2nHJJZcAex6OrzJ33nknBx10EMnJ\nyRx55JG88MILlS77ySefxIYPbNeuHXfccQcA27dvZ8yYMbRv354OHTowduxYduzYEVvP3bnvvvtI\nT0+nffv2PP7447F5RUVFDB8+nLS0NHJycrj11lurjFmklBJ6DUydOpWOHTvy4osvUlRURN++fRk0\naBAPPPAA33zzDWeffTZ9+vRh586d31v22muvBaB3794sW7aMr7/+mqOPPvp7PRzurbPOOotf//rX\nDBgwgOLiYj744IPYvCeffJJHH32U4uJiOnbsyMUXX0zTpk1Zvnw5H3zwAX//+9959NFHARg/fjxn\nnXUWmzZtYvXq1Vx99dV73O/GjRvp06cPY8aMYf369YwdO5ZzzjmHjRs38thjjzFkyBCuv/56ioqK\nOO200/a4rZkzZzJgwAA2btzIoEGDOP/882ODWVf1q2Lt2rVs2rSJlStXMmnSJD744AMuvfRSHnnk\nETZs2MDPf/5zzjvvvHKJtSIHHXQQ8+fPp6ioiAkTJjB06FAKKxhEfMuWLZxxxhn07t2br776ii++\n+ILTTz8dgFtuuYUFCxbw0Ucf8eGHH7JgwQJuueWWcrEWFxezZs0aHn30Ua666io2b94MwKhRoygu\nLmbFihXMmzePqVOn8thjj+0xZpFSSuh7obQ0O2PGDPr06cNpp51Go0aNuPbaa/n222956623vrds\nqREjRnDAAQfQuHFjbr75Zj788EOKi4vrNN4RI0Zw2GGHkZSUxIYNG/i///s/7r//fpo1a0abNm0Y\nM2YM06dPByLDxeXn51NQUECTJk044YQT9rjtl156iUMOOYTBgweTlJTEwIEDOeyww8r1h15d3bt3\n54ILLqBRo0Zcc801bNu2LTbgRlWPXTdq1IiJEyfSuHFjmjZtWq3h+Cryk5/8hPRod8r9+vXj4IMP\njvUvX9aLL75Iu3btGDNmDE2aNCk3dN+0adOYMGECrVu3pnXr1kyYMIEnnngitm6TJk0YP348jRo1\n4uyzz6Z58+YsXbqUkpISZsyYwR133MEBBxxAdnY2v/zlL8utK7InSui1sGbNGrKzs2OfzYysrCwK\nCioepKmkpIQbbriBgw46iFatWpGTk4OZsW7dujqNs2yrivz8fHbs2EG7du1ITU0lJSWFyy+/nG++\niQxCdffdd1NSUkLPnj056qijqiwd7n4OIDL2aGXnoLpxmhkdOnQoN3zenrRt25bGjRvHPufn53Pv\nvfeSmpoaO87Vq1dXub2pU6fGqmlSUlL45JNPKrw+q1atonPnzhVuY82aNXQsc6M9Ozu73H5bt25d\nboCP0mEJ161bx86dO7+37t6cS9k3KaHXUNmf/pmZmeTn55ebv2rVqti4nbtXE0ybNo1Zs2bx6quv\nsmnTJlasWBEbrTvesVU2PSsri2bNmrF+/Xo2bNjAxo0b2bRpU6y+Oi0tjUmTJlFQUMDDDz/MlVde\nWa7ueneZmZmsWLGi3LSVK1fSvn3NxwZfteq/44u7O6tXr45t54ADDuA///lPbP7atWsrPcbS4xw3\nbhwbNmyIHeeWLVsYMGBApftfuXIlP/vZz/jjH//Ixo0b2bhxI126dKnw+mRlZbFs2bIKt9O+ffty\n/y/y8/PJzMzcw5FHtGnTJvYLqey6e3MuZd+khF5D6enpsQTXv39/XnrpJV577TV27tzJPffcQ7Nm\nzTj++OOByKDKZZNhcXExTZs2JSUlha1bt3LjjTfGtcVJenp67I9EZTIyMjjzzDMZO3YsxcXFuDvL\nly+PtWX/y1/+EisRtmrViqSkpD0OF9e7d28+//xzpk+fzq5du5gxYwaffvopffr0qXH87733Xmys\n1NIqoWOPPRaAbt26MW3aNEpKSpgzZw6vv/76Hre1N8Pxbd26laSkJNq0aUNJSQmPPfYYH3/8cYXL\n9unTh7Vr1/LAAw+wfft2tmzZEtvXwIEDueWWW1i3bh3r1q3jt7/9bbVGaEpKSqJfv36MGzeOLVu2\nkJ+fz/3336/RnaT6SkuI9f2K7Pr7KpqenpXlRPq1rJNXelZWhbFU5G9/+5t37NjRU1JS/N577/UX\nXnjBjzjiCG/VqpXn5ub64sWLK11269at3rdvX2/RooV36tTJn3jiCU9KSvJly5a5u/uIESN8/Pjx\ne9z/vHnzPKtMvDk5Of7KK6+4u/v69eu9V69enpKS4t27d3d391NPPdUnT55cbhtFRUV+xRVXeIcO\nHbxVq1Z+9NFH+4wZM9zd/Ve/+pW3b9/eW7Ro4QcddJA/+uijVZ6T+fPne/fu3b1Vq1beo0cPf+ut\nt2LzRo4cWeUxubvn5eV5v379fODAgd6iRQs/+uijfdGiRbH5Cxcu9C5dunhycrIPHz7cBw8eHNvu\n7uek1Msvv+zHHHOMp6SkeGZmpvfv39+3bNmyxzhuuukmT01N9bZt2/ovf/lLz83NjZ2/xx9/3E86\n6aTYsp988omffvrpnpKS4u3atfM777zT3d23bdvmo0eP9nbt2nlmZqaPGTPGv/vuu0pjLXsNN27c\n6EOHDvW2bdt6x44d/ZZbboktt/v+d1fZd0qqJ5IPvJJX4pzbaCwV5lX1hy4JYeLEiSxbtoypU6cG\nHUqDpe9O7ag/dBERSRhK6Ano9ttvjw05V/Z1zjnnBBLPP//5z+/FU/q5Jnr37l1uO6Xv77jjjnp7\nenXVqlV/p8bXAAAWIUlEQVSVHsvq1avrJQaRuqIqF5GQ0HendlTlIiIiCUMJXUQkJJTQRURCIuG6\nz83OzlafzyJ7YfcuGGTfk3A3RSXcqrrxRF4ls/L2uFZC3bDa6+NLkGPYV+mmqIiIJAwldBGRkFBC\nFxEJCSV0EZGQUEIXEQkJJXQRkZBQQhcRCQkldBGRqjSKtP+u6JXRISPo6GIS7klREZGEs4tKHwor\nzCusz0j2SCV0EZGQUEIXEQkJJXQRkZBQQhcRCQkldBGRkFBCFxEJCSV0EZGQUEIXEQkJJXQRkZBQ\nQhcRCQkldBGRkKgyoZvZZDMrNLOPykxLMbO5ZrbUzF42s5Zl5t1oZp+b2admdmZdBS4iIuVVp4T+\nGHDWbtNuAP7h7ocCrwI3ApjZEUB/4HDgbOCPFhlKW0RE6liVCd3d/wls3G1yX2BK9P0U4Pzo+/OA\n6e6+091XAJ8DPeMTqoiI7Mne1qGnuXshgLuvBdKi09sDq8osVxCdJiIidSxe/aH73qyUl5cXe5+b\nm0tubm6cwhERCYd58+Yxb968ai27twm90MzS3b3QzDKAr6PTC4CsMst1iE6rUNmELiIi37d7YXfi\nxImVLlvdKheLvkrNBEZE318M/K3M9IFm1sTMcoCDgAXV3IeIiNRClSV0M5sG5AKtzWwlMAG4A3jW\nzC4B8om0bMHdF5vZM8BiYAdwpbvvVXWMiIjUTJUJ3d0HVzLrR5Usfztwe22CEhGRmtOToiIiIaGE\nLiISEkroIiIhoYQuIhISSugiIiGhhC4iEhJK6CJSLRkZnTCzCl8ZGZ2CDk+IX18uIhJyhYX5VNZt\nU2GheslOBCqhi4iEhBK6iEhIKKGLiISEErqISEgooYuIhIQSuohISCihi4iEhBK6iEhIKKGLiISE\nErqISEgooYuIhIQSuohISCihi4iEhBK6iEhIKKGLiISEErqISEgooYuIhIQSuohISCihi4iEhBK6\niEhIKKGLiISEErqISEgooYuIhIQSuohISCihi4iEhBK6iEhIKKGLiISEErqISEgooYuIhIQSuohI\nLTQFzKzCV6eMjHqNpcEl9IyMTpWevIyMTkGHJyL7mO8Ar+SVX1hYr7E0uIReWJhPZacvMi+8Mjpk\nVP7HrEP9lgREJPHsF3QAUn2FBYWQV8m8vPotCYhI4mlwJXQRSUCNKq9H1q/H+qMSuojU3i706zEB\nqIQuIhISSugiIiGhhC4iEhJK6CIiIVGrm6JmtgLYDJQAO9y9p5mlADOAbGAF0N/dN9cyThERqUJt\nS+glQK67d3P3ntFpNwD/cPdDgVeBG2u5DxERqYbaJnSrYBt9gSnR91OA82u5DxERqYbaJnQH/m5m\n75rZT6PT0t29EMDd1wJptdyHiIhUQ20fLDrR3b8ys7bAXDNbSiTJl7X755i8vLzY+9zcXHJzc2sZ\njohIuMybN4958+ZVa9laJXR3/yr67zdm9gLQEyg0s3R3LzSzDODrytYvm9BFROT7di/sTpw4sdJl\n97rKxcwOMLPm0fcHAmcC/wZmAiOii10M/G1v9yEiItVXmxJ6OvC8mXl0O0+5+1wzWwg8Y2aXAPlA\n/zjEKSIiVdjrhO7uXwI/rGD6BuBHtQlKRERqTk+KioiEhBK6iEhIKKGLiISEErqISEgooYuIhIQS\nuohISCihi4iEhBK6iEhIKKGLiISEEnqCycjohJlV+BIR2ZPadp8rcVZYmE/lPQ4rqYtI5VRCF5E6\n1RQq/dXZKSMj6PBCRSV0EalT37GH35yFhfUZSuiphC4iEhJK6CIiIaGELiISEkroIiIhoYQuIhIS\nSugiIiGhhC4iEhJK6CIiIaGELiISEkroIiIhoYQukgDU34nEg/pyEUkA6u9E4kEldBGRkFBCFxEJ\nCSV0EZGQUEIXEQkJJXQRkZBQQhcRCQkldBGRkAhXQm9U+cMZBzZqpAc3RCTUwvVg0S4gr+JZ/8kr\n0YMbIhJq4Sqhi4jsw5TQRURCQgldRCQklNBFREJCCV1EJCSU0EVEQkIJPSQ0QIKIKKGHROkACRW9\n8sPezr5x40r/mGV07Bh0dCL1JlwPFsm+accOeO21CmcVnnpqPQcjEhyV0EVEQkIJXSTRqUqp4arn\na6cqF5FEpyqlhquer12dldDN7MdmtsTMPjOz6+tqPyIiElEnCd3MkoD/Bc4CugCDzOywuthXXOgn\nrUgw9N2Lq7qqcukJfO7u+QBmNh3oCyypo/3VTth/0ka/NBVJz8pi7cqV9RyQSFTYv3v1rK4Sentg\nVZnPq4kkeQmCvjQi+wS1chERCQlzr2wcn1ps1Ow4IM/dfxz9fAPg7n5nmWXiv2MRkX2Au1dYh1pX\nCb0RsBQ4HfgKWAAMcvdP474zEREB6qgO3d13mdkoYC6Rap3JSuYiInWrTkroIiJS/3RTVEQkJJTQ\nRURCQn25iATEzI4EjgCalU5z96nBRSTVlajXLtR16GaWAhxM+ZP+RnARxdc+cHwGDAF+4O6/MbOO\nQIa7Lwg4tFozswlALpGkMBs4G/inu18UZFzxEm26/CBwONAEaARsdffkQAOLg0S+dqFN6Gb2U2A0\n0AFYBBwHvO3upwUaWJyE/fgAzOwhoAQ4zd0Pj/4Bm+vuxwQcWq2Z2b+BrsAH7t7VzNKBJ939jIBD\niwszWwgMBJ4FegDDgUPc/cZAA4uDRL52Ya5DHw0cA+S7+6lAN2BTsCHFVdiPD+BYd78K2Abg7huJ\nlPbC4Ft3LwF2mlky8DWQFXBMceXuXwCN3H2Xuz8G/DjomOIkYa9dmOvQt7n7tmjPbU3dfYmZHRp0\nUHEU9uMD2BF9SM0BzKwtkRJ7GCw0s1bAI8B7wBbg7WBDiqv/mFkTYJGZ3UXkAcOwFCAT9tqFucrl\neWAkMAY4DdgINHb33oEGFidhPz4AMxsCDAC6A48DFwE3ufuzQcYVb2bWCUh2948CDiVuzCybSMm1\nMTAWaAn8MVpqD41Eu3ahTehlmdkpRP5DzXH37UHHE29hPr5oP/qnRz++GpYnjs3sAiLHszn6uRWQ\n6+4vBBuZVCWRr11oE3r0Lvsn7l4c/ZwMHO7u/wo2stoxs2R3LzKz1Irmu/uG+o6pLpnZ0UAvItUu\n8939/YBDigszW+TuP9xt2gfu3i2omOLBzJ5x9/7RG4ffSy7u/j8BhBVXiXztwlyH/hBwdJnPWyqY\n1hBNA/oQqbtzoGyvaw78IIig6oKZ3Qz0A/5K5DgfM7Nn3f2WYCOLi4rqk8PwfRwd/bdPoFHUrYS9\ndmEuoVf0V/SjMJQQ9hVmthTo6u7bop/3Bxa5e4O/+WtmfybSKukP0UlXAanuPiKwoKRaEvnaheWu\nc0WWm9kvzKxx9DUaWB50UPFiZiea2YHR90PN7L7ogzdhsoYyD00BTYGCgGKJt6uB7cCM6Os7Iokh\nFMzsQjP73Mw2m1mRmRWbWVHQccVJwl67MJfQ04AHiLQAceAVYIy7fx1oYHFiZh8Rebjhf4i0AHkU\n6O/upwQZVzyZ2QtE2tr/ncg1PINI3/qrAdz9F8FFJ3tiZl8A54blJnZDEdqEHnZm9r67Hx2tZy5w\n98ml04KOLV7M7OI9zXf3KfUVS7yY2e/cfYyZzaLim4bnBRBW3JnZfHc/Meg44qkhXLuEqMiPJzP7\nlbvfZWYPUvFJD0uprtjMbgSGASeZWRIhu57uPiX6cMoh0UlL3X1HkDHFwRPRf+8JNIq6t9DMZgAv\nEKmSAMDdnwsupFpL+GsXqgQQVfoTb2GgUdS9AcBgYKS7rzWzk4EDA44prswsF5gCrCDSyiXLzC5u\nyB2Quft70adff+buQ4KOpw4lA/8BziwzzYEGm9AbwrULXUJ391nRk36Uu18bdDx1JZrEXwMGm9mT\nwJfA7wIOK97uBc5096UAZnYI8DSRJ0cbrOgQjdlm1iRsD4KVcveRQcdQFxL92oUuoUPspIeq/q5U\nNKkNir7WEbnLbtEOusKmcWkyB3D3z8yscZABxdFyYL6ZzQS2lk509/uCCyl+zKwZcCnQhfLdO18S\nWFDxk7DXLpQJPWpR9IQ/S/mT3mB/8kUtAd4E+pT2i2FmY4MNqc4sNLNHgSejn4cQnqq0ZdFXEtAi\n4FjqwhNE/q+eBfyGyLULS4uXhL12oW3lYmaPVTDZG3oJwczOJ9LP9InAHGA68Ki75wQaWB0ws6ZE\n2vf2ik56k0gHT99VvlbDYmYHuPt/go4j3kofhS99mC/6y+pNdz8u6NjiJRGvXehK6GZ2p7tfD8wO\nW698ANEOgF6IPlTUl0hvi2nRwSCed/e5gQYYJ9H7IH+O3nwK/KdsvJnZ8cBkoDnQ0cy6Aj939yuD\njSxuSlsjbbLIcG1rgbQA44mbRL52YXxStLeZGdDgR0bZE3ff6u7T3P1cIqMWfQBcH3BYcePuu4Ds\naLPFMPodkeqI9QDu/iFwcqARxdcki4wwNR6YCSwG7go2pLhJ2GsXuhI6kWqIjUDz6KPGZTuvKnH3\nlsGEVXeiI/lMir7CJGFvPsWDu6+KlD1idgUVS7y5+6PRt68Tog7jSiXqtQtdCd3dr3P3VsBL7p7s\n7i3cvQXQG3gq4PCkZpYBL/Lfm0+lrzBYZWYnAB7ta+hawnPTEDNLN7PJZvZ/0c9HmNmlQccVJwl7\n7UJ7UxTAzLoRad7Xn0g77b+6+/8GG5UImFkb4PfAj4j8ipwLjHb39YEGFifRRP4YMM4jAynvR2RQ\n5aMCDq3WEvnahS6hV9JO+1p3zw40MKmx6INTFXXfcFoA4UgNmNm77n5M2YEfKurSWuIrjHXo+1I7\n7bAr+6RvM+AnwM6AYokrM8sh0g1rJ8p8DxOhg6c42WpmrfnvAN/HAZuDDSk+EvnahTGhX0iknfZr\nZlbaTtv2vIokInd/b7dJ881sQSDBxN8LRJq+zQJKAo6lLlxDpHVLZzObD7QlMsh3GCTstQtdlUup\nMu20BxHpE30qIWqnvS/YbdzUJCJ9uDwQkhGL/uXuxwYdR12K1psfSqRAFYaeMoHEvnahTehlRdvD\n9gMGuPvpVS0vicHMvuS/46buJHJj+zfu/s9AA4sDMxsMHEzkhlrZ7mXDMgj2VcBT7r4p+jkFGOTu\nfww2stpL5Gu3TyR0kURjZrcT6ct+Gf/92e5hueFb0Q3QsjdIG7JEvnZhrEOXBq50kJLo+35lu3Aw\ns9vc/dfBRRc3/YAfJGIXrHHSyMzMoyXGaFcOYXnqN2GvXegeLJJQGFjm/e5dOPy4PgOpQx8DrYIO\nog69DMwws9PN7HQijRPmBBxTvCTstVMJXRKRVfK+os8NVStgiZm9S/l62MCbvsXJeOAyoLTDqpeJ\ntAwJg4S9dkrokoi8kvcVfW6oJgQdQF2Itmy5DRgJrIpO7kikX54kEqTPk1pK2Gunm6KScMxsF5HO\nuAzYn8jYlEQ/N3P3sIxaFGNmvYi0Arkq6Fhqw8zuJ9Lfzlh3L45Oa0FkOMFv3X10kPHVhUS6dkro\nIgGJ9jU0mMhNtlD0NWRmnwOH+G6JJXpTdIm7HxxMZPGVqNdOVS4i9WgfGBPWd0/m0Ym7zKxBlx4b\nwrVTKxeR+rWEyJPLfdy9l7s/SDjqlUstNrPhu080s6FEjr0hS/hrpxK6SP0Ke19DVwHPmdklQGlf\nPD2I3Au5ILCo4iPhr53q0EUCEPa+hszsNKBL9ONid38lyHjiKZGvnRK6SMDU11DDlWjXTgldRCQk\ndFNURCQklNBFREJCCV1EJCSU0EUCYGYXmtnnZrbZzIrMrNjMioKOS6qWyNdON0VFAmBmXwDnuvun\nQcciNZPI104ldJFgFCZiQpBqSdhrpxK6SD0yswujb08BMoiMIF+2T+3ngohLqtYQrp0Sukg9MrPH\n9jDb3f2SegtGaqQhXDsldJEAmNmJ7j6/qmmSeBL52imhiwTAzN5396OrmiaJJ5GvnXpbFKlHZnY8\ncALQ1syuKTMrGWgUTFRSHQ3h2imhi9SvJkBzIt+9FmWmFwEXBRKRVFfCXztVuYjUs+hwbM+4+0+C\njkVqzsyy3T0/6DgqohK6SD2LDseWGXQcstcer2g4PXc/LYhgylJCFwnGIjObCTwLbC2dmAhtmaVK\n15Z53wz4CbAzoFjKUZWLSAAqadOcEG2ZpebMbIG79ww6DpXQRQLg7iODjkH2jpmllvmYBHQHWgYU\nTjlK6CIBMLMOwIPAidFJbwKj3X11cFFJNb0HOJEBoncCXwKXBhpRlKpcRAJgZn8HpgFPRCcNBYa4\n+xnBRSUNnRK6SADMbJG7/7CqaZJ4zKwxcAVwcnTSPOBP7r4jsKCi1H2uSDDWm9lQM2sUfQ0F1gcd\nlFTLQ0Tqzf8YfXWPTgucSugiATCzbCJ16McTqY99C/iFu68MNDCpkpl96O5dq5oWBN0UFQlA9EnD\n84KOQ/bKLjPr7O7LAMzsB8CugGMClNBF6pWZ3byH2e7uv623YGRvXQe8ZmbLibR0yQYSohmqqlxE\n6pGZ/bKCyQcSafbW2t2b13NIshfMrClwaPTjUnf/bk/L1xcldJGAmFkLYDSRZP4McK+7fx1sVFIZ\nMzsGWOXua6OfhxN57D8fyHP3DUHGB2rlIlLvzCzVzG4BPiJS7Xm0u1+vZJ7w/gRsBzCzk4E7gKnA\nZmBSgHHFqA5dpB6Z2d3AhUQSwFHuviXgkKT6GpUphQ8AJrn7X4G/mtmiAOOKUZWLSD0ysxIiI8Xv\nJNJcMTaLyE3R5EACkyqZ2cfAD919p5ktAX7m7m+UznP3I4ONUCV0kXrl7qrmbLieBl43s3XAt0T6\n38HMDiJS7RI4ldBFRKrJzI4D2gFz3X1rdNohQHN3fz/Q4FBCFxEJDf38ExEJCSV0EZGQUEIXEQkJ\nJXTZp5lZtpkNKvO5u5n9rg7209fMDov3dkXKUkKXfV0OMLj0g7u/5+5j6mA/5wNd6mC7IjFK6NKg\nmdlwM/vQzD4wsynREvcrZrbIzP4eHbsTM3vMzH5vZvPN7AszuzC6iduBXmb2vpmNNrNTzGxWdJ0J\nZjbZzF6LrnN1mf0OMbN/Rdd7yMwsOr3YzG6J7v8tM2trZscT6Sr3rujyOfV7lmRfoYQuDZaZHQH8\nGsh1927AGCKDRjwWHcptWvRzqQx3PxE4F7gzOu0G4E13P9rdfx+dVrYt76HAGcCxwITo6EKHEXn0\n+wR3PxooAYZElz8QeCu6/zeBy9z9bWAmcF10P1/G8TSIxOhJUWnITgOedfeNAO6+MVoaviA6/wn+\nm7gBXogu96mZpVVzHy+5+04iQ8YVAunA6cDRwLvRknkzYG10+e3uPjv6/j3gR3t3aCI1p4QuYbOn\nJ+XK9llt1dxe2XV2EfnOGDDF3cdVsPz2CpYXqReqcpGG7FWgn5mlQqRbWiJjc5a2WhlKtL+NCpQm\n9GKgRTX3V7rOK8BFZtY2ut8UM8vabZndFQPqeEvqlBK6NFjuvhi4lUiHSR8A9wBXAyOj3ZkOITKA\nBHy/5F76+SOgJHpTdTR75tH9fgrcBMw1sw+BuUT696hoP6WmA9eZ2Xu6KSp1RX25iIiEhEroIiIh\noYQuIhISSugiIiGhhC4iEhJK6CIiIaGELiISEkroIiIh8f/G5eTz8NI/TAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x8c07518>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# side-by-side bar plot of the DataFrame directly above\n",
    "drinks.groupby('continent').mean().plot(kind='bar')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`plot`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 15. How do I explore a pandas Series? ([video](https://www.youtube.com/watch?v=QTVTq8SPzxM&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=15))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.9</td>\n",
       "      <td>Pulp Fiction</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>154</td>\n",
       "      <td>[u'John Travolta', u'Uma Thurman', u'Samuel L....</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating   genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R   Crime       142   \n",
       "1          9.2             The Godfather              R   Crime       175   \n",
       "2          9.1    The Godfather: Part II              R   Crime       200   \n",
       "3          9.0           The Dark Knight          PG-13  Action       152   \n",
       "4          8.9              Pulp Fiction              R   Crime       154   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1    [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3  [u'Christian Bale', u'Heath Ledger', u'Aaron E...  \n",
       "4  [u'John Travolta', u'Uma Thurman', u'Samuel L....  "
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of top-rated IMDb movies into a DataFrame\n",
    "movies = pd.read_csv('http://bit.ly/imdbratings')\n",
    "movies.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "star_rating       float64\n",
       "title              object\n",
       "content_rating     object\n",
       "genre              object\n",
       "duration            int64\n",
       "actors_list        object\n",
       "dtype: object"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the data type of each Series\n",
    "movies.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exploring a non-numeric Series:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count       979\n",
       "unique       16\n",
       "top       Drama\n",
       "freq        278\n",
       "Name: genre, dtype: object"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count the non-null values, unique values, and frequency of the most common value\n",
    "movies.genre.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`describe`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.describe.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Drama        278\n",
       "Comedy       156\n",
       "Action       136\n",
       "Crime        124\n",
       "Biography     77\n",
       "Adventure     75\n",
       "Animation     62\n",
       "Horror        29\n",
       "Mystery       16\n",
       "Western        9\n",
       "Thriller       5\n",
       "Sci-Fi         5\n",
       "Film-Noir      3\n",
       "Family         2\n",
       "Fantasy        1\n",
       "History        1\n",
       "Name: genre, dtype: int64"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count how many times each value in the Series occurs\n",
    "movies.genre.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`value_counts`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Drama        0.283963\n",
       "Comedy       0.159346\n",
       "Action       0.138917\n",
       "Crime        0.126660\n",
       "Biography    0.078652\n",
       "Adventure    0.076609\n",
       "Animation    0.063330\n",
       "Horror       0.029622\n",
       "Mystery      0.016343\n",
       "Western      0.009193\n",
       "Thriller     0.005107\n",
       "Sci-Fi       0.005107\n",
       "Film-Noir    0.003064\n",
       "Family       0.002043\n",
       "Fantasy      0.001021\n",
       "History      0.001021\n",
       "Name: genre, dtype: float64"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# display percentages instead of raw counts\n",
    "movies.genre.value_counts(normalize=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'value_counts' (like many pandas methods) outputs a Series\n",
    "type(movies.genre.value_counts())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Drama        278\n",
       "Comedy       156\n",
       "Action       136\n",
       "Crime        124\n",
       "Biography     77\n",
       "Name: genre, dtype: int64"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# thus, you can add another Series method on the end\n",
    "movies.genre.value_counts().head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Crime', 'Action', 'Drama', 'Western', 'Adventure', 'Biography',\n",
       "       'Comedy', 'Animation', 'Mystery', 'Horror', 'Film-Noir', 'Sci-Fi',\n",
       "       'History', 'Thriller', 'Family', 'Fantasy'], dtype=object)"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# display the unique values in the Series\n",
    "movies.genre.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "16"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count the number of unique values in the Series\n",
    "movies.genre.nunique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`unique`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html) and [**`nunique`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.nunique.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>content_rating</th>\n",
       "      <th>APPROVED</th>\n",
       "      <th>G</th>\n",
       "      <th>GP</th>\n",
       "      <th>NC-17</th>\n",
       "      <th>NOT RATED</th>\n",
       "      <th>PASSED</th>\n",
       "      <th>PG</th>\n",
       "      <th>PG-13</th>\n",
       "      <th>R</th>\n",
       "      <th>TV-MA</th>\n",
       "      <th>UNRATED</th>\n",
       "      <th>X</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>genre</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Action</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>44</td>\n",
       "      <td>67</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Adventure</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>21</td>\n",
       "      <td>23</td>\n",
       "      <td>17</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Animation</th>\n",
       "      <td>3</td>\n",
       "      <td>20</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>25</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Biography</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>29</td>\n",
       "      <td>36</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Comedy</th>\n",
       "      <td>9</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>16</td>\n",
       "      <td>3</td>\n",
       "      <td>23</td>\n",
       "      <td>23</td>\n",
       "      <td>73</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Crime</th>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>4</td>\n",
       "      <td>87</td>\n",
       "      <td>0</td>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Drama</th>\n",
       "      <td>12</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>24</td>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>55</td>\n",
       "      <td>143</td>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Family</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fantasy</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Film-Noir</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>History</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Horror</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>16</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mystery</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sci-Fi</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thriller</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Western</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "content_rating  APPROVED   G  GP  NC-17  NOT RATED  PASSED  PG  PG-13    R  \\\n",
       "genre                                                                        \n",
       "Action                 3   1   1      0          4       1  11     44   67   \n",
       "Adventure              3   2   0      0          5       1  21     23   17   \n",
       "Animation              3  20   0      0          3       0  25      5    5   \n",
       "Biography              1   2   1      0          1       0   6     29   36   \n",
       "Comedy                 9   2   1      1         16       3  23     23   73   \n",
       "Crime                  6   0   0      1          7       1   6      4   87   \n",
       "Drama                 12   3   0      4         24       1  25     55  143   \n",
       "Family                 0   1   0      0          0       0   1      0    0   \n",
       "Fantasy                0   0   0      0          0       0   0      0    1   \n",
       "Film-Noir              1   0   0      0          1       0   0      0    0   \n",
       "History                0   0   0      0          0       0   0      0    0   \n",
       "Horror                 2   0   0      1          1       0   1      2   16   \n",
       "Mystery                4   1   0      0          1       0   1      2    6   \n",
       "Sci-Fi                 1   0   0      0          0       0   0      1    3   \n",
       "Thriller               1   0   0      0          0       0   1      0    3   \n",
       "Western                1   0   0      0          2       0   2      1    3   \n",
       "\n",
       "content_rating  TV-MA  UNRATED  X  \n",
       "genre                              \n",
       "Action              0        3  0  \n",
       "Adventure           0        2  0  \n",
       "Animation           0        1  0  \n",
       "Biography           0        0  0  \n",
       "Comedy              0        4  1  \n",
       "Crime               0       11  1  \n",
       "Drama               1        9  1  \n",
       "Family              0        0  0  \n",
       "Fantasy             0        0  0  \n",
       "Film-Noir           0        1  0  \n",
       "History             0        1  0  \n",
       "Horror              0        5  1  \n",
       "Mystery             0        1  0  \n",
       "Sci-Fi              0        0  0  \n",
       "Thriller            0        0  0  \n",
       "Western             0        0  0  "
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compute a cross-tabulation of two Series\n",
    "pd.crosstab(movies.genre, movies.content_rating)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`crosstab`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.crosstab.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exploring a numeric Series:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    979.000000\n",
       "mean     120.979571\n",
       "std       26.218010\n",
       "min       64.000000\n",
       "25%      102.000000\n",
       "50%      117.000000\n",
       "75%      134.000000\n",
       "max      242.000000\n",
       "Name: duration, dtype: float64"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate various summary statistics\n",
    "movies.duration.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "120.97957099080695"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# many statistics are implemented as Series methods\n",
    "movies.duration.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`mean`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "112    23\n",
       "113    22\n",
       "102    20\n",
       "101    20\n",
       "129    19\n",
       "Name: duration, dtype: int64"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'value_counts' is primarily useful for categorical data, not numerical data\n",
    "movies.duration.value_counts().head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# allow plots to appear in the notebook\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x9e95710>"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAEACAYAAACgS0HpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFQFJREFUeJzt3X+sX/V93/HnCzv8Ci2iybArOzNkoNR0i5xkWNtYp2/W\nxSGriinTGG20QRhTJErIVE3DZtpsVd0SqiVdtgppKknrZEHEoUkw/cGvkW+7RApOA15M7DJPmwmw\n+K6qsiSELTHhvT++x3C53Gt/7vX31733+ZC+4nw/33PO53MP5/p1P5/POeebqkKSpFM5Y9INkCQt\nDwaGJKmJgSFJamJgSJKaGBiSpCYGhiSpyUgDI8lZSR5L8kSSg0l2deW7kjyb5PHudeWsbXYmOZLk\ncJJto2yfJKldRn0fRpJzq+qFJGuALwO3Au8BvldVH52z7mbgbuByYCPwCHBpebOIJE3cyIekquqF\nbvEsYC1w4h//zLP6duCeqnqxqo4CR4Cto26jJOnURh4YSc5I8gRwDHi4qr7afXRLkgNJ7kpyfle2\nAXhm1ubPdWWSpAkbRw/jpap6G4Mhpq1JLgPuBN5cVVsYBMlHRt0OSdLpWTuuiqrqu0n6wJVz5i5+\nC7i/W34OeNOszzZ2Za+SxDkNSVqCqppvOqDJqK+SeuOJ4aYk5wDvAv40yfpZq10DPNkt7wOuS3Jm\nkouBS4D98+27qnwN6bVr166Jt2ElvTyeHstpfZ2uUfcwfhLYk+QMBuH0mar6gySfTLIFeAk4Crwf\noKoOJdkLHAKOAzfXMH5KSdJpG2lgVNVB4O3zlP+jk2zzIeBDo2yXJGnxvNNb9Hq9STdhRfF4Do/H\ncrqM/Ma9UUjiSJUkLVISalonvSVJK4eBIUlqYmBIkpoYGJKkJgaGJKmJgSFJamJgSJKaGBiSpCYG\nhiSpiYEhSWpiYEiSmhgYkqQmBoYkqYmBIUlqYmBIkpoYGJKkJgaGJKmJgSFJamJgSJKaGBiSpCYG\nhiSpyUgDI8lZSR5L8kSSg0l2deUXJHkoyVNJHkxy/qxtdiY5kuRwkm2jbJ8kqV2qarQVJOdW1QtJ\n1gBfBm4F/h7w51X160luAy6oqh1JLgM+DVwObAQeAS6tOY1MMrdIknQKSaiqLHX7kQ9JVdUL3eJZ\nwFqggO3Anq58D3B1t3wVcE9VvVhVR4EjwNZRt1GSdGojD4wkZyR5AjgGPFxVXwXWVdUMQFUdAy7s\nVt8APDNr8+e6MknShK0ddQVV9RLwtiQ/Dnw+yU8z6GW8arXF7nf37t0vL/d6PXq93mm0cvVav/4i\nZmaeHnu969Zt4tixo2OvV1pN+v0+/X5/aPsb+RzGqypL/iXwAnAT0KuqmSTrgS9W1eYkO4Cqqju6\n9R8AdlXVY3P24xzGkCRhCXk9jJrx/6E0XlM9h5HkjSeugEpyDvAu4DCwD7ihW+164L5ueR9wXZIz\nk1wMXALsH2UbJUltRj0k9ZPAniRnMAinz1TVHyT5CrA3yY3A08C1AFV1KMle4BBwHLjZroQkTYex\nDkkNi0NSw+OQlLR6TPWQlCRp5TAwJElNDAxJUhMDQ5LUxMCQJDUxMCRJTQwMSVITA0OS1MTAkCQ1\nMTAkSU0MDElSEwNDktTEwJAkNTEwJElNDAxJUhMDQ5LUxMCQJDUxMCRJTQwMSVITA0OS1MTAkCQ1\nMTAkSU3WTroBWq3OIsnYa123bhPHjh0de73SSjDSHkaSjUkeTfKNJAeTfKAr35Xk2SSPd68rZ22z\nM8mRJIeTbBtl+zRJPwBq7K+ZmafH8tNJK1GqanQ7T9YD66vqQJLzgK8B24F/AHyvqj46Z/3NwN3A\n5cBG4BHg0prTyCRzi7REg7/yJ3EsJ1ev545WqyRU1ZK79iPtYVTVsao60C0/DxwGNnQfz9fo7cA9\nVfViVR0FjgBbR9lGSVKbsU16J7kI2AI81hXdkuRAkruSnN+VbQCembXZc7wSMJKkCRrLpHc3HHUv\n8MGqej7JncCvVlUl+TXgI8BNi9nn7t27X17u9Xr0er3hNViSVoB+v0+/3x/a/kY6hwGQZC3we8Af\nVtXH5vl8E3B/Vb01yQ6gquqO7rMHgF1V9dicbZzDGBLnMKTVY6rnMDqfAA7NDotuMvyEa4Anu+V9\nwHVJzkxyMXAJsH8MbZQkncJIh6SSXAG8FziY5AkGf1LeDvxSki3AS8BR4P0AVXUoyV7gEHAcuNmu\nhCRNh5EPSY2CQ1LD45CUtHoshyEpSdIKYGBIkpoYGJKkJgaGJKmJgSFJamJgSJKaGBiSpCYGhiSp\niYEhSWpiYEiSmhgYkqQmBoYkqYmBIUlqYmBIkpoYGJKkJgaGJKmJgSFJamJgSJKaGBiSpCZNgZHk\nr4y6IZKk6dbaw7gzyf4kNyc5f6QtkiRNpabAqKqfAd4LvAn4WpK7k7xrpC2TJE2VVFX7yska4Grg\n3wPfBQLcXlWfG03zFmxHLabdWlgSYBLHcnL1eu5otUpCVWWp27fOYbw1yW8Ah4G/Dfx8VW3uln/j\nJNttTPJokm8kOZjk1q78giQPJXkqyYOzh7mS7ExyJMnhJNuW+oNJkoarqYeR5I+Au4B7q+r/zvns\nH1bVpxbYbj2wvqoOJDkP+BqwHXgf8OdV9etJbgMuqKodSS4DPg1cDmwEHgEundudsIcxPPYwpNVj\nLD0M4OeAu0+ERZIzkpwLsFBYdJ8dq6oD3fLzDHooGxmExp5utT0MhrkArgLuqaoXq+oocATYuqif\nSJI0Eq2B8Qhwzqz353ZlzZJcBGwBvgKsq6oZGIQKcGG32gbgmVmbPdeVSZImbG3jemd3PQRg0Fs4\n0cNo0Q1H3Qt8sNt27pjAoscIdu/e/fJyr9ej1+stdheStKL1+336/f7Q9tc6h/Fl4ANV9Xj3/h3A\nb1bVX2/Ydi3we8AfVtXHurLDQK+qZrp5ji9W1eYkO4Cqqju69R4AdlXVY3P26RzGkDiHIa0e45rD\n+KfAZ5P8lyRfAj4D3NK47SeAQyfCorMPuKFbvh64b1b5dUnOTHIxcAmwv7EeSdIINd+HkeR1wFu6\nt09V1fGGba4A/hg4yODPyQJuZxACexncCPg0cG1V/Z9um53APwaOMxjCemie/drDGBJ7GNLqcbo9\njMUExt8ALmLWvEdVfXKpFZ8OA2N4DAxp9TjdwGia9E7yKeAvAQeAH3XFBUwkMCRJ49d6ldRfBS7z\nz3pJWr1aJ72fBNaPsiGSpOnW2sN4I3AoyX7gBycKq+qqkbRKkjR1WgNj9ygbIUmafou5SmoTgwcB\nPtLd5b2mqr430tYt3BanU4bEq6Sk1WNcjzf/Jwwe7fEfu6INwBeWWqkkaflpnfT+ZeAKBl+aRFUd\n4ZUHBkqSVoHWwPhBVf3wxJvu+VD26yVpFWkNjD9KcjtwTvdd3p8F7h9dsyRJ06b1abVnMHi+0zYG\ns5UPAndNaubZSe/hcdJbWj3G9iypaWJgDI+BIa0e43qW1P9knt/uqnrzUiuWJC0vi3mW1AlnA38f\n+InhN0eSNK2WPCSV5GtV9Y4ht6e1boekhsQhKWn1GNeQ1NtnvT2DQY+jtXciSVoBWv/R/8is5ReB\no8C1Q2/NKrd+/UXMzDw96WZI0ry8SmqKTGZ4yCEpabUY15DUr5zs86r66FIbIElaHhZzldTlwL7u\n/c8D+4Ejo2iUJGn6tN7p/cfAz514nHmSHwN+v6r+1ojbt1B7HJIaXq0TqHOy9a7Ec0dqMZbHmwPr\ngB/Oev/DrkyStEq0Dkl9Etif5PPd+6uBPaNpkiRpGjX1MKrqXwPvA77dvd5XVf/mVNsl+XiSmSRf\nn1W2K8mzSR7vXlfO+mxnkiNJDifZtvgfR5I0Kq1DUgDnAt+tqo8Bzya5uGGb3wbePU/5R6vq7d3r\nAYAkmxnc27EZeA9wZwaD+pKkKdD6Fa27gNuAnV3R64D/dKrtqupLDHokr9nlPGXbgXuq6sWqOsrg\nCqytLe2TJI1eaw/jF4CrgO8DVNX/An7sNOq9JcmBJHclOb8r2wA8M2ud57oySdIUaJ30/mFVVZIC\nSPL606jzTuBXu/39GoPHjty02J3s3r375eVer0ev1zuNJknSytPv9+n3+0PbX+t9GP8MuBR4F/Ah\n4Ebg7qr6Dw3bbgLur6q3nuyzJDuAqqo7us8eAHZV1WPzbOd9GMOrdQJ1TrbelXjuSC3Gch9GVf1b\n4F7gd4G3AP+qJSw6YdacRZL1sz67BniyW94HXJfkzG5C/RIGd5NLkqbAKYekkqwBHqmqdwIPL2bn\nSe4GesAbknwT2AW8M8kW4CUGT719P0BVHUqyFzgEHAduXpHdCElaplqHpP4zcE1VfWf0TTo1h6SG\nWusE6pxsvSvx3JFajOVptcDzwMEkD9NdKQVQVbcutWJJ0vLSGhif616SpFXqpENSSf5iVX1zjO1p\n4pDUUGudQJ2TrXclnjtSi1FfJfWFWRX97lIrkSQtf6cKjNlJ9OZRNkSSNN1OFRi1wLIkaZU51RzG\njxhcFRXgHOCFEx8xuCv7x0fewvnb5RzG8GqdQJ2TrXclnjtSi5FeVltVa5a6Y0nSyrKY78OQJK1i\nBoYkqYmBIUlqYmBIkpoYGJKkJgaGJKmJgSFJamJgSJKaGBiSpCYGhiSpiYEhSWpiYEiSmrR+Rau0\nQpzVPRV4vNat28SxY0fHXq80TCd9vPm08vHmQ611AnWuznpX4jmr5WXUX9EqSRIw4sBI8vEkM0m+\nPqvsgiQPJXkqyYNJzp/12c4kR5IcTrJtlG2TJC3OqHsYvw28e07ZDuCRqnoL8CiwEyDJZcC1wGbg\nPcCdmcRgsyRpXiMNjKr6EvDtOcXbgT3d8h7g6m75KuCeqnqxqo4CR4Cto2yfJKndJOYwLqyqGYCq\nOgZc2JVvAJ6Ztd5zXZkkaQpMw2W1S7p0ZPfu3S8v93o9er3ekJojSStDv9+n3+8PbX8jv6w2ySbg\n/qp6a/f+MNCrqpkk64EvVtXmJDuAqqo7uvUeAHZV1WPz7NPLaodX6wTqXJ31rsRzVsvLcrisNt3r\nhH3ADd3y9cB9s8qvS3JmkouBS4D9Y2ifJKnBSIekktwN9IA3JPkmsAv4MPDZJDcCTzO4MoqqOpRk\nL3AIOA7cvCK7EZK0THmn9xRxSGpl17sSz1ktL8thSEqStAIYGJKkJgaGJKmJgSFJamJgSJKaGBiS\npCYGhiSpiYEhSWpiYEiSmhgYkqQmBoYkqYmBIUlqYmBIkpoYGJKkJgaGJKmJgSFJamJgSJKajPQr\nWiWdcFb3jYrjtW7dJo4dOzr2erUy+RWtU8SvaLXeUdS7En9XtDR+RaskaSwMDElSEwNDktTEwJAk\nNZnYVVJJjgLfAV4CjlfV1iQXAJ8BNgFHgWur6juTaqMk6RWT7GG8BPSq6m1VtbUr2wE8UlVvAR4F\ndk6sdZKkV5lkYGSe+rcDe7rlPcDVY22RJGlBkwyMAh5O8tUkN3Vl66pqBqCqjgEXTqx1kqRXmeSd\n3ldU1beS/AXgoSRP8do7mxa842j37t0vL/d6PXq93ijaKEnLVr/fp9/vD21/U3Gnd5JdwPPATQzm\nNWaSrAe+WFWb51nfO72HV+sE6rTecda7En9XtDTL8k7vJOcmOa9bfj2wDTgI7ANu6Fa7HrhvEu2T\nJL3WpIak1gGfT1JdGz5dVQ8l+RNgb5IbgaeBayfUPknSHFMxJLVYDkkNtdYJ1Gm946x3Jf6uaGmW\n5ZCUJGn5MTAkSU0MDElSEwNDktTEwJAkNTEwJElNDAxJUhMDQ5LUxMCQJDUxMCRJTSb5ePOp9Du/\nczef+9zvj73ejRvXj71OrQZndY+cGZ916zZx7NjRsdap8fBZUnO84x0/y+OPbwX+8kj2v5A1a27i\nRz/6f/gsKetd/vX6/KppdbrPkrKHMa+/A/zsWGtcs+bmLjAkaTo5hyFJamJgSJKaGBiSpCYGhiSp\niYEhSWriVVKShmz8936A93+Mg4Ehach+wCTuOZmZGX9IrTYOSUmSmhgYkqQmUxkYSa5M8qdJ/luS\n2ybdHknSFAZGkjOA3wTeDfw08ItJfmqyrVrp+pNuwArTn3QDVpD+pBugWaYuMICtwJGqerqqjgP3\nANsn3KYVrj/pBqww/Uk3YAXpL2LdwdVZ436tX3/RiH726TONV0ltAJ6Z9f5ZBiEiSScxqauzzl41\nlxFPY2BM1Nlnv45zz/0XrF3778Za7/e/75NqpeVp9VxGPHXfh5HkrwG7q+rK7v0OoKrqjlnrTFej\nJWmZOJ3vw5jGwFgDPMXgCym+BewHfrGqDk+0YZK0yk3dkFRV/SjJLcBDDCblP25YSNLkTV0PQ5I0\nnabxstrXSHI0yX9N8kSS/V3ZBUkeSvJUkgeTnD/pdk6rJB9PMpPk67PKFjx+SXYmOZLkcJJtk2n1\ndFrgWO5K8mySx7vXlbM+81ieRJKNSR5N8o0kB5Pc2pV7fi7SPMfyA1358M7Pqpr6F/A/gAvmlN0B\n/PNu+Tbgw5Nu57S+gL8JbAG+fqrjB1wGPMFguPIi4L/T9UR9LXgsdwG/Ms+6mz2Wpzye64Et3fJ5\nDOYvf8rzc6jHcmjn57LoYQDhtb2h7cCebnkPcPVYW7SMVNWXgG/PKV7o+F0F3FNVL1bVUeAI3gfz\nsgWOJQzO0bm247E8qao6VlUHuuXngcPARjw/F22BY7mh+3go5+dyCYwCHk7y1SQ3dWXrqmoGBgcK\nuHBirVueLlzg+M29cfI5XjnptLBbkhxIctes4ROP5SIkuYhB7+0rLPz77TFtMOtYPtYVDeX8XC6B\ncUVVvR34u8AvJ/kZXnunjLP3p8fjt3R3Am+uqi3AMeAjE27PspPkPOBe4IPdX8f+fi/RPMdyaOfn\nsgiMqvpW998/A77AoNs0k2QdQJL1wP+eXAuXpYWO33PAm2att7Er0wKq6s+qGxQGfotXuvUeywZJ\n1jL4B+5TVXVfV+z5uQTzHcthnp9THxhJzu0SkySvB7YBB4F9wA3datcD9827A50QXj2OudDx2wdc\nl+TMJBcDlzC4eVKveNWx7P5BO+Ea4Mlu2WPZ5hPAoar62Kwyz8+lec2xHOb5OXU37s1jHfD57nEg\na4FPV9VDSf4E2JvkRuBp4NpJNnKaJbkb6AFvSPJNBldNfBj47NzjV1WHkuwFDgHHgZtn/XWy6i1w\nLN+ZZAvwEnAUeD94LFskuQJ4L3AwyRMMhp5uZ3CV1Gt+vz2mCzvJsfylYZ2f3rgnSWoy9UNSkqTp\nYGBIkpoYGJKkJgaGJKmJgSFJamJgSJKaGBiSpCYGhiSpyf8HHxttW4V88GMAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x9935dd8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# histogram of the 'duration' Series (shows the distribution of a numerical variable)\n",
    "movies.duration.plot(kind='hist')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0xa234208>"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEqCAYAAAAF56vUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYHGW5/vHvnQRkk4hsIwQICgJRFBQiCuqgggsquLHp\nEcTtHA6bHBdwS1DPUURUlJ8LshgUxKAiARVZR0Rk3yIBjEsQUYKgsrgG8vz+eN9mKp1OpruqmnRq\n7s919ZVOTc8zNTM9T731vJsiAjMzW/lNWNEnYGZm9XBCNzNrCCd0M7OGcEI3M2sIJ3Qzs4ZwQjcz\na4gxE7qkJ0i6WtKNkuZKmpGPryPpQkl3SPqxpMmFzzla0nxJt0navZ/fgJmZJepmHLqkNSLi75Im\nAj8DDgPeANwfEZ+W9AFgnYg4StI04AxgR2AKcDGwZXjAu5lZX3VVcomIv+enTwAmAQHsCczKx2cB\ne+XnrwXOiohHImIBMB+YXtcJm5lZZ10ldEkTJN0I3ANcFBHXAhtGxEKAiLgH2CC/fGPgrsKn352P\nmZlZH03q5kURsRjYXtLawDmSnkFqpS/xsl6+sCSXYMzMSogIdTre0yiXiHgQGAFeASyUtCGApCHg\n3vyyu4FNCp82JR/rFK+rx4wZM7p+rWOu/OfomI7pmMt+LE83o1zWa41gkbQ6sBtwGzAHODC/7ADg\n3Px8DrCvpFUlbQ5sAVwz1tcxM7Nquim5PAWYJWkC6QLw7Yj4oaSrgNmSDgLuBPYGiIh5kmYD84BF\nwMEx1mXFzMwqGzOhR8Rc4Dkdjv8ZeNkyPueTwCcrn102PDxcV6hxH3NlOEfHdEzHLKercej9IMkN\ndzOzHkki6ugUNTOzweWEbmbWEE7oZmYN4YRuZtYQTuhmZg3hhG5m1hBO6GZmDeGEbmbWEE7oZmYN\n4YRuZtYQTuhmZg0xcAl9aGgqksZ8DA1NXdGnamY2UAZucS5JdLf5kcZc7N3MrGm8OJeZ2TjghG5m\n1hBO6GZmDeGEbmbWEE7oZmYN4YRuZtYQTuhmZg3hhG5m1hBO6GZmDeGEbmbWEE7oZmYN4YRuZtYQ\nTuhmZg3hhG5m1hBjJnRJUyRdKulWSXMlHZqPz5D0e0k35McrCp9ztKT5km6TtHs/vwEzM0vGXA9d\n0hAwFBE3SVoLuB7YE9gHeCgiPtv2+m2AM4EdgSnAxcCW7Yufez10M7PeVVoPPSLuiYib8vOHgduA\njVuxO3zKnsBZEfFIRCwA5gPTy5y4mZl1r6cauqSpwHbA1fnQIZJuknSypMn52MbAXYVPu5vRC4CZ\nmfVJ1wk9l1u+AxyeW+pfAp4aEdsB9wDH9+cUzcysG5O6eZGkSaRk/o2IOBcgIv5UeMnXgPPy87uB\nTQofm5KPLWXmzJmPPR8eHmZ4eLjL0zYzGx9GRkYYGRnp6rVdbRIt6XTgvog4snBsKCLuyc/fA+wY\nEftLmgacATyPVGq5CHeKmpnVYnmdomO20CXtDLwZmCvpRlK2/SCwv6TtgMXAAuDdABExT9JsYB6w\nCDi4Y+Y2M7NaddVC78sXdgvdzKxnlYYtmpnZysEJ3cysIZzQzcwawgndzKwhnNDNzBrCCd3MrCGc\n0M3MGsIJ3cysIZzQzcwawgndzKwhnNDNzBrCCd3MrCGc0M3MGsIJ3cysIZzQzcwawgndzKwhnNDN\nzBrCCd3MrCGc0M3MGsIJ3cysIZzQzcwawgndzKwhnNDNzBrCCd3MrCGc0M3MGsIJ3cysIZzQzcwa\nwgndzKwhnNDNzBpizIQuaYqkSyXdKmmupMPy8XUkXSjpDkk/ljS58DlHS5ov6TZJu/fzGzAzs0QR\nsfwXSEPAUETcJGkt4HpgT+BtwP0R8WlJHwDWiYijJE0DzgB2BKYAFwNbRtsXktR+qHUcWP455Vcy\n1rmbmTWNJCJCnT42Zgs9Iu6JiJvy84eB20iJek9gVn7ZLGCv/Py1wFkR8UhELADmA9MrfQdmZjam\nnmrokqYC2wFXARtGxEJISR/YIL9sY+CuwqfdnY+ZmVkfTer2hbnc8h3g8Ih4WFJ7vaPn+sfMmTMf\nez48PMzw8HCvIczMGm1kZISRkZGuXjtmDR1A0iTgfOBHEXFCPnYbMBwRC3Od/bKI2EbSUUBExLH5\ndRcAMyLi6raYrqGbmfWoUg09OxWY10rm2RzgwPz8AODcwvF9Ja0qaXNgC+Cans/azMx60s0ol52B\ny4G5pKZzAB8kJenZwCbAncDeEfHX/DlHA28HFpFKNBd2iOsWuplZj5bXQu+q5NIPTuhmZr2ro+Ri\nZmYDzgndzKwhnNDNzBrCCd3MrCGc0M3MGsIJ3cysIZzQzcwawgndzKwhnNDNzBrCCd3MrCGc0M3M\nGsIJ3cysIZzQzcwawgndzKwhnNDNzBrCCd3MrCGc0M3MGsIJ3cysIZzQzcwawgndzKwhnNDNzBrC\nCd3MrCGc0M3MGsIJ3cysIZzQzcwawgndzKwhnNDNzBrCCd3MrCHGTOiSTpG0UNIthWMzJP1e0g35\n8YrCx46WNF/SbZJ279eJd2toaCqSunoMDU1d0adrZlaaImL5L5B2AR4GTo+IZ+VjM4CHIuKzba/d\nBjgT2BGYAlwMbBkdvoikToeRBCz/nPIrGevce4vXfUwzsxVFEhGhTh8bs4UeEVcAf+kUt8OxPYGz\nIuKRiFgAzAem93CuZmZWUpUa+iGSbpJ0sqTJ+djGwF2F19ydj5mZWZ9NKvl5XwI+FhEh6RPA8cA7\neg0yc+bMx54PDw8zPDxc8nTMzJppZGSEkZGRrl47Zg0dQNJmwHmtGvqyPibpKCAi4tj8sQuAGRFx\ndYfPcw3dzKxHlWrorRgUauaShgofez3wi/x8DrCvpFUlbQ5sAVzT+ymbmVmvxiy5SDoTGAbWlfQ7\nYAawq6TtgMXAAuDdABExT9JsYB6wCDi4YzPczMxq11XJpS9f2CUXM7Oe1VFyMTOzAeeEbmbWEE7o\nZmYN4YRuZtYQTuhmZg3hhF6CV3A0s0HkYYsDEtPMrBsetmhmNg44oZuZNYQTuplZQzihm5k1hBO6\nmVlDOKGbmTWEE7qZWUM4oQ8IT1Yys6o8sajBMc2seTyxyMxsHHBCNzNrCCd0M7OGcEI3M2sIJ3Qz\ns4ZwQjczawgndDOzhnBCNzNrCCd0M7OGcEI3M2sIJ3Qzs4ZwQjcza4gxE7qkUyQtlHRL4dg6ki6U\ndIekH0uaXPjY0ZLmS7pN0u79OnEzM1tSNy3004CXtx07Crg4IrYCLgWOBpA0Ddgb2AZ4JfAlpWUE\nzcysz8ZM6BFxBfCXtsN7ArPy81nAXvn5a4GzIuKRiFgAzAem13OqZma2PGVr6BtExEKAiLgH2CAf\n3xi4q/C6u/MxMzPrs0k1xSm128LMmTMfez48PMzw8HBNp2Nm1gwjIyOMjIx09dqudiyStBlwXkQ8\nK///NmA4IhZKGgIui4htJB0FREQcm193ATAjIq7uENM7FvU5ppk1Tx07Fik/WuYAB+bnBwDnFo7v\nK2lVSZsDWwDX9HzGZmbWszFLLpLOBIaBdSX9DpgBfAo4W9JBwJ2kkS1ExDxJs4F5wCLg4I7NcDMz\nq503iW5wTDNrHm8SbWY2Djihm5k1hBO6mVlDOKGbmTWEE7qZWUM4oZuZNYQTuplZQzihm5k1hBO6\nmVlDOKGbmTWEE7qZWUM4oZuZNYQTeoMNDU1FUlePoaGpK/p0zawir7bomD3FNLMVy6stWm3c6jcb\nXG6hO+YKj2lm3XML3cxsHHBCNzNrCCd0M7OGcEI3M2sIJ3Qzs4ZwQjczawgndDOzhnBCNzNrCCd0\nW+E8+9SsHp4p6piNjGnWVJ4pamY2Djihm5k1xKQqnyxpAfAAsBhYFBHTJa0DfBvYDFgA7B0RD1Q8\nTzMzG0PVFvpiYDgito+I6fnYUcDFEbEVcClwdMWvYWZmXaia0NUhxp7ArPx8FrBXxa9hZmZdqJrQ\nA7hI0rWS3pGPbRgRCwEi4h5gg4pfw8zMulCphg7sHBF/lLQ+cKGkO1h6/Nkyx5jNnDnzsefDw8MM\nDw9XPB0zs2YZGRlhZGSkq9fWNg5d0gzgYeAdpLr6QklDwGURsU2H13scumP2LaZZU/VlHLqkNSSt\nlZ+vCewOzAXmAAfmlx0AnFv2a5iZWfeqlFw2BM6RFDnOGRFxoaTrgNmSDgLuBPau4TzNzGwMnvrv\nmI2MadZUnvpvZjYOOKGbmTWEE7qZWUM4oZuZNYQTuplZQzihm5k1hBO6mVlDOKGbmTWEE7qZWUM4\noZuZNYQTuplZQzihWyMNDU1F0piPoaGpK/pUzWrjxbkcc5zH9GJftnLx4lxmZuOAE7qZWUM4oZuZ\nNYQTuplZQzihm5k1hBO6mVlDOKGbmTWEE7qZWUM4oZt1ybNPbdB5pqhjjvOY3c8U9exTGwSeKWpm\nNg44oZuZNYQTuplZQzihm61A7mi1OrlT1DHHecwV2ynqjlbr1QrpFJX0Ckm3S/qlpA/06+uYmVnS\nl4QuaQJwIvBy4BnAfpK2Lh9xpJbzcsx+xHPMQYvZjzLO41UaGhkZqfT54z1mv1ro04H5EXFnRCwC\nzgL2LB9upJ6zcsw+xHPMQYu5cOGdpDJO8TFjqWPpdSsuZqeLxK677uqLRAX9SugbA3cV/v/7fMzM\nDHj8LhLHHHPMuOlk9igXM2uMleEisazyVR0Xnr6McpG0EzAzIl6R/38UEBFxbOE17rI3MythWaNc\n+pXQJwJ3AC8F/ghcA+wXEbfV/sXMzAyASf0IGhGPSjoEuJBU1jnFydzMrL9W2MQiMzOrlztFzcwa\nwgndrGEkTZC094o+D3v8jZuELulQSeus6PPolqQ1aoy1UnzvkiZK2kjSpq1HDfEuq+v8CnFfk2dD\nD6SIWAy8f0WfRzfyxecFK/o8upEHe9Qdc9064w3sm1LSlpK+I2mepN+0HhVCbghcK2l2Xmem47Cf\nHs9xfUkflHSSpFNbj4oxXyBpHnB7/v+zJX2p4qnW+r0reYukj+b/byppesWYhwILgYuAH+TH+VVi\nRsSjwGJJk6vE6WAfYL6kT1db0iKRtLOki/K6R7+R9NuK73WAiyW9V9Imkp7celQ4xyvyvw9JerDw\neEjSg2Xj5ovP/yv7+csi6XuS9qj5wjtf0nGSptUY8ypJZ0t6VS05aVA7RfMbaAbwOeA1wNuACRHx\n0QoxBeyeY+0AzCaNwPl1yXhXAj8FrgcebR2PiO9WOMergTcCcyJi+3zsFxHxzLIxc4zavndJXwYW\nAy+JiG1y6//CiNixwvn9CnheRNxfNsYy4p4LbE+6UPytdTwiDqsYd21gP9LPM4DTgG9FxEMlYt0O\nvIel30elfxaSftvhcETEU8vG7BdJnwF+Dnyv4xKs5WK+jPS72Qk4GzgtIu6oGPOJwL457gTgVOCs\niCh9Qct/ly8DDgJ2JP1dfj0iflkqYEQM5AO4Pv87t/1YxbjPBj5PagF/GbgR+HTJWDf14fu+Ov97\nY+HYzTXFruV7B26o+xyBy4BJffh5HtDpUVPsdYEjgAXAj4D5wKFlf+eD/ABeX3i+Ts2xHyI1EP4N\nPJj//2BNsScD/0laiuRKUjJepYa4LwbuJjUSZgFb1BBz1xzzr8BPgOf3GqMv49Br8q98uzQ/j2m/\nG1irbDBJhwNvBe4DTgbeFxGLWl+DcjXH8yW9KiJ+WPa8Orgr1xRD0irA4UClMfx9+N4X5Xpi5Pjr\nk/4gq/gNMCLpB8C/Wgcj4rNVgkbELEmrAk/Ph+6ItGBcaZL2BA4EtgBOB6ZHxL2532Me8MUeQ14m\n6Tjgeyz5vd9Q4RzXAI4ENo2Id0naEtgqIsqWsT6czw/gEuA5Zc+tXUQ8sa5YRbk+/RbgP0iNlzOA\nXUgX9eES8SYCe5AuClOB43PMFwI/ZPQ9VvYcFwKHAnOA7Uh3Fpv3Em+QE/rhwBrAYcDHgZeQfhFl\nPZnUylhiEYeIWCzp1RXO8YOS/g20kkRExNoVzvM/gRNIi5ndTZqc9d8V4kH93/sXgHOADST9L6lE\n9OGK5/i7/Fg1P2ohaZjUgloACNhE0gERcXmFsK8DPtceIyL+LuntJeI9L/+7QzEc6T1f1mmkEk6r\nw/FuUoIom9C1jOelSdo6Im6X1PHiUPGCdg6wFfAN4DUR8cf8oW9Luq5k2PmkO8njIuLKwvHvSHpR\nyZg/z+e4V0T8vnD8Oklf6TXYwNbQ6zJWR1BE/PnxOpcVKbcuNqRwEY+I31WItzVpaQcBl0RNM4El\nrZXP7eGa4l0P7B+5firp6aRa93NLxpsIXBwRu9Z0fhOAN0bE7DriFeJeFxE7SLoxRvtibo6IZ5eM\ndzupz2AC8E1gfwqJvUzylXRSvnvoNBIpIqL0BU3SrhFR6wgnSWvV9b7M8SaSSp7/U1fMgW2hS9oB\n+BCwGUsmoWf1GOp6UmtHwKbAX/LzJ5FahD3d0nQ4z9cCravzSIVb2la8zUm3XVNZ8vt+bYWYhwAz\nSbd0rdJIAL3+LFtvwlsjYmvySJw6SHomqaXy5Pz/+4C3RsStFUOvEoXOsIj4ZS5llRJpWYvFkiZH\nxAMVz611l/R+UmdYnf4taXVGy2JPo1DOKeGPQKv8dU/hOZS8m4iId+V/a7k4Akh6fafnha/5vfZj\nPfiopE8A/wAuIP39vCcivlkmWH4v1Tpkc2ATOqk29T5gLhXqsxGxOYCkrwHntOrdkl4J7FXlBCV9\nitQzfUY+dLiknSPi6Aphvw+cApxH9bp0yxGk+mnlEST5TXiHpE2rtPA7OAk4stWqyqWSrzFaMijr\nOkknk1qVAG8Gyt5ytzwMzJVU18iZiyW9F/h2W7wqd48zSElnE0lnADuT6v6l1Jl02+UL7H9RaBgB\nXy3Z1/Ga5XwsGO0HKGP3iHi/pNeRSnivBy5n9L1Vxk2S5pDKYcXffanzHNiSi6QrImKXGuPNjYht\nxzrWY8xbgO0ijaVttV5vLHEXUYx5dUQ8b+xX9hTzMmC3iHikpniXk4YCXsOSb8IqdxFLlQOqlAgK\nMZ5A6oNovZd+CnwpIkq3ViV17MuJiFkl4/VliGHucNuJdEd6VUTcVyVeh/gntVrZFeOcDKxC6uuA\n1EH4aES8o2rsOkm6NSKekc/3OxFxQdX3qKTTOhyOiDioVLwBTugvJdXsLmHJnv9yVy7px6Q/5mJL\n7UUR8fIK53gLMNxqSeV6/UjFhL4/sCWpM7TSiAdJR+anzyB1ENUygkTSizsdj4iflImXY54D3EAq\nu0Dq+X9uRLyuQsyJwOkR8eayMZYTe3XSCJJKY5vrtqwOxpYqHY0dvtYNEVF5tEudF3NJb4mIbxbe\n+0uoMmoq35HvRSq5TCeVbc+vuwFWxSCXXN4GbE26chfrvmVvmfYj3Yaek+Ncno9V8UngxtwCFumW\n8aiKMbcltVBewpLfd5kOotZwsFpHkFRJ3MtxEHAMo7/fn+ZjpeXy0GaSVo2If1c9wRZJrwE+Q/pZ\nbi5pO+BjZe9Qah5iePxyPlZ15Ey7e2uK86ikp0We5CbpqRQmWPVozfxv7UMhI+IoSZ8GHsjvrb9R\naa9kkDSFNMx153zop8DhbSNeuo83wC30OyJiqz7EXTMi/jb2K7uO9xRSHR3gmoi4p2K8XwHT6kxA\nhdhrk27nep7N2BbnIXJnGymprQL8reJwzb6QdDqwDWlsb7E8VKWldj0pMY5EDbN5JX2b1Hn/1oh4\nZk7wV0bEdmXPcWWS78ZPI81FEGkgxNvqHqVSh9x5Pw1YrXUsIk6vEO8i4EyWvDN9c0TsVibeILfQ\nr5Q0LSLm1REs9yafTJqctKmkZwPvjoiDS8RqHz/buppuJGmjire1vyDdytXV+mmNGDqN3GqR9ABw\nUERcXyZeFCaCSBKplbJTyXP7fEQcIek8Ri8Sxa9Vui6f/To/JlBfq21RRDygJZfeqNKB/bSI2EfS\nfvDYePZSY707jewoqlCy7NvvKSIuad2V5EN3VOnjgL6NFptBmpA0jTSR6JXAFaTJZWWtHxHFOvrX\nJR1RNtggJ/SdSD3AvyXVfUVqXZatT38OeDmppUZE3KzykwGOBN5F59vbqre1TwJul3QtS9a7qyS2\nU4GDI+KnAJJ2ISX40rX+wnkF8P38Zi9Tbmq1TD5T9Vza5Rr6EyPivTWHvjX3dUzMiegw0rTysuoc\nYtivUR61/56W8/f3PElEtclf/Rgt9kbS8hk3RsTbJG1ItREuAPdLegvwrfz//YDSo9EGOaG/ou6A\nEXFXW8OnVJ2u0LP/yoj4Z/Fjklbr8Cm9mFHx8zt5tJXMASLiCkmlR7y0tQInkGY4/nMZL1+uwl3C\ndhFxQtvXOZy0pkUpuc6589iv7NmhpDkS/yLdLv+YNJu5rJksPcTwbWUCRUSpz+sibuv3dB3wj7aR\nXU8oGfZ9nb4UqaGxCVBludp/RsQXKnx+J//I8wYeyeXLe0nnWcVBpBr650jf+5VUGF46sAk98jR1\nSRtQqFdVUPsaKaQffnsvf6djXetTh+NPJH2V1AoI0vKvI62SUYkSUbEV+AhpTG6lziHSsg4ntB07\nsMOxXtU6zjfbIyI+RErqAEh6U/4aPYuIC3NdvjXE8PCyQwz7Ocoju4S0OmBrxuTqpBFZPc8XiIgl\n7ibyxffDpIlLh1Y7TU7Id42VR4sVXCfpSaT5EdeTfgY/r3SWMKX97jv/HO4qE2xgE7rSDMzjgY1I\nV8LNSAn4GSVD1rZGiqShHGd1SdszOgV6bdL6M2ViXhERu7R1OMJoqalKh2Nr+Fd76397ypWITo6I\nnxUP5Ddhz3X/XDfenzRaZE7hQ08E6liWYTXSLWzxe6w6weRolk7enY51RdIlEfFS0rDS9mO96tso\nj2y1KEx/j4iHVXEzltwp+hHS7+X/IuKiiucI9Y4WS5882t/2FUkXAGtHxC2VzjK1ztsbgJ2OdWVg\nEzrpFnYn0roZ20valdQDXEpu8dQ1HvnlpNbjFNJFp5XQHwQ+WCZg5ElU0YeV56L+WX51vgmvJE0r\nX48l+yQeAqr+sdRaglCaXfwqYGNJxdv5tUl3Kr3GW43UAFhPaU35YsNg4zLnGBFfzWWQByPic2Vi\njOFvkp7TaunmDvd/lAkkaQ/SXc4DwIcj4or6TpM3AU+tebjqYxfZiFjQfqzHWM8n3dWs33Y3tTYV\nSk2DnNAXRcT9SltUTYiIyyR9vmywOnu9I80InCXpDVFhM4sO51hcJ6VW+Y/nGSw53OpjPcao/U2Y\nS2t3As8v8/ljqXmc7x9INeTXkm65Wx4ibVDRq3eTlmXYKMcrNgxOLBEPeKzvYD9SXbZuRwBnS/pD\n/v9TSCW8Ms4jjRC7H3i/0po2j6k4EKC20WL9uPCShvuuRcpFxUbcg6TO11IGOaH/VWnlvcuBMyTd\nS6EGWkI/er2fm6/QfwXIv+z/iYhSS8lGn9ZJUVqGcw3SAvonk94w15QI1Zc3YT7HnUiJd5v8dSZS\nz9j200gdl2/K/39LPtbzON+IuBm4WdKZkdcZyb/zTSLiLyXinUCq9R4aEb2uoT6Wn0k6kaXXhylV\nQ5a0I3BXRFyrtNLmu0lrmVwAdFq6oBt9Wx+GekeL1X7hzX1lP5H09UJ/4QRgraiyA9IATyxak3Qr\nN4FUKpkMnBElF5hSf9ZIeWxp0sKxStOh1Z91Um6JiGcV/l0L+FFEvLBkvM2ibW31qpTWqN6XVIfe\ngbQhx9Oj2kJnSLqpfYJOp2M9xhwhtdInkf7A7yVNBCrTSm91qF4QEQ9J+jCpdPWJKh14qnlJWkk3\nAC+LiD/n4YZnke54twO2iYhKF/TC13lOle+7EKcfy1PUfuGVdCapf+9R4FpSq/+EiDiuTLyBbKHn\n0sP5ufa7mNFFe6roR6/3RElPaE2CyGOJSw3hkrQFab3yj7R96IWkGnMVrSGFf5e0EekW9ykV4j1B\n0kksXb6qNK08In4laWKkzZ1Pk3QjqbOxilrH+WaTI+JBSe8grRUzQ2ldn7I+EhFnK80PeBlwHGmL\nwNINkD70m0yM0dUf9wFOyuXG70q6qcavczI17IbUj9FiEfHFPFJuKku+76tMLJqW30tvJm1jeBSp\nkdCchB41rzmd1d7rTVo29xKlFdNE6igte/H5PHB0RMwtHpT0Z+D/SOWiss7Lw62OIy2AFaShV2Wd\nDXyF9MdXds2Ndn9X2iruJqX1Mv5IujurqtM436odpZOUlnzYm8LQxQpaP8M9SInyB0rrbpemtMrk\nG1g6+fTUb1IwUdKkSCt2vpQ0sa6lzjxS125ItZfwJH0DeBpwE6O/s6DaTNFV8jDqvYATI20NWbps\nMpAJPat7zenae70j4lhJN5NaVUGaYLJZyXAbtifz/DXmSppa9hxzXa5V5/+upPNJQ8+qXCgfiYgv\nV/j8Tv6D9Ed3CKmDcRNSQqokl4aqLh/Q7mOk3/UVuab8VNL2ZGXdrTRPYDfg2JyMq17MziWNHrme\nahtbtHyLVPO9j1QKbc063iJ/nbocU1OcE+lQwqsYcwdSi7rOOvVXSfM4bgYul7QZqTZfyiDX0Ote\nc/r7wLsiorY1UnLc7UnjqN9E6hz6bkT03FEiaX5EbLmMj/0qIraocI5L1fqrkDSTVDc+hyXLVwOz\nnZ+kL9JhzZGWCg0DJK1bti9nGfHWIM2MnhsR83Prf9uIuLBCzNKLhS0n5k6kUt2FkRe4U9rSb62q\ndW9Jz2Lpu4nScwU0ugXfLZGXC6n6dyDpbOCwGN2ftC8Kd0I9G9gWeqTd2tfPz/9UQ8jaer3zm3i/\n/LiPNJJAFeuW10l6Z0QsUQrJddpSi2gVXCLpDcD3ampdtC62xanbAZTekEFps+qPM7rlYNUJVcVd\niY6h3iUVrsp149NIncuVfqaRFuO6l7QJx3zSmPYqLX5Ii9tt2+mur6yIuKrDsV9WjSvpVNJ0/1up\nZ6ls6E8Jbz1gnqRrqJ5DljujlyW3+Os+7qC10CWJ9Md3COkXINIb/IsV6n+19npLWky65Xx7RPwq\nH/tNVNhhRmmhn3OAfzOawHcg1f9eFxWW5VWafbom6ef4T+qZfVorpWWDX09qpdb6puzDHYpIZbaD\nSEsnzwaJDO4OAAAHnUlEQVS+Xja55c76HUhroD89d1yfHRE9r0Mj6RekpDiJtFHKb6hncbu+kTQv\nIqbVHHMz0h66q5JKeJNJO1X9qkLMOnPIuyNNAuvY0IiIUqWnQUzoR5KWpXxXRPw2H3sqqdf/gqgw\n+y0nzeLa5aXKL5L2ItXndiaNwz2LNB2+0obTOfauQOtW+daIuLRqzLqp3g0ZWjEvA14aedGnOlUd\nSjpG7F1JK+6tSaqDHhURPa3vkVv72wM3xOj66reUSb6S/kIaSthR3cNN6yDpFOD4qGGpbNW/1+1K\nZRAT+o2k/S/vazu+Pql2V6qlJWlv0iiPEVJr5YXA+yLiOxXOdU3SolT7kUbLnE7aiLp07bMf1Hlb\nsgeAO8vU6tSHDRmUJq58nLS6YuVt8tpi15rQlfbqfAupI3chaQTSHFIiPbvXC7ukayJieus88/vq\n5yUTet8uXv2SW75zSItyVbqbKH7/kr4bEZU71guxaxs5oyWXjlhK2T6eQayhr9KezCHV0fPwnrI+\nBOzYapXnC8TFQOmEnjuGzgTOVJox+CbgA6Sx7oPkS6Sxva166rakqdGTJf1XiQtQbRsyFPwvaWTT\natSwTZ6WXORsDUmtkQN1lJt+TloffK9YcgmB65Rm5fZqdh7l8iRJ7ySVcsoOK91gOXXZWi6QfXAK\n6eI4l+qzuIvvw0qbbHdQ58iZYr9YbX08g5jQlzessMqQwwltJZb7qWecMwCRpn6flB+D5g+kev+t\nAJKmkYbevZ/U8dRrQq9zQ4aWjeoclRF9WOSsYKtl1fkj4thugyjtTHMlaQ7CrqThalsBH43yKw5O\nJC3PUMt47sfJnyJiztgv60os43k9wWua/FYcrSfpiLKj99oNYkJ/dqE1VSSqrYt+gaQfMzpjcB/S\nNlLjwdNbyRwgIuYpbaP3m5IN65ksvSHDgRXP8YeSdh+0clWRCsv7dvq5lRjtMIWUzLcmtU5/Rkrw\nVUY1/bHK4IEV5EalKfDnsWS5rcwol1b+EGl56zrvzPo1+a22C8/A1dDrlic+bBgRP1PaaWeX/KG/\nktaG+fWKO7vHR655/5nUeQvpYrYe6Tb3iojYcVmfu5yY6zK6IcNVncpkPcZrjcT5F7CIwRyJ8yfS\nxgPfAq6mrRVcZrRDjrsq6Rb+BaRVJ58P/LXMyI+6R/Q8HpRmWreLiDjocT+Z5ejHyJkct7Z+j/GQ\n0M+n85T6bUmL6S9vD8ZGyOWRgxm9mP2MVFf/J7BGFDYs6DLeeaS+gzm5H2FcUFpjaDdSJ/izSBtS\nfKt491My7mRSEt85//sk0vDNnpcokPTkGKAJXk3Qj5Ez7X08wN9bH6JCQ2Y8JPRrl9UClTQ3IrZ9\nvM9pRcitwK1Ib6I7Ii//WjLWi0mt/D1IK8SdRVpMred9RXPp5/ZljMSpunha3yhNz9+PNHLqmCg3\nO/gk0hr1D5Fa/FeR7nZ6Xop3ZaQ+zuatUz9HztRtEGvodXvScj62+uN2FiuQpGHSomELSC2ATSQd\nECV3VY/RtZwnkoZrvhM4lbT0Z6+OJC30dHyHj1VdPK12OZHvQUrmU4EvkCaElbEpaXXO+aRtEX9P\nKgWOF9eN/ZKB0M+RM7UaDy30bwGXRucp9btFRNndVlYaShsQ7x8Rd+T/P51UKnhuhZirkzaL3oc0\nJPL8iKi6se9Ak3Q6adLXD4GzIuIXNcQUqZX+gvx4Jqm/4+cRUedyBVZSWwt9oMf5j4eE3rcp9SuL\nTrMOy85EzJ87G5hOGunybeAndczwVP1rTddKacmHVp9BrRt5K22VtzMpqb8aWDcilnd3udKT9PmI\nOCL3ySyViEqMGuoLSY+Sfu8i3dXXUu/uh8Yn9JaVYUp9vygtfrSYNEUd0g5QE8uOIpD0ctLm3XWt\nhY6Wsdb0oNRR+0HSYYy2zBeRhiy2HnPruEgOMknPjYjr1YfdhcarcZPQx7Nc9/1vRke5/JQ03Kqn\nyUB52OcylRw33Ip9G/WvNT3QJH2WPPY8+rwk6yAa7+uu9IMTunWtMF54A1KrsnWXsyspKb26QuzH\nZa1pGxwr0+iRlcV4GOUybkmaHRF7S5pL5xplTzX01rhopV2kprWSr9KGDF+veLqd1pqOiNizYlwb\nXCvN6JGVhRN6sx2e/y3dcl6GKW0t6YWU33qvZWbheWs1zH0rxrTB1td1V8Yjl1zGGUnrAfdXqVVL\nOpG0eUJxXZz5VTswtfR2ft+LiC9WiWmDa2UaPbKycAu9wfL6zZ8ijWv+OGnJ1/WACZLeGhEXlIkb\nEYdIeh3wonzoSmCo5Dn2Yzs/WwlExMQVfQ5N44TebCcCHyQtInQp8MqIuErS1qTWdamEni0gdYw+\ntjl2yTi3k0bdvDpGt/N7T4XzMhu3nNCbbVJrOVpJH4u8yW9eO6XnYH1qTb+eVCu/TFJrO7+VaS1v\ns4FR2wYPNpCKE1P+0faxMjX020lrq7w6InbJ9e1Kk4si4vsRsS9pTfDLgCNIu+58WdLuVWKbjTfu\nFG2wMTqdVouInrb0Ux83x277Oq3t/PaJiJfWGdusyZzQrWdaSTbHNhtvnNCtEremzQaHE7qZWUO4\nU9TMrCGc0M3MGsIJ3cysIZzQzcwa4v8DeCLSWObwHAgAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x9c33eb8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# bar plot of the 'value_counts' for the 'genre' Series\n",
    "movies.genre.value_counts().plot(kind='bar')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`plot`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.plot.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 16. How do I handle missing values in pandas? ([video](https://www.youtube.com/watch?v=fCMrO_VzeL8&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=16))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>18236</th>\n",
       "      <td>Grant Park</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>IL</td>\n",
       "      <td>12/31/2000 23:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18237</th>\n",
       "      <td>Spirit Lake</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>IA</td>\n",
       "      <td>12/31/2000 23:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18238</th>\n",
       "      <td>Eagle River</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>WI</td>\n",
       "      <td>12/31/2000 23:45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18239</th>\n",
       "      <td>Eagle River</td>\n",
       "      <td>RED</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>WI</td>\n",
       "      <td>12/31/2000 23:45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18240</th>\n",
       "      <td>Ybor</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>FL</td>\n",
       "      <td>12/31/2000 23:59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              City Colors Reported Shape Reported State              Time\n",
       "18236   Grant Park             NaN       TRIANGLE    IL  12/31/2000 23:00\n",
       "18237  Spirit Lake             NaN           DISK    IA  12/31/2000 23:00\n",
       "18238  Eagle River             NaN            NaN    WI  12/31/2000 23:45\n",
       "18239  Eagle River             RED          LIGHT    WI  12/31/2000 23:45\n",
       "18240         Ybor             NaN           OVAL    FL  12/31/2000 23:59"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of UFO reports into a DataFrame\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports')\n",
    "ufo.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**What does \"NaN\" mean?**\n",
    "\n",
    "- \"NaN\" is not a string, rather it's a special value: **`numpy.nan`**.\n",
    "- It stands for \"Not a Number\" and indicates a **missing value**.\n",
    "- **`read_csv`** detects missing values (by default) when reading the file, and replaces them with this special value.\n",
    "\n",
    "Documentation for [**`read_csv`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>18236</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18237</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18238</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18239</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18240</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        City Colors Reported Shape Reported  State   Time\n",
       "18236  False            True          False  False  False\n",
       "18237  False            True          False  False  False\n",
       "18238  False            True           True  False  False\n",
       "18239  False           False          False  False  False\n",
       "18240  False            True          False  False  False"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'isnull' returns a DataFrame of booleans (True if missing, False if not missing)\n",
    "ufo.isnull().tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>18236</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18237</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18238</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18239</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18240</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       City Colors Reported Shape Reported State  Time\n",
       "18236  True           False           True  True  True\n",
       "18237  True           False           True  True  True\n",
       "18238  True           False          False  True  True\n",
       "18239  True            True           True  True  True\n",
       "18240  True           False           True  True  True"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'nonnull' returns the opposite of 'isnull' (True if not missing, False if missing)\n",
    "ufo.notnull().tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`isnull`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isnull.html) and [**`notnull`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.notnull.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "City                  25\n",
       "Colors Reported    15359\n",
       "Shape Reported      2644\n",
       "State                  0\n",
       "Time                   0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count the number of missing values in each Series\n",
    "ufo.isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This calculation works because:\n",
    "\n",
    "1. The **`sum`** method for a DataFrame operates on **`axis=0`** by default (and thus produces column sums).\n",
    "2. In order to add boolean values, pandas converts **`True`** to **1** and **`False`** to **0**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LA</td>\n",
       "      <td>8/15/1943 0:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>LA</td>\n",
       "      <td>8/15/1943 0:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>204</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>CA</td>\n",
       "      <td>7/15/1952 12:30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>241</th>\n",
       "      <td>NaN</td>\n",
       "      <td>BLUE</td>\n",
       "      <td>DISK</td>\n",
       "      <td>MT</td>\n",
       "      <td>7/4/1953 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>613</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>NV</td>\n",
       "      <td>7/1/1960 12:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    City Colors Reported Shape Reported State             Time\n",
       "21   NaN             NaN            NaN    LA   8/15/1943 0:00\n",
       "22   NaN             NaN          LIGHT    LA   8/15/1943 0:00\n",
       "204  NaN             NaN           DISK    CA  7/15/1952 12:30\n",
       "241  NaN            BLUE           DISK    MT   7/4/1953 14:00\n",
       "613  NaN             NaN           DISK    NV   7/1/1960 12:00"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use the 'isnull' Series method to filter the DataFrame rows\n",
    "ufo[ufo.City.isnull()].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**How to handle missing values** depends on the dataset as well as the nature of your analysis. Here are some options:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18241, 5)"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the number of rows and columns\n",
    "ufo.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2486, 5)"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if 'any' values are missing in a row, then drop that row\n",
    "ufo.dropna(how='any').shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`dropna`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18241, 5)"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'inplace' parameter for 'dropna' is False by default, thus rows were only dropped temporarily\n",
    "ufo.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18241, 5)"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if 'all' values are missing in a row, then drop that row (none are dropped in this case)\n",
    "ufo.dropna(how='all').shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(15576, 5)"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if 'any' values are missing in a row (considering only 'City' and 'Shape Reported'), then drop that row\n",
    "ufo.dropna(subset=['City', 'Shape Reported'], how='any').shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18237, 5)"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if 'all' values are missing in a row (considering only 'City' and 'Shape Reported'), then drop that row\n",
    "ufo.dropna(subset=['City', 'Shape Reported'], how='all').shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LIGHT       2803\n",
       "DISK        2122\n",
       "TRIANGLE    1889\n",
       "OTHER       1402\n",
       "CIRCLE      1365\n",
       "Name: Shape Reported, dtype: int64"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'value_counts' does not include missing values by default\n",
    "ufo['Shape Reported'].value_counts().head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LIGHT       2803\n",
       "NaN         2644\n",
       "DISK        2122\n",
       "TRIANGLE    1889\n",
       "OTHER       1402\n",
       "Name: Shape Reported, dtype: int64"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# explicitly include missing values\n",
    "ufo['Shape Reported'].value_counts(dropna=False).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`value_counts`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# fill in missing values with a specified value\n",
    "ufo['Shape Reported'].fillna(value='VARIOUS', inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`fillna`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "VARIOUS     2977\n",
       "LIGHT       2803\n",
       "DISK        2122\n",
       "TRIANGLE    1889\n",
       "OTHER       1402\n",
       "Name: Shape Reported, dtype: int64"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# confirm that the missing values were filled in\n",
    "ufo['Shape Reported'].value_counts().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Working with missing data in pandas](http://pandas.pydata.org/pandas-docs/stable/missing_data.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 17. What do I need to know about the pandas index? (Part 1) ([video](https://www.youtube.com/watch?v=OYZNk7Z9s6I&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=17))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=193, step=1)"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# every DataFrame has an index (sometimes called the \"row labels\")\n",
    "drinks.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'country', u'beer_servings', u'spirit_servings', u'wine_servings',\n",
       "       u'total_litres_of_pure_alcohol', u'continent'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# column names are also stored in a special \"index\" object\n",
    "drinks.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(193, 6)"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# neither the index nor the columns are included in the shape\n",
    "drinks.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>24</td>\n",
       "      <td>M</td>\n",
       "      <td>technician</td>\n",
       "      <td>85711</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>53</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>94043</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>23</td>\n",
       "      <td>M</td>\n",
       "      <td>writer</td>\n",
       "      <td>32067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>24</td>\n",
       "      <td>M</td>\n",
       "      <td>technician</td>\n",
       "      <td>43537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>33</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>15213</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0   1  2           3      4\n",
       "0  1  24  M  technician  85711\n",
       "1  2  53  F       other  94043\n",
       "2  3  23  M      writer  32067\n",
       "3  4  24  M  technician  43537\n",
       "4  5  33  F       other  15213"
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# index and columns both default to integers if you don't define them\n",
    "pd.read_table('http://bit.ly/movieusers', header=None, sep='|').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**What is the index used for?**\n",
    "\n",
    "1. identification\n",
    "2. selection\n",
    "3. alignment (covered in the next video)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Argentina</td>\n",
       "      <td>193</td>\n",
       "      <td>25</td>\n",
       "      <td>221</td>\n",
       "      <td>8.3</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Bolivia</td>\n",
       "      <td>167</td>\n",
       "      <td>41</td>\n",
       "      <td>8</td>\n",
       "      <td>3.8</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>245</td>\n",
       "      <td>145</td>\n",
       "      <td>16</td>\n",
       "      <td>7.2</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>Chile</td>\n",
       "      <td>130</td>\n",
       "      <td>124</td>\n",
       "      <td>172</td>\n",
       "      <td>7.6</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>Colombia</td>\n",
       "      <td>159</td>\n",
       "      <td>76</td>\n",
       "      <td>3</td>\n",
       "      <td>4.2</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>Ecuador</td>\n",
       "      <td>162</td>\n",
       "      <td>74</td>\n",
       "      <td>3</td>\n",
       "      <td>4.2</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72</th>\n",
       "      <td>Guyana</td>\n",
       "      <td>93</td>\n",
       "      <td>302</td>\n",
       "      <td>1</td>\n",
       "      <td>7.1</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132</th>\n",
       "      <td>Paraguay</td>\n",
       "      <td>213</td>\n",
       "      <td>117</td>\n",
       "      <td>74</td>\n",
       "      <td>7.3</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133</th>\n",
       "      <td>Peru</td>\n",
       "      <td>163</td>\n",
       "      <td>160</td>\n",
       "      <td>21</td>\n",
       "      <td>6.1</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>163</th>\n",
       "      <td>Suriname</td>\n",
       "      <td>128</td>\n",
       "      <td>178</td>\n",
       "      <td>7</td>\n",
       "      <td>5.6</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185</th>\n",
       "      <td>Uruguay</td>\n",
       "      <td>115</td>\n",
       "      <td>35</td>\n",
       "      <td>220</td>\n",
       "      <td>6.6</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188</th>\n",
       "      <td>Venezuela</td>\n",
       "      <td>333</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>7.7</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "6    Argentina            193               25            221   \n",
       "20     Bolivia            167               41              8   \n",
       "23      Brazil            245              145             16   \n",
       "35       Chile            130              124            172   \n",
       "37    Colombia            159               76              3   \n",
       "52     Ecuador            162               74              3   \n",
       "72      Guyana             93              302              1   \n",
       "132   Paraguay            213              117             74   \n",
       "133       Peru            163              160             21   \n",
       "163   Suriname            128              178              7   \n",
       "185    Uruguay            115               35            220   \n",
       "188  Venezuela            333              100              3   \n",
       "\n",
       "     total_litres_of_pure_alcohol      continent  \n",
       "6                             8.3  South America  \n",
       "20                            3.8  South America  \n",
       "23                            7.2  South America  \n",
       "35                            7.6  South America  \n",
       "37                            4.2  South America  \n",
       "52                            4.2  South America  \n",
       "72                            7.1  South America  \n",
       "132                           7.3  South America  \n",
       "133                           6.1  South America  \n",
       "163                           5.6  South America  \n",
       "185                           6.6  South America  \n",
       "188                           7.7  South America  "
      ]
     },
     "execution_count": 128,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# identification: index remains with each row when filtering the DataFrame\n",
    "drinks[drinks.continent=='South America']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "245"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# selection: select a portion of the DataFrame using the index\n",
    "drinks.loc[23, 'beer_servings']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`loc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andorra</th>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             beer_servings  spirit_servings  wine_servings  \\\n",
       "country                                                      \n",
       "Afghanistan              0                0              0   \n",
       "Albania                 89              132             54   \n",
       "Algeria                 25                0             14   \n",
       "Andorra                245              138            312   \n",
       "Angola                 217               57             45   \n",
       "\n",
       "             total_litres_of_pure_alcohol continent  \n",
       "country                                              \n",
       "Afghanistan                           0.0      Asia  \n",
       "Albania                               4.9    Europe  \n",
       "Algeria                               0.7    Africa  \n",
       "Andorra                              12.4    Europe  \n",
       "Angola                                5.9    Africa  "
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# set an existing column as the index\n",
    "drinks.set_index('country', inplace=True)\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`set_index`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'Afghanistan', u'Albania', u'Algeria', u'Andorra', u'Angola',\n",
       "       u'Antigua & Barbuda', u'Argentina', u'Armenia', u'Australia',\n",
       "       u'Austria',\n",
       "       ...\n",
       "       u'Tanzania', u'USA', u'Uruguay', u'Uzbekistan', u'Vanuatu',\n",
       "       u'Venezuela', u'Vietnam', u'Yemen', u'Zambia', u'Zimbabwe'],\n",
       "      dtype='object', name=u'country', length=193)"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'country' is now the index\n",
    "drinks.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'beer_servings', u'spirit_servings', u'wine_servings',\n",
       "       u'total_litres_of_pure_alcohol', u'continent'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'country' is no longer a column\n",
    "drinks.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(193, 5)"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'country' data is no longer part of the DataFrame contents\n",
    "drinks.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "245"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# country name can now be used for selection\n",
    "drinks.loc['Brazil', 'beer_servings']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andorra</th>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             beer_servings  spirit_servings  wine_servings  \\\n",
       "Afghanistan              0                0              0   \n",
       "Albania                 89              132             54   \n",
       "Algeria                 25                0             14   \n",
       "Andorra                245              138            312   \n",
       "Angola                 217               57             45   \n",
       "\n",
       "             total_litres_of_pure_alcohol continent  \n",
       "Afghanistan                           0.0      Asia  \n",
       "Albania                               4.9    Europe  \n",
       "Algeria                               0.7    Africa  \n",
       "Andorra                              12.4    Europe  \n",
       "Angola                                5.9    Africa  "
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# index name is optional\n",
    "drinks.index.name = None\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# restore the index name, and move the index back to a column\n",
    "drinks.index.name = 'country'\n",
    "drinks.reset_index(inplace=True)\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`reset_index`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>193.000000</td>\n",
       "      <td>193.000000</td>\n",
       "      <td>193.000000</td>\n",
       "      <td>193.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>106.160622</td>\n",
       "      <td>80.994819</td>\n",
       "      <td>49.450777</td>\n",
       "      <td>4.717098</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>101.143103</td>\n",
       "      <td>88.284312</td>\n",
       "      <td>79.697598</td>\n",
       "      <td>3.773298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>20.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.300000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>76.000000</td>\n",
       "      <td>56.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>4.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>188.000000</td>\n",
       "      <td>128.000000</td>\n",
       "      <td>59.000000</td>\n",
       "      <td>7.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>376.000000</td>\n",
       "      <td>438.000000</td>\n",
       "      <td>370.000000</td>\n",
       "      <td>14.400000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       beer_servings  spirit_servings  wine_servings  \\\n",
       "count     193.000000       193.000000     193.000000   \n",
       "mean      106.160622        80.994819      49.450777   \n",
       "std       101.143103        88.284312      79.697598   \n",
       "min         0.000000         0.000000       0.000000   \n",
       "25%        20.000000         4.000000       1.000000   \n",
       "50%        76.000000        56.000000       8.000000   \n",
       "75%       188.000000       128.000000      59.000000   \n",
       "max       376.000000       438.000000     370.000000   \n",
       "\n",
       "       total_litres_of_pure_alcohol  \n",
       "count                    193.000000  \n",
       "mean                       4.717098  \n",
       "std                        3.773298  \n",
       "min                        0.000000  \n",
       "25%                        1.300000  \n",
       "50%                        4.200000  \n",
       "75%                        7.200000  \n",
       "max                       14.400000  "
      ]
     },
     "execution_count": 137,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# many DataFrame methods output a DataFrame\n",
    "drinks.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20.0"
      ]
     },
     "execution_count": 138,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can interact with any DataFrame using its index and columns\n",
    "drinks.describe().loc['25%', 'beer_servings']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Indexing and selecting data](http://pandas.pydata.org/pandas-docs/stable/indexing.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 18. What do I need to know about the pandas index? (Part 2) ([video](https://www.youtube.com/watch?v=15q-is8P_H4&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=18))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 139,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=193, step=1)"
      ]
     },
     "execution_count": 140,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# every DataFrame has an index\n",
    "drinks.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      Asia\n",
       "1    Europe\n",
       "2    Africa\n",
       "3    Europe\n",
       "4    Africa\n",
       "Name: continent, dtype: object"
      ]
     },
     "execution_count": 141,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# every Series also has an index (which carries over from the DataFrame)\n",
    "drinks.continent.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# set 'country' as the index\n",
    "drinks.set_index('country', inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`set_index`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country\n",
       "Afghanistan      Asia\n",
       "Albania        Europe\n",
       "Algeria        Africa\n",
       "Andorra        Europe\n",
       "Angola         Africa\n",
       "Name: continent, dtype: object"
      ]
     },
     "execution_count": 143,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Series index is on the left, values are on the right\n",
    "drinks.continent.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Africa           53\n",
       "Europe           45\n",
       "Asia             44\n",
       "North America    23\n",
       "Oceania          16\n",
       "South America    12\n",
       "Name: continent, dtype: int64"
      ]
     },
     "execution_count": 144,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# another example of a Series (output from the 'value_counts' method)\n",
    "drinks.continent.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`value_counts`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'Africa', u'Europe', u'Asia', u'North America', u'Oceania',\n",
       "       u'South America'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 145,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# access the Series index\n",
    "drinks.continent.value_counts().index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([53, 45, 44, 23, 16, 12], dtype=int64)"
      ]
     },
     "execution_count": 146,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# access the Series values\n",
    "drinks.continent.value_counts().values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "53"
      ]
     },
     "execution_count": 147,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# elements in a Series can be selected by index (using bracket notation)\n",
    "drinks.continent.value_counts()['Africa']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "South America    12\n",
       "Oceania          16\n",
       "North America    23\n",
       "Asia             44\n",
       "Europe           45\n",
       "Africa           53\n",
       "Name: continent, dtype: int64"
      ]
     },
     "execution_count": 148,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# any Series can be sorted by its values\n",
    "drinks.continent.value_counts().sort_values()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Africa           53\n",
       "Asia             44\n",
       "Europe           45\n",
       "North America    23\n",
       "Oceania          16\n",
       "South America    12\n",
       "Name: continent, dtype: int64"
      ]
     },
     "execution_count": 149,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# any Series can also be sorted by its index\n",
    "drinks.continent.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`sort_values`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_values.html) and [**`sort_index`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_index.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**What is the index used for?**\n",
    "\n",
    "1. identification (covered in the previous video)\n",
    "2. selection (covered in the previous video)\n",
    "3. alignment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country\n",
       "Afghanistan      0\n",
       "Albania         89\n",
       "Algeria         25\n",
       "Andorra        245\n",
       "Angola         217\n",
       "Name: beer_servings, dtype: int64"
      ]
     },
     "execution_count": 150,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'beer_servings' Series contains the average annual beer servings per person\n",
    "drinks.beer_servings.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Albania    3000000\n",
       "Andorra      85000\n",
       "Name: population, dtype: int64"
      ]
     },
     "execution_count": 151,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a Series containing the population of two countries\n",
    "people = pd.Series([3000000, 85000], index=['Albania', 'Andorra'], name='population')\n",
    "people"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`Series`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Afghanistan            NaN\n",
       "Albania        267000000.0\n",
       "Algeria                NaN\n",
       "Andorra         20825000.0\n",
       "Angola                 NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 152,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the total annual beer servings for each country\n",
    "(drinks.beer_servings * people).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The two Series were **aligned** by their indexes.\n",
    "- If a value is missing in either Series, the result is marked as **NaN**.\n",
    "- Alignment enables us to easily work with **incomplete data**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 153,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "      <td>3000000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andorra</th>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "      <td>85000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             beer_servings  spirit_servings  wine_servings  \\\n",
       "Afghanistan              0                0              0   \n",
       "Albania                 89              132             54   \n",
       "Algeria                 25                0             14   \n",
       "Andorra                245              138            312   \n",
       "Angola                 217               57             45   \n",
       "\n",
       "             total_litres_of_pure_alcohol continent  population  \n",
       "Afghanistan                           0.0      Asia         NaN  \n",
       "Albania                               4.9    Europe   3000000.0  \n",
       "Algeria                               0.7    Africa         NaN  \n",
       "Andorra                              12.4    Europe     85000.0  \n",
       "Angola                                5.9    Africa         NaN  "
      ]
     },
     "execution_count": 153,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# concatenate the 'drinks' DataFrame with the 'population' Series (aligns by the index)\n",
    "pd.concat([drinks, people], axis=1).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`concat`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html)\n",
    "\n",
    "[Indexing and selecting data](http://pandas.pydata.org/pandas-docs/stable/indexing.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 19. How do I select multiple rows and columns from a pandas DataFrame? ([video](https://www.youtube.com/watch?v=xvpNA7bC8cs&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=19))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State             Time\n",
       "0       Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1  Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2      Holyoke             NaN           OVAL    CO  2/15/1931 14:00"
      ]
     },
     "execution_count": 154,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of UFO reports into a DataFrame\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports')\n",
    "ufo.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [**`loc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) method is used to select rows and columns by **label**. You can pass it:\n",
    "\n",
    "- A single label\n",
    "- A list of labels\n",
    "- A slice of labels\n",
    "- A boolean Series\n",
    "- A colon (which indicates \"all labels\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "City                       Ithaca\n",
       "Colors Reported               NaN\n",
       "Shape Reported           TRIANGLE\n",
       "State                          NY\n",
       "Time               6/1/1930 22:00\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 155,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# row 0, all columns\n",
    "ufo.loc[0, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State             Time\n",
       "0       Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1  Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2      Holyoke             NaN           OVAL    CO  2/15/1931 14:00"
      ]
     },
     "execution_count": 156,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows 0 and 1 and 2, all columns\n",
    "ufo.loc[[0, 1, 2], :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State             Time\n",
       "0       Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1  Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2      Holyoke             NaN           OVAL    CO  2/15/1931 14:00"
      ]
     },
     "execution_count": 157,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows 0 through 2 (inclusive), all columns\n",
    "ufo.loc[0:2, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State             Time\n",
       "0       Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1  Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2      Holyoke             NaN           OVAL    CO  2/15/1931 14:00"
      ]
     },
     "execution_count": 158,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# this implies \"all columns\", but explicitly stating \"all columns\" is better\n",
    "ufo.loc[0:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0         Ithaca\n",
       "1    Willingboro\n",
       "2        Holyoke\n",
       "Name: City, dtype: object"
      ]
     },
     "execution_count": 159,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows 0 through 2 (inclusive), column 'City'\n",
    "ufo.loc[0:2, 'City']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NJ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>CO</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City State\n",
       "0       Ithaca    NY\n",
       "1  Willingboro    NJ\n",
       "2      Holyoke    CO"
      ]
     },
     "execution_count": 160,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows 0 through 2 (inclusive), columns 'City' and 'State'\n",
    "ufo.loc[0:2, ['City', 'State']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NJ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>CO</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City State\n",
       "0       Ithaca    NY\n",
       "1  Willingboro    NJ\n",
       "2      Holyoke    CO"
      ]
     },
     "execution_count": 161,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# accomplish the same thing using double brackets - but using 'loc' is preferred since it's more explicit\n",
    "ufo[['City', 'State']].head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State\n",
       "0       Ithaca             NaN       TRIANGLE    NY\n",
       "1  Willingboro             NaN          OTHER    NJ\n",
       "2      Holyoke             NaN           OVAL    CO"
      ]
     },
     "execution_count": 162,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows 0 through 2 (inclusive), columns 'City' through 'State' (inclusive)\n",
    "ufo.loc[0:2, 'City':'State']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State\n",
       "0       Ithaca             NaN       TRIANGLE    NY\n",
       "1  Willingboro             NaN          OTHER    NJ\n",
       "2      Holyoke             NaN           OVAL    CO"
      ]
     },
     "execution_count": 163,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# accomplish the same thing using 'head' and 'drop'\n",
    "ufo.head(3).drop('Time', axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1694     CA\n",
       "2144     CA\n",
       "4686     MD\n",
       "7293     CA\n",
       "8488     CA\n",
       "8768     CA\n",
       "10816    OR\n",
       "10948    CA\n",
       "11045    CA\n",
       "12322    CA\n",
       "12941    CA\n",
       "16803    MD\n",
       "17322    CA\n",
       "Name: State, dtype: object"
      ]
     },
     "execution_count": 164,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows in which the 'City' is 'Oakland', column 'State'\n",
    "ufo.loc[ufo.City=='Oakland', 'State']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1694     CA\n",
       "2144     CA\n",
       "4686     MD\n",
       "7293     CA\n",
       "8488     CA\n",
       "8768     CA\n",
       "10816    OR\n",
       "10948    CA\n",
       "11045    CA\n",
       "12322    CA\n",
       "12941    CA\n",
       "16803    MD\n",
       "17322    CA\n",
       "Name: State, dtype: object"
      ]
     },
     "execution_count": 165,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# accomplish the same thing using \"chained indexing\" - but using 'loc' is preferred since chained indexing can cause problems\n",
    "ufo[ufo.City=='Oakland'].State"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [**`iloc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html) method is used to select rows and columns by **integer position**. You can pass it:\n",
    "\n",
    "- A single integer position\n",
    "- A list of integer positions\n",
    "- A slice of integer positions\n",
    "- A colon (which indicates \"all integer positions\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 166,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NJ</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City State\n",
       "0       Ithaca    NY\n",
       "1  Willingboro    NJ"
      ]
     },
     "execution_count": 166,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows in positions 0 and 1, columns in positions 0 and 3\n",
    "ufo.iloc[[0, 1], [0, 3]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State\n",
       "0       Ithaca             NaN       TRIANGLE    NY\n",
       "1  Willingboro             NaN          OTHER    NJ"
      ]
     },
     "execution_count": 167,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows in positions 0 through 2 (exclusive), columns in positions 0 through 4 (exclusive)\n",
    "ufo.iloc[0:2, 0:4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State             Time\n",
       "0       Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1  Willingboro             NaN          OTHER    NJ  6/30/1930 20:00"
      ]
     },
     "execution_count": 168,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows in positions 0 through 2 (exclusive), all columns\n",
    "ufo.iloc[0:2, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State             Time\n",
       "0       Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1  Willingboro             NaN          OTHER    NJ  6/30/1930 20:00"
      ]
     },
     "execution_count": 169,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# accomplish the same thing - but using 'iloc' is preferred since it's more explicit\n",
    "ufo[0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [**`ix`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ix.html) method is used to select rows and columns by **label or integer position**, and should only be used when you need to mix label-based and integer-based selection in the same call."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andorra</th>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             beer_servings  spirit_servings  wine_servings  \\\n",
       "country                                                      \n",
       "Afghanistan              0                0              0   \n",
       "Albania                 89              132             54   \n",
       "Algeria                 25                0             14   \n",
       "Andorra                245              138            312   \n",
       "Angola                 217               57             45   \n",
       "\n",
       "             total_litres_of_pure_alcohol continent  \n",
       "country                                              \n",
       "Afghanistan                           0.0      Asia  \n",
       "Albania                               4.9    Europe  \n",
       "Algeria                               0.7    Africa  \n",
       "Andorra                              12.4    Europe  \n",
       "Angola                                5.9    Africa  "
      ]
     },
     "execution_count": 170,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame and set 'country' as the index\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry', index_col='country')\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "89"
      ]
     },
     "execution_count": 171,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# row with label 'Albania', column in position 0\n",
    "drinks.ix['Albania', 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "89"
      ]
     },
     "execution_count": 172,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# row in position 1, column with label 'beer_servings'\n",
    "drinks.ix[1, 'beer_servings']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Rules for using numbers with `ix`:**\n",
    "\n",
    "- If the index is **strings**, numbers are treated as **integer positions**, and thus slices are **exclusive** on the right.\n",
    "- If the index is **integers**, numbers are treated as **labels**, and thus slices are **inclusive**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andorra</th>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         beer_servings  spirit_servings\n",
       "country                                \n",
       "Albania             89              132\n",
       "Algeria             25                0\n",
       "Andorra            245              138"
      ]
     },
     "execution_count": 173,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows 'Albania' through 'Andorra' (inclusive), columns in positions 0 through 2 (exclusive)\n",
    "drinks.ix['Albania':'Andorra', 0:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported\n",
       "0       Ithaca             NaN\n",
       "1  Willingboro             NaN\n",
       "2      Holyoke             NaN"
      ]
     },
     "execution_count": 174,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows 0 through 2 (inclusive), columns in positions 0 through 2 (exclusive)\n",
    "ufo.ix[0:2, 0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Summary of the pandas API for selection](https://github.com/pydata/pandas/issues/9595)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 20. When should I use the \"inplace\" parameter in pandas? ([video](https://www.youtube.com/watch?v=XaCSdr7pPmY&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=20))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State             Time\n",
       "0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3               Abilene             NaN           DISK    KS   6/1/1931 13:00\n",
       "4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 175,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of UFO reports into a DataFrame\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports')\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18241, 5)"
      ]
     },
     "execution_count": 176,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 177,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Colors Reported Shape Reported State             Time\n",
       "0             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3             NaN           DISK    KS   6/1/1931 13:00\n",
       "4             NaN          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 177,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# remove the 'City' column (doesn't affect the DataFrame since inplace=False)\n",
    "ufo.drop('City', axis=1).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 178,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State             Time\n",
       "0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3               Abilene             NaN           DISK    KS   6/1/1931 13:00\n",
       "4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 178,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# confirm that the 'City' column was not actually removed\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 179,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# remove the 'City' column (does affect the DataFrame since inplace=True)\n",
    "ufo.drop('City', axis=1, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Colors Reported Shape Reported State             Time\n",
       "0             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3             NaN           DISK    KS   6/1/1931 13:00\n",
       "4             NaN          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 180,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# confirm that the 'City' column was actually removed\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 181,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2490, 4)"
      ]
     },
     "execution_count": 181,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop a row if any value is missing from that row (doesn't affect the DataFrame since inplace=False)\n",
    "ufo.dropna(how='any').shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 182,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18241, 4)"
      ]
     },
     "execution_count": 182,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# confirm that no rows were actually removed\n",
    "ufo.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 183,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:00</th>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>IL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:00</th>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>IA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:45</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>WI</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:45</th>\n",
       "      <td>RED</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>WI</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:59</th>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>FL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Colors Reported Shape Reported State\n",
       "Time                                                 \n",
       "12/31/2000 23:00             NaN       TRIANGLE    IL\n",
       "12/31/2000 23:00             NaN           DISK    IA\n",
       "12/31/2000 23:45             NaN            NaN    WI\n",
       "12/31/2000 23:45             RED          LIGHT    WI\n",
       "12/31/2000 23:59             NaN           OVAL    FL"
      ]
     },
     "execution_count": 183,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use an assignment statement instead of the 'inplace' parameter\n",
    "ufo = ufo.set_index('Time')\n",
    "ufo.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 184,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:00</th>\n",
       "      <td>RED</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>IL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:00</th>\n",
       "      <td>RED</td>\n",
       "      <td>DISK</td>\n",
       "      <td>IA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:45</th>\n",
       "      <td>RED</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>WI</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:45</th>\n",
       "      <td>RED</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>WI</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:59</th>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>FL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Colors Reported Shape Reported State\n",
       "Time                                                 \n",
       "12/31/2000 23:00             RED       TRIANGLE    IL\n",
       "12/31/2000 23:00             RED           DISK    IA\n",
       "12/31/2000 23:45             RED          LIGHT    WI\n",
       "12/31/2000 23:45             RED          LIGHT    WI\n",
       "12/31/2000 23:59             NaN           OVAL    FL"
      ]
     },
     "execution_count": 184,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fill missing values using \"backward fill\" strategy (doesn't affect the DataFrame since inplace=False)\n",
    "ufo.fillna(method='bfill').tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 185,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:00</th>\n",
       "      <td>RED</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>IL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:00</th>\n",
       "      <td>RED</td>\n",
       "      <td>DISK</td>\n",
       "      <td>IA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:45</th>\n",
       "      <td>RED</td>\n",
       "      <td>DISK</td>\n",
       "      <td>WI</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:45</th>\n",
       "      <td>RED</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>WI</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12/31/2000 23:59</th>\n",
       "      <td>RED</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>FL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Colors Reported Shape Reported State\n",
       "Time                                                 \n",
       "12/31/2000 23:00             RED       TRIANGLE    IL\n",
       "12/31/2000 23:00             RED           DISK    IA\n",
       "12/31/2000 23:45             RED           DISK    WI\n",
       "12/31/2000 23:45             RED          LIGHT    WI\n",
       "12/31/2000 23:59             RED           OVAL    FL"
      ]
     },
     "execution_count": 185,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compare with \"forward fill\" strategy (doesn't affect the DataFrame since inplace=False)\n",
    "ufo.fillna(method='ffill').tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 21. How do I make my pandas DataFrame smaller and faster? ([video](https://www.youtube.com/watch?v=wDYDYGyN_cw&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=21))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 186,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 186,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 187,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 193 entries, 0 to 192\n",
      "Data columns (total 6 columns):\n",
      "country                         193 non-null object\n",
      "beer_servings                   193 non-null int64\n",
      "spirit_servings                 193 non-null int64\n",
      "wine_servings                   193 non-null int64\n",
      "total_litres_of_pure_alcohol    193 non-null float64\n",
      "continent                       193 non-null object\n",
      "dtypes: float64(1), int64(3), object(2)\n",
      "memory usage: 9.1+ KB\n"
     ]
    }
   ],
   "source": [
    "# exact memory usage is unknown because object columns are references elsewhere\n",
    "drinks.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 188,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 193 entries, 0 to 192\n",
      "Data columns (total 6 columns):\n",
      "country                         193 non-null object\n",
      "beer_servings                   193 non-null int64\n",
      "spirit_servings                 193 non-null int64\n",
      "wine_servings                   193 non-null int64\n",
      "total_litres_of_pure_alcohol    193 non-null float64\n",
      "continent                       193 non-null object\n",
      "dtypes: float64(1), int64(3), object(2)\n",
      "memory usage: 24.4 KB\n"
     ]
    }
   ],
   "source": [
    "# force pandas to calculate the true memory usage\n",
    "drinks.info(memory_usage='deep')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 189,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index                             72\n",
       "country                         9500\n",
       "beer_servings                   1544\n",
       "spirit_servings                 1544\n",
       "wine_servings                   1544\n",
       "total_litres_of_pure_alcohol    1544\n",
       "continent                       9244\n",
       "dtype: int64"
      ]
     },
     "execution_count": 189,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the memory usage for each Series (in bytes)\n",
    "drinks.memory_usage(deep=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`info`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.info.html) and [**`memory_usage`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.memory_usage.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 190,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country                           object\n",
       "beer_servings                      int64\n",
       "spirit_servings                    int64\n",
       "wine_servings                      int64\n",
       "total_litres_of_pure_alcohol     float64\n",
       "continent                       category\n",
       "dtype: object"
      ]
     },
     "execution_count": 190,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use the 'category' data type (new in pandas 0.15) to store the 'continent' strings as integers\n",
    "drinks['continent'] = drinks.continent.astype('category')\n",
    "drinks.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 191,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      Asia\n",
       "1    Europe\n",
       "2    Africa\n",
       "3    Europe\n",
       "4    Africa\n",
       "Name: continent, dtype: category\n",
       "Categories (6, object): [Africa, Asia, Europe, North America, Oceania, South America]"
      ]
     },
     "execution_count": 191,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'continent' Series appears to be unchanged\n",
    "drinks.continent.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 192,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    0\n",
       "3    2\n",
       "4    0\n",
       "dtype: int8"
      ]
     },
     "execution_count": 192,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# strings are now encoded (0 means 'Africa', 1 means 'Asia', 2 means 'Europe', etc.)\n",
    "drinks.continent.cat.codes.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 193,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index                             72\n",
       "country                         9500\n",
       "beer_servings                   1544\n",
       "spirit_servings                 1544\n",
       "wine_servings                   1544\n",
       "total_litres_of_pure_alcohol    1544\n",
       "continent                        488\n",
       "dtype: int64"
      ]
     },
     "execution_count": 193,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# memory usage has been drastically reduced\n",
    "drinks.memory_usage(deep=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 194,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index                             72\n",
       "country                         9886\n",
       "beer_servings                   1544\n",
       "spirit_servings                 1544\n",
       "wine_servings                   1544\n",
       "total_litres_of_pure_alcohol    1544\n",
       "continent                        488\n",
       "dtype: int64"
      ]
     },
     "execution_count": 194,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# repeat this process for the 'country' Series\n",
    "drinks['country'] = drinks.country.astype('category')\n",
    "drinks.memory_usage(deep=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 195,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'Afghanistan', u'Albania', u'Algeria', u'Andorra', u'Angola',\n",
       "       u'Antigua & Barbuda', u'Argentina', u'Armenia', u'Australia',\n",
       "       u'Austria',\n",
       "       ...\n",
       "       u'United Arab Emirates', u'United Kingdom', u'Uruguay', u'Uzbekistan',\n",
       "       u'Vanuatu', u'Venezuela', u'Vietnam', u'Yemen', u'Zambia', u'Zimbabwe'],\n",
       "      dtype='object', length=193)"
      ]
     },
     "execution_count": 195,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# memory usage increased because we created 193 categories\n",
    "drinks.country.cat.categories"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The **category** data type should only be used with a string Series that has a **small number of possible values**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 196,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>quality</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>101</td>\n",
       "      <td>very good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>103</td>\n",
       "      <td>excellent</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    ID    quality\n",
       "0  100       good\n",
       "1  101  very good\n",
       "2  102       good\n",
       "3  103  excellent"
      ]
     },
     "execution_count": 196,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a small DataFrame from a dictionary\n",
    "df = pd.DataFrame({'ID':[100, 101, 102, 103], 'quality':['good', 'very good', 'good', 'excellent']})\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 197,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>quality</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>103</td>\n",
       "      <td>excellent</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>101</td>\n",
       "      <td>very good</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    ID    quality\n",
       "3  103  excellent\n",
       "0  100       good\n",
       "2  102       good\n",
       "1  101  very good"
      ]
     },
     "execution_count": 197,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sort the DataFrame by the 'quality' Series (alphabetical order)\n",
    "df.sort_values('quality')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 198,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0         good\n",
       "1    very good\n",
       "2         good\n",
       "3    excellent\n",
       "Name: quality, dtype: category\n",
       "Categories (3, object): [good < very good < excellent]"
      ]
     },
     "execution_count": 198,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# define a logical ordering for the categories\n",
    "df['quality'] = df.quality.astype('category', categories=['good', 'very good', 'excellent'], ordered=True)\n",
    "df.quality"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 199,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>quality</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "      <td>good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>101</td>\n",
       "      <td>very good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>103</td>\n",
       "      <td>excellent</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    ID    quality\n",
       "0  100       good\n",
       "2  102       good\n",
       "1  101  very good\n",
       "3  103  excellent"
      ]
     },
     "execution_count": 199,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sort the DataFrame by the 'quality' Series (logical order)\n",
    "df.sort_values('quality')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 200,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>quality</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>101</td>\n",
       "      <td>very good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>103</td>\n",
       "      <td>excellent</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    ID    quality\n",
       "1  101  very good\n",
       "3  103  excellent"
      ]
     },
     "execution_count": 200,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# comparison operators work with ordered categories\n",
    "df.loc[df.quality > 'good', :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Overview of categorical data in pandas](http://pandas.pydata.org/pandas-docs/stable/categorical.html)\n",
    "\n",
    "[API reference for categorical methods](http://pandas.pydata.org/pandas-docs/stable/api.html#categorical)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 22. How do I use pandas with scikit-learn to create Kaggle submissions? ([video](https://www.youtube.com/watch?v=ylRlGCtAtiE&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=22))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 201,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 201,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read the training dataset from Kaggle's Titanic competition into a DataFrame\n",
    "train = pd.read_csv('http://bit.ly/kaggletrain')\n",
    "train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Predict passenger survival aboard the Titanic based on [passenger attributes](https://www.kaggle.com/c/titanic/data)\n",
    "\n",
    "**Video:** [What is machine learning, and how does it work?](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 202,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(891, 2)"
      ]
     },
     "execution_count": 202,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a feature matrix 'X' by selecting two DataFrame columns\n",
    "feature_cols = ['Pclass', 'Parch']\n",
    "X = train.loc[:, feature_cols]\n",
    "X.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 203,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(891L,)"
      ]
     },
     "execution_count": 203,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a response vector 'y' by selecting a Series\n",
    "y = train.Survived\n",
    "y.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** There is no need to convert these pandas objects to NumPy arrays. scikit-learn will understand these objects as long as they are entirely numeric and the proper shapes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 204,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
       "          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
       "          verbose=0, warm_start=False)"
      ]
     },
     "execution_count": 204,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fit a classification model to the training data\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "logreg = LogisticRegression()\n",
    "logreg.fit(X, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Video series:** [Introduction to machine learning with scikit-learn](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 205,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>892</td>\n",
       "      <td>3</td>\n",
       "      <td>Kelly, Mr. James</td>\n",
       "      <td>male</td>\n",
       "      <td>34.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>330911</td>\n",
       "      <td>7.8292</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>893</td>\n",
       "      <td>3</td>\n",
       "      <td>Wilkes, Mrs. James (Ellen Needs)</td>\n",
       "      <td>female</td>\n",
       "      <td>47.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>363272</td>\n",
       "      <td>7.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>894</td>\n",
       "      <td>2</td>\n",
       "      <td>Myles, Mr. Thomas Francis</td>\n",
       "      <td>male</td>\n",
       "      <td>62.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>240276</td>\n",
       "      <td>9.6875</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>895</td>\n",
       "      <td>3</td>\n",
       "      <td>Wirz, Mr. Albert</td>\n",
       "      <td>male</td>\n",
       "      <td>27.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>315154</td>\n",
       "      <td>8.6625</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>896</td>\n",
       "      <td>3</td>\n",
       "      <td>Hirvonen, Mrs. Alexander (Helga E Lindqvist)</td>\n",
       "      <td>female</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>3101298</td>\n",
       "      <td>12.2875</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Pclass                                          Name     Sex  \\\n",
       "0          892       3                              Kelly, Mr. James    male   \n",
       "1          893       3              Wilkes, Mrs. James (Ellen Needs)  female   \n",
       "2          894       2                     Myles, Mr. Thomas Francis    male   \n",
       "3          895       3                              Wirz, Mr. Albert    male   \n",
       "4          896       3  Hirvonen, Mrs. Alexander (Helga E Lindqvist)  female   \n",
       "\n",
       "    Age  SibSp  Parch   Ticket     Fare Cabin Embarked  \n",
       "0  34.5      0      0   330911   7.8292   NaN        Q  \n",
       "1  47.0      1      0   363272   7.0000   NaN        S  \n",
       "2  62.0      0      0   240276   9.6875   NaN        Q  \n",
       "3  27.0      0      0   315154   8.6625   NaN        S  \n",
       "4  22.0      1      1  3101298  12.2875   NaN        S  "
      ]
     },
     "execution_count": 205,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read the testing dataset from Kaggle's Titanic competition into a DataFrame\n",
    "test = pd.read_csv('http://bit.ly/kaggletest')\n",
    "test.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 206,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(418, 2)"
      ]
     },
     "execution_count": 206,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a feature matrix from the testing data that matches the training data\n",
    "X_new = test.loc[:, feature_cols]\n",
    "X_new.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 207,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# use the fitted model to make predictions for the testing set observations\n",
    "new_pred_class = logreg.predict(X_new)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 208,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>892</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>893</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>894</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>895</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>896</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived\n",
       "0          892         0\n",
       "1          893         0\n",
       "2          894         0\n",
       "3          895         0\n",
       "4          896         0"
      ]
     },
     "execution_count": 208,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a DataFrame of passenger IDs and testing set predictions\n",
    "pd.DataFrame({'PassengerId':test.PassengerId, 'Survived':new_pred_class}).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for the [**`DataFrame`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) constructor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 209,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>892</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>893</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>894</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>895</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>896</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived\n",
       "PassengerId          \n",
       "892                 0\n",
       "893                 0\n",
       "894                 0\n",
       "895                 0\n",
       "896                 0"
      ]
     },
     "execution_count": 209,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ensure that PassengerID is the first column by setting it as the index\n",
    "pd.DataFrame({'PassengerId':test.PassengerId, 'Survived':new_pred_class}).set_index('PassengerId').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 210,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# write the DataFrame to a CSV file that can be submitted to Kaggle\n",
    "pd.DataFrame({'PassengerId':test.PassengerId, 'Survived':new_pred_class}).set_index('PassengerId').to_csv('sub.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`to_csv`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 211,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# save a DataFrame to disk (\"pickle it\")\n",
    "train.to_pickle('train.pkl')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 212,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 212,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a pickled object from disk (\"unpickle it\")\n",
    "pd.read_pickle('train.pkl').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`to_pickle`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html) and [**`read_pickle`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_pickle.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 23. More of your pandas questions answered! ([video](https://www.youtube.com/watch?v=oH3wYKvwpJ8&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=23))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** Could you explain how to read the pandas documentation?\n",
    "\n",
    "[pandas API reference](http://pandas.pydata.org/pandas-docs/stable/api.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** What is the difference between **`ufo.isnull()`** and **`pd.isnull(ufo)`**?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 213,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State             Time\n",
       "0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3               Abilene             NaN           DISK    KS   6/1/1931 13:00\n",
       "4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 213,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of UFO reports into a DataFrame\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports')\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 214,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    City Colors Reported Shape Reported  State   Time\n",
       "0  False            True          False  False  False\n",
       "1  False            True          False  False  False\n",
       "2  False            True          False  False  False\n",
       "3  False            True          False  False  False\n",
       "4  False            True          False  False  False"
      ]
     },
     "execution_count": 214,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use 'isnull' as a top-level function\n",
    "pd.isnull(ufo).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 215,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    City Colors Reported Shape Reported  State   Time\n",
       "0  False            True          False  False  False\n",
       "1  False            True          False  False  False\n",
       "2  False            True          False  False  False\n",
       "3  False            True          False  False  False\n",
       "4  False            True          False  False  False"
      ]
     },
     "execution_count": 215,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# equivalent: use 'isnull' as a DataFrame method\n",
    "ufo.isnull().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`isnull`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isnull.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** Why are DataFrame slices inclusive when using **`.loc`**, but exclusive when using **`.iloc`**?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 216,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State             Time\n",
       "0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3               Abilene             NaN           DISK    KS   6/1/1931 13:00\n",
       "4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 216,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# label-based slicing is inclusive of the start and stop\n",
    "ufo.loc[0:4, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 217,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          City Colors Reported Shape Reported State             Time\n",
       "0       Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1  Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2      Holyoke             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3      Abilene             NaN           DISK    KS   6/1/1931 13:00"
      ]
     },
     "execution_count": 217,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# position-based slicing is inclusive of the start and exclusive of the stop\n",
    "ufo.iloc[0:4, :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`loc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) and [**`iloc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 218,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([['Ithaca', nan, 'TRIANGLE', 'NY', '6/1/1930 22:00'],\n",
       "       ['Willingboro', nan, 'OTHER', 'NJ', '6/30/1930 20:00'],\n",
       "       ['Holyoke', nan, 'OVAL', 'CO', '2/15/1931 14:00'],\n",
       "       ['Abilene', nan, 'DISK', 'KS', '6/1/1931 13:00']], dtype=object)"
      ]
     },
     "execution_count": 218,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'iloc' is simply following NumPy's slicing convention...\n",
    "ufo.values[0:4, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 219,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'pyth'"
      ]
     },
     "execution_count": 219,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ...and NumPy is simply following Python's slicing convention\n",
    "'python'[0:4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 220,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State\n",
       "0                Ithaca             NaN       TRIANGLE    NY\n",
       "1           Willingboro             NaN          OTHER    NJ\n",
       "2               Holyoke             NaN           OVAL    CO\n",
       "3               Abilene             NaN           DISK    KS\n",
       "4  New York Worlds Fair             NaN          LIGHT    NY"
      ]
     },
     "execution_count": 220,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'loc' is inclusive of the stopping label because you don't necessarily know what label will come after it\n",
    "ufo.loc[0:4, 'City':'State']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** How do I randomly sample rows from a DataFrame?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 221,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>12192</th>\n",
       "      <td>Winston</td>\n",
       "      <td>GREEN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>OR</td>\n",
       "      <td>9/23/1998 21:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1775</th>\n",
       "      <td>Lake Wales</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>FL</td>\n",
       "      <td>1/20/1969 19:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3141</th>\n",
       "      <td>Cannon AFB</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>NM</td>\n",
       "      <td>1/6/1976 1:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             City Colors Reported Shape Reported State             Time\n",
       "12192     Winston           GREEN          LIGHT    OR  9/23/1998 21:00\n",
       "1775   Lake Wales             NaN           DISK    FL  1/20/1969 19:00\n",
       "3141   Cannon AFB             NaN           DISK    NM    1/6/1976 1:00"
      ]
     },
     "execution_count": 221,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sample 3 rows from the DataFrame without replacement (new in pandas 0.16.1)\n",
    "ufo.sample(n=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`sample`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sample.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 222,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>217</th>\n",
       "      <td>Norridgewock</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>ME</td>\n",
       "      <td>9/15/1952 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12282</th>\n",
       "      <td>Ipava</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>IL</td>\n",
       "      <td>10/1/1998 21:15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17933</th>\n",
       "      <td>Ellinwood</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FIREBALL</td>\n",
       "      <td>KS</td>\n",
       "      <td>11/13/2000 22:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               City Colors Reported Shape Reported State              Time\n",
       "217    Norridgewock             NaN           DISK    ME   9/15/1952 14:00\n",
       "12282         Ipava             NaN       TRIANGLE    IL   10/1/1998 21:15\n",
       "17933     Ellinwood             NaN       FIREBALL    KS  11/13/2000 22:00"
      ]
     },
     "execution_count": 222,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use the 'random_state' parameter for reproducibility\n",
    "ufo.sample(n=3, random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 223,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# sample 75% of the DataFrame's rows without replacement\n",
    "train = ufo.sample(frac=0.75, random_state=99)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 224,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# store the remaining 25% of the rows in another DataFrame\n",
    "test = ufo.loc[~ufo.index.isin(train.index), :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`isin`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.isin.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 24. How do I create dummy variables in pandas? ([video](https://www.youtube.com/watch?v=0s_1IsROgDc&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=24))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 225,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 225,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read the training dataset from Kaggle's Titanic competition\n",
    "train = pd.read_csv('http://bit.ly/kaggletrain')\n",
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 226,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Sex_male</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  Sex_male  \n",
       "0      0         A/5 21171   7.2500   NaN        S         1  \n",
       "1      0          PC 17599  71.2833   C85        C         0  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S         0  \n",
       "3      0            113803  53.1000  C123        S         0  \n",
       "4      0            373450   8.0500   NaN        S         1  "
      ]
     },
     "execution_count": 226,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create the 'Sex_male' dummy variable using the 'map' method\n",
    "train['Sex_male'] = train.Sex.map({'female':0, 'male':1})\n",
    "train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`map`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 227,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>female</th>\n",
       "      <th>male</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   female  male\n",
       "0     0.0   1.0\n",
       "1     1.0   0.0\n",
       "2     1.0   0.0\n",
       "3     1.0   0.0\n",
       "4     0.0   1.0"
      ]
     },
     "execution_count": 227,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# alternative: use 'get_dummies' to create one column for every possible value\n",
    "pd.get_dummies(train.Sex).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generally speaking:\n",
    "\n",
    "- If you have **\"K\" possible values** for a categorical feature, you only need **\"K-1\" dummy variables** to capture all of the information about that feature.\n",
    "- One convention is to **drop the first dummy variable**, which defines that level as the \"baseline\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 228,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>male</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   male\n",
       "0   1.0\n",
       "1   0.0\n",
       "2   0.0\n",
       "3   0.0\n",
       "4   1.0"
      ]
     },
     "execution_count": 228,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop the first dummy variable ('female') using the 'iloc' method\n",
    "pd.get_dummies(train.Sex).iloc[:, 1:].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 229,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sex_male</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Sex_male\n",
       "0       1.0\n",
       "1       0.0\n",
       "2       0.0\n",
       "3       0.0\n",
       "4       1.0"
      ]
     },
     "execution_count": 229,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# add a prefix to identify the source of the dummy variables\n",
    "pd.get_dummies(train.Sex, prefix='Sex').iloc[:, 1:].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 230,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Embarked_C</th>\n",
       "      <th>Embarked_Q</th>\n",
       "      <th>Embarked_S</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Embarked_C  Embarked_Q  Embarked_S\n",
       "0         0.0         0.0         1.0\n",
       "1         1.0         0.0         0.0\n",
       "2         0.0         0.0         1.0\n",
       "3         0.0         0.0         1.0\n",
       "4         0.0         0.0         1.0\n",
       "5         0.0         1.0         0.0\n",
       "6         0.0         0.0         1.0\n",
       "7         0.0         0.0         1.0\n",
       "8         0.0         0.0         1.0\n",
       "9         1.0         0.0         0.0"
      ]
     },
     "execution_count": 230,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use 'get_dummies' with a feature that has 3 possible values\n",
    "pd.get_dummies(train.Embarked, prefix='Embarked').head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 231,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Embarked_Q</th>\n",
       "      <th>Embarked_S</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Embarked_Q  Embarked_S\n",
       "0         0.0         1.0\n",
       "1         0.0         0.0\n",
       "2         0.0         1.0\n",
       "3         0.0         1.0\n",
       "4         0.0         1.0\n",
       "5         1.0         0.0\n",
       "6         0.0         1.0\n",
       "7         0.0         1.0\n",
       "8         0.0         1.0\n",
       "9         0.0         0.0"
      ]
     },
     "execution_count": 231,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop the first dummy variable ('C')\n",
    "pd.get_dummies(train.Embarked, prefix='Embarked').iloc[:, 1:].head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How to translate these values back to the original 'Embarked' value:\n",
    "\n",
    "- **0, 0** means **C**\n",
    "- **1, 0** means **Q**\n",
    "- **0, 1** means **S**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 232,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Sex_male</th>\n",
       "      <th>Embarked_Q</th>\n",
       "      <th>Embarked_S</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  Sex_male  Embarked_Q  \\\n",
       "0      0         A/5 21171   7.2500   NaN        S         1         0.0   \n",
       "1      0          PC 17599  71.2833   C85        C         0         0.0   \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S         0         0.0   \n",
       "3      0            113803  53.1000  C123        S         0         0.0   \n",
       "4      0            373450   8.0500   NaN        S         1         0.0   \n",
       "\n",
       "   Embarked_S  \n",
       "0         1.0  \n",
       "1         0.0  \n",
       "2         1.0  \n",
       "3         1.0  \n",
       "4         1.0  "
      ]
     },
     "execution_count": 232,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# save the DataFrame of dummy variables and concatenate them to the original DataFrame\n",
    "embarked_dummies = pd.get_dummies(train.Embarked, prefix='Embarked').iloc[:, 1:]\n",
    "train = pd.concat([train, embarked_dummies], axis=1)\n",
    "train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`concat`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 233,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 233,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# reset the DataFrame\n",
    "train = pd.read_csv('http://bit.ly/kaggletrain')\n",
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 234,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Sex_female</th>\n",
       "      <th>Sex_male</th>\n",
       "      <th>Embarked_C</th>\n",
       "      <th>Embarked_Q</th>\n",
       "      <th>Embarked_S</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name   Age  SibSp  Parch  \\\n",
       "0                            Braund, Mr. Owen Harris  22.0      1      0   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0      1      0   \n",
       "2                             Heikkinen, Miss. Laina  26.0      0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  35.0      1      0   \n",
       "4                           Allen, Mr. William Henry  35.0      0      0   \n",
       "\n",
       "             Ticket     Fare Cabin  Sex_female  Sex_male  Embarked_C  \\\n",
       "0         A/5 21171   7.2500   NaN         0.0       1.0         0.0   \n",
       "1          PC 17599  71.2833   C85         1.0       0.0         1.0   \n",
       "2  STON/O2. 3101282   7.9250   NaN         1.0       0.0         0.0   \n",
       "3            113803  53.1000  C123         1.0       0.0         0.0   \n",
       "4            373450   8.0500   NaN         0.0       1.0         0.0   \n",
       "\n",
       "   Embarked_Q  Embarked_S  \n",
       "0         0.0         1.0  \n",
       "1         0.0         0.0  \n",
       "2         0.0         1.0  \n",
       "3         0.0         1.0  \n",
       "4         0.0         1.0  "
      ]
     },
     "execution_count": 234,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# pass the DataFrame to 'get_dummies' and specify which columns to dummy (it drops the original columns)\n",
    "pd.get_dummies(train, columns=['Sex', 'Embarked']).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 235,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Sex_male</th>\n",
       "      <th>Embarked_Q</th>\n",
       "      <th>Embarked_S</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name   Age  SibSp  Parch  \\\n",
       "0                            Braund, Mr. Owen Harris  22.0      1      0   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0      1      0   \n",
       "2                             Heikkinen, Miss. Laina  26.0      0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  35.0      1      0   \n",
       "4                           Allen, Mr. William Henry  35.0      0      0   \n",
       "\n",
       "             Ticket     Fare Cabin  Sex_male  Embarked_Q  Embarked_S  \n",
       "0         A/5 21171   7.2500   NaN       1.0         0.0         1.0  \n",
       "1          PC 17599  71.2833   C85       0.0         0.0         0.0  \n",
       "2  STON/O2. 3101282   7.9250   NaN       0.0         0.0         1.0  \n",
       "3            113803  53.1000  C123       0.0         0.0         1.0  \n",
       "4            373450   8.0500   NaN       1.0         0.0         1.0  "
      ]
     },
     "execution_count": 235,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use the 'drop_first' parameter (new in pandas 0.18) to drop the first dummy variable for each feature\n",
    "pd.get_dummies(train, columns=['Sex', 'Embarked'], drop_first=True).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`get_dummies`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 25. How do I work with dates and times in pandas? ([video](https://www.youtube.com/watch?v=yCgJGsg0Xa4&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=25))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 236,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>6/1/1930 22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>6/30/1930 20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>2/15/1931 14:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>6/1/1931 13:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>4/18/1933 19:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State             Time\n",
       "0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00\n",
       "1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00\n",
       "2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00\n",
       "3               Abilene             NaN           DISK    KS   6/1/1931 13:00\n",
       "4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00"
      ]
     },
     "execution_count": 236,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of UFO reports into a DataFrame\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports')\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 237,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "City               object\n",
       "Colors Reported    object\n",
       "Shape Reported     object\n",
       "State              object\n",
       "Time               object\n",
       "dtype: object"
      ]
     },
     "execution_count": 237,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'Time' is currently stored as a string\n",
    "ufo.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 238,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    22\n",
       "1    20\n",
       "2    14\n",
       "3    13\n",
       "4    19\n",
       "Name: Time, dtype: int32"
      ]
     },
     "execution_count": 238,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# hour could be accessed using string slicing, but this approach breaks too easily\n",
    "ufo.Time.str.slice(-5, -3).astype(int).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 239,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>1930-06-01 22:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>1930-06-30 20:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>1931-02-15 14:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>1931-06-01 13:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>1933-04-18 19:00:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State  \\\n",
       "0                Ithaca             NaN       TRIANGLE    NY   \n",
       "1           Willingboro             NaN          OTHER    NJ   \n",
       "2               Holyoke             NaN           OVAL    CO   \n",
       "3               Abilene             NaN           DISK    KS   \n",
       "4  New York Worlds Fair             NaN          LIGHT    NY   \n",
       "\n",
       "                 Time  \n",
       "0 1930-06-01 22:00:00  \n",
       "1 1930-06-30 20:00:00  \n",
       "2 1931-02-15 14:00:00  \n",
       "3 1931-06-01 13:00:00  \n",
       "4 1933-04-18 19:00:00  "
      ]
     },
     "execution_count": 239,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert 'Time' to datetime format\n",
    "ufo['Time'] = pd.to_datetime(ufo.Time)\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 240,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "City                       object\n",
       "Colors Reported            object\n",
       "Shape Reported             object\n",
       "State                      object\n",
       "Time               datetime64[ns]\n",
       "dtype: object"
      ]
     },
     "execution_count": 240,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`to_datetime`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 241,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    22\n",
       "1    20\n",
       "2    14\n",
       "3    13\n",
       "4    19\n",
       "Name: Time, dtype: int64"
      ]
     },
     "execution_count": 241,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convenient Series attributes are now available\n",
    "ufo.Time.dt.hour.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 242,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     Sunday\n",
       "1     Monday\n",
       "2     Sunday\n",
       "3     Monday\n",
       "4    Tuesday\n",
       "Name: Time, dtype: object"
      ]
     },
     "execution_count": 242,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo.Time.dt.weekday_name.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 243,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    152\n",
       "1    181\n",
       "2     46\n",
       "3    152\n",
       "4    108\n",
       "Name: Time, dtype: int64"
      ]
     },
     "execution_count": 243,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo.Time.dt.dayofyear.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "API reference for [datetime properties and methods](http://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 244,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('1999-01-01 00:00:00')"
      ]
     },
     "execution_count": 244,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert a single string to datetime format (outputs a timestamp object)\n",
    "ts = pd.to_datetime('1/1/1999')\n",
    "ts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 245,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>12832</th>\n",
       "      <td>Loma Rica</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>CA</td>\n",
       "      <td>1999-01-01 02:30:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12833</th>\n",
       "      <td>Bauxite</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>AR</td>\n",
       "      <td>1999-01-01 03:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12834</th>\n",
       "      <td>Florence</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CYLINDER</td>\n",
       "      <td>SC</td>\n",
       "      <td>1999-01-01 14:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12835</th>\n",
       "      <td>Lake Henshaw</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CIGAR</td>\n",
       "      <td>CA</td>\n",
       "      <td>1999-01-01 15:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12836</th>\n",
       "      <td>Wilmington Island</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>GA</td>\n",
       "      <td>1999-01-01 17:15:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    City Colors Reported Shape Reported State  \\\n",
       "12832          Loma Rica             NaN          LIGHT    CA   \n",
       "12833            Bauxite             NaN            NaN    AR   \n",
       "12834           Florence             NaN       CYLINDER    SC   \n",
       "12835       Lake Henshaw             NaN          CIGAR    CA   \n",
       "12836  Wilmington Island             NaN          LIGHT    GA   \n",
       "\n",
       "                     Time  \n",
       "12832 1999-01-01 02:30:00  \n",
       "12833 1999-01-01 03:00:00  \n",
       "12834 1999-01-01 14:00:00  \n",
       "12835 1999-01-01 15:00:00  \n",
       "12836 1999-01-01 17:15:00  "
      ]
     },
     "execution_count": 245,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compare a datetime Series with a timestamp\n",
    "ufo.loc[ufo.Time >= ts, :].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 246,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timedelta('25781 days 01:59:00')"
      ]
     },
     "execution_count": 246,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# perform mathematical operations with timestamps (outputs a timedelta object)\n",
    "ufo.Time.max() - ufo.Time.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 247,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "25781L"
      ]
     },
     "execution_count": 247,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# timedelta objects also have attributes you can access\n",
    "(ufo.Time.max() - ufo.Time.min()).days"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 248,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# allow plots to appear in the notebook\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 249,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1930    2\n",
       "1931    2\n",
       "1933    1\n",
       "1934    1\n",
       "1935    1\n",
       "Name: Year, dtype: int64"
      ]
     },
     "execution_count": 249,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count the number of UFO reports per year\n",
    "ufo['Year'] = ufo.Time.dt.year\n",
    "ufo.Year.value_counts().sort_index().head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 250,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0xcc6b240>"
      ]
     },
     "execution_count": 250,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEACAYAAABYq7oeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm8VXW9//HXGxEHUNRUSHCAiyA4k2Kl9+exzKFBKW84\nVFra4/b7qdlNs8T7K7DJ23BLu/dqj35hoqmIDU6hIOG5pVwTc0ABgRIUSEBjEgEFzuf3x3dt2Zx9\nJg57WPvwfj4e63HW+e619v6szWF91ndY36WIwMzMrFi3WgdgZmb54+RgZmYlnBzMzKyEk4OZmZVw\ncjAzsxJODmZmVqLd5CBpF0l/kvSMpOcljcnK95Y0RdJcSZMl9S7aZ7Sk+ZLmSDqtqHy4pJmS5km6\noTKHZGZm26vd5BARbwGnRMSxwDHAmZJGANcAUyNiCDANGA0gaRgwChgKnAncJEnZ290MXBIRg4HB\nkk4v9wGZmdn261CzUkSsy1Z3AboDAZwNjM/KxwMjs/WzgAkRsSkiFgLzgRGS+gJ7RMSMbLvbivYx\nM7Mc6VBykNRN0jPAUuCR7ATfJyKWAUTEUmD/bPN+wKKi3ZdkZf2AxUXli7MyMzPLmY7WHJqyZqX+\npFrA4aTaw1ablTs4MzOrje7bsnFErJHUCJwBLJPUJyKWZU1Gy7PNlgAHFu3WPytrrbyEJCcaM7NO\niAi1v1X7OjJaad/CSCRJuwEfAuYA9wOfzTa7CLgvW78fOE9SD0kDgEHAk1nT02pJI7IO6guL9ikR\nEXW7jBkzpuYx7IixO/7aL46/tks5daTm8G5gvKRupGRyd0RMkvQEMFHSxcDLpBFKRMRsSROB2cBG\n4NLYEvVlwK3ArsCkiHi4rEdjZmZl0W5yiIjngeEtlK8ATm1ln+uB61so/zNw5LaHaWZm1eQ7pCug\noaGh1iF0Wj3HDo6/1hx/16Fyt1OVg6TIY1xmZnkmiahWh7SZme14nBzMzKyEk4OZmZVwcjAzsxJO\nDmZmVsLJwczMSjg5mJlZCScHMzMr4eRgZmYlnBzMzOrMxo2V/wwnBzOzOnPqqfDgg5X9DM+tZGZW\nZwYPhv32g8cf37rccyuZme3AVq2C+fPhsccq9xlODmZmdSQiJYd//Vf4/vcr9zluVjIzqyPr1sG+\n+8Lf/w4DBsC0aTBsWHrNzUpmZjuoVatgr71gt93g8svhBz+ozOd05BnSZmaWE4XkAHDppTBoECxe\nDP37l/dzXHMwM6sjxclhn33goovgxhvL/zlODmZmdaQ4OQB8+ctwyy2pvJycHMzM6kjz5HDQQfCR\nj8BPf1rez3Gfg5lZHVm1Cnr33rrs6qvhtNPK+zmuOZiZ1ZHmNQeAI4+EX/+6vJ/j5GBmVkdaSg4A\n739/eT/HycHMrI60lhzKzcnBzKyOODmYmVmJ3CQHSf0lTZM0S9Lzkr6YlY+RtFjS09lyRtE+oyXN\nlzRH0mlF5cMlzZQ0T9INlTkkM7Ouq1rJoSNDWTcBV0bEs5J6AX+W9Ej22o8i4kfFG0saCowChgL9\ngamSDs1m0rsZuCQiZkiaJOn0iJhcvsMxM+vaclNziIilEfFstr4WmAP0y15uafa/s4EJEbEpIhYC\n84ERkvoCe0TEjGy724CR2xm/mdkOJTfJoZikQ4BjgD9lRZdLelbSzyUVbsvoBywq2m1JVtYPWFxU\nvpgtScbMzNpReJZD85vgKqHDySFrUvoV8KWsBnETMDAijgGWAv9emRDNzAxg/XrYaSfYddfKf1aH\nps+Q1J2UGG6PiPsAIuK1ok3+H/BAtr4EOLDotf5ZWWvlLRo7duw76w0NDTQ0NHQkVDOzLqt5k1Jj\nYyONjY0V+awOPQlO0m3A6xFxZVFZ34hYmq1/GTg+Ii6QNAy4AziB1Gz0CHBoRISkJ4ArgBnA74Cf\nRMTDLXyenwRnZtbM7NlwzjkwZ07Lr5fzSXDt1hwknQh8Cnhe0jNAANcCF0g6BmgCFgJfAIiI2ZIm\nArOBjcClRWf6y4BbgV2BSS0lBjMza9nq1dXpjAY/Q9rMrG489BD85CfpZ0v8DGkzsx1QtYaxgpOD\nmVndcHIwM7MSTg5mZlbCycHMzEo4OZiZWQknBzMzK+HkYGZmJZwczMyshJODmZmVcHIwM7OtVPNZ\nDuDkYGZWFzZsAKk6z3IAJwczs7pQzSYlcHIwM6sLTg5mZlbCycHMzEo4OZiZWQknBzMzK+HkYGZm\nJZwczMyshJODmZmVcHIwM7MSTg5mZlbCycHMzEqsXu3kYGZmzbjmYGZmJZwczMyshJODmZltZcOG\n9LNaz3IAJwczs9yrdq0BOpAcJPWXNE3SLEnPS7oiK99b0hRJcyVNltS7aJ/RkuZLmiPptKLy4ZJm\nSpon6YbKHJKZWdeSy+QAbAKujIjDgfcBl0k6DLgGmBoRQ4BpwGgAScOAUcBQ4EzgJknK3utm4JKI\nGAwMlnR6WY/GzKwLymVyiIilEfFstr4WmAP0B84GxmebjQdGZutnARMiYlNELATmAyMk9QX2iIgZ\n2Xa3Fe1jZmatyGVyKCbpEOAY4AmgT0Qsg5RAgP2zzfoBi4p2W5KV9QMWF5UvzsrMzKwNtUgO3Tu6\noaRewK+AL0XEWknRbJPmv2+XsWPHvrPe0NBAQ0NDOd/ezKxutJYcGhsbaWxsrMhndig5SOpOSgy3\nR8R9WfEySX0iYlnWZLQ8K18CHFi0e/+srLXyFhUnBzOzHVlryaH5hfN1111Xts/saLPSLcDsiLix\nqOx+4LPZ+kXAfUXl50nqIWkAMAh4Mmt6Wi1pRNZBfWHRPmZm1opVq6B37/a3K6d2aw6STgQ+BTwv\n6RlS89G1wPeAiZIuBl4mjVAiImZLmgjMBjYCl0ZEocnpMuBWYFdgUkQ8XN7DMTPrelatgoMPru5n\nast5Oz8kRR7jMjOrhfPOg7PPhvPPb3s7SUSE2t6qY3yHtJlZzuV+KKuZmVWfk4OZmZVwcjAzsxJO\nDmZmVsLJwczMtrJhA0RU91kO4ORgZpZrhVqDyjJAteOcHMzMcqwWTUrg5GBmlmtODmZmVsLJwczM\nSqxe7eRgZmbNuOZgZmYlnBzMzKyEk4OZmZVYuhT226/6n+vkYGaWYwsXwoAB1f9cJwczsxxbsAAO\nOaT6n+snwZmZ5dSmTdCzJ7zxBvTo0f72fhKcmdkOYNEi6NOnY4mh3JwczMxyqlb9DeDkYGaWW7Xq\nbwAnBzOz3HLNwczMSrjmYGZmJVxzMDOzErWsOfg+BzOzHHrrLdhzT1i3DnbaqWP7+D4HM7Mu7pVX\noH//jieGcnNyMDPLoVo2KUEHkoOkcZKWSZpZVDZG0mJJT2fLGUWvjZY0X9IcSacVlQ+XNFPSPEk3\nlP9QzMy6jlp2RkPHag6/AE5vofxHETE8Wx4GkDQUGAUMBc4EbpJUaP+6GbgkIgYDgyW19J5mZkYd\n1Bwi4jFgZQsvtdTpcTYwISI2RcRCYD4wQlJfYI+ImJFtdxswsnMhm5l1ffVQc2jN5ZKelfRzSb2z\nsn7AoqJtlmRl/YDFReWLszIzM2tBrWsO3Tu5303ANyMiJH0b+Hfg8+ULC8aOHfvOekNDAw0NDeV8\nezOzXOtIzaGxsZHGxsaKfH6H7nOQdDDwQEQc1dZrkq4BIiK+l732MDAGeBl4NCKGZuXnASdHxP9p\n5fN8n4OZ7bDWrYN3vQvefBO6bUP7Ti3ucxBFfQxZH0LBJ4AXsvX7gfMk9ZA0ABgEPBkRS4HVkkZk\nHdQXAvdtd/RmZl3Qyy/DQQdtW2Iot3ablSTdCTQA75L0CqkmcIqkY4AmYCHwBYCImC1pIjAb2Ahc\nWlQFuAy4FdgVmFQY4WRmZlurdX8DePoMM7PcuekmmDkTfvrTbdvP02eYmXVhCxbUdhgrODmYmeXO\nwoW1b1ZycjAzyxnXHMzMrEQeOqSdHMzMcmTNGtiwAfbbr7ZxODmYmeVIob9BZRlz1HlODmZmVfCN\nb6QaQXtqPeFegZODmVkV/OAHMG9e+9vlob8BnBzMzCpu/fpUa1iwoP1tXXMwM9tBrFiRfr70Uvvb\nuuZgZraDWJk9Ls01BzMze0dHaw4R+bgBDpwczMwqbuXK1FTUXs1h1ar0c6+9Kh5Su5wczMwqbMUK\nGD48NRm1NeF0odZQ63scwMnBzKziVq5MD+/ZfXdYtqz17fLSGQ1ODmZmFbdiBey9Nwwc2HbT0osv\nwpAh1YurLU4OZmYVtnIl7LNPajJqq1N61iw4/PDqxdUWJwczswor1BwGDGi75jB7tpODmdkOo1Bz\naKtZafPmNL3GYYdVN7bWODmYmVVYcc2htWalv/4V+vaFnj2rG1trutc6ADOzrq5Qc9hvv9ZrDrNn\nw7Bh1Y2rLU4OZmYVtmJFSg69e8Orr8LGjbDzzltvk6fOaHCzkplZRTU1pTuf99orJYR3vxteeaV0\nuzx1RoOTg5lZRa1ZA716Qfesnaa1EUuzZuWrWcnJwcysggqd0QUDB5Z2ShdGKg0dWt3Y2uLkYGZW\nQYXO6IKWag4vvQR9+uRnpBI4OZiZVVTzmkNLySFvndHg5GBmVlHNaw4tNSvVZXKQNE7SMkkzi8r2\nljRF0lxJkyX1LnpttKT5kuZIOq2ofLikmZLmSbqh/IdiZpY/Hak55O0eB+hYzeEXwOnNyq4BpkbE\nEGAaMBpA0jBgFDAUOBO4SXpnZvKbgUsiYjAwWFLz9zQz63Ka1xz69IF16+CNN7aU1WXNISIeA1Y2\nKz4bGJ+tjwdGZutnARMiYlNELATmAyMk9QX2iIgZ2Xa3Fe1jZtZlFW6AK5C2fipc3uZUKuhsn8P+\nEbEMICKWAvtn5f2ARUXbLcnK+gGLi8oXZ2VmZl3aypVbNyvB1k1LhZFKvXpVP7a2lGv6jDYefNc5\nY8eOfWe9oaGBhoaGcn+EmVnFNa85wNad0tvTpNTY2EhjY+N2xdeaziaHZZL6RMSyrMloeVa+BDiw\naLv+WVlr5a0qTg5mZvWqvZrD9nRGN79wvu666zr3Ri3oaLOSsqXgfuCz2fpFwH1F5edJ6iFpADAI\neDJrelotaUTWQX1h0T5mZl1WSzWH4uSQx85o6NhQ1juB6aQRRq9I+hzwb8CHJM0FPpj9TkTMBiYC\ns4FJwKURUWhyugwYB8wD5kfEw+U+GDOzvGk+lBVKm5XyNowVQFvO3fkhKfIYl5nZturVK03Tvcce\nW8reeCN1Qq9ZA3vuCcuXl6dDWhIRofa3bJ/vkDYzq5C334a33io98e+xB+y+OzzxRD5HKoGTg5lZ\nxRQ6o9XCtfzAgfDgg/lsUgInBzOzimmpv6FgwICUHPLYGQ1ODmZmFdN86oxiAwbkd6QSODmYmVVM\nS8NYCwYOTD/drGRmtoNp6Qa4ggED0s88Pf2tmJODmVmFtFVzGDYM3v/+fI5UAicHM7OKaavm0K8f\nPP54dePZFk4OZmYV0lbNIe+cHMzMKqStoax55+RgZlYhbQ1lzTsnBzOzCnHNwczMSrjmYGZmJeq5\nQ9pTdpuZVUAE9OgBb76ZflaDp+w2M8u5tWthl12qlxjKzcnBzKwC2roBrh44OZiZVUA99zeAk4OZ\nWUW45mBmZiVcczAzsxKuOZiZWQnXHMzMrISTg5mZlXCzkpmZlXDNwczMSrjmYGZmJVxzMDOzEjt0\nzUHSQknPSXpG0pNZ2d6SpkiaK2mypN5F24+WNF/SHEmnbW/wZmZ5taPXHJqAhog4NiJGZGXXAFMj\nYggwDRgNIGkYMAoYCpwJ3CSpLFPLmpnlyaZNaaruPfesdSSdt73JQS28x9nA+Gx9PDAyWz8LmBAR\nmyJiITAfGIGZWRezahX07g3d6rjhfntDD+ARSTMkfT4r6xMRywAiYimwf1beD1hUtO+SrMzMrEup\n58eDFnTfzv1PjIhXJe0HTJE0l5QwivmRbma2Q1mxor47o2E7k0NEvJr9fE3SvaRmomWS+kTEMkl9\ngeXZ5kuAA4t275+VtWjs2LHvrDc0NNDQ0LA9oZqZVU21ag6NjY00NjZW5L07/QxpSbsD3SJiraSe\nwBTgOuCDwIqI+J6krwF7R8Q1WYf0HcAJpOakR4BDW3pYtJ8hbWb17M474YEH4K67qvu55XyG9PbU\nHPoAv5UU2fvcERFTJD0FTJR0MfAyaYQSETFb0kRgNrARuNQZwMy6onofxgrbkRwiYgFwTAvlK4BT\nW9nneuD6zn6mmVk9qPcb4MB3SJuZlV1XqDk4OZiZlZlrDmZmtpUImD8f9t+//W3zzMnBzKyMHn4Y\nXn8dTqvz2eOcHMzMymTTJrjqKvjhD2HnnWsdzfZxcjAzK5Of/Qze/W746EdrHcn26/RNcJXkm+DM\nrN6sWgVDhsCUKXD00bWJoZw3wTk5mJmVwdVXp1FKP/957WIoZ3Jws5KZtWn8ePjHf4T162sdSX79\n9a9wyy3wrW/VOpLycXIw2wGsXAmjRsHIkfCnP3V8v3vugdGjoVev1NFqLbvmGrjyytTf0FU4OZh1\ncc8/D8cfDwcckIZXjhoFH/oQ/Pd/pzH5rZk0CS6/HB56CCZMSD/vvbd6cdeLqVNTwr3yylpHUmYR\nkbslhWVm2+vuuyP23Tfi9tu3lL31VsS4cRGDBkWceGLEL38ZsXbt1vs1Nqb9pk/fUjZ9esT++0cs\nWlSd2POuqSnixz9O39PkybWOJsnOnWU5D7tD2qwL2rQpNQf9+tfwm9/AMSVTZMLmzem18ePhscfS\n8MtPfSo93nLkyDTd9Ac/uPU+3/42/P736Wp5p52qcyx59Npr8LnPpZ933QUDB9Y6osQd0mbWogj4\n7W/hqKPghRdgxoyWEwOkk/snPwkPPgjz5sF73wvf/CY0NMC4caWJAVLCaWqC732voodRcW+9lRJo\nZzz6KBx7LBx+OPzxj/lJDOXmmoNZF/H738O116YT3/XXwxlngDpxDfn229CjR+uvL1oExx2Xhm6+\n+SYsXpyWV1+FL30pXVHnzeuvw/Tp8PjjqZb07LMpyQ0YAIcdtmU56aSWT/abNsHkySlpPvEE/OIX\ncPrp1T+O9vg+BzN7x9Kl8JnPwIIFqdln1CjoVuE2galT4Y47oH//tPTrB7vtBhdeCDfckGoktbZk\nCUycmDrTX3wRTjghnfxPOglGjEg1p/nz02tz58KsWamTftdd4dRT03LYYWnE1q23pmP8/Ofh3HNh\nzz1rfXQtc3IwqwMbN6aT6N13w8yZcPvtqSminJYtS81An/wkfP3rtZ/P57nn0oio8eNTzaVcmprg\nkUfSlf7gwa1v9/rrWxLCrFlw9tlw3nnwgQ9A9w482iwC5sxJ/25Tp6amubPOgksugSOPLN/xVIqT\ng1kNbNiQmiUWLtx62XnnLVfQ/funh7xMm5ba/gcNSienXXZJ7fmPPFK+BLF8OZxySrqS/cY3yvOe\n5TB9ejop33svnHji9r1XRPrOrr02NXctX56SwyWXwD/9E/TsmZLwpEkpIU2bBh/+MJx/fkpSu+xS\nnmOqF04OZjVw/vnpqvLYY+GQQ9Jy8MGpPbrQ7r5kSWrmed/7UvPOIYds2f/OO+ErX0lz7xxxxPbF\n8tpr6Wr4E5+A667bvveqhMmTUxNTYZ6hNWvSlfwLL6Qb8v75n2Gvvdp+jyeeSB3gf/tbai4755w0\nwurBB1Pb//Tp6Tv44x9TwrjoolSD6t27OseYR04OZlU2cWJqtnnmGdh9986/z113pZulpkzpfDPF\n66+nkUQf+1iarqEznc7VcM89cOml6ft6/XUYNiwlxc2b0/F/97vw2c9u3T8SAY2NacrrmTNhzJi0\nTUtNQkuWpBvzGhpSDc2cHMyq6tVX03DQBx5IHZnb6+674V/+BX73Oxg+fNv2ffHFVCP56EfhO9/J\nb2IomDUrdfAOGLB1Evjzn+GLX0yJ4j/+I9XG7rknJYV169JUHZ/5TNrXOs7JwaxKItIV+rHHlndS\ntcJVde/e6cq3oQFOPhkOPLDl7Zua4Cc/SQnhW9+CL3wh/4mhPU1N8MtfpnmJNm9ONYurrkp9BpUe\nbdVVOTmYVcm4cfBf/5Xav9sa+98ZTU0we3ZqRmlsTMMo3/WudHfyyJGpltKtW+r0/tznUsfr+PHw\nD/9Q3jhqbc2a1EQ0dGitI6l/Tg5mVbBwYZqw7tFHt78DuSOamlJzy733pmXFijTW/uGH0w1nV121\nY09ZYe1zcjCrsLVrU3PSmWfCV79amxjmz0+J4ZRTqpOcrP45OZiV2RtvpHsYCs07zz+fOn3vuMNX\n61Y/nBysy3r77TSaZ999001MlTgxr1+f7uR96qkty4IFqQmp0Dl8wglpOgizeuLkYF3O22+nycyu\nvx4OPRRWr05DSC++OHXGFt9MVtDUlCaBK8yN8+KL6X1OOSXdB9C375Zt169PY+Lvvjs11QwalCaP\nO+44eM97UrNNuTuczaqtrpODpDOAG0jThY+LiJLJf50c8mv9+jRiZv36dIV91FFbX91v3pxG9vzu\nd/Dkk+kO1nPPbX2Ezbp1ac6h7343jVYZMybdXQzp6n7cuHRn8RFHwB57wKpVW5a//x323nvrWTW7\ndUtTKDz6aHry2amnpo7dBx5I9xSce266q3jffSv+VZlVXd0mB0ndgHnAB4G/ATOA8yLixWbb1XVy\naGxspKGhoSafvXw5/OAH6ST9la+kE+S2aC32N96Am2+GH/84DbE84IDUPr90aXr4/Pvel4ZlPvRQ\nmr3yIx9JzTRTp8KvfgUHHZTmGBoxIrXnF5pz/vKXdKX/9a+n5wm0ZMOG9D4RacqFwrLPPmlunZbi\n37wZnn46TWPds2eah6cenu9by7+dcnD8tVXO5NCBeQrLagQwPyJeBpA0ATgbeLHNvepMLf7A/v73\ndHfpz36WnubVvXu62v70p+FrX0sn7GIRaS6g555Lc9sXfi5Y0MjgwQ0cdhgMGZKuxhcuhP/8z3QV\n3nzah6VL4Q9/gP/5nzTJ2ne+kxJBwcc/DjfemBLJhAmpWefoo1Ob/mWXpRjbmxxt111T53BHFL77\nnXZKyen44zu2X17U+8nJ8Xcd1U4O/YBFRb8vJiWMHVZEejjLhg2piWXt2nSVvnZtWjZuTBO7FZbN\nm9OJv3h5+ul0o9Y556QTfOEu2699LdUijjwSLrgA+vTZ0j4/dy706pVO1EcfnZpavvnN1MRz/vlb\ntnvkkXTl/fjjLU+V3Ldvms5h1KjWj7F79y3z45tZfah2cuiwIUO2/r1btzRdQLdurS9S+1MKRKQT\nbPMTbkvbNTW1v7QU05o1qV2+8DuUfubbb6eEUHjq1i67pJN1YenZMy277LJ1IujWrfS9Djggte83\nf4JVnz6pNnH11enKf/369PSqK65I329Ls2L26JGSST3MXW9mlVPtPof3AmMj4ozs92uAaN4pLal+\nOxzMzGqoXjukdwLmkjqkXwWeBM6PiDlVC8LMzNpV1WaliNgs6XJgCluGsjoxmJnlTC5vgjMzs9qq\nyqzpksZJWiZpZlHZUZKmS3pO0n2SemXlx0t6pmgZWbTPcEkzJc2TdEM1Yt/W+IteP0jSG5KurKf4\nJR0saZ2kp7PlpnqKv9lrL2Sv96hV/Nv43V+Q/c0/nf3cLOmo7LX35P27l9Rd0q1ZnLOyPsXCPrn/\n25G0s6RbsjifkXRyDuLvL2la9n0+L+mKrHxvSVMkzZU0WVLvon1GS5ovaY6k0zp9DBFR8QU4CTgG\nmFlU9iRwUrb+WeCb2fquQLdsvS+wrOj3PwHHZ+uTgNPzFn/R6/cAdwNXFpXlPn7g4OLtmr1PPcS/\nE/AccET2+95sqSFXPf7O/O1k5UeQ7gmqp+/+fODObH03YAFwUB3FfympqRtgP+CpHHz/fYFjsvVe\npD7bw4DvAV/Nyr8G/Fu2Pgx4htRlcAjwl87+/Vel5hARjwErmxUfmpUDTAXOybbdEBFNWfluQBOA\npL7AHhExI3vtNmAkVbAt8QNIOht4CZhVVFY38QMlox3qKP7TgOci4oVs35UREbWKvxPffcH5wASo\nq+8+gJ5KA092B94C1tRB/J/I1ocB07L9XgNWSTquxvEvjYhns/W1wBygP+nm4fHZZuOL4jkLmBAR\nmyJiITAfGNGZY6jlw/hmSTorWx9FOmAAJI2Q9ALpCvB/Z8miH+mmuYLFWVmttBh/VkX9KnAdW59k\n6yL+zCFZ08ajkk7Kyuol/sEAkh6W9JSkq7PyPMXf1ndfcC5wV7aep9ih9fh/BawjjURcCPwwIlaR\n//gLD2d9DjhL0k6SBgDvyV7LRfySDiHVgp4A+kTEMkgJBNg/26z5jcZLsrJtPoZaJoeLgcskzQB6\nAm8XXoiIJyPiCOB44NpCm3HOtBb/GODHEbGuZpF1TGvxv0pqChgOXAXcqWb9KTnRWvzdgRNJV97/\nCHxc0im1CbFVrf7tQ7o4At6MiNm1CK4DWov/BGATqSlkIPCV7ISWN63FfwvpZDoD+BHwONDCLbLV\nl/0f/BXwpawG0XwkUdlHFtXsDumImAecDiDpUOAjLWwzV9JaUvvrErZkeEhXK0uqEGqL2oj/BOAc\nSd8ntXdvlrQB+A11EH9EvE32nyUinpb0V9LVeL18/4uBP0TEyuy1ScBw4A5yEn8H/vbPY0utAern\nuz8feDir6b8m6XHgOOAx6iD+iNgMFA8geZw0Uegqahi/pO6kxHB7RNyXFS+T1CcilmVNRsuz8tb+\nVrb5b6iaNQdR1Mwiab/sZzfg/wI/zX4/JGuzRNLBwBBgYVZ1Wp01OQm4ELiP6ulQ/BHxvyJiYEQM\nJE1N/t2IuKle4pe0b1aGpIHAIOCleokfmAwcKWnX7D/VycCsGsff0djJYhtF1t8A7zQb5Pm7vzl7\n6RXgA9lrPYH3AnPqIP7C3/5uknbP1j8EbIyIF3MQ/y3A7Ii4sajsflJnOsBFRfHcD5wnqUfWNDYI\neLJTx1ClHvc7SVN0v0X6A/occAWp5/1F0gm0sO2ngReAp4GngI8VvfYe4HlSJ8uN1Yh9W+Nvtt8Y\nth6tlPv4SZ1zxd//h+sp/mz7C7JjmAlcX8v4OxH7ycD0Ft4n9989qYlmYvbdv1CHf/sHZ2WzSDfq\nHpiD+E/PriEPAAAAW0lEQVQkNW09SxqF9DRwBrAPqTN9bhbrXkX7jCaNUpoDnNbZY/BNcGZmVqKW\nHdJmZpZTTg5mZlbCycHMzEo4OZiZWQknBzMzK+HkYGZmJZwczMyshJODmZmV+P/o527arFlzVQAA\nAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0xc903eb8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plot the number of UFO reports per year (line plot is the default)\n",
    "ufo.Year.value_counts().sort_index().plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 26. How do I find and remove duplicate rows in pandas? ([video](https://www.youtube.com/watch?v=ht5buXUMqkQ&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=26))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 251,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip_code</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24</td>\n",
       "      <td>M</td>\n",
       "      <td>technician</td>\n",
       "      <td>85711</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>53</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>94043</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>23</td>\n",
       "      <td>M</td>\n",
       "      <td>writer</td>\n",
       "      <td>32067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24</td>\n",
       "      <td>M</td>\n",
       "      <td>technician</td>\n",
       "      <td>43537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>33</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>15213</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age gender  occupation zip_code\n",
       "user_id                                 \n",
       "1         24      M  technician    85711\n",
       "2         53      F       other    94043\n",
       "3         23      M      writer    32067\n",
       "4         24      M  technician    43537\n",
       "5         33      F       other    15213"
      ]
     },
     "execution_count": 251,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of movie reviewers into a DataFrame\n",
    "user_cols = ['user_id', 'age', 'gender', 'occupation', 'zip_code']\n",
    "users = pd.read_table('http://bit.ly/movieusers', sep='|', header=None, names=user_cols, index_col='user_id')\n",
    "users.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 252,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(943, 4)"
      ]
     },
     "execution_count": 252,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 253,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id\n",
       "939    False\n",
       "940     True\n",
       "941    False\n",
       "942    False\n",
       "943    False\n",
       "Name: zip_code, dtype: bool"
      ]
     },
     "execution_count": 253,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# detect duplicate zip codes: True if an item is identical to a previous item\n",
    "users.zip_code.duplicated().tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 254,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "148"
      ]
     },
     "execution_count": 254,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count the duplicate items (True becomes 1, False becomes 0)\n",
    "users.zip_code.duplicated().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 255,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id\n",
       "939    False\n",
       "940    False\n",
       "941    False\n",
       "942    False\n",
       "943    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 255,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# detect duplicate DataFrame rows: True if an entire row is identical to a previous row\n",
    "users.duplicated().tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 256,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7"
      ]
     },
     "execution_count": 256,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count the duplicate rows\n",
    "users.duplicated().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Logic for [**`duplicated`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html):\n",
    "\n",
    "- **`keep='first'`** (default): Mark duplicates as True except for the first occurrence.\n",
    "- **`keep='last'`**: Mark duplicates as True except for the last occurrence.\n",
    "- **`keep=False`**: Mark all duplicates as True."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 257,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip_code</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>496</th>\n",
       "      <td>21</td>\n",
       "      <td>F</td>\n",
       "      <td>student</td>\n",
       "      <td>55414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>572</th>\n",
       "      <td>51</td>\n",
       "      <td>M</td>\n",
       "      <td>educator</td>\n",
       "      <td>20003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>621</th>\n",
       "      <td>17</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>60402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>684</th>\n",
       "      <td>28</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>55414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>733</th>\n",
       "      <td>44</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>60630</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>805</th>\n",
       "      <td>27</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>20009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>32</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>97301</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age gender occupation zip_code\n",
       "user_id                                \n",
       "496       21      F    student    55414\n",
       "572       51      M   educator    20003\n",
       "621       17      M    student    60402\n",
       "684       28      M    student    55414\n",
       "733       44      F      other    60630\n",
       "805       27      F      other    20009\n",
       "890       32      M    student    97301"
      ]
     },
     "execution_count": 257,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the duplicate rows (ignoring the first occurrence)\n",
    "users.loc[users.duplicated(keep='first'), :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 258,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip_code</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>17</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>60402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>51</td>\n",
       "      <td>M</td>\n",
       "      <td>educator</td>\n",
       "      <td>20003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>198</th>\n",
       "      <td>21</td>\n",
       "      <td>F</td>\n",
       "      <td>student</td>\n",
       "      <td>55414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>350</th>\n",
       "      <td>32</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>97301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>428</th>\n",
       "      <td>28</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>55414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>437</th>\n",
       "      <td>27</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>20009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>460</th>\n",
       "      <td>44</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>60630</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age gender occupation zip_code\n",
       "user_id                                \n",
       "67        17      M    student    60402\n",
       "85        51      M   educator    20003\n",
       "198       21      F    student    55414\n",
       "350       32      M    student    97301\n",
       "428       28      M    student    55414\n",
       "437       27      F      other    20009\n",
       "460       44      F      other    60630"
      ]
     },
     "execution_count": 258,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the duplicate rows (ignoring the last occurrence)\n",
    "users.loc[users.duplicated(keep='last'), :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 259,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip_code</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>17</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>60402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>51</td>\n",
       "      <td>M</td>\n",
       "      <td>educator</td>\n",
       "      <td>20003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>198</th>\n",
       "      <td>21</td>\n",
       "      <td>F</td>\n",
       "      <td>student</td>\n",
       "      <td>55414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>350</th>\n",
       "      <td>32</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>97301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>428</th>\n",
       "      <td>28</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>55414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>437</th>\n",
       "      <td>27</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>20009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>460</th>\n",
       "      <td>44</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>60630</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>496</th>\n",
       "      <td>21</td>\n",
       "      <td>F</td>\n",
       "      <td>student</td>\n",
       "      <td>55414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>572</th>\n",
       "      <td>51</td>\n",
       "      <td>M</td>\n",
       "      <td>educator</td>\n",
       "      <td>20003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>621</th>\n",
       "      <td>17</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>60402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>684</th>\n",
       "      <td>28</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>55414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>733</th>\n",
       "      <td>44</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>60630</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>805</th>\n",
       "      <td>27</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>20009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>32</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>97301</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age gender occupation zip_code\n",
       "user_id                                \n",
       "67        17      M    student    60402\n",
       "85        51      M   educator    20003\n",
       "198       21      F    student    55414\n",
       "350       32      M    student    97301\n",
       "428       28      M    student    55414\n",
       "437       27      F      other    20009\n",
       "460       44      F      other    60630\n",
       "496       21      F    student    55414\n",
       "572       51      M   educator    20003\n",
       "621       17      M    student    60402\n",
       "684       28      M    student    55414\n",
       "733       44      F      other    60630\n",
       "805       27      F      other    20009\n",
       "890       32      M    student    97301"
      ]
     },
     "execution_count": 259,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the duplicate rows (including all duplicates)\n",
    "users.loc[users.duplicated(keep=False), :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 260,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(936, 4)"
      ]
     },
     "execution_count": 260,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop the duplicate rows (inplace=False by default)\n",
    "users.drop_duplicates(keep='first').shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 261,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(936, 4)"
      ]
     },
     "execution_count": 261,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.drop_duplicates(keep='last').shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 262,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(929, 4)"
      ]
     },
     "execution_count": 262,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.drop_duplicates(keep=False).shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`drop_duplicates`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 263,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "16"
      ]
     },
     "execution_count": 263,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# only consider a subset of columns when identifying duplicates\n",
    "users.duplicated(subset=['age', 'zip_code']).sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 264,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(927, 4)"
      ]
     },
     "execution_count": 264,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.drop_duplicates(subset=['age', 'zip_code']).shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 27. How do I avoid a SettingWithCopyWarning in pandas? ([video](https://www.youtube.com/watch?v=4R4WsDJ-KVc&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=27))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 265,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.9</td>\n",
       "      <td>Pulp Fiction</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>154</td>\n",
       "      <td>[u'John Travolta', u'Uma Thurman', u'Samuel L....</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating   genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R   Crime       142   \n",
       "1          9.2             The Godfather              R   Crime       175   \n",
       "2          9.1    The Godfather: Part II              R   Crime       200   \n",
       "3          9.0           The Dark Knight          PG-13  Action       152   \n",
       "4          8.9              Pulp Fiction              R   Crime       154   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1    [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3  [u'Christian Bale', u'Heath Ledger', u'Aaron E...  \n",
       "4  [u'John Travolta', u'Uma Thurman', u'Samuel L....  "
      ]
     },
     "execution_count": 265,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of top-rated IMDb movies into a DataFrame\n",
    "movies = pd.read_csv('http://bit.ly/imdbratings')\n",
    "movies.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 266,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 266,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count the missing values in the 'content_rating' Series\n",
    "movies.content_rating.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 267,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>187</th>\n",
       "      <td>8.2</td>\n",
       "      <td>Butch Cassidy and the Sundance Kid</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Biography</td>\n",
       "      <td>110</td>\n",
       "      <td>[u'Paul Newman', u'Robert Redford', u'Katharin...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>649</th>\n",
       "      <td>7.7</td>\n",
       "      <td>Where Eagles Dare</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Action</td>\n",
       "      <td>158</td>\n",
       "      <td>[u'Richard Burton', u'Clint Eastwood', u'Mary ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>936</th>\n",
       "      <td>7.4</td>\n",
       "      <td>True Grit</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>128</td>\n",
       "      <td>[u'John Wayne', u'Kim Darby', u'Glen Campbell']</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     star_rating                               title content_rating  \\\n",
       "187          8.2  Butch Cassidy and the Sundance Kid            NaN   \n",
       "649          7.7                   Where Eagles Dare            NaN   \n",
       "936          7.4                           True Grit            NaN   \n",
       "\n",
       "         genre  duration                                        actors_list  \n",
       "187  Biography       110  [u'Paul Newman', u'Robert Redford', u'Katharin...  \n",
       "649     Action       158  [u'Richard Burton', u'Clint Eastwood', u'Mary ...  \n",
       "936  Adventure       128    [u'John Wayne', u'Kim Darby', u'Glen Campbell']  "
      ]
     },
     "execution_count": 267,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the DataFrame rows that contain those missing values\n",
    "movies[movies.content_rating.isnull()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 268,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "R            460\n",
       "PG-13        189\n",
       "PG           123\n",
       "NOT RATED     65\n",
       "APPROVED      47\n",
       "UNRATED       38\n",
       "G             32\n",
       "PASSED         7\n",
       "NC-17          7\n",
       "X              4\n",
       "GP             3\n",
       "TV-MA          1\n",
       "Name: content_rating, dtype: int64"
      ]
     },
     "execution_count": 268,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# examine the unique values in the 'content_rating' Series\n",
    "movies.content_rating.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Mark the 'NOT RATED' values as missing values, represented by 'NaN'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 269,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>8.9</td>\n",
       "      <td>12 Angry Men</td>\n",
       "      <td>NOT RATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>96</td>\n",
       "      <td>[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>8.9</td>\n",
       "      <td>The Good, the Bad and the Ugly</td>\n",
       "      <td>NOT RATED</td>\n",
       "      <td>Western</td>\n",
       "      <td>161</td>\n",
       "      <td>[u'Clint Eastwood', u'Eli Wallach', u'Lee Van ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>8.5</td>\n",
       "      <td>Sunset Blvd.</td>\n",
       "      <td>NOT RATED</td>\n",
       "      <td>Drama</td>\n",
       "      <td>110</td>\n",
       "      <td>[u'William Holden', u'Gloria Swanson', u'Erich...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63</th>\n",
       "      <td>8.4</td>\n",
       "      <td>M</td>\n",
       "      <td>NOT RATED</td>\n",
       "      <td>Crime</td>\n",
       "      <td>99</td>\n",
       "      <td>[u'Peter Lorre', u'Ellen Widmann', u'Inge Land...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>66</th>\n",
       "      <td>8.4</td>\n",
       "      <td>Munna Bhai M.B.B.S.</td>\n",
       "      <td>NOT RATED</td>\n",
       "      <td>Comedy</td>\n",
       "      <td>156</td>\n",
       "      <td>[u'Sunil Dutt', u'Sanjay Dutt', u'Arshad Warsi']</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    star_rating                           title content_rating    genre  \\\n",
       "5           8.9                    12 Angry Men      NOT RATED    Drama   \n",
       "6           8.9  The Good, the Bad and the Ugly      NOT RATED  Western   \n",
       "41          8.5                    Sunset Blvd.      NOT RATED    Drama   \n",
       "63          8.4                               M      NOT RATED    Crime   \n",
       "66          8.4             Munna Bhai M.B.B.S.      NOT RATED   Comedy   \n",
       "\n",
       "    duration                                        actors_list  \n",
       "5         96  [u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...  \n",
       "6        161  [u'Clint Eastwood', u'Eli Wallach', u'Lee Van ...  \n",
       "41       110  [u'William Holden', u'Gloria Swanson', u'Erich...  \n",
       "63        99  [u'Peter Lorre', u'Ellen Widmann', u'Inge Land...  \n",
       "66       156   [u'Sunil Dutt', u'Sanjay Dutt', u'Arshad Warsi']  "
      ]
     },
     "execution_count": 269,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# first, locate the relevant rows\n",
    "movies[movies.content_rating=='NOT RATED'].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 270,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5     NOT RATED\n",
       "6     NOT RATED\n",
       "41    NOT RATED\n",
       "63    NOT RATED\n",
       "66    NOT RATED\n",
       "Name: content_rating, dtype: object"
      ]
     },
     "execution_count": 270,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# then, select the 'content_rating' Series from those rows\n",
    "movies[movies.content_rating=='NOT RATED'].content_rating.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 271,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "c:\\Users\\Kevin\\Anaconda\\lib\\site-packages\\pandas\\core\\generic.py:2701: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
      "  self[name] = value\n"
     ]
    }
   ],
   "source": [
    "# finally, replace the 'NOT RATED' values with 'NaN' (imported from NumPy)\n",
    "import numpy as np\n",
    "movies[movies.content_rating=='NOT RATED'].content_rating = np.nan"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Problem:** That statement involves two operations, a **`__getitem__`** and a **`__setitem__`**. pandas can't guarantee whether the **`__getitem__`** operation returns a view or a copy of the data.\n",
    "\n",
    "- If **`__getitem__`** returns a view of the data, **`__setitem__`** will affect the 'movies' DataFrame.\n",
    "- But if **`__getitem__`** returns a copy of the data, **`__setitem__`** will not affect the 'movies' DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 272,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 272,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# the 'content_rating' Series has not changed\n",
    "movies.content_rating.isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Solution:** Use the **`loc`** method, which replaces the 'NOT RATED' values in a single **`__setitem__`** operation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 273,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# replace the 'NOT RATED' values with 'NaN' (does not cause a SettingWithCopyWarning)\n",
    "movies.loc[movies.content_rating=='NOT RATED', 'content_rating'] = np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 274,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "68"
      ]
     },
     "execution_count": 274,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# this time, the 'content_rating' Series has changed\n",
    "movies.content_rating.isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Summary:** Use the **`loc`** method any time you are selecting rows and columns in the same statement.\n",
    "\n",
    "**More information:** [Modern Pandas (Part 1)](http://tomaugspurger.github.io/modern-1.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 275,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating   genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R   Crime       142   \n",
       "1          9.2             The Godfather              R   Crime       175   \n",
       "2          9.1    The Godfather: Part II              R   Crime       200   \n",
       "3          9.0           The Dark Knight          PG-13  Action       152   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1    [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3  [u'Christian Bale', u'Heath Ledger', u'Aaron E...  "
      ]
     },
     "execution_count": 275,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a DataFrame only containing movies with a high 'star_rating'\n",
    "top_movies = movies.loc[movies.star_rating >= 9, :]\n",
    "top_movies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Fix the 'duration' for 'The Shawshank Redemption'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 276,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "c:\\Users\\Kevin\\Anaconda\\lib\\site-packages\\pandas\\core\\indexing.py:465: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
      "  self.obj[item] = s\n"
     ]
    }
   ],
   "source": [
    "# overwrite the relevant cell with the correct duration\n",
    "top_movies.loc[0, 'duration'] = 150"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Problem:** pandas isn't sure whether 'top_movies' is a view or a copy of 'movies'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 277,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>150</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating   genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R   Crime       150   \n",
       "1          9.2             The Godfather              R   Crime       175   \n",
       "2          9.1    The Godfather: Part II              R   Crime       200   \n",
       "3          9.0           The Dark Knight          PG-13  Action       152   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1    [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3  [u'Christian Bale', u'Heath Ledger', u'Aaron E...  "
      ]
     },
     "execution_count": 277,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'top_movies' DataFrame has been updated\n",
    "top_movies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 278,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>142</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating  genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R  Crime       142   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  "
      ]
     },
     "execution_count": 278,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'movies' DataFrame has not been updated\n",
    "movies.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Solution:** Any time you are attempting to create a DataFrame copy, use the [**`copy`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html) method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 279,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# explicitly create a copy of 'movies'\n",
    "top_movies = movies.loc[movies.star_rating >= 9, :].copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 280,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# pandas now knows that you are updating a copy instead of a view (does not cause a SettingWithCopyWarning)\n",
    "top_movies.loc[0, 'duration'] = 150"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 281,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>star_rating</th>\n",
       "      <th>title</th>\n",
       "      <th>content_rating</th>\n",
       "      <th>genre</th>\n",
       "      <th>duration</th>\n",
       "      <th>actors_list</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.3</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>150</td>\n",
       "      <td>[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.2</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>175</td>\n",
       "      <td>[u'Marlon Brando', u'Al Pacino', u'James Caan']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.1</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>R</td>\n",
       "      <td>Crime</td>\n",
       "      <td>200</td>\n",
       "      <td>[u'Al Pacino', u'Robert De Niro', u'Robert Duv...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>PG-13</td>\n",
       "      <td>Action</td>\n",
       "      <td>152</td>\n",
       "      <td>[u'Christian Bale', u'Heath Ledger', u'Aaron E...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   star_rating                     title content_rating   genre  duration  \\\n",
       "0          9.3  The Shawshank Redemption              R   Crime       150   \n",
       "1          9.2             The Godfather              R   Crime       175   \n",
       "2          9.1    The Godfather: Part II              R   Crime       200   \n",
       "3          9.0           The Dark Knight          PG-13  Action       152   \n",
       "\n",
       "                                         actors_list  \n",
       "0  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...  \n",
       "1    [u'Marlon Brando', u'Al Pacino', u'James Caan']  \n",
       "2  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...  \n",
       "3  [u'Christian Bale', u'Heath Ledger', u'Aaron E...  "
      ]
     },
     "execution_count": 281,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'top_movies' DataFrame has been updated\n",
    "top_movies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation on indexing and selection: [Returning a view versus a copy](http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy)\n",
    "\n",
    "Stack Overflow: [What is the point of views in pandas if it is undefined whether an indexing operation returns a view or a copy?](http://stackoverflow.com/questions/34884536/what-is-the-point-of-views-in-pandas-if-it-is-undefined-whether-an-indexing-oper)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 28. How do I change display options in pandas? ([video](https://www.youtube.com/watch?v=yiO43TQ4xvc&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=28))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 282,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 283,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Antigua &amp; Barbuda</td>\n",
       "      <td>102</td>\n",
       "      <td>128</td>\n",
       "      <td>45</td>\n",
       "      <td>4.9</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Argentina</td>\n",
       "      <td>193</td>\n",
       "      <td>25</td>\n",
       "      <td>221</td>\n",
       "      <td>8.3</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Armenia</td>\n",
       "      <td>21</td>\n",
       "      <td>179</td>\n",
       "      <td>11</td>\n",
       "      <td>3.8</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Australia</td>\n",
       "      <td>261</td>\n",
       "      <td>72</td>\n",
       "      <td>212</td>\n",
       "      <td>10.4</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Austria</td>\n",
       "      <td>279</td>\n",
       "      <td>75</td>\n",
       "      <td>191</td>\n",
       "      <td>9.7</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Azerbaijan</td>\n",
       "      <td>21</td>\n",
       "      <td>46</td>\n",
       "      <td>5</td>\n",
       "      <td>1.3</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Bahamas</td>\n",
       "      <td>122</td>\n",
       "      <td>176</td>\n",
       "      <td>51</td>\n",
       "      <td>6.3</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Bahrain</td>\n",
       "      <td>42</td>\n",
       "      <td>63</td>\n",
       "      <td>7</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Bangladesh</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Barbados</td>\n",
       "      <td>143</td>\n",
       "      <td>173</td>\n",
       "      <td>36</td>\n",
       "      <td>6.3</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Belarus</td>\n",
       "      <td>142</td>\n",
       "      <td>373</td>\n",
       "      <td>42</td>\n",
       "      <td>14.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>295</td>\n",
       "      <td>84</td>\n",
       "      <td>212</td>\n",
       "      <td>10.5</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Belize</td>\n",
       "      <td>263</td>\n",
       "      <td>114</td>\n",
       "      <td>8</td>\n",
       "      <td>6.8</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Benin</td>\n",
       "      <td>34</td>\n",
       "      <td>4</td>\n",
       "      <td>13</td>\n",
       "      <td>1.1</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Bhutan</td>\n",
       "      <td>23</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.4</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Bolivia</td>\n",
       "      <td>167</td>\n",
       "      <td>41</td>\n",
       "      <td>8</td>\n",
       "      <td>3.8</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Bosnia-Herzegovina</td>\n",
       "      <td>76</td>\n",
       "      <td>173</td>\n",
       "      <td>8</td>\n",
       "      <td>4.6</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Botswana</td>\n",
       "      <td>173</td>\n",
       "      <td>35</td>\n",
       "      <td>35</td>\n",
       "      <td>5.4</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>245</td>\n",
       "      <td>145</td>\n",
       "      <td>16</td>\n",
       "      <td>7.2</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Brunei</td>\n",
       "      <td>31</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.6</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>231</td>\n",
       "      <td>252</td>\n",
       "      <td>94</td>\n",
       "      <td>10.3</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Burkina Faso</td>\n",
       "      <td>25</td>\n",
       "      <td>7</td>\n",
       "      <td>7</td>\n",
       "      <td>4.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Burundi</td>\n",
       "      <td>88</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Cote d'Ivoire</td>\n",
       "      <td>37</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Cabo Verde</td>\n",
       "      <td>144</td>\n",
       "      <td>56</td>\n",
       "      <td>16</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>163</th>\n",
       "      <td>Suriname</td>\n",
       "      <td>128</td>\n",
       "      <td>178</td>\n",
       "      <td>7</td>\n",
       "      <td>5.6</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>164</th>\n",
       "      <td>Swaziland</td>\n",
       "      <td>90</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>4.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>165</th>\n",
       "      <td>Sweden</td>\n",
       "      <td>152</td>\n",
       "      <td>60</td>\n",
       "      <td>186</td>\n",
       "      <td>7.2</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>166</th>\n",
       "      <td>Switzerland</td>\n",
       "      <td>185</td>\n",
       "      <td>100</td>\n",
       "      <td>280</td>\n",
       "      <td>10.2</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167</th>\n",
       "      <td>Syria</td>\n",
       "      <td>5</td>\n",
       "      <td>35</td>\n",
       "      <td>16</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168</th>\n",
       "      <td>Tajikistan</td>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "      <td>0</td>\n",
       "      <td>0.3</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169</th>\n",
       "      <td>Thailand</td>\n",
       "      <td>99</td>\n",
       "      <td>258</td>\n",
       "      <td>1</td>\n",
       "      <td>6.4</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>170</th>\n",
       "      <td>Macedonia</td>\n",
       "      <td>106</td>\n",
       "      <td>27</td>\n",
       "      <td>86</td>\n",
       "      <td>3.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>171</th>\n",
       "      <td>Timor-Leste</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172</th>\n",
       "      <td>Togo</td>\n",
       "      <td>36</td>\n",
       "      <td>2</td>\n",
       "      <td>19</td>\n",
       "      <td>1.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173</th>\n",
       "      <td>Tonga</td>\n",
       "      <td>36</td>\n",
       "      <td>21</td>\n",
       "      <td>5</td>\n",
       "      <td>1.1</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>174</th>\n",
       "      <td>Trinidad &amp; Tobago</td>\n",
       "      <td>197</td>\n",
       "      <td>156</td>\n",
       "      <td>7</td>\n",
       "      <td>6.4</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>175</th>\n",
       "      <td>Tunisia</td>\n",
       "      <td>51</td>\n",
       "      <td>3</td>\n",
       "      <td>20</td>\n",
       "      <td>1.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>176</th>\n",
       "      <td>Turkey</td>\n",
       "      <td>51</td>\n",
       "      <td>22</td>\n",
       "      <td>7</td>\n",
       "      <td>1.4</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>177</th>\n",
       "      <td>Turkmenistan</td>\n",
       "      <td>19</td>\n",
       "      <td>71</td>\n",
       "      <td>32</td>\n",
       "      <td>2.2</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>Tuvalu</td>\n",
       "      <td>6</td>\n",
       "      <td>41</td>\n",
       "      <td>9</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>179</th>\n",
       "      <td>Uganda</td>\n",
       "      <td>45</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>8.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180</th>\n",
       "      <td>Ukraine</td>\n",
       "      <td>206</td>\n",
       "      <td>237</td>\n",
       "      <td>45</td>\n",
       "      <td>8.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>181</th>\n",
       "      <td>United Arab Emirates</td>\n",
       "      <td>16</td>\n",
       "      <td>135</td>\n",
       "      <td>5</td>\n",
       "      <td>2.8</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>182</th>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>219</td>\n",
       "      <td>126</td>\n",
       "      <td>195</td>\n",
       "      <td>10.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>183</th>\n",
       "      <td>Tanzania</td>\n",
       "      <td>36</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>5.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>184</th>\n",
       "      <td>USA</td>\n",
       "      <td>249</td>\n",
       "      <td>158</td>\n",
       "      <td>84</td>\n",
       "      <td>8.7</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185</th>\n",
       "      <td>Uruguay</td>\n",
       "      <td>115</td>\n",
       "      <td>35</td>\n",
       "      <td>220</td>\n",
       "      <td>6.6</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>186</th>\n",
       "      <td>Uzbekistan</td>\n",
       "      <td>25</td>\n",
       "      <td>101</td>\n",
       "      <td>8</td>\n",
       "      <td>2.4</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>187</th>\n",
       "      <td>Vanuatu</td>\n",
       "      <td>21</td>\n",
       "      <td>18</td>\n",
       "      <td>11</td>\n",
       "      <td>0.9</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188</th>\n",
       "      <td>Venezuela</td>\n",
       "      <td>333</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>7.7</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189</th>\n",
       "      <td>Vietnam</td>\n",
       "      <td>111</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>Yemen</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191</th>\n",
       "      <td>Zambia</td>\n",
       "      <td>32</td>\n",
       "      <td>19</td>\n",
       "      <td>4</td>\n",
       "      <td>2.5</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192</th>\n",
       "      <td>Zimbabwe</td>\n",
       "      <td>64</td>\n",
       "      <td>18</td>\n",
       "      <td>4</td>\n",
       "      <td>4.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>193 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                  country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0             Afghanistan              0                0              0   \n",
       "1                 Albania             89              132             54   \n",
       "2                 Algeria             25                0             14   \n",
       "3                 Andorra            245              138            312   \n",
       "4                  Angola            217               57             45   \n",
       "5       Antigua & Barbuda            102              128             45   \n",
       "6               Argentina            193               25            221   \n",
       "7                 Armenia             21              179             11   \n",
       "8               Australia            261               72            212   \n",
       "9                 Austria            279               75            191   \n",
       "10             Azerbaijan             21               46              5   \n",
       "11                Bahamas            122              176             51   \n",
       "12                Bahrain             42               63              7   \n",
       "13             Bangladesh              0                0              0   \n",
       "14               Barbados            143              173             36   \n",
       "15                Belarus            142              373             42   \n",
       "16                Belgium            295               84            212   \n",
       "17                 Belize            263              114              8   \n",
       "18                  Benin             34                4             13   \n",
       "19                 Bhutan             23                0              0   \n",
       "20                Bolivia            167               41              8   \n",
       "21     Bosnia-Herzegovina             76              173              8   \n",
       "22               Botswana            173               35             35   \n",
       "23                 Brazil            245              145             16   \n",
       "24                 Brunei             31                2              1   \n",
       "25               Bulgaria            231              252             94   \n",
       "26           Burkina Faso             25                7              7   \n",
       "27                Burundi             88                0              0   \n",
       "28          Cote d'Ivoire             37                1              7   \n",
       "29             Cabo Verde            144               56             16   \n",
       "..                    ...            ...              ...            ...   \n",
       "163              Suriname            128              178              7   \n",
       "164             Swaziland             90                2              2   \n",
       "165                Sweden            152               60            186   \n",
       "166           Switzerland            185              100            280   \n",
       "167                 Syria              5               35             16   \n",
       "168            Tajikistan              2               15              0   \n",
       "169              Thailand             99              258              1   \n",
       "170             Macedonia            106               27             86   \n",
       "171           Timor-Leste              1                1              4   \n",
       "172                  Togo             36                2             19   \n",
       "173                 Tonga             36               21              5   \n",
       "174     Trinidad & Tobago            197              156              7   \n",
       "175               Tunisia             51                3             20   \n",
       "176                Turkey             51               22              7   \n",
       "177          Turkmenistan             19               71             32   \n",
       "178                Tuvalu              6               41              9   \n",
       "179                Uganda             45                9              0   \n",
       "180               Ukraine            206              237             45   \n",
       "181  United Arab Emirates             16              135              5   \n",
       "182        United Kingdom            219              126            195   \n",
       "183              Tanzania             36                6              1   \n",
       "184                   USA            249              158             84   \n",
       "185               Uruguay            115               35            220   \n",
       "186            Uzbekistan             25              101              8   \n",
       "187               Vanuatu             21               18             11   \n",
       "188             Venezuela            333              100              3   \n",
       "189               Vietnam            111                2              1   \n",
       "190                 Yemen              6                0              0   \n",
       "191                Zambia             32               19              4   \n",
       "192              Zimbabwe             64               18              4   \n",
       "\n",
       "     total_litres_of_pure_alcohol      continent  \n",
       "0                             0.0           Asia  \n",
       "1                             4.9         Europe  \n",
       "2                             0.7         Africa  \n",
       "3                            12.4         Europe  \n",
       "4                             5.9         Africa  \n",
       "5                             4.9  North America  \n",
       "6                             8.3  South America  \n",
       "7                             3.8         Europe  \n",
       "8                            10.4        Oceania  \n",
       "9                             9.7         Europe  \n",
       "10                            1.3         Europe  \n",
       "11                            6.3  North America  \n",
       "12                            2.0           Asia  \n",
       "13                            0.0           Asia  \n",
       "14                            6.3  North America  \n",
       "15                           14.4         Europe  \n",
       "16                           10.5         Europe  \n",
       "17                            6.8  North America  \n",
       "18                            1.1         Africa  \n",
       "19                            0.4           Asia  \n",
       "20                            3.8  South America  \n",
       "21                            4.6         Europe  \n",
       "22                            5.4         Africa  \n",
       "23                            7.2  South America  \n",
       "24                            0.6           Asia  \n",
       "25                           10.3         Europe  \n",
       "26                            4.3         Africa  \n",
       "27                            6.3         Africa  \n",
       "28                            4.0         Africa  \n",
       "29                            4.0         Africa  \n",
       "..                            ...            ...  \n",
       "163                           5.6  South America  \n",
       "164                           4.7         Africa  \n",
       "165                           7.2         Europe  \n",
       "166                          10.2         Europe  \n",
       "167                           1.0           Asia  \n",
       "168                           0.3           Asia  \n",
       "169                           6.4           Asia  \n",
       "170                           3.9         Europe  \n",
       "171                           0.1           Asia  \n",
       "172                           1.3         Africa  \n",
       "173                           1.1        Oceania  \n",
       "174                           6.4  North America  \n",
       "175                           1.3         Africa  \n",
       "176                           1.4           Asia  \n",
       "177                           2.2           Asia  \n",
       "178                           1.0        Oceania  \n",
       "179                           8.3         Africa  \n",
       "180                           8.9         Europe  \n",
       "181                           2.8           Asia  \n",
       "182                          10.4         Europe  \n",
       "183                           5.7         Africa  \n",
       "184                           8.7  North America  \n",
       "185                           6.6  South America  \n",
       "186                           2.4           Asia  \n",
       "187                           0.9        Oceania  \n",
       "188                           7.7  South America  \n",
       "189                           2.0           Asia  \n",
       "190                           0.1           Asia  \n",
       "191                           2.5         Africa  \n",
       "192                           4.7         Africa  \n",
       "\n",
       "[193 rows x 6 columns]"
      ]
     },
     "execution_count": 283,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# only 60 rows will be displayed when printing\n",
    "drinks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 284,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "60"
      ]
     },
     "execution_count": 284,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check the current setting for the 'max_rows' option\n",
    "pd.get_option('display.max_rows')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`get_option`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_option.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 285,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Antigua &amp; Barbuda</td>\n",
       "      <td>102</td>\n",
       "      <td>128</td>\n",
       "      <td>45</td>\n",
       "      <td>4.9</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Argentina</td>\n",
       "      <td>193</td>\n",
       "      <td>25</td>\n",
       "      <td>221</td>\n",
       "      <td>8.3</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Armenia</td>\n",
       "      <td>21</td>\n",
       "      <td>179</td>\n",
       "      <td>11</td>\n",
       "      <td>3.8</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Australia</td>\n",
       "      <td>261</td>\n",
       "      <td>72</td>\n",
       "      <td>212</td>\n",
       "      <td>10.4</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Austria</td>\n",
       "      <td>279</td>\n",
       "      <td>75</td>\n",
       "      <td>191</td>\n",
       "      <td>9.7</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Azerbaijan</td>\n",
       "      <td>21</td>\n",
       "      <td>46</td>\n",
       "      <td>5</td>\n",
       "      <td>1.3</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Bahamas</td>\n",
       "      <td>122</td>\n",
       "      <td>176</td>\n",
       "      <td>51</td>\n",
       "      <td>6.3</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Bahrain</td>\n",
       "      <td>42</td>\n",
       "      <td>63</td>\n",
       "      <td>7</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Bangladesh</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Barbados</td>\n",
       "      <td>143</td>\n",
       "      <td>173</td>\n",
       "      <td>36</td>\n",
       "      <td>6.3</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Belarus</td>\n",
       "      <td>142</td>\n",
       "      <td>373</td>\n",
       "      <td>42</td>\n",
       "      <td>14.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>295</td>\n",
       "      <td>84</td>\n",
       "      <td>212</td>\n",
       "      <td>10.5</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Belize</td>\n",
       "      <td>263</td>\n",
       "      <td>114</td>\n",
       "      <td>8</td>\n",
       "      <td>6.8</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Benin</td>\n",
       "      <td>34</td>\n",
       "      <td>4</td>\n",
       "      <td>13</td>\n",
       "      <td>1.1</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Bhutan</td>\n",
       "      <td>23</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.4</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Bolivia</td>\n",
       "      <td>167</td>\n",
       "      <td>41</td>\n",
       "      <td>8</td>\n",
       "      <td>3.8</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Bosnia-Herzegovina</td>\n",
       "      <td>76</td>\n",
       "      <td>173</td>\n",
       "      <td>8</td>\n",
       "      <td>4.6</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Botswana</td>\n",
       "      <td>173</td>\n",
       "      <td>35</td>\n",
       "      <td>35</td>\n",
       "      <td>5.4</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>245</td>\n",
       "      <td>145</td>\n",
       "      <td>16</td>\n",
       "      <td>7.2</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Brunei</td>\n",
       "      <td>31</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.6</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>231</td>\n",
       "      <td>252</td>\n",
       "      <td>94</td>\n",
       "      <td>10.3</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Burkina Faso</td>\n",
       "      <td>25</td>\n",
       "      <td>7</td>\n",
       "      <td>7</td>\n",
       "      <td>4.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Burundi</td>\n",
       "      <td>88</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Cote d'Ivoire</td>\n",
       "      <td>37</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Cabo Verde</td>\n",
       "      <td>144</td>\n",
       "      <td>56</td>\n",
       "      <td>16</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>Cambodia</td>\n",
       "      <td>57</td>\n",
       "      <td>65</td>\n",
       "      <td>1</td>\n",
       "      <td>2.2</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>Cameroon</td>\n",
       "      <td>147</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>5.8</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>Canada</td>\n",
       "      <td>240</td>\n",
       "      <td>122</td>\n",
       "      <td>100</td>\n",
       "      <td>8.2</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>Central African Republic</td>\n",
       "      <td>17</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1.8</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>Chad</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.4</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>Chile</td>\n",
       "      <td>130</td>\n",
       "      <td>124</td>\n",
       "      <td>172</td>\n",
       "      <td>7.6</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>China</td>\n",
       "      <td>79</td>\n",
       "      <td>192</td>\n",
       "      <td>8</td>\n",
       "      <td>5.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>Colombia</td>\n",
       "      <td>159</td>\n",
       "      <td>76</td>\n",
       "      <td>3</td>\n",
       "      <td>4.2</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>Comoros</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>Congo</td>\n",
       "      <td>76</td>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "      <td>1.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>Cook Islands</td>\n",
       "      <td>0</td>\n",
       "      <td>254</td>\n",
       "      <td>74</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>Costa Rica</td>\n",
       "      <td>149</td>\n",
       "      <td>87</td>\n",
       "      <td>11</td>\n",
       "      <td>4.4</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>Croatia</td>\n",
       "      <td>230</td>\n",
       "      <td>87</td>\n",
       "      <td>254</td>\n",
       "      <td>10.2</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>Cuba</td>\n",
       "      <td>93</td>\n",
       "      <td>137</td>\n",
       "      <td>5</td>\n",
       "      <td>4.2</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>Cyprus</td>\n",
       "      <td>192</td>\n",
       "      <td>154</td>\n",
       "      <td>113</td>\n",
       "      <td>8.2</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>Czech Republic</td>\n",
       "      <td>361</td>\n",
       "      <td>170</td>\n",
       "      <td>134</td>\n",
       "      <td>11.8</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>North Korea</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>DR Congo</td>\n",
       "      <td>32</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>2.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>Denmark</td>\n",
       "      <td>224</td>\n",
       "      <td>81</td>\n",
       "      <td>278</td>\n",
       "      <td>10.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>Djibouti</td>\n",
       "      <td>15</td>\n",
       "      <td>44</td>\n",
       "      <td>3</td>\n",
       "      <td>1.1</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>Dominica</td>\n",
       "      <td>52</td>\n",
       "      <td>286</td>\n",
       "      <td>26</td>\n",
       "      <td>6.6</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>Dominican Republic</td>\n",
       "      <td>193</td>\n",
       "      <td>147</td>\n",
       "      <td>9</td>\n",
       "      <td>6.2</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>Ecuador</td>\n",
       "      <td>162</td>\n",
       "      <td>74</td>\n",
       "      <td>3</td>\n",
       "      <td>4.2</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>Egypt</td>\n",
       "      <td>6</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>El Salvador</td>\n",
       "      <td>52</td>\n",
       "      <td>69</td>\n",
       "      <td>2</td>\n",
       "      <td>2.2</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>Equatorial Guinea</td>\n",
       "      <td>92</td>\n",
       "      <td>0</td>\n",
       "      <td>233</td>\n",
       "      <td>5.8</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>Eritrea</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.5</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>Estonia</td>\n",
       "      <td>224</td>\n",
       "      <td>194</td>\n",
       "      <td>59</td>\n",
       "      <td>9.5</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>Ethiopia</td>\n",
       "      <td>20</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>Fiji</td>\n",
       "      <td>77</td>\n",
       "      <td>35</td>\n",
       "      <td>1</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60</th>\n",
       "      <td>Finland</td>\n",
       "      <td>263</td>\n",
       "      <td>133</td>\n",
       "      <td>97</td>\n",
       "      <td>10.0</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61</th>\n",
       "      <td>France</td>\n",
       "      <td>127</td>\n",
       "      <td>151</td>\n",
       "      <td>370</td>\n",
       "      <td>11.8</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>Gabon</td>\n",
       "      <td>347</td>\n",
       "      <td>98</td>\n",
       "      <td>59</td>\n",
       "      <td>8.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63</th>\n",
       "      <td>Gambia</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2.4</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>64</th>\n",
       "      <td>Georgia</td>\n",
       "      <td>52</td>\n",
       "      <td>100</td>\n",
       "      <td>149</td>\n",
       "      <td>5.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>65</th>\n",
       "      <td>Germany</td>\n",
       "      <td>346</td>\n",
       "      <td>117</td>\n",
       "      <td>175</td>\n",
       "      <td>11.3</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>66</th>\n",
       "      <td>Ghana</td>\n",
       "      <td>31</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>1.8</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>Greece</td>\n",
       "      <td>133</td>\n",
       "      <td>112</td>\n",
       "      <td>218</td>\n",
       "      <td>8.3</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>Grenada</td>\n",
       "      <td>199</td>\n",
       "      <td>438</td>\n",
       "      <td>28</td>\n",
       "      <td>11.9</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69</th>\n",
       "      <td>Guatemala</td>\n",
       "      <td>53</td>\n",
       "      <td>69</td>\n",
       "      <td>2</td>\n",
       "      <td>2.2</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70</th>\n",
       "      <td>Guinea</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>Guinea-Bissau</td>\n",
       "      <td>28</td>\n",
       "      <td>31</td>\n",
       "      <td>21</td>\n",
       "      <td>2.5</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72</th>\n",
       "      <td>Guyana</td>\n",
       "      <td>93</td>\n",
       "      <td>302</td>\n",
       "      <td>1</td>\n",
       "      <td>7.1</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>Haiti</td>\n",
       "      <td>1</td>\n",
       "      <td>326</td>\n",
       "      <td>1</td>\n",
       "      <td>5.9</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74</th>\n",
       "      <td>Honduras</td>\n",
       "      <td>69</td>\n",
       "      <td>98</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>Hungary</td>\n",
       "      <td>234</td>\n",
       "      <td>215</td>\n",
       "      <td>185</td>\n",
       "      <td>11.3</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>Iceland</td>\n",
       "      <td>233</td>\n",
       "      <td>61</td>\n",
       "      <td>78</td>\n",
       "      <td>6.6</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>India</td>\n",
       "      <td>9</td>\n",
       "      <td>114</td>\n",
       "      <td>0</td>\n",
       "      <td>2.2</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78</th>\n",
       "      <td>Indonesia</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>Iran</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>Iraq</td>\n",
       "      <td>9</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>Ireland</td>\n",
       "      <td>313</td>\n",
       "      <td>118</td>\n",
       "      <td>165</td>\n",
       "      <td>11.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>Israel</td>\n",
       "      <td>63</td>\n",
       "      <td>69</td>\n",
       "      <td>9</td>\n",
       "      <td>2.5</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>Italy</td>\n",
       "      <td>85</td>\n",
       "      <td>42</td>\n",
       "      <td>237</td>\n",
       "      <td>6.5</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>Jamaica</td>\n",
       "      <td>82</td>\n",
       "      <td>97</td>\n",
       "      <td>9</td>\n",
       "      <td>3.4</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>Japan</td>\n",
       "      <td>77</td>\n",
       "      <td>202</td>\n",
       "      <td>16</td>\n",
       "      <td>7.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>Jordan</td>\n",
       "      <td>6</td>\n",
       "      <td>21</td>\n",
       "      <td>1</td>\n",
       "      <td>0.5</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>Kazakhstan</td>\n",
       "      <td>124</td>\n",
       "      <td>246</td>\n",
       "      <td>12</td>\n",
       "      <td>6.8</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>Kenya</td>\n",
       "      <td>58</td>\n",
       "      <td>22</td>\n",
       "      <td>2</td>\n",
       "      <td>1.8</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>Kiribati</td>\n",
       "      <td>21</td>\n",
       "      <td>34</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>Kuwait</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91</th>\n",
       "      <td>Kyrgyzstan</td>\n",
       "      <td>31</td>\n",
       "      <td>97</td>\n",
       "      <td>6</td>\n",
       "      <td>2.4</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>92</th>\n",
       "      <td>Laos</td>\n",
       "      <td>62</td>\n",
       "      <td>0</td>\n",
       "      <td>123</td>\n",
       "      <td>6.2</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>93</th>\n",
       "      <td>Latvia</td>\n",
       "      <td>281</td>\n",
       "      <td>216</td>\n",
       "      <td>62</td>\n",
       "      <td>10.5</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>94</th>\n",
       "      <td>Lebanon</td>\n",
       "      <td>20</td>\n",
       "      <td>55</td>\n",
       "      <td>31</td>\n",
       "      <td>1.9</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95</th>\n",
       "      <td>Lesotho</td>\n",
       "      <td>82</td>\n",
       "      <td>29</td>\n",
       "      <td>0</td>\n",
       "      <td>2.8</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>96</th>\n",
       "      <td>Liberia</td>\n",
       "      <td>19</td>\n",
       "      <td>152</td>\n",
       "      <td>2</td>\n",
       "      <td>3.1</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>97</th>\n",
       "      <td>Libya</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>Lithuania</td>\n",
       "      <td>343</td>\n",
       "      <td>244</td>\n",
       "      <td>56</td>\n",
       "      <td>12.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>Luxembourg</td>\n",
       "      <td>236</td>\n",
       "      <td>133</td>\n",
       "      <td>271</td>\n",
       "      <td>11.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>Madagascar</td>\n",
       "      <td>26</td>\n",
       "      <td>15</td>\n",
       "      <td>4</td>\n",
       "      <td>0.8</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>Malawi</td>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>Malaysia</td>\n",
       "      <td>13</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0.3</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103</th>\n",
       "      <td>Maldives</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>Mali</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.6</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>105</th>\n",
       "      <td>Malta</td>\n",
       "      <td>149</td>\n",
       "      <td>100</td>\n",
       "      <td>120</td>\n",
       "      <td>6.6</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106</th>\n",
       "      <td>Marshall Islands</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>107</th>\n",
       "      <td>Mauritania</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108</th>\n",
       "      <td>Mauritius</td>\n",
       "      <td>98</td>\n",
       "      <td>31</td>\n",
       "      <td>18</td>\n",
       "      <td>2.6</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>Mexico</td>\n",
       "      <td>238</td>\n",
       "      <td>68</td>\n",
       "      <td>5</td>\n",
       "      <td>5.5</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>110</th>\n",
       "      <td>Micronesia</td>\n",
       "      <td>62</td>\n",
       "      <td>50</td>\n",
       "      <td>18</td>\n",
       "      <td>2.3</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>111</th>\n",
       "      <td>Monaco</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112</th>\n",
       "      <td>Mongolia</td>\n",
       "      <td>77</td>\n",
       "      <td>189</td>\n",
       "      <td>8</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>113</th>\n",
       "      <td>Montenegro</td>\n",
       "      <td>31</td>\n",
       "      <td>114</td>\n",
       "      <td>128</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>114</th>\n",
       "      <td>Morocco</td>\n",
       "      <td>12</td>\n",
       "      <td>6</td>\n",
       "      <td>10</td>\n",
       "      <td>0.5</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>115</th>\n",
       "      <td>Mozambique</td>\n",
       "      <td>47</td>\n",
       "      <td>18</td>\n",
       "      <td>5</td>\n",
       "      <td>1.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>116</th>\n",
       "      <td>Myanmar</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117</th>\n",
       "      <td>Namibia</td>\n",
       "      <td>376</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>6.8</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>118</th>\n",
       "      <td>Nauru</td>\n",
       "      <td>49</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>119</th>\n",
       "      <td>Nepal</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>Netherlands</td>\n",
       "      <td>251</td>\n",
       "      <td>88</td>\n",
       "      <td>190</td>\n",
       "      <td>9.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121</th>\n",
       "      <td>New Zealand</td>\n",
       "      <td>203</td>\n",
       "      <td>79</td>\n",
       "      <td>175</td>\n",
       "      <td>9.3</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>Nicaragua</td>\n",
       "      <td>78</td>\n",
       "      <td>118</td>\n",
       "      <td>1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123</th>\n",
       "      <td>Niger</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>124</th>\n",
       "      <td>Nigeria</td>\n",
       "      <td>42</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>9.1</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>125</th>\n",
       "      <td>Niue</td>\n",
       "      <td>188</td>\n",
       "      <td>200</td>\n",
       "      <td>7</td>\n",
       "      <td>7.0</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>126</th>\n",
       "      <td>Norway</td>\n",
       "      <td>169</td>\n",
       "      <td>71</td>\n",
       "      <td>129</td>\n",
       "      <td>6.7</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127</th>\n",
       "      <td>Oman</td>\n",
       "      <td>22</td>\n",
       "      <td>16</td>\n",
       "      <td>1</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128</th>\n",
       "      <td>Pakistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129</th>\n",
       "      <td>Palau</td>\n",
       "      <td>306</td>\n",
       "      <td>63</td>\n",
       "      <td>23</td>\n",
       "      <td>6.9</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130</th>\n",
       "      <td>Panama</td>\n",
       "      <td>285</td>\n",
       "      <td>104</td>\n",
       "      <td>18</td>\n",
       "      <td>7.2</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131</th>\n",
       "      <td>Papua New Guinea</td>\n",
       "      <td>44</td>\n",
       "      <td>39</td>\n",
       "      <td>1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132</th>\n",
       "      <td>Paraguay</td>\n",
       "      <td>213</td>\n",
       "      <td>117</td>\n",
       "      <td>74</td>\n",
       "      <td>7.3</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133</th>\n",
       "      <td>Peru</td>\n",
       "      <td>163</td>\n",
       "      <td>160</td>\n",
       "      <td>21</td>\n",
       "      <td>6.1</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>134</th>\n",
       "      <td>Philippines</td>\n",
       "      <td>71</td>\n",
       "      <td>186</td>\n",
       "      <td>1</td>\n",
       "      <td>4.6</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135</th>\n",
       "      <td>Poland</td>\n",
       "      <td>343</td>\n",
       "      <td>215</td>\n",
       "      <td>56</td>\n",
       "      <td>10.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>136</th>\n",
       "      <td>Portugal</td>\n",
       "      <td>194</td>\n",
       "      <td>67</td>\n",
       "      <td>339</td>\n",
       "      <td>11.0</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>137</th>\n",
       "      <td>Qatar</td>\n",
       "      <td>1</td>\n",
       "      <td>42</td>\n",
       "      <td>7</td>\n",
       "      <td>0.9</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>138</th>\n",
       "      <td>South Korea</td>\n",
       "      <td>140</td>\n",
       "      <td>16</td>\n",
       "      <td>9</td>\n",
       "      <td>9.8</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>Moldova</td>\n",
       "      <td>109</td>\n",
       "      <td>226</td>\n",
       "      <td>18</td>\n",
       "      <td>6.3</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>140</th>\n",
       "      <td>Romania</td>\n",
       "      <td>297</td>\n",
       "      <td>122</td>\n",
       "      <td>167</td>\n",
       "      <td>10.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>141</th>\n",
       "      <td>Russian Federation</td>\n",
       "      <td>247</td>\n",
       "      <td>326</td>\n",
       "      <td>73</td>\n",
       "      <td>11.5</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>Rwanda</td>\n",
       "      <td>43</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>6.8</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>St. Kitts &amp; Nevis</td>\n",
       "      <td>194</td>\n",
       "      <td>205</td>\n",
       "      <td>32</td>\n",
       "      <td>7.7</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>144</th>\n",
       "      <td>St. Lucia</td>\n",
       "      <td>171</td>\n",
       "      <td>315</td>\n",
       "      <td>71</td>\n",
       "      <td>10.1</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>145</th>\n",
       "      <td>St. Vincent &amp; the Grenadines</td>\n",
       "      <td>120</td>\n",
       "      <td>221</td>\n",
       "      <td>11</td>\n",
       "      <td>6.3</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146</th>\n",
       "      <td>Samoa</td>\n",
       "      <td>105</td>\n",
       "      <td>18</td>\n",
       "      <td>24</td>\n",
       "      <td>2.6</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147</th>\n",
       "      <td>San Marino</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>148</th>\n",
       "      <td>Sao Tome &amp; Principe</td>\n",
       "      <td>56</td>\n",
       "      <td>38</td>\n",
       "      <td>140</td>\n",
       "      <td>4.2</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>Saudi Arabia</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>150</th>\n",
       "      <td>Senegal</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>0.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>151</th>\n",
       "      <td>Serbia</td>\n",
       "      <td>283</td>\n",
       "      <td>131</td>\n",
       "      <td>127</td>\n",
       "      <td>9.6</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>152</th>\n",
       "      <td>Seychelles</td>\n",
       "      <td>157</td>\n",
       "      <td>25</td>\n",
       "      <td>51</td>\n",
       "      <td>4.1</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>153</th>\n",
       "      <td>Sierra Leone</td>\n",
       "      <td>25</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>6.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>154</th>\n",
       "      <td>Singapore</td>\n",
       "      <td>60</td>\n",
       "      <td>12</td>\n",
       "      <td>11</td>\n",
       "      <td>1.5</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>155</th>\n",
       "      <td>Slovakia</td>\n",
       "      <td>196</td>\n",
       "      <td>293</td>\n",
       "      <td>116</td>\n",
       "      <td>11.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>156</th>\n",
       "      <td>Slovenia</td>\n",
       "      <td>270</td>\n",
       "      <td>51</td>\n",
       "      <td>276</td>\n",
       "      <td>10.6</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>157</th>\n",
       "      <td>Solomon Islands</td>\n",
       "      <td>56</td>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1.2</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>158</th>\n",
       "      <td>Somalia</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>159</th>\n",
       "      <td>South Africa</td>\n",
       "      <td>225</td>\n",
       "      <td>76</td>\n",
       "      <td>81</td>\n",
       "      <td>8.2</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>160</th>\n",
       "      <td>Spain</td>\n",
       "      <td>284</td>\n",
       "      <td>157</td>\n",
       "      <td>112</td>\n",
       "      <td>10.0</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>161</th>\n",
       "      <td>Sri Lanka</td>\n",
       "      <td>16</td>\n",
       "      <td>104</td>\n",
       "      <td>0</td>\n",
       "      <td>2.2</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>162</th>\n",
       "      <td>Sudan</td>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>1.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>163</th>\n",
       "      <td>Suriname</td>\n",
       "      <td>128</td>\n",
       "      <td>178</td>\n",
       "      <td>7</td>\n",
       "      <td>5.6</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>164</th>\n",
       "      <td>Swaziland</td>\n",
       "      <td>90</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>4.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>165</th>\n",
       "      <td>Sweden</td>\n",
       "      <td>152</td>\n",
       "      <td>60</td>\n",
       "      <td>186</td>\n",
       "      <td>7.2</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>166</th>\n",
       "      <td>Switzerland</td>\n",
       "      <td>185</td>\n",
       "      <td>100</td>\n",
       "      <td>280</td>\n",
       "      <td>10.2</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167</th>\n",
       "      <td>Syria</td>\n",
       "      <td>5</td>\n",
       "      <td>35</td>\n",
       "      <td>16</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168</th>\n",
       "      <td>Tajikistan</td>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "      <td>0</td>\n",
       "      <td>0.3</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169</th>\n",
       "      <td>Thailand</td>\n",
       "      <td>99</td>\n",
       "      <td>258</td>\n",
       "      <td>1</td>\n",
       "      <td>6.4</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>170</th>\n",
       "      <td>Macedonia</td>\n",
       "      <td>106</td>\n",
       "      <td>27</td>\n",
       "      <td>86</td>\n",
       "      <td>3.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>171</th>\n",
       "      <td>Timor-Leste</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172</th>\n",
       "      <td>Togo</td>\n",
       "      <td>36</td>\n",
       "      <td>2</td>\n",
       "      <td>19</td>\n",
       "      <td>1.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173</th>\n",
       "      <td>Tonga</td>\n",
       "      <td>36</td>\n",
       "      <td>21</td>\n",
       "      <td>5</td>\n",
       "      <td>1.1</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>174</th>\n",
       "      <td>Trinidad &amp; Tobago</td>\n",
       "      <td>197</td>\n",
       "      <td>156</td>\n",
       "      <td>7</td>\n",
       "      <td>6.4</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>175</th>\n",
       "      <td>Tunisia</td>\n",
       "      <td>51</td>\n",
       "      <td>3</td>\n",
       "      <td>20</td>\n",
       "      <td>1.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>176</th>\n",
       "      <td>Turkey</td>\n",
       "      <td>51</td>\n",
       "      <td>22</td>\n",
       "      <td>7</td>\n",
       "      <td>1.4</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>177</th>\n",
       "      <td>Turkmenistan</td>\n",
       "      <td>19</td>\n",
       "      <td>71</td>\n",
       "      <td>32</td>\n",
       "      <td>2.2</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>Tuvalu</td>\n",
       "      <td>6</td>\n",
       "      <td>41</td>\n",
       "      <td>9</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>179</th>\n",
       "      <td>Uganda</td>\n",
       "      <td>45</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>8.3</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180</th>\n",
       "      <td>Ukraine</td>\n",
       "      <td>206</td>\n",
       "      <td>237</td>\n",
       "      <td>45</td>\n",
       "      <td>8.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>181</th>\n",
       "      <td>United Arab Emirates</td>\n",
       "      <td>16</td>\n",
       "      <td>135</td>\n",
       "      <td>5</td>\n",
       "      <td>2.8</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>182</th>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>219</td>\n",
       "      <td>126</td>\n",
       "      <td>195</td>\n",
       "      <td>10.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>183</th>\n",
       "      <td>Tanzania</td>\n",
       "      <td>36</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>5.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>184</th>\n",
       "      <td>USA</td>\n",
       "      <td>249</td>\n",
       "      <td>158</td>\n",
       "      <td>84</td>\n",
       "      <td>8.7</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185</th>\n",
       "      <td>Uruguay</td>\n",
       "      <td>115</td>\n",
       "      <td>35</td>\n",
       "      <td>220</td>\n",
       "      <td>6.6</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>186</th>\n",
       "      <td>Uzbekistan</td>\n",
       "      <td>25</td>\n",
       "      <td>101</td>\n",
       "      <td>8</td>\n",
       "      <td>2.4</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>187</th>\n",
       "      <td>Vanuatu</td>\n",
       "      <td>21</td>\n",
       "      <td>18</td>\n",
       "      <td>11</td>\n",
       "      <td>0.9</td>\n",
       "      <td>Oceania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188</th>\n",
       "      <td>Venezuela</td>\n",
       "      <td>333</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>7.7</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189</th>\n",
       "      <td>Vietnam</td>\n",
       "      <td>111</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>Yemen</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191</th>\n",
       "      <td>Zambia</td>\n",
       "      <td>32</td>\n",
       "      <td>19</td>\n",
       "      <td>4</td>\n",
       "      <td>2.5</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192</th>\n",
       "      <td>Zimbabwe</td>\n",
       "      <td>64</td>\n",
       "      <td>18</td>\n",
       "      <td>4</td>\n",
       "      <td>4.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                          country  beer_servings  spirit_servings  \\\n",
       "0                     Afghanistan              0                0   \n",
       "1                         Albania             89              132   \n",
       "2                         Algeria             25                0   \n",
       "3                         Andorra            245              138   \n",
       "4                          Angola            217               57   \n",
       "5               Antigua & Barbuda            102              128   \n",
       "6                       Argentina            193               25   \n",
       "7                         Armenia             21              179   \n",
       "8                       Australia            261               72   \n",
       "9                         Austria            279               75   \n",
       "10                     Azerbaijan             21               46   \n",
       "11                        Bahamas            122              176   \n",
       "12                        Bahrain             42               63   \n",
       "13                     Bangladesh              0                0   \n",
       "14                       Barbados            143              173   \n",
       "15                        Belarus            142              373   \n",
       "16                        Belgium            295               84   \n",
       "17                         Belize            263              114   \n",
       "18                          Benin             34                4   \n",
       "19                         Bhutan             23                0   \n",
       "20                        Bolivia            167               41   \n",
       "21             Bosnia-Herzegovina             76              173   \n",
       "22                       Botswana            173               35   \n",
       "23                         Brazil            245              145   \n",
       "24                         Brunei             31                2   \n",
       "25                       Bulgaria            231              252   \n",
       "26                   Burkina Faso             25                7   \n",
       "27                        Burundi             88                0   \n",
       "28                  Cote d'Ivoire             37                1   \n",
       "29                     Cabo Verde            144               56   \n",
       "30                       Cambodia             57               65   \n",
       "31                       Cameroon            147                1   \n",
       "32                         Canada            240              122   \n",
       "33       Central African Republic             17                2   \n",
       "34                           Chad             15                1   \n",
       "35                          Chile            130              124   \n",
       "36                          China             79              192   \n",
       "37                       Colombia            159               76   \n",
       "38                        Comoros              1                3   \n",
       "39                          Congo             76                1   \n",
       "40                   Cook Islands              0              254   \n",
       "41                     Costa Rica            149               87   \n",
       "42                        Croatia            230               87   \n",
       "43                           Cuba             93              137   \n",
       "44                         Cyprus            192              154   \n",
       "45                 Czech Republic            361              170   \n",
       "46                    North Korea              0                0   \n",
       "47                       DR Congo             32                3   \n",
       "48                        Denmark            224               81   \n",
       "49                       Djibouti             15               44   \n",
       "50                       Dominica             52              286   \n",
       "51             Dominican Republic            193              147   \n",
       "52                        Ecuador            162               74   \n",
       "53                          Egypt              6                4   \n",
       "54                    El Salvador             52               69   \n",
       "55              Equatorial Guinea             92                0   \n",
       "56                        Eritrea             18                0   \n",
       "57                        Estonia            224              194   \n",
       "58                       Ethiopia             20                3   \n",
       "59                           Fiji             77               35   \n",
       "60                        Finland            263              133   \n",
       "61                         France            127              151   \n",
       "62                          Gabon            347               98   \n",
       "63                         Gambia              8                0   \n",
       "64                        Georgia             52              100   \n",
       "65                        Germany            346              117   \n",
       "66                          Ghana             31                3   \n",
       "67                         Greece            133              112   \n",
       "68                        Grenada            199              438   \n",
       "69                      Guatemala             53               69   \n",
       "70                         Guinea              9                0   \n",
       "71                  Guinea-Bissau             28               31   \n",
       "72                         Guyana             93              302   \n",
       "73                          Haiti              1              326   \n",
       "74                       Honduras             69               98   \n",
       "75                        Hungary            234              215   \n",
       "76                        Iceland            233               61   \n",
       "77                          India              9              114   \n",
       "78                      Indonesia              5                1   \n",
       "79                           Iran              0                0   \n",
       "80                           Iraq              9                3   \n",
       "81                        Ireland            313              118   \n",
       "82                         Israel             63               69   \n",
       "83                          Italy             85               42   \n",
       "84                        Jamaica             82               97   \n",
       "85                          Japan             77              202   \n",
       "86                         Jordan              6               21   \n",
       "87                     Kazakhstan            124              246   \n",
       "88                          Kenya             58               22   \n",
       "89                       Kiribati             21               34   \n",
       "90                         Kuwait              0                0   \n",
       "91                     Kyrgyzstan             31               97   \n",
       "92                           Laos             62                0   \n",
       "93                         Latvia            281              216   \n",
       "94                        Lebanon             20               55   \n",
       "95                        Lesotho             82               29   \n",
       "96                        Liberia             19              152   \n",
       "97                          Libya              0                0   \n",
       "98                      Lithuania            343              244   \n",
       "99                     Luxembourg            236              133   \n",
       "100                    Madagascar             26               15   \n",
       "101                        Malawi              8               11   \n",
       "102                      Malaysia             13                4   \n",
       "103                      Maldives              0                0   \n",
       "104                          Mali              5                1   \n",
       "105                         Malta            149              100   \n",
       "106              Marshall Islands              0                0   \n",
       "107                    Mauritania              0                0   \n",
       "108                     Mauritius             98               31   \n",
       "109                        Mexico            238               68   \n",
       "110                    Micronesia             62               50   \n",
       "111                        Monaco              0                0   \n",
       "112                      Mongolia             77              189   \n",
       "113                    Montenegro             31              114   \n",
       "114                       Morocco             12                6   \n",
       "115                    Mozambique             47               18   \n",
       "116                       Myanmar              5                1   \n",
       "117                       Namibia            376                3   \n",
       "118                         Nauru             49                0   \n",
       "119                         Nepal              5                6   \n",
       "120                   Netherlands            251               88   \n",
       "121                   New Zealand            203               79   \n",
       "122                     Nicaragua             78              118   \n",
       "123                         Niger              3                2   \n",
       "124                       Nigeria             42                5   \n",
       "125                          Niue            188              200   \n",
       "126                        Norway            169               71   \n",
       "127                          Oman             22               16   \n",
       "128                      Pakistan              0                0   \n",
       "129                         Palau            306               63   \n",
       "130                        Panama            285              104   \n",
       "131              Papua New Guinea             44               39   \n",
       "132                      Paraguay            213              117   \n",
       "133                          Peru            163              160   \n",
       "134                   Philippines             71              186   \n",
       "135                        Poland            343              215   \n",
       "136                      Portugal            194               67   \n",
       "137                         Qatar              1               42   \n",
       "138                   South Korea            140               16   \n",
       "139                       Moldova            109              226   \n",
       "140                       Romania            297              122   \n",
       "141            Russian Federation            247              326   \n",
       "142                        Rwanda             43                2   \n",
       "143             St. Kitts & Nevis            194              205   \n",
       "144                     St. Lucia            171              315   \n",
       "145  St. Vincent & the Grenadines            120              221   \n",
       "146                         Samoa            105               18   \n",
       "147                    San Marino              0                0   \n",
       "148           Sao Tome & Principe             56               38   \n",
       "149                  Saudi Arabia              0                5   \n",
       "150                       Senegal              9                1   \n",
       "151                        Serbia            283              131   \n",
       "152                    Seychelles            157               25   \n",
       "153                  Sierra Leone             25                3   \n",
       "154                     Singapore             60               12   \n",
       "155                      Slovakia            196              293   \n",
       "156                      Slovenia            270               51   \n",
       "157               Solomon Islands             56               11   \n",
       "158                       Somalia              0                0   \n",
       "159                  South Africa            225               76   \n",
       "160                         Spain            284              157   \n",
       "161                     Sri Lanka             16              104   \n",
       "162                         Sudan              8               13   \n",
       "163                      Suriname            128              178   \n",
       "164                     Swaziland             90                2   \n",
       "165                        Sweden            152               60   \n",
       "166                   Switzerland            185              100   \n",
       "167                         Syria              5               35   \n",
       "168                    Tajikistan              2               15   \n",
       "169                      Thailand             99              258   \n",
       "170                     Macedonia            106               27   \n",
       "171                   Timor-Leste              1                1   \n",
       "172                          Togo             36                2   \n",
       "173                         Tonga             36               21   \n",
       "174             Trinidad & Tobago            197              156   \n",
       "175                       Tunisia             51                3   \n",
       "176                        Turkey             51               22   \n",
       "177                  Turkmenistan             19               71   \n",
       "178                        Tuvalu              6               41   \n",
       "179                        Uganda             45                9   \n",
       "180                       Ukraine            206              237   \n",
       "181          United Arab Emirates             16              135   \n",
       "182                United Kingdom            219              126   \n",
       "183                      Tanzania             36                6   \n",
       "184                           USA            249              158   \n",
       "185                       Uruguay            115               35   \n",
       "186                    Uzbekistan             25              101   \n",
       "187                       Vanuatu             21               18   \n",
       "188                     Venezuela            333              100   \n",
       "189                       Vietnam            111                2   \n",
       "190                         Yemen              6                0   \n",
       "191                        Zambia             32               19   \n",
       "192                      Zimbabwe             64               18   \n",
       "\n",
       "     wine_servings  total_litres_of_pure_alcohol      continent  \n",
       "0                0                           0.0           Asia  \n",
       "1               54                           4.9         Europe  \n",
       "2               14                           0.7         Africa  \n",
       "3              312                          12.4         Europe  \n",
       "4               45                           5.9         Africa  \n",
       "5               45                           4.9  North America  \n",
       "6              221                           8.3  South America  \n",
       "7               11                           3.8         Europe  \n",
       "8              212                          10.4        Oceania  \n",
       "9              191                           9.7         Europe  \n",
       "10               5                           1.3         Europe  \n",
       "11              51                           6.3  North America  \n",
       "12               7                           2.0           Asia  \n",
       "13               0                           0.0           Asia  \n",
       "14              36                           6.3  North America  \n",
       "15              42                          14.4         Europe  \n",
       "16             212                          10.5         Europe  \n",
       "17               8                           6.8  North America  \n",
       "18              13                           1.1         Africa  \n",
       "19               0                           0.4           Asia  \n",
       "20               8                           3.8  South America  \n",
       "21               8                           4.6         Europe  \n",
       "22              35                           5.4         Africa  \n",
       "23              16                           7.2  South America  \n",
       "24               1                           0.6           Asia  \n",
       "25              94                          10.3         Europe  \n",
       "26               7                           4.3         Africa  \n",
       "27               0                           6.3         Africa  \n",
       "28               7                           4.0         Africa  \n",
       "29              16                           4.0         Africa  \n",
       "30               1                           2.2           Asia  \n",
       "31               4                           5.8         Africa  \n",
       "32             100                           8.2  North America  \n",
       "33               1                           1.8         Africa  \n",
       "34               1                           0.4         Africa  \n",
       "35             172                           7.6  South America  \n",
       "36               8                           5.0           Asia  \n",
       "37               3                           4.2  South America  \n",
       "38               1                           0.1         Africa  \n",
       "39               9                           1.7         Africa  \n",
       "40              74                           5.9        Oceania  \n",
       "41              11                           4.4  North America  \n",
       "42             254                          10.2         Europe  \n",
       "43               5                           4.2  North America  \n",
       "44             113                           8.2         Europe  \n",
       "45             134                          11.8         Europe  \n",
       "46               0                           0.0           Asia  \n",
       "47               1                           2.3         Africa  \n",
       "48             278                          10.4         Europe  \n",
       "49               3                           1.1         Africa  \n",
       "50              26                           6.6  North America  \n",
       "51               9                           6.2  North America  \n",
       "52               3                           4.2  South America  \n",
       "53               1                           0.2         Africa  \n",
       "54               2                           2.2  North America  \n",
       "55             233                           5.8         Africa  \n",
       "56               0                           0.5         Africa  \n",
       "57              59                           9.5         Europe  \n",
       "58               0                           0.7         Africa  \n",
       "59               1                           2.0        Oceania  \n",
       "60              97                          10.0         Europe  \n",
       "61             370                          11.8         Europe  \n",
       "62              59                           8.9         Africa  \n",
       "63               1                           2.4         Africa  \n",
       "64             149                           5.4         Europe  \n",
       "65             175                          11.3         Europe  \n",
       "66              10                           1.8         Africa  \n",
       "67             218                           8.3         Europe  \n",
       "68              28                          11.9  North America  \n",
       "69               2                           2.2  North America  \n",
       "70               2                           0.2         Africa  \n",
       "71              21                           2.5         Africa  \n",
       "72               1                           7.1  South America  \n",
       "73               1                           5.9  North America  \n",
       "74               2                           3.0  North America  \n",
       "75             185                          11.3         Europe  \n",
       "76              78                           6.6         Europe  \n",
       "77               0                           2.2           Asia  \n",
       "78               0                           0.1           Asia  \n",
       "79               0                           0.0           Asia  \n",
       "80               0                           0.2           Asia  \n",
       "81             165                          11.4         Europe  \n",
       "82               9                           2.5           Asia  \n",
       "83             237                           6.5         Europe  \n",
       "84               9                           3.4  North America  \n",
       "85              16                           7.0           Asia  \n",
       "86               1                           0.5           Asia  \n",
       "87              12                           6.8           Asia  \n",
       "88               2                           1.8         Africa  \n",
       "89               1                           1.0        Oceania  \n",
       "90               0                           0.0           Asia  \n",
       "91               6                           2.4           Asia  \n",
       "92             123                           6.2           Asia  \n",
       "93              62                          10.5         Europe  \n",
       "94              31                           1.9           Asia  \n",
       "95               0                           2.8         Africa  \n",
       "96               2                           3.1         Africa  \n",
       "97               0                           0.0         Africa  \n",
       "98              56                          12.9         Europe  \n",
       "99             271                          11.4         Europe  \n",
       "100              4                           0.8         Africa  \n",
       "101              1                           1.5         Africa  \n",
       "102              0                           0.3           Asia  \n",
       "103              0                           0.0           Asia  \n",
       "104              1                           0.6         Africa  \n",
       "105            120                           6.6         Europe  \n",
       "106              0                           0.0        Oceania  \n",
       "107              0                           0.0         Africa  \n",
       "108             18                           2.6         Africa  \n",
       "109              5                           5.5  North America  \n",
       "110             18                           2.3        Oceania  \n",
       "111              0                           0.0         Europe  \n",
       "112              8                           4.9           Asia  \n",
       "113            128                           4.9         Europe  \n",
       "114             10                           0.5         Africa  \n",
       "115              5                           1.3         Africa  \n",
       "116              0                           0.1           Asia  \n",
       "117              1                           6.8         Africa  \n",
       "118              8                           1.0        Oceania  \n",
       "119              0                           0.2           Asia  \n",
       "120            190                           9.4         Europe  \n",
       "121            175                           9.3        Oceania  \n",
       "122              1                           3.5  North America  \n",
       "123              1                           0.1         Africa  \n",
       "124              2                           9.1         Africa  \n",
       "125              7                           7.0        Oceania  \n",
       "126            129                           6.7         Europe  \n",
       "127              1                           0.7           Asia  \n",
       "128              0                           0.0           Asia  \n",
       "129             23                           6.9        Oceania  \n",
       "130             18                           7.2  North America  \n",
       "131              1                           1.5        Oceania  \n",
       "132             74                           7.3  South America  \n",
       "133             21                           6.1  South America  \n",
       "134              1                           4.6           Asia  \n",
       "135             56                          10.9         Europe  \n",
       "136            339                          11.0         Europe  \n",
       "137              7                           0.9           Asia  \n",
       "138              9                           9.8           Asia  \n",
       "139             18                           6.3         Europe  \n",
       "140            167                          10.4         Europe  \n",
       "141             73                          11.5           Asia  \n",
       "142              0                           6.8         Africa  \n",
       "143             32                           7.7  North America  \n",
       "144             71                          10.1  North America  \n",
       "145             11                           6.3  North America  \n",
       "146             24                           2.6        Oceania  \n",
       "147              0                           0.0         Europe  \n",
       "148            140                           4.2         Africa  \n",
       "149              0                           0.1           Asia  \n",
       "150              7                           0.3         Africa  \n",
       "151            127                           9.6         Europe  \n",
       "152             51                           4.1         Africa  \n",
       "153              2                           6.7         Africa  \n",
       "154             11                           1.5           Asia  \n",
       "155            116                          11.4         Europe  \n",
       "156            276                          10.6         Europe  \n",
       "157              1                           1.2        Oceania  \n",
       "158              0                           0.0         Africa  \n",
       "159             81                           8.2         Africa  \n",
       "160            112                          10.0         Europe  \n",
       "161              0                           2.2           Asia  \n",
       "162              0                           1.7         Africa  \n",
       "163              7                           5.6  South America  \n",
       "164              2                           4.7         Africa  \n",
       "165            186                           7.2         Europe  \n",
       "166            280                          10.2         Europe  \n",
       "167             16                           1.0           Asia  \n",
       "168              0                           0.3           Asia  \n",
       "169              1                           6.4           Asia  \n",
       "170             86                           3.9         Europe  \n",
       "171              4                           0.1           Asia  \n",
       "172             19                           1.3         Africa  \n",
       "173              5                           1.1        Oceania  \n",
       "174              7                           6.4  North America  \n",
       "175             20                           1.3         Africa  \n",
       "176              7                           1.4           Asia  \n",
       "177             32                           2.2           Asia  \n",
       "178              9                           1.0        Oceania  \n",
       "179              0                           8.3         Africa  \n",
       "180             45                           8.9         Europe  \n",
       "181              5                           2.8           Asia  \n",
       "182            195                          10.4         Europe  \n",
       "183              1                           5.7         Africa  \n",
       "184             84                           8.7  North America  \n",
       "185            220                           6.6  South America  \n",
       "186              8                           2.4           Asia  \n",
       "187             11                           0.9        Oceania  \n",
       "188              3                           7.7  South America  \n",
       "189              1                           2.0           Asia  \n",
       "190              0                           0.1           Asia  \n",
       "191              4                           2.5         Africa  \n",
       "192              4                           4.7         Africa  "
      ]
     },
     "execution_count": 285,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# overwrite the current setting so that all rows will be displayed\n",
    "pd.set_option('display.max_rows', None)\n",
    "drinks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 286,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# reset the 'max_rows' option to its default\n",
    "pd.reset_option('display.max_rows')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`set_option`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.set_option.html) and [**`reset_option`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.reset_option.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 287,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20"
      ]
     },
     "execution_count": 287,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# the 'max_columns' option is similar to 'max_rows'\n",
    "pd.get_option('display.max_columns')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 288,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 288,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read the training dataset from Kaggle's Titanic competition into a DataFrame\n",
    "train = pd.read_csv('http://bit.ly/kaggletrain')\n",
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 289,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50"
      ]
     },
     "execution_count": 289,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# an ellipsis is displayed in the 'Name' cell of row 1 because of the 'max_colwidth' option\n",
    "pd.get_option('display.max_colwidth')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 290,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Thayer)</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                  Name     Sex   Age  SibSp  \\\n",
       "0                              Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)  female  38.0      1   \n",
       "2                               Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                             Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 290,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# overwrite the current setting so that more characters will be displayed\n",
    "pd.set_option('display.max_colwidth', 1000)\n",
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 291,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.25</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Thayer)</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.28</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.92</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.10</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.05</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                  Name     Sex   Age  SibSp  \\\n",
       "0                              Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)  female  38.0      1   \n",
       "2                               Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                             Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket   Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.25   NaN        S  \n",
       "1      0          PC 17599  71.28   C85        C  \n",
       "2      0  STON/O2. 3101282   7.92   NaN        S  \n",
       "3      0            113803  53.10  C123        S  \n",
       "4      0            373450   8.05   NaN        S  "
      ]
     },
     "execution_count": 291,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# overwrite the 'precision' setting to display 2 digits after the decimal point of 'Fare'\n",
    "pd.set_option('display.precision', 2)\n",
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 292,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "      <td>54000</td>\n",
       "      <td>4900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "      <td>14000</td>\n",
       "      <td>700.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "      <td>312000</td>\n",
       "      <td>12400.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "      <td>45000</td>\n",
       "      <td>5900.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent       x        y  \n",
       "0                           0.0      Asia       0      0.0  \n",
       "1                           4.9    Europe   54000   4900.0  \n",
       "2                           0.7    Africa   14000    700.0  \n",
       "3                          12.4    Europe  312000  12400.0  \n",
       "4                           5.9    Africa   45000   5900.0  "
      ]
     },
     "execution_count": 292,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# add two meaningless columns to the drinks DataFrame\n",
    "drinks['x'] = drinks.wine_servings * 1000\n",
    "drinks['y'] = drinks.total_litres_of_pure_alcohol * 1000\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 293,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "      <td>54000</td>\n",
       "      <td>4,900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "      <td>14000</td>\n",
       "      <td>700.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "      <td>312000</td>\n",
       "      <td>12,400.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "      <td>45000</td>\n",
       "      <td>5,900.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent       x        y  \n",
       "0                           0.0      Asia       0      0.0  \n",
       "1                           4.9    Europe   54000  4,900.0  \n",
       "2                           0.7    Africa   14000    700.0  \n",
       "3                          12.4    Europe  312000 12,400.0  \n",
       "4                           5.9    Africa   45000  5,900.0  "
      ]
     },
     "execution_count": 293,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use a Python format string to specify a comma as the thousands separator\n",
    "pd.set_option('display.float_format', '{:,}'.format)\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 294,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country                          object\n",
       "beer_servings                     int64\n",
       "spirit_servings                   int64\n",
       "wine_servings                     int64\n",
       "total_litres_of_pure_alcohol    float64\n",
       "continent                        object\n",
       "x                                 int64\n",
       "y                               float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 294,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'y' was affected (but not 'x') because the 'float_format' option only affects floats (not ints)\n",
    "drinks.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 295,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "display.chop_threshold : float or None\n",
      "    if set to a float value, all float values smaller then the given threshold\n",
      "    will be displayed as exactly 0 by repr and friends.\n",
      "    [default: None] [currently: None]\n",
      "\n",
      "display.colheader_justify : 'left'/'right'\n",
      "    Controls the justification of column headers. used by DataFrameFormatter.\n",
      "    [default: right] [currently: right]\n",
      "\n",
      "display.column_space No description available.\n",
      "    [default: 12] [currently: 12]\n",
      "\n",
      "display.date_dayfirst : boolean\n",
      "    When True, prints and parses dates with the day first, eg 20/01/2005\n",
      "    [default: False] [currently: False]\n",
      "\n",
      "display.date_yearfirst : boolean\n",
      "    When True, prints and parses dates with the year first, eg 2005/01/20\n",
      "    [default: False] [currently: False]\n",
      "\n",
      "display.encoding : str/unicode\n",
      "    Defaults to the detected encoding of the console.\n",
      "    Specifies the encoding to be used for strings returned by to_string,\n",
      "    these are generally strings meant to be displayed on the console.\n",
      "    [default: UTF-8] [currently: UTF-8]\n",
      "\n",
      "display.expand_frame_repr : boolean\n",
      "    Whether to print out the full DataFrame repr for wide DataFrames across\n",
      "    multiple lines, `max_columns` is still respected, but the output will\n",
      "    wrap-around across multiple \"pages\" if its width exceeds `display.width`.\n",
      "    [default: True] [currently: True]\n",
      "\n",
      "display.float_format : callable\n",
      "    The callable should accept a floating point number and return\n",
      "    a string with the desired format of the number. This is used\n",
      "    in some places like SeriesFormatter.\n",
      "    See formats.format.EngFormatter for an example.\n",
      "    [default: None] [currently: <built-in method format of str object at 0x000000000CAB1F58>]\n",
      "\n",
      "display.height : int\n",
      "    Deprecated.\n",
      "    [default: 60] [currently: 60]\n",
      "    (Deprecated, use `display.max_rows` instead.)\n",
      "\n",
      "display.large_repr : 'truncate'/'info'\n",
      "    For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can\n",
      "    show a truncated table (the default from 0.13), or switch to the view from\n",
      "    df.info() (the behaviour in earlier versions of pandas).\n",
      "    [default: truncate] [currently: truncate]\n",
      "\n",
      "display.latex.escape : bool\n",
      "    This specifies if the to_latex method of a Dataframe uses escapes special\n",
      "    characters.\n",
      "    method. Valid values: False,True\n",
      "    [default: True] [currently: True]\n",
      "\n",
      "display.latex.longtable :bool\n",
      "    This specifies if the to_latex method of a Dataframe uses the longtable\n",
      "    format.\n",
      "    method. Valid values: False,True\n",
      "    [default: False] [currently: False]\n",
      "\n",
      "display.latex.repr : boolean\n",
      "    Whether to produce a latex DataFrame representation for jupyter\n",
      "    environments that support it.\n",
      "    (default: False)\n",
      "    [default: False] [currently: False]\n",
      "\n",
      "display.line_width : int\n",
      "    Deprecated.\n",
      "    [default: 80] [currently: 80]\n",
      "    (Deprecated, use `display.width` instead.)\n",
      "\n",
      "display.max_categories : int\n",
      "    This sets the maximum number of categories pandas should output when\n",
      "    printing out a `Categorical` or a Series of dtype \"category\".\n",
      "    [default: 8] [currently: 8]\n",
      "\n",
      "display.max_columns : int\n",
      "    If max_cols is exceeded, switch to truncate view. Depending on\n",
      "    `large_repr`, objects are either centrally truncated or printed as\n",
      "    a summary view. 'None' value means unlimited.\n",
      "\n",
      "    In case python/IPython is running in a terminal and `large_repr`\n",
      "    equals 'truncate' this can be set to 0 and pandas will auto-detect\n",
      "    the width of the terminal and print a truncated object which fits\n",
      "    the screen width. The IPython notebook, IPython qtconsole, or IDLE\n",
      "    do not run in a terminal and hence it is not possible to do\n",
      "    correct auto-detection.\n",
      "    [default: 20] [currently: 20]\n",
      "\n",
      "display.max_colwidth : int\n",
      "    The maximum width in characters of a column in the repr of\n",
      "    a pandas data structure. When the column overflows, a \"...\"\n",
      "    placeholder is embedded in the output.\n",
      "    [default: 50] [currently: 1000]\n",
      "\n",
      "display.max_info_columns : int\n",
      "    max_info_columns is used in DataFrame.info method to decide if\n",
      "    per column information will be printed.\n",
      "    [default: 100] [currently: 100]\n",
      "\n",
      "display.max_info_rows : int or None\n",
      "    df.info() will usually show null-counts for each column.\n",
      "    For large frames this can be quite slow. max_info_rows and max_info_cols\n",
      "    limit this null check only to frames with smaller dimensions than\n",
      "    specified.\n",
      "    [default: 1690785] [currently: 1690785]\n",
      "\n",
      "display.max_rows : int\n",
      "    If max_rows is exceeded, switch to truncate view. Depending on\n",
      "    `large_repr`, objects are either centrally truncated or printed as\n",
      "    a summary view. 'None' value means unlimited.\n",
      "\n",
      "    In case python/IPython is running in a terminal and `large_repr`\n",
      "    equals 'truncate' this can be set to 0 and pandas will auto-detect\n",
      "    the height of the terminal and print a truncated object which fits\n",
      "    the screen height. The IPython notebook, IPython qtconsole, or\n",
      "    IDLE do not run in a terminal and hence it is not possible to do\n",
      "    correct auto-detection.\n",
      "    [default: 60] [currently: 60]\n",
      "\n",
      "display.max_seq_items : int or None\n",
      "    when pretty-printing a long sequence, no more then `max_seq_items`\n",
      "    will be printed. If items are omitted, they will be denoted by the\n",
      "    addition of \"...\" to the resulting string.\n",
      "\n",
      "    If set to None, the number of items to be printed is unlimited.\n",
      "    [default: 100] [currently: 100]\n",
      "\n",
      "display.memory_usage : bool, string or None\n",
      "    This specifies if the memory usage of a DataFrame should be displayed when\n",
      "    df.info() is called. Valid values True,False,'deep'\n",
      "    [default: True] [currently: True]\n",
      "\n",
      "display.mpl_style : bool\n",
      "    Setting this to 'default' will modify the rcParams used by matplotlib\n",
      "    to give plots a more pleasing visual style by default.\n",
      "    Setting this to None/False restores the values to their initial value.\n",
      "    [default: None] [currently: None]\n",
      "\n",
      "display.multi_sparse : boolean\n",
      "    \"sparsify\" MultiIndex display (don't display repeated\n",
      "    elements in outer levels within groups)\n",
      "    [default: True] [currently: True]\n",
      "\n",
      "display.notebook_repr_html : boolean\n",
      "    When True, IPython notebook will use html representation for\n",
      "    pandas objects (if it is available).\n",
      "    [default: True] [currently: True]\n",
      "\n",
      "display.pprint_nest_depth : int\n",
      "    Controls the number of nested levels to process when pretty-printing\n",
      "    [default: 3] [currently: 3]\n",
      "\n",
      "display.precision : int\n",
      "    Floating point output precision (number of significant digits). This is\n",
      "    only a suggestion\n",
      "    [default: 6] [currently: 2]\n",
      "\n",
      "display.show_dimensions : boolean or 'truncate'\n",
      "    Whether to print out dimensions at the end of DataFrame repr.\n",
      "    If 'truncate' is specified, only print out the dimensions if the\n",
      "    frame is truncated (e.g. not display all rows and/or columns)\n",
      "    [default: truncate] [currently: truncate]\n",
      "\n",
      "display.unicode.ambiguous_as_wide : boolean\n",
      "    Whether to use the Unicode East Asian Width to calculate the display text\n",
      "    width.\n",
      "    Enabling this may affect to the performance (default: False)\n",
      "    [default: False] [currently: False]\n",
      "\n",
      "display.unicode.east_asian_width : boolean\n",
      "    Whether to use the Unicode East Asian Width to calculate the display text\n",
      "    width.\n",
      "    Enabling this may affect to the performance (default: False)\n",
      "    [default: False] [currently: False]\n",
      "\n",
      "display.width : int\n",
      "    Width of the display in characters. In case python/IPython is running in\n",
      "    a terminal this can be set to None and pandas will correctly auto-detect\n",
      "    the width.\n",
      "    Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a\n",
      "    terminal and hence it is not possible to correctly detect the width.\n",
      "    [default: 80] [currently: 80]\n",
      "\n",
      "io.excel.xls.writer : string\n",
      "    The default Excel writer engine for 'xls' files. Available options:\n",
      "    'xlwt' (the default).\n",
      "    [default: xlwt] [currently: xlwt]\n",
      "\n",
      "io.excel.xlsm.writer : string\n",
      "    The default Excel writer engine for 'xlsm' files. Available options:\n",
      "    'openpyxl' (the default).\n",
      "    [default: openpyxl] [currently: openpyxl]\n",
      "\n",
      "io.excel.xlsx.writer : string\n",
      "    The default Excel writer engine for 'xlsx' files. Available options:\n",
      "    'xlsxwriter' (the default), 'openpyxl'.\n",
      "    [default: xlsxwriter] [currently: xlsxwriter]\n",
      "\n",
      "io.hdf.default_format : format\n",
      "    default format writing format, if None, then\n",
      "    put will default to 'fixed' and append will default to 'table'\n",
      "    [default: None] [currently: None]\n",
      "\n",
      "io.hdf.dropna_table : boolean\n",
      "    drop ALL nan rows when appending to a table\n",
      "    [default: False] [currently: False]\n",
      "\n",
      "mode.chained_assignment : string\n",
      "    Raise an exception, warn, or no action if trying to use chained assignment,\n",
      "    The default is warn\n",
      "    [default: warn] [currently: warn]\n",
      "\n",
      "mode.sim_interactive : boolean\n",
      "    Whether to simulate interactive mode for purposes of testing\n",
      "    [default: False] [currently: False]\n",
      "\n",
      "mode.use_inf_as_null : boolean\n",
      "    True means treat None, NaN, INF, -INF as null (old way),\n",
      "    False means None and NaN are null, but INF, -INF are not null\n",
      "    (new way).\n",
      "    [default: False] [currently: False]\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# view the option descriptions (including the default and current values)\n",
    "pd.describe_option()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 296,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "display.max_info_rows : int or None\n",
      "    df.info() will usually show null-counts for each column.\n",
      "    For large frames this can be quite slow. max_info_rows and max_info_cols\n",
      "    limit this null check only to frames with smaller dimensions than\n",
      "    specified.\n",
      "    [default: 1690785] [currently: 1690785]\n",
      "\n",
      "display.max_rows : int\n",
      "    If max_rows is exceeded, switch to truncate view. Depending on\n",
      "    `large_repr`, objects are either centrally truncated or printed as\n",
      "    a summary view. 'None' value means unlimited.\n",
      "\n",
      "    In case python/IPython is running in a terminal and `large_repr`\n",
      "    equals 'truncate' this can be set to 0 and pandas will auto-detect\n",
      "    the height of the terminal and print a truncated object which fits\n",
      "    the screen height. The IPython notebook, IPython qtconsole, or\n",
      "    IDLE do not run in a terminal and hence it is not possible to do\n",
      "    correct auto-detection.\n",
      "    [default: 60] [currently: 60]\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# search for specific options by name\n",
    "pd.describe_option('rows')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`describe_option`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.describe_option.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 297,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "height has been deprecated.\n",
      "\n",
      "line_width has been deprecated, use display.width instead (currently both are\n",
      "identical)\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "c:\\Users\\Kevin\\Anaconda\\lib\\site-packages\\ipykernel\\__main__.py:2: FutureWarning: \n",
      "mpl_style had been deprecated and will be removed in a future version.\n",
      "Use `matplotlib.pyplot.style.use` instead.\n",
      "\n",
      "  from ipykernel import kernelapp as app\n"
     ]
    }
   ],
   "source": [
    "# reset all of the options to their default values\n",
    "pd.reset_option('all')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Options and Settings](http://pandas.pydata.org/pandas-docs/stable/options.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 29. How do I create a pandas DataFrame from another object? ([video](https://www.youtube.com/watch?v=-Ov1N1_FbP8&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=29))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 298,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>red</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>blue</td>\n",
       "      <td>101</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>red</td>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  color   id\n",
       "0   red  100\n",
       "1  blue  101\n",
       "2   red  102"
      ]
     },
     "execution_count": 298,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a DataFrame from a dictionary (keys become column names, values become data)\n",
    "pd.DataFrame({'id':[100, 101, 102], 'color':['red', 'blue', 'red']})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 299,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>color</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>100</td>\n",
       "      <td>red</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>101</td>\n",
       "      <td>blue</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>102</td>\n",
       "      <td>red</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    id color\n",
       "a  100   red\n",
       "b  101  blue\n",
       "c  102   red"
      ]
     },
     "execution_count": 299,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# optionally specify the order of columns and define the index\n",
    "df = pd.DataFrame({'id':[100, 101, 102], 'color':['red', 'blue', 'red']}, columns=['id', 'color'], index=['a', 'b', 'c'])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`DataFrame`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 300,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>color</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100</td>\n",
       "      <td>red</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>101</td>\n",
       "      <td>blue</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "      <td>red</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    id color\n",
       "0  100   red\n",
       "1  101  blue\n",
       "2  102   red"
      ]
     },
     "execution_count": 300,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a DataFrame from a list of lists (each inner list becomes a row)\n",
    "pd.DataFrame([[100, 'red'], [101, 'blue'], [102, 'red']], columns=['id', 'color'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 301,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.9325265 ,  0.48261452],\n",
       "       [ 0.03239681,  0.94908844],\n",
       "       [ 0.17615564,  0.80045853],\n",
       "       [ 0.36113859,  0.95982213]])"
      ]
     },
     "execution_count": 301,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a NumPy array (with shape 4 by 2) and fill it with random numbers between 0 and 1\n",
    "import numpy as np\n",
    "arr = np.random.rand(4, 2)\n",
    "arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 302,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.932527</td>\n",
       "      <td>0.482615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.032397</td>\n",
       "      <td>0.949088</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.176156</td>\n",
       "      <td>0.800459</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.361139</td>\n",
       "      <td>0.959822</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        one       two\n",
       "0  0.932527  0.482615\n",
       "1  0.032397  0.949088\n",
       "2  0.176156  0.800459\n",
       "3  0.361139  0.959822"
      ]
     },
     "execution_count": 302,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a DataFrame from the NumPy array\n",
    "pd.DataFrame(arr, columns=['one', 'two'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 303,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>student</th>\n",
       "      <th>test</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>101</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "      <td>79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>103</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>104</td>\n",
       "      <td>73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>105</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>106</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>107</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>108</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>109</td>\n",
       "      <td>96</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   student  test\n",
       "0      100    78\n",
       "1      101    83\n",
       "2      102    79\n",
       "3      103    85\n",
       "4      104    73\n",
       "5      105    91\n",
       "6      106    81\n",
       "7      107    86\n",
       "8      108    88\n",
       "9      109    96"
      ]
     },
     "execution_count": 303,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a DataFrame of student IDs (100 through 109) and test scores (random integers between 60 and 100)\n",
    "pd.DataFrame({'student':np.arange(100, 110, 1), 'test':np.random.randint(60, 101, 10)})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`np.arange`**](http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html) and [**`np.random`**](http://docs.scipy.org/doc/numpy/reference/routines.random.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 304,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>test</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103</th>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>105</th>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106</th>\n",
       "      <td>97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>107</th>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108</th>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>79</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         test\n",
       "student      \n",
       "100        78\n",
       "101        71\n",
       "102        90\n",
       "103        63\n",
       "104        83\n",
       "105        92\n",
       "106        97\n",
       "107        67\n",
       "108        71\n",
       "109        79"
      ]
     },
     "execution_count": 304,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'set_index' can be chained with the DataFrame constructor to select an index\n",
    "pd.DataFrame({'student':np.arange(100, 110, 1), 'test':np.random.randint(60, 101, 10)}).set_index('student')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`set_index`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 305,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "c     round\n",
       "b    square\n",
       "Name: shape, dtype: object"
      ]
     },
     "execution_count": 305,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a new Series using the Series constructor\n",
    "s = pd.Series(['round', 'square'], index=['c', 'b'], name='shape')\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation for [**`Series`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 306,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>color</th>\n",
       "      <th>shape</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>100</td>\n",
       "      <td>red</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>101</td>\n",
       "      <td>blue</td>\n",
       "      <td>square</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>102</td>\n",
       "      <td>red</td>\n",
       "      <td>round</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    id color   shape\n",
       "a  100   red     NaN\n",
       "b  101  blue  square\n",
       "c  102   red   round"
      ]
     },
     "execution_count": 306,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# concatenate the DataFrame and the Series (use axis=1 to concatenate columns)\n",
    "pd.concat([df, s], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Notes:**\n",
    "\n",
    "- The Series name became the column name in the DataFrame.\n",
    "- The Series data was aligned to the DataFrame by its index.\n",
    "- The 'shape' for row 'a' was marked as a missing value (NaN) because that index was not present in the Series.\n",
    "\n",
    "Documentation for [**`concat`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html)\n",
    "\n",
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 30. How do I apply a function to a pandas Series or DataFrame? ([video](https://www.youtube.com/watch?v=P_q0tkYqvSk&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=30))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 307,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 307,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read the training dataset from Kaggle's Titanic competition into a DataFrame\n",
    "train = pd.read_csv('http://bit.ly/kaggletrain')\n",
    "train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Map the existing values of a Series to a different set of values\n",
    "\n",
    "**Method:** [**`map`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html) (Series method)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 308,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sex</th>\n",
       "      <th>Sex_num</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>male</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>male</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Sex  Sex_num\n",
       "0    male        1\n",
       "1  female        0\n",
       "2  female        0\n",
       "3  female        0\n",
       "4    male        1"
      ]
     },
     "execution_count": 308,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# map 'female' to 0 and 'male' to 1\n",
    "train['Sex_num'] = train.Sex.map({'female':0, 'male':1})\n",
    "train.loc[0:4, ['Sex', 'Sex_num']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Apply a function to each element in a Series\n",
    "\n",
    "**Method:** [**`apply`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html) (Series method)\n",
    "\n",
    "**Note:** **`map`** can be substituted for **`apply`** in many cases, but **`apply`** is more flexible and thus is recommended"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 309,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Name_length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                Name  Name_length\n",
       "0                            Braund, Mr. Owen Harris           23\n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...           51\n",
       "2                             Heikkinen, Miss. Laina           22\n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)           44\n",
       "4                           Allen, Mr. William Henry           24"
      ]
     },
     "execution_count": 309,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the length of each string in the 'Name' Series\n",
    "train['Name_length'] = train.Name.apply(len)\n",
    "train.loc[0:4, ['Name', 'Name_length']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 310,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Fare</th>\n",
       "      <th>Fare_ceil</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7.2500</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>71.2833</td>\n",
       "      <td>72.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.9250</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>53.1000</td>\n",
       "      <td>54.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.0500</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Fare  Fare_ceil\n",
       "0   7.2500        8.0\n",
       "1  71.2833       72.0\n",
       "2   7.9250        8.0\n",
       "3  53.1000       54.0\n",
       "4   8.0500        9.0"
      ]
     },
     "execution_count": 310,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# round up each element in the 'Fare' Series to the next integer\n",
    "import numpy as np\n",
    "train['Fare_ceil'] = train.Fare.apply(np.ceil)\n",
    "train.loc[0:4, ['Fare', 'Fare_ceil']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 311,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                              Braund, Mr. Owen Harris\n",
       "1    Cumings, Mrs. John Bradley (Florence Briggs Th...\n",
       "2                               Heikkinen, Miss. Laina\n",
       "3         Futrelle, Mrs. Jacques Heath (Lily May Peel)\n",
       "4                             Allen, Mr. William Henry\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 311,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we want to extract the last name of each person\n",
    "train.Name.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 312,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                           [Braund,  Mr. Owen Harris]\n",
       "1    [Cumings,  Mrs. John Bradley (Florence Briggs ...\n",
       "2                            [Heikkinen,  Miss. Laina]\n",
       "3      [Futrelle,  Mrs. Jacques Heath (Lily May Peel)]\n",
       "4                          [Allen,  Mr. William Henry]\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 312,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use a string method to split the 'Name' Series at commas (returns a Series of lists)\n",
    "train.Name.str.split(',').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 313,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# define a function that returns an element from a list based on position\n",
    "def get_element(my_list, position):\n",
    "    return my_list[position]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 314,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       Braund\n",
       "1      Cumings\n",
       "2    Heikkinen\n",
       "3     Futrelle\n",
       "4        Allen\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 314,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# apply the 'get_element' function and pass 'position' as a keyword argument\n",
    "train.Name.str.split(',').apply(get_element, position=0).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 315,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       Braund\n",
       "1      Cumings\n",
       "2    Heikkinen\n",
       "3     Futrelle\n",
       "4        Allen\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 315,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# alternatively, use a lambda function\n",
    "train.Name.str.split(',').apply(lambda x: x[0]).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Apply a function along either axis of a DataFrame\n",
    "\n",
    "**Method:** [**`apply`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) (DataFrame method)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 316,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 316,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read a dataset of alcohol consumption into a DataFrame\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 317,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   beer_servings  spirit_servings  wine_servings\n",
       "0              0                0              0\n",
       "1             89              132             54\n",
       "2             25                0             14\n",
       "3            245              138            312\n",
       "4            217               57             45"
      ]
     },
     "execution_count": 317,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select a subset of the DataFrame to work with\n",
    "drinks.loc[:, 'beer_servings':'wine_servings'].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 318,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "beer_servings      376\n",
       "spirit_servings    438\n",
       "wine_servings      370\n",
       "dtype: int64"
      ]
     },
     "execution_count": 318,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# apply the 'max' function along axis 0 to calculate the maximum value in each column\n",
    "drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 319,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0\n",
       "1    132\n",
       "2     25\n",
       "3    312\n",
       "4    217\n",
       "dtype: int64"
      ]
     },
     "execution_count": 319,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# apply the 'max' function along axis 1 to calculate the maximum value in each row\n",
    "drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=1).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 320,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      beer_servings\n",
       "1    spirit_servings\n",
       "2      beer_servings\n",
       "3      wine_servings\n",
       "4      beer_servings\n",
       "dtype: object"
      ]
     },
     "execution_count": 320,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use 'np.argmax' to calculate which column has the maximum value for each row\n",
    "drinks.loc[:, 'beer_servings':'wine_servings'].apply(np.argmax, axis=1).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** Apply a function to every element in a DataFrame\n",
    "\n",
    "**Method:** [**`applymap`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.applymap.html) (DataFrame method)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 321,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>89.0</td>\n",
       "      <td>132.0</td>\n",
       "      <td>54.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>25.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>245.0</td>\n",
       "      <td>138.0</td>\n",
       "      <td>312.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>217.0</td>\n",
       "      <td>57.0</td>\n",
       "      <td>45.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   beer_servings  spirit_servings  wine_servings\n",
       "0            0.0              0.0            0.0\n",
       "1           89.0            132.0           54.0\n",
       "2           25.0              0.0           14.0\n",
       "3          245.0            138.0          312.0\n",
       "4          217.0             57.0           45.0"
      ]
     },
     "execution_count": 321,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert every DataFrame element into a float\n",
    "drinks.loc[:, 'beer_servings':'wine_servings'].applymap(float).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 322,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89.0</td>\n",
       "      <td>132.0</td>\n",
       "      <td>54.0</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245.0</td>\n",
       "      <td>138.0</td>\n",
       "      <td>312.0</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217.0</td>\n",
       "      <td>57.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan            0.0              0.0            0.0   \n",
       "1      Albania           89.0            132.0           54.0   \n",
       "2      Algeria           25.0              0.0           14.0   \n",
       "3      Andorra          245.0            138.0          312.0   \n",
       "4       Angola          217.0             57.0           45.0   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 322,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# overwrite the existing DataFrame columns\n",
    "drinks.loc[:, 'beer_servings':'wine_servings'] = drinks.loc[:, 'beer_servings':'wine_servings'].applymap(float)\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<a href=\"#Python-pandas-Q&A-video-series-by-Data-School\">Back to top</a>]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}