{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Packages, Modules, Methods, and Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "> The Python source distribution has long maintained the philosophy of \"batteries included\" -- having a rich and versatile standard library which is immediately available, without making the user download separate packages. This gives the Python language a head start in many projects.\n",
    ">\n",
    "> \\- PEP 206"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Applied Review"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Python and Jupyter Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- We're working with Python through Jupyter, the most common IDE for data science."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Fundamentals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- Python's common *atomic*, or basic, data types are:\n",
    "    - Integers\n",
    "    - Floats (decimals)\n",
    "    - Strings\n",
    "    - Booleans"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- These simple types can be combined to form more complex types, including:\n",
    "    - Lists: Ordered collections\n",
    "    - Dictionaries: Key-value pairs\n",
    "    - DataFrames: Tabular datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Packages (aka *Modules*)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "So far we've seen several data types that Python offers out-of-the-box.\n",
    "However, to keep things organized, some Python functionality is stored in standalone *packages*, or libraries of code.\n",
    "The word \"module\" is generally synonymous with package; you will hear both in discussions of Python."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "For example, functionality related to the operating system -- such as creating files and folders -- is stored in a package called `os`.\n",
    "To use the tools in `os`, we *import* the package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Once we import it, we gain access to everything inside.\n",
    "With Jupyter's autocomplete, we can view what's available."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": [
     "ci-skip"
    ]
   },
   "outputs": [],
   "source": [
    "# Move your cursor the end of the below line and press tab.\n",
    "os."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Some packages, like `os`, are bundled with every Python install; downloading Python guarantees you'll have these packages.\n",
    "Collectively, this group of packages is known as the *standard library*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Other packages must be downloaded separately, either because\n",
    "- they aren't sufficiently popular to merit inclusion in the standard library\n",
    "- *or* they change too quickly for the maintainers of Python to keep up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The DataFrame type that we saw earlier is part of one such package called `pandas` (short for *Panel Data*).\n",
    "Since pandas is specific to data science and is still rapidly evolving, it is not part of the standard library."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We can download packages like pandas from the internet using a website called PyPI, the *Python Package Index*.\n",
    "Fortunately, since we are using Binder today, that has been handled for us and pandas is already installed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "It's possible to import packages under an *alias*, or a nickname.\n",
    "The community has adopted certain conventions for aliases for common packages;\n",
    "while following them isn't mandatory, it's highly recommended, as it makes your code easier for others to understand."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "pandas is conventionally imported under the alias `pd`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Importing pandas has given us access to the DataFrame, accessible as pd.DataFrame\n",
    "pd.DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition tip alert alert-warning\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Question</p></b>\n",
    "    <p>What is the type of <code>pd</code>? Guess before you run the code below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "module"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(pd)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Third-party packages unlock a huge range of functionality that isn't available in native Python; much of Python's data science capabilities come from a handful of packages outside the standard library:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- pandas\n",
    "- numpy (numerical computing)\n",
    "- scikit-learn (modeling)\n",
    "- scipy (scientific computing)\n",
    "- matplotlib (graphing)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We won't have time to touch on most of these in this training, but if you're interested in one, google it!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Your Turn\n",
    "\n",
    "1. Import the `numpy` library, listed above. Give it the alias \"np\".\n",
    "2. Using autocomplete, determine what variable or function inside the numpy library starts with \"asco\". *Hint: remember you'll need to preface the variable name with the package alias, e.g. `np.asco`*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Dot Notation with Packages"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We've seen it a few times already, but now it's time to discuss it explicitly:\n",
    "things inside packages can be accessed with *dot-notation*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Dot notation looks like this:\n",
    "```python\n",
    "pd.Series\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "or\n",
    "```python\n",
    "import numpy as np\n",
    "np.array\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "You can read this is \"the `array` variable, within the Numpy library\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Packages can contain pretty much anything that's legal in Python;\n",
    "if it's code, it can be in a package.\n",
    "\n",
    "This flexibility is part of the reason that Python's package ecosystem is so expansive and powerful."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "As you may have noticed already, occasionally we run code using parentheses.\n",
    "The feature that permits this in Python is **functions** -- code snippets wrapped up into a single name."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "For example, take the `type` function we saw above.\n",
    "```python\n",
    "type(x)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "`type` does some complex things under the hood -- it looks at the variable inside the parentheses, determines what type of thing it is, and then returns that type to the user."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "int"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = 7\n",
    "type(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "But the beauty of `type`, and of all functions, is that you (as the user) don't need to know all the complex code that's necessary to figure out that x is an `int` -- you just need to remember that there's a `type` function to do that for you."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Functions make you much more powerful, as they unlock lots of functionality within a simple interface."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "```python\n",
    "# Get the first few rows of the planes data.\n",
    "planes.head()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "```python\n",
    "# Read in the planes.csv file.\n",
    "pd.read_csv('../data/planes.csv')\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "The variables within the parens are called function arguments, or simply **arguments**.\n",
    "\n",
    "Above, the string `'../data/planes.csv'` is the argument to the `pd.read_csv` function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Functions are integral to using Python, because it's much more efficient to use pre-written code than to always write your own."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "If you ever do want to write your own function -- perhaps to share with others, or to make it easier to reuse your work -- it's fairly simple to do so, but beyond the scope of this training."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Objects and Dot Notation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Dot-notation, which we discussed in relation to packages, has another use -- accessing things inside of *objects*.\n",
    "\n",
    "What's an object? Basically, a variable that contains other data or functionality inside of it that is exposed to users."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "For example, DataFrames are objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.DataFrame({'first_name': ['Ethan', 'Brad'], 'last_name': ['Swan', 'Boehmke']})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>first_name</th>\n",
       "      <th>last_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ethan</td>\n",
       "      <td>Swan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Brad</td>\n",
       "      <td>Boehmke</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  first_name last_name\n",
       "0      Ethan      Swan\n",
       "1       Brad   Boehmke"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2, 2)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>first_name</th>\n",
       "      <th>last_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>Ethan</td>\n",
       "      <td>Swan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       first_name last_name\n",
       "count           2         2\n",
       "unique          2         2\n",
       "top         Ethan      Swan\n",
       "freq            1         1"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "You can see that DataFrames have a `shape` variable and a `describe` function inside of them, both accessible through dot notation.\n",
    "\n",
    "<div class=\"admonition note alert alert-info\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Note</p></b>\n",
    "    <p>Variables inside an object are often called <i>attributes</i> and functions inside objects are called <i>methods</i>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### On Consistency and Language Design"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "One of the great things about Python is that its creators really cared about internal consistency."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "What that means to us, as users, is that syntax is consistent and predictable -- even across different uses that would appear to be different at first."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Dot notation reveals something kind of cool about Python: packages are just like other objects, and the variables inside them are just attributes and methods!\n",
    "\n",
    "This standardization across packages and objects helps us remember a single, intuitive syntax that works for many different things."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Functions, Objects, and Methods in the Context of DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "As we saw above, DataFrames are a type of Python object, so let's use them to explore the new Python features we've learned."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Using the `read_csv` function from the Pandas package to read in a DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('../data/airlines.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Using the `type` function to determine the type of `df`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Using the `head` method of the DataFrame to view some of its rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>carrier</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9E</td>\n",
       "      <td>Endeavor Air Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AA</td>\n",
       "      <td>American Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AS</td>\n",
       "      <td>Alaska Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>B6</td>\n",
       "      <td>JetBlue Airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>DL</td>\n",
       "      <td>Delta Air Lines Inc.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  carrier                    name\n",
       "0      9E       Endeavor Air Inc.\n",
       "1      AA  American Airlines Inc.\n",
       "2      AS    Alaska Airlines Inc.\n",
       "3      B6         JetBlue Airways\n",
       "4      DL    Delta Air Lines Inc."
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Examining the `columns` attribute of the DataFrame to see the names of its columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['carrier', 'name'], dtype='object')"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Inspecting the `shape` attribute to find the *dimensions* (rows and columns) of the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(16, 2)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Calling the `describe` method to get a summary of the data in the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>carrier</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>16</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>16</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>9E</td>\n",
       "      <td>Endeavor Air Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       carrier               name\n",
       "count       16                 16\n",
       "unique      16                 16\n",
       "top         9E  Endeavor Air Inc.\n",
       "freq         1                  1"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Now let's combine them: using the `type` function to determine what `df.describe` holds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "method"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df.describe)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition tip alert alert-warning\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Question</p></b>\n",
    "    <p>Does this result make sense? What would happen if you added parens? i.e. <code>type(df.describe())</code></p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Your Turn\n",
    "\n",
    "Spend some time using autocomplete to explore the methods and attributes of the `df` object we used above.\n",
    "Remember from the Jupyter lesson that you can use a question mark to see the documentation for a function or method (e.g. `df.describe?`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Deeper Dive on DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Now that we understand objects and functions better, let's look more at DataFrames."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## What Are DataFrames Made of?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Accessing an individual column of a DataFrame can be done by passing the column name as a string, in brackets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     9E\n",
       "1     AA\n",
       "2     AS\n",
       "3     B6\n",
       "4     DL\n",
       "5     EV\n",
       "6     F9\n",
       "7     FL\n",
       "8     HA\n",
       "9     MQ\n",
       "10    OO\n",
       "11    UA\n",
       "12    US\n",
       "13    VX\n",
       "14    WN\n",
       "15    YV\n",
       "Name: carrier, dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "carrier_column = df['carrier']\n",
    "carrier_column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Individual columns are pandas `Series` objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(carrier_column)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "How are Series different from DataFrames?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- They're always 1-dimensional"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- They have different attributes than DataFrames\n",
    "    - For example, Series have a `to_list` method -- which doesn't make sense to have on DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- They don't print in the pretty format of DataFrames, but in plain text (see above)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(16,)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "carrier_column.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(16, 2)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['9E',\n",
       " 'AA',\n",
       " 'AS',\n",
       " 'B6',\n",
       " 'DL',\n",
       " 'EV',\n",
       " 'F9',\n",
       " 'FL',\n",
       " 'HA',\n",
       " 'MQ',\n",
       " 'OO',\n",
       " 'UA',\n",
       " 'US',\n",
       " 'VX',\n",
       " 'WN',\n",
       " 'YV']"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "carrier_column.to_list()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": [
     "ci-skip"
    ]
   },
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "'DataFrame' object has no attribute 'to_list'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[23], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m df\u001b[39m.\u001b[39mto_list()\n",
      "File \u001b[0;32m/usr/local/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/core/generic.py:5989\u001b[0m, in \u001b[0;36mNDFrame.__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m   5982\u001b[0m \u001b[39mif\u001b[39;00m (\n\u001b[1;32m   5983\u001b[0m     name \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_internal_names_set\n\u001b[1;32m   5984\u001b[0m     \u001b[39mand\u001b[39;00m name \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_metadata\n\u001b[1;32m   5985\u001b[0m     \u001b[39mand\u001b[39;00m name \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_accessors\n\u001b[1;32m   5986\u001b[0m     \u001b[39mand\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_info_axis\u001b[39m.\u001b[39m_can_hold_identifiers_and_holds_name(name)\n\u001b[1;32m   5987\u001b[0m ):\n\u001b[1;32m   5988\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m[name]\n\u001b[0;32m-> 5989\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mobject\u001b[39m\u001b[39m.\u001b[39m\u001b[39m__getattribute__\u001b[39m(\u001b[39mself\u001b[39m, name)\n",
      "\u001b[0;31mAttributeError\u001b[0m: 'DataFrame' object has no attribute 'to_list'"
     ]
    }
   ],
   "source": [
    "df.to_list()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "It's important to be familiar with Series because they are fundamentally the core of DataFrames.\n",
    "Not only are columns represented as Series, but so are rows!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "carrier                   9E\n",
       "name       Endeavor Air Inc.\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Fetch the first row of the DataFrame\n",
    "first_row = df.loc[0]\n",
    "first_row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(first_row)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Whenever you select individual columns or rows, you'll get Series objects."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## What Can You Do with a Series?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "First, let's create our own Series object from scratch -- they don't need to come from a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    10\n",
       "1    20\n",
       "2    30\n",
       "3    40\n",
       "4    50\n",
       "dtype: int64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Pass a list in as an argument and it will be converted to a Series.\n",
    "s = pd.Series([10, 20, 30, 40, 50])\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    10\n",
       "1    20\n",
       "2    30\n",
       "3    40\n",
       "4    50\n",
       "dtype: int64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Pass a list in as an argument and it will be converted to a Series.\n",
    "s = pd.Series([10, 20, 30, 40, 50])\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "There are 3 things to notice about this Series:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- The values (10, 20, 30...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- The *dtype*, short for data type."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- The *index* (0, 1, 2...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Values\n",
    "Values are fairly self-explanatory; we chose them in our input list."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### dtype\n",
    "Data types are also straightforward."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Series are always homogeneous, holding only integers, floats, or generic Python objects (called just `object`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Because a Python object is general enough to contain any other type, any Series holding strings or other non-numeric data will typically default to be of type `object`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "For example, going back to our carriers DataFrame, note that the carrier column is of type `object`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     9E\n",
       "1     AA\n",
       "2     AS\n",
       "3     B6\n",
       "4     DL\n",
       "5     EV\n",
       "6     F9\n",
       "7     FL\n",
       "8     HA\n",
       "9     MQ\n",
       "10    OO\n",
       "11    UA\n",
       "12    US\n",
       "13    VX\n",
       "14    WN\n",
       "15    YV\n",
       "Name: carrier, dtype: object"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['carrier']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Index\n",
    "Indexes are more interesting.\n",
    "Every Series has an index, a way to reference each element.\n",
    "The index of a Series is a lot like the keys of a dictionary: each index element corresponds to a value in the Series, and can be used to look up that element."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=5, step=1)"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Our index is a range from 0 (inclusive) to 5 (exclusive).\n",
    "s.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    10\n",
       "1    20\n",
       "2    30\n",
       "3    40\n",
       "4    50\n",
       "dtype: int64"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "40"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "In our example, the index is just the integers 0-4, so right now it looks no different that referencing elements of a regular Python list.\n",
    "*But* indexes can be changed to something different -- like the letters a-e, for example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    10\n",
       "b    20\n",
       "c    30\n",
       "d    40\n",
       "e    50\n",
       "dtype: int64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.index = ['a', 'b', 'c', 'd', 'e']\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Now to look up the value 40, we reference `'d'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "40"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s['d']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We saw earlier that rows of a DataFrame are Series.\n",
    "In such cases, the flexibility of Series indexes comes in handy;\n",
    "the index is set to the DataFrame column names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>carrier</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9E</td>\n",
       "      <td>Endeavor Air Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AA</td>\n",
       "      <td>American Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AS</td>\n",
       "      <td>Alaska Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>B6</td>\n",
       "      <td>JetBlue Airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>DL</td>\n",
       "      <td>Delta Air Lines Inc.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  carrier                    name\n",
       "0      9E       Endeavor Air Inc.\n",
       "1      AA  American Airlines Inc.\n",
       "2      AS    Alaska Airlines Inc.\n",
       "3      B6         JetBlue Airways\n",
       "4      DL    Delta Air Lines Inc."
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "carrier                   9E\n",
       "name       Endeavor Air Inc.\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Note that the index is ['carrier', 'name']\n",
    "first_row = df.loc[0]\n",
    "first_row"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "This is particularly handy because it means you can extract individual elements based on a column name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'9E'"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_row['carrier']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## DataFrame Indexes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "It's not just Series that have indexes!\n",
    "DataFrames have them too.\n",
    "Take a look at the carrier DataFrame again and note the bold numbers on the left."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>carrier</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9E</td>\n",
       "      <td>Endeavor Air Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AA</td>\n",
       "      <td>American Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AS</td>\n",
       "      <td>Alaska Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>B6</td>\n",
       "      <td>JetBlue Airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>DL</td>\n",
       "      <td>Delta Air Lines Inc.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  carrier                    name\n",
       "0      9E       Endeavor Air Inc.\n",
       "1      AA  American Airlines Inc.\n",
       "2      AS    Alaska Airlines Inc.\n",
       "3      B6         JetBlue Airways\n",
       "4      DL    Delta Air Lines Inc."
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "These numbers are an index, just like the one we saw on our example Series.\n",
    "And DataFrame indexes support similar functionality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=16, step=1)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Our index is a range from 0 (inclusive) to 16 (exclusive).\n",
    "df.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "When loading in a DataFrame, the default index will always be 0 to N-1, where N is the number of rows in your DataFrame.\n",
    "This is called a `RangeIndex`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Selecting individual rows by their index is done with the `.loc` accessor.\n",
    "An *accessor* is an attribute designed specifically to help users reference something else (like rows within a DataFrame)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "carrier                      DL\n",
       "name       Delta Air Lines Inc.\n",
       "Name: 4, dtype: object"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get the row at index 4 (the fifth row).\n",
    "df.loc[4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "As with Series, DataFrames support reassigning their index."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "However, with DataFrames it often makes sense to change one of your columns into the index."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "This is analogous to a primary key in relational databases: a way to rapidly look up rows within a table."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "In our case, maybe we will often use the carrier code (`carrier`) to look up the full name of the airline.\n",
    "In that case, it would make sense set the carrier column as our index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>carrier</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9E</th>\n",
       "      <td>Endeavor Air Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AA</th>\n",
       "      <td>American Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AS</th>\n",
       "      <td>Alaska Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B6</th>\n",
       "      <td>JetBlue Airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DL</th>\n",
       "      <td>Delta Air Lines Inc.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           name\n",
       "carrier                        \n",
       "9E            Endeavor Air Inc.\n",
       "AA       American Airlines Inc.\n",
       "AS         Alaska Airlines Inc.\n",
       "B6              JetBlue Airways\n",
       "DL         Delta Air Lines Inc."
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.set_index('carrier')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Now the RangeIndex has been replaced with a more meaningful index, and it's possible to look up rows of the table by passing carrier code to the `.loc` accessor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name    United Air Lines Inc.\n",
       "Name: UA, dtype: object"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['UA']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<div class=\"admonition warning alert alert-danger\">\n",
    " <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Caution!</p></b>\n",
    " <p>Pandas does not require that indexes have unique values (that is, no duplicates) although many relational databases do have that requirement of a primary key. This means that it is *possible* to create a non-unique index, but highly inadvisable. Having duplicate values in your index can cause unexpected results when you refer to rows by index -- but multiple rows have that index. Don't do it if you can help it!</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "When starting to work with a DataFrame, it's often a good idea to determine what column makes sense as your index and to set it immediately."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "This will make your code nicer -- by letting you directly look up values with the index -- and also make your selections and filters faster, because Pandas is optimized for operations by index."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "If you want to change the index of your DataFrame later, you can always `reset_index` (and then assign a new one)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>carrier</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9E</th>\n",
       "      <td>Endeavor Air Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AA</th>\n",
       "      <td>American Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AS</th>\n",
       "      <td>Alaska Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B6</th>\n",
       "      <td>JetBlue Airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DL</th>\n",
       "      <td>Delta Air Lines Inc.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           name\n",
       "carrier                        \n",
       "9E            Endeavor Air Inc.\n",
       "AA       American Airlines Inc.\n",
       "AS         Alaska Airlines Inc.\n",
       "B6              JetBlue Airways\n",
       "DL         Delta Air Lines Inc."
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>carrier</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9E</td>\n",
       "      <td>Endeavor Air Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AA</td>\n",
       "      <td>American Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AS</td>\n",
       "      <td>Alaska Airlines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>B6</td>\n",
       "      <td>JetBlue Airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>DL</td>\n",
       "      <td>Delta Air Lines Inc.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  carrier                    name\n",
       "0      9E       Endeavor Air Inc.\n",
       "1      AA  American Airlines Inc.\n",
       "2      AS    Alaska Airlines Inc.\n",
       "3      B6         JetBlue Airways\n",
       "4      DL    Delta Air Lines Inc."
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.reset_index()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Your Turn\n",
    "\n",
    "The below cell has code to load in the first 100 rows of the airports data as `airports`.\n",
    "The data contains the airport code, airport name, and some basic facts about the airport location."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>faa</th>\n",
       "      <th>name</th>\n",
       "      <th>lat</th>\n",
       "      <th>lon</th>\n",
       "      <th>alt</th>\n",
       "      <th>tz</th>\n",
       "      <th>dst</th>\n",
       "      <th>tzone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>04G</td>\n",
       "      <td>Lansdowne Airport</td>\n",
       "      <td>41.130472</td>\n",
       "      <td>-80.619583</td>\n",
       "      <td>1044</td>\n",
       "      <td>-5</td>\n",
       "      <td>A</td>\n",
       "      <td>America/New_York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>06A</td>\n",
       "      <td>Moton Field Municipal Airport</td>\n",
       "      <td>32.460572</td>\n",
       "      <td>-85.680028</td>\n",
       "      <td>264</td>\n",
       "      <td>-6</td>\n",
       "      <td>A</td>\n",
       "      <td>America/Chicago</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>06C</td>\n",
       "      <td>Schaumburg Regional</td>\n",
       "      <td>41.989341</td>\n",
       "      <td>-88.101243</td>\n",
       "      <td>801</td>\n",
       "      <td>-6</td>\n",
       "      <td>A</td>\n",
       "      <td>America/Chicago</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>06N</td>\n",
       "      <td>Randall Airport</td>\n",
       "      <td>41.431912</td>\n",
       "      <td>-74.391561</td>\n",
       "      <td>523</td>\n",
       "      <td>-5</td>\n",
       "      <td>A</td>\n",
       "      <td>America/New_York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>09J</td>\n",
       "      <td>Jekyll Island Airport</td>\n",
       "      <td>31.074472</td>\n",
       "      <td>-81.427778</td>\n",
       "      <td>11</td>\n",
       "      <td>-5</td>\n",
       "      <td>A</td>\n",
       "      <td>America/New_York</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   faa                           name        lat        lon   alt  tz dst  \\\n",
       "0  04G              Lansdowne Airport  41.130472 -80.619583  1044  -5   A   \n",
       "1  06A  Moton Field Municipal Airport  32.460572 -85.680028   264  -6   A   \n",
       "2  06C            Schaumburg Regional  41.989341 -88.101243   801  -6   A   \n",
       "3  06N                Randall Airport  41.431912 -74.391561   523  -5   A   \n",
       "4  09J          Jekyll Island Airport  31.074472 -81.427778    11  -5   A   \n",
       "\n",
       "              tzone  \n",
       "0  America/New_York  \n",
       "1   America/Chicago  \n",
       "2   America/Chicago  \n",
       "3  America/New_York  \n",
       "4  America/New_York  "
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports = pd.read_csv('../data/airports.csv')\n",
    "airports = airports.loc[0:100]\n",
    "airports.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "1. What kind of index is the current index of `airports`? \n",
    "2. Is this a good choice for the DataFrame's index? If not, what column or columns would be a better candidate?\n",
    "3. If you chose a different column to be the index, make it your index using `airports.set_index()`.\n",
    "4. Using your new index, look up \"Pittsburgh-Monroeville Airport\", code 4G0. What is its altitude?\n",
    "5. Reset your index in case you want to make a different column your index in the future."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Questions\n",
    "\n",
    "Are there any questions before we move on?"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  },
  "rise": {
   "autolaunch": true,
   "transition": "none"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}