{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Packages, Modules, Methods, and Functions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"> The Python source distribution has long maintained the philosophy of \"batteries included\" -- having a rich and versatile standard library which is immediately available, without making the user download separate packages. This gives the Python language a head start in many projects.\n",
">\n",
"> \\- PEP 206"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Applied Review"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Python and Jupyter Overview"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- We're working with Python through Jupyter, the most common IDE for data science."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Fundamentals"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- Python's common *atomic*, or basic, data types are:\n",
" - Integers\n",
" - Floats (decimals)\n",
" - Strings\n",
" - Booleans"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- These simple types can be combined to form more complex types, including:\n",
" - Lists: Ordered collections\n",
" - Dictionaries: Key-value pairs\n",
" - DataFrames: Tabular datasets"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Packages (aka *Modules*)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"So far we've seen several data types that Python offers out-of-the-box.\n",
"However, to keep things organized, some Python functionality is stored in standalone *packages*, or libraries of code.\n",
"The word \"module\" is generally synonymous with package; you will hear both in discussions of Python."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"For example, functionality related to the operating system -- such as creating files and folders -- is stored in a package called `os`.\n",
"To use the tools in `os`, we *import* the package."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import os"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Once we import it, we gain access to everything inside.\n",
"With Jupyter's autocomplete, we can view what's available."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": [
"ci-skip"
]
},
"outputs": [],
"source": [
"# Move your cursor the end of the below line and press tab.\n",
"os."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Some packages, like `os`, are bundled with every Python install; downloading Python guarantees you'll have these packages.\n",
"Collectively, this group of packages is known as the *standard library*."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Other packages must be downloaded separately, either because\n",
"- they aren't sufficiently popular to merit inclusion in the standard library\n",
"- *or* they change too quickly for the maintainers of Python to keep up"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The DataFrame type that we saw earlier is part of one such package called `pandas` (short for *Panel Data*).\n",
"Since pandas is specific to data science and is still rapidly evolving, it is not part of the standard library."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We can download packages like pandas from the internet using a website called PyPI, the *Python Package Index*.\n",
"Fortunately, since we are using Binder today, that has been handled for us and pandas is already installed."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"It's possible to import packages under an *alias*, or a nickname.\n",
"The community has adopted certain conventions for aliases for common packages;\n",
"while following them isn't mandatory, it's highly recommended, as it makes your code easier for others to understand."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"pandas is conventionally imported under the alias `pd`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Importing pandas has given us access to the DataFrame, accessible as pd.DataFrame\n",
"pd.DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"
\n",
"
Question
\n",
"
What is the type of pd
? Guess before you run the code below.
\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"module"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(pd)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Third-party packages unlock a huge range of functionality that isn't available in native Python; much of Python's data science capabilities come from a handful of packages outside the standard library:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- pandas\n",
"- numpy (numerical computing)\n",
"- scikit-learn (modeling)\n",
"- scipy (scientific computing)\n",
"- matplotlib (graphing)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We won't have time to touch on most of these in this training, but if you're interested in one, google it!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Your Turn\n",
"\n",
"1. Import the `numpy` library, listed above. Give it the alias \"np\".\n",
"2. Using autocomplete, determine what variable or function inside the numpy library starts with \"asco\". *Hint: remember you'll need to preface the variable name with the package alias, e.g. `np.asco`*"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Dot Notation with Packages"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We've seen it a few times already, but now it's time to discuss it explicitly:\n",
"things inside packages can be accessed with *dot-notation*."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Dot notation looks like this:\n",
"```python\n",
"pd.Series\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"or\n",
"```python\n",
"import numpy as np\n",
"np.array\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"You can read this is \"the `array` variable, within the Numpy library\"."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Packages can contain pretty much anything that's legal in Python;\n",
"if it's code, it can be in a package.\n",
"\n",
"This flexibility is part of the reason that Python's package ecosystem is so expansive and powerful."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Functions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"As you may have noticed already, occasionally we run code using parentheses.\n",
"The feature that permits this in Python is **functions** -- code snippets wrapped up into a single name."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"For example, take the `type` function we saw above.\n",
"```python\n",
"type(x)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"`type` does some complex things under the hood -- it looks at the variable inside the parentheses, determines what type of thing it is, and then returns that type to the user."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"int"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = 7\n",
"type(x)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"But the beauty of `type`, and of all functions, is that you (as the user) don't need to know all the complex code that's necessary to figure out that x is an `int` -- you just need to remember that there's a `type` function to do that for you."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Functions make you much more powerful, as they unlock lots of functionality within a simple interface."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"```python\n",
"# Get the first few rows of the planes data.\n",
"planes.head()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"```python\n",
"# Read in the planes.csv file.\n",
"pd.read_csv('../data/planes.csv')\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"The variables within the parens are called function arguments, or simply **arguments**.\n",
"\n",
"Above, the string `'../data/planes.csv'` is the argument to the `pd.read_csv` function."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Functions are integral to using Python, because it's much more efficient to use pre-written code than to always write your own."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"If you ever do want to write your own function -- perhaps to share with others, or to make it easier to reuse your work -- it's fairly simple to do so, but beyond the scope of this training."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Objects and Dot Notation"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Dot-notation, which we discussed in relation to packages, has another use -- accessing things inside of *objects*.\n",
"\n",
"What's an object? Basically, a variable that contains other data or functionality inside of it that is exposed to users."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"For example, DataFrames are objects."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"df = pd.DataFrame({'first_name': ['Ethan', 'Brad'], 'last_name': ['Swan', 'Boehmke']})"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" first_name | \n",
" last_name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Ethan | \n",
" Swan | \n",
"
\n",
" \n",
" 1 | \n",
" Brad | \n",
" Boehmke | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" first_name last_name\n",
"0 Ethan Swan\n",
"1 Brad Boehmke"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(2, 2)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" first_name | \n",
" last_name | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" unique | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" top | \n",
" Ethan | \n",
" Swan | \n",
"
\n",
" \n",
" freq | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" first_name last_name\n",
"count 2 2\n",
"unique 2 2\n",
"top Ethan Swan\n",
"freq 1 1"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"You can see that DataFrames have a `shape` variable and a `describe` function inside of them, both accessible through dot notation.\n",
"\n",
"\n",
"
Note
\n",
"
Variables inside an object are often called attributes and functions inside objects are called methods.
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### On Consistency and Language Design"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"One of the great things about Python is that its creators really cared about internal consistency."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"What that means to us, as users, is that syntax is consistent and predictable -- even across different uses that would appear to be different at first."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Dot notation reveals something kind of cool about Python: packages are just like other objects, and the variables inside them are just attributes and methods!\n",
"\n",
"This standardization across packages and objects helps us remember a single, intuitive syntax that works for many different things."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Functions, Objects, and Methods in the Context of DataFrames"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"As we saw above, DataFrames are a type of Python object, so let's use them to explore the new Python features we've learned."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Using the `read_csv` function from the Pandas package to read in a DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"df = pd.read_csv('../data/airlines.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Using the `type` function to determine the type of `df`"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Using the `head` method of the DataFrame to view some of its rows"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" carrier | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9E | \n",
" Endeavor Air Inc. | \n",
"
\n",
" \n",
" 1 | \n",
" AA | \n",
" American Airlines Inc. | \n",
"
\n",
" \n",
" 2 | \n",
" AS | \n",
" Alaska Airlines Inc. | \n",
"
\n",
" \n",
" 3 | \n",
" B6 | \n",
" JetBlue Airways | \n",
"
\n",
" \n",
" 4 | \n",
" DL | \n",
" Delta Air Lines Inc. | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" carrier name\n",
"0 9E Endeavor Air Inc.\n",
"1 AA American Airlines Inc.\n",
"2 AS Alaska Airlines Inc.\n",
"3 B6 JetBlue Airways\n",
"4 DL Delta Air Lines Inc."
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Examining the `columns` attribute of the DataFrame to see the names of its columns."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['carrier', 'name'], dtype='object')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Inspecting the `shape` attribute to find the *dimensions* (rows and columns) of the DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(16, 2)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Calling the `describe` method to get a summary of the data in the DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" carrier | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 16 | \n",
" 16 | \n",
"
\n",
" \n",
" unique | \n",
" 16 | \n",
" 16 | \n",
"
\n",
" \n",
" top | \n",
" 9E | \n",
" Endeavor Air Inc. | \n",
"
\n",
" \n",
" freq | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" carrier name\n",
"count 16 16\n",
"unique 16 16\n",
"top 9E Endeavor Air Inc.\n",
"freq 1 1"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Now let's combine them: using the `type` function to determine what `df.describe` holds."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"method"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df.describe)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"
Question
\n",
"
Does this result make sense? What would happen if you added parens? i.e. type(df.describe())
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Your Turn\n",
"\n",
"Spend some time using autocomplete to explore the methods and attributes of the `df` object we used above.\n",
"Remember from the Jupyter lesson that you can use a question mark to see the documentation for a function or method (e.g. `df.describe?`)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Deeper Dive on DataFrames"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Now that we understand objects and functions better, let's look more at DataFrames."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## What Are DataFrames Made of?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Accessing an individual column of a DataFrame can be done by passing the column name as a string, in brackets."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 9E\n",
"1 AA\n",
"2 AS\n",
"3 B6\n",
"4 DL\n",
"5 EV\n",
"6 F9\n",
"7 FL\n",
"8 HA\n",
"9 MQ\n",
"10 OO\n",
"11 UA\n",
"12 US\n",
"13 VX\n",
"14 WN\n",
"15 YV\n",
"Name: carrier, dtype: object"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"carrier_column = df['carrier']\n",
"carrier_column"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Individual columns are pandas `Series` objects."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(carrier_column)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"How are Series different from DataFrames?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- They're always 1-dimensional"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- They have different attributes than DataFrames\n",
" - For example, Series have a `to_list` method -- which doesn't make sense to have on DataFrames"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- They don't print in the pretty format of DataFrames, but in plain text (see above)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(16,)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"carrier_column.shape"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(16, 2)"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['9E',\n",
" 'AA',\n",
" 'AS',\n",
" 'B6',\n",
" 'DL',\n",
" 'EV',\n",
" 'F9',\n",
" 'FL',\n",
" 'HA',\n",
" 'MQ',\n",
" 'OO',\n",
" 'UA',\n",
" 'US',\n",
" 'VX',\n",
" 'WN',\n",
" 'YV']"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"carrier_column.to_list()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"ci-skip"
]
},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'DataFrame' object has no attribute 'to_list'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[23], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m df\u001b[39m.\u001b[39mto_list()\n",
"File \u001b[0;32m/usr/local/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/core/generic.py:5989\u001b[0m, in \u001b[0;36mNDFrame.__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 5982\u001b[0m \u001b[39mif\u001b[39;00m (\n\u001b[1;32m 5983\u001b[0m name \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_internal_names_set\n\u001b[1;32m 5984\u001b[0m \u001b[39mand\u001b[39;00m name \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_metadata\n\u001b[1;32m 5985\u001b[0m \u001b[39mand\u001b[39;00m name \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_accessors\n\u001b[1;32m 5986\u001b[0m \u001b[39mand\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_info_axis\u001b[39m.\u001b[39m_can_hold_identifiers_and_holds_name(name)\n\u001b[1;32m 5987\u001b[0m ):\n\u001b[1;32m 5988\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m[name]\n\u001b[0;32m-> 5989\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mobject\u001b[39m\u001b[39m.\u001b[39m\u001b[39m__getattribute__\u001b[39m(\u001b[39mself\u001b[39m, name)\n",
"\u001b[0;31mAttributeError\u001b[0m: 'DataFrame' object has no attribute 'to_list'"
]
}
],
"source": [
"df.to_list()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"It's important to be familiar with Series because they are fundamentally the core of DataFrames.\n",
"Not only are columns represented as Series, but so are rows!"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"carrier 9E\n",
"name Endeavor Air Inc.\n",
"Name: 0, dtype: object"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Fetch the first row of the DataFrame\n",
"first_row = df.loc[0]\n",
"first_row"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(first_row)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Whenever you select individual columns or rows, you'll get Series objects."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## What Can You Do with a Series?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"First, let's create our own Series object from scratch -- they don't need to come from a DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 10\n",
"1 20\n",
"2 30\n",
"3 40\n",
"4 50\n",
"dtype: int64"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Pass a list in as an argument and it will be converted to a Series.\n",
"s = pd.Series([10, 20, 30, 40, 50])\n",
"s"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 10\n",
"1 20\n",
"2 30\n",
"3 40\n",
"4 50\n",
"dtype: int64"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Pass a list in as an argument and it will be converted to a Series.\n",
"s = pd.Series([10, 20, 30, 40, 50])\n",
"s"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"There are 3 things to notice about this Series:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- The values (10, 20, 30...)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- The *dtype*, short for data type."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- The *index* (0, 1, 2...)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Values\n",
"Values are fairly self-explanatory; we chose them in our input list."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### dtype\n",
"Data types are also straightforward."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Series are always homogeneous, holding only integers, floats, or generic Python objects (called just `object`)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Because a Python object is general enough to contain any other type, any Series holding strings or other non-numeric data will typically default to be of type `object`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"For example, going back to our carriers DataFrame, note that the carrier column is of type `object`."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 9E\n",
"1 AA\n",
"2 AS\n",
"3 B6\n",
"4 DL\n",
"5 EV\n",
"6 F9\n",
"7 FL\n",
"8 HA\n",
"9 MQ\n",
"10 OO\n",
"11 UA\n",
"12 US\n",
"13 VX\n",
"14 WN\n",
"15 YV\n",
"Name: carrier, dtype: object"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['carrier']"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Index\n",
"Indexes are more interesting.\n",
"Every Series has an index, a way to reference each element.\n",
"The index of a Series is a lot like the keys of a dictionary: each index element corresponds to a value in the Series, and can be used to look up that element."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"RangeIndex(start=0, stop=5, step=1)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Our index is a range from 0 (inclusive) to 5 (exclusive).\n",
"s.index"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 10\n",
"1 20\n",
"2 30\n",
"3 40\n",
"4 50\n",
"dtype: int64"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"40"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[3]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"In our example, the index is just the integers 0-4, so right now it looks no different that referencing elements of a regular Python list.\n",
"*But* indexes can be changed to something different -- like the letters a-e, for example."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"a 10\n",
"b 20\n",
"c 30\n",
"d 40\n",
"e 50\n",
"dtype: int64"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.index = ['a', 'b', 'c', 'd', 'e']\n",
"s"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Now to look up the value 40, we reference `'d'`."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"40"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s['d']"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"We saw earlier that rows of a DataFrame are Series.\n",
"In such cases, the flexibility of Series indexes comes in handy;\n",
"the index is set to the DataFrame column names."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" carrier | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9E | \n",
" Endeavor Air Inc. | \n",
"
\n",
" \n",
" 1 | \n",
" AA | \n",
" American Airlines Inc. | \n",
"
\n",
" \n",
" 2 | \n",
" AS | \n",
" Alaska Airlines Inc. | \n",
"
\n",
" \n",
" 3 | \n",
" B6 | \n",
" JetBlue Airways | \n",
"
\n",
" \n",
" 4 | \n",
" DL | \n",
" Delta Air Lines Inc. | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" carrier name\n",
"0 9E Endeavor Air Inc.\n",
"1 AA American Airlines Inc.\n",
"2 AS Alaska Airlines Inc.\n",
"3 B6 JetBlue Airways\n",
"4 DL Delta Air Lines Inc."
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"carrier 9E\n",
"name Endeavor Air Inc.\n",
"Name: 0, dtype: object"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Note that the index is ['carrier', 'name']\n",
"first_row = df.loc[0]\n",
"first_row"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"This is particularly handy because it means you can extract individual elements based on a column name."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'9E'"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first_row['carrier']"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## DataFrame Indexes"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"It's not just Series that have indexes!\n",
"DataFrames have them too.\n",
"Take a look at the carrier DataFrame again and note the bold numbers on the left."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" carrier | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9E | \n",
" Endeavor Air Inc. | \n",
"
\n",
" \n",
" 1 | \n",
" AA | \n",
" American Airlines Inc. | \n",
"
\n",
" \n",
" 2 | \n",
" AS | \n",
" Alaska Airlines Inc. | \n",
"
\n",
" \n",
" 3 | \n",
" B6 | \n",
" JetBlue Airways | \n",
"
\n",
" \n",
" 4 | \n",
" DL | \n",
" Delta Air Lines Inc. | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" carrier name\n",
"0 9E Endeavor Air Inc.\n",
"1 AA American Airlines Inc.\n",
"2 AS Alaska Airlines Inc.\n",
"3 B6 JetBlue Airways\n",
"4 DL Delta Air Lines Inc."
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"These numbers are an index, just like the one we saw on our example Series.\n",
"And DataFrame indexes support similar functionality."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"RangeIndex(start=0, stop=16, step=1)"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Our index is a range from 0 (inclusive) to 16 (exclusive).\n",
"df.index"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"When loading in a DataFrame, the default index will always be 0 to N-1, where N is the number of rows in your DataFrame.\n",
"This is called a `RangeIndex`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Selecting individual rows by their index is done with the `.loc` accessor.\n",
"An *accessor* is an attribute designed specifically to help users reference something else (like rows within a DataFrame)."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"carrier DL\n",
"name Delta Air Lines Inc.\n",
"Name: 4, dtype: object"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get the row at index 4 (the fifth row).\n",
"df.loc[4]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"As with Series, DataFrames support reassigning their index."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"However, with DataFrames it often makes sense to change one of your columns into the index."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This is analogous to a primary key in relational databases: a way to rapidly look up rows within a table."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"In our case, maybe we will often use the carrier code (`carrier`) to look up the full name of the airline.\n",
"In that case, it would make sense set the carrier column as our index."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
"
\n",
" \n",
" carrier | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 9E | \n",
" Endeavor Air Inc. | \n",
"
\n",
" \n",
" AA | \n",
" American Airlines Inc. | \n",
"
\n",
" \n",
" AS | \n",
" Alaska Airlines Inc. | \n",
"
\n",
" \n",
" B6 | \n",
" JetBlue Airways | \n",
"
\n",
" \n",
" DL | \n",
" Delta Air Lines Inc. | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name\n",
"carrier \n",
"9E Endeavor Air Inc.\n",
"AA American Airlines Inc.\n",
"AS Alaska Airlines Inc.\n",
"B6 JetBlue Airways\n",
"DL Delta Air Lines Inc."
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df.set_index('carrier')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Now the RangeIndex has been replaced with a more meaningful index, and it's possible to look up rows of the table by passing carrier code to the `.loc` accessor."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"name United Air Lines Inc.\n",
"Name: UA, dtype: object"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc['UA']"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"\n",
"
Caution!
\n",
"
Pandas does not require that indexes have unique values (that is, no duplicates) although many relational databases do have that requirement of a primary key. This means that it is *possible* to create a non-unique index, but highly inadvisable. Having duplicate values in your index can cause unexpected results when you refer to rows by index -- but multiple rows have that index. Don't do it if you can help it!
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"When starting to work with a DataFrame, it's often a good idea to determine what column makes sense as your index and to set it immediately."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This will make your code nicer -- by letting you directly look up values with the index -- and also make your selections and filters faster, because Pandas is optimized for operations by index."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"If you want to change the index of your DataFrame later, you can always `reset_index` (and then assign a new one)."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
"
\n",
" \n",
" carrier | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 9E | \n",
" Endeavor Air Inc. | \n",
"
\n",
" \n",
" AA | \n",
" American Airlines Inc. | \n",
"
\n",
" \n",
" AS | \n",
" Alaska Airlines Inc. | \n",
"
\n",
" \n",
" B6 | \n",
" JetBlue Airways | \n",
"
\n",
" \n",
" DL | \n",
" Delta Air Lines Inc. | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name\n",
"carrier \n",
"9E Endeavor Air Inc.\n",
"AA American Airlines Inc.\n",
"AS Alaska Airlines Inc.\n",
"B6 JetBlue Airways\n",
"DL Delta Air Lines Inc."
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" carrier | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9E | \n",
" Endeavor Air Inc. | \n",
"
\n",
" \n",
" 1 | \n",
" AA | \n",
" American Airlines Inc. | \n",
"
\n",
" \n",
" 2 | \n",
" AS | \n",
" Alaska Airlines Inc. | \n",
"
\n",
" \n",
" 3 | \n",
" B6 | \n",
" JetBlue Airways | \n",
"
\n",
" \n",
" 4 | \n",
" DL | \n",
" Delta Air Lines Inc. | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" carrier name\n",
"0 9E Endeavor Air Inc.\n",
"1 AA American Airlines Inc.\n",
"2 AS Alaska Airlines Inc.\n",
"3 B6 JetBlue Airways\n",
"4 DL Delta Air Lines Inc."
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df.reset_index()\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Your Turn\n",
"\n",
"The below cell has code to load in the first 100 rows of the airports data as `airports`.\n",
"The data contains the airport code, airport name, and some basic facts about the airport location."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" faa | \n",
" name | \n",
" lat | \n",
" lon | \n",
" alt | \n",
" tz | \n",
" dst | \n",
" tzone | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 04G | \n",
" Lansdowne Airport | \n",
" 41.130472 | \n",
" -80.619583 | \n",
" 1044 | \n",
" -5 | \n",
" A | \n",
" America/New_York | \n",
"
\n",
" \n",
" 1 | \n",
" 06A | \n",
" Moton Field Municipal Airport | \n",
" 32.460572 | \n",
" -85.680028 | \n",
" 264 | \n",
" -6 | \n",
" A | \n",
" America/Chicago | \n",
"
\n",
" \n",
" 2 | \n",
" 06C | \n",
" Schaumburg Regional | \n",
" 41.989341 | \n",
" -88.101243 | \n",
" 801 | \n",
" -6 | \n",
" A | \n",
" America/Chicago | \n",
"
\n",
" \n",
" 3 | \n",
" 06N | \n",
" Randall Airport | \n",
" 41.431912 | \n",
" -74.391561 | \n",
" 523 | \n",
" -5 | \n",
" A | \n",
" America/New_York | \n",
"
\n",
" \n",
" 4 | \n",
" 09J | \n",
" Jekyll Island Airport | \n",
" 31.074472 | \n",
" -81.427778 | \n",
" 11 | \n",
" -5 | \n",
" A | \n",
" America/New_York | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" faa name lat lon alt tz dst \\\n",
"0 04G Lansdowne Airport 41.130472 -80.619583 1044 -5 A \n",
"1 06A Moton Field Municipal Airport 32.460572 -85.680028 264 -6 A \n",
"2 06C Schaumburg Regional 41.989341 -88.101243 801 -6 A \n",
"3 06N Randall Airport 41.431912 -74.391561 523 -5 A \n",
"4 09J Jekyll Island Airport 31.074472 -81.427778 11 -5 A \n",
"\n",
" tzone \n",
"0 America/New_York \n",
"1 America/Chicago \n",
"2 America/Chicago \n",
"3 America/New_York \n",
"4 America/New_York "
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"airports = pd.read_csv('../data/airports.csv')\n",
"airports = airports.loc[0:100]\n",
"airports.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"1. What kind of index is the current index of `airports`? \n",
"2. Is this a good choice for the DataFrame's index? If not, what column or columns would be a better candidate?\n",
"3. If you chose a different column to be the index, make it your index using `airports.set_index()`.\n",
"4. Using your new index, look up \"Pittsburgh-Monroeville Airport\", code 4G0. What is its altitude?\n",
"5. Reset your index in case you want to make a different column your index in the future."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Questions\n",
"\n",
"Are there any questions before we move on?"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
},
"rise": {
"autolaunch": true,
"transition": "none"
}
},
"nbformat": 4,
"nbformat_minor": 4
}