{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Accessing a DataFrame\n",
"There are many [different choices for indexing](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) DataFrames available.\n",
"\n",
"Let's explore!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Setup\n",
"import pandas as pd\n",
"\n",
"test_user_data = {\n",
" 'first_name': ['Craig', 'Treasure', 'Ashley', 'Guil'],\n",
" 'last_name': ['Dennis', 'Porth', 'Boucher', 'Hernandez'],\n",
" 'balance': [42.42, 25.00, 2.02, 87.00]\n",
"}\n",
"test_user_names = ['craigsdennis', 'treasure', 'lindsay2000', 'guil']\n",
"\n",
"users = pd.DataFrame(test_user_data, index=test_user_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieve a specific Series\n",
"\n",
"### By Column Name\n",
"Each column is actually a `Series`. The `DataFrame` provides access to each of these `Series` by a column name index.\n",
"\n",
"For instance, to get the **`balance`** `Series`, you could just use that for the index."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"craigsdennis 42.42\n",
"treasure 25.00\n",
"lindsay2000 2.02\n",
"guil 87.00\n",
"Name: balance, dtype: float64"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"balances = users['balance']\n",
"balances"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the returned `Series` has a property of `name` set to the column label."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'balance'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"balances.name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### By Label\n",
"You can retrieve a row from a `DataFrame` by using the `loc` property and supply the label. \n",
"\n",
"Note how the returned `Series` is labelled by the existing column labels of the **`users`** `DataFrame` and the `name` property is set to the label."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"first_name Guil\n",
"last_name Hernandez\n",
"balance 87\n",
"Name: guil, dtype: object"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"users.loc['guil']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### By Position\n",
"Normal list like indices are also available to get a specific row, by using the `iloc` ( `i`ndex `loc`ation) property and the appropriate index.\n",
"\n",
"For instance the second row can be accessed by using the index `1`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"first_name Treasure\n",
"last_name Porth\n",
"balance 25\n",
"Name: treasure, dtype: object"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"users.iloc[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieve a Specific Value\n",
"\n",
"### By Chaining\n",
"You can definitely [chain lookups together](https://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy), but there's a better way to do it!\n",
"\n",
"Note that this is really running multiple separate lookups. You don't want to force this extra work to happen. However, I do want you to see that you can (and you will see this!)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Craig'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# CAREFUL: This first retrieves the column `Series`\n",
"# and then uses the label.\n",
"users['first_name']['craigsdennis']"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Craig'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# CAREFUL: This retrieves the row `Series` \n",
"# and then does a lookup for first_name\n",
"users.loc['craigsdennis']['first_name']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### By Row and Column\n",
"The DataFrame allows access in a single operation to a specific value by specifying a row and a column. There are a few options here too.\n",
"\n",
"#### Using [`DataFrame.loc`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html)\n",
"\n",
"\n",
"\n",
"The `loc` property on `DataFrame` allows you to work on both axes, rows and columns, by indexing with a tuple (remember a comma creates a tuple). The first element of the tuple is the row access, second is the column.\n",
"\n",
"Using a single label and single column name will retrieve that specific scalar value."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Craig'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"users.loc['craigsdennis', 'first_name']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Using [`DataFrame.at`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.at.html#pandas.DataFrame.at)\n",
"\n",
"To be even more explicit (💪🏾) that you are trying to just get at a single scalar value, you should use the `DataFrame.at` property."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Craig'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"users.at['craigsdennis', 'first_name']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieve a Specific DataFrame Through Slicing\n",
"\n",
"Using the `loc` and `iloc` properties you can slice an existing `DataFrame` into a new one."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" balance | \n",
" last_name | \n",
"
\n",
" \n",
" \n",
" \n",
" craigsdennis | \n",
" 42.42 | \n",
" Dennis | \n",
"
\n",
" \n",
" treasure | \n",
" 25.00 | \n",
" Porth | \n",
"
\n",
" \n",
" lindsay2000 | \n",
" 2.02 | \n",
" Boucher | \n",
"
\n",
" \n",
" guil | \n",
" 87.00 | \n",
" Hernandez | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" balance last_name\n",
"craigsdennis 42.42 Dennis\n",
"treasure 25.00 Porth\n",
"lindsay2000 2.02 Boucher\n",
"guil 87.00 Hernandez"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# All rows and the following ordered columns as a list (fancy, eh?)\n",
"users.loc[:, ['balance', 'last_name']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When using a slice with `loc` the results are inclusive, meaning they include the right side."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" first_name | \n",
" last_name | \n",
" balance | \n",
"
\n",
" \n",
" \n",
" \n",
" treasure | \n",
" Treasure | \n",
" Porth | \n",
" 25.00 | \n",
"
\n",
" \n",
" lindsay2000 | \n",
" Ashley | \n",
" Boucher | \n",
" 2.02 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" first_name last_name balance\n",
"treasure Treasure Porth 25.00\n",
"lindsay2000 Ashley Boucher 2.02"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# All rows starting with 'treasure' up to and including 'lindsay2000'\n",
"# and all the columns\n",
"users.loc['treasure':'lindsay2000', :]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When using a slice with `iloc` the results are exclusive, just like standard Python list slices "
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" last_name | \n",
" balance | \n",
"
\n",
" \n",
" \n",
" \n",
" treasure | \n",
" Porth | \n",
" 25.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" last_name balance\n",
"treasure Porth 25.0"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# All rows starting with the 2nd up until but not including the 3rd\n",
"# and all columns starting at the second until the end\n",
"users.iloc[1:2, 1:]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}