{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Accessing a DataFrame\n", "There are many [different choices for indexing](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) DataFrames.\n", "\n", "Let's explore!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Setup\n", "import pandas as pd\n", "\n", "test_user_data = {\n", " 'first_name': ['Craig', 'Treasure', 'Ashley', 'Guil'],\n", " 'last_name': ['Dennis', 'Porth', 'Boucher', 'Hernandez'],\n", " 'balance': [42.42, 25.00, 2.02, 87.00]\n", "}\n", "test_user_names = ['craigsdennis', 'treasure', 'lindsay2000', 'guil']\n", "\n", "users = pd.DataFrame(test_user_data, index=test_user_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrieve a specific Series\n", "\n", "### By Column Name\n", "Each column in a `DataFrame` is actually a `Series`. The `DataFrame` provides access to each of these `Series` by a column name index.\n", "\n", "For instance, to get the **`balance`** `Series`, you could just use that for the index." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "craigsdennis 42.42\n", "treasure 25.00\n", "lindsay2000 2.02\n", "guil 87.00\n", "Name: balance, dtype: float64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "balances = users['balance']\n", "balances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the returned `Series` has a property of `name` set to the column label." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'balance'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "balances.name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Label\n", "You can retrieve a row from a `DataFrame` by using the `loc` property and supply the label. \n", "\n", "Note how the returned `Series` is labelled by the existing column labels of the **`users`** `DataFrame` and the `name` property is set to the label." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "first_name Guil\n", "last_name Hernandez\n", "balance 87\n", "Name: guil, dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "users.loc['guil']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Position\n", "Normal list like indices are also available to get a specific row, by using the `iloc` ( `i`ndex `loc`ation) property and the appropriate index.\n", "\n", "For instance the second row can be accessed by using the index `1`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "first_name Treasure\n", "last_name Porth\n", "balance 25\n", "Name: treasure, dtype: object" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "users.iloc[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrieve a Specific Value\n", "\n", "### By Chaining\n", "You can definitely [chain lookups together](https://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy), but there's a better way to do it!\n", "\n", "Note that this is really running multiple separate lookups. You don't want to force this extra work to happen. However, I do want you to see that you can (and you will see this!)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Craig'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# CAREFUL: This first retrieves the column `Series`\n", "# and then uses the label.\n", "users['first_name']['craigsdennis']" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Craig'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# CAREFUL: This retrieves the row `Series` \n", "# and then does a lookup for first_name\n", "users.loc['craigsdennis']['first_name']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Row and Column\n", "The DataFrame allows access in a single operation to a specific value by specifying a row and a column. There are a few options here too.\n", "\n", "#### Using [`DataFrame.loc`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html)\n", "\n", "\n", "\n", "The `loc` property on `DataFrame` allows you to work on both axes, rows and columns, by indexing with a tuple (remember a comma creates a tuple). The first element of the tuple is the row access, second is the column.\n", "\n", "Using a single label and single column name will retrieve that specific scalar value." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Craig'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "users.loc['craigsdennis', 'first_name']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using [`DataFrame.at`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.at.html#pandas.DataFrame.at)\n", "\n", "To be even more explicit (💪🏾) that you are trying to just get at a single scalar value, you should use the `DataFrame.at` property." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Craig'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "users.at['craigsdennis', 'first_name']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrieve a Specific DataFrame Through Slicing\n", "\n", "Using the `loc` and `iloc` properties you can slice an existing `DataFrame` into a new one.\n", "\n", "In the example below we use `:` in the rows axis to select all rows, and we specify which columns we want back using a list in the columns axis, ala NumPy Fancy Indexing." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
balancelast_name
craigsdennis42.42Dennis
treasure25.00Porth
lindsay20002.02Boucher
guil87.00Hernandez
\n", "
" ], "text/plain": [ " balance last_name\n", "craigsdennis 42.42 Dennis\n", "treasure 25.00 Porth\n", "lindsay2000 2.02 Boucher\n", "guil 87.00 Hernandez" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# All rows and the following ordered columns as a list (fancy, eh?)\n", "users.loc[:, ['balance', 'last_name']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using a slice with `loc` the results are inclusive, meaning they include the right side." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
first_namelast_namebalance
treasureTreasurePorth25.00
lindsay2000AshleyBoucher2.02
\n", "
" ], "text/plain": [ " first_name last_name balance\n", "treasure Treasure Porth 25.00\n", "lindsay2000 Ashley Boucher 2.02" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# All rows starting with 'treasure' up to and including 'lindsay2000'\n", "# and all the columns\n", "users.loc['treasure':'lindsay2000', :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using a slice with `iloc` the results are exclusive, just like standard Python list slices " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
last_namebalance
treasurePorth25.0
\n", "
" ], "text/plain": [ " last_name balance\n", "treasure Porth 25.0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# All rows starting with the 2nd up until but not including the 3rd\n", "# and all columns starting at the second until the end\n", "users.iloc[1:2, 1:]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }