{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Selecting Rows\n", "\n", "- [Download the lecture notes](https://philchodrow.github.io/PIC16A/content/pd/pd_2.ipynb). \n", "\n", "In the last lecture, we saw how to extract specific columns from a data frame. In many cases, we also need to extract specific rows. This operation is often called \"filtering\" -- we are filtering out the rows that we don't want, leaving the ones that we do. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
0Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.1
1Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.5
2Adelie Penguin (Pygoscelis adeliae)AnversTorgersen40.3
3Adelie Penguin (Pygoscelis adeliae)AnversTorgersenNaN
4Adelie Penguin (Pygoscelis adeliae)AnversTorgersen36.7
\n", "
" ], "text/plain": [ " Species Region Island Culmen Length (mm)\n", "0 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.1\n", "1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.5\n", "2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 40.3\n", "3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen NaN\n", "4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 36.7" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# you'll need to run the first block in pd_2.ipynb \n", "# to download the data if you have not already done so\n", "\n", "penguins = pd.read_csv(\"palmer_penguins.csv\")\n", "# just the first five rows and selected columns\n", "penguins = penguins[[\"Species\", \"Region\", \"Island\", \"Culmen Length (mm)\"]]\n", "penguins.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simplest way to select rows of data is by explicitly naming the value(s) of the index for the rows you want. Remember that the index is the set of bold numbers at the far left. To do this, you should use the `df.loc` attribute of the data frame, like this: " ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
1Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.5
2Adelie Penguin (Pygoscelis adeliae)AnversTorgersen40.3
3Adelie Penguin (Pygoscelis adeliae)AnversTorgersenNaN
\n", "
" ], "text/plain": [ " Species Region Island Culmen Length (mm)\n", "1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.5\n", "2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 40.3\n", "3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen NaN" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "penguins.loc[1:3] # rows with index values 1 through 3" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
1Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.5
4Adelie Penguin (Pygoscelis adeliae)AnversTorgersen36.7
0Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.1
\n", "
" ], "text/plain": [ " Species Region Island Culmen Length (mm)\n", "1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.5\n", "4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 36.7\n", "0 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.1" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# passing an explicit list can change the order of the rows. \n", "s = penguins.loc[[1, 4, 0]]\n", "s" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Species Adelie Penguin (Pygoscelis adeliae)\n", "Region Anvers\n", "Island Torgersen\n", "Culmen Length (mm) 36.7\n", "Name: 4, dtype: object" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# note that this works, even though s does not have a 4th row, \n", "# because s does have an index with value 4\n", "s.loc[4]" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "ename": "KeyError", "evalue": "2", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 2645\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2646\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2647\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n", "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n", "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.Int64HashTable.get_item\u001b[0;34m()\u001b[0m\n", "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.Int64HashTable.get_item\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mKeyError\u001b[0m: 2", "\nDuring handling of the above exception, another exception occurred:\n", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# on the other hand, this doesn't work\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1766\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1767\u001b[0m \u001b[0mmaybe_callable\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_if_callable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1768\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmaybe_callable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1769\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1770\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_is_scalar_access\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mTuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_axis\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1963\u001b[0m \u001b[0;31m# fall thru to straight lookup\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1964\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_key\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1965\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_label\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1966\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1967\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_get_label\u001b[0;34m(self, label, axis)\u001b[0m\n\u001b[1;32m 623\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mIndexingError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"no slices here, handle elsewhere\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 624\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 625\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_xs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 626\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 627\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36mxs\u001b[0;34m(self, key, axis, level, drop_level)\u001b[0m\n\u001b[1;32m 3535\u001b[0m \u001b[0mloc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnew_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc_level\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdrop_level\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdrop_level\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3536\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3537\u001b[0;31m \u001b[0mloc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3538\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3539\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndarray\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 2646\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2647\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2648\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_maybe_cast_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2649\u001b[0m \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtolerance\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtolerance\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2650\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msize\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n", "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n", "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.Int64HashTable.get_item\u001b[0;34m()\u001b[0m\n", "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.Int64HashTable.get_item\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mKeyError\u001b[0m: 2" ] } ], "source": [ "# on the other hand, this doesn't work\n", "s.loc[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Boolean Indexing\n", "\n", "While it's good to know how to refer to rows by index, this is not the most useful way to filter data frames. Boolean indexing instead allows us to filter the rows of a data set based on one or more conditions. Boolean indexing in data frames is very similar to Boolean indexing in `numpy` arrays. " ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
0Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.1
1Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.5
2Adelie Penguin (Pygoscelis adeliae)AnversTorgersen40.3
3Adelie Penguin (Pygoscelis adeliae)AnversTorgersenNaN
4Adelie Penguin (Pygoscelis adeliae)AnversTorgersen36.7
...............
339Gentoo penguin (Pygoscelis papua)AnversBiscoeNaN
340Gentoo penguin (Pygoscelis papua)AnversBiscoe46.8
341Gentoo penguin (Pygoscelis papua)AnversBiscoe50.4
342Gentoo penguin (Pygoscelis papua)AnversBiscoe45.2
343Gentoo penguin (Pygoscelis papua)AnversBiscoe49.9
\n", "

344 rows × 4 columns

\n", "
" ], "text/plain": [ " Species Region Island \\\n", "0 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", ".. ... ... ... \n", "339 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "340 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "341 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "342 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "343 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "\n", " Culmen Length (mm) \n", "0 39.1 \n", "1 39.5 \n", "2 40.3 \n", "3 NaN \n", "4 36.7 \n", ".. ... \n", "339 NaN \n", "340 46.8 \n", "341 50.4 \n", "342 45.2 \n", "343 49.9 \n", "\n", "[344 rows x 4 columns]" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "penguins" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 True\n", "1 True\n", "2 False\n", "3 False\n", "4 True\n", " ... \n", "339 False\n", "340 False\n", "341 False\n", "342 False\n", "343 False\n", "Name: Culmen Length (mm), Length: 344, dtype: bool" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "penguins['Culmen Length (mm)'] < 40" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
0Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.1
1Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.5
4Adelie Penguin (Pygoscelis adeliae)AnversTorgersen36.7
5Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.3
6Adelie Penguin (Pygoscelis adeliae)AnversTorgersen38.9
...............
146Adelie Penguin (Pygoscelis adeliae)AnversDream39.2
147Adelie Penguin (Pygoscelis adeliae)AnversDream36.6
148Adelie Penguin (Pygoscelis adeliae)AnversDream36.0
149Adelie Penguin (Pygoscelis adeliae)AnversDream37.8
150Adelie Penguin (Pygoscelis adeliae)AnversDream36.0
\n", "

100 rows × 4 columns

\n", "
" ], "text/plain": [ " Species Region Island \\\n", "0 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "6 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", ".. ... ... ... \n", "146 Adelie Penguin (Pygoscelis adeliae) Anvers Dream \n", "147 Adelie Penguin (Pygoscelis adeliae) Anvers Dream \n", "148 Adelie Penguin (Pygoscelis adeliae) Anvers Dream \n", "149 Adelie Penguin (Pygoscelis adeliae) Anvers Dream \n", "150 Adelie Penguin (Pygoscelis adeliae) Anvers Dream \n", "\n", " Culmen Length (mm) \n", "0 39.1 \n", "1 39.5 \n", "4 36.7 \n", "5 39.3 \n", "6 38.9 \n", ".. ... \n", "146 39.2 \n", "147 36.6 \n", "148 36.0 \n", "149 37.8 \n", "150 36.0 \n", "\n", "[100 rows x 4 columns]" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "penguins[penguins['Culmen Length (mm)'] < 40]" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
0Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.1
1Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.5
2Adelie Penguin (Pygoscelis adeliae)AnversTorgersen40.3
3Adelie Penguin (Pygoscelis adeliae)AnversTorgersenNaN
4Adelie Penguin (Pygoscelis adeliae)AnversTorgersen36.7
\n", "
" ], "text/plain": [ " Species Region Island Culmen Length (mm)\n", "0 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.1\n", "1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.5\n", "2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 40.3\n", "3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen NaN\n", "4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 36.7" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# list of penguins encountered on Torgersen island\n", "torg = penguins['Island']== \"Torgersen\"\n", "penguins[torg].head()" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
0Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.1
1Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.5
4Adelie Penguin (Pygoscelis adeliae)AnversTorgersen36.7
5Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.3
6Adelie Penguin (Pygoscelis adeliae)AnversTorgersen38.9
\n", "
" ], "text/plain": [ " Species Region Island Culmen Length (mm)\n", "0 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.1\n", "1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.5\n", "4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 36.7\n", "5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.3\n", "6 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 38.9" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# penguins encountered on Torgersen with culmen no longer than 40 mm\n", "# using bitwise and operator &\n", "culm = penguins['Culmen Length (mm)'] < 40\n", "penguins[torg & culm].head()" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
0Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.1
1Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.5
2Adelie Penguin (Pygoscelis adeliae)AnversTorgersen40.3
3Adelie Penguin (Pygoscelis adeliae)AnversTorgersenNaN
4Adelie Penguin (Pygoscelis adeliae)AnversTorgersen36.7
\n", "
" ], "text/plain": [ " Species Region Island Culmen Length (mm)\n", "0 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.1\n", "1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 39.5\n", "2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 40.3\n", "3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen NaN\n", "4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen 36.7" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# using bitwise or instead of and\n", "penguins[torg | culm].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An especially useful example of Boolean indexing is picking out `nan` values from the data. " ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
3Adelie Penguin (Pygoscelis adeliae)AnversTorgersenNaN
339Gentoo penguin (Pygoscelis papua)AnversBiscoeNaN
\n", "
" ], "text/plain": [ " Species Region Island \\\n", "3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "339 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "\n", " Culmen Length (mm) \n", "3 NaN \n", "339 NaN " ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nas = penguins[\"Culmen Length (mm)\"].isna()\n", "penguins[nas]" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesRegionIslandCulmen Length (mm)
0Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.1
1Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.5
2Adelie Penguin (Pygoscelis adeliae)AnversTorgersen40.3
4Adelie Penguin (Pygoscelis adeliae)AnversTorgersen36.7
5Adelie Penguin (Pygoscelis adeliae)AnversTorgersen39.3
...............
338Gentoo penguin (Pygoscelis papua)AnversBiscoe47.2
340Gentoo penguin (Pygoscelis papua)AnversBiscoe46.8
341Gentoo penguin (Pygoscelis papua)AnversBiscoe50.4
342Gentoo penguin (Pygoscelis papua)AnversBiscoe45.2
343Gentoo penguin (Pygoscelis papua)AnversBiscoe49.9
\n", "

342 rows × 4 columns

\n", "
" ], "text/plain": [ " Species Region Island \\\n", "0 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", "5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen \n", ".. ... ... ... \n", "338 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "340 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "341 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "342 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "343 Gentoo penguin (Pygoscelis papua) Anvers Biscoe \n", "\n", " Culmen Length (mm) \n", "0 39.1 \n", "1 39.5 \n", "2 40.3 \n", "4 36.7 \n", "5 39.3 \n", ".. ... \n", "338 47.2 \n", "340 46.8 \n", "341 50.4 \n", "342 45.2 \n", "343 49.9 \n", "\n", "[342 rows x 4 columns]" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# invert flips the entries of a boolean array\n", "penguins[np.invert(nas)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Boolean indexing is by far the most useful form of filtering, and should usually be preferred in most practical contexts. It is especially powerful when combined with functions that operate on columns, as we'll see shortly. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }