{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "c8e7b835", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "import numpy as np\n", "import pandas as pd\n", "!{sys.executable} -m pip install --quiet jupyterlab_myst ipython" ] }, { "cell_type": "markdown", "id": "76e49f3c", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "id": "105bf8eb", "metadata": {}, "source": [ "\n", "# Introduction and Data Structures\n", " \n", "Pandas is a fast, powerful, flexible and easy-to-use open-source data analysis and manipulation tool, built on top of the Python programming language.\n", "\n", "## Introducing Pandas objects\n", "\n", "In 3 sections, we’ll start with a quick, non-comprehensive overview of the fundamental data structures in Pandas to get you started. The fundamental behavior about data types, indexing, axis labeling, and alignment apply across all of the objects. " ] }, { "cell_type": "markdown", "id": "2818782f-4106-4c67-9491-5569ffdaaf19", "metadata": {}, "source": [ "## Overview\n", "\n", "In this section, we'll introduce two data structure of pandas and the basic concept of data indexing and selection." ] }, { "cell_type": "markdown", "id": "bb9af208", "metadata": {}, "source": [ "### Series\n", "\n", "`Series` is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the **index**. The basic method to create a `Series` is to call:" ] }, { "cell_type": "markdown", "id": "2d6b11bf-93b8-439d-83ed-9ffae399bb1f", "metadata": { "attributes": { "classes": [ "py" ], "id": "" } }, "source": [ "`s = pd.Series(data, index=index)`" ] }, { "cell_type": "markdown", "id": "475acfee", "metadata": {}, "source": [ "Here, `data` can be many different things:\n", "\n", "- a Python dict\n", "- an ndarray\n", "- a scalar value (like 5)\n", "\n", "\n", "The passed **index** is a list of axis labels. Thus, this separates into a few cases depending on what the **data is**:\n", "\n", "#### Create a Series\n", "\n", "##### From ndarray\n", "\n", "If `data` is an ndarray, **index** must be the same length as the **data**. If no index is passed, one will be created having values `[0, ..., len(data) - 1]`." ] }, { "cell_type": "code", "execution_count": 3, "id": "646c8580", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "s = pd.Series(np.random.randn(5), index=[\"a\", \"b\", \"c\", \"d\", \"e\"])" ] }, { "cell_type": "code", "execution_count": 4, "id": "2d2455c1", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a -1.093235\n", "b -0.875870\n", "c 0.548668\n", "d -0.396314\n", "e -0.462231\n", "dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 5, "id": "20f33329", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "Index(['a', 'b', 'c', 'd', 'e'], dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.index" ] }, { "cell_type": "code", "execution_count": 6, "id": "5376f720", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "0 0.820731\n", "1 -1.040583\n", "2 -1.494295\n", "3 0.214854\n", "4 0.969364\n", "dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(np.random.randn(5))" ] }, { "cell_type": "markdown", "id": "2f4e73c7", "metadata": {}, "source": [ ":::{note}\n", "Pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time.\n", ":::\n", "\n", "##### From dict\n", "`Series` can be instantiated from dicts:" ] }, { "cell_type": "code", "execution_count": 7, "id": "e8095575", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "d = {\"b\": 1, \"a\": 0, \"c\": 2}" ] }, { "cell_type": "code", "execution_count": 8, "id": "ba462934", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "b 1\n", "a 0\n", "c 2\n", "dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(d)" ] }, { "cell_type": "markdown", "id": "c4329868", "metadata": {}, "source": [ "If an index is passed, the values in data corresponding to the labels in the index will be pulled out." ] }, { "cell_type": "code", "execution_count": 9, "id": "03488418", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "d = {\"a\": 0.0, \"b\": 1.0, \"c\": 2.0}" ] }, { "cell_type": "code", "execution_count": 10, "id": "c35e968c", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a 0.0\n", "b 1.0\n", "c 2.0\n", "dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(d)" ] }, { "cell_type": "code", "execution_count": 11, "id": "95eafc4d", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "b 1.0\n", "c 2.0\n", "d NaN\n", "a 0.0\n", "dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(d, index=[\"b\", \"c\", \"d\", \"a\"])" ] }, { "cell_type": "markdown", "id": "1be5c72d", "metadata": {}, "source": [ ":::{note}\n", "NaN (not a number) is the standard missing data marker used in Pandas.\n", ":::\n", "\n", "##### From scalar value\n", "\n", "If `data` is a scalar value, an index must be provided. The value will be repeated to match the length of **index**." ] }, { "cell_type": "code", "execution_count": 12, "id": "6f744115", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a 5.0\n", "b 5.0\n", "c 5.0\n", "d 5.0\n", "e 5.0\n", "dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(5.0, index=[\"a\", \"b\", \"c\", \"d\", \"e\"])" ] }, { "cell_type": "markdown", "id": "8060fb92", "metadata": {}, "source": [ "#### Series is ndarray-like\n", "\n", "`Series` acts very similarly to a `ndarray` and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index." ] }, { "cell_type": "code", "execution_count": 13, "id": "2ca453e9", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "-1.0932348256866344" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[0]" ] }, { "cell_type": "code", "execution_count": 14, "id": "4cf8e176", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a -1.093235\n", "b -0.875870\n", "c 0.548668\n", "dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[:3]" ] }, { "cell_type": "code", "execution_count": 15, "id": "1bab7730", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "c 0.548668\n", "d -0.396314\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[s > s.median()]" ] }, { "cell_type": "code", "execution_count": 16, "id": "b5e98d89", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "e -0.462231\n", "d -0.396314\n", "b -0.875870\n", "dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[[4, 3, 1]]" ] }, { "cell_type": "code", "execution_count": 17, "id": "c98a7190", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a 0.335131\n", "b 0.416500\n", "c 1.730946\n", "d 0.672796\n", "e 0.629877\n", "dtype: float64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(s)" ] }, { "cell_type": "markdown", "id": "a49ee902", "metadata": {}, "source": [ "Like a NumPy array, a Pandas Series has a single `dtype`." ] }, { "cell_type": "code", "execution_count": 18, "id": "b0298996", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.dtype" ] }, { "cell_type": "markdown", "id": "69857db8", "metadata": {}, "source": [ "If you need the actual array backing a `Series`, use `Series.array`." ] }, { "cell_type": "code", "execution_count": 19, "id": "1989c3a9", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "\n", "[ -1.0932348256866344, -0.8758697962853178, 0.5486679425929234,\n", " -0.39631364702254346, -0.46223111162737424]\n", "Length: 5, dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.array" ] }, { "cell_type": "markdown", "id": "7ed219b0", "metadata": {}, "source": [ "While `Series` is ndarray-like, if you need an actual ndarray, then use `Series.to_numpy()`." ] }, { "cell_type": "code", "execution_count": 20, "id": "1cc04172", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "array([-1.09323483, -0.8758698 , 0.54866794, -0.39631365, -0.46223111])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.to_numpy()" ] }, { "cell_type": "markdown", "id": "12f01f86", "metadata": {}, "source": [ "Even if the `Series` is backed by an `ExtensionArray`, `Series.to_numpy()` will return a NumPy ndarray.\n", "\n", "#### Series is dict-like\n", "\n", "A `Series` is also like a fixed-size dict in that you can get and set values by index label:" ] }, { "cell_type": "code", "execution_count": 21, "id": "bcfe90c9", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "-1.0932348256866344" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[\"a\"]" ] }, { "cell_type": "code", "execution_count": 22, "id": "00c68766", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "s[\"e\"] = 12.0" ] }, { "cell_type": "code", "execution_count": 23, "id": "74f58473", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a -1.093235\n", "b -0.875870\n", "c 0.548668\n", "d -0.396314\n", "e 12.000000\n", "dtype: float64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 24, "id": "2f822110", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"e\" in s" ] }, { "cell_type": "code", "execution_count": 25, "id": "164dcf61", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"f\" in s" ] }, { "cell_type": "markdown", "id": "ca979c84", "metadata": {}, "source": [ "If a label is not contained in the index, an exception is raised:" ] }, { "cell_type": "code", "execution_count": null, "id": "40a23c62-9c88-4a6e-9316-60317abe7859", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "s[\"f\"]" ] }, { "cell_type": "markdown", "id": "396df6e2", "metadata": {}, "source": [ "Using the `Series.get()` method, a missing label will return None or specified default:" ] }, { "cell_type": "code", "execution_count": 27, "id": "ad2a67c6", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "s.get(\"f\")" ] }, { "cell_type": "code", "execution_count": 28, "id": "13c1c13b", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.get(\"f\", np.nan)" ] }, { "cell_type": "markdown", "id": "1b19c44c", "metadata": {}, "source": [ "These labels can also be accessed by `attribute`.\n", "\n", "#### Vectorized operations and label alignment with Series\n", "\n", "When working with raw NumPy arrays, looping through value-by-value is usually not necessary. The same is true when working with `Series` in Pandas. `Series` can also be passed into most NumPy methods expecting an ndarray." ] }, { "cell_type": "code", "execution_count": 29, "id": "35540134", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a -2.186470\n", "b -1.751740\n", "c 1.097336\n", "d -0.792627\n", "e 24.000000\n", "dtype: float64" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s + s" ] }, { "cell_type": "code", "execution_count": 30, "id": "aea7c1dc", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a -2.186470\n", "b -1.751740\n", "c 1.097336\n", "d -0.792627\n", "e 24.000000\n", "dtype: float64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s * 2" ] }, { "cell_type": "code", "execution_count": 31, "id": "4dcdc8c4", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a 0.335131\n", "b 0.416500\n", "c 1.730946\n", "d 0.672796\n", "e 162754.791419\n", "dtype: float64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(s)" ] }, { "cell_type": "markdown", "id": "f8ed10f3", "metadata": {}, "source": [ "A key difference between `Series` and ndarray is that operations between `Series` automatically align the data based on the label. Thus, you can write computations without giving consideration to whether the `Series` involved have the same labels." ] }, { "cell_type": "code", "execution_count": 32, "id": "563555a0", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a NaN\n", "b -1.751740\n", "c 1.097336\n", "d -0.792627\n", "e NaN\n", "dtype: float64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[1:] + s[:-1]" ] }, { "cell_type": "markdown", "id": "7e19643a", "metadata": {}, "source": [ "The result of an operation between unaligned `Series` will have the **union** of the indexes involved. If a label is not found in one `Series` or the other, the result will be marked as missing `NaN`. Being able to write code without doing any explicit data alignment grants immense freedom and flexibility in interactive data analysis and research. The integrated data alignment features of the Pandas data structures set Pandas apart from the majority of related tools for working with labeled data.\n", "\n", ":::{note}\n", "In general, we chose to make the default result of operations between differently indexed objects yield the **union** of the indexes in order to avoid loss of information. Having an index label, though the data is missing, is typically important information as part of a computation. You of course have the option of dropping labels with missing data via the `dropna` function.\n", ":::\n", "\n", "#### Name attribute\n", "\n", "`Series` also has a `name` attribute:" ] }, { "cell_type": "code", "execution_count": null, "id": "3b39834b", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "s = pd.Series(np.random.randn(5), name=\"something\")" ] }, { "cell_type": "code", "execution_count": null, "id": "18210d7f", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "0 -0.085125\n", "1 0.184918\n", "2 0.789339\n", "3 1.517379\n", "4 -0.136928\n", "Name: something, dtype: float64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": null, "id": "06f09ce2", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "'something'" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.name" ] }, { "cell_type": "markdown", "id": "b35b499b", "metadata": {}, "source": [ "The `Series` `name` can be assigned automatically in many cases, in particular, when selecting a single column from a `DataFrame`, the `name` will be assigned the column label.\n", "\n", "You can rename a `Series` with the `pandas.Series.rename()` method." ] }, { "cell_type": "code", "execution_count": null, "id": "bd079c61", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "s2 = s.rename(\"different\")" ] }, { "cell_type": "code", "execution_count": null, "id": "a1767258", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "'different'" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s2.name" ] }, { "cell_type": "markdown", "id": "398a679d", "metadata": {}, "source": [ "Note that `s` and `s2` refer to different objects.\n", "\n", "### DataFrame\n", "\n", "`DataFrame` is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a `dict` of `Series` objects. It is generally the most commonly used Pandas object. Like `Series`, `DataFrame` accepts many different kinds of input:\n", "\n", "- Dict of 1D ndarrays, lists, dicts, or `Series`\n", "- 2-D `numpy.ndarray`\n", "- Structured or record ndarray\n", "- A `Series`\n", "- Another `DataFrame`\n", "\n", "Along with the data, you can optionally pass **index** (row labels) and **columns** (column labels) arguments. If you pass an index and / or columns, you are guaranteeing the index and / or columns of the resulting `DataFrame`. Thus, a `dict` of Series plus a specific index will discard all data not matching up to the passed index.\n", "\n", "If axis labels are not passed, they will be constructed from the input data based on common sense rules.\n", "\n", "#### Create a Dataframe\n", "\n", "##### From dict of `Series` or dicts\n", "\n", "The resulting **index** will be the **union** of the indexes of the various Series. If there are any nested dicts, these will first be converted to Series. If no columns are passed, the columns will be the ordered list of `dict` keys." ] }, { "cell_type": "code", "execution_count": null, "id": "aa7ddc8a", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "d = {\n", " \"one\": pd.Series([1.0, 2.0, 3.0], index=[\"a\", \"b\", \"c\"]),\n", " \"two\": pd.Series([1.0, 2.0, 3.0, 4.0], index=[\"a\", \"b\", \"c\", \"d\"]),\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "f526badc", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df = pd.DataFrame(d)" ] }, { "cell_type": "code", "execution_count": null, "id": "69ddc66c", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwo
a1.01.0
b2.02.0
c3.03.0
dNaN4.0
\n", "
" ], "text/plain": [ " one two\n", "a 1.0 1.0\n", "b 2.0 2.0\n", "c 3.0 3.0\n", "d NaN 4.0" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": null, "id": "1f5e8ccb", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwo
dNaN4.0
b2.02.0
a1.01.0
\n", "
" ], "text/plain": [ " one two\n", "d NaN 4.0\n", "b 2.0 2.0\n", "a 1.0 1.0" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(d, index=[\"d\", \"b\", \"a\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "9940fb65", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
twothree
d4.0NaN
b2.0NaN
a1.0NaN
\n", "
" ], "text/plain": [ " two three\n", "d 4.0 NaN\n", "b 2.0 NaN\n", "a 1.0 NaN" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(d, index=[\"d\", \"b\", \"a\"], columns=[\"two\", \"three\"])" ] }, { "cell_type": "markdown", "id": "93b5a50c", "metadata": {}, "source": [ "The row and column labels can be accessed respectively by accessing the **index** and **columns** attributes:\n", "\n", ":::{note}\n", "When a particular set of columns is passed along with a dict of data, the passed columns override the keys in the dict.\n", ":::" ] }, { "cell_type": "code", "execution_count": null, "id": "8a3ba6ae", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "Index(['a', 'b', 'c', 'd'], dtype='object')" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index" ] }, { "cell_type": "code", "execution_count": null, "id": "13684125", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "Index(['one', 'two'], dtype='object')" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "id": "49c8bc9a", "metadata": {}, "source": [ "##### From dict of ndarrays / lists\n", "\n", "The ndarrays must all be the same length. If an index is passed, it must also be the same length as the arrays. If no index is passed, the result will be `range(n)`, where `n` is the array length." ] }, { "cell_type": "code", "execution_count": null, "id": "c4789555", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "d = {\"one\": [1.0, 2.0, 3.0, 4.0], \"two\": [4.0, 3.0, 2.0, 1.0]}" ] }, { "cell_type": "code", "execution_count": null, "id": "29098be0", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwo
01.04.0
12.03.0
23.02.0
34.01.0
\n", "
" ], "text/plain": [ " one two\n", "0 1.0 4.0\n", "1 2.0 3.0\n", "2 3.0 2.0\n", "3 4.0 1.0" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(d)" ] }, { "cell_type": "code", "execution_count": null, "id": "5600834a", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwo
a1.04.0
b2.03.0
c3.02.0
d4.01.0
\n", "
" ], "text/plain": [ " one two\n", "a 1.0 4.0\n", "b 2.0 3.0\n", "c 3.0 2.0\n", "d 4.0 1.0" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(d, index=[\"a\", \"b\", \"c\", \"d\"])" ] }, { "cell_type": "markdown", "id": "506868de", "metadata": {}, "source": [ "##### From structured or record array\n", "\n", "This case is handled identically to a dict of arrays." ] }, { "cell_type": "code", "execution_count": null, "id": "0b3b5090", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "data = np.zeros((2,), dtype=[(\"A\", \"i4\"), (\"B\", \"f4\"), (\"C\", \"a10\")])" ] }, { "cell_type": "code", "execution_count": null, "id": "543153a7", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "data[:] = [(1, 2.0, \"Hello\"), (2, 3.0, \"World\")]" ] }, { "cell_type": "code", "execution_count": null, "id": "c5278e68", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
012.0b'Hello'
123.0b'World'
\n", "
" ], "text/plain": [ " A B C\n", "0 1 2.0 b'Hello'\n", "1 2 3.0 b'World'" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(data)" ] }, { "cell_type": "code", "execution_count": null, "id": "fefbfc51", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
first12.0b'Hello'
second23.0b'World'
\n", "
" ], "text/plain": [ " A B C\n", "first 1 2.0 b'Hello'\n", "second 2 3.0 b'World'" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(data, index=[\"first\", \"second\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "f76d517a", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CAB
0b'Hello'12.0
1b'World'23.0
\n", "
" ], "text/plain": [ " C A B\n", "0 b'Hello' 1 2.0\n", "1 b'World' 2 3.0" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(data, columns=[\"C\", \"A\", \"B\"])" ] }, { "cell_type": "markdown", "id": "75f7c017", "metadata": {}, "source": [ ":::{note}\n", "DataFrame is not intended to work exactly like a 2-dimensional NumPy ndarray.\n", ":::\n", "\n", "\n", "##### From a list of dicts" ] }, { "cell_type": "code", "execution_count": null, "id": "a2aa6cb3", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "data2 = [{\"a\": 1, \"b\": 2}, {\"a\": 5, \"b\": 10, \"c\": 20}]" ] }, { "cell_type": "code", "execution_count": null, "id": "1e45ffbc", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
012NaN
151020.0
\n", "
" ], "text/plain": [ " a b c\n", "0 1 2 NaN\n", "1 5 10 20.0" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(data2)" ] }, { "cell_type": "code", "execution_count": null, "id": "8d6db924", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
first12NaN
second51020.0
\n", "
" ], "text/plain": [ " a b c\n", "first 1 2 NaN\n", "second 5 10 20.0" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(data2, index=[\"first\", \"second\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "258fa418", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
012
1510
\n", "
" ], "text/plain": [ " a b\n", "0 1 2\n", "1 5 10" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(data2, columns=[\"a\", \"b\"])" ] }, { "cell_type": "markdown", "id": "dfb77761", "metadata": {}, "source": [ "##### From a dict of tuples\n", "\n", "You can automatically create a MultiIndexed frame by passing a tuples dictionary." ] }, { "cell_type": "code", "execution_count": null, "id": "89af5166", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
bacab
AB1.04.05.08.010.0
C2.03.06.07.0NaN
DNaNNaNNaNNaN9.0
\n", "
" ], "text/plain": [ " a b \n", " b a c a b\n", "A B 1.0 4.0 5.0 8.0 10.0\n", " C 2.0 3.0 6.0 7.0 NaN\n", " D NaN NaN NaN NaN 9.0" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(\n", " {\n", " (\"a\", \"b\"): {(\"A\", \"B\"): 1, (\"A\", \"C\"): 2},\n", " (\"a\", \"a\"): {(\"A\", \"C\"): 3, (\"A\", \"B\"): 4},\n", " (\"a\", \"c\"): {(\"A\", \"B\"): 5, (\"A\", \"C\"): 6},\n", " (\"b\", \"a\"): {(\"A\", \"C\"): 7, (\"A\", \"B\"): 8},\n", " (\"b\", \"b\"): {(\"A\", \"D\"): 9, (\"A\", \"B\"): 10},\n", " }\n", ")" ] }, { "cell_type": "markdown", "id": "e02d86d6", "metadata": {}, "source": [ "##### From a Series\n", "\n", "The result will be a DataFrame with the same index as the input Series, and with one column whose name is the original name of the Series (only if no other column name provided)." ] }, { "cell_type": "code", "execution_count": null, "id": "77ff8552", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "ser = pd.Series(range(3), index=list(\"abc\"), name=\"ser\")" ] }, { "cell_type": "code", "execution_count": null, "id": "a86d1926", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ser
a0
b1
c2
\n", "
" ], "text/plain": [ " ser\n", "a 0\n", "b 1\n", "c 2" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(ser)" ] }, { "cell_type": "markdown", "id": "2f824850", "metadata": {}, "source": [ "##### From a list of namedtuples\n", "\n", "The field names of the first `namedtuple` in the list determine the columns of the `DataFrame`. The remaining namedtuples (or tuples) are simply unpacked and their values are fed into the rows of the `DataFrame`. If any of those tuples is shorter than the first `namedtuple` then the later columns in the corresponding row are marked as missing values. If any are longer than the first `namedtuple` , a `ValueError` is raised." ] }, { "cell_type": "code", "execution_count": null, "id": "67fd765e", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "from collections import namedtuple" ] }, { "cell_type": "code", "execution_count": null, "id": "d4524af3", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "Point = namedtuple(\"Point\", \"x y\")" ] }, { "cell_type": "code", "execution_count": null, "id": "02f0937c", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xy
000
103
223
\n", "
" ], "text/plain": [ " x y\n", "0 0 0\n", "1 0 3\n", "2 2 3" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame([Point(0, 0), Point(0, 3), (2, 3)])" ] }, { "cell_type": "code", "execution_count": null, "id": "4c81da05", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "Point3D = namedtuple(\"Point3D\", \"x y z\")" ] }, { "cell_type": "code", "execution_count": null, "id": "6731aad6", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xyz
0000.0
1035.0
223NaN
\n", "
" ], "text/plain": [ " x y z\n", "0 0 0 0.0\n", "1 0 3 5.0\n", "2 2 3 NaN" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame([Point3D(0, 0, 0), Point3D(0, 3, 5), Point(2, 3)])" ] }, { "cell_type": "markdown", "id": "8ff0bca2", "metadata": {}, "source": [ "##### From a list of dataclasses\n", "\n", "Data Classes as introduced in PEP557, can be passed into the DataFrame constructor. Passing a list of dataclasses is equivalent to passing a list of dictionaries.\n", "\n", "Please be aware, that all values in the list should be dataclasses, mixing types in the list would result in a `TypeError`." ] }, { "cell_type": "code", "execution_count": null, "id": "5fe92237", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "from dataclasses import make_dataclass" ] }, { "cell_type": "code", "execution_count": null, "id": "e13b27cf", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])" ] }, { "cell_type": "code", "execution_count": null, "id": "df6b2816", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xy
000
103
223
\n", "
" ], "text/plain": [ " x y\n", "0 0 0\n", "1 0 3\n", "2 2 3" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])" ] }, { "cell_type": "markdown", "id": "8e826768", "metadata": {}, "source": [ "#### Column selection, addition, deletion\n", "\n", "You can treat a `DataFrame` semantically like a dict of like-indexed `Series` objects. Getting, setting, and deleting columns works with the same syntax as the analogous dict operations:" ] }, { "cell_type": "code", "execution_count": null, "id": "a52d0734", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwo
a1.01.0
b2.02.0
c3.03.0
dNaN4.0
\n", "
" ], "text/plain": [ " one two\n", "a 1.0 1.0\n", "b 2.0 2.0\n", "c 3.0 3.0\n", "d NaN 4.0" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": null, "id": "804405d6", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a 1.0\n", "b 2.0\n", "c 3.0\n", "d NaN\n", "Name: one, dtype: float64" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"one\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "dfa00c9b", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df[\"three\"] = df[\"one\"] * df[\"two\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "0f98ffa9", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df[\"flag\"] = df[\"one\"] > 2" ] }, { "cell_type": "code", "execution_count": null, "id": "1ef5e1a3", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothreeflag
a1.01.01.0False
b2.02.04.0False
c3.03.09.0True
dNaN4.0NaNFalse
\n", "
" ], "text/plain": [ " one two three flag\n", "a 1.0 1.0 1.0 False\n", "b 2.0 2.0 4.0 False\n", "c 3.0 3.0 9.0 True\n", "d NaN 4.0 NaN False" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "f518cd88", "metadata": {}, "source": [ "Columns can be deleted or popped like with a dict:" ] }, { "cell_type": "code", "execution_count": null, "id": "b418f585", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "del df[\"two\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "209ebb78", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "three = df.pop(\"three\")" ] }, { "cell_type": "code", "execution_count": null, "id": "9aee9b49", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
oneflag
a1.0False
b2.0False
c3.0True
dNaNFalse
\n", "
" ], "text/plain": [ " one flag\n", "a 1.0 False\n", "b 2.0 False\n", "c 3.0 True\n", "d NaN False" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "40b5a135", "metadata": {}, "source": [ "When inserting a scalar value, it will naturally be propagated to fill the column:" ] }, { "cell_type": "code", "execution_count": null, "id": "1bddfbc5", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df[\"foo\"] = \"bar\"" ] }, { "cell_type": "code", "execution_count": null, "id": "e2613bd3", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
oneflagfoo
a1.0Falsebar
b2.0Falsebar
c3.0Truebar
dNaNFalsebar
\n", "
" ], "text/plain": [ " one flag foo\n", "a 1.0 False bar\n", "b 2.0 False bar\n", "c 3.0 True bar\n", "d NaN False bar" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "d93a6895", "metadata": {}, "source": [ "When inserting a `Series` that does not have the same index as the `DataFrame`, it will be conformed to the DataFrame's index:" ] }, { "cell_type": "code", "execution_count": null, "id": "c20564a5", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df[\"one_trunc\"] = df[\"one\"][:2]" ] }, { "cell_type": "code", "execution_count": null, "id": "877b972d-49b8-4225-855e-ec77bd876d8b", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "76026aba", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
oneflagfooone_trunc
a1.0Falsebar1.0
b2.0Falsebar2.0
c3.0TruebarNaN
dNaNFalsebarNaN
\n", "
" ], "text/plain": [ " one flag foo one_trunc\n", "a 1.0 False bar 1.0\n", "b 2.0 False bar 2.0\n", "c 3.0 True bar NaN\n", "d NaN False bar NaN" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "b7c3f5d9", "metadata": {}, "source": [ "You can insert raw ndarrays but their length must match the length of the DataFrame's index.\n", "\n", "By default, columns get inserted at the end. `DataFrame.insert()` inserts at a particular location in the columns:" ] }, { "cell_type": "code", "execution_count": null, "id": "8dbfb773", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df.insert(1, \"bar\", df[\"one\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "27dea852", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onebarflagfooone_trunc
a1.01.0Falsebar1.0
b2.02.0Falsebar2.0
c3.03.0TruebarNaN
dNaNNaNFalsebarNaN
\n", "
" ], "text/plain": [ " one bar flag foo one_trunc\n", "a 1.0 1.0 False bar 1.0\n", "b 2.0 2.0 False bar 2.0\n", "c 3.0 3.0 True bar NaN\n", "d NaN NaN False bar NaN" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "4786e42f", "metadata": {}, "source": [ "#### Assigning new columns in method chains\n", "\n", "DataFrame has an `assign()` method that allows you to easily create new columns that are potentially derived from existing columns." ] }, { "cell_type": "code", "execution_count": null, "id": "e9e4dead", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "iris = pd.read_csv(\"https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/data-science/working-with-data/pandas/iris.csv\")" ] }, { "cell_type": "code", "execution_count": null, "id": "38eef1a4", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SepalLengthSepalWidthPetalLengthPetalWidthName
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
45.03.61.40.2Iris-setosa
\n", "
" ], "text/plain": [ " SepalLength SepalWidth PetalLength PetalWidth Name\n", "0 5.1 3.5 1.4 0.2 Iris-setosa\n", "1 4.9 3.0 1.4 0.2 Iris-setosa\n", "2 4.7 3.2 1.3 0.2 Iris-setosa\n", "3 4.6 3.1 1.5 0.2 Iris-setosa\n", "4 5.0 3.6 1.4 0.2 Iris-setosa" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "ed27d63b", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SepalLengthSepalWidthPetalLengthPetalWidthNamesepal_ratio
05.13.51.40.2Iris-setosa0.686275
14.93.01.40.2Iris-setosa0.612245
24.73.21.30.2Iris-setosa0.680851
34.63.11.50.2Iris-setosa0.673913
45.03.61.40.2Iris-setosa0.720000
\n", "
" ], "text/plain": [ " SepalLength SepalWidth PetalLength PetalWidth Name sepal_ratio\n", "0 5.1 3.5 1.4 0.2 Iris-setosa 0.686275\n", "1 4.9 3.0 1.4 0.2 Iris-setosa 0.612245\n", "2 4.7 3.2 1.3 0.2 Iris-setosa 0.680851\n", "3 4.6 3.1 1.5 0.2 Iris-setosa 0.673913\n", "4 5.0 3.6 1.4 0.2 Iris-setosa 0.720000" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.assign(sepal_ratio=iris[\"SepalWidth\"] / iris[\"SepalLength\"]).head()" ] }, { "cell_type": "markdown", "id": "c989dbf7", "metadata": {}, "source": [ "In the example above, we inserted a precomputed value. We can also pass in a function of one argument to be evaluated on the DataFrame being assigned to." ] }, { "cell_type": "code", "execution_count": null, "id": "4f39885a", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SepalLengthSepalWidthPetalLengthPetalWidthNamesepal_ratio
05.13.51.40.2Iris-setosa0.686275
14.93.01.40.2Iris-setosa0.612245
24.73.21.30.2Iris-setosa0.680851
34.63.11.50.2Iris-setosa0.673913
45.03.61.40.2Iris-setosa0.720000
\n", "
" ], "text/plain": [ " SepalLength SepalWidth PetalLength PetalWidth Name sepal_ratio\n", "0 5.1 3.5 1.4 0.2 Iris-setosa 0.686275\n", "1 4.9 3.0 1.4 0.2 Iris-setosa 0.612245\n", "2 4.7 3.2 1.3 0.2 Iris-setosa 0.680851\n", "3 4.6 3.1 1.5 0.2 Iris-setosa 0.673913\n", "4 5.0 3.6 1.4 0.2 Iris-setosa 0.720000" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.assign(sepal_ratio=lambda x: (x[\"SepalWidth\"] / x[\"SepalLength\"])).head()" ] }, { "cell_type": "markdown", "id": "abcd0aee", "metadata": {}, "source": [ "`assign()` **always** returns a copy of the data, leaving the original DataFrame untouched.\n", "\n", "Passing a callable, as opposed to an actual value to be inserted, is useful when you don't have a reference to the DataFrame at hand. This is common when using `assign()` in a chain of operations. For example, we can limit the DataFrame to just those observations with a Sepal Length greater than 5, calculate the ratio, and plot:" ] }, { "cell_type": "code", "execution_count": null, "id": "0508916b", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkAAAAGwCAYAAABB4NqyAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABNwklEQVR4nO3de1xUdf4/8NcMchGRER2uLokIaq4XFITwfkGh+lZu7jf0W0Ksa7/tohnr17QMQm1R14pN3WyNvLSF7vZt23a/Rm0kthobBtJFjRVveAOBhBEoUOb8/vDL5MAMzP2cM+f1fDzm8XDOnHPmc+aA8+bzeX/eH5UgCAKIiIiIFEQtdgOIiIiIXI0BEBERESkOAyAiIiJSHAZAREREpDgMgIiIiEhxGAARERGR4jAAIiIiIsXpI3YDpEiv1+PSpUvo378/VCqV2M0hIiIiCwiCgGvXriEsLAxqdc99PAyATLh06RLCw8PFbgYRERHZ4Pz58/jJT37S4z4MgEzo378/gJsfoL+/v8itISIiIkvodDqEh4cbvsd7wgDIhM5hL39/fwZAREREMmNJ+gqToImIiEhxGAARERGR4jAAIiIiIsVhAERERESKwwCIiIiIFIcBEBERESkOAyAiIiJSHAZAREREpDgMgIiIiEhxGAARERGR4nApDCKZOV3XjHPftSJiUD8M1fYTuzlERLLEAIhIJhpb27GsoAKfnqwzbJsWHYgtC8dD4+spYsuIiOSHQ2BEMrGsoAKHq+qNth2uqsfSgqMitYiISL4YABHJwOm6Znx6sg4dgmC0vUMQ8OnJOpypbxGpZURE8sQAiEgGzn3X2uPrZxsYABERWYMBEJEMDBno2+PrEYOYDE1EZA0GQEQyEBnoh2nRgfBQqYy2e6hUmBYdyNlgRERWYgBEJBNbFo7H5Cit0bbJUVpsWThepBYREckXp8ETyYTG1xN7FsfjTH0Lzja0sA4QEZEdGAARycxQLQMfIiJ7cQiMiIiIFIcBEBERESkOAyAiIiJSHAZAREREpDgMgIiIiEhxGAARERGR4nAaPJGFTtc149x3ray/Q0TkBiTRA7Rt2zZERETAx8cHCQkJKC0ttei4vXv3QqVSYd68eUbbH374YahUKqNHSkqKE1pOStDY2o60/FLMevEgMnYewczNxUjLL0VT63Wxm0ZERDYSPQDat28fMjMzkZ2djfLycowbNw7Jycm4cuVKj8edPXsWK1aswNSpU02+npKSgsuXLxseBQUFzmg+KcCyggocrqo32na4qh5LC46K1CIiIrKX6AHQSy+9hCVLliAjIwOjRo3C9u3b4evrizfeeMPsMR0dHXjwwQeRk5ODyMhIk/t4e3sjJCTE8AgICHDWJZAbO13XjE9P1qFDEIy2dwgCPj1ZhzP1LSK1jIiI7CFqANTe3o6ysjIkJSUZtqnVaiQlJaGkpMTscWvXrkVQUBAWL15sdp/i4mIEBQVhxIgRePTRR9HQ0GB237a2Nuh0OqMHEQCc+661x9fPNjAAIiKSI1EDoPr6enR0dCA4ONhoe3BwMGpqakwec+jQIeTn52PHjh1mz5uSkoI9e/agqKgIGzduxMGDB3HnnXeio6PD5P65ubnQaDSGR3h4uO0XRW5lyEDfHl+PGMRkaCIiOZLVLLBr165h0aJF2LFjB7Rardn9FixYYPj3mDFjMHbsWAwbNgzFxcWYPXt2t/1Xr16NzMxMw3OdTscgiAAAkYF+mBYdiMNV9UbDYB4qFSZHaTkbjIhIpkQNgLRaLTw8PFBbW2u0vba2FiEhId32P3XqFM6ePYt77rnHsE2v1wMA+vTpg8rKSgwbNqzbcZGRkdBqtaiqqjIZAHl7e8Pb29veyyE3tWXheCzefQRfnLtq2DY5SostC8eL2CoiIrKHqENgXl5eiI2NRVFRkWGbXq9HUVEREhMTu+0/cuRIfP3116ioqDA87r33XsycORMVFRVme20uXLiAhoYGhIaGOu1ayD01trZjacFRo+BnYkQAtiwcD42vp4gtIyIie4g+BJaZmYn09HTExcUhPj4eeXl5aGlpQUZGBgAgLS0NgwcPRm5uLnx8fDB69Gij4wcMGAAAhu3Nzc3IycnB/PnzERISglOnTmHlypWIiopCcnKyS6+N5M/UFPjyc41YWnAUexbHi9QqIiKyl+gBUGpqKurq6pCVlYWamhrExMSgsLDQkBhdXV0NtdryjioPDw989dVX2L17NxobGxEWFoa5c+di3bp1HOYiq3ROge/q1inwzAEiIpInlSB0KXBC0Ol00Gg0aGpqgr+/v9jNIZEcqLyCjJ1HzL6+M2MiZo4IcmGLLMdlO4hIiaz5/ha9B4hIquQ4Bb6xtR3LCiqMeq6mRQcyZ4mIqAvRK0ETSVXnFHgPlcpou4dKhWnRgZLsWeGyHURElmEARNSDLQvHY3KUcc0pqU6Bt2bZjtN1zThQeYVLeRCRYnEIjKgHGl9P7FkcjzP1LTjb0CLpnBpLlu0I8PXkEBkREdgDRGSRodp+mDkiSLLBD2BZzhKHyIiIbmIAROQmestZEv5vKIwr2xMRMQAicis95SxxZXsioh8xB4jIyVxZk6ennCU5TetnHSMicjYGQEROImZNnqHa7oGDHFa2Zx0jInIVDoEROYkUE46lPq3f0s+M0/iJyF7sASJyAqmuIyblaf2WfGacxk9EjsIeICInkHrCsRSn9VvymUmxV42I5IkBEJETyCnhWCp6+8w8VCpO4ycih2EAROQEclxHTGy9fWZdA5+uxO5VIyJ5YQBE5CRSTziWop4+M/aqEZEjMQmayEmknHAsVT19ZhpfT8lP4yci+VAJQi/9ygqk0+mg0WjQ1NQEf39/sZtDRP+nqfU6lhYc5SwwIjLJmu9v9gARkcM5q5Ize9WIyFEYABGRw7iqkrOpStdERNZgEjQROQzr9BCRXDAAIiKH6KzkzDo9RCQHDICIyCGkXv2aiOhWzAEiUiBnJCmzTg8RyQkDICIFcWaScmclZ9bpISI54BAYkYI4O0mZ1a+JSC7YA0SkEJ1Jyl3dmqRsby8N6/QQkVwwACJSCEuSlB0VrLBODxFJHYfAiBSit1/2PmpVL3sQEbkP9gARuYCzloawhr6X12/ouSwgESkHAyAiJ3LV0hCW4DR1IqIfcQiMyImktDRE5zR1D5XxUJeHSoVp0YHM2SEiRZFEALRt2zZERETAx8cHCQkJKC0ttei4vXv3QqVSYd68eUbbBUFAVlYWQkND0bdvXyQlJeHkyZNOaDm5g9N1zThQecXhSzVIcWkITlMnIrpJ9CGwffv2ITMzE9u3b0dCQgLy8vKQnJyMyspKBAUFmT3u7NmzWLFiBaZOndrttU2bNuGVV17B7t27MXToUDz33HNITk7G8ePH4ePj48zLIRlx9vCUK2ddWYrT1ImIbhK9B+ill17CkiVLkJGRgVGjRmH79u3w9fXFG2+8YfaYjo4OPPjgg8jJyUFkZKTRa4IgIC8vD2vWrMF9992HsWPHYs+ePbh06RLee+89J18NyYmzh6eknHMzVNsPM0cEMfghIsUSNQBqb29HWVkZkpKSDNvUajWSkpJQUlJi9ri1a9ciKCgIixcv7vbamTNnUFNTY3ROjUaDhIQEs+dsa2uDTqczepB7c8XwFHNuiIikS9QAqL6+Hh0dHQgODjbaHhwcjJqaGpPHHDp0CPn5+dixY4fJ1zuPs+acubm50Gg0hkd4eLi1l0Iy46qVy5lzQ0QkTaLnAFnj2rVrWLRoEXbs2AGtVtv7ARZavXo1MjMzDc91Oh2DIDfnquEp5twQEUmTqAGQVquFh4cHamtrjbbX1tYiJCSk2/6nTp3C2bNncc899xi26fU3y7v16dMHlZWVhuNqa2sRGhpqdM6YmBiT7fD29oa3t7e9l0MyEhnoh7ghASg/d9WoQKCzVi539dIQUii8SEQkZaIGQF5eXoiNjUVRUZFhKrter0dRURGeeOKJbvuPHDkSX3/9tdG2NWvW4Nq1a/jd736H8PBweHp6IiQkBEVFRYaAR6fT4fPPP8ejjz7q7EsiGeic/fXFuavdXpP78JSUCi8SEUmZ6ENgmZmZSE9PR1xcHOLj45GXl4eWlhZkZGQAANLS0jB48GDk5ubCx8cHo0ePNjp+wIABAGC0ffny5Vi/fj2io6MN0+DDwsK61QsiZTI1+0utAmKHBGDP4niXt8eRvTU9zWwT49qIiKRK9AAoNTUVdXV1yMrKQk1NDWJiYlBYWGhIYq6uroZabV2u9sqVK9HS0oJHHnkEjY2NmDJlCgoLC1kDiAyzv7rSC8CRs1dxpt58bR5HDys5urfG3LXdOrONw2FERDepBEHgCohd6HQ6aDQaNDU1wd/fX+zmkAMdqLyCjJ1HzL6+M2MiZo4wLsDprGGltPxSHK6qN5qK35mDZEtvjS3XRkTkTqz5/ha9ECKRK9ky+8sZBROdUYdIyoUXiYikhgEQKYq1xQntCVR6WmPMGXWIWHiRiMhyDIBIcawpTmhLoNLY2o60/FLMevEgMnYewczNxUjLL0VT63XDPs7qrWHhRSIiy4ieBE3katYUJ3T0kFlnbk9nb425HCBbe2tYeJGIyDLsASLF6rogqKkhK2cOmTmzt4aLnRIR9Yw9QOQ0cqlG3Nssry0Lx2NpwVGj1+0ZMuv8LNhbQ0QkHgZA5HByq0bc25CVs4fMXL1MBhERcQiMnMAZ08adxZohK0uGlSID/TBp2CCTr00aNoiBDhGRRDAAIrt0zZtxRn0bZ3LGdHRzpUVZcpSISDo4BEY2MTfMlRr3kx6PuzUHRmyn65pR0/R9j/uYm45uLr/pdF0zSk43mDym5HSDZJajkEt+llzaSUTywwCIbGJumKu1/UaPx0mhGrGp4K0rc9PRe8tvsiYJWgxyyc+SSzuJSL44BEZW62mY64tzVzExIkDS1YhNBW9dmZvl1Vt+U29J0Acr6/DPHgIvZ5NLfpZc2klE8sUeILJab70c6ZMi0NfzgkXTxl3N3IrpnXLvH4M7Ik0nK1uy2rq5Aoeddn12Frs+O4sAX0+8//gUhA/qOWByJLmsFi+XdhKRvDEAIqv11svx0zAN9iwOk2R9m96CtxCNj9m2Wjq8ZapuUFdXW6/j3m2HcDRrbu+NdhCpD891kks7iUjeOARGVrO0OrIUqxHbswaXpcd21g06sGIGMudEm93/aut1lw6HyWW1eLm0k4jkjQEQ2USui27as2K6tccO1fZDbzPfy6uvWtV+SzliWQ+xyKWdRCRvKkFgdZKudDodNBoNmpqa4O/vL3ZzJE2Kw1ydzE2hbmq93m2IytIZRtYee7DyCtJ3HjF7vjcXx2NqdKA1l9WjiuqrWPPXb/DNRZ3J9tlz7a4kl3YSkbRY8/3NAMgEBkDyZukUanuCN2uOHb/2I1xtvd5te4Cvp8NygHqb2p8YOQgFj9xheC7lwPVWcmknEUkDAyA7MQCSt7T80m6zsDrr+uxZHO/y9pxvaMW92w4ZBUGOngWWll+KQyfroO9hnwMrZjCIICK3Zs33N2eBkVuR4hTq8EG+OJo1F/88WYfy6quYcFuAQ4e9epva3+lfpxsYABER/R8GQORWpDyFemp0oEMDn069XXMnVe+7EBEpBgMgcitKnELd2zV3Sog0vUq9PbhWFxHJFQMgcivmKjGbW9vLHfRWfRoAJg0zXd3aVlyri4jkjnWASNZM1buRa40ie5i65k7TogPx6oOxDn0/U2t1Haqq41pdRCQbnAVmAmeBSZ8lPRBKnELdec191Crc0As2X3tPQ1un65ox68WDZo99//HJGBs+wOr3JCKyF2eBkdvrabXwzqnuQ7XKCXw62XvNlgSWvSVdP/OXr/H3ZVNtbgMRkStwCIxkp3Pad9d8l1unupNtegosO/WWdP3NJZ2s7oGpYVQicn/sASLZkfJUdzmztIZSZKAfRg/2N1puoys53AMmchMpG3uASHZ6+6Hto2bFG1tYElh2emHe6B73lUO5AUt6u4jIfTEAItnpabkHALihZ16/LaypoTQuPADTogO7/QcilxXbOYxKRAyASHaUWOzQFSID/RBgZugnwNezW1CzZeF4TOlS2Vou5Qas6e0iIvckiQBo27ZtiIiIgI+PDxISElBaWmp233fffRdxcXEYMGAA+vXrh5iYGLz55ptG+zz88MNQqVRGj5SUFGdfBrlIZ+E/D5XxUJdceh+k6nRds8lV6wHgauv1br0iGl9P7FkcjwMrZmBnxkQcWDEDexbHyyJ/hkE0EYkeAO3btw+ZmZnIzs5GeXk5xo0bh+TkZFy5csXk/gMHDsSzzz6LkpISfPXVV8jIyEBGRgY+/PBDo/1SUlJw+fJlw6OgoMAVl0MuosRih85ma6/IUG0/zBwRJKvAk0E0EYleCDEhIQETJ07E1q1bAQB6vR7h4eFYunQpVq1aZdE5JkyYgLvvvhvr1q0DcLMHqLGxEe+9955Fx7e1taGtrc3wXKfTITw8nIUQZUCJxQ6dpbcChwdWzHCrz7ip9TqWFhzlLDAiNyKbQojt7e0oKyvD6tWrDdvUajWSkpJQUlLS6/GCIOCTTz5BZWUlNm7caPRacXExgoKCEBAQgFmzZmH9+vUYNMj0YpC5ubnIycmx72JIFEosdugsSltHrXMIj0E0kTKJOgRWX1+Pjo4OBAcHG20PDg5GTU2N2eOamprg5+cHLy8v3H333diyZQvmzJljeD0lJQV79uxBUVERNm7ciIMHD+LOO+9ER0eHyfOtXr0aTU1Nhsf58+cdc4FEMqPEoUU5DuERkf1kWQixf//+qKioQHNzM4qKipCZmYnIyEjMmDEDALBgwQLDvmPGjMHYsWMxbNgwFBcXY/bs2d3O5+3tDW9vb1c1n0iy2CtCREohagCk1Wrh4eGB2tpao+21tbUICQkxe5xarUZUVBQAICYmBidOnEBubq4hAOoqMjISWq0WVVVVJgMgIjLGoUUicneiDoF5eXkhNjYWRUVFhm16vR5FRUVITEy0+Dx6vd4oibmrCxcuoKGhAaGhoXa1l4gsw/W1iEjqRB8Cy8zMRHp6OuLi4hAfH4+8vDy0tLQgIyMDAJCWlobBgwcjNzcXwM2E5bi4OAwbNgxtbW3Yv38/3nzzTbz66qsAgObmZuTk5GD+/PkICQnBqVOnsHLlSkRFRSE5OVm06yR5O13XjHPftXJIqBdcX4uI5EL0ACg1NRV1dXXIyspCTU0NYmJiUFhYaEiMrq6uhlr9Y0dVS0sLHnvsMVy4cAF9+/bFyJEj8cc//hGpqakAAA8PD3z11VfYvXs3GhsbERYWhrlz52LdunXM8yGr8QvdOj2tr7VncbxIrSIi6k70OkBSZE0dAXJvafmlZqeF8wvdmNLqCBGR9Fjz/S16JWgiqeKCmdbh+lpEJCcMgIjM4Be6dbi+FhHJCQMgF3PE7BjOsHENqX6hS/VnyJ71tRzdHv6OEFFvRE+CVgpHJNMyIde1pLY0hBx+hrYsHN9tfa2eKkk7uj38HSEiSzEJ2gRnJEE7IpmWCbmuJ6UFM+X0M2RpJWlHt4e/I0TKJpvFUJWiM5m2q1uTaXvrTXDEOch6UlkaQm4/Q5ZUknZ0e/g7QkTWYA6QCzgimZYJueISe8FMW+5/1zwYqf0MObo9Urs+IpI29gC5gCOSaaWakEvmObJ6tDX331wezK/nRlt8Dldw9M80f0eIyBoMgFzAEcm0UkvIlRNXL2PhjETczvt/6GQd9F1eC/D1xEBfL8Nzc9WYO9shlZ8hR/9M83eEiKzBITAX2bJwPCZHaY229TQ7xlnnkDtrpjc3trYjLb8Us148iIydRzBzczHS8kvR1Hrdqe1ZsucLHKoyzkXpXA7CHuYCKN331w3n7q1444q5wyX1M+Ton2n+jhCRpTgLzARnLoXhiGRasRNyxWBLr4ozZwSZas+kYYPQ2taBiguNZo+zZzkIS5aaONvQgoydR8zuszNjImaOCJLcz5Cj2yO16yMi1+AsMAmzZHaMK84hN9YusunsGUGm2vPZqYZejzvbYPv7WpLka2kejNR+hhzdHqldHxFJD4fASPJsWZPLmTOCzLXHEvYk4loS3NhTjZmISEkYAJHk2RLMOHNGUG/tMUWtgt0BiKXBDfNgiIh6xyEwkjxbghlnzgjqrT2m+Hh6YEXycJvfs5MlS01IpXgjEZGUMQnaBGcmQSuBM6ad25LQ7MxlLEy1xxKOen8GN0RE3Vnz/c0AyAQGQLZx5kKU9gQzzggWqhtacN+2w7h6y5T6fl4eaGnv6PXYScMG4e0ldzikHaa4uu6RI8ixzUQkPQyA7MQAyDauWIhSKj0fpq5VDXQrUmiOPdPhzZHjSuhybDMRSZc1399MgiaHsGWmli3EXpPrdF0zCkrPmbzWzuBHrep+XFefn+59yry1eioVIFVybDMRuQcmQZNDWDJTS85DG6Z6KswZFeaPby7qetzHlm7XnoaJ5LgSuhzbTETugwEQGbE1F8PdF6I01VNhzpaFE3D+uxakvWG+IvMdkYMsfm9LhonkGIDKsc1E5D4YACnYrcFOgK+nXbkY7rwQpbmeiq5uvdah2n5IjByEEhNDXYmRg6z6PCypgi3HAFSObSYi98EcIAUytUjozM3FOGznAp7uWoDP0sKHXa91+0OxmBYdaLTPtOhAbH8o1uL3tjS3yhkVoK1ZeNYWrFpNRGJiD5ACmepRuGpihXRrczHctQBfbz0VufePwR0menW6fh4eKhU6BAHftbZbPMOpt+Dr+MUmowrQvRVJtIQrZ2Y5qs1EcsByD9LCAEhhLB3OuZW1uRjuthBlb8N7C+Nv6/H4AF9PZP/1rE0BRW/B167PzuLucWEAHBeAWrvwrD3cNWgmuhXLPUgTh8AUxpZ1rGzNxXD2EIorbVk4HhOGDDDaZmlPhT1TvSMD/TAxIsDs60fOXe32+dpTKsBV5Qy6Eru8AZEzsdyDNLEHSGGsWcfK1gRmd/trp/N6jpy9atg2cUiARdfjiKne6ZMijN67K1M9dLZ2tXNmFpFjsdyDdLEHSGHMJZ6qcXOo5la25mK42187pq6nvLrRouuxZSX7rkaF9lzN9NYeOlMJ7mn5pWgykeNlCmdmETmWI/4PIOdgAKRApmZrTYkORPGKmTiwYgZ2ZkzEgRUzsGdxvNU9NmINoThLb9ezt7S6x2vqLaDoY0HZaGtmS9kbfHJmFpFj8Y8K6bJrCKysrAwnTpwAAIwaNQoTJkxwSKPIuXpKPNX4etr1JeduQyi9Xc+qd78GYH6Iz1wCdadF+aUWDQ9aMlvKUV3tnJlF5DjuXCNN7mzqAbpy5QpmzZqFiRMnYtmyZVi2bBni4uIwe/Zs1NVZN8MIALZt24aIiAj4+PggISEBpaWlZvd99913ERcXhwEDBqBfv36IiYnBm2++abSPIAjIyspCaGgo+vbti6SkJJw8edLqdrkLc8nIzkg8dZe/djo/Mw8L1vUCeu5lMdXjZumxnTqD1p566BzV1W7JexGR5dy1Rprc2dQDtHTpUly7dg3Hjh3D7bffDgA4fvw40tPTsWzZMhQUFFh8rn379iEzMxPbt29HQkIC8vLykJycjMrKSgQFBXXbf+DAgXj22WcxcuRIeHl54e9//zsyMjIQFBSE5ORkAMCmTZvwyiuvYPfu3Rg6dCiee+45JCcn4/jx4/Dx8bHlkmXndF0zjl/SYfdnZ3Hk3I8JtM5ORpb7XzumErj7qFW4oe959a6eelk6A4pP/12HtDe6B/fW9ND0VGLA0cGnu5UzIBILyz1Ik0oQTPTL90Kj0eDjjz/GxIkTjbaXlpZi7ty5aGxstPhcCQkJmDhxIrZu3QoA0Ov1CA8Px9KlS7Fq1SqLzjFhwgTcfffdWLduHQRBQFhYGH79619jxYoVAICmpiYEBwdj165dWLBgQa/n0+l00Gg0aGpqgr9/zwmoUtPbop2dgYij67ncqqn1erchFLnMAkvLLzU7XGWJp+ZE495xg03+53ag8goydppfH2xnxkTMHBFkV7E0U+13xT0nIpICa76/beoB0uv18PTs/kXm6ekJvV5v8Xna29tRVlaG1atXG7ap1WokJSWhpKSk1+MFQcAnn3yCyspKbNy4EQBw5swZ1NTUICkpybCfRqNBQkICSkpKTAZAbW1taGtrMzzX6XpeyVvKelu00xVTL+X6144tRSK7evkfJ/HyP06aDPh666EZ6OuJtPxSuwJH5u8QEVnGphygWbNm4cknn8SlS5cM2y5evIinnnoKs2fPtvg89fX16OjoQHBwsNH24OBg1NTUmD2uqakJfn5+8PLywt13340tW7Zgzpw5AGA4zppz5ubmQqPRGB7h4eEWX4OUmJuxZIorpl7KrbidLUUizTGV19PbDKsXPzppd/kA5u8QEVnGpgBo69at0Ol0iIiIwLBhwzBs2DAMHToUOp0OW7ZscXQbu+nfvz8qKipw5MgRvPDCC8jMzERxcbHN51u9ejWampoMj/PnzzuusS5kzRe4XJKRXcmaIpG9MTft31wy5K/nRju0fIDcgk+SL3eq+E7KYtMQWHh4OMrLy/Hxxx/j22+/BQDcfvvtRsNOltBqtfDw8EBtba3R9traWoSEhJg9Tq1WIyoqCgAQExODEydOIDc3FzNmzDAcV1tbi9DQUKNzxsTEmDyft7c3vL29rWq7FFnyBS6XZGQxRAb6ITFyEEpON/S4X+dnmHPfT/H+lxfx8j/MzzDsOu3f3PDggcorPb6n3MoHkPtzt4rvpDw2F0JUqVSYM2cOli5diqVLl1od/ACAl5cXYmNjUVRUZNim1+tRVFSExMREi8+j1+sNOTxDhw5FSEiI0Tl1Oh0+//xzq84pR72tGwUAE4YMkEU+iFh/VaosmPbemVMzVNsP94wN63Ffcz1tXXto3KV8ACmHu1V8J+WxuAfolVdewSOPPAIfHx+88sorPe67bNkyixuQmZmJ9PR0xMXFIT4+Hnl5eWhpaUFGRgYAIC0tDYMHD0Zubi6Am/k6cXFxGDZsGNra2rB//368+eabePXVVwHcDMyWL1+O9evXIzo62jANPiwsDPPmzbO4XXLV27pRj82MkvRfZ2L+VXm6rhmfnTLf+5N7/xjcETnIqCfGUdP+5V4+gJSF61uRO7A4AHr55Zfx4IMPwsfHBy+//LLZ/VQqlVUBUGpqKurq6pCVlYWamhrExMSgsLDQkMRcXV0NtfrHjqqWlhY89thjuHDhAvr27YuRI0fij3/8I1JTUw37rFy5Ei0tLXjkkUfQ2NiIKVOmoLCwUBE1gKxZN0qKevqr0pHTuE1NNe8thypE42PyP3VHzbxy1HnsmUZPZAl3q/hOymRTHSB3J+c6QIB8a8GcrmvGrBcPmn39wIoZdv+n2lMPU0NLm13v76hp/7aehzkZ5Cqu+F0lsoU139825QCtXbsWra3d/wL4/vvvsXbtWltOSQ4k17Lrrlg1uaceJnsXAnXUzCtbz8OcDHIVLppL7sCmHiAPDw9cvny521IVDQ0NCAoKQkdHh8MaKAa59wB1ErsQobVDMc7+q9KS8w/09bKpirXYw05S/Ytc7M+FnEfOFd/JfTm9ErQgCFCZmC7z5ZdfYuDAgbackpxArLWcbB2KcXYisKV5C9ZUsZbKsJPUcjKk8rmQ88i14jtRJ6uGwAICAjBw4ECoVCoMHz4cAwcONDw0Gg3mzJmDBx54wFltJZmwZyjGmcN3vU0176P+Mai3dBhKKsNOUptGL5XPhZyPRTdJrqzqAcrLy4MgCPjFL36BnJwcaDQaw2teXl6IiIhw+1o71DN7p8c6869Kcz1MnRbll3brpehpCEdKU4GlNI1eSp8LEZE5VgVA6enpAG4WG5w0aZLJBVFJ2Rw1FOOs4TtTU81v1dlL8crCmF6HcKQ27CSVhVCl9rkQEZliUw7Q9OnTDf/+4Ycf0N7ebvS6nBOHyT5SG4rpqrOH6dN/X0HaG0e6vd7ZS7FkzxcoP9do9FrXekRSu1ap5GRI7XMhIjLFpmnwra2teOKJJxAUFIR+/fohICDA6EHKJZfpsR29zH08cvZqrwuTSvVaxc7JkOrnQkR0K5sCoP/+7//GJ598gldffRXe3t54/fXXkZOTg7CwMOzZs8fRbSSZkUMdIntWfr+1HpEcrlUM/FyISOpsqgN02223Yc+ePZgxYwb8/f1RXl6OqKgovPnmmygoKMD+/fud0VaXcZc6QGJz9FCMo2vKmKuYPf62AfjinPn11EzV1BF72Emq+LkQkSs5vQ7Qd999h8jISAA3832+++47AMCUKVPw6KOP2nJKckOOSmR2Vk2ZnpKGlxYctWpGlVg1l6SOnwsRSZVNAVBkZCTOnDmD2267DSNHjsSf/vQnxMfH429/+xsGDBjg4CaS0jlrgdSekoalMqOKiIicw6YAKCMjA19++SWmT5+OVatW4Z577sHWrVtx/fp1vPTSS45uIymYK2rKmOqlEGNGFZeNICJyHZsCoKeeesrw76SkJHz77bcoKytDVFQUxo4d67DGEYldU8YVQzhcNoKIyPVsmgXW1ZAhQ3D//fdj7NixeOeddxxxSiIAzqspc7quGQcqrximtIuJy0YQEbme1QHQjRs38M033+Df//630fa//vWvGDduHB588EGHNY7I0TVlGlvbkZZfilkvHkTGziOYubkYafmlaGq97shmW6xziK+3mkNERORYVgVA33zzDaKiojBu3DjcfvvtuP/++1FbW4vp06fjF7/4Be68806cOnXKWW0lhXJkTRmp9bZYMsRHRESOZ1UO0NNPP42oqChs3boVBQUFKCgowIkTJ7B48WIUFhaib9++zmonKZijEpKluEgnl40gMTHxXjxy/uzl3PZbWRUAHTlyBB999BFiYmIwdepUFBQU4JlnnsGiRYuc1T4iA3sTksVOqDZFSqu4k3Iw8V48cv7s5dx2U6waAquvr0dYWBgAQKPRoF+/frjjjjuc0jAiR5NqbwuXjSBXk9pQsJLI+bOXc9tNsaoHSKVS4dq1a/Dx8YEgCFCpVPj++++h0+mM9uPyESRFUu1tkcoq7qQMUhwKVgo5f/Zybrs5VgVAgiBg+PDhRs/Hjx9v9FylUqGjo8NxLSRyIClXeOayEeQKUhwKVgo5f/Zybrs5VgVABw4ccFY7iFyCvS2kdFIdClYCOX/2cm67OVYFQNOnT3dWO4hcir0tpFRSHQpWAjl/9nJuuzkWJ0HrdDqLH0REJF1MvBePnD97ObfdFJUgdClBa4ZarYaqSzXertwlB0in00Gj0aCpqYkJ3UTktjgULB45f/ZSbrs1398WD4Ex/4fIMu5SJIzcH4eCxSPnz17Obb+VxQEQ83+IeuZuRcKIiNyZVUnQXbW2tqK6uhrt7e1G28eOHWtXo4jkqKciYXsWx4vUKiIiMsWmAKiurg4ZGRn44IMPTL4u9xwgImu5Y5EwIiJ3ZtVSGJ2WL1+OxsZGfP755+jbty8KCwuxe/duREdH4/3333d0G8lOp+uacaDyCs7Uu//K4mJdK1d1JyKSF5sCoE8++QQvvfQS4uLioFarMWTIEDz00EPYtGkTcnNzrT7ftm3bEBERAR8fHyQkJKC0tNTsvjt27MDUqVMREBCAgIAAJCUlddv/4YcfhkqlMnqkpKRY3S65a2xtR1p+KWa9eBAZO49g5uZipOWXoqn1uthNczixr9Udi4QREbkzmwKglpYWBAUFAQACAgJQV3ez63/MmDEoLy+36lz79u1DZmYmsrOzUV5ejnHjxiE5ORlXrlwxuX9xcTEWLlyIAwcOoKSkBOHh4Zg7dy4uXrxotF9KSgouX75seBQUFNhwpfLmbgvX9UTsa+0sEubRpVSEh0qFadGBHP4iIpIYmwKgESNGoLKyEgAwbtw4vPbaa7h48SK2b9+O0NBQq8710ksvYcmSJcjIyMCoUaOwfft2+Pr64o033jC5/1tvvYXHHnsMMTExGDlyJF5//XXo9XoUFRUZ7eft7Y2QkBDDIyAgwGwb2tra3K6YY2dOSkeXMk+35qS4C6lcq7sVCSMicmc2JUE/+eSTuHz5MgAgOzsbKSkpeOutt+Dl5YVdu3ZZfJ729naUlZVh9erVhm1qtRpJSUkoKSmx6Bytra24fv06Bg4caLS9uLgYQUFBCAgIwKxZs7B+/XoMGjTI5Dlyc3ORk5NjcbvlwB0XrjNHKtfKdcaIiOTDpgDooYceMvw7NjYW586dw7fffovbbrsNWq22hyON1dfXo6OjA8HBwUbbg4OD8e2331p0jqeffhphYWFISkoybEtJScH999+PoUOH4tSpU3jmmWdw5513oqSkBB4eHt3OsXr1amRmZhqe63Q6hIeHW3wdUqSknBSpXau5ImEskEhEJB02BUBr167FihUr4Ot784vH19cXEyZMwPfff4+1a9ciKyvLoY00Z8OGDdi7dy+Ki4vh4+Nj2L5gwQLDv8eMGYOxY8di2LBhKC4uxuzZs7udx9vbG97e3i5ps6u448J15kj9WlkgkYhIemzKAcrJyUFzc3O37a2trVYNJWm1Wnh4eKC2ttZoe21tLUJCQno8dvPmzdiwYQM++uijXgsvRkZGQqvVoqqqyuK2uQMl5aRI+VrFTtAmIqLubOoB6lz0tKsvv/yyWy5OT7y8vBAbG4uioiLMmzcPAAwJzU888YTZ4zZt2oQXXngBH374IeLi4np9nwsXLqChocHqBG25U1JOilSvlQUSiYikyaoAKCAgwFBXZ/jw4UZBUEdHB5qbm/GrX/3KqgZkZmYiPT0dcXFxiI+PR15eHlpaWpCRkQEASEtLw+DBgw31hTZu3IisrCy8/fbbiIiIQE1NDQDAz88Pfn5+aG5uRk5ODubPn4+QkBCcOnUKK1euRFRUFJKTk61qm7twl4XrLCG1a5VKgjYRERmzKgDKy8uDIAj4xS9+gZycHGg0GsNrXl5eiIiIQGJiolUNSE1NRV1dHbKyslBTU4OYmBgUFhYaEqOrq6uhVv84Uvfqq6+ivb0dP//5z43Ok52djeeffx4eHh746quvsHv3bjQ2NiIsLAxz587FunXr3C7Ph6RPagnaRER0k0oQuhRPscDBgwcxefJk9Olj11qqkqXT6aDRaNDU1AR/f3+xm0Myl5ZfajZBm4ukEhE5jjXf3zYlQU+fPh3nzp3DmjVrsHDhQkPV5g8++ADHjh2z5ZREbkvKCdpEREplcw/QnXfeicmTJ+PTTz/FiRMnEBkZiQ0bNuCLL77AO++844y2ugx7gMgZpJagTUSuxVpgzmfN97dNY1irVq3C+vXrkZmZif79+xu2z5o1C1u3brXllERuT2oJ2kTkGqwFJk02DYF9/fXX+NnPftZte1BQEOrr600cQUREpEysBSZNNgVAAwYMMKwFdqujR49i8ODBdjeKiIjIHUhlsWbqzqYAaMGCBXj66adRU1MDlUoFvV6Pw4cPY8WKFUhLS3N0G4mIiGTJklpgJA6bAqDf/OY3uP3223HbbbehubkZo0aNwrRp0zBp0iSsWbPG0W0kIiKSJdYCky6rkqD1ej1++9vf4v3330d7ezsWLVqE+fPno7m5GePHj0d0dLSz2klERCQ7Ul+sWcms6gF64YUX8Mwzz8DPzw+DBw/G22+/jXfeeQcPPPAAgx8iIiITWAtMmqyqAxQdHY0VK1bg//2//wcA+Pjjj3H33Xfj+++/N1quQu5YB4iIiByNtcCcz5rvb6sCIG9vb1RVVSE8PNywzcfHB1VVVfjJT35ie4slhgEQERGR/DhtKYwbN27Ax8fHaJunpyeuX79ufSuJiIiIRGJVErQgCHj44YeNVlX/4Ycf8Ktf/Qr9+v3Ynffuu+86roVEREREDmZVAJSent5t20MPPeSwxhARERG5glUB0M6dO53VDiIiIiKXcZ+pW0REREQWYgBEREREisMAiIiIiBTHqhwgIiIiuThd14xz37Wy8CCZxACIiIjcSmNrO5YVVODTk3WGbdOiA7Fl4XhofD1FbBlJCYfAiIjIrSwrqMDhqnqjbYer6rG04KhILSIpYgBERERu43RdMz49WWe08joAdAgCPj1ZhzP1LSK1zLlO1zXjQOUVt70+Z+AQGBERuY1z37X2+PrZhha3ygficJ/t2ANERERuY8hA3x5fjxjkPsEPwOE+ezAAIiIitxEZ6Idp0YHwUKmMtnuoVJgWHehWvT9KHe5zFAZARETkVrYsHI/JUVqjbZOjtNiycLxILXIOS4b7yDzmABERkVvR+Hpiz+J4nKlvwdmGFretA6S04T5HYwBERERuaajWPQOfTp3DfYer6o2GwTxUKkyO0rr1tTsCh8CIiIhkSinDfc7AHiAiIiKZUspwnzNIogdo27ZtiIiIgI+PDxISElBaWmp23x07dmDq1KkICAhAQEAAkpKSuu0vCAKysrIQGhqKvn37IikpCSdPnnT2ZRAREYlC6DITjHonegC0b98+ZGZmIjs7G+Xl5Rg3bhySk5Nx5coVk/sXFxdj4cKFOHDgAEpKShAeHo65c+fi4sWLhn02bdqEV155Bdu3b8fnn3+Ofv36ITk5GT/88IOrLouIiMjpGlvbkZZfilkvHkTGziOYubkYafmlaGq9LnbTJE8liBw2JiQkYOLEidi6dSsAQK/XIzw8HEuXLsWqVat6Pb6jowMBAQHYunUr0tLSIAgCwsLC8Otf/xorVqwAADQ1NSE4OBi7du3CggULej2nTqeDRqNBU1MT/P397btAIiIiJ0nLLzWbBL1ncbyILROHNd/fovYAtbe3o6ysDElJSYZtarUaSUlJKCkpsegcra2tuH79OgYOHAgAOHPmDGpqaozOqdFokJCQYPacbW1t0Ol0Rg8iIiIpYyFE+4gaANXX16OjowPBwcFG24ODg1FTU2PROZ5++mmEhYUZAp7O46w5Z25uLjQajeERHh5u7aUQERG5FAsh2kf0HCB7bNiwAXv37sVf/vIX+Pj42Hye1atXo6mpyfA4f/68A1tJRETkeCyEaB9RAyCtVgsPDw/U1tYaba+trUVISEiPx27evBkbNmzARx99hLFjxxq2dx5nzTm9vb3h7+9v9CAiIpIyJa175gyiBkBeXl6IjY1FUVGRYZter0dRURESExPNHrdp0yasW7cOhYWFiIuLM3pt6NChCAkJMTqnTqfD559/3uM5iYiI5IaFEG0neiHEzMxMpKenIy4uDvHx8cjLy0NLSwsyMjIAAGlpaRg8eDByc3MBABs3bkRWVhbefvttREREGPJ6/Pz84OfnB5VKheXLl2P9+vWIjo7G0KFD8dxzzyEsLAzz5s0T6zKJiIgcjoUQbSd6AJSamoq6ujpkZWWhpqYGMTExKCwsNCQxV1dXQ63+saPq1VdfRXt7O37+858bnSc7OxvPP/88AGDlypVoaWnBI488gsbGRkyZMgWFhYV25QkRERFJlbuve+YMotcBkiLWASIiIpIf2dQBIiIiIhIDAyAiIiJSHAZAREREpDgMgIiIiEhxGAARERGR4jAAIiIiIsVhAERERESKwwCIiIiIFIcBEBERESmO6EthEBERdTpd14xz37VyTStyOgZAREQkusbWdiwrqMCnJ+sM26ZFB2LLwvHQ+HqK2DJyVxwCIyIi0S0rqMDhqnqjbYer6rG04KhILSJ3xwCIiIhEdbquGZ+erENHl7W5OwQBn56sw5n6FpFaRu6MARAREYnq3HetPb5+toEBEDkeAyAiIhLVkIG+Pb4eMYjJ0OR4DICIiEhUkYF+mBYdCA+Vymi7h0qFadGBnA1GTsEAiIiIRLdl4XhMjtIabZscpcWWheNFahG5O06DJyIi0Wl8PbFncTzO1LfgbEOLU+oAscYQ3YoBEBERScZQreODE9YYIlM4BEZERG6NNYbIFAZARETktlhjiMxhAERERG6LNYbIHAZARETktlhjiMxhAERERG6LNYbIHAZARETk1lhjiEzhNHgiInJrrqgxRPLDAIiIiBTBGTWGSL4YABERkRFWTCYlYABEREQAWDGZlIVJ0EREBIAVk0lZGAARERErJpPiiB4Abdu2DREREfDx8UFCQgJKS0vN7nvs2DHMnz8fERERUKlUyMvL67bP888/D5VKZfQYOXKkE6+AiEj+WDGZlEbUAGjfvn3IzMxEdnY2ysvLMW7cOCQnJ+PKlSsm929tbUVkZCQ2bNiAkJAQs+f96U9/isuXLxsehw4dctYlEBG5BVZMJqURNQB66aWXsGTJEmRkZGDUqFHYvn07fH198cYbb5jcf+LEifjtb3+LBQsWwNvb2+x5+/Tpg5CQEMNDq9Wa3ZeIiFgxmZRHtACovb0dZWVlSEpK+rExajWSkpJQUlJi17lPnjyJsLAwREZG4sEHH0R1dXWP+7e1tUGn0xk9iIiUhhWTSUlEmwZfX1+Pjo4OBAcHG20PDg7Gt99+a/N5ExISsGvXLowYMQKXL19GTk4Opk6dim+++Qb9+/c3eUxubi5ycnJsfk8iInfAismkJG5XB+jOO+80/Hvs2LFISEjAkCFD8Kc//QmLFy82eczq1auRmZlpeK7T6RAeHu70thIRSRErJpMSiBYAabVaeHh4oLa21mh7bW1tjwnO1howYACGDx+Oqqoqs/t4e3v3mFNERETWE7uitNjvL1dK+dxEC4C8vLwQGxuLoqIizJs3DwCg1+tRVFSEJ554wmHv09zcjFOnTmHRokUOOycREZkndkVpsd9frpT2uYk6CywzMxM7duzA7t27ceLECTz66KNoaWlBRkYGACAtLQ2rV6827N/e3o6KigpUVFSgvb0dFy9eREVFhVHvzooVK3Dw4EGcPXsWn332GX72s5/Bw8MDCxcudPn1EREpkdgVpcV+f7lS2ucmag5Qamoq6urqkJWVhZqaGsTExKCwsNCQGF1dXQ21+scY7dKlSxg//sfZCJs3b8bmzZsxffp0FBcXAwAuXLiAhQsXoqGhAYGBgZgyZQr+9a9/ITAw0KXXRkSkRJ0Vpbu6taK0M4dVxH5/uVLi5yZ6EvQTTzxhdsirM6jpFBERAaFLmfau9u7d66imERGRlSypKO3ML1Kx31+ulPi5ib4UBhERuQ+xK0qL/f5ypcTPjQEQERE5jNgVpcV+f7lS4ufGAIiIiBzKnorSp+uacaDySrfV581tN/f+428bYPX7W/Me7khplcBVQm9JNQqk0+mg0WjQ1NQEf39/sZtDRCRL1lSUNjcFe/28n2LNe8csnppt6jwTIwLwetpEs1O5lTb9uzdyrgRuzfc3AyATGAAREblWWn4pDlfVo+OWryQPlQr+fftA9/2NbtsnR2mxZ3G8xecxt7+tx5A0WfP9zSEwIiISVecU7I4uf493CAKutl43ub1zaral5zG1v63HkHtgAERERKLqbQq2OWcbjIMTS6ZyW/vepo4h98AAiIiIRNXbFGxzuk7NtmUqtxKnf9NNDICIiEhUPU3BDvD1tHhqti1TuZU4/ZtuYgBERESiMzcF+/3Hp1g1NduWqdzuMP1b6VP4bcFZYCZwFhgRkTjMTcG2dmq2LVO55Tj9m1P4jXEavJ0YABERkRxwCr8xToMnIiJyc5zCbx8GQERERDLEKfz2YQBEREQkQ5zCbx8GQERERDLEKfz2YQBERKRQnDotf+4whV8sfcRuABERuRanTrsPja8n9iyOl+UUfrGxB4iISGGWFVTgcFW90bbDVfVYWnBUpBaRvYZq+2HmiCAGP1ZgAEREpCCcOk10EwMgIiIF4dRpopsYABERKQinThPdxACIiEhBOHWa6CYGQERECsOp00ScBk9EpDicOk3EAIiISLGGahn4kHJxCIyIiIgUhwEQERERKQ4DICIiIlIcBkBERESkOAyAiIiISHFED4C2bduGiIgI+Pj4ICEhAaWlpWb3PXbsGObPn4+IiAioVCrk5eXZfU4iIiJSHlEDoH379iEzMxPZ2dkoLy/HuHHjkJycjCtXrpjcv7W1FZGRkdiwYQNCQkIcck4iIiJSHpUgdFkS2IUSEhIwceJEbN26FQCg1+sRHh6OpUuXYtWqVT0eGxERgeXLl2P58uV2n7OtrQ1tbW2G5zqdDuHh4WhqaoK/v78dV0hERESuotPpoNFoLPr+Fq0HqL29HWVlZUhKSvqxMWo1kpKSUFJS4tJz5ubmQqPRGB7h4eE2vT8RERHJg2gBUH19PTo6OhAcHGy0PTg4GDU1NS495+rVq9HU1GR4nD9/3qb3JyIiInngUhgAvL294e3tLXYziIiIyEVE6wHSarXw8PBAbW2t0fba2lqzCc5inJOIiIjcj2gBkJeXF2JjY1FUVGTYptfrUVRUhMTERMmck4iIiNyPqENgmZmZSE9PR1xcHOLj45GXl4eWlhZkZGQAANLS0jB48GDk5uYCuJnkfPz4ccO/L168iIqKCvj5+SEqKsqicxIRERGJGgClpqairq4OWVlZqKmpQUxMDAoLCw1JzNXV1VCrf+ykunTpEsaPH294vnnzZmzevBnTp09HcXGxReckIiIiErUOkFRZU0eAiIiIpEEWdYCIiIiIxMIAiIiIiBSHARAREREpDgMgIiIiUhwGQERERKQ4DICIiIhIcRgAERERkeIwACIiIiLFYQBEREREisMAiIiIiBSHARAREREpDgMgIiIiUhwGQERERKQ4DICIiIhIcRgAERERkeIwACIiIiLF6SN2A4iIiEg5Ttc149x3rYgY1A9Dtf1EawcDICIiInK6xtZ2LCuowKcn6wzbpkUHYsvC8dD4erq8PRwCIyIiIqdbVlCBw1X1RtsOV9VjacFRUdrDAIiIiIic6nRdMz49WYcOQTDa3iEI+PRkHc7Ut7i8TQyAiIiIyKnOfdfa4+tnGxgAERERkZsZMtC3x9cjBrk+GZoBEBERETlVZKAfpkUHwkOlMtruoVJhWnSgKLPBGAARERGR021ZOB6To7RG2yZHabFl4XhR2sNp8EREROR0Gl9P7FkcjzP1LTjb0MI6QERERKQcQ7XiBj6dOARGREREisMAiIiIiBSHARAREREpDgMgIiIiUhxJBEDbtm1DREQEfHx8kJCQgNLS0h73//Of/4yRI0fCx8cHY8aMwf79+41ef/jhh6FSqYweKSkpzrwEIiIikhHRA6B9+/YhMzMT2dnZKC8vx7hx45CcnIwrV66Y3P+zzz7DwoULsXjxYhw9ehTz5s3DvHnz8M033xjtl5KSgsuXLxseBQUFrrgcIiIikgGVIHRZmczFEhISMHHiRGzduhUAoNfrER4ejqVLl2LVqlXd9k9NTUVLSwv+/ve/G7bdcccdiImJwfbt2wHc7AFqbGzEe++9Z1ObdDodNBoNmpqa4O/vb9M5iIiIyLWs+f4WtQeovb0dZWVlSEpKMmxTq9VISkpCSUmJyWNKSkqM9geA5OTkbvsXFxcjKCgII0aMwKOPPoqGhgaz7Whra4NOpzN6EBERkfsSNQCqr69HR0cHgoODjbYHBwejpqbG5DE1NTW97p+SkoI9e/agqKgIGzduxMGDB3HnnXeio6PD5Dlzc3Oh0WgMj/DwcDuvjIiIiKTMLStBL1iwwPDvMWPGYOzYsRg2bBiKi4sxe/bsbvuvXr0amZmZhuc6nY5BEBERkRsTNQDSarXw8PBAbW2t0fba2lqEhISYPCYkJMSq/QEgMjISWq0WVVVVJgMgb29veHt7G553pkVxKIyIiEg+Or+3LUlvFjUA8vLyQmxsLIqKijBv3jwAN5Ogi4qK8MQTT5g8JjExEUVFRVi+fLlh2z/+8Q8kJiaafZ8LFy6goaEBoaGhFrXr2rVrAMBeICIiIhm6du0aNBpNj/uIPgSWmZmJ9PR0xMXFIT4+Hnl5eWhpaUFGRgYAIC0tDYMHD0Zubi4A4Mknn8T06dPx4osv4u6778bevXvxxRdf4A9/+AMAoLm5GTk5OZg/fz5CQkJw6tQprFy5ElFRUUhOTraoTWFhYTh//jz69+8PlUrlnAtXmM5hxfPnz3NmnQTwfkgL74e08H5IizX3QxAEXLt2DWFhYb2eV/QAKDU1FXV1dcjKykJNTQ1iYmJQWFhoSHSurq6GWv1jrvakSZPw9ttvY82aNXjmmWcQHR2N9957D6NHjwYAeHh44KuvvsLu3bvR2NiIsLAwzJ07F+vWrTMa5uqJWq3GT37yE8dfLMHf35//oUgI74e08H5IC++HtFh6P3rr+ekkeh0gUgbWVpIW3g9p4f2QFt4PaXHW/RC9EjQRERGRqzEAIpfw9vZGdna2xcOQ5Fy8H9LC+yEtvB/S4qz7wSEwIiIiUhz2ABEREZHiMAAiIiIixWEARERERIrDAIiIiIgUhwEQOcy2bdsQEREBHx8fJCQkoLS01KLj9u7dC5VKZVgOhRzDmvuxa9cuqFQqo4ePj48LW+v+rP39aGxsxOOPP47Q0FB4e3tj+PDh2L9/v4ta6/6suR8zZszo9vuhUqlw9913u7DF7s3a34+8vDyMGDECffv2RXh4OJ566in88MMP1r2pQOQAe/fuFby8vIQ33nhDOHbsmLBkyRJhwIABQm1tbY/HnTlzRhg8eLAwdepU4b777nNNYxXA2vuxc+dOwd/fX7h8+bLhUVNT4+JWuy9r70dbW5sQFxcn3HXXXcKhQ4eEM2fOCMXFxUJFRYWLW+6erL0fDQ0NRr8b33zzjeDh4SHs3LnTtQ13U9bej7feekvw9vYW3nrrLeHMmTPChx9+KISGhgpPPfWUVe/LAIgcIj4+Xnj88ccNzzs6OoSwsDAhNzfX7DE3btwQJk2aJLz++utCeno6AyAHsvZ+7Ny5U9BoNC5qnfJYez9effVVITIyUmhvb3dVExXFlv+vbvXyyy8L/fv3F5qbm53VREWx9n48/vjjwqxZs4y2ZWZmCpMnT7bqfTkERnZrb29HWVkZkpKSDNvUajWSkpJQUlJi9ri1a9ciKCgIixcvdkUzFcPW+9Hc3IwhQ4YgPDwc9913H44dO+aK5ro9W+7H+++/j8TERDz++OMIDg7G6NGj8Zvf/AYdHR2uarbbsvX341b5+flYsGAB+vXr56xmKoYt92PSpEkoKyszDJOdPn0a+/fvx1133WXVe4u+GCrJX319PTo6OgwL2HYKDg7Gt99+a/KYQ4cOIT8/HxUVFS5oobLYcj9GjBiBN954A2PHjkVTUxM2b96MSZMm4dixY1wY2E623I/Tp0/jk08+wYMPPoj9+/ejqqoKjz32GK5fv47s7GxXNNtt2XI/blVaWopvvvkG+fn5zmqiothyP/7rv/4L9fX1mDJlCgRBwI0bN/CrX/0KzzzzjFXvzR4gcrlr165h0aJF2LFjB7RardjNIQCJiYlIS0tDTEwMpk+fjnfffReBgYF47bXXxG6aIun1egQFBeEPf/gDYmNjkZqaimeffRbbt28Xu2mKl5+fjzFjxiA+Pl7spihWcXExfvOb3+D3v/89ysvL8e677+J///d/sW7dOqvOwx4gsptWq4WHhwdqa2uNttfW1iIkJKTb/qdOncLZs2dxzz33GLbp9XoAQJ8+fVBZWYlhw4Y5t9FuzNr7YYqnpyfGjx+PqqoqZzRRUWy5H6GhofD09ISHh4dh2+23346amhq0t7fDy8vLqW12Z/b8frS0tGDv3r1Yu3atM5uoKLbcj+eeew6LFi3CL3/5SwDAmDFj0NLSgkceeQTPPvss1GrL+nbYA0R28/LyQmxsLIqKigzb9Ho9ioqKkJiY2G3/kSNH4uuvv0ZFRYXhce+992LmzJmoqKhAeHi4K5vvdqy9H6Z0dHTg66+/RmhoqLOaqRi23I/JkyejqqrK8IcBAPz73/9GaGgogx872fP78ec//xltbW146KGHnN1MxbDlfrS2tnYLcjr/WBCsWd7U2mxtIlP27t0reHt7C7t27RKOHz8uPPLII8KAAQMMU6kXLVokrFq1yuzxnAXmWNbej5ycHOHDDz8UTp06JZSVlQkLFiwQfHx8hGPHjol1CW7F2vtRXV0t9O/fX3jiiSeEyspK4e9//7sQFBQkrF+/XqxLcCu2/n81ZcoUITU11dXNdXvW3o/s7Gyhf//+QkFBgXD69Gnho48+EoYNGyY88MADVr0vh8DIIVJTU1FXV4esrCzU1NQgJiYGhYWFhsS26upqi7slyX7W3o+rV69iyZIlqKmpQUBAAGJjY/HZZ59h1KhRYl2CW7H2foSHh+PDDz/EU089hbFjx2Lw4MF48skn8fTTT4t1CW7Flv+vKisrcejQIXz00UdiNNmtWXs/1qxZA5VKhTVr1uDixYsIDAzEPffcgxdeeMGq91UJgjX9RURERETyxz/JiYiISHEYABEREZHiMAAiIiIixWEARERERIrDAIiIiIgUhwEQERERKQ4DICIiIlIcBkBERESkOAyAiEhRVCoV3nvvPZe818MPP4x58+a55L2IyDoMgIjIJerq6vDoo4/itttug7e3N0JCQpCcnIzDhw+L2i6VSmV4+Pv7Y+LEifjrX/9q1TnOnj0LlUqFiooKo+2/+93vsGvXLsc1logchmuBEZFLzJ8/H+3t7di9ezciIyNRW1uLoqIiNDQ0iN007Ny5EykpKdDpdPj973+Pn//85ygvL8eYMWPsOq9Go3FQC4nI0dgDRERO19jYiH/+85/YuHEjZs6ciSFDhiA+Ph6rV6/Gvffea9jnl7/8JQIDA+Hv749Zs2bhyy+/NJzj+eefR0xMDF577TWEh4fD19cXDzzwAJqamgz7HDlyBHPmzIFWq4VGo8H06dNRXl7ea/sGDBiAkJAQDB8+HOvWrcONGzdw4MABw+uFhYWYMmUKBgwYgEGDBuE//uM/cOrUKcPrQ4cOBQCMHz8eKpUKM2bMANB9CKytrQ3Lli1DUFAQfHx8MGXKFBw5csSmz5SI7MMAiIiczs/PD35+fnjvvffQ1tZmcp///M//xJUrV/DBBx+grKwMEyZMwOzZs/Hdd98Z9qmqqsKf/vQn/O1vf0NhYSGOHj2Kxx57zPD6tWvXkJ6ejkOHDuFf//oXoqOjcdddd+HatWsWtfPGjRvIz88HAHh5eRm2t7S0IDMzE1988QWKioqgVqvxs5/9DHq9HgBQWloKAPj4449x+fJlvPvuuybPv3LlSvzP//wPdu/ejfLyckRFRSE5OdnoGonIRQQiIhd45513hICAAMHHx0eYNGmSsHr1auHLL78UBEEQ/vnPfwr+/v7CDz/8YHTMsGHDhNdee00QBEHIzs4WPDw8hAsXLhhe/+CDDwS1Wi1cvnzZ5Ht2dHQI/fv3F/72t78ZtgEQ/vKXvxg99/HxEfr16yeo1WoBgBARESE0NDSYvZa6ujoBgPD1118LgiAIZ86cEQAIR48eNdovPT1duO+++wRBEITm5mbB09NTeOuttwyvt7e3C2FhYcKmTZvMvhcROQd7gIjIJebPn49Lly7h/fffR0pKCoqLizFhwgTs2rULX375JZqbmzFo0CBDb5Gfnx/OnDljNNR02223YfDgwYbniYmJ0Ov1qKysBADU1tZiyZIliI6Ohkajgb+/P5qbm1FdXd1j215++WVUVFTggw8+wKhRo/D6669j4MCBhtdPnjyJhQsXIjIyEv7+/oiIiACAXs97q1OnTuH69euYPHmyYZunpyfi4+Nx4sQJi89DRI7BJGgichkfHx/MmTMHc+bMwXPPPYdf/vKXyM7OxmOPPYbQ0FAUFxd3O2bAgAEWnz89PR0NDQ343e9+hyFDhsDb2xuJiYlob2/v8biQkBBERUUhKioKO3fuxF133YXjx48jKCgIAHDPPfdgyJAh2LFjB8LCwqDX6zF69Ohez0tE0sUeICISzahRo9DS0oIJEyagpqYGffr0MQQinQ+tVmvYv7q6GpcuXTI8/9e//gW1Wo0RI0YAAA4fPoxly5bhrrvuwk9/+lN4e3ujvr7eqjbFx8cjNjYWL7zwAgCgoaEBlZWVWLNmDWbPno3bb78dV69eNTqmM1+oo6PD7HmHDRsGLy8vo2n/169fx5EjRzBq1Cir2khE9mMARERO19DQgFmzZuGPf/wjvvrqK5w5cwZ//vOfsWnTJtx3331ISkpCYmIi5s2bh48++ghnz57FZ599hmeffRZffPGF4Tw+Pj5IT0/Hl19+iX/+859YtmwZHnjgAYSEhAAAoqOj8eabb+LEiRP4/PPP8eCDD6Jv375Wt3f58uV47bXXcPHiRQQEBGDQoEH4wx/+gKqqKnzyySfIzMw02j8oKAh9+/ZFYWEhamtrjWamderXrx8effRR/Pd//zcKCwtx/PhxLFmyBK2trVi8eLHVbSQi+zAAIiKn8/PzQ0JCAl5++WVMmzYNo0ePxnPPPYclS5Zg69atUKlU2L9/P6ZNm4aMjAwMHz4cCxYswLlz5xAcHGw4T1RUFO6//37cddddmDt3LsaOHYvf//73htfz8/Nx9epVTJgwAYsWLTJMObdWSkoKhg4dihdeeAFqtRp79+5FWVkZRo8ejaeeegq//e1vjfbv06cPXnnlFbz22msICwvDfffdZ/K8GzZswPz587Fo0SJMmDABVVVV+PDDDxEQEGB1G4nIPipBEASxG0FE1Jvnn38e7733Xrdqy0REtmAPEBERESkOAyAiIiJSHA6BERERkeKwB4iIiIgUhwEQERERKQ4DICIiIlIcBkBERESkOAyAiIiISHEYABEREZHiMAAiIiIixWEARERERIrz/wHNzCQB3MMxSwAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "(\n", " iris.query(\"SepalLength > 5\")\n", " .assign(\n", " SepalRatio=lambda x: x.SepalWidth / x.SepalLength,\n", " PetalRatio=lambda x: x.PetalWidth / x.PetalLength,\n", " )\n", " .plot(kind=\"scatter\", x=\"SepalRatio\", y=\"PetalRatio\")\n", ")" ] }, { "cell_type": "markdown", "id": "7e1e3e3d", "metadata": {}, "source": [ "Since a function is passed in, the function is computed on the DataFrame being assigned to. Importantly, this is the DataFrame that's been filtered to those rows with sepal length greater than 5. The filtering happens first, and then the ratio calculations. This is an example where we didn't have a reference to the filtered DataFrame available.\n", "\n", "The function signature for `assign()` is simply `**kwargs`. The keys are the column names for the new fields, and the values are either a value to be inserted (for example, a `Series` or NumPy array), or a function of one argument to be called on the `DataFrame`. A copy of the original `DataFrame` is returned, with the new values inserted.\n", "\n", "The order of `**kwargs` is preserved. This allows for dependent assignment, where an expression later in `**kwargs` can refer to a column created earlier in the same `assign()`." ] }, { "cell_type": "code", "execution_count": null, "id": "60b7e3c7", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "dfa = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})" ] }, { "cell_type": "code", "execution_count": null, "id": "4c821875", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
01456
12579
236912
\n", "
" ], "text/plain": [ " A B C D\n", "0 1 4 5 6\n", "1 2 5 7 9\n", "2 3 6 9 12" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfa.assign(C=lambda x: x[\"A\"] + x[\"B\"], D=lambda x: x[\"A\"] + x[\"C\"])" ] }, { "cell_type": "markdown", "id": "822c6838", "metadata": {}, "source": [ "In the second expression, `x['C']` will refer to the newly created column, that's equal to `dfa['A'] + dfa['B']`.\n", "\n", "#### Indexing / selection\n", "\n", "The basics of indexing are as follows:\n", "\n", "|Operation |Syntax |Result |\n", "|:------- |:----- |:----- |\n", "|Select column |`df[col]` |Series |\n", "|Select row by label |`df.loc[label]`|Series |\n", "|Select row by integer location|`df.iloc[loc]` |Series |\n", "|Slice rows |`df[5:10] ` |DataFrame|\n", "|Select rows by boolean vector |`df[bool_vec]` |DataFrame|\n", "\n", "Row selection, for example, returns a `Series` whose index is the columns of the `DataFrame`:" ] }, { "cell_type": "code", "execution_count": null, "id": "82154750", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "one 2.0\n", "bar 2.0\n", "flag False\n", "foo bar\n", "one_trunc 2.0\n", "Name: b, dtype: object" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[\"b\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "743d6893-bbf3-4fbf-a158-a3aaae040b39", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "2fae006c", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "one 3.0\n", "bar 3.0\n", "flag True\n", "foo bar\n", "one_trunc NaN\n", "Name: c, dtype: object" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[2]" ] }, { "cell_type": "markdown", "id": "87fe370b", "metadata": {}, "source": [ "#### Data alignment and arithmetic\n", "\n", "Data alignment between `DataFrame` objects automatically aligns on **both** the columns and the index (row labels)**. Again, the resulting object will have the union of the column and row labels." ] }, { "cell_type": "code", "execution_count": null, "id": "a3e29475", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df = pd.DataFrame(np.random.randn(10, 4), columns=[\"A\", \"B\", \"C\", \"D\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "c4634479", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df2 = pd.DataFrame(np.random.randn(7, 3), columns=[\"A\", \"B\", \"C\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "09eb77aa", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0-2.948481-0.423878-0.607636NaN
1-1.9201052.5419981.177187NaN
20.810385-1.4583961.420052NaN
31.405297-0.623525-0.060302NaN
4-0.3601740.887550-0.152828NaN
53.116284-0.400383-1.851961NaN
6-0.0106161.174856-1.748294NaN
7NaNNaNNaNNaN
8NaNNaNNaNNaN
9NaNNaNNaNNaN
\n", "
" ], "text/plain": [ " A B C D\n", "0 -2.948481 -0.423878 -0.607636 NaN\n", "1 -1.920105 2.541998 1.177187 NaN\n", "2 0.810385 -1.458396 1.420052 NaN\n", "3 1.405297 -0.623525 -0.060302 NaN\n", "4 -0.360174 0.887550 -0.152828 NaN\n", "5 3.116284 -0.400383 -1.851961 NaN\n", "6 -0.010616 1.174856 -1.748294 NaN\n", "7 NaN NaN NaN NaN\n", "8 NaN NaN NaN NaN\n", "9 NaN NaN NaN NaN" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df + df2" ] }, { "cell_type": "markdown", "id": "9062570a", "metadata": {}, "source": [ "When doing an operation between `DataFrame` and `Series`, the default behavior is to align the `Series` **index** on the `DataFrame` **columns**, thus broadcasting row-wise. For example:" ] }, { "cell_type": "code", "execution_count": null, "id": "c2a8adda", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
00.0000000.0000000.0000000.000000
1-0.0931222.3674710.5094430.995821
22.251086-1.8233860.955111-0.083409
32.3515560.3369140.3138880.777614
41.492720-0.1461241.6607332.685753
53.5950880.013952-1.699945-1.258807
62.5983650.038556-1.1473021.108468
72.161150-1.1105381.713914-0.157944
83.347460-0.7440860.674301-0.147782
92.7158352.4546290.6989741.240166
\n", "
" ], "text/plain": [ " A B C D\n", "0 0.000000 0.000000 0.000000 0.000000\n", "1 -0.093122 2.367471 0.509443 0.995821\n", "2 2.251086 -1.823386 0.955111 -0.083409\n", "3 2.351556 0.336914 0.313888 0.777614\n", "4 1.492720 -0.146124 1.660733 2.685753\n", "5 3.595088 0.013952 -1.699945 -1.258807\n", "6 2.598365 0.038556 -1.147302 1.108468\n", "7 2.161150 -1.110538 1.713914 -0.157944\n", "8 3.347460 -0.744086 0.674301 -0.147782\n", "9 2.715835 2.454629 0.698974 1.240166" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df - df.iloc[0]" ] }, { "cell_type": "markdown", "id": "cf0c0013", "metadata": {}, "source": [ "Arithmetic operations with scalars operate element-wise:" ] }, { "cell_type": "code", "execution_count": null, "id": "d4cc4904", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0-6.5618491.475187-1.293874-3.191283
1-7.02745713.3125421.2533401.787824
24.693579-7.6417413.481679-3.608329
35.1959293.1597560.2755660.696787
40.9017490.7445677.00979310.237485
511.4135891.544948-9.793598-9.485320
66.4299771.667968-7.0303852.351056
74.243902-4.0775047.275695-3.981001
810.175453-2.2452432.077633-3.930195
97.01732413.7483312.2009983.009549
\n", "
" ], "text/plain": [ " A B C D\n", "0 -6.561849 1.475187 -1.293874 -3.191283\n", "1 -7.027457 13.312542 1.253340 1.787824\n", "2 4.693579 -7.641741 3.481679 -3.608329\n", "3 5.195929 3.159756 0.275566 0.696787\n", "4 0.901749 0.744567 7.009793 10.237485\n", "5 11.413589 1.544948 -9.793598 -9.485320\n", "6 6.429977 1.667968 -7.030385 2.351056\n", "7 4.243902 -4.077504 7.275695 -3.981001\n", "8 10.175453 -2.245243 2.077633 -3.930195\n", "9 7.017324 13.748331 2.200998 3.009549" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df * 5 + 2" ] }, { "cell_type": "code", "execution_count": null, "id": "131ec689", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0-0.583986-9.527211-1.517969-0.963153
1-0.5538660.441987-6.696488-23.565357
21.856267-0.5185793.374549-0.891531
31.5644914.311251-2.899503-3.836671
4-4.552692-3.9826880.9980450.606981
50.531147-10.987746-0.423959-0.435338
61.128674-15.058810-0.55368614.242736
72.228261-0.8227060.947742-0.835980
80.611587-1.17778964.405384-0.843143
90.9965470.42559224.8759224.952705
\n", "
" ], "text/plain": [ " A B C D\n", "0 -0.583986 -9.527211 -1.517969 -0.963153\n", "1 -0.553866 0.441987 -6.696488 -23.565357\n", "2 1.856267 -0.518579 3.374549 -0.891531\n", "3 1.564491 4.311251 -2.899503 -3.836671\n", "4 -4.552692 -3.982688 0.998045 0.606981\n", "5 0.531147 -10.987746 -0.423959 -0.435338\n", "6 1.128674 -15.058810 -0.553686 14.242736\n", "7 2.228261 -0.822706 0.947742 -0.835980\n", "8 0.611587 -1.177789 64.405384 -0.843143\n", "9 0.996547 0.425592 24.875922 4.952705" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 / df" ] }, { "cell_type": "code", "execution_count": null, "id": "a2d50c6f", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
08.5978590.0001211.883423e-011.162034
110.62629126.2035944.972921e-040.000003
20.08422513.8274397.711465e-031.582902
30.1669200.0028951.414835e-020.004615
40.0023280.0039751.007858e+007.367132
512.5643760.0000693.095318e+0127.841482
60.6162060.0000191.064009e+010.000024
70.0405642.1828351.239478e+002.047461
87.1477160.5196735.811809e-081.978773
91.01393130.4806852.611459e-060.001662
\n", "
" ], "text/plain": [ " A B C D\n", "0 8.597859 0.000121 1.883423e-01 1.162034\n", "1 10.626291 26.203594 4.972921e-04 0.000003\n", "2 0.084225 13.827439 7.711465e-03 1.582902\n", "3 0.166920 0.002895 1.414835e-02 0.004615\n", "4 0.002328 0.003975 1.007858e+00 7.367132\n", "5 12.564376 0.000069 3.095318e+01 27.841482\n", "6 0.616206 0.000019 1.064009e+01 0.000024\n", "7 0.040564 2.182835 1.239478e+00 2.047461\n", "8 7.147716 0.519673 5.811809e-08 1.978773\n", "9 1.013931 30.480685 2.611459e-06 0.001662" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df ** 4" ] }, { "cell_type": "markdown", "id": "ab0cc5cb", "metadata": {}, "source": [ "Boolean operators operate element-wise as well:" ] }, { "cell_type": "code", "execution_count": null, "id": "edbec52a", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df1 = pd.DataFrame({\"a\": [1, 0, 1], \"b\": [0, 1, 1]}, dtype=bool)" ] }, { "cell_type": "code", "execution_count": null, "id": "727cd263", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "df2 = pd.DataFrame({\"a\": [0, 1, 1], \"b\": [1, 1, 0]}, dtype=bool)" ] }, { "cell_type": "code", "execution_count": null, "id": "523bbe29", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
0FalseFalse
1FalseTrue
2TrueFalse
\n", "
" ], "text/plain": [ " a b\n", "0 False False\n", "1 False True\n", "2 True False" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 & df2" ] }, { "cell_type": "code", "execution_count": null, "id": "b1a355fc", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
0TrueTrue
1TrueTrue
2TrueTrue
\n", "
" ], "text/plain": [ " a b\n", "0 True True\n", "1 True True\n", "2 True True" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 | df2" ] }, { "cell_type": "code", "execution_count": null, "id": "e89dc58b", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
0TrueTrue
1TrueFalse
2FalseTrue
\n", "
" ], "text/plain": [ " a b\n", "0 True True\n", "1 True False\n", "2 False True" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 ^ df2" ] }, { "cell_type": "code", "execution_count": null, "id": "9b438ef3", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
0FalseTrue
1TrueFalse
2FalseFalse
\n", "
" ], "text/plain": [ " a b\n", "0 False True\n", "1 True False\n", "2 False False" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "-df1" ] }, { "cell_type": "markdown", "id": "31d38eb7", "metadata": {}, "source": [ "#### Transposing\n", "\n", "To transpose, access the `T` attribute or `DataFrame.transpose()`, similar to an ndarray:" ] }, { "cell_type": "code", "execution_count": null, "id": "84f274b9", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
A-1.712370-1.8054910.5387160.639186-0.219650
B-0.1049632.262508-1.9283480.231951-0.251087
C-0.658775-0.1493320.296336-0.3448871.001959
D-1.038257-0.042435-1.121666-0.2606431.647497
\n", "
" ], "text/plain": [ " 0 1 2 3 4\n", "A -1.712370 -1.805491 0.538716 0.639186 -0.219650\n", "B -0.104963 2.262508 -1.928348 0.231951 -0.251087\n", "C -0.658775 -0.149332 0.296336 -0.344887 1.001959\n", "D -1.038257 -0.042435 -1.121666 -0.260643 1.647497" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[:5].T" ] }, { "cell_type": "markdown", "id": "20c81c1c", "metadata": {}, "source": [ "## Data indexing and selection\n", "\n", "The axis labeling information in Pandas objects serves many purposes:\n", "\n", "- Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display.\n", "- Enables automatic and explicit data alignment.\n", "- Allows intuitive getting and setting of subsets of the data set.\n", "\n", "In this section, we will focus on the final point: namely, how to slice, dice, and generally get and set subsets of Pandas objects. The primary focus will be on Series and DataFrame as they have received more development attention in this area.\n", "\n", ":::{note}\n", "The Python and NumPy indexing operators `[]` and attribute operator `.` provide quick and easy access to Pandas data structures across a wide range of use cases. This makes interactive work intuitive, as there's little new to learn if you already know how to deal with Python dictionaries and NumPy arrays. However, since the type of the data to be accessed isn't known in advance, directly using standard operators has some optimization limits. For production code, we recommended that you take advantage of the optimized Pandas data access methods exposed in this chapter.\n", ":::" ] }, { "cell_type": "markdown", "id": "5f5b68a0-0590-48bc-8129-c36c6faf57db", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called `chained assignment` and should be avoided." ] }, { "cell_type": "markdown", "id": "cbdec733", "metadata": {}, "source": [ "### Different choices for indexing\n", "\n", "Object selection has had a number of user-requested additions in order to support more explicit location-based indexing. Pandas now supports three types of multi-axis indexing.\n", "\n", "- `.loc` is primarily label based, but may also be used with a boolean array. `.loc` will raise `KeyError` when the items are not found. Allowed inputs are:\n", " - A single label, e.g. `5` or `'a'` (Note that `5` is interpreted as a label of the index. This use is not an integer position along the index.).\n", " - A list or array of labels `['a', 'b', 'c']`.\n", " - A slice object with labels `'a':'f'` (Note that contrary to usual Python slices, both the start and the stop are included, when present in the index!)\n", " - A boolean array (any `NA` values will be treated as `False`).\n", " - A `callable` function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).\n", "\n", "- `.iloc` is primarily integer position based (from `0` to `length-1` of the axis), but may also be used with a boolean array. `.iloc` will raise `IndexError` if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with Python/NumPy slice semantics). Allowed inputs are:\n", " - An integer e.g. `5`.\n", " - A list or array of integers `[4, 3, 0]`.\n", " - A slice object with ints `1:7`.\n", " - A boolean array (any `NA` values will be treated as `False`).\n", " - A `callable` function with one argument (the calling Series or DataFrame) that returns valid output for indexing (one of the above).\n", "- `.loc`, `.iloc`, and also `[]` indexing can accept a `callable` as indexer.\n", "\n", "Getting values from an object with multi-axes selection uses the following notation (using `.loc` as an example, but the following applies to `.iloc` as well). Any of the axes accessors may be the null slice `:`. Axes left out of the specification are assumed to be `:`, e.g. `p.loc['a']` is equivalent to `p.loc['a', :]`.\n", "\n", "|**Object Type**|**Indexers** |\n", "|:-- |:- |\n", "|Series |`s.loc[indexer]` |\n", "|DataFrame |`df.loc[row_indexer, column_indexer]`|\n", "\n", "### Basics\n", "\n", "As mentioned when introducing the data structures in the last section, the primary function of indexing with `[]` (a.k.a.` __getitem__` for those familiar with implementing class behavior in Python) is selecting out lower-dimensional slices. The following table shows return type values when indexing Pandas objects with `[]`:\n", "\n", "|**Object Type**|**Selection** |Return Value Type |\n", "|:- |:- |:- |\n", "|Series |`series[label]` |scalar value |\n", "|DataFrame |`frame[colname]`|`Series` corresponding to colname|\n", "\n", "Here we construct a simple time series data set to use for illustrating the indexing functionality:" ] }, { "cell_type": "code", "execution_count": null, "id": "12d39083", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2000-01-01-1.003075-1.0689151.2700931.131469
2000-01-020.508569-0.324633-2.092349-0.550827
2000-01-030.762823-0.897289-2.043889-1.096294
2000-01-040.5224462.1526131.6390170.314416
2000-01-05-0.3028380.9159040.803904-1.231580
2000-01-06-0.834977-0.8005500.390671-0.679977
2000-01-07-1.556795-0.502958-1.2396710.730893
2000-01-08-0.4240521.0550650.9000783.551748
\n", "
" ], "text/plain": [ " A B C D\n", "2000-01-01 -1.003075 -1.068915 1.270093 1.131469\n", "2000-01-02 0.508569 -0.324633 -2.092349 -0.550827\n", "2000-01-03 0.762823 -0.897289 -2.043889 -1.096294\n", "2000-01-04 0.522446 2.152613 1.639017 0.314416\n", "2000-01-05 -0.302838 0.915904 0.803904 -1.231580\n", "2000-01-06 -0.834977 -0.800550 0.390671 -0.679977\n", "2000-01-07 -1.556795 -0.502958 -1.239671 0.730893\n", "2000-01-08 -0.424052 1.055065 0.900078 3.551748" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dates = pd.date_range('1/1/2000', periods=8)\n", "df = pd.DataFrame(np.random.randn(8, 4),\n", " index=dates, columns=['A', 'B', 'C', 'D'])\n", "df" ] }, { "cell_type": "markdown", "id": "da294328", "metadata": {}, "source": [ ":::{note}\n", "None of the indexing functionality is time series specific unless specifically stated.\n", ":::\n", "\n", "Thus, as per above, we have the most basic indexing using `[]`:" ] }, { "cell_type": "code", "execution_count": null, "id": "1eee749c", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "-0.8349774413616455" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = df['A']\n", "\n", "s[dates[5]]" ] }, { "cell_type": "markdown", "id": "9c672552", "metadata": {}, "source": [ "You can pass a list of columns to `[]` to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner:" ] }, { "cell_type": "code", "execution_count": null, "id": "5a18bcbc", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2000-01-01-1.003075-1.0689151.2700931.131469
2000-01-020.508569-0.324633-2.092349-0.550827
2000-01-030.762823-0.897289-2.043889-1.096294
2000-01-040.5224462.1526131.6390170.314416
2000-01-05-0.3028380.9159040.803904-1.231580
2000-01-06-0.834977-0.8005500.390671-0.679977
2000-01-07-1.556795-0.502958-1.2396710.730893
2000-01-08-0.4240521.0550650.9000783.551748
\n", "
" ], "text/plain": [ " A B C D\n", "2000-01-01 -1.003075 -1.068915 1.270093 1.131469\n", "2000-01-02 0.508569 -0.324633 -2.092349 -0.550827\n", "2000-01-03 0.762823 -0.897289 -2.043889 -1.096294\n", "2000-01-04 0.522446 2.152613 1.639017 0.314416\n", "2000-01-05 -0.302838 0.915904 0.803904 -1.231580\n", "2000-01-06 -0.834977 -0.800550 0.390671 -0.679977\n", "2000-01-07 -1.556795 -0.502958 -1.239671 0.730893\n", "2000-01-08 -0.424052 1.055065 0.900078 3.551748" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": null, "id": "be2e73fe", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2000-01-01-1.068915-1.0030751.2700931.131469
2000-01-02-0.3246330.508569-2.092349-0.550827
2000-01-03-0.8972890.762823-2.043889-1.096294
2000-01-042.1526130.5224461.6390170.314416
2000-01-050.915904-0.3028380.803904-1.231580
2000-01-06-0.800550-0.8349770.390671-0.679977
2000-01-07-0.502958-1.556795-1.2396710.730893
2000-01-081.055065-0.4240520.9000783.551748
\n", "
" ], "text/plain": [ " A B C D\n", "2000-01-01 -1.068915 -1.003075 1.270093 1.131469\n", "2000-01-02 -0.324633 0.508569 -2.092349 -0.550827\n", "2000-01-03 -0.897289 0.762823 -2.043889 -1.096294\n", "2000-01-04 2.152613 0.522446 1.639017 0.314416\n", "2000-01-05 0.915904 -0.302838 0.803904 -1.231580\n", "2000-01-06 -0.800550 -0.834977 0.390671 -0.679977\n", "2000-01-07 -0.502958 -1.556795 -1.239671 0.730893\n", "2000-01-08 1.055065 -0.424052 0.900078 3.551748" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['B', 'A']] = df[['A', 'B']]\n", "df" ] }, { "cell_type": "markdown", "id": "6e6cd9c9", "metadata": {}, "source": [ "You may find this useful for applying a transform (in-place) to a subset of the columns." ] }, { "cell_type": "markdown", "id": "b9d41a7f-5d30-40e2-8508-83b4d08e1ef1", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "Pandas aligns all AXES when setting `Series` and `DataFrame` from `.loc`, and `.iloc`.\n", "\n", "This will not modify `df` because the column alignment is before value assignment." ] }, { "cell_type": "code", "execution_count": null, "id": "4e8a2ee9", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2000-01-01-1.068915-1.003075
2000-01-02-0.3246330.508569
2000-01-03-0.8972890.762823
2000-01-042.1526130.522446
2000-01-050.915904-0.302838
2000-01-06-0.800550-0.834977
2000-01-07-0.502958-1.556795
2000-01-081.055065-0.424052
\n", "
" ], "text/plain": [ " A B\n", "2000-01-01 -1.068915 -1.003075\n", "2000-01-02 -0.324633 0.508569\n", "2000-01-03 -0.897289 0.762823\n", "2000-01-04 2.152613 0.522446\n", "2000-01-05 0.915904 -0.302838\n", "2000-01-06 -0.800550 -0.834977\n", "2000-01-07 -0.502958 -1.556795\n", "2000-01-08 1.055065 -0.424052" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['A', 'B']]" ] }, { "cell_type": "code", "execution_count": null, "id": "cf8c39ef", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2000-01-01-1.068915-1.003075
2000-01-02-0.3246330.508569
2000-01-03-0.8972890.762823
2000-01-042.1526130.522446
2000-01-050.915904-0.302838
2000-01-06-0.800550-0.834977
2000-01-07-0.502958-1.556795
2000-01-081.055065-0.424052
\n", "
" ], "text/plain": [ " A B\n", "2000-01-01 -1.068915 -1.003075\n", "2000-01-02 -0.324633 0.508569\n", "2000-01-03 -0.897289 0.762823\n", "2000-01-04 2.152613 0.522446\n", "2000-01-05 0.915904 -0.302838\n", "2000-01-06 -0.800550 -0.834977\n", "2000-01-07 -0.502958 -1.556795\n", "2000-01-08 1.055065 -0.424052" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, ['B', 'A']] = df[['A', 'B']]\n", "df[['A', 'B']]" ] }, { "cell_type": "markdown", "id": "4ed60d11-3f81-43b5-8274-4d896238b734", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "The correct way to swap column values is by using raw values:" ] }, { "cell_type": "code", "execution_count": null, "id": "da9754c5", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2000-01-01-1.003075-1.068915
2000-01-020.508569-0.324633
2000-01-030.762823-0.897289
2000-01-040.5224462.152613
2000-01-05-0.3028380.915904
2000-01-06-0.834977-0.800550
2000-01-07-1.556795-0.502958
2000-01-08-0.4240521.055065
\n", "
" ], "text/plain": [ " A B\n", "2000-01-01 -1.003075 -1.068915\n", "2000-01-02 0.508569 -0.324633\n", "2000-01-03 0.762823 -0.897289\n", "2000-01-04 0.522446 2.152613\n", "2000-01-05 -0.302838 0.915904\n", "2000-01-06 -0.834977 -0.800550\n", "2000-01-07 -1.556795 -0.502958\n", "2000-01-08 -0.424052 1.055065" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, ['B', 'A']] = df[['A', 'B']].to_numpy()\n", "df[['A', 'B']]" ] }, { "cell_type": "markdown", "id": "beb7928a", "metadata": {}, "source": [ "### Attribute access\n", "\n", "You may access an index on a `Series` or column on a `DataFrame` directly as an attribute:" ] }, { "cell_type": "code", "execution_count": null, "id": "86dec0c0", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "sa = pd.Series([1, 2, 3], index=list('abc'))\n", "dfa = df.copy()" ] }, { "cell_type": "code", "execution_count": null, "id": "69ea1e07", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sa.b" ] }, { "cell_type": "code", "execution_count": null, "id": "ce9f7637", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 -1.003075\n", "2000-01-02 0.508569\n", "2000-01-03 0.762823\n", "2000-01-04 0.522446\n", "2000-01-05 -0.302838\n", "2000-01-06 -0.834977\n", "2000-01-07 -1.556795\n", "2000-01-08 -0.424052\n", "Freq: D, Name: A, dtype: float64" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfa.A" ] }, { "cell_type": "code", "execution_count": null, "id": "10cead84", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "a 5\n", "b 2\n", "c 3\n", "dtype: int64" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sa.a = 5\n", "sa" ] }, { "cell_type": "code", "execution_count": null, "id": "6db24b96", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2000-01-010-1.0689151.2700931.131469
2000-01-021-0.324633-2.092349-0.550827
2000-01-032-0.897289-2.043889-1.096294
2000-01-0432.1526131.6390170.314416
2000-01-0540.9159040.803904-1.231580
2000-01-065-0.8005500.390671-0.679977
2000-01-076-0.502958-1.2396710.730893
2000-01-0871.0550650.9000783.551748
\n", "
" ], "text/plain": [ " A B C D\n", "2000-01-01 0 -1.068915 1.270093 1.131469\n", "2000-01-02 1 -0.324633 -2.092349 -0.550827\n", "2000-01-03 2 -0.897289 -2.043889 -1.096294\n", "2000-01-04 3 2.152613 1.639017 0.314416\n", "2000-01-05 4 0.915904 0.803904 -1.231580\n", "2000-01-06 5 -0.800550 0.390671 -0.679977\n", "2000-01-07 6 -0.502958 -1.239671 0.730893\n", "2000-01-08 7 1.055065 0.900078 3.551748" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfa.A = list(range(len(dfa.index))) # ok if A already exists\n", "dfa" ] }, { "cell_type": "code", "execution_count": null, "id": "99790bfe", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2000-01-010-1.0689151.2700931.131469
2000-01-021-0.324633-2.092349-0.550827
2000-01-032-0.897289-2.043889-1.096294
2000-01-0432.1526131.6390170.314416
2000-01-0540.9159040.803904-1.231580
2000-01-065-0.8005500.390671-0.679977
2000-01-076-0.502958-1.2396710.730893
2000-01-0871.0550650.9000783.551748
\n", "
" ], "text/plain": [ " A B C D\n", "2000-01-01 0 -1.068915 1.270093 1.131469\n", "2000-01-02 1 -0.324633 -2.092349 -0.550827\n", "2000-01-03 2 -0.897289 -2.043889 -1.096294\n", "2000-01-04 3 2.152613 1.639017 0.314416\n", "2000-01-05 4 0.915904 0.803904 -1.231580\n", "2000-01-06 5 -0.800550 0.390671 -0.679977\n", "2000-01-07 6 -0.502958 -1.239671 0.730893\n", "2000-01-08 7 1.055065 0.900078 3.551748" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfa['A'] = list(range(len(dfa.index))) # use this form to create a new column\n", "dfa" ] }, { "cell_type": "markdown", "id": "1dac3787-172d-4127-b483-d08921a0e060", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "- You can use this access only if the index element is a valid Python identifier, e.g. s.1 is not allowed. See here for an explanation of valid identifiers.\n", "\n", "- The attribute will not be available if it conflicts with an existing method name, e.g. s.min is not allowed, but s['min'] is possible.\n", "\n", "- Similarly, the attribute will not be available if it conflicts with any of the following list: index, major_axis, minor_axis, items.\n", "\n", "- In any of these cases, standard indexing will still work, e.g. s['1'], s['min'], and s['index'] will access the corresponding element or column." ] }, { "cell_type": "markdown", "id": "ae10e002", "metadata": {}, "source": [ "If you are using the IPython environment, you may also use tab-completion to see these accessible attributes.\n", "\n", "You can also assign a `dict` to a row of a `DataFrame`:" ] }, { "cell_type": "code", "execution_count": null, "id": "29d1e1b0", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xy
013
1999
235
\n", "
" ], "text/plain": [ " x y\n", "0 1 3\n", "1 9 99\n", "2 3 5" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]})\n", "x.iloc[1] = {'x': 9, 'y': 99}\n", "x" ] }, { "cell_type": "markdown", "id": "9e1ee914", "metadata": {}, "source": [ "You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful; if you try to use attribute access to create a new column, it creates a new attribute rather than a new column. In 0.21.0 and later, this will raise a `UserWarning`:" ] }, { "cell_type": "code", "execution_count": null, "id": "b55c8c4d", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\fuqiongying\\AppData\\Local\\Temp\\ipykernel_15064\\269534380.py:2: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access\n", " df.two = [4, 5, 6]\n" ] } ], "source": [ "df = pd.DataFrame({'one': [1., 2., 3.]})\n", "df.two = [4, 5, 6]" ] }, { "cell_type": "code", "execution_count": null, "id": "e0a12bf3", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
one
01.0
12.0
23.0
\n", "
" ], "text/plain": [ " one\n", "0 1.0\n", "1 2.0\n", "2 3.0" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "96013159", "metadata": {}, "source": [ "### Slicing ranges\n", "\n", "For now, we explain the semantics of slicing using the [] operator.\n", "\n", "With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels:" ] }, { "cell_type": "code", "execution_count": null, "id": "ab285a63", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 -1.003075\n", "2000-01-02 0.508569\n", "2000-01-03 0.762823\n", "2000-01-04 0.522446\n", "2000-01-05 -0.302838\n", "2000-01-06 -0.834977\n", "2000-01-07 -1.556795\n", "2000-01-08 -0.424052\n", "Freq: D, Name: A, dtype: float64" ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": null, "id": "73654be5", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 -1.003075\n", "2000-01-02 0.508569\n", "2000-01-03 0.762823\n", "2000-01-04 0.522446\n", "2000-01-05 -0.302838\n", "Freq: D, Name: A, dtype: float64" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[:5]" ] }, { "cell_type": "code", "execution_count": null, "id": "bafda5a6", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 -1.003075\n", "2000-01-03 0.762823\n", "2000-01-05 -0.302838\n", "2000-01-07 -1.556795\n", "Freq: 2D, Name: A, dtype: float64" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[::2]" ] }, { "cell_type": "code", "execution_count": null, "id": "e28c3dc5", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "2000-01-08 -0.424052\n", "2000-01-07 -1.556795\n", "2000-01-06 -0.834977\n", "2000-01-05 -0.302838\n", "2000-01-04 0.522446\n", "2000-01-03 0.762823\n", "2000-01-02 0.508569\n", "2000-01-01 -1.003075\n", "Freq: -1D, Name: A, dtype: float64" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[::-1]" ] }, { "cell_type": "markdown", "id": "13b86bff", "metadata": {}, "source": [ "Note that setting works as well:" ] }, { "cell_type": "code", "execution_count": null, "id": "46dbb94c", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 0.000000\n", "2000-01-02 0.000000\n", "2000-01-03 0.000000\n", "2000-01-04 0.000000\n", "2000-01-05 0.000000\n", "2000-01-06 -0.834977\n", "2000-01-07 -1.556795\n", "2000-01-08 -0.424052\n", "Freq: D, Name: A, dtype: float64" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s2 = s.copy()\n", "s2[:5] = 0\n", "s2" ] }, { "cell_type": "markdown", "id": "89fac206", "metadata": {}, "source": [ "With DataFrame, slicing inside of `[]` slices the rows. This is provided largely as a convenience since it is such a common operation." ] }, { "cell_type": "code", "execution_count": null, "id": "d2c3c1af", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
one
01.0
12.0
23.0
\n", "
" ], "text/plain": [ " one\n", "0 1.0\n", "1 2.0\n", "2 3.0" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[:3]" ] }, { "cell_type": "code", "execution_count": null, "id": "c46fa1e7", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
one
23.0
12.0
01.0
\n", "
" ], "text/plain": [ " one\n", "2 3.0\n", "1 2.0\n", "0 1.0" ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[::-1]" ] }, { "cell_type": "markdown", "id": "674b4c29-0e55-4243-a46c-7fbffb51a02a", "metadata": {}, "source": [ "## Acknowledgments\n", "\n", "Thanks for [Pandas user guide](https://pandas.pydata.org/docs/user_guide/index.html). It contributes the majority of the content in this chapter." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }