{
"cells": [
{
"cell_type": "markdown",
"id": "ed1f74e7",
"metadata": {},
"source": [
"# 二维数据结构DataFrame对象"
]
},
{
"cell_type": "markdown",
"id": "717b29b8",
"metadata": {},
"source": [
"DataFrame对象是一种二维带标记数据结构,不同列的数据类型可以不同。为了方便理解,可以将DataFrame对象看成一张Excel电子表格,或者是一个由多列Series对象构成的字典。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b0182569",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "79178b2b",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "698481f5",
"metadata": {},
"source": [
"## DataFrame对象的生成\n",
"\n",
"与Series类似,DataFrame对象也可以由多种类型的数据生成:\n",
"- 由Series对象为值构成的字典。\n",
"- 由一维数组或列表构成的字典。\n",
"- 由字典构成的列表或数组。"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2e7cb22e",
"metadata": {},
"source": [
"### 使用Series对象构成的字典生成"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "821beb00",
"metadata": {},
"source": [
"DataFrame对象可以从一组由Series对象为值构成的字典中生成。字典中的值除了Series对象,也可以是另一个字典,因为字典被转换为Series对象。\n",
"\n",
"假设有一个包含两个Series对象的字典d:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "98d20646",
"metadata": {},
"outputs": [],
"source": [
"s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "315ddecb",
"metadata": {},
"outputs": [],
"source": [
"s2 = pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1e13fa34",
"metadata": {},
"outputs": [],
"source": [
"d = {\"one\": s1, \"two\": s2}"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "49f79ba0",
"metadata": {},
"source": [
"可以用字典d构造一个DataFrame对象:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "79c735da",
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame(d)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d65d98a8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
"
\n",
" \n",
" \n",
" \n",
" | a | \n",
" 1.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
" | b | \n",
" 2.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | c | \n",
" 3.0 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | d | \n",
" NaN | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two\n",
"a 1.0 1.0\n",
"b 2.0 2.0\n",
"c 3.0 3.0\n",
"d NaN 4.0"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e1f02d1b",
"metadata": {},
"source": [
"与Series相比,DataFrame对象要区分不同的行和列,因此有行标记和列标记之分。默认情况下,df的列标记是传入字典的键,可以用属性`.columns`查看:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "fa4a04fb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['one', 'two'], dtype='object')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ff9acb82",
"metadata": {},
"source": [
"行标记是两个Series对象标记的并集,Pandas会自动将两个Series对象的标记进行对齐:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "34b63bf8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['a', 'b', 'c', 'd'], dtype='object')"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index"
]
},
{
"cell_type": "markdown",
"id": "741871ba",
"metadata": {},
"source": [
"在生成DataFrame时,也可以指定index和columns参数:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "227afaa7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
"
\n",
" \n",
" \n",
" \n",
" | d | \n",
" NaN | \n",
" 4.0 | \n",
"
\n",
" \n",
" | b | \n",
" 2.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | a | \n",
" 1.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two\n",
"d NaN 4.0\n",
"b 2.0 2.0\n",
"a 1.0 1.0"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(d, index=[\"d\", \"b\", \"a\"])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a50f5bc2",
"metadata": {},
"source": [
"Pandas会按照给定的顺序从传入的数据中寻找对应的值,如果该值不存在,则使用缺省值`np.nan`:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "492f3b05",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" two | \n",
" three | \n",
"
\n",
" \n",
" \n",
" \n",
" | d | \n",
" 4.0 | \n",
" NaN | \n",
"
\n",
" \n",
" | b | \n",
" 2.0 | \n",
" NaN | \n",
"
\n",
" \n",
" | a | \n",
" 1.0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" two three\n",
"d 4.0 NaN\n",
"b 2.0 NaN\n",
"a 1.0 NaN"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2cef3d5d",
"metadata": {},
"source": [
"### 使用一维数组构成的字典生成"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "eb0bb12a",
"metadata": {},
"source": [
"DataFrame对象还可以使用由一维数组或列表构成的字典生成,这些数组和列表必须是等长的:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "a097d435",
"metadata": {},
"outputs": [],
"source": [
"d = {'one' : [1., 2., 3., 4.],\n",
" 'two' : [4., 3., 2., 1.]}"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "8d58e2f8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1.0 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 2.0 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 3.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 4.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two\n",
"0 1.0 4.0\n",
"1 2.0 3.0\n",
"2 3.0 2.0\n",
"3 4.0 1.0"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(d)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "0333d58f",
"metadata": {},
"source": [
"传入index参数时,该参数的长度也必须与列表长度一致:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "21661d1e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
"
\n",
" \n",
" \n",
" \n",
" | a | \n",
" 1.0 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | b | \n",
" 2.0 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | c | \n",
" 3.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | d | \n",
" 4.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two\n",
"a 1.0 4.0\n",
"b 2.0 3.0\n",
"c 3.0 2.0\n",
"d 4.0 1.0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(d, index=['a', 'b', 'c', 'd'])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9b2c149f",
"metadata": {},
"source": [
"### 使用字典数组生成"
]
},
{
"cell_type": "markdown",
"id": "3f5efc2a",
"metadata": {},
"source": [
"还可以使用字典构成的数组或列表进行构建:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "57d4e5b2",
"metadata": {},
"outputs": [],
"source": [
"data = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "fd002d26",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
" c | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" 2 | \n",
" NaN | \n",
"
\n",
" \n",
" | 1 | \n",
" 5 | \n",
" 10 | \n",
" 20.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b c\n",
"0 1 2 NaN\n",
"1 5 10 20.0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(data)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e6448551",
"metadata": {},
"source": [
"与Series不同的是,字典的键对应的是列标记,行标记由数组或列表的大小决定。"
]
},
{
"cell_type": "markdown",
"id": "28b0a139",
"metadata": {},
"source": [
"### 使用二维数组生成"
]
},
{
"cell_type": "markdown",
"id": "e73d4783",
"metadata": {},
"source": [
"还可以使用NumPy的二维数组生成:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "5e70aa1e",
"metadata": {},
"outputs": [],
"source": [
"a = np.array([[1,2,3], [4,5,6]])"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "a6fa5ce0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" | 1 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"0 1 2 3\n",
"1 4 5 6"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(a)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6aed9e8c",
"metadata": {},
"source": [
"## DataFrame对象的使用\n",
"\n",
"DataFrame对象不是二维NumPy数组,在使用方法上存在很大差异:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "9f32f6c3",
"metadata": {},
"outputs": [],
"source": [
"s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "239d3b03",
"metadata": {},
"outputs": [],
"source": [
"s2 = pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "e8776dc6",
"metadata": {},
"outputs": [],
"source": [
"d = {\"one\": s1, \"two\": s2}"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "cdde7d53",
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame(d)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "5e17d0aa",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
"
\n",
" \n",
" \n",
" \n",
" | a | \n",
" 1.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
" | b | \n",
" 2.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | c | \n",
" 3.0 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | d | \n",
" NaN | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two\n",
"a 1.0 1.0\n",
"b 2.0 2.0\n",
"c 3.0 3.0\n",
"d NaN 4.0"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "23e62817",
"metadata": {},
"source": [
"### 列相关的操作"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3950a0b2",
"metadata": {},
"source": [
"DataFrame对象可以看成是一个由Series对象构成的字典,.columns属性对应字典的键,每一列对应字典的值:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "a3c973ea",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"a 1.0\n",
"b 2.0\n",
"c 3.0\n",
"d NaN\n",
"Name: one, dtype: float64"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['one']"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "55ef0692",
"metadata": {},
"source": [
"可以像字典一样增加新列:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "0f2ef75c",
"metadata": {},
"outputs": [],
"source": [
"df[\"three\"] = df[\"one\"] * df[\"two\"]"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "05ac44bb",
"metadata": {},
"outputs": [],
"source": [
"df[\"flag\"] = df[\"one\"] > 2"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "7ecb4c9f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
" three | \n",
" flag | \n",
"
\n",
" \n",
" \n",
" \n",
" | a | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" False | \n",
"
\n",
" \n",
" | b | \n",
" 2.0 | \n",
" 2.0 | \n",
" 4.0 | \n",
" False | \n",
"
\n",
" \n",
" | c | \n",
" 3.0 | \n",
" 3.0 | \n",
" 9.0 | \n",
" True | \n",
"
\n",
" \n",
" | d | \n",
" NaN | \n",
" 4.0 | \n",
" NaN | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two three flag\n",
"a 1.0 1.0 1.0 False\n",
"b 2.0 2.0 4.0 False\n",
"c 3.0 3.0 9.0 True\n",
"d NaN 4.0 NaN False"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1f62216a",
"metadata": {},
"source": [
"增加新列时,如果新列的值是单一值,Pandas会按照行标记自动进行扩展:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "e156535d",
"metadata": {},
"outputs": [],
"source": [
"df[\"four\"] = 4"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e220dded",
"metadata": {},
"source": [
"DataFrame对象支持用del关键字或者.pop()方法删除列:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "8e744653",
"metadata": {},
"outputs": [],
"source": [
"del df[\"two\"]"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "4770ad42",
"metadata": {},
"outputs": [],
"source": [
"three = df.pop(\"three\")"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "d20663a2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"a 1.0\n",
"b 4.0\n",
"c 9.0\n",
"d NaN\n",
"Name: three, dtype: float64"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"three"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "8713ea5e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" flag | \n",
" four | \n",
"
\n",
" \n",
" \n",
" \n",
" | a | \n",
" 1.0 | \n",
" False | \n",
" 4 | \n",
"
\n",
" \n",
" | b | \n",
" 2.0 | \n",
" False | \n",
" 4 | \n",
"
\n",
" \n",
" | c | \n",
" 3.0 | \n",
" True | \n",
" 4 | \n",
"
\n",
" \n",
" | d | \n",
" NaN | \n",
" False | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one flag four\n",
"a 1.0 False 4\n",
"b 2.0 False 4\n",
"c 3.0 True 4\n",
"d NaN False 4"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "18ec10aa",
"metadata": {},
"source": [
"增加一个行标记不完全相同的新列时,Pandas只会保留该列中与原有行标记相同的部分,以保证原DataFrame对象的行标记不变化:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "07b61507",
"metadata": {},
"outputs": [],
"source": [
"df[\"foo\"] = pd.Series([1,2,3], index=[\"a\", \"d\", \"e\"])"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "e45141a6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" flag | \n",
" four | \n",
" foo | \n",
"
\n",
" \n",
" \n",
" \n",
" | a | \n",
" 1.0 | \n",
" False | \n",
" 4 | \n",
" 1.0 | \n",
"
\n",
" \n",
" | b | \n",
" 2.0 | \n",
" False | \n",
" 4 | \n",
" NaN | \n",
"
\n",
" \n",
" | c | \n",
" 3.0 | \n",
" True | \n",
" 4 | \n",
" NaN | \n",
"
\n",
" \n",
" | d | \n",
" NaN | \n",
" False | \n",
" 4 | \n",
" 2.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one flag four foo\n",
"a 1.0 False 4 1.0\n",
"b 2.0 False 4 NaN\n",
"c 3.0 True 4 NaN\n",
"d NaN False 4 2.0"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f53b8dd1",
"metadata": {},
"source": [
"默认情况下,新列的插入位置都在DataFrame对象的最后。可以使用.insert()方法将其插入指定的位置:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "edad10b3",
"metadata": {},
"outputs": [],
"source": [
"df.insert(1, \"bar\", df[\"one\"])"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "5b1967c2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" bar | \n",
" flag | \n",
" four | \n",
" foo | \n",
"
\n",
" \n",
" \n",
" \n",
" | a | \n",
" 1.0 | \n",
" 1.0 | \n",
" False | \n",
" 4 | \n",
" 1.0 | \n",
"
\n",
" \n",
" | b | \n",
" 2.0 | \n",
" 2.0 | \n",
" False | \n",
" 4 | \n",
" NaN | \n",
"
\n",
" \n",
" | c | \n",
" 3.0 | \n",
" 3.0 | \n",
" True | \n",
" 4 | \n",
" NaN | \n",
"
\n",
" \n",
" | d | \n",
" NaN | \n",
" NaN | \n",
" False | \n",
" 4 | \n",
" 2.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one bar flag four foo\n",
"a 1.0 1.0 False 4 1.0\n",
"b 2.0 2.0 False 4 NaN\n",
"c 3.0 3.0 True 4 NaN\n",
"d NaN NaN False 4 2.0"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2441cfd5",
"metadata": {},
"source": [
"### 行相关的操作\n",
"\n",
"DataFrame对象有两种常用的索引行的方式。可以用`.loc`属性索引行标记,返回一个Series对象:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "737b173b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"one 2.0\n",
"bar 2.0\n",
"flag False\n",
"four 4\n",
"foo NaN\n",
"Name: b, dtype: object"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[\"b\"]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "cb199e17",
"metadata": {},
"source": [
"也可以用.iloc属性索引位置,得到第二行数据:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "80f5c2ae",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"one 2.0\n",
"bar 2.0\n",
"flag False\n",
"four 4\n",
"foo NaN\n",
"Name: b, dtype: object"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[1]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a41ed5e1",
"metadata": {},
"source": [
"### 加法与减法操作\n",
"\n",
"DataFrame对象支持加法和减法的操作,并且按照行列标记对齐的原则进行计算:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "9b6e727a",
"metadata": {},
"outputs": [],
"source": [
"df1 = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "f2b0db7d",
"metadata": {},
"outputs": [],
"source": [
"df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "ed6e0274",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
" C | \n",
" D | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" -1.906552 | \n",
" -2.428495 | \n",
" 1.131278 | \n",
" NaN | \n",
"
\n",
" \n",
" | 1 | \n",
" -0.955872 | \n",
" -1.476556 | \n",
" -1.523796 | \n",
" NaN | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.766210 | \n",
" -0.162112 | \n",
" 0.190370 | \n",
" NaN | \n",
"
\n",
" \n",
" | 3 | \n",
" -2.866838 | \n",
" 0.866281 | \n",
" 1.340097 | \n",
" NaN | \n",
"
\n",
" \n",
" | 4 | \n",
" -2.027247 | \n",
" 0.972097 | \n",
" -0.807422 | \n",
" NaN | \n",
"
\n",
" \n",
" | 5 | \n",
" 0.841079 | \n",
" 0.101313 | \n",
" -1.701630 | \n",
" NaN | \n",
"
\n",
" \n",
" | 6 | \n",
" 0.318099 | \n",
" -0.037061 | \n",
" -1.878293 | \n",
" NaN | \n",
"
\n",
" \n",
" | 7 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 8 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 9 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 -1.906552 -2.428495 1.131278 NaN\n",
"1 -0.955872 -1.476556 -1.523796 NaN\n",
"2 0.766210 -0.162112 0.190370 NaN\n",
"3 -2.866838 0.866281 1.340097 NaN\n",
"4 -2.027247 0.972097 -0.807422 NaN\n",
"5 0.841079 0.101313 -1.701630 NaN\n",
"6 0.318099 -0.037061 -1.878293 NaN\n",
"7 NaN NaN NaN NaN\n",
"8 NaN NaN NaN NaN\n",
"9 NaN NaN NaN NaN"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1 + df2"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f0a10763",
"metadata": {},
"source": [
"DataFrame对象还可以与Series对象进行加减操作。与NumPy中的广播机制类似,Pandas会先将Series对象的标记与DataFrame对象的列标记中对应的部分拿出来,然后使用广播机制将Series对象沿着行标记进行扩展:"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "898bd8b9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
" C | \n",
" D | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.034677 | \n",
" -1.447889 | \n",
" 0.239673 | \n",
" 0.897156 | \n",
"
\n",
" \n",
" | 1 | \n",
" -0.216450 | \n",
" -0.052522 | \n",
" 0.237849 | \n",
" 0.806303 | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.260522 | \n",
" 0.590821 | \n",
" 0.231546 | \n",
" -2.164184 | \n",
"
\n",
" \n",
" | 3 | \n",
" -1.264539 | \n",
" 0.947130 | \n",
" 0.601591 | \n",
" -0.753204 | \n",
"
\n",
" \n",
" | 4 | \n",
" -1.113126 | \n",
" 0.063686 | \n",
" -0.379063 | \n",
" -0.275933 | \n",
"
\n",
" \n",
" | 5 | \n",
" 0.596109 | \n",
" -0.516650 | \n",
" -1.177866 | \n",
" 0.075800 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1.386725 | \n",
" -0.328219 | \n",
" -1.303265 | \n",
" -0.790358 | \n",
"
\n",
" \n",
" | 7 | \n",
" 1.225454 | \n",
" 0.923503 | \n",
" 0.715214 | \n",
" -0.144048 | \n",
"
\n",
" \n",
" | 8 | \n",
" -0.982050 | \n",
" -0.026315 | \n",
" 1.963732 | \n",
" 0.638793 | \n",
"
\n",
" \n",
" | 9 | \n",
" 0.715773 | \n",
" -0.767911 | \n",
" -0.379927 | \n",
" -1.533615 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 0.034677 -1.447889 0.239673 0.897156\n",
"1 -0.216450 -0.052522 0.237849 0.806303\n",
"2 0.260522 0.590821 0.231546 -2.164184\n",
"3 -1.264539 0.947130 0.601591 -0.753204\n",
"4 -1.113126 0.063686 -0.379063 -0.275933\n",
"5 0.596109 -0.516650 -1.177866 0.075800\n",
"6 1.386725 -0.328219 -1.303265 -0.790358\n",
"7 1.225454 0.923503 0.715214 -0.144048\n",
"8 -0.982050 -0.026315 1.963732 0.638793\n",
"9 0.715773 -0.767911 -0.379927 -1.533615"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "0726b1fc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
" C | \n",
" D | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" | 1 | \n",
" -0.251127 | \n",
" 1.395367 | \n",
" -0.001824 | \n",
" -0.090853 | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.225845 | \n",
" 2.038710 | \n",
" -0.008127 | \n",
" -3.061340 | \n",
"
\n",
" \n",
" | 3 | \n",
" -1.299216 | \n",
" 2.395019 | \n",
" 0.361919 | \n",
" -1.650360 | \n",
"
\n",
" \n",
" | 4 | \n",
" -1.147802 | \n",
" 1.511575 | \n",
" -0.618736 | \n",
" -1.173089 | \n",
"
\n",
" \n",
" | 5 | \n",
" 0.561432 | \n",
" 0.931239 | \n",
" -1.417538 | \n",
" -0.821356 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1.352048 | \n",
" 1.119670 | \n",
" -1.542938 | \n",
" -1.687514 | \n",
"
\n",
" \n",
" | 7 | \n",
" 1.190778 | \n",
" 2.371392 | \n",
" 0.475542 | \n",
" -1.041204 | \n",
"
\n",
" \n",
" | 8 | \n",
" -1.016727 | \n",
" 1.421574 | \n",
" 1.724059 | \n",
" -0.258363 | \n",
"
\n",
" \n",
" | 9 | \n",
" 0.681096 | \n",
" 0.679978 | \n",
" -0.619600 | \n",
" -2.430771 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 0.000000 0.000000 0.000000 0.000000\n",
"1 -0.251127 1.395367 -0.001824 -0.090853\n",
"2 0.225845 2.038710 -0.008127 -3.061340\n",
"3 -1.299216 2.395019 0.361919 -1.650360\n",
"4 -1.147802 1.511575 -0.618736 -1.173089\n",
"5 0.561432 0.931239 -1.417538 -0.821356\n",
"6 1.352048 1.119670 -1.542938 -1.687514\n",
"7 1.190778 2.371392 0.475542 -1.041204\n",
"8 -1.016727 1.421574 1.724059 -0.258363\n",
"9 0.681096 0.679978 -0.619600 -2.430771"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1 - df1.iloc[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0fe7dcf0",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}