{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 二维数据结构:DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"`DataFrame` 是 `pandas` 中的二维数据结构,可以看成一个 `Excel` 中的工作表,或者一个 `SQL` 表,或者一个存储 `Series` 对象的字典。\n",
"\n",
"`DataFrame(data, index, columns)` 中的 `data` 可以接受很多数据类型:\n",
"\n",
"- 一个存储一维数组,字典,列表或者 `Series` 的字典\n",
"- 2-D 数组\n",
"- 结构或者记录数组\n",
"- 一个 `Series`\n",
"- 另一个 `DataFrame`\n",
"\n",
"`index` 用于指定行的 `label`,`columns` 用于指定列的 `label`,如果参数不传入,那么会按照传入的内容进行设定。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 从 Series 字典中构造"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"可以使用值为 `Series` 的字典进行构造:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),\n",
" 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果没有传入 `columns` 的值,那么 `columns` 的值默认为字典 `key`,`index` 默认为所有 `value` 中 `index` 的并集。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
"
\n",
" \n",
" \n",
" \n",
" a | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" c | \n",
" 3 | \n",
" 3 | \n",
"
\n",
" \n",
" d | \n",
" NaN | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two\n",
"a 1 1\n",
"b 2 2\n",
"c 3 3\n",
"d NaN 4"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(d)\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果指定了 `index` 值,`index` 为指定的 `index` 值:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
"
\n",
" \n",
" \n",
" \n",
" d | \n",
" NaN | \n",
" 4 | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" a | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two\n",
"d NaN 4\n",
"b 2 2\n",
"a 1 1"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(d, index=['d', 'b', 'a'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果指定了 `columns` 值,会去字典中寻找,找不到的值为 `NaN`:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" two | \n",
" three | \n",
"
\n",
" \n",
" \n",
" \n",
" d | \n",
" 4 | \n",
" NaN | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" NaN | \n",
"
\n",
" \n",
" a | \n",
" 1 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" two three\n",
"d 4 NaN\n",
"b 2 NaN\n",
"a 1 NaN"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"查看 `index` 和 `columns`:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Index([u'a', u'b', u'c', u'd'], dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Index([u'one', u'two'], dtype='object')"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 从 ndarray 或者 list 字典中构造"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果字典是 `ndarray` 或者 `list`,那么它们的长度要严格保持一致:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"d = {'one' : [1., 2., 3., 4.],\n",
" 'two' : [4., 3., 2., 1.]}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`index` 默认为 `range(n)`,其中 `n` 为数组长度: "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 2 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two\n",
"0 1 4\n",
"1 2 3\n",
"2 3 2\n",
"3 4 1"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(d)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果传入 `index` 参数,那么它必须与数组等长:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
"
\n",
" \n",
" \n",
" \n",
" a | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" c | \n",
" 3 | \n",
" 2 | \n",
"
\n",
" \n",
" d | \n",
" 4 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two\n",
"a 1 4\n",
"b 2 3\n",
"c 3 2\n",
"d 4 1"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(d, index=['a', 'b', 'c', 'd'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 从结构数组中构造"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`numpy` 支持结构数组的构造:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([(1, 2.0, 'Hello'), (2, 3.0, 'World')], \n",
" dtype=[('A', '\n",
"\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" Hello | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" World | \n",
"
\n",
" \n",
"
\n",
""
],
"text/plain": [
" A B C\n",
"0 1 2 Hello\n",
"1 2 3 World"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(data)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" \n",
" \n",
" first | \n",
" 1 | \n",
" 2 | \n",
" Hello | \n",
"
\n",
" \n",
" second | \n",
" 2 | \n",
" 3 | \n",
" World | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C\n",
"first 1 2 Hello\n",
"second 2 3 World"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(data, index=['first', 'second'])"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" C | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Hello | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" 1 | \n",
" World | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" C A B\n",
"0 Hello 1 2\n",
"1 World 2 3"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(data, columns=['C', 'A', 'B'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 从字典列表中构造"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"字典中同一个键的值会被合并到同一列:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
" c | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 5 | \n",
" 10 | \n",
" 20 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b c\n",
"0 1 2 NaN\n",
"1 5 10 20"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]\n",
"\n",
"pd.DataFrame(data2)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
" c | \n",
"
\n",
" \n",
" \n",
" \n",
" first | \n",
" 1 | \n",
" 2 | \n",
" NaN | \n",
"
\n",
" \n",
" second | \n",
" 5 | \n",
" 10 | \n",
" 20 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b c\n",
"first 1 2 NaN\n",
"second 5 10 20"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(data2, index=['first', 'second'])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" 1 | \n",
" 5 | \n",
" 10 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b\n",
"0 1 2\n",
"1 5 10"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(data2, columns=['a', 'b'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 从 Series 中构造"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"相当于将 Series 二维化。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 其他构造方法"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`DataFrame.from_dict` 从现有的一个字典中构造,`DataFrame.from_records` 从现有的一个记录数组中构造:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" C | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Hello | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" World | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"C \n",
"Hello 1 2\n",
"World 2 3"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame.from_records(data, index='C')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`DataFrame.from_items` 从字典的 `item` 对构造:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 5 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"0 1 4\n",
"1 2 5\n",
"2 3 6"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 列操作"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`DataFrame` 可以类似于字典一样对列进行操作:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 1\n",
"b 2\n",
"c 3\n",
"d NaN\n",
"Name: one, dtype: float64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"one\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"添加新列:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" two | \n",
" three | \n",
" flag | \n",
"
\n",
" \n",
" \n",
" \n",
" a | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" 2 | \n",
" 4 | \n",
" False | \n",
"
\n",
" \n",
" c | \n",
" 3 | \n",
" 3 | \n",
" 9 | \n",
" True | \n",
"
\n",
" \n",
" d | \n",
" NaN | \n",
" 4 | \n",
" NaN | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one two three flag\n",
"a 1 1 1 False\n",
"b 2 2 4 False\n",
"c 3 3 9 True\n",
"d NaN 4 NaN False"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['three'] = df['one'] * df['two']\n",
"\n",
"df['flag'] = df['one'] > 2\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"可以像字典一样删除:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" flag | \n",
"
\n",
" \n",
" \n",
" \n",
" a | \n",
" 1 | \n",
" False | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" False | \n",
"
\n",
" \n",
" c | \n",
" 3 | \n",
" True | \n",
"
\n",
" \n",
" d | \n",
" NaN | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one flag\n",
"a 1 False\n",
"b 2 False\n",
"c 3 True\n",
"d NaN False"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"del df[\"two\"]\n",
"\n",
"three = df.pop(\"three\")\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"给一行赋单一值:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" flag | \n",
" foo | \n",
"
\n",
" \n",
" \n",
" \n",
" a | \n",
" 1 | \n",
" False | \n",
" bar | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" False | \n",
" bar | \n",
"
\n",
" \n",
" c | \n",
" 3 | \n",
" True | \n",
" bar | \n",
"
\n",
" \n",
" d | \n",
" NaN | \n",
" False | \n",
" bar | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one flag foo\n",
"a 1 False bar\n",
"b 2 False bar\n",
"c 3 True bar\n",
"d NaN False bar"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['foo'] = 'bar'\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果 `index` 不一致,那么会只保留公共的部分:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" flag | \n",
" foo | \n",
" one_trunc | \n",
"
\n",
" \n",
" \n",
" \n",
" a | \n",
" 1 | \n",
" False | \n",
" bar | \n",
" 1 | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" False | \n",
" bar | \n",
" 2 | \n",
"
\n",
" \n",
" c | \n",
" 3 | \n",
" True | \n",
" bar | \n",
" NaN | \n",
"
\n",
" \n",
" d | \n",
" NaN | \n",
" False | \n",
" bar | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one flag foo one_trunc\n",
"a 1 False bar 1\n",
"b 2 False bar 2\n",
"c 3 True bar NaN\n",
"d NaN False bar NaN"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['one_trunc'] = df['one'][:2]\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"也可以直接插入一维数组,但是数组的长度必须与 `index` 一致。\n",
"\n",
"默认新列插入位置在最后,也可以指定位置插入:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" bar | \n",
" flag | \n",
" foo | \n",
" one_trunc | \n",
"
\n",
" \n",
" \n",
" \n",
" a | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" bar | \n",
" 1 | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" 2 | \n",
" False | \n",
" bar | \n",
" 2 | \n",
"
\n",
" \n",
" c | \n",
" 3 | \n",
" 3 | \n",
" True | \n",
" bar | \n",
" NaN | \n",
"
\n",
" \n",
" d | \n",
" NaN | \n",
" NaN | \n",
" False | \n",
" bar | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one bar flag foo one_trunc\n",
"a 1 1 False bar 1\n",
"b 2 2 False bar 2\n",
"c 3 3 True bar NaN\n",
"d NaN NaN False bar NaN"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.insert(1, 'bar', df['one'])\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"添加一个 `test` 新列:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" one | \n",
" bar | \n",
" flag | \n",
" foo | \n",
" one_trunc | \n",
" test | \n",
"
\n",
" \n",
" \n",
" \n",
" a | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" bar | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" b | \n",
" 2 | \n",
" 2 | \n",
" False | \n",
" bar | \n",
" 2 | \n",
" 4 | \n",
"
\n",
" \n",
" c | \n",
" 3 | \n",
" 3 | \n",
" True | \n",
" bar | \n",
" NaN | \n",
" 6 | \n",
"
\n",
" \n",
" d | \n",
" NaN | \n",
" NaN | \n",
" False | \n",
" bar | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" one bar flag foo one_trunc test\n",
"a 1 1 False bar 1 2\n",
"b 2 2 False bar 2 4\n",
"c 3 3 True bar NaN 6\n",
"d NaN NaN False bar NaN NaN"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.assign(test=df[\"one\"] + df[\"bar\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 索引和选择"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"基本操作:\n",
"\n",
"| Operation\t| Syntax | Result |\n",
"| ---- | ----- | ---- |\n",
"| Select column\t| df[col] | Series |\n",
"| Select row by label | df.loc[label] | Series |\n",
"| Select row by integer location | df.iloc[loc] | Series |\n",
"| Slice rows | df[5:10]\t| DataFrame |\n",
"| Select rows by boolean vector\t| df[bool_vec] | DataFrame |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}