{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "67b5393b", "metadata": {}, "source": [ "# 一维数据结构Series对象\n", "\n", "Pandas模块中有两种主要的数据结构:一维数据结构Series和二维数据结构DataFrame,这两种数据结构能处理各种常见类型的数据。其中,又以二维数据结构DataFrame最为常用。\n", "\n", "在Pandas中,一维数据结构Series可以存储任意类型的数据,包括整数、浮点数、字符串、Python 对象等。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8bca6859", "metadata": {}, "source": [ "## Series对象的生成\n", "\n", "Pandas模块与NumPy模块需要配合使用。导入相关模块:" ] }, { "cell_type": "code", "execution_count": 1, "id": "adbc11b1", "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "id": "8e4305be", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f46cf58d", "metadata": {}, "source": [ "Series对象的构造方法为:\n", "\n", "```python\n", "pd.Series(data=None, index=None, dtype=None, name=None)\n", "```\n", "\n", "其中,各参数的含义为:\n", "- data参数可以是列表、元组或者一维数组,也可以是字典,还可以是标量值;\n", "- index参数是一个与data大小相同的数组或索引,表示Series对象的标记;\n", "- 与NumPy数组一样,Series对象中的数据必须是同一类型的,不指定dtype参数时,Pandas会根据data中的数据进行推断。" ] }, { "cell_type": "markdown", "id": "7de0edbf", "metadata": {}, "source": [ "### 使用数组生成" ] }, { "cell_type": "markdown", "id": "379f3148", "metadata": {}, "source": [ "使用数组生成Series对象:" ] }, { "cell_type": "code", "execution_count": 3, "id": "16c2437a", "metadata": {}, "outputs": [], "source": [ "a = pd.Series([1, 2, 3, 4])" ] }, { "cell_type": "code", "execution_count": 4, "id": "ea31c105", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 3\n", "3 4\n", "dtype: int64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "attachments": {}, "cell_type": "markdown", "id": "bde1d730", "metadata": {}, "source": [ "左栏是该Series对象的标记即index参数需要指定的内容,右边是对应的数据。在不指定index参数的情况下,标记默认是RangeIndex(n),其中n是data的长度。标记可以用`.index`属性查看:" ] }, { "cell_type": "code", "execution_count": 5, "id": "08151532", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=4, step=1)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.index" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c782c3f9", "metadata": {}, "source": [ "可以用标记来索引对应位置的值:" ] }, { "cell_type": "code", "execution_count": 6, "id": "b063318a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[0]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d6b177c3", "metadata": {}, "source": [ "Series对象的标记类似于字典,因此与数组不同的,Series不支持负数索引。\n", "\n", "index参数可以不是整数:" ] }, { "cell_type": "code", "execution_count": 7, "id": "3d1e7912", "metadata": {}, "outputs": [], "source": [ "a = pd.Series([1, 2, 3, 4], index=[\"a\", \"b\", \"c\", \"d\"])" ] }, { "cell_type": "code", "execution_count": 8, "id": "54b15c7d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1\n", "b 2\n", "c 3\n", "d 4\n", "dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "cell_type": "code", "execution_count": 9, "id": "6140bc65", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a['b']" ] }, { "cell_type": "code", "execution_count": 10, "id": "5fa2ea7d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['a', 'b', 'c', 'd'], dtype='object')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.index" ] }, { "cell_type": "markdown", "id": "766c1e86", "metadata": {}, "source": [ "不过Series对象也可以用数字索引:" ] }, { "cell_type": "code", "execution_count": 11, "id": "f03dd793", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[0]" ] }, { "cell_type": "markdown", "id": "489fe8ea", "metadata": {}, "source": [ "### 使用字典生成" ] }, { "cell_type": "markdown", "id": "4d3ccf6f", "metadata": {}, "source": [ "使用字典生成Series对象:" ] }, { "cell_type": "code", "execution_count": 12, "id": "4ceb08b6", "metadata": {}, "outputs": [], "source": [ "d = {\"c\": 3, \"b\": 2, \"a\": 1}" ] }, { "cell_type": "code", "execution_count": 13, "id": "ced2103a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "c 3\n", "b 2\n", "a 1\n", "dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(d)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e61ee367", "metadata": {}, "source": [ "如果指定了index参数,Pandas会按照参数指定的顺序,从字典中依次读取相应的值,并让不存在的键对应`np.nan`:" ] }, { "cell_type": "code", "execution_count": 14, "id": "22a8670d", "metadata": {}, "outputs": [], "source": [ "a = pd.Series(d, index=['c', 'd', 'b', 'e'])" ] }, { "cell_type": "code", "execution_count": 15, "id": "c8631195", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "c 3.0\n", "d NaN\n", "b 2.0\n", "e NaN\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3b3fde53", "metadata": {}, "source": [ "### 使用标量生成\n", "\n", "Series对象还可以通过标量生成,通过指定index参数,产生一个指定大小且值全为该标量的Series对象:" ] }, { "cell_type": "code", "execution_count": 16, "id": "d3ac532d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 5\n", "1 5\n", "2 5\n", "dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(5, index=range(3))" ] }, { "cell_type": "code", "execution_count": 17, "id": "6d629693", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 5\n", "b 5\n", "c 5\n", "d 5\n", "dtype: int64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(5, index=[\"a\", \"b\", \"c\", \"d\"])" ] }, { "cell_type": "markdown", "id": "af3b4401", "metadata": {}, "source": [ "## Series对象的使用" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a589103a", "metadata": {}, "source": [ "Series对象可以像数组或字典一样使用。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d3400ec0", "metadata": {}, "source": [ "### 像数组一样使用\n", "\n", "Series对象可以从数组中生成,也支持一些数组的操作:" ] }, { "cell_type": "code", "execution_count": 18, "id": "a790798b", "metadata": {}, "outputs": [], "source": [ "s = pd.Series(np.random.randn(5),index=['a', 'b', 'c', 'd', 'e'])" ] }, { "cell_type": "code", "execution_count": 19, "id": "bcf80864", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 0.511909\n", "b 0.936458\n", "c 1.273640\n", "d 0.406360\n", "e 0.070895\n", "dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7d43aaef", "metadata": {}, "source": [ "虽然标记不是数字,仍然可以像数组一样按照位置顺序对它进行索引:" ] }, { "cell_type": "code", "execution_count": 20, "id": "ac4a2754", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5119086238174924" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[0]" ] }, { "cell_type": "code", "execution_count": 21, "id": "f537d078", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 0.511909\n", "b 0.936458\n", "c 1.273640\n", "dtype: float64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[:3]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c8ac0b58", "metadata": {}, "source": [ "也可以使用布尔值进行索引:" ] }, { "cell_type": "code", "execution_count": 22, "id": "6645b00b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b 0.936458\n", "c 1.273640\n", "dtype: float64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[s > s.median()]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "37e8a602", "metadata": {}, "source": [ "Series对象还支持与NumPy数组类似的高级索引,同时索引多个元素:" ] }, { "cell_type": "code", "execution_count": 23, "id": "9f809d87", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "e 0.070895\n", "d 0.406360\n", "b 0.936458\n", "dtype: float64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[[4, 3, 1]]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "759e5aec", "metadata": {}, "source": [ "一些NumPy函数可以直接作用在Series对象上,返回的结果还是Series对象:" ] }, { "cell_type": "code", "execution_count": 24, "id": "b08e3728", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1.668473\n", "b 2.550930\n", "c 3.573839\n", "d 1.501342\n", "e 1.073468\n", "dtype: float64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(s)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "96e4b631", "metadata": {}, "source": [ "### 像字典一样使用" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2ae10147", "metadata": {}, "source": [ "Series对象也可以像字典一样的使用,标记就相当于字典的键,可以进行值的查询:" ] }, { "cell_type": "code", "execution_count": 25, "id": "984ede55", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5119086238174924" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s['a']" ] }, { "cell_type": "code", "execution_count": 26, "id": "0f14b4bd", "metadata": {}, "outputs": [], "source": [ "s['e'] = 12" ] }, { "cell_type": "code", "execution_count": 27, "id": "e9a26f8e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 0.511909\n", "b 0.936458\n", "c 1.273640\n", "d 0.406360\n", "e 12.000000\n", "dtype: float64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6ddcb8db", "metadata": {}, "source": [ "可以用关键字in查看Series中是否存在某个标记:" ] }, { "cell_type": "code", "execution_count": 28, "id": "53181c6e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'e' in s" ] }, { "cell_type": "code", "execution_count": 29, "id": "a59d7ef4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "0 in s" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6a49b429", "metadata": {}, "source": [ "Series对象也支持用.get()方法索引不存在的标记:" ] }, { "cell_type": "code", "execution_count": 30, "id": "5682c5ac", "metadata": {}, "outputs": [], "source": [ "s.get('f')" ] }, { "cell_type": "code", "execution_count": 31, "id": "769f1cae", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.get('f', np.nan)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f64e019b", "metadata": {}, "source": [ "### 数学运算和标记对齐" ] }, { "cell_type": "markdown", "id": "87842941", "metadata": {}, "source": [ "基础的数学运算:" ] }, { "cell_type": "code", "execution_count": 32, "id": "cfe3365b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1.023817\n", "b 1.872916\n", "c 2.547281\n", "d 0.812719\n", "e 24.000000\n", "dtype: float64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s + s" ] }, { "cell_type": "code", "execution_count": 33, "id": "f7da1e41", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1.023817\n", "b 1.872916\n", "c 2.547281\n", "d 0.812719\n", "e 24.000000\n", "dtype: float64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s * 2" ] }, { "cell_type": "code", "execution_count": 34, "id": "d4fc5d9d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1.668473\n", "b 2.550930\n", "c 3.573839\n", "d 1.501342\n", "e 162754.791419\n", "dtype: float64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(s)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4fed949d", "metadata": {}, "source": [ "不过数组与Series对象有一个本质上的区别。数组只有顺序没有标记,而Series对象是有标记的,两个Series对象相加时,会根据标记的值进行对齐操作。\n", "\n", "例如,`s[1:]`的标记为b到e,而`s[:-1]`的标记为a到d,它们相加时,会首先对两个Series中各自独有的部分补上np.nan,然后再相加,从而得到:" ] }, { "cell_type": "code", "execution_count": 35, "id": "21fdc61e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b 0.936458\n", "c 1.273640\n", "d 0.406360\n", "e 12.000000\n", "dtype: float64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[1:]" ] }, { "cell_type": "code", "execution_count": 36, "id": "48379986", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 0.511909\n", "b 0.936458\n", "c 1.273640\n", "d 0.406360\n", "dtype: float64" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[:-1]" ] }, { "cell_type": "code", "execution_count": 37, "id": "4ab92ce1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a NaN\n", "b 1.872916\n", "c 2.547281\n", "d 0.812719\n", "e NaN\n", "dtype: float64" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[1:] + s[:-1]" ] }, { "cell_type": "code", "execution_count": null, "id": "85a3b0c8", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.10" } }, "nbformat": 4, "nbformat_minor": 5 }