{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "67b5393b",
   "metadata": {},
   "source": [
    "# 一维数据结构Series对象\n",
    "\n",
    "Pandas模块中有两种主要的数据结构:一维数据结构Series和二维数据结构DataFrame,这两种数据结构能处理各种常见类型的数据。其中,又以二维数据结构DataFrame最为常用。\n",
    "\n",
    "在Pandas中,一维数据结构Series可以存储任意类型的数据,包括整数、浮点数、字符串、Python 对象等。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8bca6859",
   "metadata": {},
   "source": [
    "## Series对象的生成\n",
    "\n",
    "Pandas模块与NumPy模块需要配合使用。导入相关模块:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "adbc11b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8e4305be",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f46cf58d",
   "metadata": {},
   "source": [
    "Series对象的构造方法为:\n",
    "\n",
    "```python\n",
    "pd.Series(data=None, index=None, dtype=None, name=None)\n",
    "```\n",
    "\n",
    "其中,各参数的含义为:\n",
    "- data参数可以是列表、元组或者一维数组,也可以是字典,还可以是标量值;\n",
    "- index参数是一个与data大小相同的数组或索引,表示Series对象的标记;\n",
    "- 与NumPy数组一样,Series对象中的数据必须是同一类型的,不指定dtype参数时,Pandas会根据data中的数据进行推断。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7de0edbf",
   "metadata": {},
   "source": [
    "### 使用数组生成"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "379f3148",
   "metadata": {},
   "source": [
    "使用数组生成Series对象:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "16c2437a",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = pd.Series([1, 2, 3, 4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ea31c105",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "3    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "bde1d730",
   "metadata": {},
   "source": [
    "左栏是该Series对象的标记即index参数需要指定的内容,右边是对应的数据。在不指定index参数的情况下,标记默认是RangeIndex(n),其中n是data的长度。标记可以用`.index`属性查看:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "08151532",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=4, step=1)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.index"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c782c3f9",
   "metadata": {},
   "source": [
    "可以用标记来索引对应位置的值:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "b063318a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a[0]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d6b177c3",
   "metadata": {},
   "source": [
    "Series对象的标记类似于字典,因此与数组不同的,Series不支持负数索引。\n",
    "\n",
    "index参数可以不是整数:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3d1e7912",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = pd.Series([1, 2, 3, 4], index=[\"a\", \"b\", \"c\", \"d\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "54b15c7d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "d    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "6140bc65",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a['b']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "5fa2ea7d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['a', 'b', 'c', 'd'], dtype='object')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "766c1e86",
   "metadata": {},
   "source": [
    "不过Series对象也可以用数字索引:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f03dd793",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "489fe8ea",
   "metadata": {},
   "source": [
    "### 使用字典生成"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d3ccf6f",
   "metadata": {},
   "source": [
    "使用字典生成Series对象:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "4ceb08b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "d = {\"c\": 3, \"b\": 2, \"a\": 1}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "ced2103a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "c    3\n",
       "b    2\n",
       "a    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(d)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e61ee367",
   "metadata": {},
   "source": [
    "如果指定了index参数,Pandas会按照参数指定的顺序,从字典中依次读取相应的值,并让不存在的键对应`np.nan`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "22a8670d",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = pd.Series(d, index=['c', 'd', 'b', 'e'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c8631195",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "c    3.0\n",
       "d    NaN\n",
       "b    2.0\n",
       "e    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3b3fde53",
   "metadata": {},
   "source": [
    "### 使用标量生成\n",
    "\n",
    "Series对象还可以通过标量生成,通过指定index参数,产生一个指定大小且值全为该标量的Series对象:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "d3ac532d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    5\n",
       "1    5\n",
       "2    5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(5, index=range(3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "6d629693",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    5\n",
       "b    5\n",
       "c    5\n",
       "d    5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(5, index=[\"a\", \"b\", \"c\", \"d\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af3b4401",
   "metadata": {},
   "source": [
    "## Series对象的使用"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a589103a",
   "metadata": {},
   "source": [
    "Series对象可以像数组或字典一样使用。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d3400ec0",
   "metadata": {},
   "source": [
    "### 像数组一样使用\n",
    "\n",
    "Series对象可以从数组中生成,也支持一些数组的操作:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "a790798b",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = pd.Series(np.random.randn(5),index=['a', 'b', 'c', 'd', 'e'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "bcf80864",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0.511909\n",
       "b    0.936458\n",
       "c    1.273640\n",
       "d    0.406360\n",
       "e    0.070895\n",
       "dtype: float64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7d43aaef",
   "metadata": {},
   "source": [
    "虽然标记不是数字,仍然可以像数组一样按照位置顺序对它进行索引:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "ac4a2754",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5119086238174924"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "f537d078",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0.511909\n",
       "b    0.936458\n",
       "c    1.273640\n",
       "dtype: float64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[:3]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c8ac0b58",
   "metadata": {},
   "source": [
    "也可以使用布尔值进行索引:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "6645b00b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b    0.936458\n",
       "c    1.273640\n",
       "dtype: float64"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[s > s.median()]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "37e8a602",
   "metadata": {},
   "source": [
    "Series对象还支持与NumPy数组类似的高级索引,同时索引多个元素:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "9f809d87",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "e    0.070895\n",
       "d    0.406360\n",
       "b    0.936458\n",
       "dtype: float64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[[4, 3, 1]]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "759e5aec",
   "metadata": {},
   "source": [
    "一些NumPy函数可以直接作用在Series对象上,返回的结果还是Series对象:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "b08e3728",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1.668473\n",
       "b    2.550930\n",
       "c    3.573839\n",
       "d    1.501342\n",
       "e    1.073468\n",
       "dtype: float64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.exp(s)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "96e4b631",
   "metadata": {},
   "source": [
    "### 像字典一样使用"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2ae10147",
   "metadata": {},
   "source": [
    "Series对象也可以像字典一样的使用,标记就相当于字典的键,可以进行值的查询:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "984ede55",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5119086238174924"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s['a']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "0f14b4bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "s['e'] = 12"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "e9a26f8e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a     0.511909\n",
       "b     0.936458\n",
       "c     1.273640\n",
       "d     0.406360\n",
       "e    12.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6ddcb8db",
   "metadata": {},
   "source": [
    "可以用关键字in查看Series中是否存在某个标记:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "53181c6e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'e' in s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "a59d7ef4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "0 in s"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6a49b429",
   "metadata": {},
   "source": [
    "Series对象也支持用.get()方法索引不存在的标记:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "5682c5ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "s.get('f')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "769f1cae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.get('f', np.nan)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f64e019b",
   "metadata": {},
   "source": [
    "### 数学运算和标记对齐"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87842941",
   "metadata": {},
   "source": [
    "基础的数学运算:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "cfe3365b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a     1.023817\n",
       "b     1.872916\n",
       "c     2.547281\n",
       "d     0.812719\n",
       "e    24.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s + s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "f7da1e41",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a     1.023817\n",
       "b     1.872916\n",
       "c     2.547281\n",
       "d     0.812719\n",
       "e    24.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s * 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "d4fc5d9d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a         1.668473\n",
       "b         2.550930\n",
       "c         3.573839\n",
       "d         1.501342\n",
       "e    162754.791419\n",
       "dtype: float64"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.exp(s)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4fed949d",
   "metadata": {},
   "source": [
    "不过数组与Series对象有一个本质上的区别。数组只有顺序没有标记,而Series对象是有标记的,两个Series对象相加时,会根据标记的值进行对齐操作。\n",
    "\n",
    "例如,`s[1:]`的标记为b到e,而`s[:-1]`的标记为a到d,它们相加时,会首先对两个Series中各自独有的部分补上np.nan,然后再相加,从而得到:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "21fdc61e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b     0.936458\n",
       "c     1.273640\n",
       "d     0.406360\n",
       "e    12.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[1:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "48379986",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0.511909\n",
       "b    0.936458\n",
       "c    1.273640\n",
       "d    0.406360\n",
       "dtype: float64"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[:-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "4ab92ce1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a         NaN\n",
       "b    1.872916\n",
       "c    2.547281\n",
       "d    0.812719\n",
       "e         NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[1:] + s[:-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85a3b0c8",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}