{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 一维数据结构:Series" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Series` 是一维带标记的数组结构,可以存储任意类型的数据(整数,浮点数,字符串,`Python` 对象等等)。\n", "\n", "作为一维结构,它的索引叫做 `index`,基本调用方法为\n", "\n", " s = pd.Series(data, index=index)\n", " \n", "其中,`data` 可以是以下结构:\n", "\n", "- 字典\n", "- `ndarray`\n", "- 标量,例如 `5`\n", "\n", "`index` 是一维坐标轴的索引列表。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 从 ndarray 构建" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果 `data` 是个 `ndarray`,那么 `index` 的长度必须跟 `data` 一致:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a -0.032806\n", "b 0.050207\n", "c -1.909697\n", "d -1.127865\n", "e -0.073793\n", "dtype: float64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series(np.random.randn(5), index=[\"a\", \"b\", \"c\", \"d\", \"e\"])\n", "\n", "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查看 `index`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果 `index` 为空,那么 `index` 会使用 `[0, ..., len(data) - 1]`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 -0.376233\n", "1 -0.474349\n", "2 1.660590\n", "3 0.461434\n", "4 0.190965\n", "dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(np.random.randn(5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 从字典中构造" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果 `data` 是个 `dict`,如果不给定 `index`,那么 `index` 将使用 `dict` 的 `key` 排序之后的结果:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a 0\n", "b 1\n", "c 2\n", "dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {'a' : 0., 'b' : 1., 'c' : 2.}\n", "\n", "pd.Series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果给定了 `index`,那么将会按照 `index` 给定的值作为 `key` 从字典中读取相应的 `value`,如果 `key` 不存在,对应的值为 `NaN`(not a number, `Pandas` 中的缺失默认值):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "b 1\n", "d NaN\n", "a 0\n", "dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(d, index=['b', 'd', 'a'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 从标量值构造" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果 `data` 是标量,那么 `index` 值必须被指定,得到一个值为 `data` 与 `index` 等长的 `Series`:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a 5\n", "b 5\n", "c 5\n", "d 5\n", "e 5\n", "dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 像 ndarray 一样使用 Series" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a -0.032806\n", "b 0.050207\n", "c -1.909697\n", "d -1.127865\n", "e -0.073793\n", "dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "支持数字索引操作:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "-0.032806330572971713" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "切片:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a -0.032806\n", "b 0.050207\n", "c -1.909697\n", "dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`mask` 索引:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a -0.032806\n", "b 0.050207\n", "dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[s > s.median()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "花式索引:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "e -0.073793\n", "d -1.127865\n", "b 0.050207\n", "dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[[4, 3, 1]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "支持 `numpy` 函数:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a 0.967726\n", "b 1.051488\n", "c 0.148125\n", "d 0.323724\n", "e 0.928864\n", "dtype: float64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 像字典一样使用 Series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "也可以像字典一样使用 `Series`:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "-0.032806330572971713" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[\"a\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "修改数值:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a -0.032806\n", "b 0.050207\n", "c -1.909697\n", "d -1.127865\n", "e 12.000000\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[\"e\"] = 12.\n", "\n", "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查询 `key`:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"e\" in s" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"f\" in s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用 `key` 索引时,如果不确定 `key` 在不在里面,可以用 `get` 方法,如果不存在返回 `None` 或者指定的默认值:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.get(\"f\", np.nan)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 向量化操作" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "简单的向量操作与 `ndarray` 的表现一致:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a -0.065613\n", "b 0.100413\n", "c -3.819395\n", "d -2.255729\n", "e 24.000000\n", "dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s + s" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "a -0.065613\n", "b 0.100413\n", "c -3.819395\n", "d -2.255729\n", "e 24.000000\n", "dtype: float64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s * 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "但 `Series` 和 `ndarray` 不同的地方在于,`Series` 的操作默认是使用 `index` 的值进行对齐的,而不是相对位置:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "a NaN\n", "b 0.100413\n", "c -3.819395\n", "d -2.255729\n", "e NaN\n", "dtype: float64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[1:] + s[:-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于上面两个不能完全对齐的 `Series`,结果的 `index` 是两者 `index` 的并集,同时不能对齐的部分当作缺失值处理。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Name 属性" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以在定义时指定 `name` 属性:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'something'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series(np.random.randn(5), name='something')\n", "s.name" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }