{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 11.2 Time Series Basics(时间序列基础)\n", "\n", "在pandas中,一个基本的时间序列对象,是一个用时间戳作为索引的Series,在pandas外部的话,通常是用python 字符串或datetime对象来表示的:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from datetime import datetime" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),\n", " datetime(2011, 1, 7), datetime(2011, 1, 8), \n", " datetime(2011, 1, 10), datetime(2011, 1, 12)]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2011-01-02 0.384868\n", "2011-01-05 0.669181\n", "2011-01-07 2.553288\n", "2011-01-08 -1.808783\n", "2011-01-10 1.180570\n", "2011-01-12 -0.928942\n", "dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts = pd.Series(np.random.randn(6), index=dates)\n", "ts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上面的转化原理是,datetime对象被放进了DatetimeIndex:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',\n", " '2011-01-10', '2011-01-12'],\n", " dtype='datetime64[ns]', freq=None)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "像其他的Series一行,数值原色会自动按时间序列索引进行对齐:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2011-01-02 0.384868\n", "2011-01-07 2.553288\n", "2011-01-10 1.180570\n", "dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts[::2]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2011-01-02 0.769735\n", "2011-01-05 NaN\n", "2011-01-07 5.106575\n", "2011-01-08 NaN\n", "2011-01-10 2.361140\n", "2011-01-12 NaN\n", "dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts + ts[::2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ts[::2]会在ts中,每隔两个元素选一个元素。\n", "\n", "pandas中的时间戳,是按numpy中的datetime64数据类型进行保存的,可以精确到纳秒的级别:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "dtype('\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2001-05-02-0.4775170.7226850.337141-0.345072
2001-05-09-0.401860-0.4758210.685129-0.809288
2001-05-161.9005410.348590-0.805042-0.410077
2001-05-23-0.2208701.654666-0.846395-0.207802
2001-05-302.094319-0.9725881.276059-1.056146
\n", "" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2001-05-02 -0.477517 0.722685 0.337141 -0.345072\n", "2001-05-09 -0.401860 -0.475821 0.685129 -0.809288\n", "2001-05-16 1.900541 0.348590 -0.805042 -0.410077\n", "2001-05-23 -0.220870 1.654666 -0.846395 -0.207802\n", "2001-05-30 2.094319 -0.972588 1.276059 -1.056146" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "long_df.loc['5-2001']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2 Time Series with Duplicate Indices(重复索引的时间序列)\n", "\n", "在某些数据中,可能会遇到多个数据在同一时间戳下的情况:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', \n", " '1/2/2000', '1/3/2000'])" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 0\n", "2000-01-02 1\n", "2000-01-02 2\n", "2000-01-02 3\n", "2000-01-03 4\n", "dtype: int64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dup_ts = pd.Series(np.arange(5), index=dates)\n", "dup_ts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们通过is_unique属性来查看index是否是唯一值:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dup_ts.index.is_unique" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对这个时间序列取索引的的话, 要么得到标量,要么得到切片,这取决于时间戳是否是重复的:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dup_ts['1/3/2000'] # not duplicated" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2000-01-02 1\n", "2000-01-02 2\n", "2000-01-02 3\n", "dtype: int64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dup_ts['1/2/2000'] # duplicated" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设我们想要聚合那些有重复时间戳的数据,一种方法是用groupby,设定level=0:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 0\n", "2000-01-02 2\n", "2000-01-03 4\n", "dtype: int64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped = dup_ts.groupby(level=0)\n", "grouped.mean()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 1\n", "2000-01-02 3\n", "2000-01-03 1\n", "dtype: int64" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.count()" ] } ], "metadata": { "kernelspec": { "display_name": "Python [py35]", "language": "python", "name": "Python [py35]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }