{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 11.6 Resampling and Frequency Conversion(重采样和频度转换)\n", "\n", "重采样(Resampling)指的是把时间序列的频度变为另一个频度的过程。把高频度的数据变为低频度叫做降采样(downsampling),把低频度变为高频度叫做增采样(upsampling)。并不是所有的重采样都会落入上面这几个类型,例如,把W-WED(weekly on Wednesday)变为W-FRI,既不属于降采样,也不属于增采样。\n", "\n", "pandas对象自带resampe方法,用于所有的频度变化。resample有一个和groupby类似的API;我们可以用resample来对数据进行分组,然后调用聚合函数(aggregation function):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "rng = pd.date_range('2000-01-01', periods=100, freq='D')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 0.141136\n", "2000-01-02 0.955511\n", "2000-01-03 -0.334537\n", "2000-01-04 0.927611\n", "2000-01-05 0.522567\n", "2000-01-06 0.843023\n", "2000-01-07 0.108661\n", "2000-01-08 0.805668\n", "2000-01-09 -0.470524\n", "2000-01-10 1.162150\n", "2000-01-11 -0.754087\n", "2000-01-12 -1.846421\n", "2000-01-13 -0.322607\n", "2000-01-14 0.769992\n", "2000-01-15 -0.596838\n", "2000-01-16 0.865629\n", "2000-01-17 -0.394363\n", "2000-01-18 1.050334\n", "2000-01-19 0.203739\n", "2000-01-20 0.112178\n", "2000-01-21 -1.858528\n", "2000-01-22 0.921361\n", "2000-01-23 -1.034003\n", "2000-01-24 -0.319369\n", "2000-01-25 0.626385\n", "2000-01-26 2.319831\n", "2000-01-27 0.640064\n", "2000-01-28 0.762187\n", "2000-01-29 -0.053246\n", "2000-01-30 0.500993\n", " ... \n", "2000-03-11 -1.036658\n", "2000-03-12 0.569500\n", "2000-03-13 -0.279623\n", "2000-03-14 -1.593708\n", "2000-03-15 -1.552634\n", "2000-03-16 0.983931\n", "2000-03-17 0.269289\n", "2000-03-18 0.870814\n", "2000-03-19 1.642178\n", "2000-03-20 -0.109097\n", "2000-03-21 -1.891613\n", "2000-03-22 -1.867747\n", "2000-03-23 -0.173888\n", "2000-03-24 0.879418\n", "2000-03-25 0.814583\n", "2000-03-26 -1.683395\n", "2000-03-27 -0.141228\n", "2000-03-28 0.392206\n", "2000-03-29 -1.288983\n", "2000-03-30 1.052897\n", "2000-03-31 -0.297663\n", "2000-04-01 1.050265\n", "2000-04-02 -0.072390\n", "2000-04-03 1.482098\n", "2000-04-04 -0.276297\n", "2000-04-05 0.686525\n", "2000-04-06 1.368484\n", "2000-04-07 0.294756\n", "2000-04-08 1.237246\n", "2000-04-09 1.372567\n", "Freq: D, dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts = pd.Series(np.random.randn(len(rng)), index=rng)\n", "ts" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2000-01-31 -0.207554\n", "2000-02-29 0.299003\n", "2000-03-31 -0.095402\n", "2000-04-30 -0.146846\n", "Freq: M, dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts.resample('M').mean()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2000-01 0.210165\n", "2000-02 -0.051811\n", "2000-03 -0.131131\n", "2000-04 0.793695\n", "Freq: M, dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts.resample('M', kind='period').mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "resample是一个灵活且高效的方法,可以用于处理大量的时间序列。下面是一些相关的选项:\n", "\n", "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/nwc0s.png)\n", "\n", "# 1 Downsampling(降采样)\n", "\n", "把数据聚合为规律、低频度是一个很普通的时间序列任务。用于处理的数据不必是有固定频度的;我们想要设定的频度会定义箱界(bin edges),根据bin edges会把时间序列分割为多个片段,然后进行聚合。例如,转换为月度,比如'M'或'BM',我们需要把数据以月为间隔进行切割。每一个间隔都是半开放的(half-open);一个数据点只能属于一个间隔,所有间隔的合集,构成整个时间范围(time frame)。当使用resample去降采样数据的时候,有很多事情需要考虑:\n", "\n", "- 在每个间隔里,哪一边要闭合\n", "- 怎样对每一个聚合的bin贴标签,可以使用间隔的开始或结束\n", "\n", "为了演示一下,下面用一个一分钟的数据来举例:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rng = pd.date_range('2000-01-01', periods=12, freq='T')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 00:00:00 0\n", "2000-01-01 00:01:00 1\n", "2000-01-01 00:02:00 2\n", "2000-01-01 00:03:00 3\n", "2000-01-01 00:04:00 4\n", "2000-01-01 00:05:00 5\n", "2000-01-01 00:06:00 6\n", "2000-01-01 00:07:00 7\n", "2000-01-01 00:08:00 8\n", "2000-01-01 00:09:00 9\n", "2000-01-01 00:10:00 10\n", "2000-01-01 00:11:00 11\n", "Freq: T, dtype: int64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts = pd.Series(np.arange(12), index=rng)\n", "ts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设我们想要按5分钟一个数据块来进行聚合,然后对每一个组计算总和:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1999-12-31 23:55:00 0\n", "2000-01-01 00:00:00 15\n", "2000-01-01 00:05:00 40\n", "2000-01-01 00:10:00 11\n", "Freq: 5T, dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts.resample('5min', closed='right').sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们传入的频度定义了每个bin的边界按5分钟递增。默认,bin的左边界是闭合的,所以`00:00`值是属于`00:00`到`00:05`间隔的。设定closed='right',会让间隔的右边闭合:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1999-12-31 23:55:00 0\n", "2000-01-01 00:00:00 15\n", "2000-01-01 00:05:00 40\n", "2000-01-01 00:10:00 11\n", "Freq: 5T, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts.resample('5min', closed='right').sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "默认,每一个bin的左边的时间戳,会被用来作为结果里时间序列的标签。通过设置label='right',我们可以使用bin右边的时间戳来作为标签:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2000-01-01 00:00:00 0\n", "2000-01-01 00:05:00 15\n", "2000-01-01 00:10:00 40\n", "2000-01-01 00:15:00 11\n", "Freq: 5T, dtype: int64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts.resample('5min', closed='right', label='right').sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以看下图方便理解:\n", "\n", "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/d770h.png)\n", "\n", "最后,我们可能想要对结果的索引进行位移,比如在右边界减少一秒。想要实现的话,传递一个字符串或日期偏移给loffset:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1999-12-31 23:59:59 0\n", "2000-01-01 00:04:59 15\n", "2000-01-01 00:09:59 40\n", "2000-01-01 00:14:59 11\n", "Freq: 5T, dtype: int64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts.resample('5min', closed='right', \n", " label='right', loffset='-1s').sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们也可以使用shift方法来实现上面loffset的效果。\n", "\n", "### Open-High-Low-Close (OHLC) resampling(股价图重取样)\n", "\n", "> Open-High-Low-Close: 开盘-盘高-盘低-收盘图;股票图;股价图\n", "\n", "在经济界,一个比较流行的用法,是对时间序列进行聚合,计算每一个桶(bucket)里的四个值:first(open),last(close),maximum(high),minimal(low),即开盘-收盘-盘高-盘低,四个值。使用ohlc聚合函数可以得到这四个聚合结果:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
openhighlowclose
2000-01-01 00:00:000404
2000-01-01 00:05:005959
2000-01-01 00:10:0010111011
\n", "
" ], "text/plain": [ " open high low close\n", "2000-01-01 00:00:00 0 4 0 4\n", "2000-01-01 00:05:00 5 9 5 9\n", "2000-01-01 00:10:00 10 11 10 11" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts.resample('5min').ohlc()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2 Upsampling and Interpolation(增采样和插值)\n", "\n", "把一个低频度转换为高频度,是不需要进行聚合的。下面是一个有周数据的DataFrame:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2000-01-050.1383551.8815170.6553671.496932
2000-01-12-1.125212-0.8243370.803721-0.672660
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000-01-05 0.138355 1.881517 0.655367 1.496932\n", "2000-01-12 -1.125212 -0.824337 0.803721 -0.672660" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame = pd.DataFrame(np.random.randn(2, 4),\n", " index=pd.date_range('1/1/2000', periods=2,\n", " freq='W-WED'),\n", " columns=['Colorado', 'Texas', 'New York', 'Ohio'])\n", "frame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "当我们对这个数据进行聚合的的时候,每个组只有一个值,以及gap(间隔)之间的缺失值。在不使用任何聚合函数的情况下,我们使用asfreq方法将其转换为高频度:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2000-01-050.1383551.8815170.6553671.496932
2000-01-06NaNNaNNaNNaN
2000-01-07NaNNaNNaNNaN
2000-01-08NaNNaNNaNNaN
2000-01-09NaNNaNNaNNaN
2000-01-10NaNNaNNaNNaN
2000-01-11NaNNaNNaNNaN
2000-01-12-1.125212-0.8243370.803721-0.672660
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000-01-05 0.138355 1.881517 0.655367 1.496932\n", "2000-01-06 NaN NaN NaN NaN\n", "2000-01-07 NaN NaN NaN NaN\n", "2000-01-08 NaN NaN NaN NaN\n", "2000-01-09 NaN NaN NaN NaN\n", "2000-01-10 NaN NaN NaN NaN\n", "2000-01-11 NaN NaN NaN NaN\n", "2000-01-12 -1.125212 -0.824337 0.803721 -0.672660" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_daily = frame.resample('D').asfreq()\n", "df_daily" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设我们想要用每周的值来填写非周三的部分。这种方法叫做填充(filling)或插值(interpolation),可以使用fillna或reindex方法来实现重采样:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2000-01-050.1383551.8815170.6553671.496932
2000-01-060.1383551.8815170.6553671.496932
2000-01-070.1383551.8815170.6553671.496932
2000-01-080.1383551.8815170.6553671.496932
2000-01-090.1383551.8815170.6553671.496932
2000-01-100.1383551.8815170.6553671.496932
2000-01-110.1383551.8815170.6553671.496932
2000-01-12-1.125212-0.8243370.803721-0.672660
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000-01-05 0.138355 1.881517 0.655367 1.496932\n", "2000-01-06 0.138355 1.881517 0.655367 1.496932\n", "2000-01-07 0.138355 1.881517 0.655367 1.496932\n", "2000-01-08 0.138355 1.881517 0.655367 1.496932\n", "2000-01-09 0.138355 1.881517 0.655367 1.496932\n", "2000-01-10 0.138355 1.881517 0.655367 1.496932\n", "2000-01-11 0.138355 1.881517 0.655367 1.496932\n", "2000-01-12 -1.125212 -0.824337 0.803721 -0.672660" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame.resample('D').ffill()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以选择只对一部分的周期进行填写:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2000-01-050.1383551.8815170.6553671.496932
2000-01-060.1383551.8815170.6553671.496932
2000-01-070.1383551.8815170.6553671.496932
2000-01-08NaNNaNNaNNaN
2000-01-09NaNNaNNaNNaN
2000-01-10NaNNaNNaNNaN
2000-01-11NaNNaNNaNNaN
2000-01-12-1.125212-0.8243370.803721-0.672660
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000-01-05 0.138355 1.881517 0.655367 1.496932\n", "2000-01-06 0.138355 1.881517 0.655367 1.496932\n", "2000-01-07 0.138355 1.881517 0.655367 1.496932\n", "2000-01-08 NaN NaN NaN NaN\n", "2000-01-09 NaN NaN NaN NaN\n", "2000-01-10 NaN NaN NaN NaN\n", "2000-01-11 NaN NaN NaN NaN\n", "2000-01-12 -1.125212 -0.824337 0.803721 -0.672660" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame.resample('D').ffill(limit=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "注意,新的日期索引不能与旧的有重叠:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2000-01-060.1383551.8815170.6553671.496932
2000-01-13-1.125212-0.8243370.803721-0.672660
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000-01-06 0.138355 1.881517 0.655367 1.496932\n", "2000-01-13 -1.125212 -0.824337 0.803721 -0.672660" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame.resample('W-THU').ffill()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3 Resampling with Periods(对周期进行重采样)\n", "\n", "对周期的索引进行重采样的过程,与之前时间戳的方法相似:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2000-011.4510950.236027-1.1147851.245450
2000-021.720449-0.724853-1.8706761.089338
2000-030.411774-0.7859791.7490240.164739
2000-04-1.549051-0.0507220.002775-1.606657
2000-051.0119980.149377-1.6082620.992927
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000-01 1.451095 0.236027 -1.114785 1.245450\n", "2000-02 1.720449 -0.724853 -1.870676 1.089338\n", "2000-03 0.411774 -0.785979 1.749024 0.164739\n", "2000-04 -1.549051 -0.050722 0.002775 -1.606657\n", "2000-05 1.011998 0.149377 -1.608262 0.992927" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame = pd.DataFrame(np.random.randn(24, 4),\n", " index=pd.period_range('1-2000', '12-2001',\n", " freq='M'),\n", " columns=['Colorado', 'Texas', 'New York', 'Ohio'])\n", "frame[:5]" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
20000.208662-0.109971-0.2334640.138465
2001-0.4019460.368050-0.209196-0.155851
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000 0.208662 -0.109971 -0.233464 0.138465\n", "2001 -0.401946 0.368050 -0.209196 -0.155851" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annual_frame = frame.resample('A-DEC').mean()\n", "annual_frame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "增采样需要考虑的要多一些,比如在重采样前,选择哪一个时间跨度作为结束,就像asfreq方法那样。convertion参数默认是'start',但也能用'end':" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2000Q10.208662-0.109971-0.2334640.138465
2000Q20.208662-0.109971-0.2334640.138465
2000Q30.208662-0.109971-0.2334640.138465
2000Q40.208662-0.109971-0.2334640.138465
2001Q1-0.4019460.368050-0.209196-0.155851
2001Q2-0.4019460.368050-0.209196-0.155851
2001Q3-0.4019460.368050-0.209196-0.155851
2001Q4-0.4019460.368050-0.209196-0.155851
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000Q1 0.208662 -0.109971 -0.233464 0.138465\n", "2000Q2 0.208662 -0.109971 -0.233464 0.138465\n", "2000Q3 0.208662 -0.109971 -0.233464 0.138465\n", "2000Q4 0.208662 -0.109971 -0.233464 0.138465\n", "2001Q1 -0.401946 0.368050 -0.209196 -0.155851\n", "2001Q2 -0.401946 0.368050 -0.209196 -0.155851\n", "2001Q3 -0.401946 0.368050 -0.209196 -0.155851\n", "2001Q4 -0.401946 0.368050 -0.209196 -0.155851" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Q-DEC: Quarterly, year ending in December\n", "annual_frame.resample('Q-DEC').ffill()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2000Q40.208662-0.109971-0.2334640.138465
2001Q10.208662-0.109971-0.2334640.138465
2001Q20.208662-0.109971-0.2334640.138465
2001Q30.208662-0.109971-0.2334640.138465
2001Q4-0.4019460.368050-0.209196-0.155851
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000Q4 0.208662 -0.109971 -0.233464 0.138465\n", "2001Q1 0.208662 -0.109971 -0.233464 0.138465\n", "2001Q2 0.208662 -0.109971 -0.233464 0.138465\n", "2001Q3 0.208662 -0.109971 -0.233464 0.138465\n", "2001Q4 -0.401946 0.368050 -0.209196 -0.155851" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annual_frame.resample('Q-DEC', convention='end').ffill()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "增采样和降采样的规则更严格一些:\n", "\n", "- 降采样中,目标频度必须是原频度的子周期(subperiod)\n", "- 增采样中,目标频度必须是原频度的母周期(superperiod)\n", "\n", "如果不满足上面的规则,会报错。主要会影响到季度,年度,周度频度;例如,用Q-MAR定义的时间跨度只与A-MAR, A-JUN, A-SEP, A-DEC进行对齐(line up with):" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ColoradoTexasNew YorkOhio
2000Q40.208662-0.109971-0.2334640.138465
2001Q10.208662-0.109971-0.2334640.138465
2001Q20.208662-0.109971-0.2334640.138465
2001Q30.208662-0.109971-0.2334640.138465
2001Q4-0.4019460.368050-0.209196-0.155851
2002Q1-0.4019460.368050-0.209196-0.155851
2002Q2-0.4019460.368050-0.209196-0.155851
2002Q3-0.4019460.368050-0.209196-0.155851
\n", "
" ], "text/plain": [ " Colorado Texas New York Ohio\n", "2000Q4 0.208662 -0.109971 -0.233464 0.138465\n", "2001Q1 0.208662 -0.109971 -0.233464 0.138465\n", "2001Q2 0.208662 -0.109971 -0.233464 0.138465\n", "2001Q3 0.208662 -0.109971 -0.233464 0.138465\n", "2001Q4 -0.401946 0.368050 -0.209196 -0.155851\n", "2002Q1 -0.401946 0.368050 -0.209196 -0.155851\n", "2002Q2 -0.401946 0.368050 -0.209196 -0.155851\n", "2002Q3 -0.401946 0.368050 -0.209196 -0.155851" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annual_frame.resample('Q-MAR').ffill()" ] } ], "metadata": { "kernelspec": { "display_name": "Python [py35]", "language": "python", "name": "Python [py35]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }