{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 11.4 Time Zone Handling(时区处理)\n", "\n", "> 格林威治标准时间GMT\n", "\n", "> 十七世纪,格林威治皇家天文台为了海上霸权的扩张计画而进行天体观测。1675年旧皇家观测所(Old Royal Observatory) 正式成立,到了1884年决定以通过格林威治的子午线作为划分地球东西两半球的经度零度。观测所门口墙上有一个标志24小时的时钟,显示当下的时间,对全球而言,这里所设定的时间是世界时间参考点,全球都以格林威治的时间作为标准来设定时间,这就是我们耳熟能详的「格林威治标准时间」(Greenwich Mean Time,简称G.M.T.)的由来,标示在手表上,则代表此表具有两地时间功能,也就是同时可以显示原居地和另一个国度的时间。\n", " \n", " \n", "> 世界协调时间UTC\n", "\n", "> 多数的两地时间表都以GMT来表示,但也有些两地时间表上看不到GMT字样,出现的反而是UTC这3个英文字母,究竟何谓UTC?事实上,UTC指的是Coordinated Universal Time- 世界协调时间(又称世界标准时间、世界统一时间),是经过平均太阳时(以格林威治时间GMT为准)、地轴运动修正后的新时标以及以「秒」为单位的国际原子时所综合精算而成的时间,计算过程相当严谨精密,因此若以「世界标准时间」的角度来说,UTC比GMT来得更加精准。其误差值必须保持在0.9秒以内,若大于0.9秒则由位于巴黎的国际地球自转事务中央局发布闰秒,使UTC与地球自转周期一致。所以基本上UTC的本质强调的是比GMT更为精确的世界时间标准,不过对于现行表款来说,GMT与UTC的功能与精确度是没有差别的。\n", "\n", "时区可以理解为UTC的偏移(offset),例如,在夏令时,纽约时间落后于UTC时间四个小时,而在一年的其他时间里,纽约时间落后于UTC时间五个小时。\n", "\n", "在python中,时区信息来自第三方的pytz库,这个库利用的是奥尔森数据库,这个数据库汇集了世界时区信息。这个信息对于历史数据很重要,因为夏令时(daylight saving time,DST)的交接日(transition date)取决于当地政府的心血来潮。在美国,自1900年后,夏令时的交接日已经被改了很多次。\n", "\n", "关于pytz库的更多信息,需要查看相关的文档。本书中pandas包含了一些pytz的功能,除了时区的名字,其他的API都不用去查。时区名字可以通过下面的方法获得:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pytz" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pytz.common_timezones[-5:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "想要从pytz中得到一个时区对象(time zone object),使用pytz.timezone:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tz = pytz.timezone('America/New_York')\n", "tz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1 Time Zone Localization and Conversion(时区定位和转换)\n", "\n", "默认的,pandas中的时间序列是time zone naive(朴素时区)。例如,考虑下面的时间序列:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "rng = pd.date_range('3/9/2012 9:30', periods=6, freq='D')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2012-03-09 09:30:00 1.001642\n", "2012-03-10 09:30:00 -1.277818\n", "2012-03-11 09:30:00 0.481214\n", "2012-03-12 09:30:00 0.738525\n", "2012-03-13 09:30:00 0.396482\n", "2012-03-14 09:30:00 -0.269782\n", "Freq: D, dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts = pd.Series(np.random.randn(len(rng)), index=rng)\n", "ts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "索引的tz部分是None:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "print(ts.index.tz)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "日期范围也能通过时区集合(time zone set)来创建:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',\n", " '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',\n", " '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',\n", " '2012-03-15 09:30:00+00:00', '2012-03-16 09:30:00+00:00',\n", " '2012-03-17 09:30:00+00:00', '2012-03-18 09:30:00+00:00'],\n", " dtype='datetime64[ns, UTC]', freq='D')" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.date_range('3/9/2012 9:30', periods=10, freq='D', tz='UTC')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用tz_localize方法,可以实现从朴素到本地化(naive to localized)的转变:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2012-03-09 09:30:00 1.001642\n", "2012-03-10 09:30:00 -1.277818\n", "2012-03-11 09:30:00 0.481214\n", "2012-03-12 09:30:00 0.738525\n", "2012-03-13 09:30:00 0.396482\n", "2012-03-14 09:30:00 -0.269782\n", "Freq: D, dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2012-03-09 09:30:00+00:00 1.001642\n", "2012-03-10 09:30:00+00:00 -1.277818\n", "2012-03-11 09:30:00+00:00 0.481214\n", "2012-03-12 09:30:00+00:00 0.738525\n", "2012-03-13 09:30:00+00:00 0.396482\n", "2012-03-14 09:30:00+00:00 -0.269782\n", "Freq: D, dtype: float64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts_utc = ts.tz_localize('UTC')\n", "ts_utc" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',\n", " '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',\n", " '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00'],\n", " dtype='datetime64[ns, UTC]', freq='D')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts_utc.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "一旦时间序列被定位到某个时区,那么它就可以被转换为任何其他时区,使用tz_convert:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2012-03-09 04:30:00-05:00 1.001642\n", "2012-03-10 04:30:00-05:00 -1.277818\n", "2012-03-11 05:30:00-04:00 0.481214\n", "2012-03-12 05:30:00-04:00 0.738525\n", "2012-03-13 05:30:00-04:00 0.396482\n", "2012-03-14 05:30:00-04:00 -0.269782\n", "Freq: D, dtype: float64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts_utc.tz_convert('America/New_York')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在处理时间序列的时候,我们可以先把时间定位到纽约时间,然后转换到柏林时间:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ts_eastern = ts.tz_localize('America/New_York')" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2012-03-09 14:30:00+00:00 1.001642\n", "2012-03-10 14:30:00+00:00 -1.277818\n", "2012-03-11 13:30:00+00:00 0.481214\n", "2012-03-12 13:30:00+00:00 0.738525\n", "2012-03-13 13:30:00+00:00 0.396482\n", "2012-03-14 13:30:00+00:00 -0.269782\n", "Freq: D, dtype: float64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts_eastern.tz_convert('UTC')" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2012-03-09 15:30:00+01:00 1.001642\n", "2012-03-10 15:30:00+01:00 -1.277818\n", "2012-03-11 14:30:00+01:00 0.481214\n", "2012-03-12 14:30:00+01:00 0.738525\n", "2012-03-13 14:30:00+01:00 0.396482\n", "2012-03-14 14:30:00+01:00 -0.269782\n", "Freq: D, dtype: float64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts_eastern.tz_convert('Europe/Berlin')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "tz_localize和tz_convert也是DatetimeIndex上的实例方法(instance methods):" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2012-03-09 09:30:00+08:00', '2012-03-10 09:30:00+08:00',\n", " '2012-03-11 09:30:00+08:00', '2012-03-12 09:30:00+08:00',\n", " '2012-03-13 09:30:00+08:00', '2012-03-14 09:30:00+08:00'],\n", " dtype='datetime64[ns, Asia/Shanghai]', freq='D')" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts.index.tz_localize('Asia/Shanghai')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "讲朴素的时间戳进行本地化,还会检查夏令时转换期附近是否有模糊的或不存在的时间。\n", "\n", "# 2 Operations with Time Zone−Aware Timestamp Objects(时区的操作-意识到时间戳对象)\n", "\n", "和时间序列或日期范围(date ranges)相似,单独的Timestamp object(时间戳对象)也能从朴素(即无时区)本地化为有时区的日期,然后就可以转换为其他时区了:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stamp = pd.Timestamp('2011-03-12 04:00')" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stamp_utc = stamp.tz_localize('utc')" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Timestamp('2011-03-11 23:00:00-0500', tz='America/New_York')" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stamp_utc.tz_convert('America/New_York')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在创建Timestamp的时候,我们可以传递一个时区:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Timestamp('2011-03-12 04:00:00+0300', tz='Europe/Moscow')" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stamp_moscow = pd.Timestamp('2011-03-12 04:00', tz='Europe/Moscow')\n", "stamp_moscow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "有时区的Timestamp对象内部存储了一个UTC时间戳,这个值是从Unix纪元(即1907年1月1日)到现在的纳秒;这个UTC值在即使换了不同的时区,也是不变的:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1299902400000000000" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stamp_utc.value" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1299902400000000000" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stamp_utc.tz_convert('America/New_York').value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在使用pandas的DateOffset对象进行算数运算的时候,如果夏令时存在,pandas也会考虑进去。这里我们构建一个时间戳,正好出现在夏令时转换前。首先,在变为夏令时的前30分钟:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pandas.tseries.offsets import Hour" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Timestamp('2012-03-12 01:30:00-0400', tz='US/Eastern')" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stamp = pd.Timestamp('2012-03-12 01:30', tz='US/Eastern')\n", "stamp" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Timestamp('2012-03-12 02:30:00-0400', tz='US/Eastern')" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stamp + Hour()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "变为夏令时的90分钟前:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Timestamp('2012-11-04 00:30:00-0400', tz='US/Eastern')" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stamp = pd.Timestamp('2012-11-04 00:30', tz='US/Eastern')\n", "stamp" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Timestamp('2012-11-04 01:30:00-0500', tz='US/Eastern')" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stamp + 2 * Hour()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3 Operations Between Diferent Time Zones(不同时区间的运算)\n", "\n", "如果两个不同时区的时间序列被合并,那么结果为UTC。因为时间戳是以UTC为背后机制的,这种变化是直接的,不需要手动转换:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rng = pd.date_range('3/7/2012 9:30', periods=10, freq='B')" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2012-03-07 09:30:00 0.857427\n", "2012-03-08 09:30:00 -0.985773\n", "2012-03-09 09:30:00 -0.037836\n", "2012-03-12 09:30:00 -1.561366\n", "2012-03-13 09:30:00 0.195092\n", "2012-03-14 09:30:00 -0.182154\n", "2012-03-15 09:30:00 0.629671\n", "2012-03-16 09:30:00 -1.351815\n", "2012-03-19 09:30:00 -1.054486\n", "2012-03-20 09:30:00 0.072799\n", "Freq: B, dtype: float64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts = pd.Series(np.random.randn(len(rng)), index=rng)\n", "ts" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ts1 = ts[:7].tz_localize('Europe/London')\n", "ts2 = ts1[2:].tz_convert('Europe/Moscow')\n", "result = ts1 + ts2" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',\n", " '2012-03-09 09:30:00+00:00', '2012-03-12 09:30:00+00:00',\n", " '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',\n", " '2012-03-15 09:30:00+00:00'],\n", " dtype='datetime64[ns, UTC]', freq='B')" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.index" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [py35]", "language": "python", "name": "Python [py35]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }