{ "metadata": { "name": "", "signature": "sha256:ea53907a1fbd9c4a83751c7a4508f5ea83d7e88d76f5ad99f496cf7a7de5e4f0" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Task:**\n", "\n", " continue interactive analysis of time series (AO, NAO indexes)\n", "\n", "**Module:**\n", " \n", " pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Notebook file](http://nbviewer.ipython.org/urls/raw.github.com/koldunovn/earthpy.org/master/content/notebooks/time_series_analysis_with_pandas_part_2.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the [previous part](http://earthpy.org/pandas-basics.html) we looked at very basic ways of work with pandas. Here I am going to introduce couple of more advance tricks. We will use very powerful pandas IO capabilities to create time series directly from the text file, try to create seasonal means with *resample* and multi-year monthly means with *groupby*. At the end I will show how new functionality from the upcoming IPython 2.0 can be used to explore your data more efficiently with sort of a simple GUI (*interact* function). There might be easier or better ways to do some of the things discussed here, and I will be happy to hear about them in comments :) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import usual suspects and change some output formatting:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import numpy as np\n", "%matplotlib inline\n", "pd.set_option('max_rows',15) # this limit maximum numbers of rows" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 76 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also going to download necessary files. their description can be found in the [first part](http://earthpy.org/pandas-basics.html)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "!wget http://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii\n", "!wget http://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/norm.nao.monthly.b5001.current.ascii" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Pandas IO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas is equipped with very rich IO functionality, that allows direct conversion of essentially any text table based data format to Series or DataFrame directly. There is very [good extensive documentation with a lot of examples](http://pandas.pydata.org/pandas-docs/stable/io.html). Here we are going to open AO file in the same way we did in the [first part](http://earthpy.org/pandas-basics.html) and NAO file with pandas io. Then we are going to combine two in one DataFrame." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simple numpy loadtxt, create dates and then Series." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ao = np.loadtxt('monthly.ao.index.b50.current.ascii')\n", "dates = pd.date_range('1950-01', '2014-03', freq='M')\n", "AO = pd.Series(ao[:,2], index=dates)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 55 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's open NAO. First remind ourselves how the file looks like:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!tail norm.nao.monthly.b5001.current.ascii" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 2013 5 0.56906E+00\r\n", " 2013 6 0.52076E+00\r\n", " 2013 7 0.67216E+00\r\n", " 2013 8 0.97019E+00\r\n", " 2013 9 0.24060E+00\r\n", " 2013 10 -0.12801E+01\r\n", " 2013 11 0.90082E+00\r\n", " 2013 12 0.94566E+00\r\n", " 2014 1 0.29026E+00\r\n", " 2014 2 0.13352E+01\r\n" ] } ], "prompt_number": 56 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have 3 space separated columns with two first columns containing years and months. Here is the expression that will create time series out of this file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "NAO = pd.read_table('norm.nao.monthly.b5001.current.ascii', sep='\\s*', \\\n", " parse_dates={'dates':[0, 1]}, header=None, index_col=0, squeeze=True)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 57 }, { "cell_type": "code", "collapsed": false, "input": [ "NAO" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 58, "text": [ "dates\n", "1950-01-15 0.92\n", "1950-02-15 0.40\n", "1950-03-15 -0.36\n", "1950-04-15 0.73\n", "1950-05-15 -0.59\n", "...\n", "2013-09-15 0.24060\n", "2013-10-15 -1.28010\n", "2013-11-15 0.90082\n", "2013-12-15 0.94566\n", "2014-01-15 0.29026\n", "2014-02-15 1.33520\n", "Name: 2, Length: 770" ] } ], "prompt_number": 58 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some explanations:\n", "\n", "* first argument is obviously the file name\n", "* '\\s*' - regular expression, that describe separator.\n", "* parse_dates - combine columns 0 and 1, convert resulting column to dates and give it the name *\"dates\"*\n", "* header - don't use 0 row as header\n", "* index_col - make column 0 (this will be already result of the *parse_dates* parsing)\n", "* squeeze - create Series instead of DataFrame." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we would like to combine AO and NAO Series. But there is a little problem - dates in our two Series are different. Pandas date parser returns time stamps, so it uses present day number (15 in my case) and interpret indexes in NAO as points in time. Similar thing happened with AO series. Its index has monthly frequency, but every value is interpreted as point in time associated with last day of the month. As a consequence simple approach will not work:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "aonao = pd.DataFrame({'AO':AO, 'NAO':NAO})" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 59 }, { "cell_type": "code", "collapsed": false, "input": [ "aonao.head(10)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | AO | \n", "NAO | \n", "
---|---|---|
1950-01-15 | \n", "NaN | \n", "0.92 | \n", "
1950-01-31 | \n", "-0.060310 | \n", "NaN | \n", "
1950-02-15 | \n", "NaN | \n", "0.40 | \n", "
1950-02-28 | \n", "0.626810 | \n", "NaN | \n", "
1950-03-15 | \n", "NaN | \n", "-0.36 | \n", "
1950-03-31 | \n", "-0.008127 | \n", "NaN | \n", "
1950-04-15 | \n", "NaN | \n", "0.73 | \n", "
1950-04-30 | \n", "0.555100 | \n", "NaN | \n", "
1950-05-15 | \n", "NaN | \n", "-0.59 | \n", "
1950-05-31 | \n", "0.071577 | \n", "NaN | \n", "
10 rows \u00d7 2 columns
\n", "\n", " | AO | \n", "NAO | \n", "
---|---|---|
1950-01 | \n", "-0.060310 | \n", "0.92 | \n", "
1950-02 | \n", "0.626810 | \n", "0.40 | \n", "
1950-03 | \n", "-0.008127 | \n", "-0.36 | \n", "
1950-04 | \n", "0.555100 | \n", "0.73 | \n", "
1950-05 | \n", "0.071577 | \n", "-0.59 | \n", "
1950-06 | \n", "0.538570 | \n", "-0.06 | \n", "
1950-07 | \n", "-0.802480 | \n", "-1.26 | \n", "
1950-08 | \n", "-0.851010 | \n", "-0.05 | \n", "
1950-09 | \n", "0.357970 | \n", "0.25 | \n", "
1950-10 | \n", "-0.378900 | \n", "0.85 | \n", "
10 rows \u00d7 2 columns
\n", "\n", " | AO | \n", "NAO | \n", "
---|---|---|
1950Q1 | \n", "0.283250 | \n", "0.660000 | \n", "
1950Q2 | \n", "0.206183 | \n", "-0.073333 | \n", "
1950Q3 | \n", "-0.371640 | \n", "-0.456667 | \n", "
1950Q4 | \n", "-0.178680 | \n", "-0.053333 | \n", "
1951Q1 | \n", "-0.804333 | \n", "-0.080000 | \n", "
5 rows \u00d7 2 columns
\n", "\n", " | AO | \n", "NAO | \n", "
---|---|---|
1950-05-31 | \n", "0.206183 | \n", "-0.073333 | \n", "
1950-08-31 | \n", "-0.371640 | \n", "-0.456667 | \n", "
1950-11-30 | \n", "-0.178680 | \n", "-0.053333 | \n", "
1951-02-28 | \n", "-0.804333 | \n", "-0.080000 | \n", "
1951-05-31 | \n", "-1.191120 | \n", "-0.610000 | \n", "
5 rows \u00d7 2 columns
\n", "\n", " | AO | \n", "NAO | \n", "mon | \n", "
---|---|---|---|
1950-01 | \n", "-0.060310 | \n", "0.92 | \n", "1 | \n", "
1950-02 | \n", "0.626810 | \n", "0.40 | \n", "2 | \n", "
1950-03 | \n", "-0.008127 | \n", "-0.36 | \n", "3 | \n", "
1950-04 | \n", "0.555100 | \n", "0.73 | \n", "4 | \n", "
1950-05 | \n", "0.071577 | \n", "-0.59 | \n", "5 | \n", "
1950-06 | \n", "0.538570 | \n", "-0.06 | \n", "6 | \n", "
1950-07 | \n", "-0.802480 | \n", "-1.26 | \n", "7 | \n", "
1950-08 | \n", "-0.851010 | \n", "-0.05 | \n", "8 | \n", "
1950-09 | \n", "0.357970 | \n", "0.25 | \n", "9 | \n", "
1950-10 | \n", "-0.378900 | \n", "0.85 | \n", "10 | \n", "
1950-11 | \n", "-0.515110 | \n", "-1.26 | \n", "11 | \n", "
1950-12 | \n", "-1.928100 | \n", "-1.02 | \n", "12 | \n", "
1951-01 | \n", "-0.084969 | \n", "0.08 | \n", "1 | \n", "
1951-02 | \n", "-0.399930 | \n", "0.70 | \n", "2 | \n", "
1951-03 | \n", "-1.934100 | \n", "-1.02 | \n", "3 | \n", "
\n", " | ... | \n", "... | \n", "... | \n", "
770 rows \u00d7 3 columns
\n", "