{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reshaping data with `stack` and `unstack`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "%matplotlib inline\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "try:\n", " import seaborn\n", "except ImportError:\n", " pass\n", "\n", "pd.options.display.max_rows = 8" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Case study: air quality data of European monitoring stations (AirBase)\n", "\n", "Going further with the time series case study [test](05 - Time series data.ipynb) on the AirBase (The European Air quality dataBase) data: the actual data downloaded from the Airbase website did not look like a nice csv file (`data/airbase_data.csv`)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "One of the actual downloaded raw data files of AirBase is included in the repo:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "!head -1 ./data/BETR8010000800100hour.1-1-1990.31-12-2012" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Just reading the tab-delimited data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = pd.read_csv(\"data/BETR8010000800100hour.1-1-1990.31-12-2012\", sep='\\t')#, header=None)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above data is clearly not ready to be used! Each row contains the 24 measurements for each hour of the day, and also contains a flag (0/1) indicating the quality of the data." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", " | BETR801 | \n", "
---|---|
1990-01-02 09:00:00 | \n", "48.0 | \n", "
1990-01-02 12:00:00 | \n", "48.0 | \n", "
1990-01-02 13:00:00 | \n", "50.0 | \n", "
1990-01-02 14:00:00 | \n", "55.0 | \n", "
... | \n", "... | \n", "
2012-12-31 20:00:00 | \n", "16.5 | \n", "
2012-12-31 21:00:00 | \n", "14.5 | \n", "
2012-12-31 22:00:00 | \n", "16.5 | \n", "
2012-12-31 23:00:00 | \n", "15.0 | \n", "
170794 rows × 1 columns
\n", "