{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "
\n", "\"Unidata\n", "
\n", "\n", "

Basic Time Series Plotting

\n", "

Unidata Python Workshop

\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\"METAR\"
\n", "\n", "\n", "## Overview:\n", "\n", "* **Teaching:** 45 minutes\n", "* **Exercises:** 30 minutes\n", "\n", "### Questions\n", "1. How can we read data with Pandas?\n", "1. How are plots created in Python?\n", "1. What features does Matplotlib have for improving our time series plots?\n", "1. How can multiple y-axes be used in a single plot?\n", "\n", "### Objectives\n", "1. Reading in data\n", "1. Basic timeseries plotting\n", "1. Multiple y-axes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Reading in Data\n", "To learn about time series analysis, we first need to find some data and get it into Python. In this case we're going to use a file that was downloaded from the [National Data Buoy Center](http://www.ndbc.noaa.gov). Specially we're going to look at [buoy 41056](http://www.ndbc.noaa.gov/station_page.php?station=41056) as hurricane Irma passed over it.\n", "\n", "We'll use the [pandas](http://pandas.pydata.org) library for our data reading and modification as it provides a convenient way to subset and manipulate data. The data does not come in an easily usable format from the NDBC, so it's a good chance to get our hands dirty with real world data manipulation and time series plotting.\n", "\n", "First, let's start out by reading the text file into a pandas dataframe. If we look at the file we can see it's in a \"fixed-width\" format - i.e. each column has the same number of characters always.\n", "\n", "```\n", "#YY MM DD hh mm WDIR WSPD GST WVHT DPD APD MWD PRES ATMP WTMP DEWP VIS PTDY TIDE\n", "#yr mo dy hr mn degT m/s m/s m sec sec degT hPa degC degC degC nmi hPa ft\n", "2017 09 21 19 00 140 8.0 11.0 1.1 6 MM 93 1009.0 28.5 MM MM MM -1.0 MM\n", "2017 09 21 18 00 140 8.0 10.0 1.1 6 MM 90 1009.5 28.6 MM MM MM -1.3 MM\n", "2017 09 21 17 00 150 8.0 11.0 1.2 7 MM 90 1010.1 28.6 MM MM MM -0.4 MM\n", "2017 09 21 16 00 130 8.0 11.0 1.1 6 MM 89 1010.0 28.5 MM MM MM -0.4 MM\n", "2017 09 21 15 00 140 9.0 11.0 1.1 6 MM 109 1010.8 28.8 MM MM MM +1.0 MM\n", "```\n", "\n", "The data columns are year, month, day, hour, minute, wind direction, wind speed, wind gust, wave height, dominant wave period, domininant wave direction, pressure, air temperature, water temperature, dewpoint, visibility, pressure tendency, and tide. As you can see, this buoy does not have all of those sensors, so some columns are filled with `MM`, representing missing data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fname = '41056.txt'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "df = pd.read_fwf(fname)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Getting the data read was pretty easy, but we immediatly see that we've got some cleanup to do. The header row contains column names that are less than ideal. The first data row is actually a row of units as well. We also notice that the date is broken up between multiple columns. It would be nice to have that as one timestamp that is a Python datetime object. Finally, we need to replace `MM` with `NaN`. Luckily these tasks are not too onerous with pandas." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Much better column names, remember to be descriptive and use tab completion when using these!\n", "col_names = ['year', 'month', 'day', 'hour', 'minute', 'wind_direction', 'wind_speed',\n", " 'wind_gust', 'wave_height', 'dominant_wave_period', 'average_wave_period',\n", " 'dominant_wave_direction', 'pressure', 'temperature', 'water_temperature', 'dewpoint',\n", " 'visibility', '3hr_pressure_tendency', 'water_level_above_mean']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_fwf(fname, skiprows=2, na_values='MM', names=col_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While we're manupulating the data frame, let's get rid of the columns with all missing data. We could use the `drop` method and manually name all of the columns, but that would require us to know which are all `NaN` and that sounds like manual labor - something that programmers hate. Pandas has the `dropna` method that allows us to drop rows or columns where any or all values are `NaN`. In this case, let's drop all columns with all `NaN` values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = df.dropna(axis='columns', how='all')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's get the time stamps fixed up nicely. We need to combine the columns `year` `month` `day` `hour` and `minute` into a single column called `time`. We could cast all of these columns as strings, build the date time stamp string, then parse that, but that's a lot of steps! Looking in the documentation, we see that `parse_dates` can do all that for us. Here's an example of combining the `year` and `month` columns." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['time'] = pd.to_datetime(df[['year', 'month', 'day']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our table now includes a 'time' column of timestamps. But the timestamps only include the date, and we have observations every hour - let's read the data in again, this time using all of the time stamp columns." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['time'] = pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']])\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " EXERCISE:\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Your code goes here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " TIP:\n", " Many of the pandas functions have the inplace keyword argument. This allows us to modify the dataframe without continually needing to reassign it. df = df.command(...) becomes df.command(..., inplace=True).\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", "
\n",
    "\n",
    "# Using inplace means the return is None and the dataframe is simply modified.\n",
    "df.drop(['year', 'month', 'day', 'hour', 'minute'], axis='columns', inplace=True)\n",
    "\n",
    "# BONUS: add a column for intervals between observations\n",
    "df['duration'] = df['time'] - df['time'].shift(-1)\n",
    "\n",
    "df.head()\n",
    "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we need to trim down the data. The file contains 45 days worth of observations. We don't want to trim it too tightly and miss interesting things surrounding the hurricane's landfall, but having all 45 days is a bit overkill. Let's trim the data to dates between (and including) 9/06-9/08." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime\n", "idx = (df.time >= datetime(2017, 9, 6)) & (df.time <= datetime(2017, 9, 8))\n", "df = df[idx]\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're almost ready, but now the index column is not that meaningful. It starts at row 2114, which is fine with our initial file, but let's re-zero the index so we have a nice clean data frame to start with." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.reset_index(drop=True, inplace=True)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Basic Timeseries Plotting\n", "\n", "Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. We're going to learn the basics of creating timeseries plots with matplotlib by plotting buoy wind, gust, temperature, and pressure data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Convention for import of the pyplot interface\n", "import matplotlib.pyplot as plt\n", "\n", "# Set-up to have matplotlib use its support for notebook inline plots\n", "%matplotlib inline\n", "\n", "# Register pandas converters with matplotlib\n", "from pandas import plotting\n", "plotting.register_matplotlib_converters()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll start by plotting the windspeeds during and surrounding Hurricane Irma." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.rc('font', size=12)\n", "fig, ax = plt.subplots(figsize=(10, 6))\n", "\n", "# Specify how our lines should look\n", "ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')\n", "\n", "# Same as above\n", "ax.set_xlabel('Time')\n", "ax.set_ylabel('Speed (m/s)')\n", "ax.set_title('Buoy 41056 Wind Data')\n", "ax.grid(True)\n", "ax.legend(loc='upper left');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our x axis labels look a little crowded - let's try only labeling each day in our time series." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Helpers to format and locate ticks for dates\n", "from matplotlib.dates import DateFormatter, DayLocator\n", "\n", "# Set the x-axis to do major ticks on the days and label them like '07/20'\n", "ax.xaxis.set_major_locator(DayLocator())\n", "ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))\n", "\n", "fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can add wind gust speeds to the same plot as a dashed yellow line." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use linestyle keyword to style our plot\n", "ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--',\n", " label='Wind Gust')\n", "# Redisplay the legend to show our new wind gust line\n", "ax.legend(loc='upper left')\n", "\n", "fig\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " EXERCISE:\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Your code goes here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Tip:\n", " If your figure goes sideways as you try multiple things, try running the notebook up to this point again\n", " by using the Cell -> Run All Above option in the menu bar.\n", "
" ] }, { "cell_type": "markdown", "metadata": { "scrolled": true }, "source": [ "\n", "
\n", "
\n",
    "myfig, myax = plt.subplots(figsize=(10, 6))\n",
    "\n",
    "# Plot temperature\n",
    "myax.plot(df.time, df.temperature, color='tab:blue', linestyle='-.', label='Temperature')\n",
    "\n",
    "myax.set_xlabel('Time')\n",
    "myax.set_ylabel('Temperature (degC)')\n",
    "myax.set_title('Buoy 41056 Data')\n",
    "myax.grid(True)\n",
    "\n",
    "# format x axis labels\n",
    "myax.xaxis.set_major_locator(DayLocator())\n",
    "myax.xaxis.set_major_formatter(DateFormatter('%b %d'))\n",
    "\n",
    "myax.legend(loc='upper left');\n",
    "fig\n",
    "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Multiple y-axes\n", "What if we wanted to plot another variable in vastly different units on our plot?
\n", "Let's return to our wind data plot and add pressure." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plot pressure data on same figure\n", "ax.plot(df.time, df.pressure, color='black', label='Pressure')\n", "ax.set_ylabel('Pressure')\n", "\n", "ax.legend(loc='upper left')\n", "\n", "fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That is less than ideal. We can't see detail in the data profiles! We can create a twin of the x-axis and have a secondary y-axis on the right side of the plot. We'll create a totally new figure here." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(10, 6))\n", "axb = ax.twinx()\n", "\n", "# Same as above\n", "ax.set_xlabel('Time')\n", "ax.set_ylabel('Speed (m/s)')\n", "ax.set_title('Buoy 41056 Wind Data')\n", "ax.grid(True)\n", "\n", "# Plotting on the first y-axis\n", "ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')\n", "ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--', label='Wind Gust')\n", "ax.legend(loc='upper left');\n", "\n", "# Plotting on the second y-axis\n", "axb.set_ylabel('Pressure (hPa)')\n", "axb.plot(df.time, df.pressure, color='black', label='pressure')\n", "\n", "ax.xaxis.set_major_locator(DayLocator())\n", "ax.xaxis.set_major_formatter(DateFormatter('%b %d'))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're closer, but the data are plotting over the legend and not included in the legend. That's because the legend is associated with our primary y-axis. We need to append that data from the second y-axis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(10, 6))\n", "axb = ax.twinx()\n", "\n", "# Same as above\n", "ax.set_xlabel('Time')\n", "ax.set_ylabel('Speed (m/s)')\n", "ax.set_title('Buoy 41056 Wind Data')\n", "ax.grid(True)\n", "\n", "# Plotting on the first y-axis\n", "ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')\n", "ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--', label='Wind Gust')\n", "\n", "# Plotting on the second y-axis\n", "axb.set_ylabel('Pressure (hPa)')\n", "axb.plot(df.time, df.pressure, color='black', label='pressure')\n", "\n", "ax.xaxis.set_major_locator(DayLocator())\n", "ax.xaxis.set_major_formatter(DateFormatter('%b %d'))\n", "\n", "# Handling of getting lines and labels from all axes for a single legend\n", "lines, labels = ax.get_legend_handles_labels()\n", "lines2, labels2 = axb.get_legend_handles_labels()\n", "axb.legend(lines + lines2, labels + labels2, loc='upper left');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " EXERCISE:\n", " Create your own plot that has the following elements:\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Your code goes here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "
\n",
    "myfig, myax = plt.subplots(figsize=(10, 6))\n",
    "myaxb = myax.twinx()\n",
    "\n",
    "# Same as above\n",
    "myax.set_xlabel('Time')\n",
    "myax.set_ylabel('Wave Height (m)')\n",
    "myax.set_title('Buoy 41056 Data')\n",
    "myax.grid(True)\n",
    "\n",
    "# Plotting on the first y-axis\n",
    "myax.plot(df.time, df.wave_height, color='tab:blue', label='Waveheight (m)',\n",
    "        linestyle='None', marker='o')\n",
    "\n",
    "# Plotting on the second y-axis\n",
    "myaxb.set_ylabel('Windspeed (m/s)')\n",
    "myaxb.plot(df.time, df.wind_speed, color='tab:green', label='Windspeed (m/s)')\n",
    "\n",
    "myax.xaxis.set_major_locator(DayLocator())\n",
    "myax.xaxis.set_major_formatter(DateFormatter('%b %d'))\n",
    "\n",
    "# Handling of getting lines and labels from all axes for a single legend\n",
    "mylines, labels = myax.get_legend_handles_labels()\n",
    "mylines2, labels2 = myaxb.get_legend_handles_labels()\n",
    "myax.legend(lines + lines2, labels + labels2, loc='upper left');\n",
    "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 1 }