{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"top\"></a>\n",
    "<div style=\"width:1000 px\">\n",
    "\n",
    "<div style=\"float:right; width:98 px; height:98px;\">\n",
    "<img src=\"https://raw.githubusercontent.com/Unidata/MetPy/master/metpy/plots/_static/unidata_150x150.png\" alt=\"Unidata Logo\" style=\"height: 98px;\">\n",
    "</div>\n",
    "\n",
    "<h1>Basic Time Series Plotting</h1>\n",
    "<h3>Unidata Python Workshop</h3>\n",
    "\n",
    "<div style=\"clear:both\"></div>\n",
    "</div>\n",
    "\n",
    "<hr style=\"height:2px;\">\n",
    "\n",
    "<div style=\"float:right; width:250 px\"><img src=\"http://matplotlib.org/_images/date_demo.png\" alt=\"METAR\" style=\"height: 300px;\"></div>\n",
    "\n",
    "\n",
    "## Overview:\n",
    "\n",
    "* **Teaching:** 45 minutes\n",
    "* **Exercises:** 30 minutes\n",
    "\n",
    "### Questions\n",
    "1. How can we read data with Pandas?\n",
    "1. How are plots created in Python?\n",
    "1. What features does Matplotlib have for improving our time series plots?\n",
    "1. How can multiple y-axes be used in a single plot?\n",
    "\n",
    "### Objectives\n",
    "1. <a href=\"#loaddata\">Reading in data</a>\n",
    "1. <a href=\"#basictimeseries\">Basic timeseries plotting</a>\n",
    "1. <a href=\"#multiy\">Multiple y-axes</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"loaddata\"></a>\n",
    "## Reading in Data\n",
    "To learn about time series analysis, we first need to find some data and get it into Python. In this case we're going to use a file that was downloaded from the [National Data Buoy Center](http://www.ndbc.noaa.gov). Specially we're going to look at [buoy 41056](http://www.ndbc.noaa.gov/station_page.php?station=41056) as hurricane Irma passed over it.\n",
    "\n",
    "We'll use the [pandas](http://pandas.pydata.org) library for our data reading and modification as it provides a convenient way to subset and manipulate data. The data does not come in an easily usable format from the NDBC, so it's a good chance to get our hands dirty with real world data manipulation and time series plotting.\n",
    "\n",
    "First, let's start out by reading the text file into a pandas dataframe. If we look at the file we can see it's in a \"fixed-width\" format - i.e. each column has the same number of characters always.\n",
    "\n",
    "```\n",
    "#YY  MM DD hh mm WDIR WSPD GST  WVHT   DPD   APD MWD   PRES  ATMP  WTMP  DEWP  VIS PTDY  TIDE\n",
    "#yr  mo dy hr mn degT m/s  m/s     m   sec   sec degT   hPa  degC  degC  degC  nmi  hPa    ft\n",
    "2017 09 21 19 00 140  8.0 11.0   1.1     6    MM  93 1009.0  28.5    MM    MM   MM -1.0    MM\n",
    "2017 09 21 18 00 140  8.0 10.0   1.1     6    MM  90 1009.5  28.6    MM    MM   MM -1.3    MM\n",
    "2017 09 21 17 00 150  8.0 11.0   1.2     7    MM  90 1010.1  28.6    MM    MM   MM -0.4    MM\n",
    "2017 09 21 16 00 130  8.0 11.0   1.1     6    MM  89 1010.0  28.5    MM    MM   MM -0.4    MM\n",
    "2017 09 21 15 00 140  9.0 11.0   1.1     6    MM 109 1010.8  28.8    MM    MM   MM +1.0    MM\n",
    "```\n",
    "\n",
    "The data columns are year, month, day, hour, minute, wind direction, wind speed, wind gust, wave height, dominant wave period, domininant wave direction, pressure, air temperature, water temperature, dewpoint, visibility, pressure tendency, and tide. As you can see, this buoy does not have all of those sensors, so some columns are filled with `MM`, representing missing data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fname = '41056.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_fwf(fname)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "Getting the data read was pretty easy, but we immediatly see that we've got some cleanup to do. The header row contains column names that are less than ideal. The first data row is actually a row of units as well. We also notice that the date is broken up between multiple columns. It would be nice to have that as one timestamp that is a Python datetime object. Finally, we need to replace `MM` with `NaN`. Luckily these tasks are not too onerous with pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Much better column names, remember to be descriptive and use tab completion when using these!\n",
    "col_names = ['year', 'month', 'day', 'hour', 'minute', 'wind_direction', 'wind_speed',\n",
    "             'wind_gust', 'wave_height', 'dominant_wave_period', 'average_wave_period',\n",
    "             'dominant_wave_direction', 'pressure', 'temperature', 'water_temperature', 'dewpoint',\n",
    "             'visibility', '3hr_pressure_tendency', 'water_level_above_mean']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_fwf(fname, skiprows=2, na_values='MM', names=col_names)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While we're manupulating the data frame, let's get rid of the columns with all missing data. We could use the `drop` method and manually name all of the columns, but that would require us to know which are all `NaN` and that sounds like manual labor - something that programmers hate. Pandas has the `dropna` method that allows us to drop rows or columns where any or all values are `NaN`. In this case, let's drop all columns with all `NaN` values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = df.dropna(axis='columns', how='all')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's get the time stamps fixed up nicely. We need to combine the columns `year` `month` `day` `hour` and `minute` into a single column called `time`. We could cast all of these columns as strings, build the date time stamp string, then parse that, but that's a lot of steps! Looking in the documentation, we see that `parse_dates` can do all that for us. Here's an example of combining the `year` and `month` columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['time'] = pd.to_datetime(df[['year', 'month', 'day']])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our table now includes a 'time' column of timestamps. But the timestamps only include the date, and we have observations every hour - let's read the data in again, this time using all of the time stamp columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['time'] = pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']])\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    <b>EXERCISE</b>:\n",
    "     <ul>\n",
    "      <li>Now that we have timestamps, use the <code>drop</code> method to remove the now unused columns for year,\n",
    "          month, day, hour, and minute (<b>HINT</b>: Look at the <code>axis</code> keyword\n",
    "          argument in the documentation).</li>\n",
    "    <li><b>BONUS:</b> Add a column, 'duration', which contains the time in seconds between subsequent observations (<b>HINT:</b> the statement <code>df[COLUMN_NAME].shift(OFFSET)</code> returns the values of COLUMN_NAME shifted by OFFSET). </li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code goes here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>TIP</b>:\n",
    "    Many of the pandas functions have the <code>inplace</code> keyword argument. This allows us to modify the dataframe without continually needing to reassign it. <code>df = df.command(...)</code> becomes <code>df.command(..., inplace=True)</code>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<button data-toggle=\"collapse\" data-target=\"#sol1\" class='btn btn-primary'>Solution</button>\n",
    "\n",
    "<div id=\"sol1\" class=\"collapse\">\n",
    "<code><pre>\n",
    "\n",
    "# Using inplace means the return is None and the dataframe is simply modified.\n",
    "df.drop(['year', 'month', 'day', 'hour', 'minute'], axis='columns', inplace=True)\n",
    "\n",
    "# BONUS: add a column for intervals between observations\n",
    "df['duration'] = df['time'] - df['time'].shift(-1)\n",
    "\n",
    "df.head()\n",
    "</pre></code>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we need to trim down the data. The file contains 45 days worth of observations. We don't want to trim it too tightly and miss interesting things surrounding the hurricane's landfall, but having all 45 days is a bit overkill. Let's trim the data to dates between (and including) 9/06-9/08."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "idx = (df.time >= datetime(2017, 9, 6)) & (df.time <= datetime(2017, 9, 8))\n",
    "df = df[idx]\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're almost ready, but now the index column is not that meaningful. It starts at row 2114, which is fine with our initial file, but let's re-zero the index so we have a nice clean data frame to start with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.reset_index(drop=True, inplace=True)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "<hr style=\"height:2px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"basictimeseries\"></a>\n",
    "## Basic Timeseries Plotting\n",
    "\n",
    "Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. We're going to learn the basics of creating timeseries plots with matplotlib by plotting buoy wind, gust, temperature, and pressure data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convention for import of the pyplot interface\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Set-up to have matplotlib use its support for notebook inline plots\n",
    "%matplotlib inline\n",
    "\n",
    "# Register pandas converters with matplotlib\n",
    "from pandas import plotting\n",
    "plotting.register_matplotlib_converters()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll start by plotting the windspeeds during and surrounding Hurricane Irma."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.rc('font', size=12)\n",
    "fig, ax = plt.subplots(figsize=(10, 6))\n",
    "\n",
    "# Specify how our lines should look\n",
    "ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')\n",
    "\n",
    "# Same as above\n",
    "ax.set_xlabel('Time')\n",
    "ax.set_ylabel('Speed (m/s)')\n",
    "ax.set_title('Buoy 41056 Wind Data')\n",
    "ax.grid(True)\n",
    "ax.legend(loc='upper left');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our x axis labels look a little crowded - let's try only labeling each day in our time series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helpers to format and locate ticks for dates\n",
    "from matplotlib.dates import DateFormatter, DayLocator\n",
    "\n",
    "# Set the x-axis to do major ticks on the days and label them like '07/20'\n",
    "ax.xaxis.set_major_locator(DayLocator())\n",
    "ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))\n",
    "\n",
    "fig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can add wind gust speeds to the same plot as a dashed yellow line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use linestyle keyword to style our plot\n",
    "ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--',\n",
    "        label='Wind Gust')\n",
    "# Redisplay the legend to show our new wind gust line\n",
    "ax.legend(loc='upper left')\n",
    "\n",
    "fig\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    <b>EXERCISE</b>:\n",
    "     <ul>\n",
    "    <li>Create your own figure and axes (<code>myfig, myax = plt.subplots(figsize=(10, 6))</code>) which plots temperature.</li>\n",
    "    <li>Change the x-axis major tick labels to read 'Sep DD' where DD is the day number. Look at the\n",
    "        <a href=\"https://docs.python.org/3.6/library/datetime.html#strftime-and-strptime-behavior\">\n",
    "            table of formatters</a> for help.\n",
    "    <li>Make sure you include a legend and labels!</li>\n",
    "    <li><b>BONUS:</b> try changing the <code>linestyle</code>, e.g., a blue dashed line.</li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code goes here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>Tip</b>:\n",
    "     If your figure goes sideways as you try multiple things, try running the notebook up to this point again\n",
    "     by using the Cell -> Run All Above option in the menu bar.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "<button data-toggle=\"collapse\" data-target=\"#sol2\" class='btn btn-primary'>View Solution</button>\n",
    "<div id=\"sol2\" class=\"collapse\">\n",
    "<code><pre>\n",
    "myfig, myax = plt.subplots(figsize=(10, 6))\n",
    "\n",
    "# Plot temperature\n",
    "myax.plot(df.time, df.temperature, color='tab:blue', linestyle='-.', label='Temperature')\n",
    "\n",
    "myax.set_xlabel('Time')\n",
    "myax.set_ylabel('Temperature (degC)')\n",
    "myax.set_title('Buoy 41056 Data')\n",
    "myax.grid(True)\n",
    "\n",
    "# format x axis labels\n",
    "myax.xaxis.set_major_locator(DayLocator())\n",
    "myax.xaxis.set_major_formatter(DateFormatter('%b %d'))\n",
    "\n",
    "myax.legend(loc='upper left');\n",
    "fig\n",
    "</pre></code>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "<hr style=\"height:2px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"multiy\"></a>\n",
    "## Multiple y-axes\n",
    "What if we wanted to plot another variable in vastly different units on our plot? <br/>\n",
    "Let's return to our wind data plot and add pressure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot pressure data on same figure\n",
    "ax.plot(df.time, df.pressure, color='black', label='Pressure')\n",
    "ax.set_ylabel('Pressure')\n",
    "\n",
    "ax.legend(loc='upper left')\n",
    "\n",
    "fig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That is less than ideal. We can't see detail in the data profiles! We can create a twin of the x-axis and have a secondary y-axis on the right side of the plot. We'll create a totally new figure here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(10, 6))\n",
    "axb = ax.twinx()\n",
    "\n",
    "# Same as above\n",
    "ax.set_xlabel('Time')\n",
    "ax.set_ylabel('Speed (m/s)')\n",
    "ax.set_title('Buoy 41056 Wind Data')\n",
    "ax.grid(True)\n",
    "\n",
    "# Plotting on the first y-axis\n",
    "ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')\n",
    "ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--', label='Wind Gust')\n",
    "ax.legend(loc='upper left');\n",
    "\n",
    "# Plotting on the second y-axis\n",
    "axb.set_ylabel('Pressure (hPa)')\n",
    "axb.plot(df.time, df.pressure, color='black', label='pressure')\n",
    "\n",
    "ax.xaxis.set_major_locator(DayLocator())\n",
    "ax.xaxis.set_major_formatter(DateFormatter('%b %d'))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're closer, but the data are plotting over the legend and not included in the legend. That's because the legend is associated with our primary y-axis. We need to append that data from the second y-axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(10, 6))\n",
    "axb = ax.twinx()\n",
    "\n",
    "# Same as above\n",
    "ax.set_xlabel('Time')\n",
    "ax.set_ylabel('Speed (m/s)')\n",
    "ax.set_title('Buoy 41056 Wind Data')\n",
    "ax.grid(True)\n",
    "\n",
    "# Plotting on the first y-axis\n",
    "ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')\n",
    "ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--', label='Wind Gust')\n",
    "\n",
    "# Plotting on the second y-axis\n",
    "axb.set_ylabel('Pressure (hPa)')\n",
    "axb.plot(df.time, df.pressure, color='black', label='pressure')\n",
    "\n",
    "ax.xaxis.set_major_locator(DayLocator())\n",
    "ax.xaxis.set_major_formatter(DateFormatter('%b %d'))\n",
    "\n",
    "# Handling of getting lines and labels from all axes for a single legend\n",
    "lines, labels = ax.get_legend_handles_labels()\n",
    "lines2, labels2 = axb.get_legend_handles_labels()\n",
    "axb.legend(lines + lines2, labels + labels2, loc='upper left');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    <b>EXERCISE</b>:\n",
    "    Create your own plot that has the following elements:\n",
    "     <ul>\n",
    "    <li>A blue line representing the wave height measurements.</li>\n",
    "    <li>A green line representing wind speed on a secondary y-axis</li>\n",
    "    <li>Proper labels/title.</li>\n",
    "    <li>**Bonus**: Make the wave height data plot as points only with no line. Look at the documentation for the linestyle and marker arguments.</li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code goes here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<button data-toggle=\"collapse\" data-target=\"#sol3\" class='btn btn-primary'>View Solution</button>\n",
    "<div id=\"sol3\" class=\"collapse\">\n",
    "<code><pre>\n",
    "myfig, myax = plt.subplots(figsize=(10, 6))\n",
    "myaxb = myax.twinx()\n",
    "\n",
    "# Same as above\n",
    "myax.set_xlabel('Time')\n",
    "myax.set_ylabel('Wave Height (m)')\n",
    "myax.set_title('Buoy 41056 Data')\n",
    "myax.grid(True)\n",
    "\n",
    "# Plotting on the first y-axis\n",
    "myax.plot(df.time, df.wave_height, color='tab:blue', label='Waveheight (m)',\n",
    "        linestyle='None', marker='o')\n",
    "\n",
    "# Plotting on the second y-axis\n",
    "myaxb.set_ylabel('Windspeed (m/s)')\n",
    "myaxb.plot(df.time, df.wind_speed, color='tab:green', label='Windspeed (m/s)')\n",
    "\n",
    "myax.xaxis.set_major_locator(DayLocator())\n",
    "myax.xaxis.set_major_formatter(DateFormatter('%b %d'))\n",
    "\n",
    "# Handling of getting lines and labels from all axes for a single legend\n",
    "mylines, labels = myax.get_legend_handles_labels()\n",
    "mylines2, labels2 = myaxb.get_legend_handles_labels()\n",
    "myax.legend(lines + lines2, labels + labels2, loc='upper left');\n",
    "</pre></code>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "<hr style=\"height:2px;\">"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}