{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Climate\n", "### total XP possible: 8,750\n", "\n", "\n", "Over Fall break I flew back home to New Mexico to hike in the mountains. On the flight I read the book *The Weather Makers: How Man Is Changing the Climate and What It Means for Life on Earth* by Tim Flannery. Curious about New Mexico I went to the NOAA Climate website and saw this scary plot:\n", "\n", "\n", "\n", "\n", "The zigzagging gray line represents the actual yearly average temperatures and the blue one smoothes out the data points. There has been a pretty dramatic increase in the temperature since the mid 1970s. As you recall from our Weather Python Notebook the rise of atmospheric CO2 has been increasing steadily. Prior to the mid-1970s we also emitted a lot of particulate matter into the atmosphere. This had the effect of reflecting sunlight so it mitigated the effects of atmospheric CO2. In the 70s many countries regulated particulate emissions because they caused things like acid rain. Without those particulates the atomospheric CO2 raised the temperatures.\n", "\n", "I would like us to duplicate, as closely as possible, this NOAA plot.\n", "\n", "I downloaded the data of all 50 states and converted into a form that can be read by pandas. That file is at\n", "\n", "\n", "[https://raw.githubusercontent.com/zacharski/data101/master/climate.csv](https://raw.githubusercontent.com/zacharski/data101/master/climate.csv)\n", " \n", "The format looks like:\n", "\n", "state or region | year-month | year | TMAX | TMIN | TAVG\n", ":---: | :---: | :---: | :---: | :---: | :---: \n", "Alabama|1895-01|1895|52.700000|33.400000|43.100000\n", "Alabama|1895-02|1895|48.100000|26.800000|37.400000\n", "Alabama|1895-03|1895|66.500000|42.400000|54.500000\n", "Alabama|1895-04|1895|75.700000|51.200000|63.400000\n", "Alabama|1895-05|1895|80.600000|58.400000|69.500000\n", "Alabama|1895-06|1895|88.400000|66.500000|77.500000\n", "\n", "\n", "So for each state (and region) we have the monthy maximum, minimum and average temperatures for each month starting from 1895 to the present.\n", "\n", "## Part A: 3500xp\n", "\n", "The first thing we need to do is read the file and set the index to the date.\n", "\n", "## 1. Read the file" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# read the file and set the index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Getting info for a particular state\n", "\n", "To make our code flexible, let's store the state name in a variable we call `state`" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "state = 'New Mexico'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and now let's get the info for that state (you can call the variable anything you want, but suppose we call it `stateData` (be sure to use the variable `state` and not the string `New Mexico`)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# get data for our state" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Get the annual averages\n", "\n", "Next, let's create a new DataFrame with the average yearly minimum, average, and maximum temperatures" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# your work here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Plot the average yearly maximum temperatures\n", "It should look pretty zig zaggy\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# again your work" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Rolling 7 year window\n", "We are going to try to smooth those zig zags out by computing the mean of a rolling 7 year window. Here is what I mean by a rolling window. When I take my weight every morning it goes up and down. Here is the chart for a week:\n", "\n", "Date | Weight\n", ":---: | :---:\n", "10/2/2017\t| 176.80\n", "10/3/2017\t| 176.20\n", "10/4/2017\t| 176.00\n", "10/5/2017\t| 174.80\n", "10/6/2017\t| 173.40\n", "10/7/2017\t| 173.80\n", "10/8/2017 | 174.00\n", "\n", "As you can see it goes up and down. To smooth things out, I will take the average of my weight on 3 days (on a particular day plus the two previous ones). So for 10/4/2017 I average 176.8, 176.20, and 176.0.\n", "\n", "Date | Weight | Smoothed\n", ":---: | :---: | :---:\n", "10/2/2017\t| 176.80 | NaN\n", "10/3/2017\t| 176.20 | NaN\n", "10/4/2017\t| 176.00 | 176.33\n", "10/5/2017\t| 174.80 | 175.67\n", "10/6/2017\t| 173.40 | 174.75\n", "10/7/2017\t| 173.80 | 174.00\n", "10/8/2017 | 174.00 | 173.74\n", "\n", "In the original data the weight seemed to be going up for the last three days, but in the smoothed weight, the weight was consistently going down. As you can also see, if I have a three day window, there will be NaN (not a number) for the first 2 entries.\n", "\n", "So, again, create smoothed data by using a 7 year window. That dataframe should also not include the first 6 entries.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Create a dataframe with smoothed data - remove the first 6 entries\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Graph the results\n", "\n", "I would like you to produce three plots, one right after the other, that look something like\n", "\n", "\n", "\n", "Do not hard-code 'New Mexico' in the title, use the `state` variable" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n", "# put code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part B: Putting it All Together - 1000xp\n", "\n", "This should be a copy and paste of the code you did above. We are going to put the code in a function called `displayClimate`:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def displayClimate(state):\n", " \n", " # get info for that particular state\n", " \n", " # get the annual averages\n", " \n", " # Plot the average yearly maximum temperatures\n", " \n", " # Rolling 7 year window\n", " \n", " # Graph the results\n", " print('remove this print statement about %s when you add your code' % state)\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's test the function we wrote:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "remove this print statement about Virginia when you add your code\n", "remove this print statement about Arizona when you add your code\n", "remove this print statement about Maine when you add your code\n" ] } ], "source": [ "displayClimate('Virginia')\n", "displayClimate('Arizona')\n", "displayClimate('Maine')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part C: Hacker Edition extra xp\n", "\n", "### 1. Seasonal 750-1500 xp\n", "\n", "If you go to the [NOAA State Trend Charts](https://www.ncdc.noaa.gov/temp-and-precip/state-temps/) you will notice the small charts below the main one. So you can click on 'Summer' and see the summer statistics. Can you create a new `displayClimate` function that starts:\n", "\n", " displayClimate(state, season):\n", " \n", " \n", "so if you call\n", "\n", " displayClimate('New Mexico', 'Annual')\n", " \n", "it will display the annual charts and if you type:\n", "\n", " displayClimate('New Mexico', 'Summer')\n", " \n", "it will display those for the summer. \n", "\n", "If you have those two options (annual and summer) you will get 750xp, if you do all 5 you will get 1,500.\n", "\n", "### 2. LOESS - 1500xp\n", "If you read the text above the NOAA chart you will see that to get the blue line they use a statistic called Loess. Seaborn has a plot called Lowess that is a variant of Loess. Here is an example of its use:\n", "\n", "\n", " import seaborn as sns\n", " # nmY is my resampled yearly New Mexico data (but not rolling window)\n", " nmY\n", " # set the y part of the chart to be between 64 and 72\n", " plt.ylim(64, 72)\n", " # label the chart\n", " plt.title('New Mexico Annual High Temperature')\n", " # generate the Lowess regression plot\n", " sns.regplot(data = nmY, x = 'Year', y = 'TMAX', lowess=True)\n", " # show the plot\n", " plt.show()\n", " \n", "That creates a plot that looks like:\n", "\n", "\n", " \n", "Can you create a new function that will display the Lowess charts for minimum, average, and maximum for a state instead of the plot of the rolling window? So, for example,\n", "\n", "So, for example, \n", "\n", " displayLowess('Maine')\n", " \n", "will display the three Lowess plots for Maine.\n", "\n", "### Departure from 20th Century Average - 1000xp\n", "If you scroll down toward the bottom of the NOAA webpage, you will see a map of the US displaying how the decade average departs from the 20th century average. The paragraph above the map describes how they compute these numbers. I would like you to **write a function** that takes a state name and displays how much the decades depart from the 20th Century Average. Here is the algorithm I would like you to use:\n", "\n", "1. use the non-windowed data (data that is the average yearly temps, max, min, avg).\n", "2. the data should run from 1901 through the end of 2016\n", "3. compute the average (max, min, avg) temps for each decade \n", "4. subtract the 20th century average from the decade averages\n", "5. print the results\n", "6. extra credit: plot the results 250xp\n", "\n", "My results look like this for New Mexico:\n", "\n", "\n", "Date | TMIN\t| TAVG\t| TMAX\n", ":---: | :---: | :---: | :---: \t\t\t\n", "1901-12-31\t| 0.542170\t| 0.336710 |\t0.158764\n", "1911-12-31\t| 0.210503\t| -0.379957\t| -0.966236\n", "1921-12-31\t| -0.511164\t| -0.956624\t| -1.391236\n", "1931-12-31\t| -0.171997\t| -0.465790\t| -0.752069\n", "1941-12-31\t| 0.025503\t| -0.097457\t| -0.225402\n", "1951-12-31\t| -0.746997\t| -0.210790\t| 0.330431\n", "1961-12-31\t| -0.362830\t| -0.020790\t| 0.312098\n", "1971-12-31\t| -0.646164\t| -0.456624\t| -0.267069\n", "1981-12-31\t| -0.543664\t| -0.364124\t| -0.185402\n", "1991-12-31\t| -0.017830\t| -0.267457\t| -0.513736\n", "2001-12-31\t| 0.787170\t| 0.945043\t| 1.095431\n", "2011-12-31\t| 1.043003\t| 1.290876\t| 1.529598\n", "2021-12-31\t| 1.760503\t| 1.900043\t| 2.035431\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }