{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Weather and Climate\n", "This notebook is intended to reinforce the skills you learned in the Pandas course.\n", "\n", "## Fredericksburg Virginia and Las Cruces New Mexico\n", "\n", "I downloaded a dataset containing weather information about these two cities from [the National Oceanic and Atmospheric Administration (NOAA) website](https://www.ncdc.noaa.gov/cdo-web/datasets). If you would rather explore the data from other cities, feel free to do so.\n", "\n", "\n", "The dataset is [nmva2018.csv](http://zacharski.org/files/courses/data101/nmva2018.csv)\n", "\n", "It contains weather information for Fredericksburg Virginia and Las Cruces New Mexico from January 1, 2000 to the present. The dataset columns are:\n", "\n", "Column | Description\n", ":---: | :--- \n", "STATION | The NOAA weather station identifier\n", "NAME | The name of the station - I changed these to be Las Cruces and Fredericksburg. (They were originally 'STATE UNIVERSITY' and 'FREDERICKSBURG SEWAGE'\n", "DATE | The date\n", "DAPR | Number of days included in the multiday precipitation total (MDPR)\n", "MDPR | Multiday precipitation total\n", "MDWM | Multiday wind movement (miles or km as per user preference)\n", "PRCP | Precipitation total (in tenths of mm)\n", "SNOW | Snowfall (mm)\n", "SNWD | Snow depth (mm)\n", "TMAX | Maximum Temperature\n", "TMIN | Minimum Temperature\n", "TOBS | Temperature at time of observation\n", "WDMV | 24-hour wind movement (miles)\n", "WT01 | Fog, ice fog, or freezing fog (may include heavy fog)\n", "WT03 | Thunder\n", "WT06 | Glaze or rime\n", "WT11 | Blowing Spray\n", "\n", "Let's load in the dataset:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# TBD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's examine only the data from 2016:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#TBD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We would like to display a series of plots comparing Fredericksburg to Las Cruces. First, let's plot the number of days that have reached a temperature of 90 or above:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#TBD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we would like to see a similar plot for the number of days that reached a temperature of 32 or below (meaning that at some point of the day the temperature was 32 or lower):\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#TBD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What do you consider the ideal outdoor temperature range? Whatever you decide, we would like to plot the number of days that were within that range. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#TBD" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "

Hacker Challenges

\n", "\n", "The following requires some mental calisthenics. \n", "\n", "### Part 1. \n", "We would like to see a yearly plot of the number of days 90 or over for Fredericksburg. So the x axis would be the years 2010 to 2017. Which year had the most days over 90?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part 2. even more challenging.\n", "We would like to see a plot similar to that in Part 1, but showing data for both Fredericksburg and Las Cruces. That is, for each year we see the days 90 or over for Las Cruces, and the days 90 or over for Fredericksburg.\n", "\n", "\n", "Here is a bit of a hint. (This was my approach - you might have a different, better one). I had two Pandas Series. One, `cc` was the number of days 90 or higher for Las Cruces. It looked like this:\n", "\n", " cc.head()\n", " \n", " DATE\n", " 2000-12-31 142\n", " 2001-12-31 121\n", " 2002-12-31 117\n", " 2003-12-31 117\n", " 2004-12-31 106\n", "\n", "And had a similar one, `ff` for Fredericksburg. Then I combined them into one DataFrame by:\n", "\n", " combined = pd.DataFrame({'Las Cruces': cc, 'Fredericksburg' : ff})\n", " \n", " combined.head()\n", " \n", " FR\tLC\n", " DATE\t\t\n", " 2000-12-31\t24\t142\n", " 2001-12-31\t25\t121\n", " 2002-12-31\t57\t117\n", " 2003-12-31\t26\t117\n", " 2004-12-31\t24\t106\n", " \n", " \n", "After that, the plotting was easy.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#TBD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The average max weekly temperatures of Fredericksburg in 2016\n", "\n", "What we mean:\n", "\n", " | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday\n", " ---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: \n", " Max Temp | 82 | 84 | 82 | 75 | 77 | 87 | 89\n", " \n", " The average max weekly temperature for that week \n", " \n", " $$avgMaxWeekly = \\frac{82 + 84 + 82 + 75 + 77 + 87 + 89}{7} = \\frac{576}{7} = 82.2857$$\n", " \n", " We would like to see a plot for the whole year:\n", " " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#TBD\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Hacker Challenge

\n", "\n", "Can you do the same (the average max weekly temperature plot) for both Fredericksburg and Las Cruces in one plot?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#TBD\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The total annual precipitation amounts for both Fredericksburg and Las Cruces\n", "A plot showing the amounts from 2010 through 2017. (a plot showing 2010, 2011, 2012, etc)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A non-plot question.\n", "What is the average yearly precipitation amounts for Fredericksburg and Las Cruces?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# TBD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Climate Change: Atmospheric Carbon Dioxide\n", "Before the industrial revolution atmospheric carbon dioxide was about 280 ppm (parts per million). When we first started measuring its concentration at Mauna Loa Hawaii in 1958 the concentration was 315.\n", "\n", "The data from this location is in the CSV file:\n", "\n", "[co2_mm_mlo.csv](https://raw.githubusercontent.com/zacharski/data101/master/co2_mm_mlo.csv)\n", " \n", "\n", "The following information from the original dataset is important:\n", "\n", "> Data from March 1958 through April 1974 have been obtained by C. David Keeling\n", "> of the Scripps Institution of Oceanography (SIO) and were obtained from the\n", "> Scripps website (scrippsco2.ucsd.edu).\n", ">\n", "> The \"average\" column contains the monthly mean CO2 mole fraction determined\n", "> from daily averages. The mole fraction of CO2, expressed as parts per million\n", "> (ppm) is the number of molecules of CO2 in every one million molecules of dried\n", "> air (water vapor removed). If there are missing days concentrated either early\n", "> or late in the month, the monthly mean is corrected to the middle of the month\n", "> using the average seasonal cycle. Missing months are denoted by -99.99.\n", "> The \"interpolated\" column includes average values from the preceding column\n", "> and interpolated values where data are missing. Interpolated values are\n", "> computed in two steps. First, we compute for each month the average seasonal\n", "> cycle in a 7-year window around each monthly value. In this way the seasonal\n", "> cycle is allowed to change slowly over time. We then determine the \"trend\"\n", "> value for each month by removing the seasonal cycle; this result is shown in\n", "> the \"trend\" column. Trend values are linearly interpolated for missing months.\n", "> The interpolated monthly mean is then the sum of the average seasonal cycle\n", "> value and the trend value for the missing month.\n", ">\n", "> NOTE: In general, the data presented for the last year are subject to change, \n", "> depending on recalibration of the reference gas mixtures used, and other quality\n", "> control procedures. Occasionally, earlier years may also be changed for the same\n", "> reasons. Usually these changes are minor.\n", ">\n", "> CO2 expressed as a mole fraction in dry air, micromol/mol, abbreviated as ppm\n", ">\n", "> (-99.99 missing data; -1 no data for >daily means in month)\n", "\n", "**Please give a monthly plot of the atmospheric carbon (extra xp for making a pretty plotTM).**\n", "\n", "### Hint:\n", "\n", "The date has a year and a month column:\n", "\n", "\n", "year |\tmonth |\tdecimal_date\t| average\t| interpolated | \ttrend |\tdays\n", ":---: | :---: | :---: | :---: | :---: | :---: | :---: \n", "1958 |\t3\t| 1958.208\t| 315.71\t| 315.71 |\t314.62 |\t-1\n", "1958 |\t4\t| 1958.292\t| 317.45\t| 317.45 |\t315.29\t| -1\n", "1958 |\t5\t| 1958.375\t| 317.50\t| 317.50 |\t314.71\t| -1\n", "\n", "Let's say you wanted to combine the year and month to create a Pandas Series with entries like '1958-03' and so on. \n", "\n", "If our original Pandas DataFrame is called `carbon` we can create a series called `date_string` by executing:\n", "\n", "\n", " date_string = carbon['year'].astype(str) + '-' + carbon['month'].apply(lambda x:\"%02i\" % x)\n", " \n", "For more of a hint see the DataCamp page *Cleaning and tidying datetime data*\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now we would like to see a plot of the average daily atmospheric carbon for every 5 years. **" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of those plots looked saw-toothed leading us to wonder if some months of the year had lower atmospheric carbon than others. For example, maybe it is low during winter months. Can you come up with a plot that will help us answer this question?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Before We Start\n", "\n", "Suppose we have the small DataFrame" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Namefinalmidterm
0Ann8987
1Ben8175
2Clara9997
3Dora9581
4Enric6065
5Fred9391
6Ginny8785
7Hannah9996
\n", "
" ], "text/plain": [ " Name final midterm\n", "0 Ann 89 87\n", "1 Ben 81 75\n", "2 Clara 99 97\n", "3 Dora 95 81\n", "4 Enric 60 65\n", "5 Fred 93 91\n", "6 Ginny 87 85\n", "7 Hannah 99 96" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "names = ['Ann', 'Ben', 'Clara', \"Dora\", 'Enric', 'Fred', 'Ginny', 'Hannah']\n", "midtermGrades = [87, 75, 97, 81, 65, 91, 85, 96]\n", "finalGrades = [89, 81, 99, 95, 60, 93, 87, 99]\n", "grades = pd.DataFrame({'Name': names, 'midterm': midtermGrades, 'final': finalGrades})\n", "grades\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can sort the data by the values in the final column by:\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Namefinalmidterm
2Clara9997
7Hannah9996
3Dora9581
5Fred9391
0Ann8987
6Ginny8785
1Ben8175
4Enric6065
\n", "
" ], "text/plain": [ " Name final midterm\n", "2 Clara 99 97\n", "7 Hannah 99 96\n", "3 Dora 95 81\n", "5 Fred 93 91\n", "0 Ann 89 87\n", "6 Ginny 87 85\n", "1 Ben 81 75\n", "4 Enric 60 65" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gradesSorted = grades.sort_values('final', ascending=False)\n", "gradesSorted" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, if we want, we can make a new dataframe of the top 3 students:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Namefinalmidterm
2Clara9997
7Hannah9996
3Dora9581
5Fred9391
\n", "
" ], "text/plain": [ " Name final midterm\n", "2 Clara 99 97\n", "7 Hannah 99 96\n", "3 Dora 95 81\n", "5 Fred 93 91" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "topStudents = gradesSorted[:4]\n", "topStudents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "# Salaries, Colleges, and Degrees\n", "\n", "We have two data files that were created by the Wall Street Journal. \n", "\n", "One is called salariesByCollege and looks like:\n", "\n", "School Name | Unnamed: 0 | School Type | Starting Median Salary | Mid-Career Median Salary | Mid-Career 10th Percentile Salary | Mid-Career 25th Percentile Salary | Mid-Career 75th Percentile Salary | Mid-Career 90th Percentile Salary | region\n", ":---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: \n", "Massachusetts Institute of Technology (MIT) | 0 | Engineering | 72200.0 | 126000.0 | 76800.0 | 99200.0 | 168000.0 | 220000.0 | Northeastern\n", "California Institute of Technology (CIT) | 1 | Engineering | 75500.0 | 123000.0 | | 104000.0 | 161000.0 | | California\n", "Harvey Mudd College | 2 | Engineering | 71800.0 | 122000.0 | | 96000.0 | 180000.0 | | California\n", "Polytechnic University of New York Brooklyn | 3 | Engineering | 62400.0 | 114000.0 | 66800.0 | 94300.0 | 143000.0 | 190000.0 | Northeastern\n", "\n", "The other is called degreesThatPayBack:\n", "\n", "Unamed 0 | Undergraduate Major | Starting Median Salary | Mid-Career Median Salary | Percent change from Starting to Mid-Career Salary | Mid-Career 10th Percentile Salary | Mid-Career 25th Percentile Salary | Mid-Career 75th Percentile Salary | Mid-Career 90th Percentile Salary\n", ":---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: \n", "0 | Accounting | 46000.00 | 77100.00 | 67.6 | 42200.00 | 56100.00 | 108000.00 | 152000.00\n", "1 | Aerospace Engineering | 57700.00 | 101000.00 | 75.0 | 64300.00 | 82100.00 | 127000.00 | 161000.00\n", "2 | Agriculture | 42600.00 | 71900.00 | 68.8 | 36300.00 | 52100.00 | 96300.00 | 150000.00\n", "3 | Anthropology | 36800.00 | 61500.00 | 67.1 | 33800.00 | 45500.00 | 89300.00 | 138000.00\n", "4 | Architecture | 41600.00 | 76800.00 | 84.6 | 50600.00 | 62200.00 | 97000.00 | 136000.00\n", "5 | Art History | 35800.00 | 64900.00 | 81.3 | 28800.00 | 42200.00 | 87400.00 | 125000.00\n", "\n", "The files are in a zipped compressed folder at [collegeSalaries.zip](http://zacharski.org/files/courses/data101/collegeSalaries.zip). You will need to download the file to your laptop, unzip the file, and then load the files into Pandas. This is good practice for when someone emails you a file, or you create your own datafile.\n", "\n", "I will give you a moment to load this data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A few basic questions\n", "\n", "### We would like to see a list of universities sorted by those with the highest starting salary first." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Can you do the same sort but this time with majors? (A list of the highest paying majors)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Can you plot the salaries of the top 5 majors?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Can you plot the salaries of the top 5 schools?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Creative\n", "Now is your chance to do something creative with the data. What is an interesting question you have that can be answered with a few plots and/or summary statistics?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }