{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 2.19 Statistics for Geoscientists class test - 14:00-17:00 29th May 2015" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Test instructions\n", "\n", "* This test contains **5** questions each of which should be answered.\n", "* Write your program in a Python cell just under each question.\n", "* You should write an explanation of your solution as comments in your code.\n", "* In each case your solution program must fulfil all of the instructions - please check the instructions carefully and double check that your program fulfils all of the given instructions.\n", "* Save your work regularly.\n", "* At the end of the test you should email your IPython notebook document (i.e. this document) to [Gerard J. Gorman](http://www.imperial.ac.uk/people/g.gorman) at g.gorman@imperial.ac.uk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Question preamble\n", "\n", "You hypothesise that there may be a link between temperature and the level of green house gasses in the atmosphere. As part of your investigation to establish if there is a correlation you analyise ice core data taken from [Vostok Station](http://en.wikipedia.org/wiki/Vostok_Station) in Antarctica. The data file that you will be using, [VostokStation.csv](https://raw.githubusercontent.com/ggorman/Introduction-to-stats-for-geoscientists/gh-pages/data/VostokStation.csv), constains reconstructed temperature, CO2 gas concentration and CH4 gas concentrations stretching back 160,000 years." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1\n", "\n", "* Read in the data from the file [VostokStation.csv](https://raw.githubusercontent.com/ggorman/Introduction-to-stats-for-geoscientists/gh-pages/data/VostokStation.csv) into NumPy arrays called *year*, *temperature*, *co2* and *ch4*.\n", "* Print out the *mean*, *minimum* and *maximum* temperature, CO2 concentration and CH4 contentration from this data." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Question 2\n", "\n", "* Plot line graphs of temperature, CO2 concentration and CH4 concentration against year.\n", "* Remember to label all your axes." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Question 3\n", "\n", "* Create scatter plots for both: temperature vs CO2 concentration; and temperature vs CH4 concentration.\n", "* Using linear regression, fit a straight line to each set of data and plot this line on the same graph.\n", "* What does the p-value from linear regression tell you?\n", "\n", "**Remember** to label the x-axis and y-axis. " ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Question 4\n", "\n", "* Plot a histogram of the difference between CO2 concentration values and the line fit in the scatter plot above.\n", "* Use *scipy.stats.normaltest* to determine if this data is normally distributed.\n", "\n", "**HINT:** Take, for example, CO2 vs temperature. When you use linear regression to fit a straight line $y = mx+c$, where $m$ is the slope and $c$ is the $intercept$. The variation in C02 around the straight line model is therefore:\n", "\n", " co2_variation = co2 - (m*temperature+c)\n", " \n", "where *co2_variation*, *co2* and *temperature* are all NumPy arrays." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Question 5\n", "\n", "Using the approperiate correlation test in each case, determine if there is a correlation between: temperature and CO2 concentration; temperature and CH4 concentration. Explain both your choice of correlation statistic and your conclusion.\n", "\n", "**Hint.** Use the *scipy.stats.normaltest* to check if the samples are normally distributed." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }