{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Introduction to Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The aim of this Chapter is to introduce you to some of the fundamental concepts in Python. Mainly, this is based around fundamental data types in Python (`int`, `float`, `str`, `bool` etc.) and ways to group them (`tuple`, `list` and `dict`).\n", "\n", "We then learn about how to loop over groups of things, which gives us control to iterate some process.\n", "\n", "We need to spend a little time on strings, as you will likely to quite a bit of string processing in Scientific Computing (e.g. reading/writing data to/from ASCII text files). \n", "\n", "Although some of the examples we use are very simple to explain a concept, the more developed ones should be directly applicable to the sort of programming you are likely to need to do.\n", "\n", "A set of exercises is developed throughout the chapter, with worked answers available to you once you have had a go yourself.\n", "\n", "In addition, a more advanced section of the chapter is available, that goes into some more detail and complkications. This too has a set of exercises with worked examples.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.1 Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Python](http://www.python.org/) is a high level programming language that is freely available, relatively easy to learn and portable across different computing systems. In Python, you can rapidly develop solutions for the sorts of problems you might need to solve in your MSc courses and in the world beyond. Code written in Python is also easy to maintain, is (or should be) self-documented, and can easily be linked to code written in other languages.\n", "\n", "Relevant features include: \n", "\n", "- it is automatically compiled and executed \n", "- code is portable provided you have the appropriate Python modules. \n", "- for compute intensive tasks, you can easily make calls to methods written in (faster) lower-level languages such as C or FORTRAN \n", "- there is an active user and development community, which means that new capabilities appear over time and there are many existing extensions and enhancements easily available to you.\n", "\n", "For further background on Python, look over the material on [Advanced Scientific Programming in Python](https://python.g-node.org/wiki/schedule) and/or the [software-carpentry.org](http://software-carpentry.org/v3/py01.html) and [python.org](http://www.python.org/) web sites." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this session, you will be introduced to some of the basic concepts in Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.2 Running Python Programs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2.1 Requirements\n", "\n", "For this course, we suggest you use the [anaconda](https://store.continuum.io/cshop/anaconda/) Python distribution (this is what is installed in the unix lab computers), though you are free to use whichever version of it you like on your own computers.\n", "\n", "If you are intending to use these notes on your opwn computer, you will need a relatively comprehensive installation of Python (such as that from [anaconda](https://store.continuum.io/cshop/anaconda/)), and will also need [GDAL](http://www.gdal.org/) installed for some of the work. You may also find it of value to have [git](http://git-scm.com/) installed.\n", "\n", "We are assuming that you are new to computing in this course, but that you are aware of the basic unix material covered in the previous lecture.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2.2 Running Python \n", "\n", "We will generally use the `ipython` interpreter for running interactive Python programs.\n", "\n", "You will probably want to run each session and store scripts in your `Data` (or `DATA`) directory.\n", "\n", "If you are taking this course at UCL, the notes should already have been downloaded to your `DATA` directory.\n", "\n", "If so, then:\n", "\n", "```\n", "berlin% cd ~/DATA/geogg122\n", "berlin% git reset --hard HEAD\n", "berlin% git pull\n", "```\n", "\n", "will update the notes (for any changes I make over the sessions).\n", "\n", "If you need to download the notes and want to run the session directly in the notebook, you will need to download the course material from [github](https://github.com/profLewis/geogg122) and run the notebook with e.g.:\n", "\n", "```\n", "berlin% cd ~/DATA\n", "berlin% git clone https://github.com/profLewis/geogg122.git\n", "```\n", "\n", "to obtain the notes. \n", "\n", "You should next check that you are using the version of Python that we intend:\n", "\n", "```\n", "berlin% which ipython\n", "/opt/anaconda/bin/ipython\n", "```\n", "\n", "If this isn't the version of Python that you are picking up (note the use of the unix command `which` here), then you can either just type the full path name:\n", "\n", "```\n", "berlin% /opt/anaconda/bin/ipython \n", "```\n", "\n", "in place of where it says `ipython` in these notes, or modify your shell initialisation file (`~/.bashrc` if you are using `bash` or `~/.cshrc` for `tcsh` or `csh`) to include `/opt/anaconda/bin` early on in the `PATH`. For running notebooks, we use `jupyter` rather than `ipython`.\n", "\n", "To go to the directory for this session: \n", "\n", "`berlin% cd ~/Data/geogg122/Chapter2_Python_intro` \n", "`berlin% jupyter notebook python101.ipynb --pylab=inline` \n", "\n", "You quit an `jupyter` notebook session with `^C` (`Control C`).\n", "\n", "To exectute ('run') blocks of Python code in the notebook, use `^` (`SHIFT` and `RETURN` keys together).\n", "\n", "Alternatively, just run `ipython`: \n", "```\n", "berlin% cd ~/DATA/geogg122/Chapter2_Python_intro\n", "berlin% jupyter notebook\n", "```\n", "\n", "and type your own commands in at the prompt, following the class or the material on the webpages.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.3 Getting Started" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3.1 Variables, Values and Data types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The idea of **variables** is fundamental to any programming. You can think of this as the *name* of *something*, so it is a way of allowing us to refer to some object in the language.\n", "\n", "What the variable *is* set to is called its **value**.\n", "\n", "So let's start with a variable we will call (*declare to be*) `x`.\n", "\n", "We will give the *value* `1` to this variable:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "x = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a computing language, the *sort of thing* the variable can be set to is called its **data type**.\n", "\n", "In the example above, the datatype is an **integer** number (e.g. `1, 2, 3, 4`).\n", "\n", "In 'natural language', we might read the example above as 'x is one'.\n", "\n", "This is different to:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "x = 'one'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "because here we have set value of the variable `x` to a **string** (i.e. some text).\n", "\n", "A string is enclosed in quotes, e.g. `\"one\"` or `'one'`, or even `\"'one'\"` or `'\"one\"'`.\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "one\n", "one\n", "'one'\n", "\"one\"\n" ] } ], "source": [ "print \"one\"\n", "print 'one'\n", "print \"'one'\"\n", "print '\"one\"'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is different to:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "x = 1.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "because here we have set value of the variable `x` to a **floating point** number (these are treated and stored differently to integers in computing).\n", "\n", "This is different to:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "x = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where `True` is a **logical** or **boolean** datatype (something is `True` or `False`).\n", "\n", "We have so far seen three datatypes:\n", "\n", "- integer (`int`): 32 bits long on most machines\n", "- (double-precision) floating point (`float`): (64 bits long)\n", "- Boolean (`bool`)\n", "- string (`str`)\n", "\n", "but we will come across more (and even create our own!) as we go through the course." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In each of these cases above, we have used the variable `x` to contain these different data types. If you want to know what the data type of a variable is, you can use the method `type()`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n" ] } ], "source": [ "print type(1);\n", "print type(1.0);\n", "print type('one');\n", "print type(True);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can explicitly convert between data types, e.g.:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int(1.1) = 1\n", "float(1) = 1.0\n", "str(1) = 1\n", "bool(1) = True\n" ] } ], "source": [ "print 'int(1.1) = ',int(1.1)\n", "print 'float(1) = ',float(1)\n", "print 'str(1) = ',str(1)\n", "print 'bool(1) = ',bool(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "but only when it makes sense:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "converting the string '1' to an integer makes sense: 1\n" ] } ], "source": [ "print \"converting the string '1' to an integer makes sense:\",int('1')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "converting the string 'one' to an integer doesn't:" ] }, { "ename": "ValueError", "evalue": "invalid literal for int() with base 10: 'one'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32mprint\u001b[0m \u001b[1;34m\"converting the string 'one' to an integer doesn't:\"\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'one'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mValueError\u001b[0m: invalid literal for int() with base 10: 'one'" ] } ], "source": [ "print \"converting the string 'one' to an integer doesn't:\",int('one')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you get an error (such as above), you will need to learn to *read* the error message to work out what you did wrong.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### del" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can delete a variable with `del`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = 100.\n", "print x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = 100.\n", "del x\n", "\n", "# so now if we try to do anything with the variable\n", "# x, it should fail as x is no longer defined\n", "print x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3.2 Arithmetic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Often we will want to do some [arithmetic](http://www.tutorialspoint.com/python/python_basic_operators.htm) with numbers in a program, and we use the 'normal' (derived from C) operators for this.\n", "\n", "Note the way this works for integers and floating point representations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'''\n", " Some examples of arithmetic operations in Python\n", "\n", " Note how, if we mix float and int, the result is raised to float\n", " (as the more general form)\n", "'''\n", "\n", "print 10 + 100 # int addition\n", "print 10. - 100 # float subtraction\n", "print 1./2. # float division\n", "print 1/2 # int division\n", "print 10.*20. # float multiplication\n", "print 2 ** 3. # float exponent\n", "print 8%2 # int remainder\n", "\n", "print '========'\n", "# demonstration of floor (//) and remainder (%)\n", "number = 9.5\n", "base = 2.0\n", "remainder = number%base # float remainder\n", "floor = number//base # 'floor' operation\n", "print number,'is',floor,'times',base,'plus',remainder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Change the numbers in the examples above to make sure you understand these basic operations.\n", "\n", "Try combining operations and use brackets () to check that that works as expected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3.3 Assignment Operators" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'''\n", " Assignment operators\n", "\n", " x = 3 assigns the value 3 to the variable x\n", " x += 2 adds 2 onto the value of x\n", " so is the same as x = x + 2\n", " similarly /=, *=, -=\n", " x %= 2 is the same as x = x % 2\n", " x **= 2 is the same as x = x ** 2\n", " x //= 2 is the same as x = x // 2\n", "\n", " A 'magic' trick\n", " ===============\n", "\n", " http://www.wikihow.com/Read-Someone\\\n", " %27s-Mind-With-Math-%28Math-Trick%29\n", "\n", " whatever you put as myNumber, the answer is 3\n", "\n", " Try this with integers or floating point numbers ...\n", "'''\n", "\n", "# pick a number \n", "myNumber = 34.67\n", "\n", "# assign this to the variable x\n", "x = myNumber\n", "\n", "# multiply it by 2\n", "x *= 2\n", "\n", "# multiply this by 5\n", "x *= 5\n", "\n", "# divide by the original number\n", "x /= myNumber\n", "\n", "# subtract 7\n", "x -= 7\n", "\n", "# The answer will always be 3\n", "print x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3.4 Logical Operators" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'''\n", " Logical operators\n", "'''\n", "alive = True\n", "dead = not alive\n", "print 'dead or alive is',dead or alive\n", "print 'dead and alive is',dead and alive" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result of running comparison operators will give a logical (i.e. `bool`) output.\n", "\n", "Most of this should be obvious, but consider carefully how this works for string data types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3.5 Comparison Operators" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'''\n", " Related, comparison operators:\n", " \n", " == : equvalence\n", " != : not equivalent\n", " > : greater than\n", " >= : greater than or equal to\n", " < : less than\n", " <= : less than or equal to \n", "'''\n", "\n", "print \"is one plus one equal to two?\"\n", "print 1 + 1 == 2\n", "\n", "print \"is one less than or equal to 0.999?\"\n", "print 1 <= 0.999\n", "\n", "print \"is one plus one not equal to two?\"\n", "print 1 + 1 != 2\n", "\n", "# note the use of double quotes inside a single quoted string here\n", "print 'is \"Hello\" not the same as \"hello\"?'\n", "print 'Hello' != 'hello'\n", "\n", "# note the use of single quotes inside a double quoted string here\n", "print \"is 'more' greater than 'less'?\"\n", "print \"more\" > \"less\"\n", "\n", "print \"is '100' less than '2'?\"\n", "print '100' < '2'\n", "\n", "print \"is 100 less than 2?\"\n", "print 100 < 2\n", "\n", "# a boolean example just to see what happens\n", "print \"is True greater than False?\"\n", "print True > False" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can combine such logical statements, bracketing the terms as required:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print (1 < 2) and (True or False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#!/usr/bin/env python\n", "\n", "\"\"\"Exercise in logical statements\n", " \n", " P. Lewis p.lewis@ucl.ac.uk\n", "\n", " Tue 8 Oct 2013 10:11:03 BST\n", "\"\"\"\n", "\n", "# hunger threshold in hours\n", "hungerThreshold = 3.0\n", "# sleep threshold in hours\n", "sleepThreshold = 8.0\n", "\n", "# time since fed, in hours\n", "timeSinceFed = 4.0\n", "# time since sleep, in hours\n", "timeSinceSleep = 3.0\n", "\n", "# Note use of \\ as line continuation here\n", "# It is poor style to have code lines > 79 characters\n", "#\n", "# see http://www.python.org/dev/peps/pep-0008/#maximum-line-length\n", "#\n", "print \"Tired and hungry?\",(timeSinceSleep >= sleepThreshold) and \\\n", " (timeSinceFed >= hungerThreshold)\n", "print \"Just tired?\",(timeSinceSleep >= sleepThreshold) and \\\n", " (not (timeSinceFed >= hungerThreshold))\n", "print \"Just hungry?\",(not (timeSinceSleep >= sleepThreshold)) and \\\n", " (timeSinceFed >= hungerThreshold)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code above works fine, but the large blocks of logical tests are not very clear or readable, and contain repeated items.\n", "\n", "Type the code into a file (or [download it](files/python/hungry.py)).\n", "\n", "To run the code *either* type at the unix prompt:\n", "\n", "`berlin% python hungry.py`\n", "\n", "OR, within ipython:\n", "\n", "`In [17]: run hungry.py`\n", "\n", "Modify this block of code to be clearer by assigning the individual logical tests to variables, \n", "\n", "e.g.\n", "\n", "`tired = timeSinceSleep >= sleepThreshold`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.4 Groups of things" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4.1 tuples, lists and dictionaries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Very often, we will want to group items together. \n", "\n", "There are several main mechanisms for doing this in Python, known as:\n", "\n", "- tuple, e.g. (1, 2, 3)\n", "- list, e.g. [1, 2, 3]\n", "- dict, e.g. {1:'one', 2:'two', 3:'three'}\n", "\n", "You will notice that each of these grouping structures uses a different form of bracket." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### tuple" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `tuple` is a group of items separated by commas.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = 1, 2, 'three', False\n", "print t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that when you declare the tuple, you don't need to put the braces (brackets) as this is implicit.\n", "\n", "Often though, it is a good idea to do so." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = (1, 2, 'three', False)\n", "print t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If there is only one element in a tuple, you must put a comma `,` at the end, otherwise it is *not* interpreted as a tuple:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = (1)\n", "print t,type(t)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = (1,)\n", "print t,type(t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can have an *empty* tuple though:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = ()\n", "print t,type(t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the tuple can contain data of different types. \n", "\n", "It can also be nested (i.e. a tuple can contain a tuple):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = ('one', 2), 3, ((4,5),6)\n", "print t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some operations we can perform on tuples include:\n", "\n", "- length : `len()`\n", "- selection ('slice') : []\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### len" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# set up a simple tuple\n", "t = (1,2,3)\n", "print \"The length of the tuple (1,2,3) is\",len(t)\n", "\n", "# a nested example\n", "t = (1,('2a','2b'),3)\n", "print \"The length of the tuple (1,(1,('2a','2b'),3),3) is\",len(t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### slice" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# select an item with []\n", "# note the first item is 0, i.e. we start counting at 0\n", "# Python uses a 0-based indexing system\n", "\n", "t = ('it','is','a','truth','universally','acknowledged')\n", "\n", "print 'item 0',t[0]\n", "print 'item 4',t[4]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# using negative to count from the end\n", "# so -1 is the last item, -2 the second to last etc\n", "\n", "t = ('it','is','a','truth','universally','acknowledged')\n", "\n", "print 'item -1',t[-1]\n", "print 'item -3',t[-3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# select a range (a 'slice') of items with [start:end:step]\n", "\n", "t = ('it','is','a','truth','universally','acknowledged')\n", "\n", "print 'items 0:1\\t',t[0:1] # 0 to 1, so only item 0\n", "print 'items 0:2\\t',t[0:2] # 0 to 2, so items 0 and 1\n", "print 'items :4:2\\t',t[:4:2] # :4:2 so items 0 (implicit) to 4, in steps of 2\n", " # so items 0 and 3\n", "print 'items ::-1\\t',t[::-1] # 0 to end in steps of -1, so reverse order\n", "print 'items 1:-1:2\\t',t[1:-1:2] # 1 to -1 in steps of 2 so items 1 and 3\n", "\n", "# Note the use of \\t in the strings above e.g. 'items 0:1\\t'\n", "# where \\t is a tab character (for prettier formatting)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In effect, when we set up a tuple, we are *packing* some group of items together:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = ('the', 'past', ('is', 'a', 'foreign', 'country'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can similarly *unpack*:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a, b, c = t\n", "\n", "print 'a:',a\n", "print 'b:',b\n", "print 'c:',c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = ('As', 'Gregor', 'Samsa', 'awoke', 'one', 'morning')\n", "\n", "a,b = t[1:4:2]\n", "print 'a:',a\n", "print 'b:',b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The set of operations we can perform on tuples includes:\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameExampleResultMeaning
len()
len((1,2,(3,4))
3
Length
+
(1,2) + (3,4,5)
(1,2,3,4,5)
Concatenate (join)
\\*
(1,2) * 3
(1, 2, 1, 2, 1, 2, 1, 2)
Repetition
in
2 in (1,2,3,4)
True
Membership
index
('a','b','c','d').index('c')
2

Index

T.index(value, [start, [stop]]) -> integer -- return first index of value.

Raises ValueError if the value is not present

count
('a','b','c','c','d','c').count('c')
3

Count

T.count(value) -> integer -- return number of occurrences of value

min(), max()

min(('a','b','c'))

max((3,4,1))

'a'

4

Minimum, maximum
tuple()

tuple([1,2,3])

tuple('hello')

tuple({1:'one',2:'two'})

(1,2,3)

('h', 'e', 'l', 'l', 'o')

(1,2)

Convert to tuple
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will find that these basic operators work with any of the group types, so you should make sure you are aware of them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# some examples\n", "\n", "print \"\\ntuple('hello world'):\\n\\t\",tuple('hello world')\n", "print \"\\ntuple('hello world').count('o'):\\n\\t\",tuple('hello world').count('o')\n", "print \"\\n('1',)*5 + (1,)*5:\\n\\t\",('1',)*5 + (1,)*5\n", "\n", "# Note use of \\t in string for tab and \\n for newline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You **cannot** directly replace an element in a tuple:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = (1,2,3,4)\n", "t[2] = 'three'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "so you would need to find another way to do this, e.g." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = (1,2,3,4)\n", "\n", "t = t[:2] + ('three',) + t[3:]\n", "print t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Neither can you delete an item in a tuple:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = (1,2,3,4)\n", "\n", "del t[2]\n", "print t" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# again, find another way around\n", "\n", "t = (1,2,3,4)\n", "\n", "t = t[:2] + t[3:]\n", "print t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### string as a group" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might have noticed that a string data type `str` acts as a collection of individual characters, e.g.:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "word = 'hello world'\n", "\n", "print \"word =\\t\",word,\"\\n\"\n", "\n", "print \"tuple(word) =\\t\",tuple(word)\n", "# slice\n", "print \"word[2:5] =\\t\",word[2:5]\n", "# len\n", "print \"len(word) =\\t\",len(word)\n", "# max (similarly min)\n", "print \"max(word) =\\t\",max(word)\n", "# in (membership)\n", "print \"'w' in word =\\t\",'w' in word\n", "# count\n", "print \"word.count('l')=\\t\",word.count('l')\n", "# index\n", "print \"word.index('l')=\\t\",word.index('l')\n", "# + (concatenation)\n", "print \"word + ' again'=\\t\",word + ' again'\n", "# * (repetition)\n", "print \"word * 2=\\t\",word*2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This sort of consistency or operation is one of the things that makes Python a good language to program in.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*lists* or *sequences* are contained within square brackets `[]`:\n", "\n", "The operators available for tuple work in much the same way as for lists (more formally, sequences):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = ['It', 'was', 'the', 'best', 'of', 'times']\n", "print t\n", "\n", "\"\"\"slicing is the same as for tuples\"\"\"\n", "t = ['It', 'was', 'the', 'best', 'of', 'times']\n", "\n", "print \"\\n----slice----\"\n", "print 't[:4:2]:\\n\\t',t[:4:2] # 0th item to 4th in steps of 2, so 0,2\n", "print 't[::-1]:\\n\\t',t[::-1] # reversal\n", "\n", "\n", "\"\"\"index\"\"\"\n", "t = ['It', 'was', 'the', 'best', 'of', 'times']\n", "\n", "print \"\\n----index----\"\n", "print 't.index(\"best\"):\\n\\t',t.index('best')\n", "\n", "\n", "\"\"\"plus\"\"\"\n", "t = ['It', 'was'] + ['the', 'best', 'of', 'times']\n", "\n", "print \"\\n----plus----\"\n", "print \"t = ['It', 'was'] + ['the', 'best', 'of', 'times']\\n\\t\",t\n", "\n", "\n", "\"\"\"multiply\"\"\"\n", "t = ['It', 'was', 'the'] + ['best'] * 3 + ['of', 'times']\n", "\n", "print \"\\n----multiply----\"\n", "print \"t = ['It', 'was', 'the'] + ['best'] * 3 + ['of', 'times']\\n\\t\",t\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But there are many other things one can do with a list, e.g.:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"replace\"\"\"\n", "t = ['It', 'was', 'the', 'best', 'of', 'times']\n", "\n", "print \"\\n----replace----\"\n", "t[3:5] = ['New','York']\n", "print \"t[3:5] = 'New York':\\n\\t\",t\n", "\n", "\n", "\"\"\"index and replace\"\"\"\n", "t = ['It', 'was', 'the', 'best', 'of', 'times']\n", "\n", "print \"\\n----index and replace----\"\n", "t[t.index('best')] = 'worst'\n", "print 't[t.index(\"best\")] = \"worst\":\\n\\t',t\n", "\n", "\n", "\"\"\"can delete one or more items\"\"\"\n", "t = ['It', 'was', 'the', 'best', 'of', 'times']\n", "\n", "print \"\\n----del item----\"\n", "del t[2:4] # delete items 2 to 4, i.e. 2, 3 \n", " # i.e. 'the', 'best'\n", "print 'del t[2:4]:\\n\\t',t\n", "\n", "\n", "\"\"\"can sort\"\"\"\n", "t = ['It', 'was', 'the', 'best', 'of', 'times']\n", "\n", "print \"\\n----sort----\"\n", "t.sort() # sort inplace\n", "\n", "print 't.sort():\\n\\t',t\n", "\n", "\n", "\"\"\"can insert\"\"\"\n", "t = ['It', 'was', 'the', 'best', 'of', 'times']\n", "\n", "print \"\\n----insert----\"\n", "t.insert(1,'really') # insert inplace \n", "\n", "print \"t.insert(1,'really'):\\n\\t\",t\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Many functions that return multiple items will make use of a list to do so.\n", "\n", "An example of this that we will use below is [`range(start,stop,step)`](http://docs.python.org/2/library/functions.html#range) that returns a list of integers from `start` to (but not including) `stop` in steps of `step`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print range(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print range(1,3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print range(-10,10,2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# set a value 3 to the variable x\n", "x = 3\n", "\n", "# range(10) produces the lis\n", "# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n", "x in range(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.5 Loops and Conditional Statements: if, for, while" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### if" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far, we have come across the ideas of variables, data types, and two ways of grouping objects together (tuples and lists).\n", "\n", "Another fundamental aspect of any programming language is conditional statements. The simplest form of this is an `if ... else ...` statement:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pockets = ['phone','keys','wallet','frog'] \n", "\n", "this_item = 'nothing'\n", "\n", "# test if something is in the list \n", "if this_item in pockets:\n", " print \"You do have\",this_item,'in your pocket'\n", "else:\n", " print \"You don't have\",this_item,'in your pocket'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, \n", "\n", " this_item in pockets\n", "\n", "is a membership test as we have seen above (it returns `True` or `False`).\n", "\n", "If it's `True`, the code block\n", "\n", " print \"You do have\",this_item,'in your pocket'\n", "\n", "is executed. If `False`, then the next condition is checked. \n", "\n", "\n", "Note the use of indentation here (using `tab` or spaces) to represent the structure of the conditional statements. \n", "\n", "Note also the use of a colon (`:`) to mark the end of the conditional test." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'''An if example\n", "\n", " Threshold the value of x at zero\n", "'''\n", "\n", "x = 3\n", "\n", "# threshold at zero\n", "# and print some information about what we did\n", "\n", "if x < 0:\n", " print 'x less than 0'\n", " x = 0\n", "elif x == 0:\n", " print 'x is zero'\n", "else:\n", " print 'x is more than zero'\n", " \n", "print 'thresholded x = ',x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The syntax of this is:\n", "\n", "```python\n", " if condition1 is True:\n", " ...\n", " elif condition2 is True:\n", " ...\n", " else:\n", " ...\n", "```\n", "where `condition1` and `condition2` are logical tests (e.g. `this_item in pockets`, `today == \"Wednesday\"`, `x > 10` etc.) and the `is True` part of the syntax is implicit (i.e. you don't need to type `is True`).\n", "\n", "The word `elif` means `else if`. The tests are considered in the order they are givem so that if `condition1` is `not True`, we examine `condition2`. If that is `not True`, we fall through to the final `else` block.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# nested conditional statements: If\n", "\n", "what_you_keep = 'your head'\n", "when_you_do_it = 'all about you are losing theirs'\n", "whom_you_trust_when_all_men_doubt_you = 'yourself'\n", "\n", "# first tests\n", "if ( what_you_keep == 'your head' ) and \\\n", " (when_you_do_it == 'all about you are losing theirs'):\n", "\n", " # second level tests\n", " if whom_you_trust_when_all_men_doubt_you == 'yourself':\n", " print \"Yours is the Earth and everything that’s in it ...\"\n", " else:\n", " print \"Nearly there ...\"\n", "\n", "else:\n", " print \"Have a look at http://www.poetryfoundation.org/poem/175772\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2.2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 2.2 A." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A small piece of Python code that will set the variable `today` to be a string with the day of the week today is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This imports a module that we can use to access dates\n", "from datetime import datetime\n", "\n", "# set up a list of days of the week, starting Monday\n", "# Note the line continuation here with \\\n", "week = ['Monday','Tuesday','Wednesday','Thursday',\\\n", " 'Friday','Saturday','Sunday']\n", "\n", "# This part gives the day of the week\n", "# as an integer, 0 -> Monday, 1 -> Tuesday etc.\n", "day_number = datetime.now().weekday()\n", "\n", "today = week[day_number]\n", "\n", "# print item day_number in the list week\n", "print \"today is\",today" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on the example below, **set up a diary for youself for the week to print out what you should be doing today, using the conditional structure `if .. elif ... else`.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if day_number == 2:\n", " print \"Remember to wake up early to get to the Python class at UCL\"\n", "elif day_number == 4:\n", " print \"Remember to wake up early to go to classes at Imperial College\"\n", "else:\n", " print \"get some sleep\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 2.2 B." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You could set up the basic calendar for the week in a list, with the first entry representing Monday, the second Tuesday etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_diary = ['Spend the day practicing Python',\\\n", " 'Do some reading in the library at UCL', \\\n", " 'Remember to wake up early to get to the Python class at UCL',\\\n", " 'Spend the day practicing Python',\\\n", " 'Remember to wake up early to go to classes at Imperial College',\\\n", " 'Work at Python exercises from home',\\\n", " 'Work at Python exercises from home']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a list of this sort, **print the diary entry for today *without* using conditional statements.**\n", "\n", "Criticise the code you develop and make suggestions for improvement." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### for" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Very commonly, we need to iterate or 'loop' over some set of items.\n", "\n", "The basic stucture for doing this (in Python, and many other languages) is `for ... in ...`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "count_list = range(1,4)\n", "\n", "# for loop\n", "for count in count_list:\n", " '''print counter in loop'''\n", " print count\n", " \n", "print 'blast off'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which has the syntax:\n", "\n", " for var in list:\n", " ...\n", "\n", "\n", "where the variable `var` is set to each of the items in `list`, in the order in which they appear in `list`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### xrange" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we have a loop, there nust be something that defines what it is we loop over. In the example above, this was a list, `count_list`, which here is `[1, 2, 3]`.\n", "\n", "In Python, we can use either use some explicit list, tuple etc. to define what we loop over, or, we can use a [generator expression](http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#generator-expressions-1), which is something that returns one of its members at a time.\n", "\n", "Normally then, instead or using `range` above, which involves an *explicit* calculation and storage of all elements of the list, we use a generator function, [`xrange`](http://docs.python.org/2/library/functions.html#xrange), which, in essence, returns the elements 'on demand' as needed in the loop (and so uses less memory, though there is little real difference except for very large loops).\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# for loop\n", "for count in xrange(1,4):\n", " '''print counter in loop'''\n", " print count\n", " \n", "print 'blast off'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you need to force an [`iterable`](http://docs.python.org/2/glossary.html#term-iterable) to e.g. return a list, you can convert the data type to `list`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print xrange(1,4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print list(xrange(1,4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### enumerate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Commonly, when iterating over a set of items, we also need access to a counter, telling us which item in the list we are currently on.\n", "\n", "This is done using the function [`enumerate()`](http://docs.python.org/2/library/functions.html#enumerate). This returns the `tuple` `(count,item)` where `count` is the index of `item` in `list`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "word_list = ['Call', 'me', 'Ishmael']\n", "\n", "for i,w in enumerate(word_list):\n", " print 'The',i,'th','word is',w" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here the syntax:\n", "\n", " for count,var in enumerate(list):\n", " ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.6 Strings and things" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have seen the data type `str` above and noted some of the operations we can use of strings.\n", "\n", "You can look over some more [detailed notes](http://docs.python.org/2/library/string.html_) on strings at some point, but here we will now go through some other typical operations you will use in scientific computing:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6.1 Some basic string operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a recap, with some slightly more complicated examples:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " word \n", "\t= hello world \n", "\n", "list(word) \n", "\t= ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']\n", "word[::-1] \n", "\t= dlrow olleh\n", "len(word) \n", "\t= 11\n", "min(word) \n", "\t= \n", "\t... what was printed there?\n", "'n' in word \n", "\t= False\n", "word.count('p')\n", "\t= 0\n", "'hey!' + word[len('hello')::2]\n", "\t= hey! ol\n", "word[:6]* 3\n", "\t= hello hello hello \n" ] } ], "source": [ "word = 'hello world'\n", "\n", "print \"word \\n\\t=\",word,\"\\n\"\n", "\n", "print \"list(word) \\n\\t=\",list(word)\n", "\n", "# slice\n", "print \"word[::-1] \\n\\t=\",word[::-1]\n", "# len\n", "print \"len(word) \\n\\t=\",len(word)\n", "# min (similarly max)\n", "print \"min(word) \\n\\t=\",min(word),'\\n\\t... what was printed there?'\n", "# in (membership)\n", "print \"'n' in word \\n\\t=\",'n' in word\n", "# count\n", "print \"word.count('p')\\n\\t=\",word.count('p')\n", "# + (concatenation)\n", "print \"'hey!' + word[len('hello')::2]\\n\\t=\",'hey!' + word[len('hello')::2]\n", "# * (repetition)\n", "print \"word[:6]* 3\\n\\t=\",word[:6]*3" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "word.index('x')=\t" ] }, { "ename": "ValueError", "evalue": "substring not found", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# look what happens if we call index\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;31m# for something that doesn't exist in the string\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[1;32mprint\u001b[0m \u001b[1;34m\"word.index('x')=\\t\"\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mword\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'x'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mValueError\u001b[0m: substring not found" ] } ], "source": [ "# look what happens if we call index\n", "# for something that doesn't exist in the string\n", "print \"word.index('x')=\\t\",word.index('x')\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Sometimes, we might wish to use \n", "# the string operator find instead\n", "print \"word.find('x')=\\t\",word.find('x')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6.2 split" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we have some data that are presented to us as a string with white space separating each data element, e.g.:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = \"1964 1220 1974 2470 1984 2706 1994 4812 2004 2707\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These data are total fossil fuel emissions for Zimbabwe (thousand metric tons of C) for selected years (dataset [doi 10.3334/CDIAC/00001_V2011](ftp://cdiac.ornl.gov/pub/trends/emissions/zim.dat)).\n", "\n", "The even elements are the year (`1964`, `1974` etc.) and the odd elements (`1220`, `2470`) the data for that year.\n", "\n", "We can use the string operator `split()` to separate this into a list of strings:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = \"1964 1220 1974 2470 1984 2706 1994 4812 2004 2707\"\n", "sdata = data.split()\n", "print sdata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could the convert these to integer:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = \"1964 1220 1974 2470 1984 2706 1994 4812 2004 2707\"\n", "sdata = data.split()\n", "\n", "# how many items are there?\n", "# use len(), and divide by 2 in this case\n", "n_items = len(sdata)\n", "\n", "# create an empty list: years\n", "years = []\n", "\n", "# create an empty list: emissions\n", "emissions = []\n", "\n", "# loop over sdata in steps of 2\n", "# and append years and emissions\n", "# data as int\n", "\n", "# xrange(0,n_items,2) because\n", "# we want to step every 2 in this case\n", "for i in xrange(0,n_items,2):\n", " years.append(int(sdata[i]))\n", " emissions.append(int(sdata[i+1]))\n", "print years\n", "print emissions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6.3 join" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 'opposite' of `split` is `join`.\n", "\n", "This returns an iterable of the form `S.join(list)`, where `S` is the separator and `list` is a **list of strings** e.g.:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "str1 = 'hello'\n", "str2 = 'world'\n", "\n", "# joint with space\n", "print ' '.join([str1,str2])\n", "# joint with no space\n", "print ''.join([str1,str2])\n", "# join with tab\n", "print '\\t'.join([str1,str1,str2])\n", "# join with colon :\n", "# note what happens we pass a \n", "# string, rather than a list\n", "print ':'.join(str1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# remember that it has to be a list\n", "# of strings: years here is a list\n", "# of integers\n", "print ' '.join(years)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = \"1964 1220 1974 2470 1984 2706 1994 4812 2004 2707\"\n", "sdata = data.split()\n", "\n", "for i in xrange(0,len(sdata),2):\n", " print ' '.join(sdata[i:i+2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6.4 replace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another useful string operator is `replace`, e.g.:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# change white space separation to comma separation\n", "data = \"1964 1220 1974 2470 1984 2706 1994 4812 2004 2707\"\n", "print data.replace(' ',',')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6.5 format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most common way you are likely to be formatting strings is using expressions such as:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "how_many = 10\n", "how_much = \"hours\"\n", "\n", "print \"There are only %d things to learn.\\\n", " \\nBut\\tit will take you %s to do so.\"%(how_many,how_much)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using this style of string formatting, you put control characters into the string, e.g.:\n", "\n", " \"%d: hello %s\"\n", " \n", "and put a tuple of variables after the string, separated by `%` which are inserted into the string in the order in which they appear:\n", "\n", " \"%d: hello %s\"%(10,'ten')\n", " \n", "Note that you must get the data types correct, or you will generate an error.\n", "\n", "The most common formatting codes are:\n", "\n", "- `%d` represents an integer\n", "- `%s` represents a string\n", "- `%f` represents a float\n", "- `%e` represents a float in exponential form\n", "\n", "e.g.:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print \"\\\n", " integer %d\\n\\\n", " float %f\\n\\\n", " string %s\\n\\\n", " exponential %e\"%(3,3.1415926536,\"pies are squared\",3.1415926536)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6.6 strip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A useful method is `strip()` that strips off any unnecessary white space and newlines e.g.:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "| hello world \t|\n", "|hello world|\n" ] } ], "source": [ "string = ' hello world \\t'\n", "print '|%s|'%string\n", "print '|%s|'%string.strip()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6.9 Getting and splitting filenames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### glob" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "file_list:\n", "\t['files/data/modis_files2a.txt', 'files/data/modis_files2b.txt', 'files/data/some_modis_files.txt', 'files/data/HadSEEP_monthly_qc.txt', 'files/data/heathrowdata.txt', 'files/data/modis_files.txt']\n" ] } ], "source": [ "# example, with directory names\n", "\n", "# glob unix style pattern matching for files and directories\n", "import glob\n", "\n", "# returns a list (or [] if empty)\n", "# to match the pattern given\n", "file_list = glob.glob(\"files/data/*.txt\")\n", "print \"file_list:\\n\\t\",file_list\n", "\n", "# e.g. the first string in the list\n", "this_file = file_list[0]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "this_file.split('/'):\n", "\t['files', 'data', 'HadSEEP_monthly_qc.txt']\n", "\n", "this_file.split('/')[-1]:\n", "\tHadSEEP_monthly_qc.txt\n" ] } ], "source": [ "this_file = 'files/data/HadSEEP_monthly_qc.txt'\n", "\n", "# split the filename on the field '/'\n", "print \"\\nthis_file.split('/'):\\n\\t\",this_file.split('/')\n", "\n", "# so the filename is just the last element in this list\n", "print \"\\nthis_file.split('/')[-1]:\\n\\t\",this_file.split('/')[-1]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "file_list:\n", "\t['data/modis_files2a.txt', 'data/modis_files2b.txt', 'data/some_modis_files.txt', 'data/HadSEEP_monthly_qc.txt', 'data/heathrowdata.txt', 'data/modis_files.txt']\n", "\n", "file names:\n", "\tmodis_files2a.txt\n", "\tmodis_files2b.txt\n", "\tsome_modis_files.txt\n", "\tHadSEEP_monthly_qc.txt\n", "\theathrowdata.txt\n", "\tmodis_files.txt\n" ] } ], "source": [ "# another example, with directory names\n", "\n", "# glob unix style pattern matching for files and directories\n", "import glob\n", "\n", "# returns a list\n", "file_list = glob.glob(\"data/*.txt\")\n", "print \"file_list:\\n\\t\",file_list\n", "\n", "print \"\\nfile names:\"\n", "# loop over the list of file namnes\n", "for this_file in file_list:\n", " # for each of these\n", " # split the filename on the field '/'\n", " print \"\\t\",this_file.split('/')[-1]\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2.3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data below are fields of:\n", "\n", "0 year \n", "1 month \n", "2 tmax (degC) \n", "3 tmin (degC) \n", "4 air frost (days) \n", "5 rain (mm) \n", "6 sun (hours) \n", "\n", "for Lowestoft in the UK for the year 2012, taken from [Met Office data](http://www.metoffice.gov.uk/climate/uk/stationdata/lowestoftdata.txt)." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 2012 1 8.7 3.1 5 33.1 53.9\n", " 2012 2 7.1 1.6 13 13.8 86.6\n", " 2012 3 11.3 3.7 2 64.2 141.3\n", " 2012 4 10.9 4.3 3 108.9 151.1\n", " 2012 5 15.1 8.6 0 46.6 171.3\n", " 2012 6 17.9 10.9 0 74.4 189.0\n", " 2012 7 20.3 12.8 0 93.6 206.9\n", " 2012 8 22.0 14.0 0 59.6 217.3\n", " 2012 9 18.9 9.5 0 38.8 200.8\n", " 2012 10 13.6 7.9 0 92.7 94.7\n", " 2012 11 10.5 4.4 2 62.1 79.6 \n", " 2012 12 7.9 2.4 8 95.6 41.9 \n" ] } ], "source": [ "data = \"\"\" 2012 1 8.7 3.1 5 33.1 53.9\n", " 2012 2 7.1 1.6 13 13.8 86.6\n", " 2012 3 11.3 3.7 2 64.2 141.3\n", " 2012 4 10.9 4.3 3 108.9 151.1\n", " 2012 5 15.1 8.6 0 46.6 171.3\n", " 2012 6 17.9 10.9 0 74.4 189.0\n", " 2012 7 20.3 12.8 0 93.6 206.9\n", " 2012 8 22.0 14.0 0 59.6 217.3\n", " 2012 9 18.9 9.5 0 38.8 200.8\n", " 2012 10 13.6 7.9 0 92.7 94.7\n", " 2012 11 10.5 4.4 2 62.1 79.6 \n", " 2012 12 7.9 2.4 8 95.6 41.9 \"\"\"\n", "\n", "print data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use the Python package `pylab` to simply plot data on a graph:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAEZCAYAAAB1mUk3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHz9JREFUeJzt3XeYVPXZxvHvI4hS7A0pCjEYxRJFomDErIlEVASjMSIC\nxha7xhS7cTXGWDDqKxIjAiIKIqiIhRgsAwSVYgkSQLAhoCwoIKggsPu8f/wGWTe7MOzOzDkz5/5c\n117uTn32XDL3/rq5OyIikmxbRF2AiIhET2EgIiIKAxERURiIiAgKAxERQWEgIiIoDKTAmFnKzM7O\n03tdYGZlZrbCzHbIx3tWef9dzGyWmW2VwWP7mtn5+ahLipPCQGLHzD4ys6/NbKWZLTKzwWbWOH23\np7829RqtzKzCzGr1/7iZbQncCfzM3bd192W1eZ0M36umWq8CBrv7Nxm8TF/gmnTdIptNYSBx5EBX\nd98GaAe0B66r5WtZLZ/XFNgamFXL52fEzOpX/rHS7VsBfYBHMnkdd18EzAa6ZbVASQyFgcSau38C\n/BPYr+p9FlyXbkmUmdkQM9s2ffeE9H+Xp1sYh1Xz/K3M7G4zW5j+usvMGpjZ3mwIgeVm9mI1z93a\nzB4xs8/MbJmZTTGzXdP3tTaz8enupX+ZWT8zG5q+b30r4Cwzmwe8BIyvptbDgOXp3x8z29HM5ptZ\n1/TPTczsPTPrVamsFHB85ldXZAOFgcSVAZhZS+BY4K1qHnMmcAZQAnwPaAL0S9/XKf3f7dx9G3ef\nXM3zrwUOBX6Y/joUuM7d57AhfLZz96Oree4ZwLZAC2BH4DxgVfq+YcBUYCfgz4S/8Kt2bR0J7AP8\nPP191VoPAN5d/2B3XwqcBQwws12Au4A33b1yy2F2+vcQ2Wz1N/0QkbwzYLSZrQO+AJ4FbqnmcacD\nd7r7RwBmdjUww8x+TWbdQz2Bi939s/TzbwT+Afwpg+evIXzYt3H3d0iHlZntQejW+qm7rwUmmtkz\n1bxeqbuvSj+nuvfaHlhZ+QZ3H2dmI4GX0/cfWOU5K9O3i2w2tQwkjhzo7u47uHsrd7+4hkHU3YF5\nlX7+mPAHzm4Zvk+zap7fLMPnDgVeAB5LdzHdlu7/bwYsW/9BnzavmufP38TrLwW2qeb2AYRWy0PV\nDGpvAyzPqHqRKhQGUsg+AVpV+nkPYB1QRgYzjmp4/ieZvLG7r3P3m9x9P+BwoCuhO+gTYAcza1Tp\n4XtW9xI1fL/edGDvyjeYWT3gAeBh4CIz26vKc/YF3s6kfpGqFAZSyIYDl6cHZZsQupIec/cKYAlQ\nAVT9wKz6/OvMbGcz25nQPTQ0kzc2sxIzOyD9Ab0SWAuUu/vHwDTgRjPb0syOIATFxsKpulqnAtub\nWeWWyjVAOWGs5A7g4SrTUX8CjM2kfpGqFAZSyAYRPrwnAB8AXwOXALj718BfgEnp2T6HVvP8mwkf\n3NPTX9PSt623sQ/wpsBIwpjGTMJMnvVB0pMwG2gpIWAe5rtjBt953epqdfc1wENALwAzOwS4HOjj\n4RCS29Kvc2X6/t0JLYPRG6lZpEaWq8NtzGwQYZrbYnc/IH3bjsAIQrP5I+BX7q4+TilqZnYD8H13\n772Zz9sZmAgctKmFZ2bWF3jP3e+vfaWSZLlsGQwGulS57SpgnLvvTZhffVUO318kLmq18M3dP3P3\nfTNZgezuf1AQSF3kLAzcfSJQdbZDN2BI+vshwIm5en+RGMloCw2RKOV7ncFu7l6W/r6MzKcAihQs\nd78x6hpENiWyAeT0IJj+WhIRiYF8twzKzKypuy9Kz35YXN2DzEwhISJSC+5eqzGqfLcMxhD2dCH9\n3xqnwbm7vty54YYbIq8hLl+6FroWSb4WH37o3H670769s8suzvnnOy+/7Kxbt+ExdZGzloGZDScs\ngtnZzOYT5lvfCjyePpzkI+BXuXp/EZFCN38+jBwJjz8O778PJ50Ef/0rlJRA/Sx/eucsDNz9tBru\nqm4HSBERAT79NATAiBEwezaceCLcdBMcdRRsmcOji7RracyVlJREXUJs6FpsoGuxQTFciyVL4Ikn\nQgC8/TaccAJcey0cfTQ0aJCfGnK2ArkuzMzjWJeISLYsXw6jR8Njj8Frr8Gxx0KPHtClC2y9de1e\n08zwWg4gKwxERPLkq6/gmWdg+HBIpeBnP4NTT4WuXaFx400+fZMUBiIiMfXNN/DCCyEAxo6Fjh1D\nC+DEE2G77bL7XgoDEZEYKS+HV14JATB6NOy/P5x2Gpx8MuyyS+7eV2EgIhIxd5gyBYYNC1NBmzcP\nAXDqqdCiRX5qqEsYaDaRiEgdzJwZAmD48DD187TTYPx42HvvTT83ThQGIiKbaf788OE/bFiYFtqj\nR1gbcPDBYLX6uzx66iYSEcnA0qUwahQ8+ijMmBFWA59+OnTqBPXqRV1doDEDEZEcWLUKnn02BMAr\nr8Axx0DPnmFNwFZbRV3d/1IYiIhkSXl56PN/5JEwE6hdO+jVK7QEtt026uo2TmEgIlJH06eHABg2\nDHbdNQRAjx7QrFnUlWVOs4lERGph4cLw4f/II2F7iNNPDwvE9tsv6sryTy0DEUmUlSvhySdh6FB4\n882wEKx3bzjiCNgisrMfs0PdRCIiG7FuHbz0UgiAZ5+FI48MAXDCCbXfFC6OFAYiItWYPh0efjh0\nBTVvDmecEVYE53JLiChpzEBEJK2sLHz4DxkS1gb07g0vvwz77BN1ZfGmloGIFLzVq2HMmBAAkyaF\nHUH79AnHQxb6OMDmUMtARBLHHSZPDgHw+ONw0EGhG2jECGjSJOrqCo/CQEQKysKFYSD4oYegoiIE\nwFtvwR57RF1ZYVMYiEjsrVoFTz8dAmDKFDjlFBg0KBwUU6gbw8WNwkBEYskdpk6FwYNDN9Ahh8CZ\nZ8JTT0HDhlFXV3wUBiISK2VloRto8GBYswZ+/Wt1A+WDwkBEIrd2LTz3XAiA8ePDpnD33x9WBasb\nKD8UBiISmZkzQ9//0KHQpg2cfXbYLlqzgfJPYSAiebVyZZj+OXAgzJsXZgNNnFh4x0QWGy06E5Gc\nc4dXXw0B8OSTcNRRoRXQpQvU15+kWaNFZyISS0uWhL2BHnwwrAk45xyYPRuaNo26MqlKYSAiWVVe\nDi++GAJg3Djo3h0GDIAf/1iDwXGmbiIRyYoFC8Jg8MCBsPPOcO65cNppsN12UVeWHOomEpFIrFsH\nzz8PDzwQxgR69AiLwtq1i7oy2VwKAxHZbPPmhRbAoEFhMdi554YZQo0bR12Z1JbCQEQysm5dWBj2\nwAPw+uvQsyeMHQsHHBB1ZZINCgMR2aiPPw6DwQMHQqtWcN55MHIkNGoUdWWSTQoDEfkf5eXhr/5/\n/COMBZx+OrzwAuy/f9SVSa4oDETkW59+GloAAwaEtQDnnRfGAtQKKH6RHAhnZleb2X/N7B0zG2Zm\nW0VRh4iE1cEvvRTOCGjbNnQLPfVUOEXsrLMUBEmR93UGZtYKeBnY192/MbMRwPPuPqTSY7TOQCTH\nli0LR0b+/e/QoAFccAH06gXbbht1ZVJbhbbOYAWwFmhkZuVAI2BhBHWIJNIbb0D//vDEE3D88aFb\nSKuDJe9h4O5LzexO4GNgFfCCu7+Y7zpEkmT16nBa2H33waJFcP75MGcO7Lpr1JVJXOQ9DMxsL+C3\nQCvgC2CkmZ3u7o9Wflxpaem335eUlFBSUpK/IkWKxEcfhW6gwYPDquDrroPjjoN69aKuTLIhlUqR\nSqWy8lpRjBmcCnR293PSP/cGOrj7RZUeozEDkVqqqAgbxfXrF6aF9ukTxgPatIm6Msm1QhszmA1c\nb2YNgdXA0cCUCOoQKSpffBEGhO+7LxwYf/HFMHy4toiQzEQxZvAfM3sYmAZUAG8CD+S7DpFiMWtW\naAUMHw6dO2tAWGpHW1iLFKDy8rBb6L33wvTp8JvfhAVizZtHXZlEqdC6iUSklpYvD4PB/frBjjvC\nZZeFxWJbadmm1JHCQKQAvPtuaAU8+mg4N/iRR6BDB3UFSfYoDERiyj0cG3n33TBtWugKmjFDXUGS\nGwoDkZhZtQqGDoV77gnrAX7727BauGHDqCuTYqYwEImJTz4J00IHDAhdQPfeC0cdpa4gyY9Idi0V\nkQ3efBN69w5nBXzxBUyaBGPGwE9/qiCQ/FEYiESgogKeeQZKSqB793B05Pvvh1lCWiksUVA3kUge\nff11WCV8111hq+jf/S5MDd1yy6grk6RTGIjkQVlZGA+4/37o2DGcKdypk7qBJD7UTSSSQ7Nnhymh\n++wDixfDxInw9NNw5JEKAokXtQxEsswd/v1v6NsXXnsNLrwwnB2wyy5RVyZSM4WBSJaUl4dZQLff\nDkuWwO9/HzaP0xnCUggUBiJ1tHp1WCTWty9svz1ccQWceKIOkJHCojAQqaUvvggDwvfcAwcdBA88\noLEAKVwKA5HNtGhR2C9owAA49lj45z/hwAOjrkqkbjSbSCRDH3wQjo9s2xa++greeCPsHqogkGKg\nMBDZhOnToWdPOPRQ2GmnMF303nuhVauoKxPJHoWBSA1efx26dYNjjgljAh98ADffDLvuGnVlItmn\nMQORStzh5ZfhL38JH/5XXAEjRmj7aCl+CgMRQgg891wIgWXL4OqrQ9eQ9gySpFAYSKJVVMBTT4Xu\nn4oKuPZaOPlkrRGQ5FEYSCKVl8Pjj4eWQMOGcOONcMIJWiMgyaUwkERZty5sEXHzzWFmUN++YYBY\nISBJpzCQRFi3Dh59NITA7rtD//46SUykMoWBFLX1IfDnP0OLFmHVcElJ1FWJxI/CQIrSunUwbNiG\nEHjwQYWAyMYoDKSolJeHMYGbbgrdQWoJiGRGYSBFoaICRo6E0tIwMHz//XDUURoTEMmUwkAKmjuM\nHg033BCmiN5zD3TurBAQ2VwKAylI7mHr6OuvD11Dt9wCxx+vEBCpLYWBFJwJE+Caa2Dp0jBA/Itf\nwBbaclGkThQGUjDeeCNsFzFnTlgx3LOnto0QyRb9PSWxN3s2nHJK2E66e/fwc+/eCgKRbFIYSGzN\nnw9nnw2dOsGPfgRz54aTxho0iLoykeKjMJDYWboU/vjHcKDMbruFbqErroBGjaKuTKR4RRIGZra9\nmY0ys1lmNtPMOkRRh8TL11/DrbfCD34AK1fCO++EWUI77BB1ZSLFL6oB5HuA5939l2ZWH2gcUR0S\nA+XlMGQI/OlP0LEjTJoEe+8ddVUiyWLunt83NNsOeMvdv7eRx3i+65L8c4fnn4crr4Qdd4Tbb4cO\naiOK1JqZ4e61Wm0TRcugNbDEzAYDPwTeAC5z968jqEUi8sYbYVzg00/httt0sIxI1KIYM6gPtAP6\nu3s74CvgqgjqkAh8/HGYFtq1K5x6ahgX6NZNQSAStShaBguABe4+Nf3zKKoJg9LS0m+/LykpoURb\nTxa0FSvC4PA//gEXXhhmCG2zTdRViRS2VCpFKpXKymvlfcwAwMwmAOe4+xwzKwUauvuVle7XmEGR\nKC+HgQPDRnLHHBNOGmvRIuqqRIpToY0ZAFwCPGpmDYD3gTMjqkNy6KWX4PLLw9TQ556Ddu2irkhE\nahJJy2BT1DIobHPnwh/+ADNmwB13hI3kNCYgknt1aRloBbJkzYoVYaVwx45w+OEwcyacdJKCQKQQ\nKAykzioqYNCgsHL4889Di+DKK2GrraKuTEQypS2spU5efx0uvRTq14dnnoH27aOuSERqQy0DqZWy\nMjjzTDj5ZLjkEvj3vxUEIoVMYSCbZe1auPtu2H9/2GWXDWcL6KQxkcKmbiLJ2IQJcNFFYVvpCRNg\n332jrkhEskVhIJu0aFHYRyiVgrvuCl1DmiEkUlzUuJcalZdDv35wwAHQrBnMmgW//KWCQKQYqWUg\n1Zo2Dc4/Hxo3Di2C/faLuiIRySW1DOQ7VqwIs4O6dg1TRhUEIsmgMBAgHDQzahS0bQurV4fVw336\nqEtIJCnUTSR8/HGYJfT++zB8OHTqFHVFIpJvahkkWHk5/N//hd1EDz0U3n5bQSCSVGoZJNSMGXDO\nOdCgQVg9vM8+UVckIlFSyyBhvvkmHDRz1FFw1llhgFhBICJqGSTI5MkhAPbaK3QJNW8edUUiEheb\nbBmY2aVmtkM+ipHcWLUqrCDu3h2uvx6eflpBICLflUk30W7AVDN73My6mGmyYSGZNAkOOijMGHrn\nHejRQ9NFReR/ZXTspZltAfwc+DXQHngcGOju7+ekKB17WWerVoVWwLBhYUuJk06KuiIRybWcH3vp\n7hXAIqAMKAd2AEaZ2R21eVPJrcmTw3TRBQtg+nQFgYhs2iZbBmZ2GdAH+Bx4EHjK3demWwtz3X2v\nrBellkGtrFkDN90EDz4I994Lp5wSdUUikk91aRlkMptoR+Akd59X+UZ3rzCzE2rzppJ9M2aEQ2Za\ntgwzhZo2jboiESkkGY0Z5JtaBpmrqAhnDNx6K9x2WziKUgPEIsmU65aBxNT8+XDGGaF7aMoUaN06\n6opEpFBpBXKBGjECDjkEOneG8eMVBCJSN2oZFJiVK8M5A6++Cs8/D+3bR12RiBQDtQwKyNSpYcpo\n/frw5psKAhHJHrUMCkBFBdx5J9xxB/TvH84hFhHJJoVBzC1eHAaJV6wILYM994y6IhEpRuomirFX\nXoGDDw5dQ6mUgkBEckctgxgqL4dbboG//x2GDAkzhkREcklhEDOLF0OvXuEQmmnToFmzqCsSkSRQ\nN1GMvPpqWDvQvj289JKCQETyRy2DGHAPG8v95S8waBAcf3zUFYlI0igMIvbll+Fg+jlz4PXXtZJY\nRKKhbqIIvfcedOwIjRqFLiIFgYhEJbIwMLN6ZvaWmT0TVQ1RGjsWfvxjuPBCGDgQtt466opEJMmi\n7Ca6DJgJbBNhDXnnDn/9K9x3Hzz5ZAgEEZGoRdIyMLMWwHGEk9MSs/v+119Dz54wenTYclpBICJx\nEVU30V3AH4GKiN4/7xYsgCOPhHr1wpbTzZtHXZGIyAZ5DwMz6wosdve3SEirYOpU6NABfvUrGDoU\nGjaMuiIRke+KYszgcKCbmR0HbA1sa2YPu3ufyg8qLS399vuSkhJKSkryWWPWjBoFF1wQBom7dYu6\nGhEpJqlUilQqlZXXivQMZDP7CfAHdz+hyu0Ffwby+oHi+++Hp58OG86JiORSoZ+BXNif+tVYuxbO\nOw/+85+wkEzbSohI3EXaMqhJIbcMVqwIh880aACPPQZNmkRdkYgkRV1aBlqBnEULF4YZQ3vtFaaP\nKghEpFAoDLJk1iw4/HDo0SMcTVk/Dh1wIiIZ0kdWFkyeDN27w223hSMqRUQKjcKgjsaOhT594KGH\ntPW0iBQuhUEdPPYYXHYZjBkTdh8VESlUGjOopQED4Pe/hxdfVBCISOFTy6AW7rwznEyWSkGbNlFX\nIyJSdwqDzXTzzWF/oYkToWXLqKsREckOhUGG3KG0FEaODLuONm0adUUiItmjMMiAO1x3XRgoTqVg\n112jrkhEJLsUBpvgDtdcE6aQvvIK7Lxz1BWJiGSfwmATSkvhuedCEOy0U9TViIjkhsJgI265JYwR\npFIKAhEpbgqDGtx5Z1hVPH68xghEpPgpDKoxYAD06wcTJsDuu0ddjYhI7uk8gypGjYJLLw0tAi0o\nE5FCUugnncXGuHFw4YXwr38pCEQkWbQ3UdrUqXD66fDEE3DQQVFXIyKSXwoD4MMPw3kEAwZAp05R\nVyMikn+JD4OlS+HYY+Haa0MgiIgkUaIHkFevhs6d4bDDoG/fnL+diEhO1WUAObFh4B7GCNauhREj\nYIvEt5FEpNBpNlEt3HorzJ0b1hIoCEQk6RIZBmPGwH33hYPsGzaMuhoRkeglLgxmzICzz4Znn4Xm\nzaOuRkQkHhLVQbJsWZgx9Le/hUFjEREJEjOAXFEBJ54IrVvDPfdk9aVFRGJBA8gZ6NsXliwJew+J\niMh3JSIMxo8PXUNTpkCDBlFXIyISP0U/ZlBWBj17hrMJ9tgj6mpEROKpqMcM3OG44+CQQ+Dmm7NQ\nmIhIjNVlzKCoWwb9+8Pnn8MNN0RdiYhIvBVty2DWLDjySJg0CfbeO0uFiYjEmFoGVaxZA716ha4h\nBYGIyKYVZRjceCM0awa/+U3UlYiIFIaim1r69tvhkJrp08Fq1VgSEUmevLcMzKylmb1iZv81sxlm\ndmm2Xru8HM49N+xI2rRptl5VRKT4RdEyWAtc7u5vm1kT4A0zG+fus+r6wvfeC02awJln1r1IEZEk\nyXsYuPsiYFH6+y/NbBbQDKhTGMybFwaMX3tN3UMiIpsr0gFkM2sFHAxMrsvruMOFF8Lvfgdt2mSj\nMhGRZIlsADndRTQKuMzdv6x6f2lp6bffl5SUUFJSUuNrPfccfPghjB6d/TpFROIqlUqRSqWy8lqR\nLDozsy2BZ4Gx7n53NfdnvOhs7Vo44ICwEd1xx2W5UBGRAlJQi87MzICBwMzqgmBzPfAAtGwJxx5b\n99pERJIq7y0DMzsCmABMB9a/+dXu/s9Kj8moZbB8OfzgBzBuHBx4YE7KFREpGHVpGRT03kRXXAFL\nl8KDD+ahKBGRmEtkGHz4IbRvHw643333PBUmIhJjBTVmkC233AIXXaQgEBHJhoJsGSxcGGYQzZ0L\nO+2Ux8JERGIscS2Dv/0NzjhDQSAiki0F1zL4/POwynj6dGjRIs+FiYjEWKJaBvfdB7/4hYJARCSb\nCqpl8NVX0Lo1TJwY1heIiMgGiWkZDBgQzjVWEIiIZFfBtAzcw3nGQ4dChw4RFSYiEmOJaBm8/jrU\nqweHHRZ1JSIixadgwmDoUOjVSwfXiIjkQkF0E61ZA82awbRp0KpVdHWJiMRZ0XcTjR0LbdsqCERE\ncqUgwmDoUOjdO+oqRESKV+y7iZYvhz33hI8+gh12iLYuEZE4K+puopEjoXNnBYGISC7FPgzURSQi\nknux7ib67DP4/vdh8WJo0CDqqkRE4q1ou4lmzQqziBQEIiK5FeswmDMnbEEhIiK5FeswePddbUon\nIpIPsQ4DtQxERPJDYSAiIvGdTbR2rdOkCSxbBg0bRl2RiEj8FeVsonnzoGlTBYGISD7ENgw0eCwi\nkj+xDQONF4iI5I/CQERE4hsG6iYSEcmf2IaBWgYiIvkT26mlDRs6X34JW8Q2rkRE4qUop5butZeC\nQEQkX2L7casuIhGR/IltGGjwWEQkf2IbBmoZiIjkTyRhYGZdzGy2mc01syure4xaBiIi+ZP3MDCz\nekA/oAvQFjjNzPat+ji1DIJUKhV1CbGha7GBrsUGuhbZEUXL4FDgPXf/yN3XAo8B3as+aKed8l5X\nLOl/9A10LTbQtdhA1yI7ogiD5sD8Sj8vSN8mIiIRiSIM4rfKTUQk4fK+AtnMOgCl7t4l/fPVQIW7\n31bpMQoMEZFaqO0K5CjCoD7wLvAz4BNgCnCau8/KayEiIvKt+vl+Q3dfZ2YXAy8A9YCBCgIRkWjF\ncqM6ERHJr9itQM5kQVoxMrOWZvaKmf3XzGaY2aXp23c0s3FmNsfM/mVm20dda76YWT0ze8vMnkn/\nnMhrYWbbm9koM5tlZjPN7LAEX4ur0/9G3jGzYWa2VVKuhZkNMrMyM3un0m01/u7pazU3/Xn68029\nfqzCINMFaUVqLXC5u+8HdAAuSv/uVwHj3H1v4KX0z0lxGTCTDTPQknot7gGed/d9gQOB2STwWphZ\nK+BcoJ27H0DoZu5Bcq7FYMJnY2XV/u5m1hY4lfA52gXob2Yb/byPVRiQ4YK0YuTui9z97fT3XwKz\nCOsvugFD0g8bApwYTYX5ZWYtgOOAB4H1syMSdy3MbDugk7sPgjDm5u5fkMBrAawg/NHUKD0RpRFh\nEkoiroW7TwSWVbm5pt+9OzDc3de6+0fAe4TP1xrFLQy0II1v/wI6GJgM7ObuZem7yoDdIior3+4C\n/ghUVLotideiNbDEzAab2ZtmNsDMGpPAa+HuS4E7gY8JIbDc3ceRwGtRSU2/ezPC5+d6m/wsjVsY\nJH4028yaAE8Al7n7ysr3eRjtL/prZGZdgcXu/hYbWgXfkZRrQZjx1w7o7+7tgK+o0g2SlGthZnsB\nvwVaET7smphZr8qPScq1qE4Gv/tGr0vcwmAh0LLSzy35broVNTPbkhAEQ919dPrmMjNrmr5/d2Bx\nVPXl0eFANzP7EBgO/NTMhpLMa7EAWODuU9M/jyKEw6IEXov2wKvu/rm7rwOeBDqSzGuxXk3/Jqp+\nlrZI31ajuIXBNKCNmbUyswaEAZAxEdeUF2ZmwEBgprvfXemuMcAZ6e/PAEZXfW6xcfdr3L2lu7cm\nDBC+7O69Sea1WATMN7P1+/geDfwXeIaEXQvCwHkHM2uY/vdyNGGCQRKvxXo1/ZsYA/QwswZm1hpo\nQ1jgWzN3j9UXcCxhhfJ7wNVR15PH3/sIQv/428Bb6a8uwI7Ai8Ac4F/A9lHXmufr8hNgTPr7RF4L\n4IfAVOA/hL+Gt0vwtbiCEIbvEAZMt0zKtSC0kj8B1hDGVs/c2O8OXJP+HJ0NHLOp19eiMxERiV03\nkYiIREBhICIiCgMREVEYiIgICgMREUFhICIiKAxERASFgYiIoDAQyYiZ/cjM/pM+TKVx+gCitlHX\nJZItWoEskiEz+zOwNdAQmO/ut0VckkjWKAxEMpTeVXYasAro6PrHI0VE3UQimdsZaAw0IbQORIqG\nWgYiGTKzMcAw4HvA7u5+ScQliWRN/agLECkEZtYH+MbdH0sfLP6qmZW4eyri0kSyQi0DERHRmIGI\niCgMREQEhYGIiKAwEBERFAYiIoLCQEREUBiIiAgKAxERAf4f5TpOI5HeUQIAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# import the pylab module\n", "import pylab as plt\n", "# show inline in the notebook\n", "%pylab inline\n", "\n", "# some e.g. x and y data\n", "x = range(100)\n", "y = [i**0.5 for i in x]\n", "\n", "plt.plot(x,y)\n", "plt.xlabel('x')\n", "plt.ylabel('y')\n", "plt.title('Plot of sqrt(x)') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Produce a plot of the number of sunshine hours for Lowestoft for the year 2012** using the data given above.\n", "\n", "Hint: the data have newline chcracters `\\n` at the end of each line of data, and are separated by white space within each line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.7 Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To open a file that already exists for *reading*, we use:\n", "\n", "```python\n", " fp = open(filename,'r')\n", "```\n", "\n", "where `filename` here is the name of a file and the `'r'` argument tells us that we want to open in 'read' mode.\n", "\n", "This returns a file object, `fp` here that we can use to read data from the file etc.\n", "\n", "When we have finished doing what we want to do, we should close the file:\n", "\n", "```python\n", " fp.close()\n", "``` \n", "\n", "To read ASCII data from a file as a list of strings for each line, use:\n", "\n", "```python\n", " fp.readlines()\n", "```\n", "\n", "To write ASCII text to the file, use:\n", "\n", "```python\n", " fp.write(\"some text\")\n", "```\n", "\n", "or to write a list of strings all at once:\n", " \n", "```python\n", " fp.writelines([\"some text\\n\",\"some more\\n\"])\n", "```\n", "\n", "As a first example, let's open a file for *reading* from a file [`files/data/elevation.dat`](files/data/elevation.dat) that contains a list of dates, times and solar elevation angles (degrees)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2014/9/30 00:00:00 -41.1227180114\r\n", "2014/9/30 00:30:00 -40.4796625312\r\n", "2014/9/30 00:59:59 -39.0649015428\r\n", "2014/9/30 01:30:00 -36.9463783528\r\n", "2014/9/30 02:00:00 -34.2136162508\r\n", "2014/9/30 02:30:00 -30.9643006022\r\n", "2014/9/30 03:00:00 -27.2941667536\r\n", "2014/9/30 03:30:00 -23.2912319377\r\n", "2014/9/30 03:59:59 -19.0337171884\r\n", "2014/9/30 04:30:00 -14.5903017653\r\n" ] } ], "source": [ "!head -10 < data/elevation.dat " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we are going to want to do is to create a new file which has the time, specified in decimal hours, and the solar zenith angle (i.e. 90 degrees minus the elevation) into a new file `files/data/zenith.dat`, but only when the Sun is above the horizon.\n", "\n", "Let's first concentrate on reading the data in:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "filename = 'data/elevation.dat'\n", "fp = open(filename,\"r\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will use `readlines` to return a list of strings:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['2014/9/30 00:00:00 -41.1227180114\\n', '2014/9/30 00:30:00 -40.4796625312\\n', '2014/9/30 00:59:59 -39.0649015428\\n', '2014/9/30 01:30:00 -36.9463783528\\n', '2014/9/30 02:00:00 -34.2136162508\\n', '2014/9/30 02:30:00 -30.9643006022\\n', '2014/9/30 03:00:00 -27.2941667536\\n', '2014/9/30 03:30:00 -23.2912319377\\n', '2014/9/30 03:59:59 -19.0337171884\\n', '2014/9/30 04:30:00 -14.5903017653\\n', '2014/9/30 05:00:00 -10.0216590927\\n', '2014/9/30 05:30:00 -5.01932715546\\n', '2014/9/30 06:00:00 -0.134849371939\\n', '2014/9/30 06:30:00 4.09286322097\\n', '2014/9/30 06:59:59 8.55453445901\\n', '2014/9/30 07:30:00 12.938958392\\n', '2014/9/30 08:00:00 17.1514399965\\n', '2014/9/30 08:30:00 21.1181133381\\n', '2014/9/30 09:00:00 24.7672136192\\n', '2014/9/30 09:30:00 28.020436136\\n', '2014/9/30 09:59:59 30.7949255712\\n', '2014/9/30 10:30:00 33.0070257679\\n', '2014/9/30 11:00:00 34.579161145\\n', '2014/9/30 11:30:00 35.4491052641\\n', '2014/9/30 12:00:00 35.5794764999\\n', '2014/9/30 12:30:00 34.9643838142\\n', '2014/9/30 12:59:59 33.6305127561\\n', '2014/9/30 13:30:00 31.6322282163\\n', '2014/9/30 14:00:00 29.0427156717\\n', '2014/9/30 14:30:00 25.9443096378\\n', '2014/9/30 15:00:00 22.4206252902\\n', '2014/9/30 15:30:00 18.5516782953\\n', '2014/9/30 15:59:59 14.4111578562\\n', '2014/9/30 16:30:00 10.0714597095\\n', '2014/9/30 17:00:00 5.61372180666\\n', '2014/9/30 17:30:00 1.2203105187\\n', '2014/9/30 18:00:00 -2.38322183516\\n', '2014/9/30 18:30:00 -8.47981048194\\n', '2014/9/30 18:59:59 -13.0951972617\\n', '2014/9/30 19:30:00 -17.6077341991\\n', '2014/9/30 20:00:00 -21.9594218892\\n', '2014/9/30 20:30:00 -26.0849005441\\n', '2014/9/30 21:00:00 -29.9096032812\\n', '2014/9/30 21:30:00 -33.3488186792\\n', '2014/9/30 21:59:59 -36.3086640806\\n', '2014/9/30 22:30:00 -38.6902793272\\n', '2014/9/30 23:00:00 -40.3983491291\\n', '2014/9/30 23:30:00 -41.3537969714\\n']\n" ] } ], "source": [ "sdata = fp.readlines() \n", "print sdata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that each line of the file contains three fields e.g.:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2014/9/30 00:00:00 -41.1227180114\n", "\n" ] } ], "source": [ "print sdata[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first field is the date (year, day, month), the second is the time (hour, minute, second), and the third is the solar elevation at UCL at that time/date.\n", "\n", "To decode each line then we can use `split()` e.g." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['2014/9/30', '00:00:00', '-41.1227180114']\n" ] } ], "source": [ "print sdata[0].split()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "what we want is the time and elevation fields, so we will make a loop to get this, but `break` from the loop after the first entry at the moment:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "00:00:00 -41.1227180114\n" ] } ], "source": [ "filename = 'data/elevation.dat'\n", "fp = open(filename,\"r\")\n", "\n", "for i in fp.readlines():\n", " data = i.split()\n", " time = data[1]\n", " elevation = float(data[2])\n", " print time,elevation\n", " break\n", "fp.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to convert the time field to minutes. We can start this by splitting the string on `:`:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['00', '00', '00']\n" ] } ], "source": [ "time = data[1].split(':')\n", "print time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "then convert these to `float` and add up the minutes:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0\n" ] } ], "source": [ "time = [float(i) for i in data[1].split(':')]\n", "hours = time[0] + time[1]/60. + time[2]/(60.*60)\n", "print hours" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "we can easily convert elevation to zenith angle." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "131.122718011\n" ] } ], "source": [ "zenith = 90. - elevation\n", "print zenith" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Putting this together, only printing when the zenith is less than or equal to 90.:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6.5 85.907136779\n", "6.99972222222 81.445465541\n", "7.5 77.061041608\n", "8.0 72.8485600035\n", "8.5 68.8818866619\n", "9.0 65.2327863808\n", "9.5 61.979563864\n", "9.99972222222 59.2050744288\n", "10.5 56.9929742321\n", "11.0 55.420838855\n", "11.5 54.5508947359\n", "12.0 54.4205235001\n", "12.5 55.0356161858\n", "12.9997222222 56.3694872439\n", "13.5 58.3677717837\n", "14.0 60.9572843283\n", "14.5 64.0556903622\n", "15.0 67.5793747098\n", "15.5 71.4483217047\n", "15.9997222222 75.5888421438\n", "16.5 79.9285402905\n", "17.0 84.3862781933\n", "17.5 88.7796894813\n" ] } ], "source": [ "filename = 'data/elevation.dat'\n", "fp = open(filename,\"r\")\n", "\n", "for i in fp.readlines():\n", " data = i.split()\n", " time = [float(i) for i in data[1].split(':')]\n", " hours = time[0] + time[1]/60. + time[2]/(60.*60)\n", " zenith = 90. - float(data[2])\n", " if zenith <= 90.:\n", " print hours,zenith\n", "fp.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, writing to an output file `files/data/zenith.dat`:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "ifilename = 'data/elevation.dat'\n", "ofilename = 'data/zenith.dat'\n", "\n", "ifp = open(ifilename,\"r\")\n", "ofp = open(ofilename,\"w\")\n", "\n", "for i in ifp.readlines():\n", " data = i.split()\n", " time = [float(i) for i in data[1].split(':')]\n", " hours = time[0] + time[1]/60. + time[2]/(60.*60)\n", " zenith = 90. - float(data[2])\n", " if zenith <= 90.:\n", " ofp.write(\"%.1f %.3f\\n\"%(hours,zenith))\n", "ifp.close()\n", "ofp.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could check the output file from unix:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6.5 85.907\r\n", "7.0 81.445\r\n", "7.5 77.061\r\n", "8.0 72.849\r\n", "8.5 68.882\r\n", "9.0 65.233\r\n", "9.5 61.980\r\n", "10.0 59.205\r\n", "10.5 56.993\r\n", "11.0 55.421\r\n" ] } ], "source": [ "!head -10 < data/zenith.dat" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-rw-r--. 1 plewis plewis 269 Oct 7 2014 data/zenith.dat\r\n" ] } ], "source": [ "!ls -l data/zenith.dat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2.4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The text file [data/modis_files.txt](data/modis_files.txt) contains a listing of hdf format files that are in the directory `/data/geospatial_19/ucfajlg/fire/Angola/MOD09` on the UCL Geography system. The contents of the file looks like (first 10 lines):" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004001.h19v10.005.2008109063923.hdf\r\n", "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004002.h19v10.005.2008108084250.hdf\r\n", "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004003.h19v10.005.2008108054126.hdf\r\n", "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004004.h19v10.005.2008108112322.hdf\r\n", "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004005.h19v10.005.2008108173219.hdf\r\n", "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004006.h19v10.005.2008108214033.hdf\r\n", "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004007.h19v10.005.2008109081257.hdf\r\n", "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004008.h19v10.005.2008109111447.hdf\r\n", "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004009.h19v10.005.2008109211421.hdf\r\n", "/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004010.h19v10.005.2008110031925.hdf\r\n" ] } ], "source": [ "!head -10 < data/modis_files.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your task is to create a new file [`data/some_modis_files.txt`](data/some_modis_files.txt) that contains *only* the file names for the month of August.\n", "\n", "You will notice that the file names have a field in them such as `A2004006`. This is the one you will need to concentrate on, as it specifies the year (`2004` here) and the day of year (`doy`), (`006` in this example).\n", "\n", "There are various ways to find the day of year for a particular month / year, e,g, look on a [website](http://www.soils.wisc.edu/cgi-bin/asig/doyCal.rb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.8 Doing Some Science" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Maximum Precipitation " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The Problem " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "We want to calculate the **maximum** monthly precipitation for regions of the UK for all years in the 20th Century. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The Data " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "We have access to monthly average precipitation data as regional totals from the [UK Met Office](http://www.metoffice.gov.uk/hadobs/hadukp/data/download.html). \n", "\n", "These data are in ASCII format, available over the internet. \n", "\n", "e.g. http://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt\n", "\n", "or locally as [`files/data/HadSEEP_monthly_qc.txt`](files/data/HadSEEP_monthly_qc.txt).\n", "\n", "for South East England. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Solving the Problem " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With just the Python skills you have learned so far, you should be able to solve a problem of this nature.\n", "\n", "Before diving into this though, you need to think through **what steps** you need to go through to achieve your aim?\n", "\n", "At a 'high' level, this could be:\n", "\n", "1. Examine the data\n", "2. Read the data into the computer program\n", "3. Select which years you want\n", "4. Find the maximum value for each year over all months\n", "5. Print the results\n", "\n", "So now we need to think about how to implement these steps." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Examine the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the first step, there are many ways you could do this.\n", "\n", "You will probably want to look at the data set [in a browser](files/data/HadSEEP_monthly_qc.txt).\n", "\n", "You should see that the data are 'white space' separated.\n", "\n", "The first 4 lines are 'header' text, giving contextual information on the data.\n", "\n", "Subsequent lines have the **year** in the first column, then 12 columns of monthly precipitation, then an annual total.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Read the data into the computer program" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "a. You could simply save the file using 'Save As ...' from the browser.\n", "\n", "b. You could, if you really wanted, just copy and paste the data into a file on the local system.\n", "\n", "c. You could use the unix command `wget`:\n", "\n", "`berlin% mkdir -p ~/DATA/geogg122/Chapter2_Python_intro/files/data` \n", "`berlin% cd ~/DATA/geogg122/Chapter2_Python_intro` \n", "```berlin% wget -O data/HadSEEP_monthly_qc.txt \\\n", " http://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt``` \n", "\n", "d. You could download and read the file directly from a URL within Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us suppose that you had downloaded the file and saved it as `files/data/HadSEEP_monthly_qc.txt`.\n", "\n", "In this case, we would use:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Monthly Southeast England precipitation (mm). Daily automated values used after 1996.\\n', 'Wigley & Jones (J.Climatol.,1987), Gregory et al. (Int.J.Clim.,1991)\\n', 'Jones & Conway (Int.J.Climatol.,1997), Alexander & Jones (ASL,2001). Values may change after QC.\\n', 'YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC ANN\\n', ' 1873 87.1 50.4 52.9 19.9 41.1 63.6 53.2 56.4 62.0 86.0 59.4 15.7 647.7\\n', ' 1874 46.8 44.9 15.8 48.4 24.1 49.9 28.3 43.6 79.4 96.1 63.9 52.3 593.5\\n', ' 1875 96.9 39.7 22.9 37.0 39.1 76.1 125.1 40.8 54.7 137.7 106.4 27.1 803.5\\n', ' 1876 31.8 71.9 79.5 63.6 16.5 37.2 22.3 66.3 118.2 34.1 89.0 162.9 793.3\\n', ' 1877 146.0 47.7 56.2 66.4 62.3 24.9 78.5 82.4 38.4 58.1 144.5 54.2 859.6\\n', ' 1878 39.9 44.7 34.2 76.6 96.0 46.8 42.3 133.1 35.7 72.9 94.1 40.7 757.0\\n']\n" ] } ], "source": [ "filename = 'data/HadSEEP_monthly_qc.txt'\n", "\n", "fp = open(filename,'r')\n", "raw_data = fp.readlines()\n", "fp.close()\n", "\n", "print raw_data[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So we have read the data in well enough, but it's not really in a convenient format.\n", "\n", "First, it has 4 lines at the top that we don't want. \n", "Second, although the data are in a list for each line, each line is stored as a string.\n", "\n", "Since we know about lists, we might suppose that it would be better to have each line as a list, with each 'white space' separated item being an element of the list.\n", "\n", "If we have a string such as:\n", "\n", "`' 1873 87.1 50.4 52.9 19.9 41.1 63.6 53.2 56.4 62.0 86.0 59.4 15.7 647.7\\n'`\n", "\n", "one way to achieve this would be to use `split()`:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['1873', '87.1', '50.4', '52.9', '19.9', '41.1', '63.6', '53.2', '56.4', '62.0', '86.0', '59.4', '15.7', '647.7']\n" ] } ], "source": [ "line_data = ' 1873 87.1 50.4 52.9 19.9 41.1 63.6 53.2 56.4 62.0 86.0 59.4 15.7 647.7\\n'\n", "year_data = line_data.split()\n", "print year_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thats useful, and we could loop over each line and perform this to get line lists with the first element as the year, second as precipitation in January, etc.\n", "\n", "But each element is still a string, and really, we want these as `float`.\n", "\n", "We can convert `str` to `float` using `float()`, but we have to do this for each string individually.\n", "\n", "We can do this in a loop:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "format now \n", "[1873.0, 87.1, 50.4, 52.9, 19.9, 41.1, 63.6, 53.2, 56.4, 62.0, 86.0, 59.4, 15.7, 647.7]\n" ] } ], "source": [ "line_data = ' 1873 87.1 50.4 52.9 19.9 41.1 63.6 53.2 56.4 62.0 86.0 59.4 15.7 647.7\\n'\n", "year_data = line_data.split()\n", "\n", "for column,this_element in enumerate(year_data):\n", " year_data[column] = float(this_element)\n", "\n", "print 'format now',type(year_data[0])\n", "print year_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we know how to convert each line of data into a list of floating point numbers.\n", "\n", "In practice, we will see later in the course that there are simpler ways of achieving (using [`numpy`](http://www.numpy.org/)).\n", "\n", "We should first chop off the first 4 lines of data:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1873 87.1 50.4 52.9 19.9 41.1 63.6 53.2 56.4 62.0 86.0 59.4 15.7 647.7\n", "\n" ] } ], "source": [ "required_data = raw_data[4:]\n", "\n", "# lets check what the first line is now\n", "print required_data[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Putting all of this together:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# specify filename\n", "filename = 'data/HadSEEP_monthly_qc.txt'\n", "\n", "# read the data, chop off first 4 lines \n", "# and store in required_data\n", "fp = open(filename,'r')\n", "raw_data = fp.readlines()\n", "fp.close()\n", "required_data = raw_data[4:]\n", "\n", "# set up list to store data in\n", "data = []\n", "\n", "\n", "# loop over each line\n", "for line_data in required_data:\n", " # split on white space\n", " year_data = line_data.split()\n", " \n", " # convert data to float\n", " for column,this_element in enumerate(year_data):\n", " year_data[column] = float(this_element)\n", " data.append(year_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have the data read in, as floating point values, in the 2-D list called `data`:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1873.0, 87.1, 50.4, 52.9, 19.9, 41.1, 63.6, 53.2, 56.4, 62.0, 86.0, 59.4, 15.7, 647.7]\n", "[1874.0, 46.8, 44.9, 15.8, 48.4, 24.1, 49.9, 28.3, 43.6, 79.4, 96.1, 63.9, 52.3, 593.5]\n" ] } ], "source": [ "print data[0]\n", "print data[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and we can compare this with the [original data we saw on the web](http://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt) or the [file we downloaded](files/data/HadSEEP_monthly_qc.txt) to check it's been read in correctly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Select which years you want" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want years 1900 to 1999.\n", "\n", "The year is stored in the first column, e.g. `data[10][0]`, and we want columns `1` to `-1` (i.e. skip the first and last column):" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1883.0\n", "[60.2, 97.4, 23.6, 36.5, 48.1, 53.3, 69.7, 21.7, 100.6, 58.1, 97.8, 22.5]\n" ] } ], "source": [ "print data[10][0]\n", "print data[10][1:-1]" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "c20_data = []\n", "for line in data:\n", " if (line[0] >= 1900) and (line[0] < 2000):\n", " c20_data.append(line[1:-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Find the maximum value for each year over all months" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to find the maximum value in each row of `c20_data` now.\n", "\n", "If we consider row 0, we can use e.g.:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[92.8, 125.9, 23.3, 30.7, 30.0, 74.6, 31.0, 73.3, 22.3, 56.1, 72.1, 88.2]\n", "125.9\n" ] } ], "source": [ "print c20_data[0]\n", "print max(c20_data[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "i.e. in the year 1900 (row 0), for S.E. England, the maximum rainfall was in February." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In South East England\n", "In the year 1900 the rainiest month was Feb with 125.9 mm\n", "In the year 1910 the rainiest month was Dec with 119.4 mm\n", "In the year 1920 the rainiest month was Jul with 116.9 mm\n", "In the year 1930 the rainiest month was Nov with 121.0 mm\n", "In the year 1940 the rainiest month was Nov with 203.0 mm\n", "In the year 1950 the rainiest month was Nov with 137.9 mm\n", "In the year 1960 the rainiest month was Oct with 167.0 mm\n", "In the year 1970 the rainiest month was Nov with 186.4 mm\n", "In the year 1980 the rainiest month was Oct with 110.1 mm\n", "In the year 1990 the rainiest month was Feb with 122.3 mm\n" ] } ], "source": [ "# Aside: show which month that was\n", "month_names = [ \"Jan\", \"Feb\", \"Mar\", \"Apr\", \\\n", " \"May\", \"Jun\", \"Jul\", \"Aug\", \"Sep\", \"Oct\", \"Nov\", \"Dec\" ]\n", "\n", "print \"In South East England\"\n", "for row in xrange(0,100,10):\n", " year = 1900 + row\n", " max_precip = max(c20_data[row])\n", " month = c20_data[row].index(max_precip)\n", "\n", " print \"In the year\",year,\"the rainiest month was\",month_names[month],\"with\",max_precip,\"mm\"" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# max precip for all years\n", "\n", "result = []\n", "for row in xrange(100):\n", " result.append(max(c20_data[row]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Print the results" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In South East England, the maximum value for the years 1900 to 1999 is\n", "[125.9, 98.7, 97.2, 188.4, 96.0, 104.5, 127.3, 129.5, 83.8, 143.9, 119.4, 153.6, 151.6, 106.5, 190.2, 163.9, 110.7, 135.1, 151.1, 121.4, 116.9, 70.8, 92.2, 142.5, 110.2, 98.1, 143.8, 138.5, 137.0, 175.6, 121.0, 101.3, 146.4, 74.1, 176.3, 148.2, 112.8, 124.9, 93.6, 154.0, 203.0, 117.3, 94.3, 130.2, 117.0, 103.8, 126.7, 150.7, 125.3, 189.3, 137.9, 161.9, 99.9, 87.5, 131.0, 110.5, 113.7, 97.7, 107.4, 143.4, 167.0, 102.0, 94.0, 141.0, 97.7, 126.9, 129.1, 126.6, 135.5, 108.4, 186.4, 134.4, 94.0, 76.6, 164.7, 122.8, 141.5, 132.8, 144.9, 122.6, 110.1, 131.2, 143.0, 98.5, 112.4, 103.7, 110.1, 198.7, 143.8, 143.8, 122.3, 102.0, 132.8, 133.9, 115.6, 143.1, 122.6, 100.6, 128.1, 109.5]\n", "The highest rainfall in a month was 203.0 mm\n", "It occurred in 1940\n" ] } ], "source": [ "print \"In South East England, the maximum value for the years 1900 to 1999 is\"\n", "print result\n", "print \"The highest rainfall in a month was\",max(result),\"mm\"\n", "print \"It occurred in\",result.index(max(result))+1900" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2.5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That is quite an achievement, given the limited amount of programming you know so far.\n", "\n", "If you go through this though, you will (should) see that it is really not very efficient.\n", "\n", "For example:\n", "\n", "- we read all the data in and then filter out the years we want (what if the dataset were **huge**?)\n", "- we loop over the 100 years multiple times\n", "- we store intermediate results\n", "\n", "For this exercise, you should look through the code we developed and try to make it more efficient.\n", "\n", "Efficiency should not override clarity and understanding though, so make sure you can understand what is going on at each stage." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2.6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Average Temperature\n", "\n", "We want to calculate the long-term average temperature (tmax degC) using observational data at one or more meteorological stations. Such data are [relevant to understanding climate and its dynamics](http://www.carbonbrief.org/profiles/global-temperatures/).\n", "\n", "We choose the period 1960 to 1990 (30 years average to even out natural variability).\n", "\n", "We can obtain monthly average data for a number of UK stations from the [UK Met. Office](http://www.metoffice.gov.uk/climate/uk/stationdata).\n", "\n", "Not all station records are complete enough for this calculation, so we select, for example [Heathrow](http://www.metoffice.gov.uk/climate/uk/stationdata/heathrowdata.txt).\n", "\n", "Just with the Python skills you have learned so far, you should be able to solve a problem of this nature.\n", "\n", "Before diving into this though, you need to think through **what steps** you need to go through to achieve your aim?\n", "\n", "At a 'high' level, this could be:\n", "\n", "1. Get hold of the data\n", "2. Read the data into the computer program\n", "3. Select which years you want\n", "4. Average the data for each month over all selected years\n", "5. Print the results\n", "\n", "So now we need to think about how to implement these steps.\n", "\n", "For the first step, there are many ways you could do this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.9 Dictionaries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dictionaries (type `dict`) are another way of grouping objects.\n", "\n", "These are defined within curley brackets `{}` and are distinguished by having a `'key` and `value` for each item.\n", "\n", "e.g.:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a:\n", "\t{'three': 3, 'two': 2, 'one': 1}\n", "a.keys():\n", "\t['three', 'two', 'one']\n", "a.values():\n", "\t[3, 2, 1]\n", "a.items():\n", "\t[('three', 3), ('two', 2), ('one', 1)]\n" ] } ], "source": [ "a = {'one': 1, 'two': 2, 'three': 3}\n", "\n", "# we then refer to the keys and values in the dict as:\n", "\n", "print 'a:\\n\\t',a\n", "print 'a.keys():\\n\\t',a.keys() # the keys\n", "print 'a.values():\\n\\t',a.values() # returns the values\n", "print 'a.items():\\n\\t',a.items() # returns a list of tuples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the order they appear in is not necessarily the same as when we generated the dictionary.\n", "\n", "We refer to specific items as e.g.:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "print a['one']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can loop over dictionaries in various interesting ways:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n", "2\n", "1\n" ] } ], "source": [ "for k in a.keys():\n", " print a[k]" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "three 3\n", "two 2\n", "one 1\n" ] } ], "source": [ "for k,v in a.items():\n", " print k,v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you really need to process in some order, you need to sort the keys in some way:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "3\n", "2\n" ] } ], "source": [ "for k in sort(a.keys()):\n", " print a[k]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "though in this case this still might not be what you want, so be careful.\n", "\n", "You can add to a list with `update`:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'four': 4, 'three': 3, 'five': 5, 'two': 2, 'one': 1}\n" ] } ], "source": [ "a.update({'four':4,'five':5})\n", "print a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or for a single item:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'six': 6, 'three': 3, 'two': 2, 'four': 4, 'five': 5, 'one': 1}\n" ] } ], "source": [ "a['six'] = 6\n", "print a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or delete items:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'six': 6, 'two': 2, 'four': 4, 'five': 5, 'one': 1}\n" ] } ], "source": [ "del a['three']\n", "print a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These trivial examples are useful for understanding some basic operations on dictionaries, but don't show their real power. \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### configuration file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A good example is to consider a configuration file.\n", "\n", "When we have to run complicated processing jobs (e.g. on stacks of satellite data), we will often control the processing of these jobs with some text that we may put in a file.\n", "\n", "This would describe the particular conditions that one subset of the jobs would process, for example.\n", "\n", "Following from exercise 2.4 above, we might create a configuration file with:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```bash\n", "[TERRA] \n", "dir = /data/geospatial_19/ucfajlg/fire/Angola/MOD09 \n", "name = MODIS TERRA data\n", "year = 2004\n", "doy_start = 214\n", "doy_end = 245\n", "file_list = files/data/modis_files2a.txt\n", "\n", "[AQUA] \n", "dir = /data/geospatial_19/ucfajlg/fire/Angola/MYD09 \n", "name = MODIS AQUA data\n", "year = 2004\n", "doy_start = 214\n", "doy_end = 245\n", "file_list = files/data/modis_files2b.txt\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This information is in the file [`files/data/modis.cfg`](files/data/modis.cfg).\n", "\n", "We could read and parse this file ourselves:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'AQUA': {'doy_end': '245', 'doy_start': '214', 'name': 'MODIS AQUA data', 'year': '2004', 'file_list': 'files/data/modis_files2b.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MYD09'}, 'TERRA': {'doy_end': '245', 'doy_start': '214', 'name': 'MODIS TERRA data', 'year': '2004', 'file_list': 'files/data/modis_files2a.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MOD09'}}\n" ] } ], "source": [ "fp = open('data/modis.cfg')\n", "\n", "# empty dict\n", "modis = {}\n", "this_section = modis\n", "\n", "# loop over each line\n", "for line in fp.readlines():\n", " # strip any extra white space\n", " line = line.strip()\n", " # check that there is some text\n", " # and it starts with [ and ends with ]\n", " if len(line) and line[0] == '[' and line[-1] == ']':\n", " section = line[1:-1]\n", " modis[section] = this_section = {}\n", " elif len(line) and line.find(\"=\") != -1:\n", " key,value = line.split(\"=\")\n", " this_section[key.strip()] = value.strip()\n", "\n", "fp.close()\n", "print modis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "though in practice, we might choose to use the [`ConfigParser`](http://docs.python.org/2/library/configparser.html) module:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'AQUA': {'doy_end': '245', 'doy_start': '214', 'name': 'MODIS AQUA data', 'year': '2004', 'file_list': 'files/data/modis_files2b.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MYD09'}, 'TERRA': {'doy_end': '245', 'doy_start': '214', 'name': 'MODIS TERRA data', 'year': '2004', 'file_list': 'files/data/modis_files2a.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MOD09'}}\n" ] } ], "source": [ "import ConfigParser\n", "\n", "config = ConfigParser.ConfigParser()\n", "config.read('data/modis.cfg')\n", "\n", "# we can convert this to a normal dictionary\n", "modis = {}\n", "for k in config.sections():\n", " modis[k] = dict(config.items(k))\n", "print modis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The text file [data/modis_files2a.txt](data/modis_files2a.txt) contains a listing of hdf format files that are in the directory `/data/geospatial_19/ucfajlg/fire/Angola/MOD09` and [data/modis_files2b.txt](data/modis_files2b.txt) those in `/data/geospatial_19/ucfajlg/fire/Angola/MYD09` on the UCL Geography system. The contents of the files looks like (first 10 lines):" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MYD09GA.A2004001.h19v10.005.2008035021539.hdf\r\n", "MYD09GA.A2004002.h19v10.005.2008035115941.hdf\r\n", "MYD09GA.A2004003.h19v10.005.2008035223215.hdf\r\n", "MYD09GA.A2004004.h19v10.005.2008036154947.hdf\r\n", "MYD09GA.A2004005.h19v10.005.2008036025835.hdf\r\n", "MYD09GA.A2004006.h19v10.005.2008037030304.hdf\r\n", "MYD09GA.A2004007.h19v10.005.2008037072048.hdf\r\n", "MYD09GA.A2004008.h19v10.005.2008037155636.hdf\r\n", "MYD09GA.A2004009.h19v10.005.2008037162301.hdf\r\n", "MYD09GA.A2004010.h19v10.005.2008038034819.hdf\r\n" ] } ], "source": [ "!head -10 < data/modis_files2b.txt " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try to use the information in the configuration dictionary `modis` to generate a list of the files we want to process:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "examining section AQUA\n", "{'doy_end': '245', 'doy_start': '214', 'name': 'MODIS AQUA data', 'year': '2004', 'file_list': 'files/data/modis_files2b.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MYD09'}\n", "\n", "examining section TERRA\n", "{'doy_end': '245', 'doy_start': '214', 'name': 'MODIS TERRA data', 'year': '2004', 'file_list': 'files/data/modis_files2a.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MOD09'}\n" ] } ], "source": [ "# first, work out how to loop over config sections\n", "# and get the sub-dictionary\n", "\n", "for k,v in modis.items():\n", " print \"\\nexamining section\",k\n", " sub_dict = v\n", " print v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, make sure we can read the `file_list`:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reading files/data/modis_files2a.txt\n", "MOD09GA.A2004001.h19v10.005.2008109063923.hdf\n", "\n" ] } ], "source": [ "print 'reading',sub_dict['file_list']\n", "fp = open(sub_dict['file_list'],'r')\n", "file_data = fp.readlines()\n", "fp.close()\n", "\n", "# print the first one, just to see what it looks like\n", "count = 0\n", "print file_data[count]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure we can get the day of year from this:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['MOD09GA', 'A2004001', 'h19v10', '005', '2008109063923', 'hdf\\n']\n" ] } ], "source": [ "print file_data[count].split('.')" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A2004001\n" ] } ], "source": [ "print file_data[count].split('.')[1]" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "doy = int(file_data[count].split('.')[1][-3:])\n", "print doy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "whats the range of days we want?" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244]\n" ] } ], "source": [ "doy_range = range(int(sub_dict['doy_start']),int(sub_dict['doy_end']))\n", "print doy_range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Is `doy` in the range?" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False\n" ] } ], "source": [ "print doy in doy_range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now we have all of the parts we need to create this code:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I found 62 files to process\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004214.h19v10.005.2007299212915.hdf\n", "\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004215.h19v10.005.2007300042347.hdf\n", "\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004216.h19v10.005.2007300091257.hdf\n", "\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004217.h19v10.005.2007300153436.hdf\n", "\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004218.h19v10.005.2007300215826.hdf\n", "\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004219.h19v10.005.2007302194509.hdf\n", "\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004220.h19v10.005.2007302093547.hdf\n", "\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004221.h19v10.005.2007302222054.hdf\n", "\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004222.h19v10.005.2007303011606.hdf\n", "\n", "/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004223.h19v10.005.2007303073538.hdf\n", "\n" ] } ], "source": [ "# 1. Read the configuration file\n", "# into the dict modis\n", "import ConfigParser\n", "\n", "config = ConfigParser.ConfigParser()\n", "config.read('files/data/modis.cfg')\n", "\n", "# we can convert this to a normal dictionary\n", "modis = {}\n", "for k in config.sections():\n", " modis[k] = dict(config.items(k))\n", "\n", "# 2. Now, loop over config sections\n", "# and get the sub-dictionary which we call sub_dict\n", "\n", "# 3. set up anb empty list to contain the\n", "# files we want to process\n", "wanted_files = []\n", "\n", "for k,v in modis.items():\n", " \n", " sub_dict = v\n", " \n", " # 3a. Read the file list\n", " fp = open(sub_dict['file_list'],'r')\n", " file_data = fp.readlines()\n", " fp.close()\n", " \n", " # 3b. find the doy range\n", " doy_range = range(int(sub_dict['doy_start']),\\\n", " int(sub_dict['doy_end']))\n", " \n", " # 3c. loop over each file read from\n", " # sub_dict['file_list']\n", " for count in xrange(len(file_data)):\n", " # 3d. extract doy from the file name\n", " this_file = file_data[count]\n", " \n", " doy = int(this_file.split('.')[1][-3:])\n", " \n", " # 3e. see if doy is in the range we want?\n", " if doy in doy_range:\n", " \n", " # 3f. put the directory on the fornt\n", " full_name = sub_dict['dir'] + \\\n", " '/' + this_file\n", " wanted_files.append(full_name)\n", " \n", "print \"I found %d files to process\"%len(wanted_files)\n", "\n", "# I won't print the whole list as its too long\n", "# just the first 10\n", "for f in wanted_files[:10]:\n", " print f" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2.7" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should modify the example above to make it simpler, if you can spot any places for that (don't make it more complicated!).\n", "\n", "You should then modify it so that the list of files that we want to process is printed to a file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.10 Where we have reached" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That is plenty of Python for one day. We would not expect you to be able to remember all of this the first time you go through it: you will have to work at it and go through it multiple times. Make sure that the time you spend going through codes and examples is profitable though: it is all too easy to sit and stare at a section of code for hours without *learning* anything ... you need to be engaged with the learning for this to happen, so take frequent breaks, write your own summary notes and do whatever you need to get to grips with the basics here. \n", "\n", "Although we have rather crammed a lot into this session, that is for timetabling reasons more than for effectiveness. When you go through it in your own time, break it into sections and try to get to grips with each part. \n", "\n", "We would not expect you to be able to develop codes of the sort we have developed above from scratch to start with, but as you go through these notes and do the exercises, you should start to understand something of how we build a piece of code. \n", "\n", "Our experience is that you should be able to get to grips with these examples (or at least most of them). Don't start by looking at some finished piece of code ... start by going through the simple examples to learn the commands and syntax, then try to put the pieces together.\n", "\n", "Once you have learned the basic tools, you will be in a much better position to think about *algorithms*, i.e. how to break a problem down into smaller parts that you can solve with the Python that you know." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.11 Answers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Answers to the exersises in this session are made [available to you](main_answers.ipynb), though you should obviously only consult these when you are finished (or if you are very stuck)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.12 Advanced" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have got to grips with the basics in this session, you might consider stretching yourself and going through the [advanced](advanced.ipynb) section.\n", "\n", "Consult the [answers](advanced_answers.ipynb) to the exercises once you have completed them, and come along to office hours if yoiu want to go through anything." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false } }, "nbformat": 4, "nbformat_minor": 1 }