{ "metadata": { "name": "", "signature": "sha256:d69e0a34de096f0bdc712f85439dc3c254a3aa615151c14a7031a924e038cad4" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Geospatial Data in Python: Database, Desktop, and the Web\n", "## Tutorial (Part 0a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting started with Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have the required packages installed, you are ready to get started and try out some examples. However, to use the powerful range of tools, functions, commands, and spatial libraries available in Python, you first need to learn a little bit about the syntax and meaning of Python commands. Once you have learned this, operations become simple to perform." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Invoking an Operation\n", "\n", "Complex computations are built up from simpler computations. This may seem obvious, but it is a powerful idea. An **algorithm** is just a description of a computation in terms of other computations that you already know how to perform. To help distinguish between the computation as a whole and the simpler parts, it is helpful to introduce a new word: an **operator** performs a computation.\n", "\n", "It's helpful to think of the computation carried out by an operator as involving four parts:\n", "\n", "1. The name of the operator\n", "2. The input arguments\n", "3. The output value\n", "4. Side effects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A typical operation takes one or more **input arguments** and uses the information in these to produce an **output value**. Along the way, the computer might take some action: display a graph, store a file, make a sound, etc. These actions are called **side effects**.\n", "\n", "Since Python is a general-purpose programming language, we usually need to `import` special packages for doing specific things (like working with spatial data). You can think of this as adding words to the language. For Scientific Python, the most important library that we need is `numpy` (Numerical Python), which can be loaded like this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To tell the computer to perform a computation - call this **invoking an operation** or giving a **command** - you need to provide the name and the input arguments in a specific format. The computer then returns the output value. For example, the command `np.sqrt(25)` invokes the square root operator (named `sqrt` from the `numpy` library) on the argument `25`. The output from the computation will, of course, be `5`.\n", "\n", "The syntax of invoking an operation consists of the operator's name, followed by round parentheses. The input arguments go inside the parentheses.\n", "\n", "The software program that you use to invoke operators is called an **interpreter** (the interpreter is the program you are running when you start Python). You enter your commands as a 'dialog' between you and the interpreter (just like when converting between any two languages!). Commands can be entered as part of a script (a text file with a list of commands to perform) or directly at a 'command prompt':" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.sqrt(25)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above situation, the 'prompt' is `In [2]:`, and the 'command' is `np.sqrt(25)`. When you press 'Enter', the interpreter reads your command and performs the computation. For commands such as the one above, the interpreter will print the output value from the computation:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.sqrt(25)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above example, the 'output marker' is `Out[3]:`, and the output value is `5.0`. If we were working at the command-line right now, the dialog would continue as the interpreter prints another prompt and waits for your further command, here however, we just move to the next code cell." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Often, operations involve more than one argument. The various arguments are separated by commas. For example, here is an operation named `arange` from the `numpy` library that produces a range of numbers (increasing values between 3 and 10):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.arange(3, 10)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first argument tells where to start the range and the second tells where to end it. The order of the arguments is important. For instance, *here* is the range produced when 10 is the first argument, 3 is the second, and the third is -1 (decreasing values between 10 and 3):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.arange(10, 3, -1)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For some operators, particularly those that have many input arguments, some of the arguments can be referred to by name rather than position. This is particularly useful when the named argument has a sensible default value. For example, the `arange` operator from the `numpy` library can be instructed what type of output values to produce (integers, floats, etc). This is accomplished using an argument named `dtype`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.arange(10, 3, -1, dtype='float')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that all the values in the range now have decimal places. Depending on the circumstances, all four parts of an operation need not be present. For example, the `ctime` operation from the `time` library returns the current time and date; no input arguments are required:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import time\n", "time.ctime()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above example, we first imported the `time` library, which provides a series of commands that help us work with dates and times. Next, even though there are no arguments, the parentheses are still used when calling the `ctime` command. Think of the pair of parentheses as meaning, '*do this*'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Naming and Storing Values\n", "\n", "Often the value returned by an operation will be used later on. Values can be stored for later use with the **assignment operator**. This has a different syntax that reminds the user that a value is being stored. Here's an example of a simple assignment:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = 16" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The command has stored the value 16 under the name `x`. The syntax is always the same: an equal sign (=) with a name on the left side and a value on the right. \n", "Such stored values are called **objects**. Making an assignment to an object defines the object. Once an object has been defined, it can be referred to and used in later computations. Notice that an assignment operation does not return a value or display a value. Its sole purpose is to have the side effects of defining the object and thereby storing a value under the object's name.\n", "\n", "To refer to the value stored in the object, just use the object's name itself. For instance:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Doing a computation on the value store in an object is much the same (and provides and extremely rich syntax for performing complex calculations):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.sqrt(x)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create as many objects as you like and give them names that remind you of their purpose. Some examples: `wilma`, `ages`, `temp`, `dog_houses`, `foo3`. There *are* some general rules for object names:\n", "\n", "* Use only letters and numbers and 'underscores' (_)\n", "* Do NOT use spaces anywhere in the name (Python won't let you)\n", "* A number cannot be the first character in the name\n", "* Capital letters are treated as distinct from lower-case letters (i.e., Python is *case-sensitive*)\n", " * the objects named `wilma`, `Wilma`, and `WILMA` are all different\n", "* If possible, use an 'underscore' between words (i.e., `my_object`)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the sake of readability, keep object names short. But if you really must have an object named something like `ages_of_children_from_the _clinical_trial`, feel free (it's just more typing for you later!).\n", "\n", "Objects can store all sorts of things, for example a range of numbers:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = np.arange(1, 7)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you assign a new value to an existing object, as just done to `x` above, the former values of that object is erased from the computer memory. The former value of `x` was 16, but after the new assignment above, it is:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The value of an object is changed only via the assignment operator. Using an object in a computation does not change the value. For example, suppose you invoke the square-root operator on `x`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.sqrt(x)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The square roots have been returned as a value, but this doesn't change the value of `x`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "An assignment command like x=np.sqrt(x) can be confusing to people who are used to algebraic notation. In algebra, the equal sign describes a relationship between the left and right sides. So, $x = \\sqrt{x}$ tells us about how the quantity $x$ and the quantity $\\sqrt{x}$ are related. Students are usually trained to 'solve' such a relationship, going through a series of algebraic steps to find values for $x$ that are consistent with the mathematical statement (for $x = \\sqrt{x}$, the solutions are $x = 0$ and $x = 1$). In contrast, the assignment command x = np.sqrt(x) is a way of replacing the previous values stored in x with new values that are the square-root of the old ones.\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to change the value of `x`, you need to use the assignment operator:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = np.sqrt(x)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Connecting Computations\n", "\n", "The brilliant thing about organizing operators in terms of unput arguments and output values is that the output of one operator can be used as an input to another. This lets complicated computations be built out of simpler ones.\n", "\n", "For example, suppose you have a list of 10000 voters in a precinct and you want to select a random sample of 20 of them for a survey. The `np.arange` operator can be used to generate a set of 10000 choices. The `np.random.choice` operator can then be used to select a subset of these values at random.\n", "\n", "One way to connect the computations is by using objects to store the intermediate outputs:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "choices = np.arange(1, 10000)\n", "np.random.choice(choices, 20, replace=False) # sample _without_ replacement" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also pass the output of an operator *directly* as an argument to another operator. Here's another way to accomplish exactly the same thing as the above (note that the values will differ because we are performing a *random* sample):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.random.choice(np.arange(1, 10000), 20, replace=False)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Numbers and Arithmetic\n", "\n", "The `Python` language has a concise notation for arithmetic that looks very much like the traditional one:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "7. + 2." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "3. * 4." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "5. / 2." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "3. - 8." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "-3." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "5.**2. # same as 5^2 (or 5 to the power of 2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arithmetic operators, like any other operators, can be connected to form more complicated computations. For instance:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "8. + 4. / 2." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The a human reader, the command `8+4/2` might seem ambiguous. Is it intended to be `(8+4)/2` or `8+(4/2)`? The computer uses unambiguous rules to interpret the expression, but it's a good idea for you to use parentheses so that you can make sure that what you *intend* is what the computer carries out:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "(8. + 4.) / 2." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Traditional mathematical notations uses superscripts and radicals to indicate exponentials and roots, e.g. $3^2$ or $\\sqrt{3}$ or $\\sqrt[3]{8}$. This special typography doesn't work well with an ordinary keyboard, so `Python` and most other computer languages uses a different notation:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "3.**2." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "np.sqrt(3.) # or 3.**0.5" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "8.**(1./3.)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is a large set of mathematical functions: exponentials, logs, trigonometric and inverse trigonometric functions, etc. Some examples:\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "
TraditionalPython
$e^2$np.exp(2)
$\\log_{e}(100)$np.log(100)
$\\log_{10}(100)$np.log10(100)
$\\log_{2}(100)$np.log2(100)
$\\cos(\\frac{\\pi}{2})$np.cos(np.pi/2)
$\\sin(\\frac{\\pi}{2})$np.sin(np.pi/2)
$\\tan(\\frac{\\pi}{2})$np.tan(np.pi/2)
$\\cos^{-1}(-1)$np.acos(-1)
\n", "\n", "Numbers can be written in **scientific notation**. For example, the 'universal gravitational constant' that describes the gravitational attraction between masses is $6.67428 \\times 10^{11}$ (with units meters-cubed per kilogram per second squared). In the computer notation, this would be written as `6.67428e-11`. The Avogadro constant, which gives the number of atoms in a mole, is $6.02214179 \\times 10^{23}$ per mole, or `6.02214179e+23`.\n", "\n", "The computer language does not directly support the recording of units. This is unfortunate, since in the real world numbers often have units and the units matter. For example, in 1999 the Mars Climate Orbiter crashed into Mars because the design engineers specified the engine's thrust in units of pounds, while the guidance engineers thought the units were newtons.\n", "\n", "Computer arithmetic is accurate and reliable, but it often involves very slight rounding of numbers. Ordinarily, this is not noticeable. However, it can become apparent in some calculations that produce results that are (near) zero. For example, mathematically, $sin(\\pi) = 0$, however, the computer does not duplicate the mathematical relationship exactly:\n", "\n", "\n", "\n", "[pint]: https://pint.readthedocs.org/en/latest/\n", "[quantities]: http://pythonhosted.org/quantities/\n", "[units]: https://pypi.python.org/pypi/units/\n", "[sympy.physics.units]: http://docs.sympy.org/latest/modules/physics/units.html\n", "[etc]: http://conference.scipy.org/scipy2013/presentation_detail.php?id=174" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.sin(np.pi)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whether a number like this is properly interpreted as 'close to zero' depends on the context and, for quantities that have units, on the units themselves. For instance, the unit 'parsec' is used in astronomy in reporting distances between stars. The closest start to the Sun is Proxima, at a distance of 1.3 parsecs. A distance of $1.22 \\times 10^{-16}$ parsecs is tiny in astronomy but translates to about 2.5 meters - not so small on the human scale. In statistics, many calculations relate to probabilities which are always in the range 0 to 1. On this scale, `1.22e-16` is very close to zero.\n", "\n", "There are several 'special' numbers in the `Python` world; two of which are `inf`, which stands for $\\infty$ (infinity), and `nan`, which stands for 'not a number' (nan results when a numerical operation isn't define), for instance:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Mathematically oriented readers will wonder why Python should have any trouble with a computation like $\\sqrt{-9}$; the result is the imaginary number $3\\jmath$ (imaginary numbers may be represented by a $\\jmath$ or a $\\imath$, depending on the field). Python works with complex numbers, but you have to explicitly tell the system that this is what you want to do. To calculate $\\sqrt{-9}$ for example, simply use np.sqrt(-9+0j).\n", "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.float64(1.) / 0." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "np.float64(0.) / 0." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Types of Objects\n", "\n", "Most of the examples used so far have dealt with numbers. But computers work with other kinds of information as well: text, photographs, sounds, sets of data, and so on. The word **type** is used to refer to the *kind* of information. Modern computer languages support a great variety of types. It's important to know about the types of data because operators expect their input arguments to be of specific types. When you use the wrong type of input, the computer might not be able to process your command." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "In Python, data frames are not 'built in' as part of the basic language, but the excellent ['pandas'][pandas] library provides data frames and a whole slew of other functionality for researchers doing data analysis with Python. We will be learning more about 'pandas' comming up.\n", "\n", "\n", "[pandas]: http://pandas.pydata.org/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A Note on Strings\n", "\n", "Whenever you refer to an object name, make sure that you don't use quotes. For example, in the following, we are first assigning the string `\"python\"` to the `name` object, and then returning (and printing automatically) the `name` object." ] }, { "cell_type": "code", "collapsed": false, "input": [ "name = \"python\"\n", "name" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you make a command with the object name in quotes, it won't be treated as referring to an object. Instead, it will merely mean the text itself:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "\"name\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, if you omit the quotation marks from around the text, the computer will treat it as if it were an object name and will look for the object of that name. For instance, the following command directs the computer to look up the value contained in an object named `python` and insert that value into the object `name`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "name = python" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As it happens, there was no object named `python` because it had not been defined by any previous assignment command. So, the computer generated an error." ] } ], "metadata": {} } ] }