{ "metadata": { "name": "", "signature": "sha256:03f1e7994625f9058fda4315098a2fa46023499079b9157cf4ffd6af7f926e61" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Version: Python 2.7" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import this" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Scientific Python for Engineers" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Coding" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook discusses the main features of [Numerical Python](http://www.numpy.org/) (`numpy`), [Scientific Python](http://www.scipy.org/) (`scipy`), and their relatives in an engineering context.\n", "\n", "Although any installation of [IPython](http://ipython.org/) will work with a version of this notebook, we recommend that you download and install the [Enthought Canopy Distribution](https://www.enthought.com/products/canopy/) of Python, which is free for academic users. To launch the notebook, open a command terminal, type `ipython notebook tutorial.ipynb`, and press Return.\n", "\n", "A few notes on this tutorial:\n", "\n", "- Check the top left corner of this page for the Python version this notebook is supposed to work with. Enthought Canopy uses Python 2.7.x, so if you are using that version, you will need the 2.7 version of this notebook. There is also a 3.3 version, which requires some small adjustments if you are familiar with Python 2.7. Python 3 represents the future of Python and is a better platform for those developing new codes (rather than developing existing scripts). Most major modules are available in both flavors now.\n", "\n", "- Early on, we will incidentally utilize features of packages not yet introduced (such as `matplotlib`). Rest assured that the major features of these modules will be explicitly discussed at some point if it is not clear now.\n", "\n", "- Code blocks starting with `$` are intended to be run on the command line, not executed as Python code." ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Standard Header" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we will be utilizing a number of packages with reasonably long names, we will adopt the _de facto_ standard module abbreviations in the following header. We also ensure that our [division behavior is sensible](http://www.python.org/dev/peps/pep-0238/) by importing from `__future__`: _i.e._, promotion to `double` will occur from `int` or `long` data types involving division: `1/2 == 0.5`. Although this is the default in Python 3, it is a trivial way to help this notebook work in Python 2 if that's what you are using." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from __future__ import division\n", "\n", "import numpy as np\n", "import scipy as sp\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "\n", "#IPython magic command for inline plotting\n", "%matplotlib inline\n", "#a better plot shape for IPython\n", "mpl.rcParams['figure.figsize']=[15,3]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Getting Help" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from scipy import optimize\n", "help(optimize.fmin)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Function Arguments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may be familiar already with the notion of keyword arguments from Python, but it is an excellent way to store default values for subsidiary calculations that you write." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def mandelbrot(minR=-2.25, maxR=0.75, minI=-1.5, maxI=1.5, samples=51, iters=25):\n", " \"\"\"\n", " Generate the Mandelbrot set within the boundaries of the limits\n", " maxR, minR, minI, maxI with samples and a maximum iteration of iters.\n", " \"\"\"\n", " # Generate a mesh for the set.\n", " setR = np.linspace(minR, maxR, samples)\n", " setI = np.linspace(minI, maxI, samples)\n", "\n", " # Calculate the values at each point of the mesh by the escape-time\n", " # fractal algorithm.\n", " pts = np.zeros([samples, samples])\n", " for ii in range(1, len(setR)):\n", " for jj in range(1, len(setI)):\n", " it = 0\n", " x = 0.0\n", " y = 0.0\n", "\n", " xx = setR[ii]\n", " yy = setI[jj]\n", "\n", " # Look for escape---i.e., does the value settle down in a few\n", " # iterations or does it keep going?\n", " while(x * x + y * y < 4 and it < iters):\n", " xtmp = x * x - y * y + xx\n", " y = 2 * x * y + yy\n", " x = xtmp\n", " it += 1\n", " pts[ii, jj] = it\n", " return setR, setI, pts\n", "\n", "x, y, z = mandelbrot() #note that I can call any or all of the arguments explicitly as well: mandelbrot(minR=-2.5)\n", "z = z.transpose()\n", "\n", "mpl.rcParams['figure.figsize']=[8,8]\n", "plt.imshow(z, interpolation='nearest')\n", "plt.show()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an alternative to passing around possible values with a dictionary, you can pass arbitrary keyword arguments to a function that is prepared to accept them (_i.e._, has the `**kwargs` argument in its function definition line).\n", "\n", "[(More information)](http://zetcode.com/lang/python/functions/)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#Source: http://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargs-in-python/\n", "def test_var_kwargs(farg, **kwargs):\n", " print \"formal arg:\", farg\n", " for key in kwargs:\n", " print \"another keyword arg: %s: %s\" % (key, kwargs[key])\n", "\n", "test_var_kwargs(farg=1, myarg2=\"two\", myarg3=3)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Automating Numerical Experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As Python can invoke external programs and load executable code at run-time from disk, it is an excellent wrapper for automating numerical experiments in which a program is repeatedly invoked with a few systematically varied parameters.\n", "\n", "We will examine three elements of automating this process: command-line option parsing; configuration file; and systematic invocation." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Command-Line Option Parsing (The `getopt` Module)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the first place, a program can utilize command-line options and flags to indicate the preferred behavior of a program instance. You can pass in time step intervals, the location of data, or anything else configurable at run time (which in Python means everything).\n", "\n", "The `getopt` module provides a concise and convenient way of extracting both long and short options with their associated values. The system command-line input used to invoke the program is tokenized and stored in `sys.argv`. The short options are listed in a string (with `':'` indicating an argument to be given for that flag), followed by a list of long arguments. `getopt` returns a list of 2-tuples `(options, args)` for further processing. One way of processing the results is illustrated below.\n", "\n", "_N.B._: As IPython notebooks run inside an interface, the following code will only work as expected when used in a command-line script. See the example file `heat_eqn_cl.py`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "usage = \"\"\"This program can be invoked with the following options, etc.\"\"\"\n", "import getopt,sys\n", "options, args = getopt.getopt(sys.argv[1:],'h',['help'])\n", "for option, value in options:\n", " if option in ('-h','--help'):\n", " print(usage); sys.exit(0)\n", " else:\n", " print(usage); sys.exit('Invalid command-line argument detected.')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Configuration File (Creating Code at Run Time)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a number of reasons it may become undesirable to expect or allow end users to edit the primary program script. In these cases (and where command-line options are too numerous to type easily), it becomes advisable to use a configuration file containing the user-specified variables for program execution. In Python, the most straightforward implementation of this idea is to load the file into memory, convert it to a variable-definition-like syntax, and execute the resulting code as if it were regular Python. (The standard warnings about using `exec` and `eval` apply. _Caveat programator_.)\n", "\n", "See the example finite difference code in `numpy-scipy.ipynb`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Load configuration data from file.\n", "conf_file = open('config.dat', 'r')\n", "for line in file:\n", " var, val = [word.strip() for word in line.split('=')]\n", " conf_code = var + '=' + val\n", " exec(conf_code)\n", "\n", "# Check for missing values and give defaults. You can check for acceptable ranges and types here as well.\n", "if not locals().has_key('max_T'): max_t = 1e6;" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Obviously you will have to both specify in your documentation which definitions are accepted and provide defaults in the cases where definitions are not given. It is also left as an exercise how to use this feature to include units in the defitions (_i.e._, `g = 9.8 N`)." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Systematic Invocation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can automatically run many invocations of a program (whether a Python script or otherwise), tweaking the parameters slightly via either a command-line argument, a configuration file, or an input file. We will here illustrate the use of a command-line invocation for a hypothetical program which accepts a command-line argument. How can you extend this to work for a more complex input file?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import os\n", "# Loop through the variable values (stored in a list).\n", "for i in np.linspace(0,1,11):\n", " # Construct the command line string and evaluate it. To make the popen command nonblocking, add a '&' to the end of the cmd_i string.\n", " cmd_i = './my_code -t %s' % i\n", " os.popen(cmd_i)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Debugging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recollect the Zen of Python (`import this`): \"In the face of ambiguity, refuse the temptation to guess.\"\n", "\n", "You have certainly encountered errors previously in working with Python. There exist a number of approaches (collectively 'debugging') which serve to root out many of these errors and to make the others tractable." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Errors & Exceptions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few definitions are in order to discuss debugging in detail:\n", "\n", "- _Errors_\u2014exceptions which cause the program to be unrunnable (cannot be handled at run time)\n", "- _Exceptions_\u2014unusual behavior (although not necessarily unexpected behavior, particularly in Python)\n", "- _Tracebacks_\u2014listing of function calls on the stack at the time the exception arises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you've been programming in Python for a while, the following errors will look familar to you." ] }, { "cell_type": "raw", "metadata": {}, "source": [ " SyntaxError # Check missing colons or parentheses\n", " NameError # Check for typos, function definitions\n", " TypeError # Check variable types (coerce if necessary)\n", " ValueError # Check function parameters\n", " IOError # Check that files exist\n", " IndexError # Don't reference nonexistent list elements\n", " KeyError # Similar to an IndexError, but for dictionaries\n", " ZeroDivisionError # Three guesses...\n", " Exception # Generic error category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Write some snippets of code which throw the following exceptions: `SyntaxError`, `NameError`, `TypeError`, `ValueError`, `IOError`, `IndexError`, `KeyError`, `ZeroDivisionError`." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Linting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As I said before, you can debug by simply attempting to run your code. This, however, is very annoying. First off, the code will always stop at the first exception. This means that, if you have ten errors, you'll have to run the code ten times to find all of them. Now, imagine that this is long-running code. Imagine waiting five minutes for your code to run, only to discover it breaks because of a typo. Doesn't that sound exciting?\n", " \n", "Linting is the process of statically analyzing the code in order to catch multiple errors simultaneously (rather than dealing with them one at a time in the debugger). There are several available; I'll illustrate the use of pyflakes." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "$ pyflakes my_code.py" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Coding Standards" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a written natural language, there are many ways to express the same idea. To make the consumption of information easier, people define style guides to enforce particularly effective ways of writing. This is even more true in coding; consistent style choices make scripts much easier to read. Just like version control, standards become absolutely essential as projects become large ($n>1$, where $n$ is the number of coders).\n", " \n", "Some programming languages have multiple competing standards, and it's easy to imagine how messy this can get. You can find strong opinions on what constitutes a tab, for instance. Luckily, Python doesn't have this issue. The official standard, PEP8, is used everywhere. Unless you plan on hiding all the code you write from the outside world, you should learn it.\n", "\n", "To help out coders, there are tools to test for compliance. The aptly named `pep8` library shows you where your code deviates from the PEP8 standard, and `autopep8` goes a step further by trying to fix all of your errors for you. These are both run from the shell. (They are not available in Canopy Basic, so you'll need to install them yourself or get the Academic license.)" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "$ pep8 my_code.py\n", "$ autopep8 my_code.py > my_new_code.py" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Error Handling (`try`/`except`)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`try`/`catch` error handling should encapsulate the fewest statements possible. It can also reduce code readability, and so should probably be used only where things are likely to go wrong, in your judgment: the file system, or some obtuse calculation.\n", "\n", "Basically, `try` lets you execute an error-prone segment of code, while `except` lets you handle any or all of the errors that arise. (It is better to handle less, as a general maxim, so that you don't mask other errors lurking in an operation.) An optional `finally` clause will execute in any case.\n", "\n", "_**The Principle of Least Astonishment**_: The result of performing some operation should be obvious, consistent, and predictable, based upon the name of the operation and other clues." ] }, { "cell_type": "code", "collapsed": false, "input": [ "filename = 'spring.data'\n", "try:\n", " data = np.genfromtxt(filename)\n", "except:\n", " print 'Unable to read file \"%s\".'%filename" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "filename = 'spring.data'\n", "try:\n", " data = np.genfromtxt(filename)\n", " print data\n", "except IOError, err:\n", " print 'Unable to read file \"%s\"; %s.'%(filename,err)\n", " # why output err? what else can go wrong?\n", "finally:\n", " print 'Done with data loading code.'" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Code Performance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Substantial scientific programs will generally perform iterative or other intensive processes which act as bottlenecks to performance. If you can unambiguously identify the dominant processes in your program, you can often improve performance, sometimes substantially.\n", "\n", "By _profiling_ your code, you can determine where the most time is spent processing. Code profiling involves polling the current location of execution every few milliseconds or so and forming a table of the functions called together with how much time is spent in each. This gives you the information you need to make educated decisions about optimization. For instance, once you know that a specific numerical routine is eating up 90% of your program time, you may change the algorithm or parallelize the process. This lets you spend your precious time optimizing the _right_ portion of the code, rather than functions or loops that are called only a few times." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Code profiling_ refers to the process of analyzing your program at runtime to measure its performance across a variety of metrics. (We will discuss CPU time profiling only. You can also profile memory, input/output speed, etc.)\n", "\n", "There are three main ways you can profile your code in Python." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Basic timing: `time`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, you can use the time package as a stopwatch on specific functions and make your own immediate comparisons. This is the equivalent of using `print` as a debugger\u2014it's not exactly profiling, but can be very useful in isolated instances." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import time\n", "start = time.time()\n", "myFunc()\n", "end = time.time()\n", "print(repr(end-start), \" s elapsed.\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Time trials: `timeit`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Secondly, you can use the `timeit` module either inside or outside of your code to accomplish something similar. `timeit` is highly configurable as to both the number of runs and the \n", "\n", "`timeit` requires that you create a `Timer` object given the statement to be run and any necessary environmental imports. The `timeit` function runs 1,000,000 trials and returns the total time (in seconds). The `repeat` functions offers the ability to choose both the number of separate trials and the number of function calls in each trial, returning a list of times." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import timeit\n", "t = timeit.Timer(\"jn(0,1.5+1j)\",\"from scipy.special import jn\")\n", "t.timeit()\n", "t.repeat(4,1000000)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`timeit` can also be conveniently invoked from the command line, where it runs three trials by default and gives you the best time of the three." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "$ python -m timeit \"x=2**100;x**100\"\n", "10000 loops, best of 3: 20.3 usec per loop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is also an IPython magic command, `%timeit`, available." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from scipy.special import jn\n", "%timeit jn(0,1.5+1j)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Full profiling: `cProfile`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, you can use a full profiler. A profiler runs your code inside of a framework which it can query hundreds or thousands of times per second in order to create a breakdown of where time is spent in the program. This allows you to identify bottlenecks to target in your optimization efforts.\n", "\n", "We will use the built-in module `cProfile`, which gives you an embarassment of riches on timing information." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "$ python -m cProfile -s time my_code.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This type of profiler is inappropriate for benchmarking, as it introduces a relative overhead for Python programs. You should also be aware of the granularity of your system clock. Small values ($<10^{-6} s$) are possibly meaningless and so you should use large numbers of calls instead if you need to know small differences (this is what `timeit` does, of course)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another useful tool is `RunSnakeRun`, which renders the output of `cProfile` more legible (including showing nested system calls)." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "$ python -m cProfile -s time my_code.py > my_code.profile\n", "$ runsnake my_code.profile" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For further information, consult Warren (2013)." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Performance Tips" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take a look at [these benchmarks versus C/C++, MATLAB, and various Python-based implementations](http://wiki.scipy.org/PerformancePython) or [these](http://julialang.org/benchmarks/). It is clear that to really get the best performance out of your Python program, you will need to dig into the specific performance of your code (profiling, above) and make systematic changes to the functions you call and the structures you use. Here are [a few straightforward tweaks (and some not-so-straightforward ones](http://wiki.scipy.org/PerformanceTips) which can help improve your performance.\n", "\n", "In certain cases, SciPy and NumPy handle the general case of an operation, which makes them relative slow. If you know more about your system, then you can do much better. Two examples: [improving convolution by _not_ using SciPy `convolve`](http://stackoverflow.com/questions/2196693/improving-numpy-performance); [improving performance by adjusting the NumPy buffer size manually](http://stackoverflow.com/questions/17128116/why-is-numpy-any-so-slow-over-large-arrays). In addition, through `scipy.linalg.blas`, you can directly access the standard Fortran [BLAS](http://www.netlib.org/blas/) routines when nothing else will cut it.\n", "\n", "In short: _avoid copying arrays_ (use views and slices), and _push time-critical code sections over to C/C++/Fortran_ if possible (via NumPy or by hand)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Langtangen, Hans Petter. _Python Scripting for Computational Science_, 3ed. Berlin\u2013Heidelberg: Springer\u2013Verlag, 2009.\n", "- Lugo, Michael. [On propagation of errors](http://gottwurfelt.com/2012/03/26/on-propagation-of-errors/). 26 March 2012.\n", "- Warren, Russell. [A Brief Intro to Profiling in Python](https://speakerdeck.com/rwarren/a-brief-intro-to-profiling-in-python). Ottawa Python Authors Group, 28 February 2013." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Neal Davis developed these materials for [Computational Science and Engineering](http://cse.illinois.edu/) at the University of Illinois at Urbana\u2013Champaign. This content is available under a Creative Commons Attribution 3.0 Unported License." ] } ], "metadata": {} } ] }