{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2013_fall_ASTR599/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# When Things Go Wrong:\n", "## Errors, Exceptions, and Debugging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Today we'll cover perhaps one of the most important aspects of using Python: dealing with errors and bugs in code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Three Classes of Errors\n", "\n", "Types of bugs/errors in code, from the easiest to the most difficult to diagnose:\n", "\n", "1. **Syntax Errors:** Errors where the code is not valid Python (generally easy to fix)\n", "2. **Runtime Errors:** Errors where syntactically valid code fails to execute (sometimes easy to fix)\n", "3. **Semantic Errors:** Errors in logic (often very difficult to fix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Syntax Errors\n", "\n", "Syntax errors are when you write code which is not valid Python. For example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "X = [1, 2, 3)" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "pyerr", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m X = [1, 2, 3)\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "prompt_number": 50 }, { "cell_type": "code", "collapsed": false, "input": [ "y = 4x + 3" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "pyerr", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m y = 4x + 3\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "prompt_number": 51 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that if your code contains even a *single* syntax error, none of it will run:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = 4\n", "something ==== is wrong" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 2)", "output_type": "pyerr", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m2\u001b[0m\n\u001b[0;31m something ==== is wrong\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "print a" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'a' is not defined", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mprint\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'a' is not defined" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even though the syntax error appears below the (valid) variable definition, the valid code is not executed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Runtime Errors\n", "Runtime errors occur when the code is **valid python code**, but are errors within the context of the program execution. For example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print Q" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'Q' is not defined", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mprint\u001b[0m \u001b[0mQ\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'Q' is not defined" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "x = 1 + 'abc'" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unsupported operand type(s) for +: 'int' and 'str'", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m'abc'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for +: 'int' and 'str'" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "X = 1 / 0" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "ZeroDivisionError", "evalue": "integer division or modulo by zero", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mZeroDivisionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mX\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mZeroDivisionError\u001b[0m: integer division or modulo by zero" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "np.add(1, 2, 3, 4)" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "invalid number of arguments", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: invalid number of arguments" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "x = [1, 2, 3]\n", "print x[100]" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "list index out of range", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mprint\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m100\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: list index out of range" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unlike Syntax errors, RunTime errors occur **during code execution**, which means that valid code occuring before the runtime error *will* execute:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "spam = \"my all-time favorite\"\n", "eggs = 1 / 0" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "ZeroDivisionError", "evalue": "integer division or modulo by zero", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mZeroDivisionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mspam\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"my all-time favorite\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0meggs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mZeroDivisionError\u001b[0m: integer division or modulo by zero" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "print spam" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "my all-time favorite\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Semantic Errors\n", "Semantic errors are perhaps the most insidious errors, and are by far the ones that will take most of your time. Semantic errors occur when the code is **syntactically correct**, but produces the wrong result.\n", "\n", "By way of example, imagine you want to write a simple script to approximate the value of $\\pi$ according to the following formula:\n", "\n", "$$\n", "\\pi = \\sqrt{12} \\sum_{k = 0}^{\\infty} \\frac{(-3)^{-k}}{2k + 1}\n", "$$\n", "\n", "You might write a function something like this, using numpy's vectorized syntax:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from math import sqrt\n", "\n", "def approx_pi(nterms=100):\n", " k = np.arange(nterms)\n", " return sqrt(12) * np.sum((-3.0) ** (-k) / (2.0 * k + 1.0))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks OK, yes? Let's try it out:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print sqrt(12)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "3.46410161514\n" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "print approx_pi(100)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "3.14159265359\n" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Huh. That doesn't look like $\\pi$. Maybe we need more terms?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print approx_pi(1000)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nope... it looks like the algorithm simply gives the wrong result. This is a classic example of a semantic error.\n", "\n", "**Question: can you spot the problem?**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Runtime Errors and Exception Handling\n", "Now we'll talk about how to handle RunTime errors (we skip Syntax Errors because they're pretty self-explanatory).\n", "\n", "Runtime errors can be handled through \"exception catching\" using ``try...except`` statements. Here's a basic example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " print \"this block gets executed first\"\n", "except:\n", " print \"this block gets executed if there's an error\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " print \"this block gets executed first\"\n", " x = 1 / 0 # ZeroDivisionError\n", " print \"we never get here\"\n", "except:\n", " print \"this block gets executed if there's an error\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the first block executes **up until the point** of the Runtime error.\n", "Once the error is hit, the ``except`` block is executed.\n", "\n", "One important note: the above clause catches **any and all** exceptions. It is not\n", "generally a good idea to catch-all. Better is to name the precise exception you expect:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def safe_divide(a, b):\n", " try:\n", " return a / b\n", " except:\n", " print \"oops, dividing by zero. Returning None.\"\n", " return None\n", " \n", "print safe_divide(15, 3)\n", "print safe_divide(1, 0)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But there's a problem here: this is a **catch-all** exception, and will sometimes give us misleading information. For example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print safe_divide(15, 'three')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our program tells us we're dividing by zero, but we aren't! This is one reason you should **almost never** use a catch-all ``try..except`` statement, but instead specify the errors you're trying to catch:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def better_safe_divide(a, b):\n", " try:\n", " return a / b\n", " except ZeroDivisionError:\n", " print \"oops, dividing by zero. Returning None.\"\n", " return None\n", " \n", "print better_safe_divide(15, 0)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print better_safe_divide(15, 'three')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This also allows you to specify different behaviors for different exceptions:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def even_better_safe_divide(a, b):\n", " try:\n", " return a / b\n", " except ZeroDivisionError:\n", " print \"oops, dividing by zero. Returning None.\"\n", " return None\n", " except TypeError:\n", " print \"incompatible types. Returning None\"\n", " return None\n", " \n", "print even_better_safe_divide(15, 3)\n", "print even_better_safe_divide(15, 0)\n", "print even_better_safe_divide(15, 'three')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember this lesson, and **always specify your except statements!** I once spent an entire day tracing down a bug in my code which amounted to this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Raising Your Own Exceptions\n", "\n", "When you write your own code, it's good practice to use the ``raise`` keyword to create your own exceptions\n", "when the situation calls for it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import os # the \"os\" module has useful operating system stuff\n", "\n", "def read_file(filename):\n", " if not os.path.exists(filename):\n", " raise ValueError(\"'{0}' does not exist\".format(filename))\n", " f = open(filename)\n", " result = f.read()\n", " f.close()\n", " return result" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%%file tmp.txt\n", "this is the contents of the file" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "read_file('tmp.txt')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "read_file('file.which.does.not.exist')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is sometimes useful to define your own custom exceptions, which you can do easily via class inheritance:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class NonExistentFile(RuntimeError):\n", " # you can customize exception behavior by defining class methods.\n", " # we won't discuss that here.\n", " pass\n", "\n", "\n", "def read_file(filename):\n", " if not os.path.exists(filename):\n", " raise NonExistentFile(filename)\n", " f = open(filename)\n", " result = f.read()\n", " f.close()\n", " return result" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print read_file('tmp.txt')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print read_file('file.which.does.not.exist')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Get used to throwing appropriate — and meaningful — exceptions in your code!** It makes reading and debugging your code much, much easier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More Advanced Exception Handling\n", "\n", "There is also the possibility of adding ``else`` and ``finally`` clauses to your try statements.\n", "The behavior looks like this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " print \"doing something\"\n", "except:\n", " print \"this only happens if it fails\"\n", "else:\n", " print \"this only happens if it succeeds\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " raise ValueError()\n", "except:\n", " print \"this only happens if it fails\"\n", "else:\n", " print \"this only happens if it succeeds\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why would you ever want to do this?\n", "Mainly, it prevents the code within the ``else`` block from being caught by the ``try`` block.\n", "Accidentally catching an exception you don't mean to catch can lead to confusing results." ] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " print \"do something\"\n", "except:\n", " print \"this only happens if it fails\"\n", "else:\n", " raise ValueError()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last statement you might use is the ``finally`` statement, which looks like this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " print \"do something\"\n", "except:\n", " print \"this only happens if it fails\"\n", "else:\n", " print \"this only happens if it succeeds\"\n", "finally:\n", " print \"this happens no matter what.\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " raise ValueError()\n", "except:\n", " print \"this only happens if it fails\"\n", "else:\n", " print \"this only happens if it succeeds\"\n", "finally:\n", " print \"this happens no matter what.\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "``finally`` is generally used for some sort of cleanup (closing a file, etc.) It might seem a bit redundant, though. Why not write the following?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " print \"do something\"\n", "except:\n", " print \"this only happens if it fails\"\n", "else:\n", " print \"this only happens if it succeeds\"\n", "print \"this happens no matter what.\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The main difference is when the clause is used within a function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def divide(x, y):\n", " try:\n", " result = x / y\n", " except ZeroDivisionError:\n", " print \"division by zero!\"\n", " return None\n", " else:\n", " print \"result is\", result\n", " return result\n", " finally:\n", " print \"some sort of cleanup\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print divide(15, 3)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print divide(15, 0)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the finally clause is executed *no matter what*, even if the ``return`` statement has already executed!\n", "This makes it useful for cleanup tasks, such as closing an open file, restoring a state, or something along those lines." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Quick Exercise**\n", "\n", "Here is a function which takes a filename, opens the file, reads the result, closes the file, and returns the contents. It should look something like this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_all_from_file(filename):\n", " f = open(filename)\n", " contents = f.read()\n", " f.close()\n", " return contents" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the above concepts to improve this function, so that it does the following:\n", "\n", "- raises an informative custom error if the file doesn't exist\n", "- uses ``try``, ``except``, ``else``, amd/or ``finally`` to safely return the file's contents & close the file if necessary.\n", "- add a keyword ``safe`` which defaults to False. If the function is called with ``safe=True``, then return an empty string if the file doesn't exist. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_all_from_file(filename):\n", " # your code here" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "read_all_from_file('tmp.txt')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "read_all_from_file('file.which.does.not.exist')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "read_all_from_file('file.which.does.not.exist', safe=True)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Handling Semantic Errors: Debugging\n", "\n", "Here is the most difficult piece of this lecture: handling semantic errors. This is the situation where your program *runs*, but doesn't produce the correct result. These errors are commonly known as **bugs**, and the process of correcting the bugs is **debugging**.\n", "\n", "There are three main methods commonly used for debugging Python code. In order of increasing sophistication, they are:\n", "\n", "1. Inserting ``print`` statements\n", "2. Injecting an IPython interpreter\n", "3. Using a line-by-line debugger like ``pdb``" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The easiest method: print statements\n", "\n", "Say we're trying to compute the **entropy** of a set of probabilities. The\n", "form of the equation is\n", "$$\n", "H = -\\sum_i p_i \\log(p_i)\n", "$$\n", "We can write the function like this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def entropy(p):\n", " p = np.asarray(p) # convert p to array if necessary\n", " items = p * np.log(p)\n", " return -np.sum(items)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Say these are our probabilities:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "p = np.arange(5.)\n", "p /= p.sum()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print entropy(p)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We get ``nan``, which stands for \"Not a Number\". What's going on here?\n", "\n", "Often the first thing to try is to simply print things and see what's going on.\n", "Within the file, you can add some print statements in key places:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def entropy(p):\n", " p = np.asarray(p) # convert p to array if necessary\n", " print p\n", " items = p * np.log(p)\n", " print items\n", " return -np.sum(items)\n", "\n", "entropy(p)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By printing some of the intermediate items, we see the problem: ``0 * np.log(0)`` is resulting in a ``NaN``. Though mathematically it's true that $\\lim_{x\\to 0} [x\\log(x)] = 0$, the fact that we're performing the computation numerically means that we don't obtain this result.\n", "\n", "Often, inserting a few print statements can be enough to figure out what's going on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Embedding an IPython instance\n", "\n", "You can go a step further by actually embedding an IPython instance in your code.\n", "This doesn't work from within the notebook, so we'll create a file and run it from\n", "the command-line" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%file test_script.py\n", "import numpy as np\n", "\n", "def entropy(p):\n", " p = np.asarray(p) # convert p to array if necessary\n", " items = p * np.log(p)\n", " import IPython; IPython.embed()\n", " return -np.sum(items)\n", "\n", "p = np.arange(5.)\n", "p /= p.sum()\n", "entropy(p)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now open a terminal and run this. You'll see that an IPython interpreter opens, and from there you can print ``p``, print ``items``, and do any manipulation you feel like doing. This can also be a nice way to debug a script." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Using a Debugger\n", "\n", "Python comes with a built-in debugger called [pdb](http://docs.python.org/2/library/pdb.html). It allows you to step line-by-line through a computation and examine what's happening at each step. Note that this should probably be your last resort in tracing down a bug. I've probably used it a dozen times or so in five years of coding. But it can be a useful tool to have in your toolbelt.\n", "\n", "Outside of IPython, you can use it by inserting the line\n", "``` python\n", "import pdb; pdb.set_trace()\n", "```\n", "within your script. Note that this method **won't work well within an IPython session,** but will work if you use it via the command-line. For this reason, we'll create a simple file and switch to the terminal.\n", "\n", "First let's create a simple (but not well-written) script which attempts to add a few string-typed numbers." ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = '123'\n", "b = '456'\n", "c = '789'\n", "total = a + b + c\n", "print total" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can immediately see the problem here: the variables are strings, so the total is a concatenation rather than an addition. It's easy to see in this short example, but if the variables were, say, read from a file within a longer script, this might be harder to track down.\n", "\n", "Let's see how we can use Python's debugger, **pdb** to trace what's happening in the program. Again, note that this **will not work within IPython** (there are other Python processes going on that confuse the debugger), so we will save the code as a file and run it from the terminal:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%file pdb_test.py\n", "import pdb\n", "a = '123'\n", "pdb.set_trace()\n", "b = '456'\n", "c = '789'\n", "total = a + b + c\n", "print total" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now run it from the terminal, and you'll see something like this:\n", "\n", "```\n", "> b = '456'\n", "(Pdb) \n", "```\n", "\n", "This is the pdb prompt, where you can enter one of several commands. If you type ``h`` for \"help\", it will list the possible commands:\n", "\n", "```\n", "(Pdb) h\n", "Documented commands (type help ):\n", "========================================\n", "EOF bt cont enable jump pp run unt \n", "a c continue exit l q s until \n", "alias cl d h list quit step up \n", "args clear debug help n r tbreak w \n", "b commands disable ignore next restart u whatis\n", "break condition down j p return unalias where \n", "\n", "Miscellaneous help topics:\n", "==========================\n", "exec pdb\n", "\n", "Undocumented commands:\n", "======================\n", "retval rv\n", "```\n", "\n", "Type ``h`` collowed by a command to see the documentation of that command:\n", "\n", "```\n", "(Pdb) h n\n", "n(ext)\n", "Continue execution until the next line in the current function\n", "is reached or it returns.\n", "```\n", "\n", "The most useful are probably the following:\n", "\n", "- ``q``(uit): quit the debugger and the program.\n", "- ``c``(ontinute): quit the debugger, continue in the program.\n", "- ``n``(ext): go to the next step of the program.\n", "- ``list``: show the current location in the file.\n", "- ````: repeat the previous command.\n", "- ``p``(rint): print variables.\n", "- ``s``(tep into): step into a subroutine.\n", "- ``r``(eturn out): return out of a subroutine.\n", "- Arbitrary Python code: writing Python code at the (Pdb) will execute it at that point in the program.\n", "\n", "Take a few moments to try these commands out in the following code, which reads some numbers from a file. Here, instead of going to the terminal, we'll use IPython's ``%run`` magic with the ``-d`` option to automatically start ``pdb`` from the beginning (note that this will only work in IPython version 1.0 or greater; if you're using an older IPython version, you can run it from the command-line with\n", "\n", "```\n", "python -m pdb pdb_test2.py\n", "```\n", "\n", "Using ``%run -d`` actually enters ``ipdb``, which is IPython's wrapper of the standard Python debugger." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%file numbers.dat\n", "123 456 789" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%%file pdb_test2.py\n", "# File to experiment with Python debugger\n", "\n", "def add_lines(filename):\n", " f = open(filename)\n", " lines = f.read().split()\n", " f.close()\n", " result = lines[0]\n", " for line in lines[1:]:\n", " result += line\n", " return result\n", "\n", "filename = 'numbers.dat'\n", "total = add_lines(filename)\n", "print total" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%run -d pdb_test2.py" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced debugging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you write more advanced code (especially if you dig into C or Fortran extensions of Python), you might run into more serious errors like segmentation faults, core dumps, and memory leaks. For these, more advanced tools like [gdb](https://wiki.python.org/moin/DebuggingWithGdb) and [valgrind](http://valgrind.org/). If you ever get to the point of needing these, there is a lot of specific info floating around on the web. Here I'll just leave off by letting you know that they exist." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Homework\n", "Here is a script taken from [scipy lectures](http://scipy-lectures.github.io/advanced/debugging/)\n", "\n", "It is meant to compare the performance of several root-finding algorithms within the ``scipy.optimize``\n", "package, but it breaks. Use one or more of the above tools to figure out what's going on and to fix\n", "the script.\n", "\n", "When you turn in this homework (via github pull request, of course), please **write a one to two paragraph summary**\n", "of the process you used to debug this, including any dead ends (it may help to take notes as you go)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "\"\"\"\n", "A script to compare different root-finding algorithms.\n", "\n", "This version of the script is buggy and does not execute. It is your task\n", "to find an fix these bugs.\n", "\n", "The output of the script sould look like:\n", "\n", " Benching 1D root-finder optimizers from scipy.optimize:\n", " brenth: 604678 total function calls\n", " brentq: 594454 total function calls\n", " ridder: 778394 total function calls\n", " bisect: 2148380 total function calls\n", "\"\"\"\n", "from itertools import product\n", "\n", "import numpy as np\n", "from scipy import optimize\n", "\n", "FUNCTIONS = (np.tan, # Dilating map\n", " np.tanh, # Contracting map\n", " lambda x: x**3 + 1e-4*x, # Almost null gradient at the root\n", " lambda x: x+np.sin(2*x), # Non monotonous function\n", " lambda x: 1.1*x+np.sin(4*x), # Fonction with several local maxima\n", " )\n", "\n", "OPTIMIZERS = (optimize.brenth, optimize.brentq, optimize.ridder,\n", " optimize.bisect)\n", "\n", "\n", "def apply_optimizer(optimizer, func, a, b):\n", " \"\"\" Return the number of function calls given an root-finding optimizer, \n", " a function and upper and lower bounds.\n", " \"\"\"\n", " return optimizer(func, a, b, full_output=True)[1].function_calls,\n", "\n", "\n", "def bench_optimizer(optimizer, param_grid):\n", " \"\"\" Find roots for all the functions, and upper and lower bounds\n", " given and return the total number of function calls.\n", " \"\"\"\n", " return sum(apply_optimizer(optimizer, func, a, b)\n", " for func, a, b in param_grid)\n", "\n", "\n", "def compare_optimizers(optimizers):\n", " \"\"\" Compare all the optimizers given on a grid of a few different\n", " functions all admitting a signle root in zero and a upper and\n", " lower bounds.\n", " \"\"\"\n", " random_a = -1.3 + np.random.random(size=100)\n", " random_b = .3 + np.random.random(size=100)\n", " param_grid = product(FUNCTIONS, random_a, random_b)\n", " print \"Benching 1D root-finder optimizers from scipy.optimize:\"\n", " for optimizer in OPTIMIZERS:\n", " print '% 20s: % 8i total function calls' % (\n", " optimizer.__name__, \n", " bench_optimizer(optimizer, param_grid)\n", " )\n", "\n", "\n", "if __name__ == '__main__':\n", " compare_optimizers(OPTIMIZERS)" ], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }