{ "nbformat_minor": 0, "cells": [ { "source": [ "# Introduction to Python 2" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Defensive Programming" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
\n", "
\n", "

Learning Objectives

\n", "
\n", "
\n", "\n", "
\n", "
" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Our previous lessons have introduced the basic tools of programming: variables and lists, file I/O, loops, conditionals, and functions. What they haven\u2019t done is show us how to tell whether a program is getting the right answer, and how to tell if it\u2019s still getting the right answer as we make changes to it.\n", "\n", "To achieve that, we need to:\n", "\n", "- Write programs that check their own operation.\n", "- Write and run tests for widely-used functions.\n", "- Make sure we know what \u201ccorrect\u201d actually means.\n", "\n", "The good news is, doing these things will speed up our programming, not slow it down. As in real carpentry \u2014 the kind done with lumber \u2014 the time saved by measuring carefully before cutting a piece of wood is much greater than the time that measuring takes." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "### Assertions" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "The first step toward getting the right answers from our programs is to assume that mistakes _will_ happen and to guard against them. This is called [defensive programming](http://swcarpentry.github.io/python-novice-inflammation/reference.html#defensive-programming), and the most common way to do it is to add [assertions](http://swcarpentry.github.io/python-novice-inflammation/reference.html#assertion) to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertion\u2019s condition. If it\u2019s true, Python does nothing, but if it\u2019s false, Python halts the program immediately and prints the error message if one is provided. For example, this piece of code halts as soon as the loop encounters a value that isn\u2019t positive:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Usual import first\n" ] }, { "source": [ "Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80-90% are working correctly. Broadly speaking, assertions fall into three categories:\n", "\n", "- A [precondition](http://swcarpentry.github.io/python-novice-inflammation/reference.html#precondition) is something that must be true at the start of a function in order for it to work correctly.\n", "- A [postcondition](http://swcarpentry.github.io/python-novice-inflammation/reference.html#postcondition) is something that the function guarantees is true when it finishes.\n", "- An [invariant](http://swcarpentry.github.io/python-novice-inflammation/reference.html#invariant) is something that is always true at a particular point inside a piece of code.\n", "\n", "For example, suppose we are representing rectangles using a [tuple](http://swcarpentry.github.io/python-novice-inflammation/reference.html#tuple) of four coordinates `(x0, y0, x1, y1)`, representing the lower left and upper right corners of the rectangle. In order to do some calculations, we need to normalize the rectangle so that the lower left corner is at the origin and the longest side is 1.0 units long. This function does that, but checks that its input is correctly formatted and that its result makes sense:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "source": [ "The preconditions on lines 2, 4, and 5 catch invalid inputs:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "The post-conditions help us catch bugs by telling us when our calculations cannot have been correct. For example, if we normalize a rectangle that is taller than it is wide everything seems OK:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "but if we normalize one that\u2019s wider than it is tall, the assertion is triggered:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "Re-reading our function, we realize that line 10 should divide `dy` by `dx` rather than `dx` by `dy`. (You can display line numbers by typing Ctrl-M, then L.) If we had left out the assertion at the end of the function, we would have created and returned something that had the right shape as a valid answer, but wasn\u2019t. Detecting and debugging that would almost certainly have taken more time in the long run than writing the assertion.\n", "\n", "But assertions aren\u2019t just about catching errors: they also help people understand programs. Each assertion gives the person reading the program a chance to check (consciously or otherwise) that their understanding matches what the code is doing.\n", "\n", "Most good programmers follow two rules when adding assertions to their code. The first is, _fail early, fail often_. The greater the distance between when and where an error occurs and when it\u2019s noticed, the harder the error will be to debug, so good code catches mistakes as early as possible.\n", "\n", "The second rule is, _turn bugs into assertions or tests_. Whenever you fix a bug, write an assertion that catches the mistake should you make it again. If you made a mistake in a piece of code, the odds are good that you have made other mistakes nearby, or will make the same mistake (or a related one) the next time you change it. Writing assertions to check that you haven\u2019t [regressed](http://swcarpentry.github.io/python-novice-inflammation/reference.html#regression) (i.e., haven\u2019t re-introduced an old problem) can save a lot of time in the long run, and helps to warn people who are reading the code (including your future self) that this bit is tricky." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "### Test-Driven Development" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "An assertion checks that something is true at a particular point in the program. The next step is to check the overall behavior of a piece of code, i.e., to make sure that it produces the right output when it\u2019s given a particular input. For example, suppose we need to find where two or more time series overlap. The range of each time series is represented as a pair of numbers, which are the time the interval started and ended. The output is the largest range that they all include:\n", "\n", "![Overlapping Ranges](fig/python-overlapping-ranges.svg)\n", "\n", "Most novice programmers would solve this problem like this:\n", "\n", "1. Write a function `range_overlap`.\n", "2. Call it interactively on two or three different inputs.\n", "3. If it produces the wrong answer, fix the function and re-run that test.\n", "\n", "This clearly works \u2014 after all, thousands of scientists are doing it right now \u2014 but there\u2019s a better way:\n", "\n", "1. Write a short function for each test.\n", "2. Write a `range_overlap` function that should pass those tests.\n", "3. If `range_overlap` produces any wrong answers, fix it and re-run the test functions.\n", "\n", "Writing the tests before writing the function they exercise is called [test-driven development](http://swcarpentry.github.io/python-novice-inflammation/reference.html#test-driven-development) (TDD). Its advocates believe it produces better code faster because:\n", "\n", "1. If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors.\n", "2. Writing tests helps programmers figure out what the function is actually supposed to do.\n", "\n", "Here are three test functions for `range_overlap`:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "The error is actually reassuring: we haven\u2019t written `range_overlap` yet, so if the tests passed, it would be a sign that someone else had and that we were accidentally using their function.\n", "\n", "And as a bonus of writing these tests, we\u2019ve implicitly defined what our input and output look like: we expect a list of pairs as input, and produce a single pair as output.\n", "\n", "Something important is missing, though. We don\u2019t have any tests for the case where the ranges don\u2019t overlap at all:" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == ???" ], "cell_type": "raw", "metadata": {} }, { "source": [ "What should `range_overlap` do in this case: fail with an error message, produce a special value like `(0.0, 0.0)` to signal that there\u2019s no overlap, or something else? Any actual implementation of the function will do one of these things; writing the tests first helps us figure out which is best _before_ we\u2019re emotionally invested in whatever we happened to write before we realized there was an issue.\n", "\n", "And what about this case?" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == ???" ], "cell_type": "raw", "metadata": {} }, { "source": [ "Do two segments that touch at their endpoints overlap or not? Mathematicians usually say \"yes\", but engineers usually say \"no\". The best answer is \"whatever is most useful in the rest of our program\", but again, any actual implementation of `range_overlap` is going to do _something_, and whatever it is ought to be consistent with what it does when there\u2019s no overlap at all.\n", "\n", "Since we\u2019re planning to use the range this function returns as the X axis in a time series chart, we decide that:\n", "\n", "1. every overlap has to have non-zero width, and\n", "2. we will return the special value None when there\u2019s no overlap.\n", "\n", "`None` is built into Python, and means \u201cnothing here\u201d. (Other languages often call the equivalent value `null` or `nil`). With that decision made, we can finish writing our last two tests:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "Again, we get an error because we haven\u2019t written our function, but we\u2019re now ready to do so:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "(Take a moment to think about why we use `max` to raise `lowest` and `min` to lower `highest`). We\u2019d now like to re-run our tests, but they\u2019re scattered across three different cells. To make running them easier, let\u2019s put them all in a function:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "source": [ "We can now test `range_overlap` with a single function call:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "The first test that was supposed to produce `None` fails, so we know something is wrong with our function. We _don\u2019t_ know whether the other tests passed or failed because Python halted the program as soon as it spotted the first error. Still, some information is better than none, and if we trace the behavior of the function with that input, we realize that we\u2019re initializing `lowest` and `highest` to 0.0 and 1.0 respectively, regardless of the input values. This violates another important rule of programming: _always initialize from data_." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
\n", "
\n", "

Pre- and post-conditions

\n", "
\n", "
\n", "

Suppose you are writing a function called average that calculates the average of the numbers in a list. What pre-conditions and post-conditions would you write for it? Compare your answer to your neighbor\u2019s: can you think of a function that will pass your tests but not hers or vice versa?

\n", "
\n", "
" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
\n", "
\n", "

Testing assertions

\n", "
\n", "
\n", "

Given a sequence of values, the function running returns a list containing the running totals at each index.

\n", "
running([1, 2, 3, 4])
\n", "
[1, 3, 6, 10]
\n", "
running('abc')
\n", "
['a', 'ab', 'abc']
\n", "

Explain in words what the assertions in this function check, and for each one, give an example of input that will make that assertion fail.

\n", "
def running(values):\n",
    "    assert len(values) > 0\n",
    "    result = [values[0]]\n",
    "    for v in values[1:]:\n",
    "        assert result[-1] >= 0\n",
    "        result.append(result[-1] + v)\n",
    "        assert result[-1] >= result[0]\n",
    "    return result
\n", "
\n", "
" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [ "# assert len(values) > 0 tests that you haven't passed an empty sequence in\n", "# assert result[-1] >= 0 checks that the last item in the results list is greater than or equal to 0\n", "# assert result[-1] >= result[0] checks that the last item in the results list is greater than or equal to the first item\n" ] }, { "source": [ "
\n", "
\n", "

Fixing and testing

\n", "
\n", "
\n", "

Fix range_overlap. Re-run test_range_overlap after each change you make.

\n", "
\n", "
" ], "cell_type": "markdown", "metadata": {} } ], "metadata": { "kernelspec": { "display_name": "Python 2", "name": "python2", "language": "python" }, "language_info": { "mimetype": "text/x-python", "nbconvert_exporter": "python", "name": "python", "file_extension": ".py", "version": "2.7.11", "pygments_lexer": "ipython2", "codemirror_mode": { "version": 2, "name": "ipython" } } }, "nbformat": 4 }