{ "nbformat_minor": 0, "cells": [ { "source": [ "# Introduction to Python 2" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Creating Functions" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
\n", "
\n", "

Learning Objectives:

\n", "
\n", "\n", "- Define a function that takes parameters.\n", "- Return a value from a function.\n", "- Test and debug a function.\n", "- Set default values for function parameters.\n", "- Explain why we should divide programs into small, single-purpose functions." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "At this point, we\u2019ve written code to draw some interesting features in our inflammation data, loop over all our data files to quickly draw these plots for each of them, and have Python make decisions based on what it sees in our data. But, our code is getting pretty long and complicated; what if we had thousands of datasets, and didn\u2019t want to generate a figure for every single one? Commenting out the figure-drawing code is a nuisance. Also, what if we want to use that code again, on a different dataset or at a different point in our program? Cutting and pasting it is going to make our code get very long and very repetitive, very quickly. We\u2019d like a way to package our code so that it is easier to reuse, and Python provides for this by letting us define things called \u2018functions\u2019 - a shorthand way of re-executing longer pieces of code.\n", "\n", "Let\u2019s start by defining a function `kelvin_to_celsius` that converts temperatures from Kelvin to Celsius:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Let's get our import statements out of the way first\n" ] }, { "source": [ "The function definition opens with the word `def`, which is followed by the name of the function and a parenthesized list of parameter names. The body of the function \u2014 the statements that are executed when it runs \u2014 is indented below the definition line, typically by four spaces.\n", "\n", "When we call the function, the values we pass to it are assigned to those variables so that we can use them inside the function. Inside the function, we use a [return statement](http://swcarpentry.github.io/python-novice-inflammation/reference.html#return-statement) to send a result back to whoever asked for it.\n", "\n", "Let\u2019s try running our function. Calling our own function is no different from calling any other function:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "We\u2019ve successfully called the function that we defined, and we have access to the value that we returned." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "### Integer division" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "We are using Python 3 division, which always returns a floating point number:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "Unfortunately, this wasn\u2019t the case in Python 2:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "If you are using Python 2 and want to keep the fractional part of division you need to convert one or the other number to floating point:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "And if you want an integer result from division in Python 3, use a double-slash:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "### Composing Functions" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Now that we\u2019ve seen how to turn Kelvin into Celsius, let's try converting Celsius to Fahrenheit:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "What about converting Kelvin to Fahrenheit? We could write out the formula, but we don\u2019t need to. Instead, we can compose the two functions we have already created:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "This is our first taste of how larger programs are built: we define basic operations, then combine them in ever-larger chunks to get the effect we want. Real-life functions will usually be larger than the ones shown here \u2014 typically half a dozen to a few dozen lines \u2014 but they shouldn\u2019t ever be much longer than that, or the next person who reads it won\u2019t be able to understand what\u2019s going on." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "### Tidying up" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Now that we know how to wrap bits of code up in functions, we can make our inflammation analyasis easier to read and easier to reuse. First, let\u2019s make an `analyse` function that generates our plots:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "source": [ "and another function called `detect_problems` that checks for those systematics we noticed:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "source": [ "Notice that rather than jumbling this code together in one giant `for` loop, we can now read and reuse both ideas separately. We can reproduce the previous analysis with a much simpler `for` loop:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [ "# First redefine our list of filenames from the last lesson\n" ] }, { "source": [ "By giving our functions human-readable names, we can more easily read and understand what is happening in the `for` loop. Even better, if at some later date we want to use either of those pieces of code again, we can do so in a single line." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "### Testing and Documenting" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Once we start putting things in functions so that we can re-use them, we need to start testing that those functions are working correctly. To see how to do this, let\u2019s write a function to center a dataset around a particular value:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "source": [ "We could test this on our actual data, but since we don\u2019t know what the values ought to be, it will be hard to tell if the result was correct. Instead, let\u2019s use NumPy to create a matrix of 0\u2019s and then center that around 3:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "That looks right, so let\u2019s try `center` on our real data:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "It\u2019s hard to tell from the default output whether the result is correct, but there are a few simple tests that will reassure us:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "That seems almost right: the original mean was about 6.1, so the lower bound from zero is how about -6.1. The mean of the centered data isn\u2019t quite zero \u2014 we\u2019ll explore why not in the challenges \u2014 but it\u2019s pretty close. We can even go further and check that the standard deviation hasn\u2019t changed:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "Those values look the same, but we probably wouldn\u2019t notice if they were different in the sixth decimal place. Let\u2019s do this instead:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "Again, the difference is very small. It\u2019s still possible that our function is wrong, but it seems unlikely enough that we should probably get back to doing our analysis. We have one more task first, though: we should write some [documentation](http://swcarpentry.github.io/python-novice-inflammation/reference.html#documentation) for our function to remind ourselves later what it\u2019s for and how to use it.\n", "\n", "The usual way to put documentation in software is to add [comments](http://swcarpentry.github.io/python-novice-inflammation/reference.html#comment) like this:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": true }, "outputs": [], "source": [ "# centre(data, desired): return a new array containing the original data centered around the desired value.\n" ] }, { "source": [ "There\u2019s a better way, though. If the first thing in a function is a string that isn\u2019t assigned to a variable, that string is attached to the function as its documentation:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "source": [ "This is better because we can now ask Python\u2019s built-in help system to show us the documentation for the function:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "A string like this is called a [docstring](http://swcarpentry.github.io/python-novice-inflammation/reference.html#docstring). We don\u2019t need to use triple quotes when we write one, but if we do, we can break the string across multiple lines:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "### Defining Defaults" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "We have passed parameters to functions in two ways: directly, as in `type(data)`, and by name, as in `numpy.loadtxt(fname='something.csv', delimiter=',')`. In fact, we can pass the filename to `loadtxt` without the `fname=`:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "but we still need to say `delimiter=`:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "To understand what\u2019s going on, and make our own functions easier to use, let\u2019s re-define our center function like this:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "source": [ "The key change is that the second parameter is now written `desired=0.0` instead of just `desired`. If we call the function with two arguments, it works as it did before:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "But we can also now call it with just one parameter, in which case `desired` is automatically assigned the [default value](http://swcarpentry.github.io/python-novice-inflammation/reference.html#default-value) of 0.0:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "This is handy: if we usually want a function to work one way, but occasionally need it to do something else, we can allow people to pass a parameter when they need to but provide a default to make the normal case easier. The example below shows how Python matches values to parameters:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "As this example shows, parameters are matched up from left to right, and any that haven\u2019t been given a value explicitly get their default value. We can override this behavior by naming the value as we pass it in:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "With that in hand, let\u2019s look at the help for numpy.loadtxt:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "There\u2019s a lot of information here, but the most important part is the first couple of lines:" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
loadtxt(fname, dtype=, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None,\n",
    "        unpack=False, ndmin=0)
" ], "cell_type": "markdown", "metadata": { "collapsed": false } }, { "source": [ "This tells us that loadtxt has one parameter called fname that doesn\u2019t have a default value, and eight others that do. If we call the function like this:" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "then the filename is assigned to `fname` (which is what we want), but the delimiter string `','` is assigned to `dtype` rather than `delimiter`, because `dtype` is the second parameter in the list. However ',' isn\u2019t a known `dtype` so our code produced an error message when we tried to run it. When we call `loadtxt` we don\u2019t have to provide `fname=` for the filename because it\u2019s the first item in the list, but if we want the ',' to be assigned to the variable `delimiter`, we _do_ have to provide `delimiter=` for the second parameter since `delimiter` is not the second parameter in the list." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
\n", "
\n", "

Combining strings

\n", "
\n", "\n", "
\n", "

\u201cAdding\u201d two strings produces their concatenation: 'a' + 'b' is 'ab'. Write a function called fence that takes two parameters called original and wrapper and returns a new string that has the wrapper character at the beginning and end of the original. A call to your function should look like this:

\n", "
print(fence('name', '*'))
\n", "
*name*
\n", "
\n", "
" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "source": [ "
\n", "
\n", "

Selecting characters from strings

\n", "
\n", "
\n", "

If the variable s refers to a string, then s[0] is the string\u2019s first character and s[-1] is its last. Write a function called outer that returns a string made up of just the first and last characters of its input. A call to your function should look like this:

\n", "
print(outer('helium'))
\n", "
hm
\n", "
\n", "
" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
\n", "
\n", "

Rescaling an array

\n", "
\n", "\n", "Write a function `rescale` that takes an array as input and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0. (Hint: If L and H are the lowest and highest values in the original array, then the replacement for a value v should be (v\u2005\u2212\u2005L)/(H\u2005\u2212\u2005L).)\n", "
" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
\n", "
\n", "

Testing and documenting your function

\n", "
\n", "\n", "Run the commands `help(numpy.arange)` and `help(numpy.linspace)` to see how to use these functions to generate regularly-spaced values, then use those values to test your `rescale` function. Once you\u2019ve successfully tested your function, add a docstring that explains what it does.\n", "\n", "
" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
\n", "
\n", "

Defining defaults

\n", "
\n", "\n", "Rewrite the `rescale` function so that it scales data to lie between 0.0 and 1.0 by default, but will allow the caller to specify lower and upper bounds if they want. Compare your implementation to your neighbor\u2019s: do the two functions always behave the same way?\n", "\n", "
" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "
\n", "
\n", "

Variables inside and outside functions

\n", "
\n", "
\n", "

What does the following piece of code display when run - and why?

\n", "
f = 0\n",
    "k = 0\n",
    "\n",
    "def f2k(f):\n",
    "  k = ((f-32)*(5.0/9.0)) + 273.15\n",
    "  return k\n",
    "\n",
    "f2k(8)\n",
    "f2k(41)\n",
    "f2k(32)\n",
    "\n",
    "print(k)
\n", "
\n", "
" ], "cell_type": "markdown", "metadata": {} } ], "metadata": { "kernelspec": { "display_name": "Python 2", "name": "python2", "language": "python" }, "language_info": { "mimetype": "text/x-python", "nbconvert_exporter": "python", "name": "python", "file_extension": ".py", "version": "2.7.11", "pygments_lexer": "ipython2", "codemirror_mode": { "version": 2, "name": "ipython" } } }, "nbformat": 4 }