{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# HIDDEN\n", "\n", "from datascience import *\n", "%matplotlib inline\n", "import matplotlib.pyplot as plots\n", "plots.style.use('fivethirtyeight')\n", "import numpy as np\n", "import math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Introduction\n", "In this note, we'll go over the structure of Python code in a bit more detail than we have before. When you've absorbed this material, you should be able to read Python code and decompose it into simple, understandable parts. This note should be particularly useful if you've seen a lot of Python code, but you have a hard time interpreting complicated-looking code like `table['foo'] = np.array([1,2,3]) + table['bar']`.\n", "\n", "Decomposing Python into small parts is kind of like diagramming an English sentence. While our brains are perfectly capable of generating and understanding English without explicitly identifying things like subjects and predicates, Python interprets code very literally according to its rules (its *syntax*). So if you want to understand Python code, it's more important to have a precise model of Python's rules in your head. On the flip side, Python's rules are much simpler than those of English (see, for example, [this amusingly complicated English sentence](https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo)). They just seem complicated because we're less familiar with them. That makes it possible to learn Python much faster than you learned English.\n", "\n", "Note: Everything in this note is also available, with even more pedantic precision, at the [official Python language reference](https://docs.python.org/3/reference/index.html). This note is focused on the material in chapters 6, 7, and 8 of the reference. We will omit some details and fudge some truths in the interest of pedagogy. Once you feel like an expert in this stuff, feel free to brave the official documentation.\n", "\n", "### How to read this document\n", "This note contains a bunch of code cells, in addition to text. The code cells typically illustrate points from the text. Please run the code cells as you go through the note, and pay attention to what their output is. Recall that the thing that's printed when you run a cell is the value of the last line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Code is a sequence of statements\n", "Below is a cell containing various Python code that might look familiar by now." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "3 # Line 1.0\n", "z = 3 # Line 1.1\n", "4+3 # Line 1.2\n", "y = 4+3 # Line 1.3\n", "(2+3)+z # Line 1.4\n", "\"foo\"+\"bar\" # Line 1.5\n", "[1,2,3] # Line 1.6\n", "x = [1,2,3] # Line 1.7\n", "sum(x) # Line 1.8\n", "x[2] # Line 1.9\n", "x[2] = 4 # Line 1.10\n", "t = Table() # Line 1.11\n", "t['Things'] = np.array([\"foo\", \"bar\", \"baz\"]) # Line 1.12\n", "t.sort('Things') # Line 1.13\n", "u = t.sort('Things') # Line 1.14\n", "u.relabel('Things', 'Nonsense') # Line 1.15\n", "u # Line 1.16" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(The `# Line X` comments are just there for labeling; don't consider them part of the lines. Similarly, other instances of `# some text here` that you see in this note are just for explanation.) Each line in the cell is a *statement*. A statement is a (somewhat) self-contained piece of code. Python executes statements in the order in which they appear. There are many kinds of statements, and to execute a statement, Python first has to figure out what kind of statement it is." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Expressions\n", "The most basic kind of statement is the *expression*.\n", "\n", "Line 0 above is just an expression: `3`. Like many (but not all) expressions, it has a value, the integer 3. Like some (but not all) expressions, computing its value causes nothing to \"happen\" to the world. (We say it has no *side effects*.) When Python executes line 0, it computes that value. Since nothing is done with it, it just gets discarded. The same is true of lines 2, 4, 5, 6, 8, 9, 13, and 16 -- those are expression statements that cause values to be computed, but the computation has no side effects, and the value of the full expression is eventually discarded. Line 15 is an expression that does have side effects -- it causes the `'Things'` column in the table named `t` to be renamed to `'Nonsense'`. The other lines are statements but not expressions, but we will see that, like many statements, they *contain* expressions.\n", "\n", "Expressions are themselves usually made up of several smaller expressions joined together by some rules; we call these *compound expressions*, and we sometimes call the component expressions *subexpressions*. Line 2, for example, is a compound expression made up of the subexpressions `4` and `3` joined by `+`. Python knows what a `+` between two expressions means, and it puts them together so that the value of `4+3` is the value of `4` plus the value of `3`, or 7.\n", "\n", "Line 4 is another compound expression. We can think of it as the subexpressions `(2+3)` and `z`, again joined by `+`. But `(2+3)` is itself a compound expression, made up of `2` and `3` joined by `+`. Python first computes the value of `(2+3)`, which is 5, and then computes the value of `z`, which is 3 (`z` having been assigned previously), and then adds 5 and 3 to get 8. `(2+3)*(4*((5*6)+7))` is also a valid expression. It contains 10 subexpressions (not including itself):\n", "* `2`\n", "* `3`\n", "* `(2+3)`\n", "* `4`\n", "* `5`\n", "* `6`\n", "* `(5*6)`\n", "* `7`\n", "* `((5*6)+7)`\n", "* `(4*((5*6)+7))`\n", "\n", "Compound expressions can be arbitrarily complicated compositions of expressions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question.** How many subexpressions are contained in the expression `((1+2)+(3+4))+((5+6)+(7+8))`?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's *critical* to recognize that subexpressions are valid expressions that could be written by themselves or made part of other compound expressions. If you see a complicated expression like the one above (or even more exotic ones later), and you don't understand what it does, you can always break it down into smaller bits until you get to very basic expressions. There is a fairly small list of basic expression types (things that can't be broken down into subexpressions) to learn.\n", "\n", "This note will tell you the rules about most of the basic expressions in Python, but in order to understand and write real code (which very regularly involves large compound expressions) you'll need to develop the skill of breaking down compound expressions into subexpressions. You can try to do that mentally while you're reading code, but if that's too hard, you can *just type them into a Python code cell* and see what they do.\n", "\n", "**Question.** What's the value of each subexpression you found above? You can just type them into the empty code cell below if you like." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A note on errors\n", "Line 5 (`\"foo\"+\"bar\"`) is a compound expression adding two strings, with `\"foo\"` and `\"bar\"` as subexpressions. This is okay, since the `+` operator knows how to handle two strings. It produces the string \"foobar\" as its value.\n", "\n", "When the following cell is executed, however, there is an error. (Run the cell to confirm that.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "\"foo\"+5 # Error!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you see an error, don't just give up. Often (though unfortunately not always) the error message will tell you what's wrong. The error message first tells us that the problem happened on line 1 of the cell (in this case, the only line) and the text of the error is \"TypeError: Can't convert 'int' object to str implicitly\". Python evaluates `\"foo\"` and `5` just fine, but when the `+` operator tries to apply itself \"foo\" and 5, it becomes unhappy. The error refers to the fact that the `+` operator tries to convert its arguments to something it can add. For example, adding an integer and a float, like `3+4.5`, works because `+` converts the integer `3` to a float. But `+` can't convert a number to text (or vice-versa), so it gives up.\n", "\n", "The important thing to realize about that cell, for our purposes, is exactly where the error happens. In the next cell, for example, some work is done before an error happens:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "(\"foo\"+\"bar\")+5 # Error!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python actually evaluates the subexpression `(\"foo\"+\"bar\")` successfully, producing the string \"foobar\", before again failing to add \"foobar\" and 5. The error occurs only when trying to add a string and a number, and not before.\n", "\n", "Now, let's go over the kinds of expressions that Python has." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Strings, ints, and floats\n", "The most basic kinds of expressions, which we've seen repeatedly above, are string, int, and float expressions. These just look like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\"foo\" # a string expression, whose value is the string \"foo\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "'foo' # a string expression, essentially identical to the one above" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "'5' # a string expression, which happens to contain a single character called 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "5 # an int expression, whose value is the integer number 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "5.1 # a float expression, whose value is the decimal number 5.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's important to recognize that string, int, and float expressions produce values of different types. A string is not an int, nor is it a float. You can see the type of anything by calling `type(thing)` (or print it out with `print(type(thing))`, as in `type(2)`, `type('foo')`, or" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "i_am_a_string = \"blah\"\n", "type(i_am_a_string)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confusingly but conveniently, many functions built into Python will try to convert values of one type to another. `3+4.5` was one example we just saw -- in order to add `3` and `4.5`, Python first converts the integer `3` to the float `3.`. `print(3)` is another -- in order to print anything so you can see its value, the `print` function first converts it to a string. So sometimes you can forget about the types of values. Other times, as in the error we saw before, you have to think about types.\n", "\n", "You can do conversions between these three types yourself with the `str()`, `int()`, and `float()` functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Triple-quoted strings\n", "Here is a more exotic kind of string expression:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\"\"\"blah\n", "\n", "...\n", "# looks like a comment but isn't\n", "last line\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is just a string like `\"foo\"` above, with a few differences. Triple double-quotation marks denote the beginning and end of this string, and it can take up multiple lines, unlike an ordinary string expression.\n", "\n", "Frankly, this is an arcane detail of Python, but we bring it up because triple-quoted strings are often used for writing long-form comments in code, instead of `#` comments. This works even though the string is just an expression, not a special device for long comments. That's because an expression doesn't *do* anything by itself, except that the last expression in a Jupyter notebook cell gets printed. So you can sprinkle string expressions (or other expressions that have no side-effects) throughout your code (on their own lines) and no harm will come of it.\n", "\n", "The following (oddly and excessively) documented code shows this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "\"\"\"The code in this cell produces\n", "pi rounded to 5 decimal digits.\"\"\"\n", "\"First, let's give a name to pi.\"\n", "my_name_for_pi = math.pi\n", "# Now, we round it to 5 decimal\n", "# digits.\n", "pi_rounded = round(my_name_for_pi, 5)\n", "\"Now make that the last expression in this cell.\"\n", "pi_rounded" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Names\n", "Names, also called *variables*, are just expressions like `x` or `my_name_for_pi` that refer to some actual values. When Python sees a name expression, it basically just substitutes the current value of that name for the name. We'll later talk about what kinds of statements assign names to values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lists\n", "Line 6 above, `[1,2,3]`, is another kind of compound expression, the *list literal*. Python knows that when square brackets (`[]`) appear by themselves with a comma-separated list of expressions inside them, we are asking for a list consisting of those expressions' values.\n", "\n", "Again, each expression in the list can be a compound expression. So it's okay to write something like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "[\"foo\"+\"bar\", sum([1,2,3]), [4, 5, 6]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question.** Describe the value of the above list expression in English." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calls\n", "Line 8, `sum(x)`, is also a compound expression, a *function call*. Python evaluates the subexpression `sum`, producing a *function* that adds members of lists, and the subexpression `x`, which was previously set to a list of integers. Then the parentheses `()` direct Python to *call* the function on the left of the parenthesis (the one named `sum`) on the value of `x`, producing the value 6. Note that it's possible to write things like `5(3)` or `nonexistent_function(0)`. Python will just complain that `5` is not a function (specifically, that it is not \"callable\") or that `nonexistent_function` hasn't been defined, respectively.\n", "\n", "The following line is similar to line 8, but the subexpression inside the parentheses, `x + [4]`, is itself a compound expression:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sum(x + [4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Recall that adding two lists with `+` makes a new list consisting of the two lists smashed together. So `x + [4]` above has value equal to `[1,2,4,4]`. `x` is equal to `[1,2,4]`, not `[1,2,3]` as it was defined on line 7, because on line 10 we set its last element to `4`.)\n", "\n", "We haven't seen how to define new functions yet, but here is one example to see how the expression before the `(` is just an expression (whose value must be a function):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "my_name_for_sum = sum\n", "my_name_for_sum(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing\n", "Line 9, `x[2]`, is yet another compound expression. Python evaluates the subexpression `x`, producing a list, and the subexpression `2`. The square brackets `[]`, appearing immediately after an expression and with another expression inside them, tell Python to *index* into the value of the first expression using the value of the second expression. For this list as it's defined on line 9, this produces the value 3.\n", "\n", "Notice that the code string `[2]` can have two different meanings, depending on the code immediately around it. If there is an expression to the left, for example `x[2]`, then Python will take it to mean an indexing expression. If not, Python will think you mean a list with a single element, 2.\n", "\n", "Like parentheses, the things on either side of the square brackets can be compound expressions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x[2-1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "(x + [13])[2+1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question.** In the last cell, there are 7 subexpressions, not counting the whole expression `(x + [13])[2+1]`. Can you identify all of them?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, note that different kinds of values support different kinds of indexing. A Table, for example, supports indexing by strings, producing a column:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "t['Things']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question.** To put together list indexing and function calls, try to figure out what the following code is doing. (Note that an expression like `sum` has a value, like any other name expression, and that value is a function. We can put function values into lists, just like other values.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "some_functions_on_lists = [sum, len]\n", "(some_functions_on_lists[0])(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dots and attributes\n", "Objects (just another name for a value, like 1, \"four score\", or a Table) often have things called properties, attributes, fields, or (in the case when the things are functions) methods. Let's call them attributes. Though in this class we won't see how to create new kinds of objects, we will use attributes all the time.\n", "\n", "We access attributes using a `.`. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "t.rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generically, the thing on the left of the `.` must be an expression whose value is an object with the attribute we want. As with calling and indexing, it can be an arbitrarily complicated compound expression. The thing on the right of the dot is the name of the attribute. Unlike the arguments of a function or the index in an indexing expression, it is *not* an expression. It must be the name of an attribute that the object on the left has.\n", "\n", "As we said, sometimes an attribute is a function, in which case we sometimes call it a *method* instead. The syntax is the same as other attribute accesses:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "t.sort" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "t.sort('Things')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only difference between a method and a normal function is that the object itself (`t` in this case) is automatically passed as the first argument to the method. So the `sort` function technically has two arguments -- the first is the table that `sort` is being called on, and the second is the column name. This is how `sort` knows which table to sort! Normally this is a really technical detail that you don't need to worry about, but it can come up when you accidentally pass the wrong number of arguments to a method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "t.sort('This', 'is', 'too', 'many', 'arguments') # Error!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The error complains that we gave 6 arguments to `sort`, but it looks like we only passed 5. The extra first argument is the table `t`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A weird thing about dot syntax\n", "You might notice at some point that dots are used in two ways in Python: accessing attributes, and in expressions for floating-point numbers. For example, `x.y` is accessing the attribute named `y` in the value named `x`, while `1.2` is just an expression for the number 1.2. This is one reason why you can't have numbers at the start of names. It also means that the expression on the left of a `.` can't just be number. For example, we can't access the attribute `real` of an integer this way (for this example, you don't need to know what `real` is doing, other than that it should just return the same value as the integer):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "1.real" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's because Python can't tell whether we're trying to write an (invalid) decimal number `1.real` or access the `real` attribute of the value `1`. Surrounding the `1` in parentheses makes it clear to Python:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "(1).real" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "### Is that all the expressions?\n", "No. We might see more as the class goes on. But these are most of the important ones, and you've seen most of the difficult ideas." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercises to put it all together\n", "\n", "**Question.** Many people, when they first encounter tables and try to use them to manipulate data, assume that Python allows more syntactic flexibility than it really does. Below are some examples of things we might *hope* would work, but don't. For each one, describe what it actually does, what its author was probably trying to do, what went wrong, and how to fix it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# No error here, just setup for the next cells. Run this cell to see the table we're working with.\n", "my_table = Table([[1, 2, 3, 4], [9, 2, 3, 1]], ['x', 'y'], )\n", "my_table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "my_table['x + y']" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "my_table['x' + 'y']" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_table['x'] + ['y']" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_table.where('x' >= 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_table.where(['x'] >= 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "my_table.sort('y')\n", "row_with_smallest_y = my_table.rows[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Assignments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we had only expressions, it would be difficult to put together many steps in our code. For example, which piece of code is more legible?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Table([['Alice', 'Bob', 'Alice', 'Alice', 'Connie'], [119.99, 29.99, 10.00, 350.00, 5.29]], ['Customer', 'Bill']).group('Customer', np.sum).sort('Bill sum', descending=True)['Customer'][0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "transactions = Table() # Line 3.0\n", "transactions['Customer'] = ['Alice', 'Bob', 'Alice', 'Alice', 'Connie'] # Line 3.1\n", "transactions['Bill'] = [119.99, 29.99, 10.00, 350.00, 5.29] # Line 3.2\n", "total_bill_per_customer = transactions.group('Customer', np.sum) # Line 3.3\n", "customers_sorted_by_total_bill = total_bill_per_customer.sort('Bill sum', descending=True)['Customer'] # Line 3.4\n", "top_customer = customers_sorted_by_total_bill[0] # Line 3.5\n", "top_customer # Line 3.6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Many programs do hundreds (or millions) of different things, and it would be cumbersome to do this only using expressions. In this example, we are doing only one thing, using several steps. The first cell is concise, but it's very hard to read. In the second cell, we use *assignment statements* to break down the steps into things that are (hopefully) understandable.\n", "\n", "An assignment statement is executed like other statements, but it always causes an effect on the world (recall that we called these *side effects*). That is subsequent statements will see the changes made by the assignment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Name assignments\n", "An assignment statement generally has two expressions separated by an equals sign. The expression on the right can be anything, but the expression on the left must be an \"assignable thing\". The simplest case is a name that has not been assigned to anything yet, like `total_bill_per_customer` on line 3 above. Before line 3 is executed, it would be an error to refer to `total_bill_per_customer`, but after line 3, that name can be used to refer to the table created by `transactions.group('Customer', np.sum)`.\n", "\n", "Assignment statements can also reassign existing names to something else:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "number = 3\n", "number = 4\n", "number = number + 2\n", "number" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a matter of code style, it is best to avoid this where possible, because it can make your code more confusing. (If everything is assigned only once, it's trivial to see what its value is when you read code. Otherwise you might need to hunt down all the assignments.) But occasionally it is useful, and sometimes it is necessary. We'll see examples of the latter when we cover iteration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing assignments\n", "Lines 1 and 2 above are assignments to parts of an *indexable* thing. In this case, they add new columns to the `transactions` Table associated with the strings \"Customer\" and \"Bill\", respectively. Generically, an indexing assignment looks like:\n", "\n", " [] = \n", "\n", "The same pattern happens when we assign elements of a list or array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_list = [4, 5, \"foo\"]\n", "my_list[0] = \"bar\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Different indexable things can have different behavior when you set something in them. For example, Tables use string indexing instead of number indexing, and they are okay with adding new columns using indexing assignments (as we saw in lines 1 and 2) or with replacing existing columns with something else. If we want to change the customer names (say because we made a mistake the first time), we could do that by changing the whole \"Customer\" column:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "transactions['Customer'] = ['Alice', 'Bob', 'Alice', 'Alice', 'Dora'] # " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists, however, don't let us add new elements. We can only assign new things to the slots a list had when it was created:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_list[2] = \"baz\" # Okay." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "my_list[3] = \"garply\" # Error." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that it is possible to make an existing list longer using extend(), or to make a new, longer copy of the list with `+`. You just can't do it with index assignment.\n", "\n", "Why do lists have this restriction?\n", "\n", "Lists are supposed to contain contiguous ranges of things; they can't have \"holes\" that aren't indexable. If you could extend a list by assigning to it at whatever indices you wanted, you could assign elements, say, 0, 1, and 3, leaving 2 unassigned. Then what should `len` return for that list -- 3 or 4? And what should happen when you print it? Should it say `[0,1,,3]`? It's not clear. To make sure you don't have to worry about this when you use lists, Python doesn't let you do it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Import statements\n", "A simple, standalone kind of statement is the *import statement*, as in `import numpy as np`. It has the side effect of making the `numpy` module available, giving it the name `np`. Notice that the import statement has its own special rules, and it doesn't include other expressions as subexpressions anywhere.\n", "\n", "Modules are actually values, just like strings or functions. Saying `import numpy` just loads the module named numpy from the computer's library of modules and assigns it the name `numpy`. `import numpy as np` assigns it the name `np` instead. We could imagine that `import numpy as np` does something like this:\n", "\n", " np = load_module('numpy') # BEWARE: NOT REAL PYTHON CODE.\n", "\n", "When you say something like `np.array([1,2,3])`, you're accessing the `array` attribute of the module named `np` and calling it on the list given by `[1,2,3]`. (Note that, unlike function attributes of some other values, function attributes of modules are not usually called methods, and they don't get the module value as an extra argument.)\n", "\n", "**Question.** How many subexpressions (not counting the whole expression) are there in the following expression?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.array([1,1+2,3])*4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Function definitions\n", "Another important statement is the *function definition*:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def square(x):\n", " return x*x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "square(5.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After this line, the function `square` will be available for calling. Defining a function doesn't do anything else. In particular, it's not called unless you call it somewhere.\n", "\n", "The function definition is our first example of a statement that takes up multiple lines. In fact, a function definition is a *compound statement* that typically includes multiple *substatements*; its general form is:\n", "\n", " def ():\n", " \n", " \n", " ...\n", "\n", "Notice the indentation of the statements inside the function. Indentation tells Python where your function definition ends. You can use as many spaces as you want (as long as you're consistent), but 4 is traditional." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When a function is executed (using the function call syntax we saw above), its substatements are executed sequentially, just like an ordinary sequence of statements in a cell. A substatement can be any statement you want, just like a subexpression can be any expression you want. You can even put function definitions as substatements inside function definitions. A special kind of substatement often seen in functions (and nowhere else) is the `return` statement, which is covered in detail next. When a return statement is reached, execution finishes (even if there are statements below) and the expression after `return` becomes the value of the function call.\n", "\n", "Before the statements are executed, each name in the *argument list* is set to the corresponding value in the arguments passed to the function. For example, when we call `square(5.5)` above, Python starts executing the statements in the `square` function, but first sets `x` to 5.5. Arguments are how we pass information into functions; functions with no arguments can only behave one way.\n", "\n", "### Why functions?\n", "Functions are extremely useful for packaging small pieces of functionality into easily-understandable pieces. Computer code is so powerful that organizing and maintaining it is often much more difficult than just getting the computer to do what we want. If you can wrap a complicated procedure into a single function, then you can focus once on getting that function written correctly, and then move on to something else, never worrying about its correctness again. In most moderate- or large-scale software, all code is organized this way -- that is, *all code* is just a bunch of (relatively short) functions that call each other.\n", "\n", "In your labs, and in coding you do outside and after this class, you'll often notice yourself repeating the same thing several times, with slight modifications. For example, you might analyze a dataset and then perform the same analysis with a different dataset for comparison. Or you might find yourself repeatedly doing the same mathematical operation, like \"square each element and add 5\". When that happens, you should rewrite your code so that the thing you're repeating happens inside a function with a memorable name." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Details about function execution\n", "\n", "Let's go back to our definition of `square` for a moment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def square(x):\n", " return x*x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's important to know that the name `x` is assigned to a value *only* for the purposes of the statements inside the function. Outside the function call, argument names are not modified or visible. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = 5\n", "def cube(x):\n", " return x*x*x\n", "cube(3) # 27\n", "x # Still 5!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def square_root(does_not_appear_elsewhere):\n", " return does_not_appear_elsewhere**(1/2)\n", "square_root(4)\n", "does_not_appear_elsewhere # Causes an error. does_not_appear_elsewhere was only defined inside the function while it was being called." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, any names defined inside a function are only defined inside the function while it's running. They don't even stick around across calls to the function; each time the function body finishes, the names defined inside it are wiped out, just like argument names." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def times_three(x):\n", " multiplier = 3\n", " return multiplier*x\n", "\n", "six = times_three(2)\n", "three = multiplier # Error!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Functions as values\n", "A function definition like\n", "\n", " def my_func(x):\n", " return 2*x\n", "\n", "is really just producing a *function value* and assigning the name `my_func` to that value. In this case, the function value is the function that multiplies its single argument by 2. You should imagine `def` as doing something similar to the following (non-functioning) code:\n", " \n", " my_func = make_a_function(x): # BEWARE: NOT REAL PYTHON CODE.\n", " return 2*x\n", "\n", "...where we're imagining for a moment that the special syntax `make_a_function(...): ...` returns a function. So names assigned to functions are really just ordinary names, and function values are just like other values. Of course, function values, like other values, have special behaviors; they can be called using `()`, and they can't be added together like strings or numbers.\n", "\n", "*Names* assigned to functions are also just ordinary names. It is possible, for example, to redefine a name that was previously defined as a function using `def` (though this is so confusing that it is usually a bad idea):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def my_func(x):\n", " return 2*x\n", "\n", "eight = my_func(4)\n", "my_func = 3 # Technically possible, but inadvisable!\n", "my_func + 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also put function values into a list, as we saw earlier:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def my_func_0(x):\n", " return 0*x\n", "def my_func_1(x):\n", " return 1*x\n", "funcs = [my_func_0, my_func_1]\n", "zero = funcs[0](3)\n", "zero" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Though Python prints function values in a slightly cryptic way, you can print them if you want:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "funcs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Return statements\n", "Inside a function definition, we very often see yet another kind of statement: the *return statement*. This has the form `return `. Any of the expressions we saw above can appear after the `return`. This is the value produced by calls to the function. For example, the value of `square(5)` is 25, since `square` will return `5*5` when it is called with the argument `5`.\n", "\n", "`return` stops execution of the function; subsequent statements are not reached. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def weird_but_technically_correct_square(x):\n", " return x*x\n", " return (x*x)+1\n", "\n", "weird_but_technically_correct_square(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a return statement is never reached, calling the function produces no value. The following code is wrong, for example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def wrong_circle_area(r):\n", " math.pi*(r**2)\n", "\n", "some_name = wrong_circle_area(4)\n", "some_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately, this is a mistake that Python will not complain about; it will just silently let `some_name` have no value. (Technically it is given a special value called None. If a statement with value None is the last statement in a cell, Jupyter doesn't print anything, and that's what happens in the above cell. But you can see the value of `some_name` if you write, for example, `str(some_name)`.)\n", "\n", "To be clear, we just fix this by `return`ing whatever we want the function to return:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def correct_circle_area(r):\n", " return math.pi*(r**2)\n", "\n", "circle_radius_four_area = correct_circle_area(4)\n", "circle_radius_four_area" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Conditionals\n", "Conditionals are another important kind of multi-line statement:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x = [1,2,3]\n", "if len(x) > 4:\n", " message = \"x is a long list!\"\n", "else:\n", " message = \"x is a short list!\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "message" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The general form of a conditional is:\n", " \n", " if :\n", " \n", " ...\n", " elif :\n", " \n", " ...\n", " elif :\n", " \n", " ...\n", " ...\n", " else:\n", " \n", " ...\n", "\n", "If there is an `else` clause, then exactly one of the statement groups will be executed; otherwise, it's possible that none of them will happen (if none of the expressions next to `if` or `elif` are True).\n", "\n", "Conditionals are pretty simple, but like functions, they are very important for writing code that does interesting things.\n", "\n", "Something to watch out for is that Python will implicitly convert non-boolean values to boolean values, sometimes using surprising rules. Typically, the convention is that something that is \"zero-like\" or \"empty\" is False, while other things are True. It's best not to rely on this behavior, though; use an explicit comparison that produces a boolean value. See what happens in the following examples:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if 0:\n", " x = True\n", "else:\n", " x = False\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if 1:\n", " x = True\n", "else:\n", " x = False\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if \"some string\":\n", " x = True\n", "else:\n", " x = False\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if \"\":\n", " x = True\n", "else:\n", " x = False\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if []: # (an empty list)\n", " x = True\n", "else:\n", " x = False\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if [3]:\n", " x = True\n", "else: \n", " x = False\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if np.array([]):\n", " x = True\n", "else: \n", " x = False\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if np.array([True]):\n", " x = True\n", "else: \n", " x = False\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if np.array([False]):\n", " x = True\n", "else: \n", " x = False\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "if np.array([True, False]):\n", " x = True\n", "else: \n", " x = False\n", "x" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }