{ "metadata": { "name": "Optimizing Python" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Taking some steps back. Say you want to do something as simple as iterating over a range of values. The standard way of doing this in python is" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range(3):\n", " print(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0\n", "1\n", "2\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This produces a list" ] }, { "cell_type": "code", "collapsed": false, "input": [ "l = range(3)\n", "l" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 2, "text": [ "[0, 1, 2]" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python `lists` have a method `__iter__`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "help(l.__iter__)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Help on method-wrapper object:\n", "\n", "__iter__ = class method-wrapper(object)\n", " | Methods defined here:\n", " | \n", " | __call__(...)\n", " | x.__call__(...) <==> x(...)\n", " | \n", " | __cmp__(...)\n", " | x.__cmp__(y) <==> cmp(x,y)\n", " | \n", " | __getattribute__(...)\n", " | x.__getattribute__('name') <==> x.name\n", " | \n", " | __hash__(...)\n", " | x.__hash__() <==> hash(x)\n", " | \n", " | __repr__(...)\n", " | x.__repr__() <==> repr(x)\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors defined here:\n", " | \n", " | __objclass__\n", " | \n", " | __self__\n", "\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "When the `for` statement is used on a target `l`, it will first run" ] }, { "cell_type": "code", "collapsed": false, "input": [ "li = iter(l)\n", "li" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 4, "text": [ "" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "dir(li)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 5, "text": [ "['__class__',\n", " '__delattr__',\n", " '__doc__',\n", " '__format__',\n", " '__getattribute__',\n", " '__hash__',\n", " '__init__',\n", " '__iter__',\n", " '__length_hint__',\n", " '__new__',\n", " '__reduce__',\n", " '__reduce_ex__',\n", " '__repr__',\n", " '__setattr__',\n", " '__sizeof__',\n", " '__str__',\n", " '__subclasshook__',\n", " 'next']" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here `li` is an **iterator**, and any object that implements the possibility to run `iter()` in it is called an **iterable**.\n", "\n", "What defines an iterator is the existance of a `next()` method." ] }, { "cell_type": "code", "collapsed": false, "input": [ "li.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 6, "text": [ "0" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "After an iterator for the `list` has been created in the `for` statement, `l.next()` is going to be called over and over until we get a `StopIteration` exception." ] }, { "cell_type": "code", "collapsed": false, "input": [ "li.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 7, "text": [ "1" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "li.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 8, "text": [ "2" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "li.next()" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "StopIteration", "evalue": "", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mli\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mStopIteration\u001b[0m: " ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works by the object keeping track of where it is in the count.\n", "\n", "Illustrative implementation of the `for` statement as a function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from __future__ import print_function" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "def my_for(item, iterable, iter_func):\n", " it = iter(iterable)\n", " while True:\n", " try:\n", " item = it.next()\n", " # Iteration code goes here\n", " iter_func(item)\n", " except StopIteration:\n", " break" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "my_for(i, range(3), print)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0\n", "1\n", "2\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So what happened was that we called `range`, which gave a `list`, which was converted to a `listiterator`, which we then called until it stopped us.\n", "\n", "Imagine we want to iterate over a VERY long range, say" ] }, { "cell_type": "code", "collapsed": false, "input": [ "l = range(1000000);" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we are spending time building this `list`, followed by spending memory storing this `list`. Only to convert it to a `listiterator` when we iterate over it!\n", "\n", "But counting isn't that hard, let us instead _directly_ implement an iterator" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class iter_range(object):\n", " def __init__(self, stop):\n", " self.current = 0\n", " self.stop = stop\n", " \n", " def next(self):\n", " self.current += 1\n", " if self.current >= self.stop:\n", " raise StopIteration\n", "\n", " else:\n", " return self.current\n", " \n", " def __iter__(self):\n", " return self" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This pattern of `__iter__` returning `self` and `next()` returning the next iterate is referred to in Python as the _iterator protocol_." ] }, { "cell_type": "code", "collapsed": false, "input": [ "L = [1,2,3,4]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "iter(L)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 16, "text": [ "" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "range_4 = iter(iter_range(4))\n", "for i in range_4:\n", " print(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n", "2\n", "3\n" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "range_4" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 18, "text": [ "<__main__.iter_range at 0x10396a310>" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "dir(range_4)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 19, "text": [ "['__class__',\n", " '__delattr__',\n", " '__dict__',\n", " '__doc__',\n", " '__format__',\n", " '__getattribute__',\n", " '__hash__',\n", " '__init__',\n", " '__iter__',\n", " '__module__',\n", " '__new__',\n", " '__reduce__',\n", " '__reduce_ex__',\n", " '__repr__',\n", " '__setattr__',\n", " '__sizeof__',\n", " '__str__',\n", " '__subclasshook__',\n", " '__weakref__',\n", " 'current',\n", " 'next',\n", " 'stop']" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we only need to keep track of two numbers rather than a million like above; all the intermediate numbers are calculated on the fly.\n", "\n", "As you might know, this functionality is already available in Python with the much more optimal iterator object `xrange`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "li = xrange(4)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "iter(li)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 21, "text": [ "" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So what happend with your iterator once it's thrown the `StopIteration` exception?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "range_4.next()" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "StopIteration", "evalue": "", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrange_4\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36mnext\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcurrent\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcurrent\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstop\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mStopIteration\u001b[0m: " ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Python terminology we call the iterator _exhausted_." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generators\n", "\n", "This is another instance where the pattern of defining something is very common as soon as you have some sort of data which you only need to touch once. This being python, there's a quicker and more convenient way of implicitly satisfying the iterator protocol." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def iter_range(stop):\n", " current = 0\n", " while current < stop:\n", " yield current\n", " current += 1" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **`yield`** keyword can foremost be thought of as a replacement to `return`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "range_4 = iter_range(4)\n", "range_4" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 24, "text": [ "" ] } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "When a function containing a `yield` statement is called, it is not going to run the body of the function, but return a generator.\n", "\n", "The first time the `.next()` method of the generator is called it will execute up to, and including the `yield` statement upon the first call. There, it will wait until the generators `.next()` method is called again." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def iter_range(stop):\n", " current = 0\n", " print(\"Before loop\")\n", " while current < stop:\n", " if current == \"stop\":\n", " return\n", "\n", " print(\"Before yield\")\n", " yield current\n", " current += 1\n", " print(\"After yield\")\n", " \n", " print(\"After loop\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "range_4 = iter_range(4)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "range_4.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Before loop\n", "Before yield\n" ] }, { "output_type": "pyout", "prompt_number": 28, "text": [ "0" ] } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "range_4.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "After yield\n", "Before yield\n" ] }, { "output_type": "pyout", "prompt_number": 29, "text": [ "1" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "When the body of the function runs to completion, in this case meaning we exit the `while` loop and execute the final `print` call, the generator will throw `StopIteration`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "range_4.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "After yield\n", "Before yield\n" ] }, { "output_type": "pyout", "prompt_number": 30, "text": [ "2" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "range_4.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "After yield\n", "Before yield\n" ] }, { "output_type": "pyout", "prompt_number": 31, "text": [ "3" ] } ], "prompt_number": 31 }, { "cell_type": "code", "collapsed": false, "input": [ "range_4.next()" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "StopIteration", "evalue": "", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrange_4\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mStopIteration\u001b[0m: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "After yield\n", "After loop\n" ] } ], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, when the generators where introduced, they got very popular, but many generators are rather short and simple in turns of functionality, so much that even a simple function defenition might feel cumbersome to make. Therefore, in the later versions of pyhton there are **comprehensive generators**, just like there are comprehensive lists.\n", "\n", "These are great for chaining and filtering generators" ] }, { "cell_type": "code", "collapsed": false, "input": [ "range_5 = iter_range(5)\n", "range_5_even = (i for i in range_5 if i % 2 == 0)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "range_5_even" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 34, "text": [ " at 0x103968410>" ] } ], "prompt_number": 34 }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range_5_even:\n", " print(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Before loop\n", "Before yield\n", "0\n", "After yield\n", "Before yield\n", "After yield\n", "Before yield\n", "2\n", "After yield\n", "Before yield\n", "After yield\n", "Before yield\n", "4\n", "After yield\n", "After loop\n" ] } ], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The _lazy_ nature of iterators makes it a good idea to chain them. You can make small logical building blocks of generators, and wait with executing all these blocks until you have a chain with all the processing you need.\n", "\n", " Large data input -> generator -> generator -> generator -> for x in s:\n", "\n", "The `for` statement will finally **pull** the data through all the generators.\n", "\n", "An in depth slideshow about this is available here: http://www.dabeaz.com/generators/Generators.pdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `itertools`\n", "\n", "http://docs.python.org/2/library/itertools.html\n", "\n", "The built in module `itertools` contains many common iterators and iterator versions of built in list functions, as well as some combinatoric generators.\n", "\n", "Some of the more self explanatory are `chain()`, `count()`, `cycle()`, `ifilter()`, `islice()`, `izip()`, `takewhile()`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `tee(it, n)`\n", "\n", "Takes and iterator `it` and copies it in to `n`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from itertools import tee" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "range_5 = iter_range(5)\n", "r1, r2 = tee(range_5, 2)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": [ "r1.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Before loop\n", "Before yield\n" ] }, { "output_type": "pyout", "prompt_number": 38, "text": [ "0" ] } ], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "r2.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 39, "text": [ "0" ] } ], "prompt_number": 39 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This enables you to work around the limitation of only being able to use an iterator once. But it can be quite memory intensive, so you might consier using a `list` in stead if you feel like you need this.\n", "\n", "Example use: Say we want to get ordered pairs of an iterator" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from itertools import izip" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "def pairs(it):\n", " a, b = tee(it, 2)\n", " b.next() # Advance one step in the second copy\n", " return izip(a, b)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 41 }, { "cell_type": "code", "collapsed": false, "input": [ "range_4 = iter_range(4)\n", "pairs_4 = pairs(range_4)\n", "\n", "for i in pairs_4:\n", " print(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Before loop\n", "Before yield\n", "After yield\n", "Before yield\n", "(0, 1)\n", "After yield\n", "Before yield\n", "(1, 2)\n", "After yield\n", "Before yield\n", "(2, 3)\n", "After yield\n", "After loop\n" ] } ], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "(2*i for i in range(3))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 44, "text": [ " at 0x103968460>" ] } ], "prompt_number": 44 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Coroutines" ] }, { "cell_type": "code", "collapsed": false, "input": [ "range_4 = iter_range(4)\n", "dir(range_4)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 45, "text": [ "['__class__',\n", " '__delattr__',\n", " '__doc__',\n", " '__format__',\n", " '__getattribute__',\n", " '__hash__',\n", " '__init__',\n", " '__iter__',\n", " '__name__',\n", " '__new__',\n", " '__reduce__',\n", " '__reduce_ex__',\n", " '__repr__',\n", " '__setattr__',\n", " '__sizeof__',\n", " '__str__',\n", " '__subclasshook__',\n", " 'close',\n", " 'gi_code',\n", " 'gi_frame',\n", " 'gi_running',\n", " 'next',\n", " 'send',\n", " 'throw']" ] } ], "prompt_number": 45 }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can _send_ values to a generator using the `.send()` method.\n", "\n", "This is a way of using a generator \"in reverse\".\n", "\n", "The syntax is a bit unintuitive, so let's illustrate with a simple example" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def cr():\n", " while True:\n", " n = (yield)\n", " yield 1" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 61 }, { "cell_type": "code", "collapsed": false, "input": [ "c = cr()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 62 }, { "cell_type": "code", "collapsed": false, "input": [ "c.next()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 63 }, { "cell_type": "code", "collapsed": false, "input": [ "c.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 64, "text": [ "1" ] } ], "prompt_number": 64 }, { "cell_type": "code", "collapsed": false, "input": [ "c.send(2)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 65 }, { "cell_type": "code", "collapsed": false, "input": [ "c.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 66, "text": [ "1" ] } ], "prompt_number": 66 }, { "cell_type": "code", "collapsed": false, "input": [ "c.send()" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "send() takes exactly one argument (0 given)", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: send() takes exactly one argument (0 given)" ] } ], "prompt_number": 67 }, { "cell_type": "code", "collapsed": false, "input": [ "def doubler():\n", " while True:\n", " n = (yield)\n", " print(2*n)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 46 }, { "cell_type": "code", "collapsed": false, "input": [ "d = doubler()\n", "d.next()\n", "\n", "for i in range_4:\n", " d.send(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Before loop\n", "Before yield\n", "0\n", "After yield\n", "Before yield\n", "2\n", "After yield\n", "Before yield\n", "4\n", "After yield\n", "Before yield\n", "6\n", "After yield\n", "After loop\n" ] } ], "prompt_number": 47 }, { "cell_type": "code", "collapsed": false, "input": [ "d.send(7)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "14\n" ] } ], "prompt_number": 49 }, { "cell_type": "code", "collapsed": false, "input": [ "d.send(10)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "20\n" ] } ], "prompt_number": 50 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `doubler` function is a **coroutine**, in contrast to a generator, which is a _producer_, it can be seen as a _consumer_. It sits in memory, doing nothing, until a value is sent to it for consumption.\n", "\n", "Some notes about the above code:\n", "\n", "- We need to call `.next()`, since that is how we got to the point of the code that has the `yield` statement\n", "- When `.send()` is called, the argument will be inputted as the value of `yield`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def doubler(target):\n", " while True:\n", " n = (yield)\n", " target.send(2 * n)\n", "\n", "\n", "def halfer():\n", " while True:\n", " n = (yield)\n", " print(n / 2)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 51 }, { "cell_type": "code", "collapsed": false, "input": [ "h = halfer()\n", "h.next()\n", "doubler_halfer = doubler(h)\n", "doubler_halfer.next()\n", "\n", "for i in range(3):\n", " doubler_halfer.send(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0\n", "1\n", "2\n" ] } ], "prompt_number": 52 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because calling the `.next()`s get annoying after a while, let us quickly define a decorator for taking care of this" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def coroutine(func):\n", " def start(*args, **kwargs):\n", " cr = func(*args, **kwargs)\n", " cr.next()\n", " return cr\n", " \n", " return start" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def doubler(target):\n", " while True:\n", " n = (yield)\n", " target.send(2. * n)\n", " \n", "@coroutine\n", "def halfer(target):\n", " while True:\n", " n = (yield)\n", " target.send(n / 2.)\n", "\n", "@coroutine\n", "def printer():\n", " while True:\n", " n = (yield)\n", " print(str(n))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 54 }, { "cell_type": "code", "collapsed": false, "input": [ "doubler_halfer_printer = doubler(halfer(printer()))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 55 }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range(4):\n", " doubler_halfer_printer.send(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.0\n", "1.0\n", "2.0\n", "3.0\n" ] } ], "prompt_number": 56 }, { "cell_type": "markdown", "metadata": {}, "source": [ "By combining coroutines we can make pipes which we are **pushing** data through:\n", "\n", " source -> coroutine -> coroutine -> coroutine -> sink\n", "\n", "The final coroutine in the chain, which doesn't pass data along to another coroutine, is referred to as a _sink_." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Conceptually this is very different from using generators, but we're getting the same result in the end (lazy dataprocessing), so why would anyone want to use coroutines?\n", "\n", "Unlike generators, coroutines can be **branched**" ] }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def broadcaster(targets):\n", " while True:\n", " value = (yield)\n", " for target in targets:\n", " target.send(value)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 57 }, { "cell_type": "code", "collapsed": false, "input": [ "stuff = broadcaster([halfer(printer()), doubler(printer())])\n", "for i in range(5):\n", " stuff.send(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.0\n", "0.0\n", "0.5\n", "2.0\n", "1.0\n", "4.0\n", "1.5\n", "6.0\n", "2.0\n", "8.0\n" ] } ], "prompt_number": 58 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To illustrate how the execution in a generator is suspended before it's told to move one, consider this simple generator:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def gen():\n", " yield 1\n", " yield 2" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "g = gen()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "g.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 3, "text": [ "1" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "g.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 4, "text": [ "2" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next time we run `.next()`, the function defining the generator will finish, and implicitly `return None`. This will cause the `StopIteration` exception to be thrown." ] }, { "cell_type": "code", "collapsed": false, "input": [ "g.next()" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "StopIteration", "evalue": "", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mStopIteration\u001b[0m: " ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And compare that to this coroutine (which technically also is a generator)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def cr():\n", " a = (yield)\n", " print(a)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "c = cr()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the coroutine is initiated it will just wait until it's told to take control of the execution. When we call `.send()` of it, it will take control of execution, run everything passed the `yield`. This will again cause the coroutine (generator) to implicitly `return None`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "c.send(34)" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "StopIteration", "evalue": "", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m34\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mStopIteration\u001b[0m: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "34\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the type of `c` is a generator." ] }, { "cell_type": "code", "collapsed": false, "input": [ "type(c)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 14, "text": [ "generator" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "from termcolor import colored" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 59 }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range(1, 100):\n", " if not i % 7:\n", " print(colored(i, \"red\"), end=\" \")\n", " else:\n", " print(colored(i, \"cyan\"), end=\" \")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[36m1\u001b[0m \u001b[36m2\u001b[0m \u001b[36m3\u001b[0m \u001b[36m4\u001b[0m \u001b[36m5\u001b[0m \u001b[36m6\u001b[0m \u001b[31m7\u001b[0m \u001b[36m8\u001b[0m \u001b[36m9\u001b[0m \u001b[36m10\u001b[0m \u001b[36m11\u001b[0m \u001b[36m12\u001b[0m \u001b[36m13\u001b[0m \u001b[31m14\u001b[0m \u001b[36m15\u001b[0m \u001b[36m16\u001b[0m \u001b[36m17\u001b[0m \u001b[36m18\u001b[0m \u001b[36m19\u001b[0m \u001b[36m20\u001b[0m \u001b[31m21\u001b[0m \u001b[36m22\u001b[0m \u001b[36m23\u001b[0m \u001b[36m24\u001b[0m \u001b[36m25\u001b[0m \u001b[36m26\u001b[0m \u001b[36m27\u001b[0m \u001b[31m28\u001b[0m \u001b[36m29\u001b[0m \u001b[36m30\u001b[0m \u001b[36m31\u001b[0m \u001b[36m32\u001b[0m \u001b[36m33\u001b[0m \u001b[36m34\u001b[0m \u001b[31m35\u001b[0m \u001b[36m36\u001b[0m \u001b[36m37\u001b[0m \u001b[36m38\u001b[0m \u001b[36m39\u001b[0m \u001b[36m40\u001b[0m \u001b[36m41\u001b[0m \u001b[31m42\u001b[0m \u001b[36m43\u001b[0m \u001b[36m44\u001b[0m \u001b[36m45\u001b[0m \u001b[36m46\u001b[0m \u001b[36m47\u001b[0m \u001b[36m48\u001b[0m \u001b[31m49\u001b[0m \u001b[36m50\u001b[0m \u001b[36m51\u001b[0m \u001b[36m52\u001b[0m \u001b[36m53\u001b[0m \u001b[36m54\u001b[0m \u001b[36m55\u001b[0m \u001b[31m56\u001b[0m \u001b[36m57\u001b[0m \u001b[36m58\u001b[0m \u001b[36m59\u001b[0m \u001b[36m60\u001b[0m \u001b[36m61\u001b[0m \u001b[36m62\u001b[0m \u001b[31m63\u001b[0m \u001b[36m64\u001b[0m \u001b[36m65\u001b[0m \u001b[36m66\u001b[0m \u001b[36m67\u001b[0m \u001b[36m68\u001b[0m \u001b[36m69\u001b[0m \u001b[31m70\u001b[0m \u001b[36m71\u001b[0m \u001b[36m72\u001b[0m \u001b[36m73\u001b[0m \u001b[36m74\u001b[0m \u001b[36m75\u001b[0m \u001b[36m76\u001b[0m \u001b[31m77\u001b[0m \u001b[36m78\u001b[0m \u001b[36m79\u001b[0m \u001b[36m80\u001b[0m \u001b[36m81\u001b[0m \u001b[36m82\u001b[0m \u001b[36m83\u001b[0m \u001b[31m84\u001b[0m \u001b[36m85\u001b[0m \u001b[36m86\u001b[0m \u001b[36m87\u001b[0m \u001b[36m88\u001b[0m \u001b[36m89\u001b[0m \u001b[36m90\u001b[0m \u001b[31m91\u001b[0m \u001b[36m92\u001b[0m \u001b[36m93\u001b[0m \u001b[36m94\u001b[0m \u001b[36m95\u001b[0m \u001b[36m96\u001b[0m \u001b[36m97\u001b[0m \u001b[31m98\u001b[0m \u001b[36m99\u001b[0m " ] } ], "prompt_number": 60 }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def colorer(color, target):\n", " while True:\n", " text = str((yield))\n", " target.send(colored(text, color))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 76 }, { "cell_type": "code", "collapsed": false, "input": [ "stuff = broadcaster([halfer(colorer(\"red\", printer())), \\\n", " doubler(colorer(\"cyan\", printer()))])\n", "for i in range(5):\n", " stuff.send(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[31m0.0\u001b[0m\n", "\u001b[36m0.0\u001b[0m\n", "\u001b[31m0.5\u001b[0m\n", "\u001b[36m2.0\u001b[0m\n", "\u001b[31m1.0\u001b[0m\n", "\u001b[36m4.0\u001b[0m\n", "\u001b[31m1.5\u001b[0m\n", "\u001b[36m6.0\u001b[0m\n", "\u001b[31m2.0\u001b[0m\n", "\u001b[36m8.0\u001b[0m\n" ] } ], "prompt_number": 77 }, { "cell_type": "code", "collapsed": false, "input": [ "stuff.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 78 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, just like with generators, the internal state of the coroutine is preserved until it returns, meaning the input we send to them can be used to change the state.\n", "\n", "Coroutines have a method `.close()`, this will cause the coroutine to throw a `GeneratorExit` exception at the `yield` point." ] }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def summer(target):\n", " s = 0\n", " try:\n", " while True:\n", " s += (yield)\n", "\n", " except GeneratorExit:\n", " target.send(s)\n", " return" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 79 }, { "cell_type": "code", "collapsed": false, "input": [ "a = summer(printer())" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 80 }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range(4):\n", " a.send(i)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 81 }, { "cell_type": "code", "collapsed": false, "input": [ "a.close()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "6\n" ] } ], "prompt_number": 82 }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def router(func, true_target, false_target):\n", " while True:\n", " v = (yield)\n", " if func(v):\n", " true_target.send(v)\n", " else:\n", " false_target.send(v)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 83 }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def averager(target):\n", " count = 0\n", " s = 0.\n", " while True:\n", " s += (yield)\n", " count += 1\n", " target.send(s / count)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 84 }, { "cell_type": "code", "collapsed": false, "input": [ "p = printer()\n", "pipeline = router(lambda i: i % 2, colorer(\"red\", p), averager(colorer(\"cyan\", p)))\n", "\n", "for i in range(10):\n", " pipeline.send(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[36m0.0\u001b[0m\n", "\u001b[31m1\u001b[0m\n", "\u001b[36m1.0\u001b[0m\n", "\u001b[31m3\u001b[0m\n", "\u001b[36m2.0\u001b[0m\n", "\u001b[31m5\u001b[0m\n", "\u001b[36m3.0\u001b[0m\n", "\u001b[31m7\u001b[0m\n", "\u001b[36m4.0\u001b[0m\n", "\u001b[31m9\u001b[0m\n" ] } ], "prompt_number": 85 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Coroutine pipeline example\n", "\n", "As an example, let's assume we want to check some letter frequencies in four digit numerals in different languages\n", "\n", "We produce the data with this generator:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from random import randint\n", "def number_producer(n=4):\n", " fstr = '{0:0' + str(n) + 'd}'\n", " n_max = 10 ** (n) - 1\n", " while True:\n", " yield fstr.format(randint(0, n_max))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 86 }, { "cell_type": "code", "collapsed": false, "input": [ "producer = number_producer(10)\n", "producer.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 87, "text": [ "'3070970711'" ] } ], "prompt_number": 87 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we map the digits to languages by these dictionaries" ] }, { "cell_type": "code", "collapsed": false, "input": [ "english = {0: \"zero\", 1: \"one\", 2: \"two\", 3: \"three\", 4: \"four\", \\\n", " 5: \"five\", 6: \"six\", 7: \"seven\", 8: \"eight\", 9: \"nine\"}\n", "\n", "svenska = {0: \"noll\", 1: \"ett\", 2: \"tva\", 3: \"tre\", 4: \"fyra\", \\\n", " 5: \"fem\", 6: \"sex\", 7: \"sju\", 8: \"atta\", 9: \"nio\"}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 88 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The subtasks we wish to perform are these:\n", "\n", "- split\n", "- translate into language\n", "- count letters\n", "- count occurances\n", "- present result" ] }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def digit_splitter(target):\n", " try:\n", " while True:\n", " number = (yield)\n", " for digit in number:\n", " target.send(int(digit))\n", " except GeneratorExit:\n", " target.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 89 }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def digit_mapper(map, target):\n", " try:\n", " while True:\n", " digit = (yield)\n", " target.send(map[digit])\n", " except GeneratorExit:\n", " target.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 90 }, { "cell_type": "code", "collapsed": false, "input": [ "from collections import Counter" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 91 }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def letter_counter(target):\n", " c = Counter()\n", " try:\n", " while True:\n", " c.update((yield))\n", "\n", " except GeneratorExit:\n", " target.send(c)\n", " target.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 96 }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def count_formatter(target, n=5):\n", " while True:\n", " counts = (yield)\n", " for letter, count in counts.most_common(n):\n", " s = letter + \" - \" + \"|\" * count\n", " target.send(s)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 92 }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def multiline_printer():\n", " s = \"\"\n", " try:\n", " while True:\n", " s += (yield) + \"\\n\"\n", "\n", " except GeneratorExit:\n", " print(s)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 93 }, { "cell_type": "code", "collapsed": false, "input": [ "@coroutine\n", "def broadcaster(targets):\n", " try:\n", " while True:\n", " value = (yield)\n", " for target in targets:\n", " target.send(value)\n", " except GeneratorExit:\n", " for target in targets:\n", " target.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 94 }, { "cell_type": "code", "collapsed": false, "input": [ "eng_result_interpreter = count_formatter(multiline_printer(), 10)\n", "swe_result_interpreter = count_formatter(colorer(\"magenta\", multiline_printer()))\n", "\n", "eng_counter = digit_mapper(english, letter_counter(eng_result_interpreter))\n", "swe_counter = digit_mapper(svenska, letter_counter(swe_result_interpreter))\n", "\n", "number_analyser = digit_splitter(broadcaster([eng_counter, swe_counter]))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "\n" ] } ], "prompt_number": 98 }, { "cell_type": "code", "collapsed": false, "input": [ "P = number_producer(4)\n", "\n", "for i in xrange(10):\n", " number_analyser.send(P.next())\n", "\n", "number_analyser.close()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "e - |||||||||||||||||||||||||||||||||||||||\n", "r - |||||||||||||||||\n", "o - ||||||||||||||||\n", "i - |||||||||||||||\n", "t - ||||||||||||||\n", "n - |||||||||||||\n", "h - |||||||||||\n", "z - |||||||\n", "f - ||||||\n", "s - ||||||\n", "\n", "\u001b[35mt - ||||||||||||||||||||||||\u001b[0m\n", "\u001b[35me - |||||||||||||||||\u001b[0m\n", "\u001b[35ma - ||||||||||||||\u001b[0m\n", "\u001b[35ml - ||||||||||||||\u001b[0m\n", "\u001b[35mo - |||||||||||\u001b[0m\n", "\n" ] } ], "prompt_number": 99 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Profiling\n", "\n", "Built in cProfile" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%file hanoi.py\n", "\"\"\"Solve 'Towers of Hanoi'\n", "\"\"\"\n", "import sys\n", "\n", "PRINT = False\n", "\n", "@profile\n", "def solve(g,n):\n", " X = [sum(g[0])]\n", " Y = [sum(g[1])]\n", " Z = [sum(g[2])]\n", "\n", " moved = 0\n", " for i in range(2**n - 1):\n", " tops = [a[0] for a in g]\n", " movable = False\n", " j = 0\n", " while not movable:\n", " max_legal = max([t for t in tops if g[j][0] % 2 != t % 2])\n", " if g[j][0] != moved and g[j][0] > 0:\n", " num_zeros = len([z for z in tops if z == 0])\n", " if g[j][0] < max_legal or num_zeros > 0:\n", " movable = True\n", " if not movable:\n", " j += 1\n", " \n", " moved = g[j][0]\n", " \n", " legal = False\n", " k = 2\n", " while not legal:\n", " if (tops[k] % 2 != g[j][0] % 2 and g[j][0] < tops[k]) or tops[k] == 0:\n", " legal = True\n", " if not legal:\n", " k -= 1\n", " \n", " g[k] = [g[j][0]] + g[k]\n", " g[j] = g[j][1::]\n", "\n", " if PRINT:\n", " print g\n", "\n", " S = [sum(s) for s in g]\n", " X += [S[0]]\n", " Y += [S[1]]\n", " Z += [S[2]]\n", " \n", " return (X,Y,Z)\n", "\n", "\n", "if __name__ == \"__main__\":\n", " n = int(sys.argv[-1])\n", " game = [range(1,n+1)+[0], [0], [0]]\n", "\n", " X, Y, Z = solve(game,n)\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Overwriting hanoi.py\n" ] } ], "prompt_number": 111 }, { "cell_type": "code", "collapsed": false, "input": [ "%%bash\n", "python -m cProfile hanoi.py 20" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 6756455 function calls in 14.319 seconds\n", "\n", " Ordered by: standard name\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 1 0.000 0.000 14.319 14.319 hanoi.py:2()\n", " 1 12.494 12.494 14.319 14.319 hanoi.py:8(solve)\n", " 1513582 0.203 0.000 0.203 0.000 {len}\n", " 2097140 0.575 0.000 0.575 0.000 {max}\n", " 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}\n", " 2 0.031 0.015 0.031 0.015 {range}\n", " 3145728 1.017 0.000 1.017 0.000 {sum}\n", "\n", "\n" ] } ], "prompt_number": 101 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### snakeviz\n", "\n", "When one has a lot of functions to compare, just reading a text table might be unclear.\n", "There are various parsers for these profiling tables. One which particularly nice and\n", "web browser based is _snakeviz_.\n", "\n", " pip install snakeviz\n", "\n", "We profile again, but specify an output file" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%bash\n", "python -m cProfile -o profile_name hanoi.py 18" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 103 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we just call snakeviz with that file as argument, this will start a web server which displays the results." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%script --bg bash\n", "snakeviz profile_name" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Starting job # 2 in a separate thread.\n" ] } ], "prompt_number": 105 }, { "cell_type": "markdown", "metadata": {}, "source": [ "(These cells with `%%bash` and `%%script` are just like typing in a terminal window)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Instrumenting Profilers vs Statistical Profilers\n", "\n", "For instrumenting profilers, every time an action happens, e.g. executing a function, time points are saved and timings are calculated.\n", "\n", "Statistical profilers samples the call stack at short intervals, and that data can be used to figure where most time is spent." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### plop\n", "\n", "(Python Low Overhead Profiler)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%bash\n", "python -m plop.collector hanoi.py 21" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "profile output saved to /tmp/plop.out\n", "overhead was 1.6438374391e-05 per sample (0.0016438374391%)\n" ] } ], "prompt_number": 106 }, { "cell_type": "code", "collapsed": false, "input": [ "!cat /tmp/plop.out" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "{(('', 8, 'solve'), ('', 2, ''), ('/Users/valentinesvensson/.virtualenvs/devel/lib/python2.7/site-packages/plop/collector.py', 74, 'main'), ('/Users/valentinesvensson/.virtualenvs/devel/lib/python2.7/site-packages/plop/collector.py', 1, ''), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py', 62, '_run_code'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py', 136, '_run_module_as_main')): 2154}" ] } ], "prompt_number": 107 }, { "cell_type": "code", "collapsed": false, "input": [ "%%script --bg bash\n", "python -m plop.viewer --port=9876 --datadir=/tmp" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Starting job # 3 in a separate thread.\n" ] } ], "prompt_number": 108 }, { "cell_type": "code", "collapsed": false, "input": [ "%killbgscripts" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "All background processes were killed.\n" ] } ], "prompt_number": 109 }, { "cell_type": "markdown", "metadata": {}, "source": [ "A (more well established) alternative to plop is **`statprof`**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### line_profiler\n", "\n", "Sometime one might want to now on a line per line basis what is taking up time.\n", "\n", " pip install line_profiler\n", "\n", "This requires you to put a `@profile` decorator on the function(s) you wish to profile" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%bash\n", "kernprof.py --line-by-line hanoi.py 12" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Wrote profile results to hanoi.py.lprof\n" ] } ], "prompt_number": 112 }, { "cell_type": "code", "collapsed": false, "input": [ "%%bash\n", "python -m line_profiler hanoi.py.lprof" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Timer unit: 1e-06 s\n", "\n", "File: hanoi.py\n", "Function: solve at line 7\n", "Total time: 0.332164 s\n", "\n", "Line # Hits Time Per Hit % Time Line Contents\n", "==============================================================\n", " 7 @profile\n", " 8 def solve(g,n):\n", " 9 1 6 6.0 0.0 X = [sum(g[0])]\n", " 10 1 2 2.0 0.0 Y = [sum(g[1])]\n", " 11 1 2 2.0 0.0 Z = [sum(g[2])]\n", " 12 \n", " 13 1 1 1.0 0.0 moved = 0\n", " 14 4096 5217 1.3 1.6 for i in range(2**n - 1):\n", " 15 16380 23871 1.5 7.2 tops = [a[0] for a in g]\n", " 16 4095 5216 1.3 1.6 movable = False\n", " 17 4095 5005 1.2 1.5 j = 0\n", " 18 12279 15471 1.3 4.7 while not movable:\n", " 19 32736 58374 1.8 17.6 max_legal = max([t for t in tops if g[j][0] % 2 != t % 2])\n", " 20 8184 12901 1.6 3.9 if g[j][0] != moved and g[j][0] > 0:\n", " 21 23400 31866 1.4 9.6 num_zeros = len([z for z in tops if z == 0])\n", " 22 5850 8690 1.5 2.6 if g[j][0] < max_legal or num_zeros > 0:\n", " 23 4095 5343 1.3 1.6 movable = True\n", " 24 8184 10186 1.2 3.1 if not movable:\n", " 25 4089 5318 1.3 1.6 j += 1\n", " 26 \n", " 27 4095 5414 1.3 1.6 moved = g[j][0]\n", " 28 \n", " 29 4095 5204 1.3 1.6 legal = False\n", " 30 4095 5195 1.3 1.6 k = 2\n", " 31 12279 15767 1.3 4.7 while not legal:\n", " 32 8184 16257 2.0 4.9 if (tops[k] % 2 != g[j][0] % 2 and g[j][0] < tops[k]) or tops[k] == 0:\n", " 33 4095 5386 1.3 1.6 legal = True\n", " 34 8184 10304 1.3 3.1 if not legal:\n", " 35 4089 5568 1.4 1.7 k -= 1\n", " 36 \n", " 37 4095 9705 2.4 2.9 g[k] = [g[j][0]] + g[k]\n", " 38 4095 8777 2.1 2.6 g[j] = g[j][1::]\n", " 39 \n", " 40 4095 5705 1.4 1.7 if PRINT:\n", " 41 print g\n", " 42 \n", " 43 16380 28153 1.7 8.5 S = [sum(s) for s in g]\n", " 44 4095 8937 2.2 2.7 X += [S[0]]\n", " 45 4095 7506 1.8 2.3 Y += [S[1]]\n", " 46 4095 6815 1.7 2.1 Z += [S[2]]\n", " 47 \n", " 48 1 2 2.0 0.0 return (X,Y,Z)\n", "\n" ] } ], "prompt_number": 113 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### memory_profiler\n", "\n", "We spent quite a while talking about generators, and the primary gain from those are the memory usage. Commonly we would like to know where we can avoid some memory expansions.\n", "\n", " pip install memory_profiler\n", " pip install psutil\n", "\n", "Then one must decorate the functions one wish to memory profile with a `@profile` decorator, just like with the `line_profiler`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%bash\n", "python -m memory_profiler hanoi.py 6" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Filename: hanoi.py\n", "\n", "Line # Mem usage Increment Line Contents\n", "================================================\n", " 7 @profile\n", " 8 8.145 MB 0.000 MB def solve(g,n):\n", " 9 8.145 MB 0.000 MB X = [sum(g[0])]\n", " 10 8.145 MB 0.000 MB Y = [sum(g[1])]\n", " 11 8.145 MB 0.000 MB Z = [sum(g[2])]\n", " 12 \n", " 13 8.145 MB 0.000 MB moved = 0\n", " 14 8.160 MB 0.016 MB for i in range(2**n - 1):\n", " 15 8.246 MB 0.086 MB tops = [a[0] for a in g]\n", " 16 8.246 MB 0.000 MB movable = False\n", " 17 8.168 MB -0.078 MB j = 0\n", " 18 8.172 MB 0.004 MB while not movable:\n", " 19 8.246 MB 0.074 MB max_legal = max([t for t in tops if g[j][0] % 2 != t % 2])\n", " 20 8.168 MB -0.078 MB if g[j][0] != moved and g[j][0] > 0:\n", " 21 8.246 MB 0.078 MB num_zeros = len([z for z in tops if z == 0])\n", " 22 8.246 MB 0.000 MB if g[j][0] < max_legal or num_zeros > 0:\n", " 23 8.188 MB -0.059 MB movable = True\n", " 24 8.246 MB 0.059 MB if not movable:\n", " 25 8.242 MB -0.004 MB j += 1\n", " 26 \n", " 27 8.246 MB 0.004 MB moved = g[j][0]\n", " 28 \n", " 29 8.246 MB 0.000 MB legal = False\n", " 30 8.168 MB -0.078 MB k = 2\n", " 31 8.246 MB 0.078 MB while not legal:\n", " 32 8.250 MB 0.004 MB if (tops[k] % 2 != g[j][0] % 2 and g[j][0] < tops[k]) or tops[k] == 0:\n", " 33 8.188 MB -0.062 MB legal = True\n", " 34 8.246 MB 0.059 MB if not legal:\n", " 35 8.242 MB -0.004 MB k -= 1\n", " 36 \n", " 37 8.250 MB 0.008 MB g[k] = [g[j][0]] + g[k]\n", " 38 8.250 MB 0.000 MB g[j] = g[j][1::]\n", " 39 \n", " 40 8.160 MB -0.090 MB if PRINT:\n", " 41 print g\n", " 42 \n", " 43 8.250 MB 0.090 MB S = [sum(s) for s in g]\n", " 44 8.250 MB 0.000 MB X += [S[0]]\n", " 45 8.250 MB 0.000 MB Y += [S[1]]\n", " 46 8.250 MB 0.000 MB Z += [S[2]]\n", " 47 \n", " 48 8.250 MB 0.000 MB return (X,Y,Z)\n", "\n", "\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "/Users/valentinesvensson/.virtualenvs/devel/lib/python2.7/site-packages/memory_profiler.py:34: UserWarning: psutil module not found. memory_profiler will be slow\n", " warnings.warn(\"psutil module not found. memory_profiler will be slow\")\n" ] } ], "prompt_number": 115 }, { "cell_type": "code", "collapsed": false, "input": [ "%%bash\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Assignment 6\n", "\n", "### 1.\n", "\n", "Do the `about_generators.py` koans in python_koans\n", "\n", "### 2.\n", "\n", "Run cProfile, plop, line_profiler and memory_profiler for the two scripts you created for the previous exercises.\n", "\n", "With \"the two scripts\", I mean `check_repo.py` and `getting_data.py`.\n", "\n", "Make a new directory in the root of the repository called \"profile_results\"\n", "\n", "In there, put the results of the profiling, in the format `{script name}.{profiler name}`. So the output from `line_profiler` of `getting_data.py` should be in a file called `getting_data.py.line_profiler`.\n", "\n", "Special note about `plop`: For `plop`, the ouput is hardcoded to end up in `/tmp/plop.out`. Therefore, when you have run `plop` for, e.g. `getting_data.py`, you should move `/tmp/plop.out` to a file named `getting_data.py.plop` in the specified directory.\n", "\n", "(It should be noted that storing profile results in a repository like you should do here is not a standard way of doing things, but it gives us a way to assess you for this assignment.)" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }