{ "metadata": { "name": "Iterables, Iterators, Generators" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Iterables, Iterators and Generators\n", "\n", "### Ian Ward\n", "\n", "### Ottawa Python Authors Group, 24 January 2013\n", "\n", "https://github.com/wardi/iterables-iterators-generators (or http://bit.ly/itergen )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Gentle Introduction\n", "\n", "The first few examples are from Ned Bachelder's Python Iteration Talk http://bit.ly/pyiter\n", "\n", "Coming from another language you might find it natural to create a counter and increment it to iterate over a `list` in Python" ] }, { "cell_type": "code", "collapsed": false, "input": [ "my_list = [17, 23, 47, 51, 101, 173, 999, 1001]\n", "\n", "i = 0\n", "while i < len(my_list):\n", " v = my_list[i]\n", " print v\n", " i += 1" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "17\n", "23\n", "47\n", "51\n", "101\n", "173\n", "999\n", "1001" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might also have been told about `range()` which almost lets you write a C-style for loop" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range(len(my_list)):\n", " v = my_list[i]\n", " print v" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "17\n", "23\n", "47\n", "51\n", "101\n", "173\n", "999\n", "1001" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "But neither of the above are natural ways of iterating in python. We do this instead:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for v in my_list:\n", " print v" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "17\n", "23\n", "47\n", "51\n", "101\n", "173\n", "999\n", "1001" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Many types of objects may be iterated over this way. Iterating over strings produces single characters:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for v in \"Hello\":\n", " print v" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "H\n", "e\n", "l\n", "l\n", "o" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterating over a `dict` produces its keys (in no particular order):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "d = {\n", " 'a': 1,\n", " 'b': 2,\n", " 'c': 3,\n", " }\n", "\n", "for v in d:\n", " print v\n", "# Note the strange order!" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "a\n", "c\n", "b" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterating over a file object produces lines from that file, including the line termination:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = open(\"suzuki.txt\")\n", "for line in f:\n", " print \">\", line" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "> On education\n", "\n", "> \"Education has failed in a very serious way to convey the most important lesson science can teach: skepticism.\"\n", "\n", "> \"An educational system isn't worth a great deal if it teaches young people how to make a living but doesn't teach them how to make a life.\"" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Objects that can be iterated over in python are called \"Iterables\", and a for loop isn't the only thing that accepts iterables.\n", "\n", "The list constructor takes any iterable. We can use this to make a list of the keys in a `dict`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "list(d)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 7, "text": [ "['a', 'c', 'b']" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or the characters in a string:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "list(\"Hello\")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 8, "text": [ "['H', 'e', 'l', 'l', 'o']" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "List comprehensions take iterables." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ascii = [ord(x) for x in \"Hello\"]\n", "ascii" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 9, "text": [ "[72, 101, 108, 108, 111]" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `sum()` function takes any iterable that produces numbers." ] }, { "cell_type": "code", "collapsed": false, "input": [ "sum(ascii)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 10, "text": [ "500" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `str.join()` method takes any iterable that produces strings." ] }, { "cell_type": "code", "collapsed": false, "input": [ "\"-\".join(d)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 11, "text": [ "'a-c-b'" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterables can produce any python object. The `re.finditer()` function returns an iterable that produces `re.match` objects." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import re\n", "suzuki = open(\"suzuki.txt\").read()\n", "for match in re.finditer(r'\\bs\\w+', suzuki):\n", " print match.group(0)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "serious\n", "science\n", "skepticism\n", "system" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A \"classic\" iterable class\n", "\n", "Very old versions of Python supported iteration through the `__getitem__()` method, and this is still supported." ] }, { "cell_type": "code", "collapsed": true, "input": [ "class Lucky(object):\n", " def __getitem__(self, index):\n", " if index > 3:\n", " raise IndexError\n", " return 7" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Lucky` is a class that will return `7` for any index less than or equal to `3`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "lucky = Lucky()\n", "\n", "lucky[0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 14, "text": [ "7" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And raise an `IndexError` for larger indexes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "lucky[6]" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/home/ian/git/iterables-iterators-generators/\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mlucky\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m6\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/home/ian/git/iterables-iterators-generators/\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, index)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__getitem__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mindex\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mIndexError\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mIndexError\u001b[0m: " ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a perfectly well-behaved python iterable. We can loop over it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for number in lucky:\n", " print number" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "7\n", "7\n", "7\n", "7" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or pass it to functions that take iterables:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "list(lucky)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 17, "text": [ "[7, 7, 7, 7]" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even the `in` operator works with it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "7 in lucky" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 18, "text": [ "True" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "But writing this sort of class it difficult.\n", "You need to be able to return a value for any index passed to `__getitem__()` but most\n", "of the time you really only want to produce items in order from first to last.\n", "\n", "Enter \"Iterators\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Iterators\n", "\n", "The naming is confusingly similar here, but it's important to understand the difference\n", "between *iterables* and *iterators*.\n", "\n", "Iterators are iterables with some kind of 'position' state and a `.next()` method.\n", "The `.next()` method may be called to produce the next item and update the internal state.\n", "\n", "Iterables are objects that produce an iterator when they are passed to the `iter()` builtin." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Iterators are Iterables with .next()](https://raw.github.com/wardi/iterables-iterators-generators/master/iterable_iterator.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling `iter()` on our \"classic\" iterable object produces a plain `iterator` instance" ] }, { "cell_type": "code", "collapsed": false, "input": [ "i = iter(lucky)\n", "i" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 19, "text": [ "" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This plain `iterator` has a counter and the original object as its internal state.\n", "Calling `.next()` advances the counter and calls our \"classic\" iterable's `.__getitem__()` method." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print i.next()\n", "print i.next()\n", "print i.next()\n", "print i.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "7\n", "7\n", "7\n", "7" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we get to the end however, our `IndexError` exception is turned into a `StopIteration`\n", "exception. This is the iterator protocol: when `.next()` raises `StopIteration` there are\n", "no more items to be produced." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print i.next() # raises StopIteration, *not* IndexError" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "StopIteration", "evalue": "", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/home/ian/git/iterables-iterators-generators/\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mprint\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# raises StopIteration, *not* IndexError\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mStopIteration\u001b[0m: " ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember that an iterator *is* an iterable, so it can be passed to anything that takes\n", "an iterable.\n", "\n", "Be careful, though. Iterators may only be iterated over once, since they are updating\n", "their internal state as they go. If we try to iterate twice, the second time will produce\n", "no more items:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "i = iter(lucky)\n", "print list(i)\n", "print list(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[7, 7, 7, 7]\n", "[]" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling on `iter()` on an iterable will produce a different iterator object each time." ] }, { "cell_type": "code", "collapsed": false, "input": [ "i is iter(lucky)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 24, "text": [ "False" ] } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also, like other iterables, calling `iter()` on an iterator works, but it behaves differently!\n", "\n", "Calling `iter()` on an iterator typically returns the exact same iterator.\n", "If you think about it, that's all that can be done because\n", "you can't rewind or duplicate an iterator in the general case." ] }, { "cell_type": "code", "collapsed": false, "input": [ "i is iter(i)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 23, "text": [ "True" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterators come in all shapes and sizes.\n", "\n", "`xrange()` has a `rangeiterator`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "iter(xrange(20))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 25, "text": [ "" ] } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`dict` has a `dictionary-keyiterator`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "iter({'a': 1, 'b': 2})" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 26, "text": [ "" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`list` doesn't even use the plain `iterator` type, and instead uses its own more efficient `listiterator`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "iter([4, 5, 6])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 27, "text": [ "" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And some have names that provide no clue what they iterate over:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "re.finditer(r'\\bs\\w+', \"some text with swords\")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 28, "text": [ "" ] } ], "prompt_number": 28 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A better iterable class\n", "\n", "You can choose the iterator that will be returned by `iter()` by defining your own\n", "`.__iter__()` method:" ] }, { "cell_type": "code", "collapsed": true, "input": [ "class Countdown(object):\n", " def __iter__(self): # must return an iterator!\n", " return iter([5, 4, 3, 2, 1, 'launch'])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `for` loop and other places that take iterables internally use `iter()`, which calls our new\n", "`.__iter__()` method to create an iterator:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for n in Countdown():\n", " print n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "5\n", "4\n", "3\n", "2\n", "1\n", "launch" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Iterators the hard way\n", "\n", "The example above is fine if we want to reuse an existing iterator\n", "(like the `listiterator` above), but what if we want to write a new iterator?\n", "\n", "We know the protocol, so one approach is to just implement it:" ] }, { "cell_type": "code", "collapsed": true, "input": [ "class CountdownIterator(object):\n", " def __init__(self):\n", " self._remaining = [5, 4, 3, 2, 1, 'launch']\n", " \n", " def __iter__(self):\n", " return self\n", " \n", " def next(self):\n", " if not self._remaining:\n", " raise StopIteration\n", " return self._remaining.pop(0)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our internal 'position' state is a list of items we pop\n", "one at a time when `.next()` is called.\n", "We implement `.__iter__()` in the normal way for an iterator: `\"return self\"`.\n", "We raise `StopIteration` when we have nothing left to produce.\n", "\n", "This works as expected, but it's rather a lot of code for a simple result." ] }, { "cell_type": "code", "collapsed": false, "input": [ "for n in CountdownIterator():\n", " print n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "5\n", "4\n", "3\n", "2\n", "1\n", "launch" ] } ], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generators\n", "\n", "A generator function is a simpler way to create an iterator.\n", "\n", "Generator functions let you use local\n", "variables and the position of the program counter as state for\n", "a generator object. A new generator object is created and returned\n", "each time you call a generator function. The generator object is an iterator.\n", "\n", "![Generators are Iterators created with a generator function or expression](https://raw.github.com/wardi/iterables-iterators-generators/master/iterable_iterator_generator.png)\n", "\n", "Here is a generator function that *only* uses the program counter for state:" ] }, { "cell_type": "code", "collapsed": true, "input": [ "def countdown_generator():\n", " yield 5\n", " yield 4\n", " yield 3\n", " yield 2\n", " yield 1\n", " yield 'launch'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 33 }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we call the generator function it does nothing except create a new generator object.\n", "None of the code in the generator function has been executed yet." ] }, { "cell_type": "code", "collapsed": false, "input": [ "countdown_generator()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 34, "text": [ "" ] } ], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the generator object is iterated over execution starts,\n", "following the generator function definition until the next `yield` statement.\n", "\n", "When it reaches the `yield` statement execution is paused (the program counter is stored)\n", "and the value on the right of the `yield` statement is produced as a value from the\n", "generator object. Execution is resumed from the stored program counter position when\n", "iteration continues.\n", "\n", "When the generator function reaches the end, the generator raises a `StopIteration`\n", "exception just like a normal iterator. And it behaves just like a normal iterator:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for n in countdown_generator():\n", " print n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "5\n", "4\n", "3\n", "2\n", "1\n", "launch" ] } ], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have a much more concise way of defining our own iterator for an iterable class.\n", "The `.__iter__()` method of our class can be written as a generator function:" ] }, { "cell_type": "code", "collapsed": true, "input": [ "class Countdown(object):\n", " def __iter__(self):\n", " for n in [5, 4, 3, 2, 1, 'launch']:\n", " yield n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "for n in Countdown():\n", " print n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "5\n", "4\n", "3\n", "2\n", "1\n", "launch" ] } ], "prompt_number": 37 }, { "cell_type": "markdown", "metadata": {}, "source": [ "But, enough about classes. Let's dig further into how these generators work.\n", "\n", "Recall that execution of the code in a generator function does not proceed until\n", "the generator object returned is iterated over. That lets us put things in a generator\n", "that might be expensive, knowing that we will only have to pay that cost when we actually\n", "ask it to produce the next item.\n", "\n", "This generator causes a for loop to slow down between iterations. First waiting 5 seconds,\n", "then counting down from \"5\" to \"1\" with 1 seconds intervals in between:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import time\n", "\n", "def slow_generator():\n", " time.sleep(5)\n", " yield 5\n", " time.sleep(1)\n", " yield 4\n", " time.sleep(1)\n", " yield 3\n", " time.sleep(1)\n", " yield 2\n", " time.sleep(1)\n", " yield 1\n", " time.sleep(1)\n", "\n", "print \"starting\"\n", "for n in slow_generator():\n", " print n\n", "print \"done\"" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "starting\n", "5" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "4" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "3" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "2" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "done" ] }, { "output_type": "stream", "stream": "stdout", "text": [] } ], "prompt_number": 38 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way of writing this code is to turn the generator inside-out.\n", "\n", "Instead of sleeping inside the generator we can `yield`\n", "the amount of time we want to sleep.\n", "And instead of `yield`-ing the countdown we can use a function passed in to\n", "display values to the user." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "def countdown_generator(fn):\n", " yield 5\n", " fn(5)\n", " yield 1\n", " fn(4)\n", " yield 1\n", " fn(3)\n", " yield 1\n", " fn(2)\n", " yield 1\n", " fn(1)\n", " yield 1" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 39 }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `show()` function takes the place of the print inside the loop, and the `time.sleep()`\n", "call is done by the code iterating over the generator. This puts the code driving the\n", "generator in charge of how (or if) it sleeps for the given time." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def show(n):\n", " print n\n", "\n", "print \"starting\"\n", "for s in countdown_generator(show):\n", " time.sleep(s)\n", "print \"done\"" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "starting\n", "5" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "4" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "3" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "2" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "done" ] }, { "output_type": "stream", "stream": "stdout", "text": [] } ], "prompt_number": 40 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generators as coroutines\n", "\n", "While a generator object is an iterator, it can also be used for much more.\n", "\n", "When paused at a `yield` statement generator objects can receive data by\n", "using `.send()` instead of `.next()`.\n", "\n", "When we use `yield` as an expression or assign it to a variable, the value\n", "passed to `.send()` is available inside the generator." ] }, { "cell_type": "code", "collapsed": true, "input": [ "def knock_knock():\n", " name = yield \"Who's there?\"\n", " yield \"%s who?\" % name\n", " yield \"That's not funny at all\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have to switch to manually calling `.next()` on our generator object, because a\n", "`for` loop or function that takes an iterable won't be able to call `.send()` when\n", "we need to." ] }, { "cell_type": "code", "collapsed": false, "input": [ "k = knock_knock()\n", "k.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 42, "text": [ "\"Who's there?\"" ] } ], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point execution is paused at the first `yield`. The assignment to the variable\n", "`name` hasn't happened yet. But when we `.send()` a value execution continues:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "k.send(\"David\")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 43, "text": [ "'David who?'" ] } ], "prompt_number": 43 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And in the generator object we are at the second `yield` with `\"David\"` assigned to `name`.\n", "\n", "If we send something to a `yield` that isn't being used as an expression, the value we send\n", "will be ignored:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "k.send(\"David the environmentalist\")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 44, "text": [ "\"That's not funny at all\"" ] } ], "prompt_number": 44 }, { "cell_type": "markdown", "metadata": {}, "source": [ "But execution continues the same as if we called `.next()`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the end of part 1.\n", "\n", "In part 2 we will build a simple interactive network game with 90% of the code written as\n", "generators. I will show how breaking down asynchronous code into generators can make\n", "it easy to test, easy to reuse, and (with some practice) easy to understand.\n", "\n", "-----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generators in Depth\n", "\n", "Generators' ability to accept as well as produce data is useful for many purposes. The interface to generator objects is limited, unlike instances of classes we create, but generators enable a very powerful programming style.\n", "\n", "Here is a \"running average\" calculator implemented as a generator. As shown in the previous section the generator state (`total`, `count`) is stored in local variables.\n", "\n", "Notice the extra parenthesis around the first `yield` expression. This is required any time `yield` doesn't appear immediately to the right of an assignment operator." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "def running_avg():\n", " \"coroutine that accepts numbers and yields their running average\"\n", " total = float((yield))\n", " count = 1\n", " while True:\n", " i = yield total / count\n", " count += 1\n", " total += i" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This type of generator is driven by input to `.send()`. However, the code inside a generator must execute up to the first yield expression before it can accept any value from the caller. That means that the caller always has to call `.next()` (or `.send(None)`) once after creating a generator object before sending it data.\n", "\n", "If you try to send a value first instead, Python has a helpful error for you." ] }, { "cell_type": "code", "collapsed": false, "input": [ "r = running_avg()\n", "r.send(10)" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "can't send non-None value to a just-started generator", "output_type": "pyerr", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mr\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mrunning_avg\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mr\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0msend\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m10\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mTypeError\u001b[0m: can't send non-None value to a just-started generator" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So let's call `.next()` to advance the program counter to the first `yield` expression, then start passing values we would like averaged." ] }, { "cell_type": "code", "collapsed": false, "input": [ "r.next()\n", "r.send(10)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 3, "text": [ "10.0" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "r.send(5)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 4, "text": [ "7.5" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "r.send(0)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 5, "text": [ "5.0" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "r.send(0)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 6, "text": [ "3.75" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A convenient decorator\n", "\n", "The flow is always the same when working with generators.\n", "\n", "1. a generator object is created by the caller\n", "2. the caller starts the generator\n", "3. the generator passes data to the caller (or signals the end of the sequence)\n", "4. the caller passes data to the generator\n", "5. repeat from (3)\n", "\n", "For generators that are driven by input to `.send()` no data is transferred in the first 3 steps above.\n", "\n", "This is a decorator that arranges for `.next()` to be called once immediately after a generator is created. This will turn a generator function into a function that returns a generator immediately ready to receive data (step 4)." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "def advance_generator_once(original_fn):\n", " \"decorator to advance a generator once immediately after it is created\"\n", " def actual_call(*args, **kwargs):\n", " gen = original_fn(*args, **kwargs)\n", " assert gen.next() is None\n", " return gen\n", " return actual_call" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 51 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We apply our decorator to `running_avg()`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "##\n", "running_avg = advance_generator_once(running_avg)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now we don't need to call `.next()` every time we create a new running average generator." ] }, { "cell_type": "code", "collapsed": false, "input": [ "r = running_avg()\n", "r.send(42)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 52, "text": [ "42.0" ] } ], "prompt_number": 52 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Coroutine development is protocol development\n", "\n", "As shown, one of the ways to pass a message to a generator is with `.send()`. This interface allows you to pass a single object to a generator. For this object we can pass tuples, dicts or anything else we choose.\n", "\n", "You decide the protocol for your generator by documenting the types and values of objects you will send from caller to generator and yield from generator to caller.\n", "\n", "Tuples are perfect for a generator that needs two objects each time, e.g. a player number and a key press.\n", "\n", "This is a Rock-Paper-Scissors game where each player's play is passed in separately, and once both players have played the result of the game is yielded. Players can change their mind choose a different play if the other player hasn't chosen yet. Games will continue indefinitately.\n", "\n", "This generator uses a common pattern of storing the `result` that will be yielded in a local variable so that there are fewer `yield` statements in the generator function. Having fewer `yield` statements makes it easier to understand where it is possible for execution to be paused within the generator function.\n", "\n", "The outer `while` loop runs once for each full game. The inner `while` loop collects input from the users until the game result can be decided." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "@advance_generator_once\n", "def rock_paper_scissors():\n", " \"\"\"\n", " coroutine for playing rock-paper-scissors\n", "\n", " yields: 'invalid key': invalid input was sent\n", " ('win', player, choice0, choice1): when a player wins\n", " ('tie', None, choice0, choice1): when there is a tie\n", " None: when waiting for more input\n", "\n", " accepts to .send(): (player, key):\n", " player is 0 or 1, key is a character in 'rps'\n", " \"\"\"\n", " valid = 'rps'\n", " wins = 'rs', 'sp', 'pr'\n", " result = None\n", "\n", " while True:\n", " chosen = [None, None]\n", " while None in chosen:\n", " player, play = yield result\n", " result = None\n", " if play in valid:\n", " chosen[player] = play\n", " else:\n", " result = 'invalid key'\n", " \n", " if chosen[0] + chosen[1] in wins:\n", " result = ('win', 0) + tuple(chosen)\n", " elif chosen[1] + chosen[0] in wins:\n", " result = ('win', 1) + tuple(chosen)\n", " else:\n", " result = ('tie', None) + tuple(chosen)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 53 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We play this game by passing `(player number, play)` tuples to `.send()` " ] }, { "cell_type": "code", "collapsed": false, "input": [ "rps = rock_paper_scissors()\n", "rps.send((0, 'r'))\n", "rps.send((1, 'p'))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 54, "text": [ "('win', 1, 'r', 'p')" ] } ], "prompt_number": 54 }, { "cell_type": "code", "collapsed": false, "input": [ "rps.send((1, 's'))\n", "rps.send((1, 'p'))\n", "rps.send((1, 'z'))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 55, "text": [ "'invalid key'" ] } ], "prompt_number": 55 }, { "cell_type": "code", "collapsed": false, "input": [ "rps.send((0, 's'))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 56, "text": [ "('win', 0, 's', 'p')" ] } ], "prompt_number": 56 }, { "cell_type": "code", "collapsed": false, "input": [ "rps.send((0, 'r'))\n", "rps.send((1, 'r'))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 57, "text": [ "('tie', None, 'r', 'r')" ] } ], "prompt_number": 57 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make the caller wait\n", "\n", "It makes sense to design your generator protocols so that any waiting has to be done by the caller. This includes waiting for user input, IO completion and timers. The rock, paper scissors example above and all the examples to follow are designed this way.\n", "\n", "The benefit of waiting in the caller is that other processing (and other generators) can be \"running\" at the same time without needing multiple processes or threads." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generators with try: finally:\n", "\n", "`try:` `finally:` blocks in normal Python code are great for managing an external resources. `try:` `finally:` blocks work just fine in generators too, you can use them to ensure a resource aquired inside a generator is properly cleaned up no matter how the generator is used. The generator could run to completion, raise an exception or simply be garbage collected after the last reference to it is removed and the `finally:` block will always be executed.\n", "\n", "This generator sets the terminal attached to `fd` into \"cbreak\" mode when it is created and then restores the original terminal setting on exit. cbreak mode allows reading each key the user types as they are typed, instead of waiting for a full line of text.\n", "\n", "Note that the first `yield` statement is inside the `try:` block, guaranteeing that the `finally:` block will be executed eventually. The first `yield` statement yields `None` (which is what is expected by the `advance_generator_once` decorator). The `advance_generator_once` decorator makes sure `.next()` is immediately called on this generator after creation, so the terminal is always set to cbreak mode after the caller calls `cbreak_keys()`.\n", "\n", "The `finally:` block in this generator is useful in normal operation and for debugging. When the program exits the terminal will be restored to normal mode. If the terminal is left in cbreak mode tracebacks shown will be sprayed across the screen in a hard-to-read manner, and the user's shell prompt won't echo input normally." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "import os\n", "import termios\n", "import tty\n", "\n", "@advance_generator_once\n", "def cbreak_keys(fd):\n", " \"enter cbreak mode and yield keys as they arrive\"\n", " termios_settings = termios.tcgetattr(fd)\n", " tty.setcbreak(fd)\n", " try:\n", " yield # initialize step\n", " while True:\n", " yield os.read(fd, 1)\n", " finally:\n", " termios.tcsetattr(fd, termios.TCSADRAIN, termios_settings)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 58 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Testing this generator involves creating a \"psuedo-tty\" for the generator to connect to, then simulating user input on that pseudo-tty. In this test we read just 5 characters because reading any more would block." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pty\n", "import itertools\n", "\n", "master, slave = pty.openpty()\n", "old_settings = termios.tcgetattr(master)\n", "os.write(slave, \"hello\")\n", "\n", "c = cbreak_keys(master)\n", "for i in range(5):\n", " print c.next()," ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "h e l l o" ] } ], "prompt_number": 59 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We still hold a reference to the generator, so our pseudo-tty is still in cbreak mode." ] }, { "cell_type": "code", "collapsed": false, "input": [ "old_settings == termios.tcgetattr(master)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 60, "text": [ "False" ] } ], "prompt_number": 60 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could remove the reference, or we can be more explicit by calling the generator's `.close()` method. The `.close()` method causes a `GeneratorExit` exception to be raised *within the generator*.\n", "\n", "When the caller uses the `.close()` method it expects the generator to exit. If the generator yields a value instead, `.close()` raises a `RuntimeError`. Generators must exit normally or raise an exception when a `GeneratorExit` is raised." ] }, { "cell_type": "code", "collapsed": false, "input": [ "c.close()\n", "old_settings == termios.tcgetattr(master)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 61, "text": [ "True" ] } ], "prompt_number": 61 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sockets are another type of external resource that needs to be properly managed.\n", "\n", "When using sockets it's polite to shut down the socket so the other side knows you're done sending. It's also a very good idea to close the socket to free up the file descriptor allocated by the operating system.\n", "\n", "This generator reads individual characters from a socket and on exit politely tries to shutdown the socket, and next ensures that the socket is closed." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "import socket\n", "\n", "@advance_generator_once\n", "def socket_read_and_close(sock):\n", " \"yields strings from sock and ensures sock.shutdown() is called\"\n", " try:\n", " b = None\n", " while b != '':\n", " yield b\n", " b = sock.recv(1)\n", " finally:\n", " try:\n", " sock.shutdown(socket.SHUT_RDWR)\n", " except socket.error:\n", " pass\n", " sock.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 62 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Testing socket code takes quite a few lines of boilerplate. We have to find a free port, set up a server socket and then accept a client connection as another socket object.\n", "\n", "We send some text and immediately disconnect the \"client\" to verify that our generator exits cleanly after reading all the data on the \"server\" side." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import socket\n", "server = socket.socket()\n", "server.bind(('', 0))\n", "server.listen(0)\n", "host, port = server.getsockname()\n", "\n", "client = socket.socket()\n", "client.connect(('localhost', port))\n", "s, remote = server.accept()\n", "\n", "client.send(\"world\")\n", "client.shutdown(socket.SHUT_RDWR)\n", "\n", "for c in socket_read_and_close(s):\n", " print c," ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "w o r l d" ] } ], "prompt_number": 63 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The socket object has been closed, so if we try to wrap another generator around it we expect to get an error." ] }, { "cell_type": "code", "collapsed": false, "input": [ "socket_read_and_close(s).next()" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "error", "evalue": "[Errno 9] Bad file descriptor", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31merror\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/home/ian/git/iterables-iterators-generators/\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msocket_read_and_close\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/home/ian/git/iterables-iterators-generators/\u001b[0m in \u001b[0;36msocket_read_and_close\u001b[0;34m(sock)\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;32myield\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msock\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrecv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 12\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/lib/python2.7/socket.pyc\u001b[0m in \u001b[0;36m_dummy\u001b[0;34m(*args)\u001b[0m\n\u001b[1;32m 168\u001b[0m \u001b[0m__slots__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 169\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_dummy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 170\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0merror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mEBADF\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Bad file descriptor'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 171\u001b[0m \u001b[0;31m# All _delegate_methods must also be initialized here.\u001b[0m\n\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 172\u001b[0m \u001b[0msend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrecv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrecv_into\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msendto\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrecvfrom\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrecvfrom_into\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_dummy\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31merror\u001b[0m: [Errno 9] Bad file descriptor" ] } ], "prompt_number": 64 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### State machine generators\n", "\n", "Generators are great for representing state machines.\n", "\n", "Telnet is a protocol for allowing untrusted, insecure connections from remote hosts. It's one option for plain terminal connections, and almost every operating system comes with a client.\n", "\n", "For reading user input however, telnet has a bunch of control sequences that we need to filter out. Fortunately filtering out the control sequences with a little state machine is easy.\n", "\n", "Starting in telnet normal mode all bytes except 255 are user input. A 255 byte enters command mode, which is followed by either one command byte and one parameter byte, or a 250 command byte and any number of subnegotiation bytes ending with a 240. On the transitions we mark `True` for user data and `False` for part of a control sequence." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Telnet command filter](https://raw.github.com/wardi/iterables-iterators-generators/master/telnet_filter.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The generator that implements this state machine isn't much longer than the diagram that describes it. The states \"normal\", \"command\", \"parameter\" and \"subnegotiation\" are represented as `yield` statements. The transitions are normal python flow statements `while:` `if:` and `continue`.\n", "\n", "The pattern of storing the result in a local variable `actual_input` is used here for the \"normal\" state because some transitions to the normal state yield `True` and others yield `False`." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "@advance_generator_once\n", "def telnet_filter():\n", " \"\"\"\n", " coroutine accepting characters and yielding True when the character\n", " passed is actual input or False when it is part of a telnet command.\n", " \"\"\"\n", " actual_input = None\n", " while True:\n", " key = yield actual_input # normal\n", " if key != chr(255):\n", " actual_input = True\n", " continue\n", " \n", " key = yield False # command\n", " if key == chr(255):\n", " actual_input = True\n", " continue\n", " \n", " actual_input = False\n", " if key == chr(250):\n", " while key != chr(240):\n", " key = yield False # subnegotiation\n", " else:\n", " yield False # parameter" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 65 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This generator yields `True` and `False` values for each byte sent to it. To test it we print the characters sent where the generator yielded `True`. This is a capture of a control sequences sent from a modern telnet client with the 6 characters \"signal\" mixed between the control sequences." ] }, { "cell_type": "code", "collapsed": false, "input": [ "keep = telnet_filter()\n", "chatter = ('\\xff\\xfd\\x03si\\xff\\xfb\"gn\\xff\\xfa\"\\x03\\x01\\x00\\x00\\x03b'\n", " '\\x03\\x04\\x02\\x0f\\x05\\x00\\x00\\x07b\\x1c\\x08\\x02\\x04\\tB\\x1a'\n", " '\\n\\x02\\x7f\\x0b\\x02\\x15\\x0f\\x02\\x11\\x10\\x02\\x13\\x11\\x02'\n", " '\\xff\\xff\\x12\\x02\\xff\\xff\\xff\\xf0al\\xff\\xfd\\x01')\n", "for c in chatter:\n", " if keep.send(c):\n", " print c," ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "s i g n a l" ] } ], "prompt_number": 66 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Combining generators\n", "\n", "It's easy to combine simple generators for more advanced functionality. We can combine our `socket_read_and_close()` generator with some new telnet negotiation and our `telnet_filter()` generator to make a new \"clean\" character-by-character telnet input generator. This generator will yield only the user input from a telnet connection.\n", "\n", "Be careful of when combining generators, though. If a \"child\" generator raises a `StopIteration` exception when you call its `.next()` method that exception will propagate normally in the \"parent\" generator, causing it to exit as well. This can lead to some hard to find bugs, so always think about what you want to happen when your child raises `StopIteration`. If you want to handle `StopIteration` you can catch it like any other exception, or you can use the default parameter of the `next()` builtin function when advancing the child generator.\n", "\n", "In this case we do want the new `telnet_keys()` parent generator to stop when the `socket_read_and_close()` child generator does, so we let its `StopIteration` exception carry through." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "@advance_generator_once\n", "def telnet_keys(sock):\n", " \"yields next key or None if key was a filtered out telnet command\"\n", " # negotiate character-by-character, disable echo\n", " sock.send('\\xff\\xfd\\x22\\xff\\xfb\\x01')\n", " keep = telnet_filter()\n", " s = socket_read_and_close(sock)\n", " yield\n", " while True:\n", " # allow StopIteration to carry through:\n", " c = s.next()\n", " if keep.send(c):\n", " yield c\n", " else:\n", " yield None" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 67 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### High-level decision making with generators\n", "\n", "Rock, paper, scissors isn't much fun when played single player in the Python read-eval-print-loop.\n", "\n", "We should be able to play it with single keystrokes on the console. Opponents (guests) should be able to connect to play against us. Once they connect we should play a few games before we choose a winner. We need to handle a guest disconnecting before all the games have been played. Also, if the guest waits too long to make a move they should be disconnected so someone else can play. Finally, we should have reporting of how well we've done against all the guests that have connected.\n", "\n", "For the game itself we can lay out the states and transitions. There are only really two states: \"waiting\" for a guest to connect and \"play\" while we're playing. \"Moved\" and \"win\" below are just steps in deciding whether we end up in the \"play\" or \"waiting\" state.\n", "\n", "There are many transitions with different effects. When guest enters a play we should disable the timeout that will disconnect them. When a player has moved, play may continue because one player still needs to move or because the game was a tie. When a game ends with one player winning play will continue until a full round has been played when the guest will be disconnected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![game_machine states](https://raw.github.com/wardi/iterables-iterators-generators/master/game_machine.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This generator is responsible for providing prompts and feedback to both players, including game results. It accepts and processes input from both players and notifies players when their input is invalid. It also asks the caller to disconnect the opponent once enough games have been played by yielding `'waiting'`. Finally this generator manages control of the opponent timeout by yielding `'reset timeout'` when the timer should be (re)started and `'disable timeout'` when the opponent has entered a valid play.\n", "\n", "The caller is responsible for receiving input from the users, sending output to the users and informing the generator when the opponent connects, disconnects or has timed out. The disconnect and timeout conditions are somewhat exceptional so they have been implemented with the use of the generator's `.throw()` method.\n", "\n", "Exceptions passed to a generator's `.throw()` method are raised *within the generator* from the current `yield` statement, just like how the `.close()` method raises `GeneratorExit`. Exceptions may then be handled by the generator with `try:` `except:` blocks. Here we handle the exceptions by printing a message and returning to the outer `while:` loop (the \"waiting\" state)." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "class Timeout(Exception):\n", " pass\n", "\n", "class Disconnect(Exception):\n", " pass\n", "\n", "def game_machine(game_factory, output, best_of=9):\n", " \"\"\"\n", " coroutine that manages and provides comminication for two-player games,\n", " best of N\n", "\n", " :param game_factory: a function that returns a game generator object\n", " :param output: a function that sends output to one or both players\n", " :param best_of: max games to play per guest (an odd number)\n", "\n", " yields: 'waiting' : waiting for a guest (disconnect any existing guest)\n", " 'play': playing a game, accepting input\n", " 'disable timeout': disable the guest timout, accepting input\n", " 'reset timeout': reset and start guest timeout, accepting input\n", "\n", " accepts to .send():\n", " ('join', guest_name): Guest guest_name joined\n", " ('key', (player_num, key)): Input from player player_num (0 or 1)\n", "\n", " accepts to .throw(): Disconnect: the guest disconnected\n", " Timeout: the guest timout fired\n", " \"\"\"\n", " ravg = running_avg()\n", " while True:\n", " event, value = yield 'waiting'\n", " if event != 'join':\n", " continue\n", " game = game_factory()\n", " wins = [0, 0]\n", " output(\"Player connected: {0}\".format(value), player=0)\n", " output(\"Welcome to the game\", player=1)\n", "\n", " try:\n", " response = 'reset timeout'\n", " while True:\n", " event, value = yield response\n", " response = 'play'\n", " if event != 'key':\n", " continue\n", " \n", " player, key = value\n", " result = game.send((player, key))\n", " if result == 'invalid key':\n", " output(\"Invalid key\", player=player)\n", " continue\n", " elif player == 1:\n", " response = 'disable timeout'\n", " if not result:\n", " continue\n", " \n", " outcome, player, play0, play1 = result\n", " output(\"Player 0: {0}, Player 1: {1}\".format(play0, play1))\n", " if outcome == 'win':\n", " wins[player] += 1\n", " output(\"Player {0} wins!\".format(player))\n", " output(\"Wins: {0} - {1}\".format(*wins))\n", " output(\"Overall: {0:5.2f}%\".format(\n", " (1 - ravg.send(player)) * 100), player=0)\n", " \n", " if any(count > best_of / 2 for count in wins):\n", " output(\"Thank you for playing!\")\n", " break\n", " response = 'reset timeout'\n", " \n", " except Disconnect:\n", " output(\"Opponent disconnected.\", player=0)\n", "\n", " except Timeout:\n", " output(\"Timed out. Good-bye\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 68 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To test this generator we need a simple output function with a default value for the `player` parameter." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def say(t, player='all'):\n", " print player, \">\", t\n", "\n", "g = game_machine(rock_paper_scissors, say)\n", "g.next()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 69, "text": [ "'waiting'" ] } ], "prompt_number": 69 }, { "cell_type": "code", "collapsed": false, "input": [ "g.send(('join', 'Ernie'))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0 > Player connected: Ernie\n", "1 > Welcome to the game" ] }, { "output_type": "pyout", "prompt_number": 70, "text": [ "'reset timeout'" ] } ], "prompt_number": 70 }, { "cell_type": "code", "collapsed": false, "input": [ "g.send(('key', (0, 'r')))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 71, "text": [ "'play'" ] } ], "prompt_number": 71 }, { "cell_type": "code", "collapsed": false, "input": [ "g.send(('key', (1, 's')))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "all > Player 0: r, Player 1: s\n", "all > Player 0 wins!\n", "all > Wins: 1 - 0\n", "0 > Overall: 100.00%" ] }, { "output_type": "pyout", "prompt_number": 72, "text": [ "'reset timeout'" ] } ], "prompt_number": 72 }, { "cell_type": "code", "collapsed": false, "input": [ "g.throw(Disconnect())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0 > Opponent disconnected." ] }, { "output_type": "pyout", "prompt_number": 73, "text": [ "'waiting'" ] } ], "prompt_number": 73 }, { "cell_type": "code", "collapsed": false, "input": [ "g.send(('join', 'Bert'))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0 > Player connected: Bert\n", "1 > Welcome to the game" ] }, { "output_type": "pyout", "prompt_number": 74, "text": [ "'reset timeout'" ] } ], "prompt_number": 74 }, { "cell_type": "code", "collapsed": false, "input": [ "g.send(('key', (1, 'x')))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 > Invalid key" ] }, { "output_type": "pyout", "prompt_number": 75, "text": [ "'play'" ] } ], "prompt_number": 75 }, { "cell_type": "code", "collapsed": false, "input": [ "g.send(('key', (1, 'p')))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 76, "text": [ "'disable timeout'" ] } ], "prompt_number": 76 }, { "cell_type": "code", "collapsed": false, "input": [ "g.send(('key', (0, 'r')))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "all > Player 0: r, Player 1: p\n", "all > Player 1 wins!\n", "all > Wins: 0 - 1\n", "0 > Overall: 50.00%" ] }, { "output_type": "pyout", "prompt_number": 77, "text": [ "'reset timeout'" ] } ], "prompt_number": 77 }, { "cell_type": "code", "collapsed": false, "input": [ "for player, key in [(0, 's'), (1, 'p')] * 5:\n", " r = g.send(('key', (player, key)))\n", "r" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "all > Player 0: s, Player 1: p\n", "all > Player 0 wins!\n", "all > Wins: 1 - 1\n", "0 > Overall: 66.67%\n", "all > Player 0: s, Player 1: p\n", "all > Player 0 wins!\n", "all > Wins: 2 - 1\n", "0 > Overall: 75.00%\n", "all > Player 0: s, Player 1: p\n", "all > Player 0 wins!\n", "all > Wins: 3 - 1\n", "0 > Overall: 80.00%\n", "all > Player 0: s, Player 1: p\n", "all > Player 0 wins!\n", "all > Wins: 4 - 1\n", "0 > Overall: 83.33%\n", "all > Player 0: s, Player 1: p\n", "all > Player 0 wins!\n", "all > Wins: 5 - 1\n", "0 > Overall: 85.71%\n", "all > Thank you for playing!" ] }, { "output_type": "pyout", "prompt_number": 78, "text": [ "'waiting'" ] } ], "prompt_number": 78 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Driving an event loop with a generator\n", "\n", "Now we need to pull all the peices together in a way that a `select.select()` loop can run the whole thing.\n", "\n", "This generator accepts input on the console for player 0 with `cbreak_keys()`, and listens on a TCP port for player 1 (the guest) to connect. When player 1 connects it accepts input on that socket as a telnet connection with `telnet_keys()`. Input from both players is forwarded to `game_machine()`. When `game_machine()` asks for the timer to be reset this generator creates a `countdown_generator()` that will be used to give player 1 a countdown as their time is running out.\n", "\n", "The caller is expected to be running `select.select()` in a loop. `select.select()` expects a list of file descriptors that might have data waiting and optionally a timeout value. The file descriptors passed will either be the console and server socket when player 1 is not connected, or the console and the client socket when player 1 is connected. The timeout will be set if there is a countdown in progress. `select.select()` yields a list of the file descriptors that are readable, or an empty list if the timeout was reached." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "import socket\n", "import sys\n", "import time\n", "\n", "def console_telnet_game_loop(game_factory, countdown_factory, best_of=9,\n", " stdin=None, port=12333, now=None):\n", " \"\"\"\n", " Coroutine that manages IO from console and incoming telnet connections\n", " (one client at a time), and tracks a timeout for the telnet clients.\n", " Console and telnet client act as player 0 and 1 of a game_machine.\n", "\n", " :param game_factory: passed to game_machine()\n", " :param coutdown_factory: function returning a countdown generator\n", " :param best_of: passed to game_machine()\n", " :param stdin: file object to use for player 0, default: sys.stdin\n", " :param port: telnet port to listen on for player 1\n", " :param now: function to use to get the current time in seconds,\n", " default: time.time\n", "\n", " yields args for select.select(*args)\n", "\n", " accepts to .send() the fd lists returned from select.select()\n", " \"\"\"\n", " if stdin is None:\n", " stdin = sys.stdin\n", " if now is None:\n", " now = time.time\n", " \n", " server = socket.socket()\n", " server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)\n", " server.bind(('', port))\n", " server.listen(0)\n", " print \"Listening for telnet connections on port\", port\n", " \n", " client = None\n", " client_reader = None\n", " timeout = None\n", " countdown = None\n", " local_fd = stdin.fileno()\n", " local_user = cbreak_keys(local_fd)\n", " \n", " def output(txt, player='all'):\n", " if player != 1:\n", " print txt\n", " if player != 0 and client:\n", " client.send(txt + '\\r\\n')\n", "\n", " g = game_machine(game_factory, output, best_of)\n", " state = g.next()\n", " \n", " while True:\n", " if state == 'waiting' and client:\n", " client = client_reader = timeout = None\n", " if state == 'reset timeout':\n", " countdown = countdown_factory(lambda n: output(str(n), player=1))\n", " timeout = time.time() + countdown.next()\n", " state = 'play'\n", " if state == 'disable timeout':\n", " countdown = timeout = None\n", " \n", " telnet_fd = client.fileno() if client else server.fileno()\n", " timeout_seconds = max(0, timeout - now()) if timeout else None\n", " \n", " readable, _, _ = yield [local_fd, telnet_fd], [], [], timeout_seconds\n", " \n", " if not readable: # no files to read == timeout, advance countdown\n", " try:\n", " timeout = now() + countdown.next()\n", " except StopIteration:\n", " state = g.throw(Timeout())\n", " timeout = None\n", " continue\n", " if local_fd in readable: # local user input\n", " state = g.send(('key', (0, local_user.next())))\n", " readable.remove(local_fd)\n", " continue\n", " if client: # client input\n", " try:\n", " key = client_reader.next()\n", " except StopIteration:\n", " state = g.throw(Disconnect())\n", " else:\n", " if key: # might be None if telnet commands were filtered\n", " state = g.send(('key', (1, key)))\n", " continue\n", " # accept a new client connection\n", " client, addr = server.accept()\n", " client_reader = telnet_keys(client)\n", " client_reader.next()\n", " state = g.send(('join', str(addr)))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 79 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The big picture" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Overview diagram](https://raw.github.com/wardi/iterables-iterators-generators/master/everything.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last peice we need is a `main()` funtion to drive our generators. All the waiting for IO and timers is done here, controlled from the generators above.\n", "\n", "This design lets us easily switch to different event loop or async library. We could also run many of these games or many different tasks on the same event loop." ] }, { "cell_type": "code", "collapsed": true, "input": [ "##\n", "import select\n", "\n", "def main():\n", " loop = console_telnet_game_loop(rock_paper_scissors, countdown_generator)\n", " fd_lists = None\n", " while True:\n", " select_args = loop.send(fd_lists)\n", " fd_lists = select.select(*select_args)\n", "\n", "if __name__ == \"__main__\":\n", " main()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We end up with about 300 lines of code, including docstrings and more than 90% of that code is generators.\n", "\n", "You may download it from https://raw.github.com/wardi/iterables-iterators-generators/master/rps_server.py\n", "\n", "This code is a complete interactive game server. It supports character-by-character input with one player on the console and one connected by telnet. It detects players disconnecting and has a timer and countdown for disconnecting idle players. It also collects and reports game statistics.\n", "\n", "This code contains no classes other than the simple `Timeout` and `Disconnect` exceptions. Message passing is accomplished with tuples, strings and integers.\n", "\n", "All of the code is easy to test. The limited nature of the generator interface encourages us to break up code into small, simple, testable parts.\n", "\n", "This code is 100% asynchronous with no threads, callback chains, queues, locking or monkey patching required." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary\n", "\n", "Iterables\n", "\n", "- may create an iterator with `iter(x)`\n", "\n", " - implement `.__getitem__()`, which raises `IndexError` after the last element\n", " - *or* implement `.__iter__()`\n", "\n", "Iterators\n", "\n", "- are iterables, where `iter(x)` typically returns `x`\n", "- maintain some kind of \"position\" state\n", "- implement `.next()`, which raises `StopIteration` after the last element\n", "\n", "Generators\n", "\n", "- are iterators\n", "- come with `.next()`, `.send(value)`, `.close()` and `.throw(exception)` methods for free!\n", "- \"position\" state includes local variables and program counter\n", "- [PEP 380](http://www.python.org/dev/peps/pep-0380/): delegation with \"yield from\" available in Python 3.3+\n", "- CAUTION: `StopIteration` exceptions will cause trouble: consider next(x, default) builtin\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### See also\n", "\n", "- Python's built-in [itertools](http://docs.python.org/2/library/itertools.html) module inspired by constructs from APL, Haskell, and SML\n", "- Ned Bachelder's Python Iteration Talk http://bit.ly/pyiter\n", "- David M. Beazley's Generator Tricks for Systems Programmers http://www.dabeaz.com/generators/\n", "- The future: [PEP 3156](http://www.python.org/dev/peps/pep-3156/):\n", " Unified Asynchronous IO Support (Twisted, Tornado, ZeroMQ)" ] } ], "metadata": {} } ] }