{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "*This notebook contains an excerpt from the [Whirlwind Tour of Python](http://www.oreilly.com/programming/free/a-whirlwind-tour-of-python.csp) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/WhirlwindTourOfPython).*\n", "\n", "*The text and code are released under the [CC0](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/LICENSE) license; see also the companion project, the [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook).*\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [List Comprehensions](11-List-Comprehensions.ipynb) | [Contents](Index.ipynb) | [Modules and Packages](13-Modules-and-Packages.ipynb) >" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generators" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we'll take a deeper dive into Python generators, including *generator expressions* and *generator functions*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generator Expressions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The difference between list comprehensions and generator expressions is sometimes confusing; here we'll quickly outline the differences between them:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List comprehensions use square brackets, while generator expressions use parentheses\n", "This is a representative list comprehension:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[n ** 2 for n in range(12)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While this is a representative generator expression:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ " at 0x104a60518>" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(n ** 2 for n in range(12))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that printing the generator expression does not print the contents; one way to print the contents of a generator expression is to pass it to the ``list`` constructor:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G = (n ** 2 for n in range(12))\n", "list(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A list is a collection of values, while a generator is a recipe for producing values\n", "When you create a list, you are actually building a collection of values, and there is some memory cost associated with that.\n", "When you create a generator, you are not building a collection of values, but a recipe for producing those values.\n", "Both expose the same iterator interface, as we can see here:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1 4 9 16 25 36 49 64 81 100 121 " ] } ], "source": [ "L = [n ** 2 for n in range(12)]\n", "for val in L:\n", " print(val, end=' ')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1 4 9 16 25 36 49 64 81 100 121 " ] } ], "source": [ "G = (n ** 2 for n in range(12))\n", "for val in G:\n", " print(val, end=' ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The difference is that a generator expression does not actually compute the values until they are needed.\n", "This not only leads to memory efficiency, but to computational efficiency as well!\n", "This also means that while the size of a list is limited by available memory, the size of a generator expression is unlimited!\n", "\n", "An example of an infinite generator expression can be created using the ``count`` iterator defined in ``itertools``:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count(0)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from itertools import count\n", "count()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1 2 3 4 5 6 7 8 9 10 " ] } ], "source": [ "for i in count():\n", " print(i, end=' ')\n", " if i >= 10: break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ``count`` iterator will go on happily counting forever until you tell it to stop; this makes it convenient to create generators that will also go on forever:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 11 13 17 19 23 29 31 37 41 " ] } ], "source": [ "factors = [2, 3, 5, 7]\n", "G = (i for i in count() if all(i % n > 0 for n in factors))\n", "for val in G:\n", " print(val, end=' ')\n", " if val > 40: break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might see what we're getting at here: if we were to expand the list of factors appropriately, what we would have the beginnings of is a prime number generator, using the Sieve of Eratosthenes algorithm. We'll explore this more momentarily." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A list can be iterated multiple times; a generator expression is single-use\n", "This is one of those potential gotchas of generator expressions.\n", "With a list, we can straightforwardly do this:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1 4 9 16 25 36 49 64 81 100 121 \n", "0 1 4 9 16 25 36 49 64 81 100 121 " ] } ], "source": [ "L = [n ** 2 for n in range(12)]\n", "for val in L:\n", " print(val, end=' ')\n", "print()\n", "\n", "for val in L:\n", " print(val, end=' ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A generator expression, on the other hand, is used-up after one iteration:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G = (n ** 2 for n in range(12))\n", "list(G)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can be very useful because it means iteration can be stopped and started:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1 4 9 16 25 36 \n", "doing something in between\n", "49 64 81 100 121 " ] } ], "source": [ "G = (n**2 for n in range(12))\n", "for n in G:\n", " print(n, end=' ')\n", " if n > 30: break\n", "\n", "print(\"\\ndoing something in between\")\n", "\n", "for n in G:\n", " print(n, end=' ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One place I've found this useful is when working with collections of data files on disk; it means that you can quite easily analyze them in batches, letting the generator keep track of which ones you have yet to see." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generator Functions: Using ``yield``\n", "We saw in the previous section that list comprehensions are best used to create relatively simple lists, while using a normal ``for`` loop can be better in more complicated situations.\n", "The same is true of generator expressions: we can make more complicated generators using *generator functions*, which make use of the ``yield`` statement.\n", "\n", "Here we have two ways of constructing the same list:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]\n", "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]\n" ] } ], "source": [ "L1 = [n ** 2 for n in range(12)]\n", "\n", "L2 = []\n", "for n in range(12):\n", " L2.append(n ** 2)\n", "\n", "print(L1)\n", "print(L2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, here we have two ways of constructing equivalent generators:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1 4 9 16 25 36 49 64 81 100 121\n", "0 1 4 9 16 25 36 49 64 81 100 121\n" ] } ], "source": [ "G1 = (n ** 2 for n in range(12))\n", "\n", "def gen():\n", " for n in range(12):\n", " yield n ** 2\n", "\n", "G2 = gen()\n", "print(*G1)\n", "print(*G2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A generator function is a function that, rather than using ``return`` to return a value once, uses ``yield`` to yield a (potentially infinite) sequence of values.\n", "Just as in generator expressions, the state of the generator is preserved between partial iterations, but if we want a fresh copy of the generator we can simply call the function again." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: Prime Number Generator\n", "Here I'll show my favorite example of a generator function: a function to generate an unbounded series of prime numbers.\n", "A classic algorithm for this is the *Sieve of Eratosthenes*, which works something like this:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]\n" ] } ], "source": [ "# Generate a list of candidates\n", "L = [n for n in range(2, 40)]\n", "print(L)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39]\n" ] } ], "source": [ "# Remove all multiples of the first value\n", "L = [n for n in L if n == L[0] or n % L[0] > 0]\n", "print(L)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2, 3, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, 35, 37]\n" ] } ], "source": [ "# Remove all multiples of the second value\n", "L = [n for n in L if n == L[1] or n % L[1] > 0]\n", "print(L)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]\n" ] } ], "source": [ "# Remove all multiples of the third value\n", "L = [n for n in L if n == L[2] or n % L[2] > 0]\n", "print(L)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we repeat this procedure enough times on a large enough list, we can generate as many primes as we wish.\n", "\n", "Let's encapsulate this logic in a generator function:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97\n" ] } ], "source": [ "def gen_primes(N):\n", " \"\"\"Generate primes up to N\"\"\"\n", " primes = set()\n", " for n in range(2, N):\n", " if all(n % p > 0 for p in primes):\n", " primes.add(n)\n", " yield n\n", "\n", "print(*gen_primes(100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's all there is to it!\n", "While this is certainly not the most computationally efficient implementation of the Sieve of Eratosthenes, it illustrates how convenient the generator function syntax can be for building more complicated sequences." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [List Comprehensions](11-List-Comprehensions.ipynb) | [Contents](Index.ipynb) | [Modules and Packages](13-Modules-and-Packages.ipynb) >" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }