{ "metadata": { "name": "9932_03_01" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Chapter 3, example 1\n", "====================\n", "\n", "We illustrate the notion of array computation with a simple example consisting of finding, among a large list of locations, the closest one from a position of interest." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pure Python version\n", "\n", "The pure Python version loops through all positions, computes the distance from the position of interest, and returns the index of the closest location." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def closest(position, positions):\n", " x0, y0 = position\n", " dbest, ibest = None, None\n", " for i, (x, y) in enumerate(positions):\n", " d = (x - x0) ** 2 + (y - y0) ** 2\n", " if dbest is None or d < dbest:\n", " dbest, ibest = d, i\n", " return ibest\n", "import random" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We generate a list of random positions." ] }, { "cell_type": "code", "collapsed": false, "input": [ "positions = [(random.random(), random.random()) for _ in xrange(10000000)]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we evaluate the time required to compute the closest position from `(0.5, 0.5)`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit closest((.5, .5), positions)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 12.8 s per loop\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vectorized version\n", "\n", "This version uses NumPy to compute all distances in a vectorized way, without using Python loops. This version is far more efficient than the pure Python version." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we need to activate the pylab mode so that NumPy is loaded and the NumPy objects are available in the current namespace." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pylab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].\n", "For more information, type 'help(pylab)'.\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generating random positions with NumPy's `rand` function is much faster than using a list." ] }, { "cell_type": "code", "collapsed": false, "input": [ "positions = rand(10000000, 2)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`positions` is a NumPy `ndarray` object." ] }, { "cell_type": "code", "collapsed": false, "input": [ "type(positions)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 6, "text": [ "numpy.ndarray" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "It has two dimensions and a shape of `(10000000, 2)`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "positions.ndim, positions.shape" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 7, "text": [ "(2, (10000000, 2))" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can easily get the columns of this matrix, which contain here the x and y coordinates of all positions." ] }, { "cell_type": "code", "collapsed": false, "input": [ "x, y = positions[:,0], positions[:,1]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we compute the distances in a vectorized fashion, which is much more efficient than with a Python loop." ] }, { "cell_type": "code", "collapsed": false, "input": [ "distances = (x - .5) ** 2 + (y - .5) ** 2" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit exec In[9]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 341 ms per loop\n" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit ibest = distances.argmin()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 24.2 ms per loop\n" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using NumPy is about 35 times faster than using pure Python code here." ] } ], "metadata": {} } ] }