{ "metadata": { "name": "501-parallel-computing" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Chapter 5, example 1\n", "====================\n", "\n", "Here we illustrate the basic parallel computing capabilities of IPython." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, IPython engines must be started, for example with the following command to launch 2 engines (one per core):\n", "\n", " ipcluster start -n 2" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.parallel import Client" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Client` allows to start jobs on the engines." ] }, { "cell_type": "code", "collapsed": false, "input": [ "rc = Client()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can obtain the engines identifiers through the client." ] }, { "cell_type": "code", "collapsed": false, "input": [ "rc.ids" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "[0, 1]" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**ERRATUM**: the original code did not contain `%px` before the `import os` statement. This magic command is necessary so that the import occurs on all engines." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%px import os" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `%px` magic commands allows to execute commands in parallel on every engine." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%px print(os.getpid())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[stdout:0] 3256\n", "[stdout:1] 1056\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can specify with `%pxconfig` the engine identifiers which the commands should be executed on (here, the second engine)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pxconfig --targets 1" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "%px print(os.getpid())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1056\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another possibility is to use the `%%px` cell magic to run an entire cell on all engines. The `--targets` option can accept a slice object (here, all engines except the last one)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%px --targets :-1\n", "print(os.getpid())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[stdout:0] 3256\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the parallel calls are synchronous (blocking) but we can ask IPython to make asynchronous calls." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%px --noblock\n", "import time\n", "time.sleep(1)\n", "os.getpid()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "With asynchronous (non-blocking) calls, the results can be obtained synchronously from the engines with `%pxresult`. This call is blocking." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pxresult" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "text": [ "\u001b[0;31mOut[1:4]: \u001b[0m1056" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another option to run tasks on the engines is to use `map`. First, we need to retrieve a view on the engines, which represents a particular set of engines among the ones that are running." ] }, { "cell_type": "code", "collapsed": false, "input": [ "v = rc[:]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We import a module on each engine." ] }, { "cell_type": "code", "collapsed": false, "input": [ "with v.sync_imports():\n", " import time" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "importing time on engine(s)\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We define a simple function." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def f(x):\n", " time.sleep(1)\n", " return x * x" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we call `map_sync`, which is a synchronous and parallel version of Python's built-in `map` function. We execute `f` on all integers between 0 and 9 in parallel across all engines." ] }, { "cell_type": "code", "collapsed": false, "input": [ "v.map_sync(f, range(10))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check how much time the native function takes." ] }, { "cell_type": "code", "collapsed": false, "input": [ "timeit -n 1 -r 1 map(f, range(10))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 1: 10 s per loop\n" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we compare with the time taken by the parallel version." ] }, { "cell_type": "code", "collapsed": false, "input": [ "r = v.map(f, range(10))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "r.ready(), r.elapsed" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "(False, 0.065)" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We wait and get the results." ] }, { "cell_type": "code", "collapsed": false, "input": [ "r.get()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "r.elapsed, r.serial_time" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "(5.009, 10.0)" ] } ], "prompt_number": 19 } ], "metadata": {} } ] }