{ "metadata": { "name": "", "signature": "sha256:c190aac3ea5e1c2a5d121202bace3628535c8c9c354cbef175cf4efc8c2690f7" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Overview of IPython.parallel" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "IPython is a tool for interactive and exploratory computing.\n", "We have seen that IPython's kernel provides a mechanism for interactive\n", "*remote* computation, and we have extended this same mechanism for\n", "interactive remote *parallel* computation, simply by having multiple\n", "kernels." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![wideview](../figs/wideView400.png)" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Architecture overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At a high level, there are three basic components to parallel IPython:\n", "\n", "* Engine(s) - the remote or distributed processes where your code runs.\n", "* Client - your interface to running code on Engines.\n", "* Controller - the collection of processes that coordinate Engines and Clients.\n", "\n", "These components live in the `IPython.parallel` package and are installed with IPython." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This leads to a usage model where you can create as many engines as you want per compute node, and then control them all from your clients via a central 'controller' object that encapsulates the hub and schedulers:\n", "\n", "![cluster](../figs/ipcluster-kernels.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IPython engine\n", "\n", "The Engine is simply a remote Python namespace where your code is run,\n", "and is identical to the kernel used elsewhere in IPython.\n", "\n", "It can do all the magics, pylab integration, and everything else you can do in a regular IPython session." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IPython controller\n", "\n", "The Controller is a collection of processes:\n", "\n", "* Schedulers relay messages between Engines and Clients.\n", "* The Hub monitors the cluster state.\n", "\n", "Together, these processes provide a single connection point for your clients and engines.\n", "Each Scheduler is a small GIL-less function in C provided by pyzmq (the Python load-balancing scheduler being an exception).\n", "\n", "The Hub can be viewed as an \u00fcber-logger,\n", "which monitors all communication between clients and engines,\n", "and can log to a database (e.g. SQLite or MongoDB) for later retrieval or resubmission.\n", "The Hub is not involved in execution in any way,\n", "and a slow Hub cannot slow down submission of tasks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Schedulers\n", "\n", "All actions that can be performed on the engine go through a Scheduler.\n", "While the engines themselves block when user code is run,\n", "the schedulers hide that from the user to provide a fully asynchronous interface to a set of engines. " ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "IPython client and views" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is one primary object, the `Client`, for connecting to a cluster.\n", "For each execution model there is a corresponding `View`,\n", "and you determine how your work should be executed on the cluster by creating different views\n", "or manipulating attributes of views.\n", "\n", "The two basic views:\n", "\n", "- The `DirectView` class for explicitly running code on particular engine(s).\n", "- The `LoadBalancedView` class for destination-agnostic scheduling.\n", "\n", "You can use as many views of each kind as you like, all at the same time." ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Getting Started" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Starting the IPython controller and engines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The quickest way to get started is to visit the 'clusters' [tab](/#tab2),\n", "and start some engines with the 'default' profile." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To follow along with this tutorial, you will need to start the IPython\n", "controller and some IPython engines. The simplest way of doing this is\n", "visit the 'clusters' [tab](/#tab2),\n", "and start some engines with the 'default' profile,\n", "or to use the `ipcluster` command:\n", "\n", " $ ipcluster start -n 4\n", "\n", "There isn't time to go into it here, but ipcluster can be used to start engines\n", "and the controller with various batch systems including:\n", "\n", "* SGE\n", "* PBS\n", "* LSF\n", "* MPI\n", "* SSH\n", "* WinHPC\n", "\n", "More information on starting and configuring the IPython cluster in \n", "[the IPython.parallel docs](http://ipython.org/ipython-doc/dev/parallel/parallel_process.html).\n", "\n", "Once you have started the IPython controller and one or more engines,\n", "you are ready to use the engines to do something useful. \n", "\n", "To make sure everything is working correctly, let's do a very simple demo:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython import parallel\n", "rc = parallel.Client()\n", "rc.block = True" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "rc.ids" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "[0, 1, 2, 3]" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's define a simple function" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def mul(a,b):\n", " return a*b" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "mul(5,6)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "30" ] } ], "prompt_number": 4 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Create some Views" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dview = rc[:]\n", "dview" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "e0 = rc[0]\n", "e0" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "What does it look like to call this function remotely?\n", "\n", "Just turn `f(*args, **kwargs)` into `view.apply(f, *args, **kwargs)`!" ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.apply(mul, 5, 6)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "30" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the same thing in parallel?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.apply(mul, 5, 6)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "[30, 30, 30, 30]" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python has a builtin map for calling a function with a sequence of arguments" ] }, { "cell_type": "code", "collapsed": false, "input": [ "map(mul, range(1,10), range(2,11))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "[2, 6, 12, 20, 30, 42, 56, 72, 90]" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So how do we do this in parallel?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.map(mul, range(1,10), range(2,11))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "[2, 6, 12, 20, 30, 42, 56, 72, 90]" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also run code in strings with `execute`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.execute(\"import os\")\n", "dview.execute(\"a = os.getpid()\")" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And treat the view as a dict lets you access the remote namespace:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dview['a']" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "[24377, 24380, 24381, 24384]" ] } ], "prompt_number": 12 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "AsyncResults" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you do async execution, the calls return an AsyncResult object immediately" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def wait_here(t):\n", " import time, os\n", " time.sleep(t)\n", " return os.getpid()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "ar = dview.apply_async(wait_here, 2)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "ar.get()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "[24377, 24380, 24381, 24384]" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }