{ "metadata": { "name": "", "signature": "sha256:aab834d6d66727b51b1bd5e3d9d09345d9a219de070e95d5de1ae88e0cad69c7" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "%%html\n", "" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview and getting started\n", "\n", "\n", "## Introduction\n", "\n", "Let's start with an overview of IPython's architecture for parallel\n", "and distributed computing. This architecture abstracts out parallelism\n", "in a very general way, which enables IPython to support many different\n", "styles of parallelism including:\n", "\n", "- Single program, multiple data (SPMD) parallelism\n", "- Multiple program, multiple data (MPMD) parallelism\n", "- Message passing using MPI or \u00d8MQ\n", "- Task farming\n", "- Data parallel\n", "- Coordination of distributed processes\n", "- Combinations of these approaches\n", "- Custom user defined approaches\n", "\n", "Most importantly, IPython enables all types of parallel applications to\n", "be developed, executed, debugged and monitored *interactively*. Hence,\n", "the `I` in `IPython`. Some example use cases for\n", "`IPython.parallel`:\n", "\n", "- Quickly parallelize algorithms that are embarrassingly parallel\n", " using a number of simple approaches. Many simple things can be\n", " parallelized interactively in one or two lines of code.\n", "\n", "- Steer traditional MPI applications on a supercomputer from an\n", " IPython session on your laptop.\n", "\n", "- Analyze and visualize large datasets (that could be remote and/or\n", " distributed) interactively using IPython and tools like\n", " matplotlib/TVTK.\n", "\n", "- Develop, test and debug new parallel algorithms (that may use MPI or PyZMQ)\n", " interactively.\n", "\n", "- Tie together multiple MPI jobs running on different systems into one\n", " giant distributed and parallel system.\n", "\n", "- Start a parallel job on your cluster and then have a remote\n", " collaborator connect to it and pull back data into their local\n", " IPython session for plotting and analysis.\n", "\n", "- Run a set of tasks on a set of CPUs using dynamic load balancing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Architecture overview\n", "\n", "\n", "\n", "The IPython architecture consists of four components:\n", "\n", "- The IPython engine\n", "- The IPython hub\n", "- The IPython schedulers\n", "- The cluster client\n", "\n", "These components live in the `IPython.parallel` package and are\n", "installed with IPython." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IPython engine\n", "\n", "The IPython engine is a Python instance that accepts Python commands over\n", "a network connection. When multiple engines are started, parallel\n", "and distributed computing becomes possible. An important property of an\n", "IPython engine is that it blocks while user code is being executed. Read\n", "on for how the IPython controller solves this problem to expose a clean\n", "asynchronous API to the user." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IPython controller\n", "\n", "\n", "The IPython controller processes provide an interface for working with a\n", "set of engines. At a general level, the controller is a collection of\n", "processes to which IPython engines and clients can connect. The\n", "controller is composed of a `Hub` and a collection of\n", "`Schedulers`, which may be in processes or threads.\n", "\n", "The controller provides a single point of contact for users who\n", "wish to utilize the engines in the cluster. There is a variety of\n", "different ways of working with a controller, but all of these\n", "models are implemented via the `View.apply` method, after\n", "constructing `View` objects to represent different collections engines.\n", "The two primary models for interacting with engines are:\n", "\n", "- A **Direct** interface, where engines are addressed explicitly.\n", "- A **LoadBalanced** interface, where the Scheduler is trusted with\n", " assigning work to appropriate engines.\n", "\n", "Advanced users can readily extend the View models to enable other styles\n", "of parallelism." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The Hub\n", "\n", "The center of an IPython cluster is the Hub. The Hub can be viewed as an \u00fcber-logger, which keeps track of engine connections, schedulers, clients, as well as persist all\n", "task requests and results in a database for later use." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Schedulers\n", "\n", "All actions that can be performed on the engine go through a Scheduler.\n", "While the engines themselves block when user code is run, the schedulers\n", "hide that from the user to provide a fully asynchronous interface to a\n", "set of engines. Each Scheduler is a small GIL-less function in C provided\n", "by pyzmq (the Python load-balanced scheduler being an exception). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## \u00d8MQ and PyZMQ\n", "\n", "All of this is implemented with the lovely \u00d8MQ messaging library,\n", "and pyzmq, the lightweight Python bindings, which allows very fast\n", "zero-copy communication of objects like numpy arrays." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## IPython client and views\n", "\n", "There is one primary object, the `Client`, for\n", "connecting to a cluster. For each execution model, there is a\n", "corresponding `View`. These views allow users to\n", "interact with a set of engines through the interface. Here are the two\n", "default views:\n", "\n", "- The `DirectView` class for explicit addressing.\n", "- The `LoadBalancedView` class for destination-agnostic\n", " scheduling." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Started\n", "\n", "## Starting the IPython controller and engines\n", "\n", "To follow along with this tutorial, you will need to start the IPython\n", "controller and four IPython engines. The simplest way of doing this is\n", "with the [clusters tab](/#clusters),\n", "or you can use the `ipcluster` command in a terminal:\n", "\n", " $ ipcluster start -n 4\n", "\n", "There isn't time to go into it here, but ipcluster can be used to start engines\n", "and the controller with various batch systems including:\n", "\n", "* SGE\n", "* PBS\n", "* LSF\n", "* MPI\n", "* SSH\n", "* WinHPC\n", "\n", "More information on starting and configuring the IPython cluster in \n", "[the IPython.parallel docs](http://ipython.org/ipython-doc/stable/parallel/parallel_process.html).\n", "\n", "Once you have started the IPython controller and one or more engines,\n", "you are ready to use the engines to do something useful. \n", "\n", "To make sure everything is working correctly, let's do a very simple demo:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython import parallel\n", "rc = parallel.Client()\n", "rc.block = True" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rc.ids" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "def mul(a,b):\n", " return a*b" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "def summary():\n", " \"\"\"summarize some info about this process\"\"\"\n", " import os\n", " import socket\n", " import sys\n", " return {\n", " 'cwd': os.getcwd(),\n", " 'Python': sys.version,\n", " 'hostname': socket.gethostname(),\n", " 'pid': os.getpid(),\n", " }" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "mul(5,6)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "summary()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What does it look like to call this function remotely?\n", "\n", "Just turn `f(*args, **kwargs)` into `view.apply(f, *args, **kwargs)`!" ] }, { "cell_type": "code", "collapsed": false, "input": [ "rc[0].apply(mul, 5, 6)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rc[0].apply(summary)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the same thing in parallel?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "rc[:].apply(mul, 5, 6)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rc[:].apply(summary)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python has a builtin map for calling a function with a variety of arguments" ] }, { "cell_type": "code", "collapsed": false, "input": [ "map(mul, range(1,10), range(2,11))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So how do we do this in parallel?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "view = rc.load_balanced_view()\n", "view.map(mul, range(1,10), range(2,11))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And a preview of parallel magics:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%px\n", "import os, socket\n", "print os.getpid()\n", "print socket.gethostname()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's get into some more detail about how to use IPython for [remote execution](tutorial/Remote Execution.ipynb)." ] } ], "metadata": {} } ] }