{ "metadata": { "name": "", "signature": "sha256:1f649b493f084f42e79d0165b931ab102e70c1990d5eb0276965e1505a3c8281" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "The basic interface for remote computation with IPython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A Client is the low-level object which manages your connection to the various Schedulers and the Hub.\n", "Everything you do passes through one of these objects, either indirectly or directly.\n", "\n", "It has an `ids` property, which is always an up-to-date list of the integer engine IDs currently available." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import os,sys,time\n", "import numpy\n", "\n", "from IPython import parallel\n", "rc = parallel.Client()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rc.ids" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most basic function of the **Client** is to create the **View** objects,\n", "which are the interfaces for actual communication with the engines.\n", "\n", "There are two basic models for working with engines. Let's start with the simplest case for remote execution, a DirectView of one engine:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0 = rc[0] # index-access of a client gives us a DirectView\n", "e0.block = True # let's start synchronous\n", "e0" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's all about:\n", "\n", "```python\n", "view.apply(f, *args, **kwargs)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want the interface for remote and parallel execution to be as natural as possible.\n", "And what's the most natural unit of execution? Code! Simply define a function,\n", "just as you would use locally, and instead of calling it, pass it to `view.apply()`,\n", "with the remaining arguments just as you would have passed them to the function." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def get_norms(A, levels=[2]):\n", " \"\"\"get all the requested norms for an array\"\"\"\n", " norms = {}\n", " \n", " for level in levels:\n", " norms[level] = numpy.linalg.norm(A, level)\n", " return norms\n", "\n", "A = numpy.random.random(1024)\n", "get_norms(A, levels=[1,2,3,numpy.inf])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To call this remotely, simply replace '`get_norms(`' with '`e0.apply(get_norms,`'. This replacement is generally true for turning local execution into remote.\n", "\n", "Note that this will probably raise a `NameError` on numpy:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.apply(get_norms, A, levels=[1,2,3,numpy.inf])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simplest way to import numpy is to do:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.execute(\"import numpy\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But if you want to simultaneously import modules locally and globally, you can use `view.sync_imports()`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with e0.sync_imports():\n", " import numpy" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.apply(get_norms, A, levels=[1,2,3,numpy.inf])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Functions don\u2019t have to be interactively defined, you can use module functions as well:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.apply(numpy.linalg.norm, A, 2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "execute and run" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also run files or strings with `run` and `execute`\n", "respectively.\n", "\n", "For instance, I have a script `myscript.py` that defines a function\n", "`mysquare`:\n", "\n", "```python\n", "import math\n", "import numpy\n", "import sys\n", "\n", "a=5\n", "\n", "def mysquare(x):\n", " return x*x\n", "```\n", "\n", "I can run that remotely, just like I can locally with `%run`, and then I\n", "will have `mysquare()`, and any imports and globals from the script in the\n", "engine's namespace:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pycat myscript.py" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.run(\"myscript.py\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.execute(\"b=mysquare(a)\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "e0['a']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "e0['b']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with the engine namespace\n", "\n", "The namespace on the engine is accessible to your functions as\n", "`globals`. So if you want to work with values that persist in the engine namespace, you just use\n", "global variables." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def inc_a(increment):\n", " global a\n", " a += increment\n", "\n", "print(\" %2i\" % e0['a'])\n", "e0.apply(inc_a, 5)\n", "print(\" + 5\")\n", "print(\" = %2i\" % e0['a'])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And just like the rest of Python, you don\u2019t have to specify global variables if you aren\u2019t assigning to them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def mul_by_a(b):\n", " return a*b\n", "\n", "e0.apply(mul_by_a, 10)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to do multiple actions on data, you obviously don\u2019t want to send it every time. For this, we have a `Reference` class. A Reference is just a wrapper for an identifier that gets unserialized by pulling the corresponding object out of the engine namespace." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def is_it_a(b):\n", " return a is b\n", "\n", "e0.apply(is_it_a, 5)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.apply(is_it_a, parallel.Reference('a'))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`parallel.Reference` is useful to avoid repeated data movement." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Moving data around\n", "\n", "In addition to calling functions and executing code on engines, you can\n", "transfer Python objects to and from your IPython session and the\n", "engines. In IPython, these operations are called `push` (sending an\n", "object to the engines) and `pull` (getting an object from the\n", "engines).\n", "\n", "push takes a dictionary, used to update the remote namespace:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.push(dict(a=1.03234, b=3453))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "pull takes one or more keys:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.pull('a')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.pull(('b','a'))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dictionary interface" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "treating a DirectView like a dictionary results in push/pull operations:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0['a'] = range(5)\n", "e0.execute('b = a[::-1]')\n", "e0['b']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`get()` and `update()` work as well." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Exercise: Remote matrix operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can you get the eigenvalues (`numpy.linalg.eigvals` and norms (`numpy.linalg.norm`) of an array that's already on e0:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "A = np.random.random((16,16))\n", "A = A.dot(A.T)\n", "e0['A'] = A" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "np.linalg.eigvals(A)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "np.linalg.norm(A, 2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Asynchronous execution\n", "\n", "We have covered the basic methods for running code remotely, but we have been using `block=True`. We can also do non-blocking execution." ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.block = False" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In non-blocking mode, `apply` submits the command to be executed and\n", "then returns a `AsyncResult` object immediately. The `AsyncResult`\n", "object gives you a way of getting a result at a later time through its\n", "`get()` method.\n", "\n", "The AsyncResult object provides a superset of the interface in [`multiprocessing.pool.AsyncResult`](http://docs.python.org/library/multiprocessing#multiprocessing.pool.AsyncResult).\n", "See the official Python documentation for more." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def wait(t):\n", " import time\n", " tic = time.time()\n", " time.sleep(t)\n", " return time.time()-tic" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "ar = e0.apply(wait, 10)\n", "ar" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ar.ready()` tells us if the result is ready" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ar.ready()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ar.get()` blocks until the result is ready, or a timeout is reached, if one is specified" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%time ar.get(1)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%time ar.get()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For convenience, you can set block for a single call with the extra sync/async methods:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "e0.apply_sync(os.getpid)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "ar = e0.apply_async(os.getpid)\n", "ar" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "ar.get()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "ar.metadata" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the basic interface covered, we can really get going [in Parallel](Multiplexing.ipynb)." ] } ], "metadata": {} } ] }