{ "metadata": { "name": "", "signature": "sha256:68d985ab383f6a47901c566e128b2baa5c65d48f434457de01e2e766b3d2a101" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DirectView as multiplexer" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import os,sys,time\n", "import numpy as np\n", "\n", "from IPython.core.display import display\n", "from IPython import parallel\n", "rc = parallel.Client()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The DirectView can be readily understood as an Engine Multiplexer -\n", "it does the same thing on all of its engines.\n", "\n", "The only difference between running code on a single remote engine\n", "and running code in parallel is how many engines the DirectView is\n", "instructed to use.\n", "\n", "You can create DirectViews by index-access to the Client. This creates\n", "a DirectView using the engines after passing the same index (or slice)\n", "to the `ids` list." ] }, { "cell_type": "code", "collapsed": true, "input": [ "e0 = rc[0]\n", "engines = rc[:]\n", "even = rc[::2]\n", "odd = rc[1::2]\n", "\n", "# this is the one we are going to use:\n", "dview = engines\n", "dview.block = True" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, the only difference from single-engine remote execution is that the code we run happens on all of the engines of a given view:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for view in (e0, engines, even, odd):\n", " print view, view.apply_sync(os.getpid)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results of multiplexed execution is always a list of the length of the number of engines." ] }, { "cell_type": "code", "collapsed": false, "input": [ "engines['a'] = 5\n", "engines['a']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Scatter and Gather\n", "\n", "Lots of parallel computations involve partitioning data onto processes. \n", "DirectViews have `scatter()` and `gather()` methods, to help with this.\n", "Pass any container or numpy array, and IPython will partition the object onto the engines wih `scatter`,\n", "or reconstruct the full object in the Client with `gather()`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.scatter('a',range(16))\n", "dview['a']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.gather('a')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.execute(\"asum = sum(a)\")\n", "dview.gather('asum')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can pass a 'flatten' keyword,\n", "to instruct engines that will only get one item of the list to\n", "get the actual item, rather than a one-element sublist:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.scatter('id',rc.ids)\n", "dview['id']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.scatter('id',rc.ids, flatten=True)\n", "dview['id']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Scatter and gather also work with numpy arrays" ] }, { "cell_type": "code", "collapsed": false, "input": [ "A = np.random.randint(1,10,(16,4))\n", "B = np.random.randint(1,10,(4,16))\n", "display(A)\n", "display(B)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.scatter('A', A)\n", "dview.scatter('B', B)\n", "display(e0['A'])\n", "display(e0['B'])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Excercise: Parallel Matrix Multiply" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can you compute the Matrix product `C=A.dot(B)` in parallel? (not looking for brilliant, just correct).\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%run ../hints\n", "mmhint()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": true, "input": [ "%load soln/matmul.py" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's run this, and validate the result against a local computation." ] }, { "cell_type": "code", "collapsed": false, "input": [ "C_ref = A.dot(B)\n", "C1 = pdot(dview, A, B)\n", "# validation:\n", "(C1==C_ref).all()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Map\n", "\n", "DirectViews have a map method, which behaves just like the builtin map,\n", "but computed in parallel." ] }, { "cell_type": "code", "collapsed": false, "input": [ "dview.block = True\n", "\n", "serial_result = map(lambda x:x**10, range(32))\n", "parallel_result = dview.map(lambda x:x**10, range(32))\n", "\n", "serial_result==parallel_result" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`DirectView.map` partitions the sequences onto each engine,\n", "and then calls `map` remotely. The result is always a single\n", "IPython task per engine." ] }, { "cell_type": "code", "collapsed": false, "input": [ "amr = dview.map_async(lambda x:x**10, range(32))\n", "amr.msg_ids" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "amr = dview.map_async(lambda x:x**10, range(3200))\n", "amr.msg_ids" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "The motivating example" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import display, Image\n", "%run ../images_common" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "pictures = get_pictures(os.path.join('..', 'images', 'castle'))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%px cd {os.getcwd()}" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%%px\n", "import matplotlib\n", "matplotlib.use('Agg')\n", "import matplotlib.pyplot as plt\n", "\n", "from skimage.io import imread\n", "from skimage import measure" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "engines.push(dict(\n", " plot_contours=plot_contours,\n", " find_contours=find_contours,\n", "))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "ar = e0.apply_async(get_contours_image, pictures[0])\n", "ar.wait_interactive()\n", "Image(data=ar.get())" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "amr = engines.map_async(get_contours_image, pictures[:len(engines)])\n", "amr.wait_interactive()\n", "for pngdata in amr:\n", " display(Image(data=pngdata))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercises and Examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Remote Iteration](exercises/Remote%20Iteration.ipynb)\n", "- [Monte Carlo \u03c0](../exercises/Monte%20Carlo%20\u03c0.ipynb)" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Moving on" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "IPython.parallel can also be used for [load-balanced execution](Load-Balancing.ipynb),\n", "when you just want code to run, but don't care where." ] } ], "metadata": {} } ] }