{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.7. Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's import PyCUDA." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pycuda.driver as cuda\n", "import pycuda.autoinit\n", "from pycuda.compiler import SourceModule\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we initialize the NumPy array that will contain the fractal." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "size = 200\n", "iterations = 100\n", "col = np.empty((size, size), dtype=np.int32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We allocate memory for this array on the GPU." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "col_gpu = cuda.mem_alloc(col.nbytes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We write the CUDA kernel in a string. The mandelbrot function accepts the figure size, the number of iterations, and a pointer to the memory buffer as arguments. It updates the col buffer with the escape value in the fractal for each pixel." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "code = \"\"\"\n", "__global__ void mandelbrot(int size,\n", " int iterations,\n", " int *col)\n", "\n", "{\n", " // Get the row and column index of the current thread.\n", " int i = blockIdx.y * blockDim.y + threadIdx.y;\n", " int j = blockIdx.x * blockDim.x + threadIdx.x;\n", " int index = i * size + j;\n", " \n", " // Declare and initialize the variables.\n", " double cx, cy;\n", " double z0, z1, z0_tmp, z0_2, z1_2;\n", " cx = -2.0 + (double)j / size * 3;\n", " cy = -1.5 + (double)i / size * 3;\n", "\n", " // Main loop.\n", " z0 = z1 = 0.0;\n", " for (int n = 0; n < iterations; n++)\n", " {\n", " z0_2 = z0 * z0;\n", " z1_2 = z1 * z1;\n", " if (z0_2 + z1_2 <= 100)\n", " {\n", " // Need to update z0 and z1 in parallel.\n", " z0_tmp = z0_2 - z1_2 + cx;\n", " z1 = 2 * z0 * z1 + cy;\n", " z0 = z0_tmp;\n", " col[index] = n;\n", " }\n", " else break;\n", " }\n", "}\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we compile the CUDA program." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "prg = SourceModule(code)\n", "mandelbrot = prg.get_function(\"mandelbrot\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We define the block size and the grid size, specifying how the threads will be parallelized with respect to the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "block_size = 10\n", "block = (block_size, block_size, 1)\n", "grid = (size // block_size, size // block_size, 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We call the compiled function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mandelbrot(np.int32(size), np.int32(iterations), col_gpu,\n", " block=block, grid=grid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the function has completed, we copy the contents of the CUDA buffer back to the NumPy array col." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cuda.memcpy_dtoh(col, col_gpu)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's display the fractal." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "plt.imshow(np.log(col), cmap=plt.cm.hot,);\n", "plt.xticks([]);\n", "plt.yticks([]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's evaluate the time taken by this function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%timeit col_gpu = cuda.mem_alloc(col.nbytes); cuda.memcpy_htod(col_gpu, col)\n", "mandelbrot(np.int32(size), np.int32(iterations), col_gpu,\n", " block=block, grid=grid)\n", "cuda.memcpy_dtoh(col, col_gpu)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }