{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.7. Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's import PyCUDA."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import pycuda.driver as cuda\n",
    "import pycuda.autoinit\n",
    "from pycuda.compiler import SourceModule\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we initialize the NumPy array that will contain the fractal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "size = 200\n",
    "iterations = 100\n",
    "col = np.empty((size, size), dtype=np.int32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We allocate memory for this array on the GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "col_gpu = cuda.mem_alloc(col.nbytes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We write the CUDA kernel in a string. The mandelbrot function accepts the figure size, the number of iterations, and a pointer to the memory buffer as arguments. It updates the col buffer with the escape value in the fractal for each pixel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "code = \"\"\"\n",
    "__global__ void mandelbrot(int size,\n",
    "                           int iterations,\n",
    "                           int *col)\n",
    "\n",
    "{\n",
    "    // Get the row and column index of the current thread.\n",
    "    int i = blockIdx.y * blockDim.y + threadIdx.y;\n",
    "    int j = blockIdx.x * blockDim.x + threadIdx.x;\n",
    "    int index = i * size + j;\n",
    "    \n",
    "    // Declare and initialize the variables.\n",
    "    double cx, cy;\n",
    "    double z0, z1, z0_tmp, z0_2, z1_2;\n",
    "    cx = -2.0 + (double)j / size * 3;\n",
    "    cy = -1.5 + (double)i / size * 3;\n",
    "\n",
    "    // Main loop.\n",
    "    z0 = z1 = 0.0;\n",
    "    for (int n = 0; n < iterations; n++)\n",
    "    {\n",
    "        z0_2 = z0 * z0;\n",
    "        z1_2 = z1 * z1;\n",
    "        if (z0_2 + z1_2 <= 100)\n",
    "        {\n",
    "            // Need to update z0 and z1 in parallel.\n",
    "            z0_tmp = z0_2 - z1_2 + cx;\n",
    "            z1 = 2 * z0 * z1 + cy;\n",
    "            z0 = z0_tmp;\n",
    "            col[index] = n;\n",
    "        }\n",
    "        else break;\n",
    "    }\n",
    "}\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we compile the CUDA program."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "prg = SourceModule(code)\n",
    "mandelbrot = prg.get_function(\"mandelbrot\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We define the block size and the grid size, specifying how the threads will be parallelized with respect to the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "block_size = 10\n",
    "block = (block_size, block_size, 1)\n",
    "grid = (size // block_size, size // block_size, 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We call the compiled function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "mandelbrot(np.int32(size), np.int32(iterations), col_gpu,\n",
    "           block=block, grid=grid)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the function has completed, we copy the contents of the CUDA buffer back to the NumPy array col."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cuda.memcpy_dtoh(col, col_gpu)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's display the fractal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "plt.imshow(np.log(col), cmap=plt.cm.hot,);\n",
    "plt.xticks([]);\n",
    "plt.yticks([]);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's evaluate the time taken by this function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%timeit col_gpu = cuda.mem_alloc(col.nbytes); cuda.memcpy_htod(col_gpu, col)\n",
    "mandelbrot(np.int32(size), np.int32(iterations), col_gpu,\n",
    "           block=block, grid=grid)\n",
    "cuda.memcpy_dtoh(col, col_gpu)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n",
    "\n",
    "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}