{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "raw",
     "metadata": {},
     "source": [
      "Text provided under a Creative Commons Attribution license, CC-BY.  All code is made available under the FSF-approved MIT license.  (c) Lorena A. Barba, Gilbert F. Forsyth 2015. Thanks to NSF for support via CAREER award #1149784."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[@LorenaABarba](https://twitter.com/LorenaABarba)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This notebook complements the [interactive CFD online](https://bitbucket.org/cfdpython/cfd-python-class/overview) module **12 steps to Navier-Stokes**, addressing the issue of high performance with Python."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Optimizing Loops with Numba\n",
      "----\n",
      "***"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You will recall from our exploration of [array operations with NumPy](http://nbviewer.ipython.org/urls/github.com/barbagroup/CFDPython/blob/master/lessons/06_Array_Operations_with_NumPy.ipynb) that there are large speed gains to be had from implementing our discretizations using NumPy-optimized array operations instead of many nested loops.  \n",
      "\n",
      "[Numba](http://numba.pydata.org/) is a tool that offers another approach to optimizing our Python code.  Numba is a library for Python which turns Python functions into C-style compiled functions using LLVM.  Depending on the original code and the size of the problem, Numba can provide a significant speedup over NumPy optimized code.\n",
      "\n",
      "Let's revisit the 2D Laplace Equation:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from mpl_toolkits.mplot3d import Axes3D\n",
      "from matplotlib import cm\n",
      "from matplotlib import pyplot\n",
      "import numpy\n",
      "\n",
      "##variable declarations\n",
      "nx = 81\n",
      "ny = 81\n",
      "c = 1\n",
      "dx = 2.0/(nx-1)\n",
      "dy = 2.0/(ny-1)\n",
      "\n",
      "##initial conditions\n",
      "p = numpy.zeros((ny,nx)) ##create a XxY vector of 0's\n",
      "\n",
      "##plotting aids\n",
      "x = numpy.linspace(0,2,nx)\n",
      "y = numpy.linspace(0,1,ny)\n",
      "\n",
      "##boundary conditions\n",
      "p[:,0] = 0\t\t##p = 0 @ x = 0\n",
      "p[:,-1] = y\t\t##p = y @ x = 2\n",
      "p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
      "p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 27
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here is the function for iterating over the Laplace Equation that we wrote in Step 9:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def laplace2d(p, y, dx, dy, l1norm_target):\n",
      "    l1norm = 1\n",
      "    pn = numpy.empty_like(p)\n",
      "\n",
      "    while l1norm > l1norm_target:\n",
      "        pn = p.copy()\n",
      "        p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) \n",
      "        p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n",
      "        p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n",
      "    \n",
      "        p[:,0] = 0\t\t##p = 0 @ x = 0\n",
      "        p[:,-1] = y\t\t##p = y @ x = 2\n",
      "        p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
      "        p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n",
      "        l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n",
      "     \n",
      "    return p"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's use the `%%timeit` cell-magic to see how fast it runs:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%timeit\n",
      "laplace2d(p, y, dx, dy, .00001)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 206 us per loop\n"
       ]
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Ok!  Our function `laplace2d` takes around 206 *micro*-seconds to complete.  That's pretty fast and we have our array operations to thank for that.  Let's take a look at how long it takes using a more 'vanilla' Python version.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def laplace2d_vanilla(p, y, dx, dy, l1norm_target):\n",
      "    l1norm = 1\n",
      "    pn = numpy.empty_like(p)\n",
      "    nx, ny = len(y), len(y)\n",
      "\n",
      "    while l1norm > l1norm_target:\n",
      "        pn = p.copy()\n",
      "        \n",
      "        for i in range(1, nx-1):\n",
      "            for j in range(1, ny-1):\n",
      "                p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))\n",
      "                          \n",
      "        p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n",
      "        p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n",
      "    \n",
      "        p[:,0] = 0\t\t##p = 0 @ x = 0\n",
      "        p[:,-1] = y\t\t##p = y @ x = 2\n",
      "        p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
      "        p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n",
      "        l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n",
      "     \n",
      "    return p"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 29
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%timeit\n",
      "laplace2d_vanilla(p, y, dx, dy, .00001)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 32 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 30
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The simple Python version takes 32 *milli*-seconds to complete.  Let's calculate the speedup we gained in using array operations:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "32*1e-3/(206*1e-6)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 35,
       "text": [
        "155.33980582524273"
       ]
      }
     ],
     "prompt_number": 35
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So NumPy gives us a 155x speed increase over regular Python code.  That said, sometimes implementing our discretizations in array operations can be a little bit tricky.  \n",
      "\n",
      "Let's see what Numba can do.  We'll start by importing the special function decorator `autojit` from the `numba` library:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from numba import autojit"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 36
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To integrate Numba with our existing function, all we have to do it is prepend the `@autojit` function decorator before our `def` statement: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "@autojit\n",
      "def laplace2d_numba(p, y, dx, dy, l1norm_target):\n",
      "    l1norm = 1\n",
      "    pn = numpy.empty_like(p)\n",
      "\n",
      "    while l1norm > l1norm_target:\n",
      "        pn = p.copy()\n",
      "        p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) \n",
      "        p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n",
      "        p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n",
      "    \n",
      "        p[:,0] = 0\t\t##p = 0 @ x = 0\n",
      "        p[:,-1] = y\t\t##p = y @ x = 2\n",
      "        p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
      "        p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n",
      "        l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n",
      "     \n",
      "    return p"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 38
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The only lines that have changed are the `@autojit` line and also the function name, which has been changed so we can compare performance.  Now let's see what happens:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%timeit\n",
      "laplace2d_numba(p, y, dx, dy, .00001)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 137 us per loop"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n"
       ]
      }
     ],
     "prompt_number": 39
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Ok!  So it's not a 155x speed increase like we saw between vanilla Python and NumPy, but it is a non-trivial gain in performance time, especially given how easy it was to implement.  Another cool feature of Numba is that you can use the `@autojit` decorator on non-array operation functions, too.  Let's try adding it onto our vanilla version:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "@autojit\n",
      "def laplace2d_vanilla_numba(p, y, dx, dy, l1norm_target):\n",
      "    l1norm = 1\n",
      "    pn = numpy.empty_like(p)\n",
      "    nx, ny = len(y), len(y)\n",
      "\n",
      "    while l1norm > l1norm_target:\n",
      "        pn = p.copy()\n",
      "        \n",
      "        for i in range(1, nx-1):\n",
      "            for j in range(1, ny-1):\n",
      "                p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))\n",
      "                          \n",
      "        p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n",
      "        p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n",
      "    \n",
      "        p[:,0] = 0\t\t##p = 0 @ x = 0\n",
      "        p[:,-1] = y\t\t##p = y @ x = 2\n",
      "        p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
      "        p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n",
      "        l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n",
      "     \n",
      "    return p"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 41
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%timeit\n",
      "laplace2d_vanilla_numba(p, y, dx, dy, .00001)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 561 us per loop"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n"
       ]
      }
     ],
     "prompt_number": 42
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "561 micro-seconds.  That's not quite the 155x increase we saw with NumPy, but it's close.  And all we did was add one line of code.  \n",
      "\n",
      "So we have:\n",
      "\n",
      "Vanilla Python: 32 milliseconds \n",
      "\n",
      "NumPy Python: 206 microseconds \n",
      "\n",
      "Vanilla + Numba: 561 microseconds\n",
      "\n",
      "NumPy + Numba:  137 microseconds\n",
      "\n",
      "Clearly the NumPy + Numba combination is the fastest, but the ability to quickly optimize code with nested loops can also come in very handy in certain applications.  \n",
      "\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 42
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.core.display import HTML\n",
      "def css_styling():\n",
      "    styles = open(\"../styles/custom.css\", \"r\").read()\n",
      "    return HTML(styles)\n",
      "css_styling()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<link href='http://fonts.googleapis.com/css?family=Fenix' rel='stylesheet' type='text/css'>\n",
        "<link href='http://fonts.googleapis.com/css?family=Alegreya+Sans:100,300,400,500,700,800,900,100italic,300italic,400italic,500italic,700italic,800italic,900italic' rel='stylesheet' type='text/css'>\n",
        "<link href='http://fonts.googleapis.com/css?family=Source+Code+Pro:300,400' rel='stylesheet' type='text/css'>\n",
        "<style>\n",
        "    @font-face {\n",
        "        font-family: \"Computer Modern\";\n",
        "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
        "    }\n",
        "    div.cell{\n",
        "        width:800px;\n",
        "        margin-left:16% !important;\n",
        "        margin-right:auto;\n",
        "    }\n",
        "    h1 {\n",
        "        font-family: 'Alegreya Sans', sans-serif;\n",
        "    }\n",
        "    h2 {\n",
        "        font-family: 'Fenix', serif;\n",
        "    }\n",
        "    h3{\n",
        "\t\tfont-family: 'Fenix', serif;\n",
        "        margin-top:12px;\n",
        "        margin-bottom: 3px;\n",
        "       }\n",
        "\th4{\n",
        "\t\tfont-family: 'Fenix', serif;\n",
        "       }\n",
        "    h5 {\n",
        "        font-family: 'Alegreya Sans', sans-serif;\n",
        "    }\t   \n",
        "    div.text_cell_render{\n",
        "        font-family: 'Alegreya Sans',Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
        "        line-height: 135%;\n",
        "        font-size: 120%;\n",
        "        width:600px;\n",
        "        margin-left:auto;\n",
        "        margin-right:auto;\n",
        "    }\n",
        "    .CodeMirror{\n",
        "            font-family: \"Source Code Pro\";\n",
        "\t\t\tfont-size: 90%;\n",
        "    }\n",
        "/*    .prompt{\n",
        "        display: None;\n",
        "    }*/\n",
        "    .text_cell_render h1 {\n",
        "        font-weight: 200;\n",
        "        font-size: 50pt;\n",
        "\t\tline-height: 100%;\n",
        "        color:#CD2305;\n",
        "        margin-bottom: 0.5em;\n",
        "        margin-top: 0.5em;\n",
        "        display: block;\n",
        "    }\t\n",
        "    .text_cell_render h5 {\n",
        "        font-weight: 300;\n",
        "        font-size: 16pt;\n",
        "        color: #CD2305;\n",
        "        font-style: italic;\n",
        "        margin-bottom: .5em;\n",
        "        margin-top: 0.5em;\n",
        "        display: block;\n",
        "    }\n",
        "    \n",
        "    .warning{\n",
        "        color: rgb( 240, 20, 20 )\n",
        "        }  \n",
        "</style>\n",
        "<script>\n",
        "    MathJax.Hub.Config({\n",
        "                        TeX: {\n",
        "                           extensions: [\"AMSmath.js\"]\n",
        "                           },\n",
        "                tex2jax: {\n",
        "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
        "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
        "                },\n",
        "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
        "                \"HTML-CSS\": {\n",
        "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
        "                }\n",
        "        });\n",
        "</script>\n"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 1,
       "text": [
        "<IPython.core.display.HTML at 0x36fbb10>"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "> (The cell above executes the style for this notebook. We modified a style we found on the GitHub of [CamDavidsonPilon](https://github.com/CamDavidsonPilon), [@Cmrn_DP](https://twitter.com/cmrn_dp).)"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}