{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ "Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Lorena A. Barba, Gilbert F. Forsyth 2015. Thanks to NSF for support via CAREER award #1149784." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[@LorenaABarba](https://twitter.com/LorenaABarba)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook complements the [interactive CFD online](https://bitbucket.org/cfdpython/cfd-python-class/overview) module **12 steps to Navier-Stokes**, addressing the issue of high performance with Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Optimizing Loops with Numba\n", "----\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will recall from our exploration of [array operations with NumPy](http://nbviewer.ipython.org/urls/github.com/barbagroup/CFDPython/blob/master/lessons/06_Array_Operations_with_NumPy.ipynb) that there are large speed gains to be had from implementing our discretizations using NumPy-optimized array operations instead of many nested loops. \n", "\n", "[Numba](http://numba.pydata.org/) is a tool that offers another approach to optimizing our Python code. Numba is a library for Python which turns Python functions into C-style compiled functions using LLVM. Depending on the original code and the size of the problem, Numba can provide a significant speedup over NumPy optimized code.\n", "\n", "Let's revisit the 2D Laplace Equation:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from mpl_toolkits.mplot3d import Axes3D\n", "from matplotlib import cm\n", "from matplotlib import pyplot\n", "import numpy\n", "\n", "##variable declarations\n", "nx = 81\n", "ny = 81\n", "c = 1\n", "dx = 2.0/(nx-1)\n", "dy = 2.0/(ny-1)\n", "\n", "##initial conditions\n", "p = numpy.zeros((ny,nx)) ##create a XxY vector of 0's\n", "\n", "##plotting aids\n", "x = numpy.linspace(0,2,nx)\n", "y = numpy.linspace(0,1,ny)\n", "\n", "##boundary conditions\n", "p[:,0] = 0\t\t##p = 0 @ x = 0\n", "p[:,-1] = y\t\t##p = y @ x = 2\n", "p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", "p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the function for iterating over the Laplace Equation that we wrote in Step 9:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def laplace2d(p, y, dx, dy, l1norm_target):\n", " l1norm = 1\n", " pn = numpy.empty_like(p)\n", "\n", " while l1norm > l1norm_target:\n", " pn = p.copy()\n", " p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) \n", " p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n", " p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n", " \n", " p[:,0] = 0\t\t##p = 0 @ x = 0\n", " p[:,-1] = y\t\t##p = y @ x = 2\n", " p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", " p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n", " l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n", " \n", " return p" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use the `%%timeit` cell-magic to see how fast it runs:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%timeit\n", "laplace2d(p, y, dx, dy, .00001)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 206 us per loop\n" ] } ], "prompt_number": 28 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok! Our function `laplace2d` takes around 206 *micro*-seconds to complete. That's pretty fast and we have our array operations to thank for that. Let's take a look at how long it takes using a more 'vanilla' Python version. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "def laplace2d_vanilla(p, y, dx, dy, l1norm_target):\n", " l1norm = 1\n", " pn = numpy.empty_like(p)\n", " nx, ny = len(y), len(y)\n", "\n", " while l1norm > l1norm_target:\n", " pn = p.copy()\n", " \n", " for i in range(1, nx-1):\n", " for j in range(1, ny-1):\n", " p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))\n", " \n", " p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n", " p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n", " \n", " p[:,0] = 0\t\t##p = 0 @ x = 0\n", " p[:,-1] = y\t\t##p = y @ x = 2\n", " p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", " p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n", " l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n", " \n", " return p" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "%%timeit\n", "laplace2d_vanilla(p, y, dx, dy, .00001)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 32 ms per loop\n" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simple Python version takes 32 *milli*-seconds to complete. Let's calculate the speedup we gained in using array operations:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "32*1e-3/(206*1e-6)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 35, "text": [ "155.33980582524273" ] } ], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So NumPy gives us a 155x speed increase over regular Python code. That said, sometimes implementing our discretizations in array operations can be a little bit tricky. \n", "\n", "Let's see what Numba can do. We'll start by importing the special function decorator `autojit` from the `numba` library:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from numba import autojit" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To integrate Numba with our existing function, all we have to do it is prepend the `@autojit` function decorator before our `def` statement: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "@autojit\n", "def laplace2d_numba(p, y, dx, dy, l1norm_target):\n", " l1norm = 1\n", " pn = numpy.empty_like(p)\n", "\n", " while l1norm > l1norm_target:\n", " pn = p.copy()\n", " p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) \n", " p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n", " p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n", " \n", " p[:,0] = 0\t\t##p = 0 @ x = 0\n", " p[:,-1] = y\t\t##p = y @ x = 2\n", " p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", " p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n", " l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n", " \n", " return p" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 38 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only lines that have changed are the `@autojit` line and also the function name, which has been changed so we can compare performance. Now let's see what happens:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%timeit\n", "laplace2d_numba(p, y, dx, dy, .00001)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 137 us per loop" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 39 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok! So it's not a 155x speed increase like we saw between vanilla Python and NumPy, but it is a non-trivial gain in performance time, especially given how easy it was to implement. Another cool feature of Numba is that you can use the `@autojit` decorator on non-array operation functions, too. Let's try adding it onto our vanilla version:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "@autojit\n", "def laplace2d_vanilla_numba(p, y, dx, dy, l1norm_target):\n", " l1norm = 1\n", " pn = numpy.empty_like(p)\n", " nx, ny = len(y), len(y)\n", "\n", " while l1norm > l1norm_target:\n", " pn = p.copy()\n", " \n", " for i in range(1, nx-1):\n", " for j in range(1, ny-1):\n", " p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))\n", " \n", " p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n", " p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n", " \n", " p[:,0] = 0\t\t##p = 0 @ x = 0\n", " p[:,-1] = y\t\t##p = y @ x = 2\n", " p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", " p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n", " l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n", " \n", " return p" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 41 }, { "cell_type": "code", "collapsed": false, "input": [ "%%timeit\n", "laplace2d_vanilla_numba(p, y, dx, dy, .00001)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 561 us per loop" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": {}, "source": [ "561 micro-seconds. That's not quite the 155x increase we saw with NumPy, but it's close. And all we did was add one line of code. \n", "\n", "So we have:\n", "\n", "Vanilla Python: 32 milliseconds \n", "\n", "NumPy Python: 206 microseconds \n", "\n", "Vanilla + Numba: 561 microseconds\n", "\n", "NumPy + Numba: 137 microseconds\n", "\n", "Clearly the NumPy + Numba combination is the fastest, but the ability to quickly optimize code with nested loops can also come in very handy in certain applications. \n", "\n" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"../styles/custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "> (The cell above executes the style for this notebook. We modified a style we found on the GitHub of [CamDavidsonPilon](https://github.com/CamDavidsonPilon), [@Cmrn_DP](https://twitter.com/cmrn_dp).)" ] } ], "metadata": {} } ] }