{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Lorena A. Barba, Gilbert F. Forsyth 2015. Thanks to NSF for support via CAREER award #1149784."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[@LorenaABarba](https://twitter.com/LorenaABarba)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook complements the [interactive CFD online](https://bitbucket.org/cfdpython/cfd-python-class/overview) module **12 steps to Navier-Stokes**, addressing the issue of high performance with Python."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Optimizing Loops with Numba\n",
"----\n",
"***"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You will recall from our exploration of [array operations with NumPy](http://nbviewer.ipython.org/urls/github.com/barbagroup/CFDPython/blob/master/lessons/06_Array_Operations_with_NumPy.ipynb) that there are large speed gains to be had from implementing our discretizations using NumPy-optimized array operations instead of many nested loops. \n",
"\n",
"[Numba](http://numba.pydata.org/) is a tool that offers another approach to optimizing our Python code. Numba is a library for Python which turns Python functions into C-style compiled functions using LLVM. Depending on the original code and the size of the problem, Numba can provide a significant speedup over NumPy optimized code.\n",
"\n",
"Let's revisit the 2D Laplace Equation:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from mpl_toolkits.mplot3d import Axes3D\n",
"from matplotlib import cm\n",
"from matplotlib import pyplot\n",
"import numpy\n",
"\n",
"##variable declarations\n",
"nx = 81\n",
"ny = 81\n",
"c = 1\n",
"dx = 2.0/(nx-1)\n",
"dy = 2.0/(ny-1)\n",
"\n",
"##initial conditions\n",
"p = numpy.zeros((ny,nx)) ##create a XxY vector of 0's\n",
"\n",
"##plotting aids\n",
"x = numpy.linspace(0,2,nx)\n",
"y = numpy.linspace(0,1,ny)\n",
"\n",
"##boundary conditions\n",
"p[:,0] = 0\t\t##p = 0 @ x = 0\n",
"p[:,-1] = y\t\t##p = y @ x = 2\n",
"p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
"p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 27
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is the function for iterating over the Laplace Equation that we wrote in Step 9:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def laplace2d(p, y, dx, dy, l1norm_target):\n",
" l1norm = 1\n",
" pn = numpy.empty_like(p)\n",
"\n",
" while l1norm > l1norm_target:\n",
" pn = p.copy()\n",
" p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) \n",
" p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n",
" p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n",
" \n",
" p[:,0] = 0\t\t##p = 0 @ x = 0\n",
" p[:,-1] = y\t\t##p = y @ x = 2\n",
" p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
" p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n",
" l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n",
" \n",
" return p"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's use the `%%timeit` cell-magic to see how fast it runs:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"laplace2d(p, y, dx, dy, .00001)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 206 us per loop\n"
]
}
],
"prompt_number": 28
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok! Our function `laplace2d` takes around 206 *micro*-seconds to complete. That's pretty fast and we have our array operations to thank for that. Let's take a look at how long it takes using a more 'vanilla' Python version. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def laplace2d_vanilla(p, y, dx, dy, l1norm_target):\n",
" l1norm = 1\n",
" pn = numpy.empty_like(p)\n",
" nx, ny = len(y), len(y)\n",
"\n",
" while l1norm > l1norm_target:\n",
" pn = p.copy()\n",
" \n",
" for i in range(1, nx-1):\n",
" for j in range(1, ny-1):\n",
" p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))\n",
" \n",
" p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n",
" p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n",
" \n",
" p[:,0] = 0\t\t##p = 0 @ x = 0\n",
" p[:,-1] = y\t\t##p = y @ x = 2\n",
" p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
" p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n",
" l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n",
" \n",
" return p"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"laplace2d_vanilla(p, y, dx, dy, .00001)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 32 ms per loop\n"
]
}
],
"prompt_number": 30
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The simple Python version takes 32 *milli*-seconds to complete. Let's calculate the speedup we gained in using array operations:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"32*1e-3/(206*1e-6)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 35,
"text": [
"155.33980582524273"
]
}
],
"prompt_number": 35
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So NumPy gives us a 155x speed increase over regular Python code. That said, sometimes implementing our discretizations in array operations can be a little bit tricky. \n",
"\n",
"Let's see what Numba can do. We'll start by importing the special function decorator `autojit` from the `numba` library:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from numba import autojit"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 36
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To integrate Numba with our existing function, all we have to do it is prepend the `@autojit` function decorator before our `def` statement: "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"@autojit\n",
"def laplace2d_numba(p, y, dx, dy, l1norm_target):\n",
" l1norm = 1\n",
" pn = numpy.empty_like(p)\n",
"\n",
" while l1norm > l1norm_target:\n",
" pn = p.copy()\n",
" p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) \n",
" p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n",
" p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n",
" \n",
" p[:,0] = 0\t\t##p = 0 @ x = 0\n",
" p[:,-1] = y\t\t##p = y @ x = 2\n",
" p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
" p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n",
" l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n",
" \n",
" return p"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 38
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The only lines that have changed are the `@autojit` line and also the function name, which has been changed so we can compare performance. Now let's see what happens:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"laplace2d_numba(p, y, dx, dy, .00001)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 137 us per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 39
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok! So it's not a 155x speed increase like we saw between vanilla Python and NumPy, but it is a non-trivial gain in performance time, especially given how easy it was to implement. Another cool feature of Numba is that you can use the `@autojit` decorator on non-array operation functions, too. Let's try adding it onto our vanilla version:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"@autojit\n",
"def laplace2d_vanilla_numba(p, y, dx, dy, l1norm_target):\n",
" l1norm = 1\n",
" pn = numpy.empty_like(p)\n",
" nx, ny = len(y), len(y)\n",
"\n",
" while l1norm > l1norm_target:\n",
" pn = p.copy()\n",
" \n",
" for i in range(1, nx-1):\n",
" for j in range(1, ny-1):\n",
" p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))\n",
" \n",
" p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n",
" p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n",
" \n",
" p[:,0] = 0\t\t##p = 0 @ x = 0\n",
" p[:,-1] = y\t\t##p = y @ x = 2\n",
" p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n",
" p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n",
" l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n",
" \n",
" return p"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 41
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"laplace2d_vanilla_numba(p, y, dx, dy, .00001)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 561 us per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 42
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"561 micro-seconds. That's not quite the 155x increase we saw with NumPy, but it's close. And all we did was add one line of code. \n",
"\n",
"So we have:\n",
"\n",
"Vanilla Python: 32 milliseconds \n",
"\n",
"NumPy Python: 206 microseconds \n",
"\n",
"Vanilla + Numba: 561 microseconds\n",
"\n",
"NumPy + Numba: 137 microseconds\n",
"\n",
"Clearly the NumPy + Numba combination is the fastest, but the ability to quickly optimize code with nested loops can also come in very handy in certain applications. \n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 42
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.core.display import HTML\n",
"def css_styling():\n",
" styles = open(\"../styles/custom.css\", \"r\").read()\n",
" return HTML(styles)\n",
"css_styling()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"text": [
""
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> (The cell above executes the style for this notebook. We modified a style we found on the GitHub of [CamDavidsonPilon](https://github.com/CamDavidsonPilon), [@Cmrn_DP](https://twitter.com/cmrn_dp).)"
]
}
],
"metadata": {}
}
]
}