{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ "Texto y c\u00f3digo sujeto bajo Creative Commons Attribution license, CC-BY-SA. (c) Original por Lorena A. Barba y Gilbert Forsyth en 2013, traducido por F.J. Navarro-Brull para CAChemE.org" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[@LorenaABarba](https://twitter.com/LorenaABarba) - \n", "[@CAChemEorg](https://twitter.com/cachemeorg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Este notebook complementa el primer m\u00f3dulo interectivo online de [CFD con Python](https://bitbucket.org/cfdpython/cfd-python-class) y clases dadas por la profesora Lorena A. Barba, llamadas **12 Pasos para Navier-Stokes**. En particular, abordaremos la cuesti\u00f3n del trabajar con Python en alto rendimiento." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Optimizando bucles con Numba\n", "----\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recordar\u00e1s nuestra exploraci\u00f3n [Operaciones matriciales (arrays) con NumPy](http://nbviewer.ipython.org/urls/bitbucket.org/cfdpython/cfd-python-class/raw/master/lessons/06%2520-%2520Array%2520Operations%2520with%2520NumPy.ipynb) donde se mostr\u00f3 como existen grandes incrementos de velocidad en la aplicaci\u00f3n de nuestras discretizaciones mediante operaciones de arrays NumPy optimizadas (en lugar de muchos bucles anidados).\n", "\n", "[Numba](http://numba.pydata.org/) es una herramienta que ofrece otro enfoque para la optimizaci\u00f3n de nuestro c\u00f3digo Python. Numba es una librer\u00eda de Python que convierte las funciones de Python en funciones compiladas en estilo C usando LLVM. Dependiendo del c\u00f3digo original y el tama\u00f1o del problema, Numba puede proporcionar una aceleraci\u00f3n significativa sobre c\u00f3digo optimizado con NumPy.\n", "\n", "Vamos a revisar la Ecuaci\u00f3n de Laplace en 2D:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from mpl_toolkits.mplot3d import Axes3D\n", "from matplotlib import cm\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "## Declaraci\u00f3n de variables\n", "nx = 81\n", "ny = 81\n", "c = 1\n", "dx = 2.0/(nx-1)\n", "dy = 2.0/(ny-1)\n", "\n", "## Condiciones iniciales\n", "p = np.zeros((ny,nx)) ##create a XxY vector of 0's\n", "\n", "## Gu\u00edas de la figura (malla)\n", "x = np.linspace(0,2,nx)\n", "y = np.linspace(0,1,ny)\n", "\n", "## Condiciones de frontera\n", "p[:,0] = 0\t\t##p = 0 @ x = 0\n", "p[:,-1] = y\t\t##p = y @ x = 2\n", "p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", "p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Esta es la funci\u00f3n para iterar sobre la Ecuaci\u00f3n de Laplace que escribimos en el Paso 9:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def laplace2d(p, y, dx, dy, l1norm_target):\n", " l1norm = 1\n", " pn = np.empty_like(p)\n", "\n", " while l1norm > l1norm_target:\n", " pn[:] = p[:]\n", " p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) \n", " p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n", " p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n", " \n", " p[:,0] = 0\t\t##p = 0 @ x = 0\n", " p[:,-1] = y\t\t##p = y @ x = 2\n", " p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", " p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n", " l1norm = (np.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n", " \n", " return p" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Usemos el comando `%%timeit` (cell-magic) para ver lo r\u00e1pido que se ejecuta:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%timeit\n", "laplace2d(p, y, dx, dy, .00001)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 206 us per loop\n" ] } ], "prompt_number": 28 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\u00a1Vale! Nuestra funci\u00f3n `laplace2d` tarda alrededor de 206 *micro*-segundos en completarse. Eso es muy r\u00e1pido y tenemos que agradec\u00e9rselo a las operaciones con arrays. Vamos a echar un vistazo al tiempo que tarda el uso de una versi\u00f3n Python m\u00e1s 'pura'." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def laplace2d_vanilla(p, y, dx, dy, l1norm_target):\n", " l1norm = 1\n", " pn = np.empty_like(p)\n", " nx, ny = len(y), len(y)\n", "\n", " while l1norm > l1norm_target:\n", " pn[:] = p[:]\n", " \n", " for i in range(1, nx-1):\n", " for j in range(1, ny-1):\n", " p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))\n", " \n", " p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n", " p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n", " \n", " p[:,0] = 0\t\t##p = 0 @ x = 0\n", " p[:,-1] = y\t\t##p = y @ x = 2\n", " p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", " p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n", " l1norm = (np.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n", " \n", " return p" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "%%timeit\n", "laplace2d_vanilla(p, y, dx, dy, .00001)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 32 ms per loop\n" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "La versi\u00f3n simple Python tarda 32 *mili*-segundos en completarse. Vamos a calcular el aumento de velocidad que ganamos en el uso de las operaciones array:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "32*1e-3/(206*1e-6)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 35, "text": [ "155.33980582524273" ] } ], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As\u00ed NumPy nos da un aumento de velocidad 155x sobre el c\u00f3digo normal de Python. Dicho esto, a veces poniendo en pr\u00e1ctica nuestras discretizaciones en operaciones con arrays (vectorizadas) puede ser un poco complicado.\n", "\n", "Vamos a ver lo que puede hacer Numba. Empecemos con la importaci\u00f3n de la funci\u00f3n especial `autojit` de la biblioteca `numba`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from numba import autojit" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para integrar Numba con nuestra funci\u00f3n existente, todo lo que tenemos que hacer es anteponer el comando `@` autojit antes de nuestra declaraci\u00f3n de `def`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "@autojit\n", "def laplace2d_numba(p, y, dx, dy, l1norm_target):\n", " l1norm = 1\n", " pn = np.empty_like(p)\n", "\n", " while l1norm > l1norm_target:\n", " pn[:] = p[:]\n", " p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) \n", " p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n", " p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n", " \n", " p[:,0] = 0\t\t##p = 0 @ x = 0\n", " p[:,-1] = y\t\t##p = y @ x = 2\n", " p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", " p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n", " l1norm = (np.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n", " \n", " return p" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 38 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Las \u00fanicas l\u00edneas que han cambiado son la l\u00ednea `@` autojit y tambi\u00e9n el nombre de la funci\u00f3n, que se ha cambiado para que podamos comparar el rendimiento. Ahora veamos lo que sucede:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%timeit\n", "laplace2d_numba(p, y, dx, dy, .00001)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 137 us per loop" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 39 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\u00a1Ok! As\u00ed que no es un aumento de velocidad 155x como vimos entre la versi\u00f3n pura de Python y NumPy, pero es una ganancia nada trivial de rendimiento, especialmente teniendo en cuenta lo f\u00e1cil que era ponerla en pr\u00e1ctica. Otra caracter\u00edstica interesante de Numba es que se puede usar el comando `@autojit` en funciones de operaci\u00f3n no-array, tambi\u00e9n. Vamos a tratar de agregarlo a nuestra versi\u00f3n de Python pura:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "@autojit\n", "def laplace2d_vanilla_numba(p, y, dx, dy, l1norm_target):\n", " l1norm = 1\n", " pn = np.empty_like(p)\n", " nx, ny = len(y), len(y)\n", "\n", " while l1norm > l1norm_target:\n", " pn[:] = p[:]\n", " \n", " for i in range(1, nx-1):\n", " for j in range(1, ny-1):\n", " p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))\n", " \n", " p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))\n", " p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) \n", " \n", " p[:,0] = 0\t\t##p = 0 @ x = 0\n", " p[:,-1] = y\t\t##p = y @ x = 2\n", " p[0,:] = p[1,:]\t\t##dp/dy = 0 @ y = 0\n", " p[-1,:] = p[-2,:]\t##dp/dy = 0 @ y = 1\n", " l1norm = (np.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))\n", " \n", " return p" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 41 }, { "cell_type": "code", "collapsed": false, "input": [ "%%timeit\n", "laplace2d_vanilla_numba(p, y, dx, dy, .00001)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 561 us per loop" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": {}, "source": [ "561 micro-ssegundos. Eso no llega al 155x que vimos con NumPy, pero est\u00e1 cerca. Y todo lo que hicimos fue agregar una l\u00ednea de c\u00f3digo.\n", "\n", "As\u00ed, tenemos:\n", "\n", "Puro (Vanilla) Python: 32 milisegundos\n", "\n", "NumPy Python: 206 microsegundos\n", "\n", "Vanilla + Numba: 561 microsegundos\n", "\n", "NumPy + Numba: 137 microsegundos\n", "\n", "Es evidente que la combinaci\u00f3n Numba + NumPy es la m\u00e1s r\u00e1pida, pero la capacidad de optimizar r\u00e1pidamente c\u00f3digo con bucles anidados tambi\u00e9n puede ser muy \u00fatil en ciertas aplicaciones." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"../styles/custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "" ], "output_type": "pyout", "prompt_number": 37, "text": [ "" ] } ], "prompt_number": 37 }, { "cell_type": "markdown", "metadata": {}, "source": [ "> (La celda de arriba establece el formato de este notebook) Tomado originalmente de [CamDavidsonPilon](https://github.com/CamDavidsonPilon), [@Cmrn_DP](https://twitter.com/cmrn_dp).)" ] } ], "metadata": {} } ] }