{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", "
\n", "

Is it a bird?
Is it a plane?
Accelerating Python with numba

\n", "

Juan Luis Cano Rodríguez <hello@juanlu.space>
2020-04-09 @ PyAmsterdam #StayAtHome

\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Outline\n", "\n", "1. \"Python is slow\"\n", "2. What is numba?\n", "3. Some examples\n", "4. Limitations and workarounds\n", "5. Conclusions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Who is this guy?\n", "\n", "* **Aerospace Engineer** with a passion for orbits 🛰\n", "* Former chair of the **Python España** non profit and former co-organizer of **PyCon Spain** 🐍\n", "* **Mission Planning & Execution Engineer** at **Satellogic** 🌍\n", "* Free Software advocate and Python enthusiast 🕮\n", "* Hard Rock lover 🎸\n", "\n", "Follow me! https://github.com/astrojuanlu/\n", "\n", "![Me!](img/juanlu_esa.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# \"Python is slow\"\n", "\n", "## Data structures\n", "\n", "![array vs list](img/array_vs_list.png)\n", "\n", "_(From https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/_)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Dynamic and interpreted (rather than static and compiled)\n", "\n", "![Four nested loops](img/loops.png)\n", "\n", "_(From https://gist.github.com/Juanlu001/cf19b1c16caf618860fb_)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Introspection\n", "\n", "![\"Why is checking isinstance(something, Mapping) so slow?\"](img/isinstance.png)\n", "\n", "_(From https://stackoverflow.com/q/42378726/554319)_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## What to do?\n", "\n", "### Vectorization\n", "\n", "* Rewriting some code leveraging high level NumPy functions can make it way faster\n", "* However, this works best for array manipulation - some other algorithms cannot easily be vectorized\n", "* And even if you can, vectorized code can be impossible to read\n", "\n", "![Too smart](img/too_smart.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Cython\n", "\n", "![Cython](img/Cython-logo.png)\n", "\n", "* Mature, widely used, effective, gradual - a great project!\n", "* Some personal problems with it:\n", " - I don't know any C, so it's more difficult for me\n", " - I wanted poliastro to be super easy to install by avoiding the \"two language\" problem (this includes Windows)\n", " - The native debugger is broken https://github.com/cython/cython/issues/1717\n", " - I really don't want to worry about some gore details\n", "\n", "I don't have lots of experience with it, so I don't have solid arguments against it." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### PyPy\n", "\n", "![PyPy](img/pypy-logo.png)\n", "\n", "* PyPy is a super interesting alternative Python implementation https://pypy.org/\n", "* I really really want to use it more, but there are some obstacles:\n", " - The documentation is a bit poor, even the changelogs\n", " - Lacks interest from the mainstream community (including snarky comments by Guido about \"nobody using it in production\")\n", " - Difficult to install libraries with binary extensions, although conda-forge is helping! https://conda-forge.org/status/\n", " - PyPy has several incompatibilities with manylinux1 wheels https://bitbucket.org/pypy/pypy/issues/2617/ although manylinux2010 is taking off!\n", " - At the moment, requires rewriting NumPy-based code to use plain Python lists... impractical" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### And a lot others...\n", "\n", "https://github.com/pfalcon/awesome-python-compilers/\n", "\n", "The number of projects keeps growing" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## All of a sudden, in 2012...\n", "\n", "![numba 0.1 is released](img/tweet-travis.png)\n", "\n", "https://twitter.com/teoliphant/status/235789560678858752" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What is numba?\n", "\n", "![numba](img/numba.png)\n", "\n", "> Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.\n", "\n", "* Latest stable version (at the time of writing) 0.48, (0.49.0rc1 tagged 13 days ago)\n", "* Documentation https://numba.pydata.org/numba-doc/latest/index.html\n", "* BSD-2 License\n", "* Easy to install:\n", "\n", "```\n", "$ pip install numba\n", "$ conda install numba [--channel conda-forge]\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Example 1: Monte Carlo for $\\pi$\n", "\n", "![Monte Carlo pi](img/Pi_30K.gif)\n", "\n", "(From https://commons.wikimedia.org/wiki/File:Pi_30K.gif)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import random\n", "\n", "def monte_carlo_pi(nsamples):\n", " acc = 0\n", " for i in range(nsamples):\n", " x = random.random()\n", " y = random.random()\n", " if (x ** 2 + y ** 2) < 1.0:\n", " acc += 1\n", " return 4.0 * acc / nsamples" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3.56" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "monte_carlo_pi(100)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3.14984" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "monte_carlo_pi(100_000)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5.38 s ± 708 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%timeit monte_carlo_pi(10_000_000) # Slow!" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "from numba import jit" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "monte_carlo_pi_fast = jit(monte_carlo_pi)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "272 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], "source": [ "%timeit -n1 -r1 monte_carlo_pi_fast(100)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "152 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%timeit monte_carlo_pi_fast(10_000_000) # 40x faster!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Example 2: Plate deflection\n", "\n", "$$\n", "w(\\xi, \\eta) = \\frac{4 P_c}{\\pi^4 D L_x L_y} \\sum_{m=1}^\\infty \\sum_{n=1}^\\infty \\frac{\\sin{\\frac{m \\pi \\xi}{L_x}} \\sin \\frac{n \\pi \\eta}{L_y}}{\\left(\\left(\\frac{m}{L_x}\\right)^2 + \\left(\\frac{n}{L_y}\\right)^2\\right)^2}\n", "$$\n", "\n", "(From http://www.efunda.com/formulae/solid_mechanics/plates/calculators/SSSS_PPoint.cfm)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import numpy as np\n", "from numpy import pi, sin" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "@jit\n", "def a_mn_point(P, a, b, xi, eta, mm, nn):\n", " \"\"\"Navier series coefficient for concentrated load.\"\"\"\n", " return 4 * P * sin(mm * pi * xi / a) * sin(nn * pi * eta / b) / (a * b)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@jit\n", "def plate_displacement(xx, yy, ww, a, b, P, xi, eta, D, max_m, max_n):\n", " max_i, max_j = ww.shape\n", " for mm in range(1, max_m):\n", " for nn in range(1, max_n):\n", " for ii in range(max_i):\n", " for jj in range(max_j):\n", " a_mn = a_mn_point(P, a, b, xi, eta, mm, nn)\n", " ww[ii, jj] += (\n", " a_mn\n", " / (mm ** 2 / a ** 2 + nn ** 2 / b ** 2) ** 2\n", " * sin(mm * pi * xx[ii, jj] / a)\n", " * sin(nn * pi * yy[ii, jj] / b)\n", " / (pi ** 4 * D)\n", " )" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# Plate geometry\n", "a = 1.0 # m\n", "b = 1.0 # m\n", "h = 50e-3 # m\n", "\n", "# Material properties\n", "E = 69e9 # Pa\n", "nu = 0.35\n", "\n", "# Series terms\n", "max_m = 16\n", "max_n = 16\n", "\n", "# Computation points\n", "# NOTE: With an odd number of points the center of the place is included in\n", "# the grid\n", "NUM_POINTS = 101\n", "\n", "# Load\n", "P = -10e3 # N\n", "xi = a / 2\n", "eta = a / 2\n", "\n", "# Flexural rigidity\n", "D = h**3 * E / (12 * (1 - nu**2))\n", "\n", "# Set up domain\n", "x = np.linspace(0, a, num=NUM_POINTS)\n", "y = np.linspace(0, b, num=NUM_POINTS)\n", "xx, yy = np.meshgrid(x, y)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "23.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], "source": [ "# Compute displacement field\n", "ww = np.zeros_like(xx)\n", "%timeit -n1 -r1 plate_displacement.py_func(xx, yy, ww, a, b, P, xi, eta, D, max_m, max_n)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "132 ms ± 3.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%timeit plate_displacement(xx, yy, ww, a, b, P, xi, eta, D, max_m, max_n) # 100 times faster!" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# https://github.com/plotly/plotly.py/issues/1664#issuecomment-511773518\n", "import plotly.graph_objects as go\n", "import plotly.io as pio\n", "\n", "# Set default renderer\n", "pio.renderers.default = \"notebook_connected\"\n", "\n", "# Set default template\n", "pio.templates[\"slides\"] = go.layout.Template(layout=dict(width=800, height=550))\n", "pio.templates.default = \"plotly+slides\"" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = go.Figure()\n", "fig.add_surface(z=ww)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Example 3: A ~~shameless self-plug~~ personal achievement\n", "\n", "\"poliastro: An Astrodynamics library written in Python with Fortran performance\" at the 6th International Conference on Astrodynamics Tools and Techniques\n", "\n", "![poliastro benchmark](img/poliastro-benchmark.png)\n", "\n", "https://indico.esa.int/event/111/contributions/393/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Example 4: C extensions with CFFI!\n", "\n", "We can call C functions exported using CFFI:\n", "\n", "https://web.archive.org/web/20160611082327/https://www.continuum.io/blog/developer-blog/calling-c-libraries-numba-using-cffi" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```python\n", "from numba import njit, cffi_support\n", "\n", "# See https://www.pybonacci.org/2016/02/07/como-crear-extensiones-en-c-para-python-usando-cffi-y-numba/\n", "# (Spanish, sorry!)\n", "import _hyper\n", "cffi_support.register_module(_hyper)\n", "\n", "_hyp2f1 = _hyper.lib.hyp2f1x # See https://github.com/numba/numba/issues/1688\n", "\n", "\n", "@njit\n", "def hyp2f1(a, b, c, x):\n", " return _hyp2f1(a, b, c, x)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Caveats and limitations ☹️\n", "\n", "![python-sgp4 fail](img/python-sgp4-fail1.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "![python-sgp4 fail](img/python-sgp4-fail2.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## The \"nopython mode\" is the only way\n", "\n", "* Two modes: \"object mode\" and \"nopython mode\", only the latter is truly optimized\n", "* Functions JITted in nopython mode can only call other functions in nopython mode\n", "* _Avoid \"object mode\"!_ In the process of being deprecated, in numba 0.44 raises warnings\n", "\n", "![It's nopython all the way down](img/nopython.jpg)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@jit\n", "def range10():\n", " l = []\n", " for x in range(10):\n", " l.append(x)\n", " return l\n", "\n", "range10()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":1: NumbaWarning:\n", "\n", "\n", "Compilation is falling back to object mode WITH looplifting enabled because Function \"reversed_range10\" failed type inference due to: Untyped global name 'reversed': cannot determine Numba type of \n", "\n", "File \"\", line 7:\n", "def reversed_range10():\n", " \n", "\n", " return reversed(l) # innocuous change, but no reversed support in nopython mode\n", " ^\n", "\n", "\n", ":1: NumbaWarning:\n", "\n", "\n", "Compilation is falling back to object mode WITHOUT looplifting enabled because Function \"reversed_range10\" failed type inference due to: cannot determine Numba type of \n", "\n", "File \"\", line 4:\n", "def reversed_range10():\n", " \n", " l = []\n", " for x in range(10):\n", " ^\n", "\n", "\n", "/home/juanlu/.pyenv/versions/3.8.0/envs/numba38/lib/python3.8/site-packages/numba/object_mode_passes.py:177: NumbaWarning:\n", "\n", "Function \"reversed_range10\" was compiled in object mode without forceobj=True, but has lifted loops.\n", "\n", "File \"\", line 3:\n", "def reversed_range10():\n", " l = []\n", " ^\n", "\n", "\n", "/home/juanlu/.pyenv/versions/3.8.0/envs/numba38/lib/python3.8/site-packages/numba/object_mode_passes.py:187: NumbaDeprecationWarning:\n", "\n", "\n", "Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.\n", "\n", "For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit\n", "\n", "File \"\", line 3:\n", "def reversed_range10():\n", " l = []\n", " ^\n", "\n", "\n", ":1: NumbaWarning:\n", "\n", "\n", "Compilation is falling back to object mode WITHOUT looplifting enabled because Function \"reversed_range10\" failed type inference due to: non-precise type pyobject\n", "[1] During: typing of argument at (4)\n", "\n", "File \"\", line 4:\n", "def reversed_range10():\n", " \n", " l = []\n", " for x in range(10):\n", " ^\n", "\n", "\n", "/home/juanlu/.pyenv/versions/3.8.0/envs/numba38/lib/python3.8/site-packages/numba/object_mode_passes.py:177: NumbaWarning:\n", "\n", "Function \"reversed_range10\" was compiled in object mode without forceobj=True.\n", "\n", "File \"\", line 4:\n", "def reversed_range10():\n", " \n", " l = []\n", " for x in range(10):\n", " ^\n", "\n", "\n", "/home/juanlu/.pyenv/versions/3.8.0/envs/numba38/lib/python3.8/site-packages/numba/object_mode_passes.py:187: NumbaDeprecationWarning:\n", "\n", "\n", "Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.\n", "\n", "For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit\n", "\n", "File \"\", line 4:\n", "def reversed_range10():\n", " \n", " l = []\n", " for x in range(10):\n", " ^\n", "\n", "\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@jit\n", "def reversed_range10():\n", " l = []\n", " for x in range(10):\n", " l.append(x)\n", "\n", " return reversed(l) # innocuous change, but no reversed support in nopython mode\n", "\n", "reversed_range10()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@jit(nopython=True)\n", "def reversed_range10():\n", " l = []\n", " for x in range(10):\n", " l.append(x)\n", "\n", " return l[::-1] # innocuous change, but no reversed support in nopython mode\n", "\n", "reversed_range10()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Passing functions as arguments is _slow_\n", "\n", "* Since numba 0.38 the user can pass JITted functions as arguments, but it's even slower than not JITting them https://github.com/numba/numba/issues/2952\n", "* Arguably the most important blocker to write reusable numba code" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from numba import njit" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "@njit\n", "def func(x):\n", " return x**3 - 1\n", "\n", "@njit\n", "def fprime(x):\n", " return 3 * x**2" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@njit\n", "def njit_newton(func, x0, fprime):\n", " for _ in range(50):\n", " fder = fprime(x0)\n", " fval = func(x0)\n", " newton_step = fval / fder\n", " x = x0 - newton_step\n", " if abs(x - x0) < 1.48e-8:\n", " return x\n", " x0 = x" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "56.5 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n", "4.93 µs ± 749 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" ] } ], "source": [ "%timeit njit_newton(func, 1.5, fprime)\n", "%timeit njit_newton.py_func(func, 1.5, fprime=fprime)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With a smart combination of closures and caching we can implement a workaround:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from functools import lru_cache" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@lru_cache()\n", "def newton_generator(func, fprime):\n", " @njit\n", " def njit_newton_final(x0):\n", " for _ in range(50):\n", " fder = fprime(x0)\n", " fval = func(x0)\n", " newton_step = fval / fder\n", " x = x0 - newton_step\n", " if abs(x - x0) < 1.48e-8:\n", " return x\n", " x0 = x\n", "\n", " return njit_newton_final" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "528 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], "source": [ "%timeit -n1 -r1 newton_generator(func, fprime)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.24 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], "source": [ "%timeit -n1 -r1 newton_generator(func, fprime)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "102 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], "source": [ "newton_func = newton_generator(func, fprime)\n", "%timeit -n1 -r1 newton_func(1.5)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "307 ns ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" ] } ], "source": [ "%timeit newton_func(1.5)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "211 ns ± 7.24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" ] } ], "source": [ "%timeit newton_generator(func, fprime)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "584 ns ± 26 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" ] } ], "source": [ "def newton(func, x0, fprime):\n", " return newton_generator(func, fprime)(x0)\n", "\n", "%timeit newton(func, 1.5, fprime)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## NumPy arrays and nothing else\n", "\n", "![Two layers](img/two-layers.png)\n", "\n", "High level API:\n", "\n", "* Supports complex data structures (e.g. `astropy.units` or `pint`, NumPy extensions for physical units) \n", "* Convert the code to normalized, simple structure that numba understands\n", "\n", "Dangerous™ algorithms:\n", "\n", "* Fast (easy to accelerate with `numba.njit`)\n", "* Only cares about numbers, makes assumptions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Tips and tricks\n", "\n", "* If anything fails, `export NUMBA_DISABLE_JIT=1`\n", "* Be careful with dtypes! https://github.com/numba/numba/issues/3993#issuecomment-485029668\n", "* Try to split the function in smaller chunks until you find out what triggers object (slow) mode\n", "* You might have to rewrite some stuff, make it less dynamic\n", "* Keep an eye on https://numba.pydata.org/numba-doc/dev/reference/pysupported.html and https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Not covered in this talk\n", "\n", "* Ahead-of-time (AOT) compilation\n", "* Interface with GPUs (CUDA, ROCm)\n", "* Releasing the GIL\n", "\n", "Check out the documentation! https://numba.pydata.org/numba-doc/latest/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Conclusions\n", "\n", "* Numba is ✨ _awesome_ ✨ when you make it work!\n", "* Still requires a bit of code rewrites, but the code is still _mostly pythonic_ Python\n", "* For non-numerical code, you will probably have to find something else" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Per Python ad astra! 🚀\n", "\n", "* Slides https://github.com/astrojuanlu/talk-numba\n", "* My email \n", "* My Twitter https://twitter.com/poliastro_py\n", "\n", "![Vega launch](img/vega.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Backup slides" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Comparison of solutions\n", "\n", "| Project | Pros | Cons |\n", "|--------|-----------------------------------------|-----------------------------------------------------------------------------------------------|\n", "| NumPy | Powerful, omnipresent | Vectorized code is sometimes difficult to read1, if you can't vectorize you are out of luck |\n", "| Cython | Gradual, effective, widely used, mature | Tricky if you don't know any C, couldn't make the native debugger work2 |\n", "| PyPy | General purpose | C extensions still very slow, no wheels on PyPI |\n", "| Numba | Simplest, very effective | Only numerical code, needs special care |\n", "\n", "And many others: Pythran, Nuitka, mypyc...\n", "\n", "1Check out \"Integration with the vernacular\", by James Powell https://pyvideo.org/pydata-london-2015/integration-with-the-vernacular.html\n", "\n", "2https://github.com/cython/cython/issues/2699\n", "\n", "3See https://github.com/antocuni/pypy-wheels for a half-baked effort. Perhaps the future will be brighter with the new manylinux2010 specification? https://bitbucket.org/pypy/pypy/issues/2617/pypy-binary-is-linked-to-too-much-stuff, https://github.com/pypa/manylinux/issues/179" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 2 }