{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# The need for speed without bothering too much: An introduction to `numba`\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Do you really need the speed?\n", "\n", "* Write your Python program.\n", "* Ensure it executes correctly and does what it is supposed to.\n", "* Is it fast enough?\n", "* If yes: Ignore the rest of the presentation.\n", "* If no:\n", " 1. Get it right.\n", " 2. Test it's right.\n", " 3. Profile if slow.\n", " 4. Optimise (C, C++/Cython/`numba` and save yourself the pain).\n", " 5. Repeat from 2.\n", " \n", "> We *should forget* about small efficiencies, say about 97% of the time: **premature optimization is the root of all evil**.\n", "\n", "> Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only **after** that code has been identified.\n", "\n", "

Donald Knuth

" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# The need for speed\n", "\n", "* For many programs, the most important resource is **developer time**.\n", "* The best code is:\n", " * Easy to understand.\n", " * Easy to modify.\n", "* But sometimes execution speed matters. Then what do you do?\n", "* Go find a compiler!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# A Python compiler?\n", "\n", "* Takes advantage of a simple fact:\n", " * Most functions in your program only use a small number of types.\n", "* → Generate machine code to manipulate only the types you use!\n", "* LLVM (Low Level Virtual Machine) library already implements a compiler backend.\n", " * It is used to construct, optimize and produce intermediate and/or binary machine code.\n", " * A compiler framework, where you provide the \"front end\" (parser and lexer) and the \"back end\" (code that converts LLVM's representation to actual machine code).\n", " * Multi platform.\n", " * LLVM optimizations (inlining, loop unrolling, SIMD vectorization etc)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# How can `numba` help?\n", "\n", "* If you have big `numpy` arrays with your data (remember `pandas` uses `numpy` under-the-covers), `numba` makes it easy to write simple functions that are fast that work with that data.\n", "* `numba` is an open source JIT (Just-In-Time) compiler for Python functions.\n", "* From the types of the function arguments, `numba` can often generate a specialized, fast, machine code implementation at\n", "runtime.\n", "* Designed to work best with numerical code and `numpy` arrays.\n", "* Uses the LLVM library as the compiler backend.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `numba` features\n", "\n", "* Numba generates optimized machine code from pure Python code using the LLVM compiler infrastructure. With a few simple annotations, array-oriented and math-heavy Python code can be JIT compiled to performance similar as C, C++ and Fortran, without having to switch languages or Python interpreters.\n", "* `numba` supports:\n", " * Windows, OSX and Linux.\n", " * 32 and 64 bit CPUs and NVIDIA GPUs (CUDA).\n", " * `numpy`\n", "* Does *not* require any C/C++ compiler.\n", "* Does *not* replace the standard Python interpreter (all Python libraries are still available).\n", "* Easy to install (`conda` not longer required): `pip install numba` (wheels for Windows/Linux/OSX are available, no need to compile anything)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# How `numba` works\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `numba` modes of compilation\n", "\n", "* **object mode**: Compiled code operates on Python objects. Supports nearly all of Python, but generally cannot speed up code by a large factor. Only significant improvement is the compilation of loops that can be compiled in *nopython* mode.\n", " * In object mode, Numba will attempt perform *loop lifting*, i.e. extract loops and compile them in *nopython* mode.\n", " * Works great for functions that are bookended by uncompilable code, but have a compilable core loop.\n", " * All happens automatically.\n", "* **nopython mode**: Compiled code operates on native machine data. Supports a subset of Python, but runs close to C/C++/FORTRAN speed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# **nopython** mode features\n", "* Standard control and looping structures: `if`, `else`, `while`, `for`, `range`.\n", "* `numpy` arrays, int, float, complex, booleans, and tuples.\n", "* Almost all arithmetic, logical, and bitwise operators as well as functions from the math and numpy modules.\n", "* Nearly all `numpy` dtypes: `int`, `float`, `complex`, `datetime64`, `timedelta64`.\n", "* Array element access (read and write).\n", "* Array reduction functions: `sum`, `prod`, `max`, `min`, etc.\n", "* Calling other `nopython` mode compiled functions.\n", "* Calling `ctypes` or `cffi` wrapped external functions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Supported Python features\n", "\n", "* Built-in types: `int`, `bool`, `float`, `complex`, `bytes`.\n", "* Container types: generators, lists, tuples and sets (with some restrictions).\n", "* Built-in functions: Most builtin functions are supported, with some restrictions.\n", "* Standard lib modules: `array` (limited support), `cmath`, `collections`, `ctypes`, `enum`, `math`, `operator`, `functools`, `random`.\n", "* Third-party modules: `cffi` - Similarly to `ctypes`, `numba` is able to call into `cffi` declared external functions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Supported `numpy` features\n", "\n", "* `numba` integrates seamlessly with `numpy`, whose arrays provide an efficient storage method for homogeneous sets of data and `numpy` *dtypes* provide type information useful when compiling.\n", "* `numba` understands calls to `numpy` *ufuncs* and is able to generate equivalent native code for many of them.\n", "* `numpy` arrays are directly supported in `numba`. Access to `numpy` arrays is very efficient, as indexing is lowered to direct memory accesses when possible.\n", "* `numba` is able to generate *ufuncs* and *gufuncs*. This means that it is possible to implement ufuncs and gufuncs within Python, getting speeds comparable to that of ufuncs/gufuncs implemented in C extension modules using the `numpy` C API." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `numpy` arrays and dtype objects\n", "\n", "* `numpy`'s main object is the homogeneous multidimensional array. It is a table of elements, all of the same type, indexed by a tuple of positive integers. In `numpy` dimensions are called *axes*. The number of axes is *rank*.\n", "* A data type object (an instance of `numpy.dtype` class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted. It describes the type of the data (integer, float, Python object, etc.), size of the data (how many bytes is in e.g. the integer), byte order and some other parameters that further describe the data if it is e.g. a sub-array or an aggregate of other data types.\n", "\n", "```python\n", ">>> import numpy as np\n", ">>> a = np.arange(15).reshape(3, 5)\n", ">>> a\n", "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14]])\n", ">>> a.shape\n", "(3, 5)\n", ">>> a.ndim\n", "2\n", ">>> a.dtype.name\n", "'int64'\n", ">>> np.ones( (2,3,4), dtype=np.int16 ) # dtype can also be specified\n", "array([[[ 1, 1, 1, 1],\n", " [ 1, 1, 1, 1],\n", " [ 1, 1, 1, 1]],\n", " [[ 1, 1, 1, 1],\n", " [ 1, 1, 1, 1],\n", " [ 1, 1, 1, 1]]], dtype=int16)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Basic usage\n", "\n", "* **Lazy compilation**: Compilation will be deferred until the first function execution. `numba` will infer the argument types at call time, and generate optimized code based on this information. `numba` will also be able to compile separate specializations depending on the input types. \n", "```python\n", "from numba import jit\n", "#\n", "@jit\n", "def f(x, y):\n", " return x + y\n", "```\n", "* **Eager compilation**: `int32(int32, int32)` is the function’s signature. In this case, the corresponding specialization will be compiled by the `@jit` decorator, and no other specialization will be allowed. This is useful if you want fine-grained control over types chosen by the compiler (for example, to use single-precision floats).\n", "```python\n", "from numba import jit, int32\n", "#\n", "@jit(int32(int32, int32))\n", "def f(x, y):\n", " return x + y\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Compilation options\n", "\n", "* **`nopython`**: `numba` has two compilation modes: `nopython` mode and `object` mode. The former produces much faster code, but has limitations that can force `numba` to fall back to the latter. To prevent `numba` from falling back, and instead raise an error, pass `nopython=True`.\n", "* **`nogil`**: Whenever `numba` optimizes Python code to native code that only works on native types and variables (rather than Python objects), it is not necessary anymore to hold Python’s global interpreter lock (GIL). `numba` will release the GIL when entering such a compiled function if you passed `nogil=True`.\n", "* **`parallel`**: Enables an experimental feature that automatically parallelizes (and performs other optimizations for) those operations in the function known to have parallel semantics. This feature is enabled by passing `parallel=True` and must be used in conjunction with `nopython=True`:\n", "* **`cache`**: To avoid compilation times each time you invoke a Python program, you can instruct `numba` to write the result of function compilation into a file-based cache. This is done by passing `cache=True`:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Creating `numpy` universal functions with `@vectorize`\n", "\n", "* `numba`'s vectorize allows Python functions taking scalar input arguments to be used as `numpy` ufuncs. Creating a traditional `numpy` ufunc is not not the most straightforward process and involves writing some C code. `numba` makes this easy. Using `@vectorize()`, `numba` can compile a pure Python function into a ufunc that operates over `numpy` arrays as fast as traditional ufuncs written in C.\n", "* Using `@vectorize`, you write your function as operating over input scalars, rather than arrays. `numba` will generate the surrounding loop (or kernel) allowing efficient iteration over the actual inputs." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 2. 4. 6. 8. 10.]\n", "[0. 0.4 0.8 1.2 1.6 2. ]\n" ] } ], "source": [ "from numba import vectorize, float64\n", "import numpy as np\n", "\n", "@vectorize([float64(float64, float64)])\n", "def f(x, y):\n", " return x + y\n", "\n", "a = np.arange(6)\n", "print(f(a,a))\n", "a = np.linspace(0, 1, 6)\n", "print(f(a,a))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Compiling python classes with `@jitclass`\n", "\n", "`numba` supports code generation for classes via the `@jitclass` decorator. A class can be marked for optimization using this decorator along with a specification of the types of each field. We call the resulting class object a **jitclass**.\n", "* *All methods* of a jitclass are compiled into `nopython` functions. The data of a jitclass instance is allocated on the heap as a C-compatible structure so that any compiled functions can have direct access to the underlying data, bypassing the interpreter." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import numpy as np\n", "from numba import jitclass # import the decorator\n", "from numba import int32, float32 # import the types\n", "\n", "spec = [\n", " ('value', int32), # a simple scalar field\n", " ('array', float32[:]), # an array field\n", "]\n", "\n", "@jitclass(spec)\n", "class Bag(object):\n", " def __init__(self, value):\n", " self.value = value\n", " self.array = np.zeros(value, dtype=np.float32)\n", "\n", " @property\n", " def size(self):\n", " return self.array.size\n", "\n", " def increment(self, val):\n", " for i in range(self.size):\n", " self.array[i] = val\n", " return self.array" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# More features\n", "\n", "* **Flexible specializations with `@generated_jit`**: Sometimes you want to write a function that has different implementations depending on its input types. The `@generated_jit` decorator allows the user to control the selection of a specialization at compile-time, while retaining runtime execution speed of a JIT function.\n", "* **Creating C callbacks with `@cfunc`**: Interfacing with some native libraries (for example written in C or C++) can necessitate writing native callbacks to provide business logic to the library. The `@cfunc` decorator creates a compiled function callable from foreign C code, using the signature of your choice.\n", "* **Automatic parallelization with `@jit`**: Setting the `parallel` option for `@jit` enables an experimental Numba feature that attempts to automatically parallelize and perform other optimizations on (part of) a function.\n", "* **CUDA support**: CUDA has an execution model unlike the traditional sequential model used for programming CPUs. In CUDA, the code you write will be executed by multiple threads at once (often hundreds or thousands). Your solution will be modeled by defining a thread hierarchy of grid, blocks and threads.`numba`'s CUDA support exposes facilities to declare and manage this hierarchy of threads. The facilities are largely similar to those exposed by NVidia’s CUDA C language." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Example 1: Summation" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "457 µs ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n", "46.4 ms ± 1.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", "168 ns ± 5.99 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n", "165 ns ± 0.0791 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n", "Speed-up factor: 2715.0x\n" ] } ], "source": [ "from numba import jit\n", "\n", "def psum(x):\n", " res = 0\n", " for i in range(x):\n", " res += i ** 2 + 1 + i\n", " return res\n", "\n", "nsum = jit(psum)\n", "\n", "t1 = %timeit -c -o psum(1000)\n", "%timeit -c psum(100000)\n", "t2 = %timeit -c -o nsum(1000)\n", "%timeit -c nsum(100000000)\n", "print(\"Speed-up factor: {:.1f}x\".format(int(t1.average / t2.average)))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Example 2: Fibonacci series" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "71.6 µs ± 1.85 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n", "918 ns ± 11.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", "Speed-up factor: 78.0x\n" ] } ], "source": [ "from numba import jit\n", "\n", "def fib(n):\n", " a, b = 0, 1\n", " for i in range(n):\n", " a, b = b, a + b\n", "\n", " return a\n", "\n", "nfib = jit(fib, nopython=True)\n", "\n", "t1 = %timeit -c -o fib(1000)\n", "t2 = %timeit -c -o nfib(1000)\n", "print(\"Speed-up factor: {:.1f}x\".format(t1.average / t2.average))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "%load_ext cython" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "%%cython\n", "\n", "def cfib(int n):\n", " cdef int i, a, b\n", " a, b = 0, 1\n", " for i in range(n):\n", " a, b = b, a + b\n", " return a" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "645 ns ± 1.52 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", "Speed-up factor (cython vs numba): 1.4x\n" ] } ], "source": [ "t3 = %timeit -c -o cfib(1000)\n", "print(\"Speed-up factor (cython vs numba): {:.1f}x\".format(t2.average / t3.average))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Example 3: Mandelbrot fractal" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "from timeit import default_timer as timer\n", "from matplotlib.pylab import imshow, jet, show, ion\n", "import numpy as np\n", "\n", "from numba import jit\n", "\n", "def mandel(x, y, max_iters):\n", " \"\"\"\n", " Given the real and imaginary parts of a complex number,\n", " determine if it is a candidate for membership in the Mandelbrot\n", " set given a fixed number of iterations.\n", " \"\"\"\n", " i = 0\n", " c = complex(x,y)\n", " z = 0.0j\n", " for i in range(max_iters):\n", " z = z * z + c\n", " if (z.real * z.real + z.imag * z.imag) >= 4:\n", " return i\n", "\n", " return 255\n", "\n", "def create_fractal(min_x, max_x, min_y, max_y, image, iters):\n", " height = image.shape[0]\n", " width = image.shape[1]\n", "\n", " pixel_size_x = (max_x - min_x) / width\n", " pixel_size_y = (max_y - min_y) / height\n", " for x in range(width):\n", " real = min_x + x * pixel_size_x\n", " for y in range(height):\n", " imag = min_y + y * pixel_size_y\n", " color = mandel(real, imag, iters)\n", " image[y, x] = color\n", "\n", " return image" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5.82 s ± 88.8 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)\n", "47.9 ms ± 403 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n", "Speed-up factor: 121.5x\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "image = np.zeros((500 * 2, 750 * 2), dtype=np.uint8)\n", "t1 = %timeit -c -o -r 3 create_fractal(-2.0, 1.0, -1.0, 1.0, image, 20) # without numba\n", "image = np.zeros((500 * 2, 750 * 2), dtype=np.uint8)\n", "create_fractal, mandel = jit(create_fractal), jit(mandel)\n", "t2 = %timeit -c -o create_fractal(-2.0, 1.0, -1.0, 1.0, image, 20) # with numba\n", "print(\"Speed-up factor: {:.1f}x\".format(t1.average / t2.average))\n", "imshow(image)\n", "show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Example 4: Diffusion controlled reaction" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7.37 s ± 109 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "import numpy as np\n", "\n", "RHO_0 = 0.25 # initial particle density\n", "MCS = 10000 # Monte Carlo steps\n", "RUNS = 10\n", "DIM_0, DIM_1 = 81, 81\n", "\n", "def init_lattice(rho=RHO_0, dim0=DIM_0, dim1=DIM_1):\n", " \"\"\"Place particles on lattice with density rho.\"\"\"\n", " lat = np.zeros(shape=(dim0, dim1), dtype=np.int32)\n", "\n", " for i in range(lat.shape[0]):\n", " for j in range(lat.shape[1]):\n", " if np.random.random() <= rho:\n", " lat[i, j] = 1\n", "\n", " return lat\n", "\n", "def execute_mcs(lat):\n", " \"\"\"Move each particle on the lattice once in a random direction.\"\"\"\n", " lat_t0 = np.copy(lat) # lattice at time t0\n", "\n", " for i in range(lat_t0.shape[0]):\n", " for j in range(lat_t0.shape[1]):\n", " if lat_t0[i, j] == 1: # we found a particle\n", " r = np.random.randint(4)\n", " i1, j1 = i, j\n", "\n", " # move particle up/down/left right\n", " if r == 0: i1 += 1\n", " if r == 1: i1 -= 1\n", " if r == 2: j1 += 1\n", " if r == 3: j1 -= 1\n", "\n", " # boundary conditions\n", " if i1 < 0: i1 = 0\n", " if i1 > lat.shape[0] - 1: i1 = lat.shape[0] - 1\n", " if j1 < 0: j1 = 0\n", " if j1 > lat.shape[1] - 1: j1 = lat.shape[1] - 1\n", "\n", " # check trap\n", " if i1 == lat.shape[0] // 2 and j1 == lat.shape[1] // 2:\n", " # we hit the center (trap), remove the particle\n", " lat[i, j] = 0\n", " elif not lat[i1, j1]:\n", " # new position is empty, move particle\n", " lat[i, j] = 0\n", " lat[i1, j1] = 1\n", "\n", "def calc_rho(lat):\n", " return np.sum(lat) / lat.size\n", "\n", "def execute_simulation(runs=RUNS, mcs=MCS, rho_0=RHO_0):\n", " rho_t = np.zeros(mcs, dtype=np.float)\n", "\n", " for r in range(runs):\n", " lat = init_lattice()\n", " for m in range(mcs):\n", " rho_t[m] += calc_rho(lat)\n", " execute_mcs(lat)\n", "\n", " for i in range(rho_t.shape[0]):\n", " rho_t[i] /= (runs * rho_0)\n", "\n", " return rho_t\n", "\n", "t1 = %timeit -c -o execute_simulation(runs=1, mcs=1000)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "59.4 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n", "Speed-up factor: 124.1x\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from numba import jit\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "RHO_0 = 0.25 # initial particle density\n", "MCS = 10000 # Monte Carlo steps\n", "RUNS = 10\n", "DIM_0, DIM_1 = 81, 81\n", "\n", "@jit(nopython=True)\n", "def init_lattice(rho=RHO_0, dim0=DIM_0, dim1=DIM_1):\n", " \"\"\"Place particles on lattice with density rho.\"\"\"\n", " lat = np.zeros(shape=(dim0, dim1), dtype=np.int32)\n", "\n", " for i in range(lat.shape[0]):\n", " for j in range(lat.shape[1]):\n", " if np.random.random() <= rho:\n", " lat[i, j] = 1\n", "\n", " return lat\n", "\n", "@jit(nopython=True)\n", "def execute_mcs(lat):\n", " \"\"\"Move each particle on the lattice once in a random direction.\"\"\"\n", " lat_t0 = np.copy(lat) # lattice at time t0\n", "\n", " for i in range(lat_t0.shape[0]):\n", " for j in range(lat_t0.shape[1]):\n", " if lat_t0[i, j] == 1: # we found a particle\n", " r = np.random.randint(4)\n", " i1, j1 = i, j\n", "\n", " # move particle up/down/left right\n", " if r == 0: i1 += 1\n", " if r == 1: i1 -= 1\n", " if r == 2: j1 += 1\n", " if r == 3: j1 -= 1\n", "\n", " # boundary conditions\n", " if i1 < 0: i1 = 0\n", " if i1 > lat.shape[0] - 1: i1 = lat.shape[0] - 1\n", " if j1 < 0: j1 = 0\n", " if j1 > lat.shape[1] - 1: j1 = lat.shape[1] - 1\n", "\n", " # check trap\n", " if i1 == lat.shape[0] // 2 and j1 == lat.shape[1] // 2:\n", " # we hit the center (trap), remove the particle\n", " lat[i, j] = 0\n", " elif not lat[i1, j1]:\n", " # new position is empty, move particle\n", " lat[i, j] = 0\n", " lat[i1, j1] = 1\n", "\n", "@jit(nopython=True)\n", "def calc_rho(lat):\n", " return np.sum(lat) / lat.size\n", "\n", "@jit(nopython=True)\n", "def execute_simulation(runs=RUNS, mcs=MCS, rho_0=RHO_0):\n", " rho_t = np.zeros(mcs, dtype=np.float32)\n", "\n", " for r in range(runs):\n", " lat = init_lattice()\n", " for m in range(mcs):\n", " rho_t[m] += calc_rho(lat)\n", " execute_mcs(lat)\n", "\n", " for i in range(rho_t.shape[0]):\n", " rho_t[i] /= (runs * rho_0)\n", "\n", " return rho_t\n", "\n", "t2= %timeit -c -o execute_simulation(runs=1, mcs=1000)\n", "print(\"Speed-up factor: {:.1f}x\".format(t1.average / t2.average))\n", "rho_t = execute_simulation()\n", "plt.plot(rho_t)\n", "plt.yscale('log')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# References\n", "\n", "1. [Stanley Seibert - Accelerating Python with the Numba JIT Compiler (SciPy 2015)](https://www.youtube.com/watch?v=eYIPEDnp5C4)\n", "* Stanley Seibert - Numba: A JIT Compiler for Scientific Python (Continuum Analytics 2015)\n", "* Travis E. Oliphant - Performance Python: Introduction to Numba (PyData 2015)\n", "* [Numpy documentation](https://docs.scipy.org/doc/)\n", "* [Numba documentation](http://numba.pydata.org)\n", "* [Numba examples](https://github.com/numba/numba/blob/master/examples)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }