{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Problem 1: The Mandelbrot Fractal" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "This is a fun problem, for several reasons:\n", "\n", "* It produces pretty pictures\n", "* There are lots of variations to play with\n", "* The algorithm can exit early, making it non-trivial to vectorize\n", "\n", "Let's import some libraries. Note, to automatically see plots, sometimes you may have to do:\n", "```python\n", "%matplotlib inline\n", "```\n", "\n", "(or `notebook`, `widget`) - for the recommended setup, you should be fine without these." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "# Extra performance libraries for later\n", "# import numexpr\n", "import numba\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can generate a Mandelbrot fractal by applying the transform:\n", "\n", "$$\n", "z_{n+1}=z_{n}^{2}+c\n", "$$\n", "\n", "repeatedly to a regular matrix of complex numbers $c$, and recording the iteration number where the value $|z|$ surpassed some bound, usually 2. You start at $z_0 = c$.\n", "\n", "\n", "\n", "Let's set up some initial parameters and a helper matrix:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "1j + 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "maxiterations = 50\n", "\n", "# 300 x 400 matrix of complex numbers from [-1.5j, 1.5j] x [-2, 2]\n", "c = np.sum(np.broadcast_arrays(*np.ogrid[-1.5j:1.5j:300j, -2:2:400j]), axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Note that in the interest of absolute brevity, I've taken advantage of the fact `ogrid` works with complex numbers; however, `mgrid` does not. `ogrid` is faster anyway.\n", "\n", "Let's make sure we have the correct `c`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(1, 2, figsize=(6, 3))\n", "axs[0].pcolormesh(c.real, c.imag, c.real)\n", "axs[1].pcolormesh(c.real, c.imag, c.imag);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, let's make the fractal as simply as possible in numpy:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fractal = np.zeros_like(c, dtype=np.int32)\n", "z = c.copy()\n", "\n", "for i in range(1, maxiterations + 1):\n", " z = z**2 + c # Compute z\n", " diverge = abs(z) > 2 # Divergence criteria\n", "\n", " z[diverge] = 2 # Keep number size small\n", " fractal[~diverge] = i # Fill in non-diverged iteration number" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(4, 3))\n", "ax.pcolormesh(c.real, c.imag, fractal);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a notebook, you often start with raw code (like above) for easy debugging, but once it works, you put in in a function, like the function below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def fractal_numpy(c, maxiterations):\n", " f = np.zeros_like(c, dtype=np.int32)\n", " z = c.copy()\n", "\n", " for i in range(1, maxiterations + 1):\n", " z = z * z + c # Compute z\n", " diverge = np.abs(z**2) > 2**2 # Divergence criteria\n", "\n", " z[diverge] = 2 # Keep number size small\n", " f[~diverge] = i # Fill in non-diverged iteration number\n", "\n", " return f" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%timeit\n", "fractal_numpy(c, maxiterations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While we wouldn't really do this normally, expecting it to be *much* slower, here is the pure Python version:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def fractal_pure(c, maxiterations):\n", " fractal = np.zeros_like(c, dtype=np.int32)\n", "\n", " for yi in range(c.shape[0]):\n", " for xi in range(c.shape[1]):\n", " z = cxy = c[yi, xi]\n", " for i in range(1, maxiterations + 1):\n", " z = z**2 + cxy\n", " if abs(z) > 2:\n", " break\n", " fractal[yi, xi] = i\n", " return fractal" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%timeit\n", "fractal_pure(c, maxiterations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I don't know about you, but that was much faster than I would have naively expected. Why? What is different about the *algorithm*?\n", "\n", "\n", "\n", "For later use, and for better design, let's break up the above function into to pieces; the \"on each\" part and the part that applies it to each element (vectorization)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def on_each_python(cxy, maxiterations):\n", " z = cxy\n", " for i in range(1, maxiterations + 1):\n", " z = z * z + cxy\n", " if abs(z) > 2:\n", " return i\n", " return i\n", "\n", "\n", "def fractal_python(c, maxiterations):\n", " fractal = np.zeros_like(c, dtype=np.int32)\n", "\n", " for yi in range(c.shape[0]):\n", " for xi in range(c.shape[1]):\n", " fractal[yi, xi] = on_each_python(c[yi, xi], maxiterations)\n", "\n", " return fractal" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%timeit\n", "fractal_python(c, maxiterations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While we'll look at something much more interesting soon, here is NumPy's vectorize. This is not supposed to do much except replace the outer function we had to manually define (though I've actually found it to be noticeably faster)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fractal_npvectorize = np.vectorize(on_each_python)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%timeit\n", "fractal_npvectorize(c, maxiterations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can plot any of these to make sure they work:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fractal = fractal_npvectorize(c, maxiterations)\n", "fig, ax = plt.subplots(figsize=(4, 3))\n", "ax.pcolormesh(c.real, c.imag, fractal)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Profiling\n", "\n", "Never optimize until you have profiled! If code becomes uglier/harder to maintain, you *must* have a solid reason for doing so.\n", "\n", "Let's look at the `line_profiler` package, which has fairly nice IPython magic support. First let's load the magic:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext line_profiler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we run the magic with `-f function_to_profile` and the command to profile. Only the lines of the function we specify will show up:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%lprun -f fractal_numpy fractal_numpy(c, maxiterations)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you don't have external packages available, the built-in `cProfile` is also usable, though not as pretty.\n", "\n", "> #### Note\n", ">\n", "> Most standard library modules with names like `something, cSomething` were merged in Python 3, with the faster compiled implementation being selected automatically. This one was not, since `cProfile` and `profile` are not quite identical. `profile` is much slower." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import cProfile" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cProfile.run(\"fractal_numpy(c, maxiterations)\", sort=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Note: Numba takes full advantage of the instruction set on your system, since it does not expect to be compiled and run on a different machine; thus often Numba will be faster than normally compiled code)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numexpr\n", "\n", "How can we make NumPy faster? Expressions are slow in NumPy because they usually create lots of temporary arrays, and memory allocations are costly. To avoid this, you could manually reuse memory, but this would require lots of ugly rewriting, such as taking `a + b + c` and writing `t = a + b; b += c`. \n", "\n", "> Starting with NumPy 1.13, some simple expressions like the one above, will [avoid making memory copies](https://docs.scipy.org/doc/numpy/release.html#temporary-elision) (generally on Unix only)\n", "\n", "There's a second issue; even with avoiding unneeded temporaries, you still have to run multiple kernels (computation functions) - it would be nicer if you could just do the full calculation on each input and produce one output, with no in-between steps.\n", "\n", "One way to do this is with numexpr. This is an odd little library that can compile small expressions just-in-time (JIT). Here's what it looks like in practice:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numexpr\n", "\n", "a, b = np.random.random_sample(size=(2, 100_000))\n", "\n", "print(\"classic\", 2 * a + 3 * b)\n", "print(\"numexpr\", numexpr.evaluate(\"2 * a + 3 * b\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%timeit\n", "c = 2 * a + 3 * b # Try 2 * a**2 + 3 * b**3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%timeit\n", "c = numexpr.evaluate(\"2 * a + 3 * b\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, numexpr is *very* limited. It has a small set of data types, a small collection of supported operators and basic functions, and works one-line-at a time. You can make it less magical with feed dictionaries if you want." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numba\n", "\n", "If that managed to whet your appitite, let's look at Numba - a Python to LLVM JIT compiler! We'll see it again, but for now, here's a little demo:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@numba.vectorize\n", "def f(a, b):\n", " return 2 * a + 3 * b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "c = f(a, b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%timeit\n", "c = f(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It took the function we defined, pulled it apart, and turned into Low Level Virtual Machine (LLVM) code, and compiled it. No special strings or special syntax; it is just a (large) subset of Python and NumPy. And users and libraries can extend it too. It also supports:\n", "\n", "* Vectorized, general vectorized, or regular functions\n", "* Ahead of time compilation, JIT, or dynamic JIT\n", "* Parallelized targets\n", "* GPU targets via CUDA or ROCm\n", "* Nesting\n", "* Creating cfunction callbacks\n", "\n", "It is almost always as fast or faster than any other compiled solution (minus the JIT time). A couple of years ago it became much easier to install (via PIP with LLVMLite's lightweight and independent LLVM build)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Project: accelerate\n", "\n", "Try some of the following:\n", "\n", "* Use `numexpr` to replace parts of the above calculation. Why is this not very effective?\n", "* Try reducing the number of memory allocations by using numpy\n", "* Try accelerating using `@numba.njit`\n", "* Try accelerating using `@numba.vectorize`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Further reading:\n", "\n", "* [Christoph Deil's Numba talk](https://christophdeil.com/download/2019-07-11_Christoph_Deil_Numba.pdf)\n", "* [CompClass](https://github.com/henryiii/compclass): Several days visited this, including week 12\n", "* Any of Jim's classes (see intro talk)\n", "* The distributed lesson will revisit fractals" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:performance-minicourse] *", "language": "python", "name": "conda-env-performance-minicourse-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" }, "vscode": { "interpreter": { "hash": "2d4e9b9c84dab3e1662173f95b81bd7f8a551068d04f5f3c42d164db7312a928" } } }, "nbformat": 4, "nbformat_minor": 4 }