{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Slow loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python loops are inefficient for numeric operations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a function that computes the sum of the log of all non-zero values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def sum_log_nz(ary):\n", " res = np.zeros(ary.shape[0])\n", " for i in range(ary.shape[0]):\n", " v = ary[i] \n", " if v != 0:\n", " res[i] = np.log(v)\n", " return res.sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Test the function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.random.random(5_000_000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum_log_nz(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Time the function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time \n", "sum_log_nz(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SIMD Loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numba can compile the inefficient pure-Python loop into SIMD-vectorized native loop." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numba" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try compiling the function with Numba.\n", "\n", "Notice the difference between settings of `fastmath=`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fast_sum_log_nz = numba.njit(fastmath=True)(sum_log_nz)\n", "fast_sum_log_nz" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fast_sum_log_nz(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice the improved performance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "fast_sum_log_nz(a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "fast_sum_log_nz.inspect_cfg(fast_sum_log_nz.signatures[0]).display()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parallel Loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numba can auto-parallize the function to leverage multiple threads." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "par_sum_log_nz = numba.njit(parallel=True)(sum_log_nz)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "par_sum_log_nz(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the `.parallel_diagnostics()` to inspect what the compiler has done to optimize the function.\n", "\n", "Note: \n", "* notice how the manually written loop is not recognized." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "par_sum_log_nz.parallel_diagnostics()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `numba.prange` to mark a loop for parallelization." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@numba.njit(parallel=True, fastmath=True)\n", "def par_sum_log_nz(ary):\n", " res = np.zeros(ary.shape[0])\n", " for i in numba.prange(ary.shape[0]):\n", " v = ary[i] \n", " if v != 0:\n", " res[i] = np.log(v)\n", " return res.sum()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "par_sum_log_nz(a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "par_sum_log_nz(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compare the result of the `.parallel_diagnostics()` with the previous version.\n", "\n", "Note:\n", "* 3 loops are recognized.\n", "* the loops are fused because they iterate over the same domain." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "par_sum_log_nz.parallel_diagnostics()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }