{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Year of Numba (the year 2018!)\n",
    "\n",
    "A lot changed in Numba throughout 2018, this notebook aims to show some of the highlights. A key driver of a large number of the changes made this year has been direct response to feedback provided by the Numba users community. The core development team are very grateful for such excellent feedback and would like to thank everyone who has contributed.\n",
    "\n",
    "For reference, online help over IRC is available on [Gitter](https://gitter.im/numba/numba) and issues for bugs/questions/feature requests are welcome on [Github](https://github.com/numba/numba/).\n",
    "\n",
    "## Updates to Numba's dependencies\n",
    "\n",
    "* Numba now supports Python 2.7 and Python 3.4-3.7. Typically one release cycle is  all that is\n",
    "  required to adopt a new 3.x Python version.\n",
    "* Numba (via llvmlite) is now backed by LLVM 7.0. Three LLVM releases have been used to back Numba this year, 5, 6 and 7. Again, it is typical for only a single release cycle to be required to adopt a new LLVM version.\n",
    "* CUDA 8.0 is now the minimum supported CUDA version.\n",
    "\n",
    "## New NumPy function support:\n",
    "These are not going to be demonstrated in this notebook, but it is worth mentioning that somewhere in the order of an additional 30 new NumPy functions/`ndarray` methods (or additional kwarg support) have been added to Numba this year. As always a full list of supported NumPy functionality can be found [in the documentation](http://numba.pydata.org/numba-doc/latest/reference/numpysupported.html). The new for 2018 features are:\n",
    "\n",
    "* `numpy.ndarray` methods:\n",
    "  * `numpy.ndarray.conj`\n",
    "  * `numpy.ndarray.conjugate`\n",
    "  * `numpy.ndarray.argsort`\n",
    "  * `numpy.ndarray.dot`\n",
    "  * `numpy.ndarray.transpose`\n",
    "\n",
    "* NumPy functions:\n",
    "  * `numpy.ascontiguousarray`\n",
    "  * `numpy.percentile`\n",
    "  * `numpy.convolve`\n",
    "  * `numpy.corrcoef`\n",
    "  * `numpy.correlate`\n",
    "  * `numpy.cov`\n",
    "  * `numpy.dtype`\n",
    "  * `numpy.ediff1d`\n",
    "  * `numpy.fill_diagonal`\n",
    "  * `numpy.nancumprod`\n",
    "  * `numpy.nancumsum`\n",
    "  * `numpy.nanpercentile`\n",
    "  * `numpy.partition`\n",
    "  * `numpy.reshape`\n",
    "  * `numpy.transpose`\n",
    "  * `numpy.tri`\n",
    "  * `numpy.tril`\n",
    "  * `numpy.triu`\n",
    "  * `numpy.unique`\n",
    "  * `numpy.vander`\n",
    "  * `numpy.random.randint`\n",
    "  * `numpy.random.permutation`\n",
    "\n",
    "* Updates to NumPy function kwarg:\n",
    "  * `np.searchsorted` (`side` kwarg available)\n",
    "  * `np.argsort` (`kind` kwarg with `quicksort` and `mergesort` available.)\n",
    "\n",
    "## Updates to CUDA:\n",
    "A large number of CUDA bugs have been fixed, and the following enhancements made:\n",
    "* Support for FMA and faster float64 atomics on suitable hardware. \n",
    "* Records in const memory\n",
    "* Improved datatime dtype support.\n",
    "* The addition of the `__cuda_array_interface__` member following the NumPy array interface specification to allow Numba to consume externally defined device arrays. This was undertaken in correspondence with CuPy to test out the concept and be able to use CuPy GPU arrays.\n",
    "* Support for the NumPY tranpose API\n",
    "* Permit direct assignment to to DeviceNDArrays\n",
    "* Support added for a larger selection of bit manipulation intrinsics and also SELP."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Demonstration\n",
    "\n",
    "### Prerequisites\n",
    "This demonstration needs at least the following:\n",
    "* Python 2.7 or >= 3.4\n",
    "* NumPy >= 1.11\n",
    "* llvmlite >= 0.27.0\n",
    "* Numba >= 0.42.0\n",
    "* SciPy >= 0.16\n",
    "* progressbar2\n",
    "\n",
    "All of which can be obtained via the Anaconda Python distribution via the `conda` package manager (pip also works!):\n",
    "```\n",
    "$ conda create -n demo_env python=3 numpy scipy numba progressbar2\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you can, try and download this notebook and run it locally. However, if you are running this on binder, performance results may be a bit dubious as understandably the hardware isn't state-of-the-art and there are only a couple of cores available. Further, the GPU examples won't work on binder as there's no GPUs available. Here's what Numba detects about the hardware on which this is running:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!numba -s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, import the most commonly used modules/functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import numba\n",
    "from numba import jit, njit, prange, config, generated_jit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## New Numba/Python interaction\n",
    "A number of significant improvements were made in the interaction between Numba and Python.\n",
    "\n",
    "### Passing jitted functions as arguments\n",
    "Numba now allows the passing of jitted functions (and containers of jitted functions) as arguments to other jitted functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "add_one = njit(lambda x: x + 1)\n",
    "div_two = njit(lambda x: x / 2)\n",
    "\n",
    "@njit\n",
    "def apply_funcs(a, f1, f2):   \n",
    "    return f1(a) + f2(a)\n",
    "\n",
    "A = np.arange(10)\n",
    "apply_funcs(A, add_one, div_two)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lists can now contain NumPy arrays, lists etc\n",
    "Numba's list handling gained support for containing reference-counted types, like NumPy arrays and `list`. This permits a more natural programming style as it's often common for users to want to pass a list of array arguments to a function. Note: list still only support homogeneous data types, i.e. everything in a list must be the same type, like all items being NumPy arrays. Here's an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a_list = [(1, 2), (3, 4), (5, 6)]\n",
    "\n",
    "@njit\n",
    "def list_conv(alist):\n",
    "    new_list = []\n",
    "    for item in alist:\n",
    "        new_list.append(np.ones(item))\n",
    "    return new_list\n",
    "\n",
    "list_conv(a_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Running `object mode` blocks inside `nopython` mode\n",
    "Experimental support was added for executing an arbitrary block of code in `object mode` inside a `nopython` mode function. This is really useful if you want to e.g. occasionally update an image with results or show a progress bar from with in a loop.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import progressbar\n",
    "import time\n",
    "from numba import objmode\n",
    "\n",
    "n = 100\n",
    "with progressbar.ProgressBar(max_value=n) as thebar:\n",
    "    @njit\n",
    "    def foo(): # this is a nopython mode function\n",
    "        x = y = 0\n",
    "        for i in range(n):\n",
    "            x += np.sqrt(np.cos(n) ** 2 + np.sin(n) ** 2)\n",
    "            # but this block jumps into object mode j is defined in object mode,\n",
    "            # so we need to tell `nopython` mode its type so it can be used\n",
    "            # outside this block in nopython mode\n",
    "            with objmode(j='int64'): \n",
    "                thebar.update(i)\n",
    "                time.sleep(0.05)\n",
    "                j = i + 10 # j is defined in object mode\n",
    "            y += j\n",
    "        return x, y\n",
    "    print(foo())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Unicode strings\n",
    "\n",
    "Initial support for Unicode strings has been implemented for Python versions >= 3.4. Support for fundamental string operations has been added as well as support for strings as arguments and return value. The next release of Numba will contain performance updates and additional features to further enhance string support."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if config.PYVERSION > (3, 4): # Only supported in Python >= 3.4\n",
    "    \n",
    "    @njit\n",
    "    def strings_demo(str1, str2, str3):\n",
    "        # strings, ---^  ---^   ---^\n",
    "        # as arguments are now supported!\n",
    "        \n",
    "        # defining strings in compiled code also works\n",
    "        def1 = 'numba is '\n",
    "        \n",
    "        # as do unicode strings\n",
    "        def2 = '🐍⚡'\n",
    "        \n",
    "        # also string concatenation \n",
    "        print(str1 + str2)\n",
    "        \n",
    "        # comparison operations\n",
    "        x = True\n",
    "        x |= (str1 == str2)\n",
    "        x |= (str1 < str2)\n",
    "        x |= (str1 <= str2)\n",
    "        x |= (str1 > str2)\n",
    "        x |= (str1 >= str2)\n",
    "        \n",
    "        # {starts,ends}with\n",
    "        x |= (str1.startswith(str3))\n",
    "        x |= (str2.endswith(str3))\n",
    "        \n",
    "        # len()\n",
    "        print(len(str1), len(def2), len(str3))\n",
    "        \n",
    "        # str.find()\n",
    "        print(str2.find(str3))\n",
    "        \n",
    "        # in\n",
    "        x |= (str3 in str2)\n",
    "        \n",
    "        # slicing\n",
    "        print(str2[1:], str1[:1])\n",
    "        \n",
    "        # and finally, strings can also be returned\n",
    "        return '\\nnum' + str1[1::-1] + def1[5:] + def2\n",
    "    \n",
    "    \n",
    "    # run the demo\n",
    "    print(strings_demo('abc', 'zba', 'a'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exceptions with tracebacks relating to Python source.\n",
    "\n",
    "Tracebacks from exceptions raised in jitted code now contain a synthesized stack frame containing the location where the exception was raised. The stack frame is based on the Python source from which is was compiled, it looks like a CPython traceback, but is coming from compiled code! This makes it easier to use exceptions in `nopython` mode as it is now possible to find out the location from which they were raised. Try commenting/uncommenting the `@njit` decorator and rerunning!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import traceback\n",
    "@njit\n",
    "def raise_exception(x):\n",
    "    if x == 0:\n",
    "        raise Exception('raised x==0. Also, exception arguments are correctly handled', 123, 4j)\n",
    "\n",
    "try:\n",
    "    raise_exception(0)\n",
    "except Exception:\n",
    "    traceback.print_exc()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usability\n",
    "\n",
    "Numba now supports the use of a [per-project configuration file](http://numba.pydata.org/numba-doc/latest/reference/envvars.html#environment-variables) to permanently set behaviours typically set via `NUMBA_*` family environment variables (this requires the `pyyaml` package to be present). Much effort has gone into improving error reporting and the general usability of Numba. This includes highlighted error messages and performance tips documentation (requires the `colorama` package and a [color scheme to be set](http://numba.pydata.org/numba-doc/latest/reference/envvars.html#envvar-NUMBA_COLOR_SCHEME))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In 2018 Numba's primary exception handling mechanism was rewritten based on user feedback to make tracebacks much shorter and more user friendly. Improvements include:\n",
    " * Better error messages, often including diagnostic output.\n",
    " * Identification of the offending source code, this is now printed out with line numbers and a caret pointing to the offending line.\n",
    " * Categorized help messages referring to documentation that may be useful.\n",
    " \n",
    "For example, this is an invalid use of the `add` built-in, note the argument type display, the declaration of signatures Numba knows, the offending source is printed and pointed to and a help message appears:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit\n",
    "def f(x, y):\n",
    "    return x + y\n",
    "\n",
    "try:\n",
    "    f(1, (2,)) # invalid\n",
    "except:\n",
    "    traceback.print_exc()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is an example of a type unification failure, a problem commonly encountered by users, note that the two locations that cause the problem are highlighted and the types are listed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit\n",
    "def f(x):\n",
    "    # Numba cannot statically determine if the return type is tuple or int\n",
    "    if x > 10:\n",
    "        return (1,)\n",
    "    else:\n",
    "        return 1\n",
    "try:\n",
    "    f(0)\n",
    "except:\n",
    "    traceback.print_exc()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example a list of an unknown type is used, note that Numba identifies this and has a specialized help message."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numba import jit\n",
    "@jit(nopython=True)\n",
    "def f(x):\n",
    "    tmp = [] # defined empty\n",
    "    return (tmp, x) # ERROR: the type of `tmp` is unknown\n",
    "try:\n",
    "    f(1)\n",
    "except:\n",
    "    traceback.print_exc()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extending Numba\n",
    "\n",
    "\n",
    "The Numba extension API has gained the ability operate more easily with functions from Cython modules through the use of `numba.extending.get_cython_function_address` to obtain function addresses for direct use in `ctypes.CFUNCTYPE`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ctypes\n",
    "import scipy\n",
    "from numba.extending import get_cython_function_address\n",
    "\n",
    "addr = get_cython_function_address(\"scipy.special.cython_special\", \"j0\")\n",
    "functype = ctypes.CFUNCTYPE(ctypes.c_double, ctypes.c_double)\n",
    "bind_j0 = functype(addr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function `bind_j0` can now be used inside jitted functions, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit\n",
    "def double_j0(x):\n",
    "    return 2 * bind_j0(x)\n",
    "\n",
    "val = 0.5\n",
    "\n",
    "assert double_j0(val) == 2 * scipy.special.j0(0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Literal value support\n",
    "------------------------------\n",
    "Numba 0.41.0 has a significant change made to the typing system that aims to clean up the use of constants. This change takes the form of support for type specific literal values in the type inference mechanism. During typing two passes are now made, the first with anything which is a constant and can expressed as a literal set as such (integers, strings, slices and `make_function` are implemented as literals at present), the second with the standard types used for the constants. This, for example, permits value based dispatch as demonstrated below, but also opens up a lot of future possibilities surrounding typing which were inaccessible prior to this change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@generated_jit\n",
    "def myoverload(arg):\n",
    "    literal_val = getattr(arg, 'literal_value', None)\n",
    "    if literal_val is not None:\n",
    "        if literal_val == 100:\n",
    "            def impl_1(arg):\n",
    "                return 'dispatched: impl_1(literal, value 100)'\n",
    "            return impl_1\n",
    "        else:\n",
    "            def impl_2(arg):\n",
    "                return 'dispatched: impl_2(literal, value not 100)'\n",
    "            return impl_2\n",
    "    else:\n",
    "        def impl_3(arg):\n",
    "                return 'dispatched: impl_3(non-literal type)'\n",
    "        return impl_3\n",
    "\n",
    "@njit\n",
    "def example(x):\n",
    "    print(myoverload(100))         # literal value 100, dispatches impl_1\n",
    "    print(myoverload(99))          # literal value 99, dispatches impl_2\n",
    "    a = 50 + 25 + 2 * 10 + 15 // 3 # `a` is const expr value 100\n",
    "    print(myoverload(a))           # `a` has literal value 100, dispatches impl_1\n",
    "    b = 50 * x                     # `b` non-literal, it's an intp type\n",
    "    print(myoverload(b))           # `b` non-literal intp, has no value, dispatches impl_3\n",
    "\n",
    "example(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ParallelAccelerator\n",
    "===============\n",
    "Following the introduction of ParallelAccelerator technology into Numba in mid-2017, steady improvments were made throughout 2018, including the following..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The thread pool implementation used by Numba for automatic multithreading is configurable to use Intel's Thread Building Blocks (TBB), OpenMP, or the old \"workqueue\" implementation. (TBB is likely to become the preferred default in a future release.) [documentation is here](http://numba.pydata.org/numba-doc/latest/user/threading-layer.html?highlight=threading#the-threading-layers). For the purposes of this example, TBB will be used:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "numba.config.THREADING_LAYER = 'tbb'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now define a demonstration kernel:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "@njit(parallel=True)\n",
    "def parallel_demo(x):\n",
    "    n = x.shape[0]\n",
    "    a = np.sin(x)\n",
    "    b = np.cos(a * a)\n",
    "    acc = 0\n",
    "    for i in prange(n - 2):\n",
    "        for j in prange(n - 1):\n",
    "            acc += b[i] + b[j + 1]\n",
    "    return acc\n",
    "\n",
    "parallel_demo(np.arange(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The threading layer used for execution can easily be inspected as follows..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "numba.threading_layer()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation on ensuring thread and fork-safety whilst using Numba is [available here](http://numba.pydata.org/numba-doc/latest/user/threading-layer.html?highlight=threading#selecting-a-threading-layer-for-safe-parallel-execution). It's now both possible and safe to fork, spawn, use threads, and use Numba's compiler and threading backend simulataneously..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import multiprocessing as mp\n",
    "from threading import Thread, get_ident\n",
    "\n",
    "nthreads = 4\n",
    "\n",
    "# this function is going to be compiled\n",
    "def function(n):\n",
    "    x = np.dot( 3 + 5j * np.ones((n, n)), 2 + 1j * np.ones((n, n)))\n",
    "    return np.linalg.norm(x + np.arange(n) + n)\n",
    "\n",
    "# this work is done by each thread, concurrent compilation and execution\n",
    "def thread_work(results, n):\n",
    "    compiled_function = njit(parallel=True, nogil=True)(function)\n",
    "    # pointless extra work to stress the backend\n",
    "    [compiled_function(n) for _ in range(10)]\n",
    "    # return one result key'd by thread id\n",
    "    results[get_ident()] = compiled_function(n)\n",
    "\n",
    "# this work is done by each process\n",
    "def process_work(n):   \n",
    "    # spin up some threads to do the thread_work\n",
    "    results = dict()\n",
    "    tpool = [Thread(target=thread_work, args=(results, n)) for _ in range(nthreads)]\n",
    "    [t.start() for t in tpool]\n",
    "    [t.join() for t in tpool]\n",
    "    # reduce the per thread results\n",
    "    acc = 0\n",
    "    for v in results.values():\n",
    "        acc += v\n",
    "    return acc\n",
    "\n",
    "# This starts a process pool of 4 processes and maps the process work to it.\n",
    "# Each process then starts 4 threads.\n",
    "# Each thread compile a function for use in the parallel backend and then repeatedly runs it\n",
    "# before returning a single result.\n",
    "# The process then reduces the results all of its threads and returns that result.\n",
    "p = mp.Pool(4)\n",
    "print(p.map(process_work, [100, 200, 300, 400]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Parallel support for `np.arange` and `np.linspace`, also `np.mean`, `np.std`\n",
    "and `np.var` have been added, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit(parallel=True)\n",
    "def parallel_new_function_support(n):\n",
    "    a = np.arange(0, n, 1, np.float64)\n",
    "    b = np.linspace(0, 1, n)\n",
    "    m = np.mean(a)\n",
    "    s = np.std(b)\n",
    "    v = np.var(a + b)\n",
    "    return m + s + v\n",
    "\n",
    "parallel_new_function_support(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Parallel loops now allow arrays as reduction variables, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit(parallel=True)\n",
    "def parallel_array_reduction(n):\n",
    "    y = np.arange(n)\n",
    "    x = np.ones_like(y)\n",
    "    for i in prange(10):\n",
    "        y += x\n",
    "    return y\n",
    "\n",
    "parallel_array_reduction(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Having got ParallelAccelerator working well and achieving great performance, the question asked most often by users was **\"what did it do?!\"**. Numba 0.41.0 added Parallel Diagnostics functionality to address this, calling the `parallel_diagnostics` member function on a function compiled with `parallel=True` set shows the optimizations done by ParallelAccelerator. The documentation for this feature [is here](http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics) and includes a guide for interpreting the output given. For example, obtaining the Parallel Diagnostics with verbosity level `4` on the above function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "parallel_demo.parallel_diagnostics(level=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More general Diagnostics\n",
    "-----------------------\n",
    "Support was added for profiling Numba-compiled functions in Intel VTune, simply set the [`NUMBA_ENABLE_PROFILING`](http://numba.pydata.org/numba-doc/latest/reference/envvars.html?highlight=profiling#envvar-NUMBA_ENABLE_PROFILING) environment variable and ask Vtune to profile the execution.\n",
    "\n",
    "Further, as a result of community feedback, the Numba dispatcher `inspect_types()` method now supports the kwarg `pretty` which if set to `True` will produce ANSI/HTML output, showing the annotated types, when invoked from ipython/jupyter-notebook respectively. Green highlighting shows compiled loops, and yellow interpreted code. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class clazz(object):\n",
    "\n",
    "    def __init__(self, arr):\n",
    "        self._arr = arr\n",
    "\n",
    "    @property\n",
    "    def arr(self):\n",
    "        return self._arr\n",
    "\n",
    "    @arr.setter\n",
    "    def arr(self, value):\n",
    "        self._arr = value\n",
    "\n",
    "@jit # use of class, `c`, prevents `nopython` mode compilation.\n",
    "def foo(a, c):\n",
    "\n",
    "    c.arr += 12\n",
    "\n",
    "    for i in range(len(a)):\n",
    "        a[i] = np.sqrt(i) + 7\n",
    "        if i % 8 + 1 > 4:\n",
    "            a[i] -= np.pi\n",
    "\n",
    "\n",
    "    return a\n",
    "\n",
    "A = np.arange(100.)\n",
    "class_inst = clazz(11)\n",
    "foo(A, class_inst)\n",
    "foo.inspect_types(pretty=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## New hardware support\n",
    "\n",
    "\n",
    "Support for the `ppc64le`, `aarch64` (64bit ARMv8) and `armv7l` (ARMv7 little endian. Yes Numba now works on a Raspberry Pi!) architectures have been added.\n",
    "\n",
    "### Intel SVML\n",
    "Further considerable improvements in vectorization were made available as Numba now supports Intel's short vector math library (SVML). Try it out with `conda install -c numba icc_rt`. This primarily permits the vectorization of fundamental math functions and has variants with lesser precision for use in `fastmath` mode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def demo_svml(n):\n",
    "    ret = np.arange(n)\n",
    "    return np.sqrt(np.cos(ret) ** 2 + np.cos(ret) ** 2)\n",
    "\n",
    "demo_svml_njit = njit(demo_svml)\n",
    "demo_svml_fastmath = njit(fastmath=True)(demo_svml)\n",
    "\n",
    "count = 10000\n",
    "%timeit -o -r 10 -n 1000 demo_svml(count)\n",
    "%timeit -o -r 10 -n 1000 demo_svml_njit(count)\n",
    "%timeit -o -r 10 -n 1000 demo_svml_fastmath(count);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### AMD ROCm\n",
    "\n",
    "2018 saw the addition of a new GPU backend, AMD's ROCm (Radeon Open Compute). Kernels for ROCm supported AMD GPUs can now be compiled using the ROCm driver on Linux. Documentation [is here](http://numba.pydata.org/numba-doc/latest/roc/index.html) with information about prerequistites [here](http://numba.pydata.org/numba-doc/latest/roc/overview.html#installation). The kernel launch syntax and programming model is very similar to that found in Numba's CUDA support, and ufuncs work exactly the same way. For example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numba import roc\n",
    "from numba import cuda\n",
    "from numba import vectorize, float64\n",
    "\n",
    "# this is to handle thread model indexing differences between CUDA and ROCm\n",
    "global_id_func = {roc: lambda : roc.get_global_id(0),\n",
    "                  cuda: lambda : cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x}\n",
    "\n",
    "# Same code, different GPU backends !\n",
    "for gpu_target in ['roc', 'cuda']:\n",
    "    backend = getattr(numba, gpu_target)\n",
    "    if backend.is_available():\n",
    "\n",
    "            @backend.jit(device=True)\n",
    "            def inner(a, b):\n",
    "                return a + b\n",
    "            \n",
    "            index = backend.jit(device=True)(global_id_func[backend])\n",
    "\n",
    "            @backend.jit\n",
    "            def outer(A, B):\n",
    "                i = index()\n",
    "                if i < A.size:\n",
    "                    A[i] = inner(A[i], B[i])\n",
    "\n",
    "            A = np.arange(10)\n",
    "            Aorig = A.copy()\n",
    "            B = np.arange(10)\n",
    "\n",
    "            outer.forall(A.size)(A, B)\n",
    "            assert not np.all(Aorig == A)\n",
    "            np.testing.assert_equal(Aorig + B, A)\n",
    "            \n",
    "            sig = [float64(float64, float64)]\n",
    "            \n",
    "            @vectorize(sig, target=gpu_target)\n",
    "            def vector_add(a, b):\n",
    "                return a + b\n",
    "            \n",
    "            a = np.arange(100.)\n",
    "            b = np.arange(100.) * 3.14\n",
    "            c = vector_add(a, b)\n",
    "            np.testing.assert_equal(c, a + b)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}