{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Numba 0.49.0 Release Demo\n",
    "=======================\n",
    "\n",
    "This notebook contains a demonstration of new features present in the 0.49.0 release of Numba. Whilst release notes are produced as part of the [`CHANGE_LOG`](https://github.com/numba/numba/blob/664e5b3544ffb00ba89720a278dbfa9a150d345c/CHANGE_LOG), there's nothing like seeing code in action! This release contains some significant changes to Numba internals and, of course, some exciting new features!\n",
    "\n",
    "## Important updates/information about this release:\n",
    "* This release drops support for Python 2 both for users and in the code base itself. It also raises the minimum supported versions of related software as follows:\n",
    " * Python >= 3.6\n",
    " * NumPy >= 1.15\n",
    " * SciPy >=1.0.\n",
    " \n",
    " It's still possible to build with NumPy 1.11 but runtime support is for 1.15 or later.\n",
    "* A huge amount of refactoring happened in this release (mainly module movement) to try and clean up the Numba code base. This refactoring was done so as to make it easier for users to contribute to the project, for core developers to maintain the project, and to remove legacy code. The core developers had been waiting for an opportunity to do this for years and decided that coinciding with Python 2 retirement was best as it would lead to least disruption to users.\n",
    "* As a result of the above, projects that relied on Numba's internals may have to adjust their imports. There is however an import \"shim\" in place in 0.49 that tries to faithfully replicate the original import locations. If one of these shim locations is used, it will issue warnings about the refactoring and state the new import location. This is so that projects relying on Numba's internals have a couple of months to make changes.\n",
    "\n",
    "The core developers would like to offer their thanks to all users for their understanding and support with respect to these changes. If you need help migrating your code base to 0.49 due to this refactoring, try one of:\n",
    " * Asking for help on https://gitter.im/numba/numba\n",
    " * Opening an issue in the [issue tracker](https://github.com/numba/numba/issues).\n",
    "\n",
    "## New features:\n",
    "The new features are split into sections based on use case...\n",
    "\n",
    "#### For all users:\n",
    "* [Thread masking](#Thread-masking)\n",
    "* [First-class function types](#First-class-function-types)\n",
    "* [Typed List update](#Typed-list-update)\n",
    "* [Support for `ord` and `chr`](#Support-for-ord-and-chr)\n",
    "* [Checking if a function is JIT wrapped](#Checking-if-a-function-is-JIT-wrapped)\n",
    "* [Newly supported NumPy functions/features](#Newly-supported-NumPy-functions/features)\n",
    "* [Using tuples in parallel regions](#Using-tuples-in-parallel-regions)\n",
    "\n",
    "#### For CUDA target users:\n",
    "* [All kernels require launch configurations](#All-kernels-require-launch-configurations)\n",
    "\n",
    "#### For Numba extension writers/expert users:\n",
    "* [Static Single Assignment form](#Static-Single-Assignment-form)\n",
    "\n",
    "\n",
    "First, import the necessary from Numba and NumPy..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numba import jit, njit, config, __version__, prange\n",
    "from numba.typed import List\n",
    "config.NUMBA_NUM_THREADS = 4 # for this demo, pretend there's 4 cores on the machine\n",
    "from numba.extending import overload\n",
    "import numba\n",
    "import numpy as np\n",
    "assert tuple(int(x) for x in __version__.split('.')[:2]) >= (0, 49)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# For all users...\n",
    "\n",
    "## Thread masking\n",
    "\n",
    "\n",
    "Numerous users have asked for the ability to dynamically control, at runtime, the number of threads Numba uses in parallel regions. Numba 0.49 brings this functionality, it is modelled after ``OpenMP`` as this is a model familiar to a lot of users. Documentation is [here](http://numba.pydata.org/numba-doc/latest/user/threading-layer.html#setting-the-number-of-threads).\n",
    "\n",
    "The API consists of two functions:\n",
    "\n",
    "* ``numba.get_num_threads()`` - returns the number of threads currently in use.\n",
    "* ``numba.set_num_threads(nthreads)`` - sets the number of threads to use to ``nthreads``.\n",
    "\n",
    "these functions themselves are thread and fork safe and are available to call from both Python and JIT compiled code!\n",
    "\n",
    "For those interested, the implementation details are [here](http://numba.pydata.org/numba-doc/latest/developer/threading_implementation.html#thread-masking), as a warning, they are somewhat gnarly!\n",
    "\n",
    "Now, a demonstration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numba import get_num_threads, set_num_threads\n",
    "\n",
    "# Discover thread mask from Python\n",
    "print(\"Number of threads: {}\".format(get_num_threads()))\n",
    "\n",
    "# Set thread mask from Python\n",
    "set_num_threads(2)\n",
    "\n",
    "# Check it was set\n",
    "print(\"Number of threads: {}\".format(get_num_threads()))\n",
    "\n",
    "@njit\n",
    "def get_mask():\n",
    "    print(\"JIT code, number of threads\", get_num_threads())\n",
    "\n",
    "# Discover thread mask from JIT code\n",
    "get_mask()\n",
    "\n",
    "@njit\n",
    "def set_mask(x):\n",
    "    set_num_threads(x)\n",
    "    print(\"JIT code, number of threads\", get_num_threads())\n",
    "\n",
    "# Set thread mask from JIT code\n",
    "set_mask(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Something more complicated, limiting threads in use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit(parallel=True)\n",
    "def thread_limiting():\n",
    "    n = 5\n",
    "    mask1 = 3\n",
    "    mask2 = 2\n",
    "    \n",
    "    # np.zeros is parallelised, all threads are in use here\n",
    "    A = np.zeros((n, mask1))\n",
    "    \n",
    "    # only use mask1 threads in this parallel region\n",
    "    set_num_threads(mask1)\n",
    "    for i in prange(mask1):\n",
    "        A[:, i] = i\n",
    "\n",
    "    # only use mask2 threads in this parallel region\n",
    "    set_num_threads(mask2)\n",
    "    A[:, :] = np.sqrt(A)\n",
    "\n",
    "    return A\n",
    "\n",
    "print(thread_limiting())\n",
    "\n",
    "# Uncomment and run this to see the parallel diagnostics for the function above\n",
    "# thread_limiting.parallel_diagnostics(thread_limiting.signatures[0], level=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It should be noted that once in a parallel region, setting the number of threads has no effect on the region that is executing, it does however impact subsequent parallel region launches. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mask = config.NUMBA_NUM_THREADS - 1 # create a mask\n",
    "\n",
    "# some constants based on mask size\n",
    "N = config.NUMBA_NUM_THREADS\n",
    "M = 2 * config.NUMBA_NUM_THREADS\n",
    "\n",
    "@njit(parallel=True)\n",
    "def child_func(buf, fid):\n",
    "    M, N = buf.shape\n",
    "    for i in prange(N): # parallel write into the row slice\n",
    "        buf[fid, i] = get_num_threads()\n",
    "\n",
    "@njit(parallel=True)\n",
    "def parent_func(nthreads):\n",
    "    acc = 0\n",
    "    buf = np.zeros((M, N))\n",
    "    print(\"Parent: Setting mask to:\", nthreads)\n",
    "    set_num_threads(nthreads) # set threads to mask\n",
    "    print(\"Parent: Running parallel loop of size\", M)\n",
    "    for i in prange(M):\n",
    "        local_mask = 1 + i % mask\n",
    "        \n",
    "        # set threads in parent function\n",
    "        set_num_threads(local_mask)\n",
    "        \n",
    "        # only call child_func if your thread mask permits!\n",
    "        if local_mask < N:\n",
    "            child_func(buf, local_mask)\n",
    "\n",
    "        # add up all used threadmasks\n",
    "        print(\"prange index\", i, \". get_num_threads()\", get_num_threads())\n",
    "        acc += get_num_threads()\n",
    "    return acc, buf\n",
    "\n",
    "print(\"Calling with mask: {} and constants M = {}, N = {}\".format(mask, M, N))\n",
    "got_acc, got_buf = parent_func(mask)\n",
    "print(\"got acc = {}\".format(got_acc))\n",
    "# expect sum of local_masks in prange(M) loop\n",
    "print(\"expect acc = {}\".format(np.sum(1 + np.arange(M) % mask)))\n",
    "# Output `buf` should only be written to in rows with index < N as\n",
    "# the thread mask would forbid it, the contents of the rows is the thread mask\n",
    "print(got_buf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## First-class function types\n",
    "\n",
    "For quite some time Numba has been able to pass around Numba JIT decorated functions as objects, these, however, have been seen by Numba as different types even if they have identical signatures. Numba 0.49.0 brings a new experimental feature that makes function objects first class types such that functions with the same signatures can be see has being \"of the same type\" for the purposes of type inference. Further ``cfunc``s, JIT functions and a new \"Wrapper address protocol\" based functions are all supported to some degree. Documentation is [here](http://numba.pydata.org/numba-doc/latest/reference/types.html#functions).\n",
    "\n",
    "An example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit(\"intp(intp)\")\n",
    "def foo(x):\n",
    "    return x + 1\n",
    "\n",
    "@njit(\"intp(intp)\")\n",
    "def bar(x):\n",
    "    return x + 2\n",
    "\n",
    "@njit(\"intp(intp)\")\n",
    "def baz(x):\n",
    "    return x + 3\n",
    "\n",
    "@njit\n",
    "def apply(arg, *functions):\n",
    "    for fn in functions: # to iterate over a container it must contain \"all the same types\"\n",
    "        arg = fn(arg)\n",
    "    return arg\n",
    "\n",
    "apply(10, foo, bar, baz)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Typed list update\n",
    "\n",
    "Numba's ``typed.List`` container has been enhanced with the ability to construct a new instance directly from an iterable, this saving a lot of boiler plate code. A quick demonstration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numba.typed import List\n",
    "\n",
    "print(List(range(10)))\n",
    "\n",
    "x = [4., 6., 2., 1.]\n",
    "print(List(x))\n",
    "\n",
    "# also works in JIT code\n",
    "@njit\n",
    "def list_ctor(x):\n",
    "    return List(x), List((1, 2, 3, 4))\n",
    "\n",
    "list_ctor(np.arange(10.))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Support for `ord` and `chr`\n",
    "\n",
    "For users wanting to encode/decode strings, particularly those of the ASCII variety, `ord` and `chr` are now supported:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit\n",
    "def demo_ord_chr():\n",
    "    alphabet = 'abcdefghijklmnopqrstuvwxyz'\n",
    "    lord = List()\n",
    "    lchr = List()\n",
    "    for idx, char in enumerate(alphabet, ord('A')):\n",
    "        lord.append(ord(char))\n",
    "        lchr.append(chr(idx))\n",
    "    return lord, lchr\n",
    "\n",
    "demo_ord_chr()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Checking if a function is JIT wrapped\n",
    "\n",
    "A common question from writers of extension library that can consume Numba functions is \"How do I know if a function my application receives as an argument is already Numba JIT wrapped?\". Numba 0.49 answers this with the `numba.extending.is_jitted` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def some_func(x):\n",
    "    return x + 1\n",
    "\n",
    "def consumer(func, *args):\n",
    "    if not numba.extending.is_jitted(func):\n",
    "        print(\"Not JIT wrapped, will wrap and compile!\")\n",
    "        func = njit(func)\n",
    "    return func(*args)\n",
    "\n",
    "consumer(some_func, 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Newly supported NumPy functions/features\n",
    "\n",
    "This release contains support for direct iteration over `np.ndarray`s and one newly supported NumPy function, `np.isnat`, all written by contributors from the Numba community:\n",
    "\n",
    " \n",
    "A quick demo of the above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "NAT = np.datetime64('NaT')\n",
    "dt = np.dtype('<M8')\n",
    "\n",
    "@njit\n",
    "def demo_numpy():\n",
    "    a = np.empty((5, 3, 2), dt)\n",
    "    out = np.zeros_like(a, np.bool_)\n",
    "    # iterate with ndindex\n",
    "    for x in np.ndindex(a.shape):\n",
    "        if np.random.random() < 0.5:\n",
    "            a[x] = NAT\n",
    "            \n",
    "    count = 0\n",
    "    # now iterate directly\n",
    "    for twoDarr in a:\n",
    "        for oneDarr in twoDarr:\n",
    "            for item in oneDarr:\n",
    "                if np.isnat(item):\n",
    "                    count += 1\n",
    "    \n",
    "    # use ufunc\n",
    "    ufunc_count = np.isnat(a).sum()\n",
    "    \n",
    "    assert count == ufunc_count\n",
    "    \n",
    "\n",
    "demo_numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using tuples in parallel regions\n",
    "\n",
    "Due to long standing issues in the internal implementation of parallel regions (that they are based on __Generalized Universal Functions__), functions with `parallel=True` have not supported `tuple` \"arguments\" to these regions. This is a bit of a technical detail, but is now fixed, so common things like expressing a loop nest iteration limits from an array shape works as expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit(parallel=True)\n",
    "def demo_tuple_in_prange(A):\n",
    "    for i in prange(A.shape[0]):\n",
    "        for j in range(A.shape[1]):\n",
    "            for k in range(A.shape[2]):\n",
    "                A[i, j, k] = i + j + k\n",
    "\n",
    "x = 4\n",
    "y = 3\n",
    "z = 2\n",
    "A = np.empty((x, y, z))\n",
    "demo_tuple_in_prange(A)\n",
    "print(A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## For CUDA target users...\n",
    "\n",
    "Prior to Numba 0.49, if a user forgot to specify a launch configuration to a CUDA kernel a default configuration of one thread and one block was used. This lead to hard to explain behaviours for example, code that worked by virtue of running in this minimum configuration, or code that exhibited strange performance characteristics.\n",
    "\n",
    "### All kernels require launch configurations\n",
    "As a result, in Numba 0.49, it is now a requirement for all CUDA kernel launches to be explicitly configured in both the CUDA simulator and on real hardware. Example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config.ENABLE_CUDASIM = 1\n",
    "from numba import cuda\n",
    "\n",
    "@cuda.jit\n",
    "def kernel(x):\n",
    "    print(\"In the kernel\", cuda.threadIdx)\n",
    "\n",
    "# bad launch, no configuration given\n",
    "try:\n",
    "    kernel(np.arange(10))\n",
    "except ValueError as e:\n",
    "    print(e)\n",
    "    \n",
    "# good launch, configuration specified\n",
    "kernel[2, 4](np.arange(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### External Memory Management (EMM) Plugin interface\n",
    "\n",
    "Whilst not possible to demonstrate this feature in the current notebook, Numba 0.49 gains an External Memory Management (EMM) Plugin interface. When multiple CUDA-aware libraries are used together, it may be preferable for Numba to defer to another library for memory management. The EMM Plugin interface facilitates this, by enabling Numba to use another CUDA-aware library for all allocations and deallocations. Documentation for this feature is [here](http://numba.pydata.org/numba-doc/latest/cuda/external-memory.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# For developers of Numba extensions...\n",
    "\n",
    "There's three changes that may be of interest to those working on Numba extensions or with Numba IR:\n",
    "\n",
    "1. Numba transforms it's IR to SSA.\n",
    "2. Debug dumps now have syntax highlighting.\n",
    "3. Disassembly CFGs are now available (not demonstrated here, see [documentation](http://numba.pydata.org/numba-doc/latest/reference/jit-compilation.html#Dispatcher.inspect_disasm_cfg)).\n",
    "\n",
    "\n",
    "## Static Single Assignment form\n",
    "Numba 0.49 contains the start of an important change to Numba's internal representation (IR). The change is essentially that the IR is now coerced into [static single assignment (SSA)](https://en.wikipedia.org/wiki/Static_single_assignment_form) form immediately prior to when type inference is performed. This fixes a number of bugs and makes it considerably easier to write more advanced optimisation passes. It's hoped that SSA form can be extended further up the compilation pipeline as time allows.\n",
    "\n",
    "A quick demonstration that shows SSA form and the new syntax highlighted dumps in action:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config.COLOR_SCHEME = 'light_bg' # colour scheme highlighting for a light background\n",
    "config.HIGHLIGHT_DUMPS = '1' # request dump highlighting \n",
    "config.DEBUG_PRINT_WRAP = 'reconstruct_ssa' # print IR both sides of the SSA reconstruction pass\n",
    "\n",
    "@njit\n",
    "def demo_ssa(x):\n",
    "    if x > 2:\n",
    "        a = 12\n",
    "    elif x > 4:\n",
    "        a = 20\n",
    "    else:\n",
    "        a = 3\n",
    "    return a\n",
    "\n",
    "print(demo_ssa(5))\n",
    "\n",
    "# switch it off again!\n",
    "config.DEBUG_PRINT_WRAP = ''"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}