{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Numba 0.51.0 Release Demo\n", "=======================\n", "\n", "This notebook contains a demonstration of new features present in the 0.51.0 release of Numba. Whilst release notes are produced as part of the [`CHANGE_LOG`](https://github.com/numba/numba/blob/release0.51/CHANGE_LOG), there's nothing like seeing code in action!\n", "\n", "This release notebook contains new CPU target features, the [CUDA target](https://numba.pydata.org/numba-doc/latest/cuda/index.html) also gained a lot of new features in 0.51.0 and so has it's [own demo notebook](https://mybinder.org/v2/gh/numba/numba-examples/master?filepath=notebooks%2FNumba_051_CUDA_Release_Demo.ipynb)!\n", "\n", "Key internal changes:\n", "\n", "* Numba is now backed by **LLVM 10** ([@esc](https://github.com/esc)).\n", "* Numba now does not specialise compilation based on literal values unless requested, this should help with compilation times ([@sklam](https://github.com/sklam)).\n", "\n", "Intel also kindly sponsored research and development that lead to some exciting\n", "new features:\n", "\n", "* Support for immutable heterogeneous lists and immutable heterogeneous string\n", " key dictionaries. Also optional initial/construction value capturing for all\n", " lists and dictionaries containing literal values ([@stuartarchibald](https://github.com/stuartarchibald)).\n", "* A new pass-by-reference mutable structure extension type ``StructRef`` ([@sklam](https://github.com/sklam)).\n", "* Object mode blocks are now cacheable, with the side effect of numerous bug\n", " fixes and performance improvements in caching. This also permits caching of\n", " functions defined in closures ([@sklam](https://github.com/sklam)).\n", "\n", "Demonstrations of new features/changes include:\n", "* [Immutable heterogeneous containers](#Heterogeneous-immutable-containers):\n", " * [Immutable heterogeneous lists](#Immutable-heterogeneous-lists).\n", " * [Immutable heterogeneous string-key dictionaries](#Immutable-string-key-dictionaries).\n", "* [Initial value capturing of literals in lists and dictionaries](#Initial-value-capturing).\n", "* [Caching improvments](#Caching-improvements)\n", " * [Caching of object mode blocks](#Caching-of-object-mode-blocks).\n", " * [Caching of jit functions defined in closures](#Caching-of-functions-defined-in-closures).\n", "* [The new StructRef type for defining mutable pass-by-reference structures](#User-defined-mutable-pass-by-ref-structures).\n", "* [NumPy enhancements](#Newly-supported-NumPy-functions/features)\n", "\n", "First, import the necessary from Numba and NumPy..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from numba import jit, njit, config, __version__, errors, literal_unroll, types\n", "from numba.extending import overload\n", "import numba\n", "import numpy as np\n", "assert tuple(int(x) for x in __version__.split('.')[:2]) >= (0, 51)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "New heterogeneous immutable containers\n", "===================================\n", "\n", "Numba has supported heterogeneous immutable containers (e.g. tuples!) and homogeneous mutable containers (lists and dictionaries) for some time, Numba 0.51 adds support for additional types of immutable heterogeneous containers. Practically these take the form of \"lists of mixed type items\" and \"string key dictionaries mapping to any type of value\", these are only supported by direct definition in `@jit` decorated functions (i.e. can't pass them in from Python). Motivating these by example:\n", "\n", "## Immutable heterogeneous lists" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@njit\n", "def mixed_type_list():\n", " # a list of type [literal intp, unicode string, NumPy 1d array of float64]\n", " x = [1, 'a', np.zeros(5)]\n", " \n", " # getitem works for constant indexes (a literal value known at compile time)\n", " print('getitem', x[1]) # 1 is constant, this prints 'a'\n", " print('len', len(x)) # non-mutating call on the list is ok\n", " \n", " # iteration requires `literal_unroll` as the type of the induction variable\n", " # depends on the iteration, but works for constant values as before\n", " y = [100, 'apple', 200, 'orange']\n", " for i in literal_unroll(y):\n", " print(i)\n", " \n", "mixed_type_list()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Heterogeneously typed lists are immutable, attempted mutation is a compilation error..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@njit\n", "def mixed_type_list_error():\n", " # a list of type [literal intp, unicode string, NumPy 1d array of float64]\n", " x = [1, 'a', np.zeros(5)]\n", " \n", " x.append(2j) # illegal mutation\n", "\n", "try:\n", " mixed_type_list_error()\n", "except errors.TypingError as e:\n", " # CANNOT MUTATE A LITERAL LIST!\n", " print(\"Cannot mutate a literal list!\")\n", " assert \"Cannot mutate a literal list\" in str(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Heterogeneously typed lists also carry their type information, including literal values, such that it's possible to dispatch based on their value types" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def bar(x):\n", " pass\n", "\n", "@overload(bar)\n", "def ol_bar(x):\n", " # If the string \"NOP\" is in the list then return a no-operation function\n", " # else capture the types as strings and return that!\n", " # Note that heterogeneous lists use `.literal_value` to hold\n", " # the types of the item whereas e.g. a tuple uses `.types`, this is because\n", " # heterogeneous lists inherit from `types.Literal`.\n", " \n", " # Look for NOP, do nothing!\n", " if any([getattr(lv, 'literal_value', None) == \"NOP\" for lv in x.literal_value]):\n", " return lambda x: None\n", "\n", " # Capture the type strings\n", " type_str = ', '.join([str(lv) for lv in x.literal_value])\n", " def impl(x):\n", " return \"Item types: \" + type_str\n", " return impl\n", "\n", "\n", "@njit\n", "def mixed_type_list():\n", " # a list of type [literal intp, unicode string, NumPy 1d array of float64]\n", " x = [1, 'a', np.zeros(5)]\n", " print(\"type strings:\", bar(x)) # prints the type strings\n", " # a list with the magic \"NOP\" string\n", " x = [1, 'a', np.zeros(5), \"NOP\"]\n", " print(\"NOP does nothing...!\", bar(x))\n", "\n", "mixed_type_list()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Immutable string key dictionaries\n", "\n", "Following on from immutable heterogeneous lists, immutable heterogeneous string key dictionaries are also now supported. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@njit\n", "def mixed_value_type_str_key_dict():\n", " # str -> mixed types, including array and typed dictionary!\n", " a = {'a': 1, 'b': 'string', 'c': np.arange(5), 'd': {10:20, 30:40}}\n", " \n", " print('getitem', a['d']) # getitem works \n", " [print(\"key\", k) for k in a.keys()] # keys() works\n", " [print(\"value\", v) for v in literal_unroll(a.values())] # as does values()\n", " print('len', len(a)) # non-mutating call on the dictionary is ok\n", " print(\"contains \", 'a' in a, 'z' in a) # and contains as it's read only\n", " # it's slightly contrived, but .items() also works\n", " for item in literal_unroll(a.items()):\n", " k, v = item\n", " print(k, \"->\", v)\n", " \n", " \n", "mixed_value_type_str_key_dict()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and a more advanced example might be to use a dictionary to provide configuration" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@njit\n", "def my_function(data, config):\n", " tmp = data / np.linalg.norm(data, ord=config['normalize'])\n", " iv = config['initial_value']\n", " for i in tmp:\n", " iv += i\n", " return iv\n", "\n", "@njit\n", "def config_example(data):\n", " # pass a dictionary as configuration\n", " config_a = {'normalize': None, 'initial_value': 5}\n", " result_a = my_function(data, config_a)\n", " print(result_a)\n", " \n", " config_b = {'normalize': np.inf, 'initial_value': 10j}\n", " result_b = my_function(data, config_b)\n", " print(result_b)\n", "\n", "config_example(np.arange(10.))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Initial value capturing\n", "\n", "> **NOTE: this is an advanced feature predominantly for use by library authors. It permits dispatching on values recorded from the definition site of the container.**\n", "\n", "Locally defined homogeneous lists and string key dictionaries can now do initial value capturing (and type capturing in the case of dictionaries), this requires the use of [`literally`](https://numba.pydata.org/numba-doc/latest/developer/literal.html#specifying-for-literal-typing) to force literal value dispatch. These types now have an `.initial_value` attribute which contains any information about the values at the definition site, as directly discovered from the bytecode. This is best demonstrated by example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def demo_iv(x):\n", " pass\n", "\n", "@overload(demo_iv)\n", "def ol_demo_iv(x):\n", " # if the initial_value is not present, request literal value dispatch\n", " if x.initial_value is None:\n", " return lambda x: literally(x)\n", " else: # initial_value is present on the type\n", " print(\"type of x: {}. Initial value {}\".format(x, x.initial_value))\n", " return lambda x: ...\n", "\n", "@njit\n", "def initial_value_capturing():\n", " l = [1, 2, 3, 4] # initial value [1, 2, 3, 4]\n", " l.append(5) # not part of the initial value\n", " demo_iv(l)\n", " \n", "initial_value_capturing()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "the same works for dictionaries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@njit\n", "def dict_initial_value_capturing():\n", " d = {'a': 10, 'b': 20, 'c': 30} # initial value {'a': 10, 'b': 20, 'c': 30}\n", " d['d'] = 40 # not part of the initial value\n", " demo_iv(d)\n", " \n", "dict_initial_value_capturing()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "given this information is evidently available at compile time it's naturally possible to dispatch specialisations based on this information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Caching improvements\n", "\n", "Numerous improvements were made to on-disk function caching support in 0.51, to ensure the best performance on Python < 3.8 make sure the `pickle5` module is installed!\n", "\n", "\n", "## Caching of object mode blocks.\n", "\n", "A long requested piece of functionality was added in 0.51, that of being able to cache functions that contain object mode blocks. For example, this is now cacheable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "from numba import objmode\n", "\n", "n = 100\n", "@njit(cache=True) # request caching!\n", "def foo(): # this is a nopython mode function\n", " x = y = 0\n", " for i in range(n):\n", " x += np.sqrt(np.cos(n) ** 2 + np.sin(n) ** 2)\n", " # but this block jumps into object mode j is defined in object mode,\n", " # so we need to tell `nopython` mode its type so it can be used\n", " # outside this block in nopython mode\n", " with objmode(j='int64'): \n", " time.sleep(0.05)\n", " j = i + 10 # j is defined in object mode\n", " y += j\n", " return x, y\n", "\n", "print(foo()) # worked with no warnings!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Caching of functions defined in closures.\n", "As a result of fixing caching of object mode blocks, it's now also possible to cache functions defined in closures:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the specialiser, close over a jitted function argument,\n", "# the inner function is compiled and cached!\n", "def make_function(specialise_on_this_function):\n", " @njit(cache=True)\n", " def specialised(x):\n", " return specialise_on_this_function(x)\n", " return specialised\n", "\n", "@njit(cache=True)\n", "def f(x):\n", " print(\"f(x)\", x)\n", "\n", "@njit(cache=True)\n", "def g(x):\n", " print(\"g(x)\", x)\n", " \n", "# these both cache miss as it had to compile it, but no complaints about doing the caching!\n", "special_f = make_function(f)\n", "special_f(10)\n", "print(special_f.stats)\n", "\n", "special_g = make_function(g)\n", "special_g(20)\n", "print(special_g.stats)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# User defined mutable pass-by-ref structures\n", "\n", "A very common question from users is:\n", "\n", "> What can I use as a mutable structure that's also pass-by-reference?\n", "\n", "the answer is the new ``StructRef`` type (warning: this is experimental!), [documentation is here](http://numba.pydata.org/numba-doc/dev/extending/high-level.html#implementing-mutable-structures)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from numba.experimental import structref\n", "\n", "\n", "# Define a StructRef.\n", "# `structref.register` associates the type with the default data model.\n", "# This will also install getters and setters to the fields of\n", "# the StructRef.\n", "@structref.register\n", "class FruitType(types.StructRef):\n", " def preprocess_fields(self, fields):\n", " # This method is called by the type constructor for additional\n", " # preprocessing on the fields.\n", " # Here, we don't want the struct to take Literal types.\n", " return tuple((name, types.unliteral(typ)) for name, typ in fields)\n", "\n", "\n", "# Define a Python type that can be used as a proxy to the StructRef\n", "# allocated inside Numba. Users can construct the StructRef via\n", "# the constructor for this type in python code and jit-code.\n", "class Fruit(structref.StructRefProxy):\n", " def __new__(cls, kind, amount):\n", " # Overriding the __new__ method is optional, doing so\n", " # allows Python code to use keyword arguments,\n", " # or add other customized behavior.\n", " # The default __new__ takes `*args`.\n", " # IMPORTANT: Users should not override __init__.\n", " return structref.StructRefProxy.__new__(cls, kind, amount)\n", "\n", " # By default, the proxy type does not reflect the attributes or\n", " # methods to the Python side. It is up to users to define\n", " # these. (This may be automated in the future.)\n", "\n", " @property\n", " def kind(self):\n", " # To access a field, we can define a function that simply\n", " # return the field in jit-code. This is to permit access\n", " # to the data in the jit representation of the structure.\n", " # The definition is shown later.\n", " return Fruit_get_kind(self)\n", "\n", " @property\n", " def amount(self):\n", " # The definition of is shown later.\n", " return Fruit_get_amount(self)\n", "\n", "\n", "@njit\n", "def Fruit_get_kind(self):\n", " # In jit-code, the StructRef's attribute is exposed via\n", " # structref.register\n", " return self.kind\n", "\n", "\n", "@njit\n", "def Fruit_get_amount(self):\n", " return self.amount\n", "\n", "\n", "# This associates the proxy with FruitType for the given set of\n", "# fields. Notice how we are not contraining the type of each field.\n", "# Field types remain generic.\n", "structref.define_proxy(Fruit, FruitType, [\"kind\", \"amount\"])\n", "\n", "\n", "from numba.core.extending import overload_method\n", "\n", "# Use @overload_method to add a method for \"eat\"\n", "@overload_method(FruitType, \"eat\")\n", "def ol_eat(self, this_many):\n", "\n", " def impl(self, this_many):\n", " if self.amount >= this_many:\n", " self.amount -= this_many\n", " else:\n", " raise ValueError(\"Insufficient quantity\")\n", "\n", " return impl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the above, and also demonstrate the new `str(int)` support (implemented by [@guilhermeleobas](https://github.com/guilhermeleobas), with thanks!)..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@njit\n", "def demo_struct_mutation():\n", " fruit = Fruit(\"apple\", 5)\n", " print(\"Have \" + str(fruit.amount) + \"s \" + fruit.kind + \".\\n\\nGoing to eat 3...\")\n", " fruit.eat(3)\n", " print(\"Now have \", str(fruit.amount) + \"s \" + fruit.kind + \".\\n\\nGoing to eat 4 more...\")\n", " try:\n", " fruit.eat(4)\n", " except:\n", " print(\"Ran out of \" + fruit.kind + \"s!\")\n", " return fruit\n", "\n", "python_struct = demo_struct_mutation()\n", "print(\"Object returned to Python: kind={}, amount={}\".format(python_struct.kind,\n", " python_struct.amount))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Newly supported NumPy functions/features\n", "=====\n", "\n", "Finally, and with many thanks to contributions from the community, this release contains support for:\n", "* `setitem` with literal string on a record array (by [@luk-f-a](https://github.com/luk-f-a)).\n", "* `np.ndarray` construction from literal value (by [@guilhermeleobas](https://github.com/guilhermeleobas)).\n", "* `np.positive` ufunc support (by [@niteya-shah](https://github.com/niteya-shah)).\n", "* `minlength` kwarg support to `np.bincount` (by [@AndrewEckart](https://github.com/AndrewEckart)).\n", "* `np.divmod` ufunc support (by [@eric-wieser](https://github.com/eric-wieser)).\n", "\n", "a demonstration of these features..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Define a record array for use in the demo\n", "rec_array = np.array([1, 2], dtype=np.dtype([('e', np.int32), ('f', np.float64)], align=True))\n", "\n", "@njit\n", "def new_numpy_features(rec):\n", " print(\"original record\", rec)\n", " print(\"setitem with literal string on record array\") \n", " for f in literal_unroll(('e', 'f')):\n", " rec[0][f] = 10 * ord(f)\n", " print(\"record updated\", rec)\n", "\n", " print(\"np.ndarray from literal\", np.asarray(\"abc\"), np.asarray(123))\n", " print(\"np.positive(np.arange(10))\",np.positive(np.arange(10)))\n", " print(\"np.bincount with minlength\", np.bincount(np.array([0, 1, 2, 1, 3, 2, 4]),\n", " minlength=10))\n", " print(\"np.divmod, multi-output ufunc!\", np.divmod(np.arange(10), 2 ))\n", " \n", "new_numpy_features(rec_array)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 2 }