{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "NumPy and numba" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from __future__ import print_function\n", "import numba\n", "import numpy as np\n", "import math\n", "import llvm\n", "import ctypes\n", "print(\"numba version: %s \\nNumPy version: %s\\nllvm version: %s\" % (numba.__version__,np.__version__, llvm.__version__))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "numba version: 0.12.0-10-g4e41ab2-dirty \n", "NumPy version: 1.7.1\n", "llvm version: 0.12.0\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "_NumPy_ provides a compact, typed container for homogenous arrays of data. This is ideal to store data homogeneous data in *Python* with little overhead. _NumPy_ also provides a set of functions that allows manipulation of that data, as well as operating over it. There is a rich ecosystem around _Numpy_ that results in fast manipulation of _Numpy arrays_, as long as this manipulation is done using pre-baked operations (that are typically vectorized). This operations are usually provided by extension modules and written in C, using the _Numpy C API_.\n", "\n", "_numba_ allows generating native code from Python functions just by adding decorators. This code is wrapped and directly callable from within Python.\n", "\n", "There are many cases where you want to apply code to your _NumPy_ data, and need that code to execute fast. You may get lucky and have the functions you want already written in the extensive _NumPy_ ecosystem. Otherwise you will end with some code that is not that fast, but that you can improve execution time by writing code the \"NumPy way\". If it is not fast enough, you can write an extension module using the _Numpy C API_. Writing an extension module will take quite a bit of time, and forces you to a slow compile-install-test cycle.\n", "\n", "Wouldn't it be great if you could just write code in *Python* that describes your function and execute it at speed similar to that of what you could achieve with the extension module, all without leaving the *Python interpreter*? _numba_ allows that.\n", "\n", "_Numba_ is _NumPy_ aware. This means:\n", "\n", "- It natively understands _NumPy_ arrays, shapes and dtypes. *NumPy* arrays are supported as native types.\n", "- It knows how to index/slice a _NumPy_ array without relying on *Python*.\n", "- It provides supports for generating _ufuncs_ and _gufuncs_ from inside the *Python* interpreter." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "_Numba_ understands _NumPy_ arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy arrays are understood by numba. By using the *numba.typeof* we can see that numba not only knows about the arrays themshelves, but also about its shape and underlying dtypes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "array = np.arange(2000, dtype=np.float_)\n", "numba.typeof(array)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "array(float64, 1d, C)" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "numba.typeof(array.reshape((2,10,100)))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "array(float64, 3d, C)" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the point of view of *numba*, there are three factors that identify the array type:\n", "\n", " - The base type (dtype)\n", " \n", " - The number of dimensions (len(shape)). Note that for numba the arity of each dimension is not considered part of the type, only the dimension count.\n", " \n", " - The arrangement of the array. 'C' for C-like, 'F' for FORTRAN-like, 'A' for generic strided array.\n", " \n", "It is easy to illustrate how the arity of an array is not part of the dtype in *numba* with the following samples:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "numba.typeof(array.reshape((2,10,100))) == numba.typeof(array.reshape((4,10,50)))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "True" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "numba.typeof(array.reshape((2,10,100))) == numba.typeof(array.reshape((40,50)))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "False" ] } ], "prompt_number": 5 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Specifying an array type in *numba*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In *numba* you can build the type specification by basing it on the base type for the array.\n", "\n", "So if numba.float32 specifies a single precision floating point number:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "float32" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "numba.float32[:] specifies an single dimensional array of single precision floating point numbers:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[:]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "array(float32, 1d, A)" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adding dimensions is just a matter of tweaking the slice description passed:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[:,:]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "array(float32, 2d, A)" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[:,:,:,:]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "array(float32, 4d, A)" ] } ], "prompt_number": 9 }, { "cell_type": "raw", "metadata": {}, "source": [ "As you can see, all the specified arrays are \"strided\". It is possible to specify that a given dimension is consecutive in memory by using ::1 in such dimension. This allows describing C-type arrays and F-type arrays.\n", "\n", "row-major arrays (C-type) have the elements in the last dimension packed together:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[:,::1]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "array(float32, 2d, C)" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[:,:,::1]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "array(float32, 3d, C)" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "column-major arrays (F-type) have elements in the first dimension packed together:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[::1,:]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "array(float32, 2d, F)" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[::1,:,:]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "array(float32, 3d, F)" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The use of any other dimension as consecutive is handled as a strided array:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[:,::1,:]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "array(float32, 3d, A)" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the array arrangement does change the type, although numba will easily coerce a C or FORTRAN array into a strided one:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[::1,:] == numba.float32[:,::1]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "False" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "numba.float32[::1,:] == numba.float32[:,:]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "False" ] } ], "prompt_number": 16 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "NumPy arrays as arguments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In all cases, NumPy arrays are passed to numba functions by reference. This means that any change performed on the argument in the function will modify the contents of the original matrix. This behavior maps the usual NumPy semantics. So the array values passed as arguments to a *numba* functions can be considered as input/output arguments." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "_Numba_ knows how to index and slice a _Numpy_ array natively" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indexing and slicing of _NumPy arrays_ are handled natively by _numba_. This means that it is possible to _index_ and _slice_ a _Numpy array_ in _numba_ compiled code without relying on the _Python runtime_. In practice this means that _numba_ code running on _NumPy arrays_ will execute with a level of efficiency close to that of C.\n", "\n", "Let's make a simple function that uses indexing. For example a really naive implementation of a sum:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def sum_all(A):\n", " \"\"\"Naive sum of elements of an array... assumes one dimensional array of floats\"\"\"\n", " acc = 0.0\n", " for i in xrange(A.shape[0]):\n", " acc += A[i]\n", " return acc" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "sample_array = np.arange(10000.0)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The _pure Python_ approach of this naive function is quite underwhelming speed-wise:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit sum_all(sample_array)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 5.44 ms per loop\n" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we relied on _NumPy_ it would be much faster:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit np.sum(sample_array)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10000 loops, best of 3: 19.8 \u00b5s per loop\n" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "But with _numba_ the speed of that _naive_ code is quite good:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sum_all_jit = numba.jit('float64(float64[:])')(sum_all)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit sum_all_jit(sample_array)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100000 loops, best of 3: 11.7 \u00b5s per loop\n" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is in part possible because of the native support for indexing in _numba_. The function can be compiled in a nopython context, that makes it quite fast:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sum_all_jit = numba.jit('float64(float64[:])', nopython=True)(sum_all)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 23 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "_Numba_ supports generating _NumPy_ _ufuncs_ and _gufuncs_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In _NumPy_ there are universal functions([_ufuncs_](http://docs.scipy.org/doc/numpy/reference/ufuncs.html)) and generalized universal functions ([_gufuncs_](http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html)). \n", "\n", "- _ufuncs_ are quite established and allows mapping of scalar operations over _NumPy_ arrays. The resulting vectorized operation follow _Numpy_'s broadcasting rules.\n", "\n", "- _gufuncs_ are a generalization of _ufuncs_ that allow vectorization of _kernels_ that work over the inner dimensions of the arrays. In this context a _ufunc_ would just be a _gufunc_ where all the operands of its kernels have 0 dimensions (i.e. are scalars).\n", "\n", "_ufuncs_ and _gufuncs_ are typically built using _Numpy's C API_. _Numba_ offers the possibility to create _ufuncs_ and _gufuncs_ within the Python interpreter, using Python functions to describe the _kernels_. To access this functionality _numba_ provides the *vectorize* decorator and the *GUVectorize* class." ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "The *vectorize* decorator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*vectorize* is the decorator to be used to build *ufuncs*. Note that as of this writing, it is not in the numba namespace, but in *numba.vectorize*." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print(numba.vectorize.__doc__)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "vectorize(ftylist[, target='cpu', [**kws]])\n", "\n", " A decorator to create numpy ufunc object from Numba compiled code.\n", "\n", " Args\n", " -----\n", " ftylist: iterable\n", " An iterable of type signatures, which are either\n", " function type object or a string describing the\n", " function type.\n", "\n", " target: str\n", " A string for code generation target. Defaults to 'cpu'.\n", "\n", " Returns\n", " --------\n", "\n", " A NumPy universal function\n", "\n", " Example\n", " -------\n", " @vectorize(['float32(float32, float32)',\n", " 'float64(float64, float64)'])\n", " def sum(a, b):\n", " return a + b\n", "\n", " \n" ] } ], "prompt_number": 24 }, { "cell_type": "raw", "metadata": {}, "source": [ "Its usage is pretty simple, just write the scalar function you want for your _ufunc_. Then just decorate it with _vectorize_, passing as a parameter the signatures you want your code to be generated. The generated _ufunc_ will be handled as any other _NumPy_ _ufunc_. That means that type promotions and broadcasting rules follow those of _NumPy_." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, let's write a sample ufunc that performs a lineal interpolation between A and B. The 'kernel' will look like this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def lerp(A,B,factor):\n", " \"\"\"interpolates A and B by factor\"\"\"\n", " return (1-factor)*A + factor*B" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "lerp(0.0, 10.0, 0.3)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 26, "text": [ "3.0" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's do a _ufunc_ for the floating point types. I will be using vectorize as a function, but remember that you could just add the decorator in the definition of the kernel itself." ] }, { "cell_type": "code", "collapsed": false, "input": [ "lerp_ufunc = numba.vectorize(['float32(float32, float32, float32)', 'float64(float64, float64, float64)'])(lerp)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can run our lerp with all of _NumPy_'s niceties, like broadcasting of one operand (in this case the factor)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "A = np.arange(0.0, 100000.0, 2.0)\n", "B = np.arange(100000.0, 0.0, -2.0)\n", "F = np.array([0.5] * 50000)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "lerp_ufunc(A,B,0.5)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 29, "text": [ "array([ 50000., 50000., 50000., ..., 50000., 50000., 50000.])" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also quite fast:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit lerp_ufunc(A, B, 0.5)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1000 loops, best of 3: 364 \u00b5s per loop\n" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit lerp(A, B, 0.5)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10000 loops, best of 3: 175 \u00b5s per loop\n" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that in this case the same original function can be used to generate the _ufunc_ and to execute the equivalent _NumPy_ vectorized version. When executing there will be differences in how the expression is evaluated.\n", "\n", "When using _NumPy_ the expression is evaluated one operation at a time, over the entire vector. _Numba_ generated code will evaluate the full expression in one go, for each element. The _numba_ approach approach avoids having temporal intermmediate arrays built, as well as avoiding revisiting operands that are being used more than once in a expression. This is useful with big arrays of data where there will be savings in process memory usage as well as better cache usage." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def sample_poly(x):\n", " return x - x*x*x/6.0 + x*x*x*x*x/120.0" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 32 }, { "cell_type": "code", "collapsed": false, "input": [ "S = np.arange(0, np.pi, np.pi/36000000)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "sample_poly_ufunc = numba.vectorize(['float32(float32)', 'float64(float64)'])(sample_poly)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 34 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit sample_poly(S)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 2.51 s per loop\n" ] } ], "prompt_number": 35 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit sample_poly_ufunc(S)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 633 ms per loop\n" ] } ], "prompt_number": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also worth noting that numba's _vectorize_ provides similar convenience to that of NumPy's _vectorize_, but with performance similar to an _ufunc_.\n", "\n", "For example, let's take the example in [NumPy's vectorize documentation](http://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def myfunc(a, b):\n", " \"Return a-b if a>b, otherwise return a+b\"\n", " if a > b:\n", " return a - b\n", " else:\n", " return a + b" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": [ "myfunc_input = np.arange(100000.0)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "numpy_vec_myfunc = np.vectorize(myfunc) \n", "%timeit numpy_vec_myfunc(myfunc_input, 50000)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 32.1 ms per loop\n" ] } ], "prompt_number": 39 }, { "cell_type": "code", "collapsed": false, "input": [ "numba_vec_myfunc = numba.vectorize(['float64(float64, float64)'])(myfunc)\n", "%timeit numba_vec_myfunc(myfunc_input, 50000)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1000 loops, best of 3: 597 \u00b5s per loop\n" ] } ], "prompt_number": 40 }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "The guvectorize decorator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the same way the _vectorize_ allows building _NumPy_'s _ufuncs_ from inside the Python interpreter just by writing the expression that forms the kernel; guvectorize allows building _Numpy_'s _gufuncs_ without the need of writing a C extension module." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print(numba.guvectorize.__doc__)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "guvectorize(ftylist, signature, [, target='cpu', [**kws]])\n", "\n", " A decorator to create numpy generialized-ufunc object from Numba compiled\n", " code.\n", "\n", " Args\n", " -----\n", " ftylist: iterable\n", " An iterable of type signatures, which are either\n", " function type object or a string describing the\n", " function type.\n", "\n", "\n", " signature: str\n", " A NumPy generialized-ufunc signature.\n", " e.g. \"(m, n), (n, p)->(m, p)\"\n", "\n", " target: str\n", " A string for code generation target. Defaults to \"cpu\".\n", "\n", " Returns\n", " --------\n", "\n", " A NumPy generialized universal-function\n", "\n", " Example\n", " -------\n", " @guvectorize(['void(int32[:,:], int32[:,:], int32[:,:])',\n", " 'void(float32[:,:], float32[:,:], float32[:,:])'],\n", " '(x, y),(x, y)->(x, y)')\n", " def add_2d_array(a, b):\n", " for i in range(c.shape[0]):\n", " for j in range(c.shape[1]):\n", " c[i, j] = a[i, j] + b[i, j]\n", "\n", " \n" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "[_Generalized universal functions_](http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html) require a _dimension signature_ for the _kernel_ they implement. We call this the *NumPy generalized-ufunc signature*. Do not confuse this _dimension signature_ with the _type signature_ that _numba_ requires.\n", "\n", "The _dimension signature_ describe the dimensions of the operands, as well as constraints to the values of those dimensions so that the function can work. For example, a matrix multiply gufunc will have a dimension signature like '(m,n), (n,p) -> (m,p)'. This means:\n", "\n", "- First operand has two dimensions (m,n).\n", "\n", "- Second operand has two dimensions (n,p).\n", "\n", "- Result has two dimensions (m,p).\n", "\n", "The names of the dimensions are symbolic, and dimensions having the same name must match in *arity* (number of elements). So in our matrix multiply example the following constraints have to be met:\n", "\n", "- elements in a row of the first operand *must equal* the elements in a column of the second operand. Both are 'n'.\n", "\n", "As you can see, the arity of the dimensions of the result can be infered from the source operands:\n", "\n", "- Result will have as many rows as rows has the first operand. Both are 'm'.\n", "\n", "- Result will have as many columns as columns has the second operand. Both are 'p'.\n", "\n", "You can find more information about *Numpy generalized-ufunc signature* in [NumPy's documentation](http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html#details-of-signature).\n", "\n", "When building a _gufunc_ you start by writing the kernel function. You have to bear in mind which is the dimension signature and write the code to handle a single element. The function will take both, *input arguments* and *results*, as parameters. The *result* will be the last argument of the function. There shouldn't be any return value to the function, as the result should be placed directly in the last argument. The result of modifying an argument other than the result argument is undefined." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def matmulcore(A, B, C):\n", " m, n = A.shape\n", " n, p = B.shape\n", " for i in range(m):\n", " for j in range(p):\n", " C[i, j] = 0\n", " for k in range(n):\n", " C[i, j] += A[i, k] * B[k, j]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note how the m, n and p are extracted from the input arguments. The extraction of *n* is done twice to reinforce the notion that both are the same. That extraction is not really needed, as you could directly index inside the shape when defining the range." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To build a *generalized-ufunc* from the function is just a matter of using the *guvectorize* decorator. The interface to *guvectorize* is akin that of *vectorize*, but also requires the *NumPy generalized-ufunc* signature." ] }, { "cell_type": "code", "collapsed": false, "input": [ "gu_matmul = numba.guvectorize(['float32[:,:], float32[:,:], float32[:,:]', 'float64[:,:], float64[:,:], float64[:,:]' ],\n", " '(m,n),(n,p)->(n,p)')(matmulcore)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 43 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is a _gufunc_, that can be used as any othe _gufunc_ in _NumPy_. Broadcasting and type promotion rules are those on _NumPy_." ] }, { "cell_type": "code", "collapsed": false, "input": [ "matrix_ct = 10000\n", "gu_test_A = np.arange(matrix_ct * 2 * 4, dtype=np.float32).reshape(matrix_ct, 2, 4)\n", "gu_test_B = np.arange(matrix_ct * 4 * 5, dtype=np.float32).reshape(matrix_ct, 4, 5)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 44 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit gu_matmul(gu_test_A, gu_test_B)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 9.63 ms per loop\n" ] } ], "prompt_number": 45 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some recap on the difference between *vectorize* and *guvectorize*:\n", "\n", "1. *vectorize* generates *ufuncs*, *guvectorize* generates *generalized-ufuncs*\n", "\n", "1. In both, *type signatures* for the arguments and return type are given in a list, but in *vectorize* function signatures are used to specify them, while on *guvectorize* a list of types is used instead, the resulting type being specified last.\n", "\n", "1. When using *guvectorize* the *NumPy generalized-ufunc* signature needs to be supplied. This signature must be coherent with the type signatures.\n", "\n", "1. Remember that with *guvectorize* the result is passed as the last argument to the *kernel*, while in *vectorize* the result is returned by the *kernel*." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Caveats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are some points to take into account when dealing with _NumPy_ arrays inside _numba_ compiled functions:" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "No range checks when indexing in _numba_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In _numba_ generated code **no range checking is performed when indexing**. No range checking is performed as to allow generating code that performs better. So you need to be careful about the code as any indexing that goes out of range can cause a bad-access or a memory overwrite, potentially *crashing* the interpreter process." ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr = np.arange(16.0).reshape((2,8))\n", "print(arr)\n", "print(arr.strides)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[ 0. 1. 2. 3. 4. 5. 6. 7.]\n", " [ 8. 9. 10. 11. 12. 13. 14. 15.]]\n", "(64, 8)\n" ] } ], "prompt_number": 46 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As indexing in Python is 0-based, the following line will cause an exception error, as arr.shape[1] is 8, and the range for the column number is (0..7):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr[0, arr.shape[1]] = 42.0" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "index 8 is out of bounds for axis 1 with size 8", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0marr\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0marr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m42.0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: index 8 is out of bounds for axis 1 with size 8" ] } ], "prompt_number": 47 }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, as _numba_ doesn't have range checks, it will index anyways. As the index is out of bounds, and the array is in C order, the value will overflow into the next row. In this case, in the place reserved for element (1, 0)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "@numba.jit(\"void(f8[:,:])\")\n", "def bad_access(array):\n", " array[0, array.shape[1]] = 42.0\n", "bad_access(arr)\n", "print(arr)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[ 0. 1. 2. 3. 4. 5. 6. 7.]\n", " [ 42. 9. 10. 11. 12. 13. 14. 15.]]\n" ] } ], "prompt_number": 48 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this sample case we where lucky, as the _out-of-bounds_ access fell into the allocated range. Unchecked indexing can potentially cause illegal accesses and crash the process running the *Python interpreter*. However, it allows for code generation that produces faster code." ] } ], "metadata": {} } ] }