{ "metadata": { "name": "Numba Parakeet Cython" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Numba vs. Parakeet vs. Cython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*This notebook is derived from a blog *\n", "[*post*](http://jakevdp.github.io/blog/2012/08/24/numba-vs-cython/)\n", "*by Jake Vanderplas on the blog*\n", "[*Pythonic Perambulations*](http://jakevdp.github.io) and updated by Olivier Grisel to add Parakeet." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "\n", "X = np.random.random((1000, 3))\n", "X_wide = np.random.random((1000, 100))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Numpy Function With Broadcasting" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def pairwise_numpy(X):\n", " return np.sqrt(((X[:, None, :] - X) ** 2).sum(-1))\n", "%timeit pairwise_numpy(X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 64.7 ms per loop\n" ] } ], "prompt_number": 2 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Pure Python Function" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def pairwise_python(X):\n", " M = X.shape[0]\n", " N = X.shape[1]\n", " D = np.empty((M, M), dtype=np.float)\n", " for i in range(M):\n", " for j in range(M):\n", " d = 0.0\n", " for k in range(N):\n", " tmp = X[i, k] - X[j, k]\n", " d += tmp * tmp\n", " D[i, j] = np.sqrt(d)\n", " return D" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit pairwise_python(X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 9.51 s per loop\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternative python / numpy implementation closer to the parakeet example from the `examples` folder of its git repo to be fair." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def pairwise_python2(X):\n", " n_samples = X.shape[0]\n", " result = np.zeros((n_samples, n_samples), dtype=X.dtype)\n", " for i in xrange(X.shape[0]):\n", " for j in xrange(X.shape[0]):\n", " result[i, j] = np.sqrt(np.sum((X[i, :] - X[j, :]) ** 2))\n", " return result" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit pairwise_python2(X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 18.2 s per loop\n" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "#np.allclose(pairwise_python(X), pairwise_python2(X))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Numba Wrapper" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: I did not use master as I get a `TypeError: 'numba.numbawrapper.NumbaCompiledWrapper' object is not callable` when calling it." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numba\n", "\n", "numba.__version__" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "'0.9.0'" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "from numba import double\n", "from numba.decorators import jit, autojit\n", "\n", "pairwise_numba = autojit(pairwise_python)\n", "\n", "%timeit pairwise_numba(X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 6.72 ms per loop\n" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit pairwise_numba(X_wide)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 97.3 ms per loop\n" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "pairwise_numba2 = autojit(pairwise_python2)\n", "\n", "%timeit pairwise_numba2(X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 13.9 s per loop" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 11 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Parakeet Wrapper" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parakeet is installed from the master branch of the git repo on Jul. 3 2013" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from parakeet import jit\n", "\n", "pairwise_parakeet = jit(pairwise_python)\n", "\n", "%timeit pairwise_parakeet(X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 12.3 ms per loop\n" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit pairwise_parakeet(X_wide)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 101 ms per loop\n" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "pairwise_parakeet2 = jit(pairwise_python2)\n", "%timeit pairwise_parakeet2(X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 13 ms per loop\n" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit pairwise_parakeet2(X_wide)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 103 ms per loop\n" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "np.allclose(pairwise_parakeet(X), pairwise_parakeet2(X))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "True" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "np.allclose(pairwise_parakeet(X_wide), pairwise_parakeet2(X_wide))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "True" ] } ], "prompt_number": 17 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Optimized Cython Function" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cython --version" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Cython version 0.19.1\r\n" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "%load_ext cythonmagic" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "%%cython\n", "\n", "import numpy as np\n", "cimport cython\n", "from libc.math cimport sqrt\n", "\n", "@cython.boundscheck(False)\n", "@cython.wraparound(False)\n", "def pairwise_cython(double[:, ::1] X):\n", " cdef int M = X.shape[0]\n", " cdef int N = X.shape[1]\n", " cdef double tmp, d\n", " cdef double[:, ::1] D = np.empty((M, M), dtype=np.float64)\n", " for i in range(M):\n", " for j in range(M):\n", " d = 0.0\n", " for k in range(N):\n", " tmp = X[i, k] - X[j, k]\n", " d += tmp * tmp\n", " D[i, j] = sqrt(d)\n", " return np.asarray(D)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit pairwise_cython(X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 6.57 ms per loop\n" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit pairwise_cython(X_wide)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 95.5 ms per loop\n" ] } ], "prompt_number": 22 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Fortran/F2Py" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%file pairwise_fortran.f\n", "\n", " subroutine pairwise_fortran(X,D,m,n)\n", " integer :: n,m\n", " double precision, intent(in) :: X(m,n)\n", " double precision, intent(out) :: D(m,m) \n", " integer :: i,j,k\n", " double precision :: r \n", " do i = 1,m \n", " do j = 1,m \n", " r = 0\n", " do k = 1,n \n", " r = r + (X(i,k) - X(j,k)) * (X(i,k) - X(j,k)) \n", " end do \n", " D(i,j) = sqrt(r) \n", " end do \n", " end do \n", " end subroutine pairwise_fortran" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Overwriting pairwise_fortran.f\n" ] } ], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "# Compile the Fortran with f2py.\n", "# We'll direct the output into /dev/null so it doesn't fill the screen\n", "!f2py -c pairwise_fortran.f -m pairwise_fortran > /dev/null" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "from pairwise_fortran import pairwise_fortran\n", "XF = np.asarray(X, order='F')\n", "%timeit pairwise_fortran(XF)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 10.8 ms per loop\n" ] } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "XF_wide = np.asarray(X_wide, order='F')\n", "%timeit pairwise_fortran(XF_wide)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 111 ms per loop\n" ] } ], "prompt_number": 26 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Scipy Pairwise Distances" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from scipy.spatial.distance import cdist\n", "%timeit cdist(X, X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 7.37 ms per loop\n" ] } ], "prompt_number": 27 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit cdist(X_wide, X_wide)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 97.6 ms per loop\n" ] } ], "prompt_number": 28 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Scikit-learn Pairwise Distances" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn.metrics import euclidean_distances\n", "%timeit euclidean_distances(X, X)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 16.2 ms per loop\n" ] } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit euclidean_distances(X_wide, X_wide)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 22.4 ms per loop\n" ] } ], "prompt_number": 30 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Remarks and analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- This was run on a macbook air 2012 2Ghz Core i7 with the default system blas implementation (no MKL) for numpy\n", "- Some of the timings vary quite a lot from Jake's original post.\n", "- Numba seems to be able to go twice faster than Parakeet when `n_features` is small (e.g. 3 in Jake's original setting)\n", "- Numba fails to optimize the python version that uses the numpy notation to compute distances on pairs of rows\n", "- Maybe calling numba `nopython=True` would catch this but I did not understand how to add this option and make the first example work so I am not sure how to use that option correctly \n", "- Parakeet is almost as fast as numba when `n_features` grows to more realistic sizes (e.g. 100)\n", "- Parakeet can work as efficiently with the numpy row slice expression without any issue which allow for a more natural and concise syntax.\n", "- Blas (as used in the scikit-learn implementation) is still a killer as soon as all the dimensions are not small (note: the scikit-learn implementation can be less numerically stable though)" ] } ], "metadata": {} } ] }