{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"numba_intro.ipynb","version":"0.3.2","provenance":[{"file_id":"https://github.com/cbernet/maldives/blob/master/numba/numba_intro.ipynb","timestamp":1561464888268}],"collapsed_sections":[]},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.3"},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"accelerator":"GPU"},"cells":[{"cell_type":"markdown","metadata":{"colab_type":"text","id":"oi5Qc67EaCuA"},"source":["## Python decorators\n","\n","With numba, the compilation of a python function is triggered by a decorator. If you already know what's a decorator, you can just skip to next section. Otherwise, please read on. \n","\n","A python decorator is a function that takes another function as input, modifies it, and returns the modified function to the user. \n","\n","I realize that this sentence sounds tricky, but it's not. We'll go over a few examples of decorators and everything will become clear! \n","\n","Before we get started, it's important to realize that in python, everything is an object. Functions are objects, and classes are objects too. For instance, let's take this simple function: "]},{"cell_type":"code","metadata":{"colab_type":"code","id":"3eH6kWTGd9KX","colab":{}},"source":["def hello():\n"," print('Hello world')"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"q2hPgixweL7D"},"source":["`hello` is a function object, so we can pass it to another function like this one: "]},{"cell_type":"code","metadata":{"colab_type":"code","id":"ieTemKPNfZLM","colab":{}},"source":["def make_sure(func): \n"," def wrapper():\n"," while 1:\n"," res = input('are you sure you want to greet the world? [y/n]')\n"," if res=='n':\n"," return\n"," elif res=='y':\n"," func()\n"," return\n"," return wrapper "],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"UCDcnKG3iZxC"},"source":["This is a decorator! `make_sure` takes an input function and returns a new function that wraps the input function. \n","\n","Below, we decorate the function `hello`, and `whello` is the decorated function: "]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149326038,"user_tz":-120,"elapsed":2501,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"QDpLTBVfe_CB","outputId":"60529bfd-6040-4108-8631-751903723148","colab":{"base_uri":"https://localhost:8080/","height":51}},"source":["whello = make_sure(hello)\n","whello()\n"],"execution_count":3,"outputs":[{"output_type":"stream","text":["are you sure you want to greet the world? [y/n]y\n","Hello world\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"1MmtXXvzi_5H"},"source":["Of course, we can use the make_sure decorator on any function that has the same signature as `func` (can work without arguments, and no need to retrieve the output). \n","\n","We know enough about decorators to use numba. still, just one word about the syntax. We can also decorate a function in this way: "]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149334821,"user_tz":-120,"elapsed":2365,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"TfleCn-mjd5Q","outputId":"4cbf3e70-3e72-4283-fc4c-3b4226c31cdf","colab":{"base_uri":"https://localhost:8080/","height":51}},"source":["@make_sure\n","def hello():\n"," print('Hello world')\n"," \n","hello()"],"execution_count":4,"outputs":[{"output_type":"stream","text":["are you sure you want to greet the world? [y/n]y\n","Hello world\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"OcykNEdcjlii"},"source":["There is really nothing mysterious about this, it's just a nice and easy syntax to decorate the function as soon as you write it. "]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"UlTYGbG47RLA"},"source":["## just-in-time compilation with numba\n"]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"016H-Bz54Fa0"},"source":["Numba is able to compile python code into bytecode optimized for your machine, with the help of LLVM. You don't really need to know what LLVM is to follow this tutorial, but here is a [nice introduction to LLVM](https://www.infoworld.com/article/3247799/what-is-llvm-the-power-behind-swift-rust-clang-and-more.html) in case you're interested. \n","\n","In this section, we will see how to do that."]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"LBjJhsU3-8I-"},"source":["Here is a function that can take a bit of time. This function takes a list of numbers, and returns the standard deviation of these numbers. "]},{"cell_type":"code","metadata":{"colab_type":"code","id":"GnPGlxxL2ROc","colab":{}},"source":["import math\n","\n","def std(xs):\n"," # compute the mean\n"," mean = 0\n"," for x in xs: \n"," mean += x\n"," mean /= len(xs)\n"," # compute the variance\n"," ms = 0\n"," for x in xs:\n"," ms += (x-mean)**2\n"," variance = ms / len(xs)\n"," std = math.sqrt(variance)\n"," return std"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"PZUXpAVS_gbg"},"source":["As we can see in the code, we need to loop twice on the sample of numbers: first to compute the mean, and then to compute the variance, which is the square of the standard deviation. \n","\n","Obviously, the more numbers in the sample, the more time the function will take to complete. Let's start with 10 million numbers, drawn from a Gaussian distribution of unit standard deviation:"]},{"cell_type":"code","metadata":{"colab_type":"code","id":"IhM4eV5r26fr","colab":{}},"source":["import numpy as np\n","a = np.random.normal(0, 1, 10000000)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149356352,"user_tz":-120,"elapsed":5587,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"VjG27qOP2_fT","outputId":"f9b0443c-c07f-4f27-df97-e425d5733f52","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["std(a)"],"execution_count":7,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.9998845294678804"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"klnf87d28pNY"},"source":["The function takes a couple seconds to compute the standard deviation of the sample.\n","\n","Now, let's import the njit decorator from numba, and decorate our std function to create a new function:"]},{"cell_type":"code","metadata":{"colab_type":"code","id":"kel6N3zs3j3y","colab":{}},"source":["from numba import njit\n","c_std = njit(std)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149360544,"user_tz":-120,"elapsed":747,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"xxXWnK5a34dU","outputId":"06a6ccaa-f527-4144-87f6-af98ae5d0343","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["c_std(a)"],"execution_count":9,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.9998845294678804"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"bWQBWhJuBKBd"},"source":["The performance improvement might not seem striking, maybe due to some overhead related with interpreting the code in the notebook. Also, please keep in mind that the first time the function is called, numba will need to compile the function, which takes a bit of time. \n","\n","But we can quantify the improvement using the timeit magic function, first for the interpreted version of the std function, and then for the compiled version: "]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149381134,"user_tz":-120,"elapsed":19716,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"iwD_zYJ337j7","outputId":"18f32748-fcf9-46db-8e18-f3229874fee2","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["%timeit std(a)"],"execution_count":10,"outputs":[{"output_type":"stream","text":["1 loop, best of 3: 4.77 s per loop\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149382372,"user_tz":-120,"elapsed":19842,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"7OhmC6oX5EJc","outputId":"da394aff-ff2b-429f-aa90-fcd902d96377","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["%timeit c_std(a)"],"execution_count":11,"outputs":[{"output_type":"stream","text":["10 loops, best of 3: 25.3 ms per loop\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"cL37AUon7dFn"},"source":["The compiled function is 100 times faster! \n","\n","But obviously we did not have to go into such trouble to compute the standard deviation of our array. For that, we can simply use numpy: "]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149382373,"user_tz":-120,"elapsed":17997,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"h80HBVB45Gab","outputId":"68912967-c99d-4b23-fbcd-070388619666","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["a.std()"],"execution_count":12,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.9998845294678591"]},"metadata":{"tags":[]},"execution_count":12}]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149384900,"user_tz":-120,"elapsed":19728,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"Swne0D077yhU","outputId":"386d16b4-8148-4b9d-ff8f-e50f240fbf03","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["%timeit a.std()"],"execution_count":13,"outputs":[{"output_type":"stream","text":["10 loops, best of 3: 63.5 ms per loop\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"zbpITZ4XCIKW"},"source":["We see that numba is even faster than numpy in this particular case, and we will see below that it is much more flexible. "]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"ojynZctYCsH3"},"source":["## Calculation of pi\n","\n","the number pi can be estimated with a very elegant Monte Carlo method. \n","\n","Just consider a square of side L=2, centred on (0,0). In this square, we fit a circle of radius R=1, as shown in the figure below. \n","\n","![](https://raw.githubusercontent.com/cbernet/maldives/master/numba/pi_mc.png)"]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"qj3SuK5rASUt"},"source":["\n","The ratio of the circle area to the square area is \n","\n","$$r = \\frac{A_c}{A_s} = \\frac{\\pi R^2}{L^2} = \\pi / 4$$\n","\n","so \n","\n","$$\\pi = 4 r$$\n","\n","So if we can estimate this ratio, we can estimate pi! \n","\n","And to estimate this ratio, we will simply shoot a large number of points in the square, following a uniform probability distribution. The fraction of the points falling in the circle is an estimator of r. \n","\n","Obviously, the more points, the more precise this estimator will be, and the more decimals of pi can be computed. \n","\n","Let's implement this method, and use it with an increasing number of points to see how the precision improves. \n"]},{"cell_type":"code","metadata":{"colab_type":"code","id":"ON3BJEkiGEJs","colab":{}},"source":["import random \n","\n","def pi(npoints): \n"," n_in_circle = 0 \n"," for i in range(npoints):\n"," x = random.random()\n"," y = random.random()\n"," if (x**2+y**2 < 1):\n"," n_in_circle += 1\n"," return 4*n_in_circle / npoints"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149414384,"user_tz":-120,"elapsed":4317,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"qGFNc_sEGsUa","outputId":"c5cbdb5b-0465-4632-c280-1bd987f20afa","colab":{"base_uri":"https://localhost:8080/","height":85}},"source":["npoints = [10, 100, 10000, int(10e6)]\n","for number in npoints:\n"," print(pi(number))\n"],"execution_count":15,"outputs":[{"output_type":"stream","text":["2.8\n","3.2\n","3.126\n","3.1405876\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"IYKaWi-GHDvG"},"source":["As you can see, even with N=10 million points, the precision is not great. More specifically, the relative uncertainty on pi can be calculated as \n","\n","$$\\delta = 1/ \\sqrt{N}$$\n","\n","Here is how the uncertainty evolves with the number of points: "]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149414385,"user_tz":-120,"elapsed":2569,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"_inYC2KBH4tc","outputId":"f81c7d82-d16c-4731-d0b7-f9a6c3a9ddc0","colab":{"base_uri":"https://localhost:8080/","height":85}},"source":["import math\n","# defining the uncertainty function \n","# with a lambda construct\n","uncertainty = lambda x: 1/math.sqrt(x)\n","for number in npoints:\n"," print('npoints', number, 'delta:', uncertainty(number))"],"execution_count":16,"outputs":[{"output_type":"stream","text":["npoints 10 delta: 0.31622776601683794\n","npoints 100 delta: 0.1\n","npoints 10000 delta: 0.01\n","npoints 10000000 delta: 0.00031622776601683794\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"ddsXvcwmIv1t"},"source":["Clearly, we'll need a lot of points. How fast is our code? "]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149428648,"user_tz":-120,"elapsed":14806,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"K2coSx-YI5ol","outputId":"51746db3-701f-4012-db6f-aebb8a401e46","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["%timeit pi(10000000)"],"execution_count":17,"outputs":[{"output_type":"stream","text":["1 loop, best of 3: 3.54 s per loop\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"2GJsoGrFJLHK"},"source":["A few seconds for 10 million points. This algorithm is O(N), so if we want to use 1 **billion** points, it will take us between 5 and 10 minutes . We don't have that much time, so let's use numba! "]},{"cell_type":"code","metadata":{"colab_type":"code","id":"8RNB6NbPJr9L","colab":{}},"source":["@njit\n","def fast_pi(npoints): \n"," n_in_circle = 0 \n"," for i in range(npoints):\n"," x = random.random()\n"," y = random.random()\n"," if (x**2+y**2 < 1):\n"," n_in_circle += 1\n"," return 4*n_in_circle / npoints"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149441443,"user_tz":-120,"elapsed":23997,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"Kmw3N-HgJytS","outputId":"267a7d1c-bbc2-443c-c8ac-62368ada4c49","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["fast_pi( int(1e9) )"],"execution_count":19,"outputs":[{"output_type":"execute_result","data":{"text/plain":["3.141602888"]},"metadata":{"tags":[]},"execution_count":19}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"GTd4sWAqMjrO"},"source":["This took about 10 s, instead of 7 minutes!"]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"oevjT7CP5KwW"},"source":["## A more involved example: Finding the closest two points"]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"EoQ10qzv8ug2"},"source":["Numpy features an efficient implementation for most array operations. Indeed, you can use numpy to map any function to all elements of an array, or to perform element-wise operations on several arrays. \n","\n","I would say: If numpy can do it, just go for it. \n","\n","But sometimes, you'll come up with an expensive algorithm that cannot easily be implemented with numpy. For instance, let's consider the following function, which takes an array of 2D points, and looks for the closest two points."]},{"cell_type":"code","metadata":{"colab_type":"code","id":"5iG7EBeK70YE","colab":{}},"source":["import sys\n","def closest(points):\n"," '''Find the two closest points in an array of points in 2D. \n"," Returns the two points, and the distance between them'''\n"," \n"," # we will search for the two points with a minimal\n"," # square distance. \n"," # we use the square distance instead of the distance\n"," # to avoid a square root calculation for each pair of \n"," # points\n"," \n"," mindist2 = 999999.\n"," mdp1, mdp2 = None, None\n"," for i in range(len(points)):\n"," p1 = points[i]\n"," x1, y1 = p1\n"," for j in range(i+1, len(points)):\n"," p2 = points[j]\n"," x2, y2 = p2\n"," dist2 = (x1-x2)**2 + (y1-y2)**2\n"," if dist2 < mindist2:\n"," # squared distance is improved, \n"," # keep it, as well as the two \n"," # corresponding points\n"," mindist2 = dist2\n"," mdp1,mdp2 = p1,p2\n"," return mdp1, mdp2, math.sqrt(mindist2)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"qJjt9SPP2Eyg"},"source":["You might be thinking that this algorithm is quite naive, and it's true! I wrote it like this on purpose. \n","\n","You can see that there is a double loop in this algorithm. So if we have N points, we would have to test NxN pairs of points. That's why we say that this algorithm has a complexity of order NxN, denoted O(NxN). \n","\n","To improve the situation a bit, please note that the distance between point i and point j is the same as the distance between point j and point i! \n","So there is no need to check this combination twice. Also, the distance between point i and itself is zero, and should not be tested...That's why I started the inner loop at i+1. So the combinations that are tested are: \n","\n","* (0,1), (0,2), ... (0, N)\n","* (1,2), (1,3), ... (1, N)\n","* ...\n","\n","Another thing to note is that I'm doing all I can to limit the amount of computing power needed for each pair. That's why I decided to minimize the square distance instead of the distance itself, which saves us a call to math.sqrt for every pair of points. \n","\n","Still, the algorithm remains O(NxN). \n","\n","Let's first run this algorithm on a small sample of 10 points, just to check that it works correctly. "]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149449306,"user_tz":-120,"elapsed":508,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"lkS5dbWv_TJA","outputId":"c0b50a27-6f71-42b8-8ac5-83694925bd5a","colab":{"base_uri":"https://localhost:8080/","height":238}},"source":["points = np.random.uniform((-1,-1), (1,1), (10,2))\n","print(points)\n","closest(points)"],"execution_count":21,"outputs":[{"output_type":"stream","text":["[[ 0.9484922 0.02577868]\n"," [-0.23119727 0.45611756]\n"," [ 0.02456776 0.88077104]\n"," [ 0.90098115 0.87950187]\n"," [ 0.93403365 -0.54149292]\n"," [-0.2723318 -0.26327452]\n"," [-0.34920523 0.58595109]\n"," [-0.61427508 -0.98697372]\n"," [-0.84107879 0.23247338]\n"," [-0.03779095 0.86567126]]\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/plain":["(array([0.02456776, 0.88077104]),\n"," array([-0.03779095, 0.86567126]),\n"," 0.06416082286764821)"]},"metadata":{"tags":[]},"execution_count":21}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"QF-j2ddw8vSH"},"source":["Ok, this looks right, the two points indeed appear to be quite close. Let's see how fast is the calculation:"]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149458145,"user_tz":-120,"elapsed":4470,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"gnOqlYxkAnPE","outputId":"9a5f61f1-df9f-4631-d102-333a5e731241","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["%timeit closest(points)"],"execution_count":22,"outputs":[{"output_type":"stream","text":["10000 loops, best of 3: 90.5 µs per loop\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"v0Dp0UJZ2_-7"},"source":["Now, let's increase a bit the number of points in the sample. You will see that the calculation will be much slower. "]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149465621,"user_tz":-120,"elapsed":4174,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"JJUyabLEApr0","outputId":"636aff06-68e4-43af-9f45-9eaf1e757b4b","colab":{"base_uri":"https://localhost:8080/","height":68}},"source":["points = np.random.uniform((-1,-1), (1,1), (2000,2))\n","closest(points)"],"execution_count":23,"outputs":[{"output_type":"execute_result","data":{"text/plain":["(array([-0.03506396, -0.50458327]),\n"," array([-0.0355305, -0.5046914]),\n"," 0.0004789123574545794)"]},"metadata":{"tags":[]},"execution_count":23}]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"ok","timestamp":1562149479332,"user_tz":-120,"elapsed":15730,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"HkIm_2BgA2sM","outputId":"ca54a6c7-beeb-450b-d92d-9fab8185d922","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["%timeit closest(points)"],"execution_count":24,"outputs":[{"output_type":"stream","text":["1 loop, best of 3: 3.41 s per loop\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"ZOwo4aRj3Ic5"},"source":["Since our algorithm is O(NxN), if we go from 10 to 2,000 points, the algorithm will be 200x200 = 40,000 times slower. \n","\n","And this is what we have measured: 2.97 / 78e-6 = 38,000 (the exact numbers may vary).\n","\n","Now let's try and speed this up with numba's just in time compilation: "]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"status":"error","timestamp":1562149479532,"user_tz":-120,"elapsed":9510,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"}},"id":"GSBhk2nlA4Zd","outputId":"d07e4955-5295-4124-ec5d-1ab61cfcfada","colab":{"base_uri":"https://localhost:8080/","height":266}},"source":["c_closest = njit(closest)\n","c_closest(points)"],"execution_count":25,"outputs":[{"output_type":"error","ename":"SystemError","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)","\u001b[0;31mTypeError\u001b[0m: cannot convert native ?array(float64, 1d, C) to Python object","\nThe above exception was the direct cause of the following exception:\n","\u001b[0;31mSystemError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mc_closest\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnjit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mclosest\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mc_closest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpoints\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;31mSystemError\u001b[0m: CPUDispatcher() returned a result with an error set"]}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"tba6m2CY-V-s"},"source":["It does not work! But no reason to panic. The important message is :\n","\n","```\n","TypeError: cannot convert native ?array(float64, 1d, C) to Python object\n","```\n","\n","That's a bit cryptic, but one can always google error messages. What this one means is that numba has trouble converting types between the compiled function and python. \n","\n","To solve this issue, we will use numba's just in time compiler to specify the input and output types. In the example below, we specify that the input is a 2D array containing float64 numbers, and that the output is a tuple with two float64 1D arrays (the two points), and one float64, the distance between these points. \n","\n","The nopython argument instructs numba to compile the whole function. Actually, this is the recommended practice for maximum performance, and so far, we have used the njit decorator, which is equivalent to\n","\n","```python\n","@jit(nopython=True)\n","```"]},{"cell_type":"code","metadata":{"colab_type":"code","id":"h_uvtqgkCwSt","colab":{}},"source":["from numba import jit\n","@jit('Tuple((float64[:], float64[:], float64))(float64[:,:])', \n"," nopython=True)\n","def c_closest(points):\n"," mindist2 = 999999.\n"," mdp1, mdp2 = None, None\n"," for i in range(len(points)):\n"," p1 = points[i]\n"," x1, y1 = p1\n"," for j in range(i+1, len(points)): \n"," p2 = points[j]\n"," x2, y2 = p2\n"," dist2 = (x1-x2)**2 + (y1-y2)**2\n"," if dist2 < mindist2: \n"," mindist2 = dist2\n"," mdp1 = p1\n"," mdp2 = p2\n"," return mdp1, mdp2, math.sqrt(mindist2)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"elapsed":848,"status":"ok","timestamp":1561476465418,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"},"user_tz":-120},"id":"5b5vaFXbEGCe","outputId":"e4bcb82c-aebf-41df-9309-3d2e14c334cc","colab":{"base_uri":"https://localhost:8080/","height":68}},"source":["c_closest(points)"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["(array([0.34985315, 0.29602749]),\n"," array([0.34939527, 0.29609651]),\n"," 0.00046305826994962995)"]},"metadata":{"tags":[]},"execution_count":28}]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"elapsed":12261,"status":"ok","timestamp":1561476479015,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"},"user_tz":-120},"id":"xaWTpO4dEIP1","outputId":"506cf6b8-0638-4886-84c5-69ae7f555052","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["%timeit closest(points)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["2.53 s ± 48.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab_type":"code","executionInfo":{"elapsed":12341,"status":"ok","timestamp":1561476480612,"user":{"displayName":"Colin Bernet","photoUrl":"https://lh5.googleusercontent.com/-PT0Y40VvMq0/AAAAAAAAAAI/AAAAAAAAFEI/VsJO93FEElA/s64/photo.jpg","userId":"10813031011844134122"},"user_tz":-120},"id":"ohpaXF80LCE0","outputId":"b2daf05f-e968-4107-c191-21e715316c34","colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["%timeit c_closest(points)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["10 loops, best of 3: 37.1 ms per loop\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"Z4uXBrNe_Kv5"},"source":["Again, the compiled code is 100 times faster! "]}]}