--- aliases: - /2015/03/numba-groupby-pixels categories: - python - astrophysics date: 2015-03-24 9:00 layout: post slug: numba-groupby-pixels title: Accelerate groupby operation on pixels with Numba --- [Download the original IPython notebook](https://gist.github.com/zonca/37b4423e507d5dc2fc3b8b5d4db6d316) ## Astrophysics background It is very common in Astrophysics to work with sky pixels. The sky is tassellated in patches with specific properties and a sky map is then a collection of intensity values for each pixel. The most common pixelization used in Cosmology is [HEALPix](https://healpix.sourceforge.io). Measurements from telescopes are then represented as an array of pixels that encode the pointing of the instrument at each timestamp and the measurement output. ## Sample timeline import pandas as pd import numba import numpy as np For simplicity let's assume we have a sky with 50K pixels: NPIX = 50000 And we have 50 million measurement from our instrument: NTIME = int(50 * 1e6) The pointing of our instrument is an array of pixels, random in our sample case: pixels = np.random.randint(0, NPIX-1, NTIME) Our data are also random: timeline = np.random.randn(NTIME) ## Create a map of the sky with pandas One of the most common operations is to sum all of our measurements in a sky map, so the value of each pixel in our sky map will be the sum of each individual measurement. The easiest way is to use the `groupby` operation in `pandas`: timeline_pandas = pd.Series(timeline, index=pixels) timeline_pandas.head() 46889 0.407097 3638 1.300001 6345 0.174931 15742 -0.255958 34308 1.147338 dtype: float64 %time m = timeline_pandas.groupby(level=0).sum() CPU times: user 4.09 s, sys: 471 ms, total: 4.56 s Wall time: 4.55 s ## Create a map of the sky with numba We would like to improve the performance of this operation using `numba`, which allows to produce automatically C-speed compiled code from pure python functions. First we need to develop a pure python version of the code, test it, and then have `numba` optimize it: def groupby_python(index, value, output): for i in range(index.shape[0]): output[index[i]] += value[i] m_python = np.zeros_like(m) %time groupby_python(pixels, timeline, m_python) CPU times: user 37.5 s, sys: 0 ns, total: 37.5 s Wall time: 37.6 s np.testing.assert_allclose(m_python, m) Pure Python is slower than the `pandas` version implemented in `cython`. ### Optimize the function with numba.jit `numba.jit` gets an input function and creates an compiled version with does not depend on slow Python calls, this is enforced by `nopython=True`, `numba` would throw an error if it would not be possible to run in `nopython` mode. groupby_numba = numba.jit(groupby_python, nopython=True) m_numba = np.zeros_like(m) %time groupby_numba(pixels, timeline, m_numba) CPU times: user 274 ms, sys: 5 ms, total: 279 ms Wall time: 278 ms np.testing.assert_allclose(m_numba, m) Performance improvement is about 100x compared to Python and 20x compared to Pandas, pretty good! ## Use numba.jit as a decorator The exact same result is obtained if we use `numba.jit` as a decorator: @numba.jit(nopython=True) def groupby_numba(index, value, output): for i in range(index.shape[0]): output[index[i]] += value[i]