{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4.8. Making efficient selections in arrays with NumPy" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "id = lambda x: x.__array_interface__['data'][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a large array." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "n, d = 100000, 100" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = np.random.random_sample((n, d)); aid = id(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array views and fancy indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We take a selection using two different methods: with a view and with fancy indexing." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "b1 = a[::10]\n", "b2 = a[np.arange(0, n, 10)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.array_equal(b1, b2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The view refers to the original data buffer, whereas fancy indexing yields a copy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "id(b1) == aid, id(b2) == aid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fancy indexing is several orders of magnitude slower as it involves copying a large array. Fancy indexing is more general as it allows to select any portion of an array (using any list of indices), not just a strided selection." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit a[::10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit a[np.arange(0, n, 10)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Alternatives to fancy indexing: list of indices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given a list of indices, there are two ways of selecting the corresponding sub-array: fancy indexing, or the np.take function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "i = np.arange(0, n, 10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "b1 = a[i]\n", "b2 = np.take(a, i, axis=0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.array_equal(b1, b2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit a[i]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit np.take(a, i, axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using np.take instead of fancy indexing is faster." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: Performance of fancy indexing has been improved in recent versions of NumPy; this trick is especially useful on older versions of NumPy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Alternatives to fancy indexing: mask of booleans" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a mask of booleans, where each value indicates whether the corresponding row needs to be selected in x." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "i = np.random.random_sample(n) < .5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The selection can be made using fancy indexing or the np.compress function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "b1 = a[i]\n", "b2 = np.compress(i, a, axis=0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.array_equal(b1, b2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit a[i]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit np.compress(i, a, axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again, the alternative method to fancy indexing is faster." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }