{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4.5. Understanding the internals of NumPy to avoid unnecessary array copying" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inspect the memory address of arrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def id(x):\n", " # This function returns the memory\n", " # block address of an array.\n", " return x.__array_interface__['data'][0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = np.zeros(10); aid = id(a); aid" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "b = a.copy(); id(b) == aid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## In-place and copy operations" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a *= 2; id(a) == aid" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "c = a * 2; id(c) == aid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Benchmarking" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In-place operation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%timeit a = np.zeros(10000000)\n", "a *= 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With memory copy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%timeit a = np.zeros(10000000)\n", "b = a * 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reshaping an array: copy or not?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = np.zeros((10, 10)); aid = id(a); aid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reshaping an array while preserving its order does not trigger a copy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "b = a.reshape((1, -1)); id(b) == aid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Transposing an array changes its order so that a reshape triggers a copy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "c = a.T.reshape((1, -1)); id(c) == aid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To return a flattened version (1D) of a multidimensional array, one can use `flatten` or `ravel`. The former always return a copy, whereas the latter only makes a copy if necessary." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "d = a.flatten(); id(d) == aid" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "e = a.ravel(); id(e) == aid" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit a.flatten()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit a.ravel()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Broadcasting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When performing operations on arrays with different shapes, you don't necessarily have to make copies to make their shapes match. Broadcasting rules allow you to make computations on arrays with different but compatible shapes. Two dimensions are compatible if they are equal or one of them is 1. If the arrays have different number of dimensions, dimensions are added to the smaller array from the trailing dimensions to the leading ones." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "n = 1000" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = np.arange(n)\n", "ac = a[:, np.newaxis]\n", "ar = a[np.newaxis, :]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit np.tile(ac, (1, n)) * np.tile(ar, (n, 1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit ar * ac" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can you explain the performance discrepancy between the following two similar operations?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = np.random.rand(5000, 5000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit a[0, :].sum()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%timeit a[:, 0].sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }