{ "metadata": { "name": "", "signature": "sha256:3578150c0e31a977f86ecbc4bc1024cd22aedd69363b2fe3b3388384b1a8ef7b" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An introduction to NumPy\n", "\n", "The Python language is an excellent tool for general-purpose programming, with a highly readable syntax, rich and powerful data types (strings, lists, sets, dictionaries, arbitrary length integers, etc) and a very comprehensive standard library. It was not, however, designed specifically for mathematical and scientific computing. Neither the language nor its standard library have facilities for the efficient representation of multidimensional datasets, tools for linear algebra and general matrix manipulations (essential building blocks of virtually all technical computing), nor any data visualisation facilities.\n", "\n", "For example, Python lists are very flexible containers that can be nested arbitrarily deep and can hold any Python object in them, but they are poorly suited to represent common mathematical constructs like vectors and matrices. It is for this reason that NumPy exists. Typically NumPy is imported as `np`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy, at its core, provides a powerful array object. Let's start by exploring how the NumPy array differs from a Python list. We start by creating a simple Python list and a NumPy array with identical contents:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "lst = [10, 20, 30, 40]\n", "arr = np.array([10, 20, 30, 40])\n", "print lst\n", "print arr" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[10, 20, 30, 40]\n", "[10 20 30 40]\n" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Element indexing\n", "\n", "Elements of a one-dimensional array are accessed with the same syntax as a list:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print lst[0], arr[0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 10\n" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "print lst[-1], arr[-1]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "40 40\n" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "print lst[2:], arr[2:]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[30, 40] [30 40]\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Differences between arrays and lists\n", "\n", "The first difference to note between lists and arrays is that arrays are *homogeneous*; i.e. all elements of an array must be of the same type. In contrast, lists can contain elements of arbitrary type. For example, we can change the last element in our list above to be a string:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "lst[-1] = 'a string inside a list'\n", "lst" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "[10, 20, 30, 'a string inside a list']" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "but the same can not be done with an array, as we get an error message:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr[-1] = 'a string inside an array'" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "invalid literal for long() with base 10: 'a string inside an array'", "output_type": "pyerr", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0marr\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m-\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'a string inside an array'\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mValueError\u001b[0m: invalid literal for long() with base 10: 'a string inside an array'" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Caveat, it can be done, but really *don't do it*; lists are generally better at non-homogeneous collections." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Array Attributes\n", "\n", "The information about the type of an array is contained in its `dtype` attribute:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr.dtype" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "dtype('int64')" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once an array has been created, its `dtype` is fixed (in this case to an 8 byte/64 bit signed integer) and it can only store elements of the same type.\n", "\n", "For this example where the `dtype` is integer, if we try storing a floating point number in the array it will be automatically converted into an integer:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr[-1] = 1.234\n", "arr" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "array([10, 20, 30, 1])" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating Arrays\n", "\n", "Above we created an array from an existing list. Now let's look into other ways in which we can create arrays, which we'll illustrate next. A common need is to have an array initialized with a constant value. Very often this value is 0 or 1 (suitable as starting value for additive and multiplicative loops respectively); `zeros` creates arrays of all zeros, with any desired dtype:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.zeros(5, dtype=np.float)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "array([ 0., 0., 0., 0., 0.])" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "np.zeros(3, dtype=np.int)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "array([0, 0, 0])" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "and similarly for `ones`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print '5 ones:', np.ones(5, dtype=np.int)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "5 ones: [1 1 1 1 1]\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want an array initialized with an arbitrary value, we can create an empty array and then use the fill method to put the value we want into the array:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = np.empty(4, dtype=np.float)\n", "a.fill(5.5)\n", "a" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "array([ 5.5, 5.5, 5.5, 5.5])" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatives, such as:\n", "\n", " * ``np.ones(4) * 5.5``\n", " * ``np.zeros(4) + 5.5``\n", "\n", "are generally less efficient, but are also reasonable. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data types\n", "\n", "NumPy comes with most of the common data types (and some uncommon ones too).\n", "\n", "The most used (and portable) dtypes are:\n", "\n", "* bool\n", "* uint8\n", "* int (machine dependent)\n", "* int8\n", "* int32\n", "* int64\n", "* float (machine dependent)\n", "* float32\n", "* float64\n", "\n", "Full details can be found at http://docs.scipy.org/doc/numpy/user/basics.types.html." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Question: What are the limits of the common NumPy integer types?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.array(256, dtype=np.uint8)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "array(0, dtype=uint8)" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "float_info = ('{finfo.dtype}: max={finfo.max:<18}, '\n", " 'approx decimal precision={finfo.precision};')\n", "print float_info.format(finfo=np.finfo(np.float32))\n", "print float_info.format(finfo=np.finfo(np.float64))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "float32: max=3.40282346639e+38 , approx decimal precision=6;\n", "float64: max=1.79769313486e+308, approx decimal precision=15;\n" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Floating point precision is covered in detail at http://en.wikipedia.org/wiki/Floating_point." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can convert an array from one type to another with the ``astype`` method" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.array(1, dtype=np.uint8).astype(np.float32)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "array(1.0, dtype=float32)" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Filling arrays with sequences\n", "\n", "NumPy also offers the `arange` function, which works like the builtin `range` but returns an array instead of a list:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.arange(10, dtype=np.float64)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "np.arange(5, 7, 0.1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "array([ 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. ,\n", " 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9])" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `linspace` and `logspace` functions to create linearly and logarithmically-spaced grids respectively, with a fixed number of points that include both ends of the specified interval:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print \"A linear grid between 0 and 1:\"\n", "print np.linspace(0, 1, 5)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "A linear grid between 0 and 1:\n", "[ 0. 0.25 0.5 0.75 1. ]\n" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "print \"A logarithmic grid between 10**2 and 10**4:\"\n", "print np.logspace(2, 4, 3)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "A logarithmic grid between 10**2 and 10**4:\n", "[ 100. 1000. 10000.]\n" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating random arrays\n", "\n", "Finally, it is often useful to create arrays with random numbers that follow a specific distribution. The `np.random` module contains a number of functions that can be used to this effect.\n", "First, we must import it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "import numpy.random" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To produce an array of 5 random samples taken from a standard normal distribution (0 mean and variance 1):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print np.random.randn(5)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ 0.55206256 -1.87467532 -0.60317911 -0.50195095 -0.70764961]\n" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For an array of 5 samples from the normal distribution with a mean of 10 and a variance of 3:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "norm10 = np.random.normal(10, 3, 5)\n", "print norm10" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ 9.96241735 11.19345887 5.52211376 11.13822636 11.77986771]\n" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more details see http://docs.scipy.org/doc/numpy/reference/routines.random.html." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Indexing with other arrays\n", "Above we saw how to index NumPy arrays with single numbers and slices, just like Python lists. Arrays also allow for a more sophisticated kind of indexing that is very powerful: you can index an array with another array, and in particular with an array of boolean values. This is particularly useful to extract information from an array that matches a certain condition.\n", "\n", "Consider for example that in the array `norm10` we want to replace all values above 9 with the value 0. We can do so by first finding the *mask* that indicates where this condition is true or false:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "mask = norm10 > 9\n", "mask" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 24, "text": [ "array([ True, True, False, True, True], dtype=bool)" ] } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have this mask, we can use it to either read those values or to reset them to 0:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'Values above 9:', norm10[mask]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Values above 9: [ 9.96241735 11.19345887 11.13822636 11.77986771]\n" ] } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "print 'Resetting all values above 9 to 0...'\n", "norm10[mask] = 0\n", "print norm10" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Resetting all values above 9 to 0...\n", "[ 0. 0. 5.52211376 0. 0. ]\n" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whilst beyond the scope of this course, it is also worth knowing that a specific masked array object exists in NumPy.\n", "Further details are available at http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arrays with more than one dimension\n", "Up until now all our examples have used one-dimensional arrays. NumPy can also create arrays of arbitrary dimensions, and all the methods illustrated in the previous section work on arrays with more than one dimension.\n", "\n", "A list of lists can be used to initialize a two dimensional array:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "lst2 = [[1, 2, 3], [4, 5, 6]]\n", "arr2 = np.array([[1, 2, 3], [4, 5, 6]])\n", "print arr2\n", "print arr2.shape" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[1 2 3]\n", " [4 5 6]]\n", "(2, 3)\n" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "With two-dimensional arrays we start seeing the power of NumPy: while nested lists can be indexed by repeatedly using the `[ ]` operator, multidimensional arrays support a much more natural indexing syntax using a single `[ ]` and a set of indices separated by commas:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print lst2[0][1]\n", "print arr2[0, 1]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "2\n", "2\n" ] } ], "prompt_number": 28 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Question: Why does the following example produce different results?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print lst2[0:2][1]\n", "print arr2[0:2, 1]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[4, 5, 6]\n", "[2 5]\n" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multidimensional array creation\n", "\n", "The array creation functions listed previously can also be used to create arrays with more than one dimension. For example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.zeros((2, 3))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 30, "text": [ "array([[ 0., 0., 0.],\n", " [ 0., 0., 0.]])" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "np.random.normal(10, 3, size=(2, 4))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 31, "text": [ "array([[ 9.62991854, 5.91150288, 4.87681868, 5.31150873],\n", " [ 3.93823953, 10.5975548 , 11.2739624 , 9.27214526]])" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In fact, the shape of an array can be changed at any time, as long as the total number of elements is unchanged. For example, if we want a 2x4 array with numbers increasing from 0, the easiest way to create it is:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr = np.arange(8).reshape(2, 4)\n", "print arr" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[0 1 2 3]\n", " [4 5 6 7]]\n" ] } ], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Views, not Copies\n", "\n", "Note that reshaping (like most NumPy operations), wherever possible, provides a **view** of *the same memory*:\n", "\n", "Reshaping, like most NumPy operations, provides a **view** of _the same memory_ wherever possible:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr = np.arange(8)\n", "arr_view = arr.reshape(2, 4)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 33 }, { "cell_type": "markdown", "metadata": {}, "source": [ "What this means is that if one array is modified, the other will also be updated:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Print the \"view\" array from reshape.\n", "print 'Before\\n', arr_view\n", "\n", "# Update the first element of the original array.\n", "arr[0] = 1000\n", "\n", "# Print the \"view\" array from reshape again,\n", "# noticing the first value has changed.\n", "print 'After\\n', arr_view" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Before\n", "[[0 1 2 3]\n", " [4 5 6 7]]\n", "After\n", "[[1000 1 2 3]\n", " [ 4 5 6 7]]\n" ] } ], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This lack of copying allows for very efficient vectorized operations, but this power should be used carefully - if used badly it can lead to some bugs that are hard to track down.\n", "\n", "If in doubt, you can always copy the data to a different block of memory with the **``copy()``** method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slices\n", "\n", "With multidimensional arrays you can also index using slices, and you can mix and match slices and single indices in the different dimensions:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr = np.arange(8).reshape(2, 4)\n", "print arr\n", "\n", "print ('Second element from dimension 0, '\n", " 'last 2 elements from dimension one: \\n'),\n", "print arr[1, 2:]\n", "\n", "print ('All elements bar the last from dimension 0, '\n", " 'third element from dimension one: \\n'),\n", "print arr[:-1, 2]\n", "\n", "print ('Second element from dimension 0 (maintaining its dimension), '\n", " 'all elements from dimension one: \\n'),\n", "print arr[1:2, :]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[0 1 2 3]\n", " [4 5 6 7]]\n", "Second element from dimension 0, last 2 elements from dimension one: [6 7]\n", "All elements bar the last from dimension 0, third element from dimension one: [2]\n", "Second element from dimension 0 (maintaining its dimension), all elements from dimension one: [[4 5 6 7]]\n" ] } ], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you only provide one index to slice a multidimensional array, then the slice will be expanded to ``\":\"`` for all of the remaining dimensions:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'First row: ', arr[0], 'is equivalent to', arr[0, :]\n", "print 'Second row: ', arr[1], 'is equivalent to', arr[1, :]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "First row: [0 1 2 3] is equivalent to [0 1 2 3]\n", "Second row: [4 5 6 7] is equivalent to [4 5 6 7]\n" ] } ], "prompt_number": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is also known as \"ellipsis\".\n", "\n", "Ellipsis can be specified explicitly with ```\"...\"```. It will automatically expand to `\":\"` for each of the unspecified dimensions in the array, and can even be used at the beginning of the slice:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr1 = np.empty((4, 6, 3))\n", "print 'Orig shape: ', arr1.shape\n", "print arr1[...].shape\n", "print arr1[..., 0:2].shape\n", "print arr1[2:4, ..., ::2].shape\n", "print arr1[2:4, :, ..., ::-1].shape" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Orig shape: (4, 6, 3)\n", "(4, 6, 3)\n", "(4, 6, 2)\n", "(2, 6, 2)\n", "(2, 6, 3)\n" ] } ], "prompt_number": 37 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating 2D coordinate arrays\n", "\n", "A common task is to generate a pair of arrays that represent the coordinates of our data.\n", "\n", "When orthogonal 1d coordinate arrays already exist, NumPy's `meshgrid` function is very useful:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = np.linspace(0, 9, 3)\n", "y = np.linspace(-8, 4, 3)\n", "x2d, y2d = np.meshgrid(x, y)\n", "print x2d\n", "print y2d" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[ 0. 4.5 9. ]\n", " [ 0. 4.5 9. ]\n", " [ 0. 4.5 9. ]]\n", "[[-8. -8. -8.]\n", " [-2. -2. -2.]\n", " [ 4. 4. 4.]]\n" ] } ], "prompt_number": 38 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array Properties and Methods\n", "\n", "Now that we have seen how to create arrays with more than one dimension, it's a good idea to look at some of the most useful properties and methods that arrays have. The following provide basic information about the size, shape and data in the array:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'Data type :', arr.dtype\n", "print 'Total number of elements :', arr.size\n", "print 'Number of dimensions :', arr.ndim\n", "print 'Shape (dimensionality) :', arr.shape\n", "print 'Memory used (in bytes) :', arr.nbytes" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Data type : int64\n", "Total number of elements : 8\n", "Number of dimensions : 2\n", "Shape (dimensionality) : (2, 4)\n", "Memory used (in bytes) : 64\n" ] } ], "prompt_number": 39 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arrays also have many useful statistical/mathematical methods:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'Minimum and maximum :', arr.min(), arr.max()\n", "print 'Sum and product of all elements :', arr.sum(), arr.prod()\n", "print 'Mean and standard deviation :', arr.mean(), arr.std()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Minimum and maximum : 0 7\n", "Sum and product of all elements : 28 0\n", "Mean and standard deviation : 3.5 2.29128784748\n" ] } ], "prompt_number": 40 }, { "cell_type": "markdown", "metadata": {}, "source": [ "These methods operate on an element-by-element basis on the array, however for multidimensional arrays it is also possible to carry out the computation along a single dimension by passing the `axis` parameter:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'For the following array:\\n', arr\n", "print 'The sum of elements along the rows is :', arr.sum(axis=1)\n", "print 'The sum of elements along the columns is :', arr.sum(axis=0)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "For the following array:\n", "[[0 1 2 3]\n", " [4 5 6 7]]\n", "The sum of elements along the rows is : [ 6 22]\n", "The sum of elements along the columns is : [ 4 6 8 10]\n" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see in this example, the value of the `axis` parameter is the dimension that will be *consumed* once the operation has been carried out. This is why to sum along the columns we use `axis=0`. \n", "\n", "This can be easily illustrated with an example that has more dimensions; we create an array with 4 dimensions and shape `(3,4,5,6)` and sum along the axis number 2 (i.e. the *third* axis, since in Python all counts are 0-based). That consumes the dimension whose length was 5, leaving us with a new array that has shape `(3,4,6)`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.zeros((3, 4, 5, 6)).sum(axis=2).shape" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 42, "text": [ "(3, 4, 6)" ] } ], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another widely used property of arrays is the `.T` attribute, which allows you to access the transpose of the array:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'Array:\\n', arr\n", "print 'Transpose:\\n', arr.T" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Array:\n", "[[0 1 2 3]\n", " [4 5 6 7]]\n", "Transpose:\n", "[[0 4]\n", " [1 5]\n", " [2 6]\n", " [3 7]]\n" ] } ], "prompt_number": 43 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operating with arrays\n", "Arrays support all regular arithmetic operators, and the NumPy library also contains a complete collection of basic mathematical functions that operate on arrays. It is important to remember that in general, all operations with arrays are applied *element-wise*, i.e., are applied to all the elements of the array at the same time. For example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr1 = np.arange(4)\n", "arr2 = np.arange(10, 14)\n", "print arr1, '+', arr2, '=', arr1 + arr2" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0 1 2 3] + [10 11 12 13] = [10 12 14 16]\n" ] } ], "prompt_number": 44 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Importantly, even the multiplication operator is by default applied element-wise. However it is *not* the matrix multiplication from linear algebra:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print arr1, '*', arr2, '=', arr1 * arr2" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0 1 2 3] * [10 11 12 13] = [ 0 11 24 39]\n" ] } ], "prompt_number": 45 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We may also multiply an array by a scalar:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "1.5 * arr1" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 46, "text": [ "array([ 0. , 1.5, 3. , 4.5])" ] } ], "prompt_number": 46 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an example of **broadcasting**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Broadcasting\n", "\n", "The fact that NumPy operates on an element-wise basis means that in principle arrays must always match one another's shape. However, NumPy will also helpfully \"*broadcast*\" dimensions when possible. Here is an example of broadcasting a scalar to a 1D array:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print np.arange(3)\n", "print np.arange(3) + 5" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0 1 2]\n", "[5 6 7]\n" ] } ], "prompt_number": 47 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also broadcast a 1D array to a 2D array, in this case adding a vector to all rows of a matrix:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.ones((3, 3)) + np.arange(3)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 48, "text": [ "array([[ 1., 2., 3.],\n", " [ 1., 2., 3.],\n", " [ 1., 2., 3.]])" ] } ], "prompt_number": 48 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also broadcast in two directions at a time:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = np.arange(3).reshape((3, 1))\n", "b = np.arange(3)\n", "print a, '+', b, '=\\n', a + b" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[0]\n", " [1]\n", " [2]] + [0 1 2] =\n", "[[0 1 2]\n", " [1 2 3]\n", " [2 3 4]]\n" ] } ], "prompt_number": 49 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pictorially:\n", "\n", "![Illustration of broadcasting](../images/fig_broadcast_visual_1.png)\n", "\n", "([image source](http://www.astroml.org/book_figures/appendix/fig_broadcast_visual.html))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rules of Broadcasting\n", "\n", "Broadcasting follows these three rules:\n", "\n", "1. If the two arrays differ in their number of dimensions, the shape of the array with fewer dimensions is *padded* with ones on its leading (left) side.\n", "\n", "2. If the shape of the two arrays does not match in any dimension, either array with shape equal to 1 in a given dimension is *stretched* to match the other shape.\n", "\n", "3. If in any dimension the sizes disagree and neither has shape equal to 1, an error is raised.\n", "\n", "Note that all of this happens without ever actually creating the expanded arrays in memory! This broadcasting behavior is in practice enormously powerful, especially given that when NumPy broadcasts to create new dimensions or to 'stretch' existing ones, it doesn't actually duplicate the data. In the example above the operation is carried out *as if* the scalar 1.5 was a 1D array with 1.5 in all of its entries, but no actual array is ever created. This can save lots of memory in cases when the arrays in question are large. As such this can have significant performance implications." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Broadcasting Examples:\n", "\n", "So when we do\n", "\n", " np.arange(3) + 5\n", " \n", "The scalar 5 is:\n", "\n", "* first 'promoted' to a 1-dimensional array of length 1 (rule 1)\n", "* then, this array is 'stretched' to length 3 to match the first array. (rule 2)\n", "\n", "After these two operations are complete, the addition can proceed as now both operands are one-dimensional arrays of length 3." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we do\n", "\n", " np.ones((3, 3)) + np.arange(3)\n", " \n", "The second array is:\n", "\n", "- first 'promoted' to a 2-dimensional array of shape (1, 3) (rule 1)\n", "- then axis 0 is 'stretched' to length 3 to match the first array (rule 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we do\n", "\n", " np.arange(3).reshape((3, 1)) + np.arange(3)\n", " \n", "- the second array is 'promoted' to a 2-dimensional array of shape (1, 3) (rule 1)\n", "- the second array's axis 0 is 'stretched' to form an array of shape (3, 3) (rule 2)\n", "- the first array's axis 1 is 'stretched' to form an array of shape (3, 3) (rule 2)\n", "\n", "Then the operation proceeds as if on two 3 $\\times$ 3 arrays." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The general rule is: when operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward, creating dimensions of length 1 as needed. Two dimensions are considered compatible when\n", "\n", "* they are equal to begin with, or\n", "* one of them is 1; in this case NumPy will do the 'stretching' to make them equal.\n", "\n", "If these conditions are not met, a `ValueError: operands could not be broadcast together` exception is thrown, indicating that the arrays have incompatible shapes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Questions:\n", "\n", "What will the result of adding ``arr1`` with ``arr2`` be in the following cases?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr1 = np.ones((2, 3))\n", "arr2 = np.ones((2, 1))\n", "\n", "# arr1 + arr2" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 50 }, { "cell_type": "code", "collapsed": false, "input": [ "arr1 = np.ones((2, 3))\n", "arr2 = np.ones(3)\n", "\n", "# arr1 + arr2" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 51 }, { "cell_type": "code", "collapsed": false, "input": [ "arr1 = np.ones((1, 3))\n", "arr2 = np.ones((2, 1))\n", "\n", "# arr1 + arr2" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 52 }, { "cell_type": "code", "collapsed": false, "input": [ "arr1 = np.ones((1, 3))\n", "arr2 = np.ones((1, 2))\n", "\n", "# arr1 + arr2" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 53 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1:\n", "\n", "Use ``np.arange`` and ``reshape`` to create the array\n", "\n", " A = [[1 2 3 4]\n", " [5 6 7 8]]" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 53 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Use ``np.array`` to create the array\n", "\n", " B = [1 2]" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 53 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Use broadcasting to add ``B`` to ``A`` to create the final array\n", "\n", " A + B = [[2 3 4 5]\n", " [7 8 9 10]\n", "\n", "Hint: what shape does ``B`` have to be changed to?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 53 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reshape and newaxis\n", "\n", "Reshaping arrays is a common task in order to make the best of NumPy's powerful broadcasting.\n", "\n", "A useful tip with the **``reshape``** method is that it is possible to provide a ``-1`` length for at most one of the dimensions. This indicates that NumPy should automatically calculate the length of this dimension:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.arange(6).reshape((1, -1))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 54, "text": [ "array([[0, 1, 2, 3, 4, 5]])" ] } ], "prompt_number": 54 }, { "cell_type": "code", "collapsed": false, "input": [ "np.arange(6).reshape((2, -1))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 55, "text": [ "array([[0, 1, 2],\n", " [3, 4, 5]])" ] } ], "prompt_number": 55 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way to increase the dimensionality of an array is to use the ``newaxis`` keyword:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "arr = np.arange(6)\n", "print arr[np.newaxis, :, np.newaxis].shape" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(1, 6, 1)\n" ] } ], "prompt_number": 56 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Element-wise Functions\n", "\n", "NumPy ships with a full complement of mathematical functions that work on entire arrays, including logarithms, exponentials, trigonometric and hyperbolic trigonometric functions, etc. \n", "\n", "For example, sampling the sine function at 100 points between $0$ and $2\\pi$ is as simple as:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = np.linspace(0, 2*np.pi, 100)\n", "y = np.sin(x)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 57 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or to sample the exponential function between $-5$ and $5$ at intervals of $0.5$:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = np.arange(-5, 5.5, 0.5)\n", "y = np.exp(x)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 58 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Linear algebra in NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy ships with a basic linear algebra library, and all arrays have a `dot` method whose behavior is that of the scalar dot product when its arguments are vectors (one-dimensional arrays) and the traditional matrix multiplication when one or both of its arguments are two-dimensional arrays:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "v1 = np.array([2, 3, 4])\n", "v2 = np.array([1, 0, 1])\n", "\n", "print v1, '.', v2, '=', np.dot(v1, v2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[2 3 4] . [1 0 1] = 6\n" ] } ], "prompt_number": 59 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For matrix-matrix multiplication, the regular $matrix \\times matrix$ rules must be satisfied. For example $A \\times A^T$:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "A = np.arange(6).reshape(2, 3)\n", "print A, '\\n'\n", "print np.dot(A, A.T)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[0 1 2]\n", " [3 4 5]] \n", "\n", "[[ 5 14]\n", " [14 50]]\n" ] } ], "prompt_number": 60 }, { "cell_type": "markdown", "metadata": {}, "source": [ "results in a (2, 2) array, yet $A^T \\times A$ results in a (3, 3). Why is this?:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print np.dot(A.T, A)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[ 9 12 15]\n", " [12 17 22]\n", " [15 22 29]]\n" ] } ], "prompt_number": 61 }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy makes no distinction between row and column vectors and simply verifies that the dimensions match the required rules of matrix multiplication. Below is an example of matrix-vector multiplication, and in this case we have a $2 \\times 3$ matrix multiplied by a 3-vector, which produces a 2-vector:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print A, 'x', v1, '=', np.dot(A, v1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[[0 1 2]\n", " [3 4 5]] x [2 3 4] = [11 38]\n" ] } ], "prompt_number": 62 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: To help with the interpretation of this last result, notice that $0 \\times 2 + 1 \\times 3 + 2 \\times 4 = 11$ and $3 \\times 2 + 4 \\times 3 + 5 \\times 4 = 38$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise: trapezoidal integration\n", "\n", "In this exercise, you are tasked with implementing the simple trapezoid rule\n", "formula for numerical integration. If we want to compute the definite integral\n", "\n", "$$\n", " \\int_{a}^{b}f(x)dx\n", "$$\n", "\n", "we can partition the integration interval $[a,b]$ into smaller subintervals. We then\n", "approximate the area under the curve for each subinterval by calculating the area of\n", "the trapezoid created by linearly interpolating between the two function values\n", "at each end of the subinterval:\n", "\n", "![Illustration of the trapezoidal rule](../images/trapezoidal_rule.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a pre-computed $y$ array (where $y = f(x)$ at discrete samples) the trapezoidal rule equation is:\n", "\n", "$$\n", " \\int_{a}^{b}f(x)dx\\approx\\frac{1}{2}\\sum_{i=1}^{n}\\left(x_{i}-x_{i-1}\\right)\\left(y_{i}+y_{i-1}\\right).\n", "$$\n", "\n", "In pure python, this can be written as:\n", "\n", " def trapz_slow(x, y):\n", " area = 0.\n", " for i in range(1, len(x)):\n", " area += (x[i] - x[i-1]) * (y[i] + y[i-1])\n", " return area / 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2\n", "\n", "#### Part 1\n", "\n", "Create two arrays $x$ and $y$, where $x$ is a linearly spaced array in the interval $[0, 3]$ of length 10, and $y$ represents the function $f(x) = x^2$ sampled at $x$." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 62 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Part 2\n", "\n", "Use indexing (not a for loop) to find the 9 values representing $y_{i}+y_{i-1}$ for $i$ between 1 and 10.\n", "\n", "Hint: What indexing would be needed to get all but the last element of the 1d array **``y``**. Similarly what indexing would be needed to get all but the first element of a 1d array." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 62 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Part 3\n", "\n", "Write a function `trapz(x, y)`, that applies the trapezoid formula to pre-computed values, where `x` and `y` are 1-d arrays. The function should not use a for loop." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 62 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Part 4\n", "\n", "Verify that your function is correct by using the arrays created in #1 as input to ``trapz``. Your answer should be a close approximation of $\\int_0^3 x^2$ which is $9$." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 62 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Part 5 (extension)\n", "\n", "``numpy`` and ``scipy.integrate`` provide many common integration schemes. Find the documentation for NumPy's own version of the trapezoidal integration scheme and check its result with your own:" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 62 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Part 6 (extension)\n", "\n", "Write a function `trapzf(f, a, b, npts=100)` that accepts a function `f`, the endpoints `a` and `b` and the number of samples to take `npts`. Sample the function uniformly at these\n", "points and return the value of the integral.\n", "\n", "Use the trapzf function to identify the minimum number of sampling points needed to approximate the integral $\\int_0^3 x^2$ with an absolute error of $<=0.0001$. (A loop is necessary here.)" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 62 } ], "metadata": {} } ] }