{ "metadata": { "name": "", "signature": "sha256:9656c4f5889fd861f672b9af753aa0786d5b19287795668da81a50edbc1180c8" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this tutorial is to provide an overview of the use of the NumPy library. It tries to hit all of the important parts , but it is by no means comprehensive. For more information, try looking at the:\n", "- [Tentative NumPy Tutorial](http://wiki.scipy.org/Tentative_NumPy_Tutorial)\n", "- [NumPy User Guide](http://docs.scipy.org/doc/numpy/user/)\n", "- [Introduction to NumPy from SAM](http://www.sam.math.ethz.ch/~raoulb/teaching/PythonTutorial/intro_numpy.html)\n", "\n", "NumPy is the fundamental package for scientific computing with Python. It contains among other things:\n", "- a powerful N-dimensional array object\n", "- sophisticated (broadcasting) functions\n", "- tools for integrating C/C++ and Fortran code\n", "- useful linear algebra, Fourier transform, and random number capabilities\n", "\n", "The NumPy array object is the common interface for working with typed arrays of data across a wide-variety of scientific Python packages. NumPy also features a C-API, which enables interfacing existing Fortran/C/C++ libraries with Python and NumPy." ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Basic Usage" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The NumPy array represents a *contiguous* block of memory, holding entries of a given type (and hence fixed size). The entries are laid out in memory according to the shape, or list of dimension sizes." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Convention for import to get shortened namespace\n", "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create a simple array from a list of integers\n", "a = np.array([1, 2, 3])\n", "a" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Print out the shape attribute\n", "a.shape" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# And now the data type\n", "a.dtype" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# This time use a list of floats\n", "a = np.array([1., 2., 3., 4., 5.])\n", "a" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "a.shape" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "a.dtype" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy also provides helper functions for generating arrays of data. `arange` for instance creates a range of values." ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = np.arange(5)\n", "print a" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "a = np.arange(1, 10, 2)\n", "print a" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`linspace` is similar, but is used to create a linearly spaced array of values." ] }, { "cell_type": "code", "collapsed": false, "input": [ "b = np.linspace(5, 15, 5)\n", "print b" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy also provides simple ways of performing mathematical operations. For instance, in core Python, you can do:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print [x + y for x,y in zip(a,b)]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using NumPy this becomes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a + b" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "a * b" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "t = np.arange(0, 2 * np.pi, np.pi / 4)\n", "t" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy also provides mathematical functions that operate on arrays:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.sin(t)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Basic Plotting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Convention for import of the pyplot interface\n", "import matplotlib.pyplot as plt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import display\n", "\n", "# Set-up the IPython notebook to put plots inline\n", "%matplotlib inline" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create some example data\n", "x = np.linspace(0, 2, 100)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Go ahead and explicitly create a figure and an axes\n", "fig = plt.figure(figsize=(8,5))\n", "ax = fig.add_subplot(1, 1, 1)\n", "\n", "# Plot our x variable on the x-axis and x^2 on the y-axis\n", "ax.plot(x, x**2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Add some labels to the plot\n", "ax.set_xlabel('x')\n", "ax.set_ylabel('f(x)')\n", "\n", "# Needed to reuse and see the updated plot while using inline\n", "display(fig)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Let's add a title with a bit of latex syntax\n", "ax.set_title('$y = x^2$', fontdict={'size':22})\n", "\n", "display(fig)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "fig = plt.figure(figsize=(8,5))\n", "ax = fig.add_subplot(1, 1, 1)\n", "\n", "# Plot a set of different polynomials. The label argument is used when generating a legend.\n", "ax.plot(x, x, label='$x$')\n", "ax.plot(x, x * x, label='$x^2$')\n", "ax.plot(x, x**3, label='$x^3$')\n", "\n", "# Add labels and title\n", "ax.set_xlabel('x')\n", "ax.set_ylabel('f(x)')\n", "ax.set_title('Polynomials')\n", "\n", "# Add gridlines\n", "ax.grid(True)\n", "\n", "# Add a legend to the upper left corner of the plot\n", "ax.legend(loc='upper left')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make a plot containing:\n", "- 3 different plots with different styles. Suggestions:\n", " - sin, cos, tan\n", " - exp, log\n", " - sqrt\n", " - Any others you want to try\n", "- Use labels and a legend\n", "- Add labels and title" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Indexing and Slicing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indexing is how we pull individual data items out of an array. Slicing extends this process to pulling out a regular set of the items." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create an array for testing\n", "a = np.arange(12).reshape(3, 4)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "a" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indexing in Python is 0-based, so the command below looks for the 2nd item along the first dimension and the 3rd along the second dimension." ] }, { "cell_type": "code", "collapsed": false, "input": [ "a[1, 2]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can also just index on one dimension" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a[2]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Negative indices are also allowed, which permit indexing relative to the end of the array." ] }, { "cell_type": "code", "collapsed": false, "input": [ "a[0, -1]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Slicing syntax is written as `start:stop[:step]`, where all numbers are optional. Start defaults to 0, end to len(dim), and step to 1. The second colon is also optional if no step is used. It should be noted that end represents one past the last item; one can also think of it as a half open interval: `[start, end)`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a[1:3]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "a[::2, ::2]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "a[:, 2]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# ... can be used to replace one or more full slices\n", "a[..., 2]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the data below, write code to solve the commented problems." ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = np.arange(60).reshape(3, 4, 5)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Decimate the data by a factor of 3 along all dimensions" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Pull out the 2nd 4x5 slice" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Grab the last column from all dimensions (result should be 3x4)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Broadcasting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is \u201cbroadcast\u201d across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.\" (NumPy User Guide)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create some test data\n", "a = np.linspace(0, np.pi, 4)\n", "a" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy can perform operations between arrays and constants:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a * 2" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This extends to operations between arrays of different sizes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a * np.array([2])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "b = np.linspace(0, 50, 5)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print a.shape\n", "print b.shape" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# This will not work, however, because the array sizes cannot be rectified\n", "a * b" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Broadcasting works by:\n", "1. Make the arrays have the same number of dimensions. If they are different, size-1 dimensions are *prepended* implicitly\n", "2. Check that each dimension is compatible--that they are the same or one of them is 1" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# If we add a size-1 dimension to a they can be rectified. The process of broadcasting will implicitly add a \n", "a = a.reshape((-1, 1))\n", "print a.shape" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "result = a * b\n", "result" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "result.shape" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a 1D array of 100 x values between -3 and 3, and a 1D array of 150 y values between -5 and 5. Use these to calculate an array of radius values.\n", "\n", "Radius can be calculated as:\n", "$r = \\sqrt{x^2 + y^2}$" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Logical Indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Logical indexing allows selecting elements from an array using a second array of `True` and `False` values." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create some synthetic data representing temperature and wind speed data\n", "temp = 20 * np.cos(np.linspace(0, 2 * np.pi, 100)) + 40 + 2 * np.random.randn(100)\n", "spd = np.abs(10 * np.sin(np.linspace(0, 2 * np.pi, 100)) + 10 + 5 * np.random.randn(100))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.plot(temp)\n", "plt.plot(spd)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# By doing a comparision between a NumPy array and a value, we get an\n", "# array of values representing the results of the comparison between\n", "# each element and the value\n", "temp > 50" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# We can take the resulting array and use this to index into the NumPy\n", "# array and retrieve the values where the result was true\n", "temp[temp > 50]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# So long as the size of the boolean array matches the data, the\n", "# boolean array can come from anywhere\n", "temp[spd > 10]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Make a copy so we don't modify the original data\n", "temp2 = temp.copy()\n", "\n", "# Replace all places where spd is <10 with 0.\n", "temp2[spd < 10] = 0\n", "plt.plot(temp2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Can also combine multiple boolean arrays using the syntax for bitwise operations\n", "# *MUST HAVE PARENTHESES* due to operator precedence\n", "temp[(temp < 45) & (spd > 10)]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Masked Arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Masked arrays are a specialization of NumPy arrays to handle flagging individual elements as masked. This allows elminating values from plots and from computations." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create a masked array of temperature, masking values where speed is < 10\n", "temp_masked = np.ma.array(temp, mask=spd < 10)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "temp_masked" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.plot(temp_masked)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Masked values can be set after array creation by setting the corresponding value to the special value of ma.masked\n", "temp_masked[temp_masked > 45] = np.ma.masked" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "temp_masked" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.plot(np.arange(temp_masked.size), temp_masked)\n", "\n", "# Set plot limits to the same as before\n", "plt.xlim(0, 100)\n", "plt.ylim(0, 65)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently, the integral in the code below results in a bad value. Using masked arrays, fix this so that the integral returns the proper value of `2.393`.\n", "\n", "Hint: look at `np.isnan()`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "t = np.linspace(0, 2*np.pi, 200)\n", "x = np.sqrt(np.sin(t))\n", "print np.trapz(x, t)" ], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }