{ "metadata": { "name": "", "signature": "sha256:8169f656b8c946436386bec1d439a1c4b1be5d1dc9ce3ee2883753809c7e33fc" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "%pylab inline" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "prompt_number": 3 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Introduction to NumPy Arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "J.R. Johansson and P.D. Nation\n", "\n", "For more information about QuTiP see [http://qutip.org](http://qutip.org)" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Introduction" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Until now we have been using lists as a way of storing multiple elements together. However, when doing numerical computations, lists are not very good. For example, what if I wanted to add one to a list of numbers? We couldn't write" ] }, { "cell_type": "code", "collapsed": true, "input": [ "a=[1,2,3]\n", "a=a+1" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "can only concatenate list (not \"int\") to list", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0ma\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m+\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: can only concatenate list (not \"int\") to list" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead we would have to do" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for k in range(3):\n", " a[k]=a[k]+1" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Working with lists would quickly become very complicated if we wanted to do numerical operations on many elements at the same time, or if, for example, we want to be able to construct vectors and matrices in our programs. All of these features, and more, come with using NumPy **arrays** as our preferred data structure." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "NumPy Arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "When dealing with numerical data in Python, nearly 100% of the time one uses arrays from the NumPy module to store and manipulate data. NumPy arrays are very similar to Python lists, but are actually arrays in c-code that allow for very fast multi-dimensional numerical, vector, matrix, and linear algebra operations. Using arrays with slicing, and **vectorization** leads to very fast Python code, and can replace many of the for-loops that you would have use if you coded a problem using lists. As a general rule, **minimizing the number of for-loops maximizes the performance of your code**. To start using arrays, we can start with a simple list and use it as an argument to the array function" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from numpy import *\n", "a=array([1,2,3,4,5,6])\n", "print(a)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[1 2 3 4 5 6]\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have now created our first array of integers. Notice how, when using print, the array looks the same as a list, however it is very much a different data structure. We can also create an array of floats, complex numbers or even strings" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a=array([2.0,4.0,8.0,16.0])\n", "b=array([0,1+0j,1+1j,2-2j])\n", "c=array(['a','b','c','d'])\n", "print(a)\n", "print(b)\n", "print(c)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ 2. 4. 8. 16.]\n", "[ 0.+0.j 1.+0.j 1.+1.j 2.-2.j]\n", "['a' 'b' 'c' 'd']\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "In general there are three different ways of creating arrays in Python:\n", "\n", "- First create a list and then call the array function using the list as an input argument.\n", "\n", "- Use NumPy functions that are designed to create arrays: **zeros, ones, arange, linspace**.\n", "\n", "- Import data into Python from file." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Arrays from Lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have already seen how to create arrays with simple lists, but now lets look at how to create more complicated lists that we can turn into arrays. A short way of creating a list, say from 0 to 9 is as follows:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "output=[n for n in range(10)]\n", "print(output)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This code is doing the exact same thing as the longer expression" ] }, { "cell_type": "code", "collapsed": false, "input": [ "output=[]\n", "for n in range(10):\n", " output.append(n)\n", "print(output)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We could turn this into an array quite easy" ] }, { "cell_type": "code", "collapsed": false, "input": [ "array(output)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, we can save even more space and create the list inside of the array function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "array([n for n in range(10)])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can also be used to create more complicated arrays" ] }, { "cell_type": "code", "collapsed": false, "input": [ "array([2.0*k**0.563 for k in range(0,10,2)])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "array([ 0. , 2.95467613, 4.36505551, 5.48440035, 6.44866265])" ] } ], "prompt_number": 13 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Array Creation in NumPy (see [NumPy Documentation](http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html) for more info.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "NumPy has several extremely important array creation functions that will make you life much easier. For example, creating arrays of all zeros or ones is trivial. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "zeros(5)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "array([ 0., 0., 0., 0., 0.])" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "ones(10)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "However, the most useful functions are [**arange**](http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html#numpy.arange) which generates evenly spaced values within a given interval in a similar way that the range function did, and [**linspace**](http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html) that makes a linear array of points from a starting to an ending value." ] }, { "cell_type": "code", "collapsed": false, "input": [ "arange(5)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "array([0, 1, 2, 3, 4])" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "arange(0,10,2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "array([0, 2, 4, 6, 8])" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "linspace(0,10,20) #makes an array of 20 points linearly spaced from 0 to 10" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "array([ 0. , 0.52631579, 1.05263158, 1.57894737,\n", " 2.10526316, 2.63157895, 3.15789474, 3.68421053,\n", " 4.21052632, 4.73684211, 5.26315789, 5.78947368,\n", " 6.31578947, 6.84210526, 7.36842105, 7.89473684,\n", " 8.42105263, 8.94736842, 9.47368421, 10. ])" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "linspace(-5,5,15) #15 points in range from -5 to 5" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "array([-5. , -4.28571429, -3.57142857, -2.85714286, -2.14285714,\n", " -1.42857143, -0.71428571, 0. , 0.71428571, 1.42857143,\n", " 2.14285714, 2.85714286, 3.57142857, 4.28571429, 5. ])" ] } ], "prompt_number": 19 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Differences Between Arrays and Lists" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Having played with arrays a bit, it is now time to explain the main differences between NumPy arrays and Python lists." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python lists are very general and can hold any combination of data types. However, NumPy **arrays can only hold one type of data** (integers, floats, strings, complex). If we try to combine different types of data, then the array function will **upcast** the data in the array such that it all has the same type" ] }, { "cell_type": "code", "collapsed": false, "input": [ "array([1,2,3.14]) # [int,int,float] -> [float,float,float]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "array([ 1. , 2. , 3.14])" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Upcasting between integers and floats does not cause too much trouble, but mixing strings and numbers in an array can create problems" ] }, { "cell_type": "code", "collapsed": false, "input": [ "array([1.0,1+1j,'hello']) #array data is upcast to strings" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 21, "text": [ "array(['1.0', '(1+1j)', 'hello'], \n", " dtype='\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m8\u001b[0m\u001b[0;34m<\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m<=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" ] } ], "prompt_number": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reason for this error is because the computer does not know how to take an array of many True/False values and return just a single value." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example: Rewriting Sieve of Eratosthenes" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Here we will replace most of the for-loops used when writing the Sieve of Eratosthenes using lists will arrays. This will make the code much easier to read and actually much faster for computing large prime numbers. The main part of the original code is:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "N=20\n", "# generate a list from 2->N\n", "numbers = []\n", "for i in range(2,N+1): # This can be replaced by array\n", " numbers.append(i)\n", "# Run Seive of Eratosthenes algorithm marking nodes with -1\n", "for j in range(N-1):\n", " if numbers[j]!=-1:\n", " p=numbers[j]\n", " for k in range(j+p,N-1,p): # This can be replaced by array\n", " numbers[k]=-1\n", "# Collect all elements not -1 (these are the primes)\n", "primes = []\n", "for i in range(N-1): # This can be replaced by array\n", " if numbers[i]!=-1:\n", " primes.append(numbers[i])\n", "print(primes)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[2, 3, 5, 7, 11, 13, 17, 19]\n" ] } ], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Using arrays instead of lists simplifies the code:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "N=20\n", "# generate a list from 2->N\n", "numbers=arange(2,N+1) # replaced for-loop with call to arange\n", "# Run Seive of Eratosthenes algorithm\n", "# by marking nodes with -1\n", "for j in range(N-1):\n", " if numbers[j]!=-1:\n", " p=numbers[j]\n", " numbers[j+p:N-1:p]=-1 # replaced for-loop by slicing array\n", "# Collect all elements not -1 (these are the primes)\n", "primes=numbers[numbers!=-1] # Used conditional statement to get elements !=-1\n", "print(primes)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ 2 3 5 7 11 13 17 19]\n" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

\n", "