{
"metadata": {
"name": "",
"signature": "sha256:c1e93a8c45c200cd071e378387993049bde6cbc0fb433edb5bd89916ef726a23"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Back to the main [index](../index.ipynb)*"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Scientific Python: Transitioning from MATLAB to Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Part of the introductory series to using [Python for Vision Research](http://gestaltrevision.be/wiki/python/python) brought to you by the [GestaltReVision](http://gestaltrevision.be) group (KU Leuven, Belgium).*\n",
"\n",
"This notebook is meant as an introduction to Python's essential scientific packages: Numpy, PIL, Matplotlib, and SciPy.\n",
"\n",
"There is more Python learning material available on [our lab's wiki](http://gestaltrevision.be/wiki/python/python).\n",
"\n",
"**Author:** Maarten Demeyer \n",
"**Year:** 2014 \n",
"**Copyright:** Public Domain as in [CC0](https://creativecommons.org/publicdomain/zero/1.0/)"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Contents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [A Quick Recap](#A-Quick-Recap)\n",
" - [Data types](#Data-types)\n",
" - [Lists](#Lists)\n",
" - [Functions](#Functions)\n",
" - [Objects](#Objects)\n",
"- [Numpy](#Numpy)\n",
" - [Why we need Numpy ](#Why-we-need-Numpy--)\n",
" - [The ndarray data type](#The-ndarray-data-type)\n",
" - [shape and dtype](#shape-and-dtype)\n",
" - [Indexing and slicing](#Indexing-and-slicing)\n",
" - [Filling and manipulating arrays](#Filling-and-manipulating-arrays)\n",
" - [A few useful functions](#A-few-useful-functions)\n",
" - [A small exercise](#A-small-exercise)\n",
" - [A bit harder: The Gabor](#A-bit-harder:-The-Gabor)\n",
" - [Boolean indexing](#Boolean-indexing)\n",
" - [Vectorizing a simulation](#Vectorizing-a-simulation)\n",
"- [PIL: the Python Imaging Library](#PIL:-the-Python-Imaging-Library)\n",
" - [Loading and showing images](#Loading-and-showing-images)\n",
" - [Resizing, rotating, cropping and converting](#Resizing,-rotating,-cropping-and-converting)\n",
" - [Advanced](#Advanced)\n",
" - [Saving](#Saving)\n",
" - [Exercise](#Exercise)\n",
"- [Matplotlib](#Matplotlib)\n",
" - [Quick plots](#Quick-plots)\n",
" - [Saving to a file](#Saving-to-a-file)\n",
" - [Visualizing arrays](#Visualizing-arrays)\n",
" - [Multi-panel figures](#Multi-panel-figures)\n",
" - [Exercise: Function plots](#Exercise:-Function-plots)\n",
" - [Finer figure control](#Finer-figure-control)\n",
" - [Exercise: Add regression lines](#Exercise:-Add-regression-lines)\n",
"- [Scipy](#Scipy)\n",
" - [Statistics](#Statistics)\n",
" - [Fast Fourier Transform](#Fast-Fourier-Transform)\n"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"A Quick Recap"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Data types"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Depending on what kind of values you want to store, Python variables can be of different data types. For instance:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"my_int = 5\n",
"print my_int, type(my_int)\n",
"\n",
"my_float = 5.0\n",
"print my_float, type(my_float)\n",
"\n",
"my_boolean = False\n",
"print my_boolean, type(my_boolean)\n",
"\n",
"my_string = 'hello'\n",
"print my_string, type(my_string)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Lists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One useful data type is the list, which stores an **ordered**, **mutable** sequence of **any data type**, even **mixed**"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"my_list = [my_int, my_float, my_boolean, my_string]\n",
"print type(my_list)\n",
"\n",
"for element in my_list:\n",
" print type(element)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To retrieve or change specific elements in a list, **indices** and **slicing** can be used.\n",
"\n",
"Indexing **starts at zero**. Slices **do not include the last element**."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print my_list[1]\n",
"\n",
"my_list[1] = 3.0\n",
"my_sublist = my_list[1:3]\n",
"\n",
"print my_sublist\n",
"print type(my_sublist)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Re-usable pieces of code can be put into functions. Many pre-defined functions are available in Python packages.\n",
"\n",
"Functions can have both **required** and **optional** input arguments. When the function has no output argument, it returns **None**."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Function with a required and an optional argument\n",
"def regress(x, c=0, b=1):\n",
" return (x*b)+c\n",
"\n",
"print regress(5) # Only required argument\n",
"print regress(5, 10, 3) # Use argument order\n",
"print regress(5, b=3) # Specify the name to skip an optional argument"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Function without return argument\n",
"def divisible(a,b):\n",
" if a%b:\n",
" print str(a) + \" is not divisible by \" + str(b)\n",
" else:\n",
" print str(a) + \" is divisible by \" + str(b)\n",
"\n",
"divisible(9,3)\n",
"res = divisible(9,2)\n",
"print res"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Function with multiple return arguments\n",
"def add_diff(a,b):\n",
" return a+b, a-b\n",
"\n",
"# Assigned as a tuple\n",
"res = add_diff(5,3)\n",
"print res\n",
"\n",
"# Directly unpacked to two variables\n",
"a,d = add_diff(5,3)\n",
"print a\n",
"print d"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Objects"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Every variable in Python is actually an object. Objects bundle **member variables** with tightly connected **member functions** that (typically) use these member variables. Lists are a good example of this."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"my_list = [1, False, 'boo']\n",
"my_list.append('extra element')\n",
"my_list.remove(False)\n",
"print my_list"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The member variables in this case just contain the information on the elements in the list. They are 'hidden' and not intended to be used directly - you manipulate the list through its member functions.\n",
"\n",
"The functions above are **in-place** methods, changing the original list directly, and returning None. This is not always the case. Some member functions, for instance in strings, do not modify the original object, but return a second, modified object instead."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"return_arg = my_list.append('another one')\n",
"print return_arg\n",
"print my_list"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"my_string = 'kumbaya, milord'\n",
"return_arg = my_string.replace('lord', 'lard')\n",
"print return_arg\n",
"print my_string"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do you remember *why* list functions are in-place, while string functions are not?"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Numpy"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Why we need Numpy "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While lists are great, they are not very suitable for scientific computing. Consider this example:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subj_length = [180.0,165.0,190.0,172.0,156.0]\n",
"subj_weight = [75.0,60.0,83.0,85.0,62.0]\n",
"subj_bmi = []\n",
"\n",
"# EXERCISE 1: Try to compute the BMI of each subject, as well as the average BMI across subjects\n",
"# BMI = weight/(length/100)**2"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clearly, this is clumsy. MATLAB users would expect something like this to work:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subj_bmi = subj_weight/(subj_length/100)**2 \n",
"mean_bmi = mean(subj_bmi)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But it doesn't. `/` and `**` are not defined for lists; nor does the `mean()` function exist. `+` and `*` are defined, but they mean something else. Do you remember what they do?"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"The ndarray data type"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Enter Numpy, and its **ndarray** data type, allowing these *elementwise* computations on ordered sequences, and implementing a host of mathematical functions operating on them.\n",
"\n",
"Lists are converted to Numpy arrays through calling the `np.array()` **constructor** function, which takes a list and creates a new array object filled with the list's values."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"\n",
"# Create a numpy array from a list\n",
"subj_length = np.array([180.0,165.0,190.0,172.0,156.0])\n",
"subj_weight = np.array([75.0,60.0,83.0,85.0,62.0])\n",
"\n",
"print type(subj_length), type(subj_weight)\n",
"\n",
"# EXERCISE 2: Try to complete the program now!\n",
"# Hint: np.mean() computes the mean of a numpy array\n",
"# Note that unlike MATLAB, Python does not need the '.' before elementwise operators"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy is a very large package, that we can't possibly cover completely. But we will cover enough to get you started."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"shape and dtype"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The most basic characteristics of a Numpy array are its shape and the data type of its elements, or dtype. For those of you who have worked in MATLAB before, this should be familiar."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Multi-dimensional lists are just nested lists\n",
"# This is clumsy to work with\n",
"my_nested_list = [[1,2,3],[4,5,6]]\n",
"\n",
"print my_nested_list\n",
"\n",
"print len(my_nested_list)\n",
"print my_nested_list[0]\n",
"print len(my_nested_list[0])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Numpy arrays handle multidimensionality better\n",
"arr = np.array(my_nested_list)\n",
"\n",
"print arr # nicer printing\n",
"print arr.shape # direct access to all dimension sizes\n",
"print arr.size # direct access to the total number of elements\n",
"print arr.ndim # direct access to the number of dimensions"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The member variables **shape** and **size** contain the dimension lengths and the total number of elements, respectively, while **ndim** contains the number of dimensions. \n",
"\n",
"The shape is represented by a tuple, where the **last** dimension is the **inner** dimension representing the **columns** of a 2-D matrix. The first dimension is the top-level, outer dimension and represents the **rows** here. We could also make 3-D (or even higher-level) arrays:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"arr3d = np.array([ [[1,2,3],[4,5,6]] , [[7,8,9],[10,11,12]] ])\n",
"\n",
"print arr3d\n",
"\n",
"print arr3d.shape\n",
"print arr3d.size\n",
"print arr3d.ndim"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now the last or inner dimension becomes the **layer** dimension. The inner lists of the constructor represent the values at that (row,column) coordinate of the various layers. Rows and columns remain the first two dimensions. Note how what we have here now, is **three** layers of **two-by-two** matrices. *Not* two layers of two-by-three matrices.\n",
"\n",
"This implies that dimension sizes are listed from **low to high** in the shape tuple. \n",
"\n",
"The second basic property of an array is its **dtype**. Contrary to list elements, numpy array elements are (typically) all of the same type."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# The type of a numpy array is always... numpy.ndarray\n",
"arr = np.array([[1,2,3],[4,5,6]])\n",
"print type(arr)\n",
"\n",
"# So, let's do a computation\n",
"print arr/2\n",
"\n",
"# Apparently we're doing our computations on integer elements!\n",
"# How do we find out?\n",
"print arr.dtype"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# And how do we fix this?\n",
"arr = arr.astype('float') # Note: this is not an in-place function!\n",
"print arr.dtype\n",
"print arr/2"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Alternatively, we could have defined our dtype better from the start\n",
"arr = np.array([[1,2,3],[4,5,6]], dtype='float')\n",
"print arr.dtype\n",
"\n",
"arr = np.array([[1.,2.,3.],[4.,5.,6.]])\n",
"print arr.dtype"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To summarize, any numpy array is of the data type `numpy.ndarray`, but the data type of its elements can be set separately as its `dtype` member variable. It's a good idea to explicitly define the `dtype` when you create the array."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Indexing and slicing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The same indexing and slicing operations used on lists can also be used on Numpy arrays. It is possible to perform computations on slices directly. \n",
"\n",
"But pay attention - Numpy arrays must have an **identical shape** if you want to combine them. There are some exceptions though, the most common being **scalar** operands."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"arr = np.array([[1,2,3],[4,5,6],[7,8,9]], dtype='float')\n",
"\n",
"# Indexing and slicing\n",
"print arr[0,0] # or: arr[0][0]\n",
"print arr[:-1,0]"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Elementwise computations on slices\n",
"# Remember, the LAST dimension is the INNER dimension\n",
"print arr[:,0] * arr[:,1]\n",
"print arr[0,:] * arr[1,:]\n",
"\n",
"# Note that you could never slice across rows like this in a nested list!"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# This doesn't work\n",
"# print arr[1:,0] * arr[:,1]\n",
"\n",
"# And here's why:\n",
"print arr[1:,0].shape, arr[:,1].shape"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# This however does work. You can always use scalars as the other operand.\n",
"print arr[:,0] * arr[2,2]\n",
"\n",
"# Or, similarly:\n",
"print arr[:,0] * 9."
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As an **exercise**, can you create a 2x3 array containing the column-wise and the row-wise means of the original matrix, respectively? Without using a for-loop."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# EXERCISE 3: Create a 2x3 array containing the column-wise and the row-wise means of the original matrix\n",
"# Do not use a for-loop, and also do not use the np.mean() function for now.\n",
"arr = np.array([[1,2,3],[4,5,6],[7,8,9]], dtype='float')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This works, but it is still a bit clumsy. We will learn more efficient methods below."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Filling and manipulating arrays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Creating** arrays mustn't always be done by hand. The following functions are particularly common. Again, they are analogous to what you do in MATLAB."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 1-D array, filled with zeros\n",
"arr = np.zeros(3)\n",
"print arr\n",
"\n",
"# Multidimensional array of a given shape, filled with ones\n",
"# This automatically allows you to fill arrays with /any/ value\n",
"arr = np.ones((3,2))*5\n",
"print arr\n",
"\n",
"# Sequence from 1 to AND NOT including 16, in steps of 3\n",
"# Note that using a float input makes the dtype a float as well\n",
"# This is equivalent to np.array(range(1.,16.,3))\n",
"arr = np.arange(1.,16.,3)\n",
"print arr\n",
"\n",
"# Sequence from 1 to AND including 16, in 3 steps\n",
"# This always returns an array with dtype float\n",
"arr = np.linspace(1,16,3)\n",
"print arr"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Array of random numbers between 0 and 1, of a given shape\n",
"# Note that the inputs here are separate integers, not a tuple\n",
"arr = np.random.rand(5,2)\n",
"print arr\n",
"\n",
"# Array of random integers from 0 to AND NOT including 10, of a given shape\n",
"# Here the shape is defined as a tuple again\n",
"arr = np.random.randint(0,10,(5,2))\n",
"print arr"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once we have an array, we may wish to **replicate** it to create a larger array. \n",
"\n",
"Here the concept of an **axis** becomes important, i.e., along which of the dimensions of the array are you working? **axis=0** corresponds to the **first** dimension of the shape tuple, **axis=-1** always corresponds to the **last** dimension (inner dimension; columns in case of 2D, layers in case of 3D)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"arr0 = np.array([[1,2],[3,4]])\n",
"print arr0\n",
"\n",
"# 'repeat' replicates elements along a given axis\n",
"# Each element is replicated directly after itself\n",
"arr = np.repeat(arr0, 3, axis=-1)\n",
"print arr\n",
"\n",
"# We may even specify the number of times each element should be repeated\n",
"# The length of the tuple should correspond to the dimension length\n",
"arr = np.repeat(arr0, (2,4), axis=0)\n",
"print arr"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print arr0\n",
"\n",
"# 'tile' replicates the array as a whole\n",
"# Use a tuple to specify the number of tilings along each dimensions\n",
"arr = np.tile(arr0, (2,4))\n",
"print arr"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 'meshgrid' is commonly used to create X and Y coordinate arrays from two vectors\n",
"# where each array contains the X or Y coordinates corresponding to a given pixel in an image\n",
"x = np.arange(10)\n",
"y = np.arange(5)\n",
"print x,y\n",
"\n",
"arrx, arry = np.meshgrid(x,y)\n",
"print arrx\n",
"print arry"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Concatenating** an array allows you to make several arrays into one."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"arr0 = np.array([[1,2],[3,4]])\n",
"arr1 = np.array([[5,6],[7,8]])\n",
"\n",
"# 'concatenate' requires an axis to perform its operation on\n",
"# The original arrays should be put in a tuple\n",
"arr = np.concatenate((arr0,arr1), axis=0)\n",
"print arr # as new rows\n",
"\n",
"arr = np.concatenate((arr0,arr1), axis=1)\n",
"print arr # as new columns"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Suppose we want to create a 3-D matrix from them,\n",
"# we have to create them as being three-dimensional\n",
"# (what happens if you don't?)\n",
"arr0 = np.array([[[1],[2]],[[3],[4]]])\n",
"arr1 = np.array([[[5],[6]],[[7],[8]]])\n",
"print arr0.shape, arr1.shape\n",
"arr = np.concatenate((arr0,arr1),axis=2)\n",
"print arr"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# hstack, vstack, and dstack are short-hand functions\n",
"# which will automatically create these 'missing' dimensions\n",
"arr0 = np.array([[1,2],[3,4]])\n",
"arr1 = np.array([[5,6],[7,8]])\n",
"\n",
"# vstack() concatenates rows\n",
"arr = np.vstack((arr0,arr1))\n",
"print arr\n",
"\n",
"# hstack() concatenates columns\n",
"arr = np.hstack((arr0,arr1))\n",
"print arr\n",
"\n",
"# dstack() concatenates 2D arrays into 3D arrays\n",
"arr = np.dstack((arr0,arr1))\n",
"print arr"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Their counterparts are the hsplit, vsplit, dsplit functions\n",
"# They take a second argument: how do you want to split\n",
"arr = np.random.rand(4,4)\n",
"print arr\n",
"print '--'\n",
"\n",
"# Splitting int equal parts \n",
"arr0,arr1 = hsplit(arr,2)\n",
"print arr0\n",
"print arr1\n",
"print '--'\n",
"\n",
"# Or, specify exact split points\n",
"arr0,arr1,arr2 = hsplit(arr,(1,2))\n",
"print arr0\n",
"print arr1\n",
"print arr2"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can easily **reshape** and **transpose** arrays"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"arr0 = np.arange(10)\n",
"print arr0\n",
"print '--'\n",
"\n",
"# 'reshape' does exactly what you would expect\n",
"# Make sure though that the total number of elements remains the same\n",
"arr = np.reshape(arr0,(5,2))\n",
"print arr\n",
"\n",
"# You can also leave one dimension blank by using -1 as a value\n",
"# Numpy will then compute for you how long this dimension should be\n",
"arr = np.reshape(arr0,(-1,5))\n",
"print arr\n",
"print '--'\n",
"\n",
"# 'transpose' allows you to switch around dimensions\n",
"# A tuple specifies the new order of dimensions\n",
"arr = np.transpose(arr,(1,0))\n",
"print arr\n",
"\n",
"# For simply transposing rows and columns, there is the short-hand form .T\n",
"arr = arr.T\n",
"print arr\n",
"print '--'\n",
"\n",
"# 'flatten' creates a 1D array out of everything\n",
"arr = arr.flatten()\n",
"print arr"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Time for an exercise!** Can you write your own 'meshgrid3d' function, which returns the resulting 2D arrays as two layers of a 3D matrix, instead of two separate 2D arrays?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# EXERCISE 4: Create your own meshgrid3d function\n",
"# Like np.meshgrid(), it should take two vectors and replicate them; one into columns, the other into rows\n",
"# Unlike np.meshgrid(), it should return them as a single 3D array rather than 2D arrays\n",
"# ...do not use the np.meshgrid() function\n",
"\n",
"def meshgrid3d(xvec, yvec):\n",
" # fill in!\n",
"\n",
"xvec = np.arange(10)\n",
"yvec = np.arange(5)\n",
"xy = meshgrid3d(xvec, yvec)\n",
"print xy\n",
"print xy[:,:,0] # = first output of np.meshgrid()\n",
"print xy[:,:,1] # = second output of np.meshgrid()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"A few useful functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now handle arrays in any way we like, but we still don't know any operations to perform on them, other than the basic arithmetic operations. Luckily numpy implements a large collection of common computations. This is a very short review of some useful functions."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"arr = np.random.rand(5)\n",
"print arr\n",
"\n",
"# Sorting and shuffling\n",
"res = arr.sort() \n",
"print arr # in-place!!!\n",
"print res\n",
"\n",
"res = np.random.shuffle(arr) \n",
"print arr # in-place!!!\n",
"print res"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Min, max, mean, standard deviation\n",
"arr = np.random.rand(5)\n",
"print arr\n",
"\n",
"mn = np.min(arr)\n",
"mx = np.max(arr)\n",
"print mn, mx\n",
"\n",
"mu = np.mean(arr)\n",
"sigma = np.std(arr)\n",
"print mu, sigma"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Some functions allow you to specify an axis to work along, in case of multidimensional arrays\n",
"arr2d = np.random.rand(3,5)\n",
"print arr2d\n",
"\n",
"print np.mean(arr2d, axis=0)\n",
"print np.mean(arr2d, axis=1)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Trigonometric functions\n",
"# Note: Numpy works with radians units, not degrees\n",
"arr = np.random.rand(5)\n",
"print arr\n",
"\n",
"sn = np.sin(arr*2*np.pi)\n",
"cs = np.cos(arr*2*np.pi)\n",
"print sn\n",
"print cs"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Exponents and logarithms\n",
"arr = np.random.rand(5)\n",
"print arr\n",
"\n",
"xp = np.exp(arr)\n",
"print xp\n",
"print np.log(xp)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Rounding\n",
"arr = np.random.rand(5)\n",
"print arr\n",
"\n",
"print arr*5\n",
"print np.round(arr*5)\n",
"print np.floor(arr*5)\n",
"print np.ceil(arr*5)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A complete list of all numpy functions can be found at [the Numpy website](http://docs.scipy.org/doc/numpy/reference/). Or, a google search for 'numpy tangens', 'numpy median' or similar will usually get you there as well."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"A small exercise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remember how you were asked to create a 2x3 array containing the column-wise and the row-wise means of a matrix above? We now have the knowledge to do this far shorter. Use a concatenation function and a statistical function to obtain the same thing!"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# EXERCISE 5: Make a better version of Exercise 3 with what you've just learned\n",
"arr = np.array([[1,2,3],[4,5,6],[7,8,9]], dtype='float')\n",
"\n",
"# What we had:\n",
"print np.array([(arr[:,0]+arr[:,1]+arr[:,2])/3,(arr[0,:]+arr[1,:]+arr[2,:])/3])\n",
"\n",
"# Now the new version:"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"A bit harder: The Gabor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A Gabor patch is the product of a sinusoidal grating and a Gaussian. If we ignore orientation and just create a vertically oriented Gabor, the grating luminance (bounded between -1 and 1) is created by:\n",
"\n",
"$grating = \\sin(xf)$\n",
"\n",
"where $x$ is the $x$ coordinate of a pixel, and $f$ is the frequency of the sine wave (how many peaks per $2 \\pi$ coordinate units). A simple 2D Gaussian luminance profile (bounded between 0 and 1) with its peak at coordinate $(0,0)$ and a variance of $1$ is given by:\n",
"\n",
"$gaussian = e^{-(x^2+y^2)/2}$\n",
"\n",
"where $x$ and $y$ are again the $x$ and $y$ coordinates of a pixel. The Gabor luminance (bounded between -1 and 1) for any pixel then equals:\n",
"\n",
"$gabor = grating \\times gaussian$\n",
"\n",
"To visualize this, these are the grating, the Gaussian, and the Gabor, respectively (at maximal contrast):\n",
"\n",
"\n",
"Now you try to create a 100x100 pixel image of a Gabor. Use $x$ and $y$ coordinate values ranging from $-\\pi$ to $\\pi$, and a frequency of 10 for a good-looking result."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# EXERCISE 6: Create a Gabor patch of 100 by 100 pixels\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Step 1: Define the 1D coordinate values\n",
"# Tip: use 100 equally spaced values between -np.pi and np.pi\n",
"\n",
"# Step 2: Create the 2D x and y coordinate arrays\n",
"# Tip: use np.meshgrid()\n",
"\n",
"# Step 3: Create the grating\n",
"# Tip: Use a frequency of 10\n",
"\n",
"# Step 4: Create the Gaussian\n",
"# Tip: use np.exp() to compute a power of e\n",
"\n",
"# Step 5: Create the Gabor\n",
"\n",
"# Visualize your result\n",
"# (we will discuss how this works later)\n",
"plt.figure(figsize=(15,5))\n",
"plt.subplot(131)\n",
"plt.imshow(grating, cmap='gray')\n",
"plt.subplot(132)\n",
"plt.imshow(gaussian, cmap='gray')\n",
"plt.subplot(133)\n",
"plt.imshow(gabor, cmap='gray')\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Boolean indexing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dtype of a Numpy array can also be boolean, that is, `True` or `False`. It is then particularly convenient that given an array of the same shape, these boolean arrays can be used to **index other arrays**."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Check whether each element of a 2x2 array is greater than 0.5\n",
"arr = np.random.rand(2,2)\n",
"print arr\n",
"res = arr>0.5\n",
"print res\n",
"print '--'\n",
" \n",
"# Analogously, check it against each element of a second 2x2 array\n",
"arr2 = np.random.rand(2,2)\n",
"print arr2\n",
"res = arr>arr2\n",
"print res"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# We can use these boolean arrays as indices into other arrays!\n",
"\n",
"# Add 0.5 to any element smaller than 0.5\n",
"arr = np.random.rand(2,2)\n",
"print arr\n",
"res = arr<0.5\n",
"print res\n",
"arr[res] = arr[res]+0.5\n",
"print arr\n",
"\n",
"# Or, shorter:\n",
"arr[arr<0.5] = arr[arr<0.5] + 0.5\n",
"\n",
"# Or, even shorter:\n",
"arr[arr<0.5] += 0.5"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While it is possible to do multiplication and addition on boolean values (this will convert them to ones and zeros), the proper way of doing elementwise boolean logic is to use **boolean operators**: **and, or, xor, not**."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"arr = np.array([[1,2,3],[4,5,6]])\n",
"\n",
"# The short-hand forms for elementwise boolean operators are: & | ~ ^\n",
"# Use parentheses around such expressions\n",
"res = (arr<4) & (arr>1)\n",
"print res\n",
"print '--'\n",
"\n",
"res = (arr<2) | (arr==5)\n",
"print res\n",
"print '--'\n",
"\n",
"res = (arr>3) & ~(arr==6)\n",
"print res\n",
"print '--'\n",
"\n",
"res = (arr>3) ^ (arr<5)\n",
"print res"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# To convert boolean indices to normal integer indices, use the 'nonzero' function\n",
"print res\n",
"print np.nonzero(res)\n",
"print '--'\n",
"\n",
"# Separate row and column indices\n",
"print np.nonzero(res)[0]\n",
"print np.nonzero(res)[1]\n",
"print '--'\n",
"\n",
"# Or stack and transpose them to get index pairs\n",
"pairs = np.vstack(np.nonzero(res)).T\n",
"print pairs"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Vectorizing a simulation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy is excellent at making programs that involve iterative operations more efficient. This then requires you to re-imagine the problem as an array of values, rather than values that change with each loop iteration.\n",
"\n",
"For instance, imagine the following situation: You throw a die continuously until you either encounter the sequence \u2018123\u2019 or \u2018111\u2019. Which one can be expected to occur sooner? This could be proven mathematically, but in practice it is often faster to do a simulation instead of working out an analytical solution. \n",
"\n",
"We could just use two nested for-loops:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"\n",
"# We will keep track of the sum of first occurence positions,\n",
"# as well as the number of positions entered into this sum.\n",
"# This way we can compute the mean.\n",
"sum111 = 0.\n",
"n111 = 0.\n",
"sum123 = 0.\n",
"n123 = 0.\n",
"\n",
"for sim in range(5000):\n",
" \n",
" # Keep track of how far along we are in finding a given pattern\n",
" d111 = 0\n",
" d123 = 0\n",
" \n",
" for throw in range(2000):\n",
" \n",
" # Throw a die\n",
" die = np.random.randint(1,7)\n",
" \n",
" # 111 case\n",
" if d111==3:\n",
" pass\n",
" elif die==1 and d111==0:\n",
" d111 = 1\n",
" elif die==1 and d111==1:\n",
" d111 = 2\n",
" elif die==1 and d111==2:\n",
" d111 = 3\n",
" sum111 = sum111 + throw\n",
" n111 = n111 + 1 \n",
" else:\n",
" d111 = 0\n",
" \n",
" # 123 case\n",
" if d123==3:\n",
" pass\n",
" elif die==1:\n",
" d123 = 1\n",
" elif die==2 and d123==1:\n",
" d123 = 2\n",
" elif die==3 and d123==2:\n",
" d123 = 3\n",
" sum123 = sum123 + throw\n",
" n123 = n123 + 1\n",
" else:\n",
" d123 = 0\n",
" \n",
" # Don't continue if both have been found\n",
" if d111==3 and d123==3:\n",
" break\n",
"\n",
"# Compute the averages\n",
"avg111 = sum111/n111\n",
"avg123 = sum123/n123\n",
"print avg111, avg123 \n",
"\n",
"# ...can you spot the crucial difference between both patterns?"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However this is inefficient and makes the code unwieldy. Vectorized solutions are usually preferred.\n",
"\n",
"Try to run these 5000 simulations using Numpy, **without any loops**, and see whether the result is the same. Use a maximal die-roll sequence length of 2000, and just assume that both '123' and '111' will occur before the end of any sequence.\n",
"\n",
"You will have to make use of 2D arrays and boolean logic. A quick solution to find the first occurence in a boolean array is to use **argmax** - use the only Numpy documentation to find out how to use it.\n",
"\n",
"**Vectorizing problems is a crucial skill in scientific computing!**"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# EXERCISE 7: Vectorize the above program\n",
"\n",
"# You get these lines for free...\n",
"import numpy as np\n",
"throws = np.random.randint(1,7,(5000,2000))\n",
"one = (throws==1)\n",
"two = (throws==2)\n",
"three = (throws==3)\n",
"\n",
"# Find out where all the 111 and 123 sequences occur\n",
"find111 = \n",
"find123 = \n",
"\n",
"# Then at what index they /first/ occur in each sequence\n",
"first111 = \n",
"first123 = \n",
"\n",
"# Compute the average first occurence location for both situations\n",
"avg111 = \n",
"avg123 = \n",
"\n",
"# Print the result\n",
"print avg111, avg123"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this particular example, the nested for-loop solution does have the advantage that it can 'break' out of the die throwing sequence when first occurences of both patterns have been found, whereas Numpy will always generate complete sequences of 2000 rolls. \n",
"\n",
"Remove the `break` statement in the first solution to see what the speed difference would have been if both programs were truly doing the same thing!"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"PIL: the Python Imaging Library"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As vision scientists, images are a natural stimulus to work with. The Python Imaging Library will help us handle images, similar to the Image Processing toolbox in MATLAB. Note that PIL itself has nowadays been superseded by **Pillow**, for which an excellent documentation can be found [here](http://pillow.readthedocs.org/). The module to import is however still called 'PIL'. In practice, we will mostly use its **Image** module."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from PIL import Image"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Loading and showing images"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The image we will use for this example code should be in the same directory as this file. But really, any color image will do, as long as you put it in the same directory as this notebook, and change the filename string in the code to correspond with the actual image filename."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Opening an image is simple enough:\n",
"# Construct an Image object with the filename as an argument\n",
"im = Image.open('python.jpg')\n",
"\n",
"# It is now represented as an object of the 'JpegImageFile' type\n",
"print im\n",
"\n",
"# There are some useful member variables we can inspect here\n",
"print im.format # format in which the file was saved\n",
"print im.size # pixel dimensions\n",
"print im.mode # luminance/color model used\n",
"\n",
"# We can even display it\n",
"# NOTE this is not perfect; meant for debugging\n",
"im.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the `im.show()` call does not work well on your system, use this function instead to show images in a separate window. Note, you must always close the window before you can continue using the notebook. \n",
"\n",
"(`Tkinter` is a package to write graphical user interfaces in Python, we will not discuss it here)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Alternative quick-show method\n",
"from Tkinter import Tk, Button\n",
"from PIL import ImageTk\n",
"\n",
"def alt_show(im):\n",
" win = Tk()\n",
" tkimg = ImageTk.PhotoImage(im)\n",
" Button(image=tkimg).pack()\n",
" win.mainloop()\n",
"\n",
"alt_show(im)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once we have opened the image in PIL, we can convert it to a Numpy object."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# We can convert PIL images to an ndarray!\n",
"arr = np.array(im)\n",
"print arr.dtype # uint8 = unsigned 8-bit integer (values 0-255 only)\n",
"print arr.shape # Why do we have three layers?\n",
"\n",
"# Let's make it a float-type for doing computations\n",
"arr = arr.astype('float')\n",
"print arr.dtype\n",
"\n",
"# This opens up unlimited possibilities for image processing!\n",
"# For instance, let's make this a grayscale image, and add white noise\n",
"max_noise = 50\n",
"arr = np.mean(arr,-1)\n",
"noise = (np.random.rand(arr.shape[0],arr.shape[1])-0.5)*2\n",
"arr = arr + noise*max_noise\n",
"\n",
"# Make sure we don't exceed the 0-255 limits of a uint8\n",
"arr[arr<0] = 0\n",
"arr[arr>255] = 255"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The conversion back to PIL is easy as well"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# When going back to PIL, it's a good idea to explicitly \n",
"# specify the right dtype and the mode.\n",
"# Because automatic conversions might mess things up\n",
"arr = arr.astype('uint8')\n",
"imn = Image.fromarray(arr, mode='L')\n",
"print imn.format\n",
"print imn.size\n",
"print imn.mode # L = greyscale\n",
"imn.show() # or use alt_show() from above if show() doesn't work well for you\n",
"\n",
"# Note that /any/ 2D or 2Dx3 numpy array filled with values between 0 and 255\n",
"# can be converted to an image object in this way"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Resizing, rotating, cropping and converting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The main operations of the PIL Image module you will probably use, are its **resizing** and **conversion** capabilities."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"im = Image.open('python.jpg')\n",
"\n",
"# Make the image smaller\n",
"ims = im.resize((800,600))\n",
"ims.show()\n",
"\n",
"# Or you could even make it larger\n",
"# The resample argument allows you to specify the method used\n",
"iml = im.resize((1280,1024), resample=Image.BILINEAR)\n",
"iml.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Rotation is similar (unit=degrees)\n",
"imr = im.rotate(10, resample=Image.BILINEAR, expand=False)\n",
"imr.show()\n",
"\n",
"# If we want to lose the black corners, we can crop (unit=pixels)\n",
"imr = imr.crop((100,100,924,668))\n",
"imr.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 'convert' allows conversion between different color models\n",
"# The most important here is between 'L' (luminance) and 'RGB' (color)\n",
"imbw = im.convert('L')\n",
"imbw.show()\n",
"print imbw.mode\n",
"\n",
"imrgb = imbw.convert('RGB')\n",
"imrgb.show()\n",
"print imrgb.mode\n",
"\n",
"# Note that the grayscale conversion of PIL is more sophisticated\n",
"# than simply averaging the three layers in Numpy (it is a weighted average)\n",
"\n",
"# Also note that the color information is effectively lost after converting to L"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Advanced"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The **ImageFilter** module implements several types of filters to execute on any image. You can also define your own."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from PIL import Image, ImageFilter\n",
"\n",
"im = Image.open('python.jpg')\n",
"imbw = im.convert('L')\n",
"\n",
"# Contour detection filter\n",
"imf = imbw.filter(ImageFilter.CONTOUR)\n",
"imf.show()\n",
"\n",
"# Blurring filter\n",
"imf = imbw.filter(ImageFilter.GaussianBlur(radius=3))\n",
"imf.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, you can import the **ImageDraw** module to draw shapes and text onto an image."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from PIL import Image, ImageDraw\n",
"\n",
"im = Image.open('python.jpg')\n",
"\n",
"# You need to attach a drawing object to the image first\n",
"imd = ImageDraw.Draw(im)\n",
"\n",
"# Then you work on this object\n",
"imd.rectangle([10,10,100,100], fill=(255,0,0))\n",
"imd.line([(200,200),(200,600)], width=10, fill=(0,0,255))\n",
"imd.text([500,500], 'Python', fill=(0,255,0))\n",
"\n",
"# The results are automatically applied to the Image object\n",
"im.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Saving"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, you can of course save these image objects back to a file on the disk."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# PIL will figure out the file type by the extension\n",
"im.save('python.bmp')\n",
"\n",
"# There are also further options, like compression quality (0-100)\n",
"im.save('python_bad.jpg', quality=5)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Exercise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We mentioned that the conversion to grayscale in PIL is not just a simple averaging of the RGB layers. Can you visualize as an image what the difference in result looks like, when comparing a simple averaging to a PIL grayscale conversion?\n",
"\n",
"Pixels that are less luminant in the plain averaging method should be displayed in red, with a luminance depending on the size of the difference. Pixels that are more luminant when averaging in Numpy should similarly be displayed in green.\n",
"\n",
"**Hint: you will have to make use of Boolean indexing.**\n",
"\n",
"As an extra, try to maximize the contrast in your image, so that all values from 0-255 are used. \n",
"\n",
"As a second extra, save the result as PNG files of three different sizes (large, medium, small), at respectively the full image resolution, half of the image size, and a quarter of the image size."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# EXERCISE 8: Visualize the difference between the PIL conversion to grayscale, and a simple averaging of RGB\n",
"# Display pixels where the average is LESS luminant in red, and where it is MORE luminant in shades green\n",
"# The luminance of these colors should correspond to the size of the difference\n",
"#\n",
"# Extra 1: Maximize the overall contrast in your image\n",
"#\n",
"# Extra 2: Save as three PNG files, of different sizes (large, medium, small)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Matplotlib"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While PIL is useful for processing photographic images, it falls short for creating data plots and other kinds of schematic figures. **Matplotlib** offers a far more advanced solution for this, specifically through its **pyplot** module."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Quick plots"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Common figures such as scatter plots, histograms and barcharts can be generated and manipulated very simply."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"from PIL import Image\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# As data for our plots, we will use the pixel values of the image\n",
"# Open image, convert to an array\n",
"im = Image.open('python.jpg')\n",
"im = im.resize((400,300))\n",
"arr = np.array(im, dtype='float')\n",
"\n",
"# Split the RGB layers and flatten them\n",
"R,G,B = np.dsplit(arr,3)\n",
"R = R.flatten()\n",
"G = G.flatten()\n",
"B = B.flatten()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# QUICKPLOT 1: Correlation of luminances in the image\n",
"\n",
"# This works if you want to be very quick:\n",
"# (xb means blue crosses, .g are green dots) \n",
"plt.plot(R, B, 'xb')\n",
"plt.plot(R, G, '.g')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# However we will take a slightly more disciplined approach here\n",
"# Note that Matplotlib wants colors expressed as 0-1 values instead of 0-255\n",
"\n",
"# Create a square figure\n",
"plt.figure(figsize=(5,5))\n",
"\n",
"# Plot both scatter clouds\n",
"# marker: self-explanatory\n",
"# linestyle: 'None' because we want no line\n",
"# color: RGB triplet with values 0-1\n",
"plt.plot(R, B, marker='x', linestyle='None', color=(0,0,0.6))\n",
"plt.plot(R, G, marker='.', linestyle='None', color=(0,0.35,0))\n",
"\n",
"# Make the axis scales equal, and name them\n",
"plt.axis([0,255,0,255])\n",
"plt.xlabel('Red value')\n",
"plt.ylabel('Green/Blue value')\n",
"\n",
"# Show the result\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# QUICKPLOT 2: Histogram of 'red' values in the image\n",
"plt.hist(R)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# ...and now a nicer version\n",
"\n",
"# Make a non-square figure\n",
"plt.figure(figsize=(7,5))\n",
"\n",
"# Make a histogram with 25 red bins\n",
"# Here we simply use the abbreviation 'r' for red\n",
"plt.hist(R, bins=25, color='r')\n",
"\n",
"# Set the X axis limits and label\n",
"plt.xlim([0,255])\n",
"plt.xlabel('Red value', size=16)\n",
"\n",
"# Remove the Y ticks and labels by setting them to an empty list\n",
"plt.yticks([])\n",
"\n",
"# Remove the top ticks by specifying the 'top' argument\n",
"plt.tick_params(top=False)\n",
"\n",
"# Add two vertical lines for the mean and the median\n",
"plt.axvline(np.mean(R), color='g', linewidth=3, label='mean')\n",
"plt.axvline(np.median(R), color='b', linewidth=1, linestyle=':', label='median')\n",
"\n",
"# Generate a legend based on the label= arguments\n",
"plt.legend(loc=2)\n",
"\n",
"# Show the plot\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# QUICKPLOT 3: Bar chart of mean+std of RGB values\n",
"plt.bar([0,1,2],[np.mean(R), np.mean(G), np.mean(B)], yerr=[np.std(R), np.std(G), np.std(B)])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# ...and now a nicer version\n",
"\n",
"# Make a non-square-figure\n",
"plt.figure(figsize=(7,5))\n",
"\n",
"# Plot the bars with various options\n",
"# x location where bars start, y height of bars\n",
"# yerr: data for error bars\n",
"# width: width of the bars\n",
"# color: surface color of bars\n",
"# ecolor: color of error bars ('k' means black)\n",
"plt.bar([0,1,2], [np.mean(R), np.mean(G), np.mean(B)], \n",
" yerr=[np.std(R), np.std(G), np.std(B)],\n",
" width=0.75,\n",
" color=['r','g','b'], \n",
" ecolor='k')\n",
"\n",
"# Set the X-axis limits and tick labels\n",
"plt.xlim((-0.25,3.))\n",
"plt.xticks(np.array([0,1,2])+0.75/2, ['Red','Green','Blue'], size=16)\n",
"\n",
"# Remove all X-axis ticks by setting their length to 0\n",
"plt.tick_params(length=0)\n",
"\n",
"# Set a figure title\n",
"plt.title('RGB Color Channels', size=16)\n",
"\n",
"# Show the figure\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A full documentation of all these pyplot commands and options can be found [here](http://matplotlib.org/api/pyplot_summary.html). If you use Matplotlib, you will be consulting this page a lot!"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Saving to a file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Saving to a file is easy enough, using the **savefig()** function. However, there are some caveats, depending on the exact environment you are using. You have to use it BEFORE calling `plt.show()` and, in case of this notebook, within the same codebox. The reason for this is that Matplotlib is automatically deciding for you which plot commands belong to the same figure based on these criteria."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# So, copy-paste this line into the box above, before the plt.show() command\n",
"plt.savefig('bar.png') \n",
"\n",
"# There are some further formatting options possible, e.g.\n",
"plt.savefig('bar.svg', dpi=300, bbox_inches=('tight'), pad_inches=(1,1), facecolor=(0.8,0.8,0.8))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Visualizing arrays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Like PIL, Matplotlib is capable of displaying the contents of 2D Numpy arrays. The primary method is **imshow()**"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# A simple grayscale luminance map\n",
"# cmap: colormap used to display the values\n",
"plt.figure(figsize=(5,5))\n",
"plt.imshow(np.mean(arr,2), cmap='gray')\n",
"plt.show()\n",
"\n",
"# Importantly and contrary to PIL, imshow luminances are by default relative\n",
"# That is, the values are always rescaled to 0-255 first (maximum contrast)\n",
"# Moreover, colormaps other than grayscale can be used\n",
"plt.figure(figsize=(5,5))\n",
"plt.imshow(np.mean(arr,2)+100, cmap='jet') # or hot, hsv, cool,...\n",
"plt.show() # as you can see, adding 100 didn't make a difference here"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Multi-panel figures"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we noted, Matplotlib is behind the scenes keeping track of what your current figure is. This is often convenient, but in some cases you want to keep explicit control of what figure you're working on. For this, we will have to make a distinction between **Figure** and **Axes** objects."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 'Figure' objects are returned by the plt.figure() command\n",
"fig = plt.figure(figsize=(7,5))\n",
"print type(fig)\n",
"\n",
"# Axes objects are the /actual/ plots within the figure\n",
"# Create them using the add_axes() method of the figure object\n",
"# The input coordinates are relative (left, bottom, width, height)\n",
"ax0 = fig.add_axes([0.1,0.1,0.4,0.7], xlabel='The X Axis')\n",
"ax1 = fig.add_axes([0.2,0.2,0.5,0.2], axisbg='gray')\n",
"ax2 = fig.add_axes([0.4,0.5,0.4,0.4], projection='polar')\n",
"print type(ax0), type(ax1), type(ax2)\n",
"\n",
"# This allows you to execute functions like savefig() directly on the figure object\n",
"# This resolves Matplotlib's confusion of what the current figure is, when using plt.savefig()\n",
"fig.savefig('fig.png')\n",
"\n",
"# It also allows you to add text to the figure as a whole, across the different axes objects\n",
"fig.text(0.5, 0.5, 'splatter', color='r')\n",
"\n",
"# The overall figure title can be set separate from the individual plot titles\n",
"fig.suptitle('What a mess', size=18)\n",
"\n",
"# show() is actually a figure method as well\n",
"# It just gets 'forwarded' to what is thought to be the current figure if you use plt.show()\n",
"fig.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a full list of the Figure methods and options, go [here](http://matplotlib.org/api/figure_api.html#matplotlib.figure.Figure)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Create a new figure\n",
"fig = plt.figure(figsize=(15,10))\n",
"\n",
"# As we saw, many of the axes properties can already be set at their creation\n",
"ax0 = fig.add_axes([0.,0.,0.25,0.25], xticks=(0.1,0.5,0.9), xticklabels=('one','thro','twee'))\n",
"ax1 = fig.add_axes([0.3,0.,0.25,0.25], xscale='log', ylim=(0,0.5))\n",
"ax2 = fig.add_axes([0.6,0.,0.25,0.25])\n",
"\n",
"# Once you have the axes object though, there are further methods available\n",
"# This includes many of the top-level pyplot functions\n",
"# If you use for instance plt.plot(), Matplotlib is actually 'forwarding' this\n",
"# to an Axes.plot() call on the current Axes object\n",
"R.sort()\n",
"G.sort()\n",
"B.sort()\n",
"ax2.plot(R, color='r', linestyle='-', marker='None') # plot directly to an Axes object of choice\n",
"plt.plot(G, color='g', linestyle='-', marker='None') # plt.plot() just plots to the last created Axes object\n",
"ax2.plot(B, color='b', linestyle='-', marker='None')\n",
"\n",
"# Other top-level pyplot functions are simply renamed to 'set_' functions here\n",
"ax1.set_xticks([])\n",
"plt.yticks([])\n",
"\n",
"# Show the figure\n",
"fig.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The full methods and options of Axes can be found [here](http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clearly, when making a multi-panel figure, we are actually creating a single **Figure** object with multiple **Axes** objects attached to it. Having to set the Axes sizes manually is annoying though. Luckily, the **subplot()** method can handle much of this automatically."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Create a new figure\n",
"fig = plt.figure(figsize=(15,5))\n",
"\n",
"# Specify the LAYOUT of the subplots (rows,columns)\n",
"# as well as the CURRENT Axes you want to work on\n",
"ax0 = fig.add_subplot(231)\n",
"\n",
"# Equivalent top-level call on the current figure\n",
"# It is also possible to create several subplots at once using plt.subplots()\n",
"ax1 = plt.subplot(232) \n",
"\n",
"# Optional arguments are similar to those of add_axes()\n",
"ax2 = fig.add_subplot(233, title='three') \n",
"\n",
"# We can use these Axes object as before\n",
"ax3 = fig.add_subplot(234)\n",
"ax3.plot(R, 'r-') \n",
"ax3.set_xticks([])\n",
"ax3.set_yticks([])\n",
"\n",
"# We skipped the fifth subplot, and create only the 6th\n",
"ax5 = fig.add_subplot(236, projection='polar')\n",
"\n",
"# We can adjust the spacings afterwards\n",
"fig.subplots_adjust(hspace=0.4)\n",
"\n",
"# And even make room in the figure for a plot that doesn't fit the grid\n",
"fig.subplots_adjust(right=0.5)\n",
"ax6 = fig.add_axes([0.55,0.1,0.3,0.8])\n",
"\n",
"# Show the figure\n",
"fig.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Exercise: Function plots"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a figure with a 2:1 aspect ratio, containing two subplots, one above the other. The TOP figure should plot one full cycle of a sine wave, that is $y=sin(x)$. Use $0$ to $2\\pi$ as values on the X axis. On the same scale, the BOTTOM figure should plot $y=sin(x^2)$ instead. Tweak your figure until you think it looks good."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# EXERCISE 9: Plot y=sin(x) and y=sin(x^2) in two separate subplots, one above the other\n",
"# Let x range from 0 to 2*pi"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Finer figure control"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are not satisfied with the output of these general plotting functions, despite all the options they offer, you can start fiddling with the details manually.\n",
"\n",
"First, many figure elements can be **manually added** through top-level or Axes functions:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# This uses the result of the exercise above\n",
"# You have to copy-paste it into the same code-box, before the fig.show()\n",
"\n",
"# Add horizontal lines\n",
"ax0.axhline(0, color='g')\n",
"ax0.axhline(0.5, color='gray', linestyle=':')\n",
"ax0.axhline(-0.5, color='gray', linestyle=':')\n",
"ax1.axhline(0, color='g')\n",
"ax1.axhline(0.5, color='gray', linestyle=':')\n",
"ax1.axhline(-0.5, color='gray', linestyle=':')\n",
"\n",
"# Add text to the plots\n",
"ax0.text(0.1,-0.9,'$y = sin(x)$', size=16) # math mode for proper formula formatting!\n",
"ax1.text(0.1,-0.9,'$y = sin(x^2)$', size=16)\n",
"\n",
"# Annotate certain points with a value\n",
"for x_an in np.linspace(0,2*np.pi,9):\n",
" ax0.annotate(str(round(sin(x_an),2)),(x_an,sin(x_an)))\n",
" \n",
"# Add an arrow (x,y,xlength,ylength)\n",
"ax0.arrow(np.pi-0.5,-0.5,0.5,0.5, head_width=0.1, length_includes_head=True)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Second, all basic elements like lines, polygons and the individual axis lines are customizable **objects in their own right**, attached to a specific Axes object. They can be retrieved, manipulated, created from scratch, and added to existing Axes objects."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# This uses the result of the exercise above\n",
"# You have to copy-paste it into the same code-box, before the fig.show()\n",
"\n",
"# For instance, fetch the X axis\n",
"# XAxis objects have their own methods\n",
"xax = ax1.get_xaxis()\n",
"print type(xax)\n",
"\n",
"# These methods allow you to fetch the even smaller building blocks\n",
"# For instance, tick-lines are Line2D objects attached to the XAxis\n",
"xaxt = xax.get_majorticklines()\n",
"print len(xaxt)\n",
"\n",
"# Of which you can fetch AND change the properties\n",
"# Here we change just one tickline into a cross\n",
"print xaxt[6].get_color()\n",
"xaxt[6].set_color('g')\n",
"xaxt[6].set_marker('x')\n",
"xaxt[6].set_markersize(10)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# This uses the result of the exercise above\n",
"# You have to copy-paste it into the same code-box, before the fig.show()\n",
"\n",
"# Another example: fetch the lines in the plot\n",
"# Change the color, change the marker, and mark only every 100 points for one specific line\n",
"ln = ax0.get_lines()\n",
"print ln\n",
"ln[0].set_color('g')\n",
"ln[0].set_marker('o')\n",
"ln[0].set_markerfacecolor('b')\n",
"ln[0].set_markevery(100)\n",
"\n",
"# Finally, let's create a graphic element from scratch, that is not available as a top-level pyplot function\n",
"# And then attach it to existing Axes\n",
"# NOTE: we need to import something before we can create the ellipse like this. What should we import?\n",
"ell = matplotlib.patches.Ellipse((np.pi,0), 1., 1., color='r')\n",
"ax0.add_artist(ell)\n",
"ell.set_hatch('//')\n",
"ell.set_edgecolor('black')\n",
"ell.set_facecolor((0.9,0.9,0.9))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Exercise: Add regression lines"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Take the scatterplot from the first example, and manually add a regression line to both the R-G and the R-B comparisons. Try not to use the plot() function for the regression line, but manually create a Line2D object instead, and attach it to the Axes.\n",
"\n",
"Useful functions: \n",
"- **np.polyfit(x,y,1)** performs a linear regression, returning slope and constant \n",
"- **plt.gca()** retrieves the current Axes object \n",
"- **matplotlib.lines.Line2D(x,y)** can create a new Line2D object from x and y coordinate vectors"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# EXERCISE 10: Add regression lines\n",
"import numpy as np\n",
"from PIL import Image\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib.lines as lines\n",
"\n",
"# Open image, convert to an array\n",
"im = Image.open('python.jpg')\n",
"im = im.resize((400,300))\n",
"arr = np.array(im, dtype='float')\n",
"\n",
"# Split the RGB layers and flatten them\n",
"R,G,B = np.dsplit(arr,3)\n",
"R = R.flatten()\n",
"G = G.flatten()\n",
"B = B.flatten()\n",
"\n",
"# Do the plotting\n",
"plt.figure(figsize=(5,5))\n",
"plt.plot(R, B, marker='x', linestyle='None', color=(0,0,0.6))\n",
"plt.plot(R, G, marker='.', linestyle='None', color=(0,0.35,0))\n",
"\n",
"# Tweak the plot\n",
"plt.axis([0,255,0,255])\n",
"plt.xlabel('Red value')\n",
"plt.ylabel('Green/Blue value')\n",
"\n",
"# Fill in your code...\n",
"\n",
"# Show the result\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Scipy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Scipy** is a large library of scientific functions, covering for instance numerical integration, linear algebra, Fourier transforms, and interpolation algorithms. If you can't find the equivalent of your favorite MATLAB function in any of the previous three packages, Scipy is a good place to look. A full list of all submodules can be found [here](http://docs.scipy.org/doc/scipy/reference/).\n",
"\n",
"We will pick two useful modules from SciPy: **stats** and **fftpack**\n",
"\n",
"I will not give a lot of explanation here. I'll leave it up to you to navigate through the documentation, and find out how these functions work."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Statistics"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import scipy.stats as stats\n",
"\n",
"# Generate random numbers between 0 and 1\n",
"data = np.random.rand(30)\n",
"\n",
"# Do a t-test with a H0 for the mean of 0.4\n",
"t,p = stats.ttest_1samp(data,0.4)\n",
"print p\n",
"\n",
"# Generate another sample of random numbers, with mean 0.4\n",
"data2 = np.random.rand(30)-0.1\n",
"\n",
"# Do a t-test that these have the same mean\n",
"t,p = stats.ttest_ind(data, data2)\n",
"print p"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import scipy.stats as stats\n",
"\n",
"# Simulate the size of the F statistic when comparing three conditions\n",
"# Given a constant n, and an increasing true effect size. \n",
"true_effect = np.linspace(0,0.5,500)\n",
"n = 100\n",
"Fres = []\n",
"\n",
"# Draw random normally distributed samples for each condition, and do a one-way ANOVA\n",
"for eff in true_effect:\n",
" c1 = stats.norm.rvs(0,1,size=n)\n",
" c2 = stats.norm.rvs(eff,1,size=n)\n",
" c3 = stats.norm.rvs(2*eff,1,size=n)\n",
" F,p = stats.f_oneway(c1,c2,c3)\n",
" Fres.append(F)\n",
"\n",
"# Create the plot\n",
"plt.figure()\n",
"plt.plot(true_effect,Fres,'r*-')\n",
"plt.xlabel('True Effect')\n",
"plt.ylabel('F')\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import scipy.stats as stats\n",
"\n",
"# Compute the pdf and cdf of normal distributions, with increasing sd's\n",
"# Then plot them in different colors\n",
"# (of course, many other distributions are also available)\n",
"x = np.linspace(-5,5,1000)\n",
"sds = np.linspace(0.25,2.5,10)\n",
"cols = np.linspace(0.15,0.85,10)\n",
"\n",
"# Create the figure\n",
"fig = plt.figure(figsize=(10,5))\n",
"ax0 = fig.add_subplot(121)\n",
"ax1 = fig.add_subplot(122)\n",
"\n",
"# Compute the densities, and plot them\n",
"for i,sd in enumerate(sds):\n",
" y1 = stats.norm.pdf(x,0,sd)\n",
" y2 = stats.norm.cdf(x,0,sd)\n",
" ax0.plot(x,y1,color=cols[i]*np.array([1,0,0]))\n",
" ax1.plot(x,y2,color=cols[i]*np.array([0,1,0]))\n",
"\n",
"# Show the figure\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The **stats** module of SciPy contains more statistical distributions and further tests such as a Kruskall-Wallis test, Wilcoxon test, a Chi-Square test, a test for normality, and so forth. A full listing of functions is found [here](http://docs.scipy.org/doc/scipy/reference/stats.html).\n",
"\n",
"For serious statistical models however, you should be looking at the [**statsmodels**](http://statsmodels.sourceforge.net/) package, or the [**rpy**](http://rpy.sourceforge.net/) interfacing package, allowing R to be called from within Python."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Fast Fourier Transform"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FFT is commonly used to process or analyze images (as well as sound). Numpy has a FFT package, `numpy.fft`, but SciPy has its own set of functions as well in `scipy.fftpack`. Both are very similar, you can use whichever package you like. \n",
"\n",
"I will assume that you are familiar with the basic underlying theory. That is, that any periodic function can be described as a sum of sine-waves of different frequencies, amplitudes and phases. A **Fast Fourier Transform** allows you to do this very quickly for equally spaced samples from the function, returning a finite set of sinusoidal components with `n` equal to the number of samples, ordered by frequency.\n",
"\n",
"Let's do this for a simple 1D function."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import scipy.fftpack as fft\n",
"\n",
"# The original data: a step function\n",
"data = np.zeros(200, dtype='float')\n",
"data[25:100] = 1\n",
"\n",
"# Decompose into sinusoidal components\n",
"# The result is a series of complex numbers as long as the data itself\n",
"res = fft.fft(data)\n",
"\n",
"# FREQUENCY is implied by the ordering, but can be retrieved as well\n",
"# It increases from 0 to the Nyquist frequency (0.5), followed by its reversed negative counterpart\n",
"# Note: in case of real input data, the FFT results will be symmetrical\n",
"freq = fft.fftfreq(data.size)\n",
"\n",
"# AMPLITUDE is given by np.abs() of the complex numbers\n",
"amp = np.abs(res)\n",
"\n",
"# PHASE is given by np.angle() of the complex numbers\n",
"phase = np.angle(res)\n",
"\n",
"# We can plot each component separately\n",
"plt.figure(figsize=(15,5))\n",
"plt.plot(data, 'k-', lw=3)\n",
"xs = np.linspace(0,data.size-1,data.size)*2*np.pi\n",
"for i in range(len(res)): \n",
" ys = np.cos(xs*freq[i]+phase[i]) * (amp[i]/data.size)\n",
" plt.plot(ys.real, 'r:', lw=1)\n",
"plt.show()\n",
"\n",
"# Can you then plot what the SUM of all these components looks like?"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, there is a short-hand function available for reconstructing the original input from the FFT result:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# ifft = inverse fft\n",
"reconstructed = fft.ifft(res)\n",
"\n",
"plt.figure(figsize=(15,5))\n",
"plt.plot(data,'k-', lw=3)\n",
"plt.plot(reconstructed, 'r:', lw=3)\n",
"plt.show()\n",
"\n",
"# Note that /some/ information has been lost, but very little\n",
"print 'Total deviation:', np.sum(np.abs(data-reconstructed))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This allows us to perform operations in the **frequency domain**, like applying a high-pass filter: "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Set components with low frequencies equal to 0\n",
"resf = res.copy()\n",
"mask = np.abs(fft.fftfreq(data.size)) < 0.25\n",
"resf[mask] = 0\n",
"\n",
"# ifft the modified components\n",
"filtered = fft.ifft(resf)\n",
"\n",
"# And the result is high-pass filtered\n",
"plt.figure(figsize=(15,5))\n",
"plt.plot(data,'k-', lw=3)\n",
"plt.plot(filtered, 'r:', lw=3)\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The exact same logic can be applied to 2D images, using the **ftt2** and **ifft2** functions. For instance, let us equalize the amplitude spectrum of an image, so that all frequencies are equally strong."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"from PIL import Image\n",
"import matplotlib.pyplot as plt\n",
"import scipy.fftpack as fft\n",
"\n",
"im = Image.open('python.jpg').convert('L')\n",
"arr = np.array(im, dtype='float')\n",
"res = fft.fft2(arr)\n",
"\n",
"# Just set all amplitudes to one\n",
"new_asp = np.ones(res.shape, dtype='float')\n",
"\n",
"# Then recombine the complex numbers\n",
"real_part = new_asp * np.cos(np.angle(res))\n",
"imag_part = new_asp * np.sin(np.angle(res))\n",
"eq = real_part+(imag_part*1j)\n",
"\n",
"# And do the ifft\n",
"arr_eq = fft.ifft2(eq).real\n",
"\n",
"# Show the result\n",
"# Clear the high frequencies dominate now\n",
"plt.figure()\n",
"plt.imshow(arr_eq, cmap='gray')\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that in practice, it is often recommended to use image dimensions that are a power of 2, for efficiency. The fft functions allow you to automatically pad images to a given size; at the end you can just crop the result to obtain the original image size again."
]
}
],
"metadata": {}
}
]
}