{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<CENTER>\n",
      "<H1>\n",
      "NASA Goddard Space Flight Center <BR>\n",
      "    Python User Group <BR>\n",
      "2014 Python Boot Camp\n",
      "</H1>\n",
      "</CENTER>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.display import YouTubeVideo\n",
      "YouTubeVideo(\"kQFKtI6gn9Y\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Scientific Programming I"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Reference Document"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<OL>\n",
      "<LI> <A HREF=\"http://scipy-lectures.github.io\">Python Scientific Lecture Notes</A>\n",
      "<LI> <A HREF=\"http://scipy-central.org\">SciPy Central (code snippets, modules, etc.) </A>\n",
      "<LI> <A HREF=\"http://docs.scipy.org/doc/scipy/reference\">SciPy Reference Guide</A>\n",
      "</OL>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "What is a SciPy?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The scipy package contains various toolboxes dedicated to common issues in scientific computing. Its different submodules correspond to different applications, such as interpolation, integration, optimization, image processing, statistics, special functions, etc.\n",
      "\n",
      "scipy can be compared to other standard scientific-computing libraries, such as the GSL (GNU Scientific Library for C and C++), or Matlab\u2019s toolboxes. scipy is the core package for scientific routines in Python; it is meant to operate efficiently on numpy arrays, so that numpy and scipy work hand in hand.\n",
      "\n",
      "Before implementing a routine, it is worth checking if the desired data processing is not already implemented in Scipy. As non-professional programmers, scientists often tend to re-invent the wheel, which leads to buggy, non-optimal, difficult-to-share and unmaintainable code. By contrast, Scipy\u2018s routines are optimized and tested, and should therefore be used when possible."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Available packages"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<IMG src=\"files/modules.png\"></IMG>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Getting Started"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import matplotlib\n",
      "matplotlib.rcParams['savefig.dpi'] = 2 * matplotlib.rcParams['savefig.dpi'] # Increase default figure resolution"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Input / Output (scipy.io)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "scipy has built-in functions to read and write to a wide variety of data formats, including Matlab, IDL, and netcdf.  Plain text and binary capabilites are available from numpy.  hdf5 has its own python module. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy import io\n",
      "a = np.ones((3, 3))\n",
      "io.savemat('file.mat', {'a': a}); # savemat expects a dictionary\n",
      "data = io.loadmat('file.mat', struct_as_record=True)\n",
      "data['a']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Linear Algebra Operations (scipy.linalg)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The scipy.linalg module provides standard linear algebra operations, relying on an underlying efficient implementation (BLAS, LAPACK).  The most commonly used methods are to calculate the determinant of a matrix (linalg.det) and to invert a matrix (linalg.inv)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy import linalg"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "arr1 = np.array([[1, 2], [3, 4]]); arr2 = np.array([[3, 2], [6, 4]]); arr3 = np.ones((3, 4))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linalg.det(arr1), linalg.det(arr2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linalg.det(arr3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linalg.inv(arr1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.allclose(np.dot(arr1, linalg.inv(arr1)), np.eye(2))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linalg.inv(arr2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Fast Fourier Transforms (scipy.fftpack)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy import fftpack"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The scipy.fftpack module allows to compute fast Fourier transforms. As an illustration, a (noisy) input signal may look like:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "time_step = 0.02\n",
      "period = 5.\n",
      "time_vec = np.arange(0, 20, time_step)\n",
      "sig = np.sin(2 * np.pi / period * time_vec) + 0.5 * np.random.randn(time_vec.size)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The observer doesn\u2019t know the signal frequency, only the sampling time step of the signal sig. The signal is supposed to come from a real function so the Fourier transform will be symmetric. The scipy.fftpack.fftfreq() function will generate the sampling frequencies and scipy.fftpack.fft() will compute the fast Fourier transform:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sample_freq = fftpack.fftfreq(sig.size, d=time_step)\n",
      "sig_fft = fftpack.fft(sig)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Because the resulting power is symmetric, only the positive part of the spectrum needs to be used for finding the frequency:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pidxs = np.where(sample_freq > 0)\n",
      "freqs = sample_freq[pidxs]\n",
      "power = np.abs(sig_fft)[pidxs]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plt.figure()\n",
      "plt.plot(freqs, power)\n",
      "plt.xlabel('Frequency [Hz]')\n",
      "plt.ylabel('plower')\n",
      "axes = plt.axes([0.3, 0.3, 0.5, 0.5])\n",
      "plt.title('Peak frequency')\n",
      "plt.plot(freqs[:8], power[:8])\n",
      "plt.setp(axes, yticks=[])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The signal frequency can be found by:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "freq = freqs[power.argmax()]\n",
      "freq, np.allclose(freq, 1./period)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now the high-frequency noise will be removed from the Fourier transformed signal:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sig_fft[np.abs(sample_freq) > freq] = 0"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The resulting filtered signal can be computed by the scipy.fftpack.ifft() function:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "main_sig = fftpack.ifft(sig_fft)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The result can be viewed with:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plt.figure()\n",
      "plt.plot(time_vec, sig)\n",
      "plt.plot(time_vec, main_sig, linewidth=3)\n",
      "plt.xlabel('Time [s]')\n",
      "plt.ylabel('Amplitude')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Optimization and Fitting (scipy.optimize)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The scipy.optimize module provides useful algorithms for function minimization (scalar or multi-dimensional), curve fitting and root finding."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy import optimize"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Finding the minimum of a scalar function"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "f = lambda x: x**2 + 10*np.sin(x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = np.arange(-10, 10, 0.1)\n",
      "plt.plot(x, f(x))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This function has a global minimum around -1.3 and a local minimum around 3.8.\n",
      "\n",
      "The general and efficient way to find a minimum for this function is to conduct a gradient descent starting from a given initial point. The BFGS algorithm is a good way of doing this:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "optimize.fmin_bfgs(f, 0)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A possible issue with this approach is that, if the function has local minima the algorithm may find these local minima instead of the global minimum depending on the initial point:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "optimize.fmin_bfgs(f, 3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If we don\u2019t know the neighborhood of the global minimum to choose the initial point, we need to resort to costlier global optimization. To find the global minimum, the simplest algorithm is the brute force algorithm, in which the function is evaluated on each point of a given grid:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "grid = (-10, 10, 0.1)\n",
      "xmin_global = optimize.brute(f, (grid,))\n",
      "xmin_global"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For larger grid sizes, scipy.optimize.brute() becomes quite slow. scipy.optimize.anneal() provides an alternative, using simulated annealing. More efficient algorithms for different classes of global optimization problems exist, but this is out of the scope of scipy. Some useful packages for global optimization are OpenOpt, IPOPT, PyGMO and PyEvolve.\n",
      "\n",
      "To find the local minimum, let\u2019s constraint the variable to the interval (0, 10) using scipy.optimize.fminbound():"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "xmin_local = optimize.fminbound(f, 0, 10)\n",
      "xmin_local"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Finding the roots of a scalar function"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To find a root, i.e. a point where f(x) = 0, of the function f above we can use for example scipy.optimize.fsolve():"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "root = optimize.fsolve(f, 1)  # our initial guess is 1\n",
      "root"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note that only one root is found. Inspecting the plot of f reveals that there is a second root around -2.5. We find the exact value of it by adjusting our initial guess:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "root2 = optimize.fsolve(f, -2.5)\n",
      "root2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Curve Fitting"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Suppose we have data sampled from f with some noise:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "xdata = np.linspace(-10, 10, num=20)\n",
      "ydata = f(xdata) + np.random.randn(xdata.size)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now if we know the functional form of the function from which the samples were drawn (x^2 + sin(x) in this case) but not the amplitudes of the terms, we can find those by least squares curve fitting. First we have to define the function to fit:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "f2 = lambda x, a, b: a*x**2 + b*np.sin(x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Then we can use scipy.optimize.curve_fit() to find a and b:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "guess = [2, 2]\n",
      "params, params_covariance = optimize.curve_fit(f2, xdata, ydata, guess)\n",
      "params"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fig = plt.figure()\n",
      "ax = fig.add_subplot(111)\n",
      "ax.plot(x, f(x), 'b-', label=\"f(x)\")\n",
      "ax.plot(x, f2(x, *params), 'r--', label=\"Curve fit result\")\n",
      "xmins = np.array([xmin_global[0], xmin_local])\n",
      "ax.plot(xmins, f(xmins), 'go', label=\"Minima\")\n",
      "roots = np.array([root, root2])\n",
      "ax.plot(roots, f(roots), 'kv', label=\"Roots\")\n",
      "ax.legend()\n",
      "ax.set_xlabel('x')\n",
      "ax.set_ylabel('f(x)')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "Note: In Scipy >= 0.11 unified interfaces to all minimization and root finding algorithms are available: scipy.optimize.minimize(), scipy.optimize.minimize_scalar() and scipy.optimize.root(). They allow comparing various algorithms easily through the method keyword.\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.display import YouTubeVideo\n",
      "YouTubeVideo(\"PPN3KTtrnZM\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Interpolation (scipy.interpolate)\n",
      "\n",
      "\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The scipy.interpolate is useful for fitting a function from experimental data and thus evaluating points where no measure exists. The module is based on the FITPACK Fortran subroutines from the netlib project.\n",
      "\n",
      "By imagining experimental data close to a sine function:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "measured_time = np.linspace(0, 1, 10)\n",
      "noise = (np.random.random(10)*2 - 1) * 1e-1\n",
      "measures = np.sin(2 * np.pi * measured_time) + noise"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The scipy.interpolate.interp1d class can build a linear interpolation function:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy.interpolate import interp1d\n",
      "linear_interp = interp1d(measured_time, measures)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Then the scipy.interpolate.linear_interp instance needs to be evaluated at the time of interest:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "computed_time = np.linspace(0, 1, 50)\n",
      "linear_results = linear_interp(computed_time)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A cubic interpolation can also be selected by providing the kind optional keyword argument:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cubic_interp = interp1d(measured_time, measures, kind='cubic')\n",
      "cubic_results = cubic_interp(computed_time)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plt.plot(measured_time, measures, 'o', ms=6, label='measures')\n",
      "plt.plot(computed_time, linear_results, label='linear interp')\n",
      "plt.plot(computed_time, cubic_results, label='cubic interp')\n",
      "plt.legend()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Statistics and Random Numbers"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The module scipy.stats contains statistical tools and probabilistic descriptions of random processes. Random number generators for various random process can be found in numpy.random."
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Histogram and probability density function"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Given observations of a random process, their histogram is an estimator of the random process\u2019s PDF (probability density function):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "a = np.random.normal(size=1000)\n",
      "bins = np.arange(-4, 5)\n",
      "bins\n",
      "\n",
      "histogram = np.histogram(a, bins=bins, normed=True)[0]\n",
      "bins = 0.5*(bins[1:] + bins[:-1])\n",
      "bins\n",
      "\n",
      "from scipy import stats\n",
      "b = stats.norm.pdf(bins)  # norm is a distribution"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plt.plot(bins, histogram)\n",
      "plt.plot(bins, b)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If we know that the random process belongs to a given family of random processes, such as normal processes, we can do a maximum-likelihood fit of the observations to estimate the parameters of the underlying distribution. Here we fit a normal process to the observed data:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "loc, std = stats.norm.fit(a)\n",
      "loc, std"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Percentiles"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The median is the value with half of the observations below, and half above:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.median(a)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It is also called the percentile 50, because 50% of the observation are below it:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "stats.scoreatpercentile(a, 50)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Similarly, we can calculate the percentile 90:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "stats.scoreatpercentile(a, 90) "
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The percentile is an estimator of the CDF: cumulative distribution function."
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Statistical tests"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A statistical test is a decision indicator. For instance, if we have two sets of observations, that we assume are generated from Gaussian processes, we can use a T-test to decide whether the two sets of observations are significantly different:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "a = np.random.normal(0, 1, size=100)\n",
      "b = np.random.normal(1, 1, size=10)\n",
      "stats.ttest_ind(a, b)   "
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The resulting output is composed of:\n",
      "\n",
      "1) The T statistic value: it is a number the sign of which is proportional to the difference between the two random processes and the magnitude is related to the significance of this difference.\n",
      "\n",
      "2) The p value: the probability of both processes being identical. If it is close to 1, the two process are almost certainly identical. The closer it is to zero, the more likely it is that the processes have different means."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Signal Processing (scipy.signal)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy import signal"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "scipy.signal.detrend(): remove linear trend from signal:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "t = np.linspace(0, 5, 100)\n",
      "x = t + np.random.normal(size=100)\n",
      "\n",
      "plt.plot(t, x, linewidth=3)\n",
      "plt.plot(t, signal.detrend(x), linewidth=3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "scipy.signal.resample(): resample a signal to n points using FFT."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "t = np.linspace(0, 5, 100)\n",
      "x = np.sin(t)\n",
      "\n",
      "plt.plot(t, x, linewidth=3)\n",
      "plt.plot(t[::2], signal.resample(x, 50), 'ko')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "scipy.signal has many window functions: scipy.signal.hamming(), scipy.signal.bartlett(), scipy.signal.blackman()...\n",
      "\n",
      "scipy.signal has filtering (median filter scipy.signal.medfilt(), Wiener scipy.signal.wiener()), but we will discuss this in the image section."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Image Processing (scipy.ndimage)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy import ndimage"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Image processing routines may be sorted according to the category of processing they perform."
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Geometrical transformations on images"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Changing orientation, resolution, ..."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy import misc\n",
      "lena = misc.lena()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fig = plt.figure()\n",
      "ax = fig.add_subplot(111)\n",
      "ax.imshow(lena, cmap=cm.gray)\n",
      "ax.axis('off')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "shifted_lena = ndimage.shift(lena, (50, 50))\n",
      "shifted_lena2 = ndimage.shift(lena, (50, 50), mode='nearest')\n",
      "rotated_lena = ndimage.rotate(lena, 30)\n",
      "cropped_lena = lena[50:-50, 50:-50]\n",
      "zoomed_lena = ndimage.zoom(lena, 2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fig = plt.figure()\n",
      "ax = fig.add_subplot(151)\n",
      "ax.imshow(shifted_lena, cmap=cm.gray)\n",
      "ax.axis('off')\n",
      "ax2 = fig.add_subplot(152)\n",
      "ax2.imshow(shifted_lena2, cmap=cm.gray)\n",
      "ax2.axis('off')\n",
      "ax3 = fig.add_subplot(153)\n",
      "ax3.imshow(rotated_lena, cmap=cm.gray)\n",
      "ax3.axis('off')\n",
      "ax4 = fig.add_subplot(154)\n",
      "ax4.imshow(cropped_lena, cmap=cm.gray)\n",
      "ax4.axis('off')\n",
      "ax5 = fig.add_subplot(155)\n",
      "ax5.imshow(zoomed_lena, cmap=cm.gray)\n",
      "ax5.axis('off')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "noisy_lena = np.copy(lena).astype(np.float)\n",
      "noisy_lena += lena.std()*0.5*np.random.standard_normal(lena.shape)\n",
      "blurred_lena = ndimage.gaussian_filter(noisy_lena, sigma=3)\n",
      "median_lena = ndimage.median_filter(blurred_lena, size=5)\n",
      "wiener_lena = signal.wiener(blurred_lena, (5,5))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fig = plt.figure()\n",
      "ax = fig.add_subplot(141)\n",
      "ax.imshow(noisy_lena, cmap=cm.gray)\n",
      "ax.axis('off')\n",
      "ax.set_title(\"noisy Lena\")\n",
      "ax2 = fig.add_subplot(142)\n",
      "ax2.imshow(blurred_lena, cmap=cm.gray)\n",
      "ax2.axis('off')\n",
      "ax2.set_title(\"Gaussian filter\")\n",
      "ax3 = fig.add_subplot(143)\n",
      "ax3.imshow(median_lena, cmap=cm.gray)\n",
      "ax3.axis('off')\n",
      "ax3.set_title(\"median filter\")\n",
      "ax4 = fig.add_subplot(144)\n",
      "ax4.imshow(wiener_lena, cmap=cm.gray)\n",
      "ax4.axis('off')\n",
      "ax4.set_title(\"Weiner filter\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Measurements on images"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x, y = np.indices((100, 100))\n",
      "sig = np.sin(2*np.pi*x/50.)*np.sin(2*np.pi*y/50.)*(1+x*y/50.**2)**2\n",
      "mask = sig > 1"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we look for various information about the objects in the image:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "labels, nb = ndimage.label(mask)\n",
      "nb"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "areas = ndimage.sum(mask, labels, xrange(1, labels.max()+1))\n",
      "areas"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "maxima = ndimage.maximum(sig, labels, xrange(1, labels.max()+1))\n",
      "maxima"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ndimage.find_objects(labels==4)\n",
      "sl = ndimage.find_objects(labels==4)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fig = plt.figure()\n",
      "ax = fig.add_subplot(131)\n",
      "ax.imshow(sig)\n",
      "ax.axis('off')\n",
      "ax.set_title(\"sig\")\n",
      "ax2 = fig.add_subplot(132)\n",
      "ax2.imshow(mask)\n",
      "ax2.axis('off')\n",
      "ax2.set_title(\"mask\")\n",
      "ax3 = fig.add_subplot(133)\n",
      "ax3.imshow(labels)\n",
      "ax3.axis('off')\n",
      "ax3.set_title(\"labels\")\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Breakout Session: Imaging Processing"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<IMG src=\"files/MV_HFV_012.jpg\"></IMG>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<OL>\n",
      "<LI> Open the image file MV_HFV_012.png and display it.  This Scanning Element Microscopy image shows a glass sample (light gray) with some bubbles (on black) and unmolten sand grains (dark gray). \n",
      "\n",
      "<LI> Crop the image to remove the lower panel with measure information.\n",
      "\n",
      "<LI> Slightly filter the image with a median filter in order to refine its histogram. Check how the histogram changes.\n",
      "\n",
      "<LI> Using the histogram of the filtered image, determine thresholds that allow to define masks for sand pixels, glass pixels and bubble pixels.\n",
      "\n",
      "<LI> Display an image in which the three phases are colored with three different colors.\n",
      "\n",
      "<LI> Attribute labels to all bubbles and sand grains, and remove from the sand mask grains that are smaller than 10 pixels. To do so, use ndimage.sum or np.bincount to compute the grain sizes.\n",
      "\n",
      "<LI> Compute the mean and median size of bubbles.\n",
      "\n",
      "</OL>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}