{ "metadata": { "signature": "sha256:cae10a1fd8af2ebca7a82c17eeb0a0ee278707d9adb3ea0f33a85bbcc8ca9af5" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Savitzky Golay Filtering\n", "========================\n", "\n", "The Savitzky Golay filter is a particular type of low-pass filter, well\n", "adapted for data smoothing. For further information see:\n", " (or\n", " for\n", "a pre-numpy implementation).\n", "\n", "Sample Code\n", "-----------" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#!python\n", "def savitzky_golay(y, window_size, order, deriv=0, rate=1):\n", " r\"\"\"Smooth (and optionally differentiate) data with a Savitzky-Golay filter.\n", " The Savitzky-Golay filter removes high frequency noise from data.\n", " It has the advantage of preserving the original shape and\n", " features of the signal better than other types of filtering\n", " approaches, such as moving averages techniques.\n", " Parameters\n", " ----------\n", " y : array_like, shape (N,)\n", " the values of the time history of the signal.\n", " window_size : int\n", " the length of the window. Must be an odd integer number.\n", " order : int\n", " the order of the polynomial used in the filtering.\n", " Must be less then `window_size` - 1.\n", " deriv: int\n", " the order of the derivative to compute (default = 0 means only smoothing)\n", " Returns\n", " -------\n", " ys : ndarray, shape (N)\n", " the smoothed signal (or it's n-th derivative).\n", " Notes\n", " -----\n", " The Savitzky-Golay is a type of low-pass filter, particularly\n", " suited for smoothing noisy data. The main idea behind this\n", " approach is to make for each point a least-square fit with a\n", " polynomial of high order over a odd-sized window centered at\n", " the point.\n", " Examples\n", " --------\n", " t = np.linspace(-4, 4, 500)\n", " y = np.exp( -t**2 ) + np.random.normal(0, 0.05, t.shape)\n", " ysg = savitzky_golay(y, window_size=31, order=4)\n", " import matplotlib.pyplot as plt\n", " plt.plot(t, y, label='Noisy signal')\n", " plt.plot(t, np.exp(-t**2), 'k', lw=1.5, label='Original signal')\n", " plt.plot(t, ysg, 'r', label='Filtered signal')\n", " plt.legend()\n", " plt.show()\n", " References\n", " ----------\n", " .. [1] A. Savitzky, M. J. E. Golay, Smoothing and Differentiation of\n", " Data by Simplified Least Squares Procedures. Analytical\n", " Chemistry, 1964, 36 (8), pp 1627-1639.\n", " .. [2] Numerical Recipes 3rd Edition: The Art of Scientific Computing\n", " W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery\n", " Cambridge University Press ISBN-13: 9780521880688\n", " \"\"\"\n", " import numpy as np\n", " from math import factorial\n", " \n", " try:\n", " window_size = np.abs(np.int(window_size))\n", " order = np.abs(np.int(order))\n", " except ValueError, msg:\n", " raise ValueError(\"window_size and order have to be of type int\")\n", " if window_size % 2 != 1 or window_size < 1:\n", " raise TypeError(\"window_size size must be a positive odd number\")\n", " if window_size < order + 2:\n", " raise TypeError(\"window_size is too small for the polynomials order\")\n", " order_range = range(order+1)\n", " half_window = (window_size -1) // 2\n", " # precompute coefficients\n", " b = np.mat([[k**i for i in order_range] for k in range(-half_window, half_window+1)])\n", " m = np.linalg.pinv(b).A[deriv] * rate**deriv * factorial(deriv)\n", " # pad the signal at the extremes with\n", " # values taken from the signal itself\n", " firstvals = y[0] - np.abs( y[1:half_window+1][::-1] - y[0] )\n", " lastvals = y[-1] + np.abs(y[-half_window-1:-1][::-1] - y[-1])\n", " y = np.concatenate((firstvals, y, lastvals))\n", " return np.convolve( m[::-1], y, mode='valid')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Code explanation\n", "----------------\n", "\n", "In lines 61-62 the coefficients of the local least-square polynomial fit\n", "are pre-computed. These will be used later at line 68, where they will\n", "be correlated with the signal. To prevent spurious results at the\n", "extremes of the data, the signal is padded at both ends with its mirror\n", "image, (lines 65-67).\n", "\n", "Figure\n", "------\n", "\n", "![](files/attachments/SavitzkyGolay/cd_spec.png)\n", "\n", "CD-spectrum of a protein. Black: raw data. Red: filter applied\n", "\n", "A wrapper for cyclic voltammetry data\n", "-------------------------------------\n", "\n", "One of the most popular applications of S-G filter, apart from smoothing\n", "UV-VIS and IR spectra, is smoothing of curves obtained in\n", "electroanalytical experiments. In cyclic voltammetry, voltage (being the\n", "abcissa) changes like a triangle wave. And in the signal there are cusps\n", "at the turning points (at switching potentials) which should never be\n", "smoothed. In this case, Savitzky-Golay smoothing should be done\n", "piecewise, ie. separately on pieces monotonic in x:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#!python numbers=disable\n", "def savitzky_golay_piecewise(xvals, data, kernel=11, order =4):\n", " turnpoint=0\n", " last=len(xvals)\n", " if xvals[1]>xvals[0] : #x is increasing?\n", " for i in range(1,last) : #yes\n", " if xvals[i]xvals[i-1] :\n", " turnpoint=i\n", " break\n", " if turnpoint==0 : #no change in direction of x\n", " return savitzky_golay(data, kernel, order)\n", " else:\n", " #smooth the first piece\n", " firstpart=savitzky_golay(data[0:turnpoint],kernel,order)\n", " #recursively smooth the rest\n", " rest=savitzky_golay_piecewise(xvals[turnpoint:], data[turnpoint:], kernel, order)\n", " return numpy.concatenate((firstpart,rest))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Two dimensional data smoothing and least-square gradient estimate\n", "-----------------------------------------------------------------\n", "\n", "Savitsky-Golay filters can also be used to smooth two dimensional data\n", "affected by noise. The algorithm is exactly the same as for the one\n", "dimensional case, only the math is a bit more tricky. The basic\n", "algorithm is as follow: 1. for each point of the two dimensional matrix\n", "extract a sub-matrix, centered at that point and with a size equal to an\n", "odd number \"window\\_size\". 2. for this sub-matrix compute a least-square\n", "fit of a polynomial surface, defined as p(x,y) = a0 + a1\\*x + a2\\*y +\n", "a3\\*x\\^2 + a4\\*y\\^2 + a5\\*x\\*y + ... . Note that x and y are equal to\n", "zero at the central point. 3. replace the initial central point with the\n", "value computed with the fit.\n", "\n", "Note that because the fit coefficients are linear with respect to the\n", "data spacing, they can pre-computed for efficiency. Moreover, it is\n", "important to appropriately pad the borders of the data, with a mirror\n", "image of the data itself, so that the evaluation of the fit at the\n", "borders of the data can happen smoothly.\n", "\n", "Here is the code for two dimensional filtering." ] }, { "cell_type": "code", "collapsed": false, "input": [ "#!python numbers=enable\n", "def sgolay2d ( z, window_size, order, derivative=None):\n", " \"\"\"\n", " \"\"\"\n", " # number of terms in the polynomial expression\n", " n_terms = ( order + 1 ) * ( order + 2) / 2.0\n", " \n", " if window_size % 2 == 0:\n", " raise ValueError('window_size must be odd')\n", " \n", " if window_size**2 < n_terms:\n", " raise ValueError('order is too high for the window size')\n", "\n", " half_size = window_size // 2\n", " \n", " # exponents of the polynomial. \n", " # p(x,y) = a0 + a1*x + a2*y + a3*x^2 + a4*y^2 + a5*x*y + ... \n", " # this line gives a list of two item tuple. Each tuple contains \n", " # the exponents of the k-th term. First element of tuple is for x\n", " # second element for y.\n", " # Ex. exps = [(0,0), (1,0), (0,1), (2,0), (1,1), (0,2), ...]\n", " exps = [ (k-n, n) for k in range(order+1) for n in range(k+1) ]\n", " \n", " # coordinates of points\n", " ind = np.arange(-half_size, half_size+1, dtype=np.float64)\n", " dx = np.repeat( ind, window_size )\n", " dy = np.tile( ind, [window_size, 1]).reshape(window_size**2, )\n", "\n", " # build matrix of system of equation\n", " A = np.empty( (window_size**2, len(exps)) )\n", " for i, exp in enumerate( exps ):\n", " A[:,i] = (dx**exp[0]) * (dy**exp[1])\n", " \n", " # pad input array with appropriate values at the four borders\n", " new_shape = z.shape[0] + 2*half_size, z.shape[1] + 2*half_size\n", " Z = np.zeros( (new_shape) )\n", " # top band\n", " band = z[0, :]\n", " Z[:half_size, half_size:-half_size] = band - np.abs( np.flipud( z[1:half_size+1, :] ) - band )\n", " # bottom band\n", " band = z[-1, :]\n", " Z[-half_size:, half_size:-half_size] = band + np.abs( np.flipud( z[-half_size-1:-1, :] ) -band ) \n", " # left band\n", " band = np.tile( z[:,0].reshape(-1,1), [1,half_size])\n", " Z[half_size:-half_size, :half_size] = band - np.abs( np.fliplr( z[:, 1:half_size+1] ) - band )\n", " # right band\n", " band = np.tile( z[:,-1].reshape(-1,1), [1,half_size] )\n", " Z[half_size:-half_size, -half_size:] = band + np.abs( np.fliplr( z[:, -half_size-1:-1] ) - band )\n", " # central band\n", " Z[half_size:-half_size, half_size:-half_size] = z\n", " \n", " # top left corner\n", " band = z[0,0]\n", " Z[:half_size,:half_size] = band - np.abs( np.flipud(np.fliplr(z[1:half_size+1,1:half_size+1]) ) - band )\n", " # bottom right corner\n", " band = z[-1,-1]\n", " Z[-half_size:,-half_size:] = band + np.abs( np.flipud(np.fliplr(z[-half_size-1:-1,-half_size-1:-1]) ) - band ) \n", " \n", " # top right corner\n", " band = Z[half_size,-half_size:]\n", " Z[:half_size,-half_size:] = band - np.abs( np.flipud(Z[half_size+1:2*half_size+1,-half_size:]) - band ) \n", " # bottom left corner\n", " band = Z[-half_size:,half_size].reshape(-1,1)\n", " Z[-half_size:,:half_size] = band - np.abs( np.fliplr(Z[-half_size:, half_size+1:2*half_size+1]) - band ) \n", " \n", " # solve system and convolve\n", " if derivative == None:\n", " m = np.linalg.pinv(A)[0].reshape((window_size, -1))\n", " return scipy.signal.fftconvolve(Z, m, mode='valid')\n", " elif derivative == 'col':\n", " c = np.linalg.pinv(A)[1].reshape((window_size, -1))\n", " return scipy.signal.fftconvolve(Z, -c, mode='valid') \n", " elif derivative == 'row':\n", " r = np.linalg.pinv(A)[2].reshape((window_size, -1))\n", " return scipy.signal.fftconvolve(Z, -r, mode='valid') \n", " elif derivative == 'both':\n", " c = np.linalg.pinv(A)[1].reshape((window_size, -1))\n", " r = np.linalg.pinv(A)[2].reshape((window_size, -1))\n", " return scipy.signal.fftconvolve(Z, -r, mode='valid'), scipy.signal.fftconvolve(Z, -c, mode='valid')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is a demo" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#!python number=enable\n", "\n", "# create some sample twoD data\n", "x = np.linspace(-3,3,100)\n", "y = np.linspace(-3,3,100)\n", "X, Y = np.meshgrid(x,y)\n", "Z = np.exp( -(X**2+Y**2))\n", "\n", "# add noise\n", "Zn = Z + np.random.normal( 0, 0.2, Z.shape )\n", "\n", "# filter it\n", "Zf = sgolay2d( Zn, window_size=29, order=4)\n", "\n", "# do some plotting\n", "matshow(Z)\n", "matshow(Zn)\n", "matshow(Zf)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "attachment:Original.pdf Original data\n", "attachment:Original+noise.pdf Original data + noise\n", "attachment:Original+noise+filtered.pdf (Original data + noise) filtered\n", "\n", "\n", "## Gradient of a two-dimensional function\n", "\n", "Since we have computed the best fitting interpolating polynomial surface it is easy to compute its gradient. This method of computing the gradient of a two dimensional function is quite robust, and partially hides the noise in the data, which strongly affects the differentiation operation. The maximum order of the derivative that can be computed obviously depends on the order of the polynomial used in the fitting.\n", "\n", "The code provided above have an option `derivative`, which as of now allows to compute the first derivative of the 2D data. It can be \"row\"or \"column\", indicating the direction of the derivative, or \"both\", which returns the gradient." ] } ], "metadata": {} } ] }