{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Disclaimer:** Most of the content in this notebook is coming from [www.scipy-lectures.org](http://www.scipy-lectures.org/intro/index.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Scipy - High-level Scientific Computing\n", "\n", "The [scipy](https://www.scipy.org/) package contains various toolboxes dedicated to common issues in scientific computing. Its different submodules correspond to different applications, such as interpolation, integration, optimization, image processing, statistics, special functions, etc.\n", "\n", "`scipy` is composed of task-specific sub-modules:\n", "\n", " scipy.cluster Clustering package\n", " scipy.constants Constants\n", " scipy.fftpack Discrete Fourier transforms\n", " scipy.integrate Integration and ODEs\n", " scipy.interpolate Interpolation\n", " scipy.io Input and output\n", " scipy.linalg Linear algebra\n", " scipy.misc Miscellaneous routines\n", " scipy.ndimage Multi-dimensional image processing\n", " scipy.odr Orthogonal distance regression\n", " scipy.optimize Optimization and root finding\n", " scipy.signal Signal processing\n", " scipy.sparse Sparse matrices\n", " scipy.sparse.linalg Sparse linear algebra\n", " scipy.sparse.csgraph Compressed Sparse Graph Routines\n", " scipy.spatial Spatial algorithms and data structures\n", " scipy.special Special functions\n", " scipy.stats Statistical functions\n", " scipy.stats.mstats Statistical functions for masked arrays\n", "\n", "They all depend on `numpy`, but are mostly independent of each\n", "other. The standard way of importing Numpy and these Scipy modules\n", "is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from scipy import stats # same for other sub-modules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Warning:** This tutorial is far from a complete introduction to numerical computing. As enumerating the different submodules and functions in scipy would be very boring, we concentrate instead on a few examples to give a general idea of how to use ``scipy`` for scientific computing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## File input/output: `scipy.io`\n", "\n", "### Load and Save `MATLAB` files" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy import io as spio\n", "a = np.ones((3, 3))\n", "spio.savemat('file.mat', {'a': a}) # savemat expects a dictionary" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls *mat" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = spio.loadmat('file.mat')\n", "data['a']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### See also\n", "\n", "* Load text files: `numpy.loadtxt`/`numpy.savetxt`\n", "* Clever loading of text/csv files: `numpy.genfromtxt`/`numpy.recfromcsv`\n", "* Fast and efficient, but numpy-specific, binary format: `numpy.save`/`numpy.load`\n", "* More advanced input/output of images in scikit-image: `skimage.io`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear algebra operations: `scipy.linalg`\n", "\n", "The `scipy.linalg` module provides standard linear algebra operations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `scipy.linalg.det` function computes the determinant of a square matrix:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy import linalg\n", "arr = np.array([[1, 2],\n", " [3, 4]])\n", "linalg.det(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `scipy.linalg.inv` function computes the inverse of a square matrix:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.array([[1, 2],\n", " [3, 4]])\n", "iarr = linalg.inv(arr)\n", "iarr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More advanced operations are available, for example **singular-value decomposition (SVD)**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.arange(9).reshape((3, 3)) + np.diag([1, 0, 1])\n", "arr" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "uarr, spec, vharr = linalg.svd(arr)\n", "\n", "# The resulting array spectrum is\n", "spec" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The original matrix can be re-composed by matrix multiplication of the outputs of ``svd`` with ``np.dot``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sarr = np.diag(spec)\n", "svd_mat = uarr.dot(sarr).dot(vharr)\n", "svd_mat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SVD is commonly used in statistics and signal processing. Many other standard decompositions (QR, LU, Cholesky, Schur), as well as solvers for linear systems, are available in `scipy.linalg`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpolation: `scipy.interpolate`\n", "\n", "`scipy.interpolate` is useful for fitting a function from experimental data and thus evaluating points where no measure exists." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's create some data\n", "measured_time = np.linspace(0, 1, 10)\n", "noise = (np.random.random(10)*2 - 1) * 1e-1\n", "measures = np.sin(2 * np.pi * measured_time) + noise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`scipy.interpolate.interp1d` can build a linear interpolation\n", "function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy.interpolate import interp1d\n", "linear_interp = interp1d(measured_time, measures)\n", "interpolation_time = np.linspace(0, 1, 50)\n", "linear_results = linear_interp(interpolation_time)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from matplotlib import pyplot as plt\n", "plt.plot(measured_time, measures, 'o', ms=6, label='measures')\n", "plt.plot(interpolation_time, linear_results, label='linear interp')\n", "plt.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A cubic interpolation can also be selected by providing the ``kind`` optional keyword argument:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cubic_interp = interp1d(measured_time, measures, kind='cubic')\n", "cubic_results = cubic_interp(interpolation_time)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(measured_time, measures, 'o', ms=6, label='measures')\n", "plt.plot(interpolation_time, cubic_results, label='cubic interp')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimization and fit: `scipy.optimize`\n", "\n", "Optimization is the problem of finding a numerical solution to a minimization or equality.\n", "\n", "The `scipy.optimize` module provides algorithms for function minimization (scalar or multi-dimensional), curve fitting and root finding." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy import optimize" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Curve fitting\n", "\n", "Suppose we have data on a sine wave, with some noise:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x_data = np.linspace(-5, 5, num=50)\n", "y_data = 2.9 * np.sin(1.5 * x_data) + np.random.normal(size=50)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And plot it\n", "import matplotlib.pyplot as plt\n", "plt.figure(figsize=(6, 4))\n", "plt.scatter(x_data, y_data)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we know that the data lies on a sine wave, but not the amplitudes or the period, we can find those by least squares curve fitting. First, we have to define the test function to fit, here a sine with unknown amplitude and period:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def test_func(x, a, b):\n", " return a * np.sin(b * x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then use `scipy.optimize.curve_fit` to find $a$ and $b$:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "params, params_covariance = optimize.curve_fit(\n", " test_func, x_data, y_data, p0=[2, 2])\n", "print(params)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And plot the resulting curve on the data\n", "plt.figure(figsize=(6, 4))\n", "plt.scatter(x_data, y_data, label='Data')\n", "plt.plot(x_data, test_func(x_data, params[0], params[1]),\n", " label='Fitted function')\n", "plt.legend(loc='best')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding the minimum of a scalar function\n", "\n", "Let's define the following function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def f(x):\n", " return x**2 + 10*np.sin(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and plot it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(-10, 10, 0.1)\n", "plt.plot(x, f(x))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function has a global minimum around -1.3 and a local minimum around 3.8.\n", "\n", "Searching for minimum can be done with `scipy.optimize.minimize`, given a starting point `x0`, it returns the location of the minimum that it has found:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = optimize.minimize(f, x0=0) \n", "result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result.x # The coordinate of the minimum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fast Fourier transforms: `scipy.fftpack`\n", "\n", "The `scipy.fftpack` module computes fast Fourier transforms (FFTs)\n", "and offers utilities to handle them. The main functions are:\n", "\n", "* `scipy.fftpack.fft` to compute the FFT\n", "* `scipy.fftpack.fftfreq` to generate the sampling frequencies\n", "* `scipy.fftpack.ifft` computes the inverse FFT, from frequency space to signal space\n", "\n", "As an illustration, let's create a (noisy) input signal (``sig``), and its FFT:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy import fftpack\n", "\n", "time_step = 0.02\n", "period = 1. / 0.5 # 0.5 Hz\n", "\n", "time_vec = np.arange(0, 20, time_step)\n", "sig = (np.sin(2 * np.pi / period * time_vec)\n", " + 0.5 * np.random.randn(time_vec.size))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(10, 4))\n", "plt.plot(time_vec, sig, label='Original signal')\n", "plt.title('Signal')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The FFT of the signal\n", "sig_fft = fftpack.fft(sig)\n", "\n", "# And the power (sig_fft is of complex dtype)\n", "power = np.abs(sig_fft)\n", "\n", "# The corresponding frequencies\n", "sample_freq = fftpack.fftfreq(sig.size, d=time_step)\n", "\n", "# Plot the FFT power\n", "plt.figure(figsize=(10, 4))\n", "plt.plot(sample_freq[:500], power[:500])\n", "plt.xlabel('Frequency [Hz]')\n", "plt.ylabel('plower')\n", "plt.title('FFT')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The peak signal frequency can be found with ``sample_freq[power.argmax()]``" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "peak_freq = sample_freq[power.argmax()]\n", "peak_freq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setting the Fourier component above this frequency to zero and inverting the FFT with `scipy.fftpack.ifft`, gives a filtered signal:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "high_freq_fft = sig_fft.copy()\n", "high_freq_fft[np.abs(sample_freq) > peak_freq] = 0\n", "filtered_sig = fftpack.ifft(high_freq_fft)\n", "\n", "plt.figure(figsize=(6, 5))\n", "plt.plot(time_vec, sig, label='Original signal')\n", "plt.plot(time_vec, filtered_sig, linewidth=3, label='Filtered signal')\n", "plt.xlabel('Time [s]')\n", "plt.ylabel('Amplitude')\n", "plt.legend(loc='best')\n", "plt.show();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Signal processing: `scipy.signal`\n", "\n", "`scipy.signal` is for typical signal processing: 1D, regularly-sampled signals." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Detrending `scipy.signal.detrend` - remove linear trend from signal:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate a random signal with a trend\n", "import numpy as np\n", "t = np.linspace(0, 5, 100)\n", "x = t + np.random.normal(size=100)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Detrend\n", "from scipy import signal\n", "x_detrended = signal.detrend(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot\n", "from matplotlib import pyplot as plt\n", "plt.figure(figsize=(8, 4))\n", "plt.plot(t, x, label=\"x\")\n", "plt.plot(t, x_detrended, label=\"x_detrended\")\n", "plt.legend(loc='best')\n", "plt.show()" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }