{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": { "word_id": "4818_07_pymc" }, "source": [ "# 7.7. Fitting a Bayesian model by sampling from a posterior distribution with a Markov Chain Monte Carlo method" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can find the instructions to install PyMC on the package's website. (http://pymc-devs.github.io/pymc/)\n", "\n", "You also need the *Storms* dataset from the book's website. (http://ipython-books.github.io)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Let's import the standard packages and PyMC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import pymc\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Let's import the data with Pandas." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# http://www.ncdc.noaa.gov/ibtracs/index.php?name=wmo-data\n", "df = pd.read_csv(\"data/Allstorms.ibtracs_wmo.v03r05.csv\",\n", " delim_whitespace=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. With Pandas, it only takes a single line of code to get the annual number of storms in the North Atlantic Ocean. We first select the storms in that basin (`NA`), then we group the rows by year (`Season`), and we take the number of unique storm (`Serial_Num`) as each storm can span several days (`nunique` method)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cnt = df[df['Basin'] == ' NA'].groupby('Season') \\\n", " ['Serial_Num'].nunique()\n", "years = cnt.index\n", "y0, y1 = years[0], years[-1]\n", "arr = cnt.values\n", "plt.figure(figsize=(8,4));\n", "plt.plot(years, arr, '-ok');\n", "plt.xlim(y0, y1);\n", "plt.xlabel(\"Year\");\n", "plt.ylabel(\"Number of storms\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Now, we define our probabilistic model. We assume that storms arise following a time-dependent Poisson process with a deterministic rate. We assume this rate is a piecewise-constant function that takes a first value `early_mean` before a certain switch point, and a second value `late_mean` after that point. These three unknown parameters are treated as random variables (we will describe them more in the *How it works...* section)." ] }, { "cell_type": "markdown", "metadata": { "style": "tip" }, "source": [ "A [Poisson process](http://en.wikipedia.org/wiki/Poisson_process) is a particular **point process**, that is, a stochastic process describing the random occurence of instantaneous events. The Poisson process is fully random: the events occur independently at a given rate." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "switchpoint = pymc.DiscreteUniform('switchpoint',\n", " lower=0, upper=len(arr))\n", "early_mean = pymc.Exponential('early_mean', beta=1)\n", "late_mean = pymc.Exponential('late_mean', beta=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. We define the piecewise-constant rate as a Python function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "@pymc.deterministic(plot=False)\n", "def rate(s=switchpoint, e=early_mean, l=late_mean):\n", " out = np.empty(len(arr))\n", " out[:s] = e\n", " out[s:] = l\n", " return out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Finally, the observed variable is the annual number of storms. It follows a Poisson variable with a random mean (the rate of the underlying Poisson process). This fact is a known mathematical property of Poisson processes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "storms = pymc.Poisson('storms', mu=rate, value=arr, observed=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Now, we use the MCMC method to sample from the posterior distribution, given the observed data. The `sample` method launches the fitting iterative procedure." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "model = pymc.Model([switchpoint, early_mean, late_mean, rate, storms])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mcmc = pymc.MCMC(model)\n", "mcmc.sample(iter=10000, burn=1000, thin=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. Let's plot the sampled Markov chains. Their stationary distribution corresponds to the posterior distribution we want to characterize." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure(figsize=(8,8))\n", "plt.subplot(311);\n", "plt.plot(mcmc.trace('switchpoint')[:]);\n", "plt.ylabel(\"Switch point\"); \n", "plt.subplot(312);\n", "plt.plot(mcmc.trace('early_mean')[:]);\n", "plt.ylabel(\"Early mean\");\n", "plt.subplot(313);\n", "plt.plot(mcmc.trace('late_mean')[:]);\n", "plt.xlabel(\"Iteration\");\n", "plt.ylabel(\"Late mean\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. We also plot the distribution of the samples: they correspond to the posterior distributions of our parameters, after the data points have been taken into account." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure(figsize=(14,3))\n", "plt.subplot(131);\n", "plt.hist(mcmc.trace('switchpoint')[:] + y0, 15);\n", "plt.xlabel(\"Switch point\")\n", "plt.ylabel(\"Distribution\")\n", "plt.subplot(132);\n", "plt.hist(mcmc.trace('early_mean')[:], 15);\n", "plt.xlabel(\"Early mean\");\n", "plt.subplot(133);\n", "plt.hist(mcmc.trace('late_mean')[:], 15);\n", "plt.xlabel(\"Late mean\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "10. Taking the sample mean of these distributions, we get posterior estimates for the three unknown parameters, including the year where the frequency of storms suddenly increased." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "yp = y0 + mcmc.trace('switchpoint')[:].mean()\n", "em = mcmc.trace('early_mean')[:].mean()\n", "lm = mcmc.trace('late_mean')[:].mean()\n", "print((yp, em, lm))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "11. Now we can plot the estimated rate on top of the observations." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure(figsize=(8,4));\n", "plt.plot(years, arr, '-ok');\n", "plt.axvline(yp, color='k', ls='--');\n", "plt.plot([y0, yp], [em, em], '-b', lw=3);\n", "plt.plot([yp, y1], [lm, lm], '-r', lw=3);\n", "plt.xlim(y0, y1);\n", "plt.xlabel(\"Year\");\n", "plt.ylabel(\"Number of storms\");" ] }, { "cell_type": "markdown", "metadata": { "style": "hidden" }, "source": [ "For a possible scientific interpretation of the data considered here, see http://www.gfdl.noaa.gov/global-warming-and-hurricanes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }