{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Demystifying Markov Chain Monte Carlo\n", "\n", "#### Brett Morris\n", "\n", "### In this tutorial \n", "\n", "We will write our own Markov Chain Monte Carlo algorithm with a Metropolis-Hastings sampler. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "np.random.seed(42)\n", "\n", "# Set the properties of the line that we'll be fitting: \n", "true_slope = 0.3\n", "true_intercept = 0.0\n", "\n", "x = np.linspace(0, 10, 50)\n", "y = np.random.randn(len(x)) + true_slope * x + true_intercept\n", "yerr = np.random.rand(len(x))/10 + 1\n", "\n", "plt.errorbar(x, y, yerr, fmt='.')\n", "plt.plot(x, true_slope * x + true_intercept, color='k')\n", "plt.xlabel('x')\n", "plt.ylabel('y')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We define a simple linear model to describe the data, which has parameters $\\theta = \\{m, b\\}$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "def linear_model(theta, x): \n", " \"\"\"\n", " Simple linear model. \n", " \"\"\"\n", " m, b = theta\n", " return m * x + b\n", "\n", "\n", "def chi2(theta, x, y, yerr, model): \n", " \"\"\"\n", " Compute the \\chi^2 by comparing `model` evaluated at `theta`\n", " to `y` at each value of `x`. \n", " \"\"\"\n", " return np.sum((model(theta, x) - y)**2 / yerr**2)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Let the proposal step simply draw from a Gaussian distribution: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "def proposal(theta, scale=1):\n", " \"\"\"\n", " Generate proposal step, by adding a draw from a \n", " Gaussian distribution to the initial step position\n", " \"\"\"\n", " return theta + scale * np.random.randn(len(theta))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We will decide whether or not to accept new proposal steps using the Metropolis-Hastings algorithm, as defined by [Ford (2005)](https://arxiv.org/abs/astro-ph/0305441):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "def metropolis_hastings(init_theta, x, y, yerr, acceptance, scale=0.05):\n", " \"\"\"\n", " Metropolis-Hastings algorithm, a la Ford (2005). \n", " \n", " 1) Generate a proposal step\n", " 2) Compare the chi^2 of the proposal step to the current step\n", " 3) Draw a random number `u` on [0, 1]\n", " 4) Compute alpha = min([exp(-0.5 * (chi2(new) - chi2(old)), 1])\n", " 5) If u <= alpha, accept step, otherwise keep step\n", " \"\"\"\n", " # Generate a proposal step: \n", " proposed_theta = proposal(init_theta, scale=scale)\n", "\n", " # Compare chi^2 of proposed step to current step:\n", " chi2_init_step = chi2(init_theta, x, y, yerr, linear_model)\n", " chi2_proposed_step = chi2(proposed_theta, x, y, yerr, linear_model)\n", " relative_likelihood = np.exp(-0.5 * (chi2_proposed_step - chi2_init_step))\n", " \n", " alpha = np.min([relative_likelihood, 1])\n", "\n", " # If U(0, 1) <= alpha, accept the step: \n", " if np.random.rand() <= alpha: \n", " return proposed_theta, acceptance + 1\n", " else: \n", " return init_theta, acceptance\n", " \n", "\n", "def sampler(x, y, yerr, init_theta, n_steps, scale=0.05):\n", " \"\"\"\n", " Markov Chain Monte Carlo sampler. \n", " \"\"\"\n", " current_theta = np.copy(init_theta)\n", " # Allocate memory for samples: \n", " samples = np.zeros((n_steps, len(init_theta)))\n", " samples[0, :] = init_theta\n", " acceptance = 0\n", " \n", " for i in range(1, n_steps):\n", " # Run the M-H algorithm to determine next step:\n", " current_theta, acceptance = metropolis_hastings(current_theta, \n", " x, y, yerr, acceptance, \n", " scale=scale)\n", " # Record the result: \n", " samples[i, :] = current_theta\n", " \n", " # Compute the final acceptance rate\n", " acceptance_rate = acceptance / n_steps\n", " \n", " return samples, acceptance_rate" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Make an initial guess (which is wrong!) and let the MCMC algorithm find the correct solution: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "init_parameters = [1, 0.5] # slope, intercept\n", "n_steps = 50000\n", "\n", "# This tweakable parameter determines how\n", "# far new steps should be taken away from \n", "# previous steps. Increase `scale` to \n", "# decrease your acceptance rate: \n", "scale = 0.06\n", "\n", "samples, acceptance_rate = sampler(x, y, yerr, init_parameters, \n", " n_steps, scale=scale)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "What is the acceptance rate? Ideally this should be near 45%:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "print(acceptance_rate)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Let's plot a few random draws from the posterior probability distribution functions for each parameter: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "for i in range(100): \n", " random_step = np.random.randint(0, samples.shape[0])\n", " random_theta = samples[random_step, :]\n", " plt.plot(x, linear_model(random_theta, x), alpha=0.05, color='k')\n", "plt.errorbar(x, y, yerr, fmt='.')\n", "plt.xlabel('x')\n", "plt.ylabel('y')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "You can see that the uncertainty in the measurements is being reflected by uncertainty in the slope and intercept parameters. \n", "\n", "We normally see the posterior probability distribution functions displayed in a \"corner\" plot like the one below: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "from corner import corner\n", "\n", "burned_in_samples = samples[1000:]\n", "\n", "corner(burned_in_samples, labels=['m', 'b'], truths=[true_slope, true_intercept]);" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Note that the posterior distributions are consistent within the uncertainties for each parameter! We are accurately measuring each parameter and their uncertainties. \n", "\n", "### Parameter Degeneracies\n", "\n", "But these parameters $m$ and $b$ are correlated with one another – note that small values of $m$ correspond to large values of $b$ while small values of $b$ correspond to large values of $m$. We can get rid of this degeneracy by reparameterizing our model to fit for the parameters $\\theta$ and $b$, see [Hogg et al. (2010)](https://arxiv.org/abs/1008.4686). " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "def ensemble_sampler(x, y, yerr, init_thetas, n_steps, n_walkers, n_dim, scale=0.05):\n", " \"\"\"\n", " Markov Chain Monte Carlo sampler with multiple walkers.\n", " \"\"\"\n", " current_theta = np.copy(init_thetas)\n", " # Allocate memory for samples: \n", " samples = np.zeros((n_steps, n_walkers, n_dim))\n", " samples[0, :, :] = current_theta\n", " acceptance = 0\n", " \n", " for i in range(1, n_steps):\n", " for j in range(n_walkers):\n", " # Run the M-H algorithm to determine next step:\n", " current_theta, acceptance = metropolis_hastings(samples[i-1, j, :], \n", " x, y, yerr, acceptance, \n", " scale=scale)\n", " # Record the result: \n", " samples[i, j, :] = current_theta\n", " \n", " # Compute the final acceptance rate\n", " acceptance_rate = acceptance / (n_steps * n_walkers)\n", " \n", " return samples, acceptance_rate" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# `n_dim` is the number of free parameters\n", "n_dim = 2\n", "\n", "# `n_walkers` is the number of (independent) samplers\n", "n_walkers = 5\n", "\n", "# `n_steps` is the number of steps per walker \n", "n_steps = 10000\n", "\n", "# Create a group of initial parameters, one per walker, clustered\n", "# around the initial guess for the maximum-likelihood parameters: \n", "init_params_ensemble = [np.array(init_parameters) + 1e-5 * np.random.randn()\n", " for i in range(n_walkers)]\n", "\n", "samples, acceptance_rate = ensemble_sampler(x, y, yerr, init_params_ensemble, \n", " n_steps, n_walkers, n_dim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot the \"trace\" of each walker, i.e. how each walker evolves from step to step: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig, ax = plt.subplots(2, 1, figsize=(4, 5), sharex=True)\n", "ax[0].plot(samples[:, :, 0])\n", "ax[0].set(ylabel='m')\n", "ax[1].plot(samples[:, :, 1])\n", "ax[1].set(xlabel='Step', ylabel='b')\n", "fig.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the \"trace\" plot above, you can see that the chains appear to be oscillate around their final values after the first 1000 steps or so. \n", "\n", "Plot the corner plot for all chains: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Skip the first 1000 steps to allow chains to reach convergence \n", "burned_in_samples = samples[1000:]\n", "shape = burned_in_samples.shape\n", "\n", "# We need to reshape the 3D array into a 2D array to make the corner plot: \n", "corner(burned_in_samples.reshape((shape[0]*n_walkers, shape[2])), \n", " labels=['m', 'b'], truths=[true_slope, true_intercept])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Have the chains \"converged\"? One way to check is by computing the autocorrelation length of the chains, which we do below: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def autocorrelation(x):\n", " \"\"\"\n", " Calculate the autocorrelation function of array `x`.\n", " \"\"\"\n", " result = np.correlate(x, x, mode='full')\n", " return result[result.size//2:]\n", "\n", "def first_zero_crossing(acf): \n", " \"\"\"\n", " Find index of first zero-crossing of the autcorrelation function \n", " \"\"\"\n", " return np.argwhere(np.diff(np.sign(acf)))[0][0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "effective_chain_lengths = []\n", "for walker in burned_in_samples[:, :, 0].T:\n", " acf = autocorrelation(walker - walker.mean())\n", " ind = first_zero_crossing(acf)\n", " effective_chain_lengths.append(burned_in_samples.shape[0]/ind)\n", " \n", "print(effective_chain_lengths)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These effective chain lengths should be large to ensure convergence. If they're small (in the range of a few to ten), you haven't sampled enough steps to reach convergence." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" } }, "nbformat": 4, "nbformat_minor": 2 }