{
 "metadata": {
  "name": "",
  "signature": "sha256:238268e5d05a9a7320663cdf95bbdeaf1b2ff3aae58bd730377b8106e33e17b1"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Afternoon Breakout: Do-It-Yourself MCMC\n",
      "\n",
      "The standard Bayesian approach to model fitting involves sampling the posterior, usually via a variant of Markov Chain Monte Carlo (MCMC). Though there are many very sophisticated MCMC samplers out there, the most simple algorithm (Metropolis-Hastings) is rather straightforward to code.\n",
      "\n",
      "Here we'll walk through creating our own Metropolis-Hastings sampler from scratch.\n",
      "\n",
      "(The following is based on material put together by Adrian Price-Whelan)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Preliminaries\n",
      "\n",
      "As usual, we start with some imports:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%matplotlib inline\n",
      "import numpy as np\n",
      "import matplotlib.pyplot as plt\n",
      "from scipy import stats\n",
      "\n",
      "# use seaborn plotting defaults\n",
      "# If this causes an error, you can comment it out.\n",
      "import seaborn as sns\n",
      "sns.set()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And load some data:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from fig_code import linear_data_sample\n",
      "x, y, dy = linear_data_sample()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Exercise\n",
      "\n",
      "Walk through all the following steps, filling-in the code along the way."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "First plot the data to see what we're looking at (Use a ``plt.errorbar()`` plot with the provided data)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We're going to fit a line to the data, as we've done through the lecture:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def model(theta, x):\n",
      "    # the `theta` argument is a list of parameter values, e.g., theta = [m, b] for a line\n",
      "    pass"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---\n",
      "\n",
      "We'll start with the assumption that the data are independent and identically distributed so that the likelihood is simply a product of Gaussians (one big Gaussian). We'll also assume that the uncertainties reported are correct, and that there are no uncertainties on the `x` data. We need to define a function that will evaluate the (ln)likelihood of the data, given a particular choice of your model parameters. A good way to structure this function is as follows:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def ln_likelihood(theta, x, y, dy):\n",
      "    # we will pass the parameters (theta) to the model function\n",
      "    # the other arguments are the data\n",
      "    pass "
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "What about priors? Remember your prior only depends on the model parameters, but be careful about what kind of prior you are specifying for each parameter. Do we need to properly normalize the probabilities?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def ln_prior(theta):\n",
      "    pass"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we can define a function that evaluates the (ln)posterior probability, which is just the sum of the ln prior and ln likelihood:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def ln_posterior(theta, x, y, dy):\n",
      "    return ln_prior(theta) + ln_likelihood(theta, x, y, dy)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now write a function to actually run a Metropolis-Hastings MCMC sampler. Ford (2005) includes a great step-by-step walkthrough of the Metropolis-Hastings algorithm, and we'll base our code on that. Fill-in the steps mentioned in the comments below:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def run_mcmc(ln_posterior, nsteps, ndim, theta0, stepsize, args=()):\n",
      "    \"\"\"\n",
      "    Run a Markov Chain Monte Carlo\n",
      "    \n",
      "    Parameters\n",
      "    ----------\n",
      "    ln_posterior: callable\n",
      "        our function to compute the posterior\n",
      "    nsteps: int\n",
      "        the number of steps in the chain\n",
      "    theta0: list\n",
      "        the starting guess for theta\n",
      "    stepsize: float\n",
      "        a parameter controlling the size of the random step\n",
      "        e.g. it could be the width of the Gaussian distribution\n",
      "    args: tuple (optional)\n",
      "        additional arguments passed to ln_posterior\n",
      "    \"\"\"\n",
      "    # Create the array of size (nsteps, ndims) to hold the chain\n",
      "    # Initialize the first row of this with theta0\n",
      "    \n",
      "    # Create the array of size nsteps to hold the log-likelihoods for each point\n",
      "    # Initialize the first entry of this with the log likelihood at theta0\n",
      "    \n",
      "    # Loop for nsteps\n",
      "    for i in range(1, nsteps):\n",
      "        # Randomly draw a new theta from the proposal distribution.\n",
      "        # for example, you can do a normally-distributed step by utilizing\n",
      "        # the np.random.randn() function\n",
      "        \n",
      "        # Calculate the probability for the new state\n",
      "        \n",
      "        # Compare it to the probability of the old state\n",
      "        # Using the acceptance probability function\n",
      "        # (remember that you've computed the log probability, not the probability!)\n",
      "        \n",
      "        # Chose a random number r between 0 and 1 to compare with p_accept\n",
      "        \n",
      "        # If p_accept>1 or p_accept>r, accept the step\n",
      "            # Save the position to the i^th row of the chain\n",
      "            \n",
      "            # Save the probability to the i^th entry of the array\n",
      "            \n",
      "        # Else, do not accept the step\n",
      "            # Set the position and probability are equal to the previous values\n",
      "        pass\n",
      "            \n",
      "    # Return the chain and probabilities"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now run the MCMC code on the data provided."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Plot the position of the walker as a function of step number for each of the parameters. Are the chains converged? "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Make histograms of the samples for each parameter. Should you include all of the samples? "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Report to us your constraints on the model parameters -- you have some freedom in interpreting what this means..."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}