{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Simple Bayesian Modeling\n", "\n", "Here we'll apply what we learned in the previous section, and use Python to fit some Bayesian models.\n", "We'll start with the classic model fitting problem: **Fitting a line to data**.\n", "\n", "We'll begin with some standard Python imports:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import seaborn; seaborn.set() # for plot formatting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Data\n", "\n", "For the sake of this demonstration, let's start by creating some data that we will fit with a straight line:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def make_data(intercept, slope, N=20, dy=5, rseed=42):\n", " rand = np.random.RandomState(rseed)\n", " x = 100 * rand.rand(N)\n", " y = intercept + slope * x\n", " y += dy * rand.randn(N)\n", " return x, y, dy * np.ones_like(x)\n", "\n", "theta_true = [25, 0.5]\n", "x, y, dy = make_data(*theta_true)\n", "plt.errorbar(x, y, dy, fmt='o');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Model\n", "\n", "Next we need to specify a model. We're fitting a straight line to data, so we'll need a slope and an intercept; i.e.\n", "\n", "$$\n", "y_M(x) = mx + b\n", "$$\n", "\n", "where our paramteter vector might be \n", "\n", "$$\n", "\\theta = [b, m]\n", "$$\n", "\n", "But this is only half the picture: what we mean by a \"model\" in a Bayesian sense is not only this expected value $y_M(x;\\theta)$, but a **probability distribution** for our data.\n", "That is, we need an expression to compute the likelihood $P(D\\mid\\theta)$ for our data as a function of the parameters $\\theta$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we are given data with simple errorbars, which imply that the probability for any *single* data point is a normal distribution about the true value. That is,\n", "\n", "$$\n", "y_i \\sim \\mathcal{N}(y_M(x_i;\\theta), \\sigma)\n", "$$\n", "\n", "or, in other words,\n", "\n", "$$\n", "P(x_i,y_i\\mid\\theta) = \\frac{1}{\\sqrt{2\\pi\\varepsilon_i^2}} \\exp\\left(\\frac{-\\left[y_i - y_M(x_i;\\theta)\\right]^2}{2\\varepsilon_i^2}\\right)\n", "$$\n", "\n", "where $\\varepsilon_i$ are the (known) measurement errorsindicated by the errorbars." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assuming all the points are independent, we can find the full likelihood by multiplying the individual likelihoods together:\n", "\n", "$$\n", "P(D\\mid\\theta) = \\prod_{i=1}^N P(x_i,y_i\\mid\\theta)\n", "$$\n", "\n", "For convenience (and also for numerical accuracy) this is often expressed in terms of the log-likelihood:\n", "\n", "$$\n", "\\log P(D\\mid\\theta) = -\\frac{1}{2}\\sum_{i=1}^N\\left(\\log(2\\pi\\varepsilon_i^2) + \\frac{\\left[y_i - y_M(x_i;\\theta)\\right]^2}{\\varepsilon_i^2}\\right)\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quick Exercise\n", "\n", "1. Write a Python function which computes the log-likelihood given a parameter vector $\\theta$, an array of errors $\\varepsilon$, and an array of $x$ and $y$ values\n", "\n", "2. Use tools in [``scipy.optimize``](http://docs.scipy.org/doc/scipy/reference/optimize.html) to maximize this likelihood (i.e. minimize the negative log-likelihood). How close is this result to the input ``theta_true`` above?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See the [Solutions-02](Solutions-02.ipynb#Quick-Exercise) notebook for one possible approach." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Prior\n", "\n", "We have computed the likelihood, now we need to think about the prior $P(\\theta\\mid I)$.\n", "\n", "This is where Bayesianism gets a bit controversial... what can we actually say about the slope and intercept before we fit our data?\n", "\n", "There are a couple approaches to choosing priors that you'll come across in practice:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0. Conjugate Priors\n", "\n", "In the days before computational power caught up with the aspirations of Bayesians, it was important to choose priors which would make the posterior analytically computable. A [conjugate prior](https://en.wikipedia.org/wiki/Conjugate_prior) is a prior which, due to its mathematical relation to the likelihood, makes the result analytically computable.\n", "The problem is that the form of the conjugate prior is very rarely defensible on any grounds other than computational convenience, and so this is **almost never a good choice**.\n", "You'll still occasionally hear people attack Bayesian approaches because of the use of conjugate priors: these people's arguments are decades outdated, and you should treat them appropriately." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Empirical Priors\n", "\n", "Empirical Priors are priors which are actually posteriors from previous studies of the same phenomenon. For example, it's common in Supernova cosmology studies to use the WMAP results as a prior: that is, we actually plug-in a *real result* and use our new data to improve on that. This situation is where Bayesian approaches really shine.\n", "\n", "For our linear fit, you might imagine that our $x, y$ data is a more accurate version of a previous experiment, where we've found that the intercept is $\\theta_0 = 50 \\pm 30$ and the slope is $\\theta_1 = 1.0 \\pm 0.5$.\n", "In this case, we'd encode this prior knowledge in the prior distribution itself." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Flat Priors\n", "\n", "If you don't have an empirical prior, you might be tempted to simply use a *flat prior* – i.e. a prior that is constant between two reasonable limits (i.e. equal probability slopes from -1000 to +1000).\n", "\n", "The problem is that flat priors are not always non-informative! For example, a flat prior on the slope will effectively give a higher weight to larger slopes.\n", "We can see this straightforwardly by plotting regularly-spaced slopes between 0 and 20:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "xx = np.linspace(-1, 1)\n", "for slope in np.linspace(0, 20, 100):\n", " plt.plot(xx, slope * xx, '-k', linewidth=1)\n", "plt.axis([-1, 1, -1, 1], aspect='equal');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The density of the lines is a proxy for the probability of those slopes with a flat prior.\n", "This is an important point to realize: **flat priors are not necessarily minimally informative**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Non-informative Priors\n", "\n", "What we *really* want in cases where no empirical prior is available is a **non-informative prior**. Among other things, such a prior should not depend on the units of the data.\n", "Perhaps the most principled approach to choosing non-informative priors was the *principle of maximum entropy* advocated by Jaynes ([book](http://omega.albany.edu:8008/JaynesBook.html)).\n", "\n", "Similar in spirit is the commonly-used [Jeffreys Prior](https://en.wikipedia.org/wiki/Jeffreys_prior), which in many cases of interest amounts to a \"scale invariant\" prior: a flat prior on the logarithm of the parameter.\n", "\n", "In the case of the linear slope, we often want a prior which does not artificially over-weight large slopes: there are a couple possible approaches to this (see http://arxiv.org/abs/1411.5018 for some discussion). For our situation, we might use a flat prior on the angle the line makes with the x-axis, which gives\n", "\n", "$$\n", "P(m) \\propto (1 + m^2)^{-3/2}\n", "$$\n", "\n", "For lack of a better term, I like to call this a \"symmetric prior\" on the slope (because it's the same whether we're fitting $y = mx + b$ or $x = m^\\prime y + b^\\prime$)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Implementation\n", "\n", "Let's define two python functions to compute the options for our prior: we'll use both a (log) flat prior and a (log) symmetric prior.\n", "In general, we need not worry about the normalization of the prior or the likelihood, which makes our lives easier:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def log_flat_prior(theta):\n", " if np.all(np.abs(theta) < 1000):\n", " return 0 # log(1)\n", " else:\n", " return -np.inf # log(0)\n", " \n", "def log_symmetric_prior(theta):\n", " if np.abs(theta[0]) < 1000:\n", " return -1.5 * np.log(1 + theta[1] ** 2)\n", " else:\n", " return -np.inf # log(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With these defined, we now have what we need to compute the log posterior as a function of the model parameters.\n", "You might be tempted to maximize this posterior in the same way that we did with the likelihood above, but this is not a Bayesian result! The Bayesian result is a (possibly marginalized) posterior probability for our parameters.\n", "The mode of a probability distribution is perhaps slightly informative, but it is in no way a Bayesian result.\n", "\n", "In the breakout, we will take a few minutes now to plot the posterior probability as a function of the slope and intercept." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Breakout\n", "\n", "1. Using matplotlib, plot the posterior probability distribution for the slope and intercept, once for each prior. I would suggest using ``plt.contourf()`` or ``plt.pcolor()``. How different are the distributions?\n", "\n", "2. Modify the dataset – how do the results change if you have very few data points or very large errors?\n", "\n", "3. If you finish this quickly, try adding 1-sigma and 2-sigma contours to your plot, keeping in mind that the probabilities are not normalized! You can add them to your plot with ``plt.contour()``." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See the [Solutions-02](Solutions-02.ipynb#Breakout) notebook for one possible approach to answering these questions. But give this a good try before simply clicking! It's by working through this yourself that you will really learn the material." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }