{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Bayesian Statistics Made Simple\n", "===\n", "\n", "Code and exercises from my workshop on Bayesian statistics in Python.\n", "\n", "Copyright 2016 Allen Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from __future__ import print_function, division\n", "\n", "%matplotlib inline\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "from thinkbayes2 import Pmf, Suite\n", "import thinkplot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Working with Pmfs\n", "---\n", "Create a Pmf object to represent a six-sided die." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d6 = Pmf()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A Pmf is a map from possible outcomes to their probabilities." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for x in [1,2,3,4,5,6]:\n", " d6[x] = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initially the probabilities don't add up to 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d6.Print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Normalize` adds up the probabilities and divides through. The return value is the total probability before normalizing." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d6.Normalize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the Pmf is normalized." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d6.Print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can compute its mean (which only works if it's normalized)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d6.Mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Random` chooses a random value from the Pmf." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d6.Random()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`thinkplot` provides methods for plotting Pmfs in a few different styles." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "thinkplot.Hist(d6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 1:** The Pmf object provides `__add__`, so you can use the `+` operator to compute the Pmf of the sum of two dice.\n", "\n", "Compute and plot the Pmf of the sum of two 6-sided dice." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 2:** Suppose I roll two dice and tell you the result is greater than 3.\n", "\n", "Plot the Pmf of the remaining possible outcomes and compute its mean." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cookie problem\n", "---\n", "Create a Pmf with two equally likely hypotheses.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cookie = Pmf(['Bowl 1', 'Bowl 2'])\n", "cookie.Print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Update each hypothesis with the likelihood of the data (a vanilla cookie)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cookie['Bowl 1'] *= 0.75\n", "cookie['Bowl 2'] *= 0.5\n", "cookie.Normalize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print the posterior probabilities." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cookie.Print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 3:** Suppose we put the first cookie back, stir, choose again from the same bowl, and get a chocolate cookie.\n", "\n", "Hint: The posterior (after the first cookie) becomes the prior (before the second cookie)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 4:** Instead of doing two updates, what if we collapse the two pieces of data into one update?\n", "\n", "Re-initialize `Pmf` with two equally likely hypotheses and perform one update based on two pieces of data, a vanilla cookie and a chocolate cookie.\n", "\n", "The result should be the same regardless of how many updates you do (or the order of updates)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dice problem\n", "---\n", "Create a Suite to represent dice with different numbers of sides." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pmf = Pmf([4, 6, 8, 12])\n", "pmf.Print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 5:** We'll solve this problem two ways. First we'll do it \"by hand\", as we did with the cookie problem; that is, we'll multiply each hypothesis by the likelihood of the data, and then renormalize.\n", "\n", "In the space below, update `suite` based on the likelihood of the data (rolling a 6), then normalize and print the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 6:** Now let's do the same calculation using `Suite.Update`.\n", "\n", "Write a definition for a new class called `Dice` that extends `Suite`. Then define a method called `Likelihood` that takes `data` and `hypo` and returns the probability of the data (the outcome of rolling the die) for a given hypothesis (number of sides on the die).\n", "\n", "Hint: What should you do if the outcome exceeds the hypothetical number of sides on the die?\n", "\n", "Here's an outline to get you started:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Dice(Suite):\n", " # hypo is the number of sides on the die\n", " # data is the outcome\n", " def Likelihood(self, data, hypo):\n", " return 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can create a `Dice` object and update it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dice = Dice([4, 6, 8, 12])\n", "dice.Update(6)\n", "dice.Print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we get more data, we can perform more updates." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for roll in [8, 7, 7, 5, 4]:\n", " dice.Update(roll)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dice.Print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The German tank problem\n", "---\n", "The German tank problem is actually identical to the dice problem." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Tank(Suite):\n", " # hypo is the number of tanks\n", " # data is an observed serial number\n", " def Likelihood(self, data, hypo):\n", " if data > hypo:\n", " return 0\n", " else:\n", " return 1 / hypo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the posterior probabilities after seeing Tank #37." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tank = Tank(range(100))\n", "tank.Update(37)\n", "thinkplot.Pdf(tank)\n", "tank.Mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 7:** Suppose we see another tank with serial number 17. What effect does this have on the posterior probabilities?\n", "\n", "Update the suite again with the new data and plot the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Euro problem\n", "---\n", "\n", "**Exercise 8:** Write a class definition for `Euro`, which extends `Suite` and defines a likelihood function that computes the probability of the data (heads or tails) for a given value of `x` (the probability of heads).\n", "\n", "Note that `hypo` is in the range 0 to 100. Here's an outline to get you started." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Euro(Suite):\n", " \n", " def Likelihood(self, data, hypo):\n", " \"\"\" \n", " hypo is the prob of heads (0-100)\n", " data is a string, either 'H' or 'T'\n", " \"\"\"\n", " return 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll start with a uniform distribution from 0 to 100." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro = Euro(range(101))\n", "thinkplot.Pdf(euro)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can update with a single heads:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro.Update('H')\n", "thinkplot.Pdf(euro)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another heads:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro.Update('H')\n", "thinkplot.Pdf(euro)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And a tails:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro.Update('T')\n", "thinkplot.Pdf(euro)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting over, here's what it looks like after 7 heads and 3 tails." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro = Euro(range(101))\n", "\n", "for outcome in 'HHHHHHHTTT':\n", " euro.Update(outcome)\n", "\n", "thinkplot.Pdf(euro)\n", "euro.MaximumLikelihood()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The maximum posterior probability is 70%, which is the observed proportion.\n", "\n", "Here are the posterior probabilities after 140 heads and 110 tails." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro = Euro(range(101))\n", "\n", "evidence = 'H' * 140 + 'T' * 110\n", "for outcome in evidence:\n", " euro.Update(outcome)\n", " \n", "thinkplot.Pdf(euro)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The posterior mean s about 56%" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro.Mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So is the value with Maximum Aposteriori Probability (MAP)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro.MAP()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The posterior credible interval has a 90% chance of containing the true value (provided that the prior distribution truly represents our background knowledge)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro.CredibleInterval(90)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Swamping the prior\n", "\n", "The following function makes a Euro object with a triangle prior." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def TrianglePrior():\n", " \"\"\"Makes a Suite with a triangular prior.\"\"\"\n", " suite = Euro(label='triangle')\n", " for x in range(0, 51):\n", " suite[x] = x\n", " for x in range(51, 101):\n", " suite[x] = 100-x \n", " suite.Normalize()\n", " return suite" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's what it looks like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "euro1 = Euro(range(101), label='uniform')\n", "euro2 = TrianglePrior()\n", "thinkplot.Pdfs([euro1, euro2])\n", "thinkplot.Config(title='Priors')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 9:** Update euro1 and euro2 with the same data we used before (140 heads and 110 tails) and plot the posteriors. How big is the difference in the means?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 1 }