{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Bayesian Statistics Made Simple\n", "===\n", "\n", "Code and exercises from my workshop on Bayesian statistics in Python.\n", "\n", "Copyright 2016 Allen Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# If we're running on Colab, install empiricaldist\n", "# https://pypi.org/project/empiricaldist/\n", "\n", "import sys\n", "IN_COLAB = 'google.colab' in sys.modules\n", "\n", "if IN_COLAB:\n", " !pip install empiricaldist" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "import seaborn as sns\n", "sns.set_style('white')\n", "sns.set_context('talk')\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "from empiricaldist import Pmf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Euro problem\n", "\n", "*\"When spun on edge 250 times, a Belgian one-euro coin came up heads 140 times and tails 110. 'It looks very suspicious to me,' said Barry Blight, a statistics lecturer at the London School of Economics. 'If the coin were unbiased, the chance of getting a result as extreme as that would be less than 7%.' \"*\n", "\n", "From “The Guardian” quoted by MacKay, *Information Theory, Inference, and Learning Algorithms*.\n", "\n", "\n", "**Exercise 1:** Write a function called `likelihood_euro` that defines the likelihood function for the Euro problem. Note that `hypo` is in the range 0 to 100.\n", "\n", "Here's an outline to get you started." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def likelihood_euro(data, hypo):\n", " \"\"\" Likelihood function for the Euro problem.\n", " \n", " data: string, either 'H' or 'T'\n", " hypo: prob of heads (0-100)\n", " \n", " returns: float probability\n", " \"\"\"\n", " # TODO: fill this in!\n", " return 1" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the prior, we'll start with a uniform distribution from 0 to 100." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def decorate_euro(title):\n", " \"\"\"Labels the axes.\n", " \n", " title: string\n", " \"\"\"\n", " plt.xlabel('Probability of heads')\n", " plt.ylabel('PMF')\n", " plt.title(title)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "euro = Pmf.from_seq(range(101))\n", "euro.plot()\n", "decorate_euro('Prior distribution')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can update with a single heads:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "euro.update(likelihood_euro, 'H')\n", "euro.plot()\n", "decorate_euro('Posterior distribution, one heads')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another heads:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "euro.update(likelihood_euro, 'H')\n", "euro.plot()\n", "decorate_euro('Posterior distribution, two heads')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And a tails:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "euro.update(likelihood_euro, 'T')\n", "euro.plot()\n", "decorate_euro('Posterior distribution, HHT')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting over, here's what it looks like after 7 heads and 3 tails." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "euro = Pmf.from_seq(range(101))\n", "\n", "for outcome in 'HHHHHHHTTT':\n", " euro.update(likelihood_euro, outcome)\n", "\n", "euro.plot()\n", "decorate_euro('Posterior distribution, 7 heads, 3 tails')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The maximum apostiori probability (MAP) is 70%, which is the observed proportion." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "euro.max_prob()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the posterior probabilities after 140 heads and 110 tails." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "euro = Pmf.from_seq(range(101))\n", "\n", "evidence = 'H' * 140 + 'T' * 110\n", "for outcome in evidence:\n", " euro.update(likelihood_euro, outcome)\n", " \n", "euro.plot()\n", "\n", "decorate_euro('Posterior distribution, 140 heads, 110 tails')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The posterior mean is about 56%" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "euro.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So is the MAP." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "euro.max_prob()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the median (50th percentile)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "euro.quantile(0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The posterior credible interval has a 90% chance of containing the true value (provided that the prior distribution truly represents our background knowledge)." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "euro.credible_interval(0.9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Swamping the prior\n", "\n", "The following function makes a Euro object with a triangle prior." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "def TrianglePrior():\n", " \"\"\"Makes a Suite with a triangular prior.\n", " \"\"\"\n", " suite = Pmf(name='triangle')\n", " for x in range(0, 51):\n", " suite[x] = x\n", " for x in range(51, 101):\n", " suite[x] = 100-x \n", " suite.normalize()\n", " return suite" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's what it looks like:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "euro1 = Pmf.from_seq(range(101), name='uniform')\n", "euro1.plot()\n", "\n", "euro2 = TrianglePrior()\n", "euro2.plot()\n", "\n", "plt.legend()\n", "decorate_euro('Prior distributions')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 9:** Update `euro1` and `euro2` with the same data we used before (140 heads and 110 tails) and plot the posteriors. How big is the difference in the means?" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The posterior distributions are not identical, but with this data, they converge to the point where there is no practical difference, for most purposes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }