{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bayesian Statistics Made Simple\n",
    "===\n",
    "\n",
    "Code and exercises from my workshop on Bayesian statistics in Python.\n",
    "\n",
    "Copyright 2016 Allen Downey\n",
    "\n",
    "MIT License: https://opensource.org/licenses/MIT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If we're running on Colab, install empiricaldist\n",
    "# https://pypi.org/project/empiricaldist/\n",
    "\n",
    "import sys\n",
    "IN_COLAB = 'google.colab' in sys.modules\n",
    "\n",
    "if IN_COLAB:\n",
    "    !pip install empiricaldist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import seaborn as sns\n",
    "sns.set_style('white')\n",
    "sns.set_context('talk')\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from empiricaldist import Pmf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Euro problem\n",
    "\n",
    "*\"When spun on edge 250 times, a Belgian one-euro coin came up heads 140 times and tails 110.  'It looks very suspicious to me,' said Barry Blight, a statistics lecturer at the London School of Economics.  'If the coin were unbiased, the chance of getting a result as extreme as that would be less than 7%.' \"*\n",
    "\n",
    "From “The Guardian” quoted by MacKay, *Information Theory, Inference, and Learning Algorithms*.\n",
    "\n",
    "\n",
    "**Exercise 1:**  Write a function called `likelihood_euro` that defines the likelihood function for the Euro problem.  Note that `hypo` is in the range 0 to 100.\n",
    "\n",
    "Here's an outline to get you started."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def likelihood_euro(data, hypo):\n",
    "    \"\"\" Likelihood function for the Euro problem.\n",
    "    \n",
    "    data: string, either 'H' or 'T'\n",
    "    hypo: prob of heads (0-100)\n",
    "    \n",
    "    returns: float probability\n",
    "    \"\"\"\n",
    "    # TODO: fill this in!\n",
    "    return 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the prior, we'll start with a uniform distribution from 0 to 100."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def decorate_euro(title):\n",
    "    \"\"\"Labels the axes.\n",
    "    \n",
    "    title: string\n",
    "    \"\"\"\n",
    "    plt.xlabel('Probability of heads')\n",
    "    plt.ylabel('PMF')\n",
    "    plt.title(title)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro = Pmf.from_seq(range(101))\n",
    "euro.plot()\n",
    "decorate_euro('Prior distribution')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can update with a single heads:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro.update(likelihood_euro, 'H')\n",
    "euro.plot()\n",
    "decorate_euro('Posterior distribution, one heads')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another heads:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro.update(likelihood_euro, 'H')\n",
    "euro.plot()\n",
    "decorate_euro('Posterior distribution, two heads')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And a tails:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro.update(likelihood_euro, 'T')\n",
    "euro.plot()\n",
    "decorate_euro('Posterior distribution, HHT')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Starting over, here's what it looks like after 7 heads and 3 tails."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro = Pmf.from_seq(range(101))\n",
    "\n",
    "for outcome in 'HHHHHHHTTT':\n",
    "    euro.update(likelihood_euro, outcome)\n",
    "\n",
    "euro.plot()\n",
    "decorate_euro('Posterior distribution, 7 heads, 3 tails')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The maximum apostiori probability (MAP) is 70%, which is the observed proportion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro.max_prob()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are the posterior probabilities after 140 heads and 110 tails."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro = Pmf.from_seq(range(101))\n",
    "\n",
    "evidence = 'H' * 140 + 'T' * 110\n",
    "for outcome in evidence:\n",
    "    euro.update(likelihood_euro, outcome)\n",
    "    \n",
    "euro.plot()\n",
    "\n",
    "decorate_euro('Posterior distribution, 140 heads, 110 tails')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The posterior mean is about 56%"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So is the MAP."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro.max_prob()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the median (50th percentile)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro.quantile(0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The posterior credible interval has a 90% chance of containing the true value (provided that the prior distribution truly represents our background knowledge)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro.credible_interval(0.9)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Swamping the prior\n",
    "\n",
    "The following function makes a Euro object with a triangle prior."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "def TrianglePrior():\n",
    "    \"\"\"Makes a Suite with a triangular prior.\n",
    "    \"\"\"\n",
    "    suite = Pmf(name='triangle')\n",
    "    for x in range(0, 51):\n",
    "        suite[x] = x\n",
    "    for x in range(51, 101):\n",
    "        suite[x] = 100-x \n",
    "    suite.normalize()\n",
    "    return suite"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here's what it looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "euro1 = Pmf.from_seq(range(101), name='uniform')\n",
    "euro1.plot()\n",
    "\n",
    "euro2 = TrianglePrior()\n",
    "euro2.plot()\n",
    "\n",
    "plt.legend()\n",
    "decorate_euro('Prior distributions')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise 9:** Update `euro1` and `euro2` with the same data we used before (140 heads and 110 tails) and plot the posteriors.  How big is the difference in the means?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The posterior distributions are not identical, but with this data, they converge to the point where there is no practical difference, for most purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}