{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## The double dice problem\n", "\n", "This notebook demonstrates a way of doing simple Bayesian updates using the table method, with a Pandas DataFrame as the table.\n", "\n", "Copyright 2018 Allen Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Configure Jupyter so figures appear in the notebook\n", "%matplotlib inline\n", "\n", "# Configure Jupyter to display the assigned value after an assignment\n", "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from fractions import Fraction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The BayesTable class\n", "\n", "Here's the class that represents a Bayesian table." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "class BayesTable(pd.DataFrame):\n", " def __init__(self, hypo, prior=1, **options):\n", " columns = ['hypo', 'prior', 'likelihood', 'unnorm', 'posterior']\n", " super().__init__(columns=columns, **options)\n", " self.hypo = hypo\n", " self.prior = prior\n", " \n", " def mult(self):\n", " self.unnorm = self.prior * self.likelihood\n", " \n", " def norm(self):\n", " nc = np.sum(self.unnorm)\n", " self.posterior = self.unnorm / nc\n", " return nc\n", " \n", " def update(self):\n", " self.mult()\n", " return self.norm()\n", " \n", " def reset(self):\n", " return BayesTable(self.hypo, self.posterior)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The double dice problem\n", "\n", "Suppose I have a box that contains one each of 4-sided, 6-sided, 8-sided, and 12-sided dice. I choose a die at random, and roll it twice\n", "without letting you see the die or the outcome. I report that I got\n", "the same outcome on both rolls.\n", "\n", "1) What is the posterior probability that I rolled each of the dice?\n", "\n", "\n", "2) If I roll the same die again, what is the probability that I get the same outcome a third time?\n", "\n", "**Solution**\n", "\n", "Here's a `BayesTable` that represents the four hypothetical dice." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hypopriorlikelihoodunnormposterior
041NaNNaNNaN
161NaNNaNNaN
281NaNNaNNaN
3121NaNNaNNaN
\n", "
" ], "text/plain": [ " hypo prior likelihood unnorm posterior\n", "0 4 1 NaN NaN NaN\n", "1 6 1 NaN NaN NaN\n", "2 8 1 NaN NaN NaN\n", "3 12 1 NaN NaN NaN" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hypo = [Fraction(sides) for sides in [4, 6, 8, 12]]\n", "table = BayesTable(hypo)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we didn't specify prior probabilities, the default value is equal priors for all hypotheses. They don't have to be normalized, because we have to normalize the posteriors anyway.\n", "\n", "Now we can specify the likelihoods: if a die has `n` sides, the chance of getting the same outcome twice is `1/n`.\n", "\n", "So the likelihoods are:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hypopriorlikelihoodunnormposterior
0411/4NaNNaN
1611/6NaNNaN
2811/8NaNNaN
31211/12NaNNaN
\n", "
" ], "text/plain": [ " hypo prior likelihood unnorm posterior\n", "0 4 1 1/4 NaN NaN\n", "1 6 1 1/6 NaN NaN\n", "2 8 1 1/8 NaN NaN\n", "3 12 1 1/12 NaN NaN" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table.likelihood = 1/table.hypo\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use `update` to compute the posterior probabilities:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hypopriorlikelihoodunnormposterior
0411/41/42/5
1611/61/64/15
2811/81/81/5
31211/121/122/15
\n", "
" ], "text/plain": [ " hypo prior likelihood unnorm posterior\n", "0 4 1 1/4 1/4 2/5\n", "1 6 1 1/6 1/6 4/15\n", "2 8 1 1/8 1/8 1/5\n", "3 12 1 1/12 1/12 2/15" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table.update()\n", "table" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.400000\n", "1 0.266667\n", "2 0.200000\n", "3 0.133333\n", "Name: posterior, dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table.posterior.astype(float)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 4-sided die is most likely because you are more likely to get doubles on a 4-sided die than on a 6-, 8-, or 12- sided die.\n", "\n", "\n", "### Part two\n", "\n", "The second part of the problem asks for the (posterior predictive) probability of getting the same outcome a third time, if we roll the same die again.\n", "\n", "If the die has `n` sides, the probability of getting the same value again is `1/n`, which should look familiar.\n", "\n", "To get the total probability of getting the same outcome, we have to add up the conditional probabilities:\n", "\n", "```\n", "P(n | data) * P(same outcome | n)\n", "```\n", "\n", "The first term is the posterior probability; the second term is `1/n`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Fraction(13, 72)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "total = 0\n", "for _, row in table.iterrows():\n", " total += row.posterior / row.hypo\n", " \n", "total" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This calculation is similar to the first step of the update, so we can also compute it by\n", "\n", "1) Creating a new table with the posteriors from `table`.\n", "\n", "2) Adding the likelihood of getting the same outcome a third time.\n", "\n", "3) Computing the normalizing constant." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hypopriorlikelihoodunnormposterior
042/51/4NaNNaN
164/151/6NaNNaN
281/51/8NaNNaN
3122/151/12NaNNaN
\n", "
" ], "text/plain": [ " hypo prior likelihood unnorm posterior\n", "0 4 2/5 1/4 NaN NaN\n", "1 6 4/15 1/6 NaN NaN\n", "2 8 1/5 1/8 NaN NaN\n", "3 12 2/15 1/12 NaN NaN" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table2 = table.reset()\n", "table2.likelihood = 1/table.hypo\n", "table2" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Fraction(13, 72)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table2.update()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hypopriorlikelihoodunnormposterior
042/51/41/1036/65
164/151/62/4516/65
281/51/81/409/65
3122/151/121/904/65
\n", "
" ], "text/plain": [ " hypo prior likelihood unnorm posterior\n", "0 4 2/5 1/4 1/10 36/65\n", "1 6 4/15 1/6 2/45 16/65\n", "2 8 1/5 1/8 1/40 9/65\n", "3 12 2/15 1/12 1/90 4/65" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This result is the same as the posterior after seeing the same outcome three times." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example demonstrates a general truth: to compute the predictive probability of an event, you can pretend you saw the event, do a Bayesian update, and record the normalizing constant.\n", "\n", "(With one caveat: this only works if your priors are normalized.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }