{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bite Size Bayes\n", "\n", "Copyright 2020 Allen B. Downey\n", "\n", "License: [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def prob(A):\n", " \"\"\"Computes the probability of a proposition, A.\n", " \n", " A: Boolean series\n", " \n", " returns: probability\n", " \"\"\"\n", " return A.mean()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def conditional(A, B):\n", " \"\"\"Conditional probability of A given B.\n", " \n", " A: Boolean series\n", " B: Boolean series\n", " \n", " returns: probability\n", " \"\"\"\n", " return prob(A[B])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Two coins\n", "\n", "> Suppose I flip two fair coins and tell you (honestly) that at least one of the coins is heads. What is the probability that both coins are heads?\n", "\n", "The answer is 1/3, and here's an argument that explains it.\n", "\n", "1. If you toss two coins, there are 4 equally likely outcomes: HH, HT, TH, TT\n", "\n", "2. If I tell you that at least one is heads, that eliminates TT. \n", "\n", "3. The remaining 3 outcomes are still equally likely, so their probability is now 1/3 each.\n", "\n", "4. Therefore, the probability of HH is now 1/3.\n", "\n", "However, you might still have some doubts. For me, Step 3 feels like an unsupported assertion: How do we know the 3 remaining outcomes are still equally likely?\n", "\n", "The following simulation might help convince you.\n", "\n", "First I'll generate two sets of coin flips." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "size = 10000\n", "first = np.random.choice(['H', 'T'], size)\n", "second = np.random.choice(['H', 'T'], size)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can confirm that each coin has a 50% chance of landing heads:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "prob(first == 'H')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "prob(second == 'H')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can compute a Boolean Series that is `True` if either coin landed heads, or both." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "at_least_one = (first == 'H') | (second == 'H')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can confirm that happens 75% of the time:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "prob(at_least_one)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can compute a Boolean Series that is `True` if both coins landed heads." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "both = (first == 'H') & (second == 'H')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And confirm that it happens 25% of the time." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "prob(both)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can compute the conditional probability of `both` given `at_least_one`:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "conditional(both, at_least_one)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Monty Hall problem\n", "\n", "From [Wikipedia](https://en.wikipedia.org/wiki/Monty_Hall_problem):\n", "\n", "> Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, \"Do you want to pick door No. 2?\" Is it to your advantage to switch your choice?\n", "\n", "To avoid ambiguities, we have to make some assumptions about the behavior of the host:\n", "\n", "1. The host never opens the door you picked.\n", "\n", "2. The host never opens the door with the car.\n", "\n", "3. If you choose the door with the car, the host chooses one of the other doors at random.\n", "\n", "4. The host always offers you the option to switch.\n", "\n", "Under these assumptions, are you better off sticking or switching?\n", "\n", "The correct answer is that you are better off switching. If you stick, you win 1/3 of the time. If you switch, you win 2/3 of the time.\n", "\n", "Here's one of many arguments that might persuade you.\n", "\n", "> If you always stick, you win if you initially choose the door with the car, so the probability is 1/3.\n", ">\n", "> If you always switch, you win if you did _not_ choose the door with the car, so the probability is 2/3.\n", "\n", "However, many people do not find any verbal arguments persuasive. So, maybe a simulation will help." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Write a simulation that confirms that you are better off switching if the host opens Door 3." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bayes's Theorem\n", "\n", "In the previous two examples, you might have noticed a seeming contradiction:\n", "\n", "* In the coin example, we start with four hypotheses with equal probability; one of them is eliminated by the data, and I argue that the other three still have equal probability.\n", "\n", "* In the Monty Hall example, we start with three hypotheses with equal probability; one of them is eliminated by the data, but it turns out that the other two do _not_ have equal probability.\n", "\n", "When one hypothesis is eliminated, its probability is redistributed to the remaining hypotheses, but it seems like there is no general rule for _how_ it is redistibuted.\n", "\n", "Fortunately, Bayes's Theorem resolves this contradiction; if we apply it carefully, it tells us exactly how the probability should be redistributed.\n", "\n", "First I'll solve the coin problem using a Bayes table. Again, we start with four hypotheses with equal prior probability. Just for fun, I'll use `Fraction` objects so the results are represented as rational numbers rather than floating-point." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "from fractions import Fraction\n", "\n", "hypos = ['HH', 'HT', 'TH', 'HH']\n", "table = pd.DataFrame(index=hypos)\n", "table['prior'] = Fraction(1, 4)\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data, in this example, is my report that there is at least one heads. Assuming that I report honestly, we can compute the likelihood of the data under each hypothesis." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "table['likelihood'] = [1, 1, 1, 0]\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we fill in the rest of the table in the usual way." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "table['unnorm'] = table['prior'] * table['likelihood']\n", "prob_data = table['unnorm'].sum()\n", "table['posterior'] = table['unnorm'] / prob_data\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example the remaining hypotheses have the same posterior probability because the likelihood of the data is the same under any of them.\n", "\n", "For the Monty Hall problem, that is not the case. We'll start with three hypotheses, one for each door, and equal priors." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "hypos = ['Door 1', 'Door 2', 'Door 3']\n", "table = pd.DataFrame(index=hypos)\n", "table['prior'] = Fraction(1, 3)\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, the data is that the host opens Door 3. So the question is, what is the probability of the data under each hypothesis. In reverse order:\n", "\n", "* Door 3: If the car is behind Door 3 and you choose Door 1, the host has no choice but to open Door 2, so the probability that he opens Door 3 is 0.\n", "\n", "* Door 2: If the car is behind Door 2 and you choose Door 1, the host has not choice but to open Door 3, so the probability that he opens Door 3 is 1.\n", "\n", "* Door 1: If the car is behind Door 1 and you choose Door 1, the host has a choice, and the statement of the problem indicates that he chooses either Door 2 or Door 3 with equal probability, so the probability that he opens Door 3 is 1/2.\n", "\n", "That's all we need to fill in the likelihoods:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "table['likelihood'] = [Fraction(1,2), 1, 0]\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we fill in the rest of the table in the usual way." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "table['unnorm'] = table['prior'] * table['likelihood']\n", "prob_data = table['unnorm'].sum()\n", "table['posterior'] = table['unnorm'] / prob_data\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example the likelihood of the data is not the same under the remaining hypotheses, so the posterior probabilities are not the same." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Here's a variation on the Monty Hall problem. Suppose that whenever the host has a choice, he opens Door 3. In that case, what are the posterior probabilities for the three doors?" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Suppose that whenever the host has a choice he chooses Door 3 with probability `p` and Door 2 with probability `1-p`. What are the posterior probabilities for the three doors?\n", "\n", "Hint: If you use SymPy to create a symbol for `p`, it will carry through the computation." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "from sympy import symbols\n", "\n", "p = symbols('p')" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 1 }