{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bayes' theorem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ P(A \\mid B) = \\frac{P(B \\mid A) P(A)}{P(B)}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem statement\n", "\n", "- A drug test where the probability of a user testing positive is 0.99 and the probability of a non-user testing negative is also 0.99.\n", "- If someone tests positive, what's the probability of them being a user?\n", "- Need to know the level of users in the population, suppose it's 0.5%." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Events, outcomes, probabilities\n", "\n", "![Deck of cards](https://i.imgur.com/1THqdvh.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- A probability is a number between zero and one inclusive: $p \\in [0,1]$.\n", "- Start with a set of elements called possible outcomes.\n", "- Experiment is the selection of one outcome.\n", "- Event is a subset of possible outcomes.\n", "- An event occurs if the selected outcome is in the subset.\n", "- Probability of an event is number of possible outcomes in event divided by the total number.\n", "- Watch out, sets can be infinite and/or uncountable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "9\n" ] } ], "source": [ "import numpy as np\n", "\n", "print(np.random.binomial(1, 0.01))\n", "\n", "# https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.binomial.html\n", "x = np.random.binomial(1, 0.01, 1000)\n", "print(np.sum(x))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Helper function, returns True with probability P, False otherwise.\n", "def true_with_prob_p(p):\n", " return True if np.random.binomial(1, p) == 1 else False\n", "\n", "# Simulate the selection of a random person from the population.\n", "# Return True if they are a drug user, False otherwise.\n", "# True is returned with probability 0.005.\n", "def select_random_person():\n", " return true_with_prob_p(0.005)\n", "\n", "# Simulate the testing of a person from the population.\n", "# Return True if they test positive, False otherwise.\n", "# Non-users test positive with probability 0.01.\n", "# Users test positive with probability 0.99.\n", "def test_person(user):\n", " if user:\n", " return true_with_prob_p(0.99)\n", " else:\n", " return true_with_prob_p(0.01)\n", " \n", "# Run an experiment - take a random person from the population\n", "# and test whether or not they are positive.\n", "def run_experiment():\n", " user = select_random_person()\n", " test = test_person(user)\n", " return (user, test)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Run the experiemnt 10,000 times.\n", "y = [run_experiment() for i in range(10000)]\n", "\n", "# Count the number of users who tested positive.\n", "user_and_positive = [True for i in y if i[0] == True and i[1] == True]\n", "\n", "# Count the number of non-users who tested positive.\n", "nonuser_and_positive = [True for i in y if i[0] == False and i[1] == True]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "52" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum(user_and_positive)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "98" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum(nonuser_and_positive)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5,1,'People who tested positive')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "plt.bar([0, 1], [np.sum(user_and_positive), np.sum(nonuser_and_positive)])\n", "plt.xticks([0, 1], ('Users', ('Non-Users')))\n", "plt.title(\"People who tested positive\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ P(User \\mid Positive) = \\frac{P(Positive \\mid User) P(User)}{P(Positive)} = \\frac{P(Positive \\mid User) P(User)}{P(Positive \\mid User)P(User) + P(Positive \\mid Nonuser)P(Nonuser)}$$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.33221476510067094\n" ] } ], "source": [ "# Probability that you're a user.\n", "p_user = 0.005\n", "\n", "# Probability that you're a non-user.\n", "p_nonuser = 1 - p_user\n", "\n", "# Probability that a user tests positive.\n", "p_positive_user = 0.99\n", "\n", "# Probability that a non-user tests negative.\n", "p_positive_nonuser = 1.0 - 0.99\n", "\n", "# Probability that you test positive.\n", "p_positive = p_positive_user * p_user + p_positive_nonuser * p_nonuser\n", "\n", "# Bayes' theorem.\n", "top_line = p_positive_user * p_user\n", "bottom_line = p_positive\n", "p_user_positive = top_line / bottom_line\n", "\n", "# Show result.\n", "print(p_user_positive)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## End" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }