{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bayes' theorem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ P(A \\mid B) = \\frac{P(B \\mid A) P(A)}{P(B)}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem statement\n",
    "\n",
    "- A drug test where the probability of a user testing positive is 0.99 and the probability of a non-user testing negative is also 0.99.\n",
    "- If someone tests positive, what's the probability of them being a user?\n",
    "- Need to know the level of users in the population, suppose it's 0.5%."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Events, outcomes, probabilities\n",
    "\n",
    "![Deck of cards](https://i.imgur.com/1THqdvh.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- A probability is a number between zero and one inclusive: $p \\in [0,1]$.\n",
    "- Start with a set of elements called possible outcomes.\n",
    "- Experiment is the selection of one outcome.\n",
    "- Event is a subset of possible outcomes.\n",
    "- An event occurs if the selected outcome is in the subset.\n",
    "- Probability of an event is number of possible outcomes in event divided by the total number.\n",
    "- Watch out, sets can be infinite and/or uncountable."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simulation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "9\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "print(np.random.binomial(1, 0.01))\n",
    "\n",
    "# https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.binomial.html\n",
    "x = np.random.binomial(1, 0.01, 1000)\n",
    "print(np.sum(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper function, returns True with probability P, False otherwise.\n",
    "def true_with_prob_p(p):\n",
    "    return True if np.random.binomial(1, p) == 1 else False\n",
    "\n",
    "# Simulate the selection of a random person from the population.\n",
    "# Return True if they are a drug user, False otherwise.\n",
    "# True is returned with probability 0.005.\n",
    "def select_random_person():\n",
    "    return true_with_prob_p(0.005)\n",
    "\n",
    "# Simulate the testing of a person from the population.\n",
    "# Return True if they test positive, False otherwise.\n",
    "# Non-users test positive with probability 0.01.\n",
    "# Users test positive with probability 0.99.\n",
    "def test_person(user):\n",
    "    if user:\n",
    "        return true_with_prob_p(0.99)\n",
    "    else:\n",
    "        return true_with_prob_p(0.01)\n",
    "    \n",
    "# Run an experiment - take a random person from the population\n",
    "# and test whether or not they are positive.\n",
    "def run_experiment():\n",
    "    user = select_random_person()\n",
    "    test = test_person(user)\n",
    "    return (user, test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the experiemnt 10,000 times.\n",
    "y = [run_experiment() for i in range(10000)]\n",
    "\n",
    "# Count the number of users who tested positive.\n",
    "user_and_positive = [True for i in y if i[0] == True and i[1] == True]\n",
    "\n",
    "# Count the number of non-users who tested positive.\n",
    "nonuser_and_positive = [True  for i in y if i[0] == False and i[1] == True]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "52"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.sum(user_and_positive)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "98"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.sum(nonuser_and_positive)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0.5,1,'People who tested positive')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.bar([0, 1], [np.sum(user_and_positive), np.sum(nonuser_and_positive)])\n",
    "plt.xticks([0, 1], ('Users', ('Non-Users')))\n",
    "plt.title(\"People who tested positive\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ P(User \\mid Positive) = \\frac{P(Positive \\mid User) P(User)}{P(Positive)} =  \\frac{P(Positive \\mid User) P(User)}{P(Positive \\mid User)P(User) + P(Positive \\mid Nonuser)P(Nonuser)}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.33221476510067094\n"
     ]
    }
   ],
   "source": [
    "# Probability that you're a user.\n",
    "p_user = 0.005\n",
    "\n",
    "# Probability that you're a non-user.\n",
    "p_nonuser = 1 - p_user\n",
    "\n",
    "# Probability that a user tests positive.\n",
    "p_positive_user = 0.99\n",
    "\n",
    "# Probability that a non-user tests negative.\n",
    "p_positive_nonuser = 1.0 - 0.99\n",
    "\n",
    "# Probability that you test positive.\n",
    "p_positive = p_positive_user * p_user + p_positive_nonuser * p_nonuser\n",
    "\n",
    "# Bayes' theorem.\n",
    "top_line = p_positive_user * p_user\n",
    "bottom_line = p_positive\n",
    "p_user_positive = top_line / bottom_line\n",
    "\n",
    "# Show result.\n",
    "print(p_user_positive)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## End"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}