{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 3: Birthday Problem, Properties of Probability\n", "\n", "\n", "## Stat 110, Prof. Joe Blitzstein, Harvard University\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Birthday Problem\n", "Given $k$ people, what is the probability of at least 2 people having the same birthday?\n", "\n", "First, we need to define the problem:\n", "\n", "1. there are 365 days in a year (no leap-years)\n", "1. births can be on any day with equal probability (birthdays are independent of one another)\n", "1. treat people as distinguishable, because.\n", "\n", "$k \\le 1$ is meaningless, so we will not consider those cases.\n", "\n", "Now consider the case where you have more people than there are days in a year. In such a case, \n", "\n", "$$ P(k\\ge365) = 1$$\n", "\n", "Now think about the event of _no matches_. We can compute this probability using the naïve definition of probability:\n", "\n", "$$ P(\\text{no match}) = \\frac{365 \\times 364 \\times \\cdots \\times 365-k+1}{365^k} $$\n", "\n", "Now the event of _at least one match_ is the complement of _no matches_, so \n", "\n", "\\begin{align}\n", " P(\\text{at least one match}) &= 1 - P(\\text{no match}) \\\\\n", " &= 1 - \\frac{365 \\times 364 \\times \\cdots \\times 365-k+1}{365^k}\n", "\\end{align}\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "With k=23 people, the probability of a match is 0.507297, already exceeding 0.5.\n", "With k=50 people, the probability of a match is 0.970374, which is approaching 1.0.\n" ] } ], "source": [ "import numpy as np\n", "\n", "DAYS_IN_YEAR = 365\n", "\n", "def bday_prob(k):\n", " def no_match(k):\n", " days = [(DAYS_IN_YEAR-x) for x in range(k)]\n", " num = np.multiply.reduce(days, dtype=np.float64)\n", " return num / DAYS_IN_YEAR**k\n", " return 1.0 - no_match(k)\n", "\n", "print(\"With k=23 people, the probability of a match is {:0f}, already exceeding 0.5.\".format(bday_prob(23)))\n", "print(\"With k=50 people, the probability of a match is {:0f}, which is approaching 1.0.\".format(bday_prob(50)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Properties\n", "\n", "Let's derive some properties using nothing but the two axioms stated earlier.\n", "\n", "### Property 1\n", "> _The probability of an event $A$ is 1 minus the probability of that event's inverse (or complement)._\n", ">\n", "> \\begin\\{align\\}\n", "> P(A^{c}) &= 1 - P(A) \\\\\\\\\n", "> \\\\\\\\\n", "> \\because 1 &= P(S) \\\\\\\\\n", "> &= P(A \\cup A^{c}) \\\\\\\\\n", "> &= P(A) + P(A^{c}) & \\quad \\text{since } A \\cap A^{c} = \\emptyset ~~ \\blacksquare\n", "> \\end\\{align\\}\n", "\n", "### Property 2\n", "\n", "> _If $A$ is contained within $B$, then the probability of $A$ must be less than or equal to that for $B$._\n", ">\n", "> \\begin\\{align\\}\n", "> \\text{If } A &\\subseteq B \\text{, then } P(A) \\leq P(B) \\\\\\\\\n", "> \\\\\\\\\n", "> \\because B &= A \\cup ( B \\cap A^{c}) \\\\\\\\\n", "> P(B) &= P(A) + P(B \\cap A^{c}) \\\\\\\\\n", "> \\\\\\\\\n", "> \\implies P(B) &\\geq P(A) \\text{, since } P(B \\cap A^{c}) \\geq 0 & \\quad \\blacksquare\n", "> \\end\\{align\\}\n", "\n", "### Property 3, or the Inclusion/Exclusion Principle\n", "\n", "_The probability of a union of 2 events $A$ and $B$_\n", "\n", "> \\begin\\{align\\}\n", "> P(A \\cup B) &= P(A) + P(B) - P(A \\cap B) \\\\\\\\\n", "> \\\\\\\\\n", "> \\text{since } P(A \\cup B) &= P(A \\cup (B \\cap A^{c})) \\\\\\\\\n", "> &= P(A) + P(B \\cap A^{c}) \\\\\\\\ \n", "> \\\\\\\\\n", "> \\text{but note that } P(B) &= P(B \\cap A) + P(B \\cap A^{c}) \\\\\\\\\n", "> \\text{and since } P(B) - P(A \\cap B) &= P(B \\cap A^{c}) \\\\\\\\\n", "> \\\\\\\\\n", "> \\implies P(A \\cup B) &= P(A) + P(B) - P(A \\cap B) ~~~~ \\blacksquare\n", "> \\end\\{align\\}\n", "\n", "![title](images/L0301.png)\n", "\n", "This is the simplest case of the [principle of inclusion/exclusion](https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle).\n", "\n", "Considering the 3-event case, we have:\n", "\n", "> \\begin\\{align\\}\n", "> P(A \\cup B \\cup C) &= P(A) + P(B) + P(C) - P(A \\cap B) - P(B \\cap C) - P(A \\cap C) + P(A \\cap B \\cap C)\n", "> \\end\\{align\\}\n", ">\n", "> ...where we sum up all of the separate events;\n", "> and then subtract each of the pair-wise intersections;\n", "> and finally add back in that 3-event intersection since that was subtracted in the previous step.\n", "\n", "![title](images/L0302.png)\n", "\n", "For the general case, we have:\n", "\n", "$$\n", " P(A_1 \\cup A_2 \\cup \\cdots \\cup A_n) = \\sum_{j=1}^n P(A_{j}) - \\sum_{i