{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 12: Discrete vs Continuous Distributions\n", "\n", "\n", "## Stat 110, Prof. Joe Blitzstein, Harvard University\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Discrete vs Continuous Random Variables\n", "\n", "So, we've completed our whirlwind introduction of discrete random variables, covering the following:\n", "\n", "1. Bernoulli\n", "1. Binomial\n", "1. Hypergeometric\n", "1. Geometric\n", "1. Negative Binomial\n", "1. Poisson\n", "\n", "As we now move into continuous random variables, let's create a table to compare/contrast important random variable properties and concepts.\n", "\n", "| Discrete | Continuous |\n", "| ------------- |---------------|\n", "| $X$ | $X$ |\n", "| PMF $P(X=x)$ | PDF $f_x(x) = F^\\prime(x)$
note $P(X=x)=0$ |\n", "| CDF $F_x(x)=P(X \\le x)$| CDF $F_x(x) = P(X \\le x)$ |\n", "| $\\mathbb{E}(X) = \\sum_{x} x P(X=x)$ | $\\mathbb{E}(X) = \\int_{-\\infty}^{\\infty} x f(x)dx$ |\n", "| $Var(X) = \\mathbb{E}X^2 - \\mathbb{E}(X)^2$ | $Var(X) = \\mathbb{E}X^2 - \\mathbb{E}(X)^2$ |\n", "| $SD(X) = \\sqrt{Var(X)}$ | $SD(X) = \\sqrt{Var(X)}$ |\n", "| LOTUS $\\mathbb{E}(g(x)) = \\sum_{x} g(x) P(X=x)$ | LOTUS $\\mathbb{E}( g(x) ) = \\int_{-\\infty}^{\\infty} g(x) f(x)dx$ |\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Probability Density Function\n", "\n", "In the _discrete_ case, we could calculate probability by __summing__ (counting) the discrete elements, since each element represents a bit of mass. The probability would be the total of all elements concerned, divided by total mass.\n", "\n", "But we cannot count discrete elements in the _continuous_ case. Instead, we __integrate__ the density function over a range to get probability mass per _area_.\n", "\n", "#### Definition: random variable\n", "\n", "> A __random variable__ $X$ has PDF $f(x)$ if for all $a$ and $b$\n", ">\n", "> \\\\begin{align}\n", "> & P(a \\le x \\le b) = \\int_a^b f(x) dx \\\\\\\\\n", "> \\\\end{align}\n", "\n", "### Test for validity\n", "\n", "Note that to be a valid PDF, \n", "\n", "1. $f(x) \\ge 0$\n", "1. $\\int_{-\\infty}^{\\infty} f(x) = 1$\n", "\n", "\n", "### Probability at a _point_ is 0\n", "\n", "- $a = b \\Rightarrow \\int_{a}^{a} f(x)dx = 0$\n", "\n", "\n", "### Density $\\times$ Length \n", "\n", "But for some point $x_0$ and some _very, very small value_ $\\epsilon$, we can derive $f(x_0) \\epsilon \\approx P(X \\in (x_0-\\frac{\\epsilon}{2}, x_0+\\frac{\\epsilon}{2})$\n", "\n", "\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cumulative Distribution Function\n", "\n", "### Deriving CDF from PDF\n", "\n", "If continuous r.v. $X$ has PDF $f$, the CDF is\n", "\n", "\\begin{align}\n", " F(x) &= P(X \\le x) \\\\\n", " &= \\int_{-\\infty}^{x} f(t) dt \\\\\n", " \\\\\n", " \\Rightarrow P(a \\le x \\le b) &= \\int_{a}^{b} f(x)dx \\\\\n", " &= F(b) - F(a) \\\\\n", "\\end{align}\n", "\n", "### Deriving PDF from CDF\n", "\n", "If continuous r.v. $X$ has CDF $F$ (and $X$ is continuous), the PDF is\n", "\n", "\\begin{align}\n", " f(x) &= F^\\prime(x) & &\\text{ by the Fundamental Theorem of Calculus} \\\\\n", "\\end{align}\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variance\n", "\n", "Mean only tells you where the average is. Another useful statistic is the _variance_ of a random variable, which tells you how the random variable is spread out around the mean.\n", "\n", "In other words, _variance_ answers the question _How far is $X$ from its mean, on average?_\n", "\n", "#### Definition: variance\n", "> Variance is a measure of how a random variable is spread about its mean.\n", ">\n", "> \\\\begin{align}\n", "> \\operatorname{Var}(X) &= \\mathbb{E}(X - \\mathbb{E}X)^2 & \\quad \\text{or alternatively} \\\\\\\\\n", "> \\\\\\\\\n", "> &= \\mathbb{E}X^2 - 2X(\\mathbb{E}X) + \\mathbb{E}(X^2) & \\quad \\text{by Linearity}\\\\\\\\\n", "> &= \\boxed{\\mathbb{E}X^2 - \\mathbb{E}(X)^2}\n", "> \\\\end{align}\n", "\n", "Sometimes the second form of variance is easier to use.\n", "\n", "Note that the formula variance is the same for both discrete and continuous r.v.\n", "\n", "But you might be wondering right now how to calculate $\\mathbb{E}X^2$. We will get to that in a bit...\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Standard Deviation\n", "\n", "But note that variance is expressed in terms of units __squared__. _Standard deviation_ is sometimes easier to use than variance, as it is given in the original units.\n", "\n", "#### Definition: standard deviation\n", "> The _standard deviation_ the square root of the variance.\n", ">\n", "> \\\\begin{align} \n", "> SD(X) &= \\sqrt{\\operatorname{Var}(X)}\n", "> \\\\end{align}\n", "\n", "Note that like variance, the formula for standard deviation is the same for both discrete and continuous r.v.\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Uniform Distribution\n", "\n", "### Description\n", "\n", "The simplest and perhaps the most famous continuous distribution. Given starting point $a$ and ending point $b$, probability $\\propto$ length.\n", "\n", "![title](images/L1201.png)\n", "\n", "### Notation\n", "\n", "$X \\sim \\operatorname{Unif}(a,b)$\n", "\n", "### Parameters\n", "\n", "- $a$ start of the segment, $a < b$\n", "- $b$ end of the segment, $b > a$\n", "\n", "### Probability density function\n", "\n", "\\begin{align}\n", " f(x) &= \n", " \\begin{cases}\n", " c & \\quad \\text{ if } a \\le x \\le b \\\\\n", " 0 & \\quad \\text{ otherwise }\n", " \\end{cases} \\\\\n", " \\\\\n", " \\\\\n", " 1 &= \\int_{a}^{b} c dx \\\\\n", " \\Rightarrow c &= \\boxed{\\frac{1}{b-a}}\n", "\\end{align}\n", "\n", "### Cumulative distribution function\n", "\n", "\\begin{align}\n", " F(x) &= \\int_{-\\infty}^{x} f(t)dt \\\\\n", " &= \\int_{a}^{x} f(t)dt \\\\\n", " &= \n", " \\begin{cases}\n", " 0 & \\quad \\text{if } x \\lt a \\\\\n", " \\frac{x-a}{b-a} & \\quad \\text{if } a \\lt x \\lt b \\\\\n", " 1 & \\quad \\text{if } x \\gt b\n", " \\end{cases} \\\\\n", "\\end{align}\n", "\n", "So this means that as $X$ increase, its probability increase likewise in a _linear_ fashion.\n", "\n", "### Expected value\n", "\n", "For continuous r.v.\n", "\n", "\\begin{align}\n", " \\mathbb{E}(X) &= \\int_{a}^{b} x \\frac{1}{b-a} dx \\\\\n", " &=\\left. \\frac{x^2}{2(b-a)} ~~ \\right\\vert_{a}^{b} \\\\\n", " &= \\frac{(b^2-a^2)}{2(b-a)} \\\\\n", " &= \\boxed{\\frac{b+a}{2}}\n", "\\end{align}\n", "\n", "### Variance\n", "\n", "Remember that lingering doubt about $\\mathbb{E}X^2$?\n", "\n", "Let random variable $Y = X^2$.\n", "\n", "\\begin{align}\n", " \\mathbb{E}X^2 &= \\mathbb{E}(Y) \\\\\n", " &\\stackrel{?}{=} \\int_{-\\infty}^{\\infty} x^2 f(x) dx & &\\text{since we need the PDF of Y..?}\n", "\\end{align}\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Law of the Unconscious Statistician (LOTUS)\n", "\n", "Actually, that last bit of wishful thinking is correct and will work in both the discrete and continuous cases. \n", "\n", "In general for continuous r.v.\n", "\n", "\\begin{align}\n", " \\mathbb{E}( g(x) ) = \\int_{-\\infty}^{\\infty} g(x) f(x)dx\n", "\\end{align}\n", "\n", "And likewise for discrete r.v.\n", "\n", "\\begin{align}\n", " \\mathbb{E}(g(x)) = \\sum_{x} g(x) P(X=x)\n", "\\end{align}\n", "\n", "### Variance of $U \\sim \\operatorname{Unif}(0,1)$\n", "\n", "\\begin{align}\n", " \\mathbb{E}(U) &= \\frac{1}{b-a} \\\\\n", " &= \\frac{1}{2} \\\\\n", " \\\\\n", " \\\\\n", " \\mathbb{E}U^2 &= \\int_{0}^{1} u^2 \\underbrace{f(u) du}_{1} \\\\\n", " &= \\left.\\frac{u^3}{3} ~~ \\right\\vert_{0}^{1} \\\\\n", " &= \\frac{1}{3} \\\\\n", " \\\\\n", " \\\\\n", " \\Rightarrow Var(U) &= \\mathbb{E}U^2 - \\mathbb{E}(U)^2 \\\\\n", " &= \\frac{1}{3} - \\left(\\frac{1}{2}\\right)^2 \\\\\n", " &= \\frac{1}{12}\n", "\\end{align}\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Universality of the Uniform\n", "\n", "Given an arbitrary CDF $F$ and the uniform $\\operatorname{U} \\sim \\operatorname{Unif}(0,1)$, it is possible to simulate a draw from the continuous r.v. of the CDF $F$.\n", "\n", "Assume:\n", "\n", "1. $F$ is strictly increasing\n", "1. $F$ is continuous as a function\n", "\n", "If we define $X = F^{-1}(U)$. Then $X \\sim F$.\n", "\n", "\\begin{align}\n", " P(X \\le x) &= P(F^{-1}(U) \\le x) \\\\\n", " &= P(U \\le F(x)) \\\\\n", " &= F(x) & \\quad \\text{ since } P(U \\le u) \\propto 1~~ \\blacksquare\n", "\\end{align}\n", "\n", "![title](images/L1202.png)\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "View [Lecture 12: Discrete vs. Continuous, the Uniform | Statistics 110](http://bit.ly/2wex5yh) on YouTube." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 1 }