{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 20: Expected distance between Normals, Multinomial, Cauchy\n", "\n", "\n", "## Stat 110, Prof. Joe Blitzstein, Harvard University\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In lecture 19, we saw how to obtain the expected distance between two uniform r.v. How would we do the same, but with the _Normal distribution_?\n", "\n", "### Example: expected distance between 2 Normally distributed random points $\\mathbb{E} | Z_1 - Z_2 |$\n", "\n", "Given $Z_1,Z_2 \\sim \\mathcal{N}(0,1)$, $Z_1$ and $Z_2$ are i.i.d., can we find $\\mathbb{E} | Z_1 - Z_2 |$?\n", "\n", "Now, we could just jump in and try using 2D LOTUS, but let's take a step back and recall what we have seen earlier in Lecture 14 about the _linearity of Normals_, and see if we can apply what we know about MGFs." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Theorem: If $X \\sim \\mathcal{N}(\\mu_1, \\sigma_1^2)$ and $Y \\sim \\mathcal{N}(\\mu_2, \\sigma_2^2)$ are *independent*, then $X + Y \\sim \\mathcal{N}(\\mu_1 + \\mu_2, \\sigma_1^2 + \\sigma_2^2)$\n", "\n", "Recall from Lecture 17 that the MGF of a sum of independent r.v. is just the product of their respective MGF.\n", "\n", "\\begin{align}\n", " M(X + Y) &= \\mathbb{E}(e^{t(X+Y)}) \\\\\n", " &= \\mathbb{E}(e^{tX}) \\, \\mathbb{E}(e^{tY}) \\\\\n", " &= \\mathbb{E}(e^{\\mu_1 t + \\frac{1}{2} \\sigma_1^2 t^2 }) \\, \\mathbb{E}(e^{\\mu_2 t + \\frac{1}{2} \\sigma_2^2 t^2 }) \\\\\n", " &= \\mathbb{E} \\left( e^{(\\mu_1 + \\mu_2) t + \\frac{1}{2} (\\sigma_1^2 + \\sigma_2^2) t^2 } \\right) &\\quad \\blacksquare\n", "\\end{align}\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Returning now to our original question, we want to find $\\mathbb{E} |Z_1 - Z_2|$, with $Z_1, Z_2 \\sim \\mathcal{N}(0,1)$.\n", "\n", "Keep in mind that $Z_1 - Z_2 \\sim \\mathcal{N}(0, 2)$. But since this is just the standard normal scaled with $\\sqrt{2}$, we can do this:\n", "\n", "\\begin{align}\n", " \\mathbb{E} | Z_1 - Z_2 | &= \\mathbb{E} | \\sqrt{2} Z | & \\quad \\text{, where } Z \\sim \\mathcal{N}(0,1) \\\\\n", " &= \\sqrt{2} \\mathbb{E}|Z| \\\\\n", " &= \\sqrt{2} \\int_{-\\infty}^{\\infty} |z| \\, \\frac{1}{\\sqrt{2 \\pi}} \\, e~{-\\frac{z^2}{2}} \\, dz \\\\\n", " &= \\sqrt{\\frac{2}{\\pi}}\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multinomial Distribution\n", "\n", "A generalization of the binomial distribution, $\\operatorname{Mult}(n, \\vec{p})$ has parameter $n$ for number of items observed, with probability vector $\\vec{p} = p_1, p_2, \\dots , p_k)$ such that $p_j \\ge 0$ and $\\sum{p_i} = 1$\n", "\n", "So while the binomial has 2 classes, with success equating to $p$ and failure equating to $q$, the multinomial has $k$ classes, each with respective probability $p_i$.\n", "\n", "We have $n$ objects which we are independently putting into $k$ categories.\n", "\n", "\\begin{align}\n", " \\vec{X} &= \\operatorname{Mul}t(n, \\vec{p}) \\quad \\text{, with } \\vec{X} = ( X_1, X_2, \\dots , X_k) \\\\\n", " \\\\\n", " P_j &= P(\\text{category }j) \\\\\n", " \\\\\n", " X_j &= \\text{number of objects in category } j\n", "\\end{align}\n", "\n", "#### Joint PMF of Multinomial\n", "\n", "Because this is a _joint distribution_, $\\operatorname{Mult}(n, \\vec{p})$ has a joint PMF.\n", "\n", "\\begin{align}\n", " P(X_1=n_1, X_2=n_2, \\dots , X_k=n_k) &= \\frac{n!}{n_{1}!n_{2}! \\dots n_{k}!} \\, p_1^{n_1} p_2^{n_2} \\dots p_k^{n_k}\n", "\\end{align}\n", "\n", "#### Marginal distribution $X_j$ of Multinomial\n", "\n", "Given $\\vec{X} \\sim \\operatorname{Mult_k}(n, \\vec{p})$, what is the marginal distribution of $X_j$?\n", "\n", "Well, since we are interested in only the two cases of an object being in class $k$ or not, this is _binomial_. \n", "\n", "So for the marginal distribution $X_j$ of $\\operatorname{Mult}$,\n", "\n", "\\begin{align}\n", " &X_j \\sim \\operatorname{Bin}(n, p_j) \\\\\n", " \\\\\n", " &\\mathbb{E}(X_j) = n \\, p_j \\\\\n", " \\\\\n", " &\\operatorname{Var}(X_j) = n \\, p_j \\, (1 - p_j)\n", "\\end{align}\n", "\n", "\n", "#### Lumping Property\n", "\n", "Given a situation where there are n voters in a population sample, and there are 10 political parties, $\\vec{X} = (X_1, X_2, \\dots, X_10) \\sim Mult(n, (p_1, p_2, \\dots, p_10))$...\n", "\n", "Let us say that political parties 3 through 10 are relatively minor compared to parties 1 and 2. We can describe this case where we want to _lump together_ parties 3 through 10 with another multinomial $Y$ such that:\n", "\n", "\\begin{align}\n", " \\vec{Y} = (Y_1, Y_2, Y_{3, \\dots,10}) \\sim \\operatorname{Mult}(n, (p_1, p_2, p_{3, \\dots , 10}))\n", "\\end{align}\n", "\n", "Here, we gather up the counts/probabilities for parties 3 through 10, and now we have a multinomial with essentially three classes.\n", "\n", "\n", "#### Conditional Joint PMF\n", "\n", "Say that we have a k-class $\\vec{X} \\sim \\operatorname{Mult_k}(n, \\vec{p})$, but we know $X_1 = n_1$, but we don't know about the other $k-1$ classes...\n", "\n", "Given $X_1 = n_1$\n", "\n", "\\begin{align}\n", " (X_2, \\dots , X_k) &\\sim \\operatorname{Mult_{k-1}}(n-n_1, (p'_2, \\dots , p'_k )) & \\quad \\text{ where } p'_j = P(\\text{in class j} | \\text{not in class }1) \\\\\n", " &\\sim \\operatorname{Mult}((n - n_1), (\\frac{p_2}{1-p_1}, \\dots , \\frac{p_k}{1-p_1})) & \\quad \\text{ or alternatively... } \\\\\n", " &\\sim \\operatorname{Mult}((n - n_1), (\\frac{p_2}{p_2 + \\dots + p_k}, \\dots , \\frac{p_k}{p_2 + \\dots + p_k})) & \\quad \\text{ or alternatively... }\n", "\\end{align}\n", "\n", "All we are doing here is simply re-normalizing the probability vector to take into account that we have information that $X_1 = n_1$. Multinomials are simple and intuitive!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Cauchy Interview Problem\n", "\n", "### Or a good example of working with a continuous joint PDF\n", "\n", "The Cauchy Distribution is $T = \\frac{X}{Y}$ with $X, Y$ i.i.d. $\\mathcal{N}(0,1)$.\n", "\n", "It looks simple enough and appears to be quite useful, but it does have some weird properties. \n", "\n", "* it has no mean\n", "* it has no variance\n", "* it defies the Law of Large Numbers; averaging up a bunch of Cauchy will not yield a normal distribution, but remains Cauchy!\n", "\n", "_Can you find the PDF of T_?\n", "\n", "#### Using double integrals\n", "\n", "Let us try to find the CDF, and derive the PDF from that.\n", "\n", "\\begin{align}\n", " \\text{CDF: }P(\\frac{X}{Y} \\le t) &= P\\left(\\frac{X}{|Y|}\\right) & \\quad \\text{ following from the symmetry of the Normal} \\\\\n", " &= P(X \\le t \\, |Y| ) \\\\\n", " &= \\int_{-\\infty}^{\\infty} \\int_{-\\infty}^{t|y|} \\frac{1}{\\sqrt{2\\pi}} e^{-\\frac{x^2}{2}} \\, \\frac{1}{\\sqrt{2\\pi}} e^{-\\frac{y^2}{2}} \\, dx \\, dy \\\\\n", " &= \\frac{1}{\\sqrt{2\\pi}} \\, \\int_{-\\infty}^{\\infty} e^{-\\frac{y^2}{2}} \\, \\int_{-\\infty}^{t|y|} \\frac{1}{\\sqrt{2\\pi}} e^{-\\frac{x^2}{2}} \\, dx \\, dy & \\quad \\text{ starting from } x \\\\\n", " &= \\frac{1}{\\sqrt{2\\pi}} \\, \\int_{-\\infty}^{\\infty} e^{-\\frac{y^2}{2}} \\, \\Phi(t|y|) \\, dy & \\quad \\text{but we have an even function} \\\\\n", " &= \\sqrt{ \\frac{2}{\\pi}} \\, \\int_{0}^{\\infty} e^{-\\frac{y^2}{2}} \\, \\Phi(t|y|) \\, dy \\\\\n", " \\\\\n", " \\text{PDF: } F'(t) &= \\sqrt{\\frac{2}{\\pi}} \\int_{0}^{\\infty} y \\, e^{-\\frac{y^2}{2}} \\, \\frac{1}{\\sqrt{2\\pi}} \\, e^{-\\frac{t^2y^2}{2}} \\, dy \\\\\n", " &= \\frac{1}{\\pi} \\int_{0}^{\\infty} y \\, e^{-\\frac{(1+t^2)y^2}{2}} \\, dy & \\quad \\text{and with } u = \\frac{(1+t^2)y^2}{2} \\text{, } du = (1+t^2) \\, y \\, dy \\\\\n", " &= \\frac{1}{\\pi} \\int_{0}^{\\infty} \\frac{(1 + t^2)}{(1 + t^2)} \\, y \\, e^{-\\frac{(1+t^2)y^2}{2}} \\, dy \\\\\n", " &= \\frac{1}{\\pi} \\int_{0}^{\\infty} \\frac{1}{1 + t^2} \\, e^{-u} \\, du \\\\\n", " &= \\boxed{ \\frac{1}{\\pi (1+t^2)} } & \\quad \\text{ and just integrate this to get the CDF} \\\\\n", " \\\\\n", " \\text{CDF: } &= \\boxed{ \\frac{tan^{-1}(t)}{\\pi} }\n", "\\end{align}\n", "\n", "#### Using the Law of Total Probability\n", "\n", "Recall that the PDF of $\\mathcal{N}(0,1)$ is $\\phi(y)$\n", "\n", "\\begin{align}\n", " P(X \\le t|Y|) &= \\int_{-\\infty}^{^infty} P\\left(X \\le t \\lvert Y \\rvert \\mid Y=y\\right) \\, \\phi(y) \\, dy \\\\\n", " &= \\int_{-\\infty}^{\\infty} \\Phi \\left(t|y\\right) \\, \\phi(y) \\, dy & \\quad \\text{ which is pretty much what we did above }\n", "\\end{align}\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "View [Lecture 20: Multinomial and Cauchy | Statistics 110](http://bit.ly/2oQNiGU) on YouTube." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }