{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 22: Transformations, Log-Normal, Convolutions, Proving Existence\n",
    "\n",
    "\n",
    "## Stat 110, Prof. Joe Blitzstein, Harvard University\n",
    "\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Variance of Hypergeometric, con't\n",
    "\n",
    "Returning to where we left off in Lecture 21, recall that we are considering $X \\sim \\operatorname{HGeom}(w, b, n)$ where $p = \\frac{w}{w+b}$ and $w + b = N$.\n",
    "\n",
    "\\begin{align}\n",
    "  Var\\left( \\sum_{j=1}^{n} X_j \\right) &= \\operatorname{Var}(X_1) + \\dots + \\operatorname{Var}(X_n) + 2 \\, \\sum_{i<j} \\operatorname{Cov}(X_i, X_j) \\\\\n",
    "  &= n \\, Var(X_1) + 2 \\, \\binom{n}{2} \\operatorname{Cov} (X_1, X_2) & \\quad \\text{symmetry, amirite?} \\\\\n",
    "  &= n \\, p \\, (1-p) + 2 \\, \\binom{n}{2} \\left( \\frac{w}{w+b} \\, \\frac{w-1}{w+b-1} - p^2  \\right) \\\\\n",
    "  &= \\frac{N-n}{N-1} \\, n \\, p \\, (1-p) \\\\\n",
    "  \\\\\n",
    "  \\text{where } \\frac{N-n}{N-1} &\\text{  is known as the finite population correction}\n",
    "\\end{align}\n",
    "\n",
    "\n",
    "Note how this closely resembles the variance for a binomial distribution, except for scaling by that finite population correction.\n",
    "\n",
    "Let's idiot-check this:\n",
    "\n",
    "\\begin{align}\n",
    "  \\text{let } n &= 1 \\\\\n",
    "  \\\\\n",
    "  \\operatorname{Var}(X) &= \\frac{N-1}{N-1} 1 \\, p \\, (1-p) \\\\\n",
    "  &= p \\, (1-p) & \\quad \\text{ ... just a Bernoulli, since we only sample once!} \\\\\n",
    "  \\\\\n",
    "  \\text{let } N &\\gg n \\\\\n",
    "  \\Rightarrow \\frac{N-n}{N-1} &= 1\n",
    "  \\\\\n",
    "  \\operatorname{Var}(X) &= \\frac{N-n}{N-1} n \\, p \\, (1-p) \\\\\n",
    "  &= n \\, p \\, (1-p) & \\quad \\text{ ... Binomial, we probably never sample same element twice!} \\\\\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Transformations\n",
    "\n",
    "### Or a change of variables\n",
    "\n",
    "A function of an r.v. is itself an r.v., and we can use LOTUS to find mean and variance. But what if we want more than just the mean and variance? What if we want to know the entire distribution (PDF)?\n",
    "\n",
    "### Theorem\n",
    "\n",
    "> Let $X$ be a continuous r.v. with PDF $f_X, Y = g(X)$.\n",
    "> Given that $g$ is differentiable, and strictly increasing\n",
    "> (at least on the region in which we are interested),\n",
    "> then the PDF of $Y$ is given by\n",
    ">\n",
    "> \\begin\\{align\\}\n",
    ">   f_Y(y) &= f_X(x) \\, \\frac{dx}{dy} & \\quad \\text{ where } y = g(x) \\text{ , } x = g^{-1}(y)\n",
    "> \\end\\{align\\}\n",
    ">\n",
    "\n",
    "And since we know from the [Chain Rule](https://en.wikipedia.org/wiki/Chain_rule) that $\\frac{dx}{dy} = \\left( \\frac{dy}{dx} \\right)^{-1}$, you can substitute $\\left( \\frac{dy}{dx} \\right)^{-1}$ for $\\frac{dx}{dy}$ if that makes things easier.\n",
    "\n",
    "#### Proof\n",
    "\n",
    "\\begin{align}\n",
    "  &\\text{starting from the CDF...} \\\\\n",
    "  \\\\\n",
    "  F_Y(y) &= P(Y \\le y) \\\\\n",
    "  &= P \\left(g(x) \\le y \\right) \\\\\n",
    "  &= P \\left(X \\le g^{-1}(y) \\right) \\\\\n",
    "  &= F_X \\left( g^{-1}(y) \\right) \\\\\n",
    "  &= F_X(x) \\\\\n",
    "  \\\\\n",
    "  &\\text{and now differentiating to get the PDF...} \\\\\n",
    "  \\\\\n",
    "  \\Rightarrow f_{Y}(y) &=  f_{X}(x) \\frac{dx}{dy} \n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Log-Normal\n",
    "\n",
    "Now let's try applying what we now know about transformations to get the PDF of a [Log-Normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution).\n",
    "\n",
    "Given the log-normal distribution $Y = e^{z}$, where $Z \\sim (0,1)$, find the PDF.\n",
    "\n",
    "Note that $\\frac{dy}{dz} = e^z = y$.\n",
    "\n",
    "\\begin{align}\n",
    "  f_Y(y) &= f_Z{z} \\, \\frac{dz}{dy} \\\\\n",
    "  &= \\frac{1}{\\sqrt{2\\pi}} \\, e^{-\\frac{lny^2}{2}} \\, \\frac{1}{y} & \\quad \\text{where }y \\gt 0\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Transformations in $\\mathbb{R}^n$\n",
    "\n",
    "### Multi-dimensional Example\n",
    "\n",
    "Here's a multi-dimensional example.\n",
    "\n",
    "Given the distribution $\\vec{Y} = g(\\vec{X})$, where $g \\colon \\mathbb{R}^n \\rightarrow \\mathbb{R}^n$, with continuous joint PDF $\\vec{X} = \\{ X_1, \\dots , X_n \\}$.\n",
    "\n",
    "What is the joint PDF of $Y$ in terms of the joint PDF $X$?\n",
    "\n",
    "\\begin{align}\n",
    "  f_Y(\\vec{y}) &= f_X(\\vec{x}) \\, | \\frac{d\\vec{x}}{d\\vec{y}} | \\\\ \n",
    "  \\\\\n",
    "  \\text{where } \\frac{d\\vec{x}}{d\\vec{y}} &= \n",
    "    \\begin{bmatrix} \n",
    "      \\frac{\\partial x_1}{\\partial y_1} & \\cdots & \\frac{\\partial x_1}{\\partial y_n} \\\\\n",
    "      \\vdots&\\ddots&\\vdots \\\\\n",
    "      \\frac{\\partial x_n}{\\partial y_1}& \\cdots &\\frac{\\partial x_n}{\\partial y_n}\n",
    "    \\end{bmatrix} & \\text{... is the Jacobian} \\\\\n",
    "    \\\\\n",
    "  \\text{and }  | \\frac{d\\vec{x}}{d\\vec{y}} | &= \\left| \\, det \\, \\frac{d\\vec{x}}{d\\vec{y}} \\, \\right| & \\quad \\text{... absolute value of determinant of Jacobian}\n",
    "\\end{align}\n",
    "\n",
    "Similar to the previous explanation on transformations,  you can substitute $\\left( | \\, \\frac{d\\vec{y}}{d\\vec{x}} \\, | \\right)^{-1}$ for $\\frac{d\\vec{x}}{d\\vec{y}}$ if that makes things easier."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Convolutions\n",
    "\n",
    "### Distribution for a Sum of Random Variables\n",
    "\n",
    "Let $T = X + Y$, where $X,Y$ are independent.\n",
    "\n",
    "\\begin{align}\n",
    "  P(T=t) &= \\sum_{x} P(X=x) \\, P(Y=t-x) & \\quad \\text{discrete case}\\\\\n",
    "  \\\\\n",
    "  f_T(t) &= \\int_{-\\infty}^{\\infty} f_X(x) \\, f_Y(t-x) \\, dx & \\quad \\text{continuous case} \\\\\n",
    "\\end{align}\n",
    "\n",
    "#### Proof of continuous case\n",
    "\n",
    "\\begin{align}\n",
    "  &\\text{starting from the CDF...} \\\\\n",
    "  \\\\\n",
    "  F_T(t) &= P(T \\le t) \\\\\n",
    "  &= \\int_{-\\infty}^{\\infty} P(X + Y \\le t \\, | \\, X=x) \\, f_X(x) \\, dx & \\quad \\text{ law of total probability} \\\\\n",
    "  &= \\int_{-\\infty}^{\\infty} P(Y \\le t - x) \\, f_X(x) \\, dx \\\\\n",
    "  &= \\int_{-\\infty}^{\\infty} F_Y(t-x) \\, f_X(x) \\, dx \\\\\n",
    "  \\\\\n",
    "  &\\text{and now differentiating w.r.t. } T \\text{ ...} \\\\\n",
    "  \\\\\n",
    "  \\Rightarrow f_{T}(t) &= \\int_{-\\infty}^{\\infty} f_Y(t-x) \\, f_X(x) \\, dx\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Proving Existence\n",
    "\n",
    "### Using Probability to Prove the Existence of Object with Desired Properties\n",
    "\n",
    "Let us say that $A$ is our desired property.\n",
    "\n",
    "Can we show that $P(A) \\gt 0$ for a _random object_? For if $P(A) \\gt 0$, it follows that there should be _at least one object with property $A$_.\n",
    "\n",
    "Suppose each object has some associated \"score\". We can pick a random object, and use that to compute the average score. From there, we can reason that there must be an object where this score is $\\ge \\mathbb{E}(X)$\n",
    "\n",
    "\n",
    "Suppose we have:\n",
    "\n",
    "* 100 people\n",
    "* 15 committees\n",
    "* each committee has 20 people\n",
    "* assume that each person is on 3 committees\n",
    "\n",
    "Show that there exists 2 committees where a group of 3 people are on both committees (overlap $\\ge 3$).\n",
    "\n",
    "Rather than try to enumerate all possible committee permutations, find the average overlap of 2 _random_ committees using indicator random variables.\n",
    "\n",
    "#### Proof\n",
    "\n",
    "\\begin{align}\n",
    "  \\text{let } \\, I_1 &= \\text{person 1 on both the randomly chosen committees} \\\\\n",
    "  \\\\\n",
    "  \\text{then } \\, P(I_1) &= \\frac{\\binom{3}{2}}{\\binom{15}{2}} \\\\\n",
    "  \\\\\n",
    "  \\mathbb{E}(overlap) &= 100 \\, \\frac{\\binom{3}{2}}{\\binom{15}{2}} & \\quad \\text{... by symmetry} \\\\\n",
    "  &= 100 \\, \\frac{3}{105} \\\\\n",
    "  &= \\frac{20}{7} \\\\\n",
    "  &= 2.857142857142857 \\\\\n",
    "\\end{align}\n",
    "\n",
    "But if the average overlap is $\\frac{20}{7}$, since overlap must be an integer, we can safely round up and assume that average overlap is 3. And so we conclude that there must be at least one pair of committees where the overlap $\\ge 3$.\n",
    "\n",
    "This is similar to how Shannon proved his theory on channel capacity.\n",
    "\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "View [Lecture 22: Transformations and Convolutions | Statistics 110](http://bit.ly/2wRz77T) on YouTube."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}