{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 27: Conditional expectation (cont.); taking out what's known; Adam's Law, Eve's Law; projection picture\n",
    "\n",
    "\n",
    "## Stat 110, Prof. Joe Blitzstein, Harvard University\n",
    "\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conditioning on Random Variables\n",
    "\n",
    "### Ex.  $\\mathbb{E}(Y|X)$ where $X \\sim N(0,1)$\n",
    "\n",
    "Let $X \\sim N(0,1)$ and $Y=X^2$.\n",
    "\n",
    "Then \n",
    "\n",
    "\\begin{align}\n",
    "  \\mathbb{E}(Y|X) &= \\mathbb{E}(X^2|X) \\\\\n",
    "  &= X^2 \\\\\n",
    "  &= Y\n",
    "\\end{align}\n",
    "\n",
    "* this is simple enough, and very clear. \n",
    "\n",
    "But how about the other way 'round?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ex. $\\mathbb{E}(X|Y)$ where $X \\sim N(0,1)$\n",
    "\n",
    "\\begin{align}\n",
    "  \\mathbb{E}(X|Y) &= \\mathbb{E}(X|X^2) \\\\\n",
    "  &= 0\n",
    "\\end{align}\n",
    "\n",
    "Why?\n",
    "\n",
    "* we don't know $X$, but what we are given is $X^2$\n",
    "* if we observe $x^2 = a$, then we know $x = \\pm \\sqrt{a}$\n",
    "* by _symmetry_, both $x=-\\sqrt{a}$ and $x=\\sqrt{a}$ are equally likely\n",
    "* hence the best estimate of $X$ would be... 0! "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Uniform\n",
    "\n",
    "Say we have a stick of length 1, and we break it at point $x$. Then we take that stick of length $x$, and break that at point $y$.\n",
    "\n",
    "What is $\\mathbb{E}(Y|X)$?\n",
    "\n",
    "![title](images/L2701.png)\n",
    "\n",
    "\n",
    "* $X \\sim \\operatorname{Unif}(0,1)$\n",
    "* $Y|X \\sim \\operatorname{Unif}(0,X)$\n",
    "\n",
    "\\begin{align}\n",
    "  \\mathbb{E}(Y|X=x) &= 0 \\\\\n",
    "  &= \\frac{x}{2} \\\\\n",
    "  \\\\\n",
    "  \\Rightarrow \\mathbb{E}(Y|X) &= \\frac{X}{2} \\\\\n",
    "  \\\\\n",
    "  \\mathbb{E} \\left( \\mathbb{E}(Y|X) \\right) &= \\frac{1}{4} \\\\\n",
    "  &= \\mathbb{E}(Y)\n",
    "\\end{align}\n",
    "\n",
    "* the expected length of $y = \\frac{1}{4}$ is pretty intuitive; take a stick, break it in half, break that half in half again\n",
    "* we will get more into that $\\mathbb{E} \\left( \\mathbb{E}(Y|X) \\right) = \\mathbb{E}(Y)$ in a bit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Useful Properties\n",
    "\n",
    "Here are some useful properties related to conditional expectation.\n",
    "\n",
    "\\begin{align}\n",
    "  &\\text{(1) } \\mathbb{E}\\left( h(X) Y|X \\right) = h(X) \\, \\mathbb{E}(Y|X) &\\text{\"taking out what is known\"} \\\\\\\\\n",
    "  &\\text{(2) } \\mathbb{E}(Y|X) = \\mathbb{E}(Y) &\\text{if } X,Y \\text{ are independent} \\\\\\\\\n",
    "  &\\text{(3) } \\mathbb{E}\\left( \\mathbb{E}(Y|X) \\right) = \\mathbb{E}(Y) &\\text{Iterated Expectation, or Adam's Law} \\\\\\\\\n",
    "  &\\text{(4) } \\mathbb{E}\\left( (Y - \\mathbb{E}(Y|X) \\, h(X) \\right) = 0 &\\text{residual is uncorrelated with } h(X) \\\\\\\\\n",
    "  &\\text{(5) } \\operatorname{Var}(Y) = \\mathbb{E}\\left( \\operatorname{Var}(Y|X) \\right) + \\operatorname{Va}r\\left( \\mathbb{E}(Y|X) \\right) &\\text{EVvE's Law} \\\\\\\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Proof of Property 4\n",
    "\n",
    "Here is a pictorial explanation to aid your intuition.\n",
    "\n",
    "A vector could be anything (point, function, cow); as long as it follows the axioms of vector space, then anything can be treated as a vector.\n",
    "\n",
    "![title](images/L2702.png)\n",
    "\n",
    "* The \"plane\" in the image represents all of the possible functions of $X$. As such, it neccessarily passes through the origin.\n",
    "* Conditional expectation is simply projecting $Y$ into the plane of all functions of $X$. \n",
    "* $\\mathbb{E}(Y|X)$ is the point in $X$ that is closest to $Y$.\n",
    "* If $Y$ is already a function of $X$, then $Y$ lies in that plane of $X$ functions.\n",
    "* If $Y$ is not a function of $X$, then the length of that projection is the _residual_.\n",
    "* in this image, we implicitly assume finite variance for all functions of $X$.\n",
    "\n",
    "So let us show that the residual $Y - \\mathbb{E}(Y|X)$ is uncorrelated with any function $h(X)$:\n",
    "\n",
    "\\begin{align}\n",
    "  \\operatorname{Cov}\\left( Y - \\mathbb{E}(Y|X) , h(X) \\right) &= \\mathbb{E}\\left( (Y - \\mathbb{E}(Y|X)) \\, h(X) \\right) - \\mathbb{E}\\left(Y-\\mathbb{E}(Y|X)\\right) \\, \\mathbb{E}\\left(h(X)\\right) \\\\\n",
    "  &= \\mathbb{E}\\left( (Y - \\mathbb{E}(Y|X)) \\, h(X) \\right) - \\left[\\mathbb{E}(Y) - \\mathbb{E}(Y) \\right] \\, \\mathbb{E}\\left(h(X)\\right) &\\text{ linearity, Adam's Law} \\\\\n",
    "  &= \\mathbb{E}\\left( (Y - \\mathbb{E}(Y|X)) \\, h(X) \\right) - 0 \\\\\n",
    "  &= \\mathbb{E}\\left( (Y - \\mathbb{E}(Y|X)) \\, h(X) \\right)\\\\\n",
    "  &= \\mathbb{E}\\left( Y \\, h(X) \\right) - \\mathbb{E}\\left( \\mathbb{E}(Y|X) \\, h(X) \\right) \\\\\n",
    "  &= \\mathbb{E}\\left( Y \\, h(X) \\right) - \\mathbb{E} \\left( \\mathbb{E}(Y \\, h(X))|X) \\right) & \\text{if we can take out, we can put back} \\\\\n",
    "  &= \\mathbb{E}\\left( Y \\, h(X) \\right) - \\mathbb{E}\\left( Y \\, h(X) \\right) & \\text{Adam's Law} \\\\\n",
    "  &= 0 &\\quad \\blacksquare\n",
    "\\end{align}\n",
    "\n",
    "And so the residual $Y - \\mathbb{E}(Y|X)$ is indeed uncorrelated with any function $h(X)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Proof of Property 3\n",
    "\n",
    "Returning to Property 3, let's do the discrete case (but the continuous case is analogous).\n",
    "\n",
    "Since $\\mathbb{E}(Y|X)$ is just a function of $X$, we can call it by another name, say $g(X)$.\n",
    "\n",
    "\\begin{align}\n",
    "  \\mathbb{E}\\left( \\mathbb{E}(Y|X) \\right) &= \\mathbb{E}\\left( g(X) \\right)  \\\\\n",
    "  &= \\sum_x g(x) \\, P(X=x) &\\text{by LOTUS, definition} \\\\\n",
    "  &= \\sum_x \\mathbb{E}(Y|X=x) \\, P(X=x) \\\\\n",
    "  &= \\sum_x \\left[ \\sum_y y \\, P(Y=y|X=x) \\right] P(X=x) \\\\ \n",
    "  &= \\sum_y \\sum_x y \\, P(Y=y, X=x) \\\\\n",
    "  &= \\sum_y y \\sum_x P(Y=y, X=x) \\\\\n",
    "  &= \\sum_y y P(Y=y) \\\\\n",
    "  &= \\mathbb{E}(Y) &\\quad \\blacksquare\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Conditional Variance\n",
    "\n",
    "Conditional variance is defined thusly:\n",
    "\n",
    "\\begin{align}\n",
    "  \\operatorname{Var}(Y|X) &= \\mathbb{E}(Y^2|X) - \\mathbb{E}(Y|X)^2 &\\text{or alternately} \\\\\\\\\n",
    "  &= \\mathbb{E}\\left[ (Y - \\mathbb{E}(Y|X))^2 | X \\right]\n",
    "\\end{align}\n",
    "\n",
    "#### Proof\n",
    "\n",
    "Let $g(X) = \\mathbb{E}(Y|X)$; this will make things a bit clearer.\n",
    "\n",
    "\\begin{align}\n",
    "  \\operatorname{Var}(Y|X) &= \\mathbb{E}\\left[ (Y - \\mathbb{E}g(X))^2 | X \\right] \\\\\n",
    "  &= \\mathbb{E}\\left[ Y^2 - 2Y \\, g(X) + g(X)^2 | X \\right] \\\\\n",
    "  &= \\mathbb{E}(Y^2|X) - 2\\mathbb{E}(Y\\,g(X)|X) + \\mathbb{E}(g(X)^2 | X) \\\\\n",
    "  &= \\mathbb{E}(Y^2|X) - 2 \\, g(X) \\, \\mathbb{E}(Y|X) + \\mathbb{E}(g(X)^2 | X) \\\\\n",
    "  &= \\mathbb{E}(Y^2|X) - 2 \\, g(X) \\, g(X) + g(X)^2 \\\\\n",
    "  &= \\mathbb{E}(Y^2|X) - 2 \\, g(X)^2 + g(X)^2 \\\\\n",
    "  &= \\mathbb{E}(Y^2|X) - g(X)^2 \\\\\n",
    "  &= \\mathbb{E}(Y^2|X) - \\mathbb{E}(Y|X)^2 &\\quad \\blacksquare \\\\\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Proof of Property 5\n",
    "\n",
    "EVvE's Law states that $\\operatorname{Var}(Y) = \\mathbb{E}\\left( \\operatorname{Var}(Y|X) \\right) + \\operatorname{Var}\\left( \\mathbb{E}(Y|X) \\right)$.\n",
    "\n",
    "![title](images/L2703.png)\n",
    "\n",
    "Graphically, conditional variance deals with both the variance _within_ a sub-groups $\\mathbb{E}\\left( \\operatorname{Var}(Y|X) \\right)$, and the variance amongst the groups $\\operatorname{Var}\\left( \\mathbb{E}(Y|X) \\right)$.\n",
    "\n",
    "In order to prove EVvE's Law, we will do the following to make things simpler:\n",
    "\n",
    "* let $g(X) = \\mathbb{E}(Y|X)$\n",
    "* by Adam's Law, $\\mathbb{E}(g(X))=\\mathbb{E}(Y)$\n",
    "\n",
    "Then:\n",
    "\n",
    "\\begin{align}\n",
    "  \\mathbb{E}\\left( \\operatorname{Var}(Y|X) \\right) &= \\mathbb{E}\\left[ \\mathbb{E}(Y^2|X) - (\\mathbb{E}(Y|X))^2 \\right] &\\text{ for the first part} \\\\\n",
    "  &= \\mathbb{E}(Y^2) - \\mathbb{E}(g(X))^2 \\\\\n",
    "  \\\\\n",
    "  Var\\left( \\mathbb{E}(Y|X) \\right) &= \\operatorname{Var}(g(X)) &\\text{ for the second part} \\\\\n",
    "  &= \\mathbb{E}(g(X))^2 - (\\mathbb{E}(g(X))^2 \\\\\n",
    "  \\\\\n",
    "  \\operatorname{Var}(Y) &= \\mathbb{E}(Y^2) - \\mathbb{E}\\left(g(X)\\right)^2 + \\mathbb{E}(g(X))^2 - (\\mathbb{E}\\left(g(X)\\right)^2 \\\\\n",
    "  &= \\mathbb{E}(Y^2) - (\\mathbb{E}( \\mathbb{E}(Y|X) ))^2 \\\\\n",
    "  &= \\mathbb{E}(Y^2) - (\\mathbb{E}(Y))^2 &\\quad \\blacksquare\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Example: Epidemiology and Conditional Variance\n",
    "\n",
    "Suppose we are studying infectious disease in a certain state. Due to circumstances (lack of resources and/or time), rather than taking samples across the state, we will randomly select a city and study a random sample of $n$ people there.\n",
    "\n",
    "Let $X$ be the number of infected people in the sample.\n",
    "\n",
    "Let $Q$ be the proportion of infected people in the randomly selected city. Keep in mind that different cities will have different proportions, hence $Q$ is a random variable.\n",
    "\n",
    "Find $\\mathbb{E}(X)$ and $\\operatorname{Var}(X)$.\n",
    "\n",
    "But to do this, we need to make an assumption about the distribution of $Q$. Given its flexibility, computational convenience and the fact that it is the conjugate prior to the binomial distribution, we will assume $Q \\sim \\operatorname{Beta}(a,b)$.\n",
    "\n",
    "It should be clear then that we are assuming that $X|Q \\sim \\operatorname{Bin}(n, Q)$. A hypergeometric might also work, but since $n$ is probably small compared to the total population size, and since we are sampling without replacement, we can choose to use the binomial along with the Beta distribution.\n",
    "\n",
    "_Remember that conditioning is the soul of statistics, and so we will condition on the proportion of infection $Q$ of our randomly selected city._"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  $\\mathbb{E}(X)$ via Adam's Law\n",
    "\n",
    "Thinking conditionally, we have:\n",
    "\n",
    "\\begin{align}\n",
    "  \\mathbb{E}(X) &= \\mathbb{E}\\left( \\mathbb{E}(X|Q) \\right) \\\\\n",
    "  &= \\mathbb{E}( nQ) &\\text{expected value of }\\operatorname{Bin}(n,Q) \\\\\n",
    "  &= n \\, \\mathbb{E}(Q) \\\\\n",
    "  &= n \\, \\frac{a}{a+b} &\\text{expected value of }\\operatorname{Beta}(a,b) \n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## $\\operatorname{Var}(X)$ via EVvE's Law\n",
    "\n",
    "Again thinking conditionally, we have:\n",
    "\n",
    "\\begin{align}\n",
    "  \\operatorname{Var}(X) &= \\mathbb{E}\\left( \\operatorname{Var}(X|Q) \\right) + \\operatorname{Var}\\left( \\mathbb{E}(X|Q) \\right) &\\text{ by EVvE's Law} \\\\\n",
    "  \\\\\n",
    "  \\mathbb{E}\\left( \\operatorname{Var}(X|Q) \\right) &= \\mathbb{E}\\left( n \\, Q \\, (1-Q) \\right) &\\text{ for the first part} \\\\\n",
    "  &= n \\, \\mathbb{E}\\left( Q \\, (1-Q) \\right) \\\\\n",
    "  &= n \\, \\frac{\\Gamma(a+b)}{\\Gamma(a)\\Gamma(b)} \\, \\int_{0}^{1} q \\, (1-q) \\, q^{a-1} \\, (1-q)^{b-1} \\, dq &\\text{LOTUS} \\\\\n",
    "  &= n \\, \\frac{\\Gamma(a+b)}{\\Gamma(a)\\Gamma(b)} \\, \\int_{0}^{1} q^{a} \\, (1-q)^{b} \\, dq \\\\\n",
    "  &= n \\, \\frac{\\Gamma(a+b)}{\\Gamma(a)\\Gamma(b)} \\, \\frac{\\Gamma(a+1)\\Gamma(b+1)}{\\Gamma(a+b+2)} &\\text{that is another }Beta \\\\\n",
    "  &= n \\, \\frac{\\Gamma(a+b)}{\\Gamma(a)\\Gamma(b)} \\, \\frac{a\\Gamma(a)b\\Gamma(b)}{(a+b+1)(a+b)\\Gamma(a+b)} \\\\\n",
    "  &= \\frac{n \\, a \\, b}{(a+b+1)(a+b)} \\\\\n",
    "  \\\\\n",
    "  Var\\left( \\mathbb{E}(X|Q) \\right) &= Var(n \\, Q) &\\text{ for the second part} \\\\\n",
    "  &= n^2 \\, \\operatorname{Var}(Q) \\\\\n",
    "  &= n^2 \\, \\frac{\\mu(1-\\mu)}{a+b+1} &\\text{where } \\mu = \\frac{a}{a+b} \\\\\n",
    "\\end{align}\n",
    "\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "View [Lecture 27: Conditional Expectation given an R.V. | Statistics 110](http://bit.ly/2NTiQXk) on YouTube."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}