{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Random Variables, Expectation, Random Vectors, and Stochastic Processes\n",
    "\n",
    "## Random Variables\n",
    "A _real-valued random variable_ is a mapping from outcome space $\\mathcal{S}$ to the real line $\\Re$.\n",
    "A _real-valued random variable_ $X$ can be characterized by its probability distribution, which specifies (for a suitable collection of subsets of the real line $\\Re$ that comprises a sigma-algebra), the chance that the value of $X$ will be in each such subset.\n",
    "There are technical requirements regarding  _measurability_, which generally we will ignore.\n",
    "Perhaps the most natural mathematical setting for probability theory involves _Lebesgue integration_;\n",
    "we will largely ignore the difference between a _Riemann integral_ and a _Lebesgue integral_.\n",
    "\n",
    "Let $P_X$ denote the probability distribution of the random variable $X$. \n",
    "Then if $A \\subset \\Re$, $P_X(A) = {\\mathbb P} \\{ X \\in A \\}$.\n",
    "We write $X \\sim P_X$,\n",
    "pronounced \"$X$ is distributed as $P_X$\" or \"$X$ has distribution $P_X$.\" \n",
    "\n",
    "If two random variables $X$ and $Y$ have the same distribution, we write $X \\sim Y$ and we say that $X$ and $Y$\n",
    "are _identically distributed_.\n",
    "\n",
    "Real-valued random variables can be _continuous_, _discrete_, or _mixed (general)_.\n",
    "\n",
    "Continuous random variables have _probability density functions_ with respect to Lebesgue measure.\n",
    "If $X$ is a continuous random variables, there is some nonnegative function $f(x)$,\n",
    "the probability density of $X$, such that\n",
    "for any (suitable) set $A \\subset \\Re$,\n",
    "$$\n",
    "  {\\mathbb P} \\{ X \\in A \\} = \\int_A f(x) dx.\n",
    "$$\n",
    "Since ${\\mathbb P} \\{ X \\in \\Re \\} = 1$, it follows that $\\int_{-\\infty}^\\infty f(x) dx = 1$.\n",
    "\n",
    "_Example._ \n",
    "Let $f(x) = \\lambda e^{-\\lambda x}$ for $x \\ge 0$, with $\\lambda > 0$ fixed, and $f(x) = 0$ otherwise.\n",
    "Clearly $f(x) \\ge 0$.\n",
    "$$\n",
    "  \\int_{-\\infty}^\\infty f(x) dx = \\int_0^\\infty \\lambda e^{-\\lambda x} dx\n",
    "  = - e^{-\\lambda x}|_0^\\infty = - 0 + 1 = 1.\n",
    "$$\n",
    "Hence, $\\lambda e^{-\\lambda x}$ can be the probability density of a continuous random variable.\n",
    "A random variable with this density is said to be _exponentially distributed_.\n",
    "Exponentially distributed random variables are used to model radioactive decay and the failure\n",
    "of items that do not \"fatigue.\" For instance, the lifetime of a semiconductor after an initial\n",
    "\"burn-in\" period is often modeled as an exponentially distributed random variable.\n",
    "It is also a common model for the occurrence of earthquakes (although it does not fit the data well).\n",
    "\n",
    "_Example._\n",
    "Let $a$ and $b$ be real numbers with $a < b$, and let $f(x) = \\frac{1}{b-a}$, $x \\in [a, b]$ and \n",
    "$f(x)=0$, otherwise. \n",
    "Then $f(x) \\ge 0$ and $\\int_{-\\infty}^\\infty f(x) dx = \\int_a^b \\frac{1}{b-a} = 1$,\n",
    "so $f(x)$ can be the probability density function of a continuous random variable.\n",
    "A random variable with this density is sad to be _uniformly distributed on the interval $[a, b]$_.\n",
    "\n",
    "Discrete random variables assign all their probability to some _countable_ set of points $\\{x_i\\}_{i=1}^n$,\n",
    "where $n$ might be infinite.\n",
    "Discrete random variables have _probability mass functions_.\n",
    "If $X$ is a discrete random variable, there is a nonnegative function $p$, the probability mass function\n",
    "of $X$, such that\n",
    "for any set $A \\subset \\Re$,\n",
    "$$\n",
    "  {\\mathbb P} \\{X \\in A \\} = \\sum_{i: x_i \\in A} p(x_i).\n",
    "$$\n",
    "The value $p(x_i) = {\\mathbb P} \\{X = x_i\\}$, and $\\sum_{i=1}^\\infty p(x_i) = 1$.\n",
    "\n",
    "_Example._\n",
    "Fix $\\lambda > 0$.\n",
    "Let $x_i = i-1$ for $i=1, 2, \\ldots$, and let $p(x_i) = e^{-\\lambda} \\lambda^{x_i}/x_i!$.\n",
    "Then $p(x_i) > 0$ and \n",
    "$$ \n",
    "\\sum_{i=1}^\\infty p(x_i) = e^{-\\lambda} \\sum_{j=0}^\\infty \\lambda^j/j! = e^{-\\lambda} e^{\\lambda} = 1.\n",
    "$$\n",
    "Hence, $p(x)$ is the probability mass function of a discrete random variable.\n",
    "A random variable with this probability mass function is said to be _Poisson distributed (with parameter\n",
    "$\\lambda$)_.\n",
    "Poisson-distributed random variables are often used to model rare events.\n",
    "\n",
    "\n",
    "_Example._\n",
    "Let $x_i = i$ for $i=1, \\ldots, n$, and let $p(x_i) = 1/n$ and $p(x) = 0$, otherwise.\n",
    "Then $p(x) \\ge 0$ and $\\sum_{x_i} p(x_i) = 1$.\n",
    "Hence, $p(x)$ can be the probability mass function of a discrete random variable.\n",
    "A random variable with this probability mass function is said to be _uniformly distributed on $1, \\ldots, n$_.\n",
    "\n",
    "_Example._\n",
    "Let $x_i = i-1$ for $i=1, \\ldots, n+1$, and let $p(x_i) = {n \\choose x_i} p^{x_i} (1-p)^{n-x_i}$, and\n",
    "$p(x) = 0$ otherwise.\n",
    "Then $p(x) \\ge 0$ and \n",
    "$$\n",
    "\\sum_{x_i} p(x_i) = \\sum_{j=0}^n {n \\choose j} p^j (1-p)^{n-j} = 1,\n",
    "$$\n",
    "by the binomial theorem.\n",
    "Hence $p(x)$ is the probability mass function of a discrete random variable.\n",
    "A random variable with this probability mass function is said to be _binomially distributed\n",
    "with parameters $n$ and $p$_.\n",
    "The number of successes in $n$ independent trials that each have the same probability $p$ of success\n",
    "has a binomial distribution with parameters $n$ and $p$\n",
    "For instance, the number of times a fair die lands with 3 spots showing in 10 independent rolls has\n",
    "a binomial distribution with parameters $n=10$ and $p = 1/6$.\n",
    "\n",
    "For general random variables, the chance that $X$ is in some subset of $\\Re$ cannot be written as\n",
    "a sum or as a Riemann integral; it is more naturally represented as a Lebesgue integral (with respect to\n",
    "a measure other than Lebesgue measure).\n",
    "For example, imagine a random variable $X$ that has probability $\\alpha$ of being equal to zero;\n",
    "and if $X$ is not zero, it has a uniform distribution on the interval $[0, 1]$.\n",
    "Such a random variable is neither continuous nor discrete.\n",
    "\n",
    "Most of the random variables in this class are either discrete or continuous.\n",
    "\n",
    "If $X$ is a random variable such that, for some constant $x_1 \\in \\Re$, ${\\mathbb P}(X = x_1) = 1$, $X$\n",
    "is called a _constant random variable_."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "### Exercises\n",
    "\n",
    "1. Show analytically that $\\sum_{x_i} p(x_i) = \\sum_{j=0}^n {n \\choose j} p^j (1-p)^{n-j} = 1$.\n",
    "+ Write a Python program that verifies that equation numerically for $n=10$: for 1000 values of $p$ \n",
    "equispaced on the interval $(0, 1)$, find the maximum absolute value of the difference between the sum and 1.\n",
    "1.  Let $ \\in (0, 1]$; let $x_i = 1, 2, \\ldots$; and define $p(x_i) = (1-p)^{x_i-1}p$, and $p(x) = 0$ otherwise. Show analytically that $p(x)$ is the probability mass function of a discrete random variable. \n",
    "(A random variable with this probability mass function is said to be _geometrically distributed with parameter $p$_.)\n",
    "\n",
    "<hr />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cumulative Distribution Functions\n",
    "\n",
    "The _cumulative distribution function_ or _cdf_ of a real-valued random variable is the chance that the variable is less than $x$, as a function of $x$.\n",
    "Cumulative distribution functions are often denoted with capital Roman letters ($F$ is especially common notation):\n",
    "\n",
    "$$F_X(x) \\equiv \\mathbb{P}(X \\le x).$$\n",
    "\n",
    "Clearly:\n",
    "\n",
    "+ $0 \\le F_X(x) \\le 1$\n",
    "+ $F_X(x)$ increases monotonically with $x$ (i.e., $F_X(a) \\le F_X(b)$ if $a \\le b$.\n",
    "+ $\\lim_{x \\rightarrow -\\infty} F_X(x) = 0$\n",
    "+ $\\lim_{x \\rightarrow \\infty} F_X(x) = 1$\n",
    "\n",
    "The cdf of a continuous real-valued random variable is a continuous function.\n",
    "The cdf of a discrete real-valued random variable is piecewise constant, with jumps at the possible values of the random variable.\n",
    "If the cdf of a real-valued random variable has jumps and also regions where it is not constant, the random variable is neither continuous nor discrete.\n",
    "\n",
    "### Examples\n",
    "[To Do]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# boilerplate\n",
    "%matplotlib inline\n",
    "from __future__ import division\n",
    "import math\n",
    "import numpy as np\n",
    "import scipy as sp\n",
    "from scipy import stats  # distributions\n",
    "from scipy import special # special functions\n",
    "import matplotlib.pyplot as plt\n",
    "from ipywidgets import interact, interactive, FloatRangeSlider, fixed # interactive stuff"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Examples of densities and cdfs\n",
    "\n",
    "# U[0,1]\n",
    "def pltUnif(a,b):\n",
    "    ffac = 0.1\n",
    "    s = b-a\n",
    "    fudge = ffac*s\n",
    "    x = np.arange(a-fudge, b+fudge, s/200)\n",
    "    y = np.ones(len(x))/s\n",
    "    y[x<a] = np.zeros(np.sum(x < a))   # zero for x < a\n",
    "    y[x>b] = np.zeros(np.sum(x > b))   # zero for x > b\n",
    "    Y = (x-a)/s   # uniform CDF is linear\n",
    "    Y[x<a] = np.zeros(np.sum(x < a))\n",
    "    Y[x >= b] = np.ones(np.sum(x >= b))\n",
    "    plt.plot(x,y,'b-',x,Y,'r-',linewidth=2)\n",
    "    plt.plot((a-fudge, b+fudge), (0.5, 0.5), 'g--')  # horizontal green dashed line at 0.5\n",
    "    plt.plot((a-fudge, b+fudge), (0, 0), 'k-')  # horizontal black line at 0\n",
    "    plt.xlabel('$x$')  # axis labels. Can use LaTeX math markup\n",
    "    plt.ylabel(r'$f(x) = 1_{[a,b]}/(b-a)')\n",
    "    plt.axis([a-fudge,b+fudge,-0.1,(1+ffac)*max(1, 1/s)])  # axis limits\n",
    "    plt.title('The $U[$' + str(a) + ',' + str(b) + '$]$ density and cdf')\n",
    "    plt.show()\n",
    "\n",
    "interactive(pltUnif, \\\n",
    "            [a, b] = FloatRangeSlider(min = -5, max = 5, step = 0.05, lower=-1, upper=1))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Exponential(lambda)\n",
    "\n",
    "def plotExp(lam):\n",
    "    ffac = 0.05\n",
    "    x = np.arange(0, 5/lam, step=(5/lam)/200)\n",
    "    y = sp.stats.expon.pdf(x, scale = 1/lam)\n",
    "    Y = sp.stats.expon.cdf(x, scale = 1/lam)\n",
    "    plt.plot(x,y,'b-',x,Y,'r-',linewidth=2)\n",
    "    plt.plot((-.1, (1+ffac)*np.max(x)), (0.5, 0.5), 'g--')  # horizontal line at 0.5\n",
    "    plt.plot((-.1, (1+ffac)*np.max(x)), (1, 1), 'k:')  # horizontal line at 1\n",
    "    plt.xlabel('$x$')  # axis labels. Can use LaTeX math markup\n",
    "    plt.ylabel(r'$f(x) = \\lambda e^{-\\lambda x}; F(x) = 1-e^{\\lambda x}$.')\n",
    "    plt.title(r'The exponential density and cdf for $\\lambda=$' + str(lam))\n",
    "    plt.axis([-.1,(1+ffac)*np.max(x),-0.1,(1+ffac)*max(1, lam)])  # axis limits\n",
    "    plt.show()\n",
    "    \n",
    "interact(plotExp, lam=(0, 10, 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Jointly Distributed Random Variables\n",
    "\n",
    "Often we work with more than one random variable at a time.\n",
    "Indeed, much of this course concerns _random vectors_, the components of which are individual\n",
    "real-valued random variables.\n",
    "\n",
    "The _joint probability distribution_ of a collection of random variables $\\{X_i\\}_{i=1}^n$ gives the probability that\n",
    "the variables simultaneously fall in subsets of their possible values.\n",
    "That is, for every (suitable) subset $ A \\in \\Re^n$, the joint probability distribution of $\\{X_i\\}_{i=1}^n$\n",
    "gives ${\\mathbb P} \\{ (X_1, \\ldots, X_n) \\in A \\}$.\n",
    "\n",
    "An _event determined by the random variable $X$_ is an event of the form $X \\in A$, where $A \\subset \\Re$.\n",
    "\n",
    "An _event determined by the random variables $\\{X_j\\}_{j \\in J}$_ is an event of the form\n",
    "$(X_j)_{j \\in J} \\in A$, where $A \\subset \\Re^{\\#J}$.\n",
    "\n",
    "Two random variables $X_1$ and $X_2$ are _independent_ if every event determined by $X_1$ is independent\n",
    "of every event determined by $X_2$.\n",
    "If two random variables are not independent, they are _dependent_.\n",
    "\n",
    "A collection of random variables $\\{X_i\\}_{i=1}^n$ is _independent_ if every event determined by every subset\n",
    "of those variables is independent of every event determined by any disjoint subset of those variables.\n",
    "If a collection of random variables is not independent, it is _dependent_.\n",
    "\n",
    "Loosely speaking, a collection of random variables is independent if learning the values of some of them\n",
    "tells you nothing about the values of the rest of them.\n",
    "If learning the values of some of them tells you anything about the values of the rest of them,\n",
    "the collection is dependent.\n",
    "\n",
    "For instance, imagine tossing a fair coin twice and rolling a fair die.\n",
    "Let $X_1$ be the number of times the coin lands heads, and $X_2$ be the number of spots that show on the die.\n",
    "Then $X_1$ and $X_2$ are independent: learning how many times the coin lands heads tells you nothing about what\n",
    "the die did.\n",
    "\n",
    "On the other hand, let $X_1$ be the number of times the coin lands heads, and let $X_2$ be the sum of the\n",
    "number of heads and the number of spots that show on the die.\n",
    "Then $X_1$ and $X_2$ are dependent. For instance, if you know the coin landed heads twice, you know that the sum\n",
    "of the number of heads and the number of spots must be at least 3."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Expectation\n",
    "\n",
    "See [SticiGui: The Long Run and the Expected Value](http://www.stat.berkeley.edu/~stark/SticiGui/Text/expectation.htm) for an elementary introduction to expectation.\n",
    "\n",
    "The _expectation_ or _expected value_ of a random variable $X$, denoted ${\\mathbb E}X$, is a probability-weighted average of its possible values.\n",
    "From a frequentist perspective, it is the long-run limit (in probabiity) of the average of its values in repeated experiments.\n",
    "The expected value of a real-valued random variable (when it exists) is a fixed number, not a random value.\n",
    "The expected value depends on the probability distribution of $X$ but not on any realized value of $X$.\n",
    "If two random variables have the same probability distribution, they have the same expected value.\n",
    "\n",
    "<hr />\n",
    "### Properties of Expectation\n",
    "\n",
    "+ For any real $\\alpha \\in \\Re$, if ${\\mathbb P} \\{X = \\alpha\\} = 1$, then ${\\mathbb E}X = \\alpha$: the expected\n",
    "value of a constant random variable is that constant.\n",
    "+ For any real $\\alpha \\in \\Re$, ${\\mathbb E}(\\alpha X) = \\alpha {\\mathbb E}X$: scalar homogeneity.\n",
    "+ If $X$ and $Y$ are random variables, ${\\mathbb E}(X+Y) = {\\mathbb E}X + {\\mathbb E}Y$: additivity.\n",
    "\n",
    "<hr />\n",
    "\n",
    "### Calculating Expectation\n",
    "If $X$ is a continuous real-valued random variable with density $f(x)$, then the expected value of $X$ is\n",
    "$$\n",
    "   {\\mathbb E}X = \\int_{-\\infty}^\\infty x f(x) dx,\n",
    "$$\n",
    "provided the integral exists.\n",
    "\n",
    "If $X$ is a discrete real-valued random variable with probability function $p$, then the expected value of $X$ is\n",
    "$$\n",
    "   {\\mathbb E}X = \\sum_{i=1}^\\infty x_i p(x_i),\n",
    "$$\n",
    "where $\\{x_i\\} = \\{ x \\in \\Re: p(x) > 0\\}$,\n",
    "provided the sum exists."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples\n",
    "\n",
    "### Uniform\n",
    "Suppose $X$ has density $f(x) = \\frac{1}{b-a}$ for $a \\le x \\le b$ and $0$ otherwise.\n",
    "Then \n",
    "\n",
    "$$ \\mathbb{E} = \\int_{-\\infty}^\\infty x f(x) dx = \\frac{1}{b-a} \\int_a^b x dx = \\frac{b^2-a^2}{2(b-a)} =\n",
    "\\frac{a+b}{2}.$$\n",
    "\n",
    "\n",
    "\n",
    "### Poisson\n",
    "Suppose $X$ has a Poisson distribution with parameter $\\lambda$.\n",
    "Then \n",
    "\n",
    "$$\\mathbb{E}X = e^{-\\lambda} \\sum_{j=0}^\\infty j \\lambda^j/j! = \\lambda.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples relates to Bernoulli Trials\n",
    "\n",
    "### Bernoulli\n",
    "Suppose $X$ can take only two values, 0 and 1, and the probability that $X= 1$ is $p$.\n",
    "Then\n",
    "\n",
    "$$\\mathbb{E} X = 1 \\times p + 0 \\times (1-p) = p.$$\n",
    "\n",
    "### Binomial\n",
    "[To do.] Derive the Binomial distribution as the number of successes in $n$ iid Bernoulli trials.\n",
    "\n",
    "The number of successes $X$ in $n$ trials is equivalent to the sum of indicators for the success in each trial. That is, \n",
    "\n",
    "$$ X = \\sum_{i=1}^n X_i,$$\n",
    "\n",
    "where $X_i = 1$ if the $i$th trial results in success, and $X_i = 0$ otherwise.\n",
    "By the additive property of expectation,\n",
    "\n",
    "$$ \\mathbb{E}X = \\mathbb{E} \\sum_{i=1}^n X_i = \\sum_{i=1}^n \\mathbb{E}X_i =\n",
    "\\sum_{i=1}^n p = np.$$\n",
    "\n",
    "### Geometric\n",
    "\n",
    "The number of trials to the first success in iid Bernoulli($p$) trials has a _geometric distribution with parameter $p$_.\n",
    "\n",
    "[To do.] Derive the geometric and calculate expectation.\n",
    "\n",
    "\n",
    "### Negative Binomial\n",
    "The number of trials to the $k$th success in iid Bernoulli($p$) trials\n",
    "has a _negative binomial distribution with parameters $p$ and $k$_.\n",
    "\n",
    "[To do.] Derive the negative binomial.\n",
    "\n",
    "The number of trials $X$ until the $k$th success in iid Bernoulli trials can be written as the number of trials until the 1st success plus the number to the second success plus \\hellip; plus the number of trials to the $k$th success.\n",
    "Each of those $k$ \"waiting times\" $X_i$ has a geometric distribution.\n",
    "Hence\n",
    "\n",
    "$$ \\mathbb{E}X = \\mathbb{E} \\sum_{i=1}^k X_i = \\sum_{i=1}^k \\mathbb{E}X_i =\n",
    "\\sum_{i=1}^k 1/p = k/p.$$\n",
    "\n",
    "### Hypergeometric\n",
    "[To do.] Derive hypergeometric.\n",
    "\n",
    "Population of $N$ numbers of which $G$ equal 1 and $N-G$ equal 0.\n",
    "Number of 1s in a sample of size $n$ drawn without replacement.\n",
    "\n",
    "$$ \\mathbb{P} \\{X = x\\} = \\frac{ {{G} \\choose {x}}{{N-g} \\choose {n-x}}}{{N}\\choose{n}}.$$\n",
    "\n",
    "[To do.] Calculate expected value. Use random permutations of \"tickets\" to show that expected value in each position is $G/N$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples related to sampling from finite populations\n",
    "\n",
    "### One draw from a box of numbered tickets\n",
    "[To do.]\n",
    "\n",
    "### The sample sum of $n$ draws from a box\n",
    "[To do.]\n",
    "\n",
    "### The sample mean of $n$ draws from a box\n",
    "[To do.]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Variance,  Standard Error, and Covariance\n",
    "\n",
    "See [SticiGui: Standard Error](http://www.stat.berkeley.edu/~stark/SticiGui/Text/standardError.htm) for an elementary introduction to variance and standard error.\n",
    "\n",
    "The _variance_ of a random variable $X$ is $\\mbox{Var }X = {\\mathbb E}(X - {\\mathbb E}X)^2$.\n",
    "\n",
    "Algebraically, the following identity holds:\n",
    "$$\n",
    "\\mbox{Var } X = {\\mathbb E}(X - {\\mathbb E}X)^2 = {\\mathbb E}X^2 - 2({\\mathbb E}X)^2 + ({\\mathbb E}X)^2 =\n",
    "{\\mathbb E}X^2 - ({\\mathbb E}X)^2.\n",
    "$$\n",
    "However, this is generally not a good way to calculate $\\mbox{Var} X$ numerically, because of roundoff:\n",
    "it sacrifices precision unnecessarily.\n",
    "\n",
    "The _standard error_ of a random variable $X$ is $\\mbox{SE }X = \\sqrt{\\mbox{Var } X}$.\n",
    "\n",
    "If $\\{X_i\\}_{i=1}^n$ are independent, then $\\mbox{Var} \\sum_{i=1}^n X_i = \\sum_{i=1}^n \\mbox{Var }X_i$.\n",
    "\n",
    "If $X$ and $Y$ have a joint distribution, then $\\mbox{cov} (X,Y) = {\\mathbb E} (X - {\\mathbb E}X)(Y - {\\mathbb E}Y)$.\n",
    "It follows from this definition (and the commutativity of multiplication)\n",
    "that $\\mbox{cov}(X,Y) = \\mbox{cov}(Y,X)$.\n",
    "Also,\n",
    "$$\n",
    "\\mbox{var }(X+Y) = \\mbox{var }X + \\mbox{var }Y + 2\\mbox{cov}(X,Y).\n",
    "$$\n",
    "\n",
    "If $X$ and $Y$ are independent, $\\mbox{cov }(X,Y) = 0$. \n",
    "However, the converse is not necessarily true: $\\mbox{cov}(X,Y) = 0$ does not in general imply that\n",
    "$X$ and $Y$ are independent."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples\n",
    "\n",
    "### Variance of a Bernoulli random variable\n",
    "\n",
    "### Variance of a Binomial random variable\n",
    "\n",
    "### Variance of a Geometric and Negative Binomial random variable\n",
    "\n",
    "### Variance of the sample sum and sample mean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random Vectors\n",
    "\n",
    "Suppose $\\{X_i\\}_{i=1}^n$ are jointly distributed random variables, and let\n",
    "$$\n",
    "X = \n",
    "\\begin{pmatrix}\n",
    "X_1 \\\\\n",
    "\\vdots \\\\\n",
    "X_n\n",
    "\\end{pmatrix}\n",
    ".\n",
    "$$\n",
    "Then $X$ is a random vector, a $n$ by $1$ vector of real-valued random variables.\n",
    "\n",
    "The expected value of $X$ is\n",
    "$$\n",
    "{\\mathbb E} X \\equiv\n",
    "\\begin{pmatrix}\n",
    "{\\mathbb E} X_1 \\\\\n",
    "\\vdots \\\\\n",
    "{\\mathbb E} X_n\n",
    "\\end{pmatrix}\n",
    ".\n",
    "$$\n",
    "\n",
    "The _covariance matrix_ of $X$ is\n",
    "$$\n",
    "\\mbox{cov } X \\equiv\n",
    "{\\mathbb E} \n",
    "\\left (\n",
    "\\begin{pmatrix}\n",
    "X_1 - {\\mathbb E} X_1 \\\\\n",
    "\\vdots \\\\\n",
    "X_n - {\\mathbb E} X_n\n",
    "\\end{pmatrix}\n",
    "\\begin{pmatrix}\n",
    "X_1 - {\\mathbb E} X_1 & \\cdots & X_n - {\\mathbb E} X_n\n",
    "\\end{pmatrix}\n",
    "\\right )\n",
    "=\n",
    "{\\mathbb E} \n",
    "\\begin{pmatrix}\n",
    "(X_1 - {\\mathbb E} X_1)^2 & (X_1 - {\\mathbb E} X_1)(X_2 - {\\mathbb E} X_2) & \\cdots & (X_1 - {\\mathbb E} X_1)(X_n - {\\mathbb E} X_n) \\\\\n",
    "(X_1 - {\\mathbb E} X_1)(X_2 - {\\mathbb E} X_2) & (X_2 - {\\mathbb E} X_2)^2 & \\cdots & (X_2 - {\\mathbb E} X_2)(X_n - {\\mathbb E} X_n) \\\\\n",
    "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "(X_1 - {\\mathbb E} X_1)(X_n - {\\mathbb E} X_n) & (X_2 - {\\mathbb E} X_2)(X_n - {\\mathbb E} X_n) & \\cdots & (X_n - {\\mathbb E} X_n)^2\n",
    "\\end{pmatrix}\n",
    "$$\n",
    "\n",
    "$$\n",
    "= \n",
    "\\begin{pmatrix}\n",
    "{\\mathbb E}(X_1 - {\\mathbb E} X_1)^2 & {\\mathbb E}((X_1 - {\\mathbb E} X_1)(X_2 - {\\mathbb E} X_2)) & \\cdots & \n",
    "{\\mathbb E}(X_1 - {\\mathbb E} X_1)(X_n - {\\mathbb E} X_n)) \\\\\n",
    "{\\mathbb E}((X_1 - {\\mathbb E} X_1)(X_2 - {\\mathbb E} X_2)) & {\\mathbb E}(X_2 - {\\mathbb E} X_2)^2 & \\cdots & \n",
    "{\\mathbb E}((X_2 - {\\mathbb E} X_2)(X_n - {\\mathbb E} X_n)) \\\\\n",
    "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "{\\mathbb E}((X_1 - {\\mathbb E} X_1)(X_n - {\\mathbb E} X_n)) & {\\mathbb E}(X_2 - {\\mathbb E} X_2)(X_n - {\\mathbb E} X_n)) & \\cdots & {\\mathbb E}(X_n - {\\mathbb E} X_n)^2\n",
    "\\end{pmatrix}\n",
    ".\n",
    "$$\n",
    "\n",
    "Covariance matrices are always _positive semidefinite_.\n",
    "(If $x'Ax \\ge 0$ for all $x \\in \\Re^n$, $A$ is _nonnegative definite_ (or _positive semi-definite_.  [Here](./linalg.ipynb) is a review of linear algebra.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Multivariate Normal Distribution\n",
    "\n",
    "The notation $X \\sim {\\mathcal N}(\\mu, \\sigma^2)$ means that $X$ has a normal distribution with mean $\\mu$\n",
    "and variance $\\sigma^2$.\n",
    "This distribution is continuous, with probability density function\n",
    "$$\n",
    "\\frac{1}{\\sqrt{2\\pi} \\sigma} e^{\\frac{-(x-\\mu)^2}{2\\sigma^2}}.\n",
    "$$\n",
    "\n",
    "If $X \\sim {\\mathcal N}(\\mu, \\sigma^2)$, then $\\frac{X-\\mu}{\\sigma} \\sim {\\mathcal N}(0, 1)$,\n",
    "the _standard normal distribution_.\n",
    "The probability density function of the standard normal distribution is\n",
    "$$\n",
    "\\phi(x) = \\frac{1}{\\sqrt{2\\pi}} e^ {-x^2/2}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "## Plot the standard normal density and cdf\n",
    "\n",
    "def plotNorm(mu, sigma):\n",
    "    x = np.arange(mu-4*sigma, mu+4*sigma, 8*sigma/200)\n",
    "    y = np.exp(-x**2/(2*sigma**2))/(sigma*math.sqrt(2*math.pi))  # for clarity\n",
    "    Y = sp.stats.norm.cdf(x, loc=mu, scale=sigma)  # using scipy for convenience\n",
    "    plt.plot(x,y,'b-',x,Y,'r-',linewidth=2)\n",
    "    plt.plot((mu-4.1*sigma, mu+4.1*sigma), (0.5, 0.5), 'g--')  # horizontal line at 0.5\n",
    "    plt.xlabel('$x$')  # axis labels. Can use LaTeX math markup\n",
    "    plt.ylabel(r'$f(x) = \\frac{1}{\\sqrt{2\\pi}\\sigma} e^{-x^2/2\\sigma^2}$; $F(x)$')\n",
    "    plt.axis([mu-4.1*sigma, mu+4.1*sigma,0,max(1.1,max(y))])  # axis limits\n",
    "    plt.title(r'The $\\mathcal{N}($' + str(mu) + ',' + str(sigma**2) + '$)$ density and cdf')\n",
    "    plt.show()\n",
    "    \n",
    "interact(plotNorm, mu=(-5,5,.05), sigma=(0.1, 10, .1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A collection of random variables $\\{ X_1, X_2, \\ldots, X_n\\} = \\{X_j\\}_{j=1}^n$ is _jointly normal_\n",
    "if all linear combinations of those variables have normal distributions.\n",
    "That is, the collection is jointly normal if for all $\\alpha \\in \\Re^n$, $\\sum_{j=1}^n \\alpha_j X_j$\n",
    "has a normal distribution.\n",
    "\n",
    "If $\\{X_j \\}_{j=1}^n$ are independent, normally distributed random variables, they are jointly normal.\n",
    "\n",
    "If for some $\\mu \\in \\Re^n$ and positive-definite matrix $G$, the joint density of $\\{X_j \\}_{j=1}^n$ is\n",
    "$$ \n",
    "\\left ( \\frac{1}{\\sqrt{2 \\pi}}\\right )^n \\frac{1}{\\sqrt{\\left | G \\right |}} \n",
    "\\exp \\left \\{ - \\frac{1}{2} (x - \\mu)'G^{-1}(x-\\mu) \\right \\},\n",
    "$$\n",
    "then $\\{X_j \\}_{j=1}^n$ are jointly normal, and the covariance matrix of $\\{X_j\\}_{j=1}^n$ is $G$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Central Limit Theorem\n",
    "\n",
    "For an elementary discussion, see [SticiGui: The Normal Curve, The Central Limit Theorem, and Markov's and Chebychev's Inequalities for Random Variables](http://www.stat.berkeley.edu/~stark/SticiGui/Text/clt.htm).\n",
    "\n",
    "Suppose $\\{X_j \\}_{j=1}^\\infty$ are independent and identically distributed (iid), have finite expected value ${\\mathbb E}X_j = \\mu$, and have finite variance $\\mbox{var }X_j = \\sigma^2$.\n",
    "\n",
    "Define the sum $S_n \\equiv \\sum_{j=1}^n X_j$.\n",
    "Then \n",
    "$$\n",
    "{\\mathbb E}S_n = {\\mathbb E} \\sum_{j=1}^n X_j = \\sum_{j=1}^n {\\mathbb E} X_j = \\sum_{j=1}^n \\mu = n\\mu,\n",
    "$$\n",
    "and\n",
    "$$\n",
    "\\mbox{var }S_n = \\mbox{var } \\sum_{j=1}^n X_j = n\\sigma^2.\n",
    "$$\n",
    "(The last step follows from the independence of $\\{X_j\\}$: the variance of the sum is the sum of the variances.)\n",
    "\n",
    "Define $Z_n \\equiv \\frac{S_n - n\\mu}{\\sqrt{n}\\sigma}$.\n",
    "Then for every $a, b \\in \\Re$ with $a \\le b$,\n",
    "$$\n",
    "\\lim_{n \\rightarrow \\infty} {\\mathbb P} \\{ a \\le Z_n \\le b \\} = \\frac{1}{\\sqrt{2\\pi}} \\int_a^b e^{-x^2/2} dx.\n",
    "$$\n",
    "This is a basic form of the Central Limit Theorem."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conditional Distributions\n",
    "\n",
    "The conditional distribution of a random variable or random vector\n",
    "$X$ given the event $A$ is \n",
    "\n",
    "$$\\mathbb{P}_{X|A}(B) = \\mathbb{P} \\{ X \\in B | A \\}$$\n",
    "\n",
    "as a function of $B$, provided $\\mathbb{P} (A) > 0$.\n",
    "\n",
    "\n",
    "[To do]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conditional Expectation\n",
    "[To do]\n",
    "\n",
    "Conditional expectation is a random variable...\n",
    "The expectation of the conditional expectation is the unconditional expectation\n",
    "$\\mathbb{E}(\\mathbb{E}(X|Y)) = \\mathbb{E} X$.\n",
    "This is essentially another expression of the law of total probability.\n",
    "\n",
    "### Examples\n",
    "[To do] Use random permutation of a list of numbers to illustrate: $\\mathbb{E}X_j$, $\\mathbb{E}(X_j | X_k = x)$, $\\mathbb{E}(X_j | X_k)$, $\\mathbb{E} (\\mathbb{E}(X_j | X_k)) = $\\mathbb{E}X_j$.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Point Processes\n",
    "\n",
    "Point processes formalize the notion of something occurring at a random place or time (or both).\n",
    "\n",
    "For instance, imagine the radioactive decay of a mass of uranium; the particular times at which an atom decays are modeled well as a Poisson process (described below).\n",
    "\n",
    "\n",
    "### Poisson Processes\n",
    "\n",
    "Temporal, spatiotemporal.  Waiting times (inter-arrival times) are exponential.\n",
    "Alternative characterizations.\n",
    "\n",
    "Temporal point processes: the counting function.\n",
    "[To Do]\n",
    "\n",
    "\n",
    "#### Marked Poisson Processes\n",
    "[To Do]\n",
    "\n",
    "### Renewal Processes\n",
    "[To Do]\n",
    "\n",
    "\n",
    "### Branching Processes\n",
    "[To Do]\n",
    "\n",
    "\n",
    "#### Hawkes Processes\n",
    "[To Do]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Previous: [Theories of Probability](probTheory.ipynb) Next: [Probability Inequalities](ineq.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%run talkTools.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}