{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# ipython magic\n", "%matplotlib notebook\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# plot configuration\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "plt.style.use(\"ggplot\")\n", "# import seaborn as sns # sets another style\n", "matplotlib.rcParams['lines.linewidth'] = 3\n", "fig_width, fig_height = (7.0,5.0)\n", "matplotlib.rcParams['figure.figsize'] = (fig_width, fig_height)\n", "\n", "# font = {'family' : 'sans-serif',\n", "# 'weight' : 'normal',\n", "# 'size' : 18.0}\n", "# matplotlib.rc('font', **font) # pass in the font dict as kwar" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Imports and definitions\n", "from sympy import *\n", "from sympy.stats import Normal, density\n", "init_printing(use_latex='mathjax')\n", "from IPython.display import display;\n", "\n", "x1,x2,y =symbols(\"x1,x2,y\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conditional variance as a means to quantify uncertainty and sensitivity\n", "\n", "\n", "## Recap of basic statistics\n", "\n", "Inspired by [Rachel Fewster's presentation](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch3.pdf)\n", "### Expectation\n", "\n", "The mean, expected value, or expectation of a random variable $X$ is\n", "written $E(X)$ or sometimes $\\mu_X$. When it is assumed to be clear\n", "from the context which random varialbe it pertains to, the supscript\n", "of the expectation may by dropped, i.e. $E(X)=\\mu$. The meaning of\n", "$E(X)$ is that if we observe $N$ random values of $X$, the mean of\n", "these $N$ values will be approximately equal to $E(X)$ for large\n", "values of $N$ ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "E(X) \\equiv \\mu = {\\int\\limits}_{x_{\\text{min}}}^{x_{\\text{max}}} x \\; p(x) \\, dx \n", "\\label{eq:definition} \\tag{1}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where $p(x)$ is the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function) (pdf) of\n", "for the random variable $X$.\n", "\n", "With [sympy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html) we may compute\n", "the analytical expectation of a normal distribution easily. Random\n", "variables may be declared by means of ready made functions like\n", "`Normal, Exponential`,etc. A corresponding probability density\n", "function may declared with the `density` function. The expectation\n", "$E(X) = \\mu$ may then be computed analyticaly by following the\n", "definition in ([1](#eq:definition)):\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Expectation of a normally distributed variable \n", "mu1 = symbols(\"mu1\", positive=True)\n", "V1 = symbols(\"sigma1\", positive=True)\n", "X1 = Normal(\"X\", mu1, V1)\n", "D1 = density(X1)(x1)\n", "E1=Integral(x1*D1,(x1,-oo,oo))\n", "display(Eq(E1,E1.doit())) # use doit to evaluate an unevaluated integral" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our snippet above the random variable $Z$ is normally distributed with variance `mu1` and variance `V1` (rendered $\\mu_1$ and $\\sigma^2$ repsectively, with mathjax). \n", "\n", "We observe that when we carry out the integration\n", "`E1=Integral(x1*D1,(x1,-oo,oo))` the expectation $\\mu_1$ is returned\n", "as it should.\n", "\n", "### Variance\n", "\n", "The variance $V(X)$ is introduced to quantify \"how much\" the samples\n", "of the random variable $X$ varies around the mean $E(X)$:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "V(X) \\equiv \\sigma_X^2 = E \\left [\\left ( X - E(X) \\right )^2 \\right ] \n", "\\label{eq:var_def} \\tag{2}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variance for our random variable `Z` in the code snippet above may also computed from the definition in ([2](#eq:var_def)):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Compute the variance analytically\n", "V = Integral((x1-E1.doit())**2*D1, (x1,-oo,oo))\n", "display(Eq(V,V.doit()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A common way of denoting the variance of $X$ is also\n", "$V(X)=\\sigma_X^2$. For later use we observe that from\n", "([2](#eq:var_def)) we may deduce the commonly used relations:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "V(X) = E \\left [\\left ( X - E(X) \\right )^2 \\right ] = E[X^2] -2 \\; E\\left[X\\, E[X] \\right ] + E^2[X]= E[X^2] - E^2[X]\n", "\\label{eq:Vx} \\tag{3}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and thus:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", " E[X^2] = V(X) + E^2[X]\n", "\\label{eq:Exsq} \\tag{4}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conditional probability density functions and expectation\n", "\n", "**Joint probability density functions:**\n", "\n", "Suppose that $X_1, X_2, \\ldots$ are random varibles, possibly\n", "dependent on each other, then their [joint probability density\n", "function](https://en.wikipedia.org/wiki/Joint_probability_distribution), (jpdf) \n", "expresses the probability that each variable falls within the range\n", "specificed for that particular variable. In the case of two random\n", "variables, the joint probability distribution is normally called the\n", "bivariate distribution.\n", "\n", "Below we will consider the jpdfs for the input parameters of a\n", "computational model, which we assume to be independent random\n", "variables. We will further assume that the jpdf may simply be\n", "represented as:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "p(x_1,x_2) = p(x_1) \\, p(x_2)\n", "\\label{eq:jpdf} \\tag{5}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below we compute the expectation for a bivariate distribution:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Expectation of the product of two normally distributed parameters\n", "mu2 = symbols(\"mu2\", positive=True)\n", "V2 = symbols(\"sigma2\", positive=True)\n", "X2 = Normal(\"X\", mu2, V2)\n", "D2=density(X2)(x2)\n", "jpdf=D1*D2\n", "E2=Integral(x1*x2*jpdf,(x2,-oo,oo),(x1,-oo,oo)) \n", "display(Eq(E2,E2.doit())) # use doit to evaluate an unevaluated integral" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and we note that the expectation of the bivariate distribution is the\n", "scalar number given (the product of the two mean values).\n", "\n", "**Conditional pdfs:**\n", "\n", "Suppose that $X$ and $Y$ are two random varibles, possibly dependent\n", "on each other and that we fix $X$ at the value $x$.This situation\n", "gives rise to a conditional pdf of $Y$ for a\n", "given $X=x$:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "p(Y\\; | \\,X=x) = \\frac{p(X,Y)}{p(X)}\n", "\\label{eq:cond_pdf} \\tag{6}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where $p(X,Y)$ is the joint pdf. That is, whenever the joint pdf may\n", "be represented as the product of the random variables, the conditional\n", "pdf reduces to the marginal pdf of the variable which is not taken as\n", "a constant. We then realize that the conditional pdf of $Y$ for a\n", "given $X=x$ may be simplified to:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "p(Y\\; | \\,X=x) = p(Y)\n", "\\label{eq:cond_pdf_simple} \\tag{7}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Conditional expectation:**\n", "\n", "Once the conditional pdfs has been introduced, the conditional\n", "expectation $E(Y\\; | \\,X=x)$ has natural formulation following from the\n", "definition in ([1](#eq:definition)):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "E(Y\\; | \\,X=x) = {\\int\\limits}_{y_{\\text{min}}}^{y_{\\text{max}}} y \\; p(y\\; | \\,X=x) \\, dy = f(x)\n", "\\label{eq:conditionalE} \\tag{8}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Conditional expectation as a random variable:**\n", "\n", "However, the conditional expectation, $E(Y \\;|\\, X=x)$ is a number\n", "depending on $x$, and therefore the value of $x$ will also\n", "influence the mean or expectation of $Y$.\n", "The normal, unconditional expectation $E(X)$ of a random variable $X$, is just a number." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Conditional expectation\n", "E_given_x1=Integral(x2*jpdf,(x2,-oo,oo)) \n", "display(E_given_x1.doit())\n", "\n", "from sympy.plotting import plot\n", "mu1_value=1\n", "V1_value=2\n", "mu2_value=2\n", "V2_value=1\n", "\n", "Ex=E_given_x1.subs([(V1,V1_value),(mu2,mu2_value),(mu1,mu1_value)]).doit()\n", "_=plot(Ex,(x1,-10,10))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Compute the conditional variance analytically\n", "V_given_x1=Integral((x2-E_given_x1.doit())**2*jpdf,(x2,-oo,oo))\n", "display(simplify(V_given_x1.doit())) \n", "Vx=V_given_x1.subs([(mu1,mu1_value),(V1,V1_value),(mu2,mu2_value),(V2,V2_value)]).doit()\n", "display(Vx)\n", "_=plot(Vx,(x1,-10,10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "### Adam and Eve and their statistical relation\n", "\n", "For two integrable random variables $X$ and $Y$ Adam's and Eve's laws are formulated below.\n", "\n", "**Adam's Law of total expectation:**\n", "\n", "\n", "[Adam's law](https://en.wikipedia.org/wiki/Law_of_total_expectation) of\n", "total expectation may be stated as (follow the hyperlink for more details\n", "and proof):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "E[Y] = E[E[Y\\; | \\,X]]\n", "\\label{eq:Adam} \\tag{9}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "the law is also known as law of iterated expectations, the tower rule, the smoothing theorem.\n", "\n", "To make it more clear over which variables the expectation operators are applied to we may present Adams's law as:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "E[Y] = E_X \\, [E_{Y|X}[Y\\; | \\,X]]\n", "\\label{eq:Adam_explained} \\tag{10}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A short proof:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "E[Y] = \\int_x E[Y\\; | \\,X] \\; p(x) dx = \\int_x \\int_y y \\; p(y \\,| X=x) \\; dy \\; p(x) dx = \\int_x \\int_y y \\, p(x,y) \\; dy \\, dx = \\int_y y \\, p(y) \\; dy \n", "\\label{eq:Adam_proof} \\tag{11}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An alternative proof for the discrete case may be found on [YouTube](https://www.youtube.com/watch?v=Ki2HpTCPwhM)\n", "\n", "\n", "**Eve's Law of total variance:**\n", "\n", "The somewhat related [Eve's law](https://en.wikipedia.org/wiki/Law_of_total_variance), which we will\n", "refer to as the *law of total variance* may be represented as:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "V(Y) = V(E(Y\\; | \\,X)) + E(V(Y\\; | \\,X)) \n", "\\label{eq:Eve} \\tag{12}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generalizations to a computational model with $k$ uncorrelated parameters\n", "We consider a mathematical/computational model $f$ which produces an output $y$ as a function of $k$ uncorrelated parameters $x_1, x_2, \\ldots, x_k$:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", " y = f(x_1, x_2, \\ldots, x_k)\n", "\\label{eq:model} \\tag{13}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Condtional variance for our computational model\n", "\n", "Given that our computational model $f(x_1, x_2, \\ldots, x_k)$ takes\n", "$x_1, x_2, \\ldots, x_k$ are $k$ independent random parameters with\n", "marginal pdfs $p_i(x_i)$, a joint probability\n", "density function may be constructed:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "\\begin{equation}\n", "P(x_1, x_2, \\ldots, x_k) = \\prod_{i=1}^k p_i(x_i)\n", "\\label{}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the parameters of our model are random parameters, the predicted\n", "values $y$ will be random too, with the following statistical moments:\n", "\n", "**Expected value $E(y)=\\mu$:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "\\begin{equation}\n", "\\mu=E(y)=\\idotsint f(x_1, x_2, \\ldots, x_k)\\; \\prod_{i=1}^k p_i(x_i) \\, \\; \\mbox{dx}_i\n", "\\label{}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Variance $V(y)=\\sigma_y^2$:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "V(y) = \\idotsint \\left (f(x_1, x_2, \\ldots, x_k) -E(y) \\right )^2\\;\\prod_{i=1}^k p_i(x_i) \\, \\; \\mbox{dx}_i \\nonumber\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation} \n", " = \\idotsint f(x_1, x_2, \\ldots, x_k)^2\\; \\prod_{i=1}^k p_i(x_i)\\, \\; \\mbox{dx}_i -E^2(y) \\label{eq:VarDef} \\tag{14}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where the latter equivalence may be deduced following the same lines of deduction as for ([3](#eq:Vx))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "\\begin{align*}\n", "V(y\\; | \\,x_j = \\tilde{x}_j) & = \\idotsint \\left (f(x_1, x_2, \\ldots, \\tilde{x}_j,\\ldots, x_k) -E(y\\; | \\,x_j = \\tilde{x}_j) \\right )^2\\;\\prod_{i=1 \\atop i\\neq j}^k p_i(x_i) \\, \\; \\mbox{dx}_i \\\\\n", " & = \\idotsint f(x_1, x_2, \\ldots, \\tilde{x}_j, x_k)^2\\; \\prod_{i=1\\atop i\\neq j}^k p_i(x_i)\\, \\; \\mbox{dx}_i -E^2(y\\; | \\,x_j = \\tilde{x}_j) \n", "\\label{}\n", "\\end{align*}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "E\\left ( V(y\\; | \\,x_j = \\tilde{x}_j) \\right ) = \\idotsint f(x_1, x_2, \\ldots, \\tilde{x}_j, x_k)^2\\; \\prod_{i=1}^k p_i(x_i)\\, \\; \\mbox{dx}_i -\\int E^2(y\\; | \\,x_j = \\tilde{x}_j) \\; p_j(\\tilde{x}_j) \\; d \\tilde{x}_j \\label{eq:EVarCondz} \\tag{15}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [The Law of Total Variance](https://en.wikipedia.org/wiki/Law_of_total_variance) (also known as\n", "`Eve's Law`) can be presented as:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "V\\left ( E(y\\; | \\,x_j ) \\right ) = V(y) - E\\left ( V(y\\; | \\,x_j ) \\right ) \n", "\\label{eq:VarCond} \\tag{16}\n", "\\end{equation}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By subtraction of ([15](#eq:EVarCondz)) from ([14](#eq:VarDef)) we realize\n", "from ([16](#eq:VarCond)) that `Eve's Law` also may be represented:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "$$\n", "\\begin{equation}\n", "V \\left ( E(y\\; | \\,x_j ) \\right ) = \\int E^2(y\\; | \\,x_j = \\tilde{x}_j) \\; p_j(\\tilde{x}_j) \\; d \\tilde{x}_j -E^2(y)\n", "\\label{eq:VarCondII} \\tag{17}\n", "\\end{equation}\n", "$$" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }