{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Solutions for Chapter 4. Gaussian Models\n", "+ date: 2017-03-20\n", "+ tags: mlapp, statistics\n", "+ author: Peijun Zhu\n", "+ slug: mlapp-ch4-sol" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "$\\def\\bra#1{\\mathinner{\\langle{#1}|}}\n", "\\def\\ket#1{\\mathinner{|{#1}\\rangle}}\n", "\\def\\braket#1{\\mathinner{\\langle{#1}\\rangle}}\n", "\\newcommand{\\mdef}{\\overset{\\mathrm{def}}{=}}\n", "\\newcommand{\\bm}{\\mathbf}\n", "\\newcommand{\\pfr}[2]{\\frac{\\pp #1}{\\pp #2}} % Partial derivative\n", "\\newcommand{\\pfr}[2]{\\frac{\\pp #1}{\\pp #2}} % Partial derivative\n", "\\newcommand{\\inv}[1]{#1^{-1}} % Inverse Matrix\n", "\\newcommand{\\invt}[1]{#1^{-T}} % Inverse Transposed Matrix\n", "\\renewcommand{\\nl}{\\\\&\\phantom{{}={}}}% Newline In aligned equations\n", "\\newcommand{\\dfr}[2]{\\frac{\\dd #1}{\\dd #2}} % Total derivative\n", "\\DeclareMathOperator{\\Var}{Var} \n", "\\DeclareMathOperator{\\det}{det} \n", "\\DeclareMathOperator{\\tr}{tr} \n", "\\DeclareMathOperator{\\E}{E} \n", "\\DeclareMathOperator{\\Cov}{Cov} \n", "\\DeclareMathOperator{\\Beta}{B} \n", "\\DeclareMathOperator{\\Bdist}{Beta}\n", "\\DeclareMathOperator{\\sgn}{sgn}\n", "\\DeclareMathOperator{\\adj}{adj}\n", "\\DeclareMathOperator{\\ii}{i} \n", "\\DeclareMathOperator{\\dd}{d} \n", "\\newcommand{\\pp}{\\partial} \n", "\\DeclareMathOperator{\\rhs}{RHS}\n", "\\DeclareMathOperator{\\lhs}{LHS}$\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.1\n", "As\n", "$$\\E[x]=0,\\quad \\E[y]=\\E[x]^2+\\Var[x]^2=1/3$$\n", "\n", "\\begin{align}\n", "\\Cov[x,y]&=E[xy-x\\bar y-\\bar xy+\\bar x\\bar y]\\\\\n", "&=\\E[xy]-\\E[x]\\E[y]\\\\\n", "&=\\int_{-1}^1 \\frac{x^3}{2}dx\\\\\n", "&=0\\\\\n", "&\\Rightarrow\\rho=0\\end{align}" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 4.2\n", "$$P(Y=y)=\\frac{P(X=y)+P(X=-y)}{2}=P(X=y)$$\n", "It is obvious that $\\E[x]=\\E[y]=0$,\n", "\n", "\\begin{align}\n", "\\Cov[x,y]&=\\E[xy]-\\E[x]\\E[y]\\\\\n", "&=P(W=1)\\E[x,y|W=1]+P(W=-1)\\E[x,y|W=-1]\\\\\n", "&=(E[x^2]+E[-x^2])/2\\\\\n", "&=0\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.3\n", "We need to prove $|\\rho|<1$, i.e. \n", "$$\\Cov[x,y]^2\\leq \\Cov[x,x]\\Cov[y,y]$$\n", "\n", "The $\\Cov$ function satisty\n", "+ Symmetry $\\Cov[x,y]=\\Cov[y,x]$\n", "+ Double linearity $\\Cov[\\lambda x+\\mu y, z]=\\lambda\\Cov[x, z]+\\mu\\Cov[y,z]$\n", "+ Positive definiteness $\\Cov[x, x]\\geq 0$, equal sign holds iff $x=0$\n", "\n", "Using Cauchy-Schwartz Inequality, the conclusion is straightforward." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.4\n", "It is obvious that $\\Cov[X,Y+c]=\\Cov[X,Y]$ for any const $c$. So for $Y=aX+b$, \n", "\n", "\\begin{align}\n", "\\Cov[X,Y]&=\\Cov[x,aX]=a\\Cov[x,x]\\\\\n", "\\Cov[Y,Y]&=a^2\\Cov[X,X]\n", "\\end{align}\n", "\n", "So we have\n", "$$\\rho=\\frac{\\Cov[X,Y]}{\\sqrt{\\Cov[X,X]\\Cov[Y,Y]}}=\\frac{a\\Cov[X,X]}{|a|\\Cov[X,X]}=\\sgn[a]$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.5\n", "Suppose $\\Lambda=\\Sigma^{-1}=U^TDU$, where $U$ is orthonormal and $D$ is diagonal, then we can find basis $\\vec y=U(\\vec x-\\vec{\\mu})$ , which simplifies the integral to\n", "\n", "\\begin{align}\n", "\\int \\exp\\left(-\\frac{y^TDy}{2}\\right)&=\\prod \\int \\exp\\left(-\\frac{D_iy_i^2}{2}\\right)dy_i\\\\\n", "&=\\prod \\sqrt{\\frac{2\\pi}{D_i}}\\\\\n", "&=\\frac{(2\\pi)^{d/2}}{\\sqrt{\\det D}}\\\\\n", "&=(2\\pi)^{d/2}\\sqrt{\\det \\Sigma}\\\\\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.6\n", "Obviously, $\\det\\Sigma=(1-\\rho^2)\\sigma_1^2\\sigma_2^2$\n", "\n", "The inverse matrix of $\\Sigma$ is \n", "\n", "\\begin{align}\n", "\\Lambda&=\\adj\\Sigma/\\det\\Sigma\\\\\n", "&=\\begin{bmatrix}\n", "\\sigma_2^2& -\\rho\\sigma_1\\sigma_2\\\\\n", "-\\rho\\sigma_1\\sigma_2& \\sigma_1^2\n", "\\end{bmatrix}/\\left[(1-\\rho^2)\\sigma_1^2\\sigma_2^2\\right]\\\\\n", "&=\\frac{1}{1-\\rho^2}\\begin{bmatrix}\n", "\\dfrac{1}{\\sigma_1^2}& -\\dfrac{\\rho}{\\sigma_1\\sigma_2}\\\\\n", "-\\dfrac{\\rho}{\\sigma_1\\sigma_2}& \\dfrac{1}{\\sigma_2^2}\n", "\\end{bmatrix}\n", "\\end{align}\n", "Plug thess expressions into the original pdf, we can easily prove the (4.268) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.7 Bivariate conditioning\n", "\\begin{align}\n", "p(x_1,x_2)&=\\frac{1}{2\\pi\\sigma_1\\sigma_2\\sqrt{1-\\rho^2}}\\exp\\left(-\\frac{1}{2(1-\\rho^2)}\\left(\\frac{(x_1-\\mu_1)^2}{\\sigma_1^2}+\\frac{(x_2-\\mu_2)^2}{\\sigma_2^2}-2\\rho \\frac{x_1-\\mu_1}{\\sigma_1}\\frac{x_2-\\mu_2}{\\sigma_2}\\right)\\right)\\\\\n", "& =\\frac{1}{2\\pi\\sigma_1\\sigma_2\\sqrt{1-\\rho^2}}\\exp\\left(-\\frac{1}{2(1-\\rho^2)}\\left(\\frac{x_1-\\mu_1}{\\sigma_1}-\\rho\\frac{x_2-\\mu_2}{\\sigma_2}\\right)^2+\\frac{(x_2-\\mu_2)^2}{2\\sigma_2^2}\\right)\\\\\n", "& =\\frac{N(x_2|\\mu_2,\\sigma_2^2)}{\\sqrt{2\\pi}\\sigma_1\\sqrt{1-\\rho^2}}\\exp\\left(-\\frac{1}{2\\sigma_1^2(1-\\rho^2)}\\left(x_1-\\mu_1-\\rho\\frac{\\sigma_1}{\\sigma_2}(x_2-\\mu_2)\\right)^2\\right)\\\\\n", "&=N(x_2|\\mu_2,\\sigma_2^2)N\\left(x_1\\Big|\\mu_1+\\rho\\frac{\\sigma_1}{\\sigma_2}(x_2-\\mu_2),\\sigma_1^2(1-\\rho^2)\\right)\\\\\n", "&=p(x_2)N\\left(x_1\\Big|\\mu_1+\\rho\\frac{\\sigma_1}{\\sigma_2}(x_2-\\mu_2),\\sigma_1^2(1-\\rho^2)\\right)\\\\\n", "\\Rightarrow\\quad p(x_1|x_2)&=N\\left(x_1\\Big|\\mu_1+\\rho\\frac{\\sigma_1}{\\sigma_2}(x_2-\\mu_2),\\sigma_1^2(1-\\rho^2)\\right)\n", "\\end{align}\n", "\n", "If $\\sigma_i=1$, then it is simplified to\n", "$$p(x_1|x_2)=N\\left(x_1|\\mu_1+\\rho(x_2-\\mu_2),1-\\rho^2\\right)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.8 TBD\n", "Try to use python to solve it!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.9 Sensor fusion with known variances in 1d\n", "\\begin{align}\n", "p(\\mu|D)&\\propto \\prod_i N(y_i^{(1)}|\\mu, v_1)\\prod_i N(y_i^{(2)}|\\mu, v_2)\\\\\n", "&\\propto \\prod_i N(\\mu|y_i^{(1)}, v_1)\\prod_i N(\\mu|y_i^{(2)}, v_2)\\\\\n", "&\\propto \\prod_i N_c(\\mu|v_1^{-1}y_i^{(1)}, v_1^{-1})\\prod_i N_c(\\mu|v_2^{-1}y_i^{(2)}, v_2^{-1})\\\\\n", "&\\propto N_c\\left(\\mu\\Big|v_1^{-1}\\sum_i y_i^{(1)}+v_2^{-1}\\sum_i y_i^{(2)}, n_1v_1^{-1}+n_2v_2^{-1}\\right)\\\\\n", "&\\propto N_c\\left(\\mu\\Big|n_1v_1^{-1}\\bar y^{(1)}+n_2v_2^{-1}\\bar y^{(2)}, n_1v_1^{-1}+n_2v_2^{-1}\\right)\\\\\n", "&\\propto N\\left(\\mu\\Big|\\frac{n_1\\bar y^{(1)}/v_1+n_2\\bar y^{(2)}/v_1}{n_1/v_1+n_2/v_2}, \\frac{1}{n_1/v_1+n_2/v_2}\\right)\n", "\\end{align}\n", "\n", "So the mean of $\\mu$ is $\\dfrac{n_1\\bar y^{(1)}/v_1+n_2\\bar y^{(2)}/v_1}{n_1/v_1+n_2/v_2}$, and variance of $\\mu$ is $\\dfrac{1}{n_1/v_1+n_2/v_2}$." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 4.10 Information form marginalization formula\n", "$$\\det \\Lambda=\\det \\Sigma^{-1}=(\\det\\Sigma)^{-1}$$\n", "\n", "\\begin{align}\n", "(x-\\mu)^T\\Sigma^{-1}(x-\\mu)&=x^T\\Lambda x-2x^T\\Lambda \\mu+\\mu^T\\Lambda\\mu\\\\\n", "&=x^T\\Lambda x-2x^T\\xi+\\xi^T\\Lambda^{-1}\\xi\n", "\\end{align}\n", "\n", "From $$\\begin{bmatrix}\n", "\\Sigma_{11}& \\Sigma_{12}\\\\\n", "\\Sigma_{21}& \\Sigma_{22}\n", "\\end{bmatrix}\n", "\\cdot\\begin{bmatrix}\n", "\\Lambda_{11}&\\Lambda_{12}\\\\\n", "\\Lambda_{21}&\\Lambda_{22}\n", "\\end{bmatrix}=I,$$\n", "we find \n", "\n", "\\begin{align}\n", "\\Sigma_{11}^{-1}=\\Lambda_{11}-\\Lambda_{12}\\Lambda_{22}^{-1}\\Lambda_{21},\\quad \\Sigma_{12}^{-1}=\\Lambda_{21}-\\Lambda_{22}\\Lambda_{12}^{-1}\\Lambda_{11}\\\\\n", "\\Sigma_{21}^{-1}=\\Lambda_{12}-\\Lambda_{11}\\Lambda_{21}^{-1}\\Lambda_{22},\\quad \\Sigma_{22}^{-1}=\\Lambda_{22}-\\Lambda_{21}\\Lambda_{11}^{-1}\\Lambda_{12}\n", "\\end{align}\n", "\n", "Using the rule $N(\\mu,\\Sigma)\\rightarrow N_c(\\Sigma^{-1}\\mu,\\Sigma^{-1})$\n", "\n", "\\begin{align}\n", "p(x_2)&=N(x_2|\\mu_2,\\Sigma_{22})\\\\\n", "&=N_c(x_2|\\Sigma_{22}^{-1}\\mu_2, \\Sigma_{22}^{-1})\\\\\n", "&=N_c(x_2|(\\Lambda_{22}-\\Lambda_{21}\\Lambda_{11}^{-1}\\Lambda_{12})\\mu_2,\\Sigma_{22}^{-1})\\\\\n", "&=N_c(x_2|(\\Lambda_{22}\\mu_2+\\Lambda_{21}\\mu_1)-\\Lambda_{21}\\Lambda_{11}^{-1}(\\Lambda_{11}\\mu_1+\\Lambda_{12}\\mu_2),\\Sigma_{22}^{-1})\\\\\n", "&=N_c(x_2|\\xi_2-\\Lambda_{21}\\Lambda_{11}^{-1}\\xi_1,\\Lambda_{22}-\\Lambda_{21}\\Lambda_{11}^{-1}\\Lambda_{12})\\\\\n", "p(x_1|x_2)&=N(x_1|\\mu_{1|2}, \\Sigma_{1|2})\\\\\n", "&=N(x_1|\\Lambda_{11}^{-1}(\\Lambda_{11}\\mu_1-\\Lambda_{12}(x_2-\\mu_2)), \\Lambda_{11}^{-1})\\\\\n", "&=N_c(x_1|\\Lambda_{11}\\mu_1-\\Lambda_{12}(x_2-\\mu_2),\\Lambda_{11})\\\\\n", "&=N_c(x_1|(\\Lambda_{11}\\mu_1+\\Lambda_{12}\\mu_2)-\\Lambda_{12}x_2,\\Lambda_{11})\\\\\n", "&=N_c(x_1|\\xi_1-\\Lambda_{12}x_2,\\Lambda_{11})\\\\\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11,12\n", "## 13\n", "$$p(\\mu|D)\\propto N(\\mu|\\mu_0,9)\\prod_i N(y_i|\\mu, 4)$$\n", "And we find $\\mu\\sim N(\\mu_n,\\sigma_n^2)$, where $\\sigma_n^2=\\dfrac{1}{1/9+n/4}$, so $n=4/\\sigma_n^2-4/9$. We need\n", "$$1.96\\sigma_n=1/2\\quad\\Rightarrow n>62.29$$" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }