{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Notes for Chapter 4. Gaussian Models\n", "+ date: 2017-03-20\n", "+ tags: mlapp, statistics, Calculus of Variations\n", "+ author: Peijun Zhu\n", "+ slug: mlapp-ch4-notes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "$\\def\\bra#1{\\mathinner{\\langle{#1}|}}\n", "\\def\\ket#1{\\mathinner{|{#1}\\rangle}}\n", "\\def\\braket#1{\\mathinner{\\langle{#1}\\rangle}}\n", "\\newcommand{\\mdef}{\\overset{\\mathrm{def}}{=}}\n", "\\newcommand{\\bm}{\\mathbf}\n", "\\newcommand{\\pfr}[2]{\\frac{\\pp #1}{\\pp #2}} % Partial derivative\n", "\\newcommand{\\inv}[1]{#1^{-1}} % Inverse Matrix\n", "\\newcommand{\\invt}[1]{#1^{-T}} % Inverse Transposed Matrix\n", "\\DeclareMathOperator{\\Var}{Var} \n", "\\DeclareMathOperator{\\det}{det} \n", "\\DeclareMathOperator{\\tr}{tr} \n", "\\DeclareMathOperator{\\E}{E} \n", "\\DeclareMathOperator{\\Cov}{Cov} \n", "\\DeclareMathOperator{\\Beta}{B} \n", "\\DeclareMathOperator{\\Bdist}{Beta}\n", "\\DeclareMathOperator{\\sgn}{sgn}\n", "\\DeclareMathOperator{\\adj}{adj}\n", "\\DeclareMathOperator{\\ii}{i} \n", "\\DeclareMathOperator{\\dd}{d} \n", "\\newcommand{\\pp}{\\partial} \n", "\\DeclareMathOperator{\\rhs}{RHS}\n", "\\DeclareMathOperator{\\lhs}{LHS}\n", "\\DeclareMathOperator{\\Beta}{B}\n", "\\newcommand{\\nl}{\\\\&\\phantom{={}}}\n", "$\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Matrix derivative\n", "We are using Einstein's summation rule below: sum over repeating index. It is useful in evaluating matrix/tensor.\n", "\n", "\\begin{align}\n", "\\delta |A|&=|A+\\delta A|-|A|\\\\\n", "&=|D+\\inv U\\delta A\\inv V|-|A|\\\\\n", "&=|D||I+\\inv A\\delta A|-|A|\\\\\n", "&=|A|\\tr(\\inv A\\delta A)\n", "\\end{align}\n", "\n", "Using\n", "\n", "\\begin{align}\n", "\\pfr{|A|}{A}&=e_ie_j^T\\pfr{|A|}{A_{ij}}\\\\\n", "&=e_ie_j^T|A|\\tr(\\inv A\\pfr{A}{A_{ij}})\\\\\n", "&=e_ie_j^T|A|\\tr(\\inv Ae_ie_j)\\\\\n", "&=e_ie_j^T|A|\\inv A_{ji}\\\\\n", "&=|A|\\invt A\n", "\\end{align}\n", "\n", "$$\\pfr{A_{ij}B_{ji}}{A}=e_me_n^T\\pfr{A_{ij}B_{ji}}{A_{mn}}=B_{nm}e_me_n^T=B^T$$\n", "\n", "$$\\pfr{a^TAa}{a}=e_m\\pfr{a_iA_{ij}a_j}{a_m}=e_m(A_{mj}a_j+a_i A_{im})=(A+A^T)a$$\n", "\n", "$$\\pfr{a^TAa}{A}=e_me_n^T\\pfr{a_iA_{ij}a_j}{A_{mn}}=aa^T$$\n", "\n", "$$\\pfr{\\tr AB}{A}=e_me_n^T\\pfr{A_{ij}B_{ji}}{A_{mn}}=e_me_n^TB_{nm}=B^T$$\n", "\n", "Using these relationships, it is straightforward to MLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sigmoid model\n", "$$\\exp\\left[\\frac{(x-\\mu)^2}{2\\sigma^2}\\right]=\\exp\\left(\\frac{x^2+\\mu^2-2\\mu x}{2\\sigma^2}\\right)=Af(x)\\exp(\\mu x/\\sigma^2)$$\n", "$$\\frac{A\\exp(ax)}{A\\exp(ax)+B\\exp(bx)}=\\frac{1}{1+\\exp(Dx+C)}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Maximal Entropy of Gaussian Dist from Variation\n", "Consider entropy\n", "\n", "\\begin{align}\n", "S&=-\\int p\\ln p \\dd V-a\\left(\\int p\\dd V-1\\right)-\\sum_i b_i\\left(\\int x_ip\\dd V\\right)\\nl-\\sum_{i,j} C_{ij}\\left(\\int x_ix_jp\\dd V-\\Sigma_{ij}\\right)\\\\\n", "&=-\\int (\\ln p+a+b^Tx+x^TCx)p \\dd V+a+\\tr(C\\Sigma)\\\\\n", "&=-\\int L(p,x)\\dd x+a+c\n", "\\end{align}\n", "\n", "The Euler-Lagrange equation is\n", "\n", "\\begin{align}\n", "\\pfr{L}{p}&=1+a+b^Tx+x^TCx+\\ln p=0\\\\\n", "\\Rightarrow p(x)&=\\exp(-1-a-b^Tx-x^TCx)\n", "\\end{align}\n", "\n", "It is a Gaussian distribution, with constraints:\n", "\n", "\\begin{align}\n", "\\int p\\dd V&=1\\\\\n", "\\int x_ip\\dd V&=0\\\\\n", "\\int x_ix_jp\\dd V&=\\Sigma_{ij}\\\\\n", "\\end{align}\n", "\n", "and we can determine $a,b,c$ from constraints. Multivariable case is harder, but the method is the same: using multiplier and EL equation. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear Gaussian system\n", "\\begin{align}\n", "p(x)&=N(x|\\mu_x, \\Sigma_x)\\\\\n", "p(y|x)&=N(y|Ax+b,\\Sigma_y)\n", "\\end{align}\n", "As $y=Ax+b$, we find $x=A^{-1}(y-b)$, so $$(y-Ax-b)^T\\Sigma_y(y-Ax-b)=[x-A^{-1}(y+b)]^TA^T\\Sigma_y^{-1}A[x-A^{-1}(y+b)]$$\n", "Thus\n", "\n", "\\begin{align}\n", "N(y|Ax+b,\\Sigma_y)&=N\\left(x|A^{-1}(y-b), A^{-1}\\Sigma_yA^{-T}\\right)\\\\\n", "&=N_c(x|A^T\\Sigma_y^{-1}AA^{-1}(y-b), A^T\\Sigma_y^{-1}A)\\\\\n", "&=N_c(x|A^T\\Sigma_y^{-1}(y-b), A^T\\Sigma_y^{-1}A)\\\\\n", "&=N_c(y|\\Sigma_y^{-1}(Ax+b), \\Sigma_y^{-1})\n", "\\end{align}\n", "\n", "\\begin{align}\n", "p(x,y)&= N(x|\\mu_x, \\Sigma_x)N(y|Ax+b,\\Sigma_y)\\\\\n", "&=N_c(x|\\Sigma_x^{-1}\\mu_x, \\Sigma_x^{-1})N_c(x|A^T\\Sigma_y^{-1}(y-b), A^T\\Sigma_y^{-1}A)\\\\\n", "&=p(y)N_c(x|\\Sigma_x^{-1}\\mu_x+A^T\\Sigma_y^{-1}(y-b), \\Sigma_x^{-1}+A^T\\Sigma_y^{-1}A)\\\\\n", "&=p(y)N(x|\\mu_{x|y},\\Sigma_{x|y})\\\\\n", "&=p(y)p(x|y)\\\\\n", "\\end{align}\n", "Where\n", "\n", "\\begin{align}\n", "\\Sigma_{x|y}^{-1}&=\\Sigma_x^{-1}+A^T\\Sigma_y^{-1}A\\\\\n", "\\mu_{x|y}&=\\Sigma_{x|y}[\\Sigma_x^{-1}\\mu_x+A^T\\Sigma_y^{-1}(y-b)]\\end{align}\n", "How to calculate $p(y)$?\n", "### Attempt\n", "We can rewrite \n", "$$N_c(x|A^T\\Sigma_y^{-1}y+(A^T\\Sigma_y^{-1}b+\\Sigma_x^{-1}\\mu_x), \\Sigma_x^{-1}+A^T\\Sigma_y^{-1}A)=N_c(y|Cx+d, xx)$$\n", "From $p(x,y)=p(y)p(x|y)$, we find $\\Sigma^{-1}=\\Sigma_x^{-1}+A^T\\Sigma_y^{-1}A-\\Sigma_{x|y}^{-1}$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }