{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "View the assignment description at [http://www.cs.ubc.ca/~nando/540-2013/lectures/homework1.pdf](http://www.cs.ubc.ca/~nando/540-2013/lectures/homework1.pdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $\\mathbf{X} \\in \\mathbb{R}^{n \\times d}$ be the matrix of input vectors, $\\mathbf{y} \\in \\mathbb{R}^{n \\times 1}$ be the vector of targets and $\\boldsymbol{\\theta} \\in \\mathbb{R}^{n \\times 1}$ be the vector of weights. Assume that the likelihood is a Gaussian:\n", "$$p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) =\n", "\\mathcal{N}(\\mathbf{y} | \\mathbf{X}\\boldsymbol{\\theta}, \\mathbf{\\Sigma}) =\n", "|2\\pi\\mathbf\\Sigma|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\right\\}$$\n", "where $\\mathbf{\\Sigma} \\in \\mathbb{R}^{n \\times n}$ is the covariance matrix and we assume is given.\n", "Assume also that the prior for $\\boldsymbol{\\theta}$ is a Gaussian:\n", "$$p\\left(\\boldsymbol{\\theta}\\right) = \\mathcal{N}(\\boldsymbol\\theta | \\mathbf{0}, \\mathbf{\\Delta}) = |2\\pi\\mathbf{\\Delta}|^{-\\frac{1}{2}}\n", "\\exp\\left\\{-\\frac{1}{2} \\boldsymbol{\\theta}^T \\mathbf\\Delta^{-1} \\boldsymbol{\\theta}\\right\\}$$\n", "where $\\mathbf{\\Delta} \\in \\mathbb{R}^{d \\times d}$ is the covariance matrix." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Exercise 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then the posterior for $\\boldsymbol{\\theta}$ is:\n", "$$p\\left( \\boldsymbol{\\theta} | \\mathbf{y}, \\mathbf{X}, \\mathbf{\\Sigma} \\right) \\propto\n", "p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) p\\left(\\boldsymbol{\\theta}\\right) \\propto\n", "\\exp\\left\\{ -\\frac{1}{2} \\boldsymbol{\\theta}^T \\left(\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1}\\right) \\boldsymbol{\\theta} + \\boldsymbol{\\theta}^T\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} \\right\\}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now if we want the posterior of $\\boldsymbol\\theta$ to be a Gaussian of the form:\n", "$$\\mathcal{N}\\left( \\boldsymbol\\theta | \\boldsymbol{\\theta}_n, \\mathbf{V}_n \\right) =\n", "|2\\pi\\mathbf{V}_n|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\left(\\boldsymbol\\theta - \\boldsymbol{\\theta}_n\\right)^T \\mathbf{V}_{n}^{-1} \\left(\\boldsymbol\\theta - \\boldsymbol{\\theta}_n\\right) \\right\\} \\propto\n", "\\exp\\left\\{ -\\frac{1}{2} \\boldsymbol{\\theta}^T\\mathbf{V}_{n}^{-1}\\boldsymbol{\\theta} + \\boldsymbol{\\theta}^T\\mathbf{V}_{n}^{-1}\\boldsymbol{\\theta}_n \\right\\}$$\n", "we have to equate:\n", "$$\\mathbf{V}_n^{-1} = \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1}$$\n", "and\n", "$$\\mathbf{V}_n^{-1}\\boldsymbol{\\theta}_n = \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} $$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thus we can write the posterior for $\\boldsymbol\\theta$ as:\n", "$$p\\left( \\boldsymbol{\\theta} | \\mathbf{y}, \\mathbf{X}, \\mathbf{\\Sigma} \\right) = \n", "\\mathcal{N}\\left( \\boldsymbol\\theta | \\boldsymbol{\\theta}_n, \\mathbf{V}_n \\right)$$\n", "where $\\mathbf{V}_n^{-1} = \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1}$ and\n", "$\\boldsymbol{\\theta}_n = \\left( \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1} \\right)^{-1}\n", "\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ridge estimator for $\\boldsymbol\\theta$ is given by\n", "$$\\hat{\\boldsymbol{\\theta}}_R = \\left( \\mathbf{X}^T\\mathbf{X} + \\delta^2I_d \\right)^{-1} \\mathbf{X}^T\\mathbf{y}$$\n", "and is equal to the posterior mean $\\boldsymbol{\\theta}_n$ when $\\mathbf\\Sigma = \\sigma^2I_n$ (i.e. the elements in the dataset are uncorrelated and have the same variance) and $\\mathbf{\\Delta}=\\tau^2I_d$ (i.e. the elements of the prior are uncorrelated and have the same variance), in fact in this case the posterior mean is equal to\n", "$$\\boldsymbol{\\theta}_n = \\left( \\frac{1}{\\sigma^2} \\mathbf{X}^T\\mathbf{X} + \\frac{1}{\\tau^2}I_d \\right)^{-1}\n", "\\frac{1}{\\sigma^2}\\mathbf{X}^T\\mathbf{y} =\n", "\\left( \\mathbf{X}^T\\mathbf{X} + \\frac{\\sigma^2}{\\tau^2}I_d \\right)^{-1}\n", "\\mathbf{X}^T\\mathbf{y}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The MLE estimator for $\\boldsymbol\\theta$ is given by\n", "$$\\hat{\\boldsymbol{\\theta}}_{ML} = \\left( \\mathbf{X}^T\\mathbf{X} \\right)^{-1} \\mathbf{X}^T\\mathbf{y}$$\n", "and is equal to the posterior mean $\\boldsymbol{\\theta}_n$ when $\\mathbf\\Sigma = \\sigma^2I_n$ (i.e. the elements in the dataset are uncorrelated and have the same variance) and $\\mathbf{\\Delta}^{-1}=0$ (i.e. the variance in the prior tends to infinite), in fact in this case the posterior mean is equal to\n", "$$\\boldsymbol{\\theta}_n = \\left( \\frac{1}{\\sigma^2} \\mathbf{X}^T\\mathbf{X} \\right)^{-1}\n", "\\frac{1}{\\sigma^2}\\mathbf{X}^T\\mathbf{y} =\n", "\\left( \\mathbf{X}^T\\mathbf{X} \\right)^{-1}\n", "\\mathbf{X}^T\\mathbf{y}$$" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Exercise 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now calculate the maximum likelihood estimate for $\\mathbf{\\Sigma}$. To do so we calculate the derivative of the log-likelihood\n", "\n", "$$\\frac{\\partial}{\\partial{\\mathbf{\\Sigma}^{-1}}} -\\frac{1}{2}\\mathbf{y}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} +\n", "\\boldsymbol{\\theta}^T\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} -\n", "\\frac{1}{2} \\boldsymbol{\\theta}^T\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X}\\boldsymbol{\\theta} +\n", "\\frac{1}{2} \\log{|\\mathbf{\\Sigma}^{-1}|}$$\n", "$$=-\\frac{1}{2}\\mathbf{y}\\mathbf{y}^T + \\mathbf{y}\\boldsymbol{\\theta}^T\\mathbf{X}^T -\n", "\\frac{1}{2} \\mathbf{X}\\boldsymbol{\\theta}\\boldsymbol{\\theta}^T\\mathbf{X}^T + \\frac{1}{2}\\mathbf{\\Sigma}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we set the derivative log-likelihood to zero and we get\n", "$$\\mathbf{\\Sigma}_{ML} = \\mathbf{yy}^T + \\mathbf{X}\\boldsymbol{\\theta\\theta}^T\\mathbf{X}^T - 2\\mathbf{y}\\boldsymbol{\\theta}^T\\mathbf{X}^T = \\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n", "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T$$" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Exercise 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assume the covariance matrix is unkown and is given an inverse Wishart prior on $\\mathbf{\\Sigma}$, with fixed know parameters $\\alpha$ and $\\mathbf{\\Sigma}^*$:\n", "$$p\\left( \\mathbf{\\Sigma} | \\alpha, \\mathbf{\\Sigma}^* \\right) \\propto\n", "|\\mathbf{\\Sigma}|^{-\\left(\\alpha+n+1\\right)/2} + \n", "\\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left( \\mathbf{\\Sigma}^* \\mathbf{\\Sigma}^{-1} \\right) \\right\\}$$\n", "Assume that the likelihood is a Gaussian:\n", "$$p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) = \\mathcal{N}(\\mathbf{y} | \\mathbf{X}\\boldsymbol{\\theta}, \\mathbf{\\Sigma}) = |2\\pi\\mathbf\\Sigma|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\right\\}=\n", "|2\\pi\\mathbf\\Sigma|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left( \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\right)\\right\\}$$\n", "and since $\\mathrm{trace}\\left(\\mathbf{z}^T\\mathbf{Az}\\right) = \\mathrm{trace}\\left(\\mathbf{z}\\mathbf{z}^T\\mathbf{A}\\right)$\n", "$$p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) \\propto\n", "|\\mathbf\\Sigma|^{-\\frac{1}{2}} \n", "\\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left( \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\right)\\right\\}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can write the posterior as\n", "$$p\\left( \\mathbf{\\Sigma} | \\mathbf{y}, \\mathbf{X}, \\boldsymbol\\theta, \\alpha, \\mathbf{\\Sigma}^* \\right) \\propto\n", "p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right)\n", "p\\left( \\mathbf{\\Sigma} | \\alpha, \\mathbf{\\Sigma}^* \\right) \\propto\n", "|\\mathbf\\Sigma|^{-\\left(\\left(\\alpha+1\\right)+n+1\\right)/2} \n", "\\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left[ \\left( \\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n", "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T + \\mathbf{\\Sigma}^* \\right) \\mathbf{\\Sigma}^{-1} \\right] \\right\\}$$\n", "thus we can write the posterior as an inverse Wishart with parameters $\\alpha + 1$ and\n", "$\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n", "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T + \\mathbf{\\Sigma}^*$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If $\\alpha = n+1$ and $\\mathbf{\\Sigma}^* = 0$ then the posterior has parameters $\\alpha +2$ and $\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n", "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T$, thus the expectation of the distribution is\n", "$$ \\mathbb{E}\\left(\\mathbf\\Sigma\\right) = \\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n", "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T$$\n", "which is the maximum likelihood estimate $\\mathbf{\\Sigma}_{ML}$." ] } ], "metadata": {} } ] }