{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"View the assignment description at [http://www.cs.ubc.ca/~nando/540-2013/lectures/homework1.pdf](http://www.cs.ubc.ca/~nando/540-2013/lectures/homework1.pdf)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let $\\mathbf{X} \\in \\mathbb{R}^{n \\times d}$ be the matrix of input vectors, $\\mathbf{y} \\in \\mathbb{R}^{n \\times 1}$ be the vector of targets and $\\boldsymbol{\\theta} \\in \\mathbb{R}^{n \\times 1}$ be the vector of weights. Assume that the likelihood is a Gaussian:\n",
"$$p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) =\n",
"\\mathcal{N}(\\mathbf{y} | \\mathbf{X}\\boldsymbol{\\theta}, \\mathbf{\\Sigma}) =\n",
"|2\\pi\\mathbf\\Sigma|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\right\\}$$\n",
"where $\\mathbf{\\Sigma} \\in \\mathbb{R}^{n \\times n}$ is the covariance matrix and we assume is given.\n",
"Assume also that the prior for $\\boldsymbol{\\theta}$ is a Gaussian:\n",
"$$p\\left(\\boldsymbol{\\theta}\\right) = \\mathcal{N}(\\boldsymbol\\theta | \\mathbf{0}, \\mathbf{\\Delta}) = |2\\pi\\mathbf{\\Delta}|^{-\\frac{1}{2}}\n",
"\\exp\\left\\{-\\frac{1}{2} \\boldsymbol{\\theta}^T \\mathbf\\Delta^{-1} \\boldsymbol{\\theta}\\right\\}$$\n",
"where $\\mathbf{\\Delta} \\in \\mathbb{R}^{d \\times d}$ is the covariance matrix."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then the posterior for $\\boldsymbol{\\theta}$ is:\n",
"$$p\\left( \\boldsymbol{\\theta} | \\mathbf{y}, \\mathbf{X}, \\mathbf{\\Sigma} \\right) \\propto\n",
"p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) p\\left(\\boldsymbol{\\theta}\\right) \\propto\n",
"\\exp\\left\\{ -\\frac{1}{2} \\boldsymbol{\\theta}^T \\left(\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1}\\right) \\boldsymbol{\\theta} + \\boldsymbol{\\theta}^T\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} \\right\\}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now if we want the posterior of $\\boldsymbol\\theta$ to be a Gaussian of the form:\n",
"$$\\mathcal{N}\\left( \\boldsymbol\\theta | \\boldsymbol{\\theta}_n, \\mathbf{V}_n \\right) =\n",
"|2\\pi\\mathbf{V}_n|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\left(\\boldsymbol\\theta - \\boldsymbol{\\theta}_n\\right)^T \\mathbf{V}_{n}^{-1} \\left(\\boldsymbol\\theta - \\boldsymbol{\\theta}_n\\right) \\right\\} \\propto\n",
"\\exp\\left\\{ -\\frac{1}{2} \\boldsymbol{\\theta}^T\\mathbf{V}_{n}^{-1}\\boldsymbol{\\theta} + \\boldsymbol{\\theta}^T\\mathbf{V}_{n}^{-1}\\boldsymbol{\\theta}_n \\right\\}$$\n",
"we have to equate:\n",
"$$\\mathbf{V}_n^{-1} = \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1}$$\n",
"and\n",
"$$\\mathbf{V}_n^{-1}\\boldsymbol{\\theta}_n = \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} $$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thus we can write the posterior for $\\boldsymbol\\theta$ as:\n",
"$$p\\left( \\boldsymbol{\\theta} | \\mathbf{y}, \\mathbf{X}, \\mathbf{\\Sigma} \\right) = \n",
"\\mathcal{N}\\left( \\boldsymbol\\theta | \\boldsymbol{\\theta}_n, \\mathbf{V}_n \\right)$$\n",
"where $\\mathbf{V}_n^{-1} = \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1}$ and\n",
"$\\boldsymbol{\\theta}_n = \\left( \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1} \\right)^{-1}\n",
"\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y}$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ridge estimator for $\\boldsymbol\\theta$ is given by\n",
"$$\\hat{\\boldsymbol{\\theta}}_R = \\left( \\mathbf{X}^T\\mathbf{X} + \\delta^2I_d \\right)^{-1} \\mathbf{X}^T\\mathbf{y}$$\n",
"and is equal to the posterior mean $\\boldsymbol{\\theta}_n$ when $\\mathbf\\Sigma = \\sigma^2I_n$ (i.e. the elements in the dataset are uncorrelated and have the same variance) and $\\mathbf{\\Delta}=\\tau^2I_d$ (i.e. the elements of the prior are uncorrelated and have the same variance), in fact in this case the posterior mean is equal to\n",
"$$\\boldsymbol{\\theta}_n = \\left( \\frac{1}{\\sigma^2} \\mathbf{X}^T\\mathbf{X} + \\frac{1}{\\tau^2}I_d \\right)^{-1}\n",
"\\frac{1}{\\sigma^2}\\mathbf{X}^T\\mathbf{y} =\n",
"\\left( \\mathbf{X}^T\\mathbf{X} + \\frac{\\sigma^2}{\\tau^2}I_d \\right)^{-1}\n",
"\\mathbf{X}^T\\mathbf{y}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The MLE estimator for $\\boldsymbol\\theta$ is given by\n",
"$$\\hat{\\boldsymbol{\\theta}}_{ML} = \\left( \\mathbf{X}^T\\mathbf{X} \\right)^{-1} \\mathbf{X}^T\\mathbf{y}$$\n",
"and is equal to the posterior mean $\\boldsymbol{\\theta}_n$ when $\\mathbf\\Sigma = \\sigma^2I_n$ (i.e. the elements in the dataset are uncorrelated and have the same variance) and $\\mathbf{\\Delta}^{-1}=0$ (i.e. the variance in the prior tends to infinite), in fact in this case the posterior mean is equal to\n",
"$$\\boldsymbol{\\theta}_n = \\left( \\frac{1}{\\sigma^2} \\mathbf{X}^T\\mathbf{X} \\right)^{-1}\n",
"\\frac{1}{\\sigma^2}\\mathbf{X}^T\\mathbf{y} =\n",
"\\left( \\mathbf{X}^T\\mathbf{X} \\right)^{-1}\n",
"\\mathbf{X}^T\\mathbf{y}$$"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now calculate the maximum likelihood estimate for $\\mathbf{\\Sigma}$. To do so we calculate the derivative of the log-likelihood\n",
"\n",
"$$\\frac{\\partial}{\\partial{\\mathbf{\\Sigma}^{-1}}} -\\frac{1}{2}\\mathbf{y}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} +\n",
"\\boldsymbol{\\theta}^T\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} -\n",
"\\frac{1}{2} \\boldsymbol{\\theta}^T\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X}\\boldsymbol{\\theta} +\n",
"\\frac{1}{2} \\log{|\\mathbf{\\Sigma}^{-1}|}$$\n",
"$$=-\\frac{1}{2}\\mathbf{y}\\mathbf{y}^T + \\mathbf{y}\\boldsymbol{\\theta}^T\\mathbf{X}^T -\n",
"\\frac{1}{2} \\mathbf{X}\\boldsymbol{\\theta}\\boldsymbol{\\theta}^T\\mathbf{X}^T + \\frac{1}{2}\\mathbf{\\Sigma}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we set the derivative log-likelihood to zero and we get\n",
"$$\\mathbf{\\Sigma}_{ML} = \\mathbf{yy}^T + \\mathbf{X}\\boldsymbol{\\theta\\theta}^T\\mathbf{X}^T - 2\\mathbf{y}\\boldsymbol{\\theta}^T\\mathbf{X}^T = \\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
"\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T$$"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assume the covariance matrix is unkown and is given an inverse Wishart prior on $\\mathbf{\\Sigma}$, with fixed know parameters $\\alpha$ and $\\mathbf{\\Sigma}^*$:\n",
"$$p\\left( \\mathbf{\\Sigma} | \\alpha, \\mathbf{\\Sigma}^* \\right) \\propto\n",
"|\\mathbf{\\Sigma}|^{-\\left(\\alpha+n+1\\right)/2} + \n",
"\\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left( \\mathbf{\\Sigma}^* \\mathbf{\\Sigma}^{-1} \\right) \\right\\}$$\n",
"Assume that the likelihood is a Gaussian:\n",
"$$p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) = \\mathcal{N}(\\mathbf{y} | \\mathbf{X}\\boldsymbol{\\theta}, \\mathbf{\\Sigma}) = |2\\pi\\mathbf\\Sigma|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\right\\}=\n",
"|2\\pi\\mathbf\\Sigma|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left( \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\right)\\right\\}$$\n",
"and since $\\mathrm{trace}\\left(\\mathbf{z}^T\\mathbf{Az}\\right) = \\mathrm{trace}\\left(\\mathbf{z}\\mathbf{z}^T\\mathbf{A}\\right)$\n",
"$$p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) \\propto\n",
"|\\mathbf\\Sigma|^{-\\frac{1}{2}} \n",
"\\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left( \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\right)\\right\\}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can write the posterior as\n",
"$$p\\left( \\mathbf{\\Sigma} | \\mathbf{y}, \\mathbf{X}, \\boldsymbol\\theta, \\alpha, \\mathbf{\\Sigma}^* \\right) \\propto\n",
"p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right)\n",
"p\\left( \\mathbf{\\Sigma} | \\alpha, \\mathbf{\\Sigma}^* \\right) \\propto\n",
"|\\mathbf\\Sigma|^{-\\left(\\left(\\alpha+1\\right)+n+1\\right)/2} \n",
"\\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left[ \\left( \\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
"\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T + \\mathbf{\\Sigma}^* \\right) \\mathbf{\\Sigma}^{-1} \\right] \\right\\}$$\n",
"thus we can write the posterior as an inverse Wishart with parameters $\\alpha + 1$ and\n",
"$\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
"\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T + \\mathbf{\\Sigma}^*$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If $\\alpha = n+1$ and $\\mathbf{\\Sigma}^* = 0$ then the posterior has parameters $\\alpha +2$ and $\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
"\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T$, thus the expectation of the distribution is\n",
"$$ \\mathbb{E}\\left(\\mathbf\\Sigma\\right) = \\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
"\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T$$\n",
"which is the maximum likelihood estimate $\\mathbf{\\Sigma}_{ML}$."
]
}
],
"metadata": {}
}
]
}