{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "View the assignment description at [http://www.cs.ubc.ca/~nando/540-2013/lectures/homework1.pdf](http://www.cs.ubc.ca/~nando/540-2013/lectures/homework1.pdf)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let $\\mathbf{X} \\in \\mathbb{R}^{n \\times d}$ be the matrix of input vectors, $\\mathbf{y} \\in \\mathbb{R}^{n \\times 1}$ be the vector of targets and $\\boldsymbol{\\theta} \\in \\mathbb{R}^{n \\times 1}$ be the vector of weights. Assume that the <font color='red'>likelihood</font> is a Gaussian:\n",
      "$$p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) =\n",
      "\\mathcal{N}(\\mathbf{y} | \\mathbf{X}\\boldsymbol{\\theta}, \\mathbf{\\Sigma}) =\n",
      "|2\\pi\\mathbf\\Sigma|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\right\\}$$\n",
      "where $\\mathbf{\\Sigma} \\in \\mathbb{R}^{n \\times n}$ is the covariance matrix and we assume is given.\n",
      "Assume also that the <font color='red'>prior</font> for $\\boldsymbol{\\theta}$ is a Gaussian:\n",
      "$$p\\left(\\boldsymbol{\\theta}\\right) = \\mathcal{N}(\\boldsymbol\\theta | \\mathbf{0}, \\mathbf{\\Delta}) = |2\\pi\\mathbf{\\Delta}|^{-\\frac{1}{2}}\n",
      "\\exp\\left\\{-\\frac{1}{2} \\boldsymbol{\\theta}^T \\mathbf\\Delta^{-1} \\boldsymbol{\\theta}\\right\\}$$\n",
      "where $\\mathbf{\\Delta} \\in \\mathbb{R}^{d \\times d}$ is the covariance matrix."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise 1"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Then the posterior for $\\boldsymbol{\\theta}$ is:\n",
      "$$p\\left( \\boldsymbol{\\theta} | \\mathbf{y}, \\mathbf{X}, \\mathbf{\\Sigma} \\right) \\propto\n",
      "p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) p\\left(\\boldsymbol{\\theta}\\right) \\propto\n",
      "\\exp\\left\\{ -\\frac{1}{2} \\boldsymbol{\\theta}^T \\left(\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1}\\right) \\boldsymbol{\\theta} + \\boldsymbol{\\theta}^T\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} \\right\\}$$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now if we want the posterior of $\\boldsymbol\\theta$ to be a Gaussian of the form:\n",
      "$$\\mathcal{N}\\left( \\boldsymbol\\theta | \\boldsymbol{\\theta}_n, \\mathbf{V}_n \\right) =\n",
      "|2\\pi\\mathbf{V}_n|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\left(\\boldsymbol\\theta - \\boldsymbol{\\theta}_n\\right)^T \\mathbf{V}_{n}^{-1} \\left(\\boldsymbol\\theta - \\boldsymbol{\\theta}_n\\right) \\right\\} \\propto\n",
      "\\exp\\left\\{ -\\frac{1}{2} \\boldsymbol{\\theta}^T\\mathbf{V}_{n}^{-1}\\boldsymbol{\\theta} + \\boldsymbol{\\theta}^T\\mathbf{V}_{n}^{-1}\\boldsymbol{\\theta}_n \\right\\}$$\n",
      "we have to equate:\n",
      "$$\\mathbf{V}_n^{-1} = \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1}$$\n",
      "and\n",
      "$$\\mathbf{V}_n^{-1}\\boldsymbol{\\theta}_n = \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} $$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Thus we can write the <font color='red'>posterior</font> for $\\boldsymbol\\theta$ as:\n",
      "$$p\\left( \\boldsymbol{\\theta} | \\mathbf{y}, \\mathbf{X}, \\mathbf{\\Sigma} \\right) = \n",
      "\\mathcal{N}\\left( \\boldsymbol\\theta | \\boldsymbol{\\theta}_n, \\mathbf{V}_n \\right)$$\n",
      "where $\\mathbf{V}_n^{-1} = \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1}$ and\n",
      "$\\boldsymbol{\\theta}_n = \\left( \\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X} + \\mathbf{\\Delta}^{-1} \\right)^{-1}\n",
      "\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y}$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The ridge estimator for $\\boldsymbol\\theta$ is given by\n",
      "$$\\hat{\\boldsymbol{\\theta}}_R = \\left( \\mathbf{X}^T\\mathbf{X} + \\delta^2I_d \\right)^{-1} \\mathbf{X}^T\\mathbf{y}$$\n",
      "and is equal to the posterior mean $\\boldsymbol{\\theta}_n$ when $\\mathbf\\Sigma = \\sigma^2I_n$ (i.e. the elements in the dataset are uncorrelated and have the same variance) and $\\mathbf{\\Delta}=\\tau^2I_d$ (i.e. the elements of the prior are uncorrelated and have the same variance), in fact in this case the posterior mean is equal to\n",
      "$$\\boldsymbol{\\theta}_n = \\left( \\frac{1}{\\sigma^2} \\mathbf{X}^T\\mathbf{X} + \\frac{1}{\\tau^2}I_d \\right)^{-1}\n",
      "\\frac{1}{\\sigma^2}\\mathbf{X}^T\\mathbf{y} =\n",
      "\\left( \\mathbf{X}^T\\mathbf{X} + \\frac{\\sigma^2}{\\tau^2}I_d \\right)^{-1}\n",
      "\\mathbf{X}^T\\mathbf{y}$$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The MLE estimator for $\\boldsymbol\\theta$ is given by\n",
      "$$\\hat{\\boldsymbol{\\theta}}_{ML} = \\left( \\mathbf{X}^T\\mathbf{X} \\right)^{-1} \\mathbf{X}^T\\mathbf{y}$$\n",
      "and is equal to the posterior mean $\\boldsymbol{\\theta}_n$ when $\\mathbf\\Sigma = \\sigma^2I_n$ (i.e. the elements in the dataset are uncorrelated and have the same variance) and $\\mathbf{\\Delta}^{-1}=0$ (i.e. the variance in the prior tends to infinite), in fact in this case the posterior mean is equal to\n",
      "$$\\boldsymbol{\\theta}_n = \\left( \\frac{1}{\\sigma^2} \\mathbf{X}^T\\mathbf{X} \\right)^{-1}\n",
      "\\frac{1}{\\sigma^2}\\mathbf{X}^T\\mathbf{y} =\n",
      "\\left( \\mathbf{X}^T\\mathbf{X} \\right)^{-1}\n",
      "\\mathbf{X}^T\\mathbf{y}$$"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise 2"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We now calculate the maximum likelihood estimate for $\\mathbf{\\Sigma}$. To do so we calculate the derivative of the log-likelihood\n",
      "\n",
      "$$\\frac{\\partial}{\\partial{\\mathbf{\\Sigma}^{-1}}} -\\frac{1}{2}\\mathbf{y}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} +\n",
      "\\boldsymbol{\\theta}^T\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{y} -\n",
      "\\frac{1}{2} \\boldsymbol{\\theta}^T\\mathbf{X}^T\\mathbf{\\Sigma}^{-1}\\mathbf{X}\\boldsymbol{\\theta} +\n",
      "\\frac{1}{2} \\log{|\\mathbf{\\Sigma}^{-1}|}$$\n",
      "$$=-\\frac{1}{2}\\mathbf{y}\\mathbf{y}^T + \\mathbf{y}\\boldsymbol{\\theta}^T\\mathbf{X}^T -\n",
      "\\frac{1}{2} \\mathbf{X}\\boldsymbol{\\theta}\\boldsymbol{\\theta}^T\\mathbf{X}^T + \\frac{1}{2}\\mathbf{\\Sigma}$$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Then we set the derivative log-likelihood to zero and we get\n",
      "$$\\mathbf{\\Sigma}_{ML} = \\mathbf{yy}^T + \\mathbf{X}\\boldsymbol{\\theta\\theta}^T\\mathbf{X}^T - 2\\mathbf{y}\\boldsymbol{\\theta}^T\\mathbf{X}^T = \\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
      "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T$$"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise 3"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Assume the covariance matrix is unkown and is given an <font color='red'>inverse Wishart prior</font> on $\\mathbf{\\Sigma}$, with fixed know parameters $\\alpha$ and $\\mathbf{\\Sigma}^*$:\n",
      "$$p\\left( \\mathbf{\\Sigma} | \\alpha, \\mathbf{\\Sigma}^* \\right) \\propto\n",
      "|\\mathbf{\\Sigma}|^{-\\left(\\alpha+n+1\\right)/2} + \n",
      "\\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left( \\mathbf{\\Sigma}^* \\mathbf{\\Sigma}^{-1} \\right) \\right\\}$$\n",
      "Assume that the <font color='red'>likelihood</font> is a Gaussian:\n",
      "$$p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) = \\mathcal{N}(\\mathbf{y} | \\mathbf{X}\\boldsymbol{\\theta}, \\mathbf{\\Sigma}) = |2\\pi\\mathbf\\Sigma|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\right\\}=\n",
      "|2\\pi\\mathbf\\Sigma|^{-\\frac{1}{2}} \\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left( \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\right)\\right\\}$$\n",
      "and since $\\mathrm{trace}\\left(\\mathbf{z}^T\\mathbf{Az}\\right) = \\mathrm{trace}\\left(\\mathbf{z}\\mathbf{z}^T\\mathbf{A}\\right)$\n",
      "$$p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right) \\propto\n",
      "|\\mathbf\\Sigma|^{-\\frac{1}{2}} \n",
      "\\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left( \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right) \\left(\\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta\\right)^T \\mathbf\\Sigma^{-1} \\right)\\right\\}$$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can write the <font color='red'>posterior</font> as\n",
      "$$p\\left( \\mathbf{\\Sigma} | \\mathbf{y}, \\mathbf{X}, \\boldsymbol\\theta, \\alpha, \\mathbf{\\Sigma}^* \\right) \\propto\n",
      "p\\left(\\mathbf{y} | \\mathbf{X}, \\boldsymbol{\\theta}, \\mathbf\\Sigma \\right)\n",
      "p\\left( \\mathbf{\\Sigma} | \\alpha, \\mathbf{\\Sigma}^* \\right) \\propto\n",
      "|\\mathbf\\Sigma|^{-\\left(\\left(\\alpha+1\\right)+n+1\\right)/2} \n",
      "\\exp\\left\\{ -\\frac{1}{2} \\mathrm{trace}\\left[ \\left( \\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
      "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T + \\mathbf{\\Sigma}^* \\right) \\mathbf{\\Sigma}^{-1} \\right] \\right\\}$$\n",
      "thus we can write the posterior as an inverse Wishart with parameters $\\alpha + 1$ and\n",
      "$\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
      "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T + \\mathbf{\\Sigma}^*$."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If $\\alpha = n+1$ and $\\mathbf{\\Sigma}^* = 0$ then the posterior has parameters $\\alpha +2$ and $\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
      "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T$, thus the expectation of the distribution is\n",
      "$$ \\mathbb{E}\\left(\\mathbf\\Sigma\\right) = \\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)\n",
      "\\left( \\mathbf{y}-\\mathbf{X}\\boldsymbol\\theta \\right)^T$$\n",
      "which is the maximum likelihood estimate $\\mathbf{\\Sigma}_{ML}$."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}