{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Biostat M280 Homework 4\n",
    "\n",
    "**Due Friday, May 24 @ 11:59PM**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q1 - Ridge regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q1(1)\n",
    "\n",
    "Read in the [`longley.txt`](http://hua-zhou.github.io/teaching/biostatm280-2019spring/hw/hw4/longley.txt) with the response (`number of people employed`) in the first column and six explanatory variables in the other columns (`GNP implicit price deflator`, `Gross National Product`, `number of unemployed`, `number of people in the armed forces`, `noninstitutionalized population >=14 years of age`, `year`). Include an intercept in your model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q1(2)\n",
    "\n",
    "One popular regularization method is the ridge regression, which estimates regression coefficients by minimizing a penalized least squares criterion\n",
    "\\begin{eqnarray*}\n",
    "\t\\frac 12 \\| \\mathbf{y} - \\mathbf{X} \\beta\\|_2^2 + \\frac{\\lambda}{2} \\|\\beta\\|_2^2.\n",
    "\\end{eqnarray*}\n",
    "Show that, regardless the shape of $\\mathbf{X}$, there is always a unique global minimum for any $\\lambda>0$ and the ridge solution is given by\n",
    "\\begin{eqnarray*}\n",
    "\t\\widehat \\beta(\\lambda) = (\\mathbf{X}^T \\mathbf{X} + \\lambda \\mathbf{I})^{-1} \\mathbf{X}^T \\mathbf{y}.\n",
    "\\end{eqnarray*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q1(3)\n",
    "\n",
    "Show that the ridge estimator is equivalent to the solution of a regular least squares problem with added observations. Compute the ridge regression estimates $\\widehat \\beta(\\lambda)$ at $\\lambda=5,10,15,...,100$ by solving this augmented least squares problem and report the timing. You can use any method of your choice (QR, Cholesky, or sweep)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q1(4)\n",
    "\n",
    "Express ridge solution $\\widehat \\beta(\\lambda)$ in terms of the singular value decomposition (SVD) of $\\mathbf{X}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q1(5)\n",
    "\n",
    "Show that (i) the $\\ell_2$ norms of ridge solution $\\|\\widehat \\beta(\\lambda)\\|_2$ and corresponding fitted values $\\|\\widehat{\\mathbf{y}} (\\lambda)\\|_2 = \\|\\mathbf{X} \\widehat \\beta (\\lambda)\\|_2$ are non-increasing in $\\lambda$ and (ii) the $\\ell_2$ norm of the residual vector $\\|\\mathbf{y} - \\widehat{\\mathbf{y}}(\\lambda)\\|_2$ is non-decreasing in $\\lambda$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q1(6)\n",
    "\n",
    "Re-compute and plot the ridge solution for the Longley data in Q1(3) at $\\lambda = 5, 10, 15,\\ldots, 100$ using the SVD approach and report the timing. Comment on the computation efficiency of SVD approach compared to the approach you used in Q1(3)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q1(7)\n",
    "\n",
    "Let's address how to choose the optimal tuning parameter $\\lambda$. Let $\\widehat{\\beta}_k(\\lambda)$ be the solution to the ridge problem\n",
    "\\begin{eqnarray*}\n",
    "    \\text{minimize} \\,\\, \\frac 12 \\| \\mathbf{y}_{-k} - \\mathbf{X}_{-k} \\beta\\|_2^2 + \\frac{\\lambda}{2} \\|\\beta\\|_2^2,\n",
    "\\end{eqnarray*}\n",
    "where $\\mathbf{y}_{-k}$ and $\\mathbf{X}_{-k}$ are the data with the $k$-th observation taken out. The optimal $\\lambda$ can to chosen to minimize the cross-validation square error\n",
    "\\begin{eqnarray*}\n",
    "    C(\\lambda) = \\frac 1n \\sum_{k=1}^n [y_k - \\mathbf{x}_k^T \\widehat{\\beta}_k(\\lambda)]^2.\n",
    "\\end{eqnarray*}\n",
    "However computing $n$ ridge solution paths $\\widehat{\\beta}_k(\\lambda)$ is expensive. Show that, using SVD $\\mathbf{X}=\\mathbf{U} \\Sigma \\mathbf{V}^T$, \n",
    "\\begin{eqnarray*}\n",
    "    C(\\lambda) = \\frac 1n \\sum_{k=1}^n \\left[ \\frac{y_k - \\sum_{j=1}^r u_{kj} \\tilde y_j \\left( \\frac{\\sigma_j^2}{\\sigma_j^2 + \\lambda} \\right)}{1 - \\sum_{j=1}^r u_{kj}^2 \\left( \\frac{\\sigma_j^2}{\\sigma_j^2 + \\lambda} \\right)} \\right]^2,\n",
    "\\end{eqnarray*}\n",
    "where $\\tilde{\\mathbf{y}} = \\mathbf{U}^T \\mathbf{y}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q1(8)\n",
    "\n",
    "Find optimal $\\lambda$ for the Longley data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q2 - Matching Images\n",
    "\n",
    "Below figure displays two 3s written on a piece of paper and we want to properly align them. \n",
    "\n",
    "<img src=\"./threes.pdf\" width=\"300\" align=\"center\"/>\n",
    "\n",
    "Let matrices $\\mathbf{X}, \\mathbf{Y} \\in \\mathbb{R}^{n \\times p}$ record $n$ points on the two shapes. In this case $n=10$ and $p=2$. Mathematically we consider the problem\n",
    "\\begin{eqnarray*}\n",
    "\t\\text{minimize}_{\\beta, \\mathbf{O}, \\mu} \\quad \\|\\mathbf{X} - (\\beta \\mathbf{Y} \\mathbf{O} + \\mathbf{1}_n \\mu^T)\\|_{\\text{F}}^2,\n",
    "\\end{eqnarray*}\n",
    "where $\\beta > 0$ is a scaling factor, $\\mathbf{O} \\in \\mathbb{R}^{p \\times p}$ is an orthogonal matrix, and $\\mu \\in \\mathbb{R}^{p}$ is a vector of shifts. Here $\\|\\mathbf{M}\\|_{\\text{F}}^2 = \\sum_{i,j} m_{ij}^2$ is the squared Frobenius norm. Intuitively we want to rotate, stretch and shift the shape [$\\mathbf{Y}$](http://hua-zhou.github.io/teaching/biostatm280-2019spring/hw/hw4/Y.txt) to match the shape [$\\mathbf{X}$](http://hua-zhou.github.io/teaching/biostatm280-2019spring/hw/hw4/X.txt) as much as possible."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q2(1)\n",
    "\n",
    "Let $\\bar{\\mathbf{x}}$ and $\\bar{\\mathbf{y}}$ be the column mean vectors of the matrices and $\\tilde{\\mathbf{X}}$ and $\\tilde{\\mathbf{Y}}$ be the versions of these matrices centered by column means. Show that the solution $(\\hat{\\beta}, \\hat{\\mathbf{O}}, \\hat{\\mu})$ satisfies \n",
    "\\begin{eqnarray*}\n",
    "\t\\hat{\\mu} = \\bar{\\mathbf{x}} - \\hat{\\beta} \\cdot \\hat{\\mathbf{O}}^T \\bar{\\mathbf{y}}.\n",
    "\\end{eqnarray*}\n",
    "Therefore we can center each matrix at its column centroid and then ignore the location completely."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q2(2)\n",
    "\n",
    "Derive the solution to\n",
    "\\begin{eqnarray*}\n",
    "\t\\text{minimize}_{\\beta, \\mathbf{O}} \\quad \\|\\tilde{\\mathbf{X}} - \\beta \\tilde{\\mathbf{Y}} \\mathbf{O}\\|_{\\text{F}}^2\n",
    "\\end{eqnarray*}\n",
    "using the SVD of $\\tilde{\\mathbf{Y}}^T \\tilde{\\mathbf{X}}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q2(3)\n",
    "\n",
    "Implement your method and solve the alignment problem in the figure. Display $\\mathbf{X}$ and your solution $\\hat{\\beta} \\mathbf{Y} \\hat{\\mathbf{O}} + \\mathbf{1}_n \\hat{\\mu}^T$ together."
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Julia 1.1.0",
   "language": "julia",
   "name": "julia-1.1"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.1.0"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "87px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": true,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}