{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Biostat M280 Homework 4\n", "\n", "**Due Friday, May 24 @ 11:59PM**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q1 - Ridge regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q1(1)\n", "\n", "Read in the [`longley.txt`](http://hua-zhou.github.io/teaching/biostatm280-2019spring/hw/hw4/longley.txt) with the response (`number of people employed`) in the first column and six explanatory variables in the other columns (`GNP implicit price deflator`, `Gross National Product`, `number of unemployed`, `number of people in the armed forces`, `noninstitutionalized population >=14 years of age`, `year`). Include an intercept in your model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q1(2)\n", "\n", "One popular regularization method is the ridge regression, which estimates regression coefficients by minimizing a penalized least squares criterion\n", "\\begin{eqnarray*}\n", "\t\\frac 12 \\| \\mathbf{y} - \\mathbf{X} \\beta\\|_2^2 + \\frac{\\lambda}{2} \\|\\beta\\|_2^2.\n", "\\end{eqnarray*}\n", "Show that, regardless the shape of $\\mathbf{X}$, there is always a unique global minimum for any $\\lambda>0$ and the ridge solution is given by\n", "\\begin{eqnarray*}\n", "\t\\widehat \\beta(\\lambda) = (\\mathbf{X}^T \\mathbf{X} + \\lambda \\mathbf{I})^{-1} \\mathbf{X}^T \\mathbf{y}.\n", "\\end{eqnarray*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q1(3)\n", "\n", "Show that the ridge estimator is equivalent to the solution of a regular least squares problem with added observations. Compute the ridge regression estimates $\\widehat \\beta(\\lambda)$ at $\\lambda=5,10,15,...,100$ by solving this augmented least squares problem and report the timing. You can use any method of your choice (QR, Cholesky, or sweep)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q1(4)\n", "\n", "Express ridge solution $\\widehat \\beta(\\lambda)$ in terms of the singular value decomposition (SVD) of $\\mathbf{X}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q1(5)\n", "\n", "Show that (i) the $\\ell_2$ norms of ridge solution $\\|\\widehat \\beta(\\lambda)\\|_2$ and corresponding fitted values $\\|\\widehat{\\mathbf{y}} (\\lambda)\\|_2 = \\|\\mathbf{X} \\widehat \\beta (\\lambda)\\|_2$ are non-increasing in $\\lambda$ and (ii) the $\\ell_2$ norm of the residual vector $\\|\\mathbf{y} - \\widehat{\\mathbf{y}}(\\lambda)\\|_2$ is non-decreasing in $\\lambda$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q1(6)\n", "\n", "Re-compute and plot the ridge solution for the Longley data in Q1(3) at $\\lambda = 5, 10, 15,\\ldots, 100$ using the SVD approach and report the timing. Comment on the computation efficiency of SVD approach compared to the approach you used in Q1(3)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q1(7)\n", "\n", "Let's address how to choose the optimal tuning parameter $\\lambda$. Let $\\widehat{\\beta}_k(\\lambda)$ be the solution to the ridge problem\n", "\\begin{eqnarray*}\n", " \\text{minimize} \\,\\, \\frac 12 \\| \\mathbf{y}_{-k} - \\mathbf{X}_{-k} \\beta\\|_2^2 + \\frac{\\lambda}{2} \\|\\beta\\|_2^2,\n", "\\end{eqnarray*}\n", "where $\\mathbf{y}_{-k}$ and $\\mathbf{X}_{-k}$ are the data with the $k$-th observation taken out. The optimal $\\lambda$ can to chosen to minimize the cross-validation square error\n", "\\begin{eqnarray*}\n", " C(\\lambda) = \\frac 1n \\sum_{k=1}^n [y_k - \\mathbf{x}_k^T \\widehat{\\beta}_k(\\lambda)]^2.\n", "\\end{eqnarray*}\n", "However computing $n$ ridge solution paths $\\widehat{\\beta}_k(\\lambda)$ is expensive. Show that, using SVD $\\mathbf{X}=\\mathbf{U} \\Sigma \\mathbf{V}^T$, \n", "\\begin{eqnarray*}\n", " C(\\lambda) = \\frac 1n \\sum_{k=1}^n \\left[ \\frac{y_k - \\sum_{j=1}^r u_{kj} \\tilde y_j \\left( \\frac{\\sigma_j^2}{\\sigma_j^2 + \\lambda} \\right)}{1 - \\sum_{j=1}^r u_{kj}^2 \\left( \\frac{\\sigma_j^2}{\\sigma_j^2 + \\lambda} \\right)} \\right]^2,\n", "\\end{eqnarray*}\n", "where $\\tilde{\\mathbf{y}} = \\mathbf{U}^T \\mathbf{y}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q1(8)\n", "\n", "Find optimal $\\lambda$ for the Longley data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q2 - Matching Images\n", "\n", "Below figure displays two 3s written on a piece of paper and we want to properly align them. \n", "\n", "\n", "\n", "Let matrices $\\mathbf{X}, \\mathbf{Y} \\in \\mathbb{R}^{n \\times p}$ record $n$ points on the two shapes. In this case $n=10$ and $p=2$. Mathematically we consider the problem\n", "\\begin{eqnarray*}\n", "\t\\text{minimize}_{\\beta, \\mathbf{O}, \\mu} \\quad \\|\\mathbf{X} - (\\beta \\mathbf{Y} \\mathbf{O} + \\mathbf{1}_n \\mu^T)\\|_{\\text{F}}^2,\n", "\\end{eqnarray*}\n", "where $\\beta > 0$ is a scaling factor, $\\mathbf{O} \\in \\mathbb{R}^{p \\times p}$ is an orthogonal matrix, and $\\mu \\in \\mathbb{R}^{p}$ is a vector of shifts. Here $\\|\\mathbf{M}\\|_{\\text{F}}^2 = \\sum_{i,j} m_{ij}^2$ is the squared Frobenius norm. Intuitively we want to rotate, stretch and shift the shape [$\\mathbf{Y}$](http://hua-zhou.github.io/teaching/biostatm280-2019spring/hw/hw4/Y.txt) to match the shape [$\\mathbf{X}$](http://hua-zhou.github.io/teaching/biostatm280-2019spring/hw/hw4/X.txt) as much as possible." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q2(1)\n", "\n", "Let $\\bar{\\mathbf{x}}$ and $\\bar{\\mathbf{y}}$ be the column mean vectors of the matrices and $\\tilde{\\mathbf{X}}$ and $\\tilde{\\mathbf{Y}}$ be the versions of these matrices centered by column means. Show that the solution $(\\hat{\\beta}, \\hat{\\mathbf{O}}, \\hat{\\mu})$ satisfies \n", "\\begin{eqnarray*}\n", "\t\\hat{\\mu} = \\bar{\\mathbf{x}} - \\hat{\\beta} \\cdot \\hat{\\mathbf{O}}^T \\bar{\\mathbf{y}}.\n", "\\end{eqnarray*}\n", "Therefore we can center each matrix at its column centroid and then ignore the location completely." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q2(2)\n", "\n", "Derive the solution to\n", "\\begin{eqnarray*}\n", "\t\\text{minimize}_{\\beta, \\mathbf{O}} \\quad \\|\\tilde{\\mathbf{X}} - \\beta \\tilde{\\mathbf{Y}} \\mathbf{O}\\|_{\\text{F}}^2\n", "\\end{eqnarray*}\n", "using the SVD of $\\tilde{\\mathbf{Y}}^T \\tilde{\\mathbf{X}}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q2(3)\n", "\n", "Implement your method and solve the alignment problem in the figure. Display $\\mathbf{X}$ and your solution $\\hat{\\beta} \\mathbf{Y} \\hat{\\mathbf{O}} + \\mathbf{1}_n \\hat{\\mu}^T$ together." ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Julia 1.1.0", "language": "julia", "name": "julia-1.1" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.1.0" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "87px", "width": "252px" }, "navigate_menu": true, "number_sections": false, "sideBar": true, "skip_h1_title": true, "threshold": 4, "toc_cell": false, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }