{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ "---\n", "title: Biostat 216 Homework 6\n", "subtitle: 'Due Dec 1 @ 11:59pm'\n", "format:\n", " html:\n", " theme: cosmo\n", " embed-resources: true\n", " number-sections: false\n", " toc: true\n", " toc-depth: 4\n", " toc-location: left\n", " code-fold: false\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Submit a PDF (scanned/photographed from handwritten solutions, or converted from RMarkdown or Jupyter Notebook) to Gracescope on BruinLearn." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Eigenvalues and eigenvectors\n", "\n", "### Q1\n", "\n", "Diagonalize (show the steps to find eigenvalues and eigenvectors)\n", "$$\n", "\\mathbf{A} = \\begin{pmatrix} 2 & -1 \\\\ -1 & 2 \\end{pmatrix}\n", "$$\n", "and compute $\\mathbf{X} \\boldsymbol{\\Lambda}^k \\mathbf{X}^{-1}$ to prove the formula\n", "$$\n", "\\mathbf{A}^k = \\frac 12 \\begin{pmatrix} 1 + 3^k & 1 - 3^k \\\\ 1 - 3^k & 1 + 3^k \\end{pmatrix}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q2\n", "\n", "Suppose the eigenvalues of a square matrix $\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$ are $\\lambda_1, \\ldots, \\lambda_n$. Show that $\\det (\\mathbf{A}) = \\prod_{i=1}^n \\lambda_i$. Hint: $\\lambda_i$s are roots of the characteristic polynomial." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q3\n", "\n", "Ture of false. For each statement, indicate it is true or false and gives a brief explanation.\n", "\n", "1. If the columns of $\\mathbf{X}$ (eigenvectors of a square matrix $\\mathbf{A}$) are linearly independent, then (a) $\\mathbf{A}$ is invertible; (b) $\\mathbf{A}$ is diagonalizable; (c) $\\mathbf{X}$ is invertible; (d) $\\mathbf{X}$ is diagonalizable.\n", " \n", "2. If the eigenvalues of $\\mathbf{A}$ are 2, 2, 5 then the matrix is certainly (a) invertible; (b) diagonalizable. \n", " \n", "3. If the only eigenvectors of $\\mathbf{A}$ are multiples of $\\begin{pmatrix} 1 \\\\ 4 \\end{pmatrix}$, then the matrix has (a) no inverse; (b) a repeated eigenvalue; (c) no diagonalization $\\mathbf{X} \\boldsymbol{\\Lambda} \\mathbf{X}^{-1}$. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q4\n", "\n", "Let $\\mathbf{A} \\in \\mathbb{R}^{m \\times n}$ and $\\mathbf{B} \\in \\mathbb{R}^{n \\times m}$. Show that $\\mathbf{A} \\mathbf{B}$ and $\\mathbf{B} \\mathbf{A}$ share the same non-zero eigenvalues. Hint: examine the eigen-equations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q5\n", "\n", "Let\n", "$$\n", "\\mathbf{A} = \\begin{pmatrix}\n", "0 & 2 \\\\\n", "1 & 1\n", "\\end{pmatrix}, \\quad \\mathbf{B} = \\begin{pmatrix}\n", "1 & 2 \\\\\n", "0 & 1\n", "\\end{pmatrix}.\n", "$$\n", "\n", "1. Find eigenvalues and eigenvectors of $\\mathbf{A}$ and $\\mathbf{A}^{-1}$. Do they have same eigenvectors? What's the relationship between their eigenvalues? \n", "\n", "2. Find eigenvalues of $\\mathbf{B}$ and $\\mathbf{A} + \\mathbf{B}$. Are eigenvalues of $\\mathbf{A} + \\mathbf{B}$ equal to eigenvalues of $\\mathbf{A}$ plus eigenvalues of $\\mathbf{B}$? \n", "\n", "3. Find eigenvalues of $\\mathbf{A} \\mathbf{B}$ and $\\mathbf{B} \\mathbf{A}$. Are the eigenvalues of $\\mathbf{A} \\mathbf{B}$ equal to eigenvalues of $\\mathbf{A}$ times eigenvalues of $\\mathbf{B}$? Are the eigenvalues of $\\mathbf{A} \\mathbf{B}$ equal to eigenvalues of $\\mathbf{B} \\mathbf{A}$?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q6\n", "\n", "Suppose $\\mathbf{A}$ has eigenvalues 0, 3, 5 with independent eigenvectors $\\mathbf{u}$, $\\mathbf{v}$, $\\mathbf{w}$ respectively. \n", "\n", "1. Give a basis for $\\mathcal{C}(\\mathbf{A})$ and a basis for $\\mathcal{N}(\\mathbf{A})$.\n", "\n", "2. Find a particular solution to $\\mathbf{A} \\mathbf{x} = \\mathbf{v} + \\mathbf{w}$. Find all solutions. \n", "\n", "3. Show that the linear equation $\\mathbf{A} \\mathbf{x} = \\mathbf{u}$ has no solution. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Positive definite matrices\n", "\n", "### Q7\n", "\n", "Suppose $\\mathbf{C}$ is positive definite and $\\mathbf{A}$ has independent columns. Apply the energy test to show that $\\mathbf{A}' \\mathbf{C} \\mathbf{A}$ is positive definite. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q8\n", "\n", "Show that the diagonal entries of a positive definite matrix are positive. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q9 (Rayleigh quotient)\n", "\n", "Suppose $\\mathbf{S}$ is positive definite with eigenvalues $\\lambda_1 \\ge \\lambda_2 \\ge \\cdots \\ge \\lambda_n > 0$ with corresponding orthonormal eigenvectors $\\mathbf{u}_1, \\mathbf{u}_2, \\ldots, \\mathbf{u}_n$. \n", "\n", "1. What are the eigenvalues of the matrix $\\lambda_1 \\mathbf{I} - \\mathbf{S}$? Is it positive semidefinite? \n", "\n", "2. How does it follow that $\\lambda_1 \\mathbf{x}'\\mathbf{x} \\ge \\mathbf{x}' \\mathbf{S} \\mathbf{x}$ for every $\\mathbf{x}$? \n", "\n", "3. Draw the conclusion: The maximum value of the Rayleigh quotient \n", "$$\n", "R(\\mathbf{x}) = \\frac{\\mathbf{x}'\\mathbf{S}\\mathbf{x}}{\\mathbf{x}'\\mathbf{x}}\n", "$$ \n", "is $\\lambda_1$. \n", "\n", "4. Show that the maximum value of the Rayleigh quotient subject to the condition $\\mathbf{x}\\perp \\mathbf{u}_1$ is $\\lambda_2$. Hint: expand $\\mathbf{x}$ in eigenvectors $\\mathbf{u}_1,\\ldots,\\mathbf{u}_n$. \n", "\n", "5. Show that the maximum value of the Rayleigh quotient subject to the conditions $\\mathbf{x}\\perp \\mathbf{u}_1$ and $\\mathbf{x}\\perp \\mathbf{u}_2$ is $\\lambda_3$. \n", "\n", "6. What is the maximum value of $\\frac 12 \\mathbf{x}' \\mathbf{S} \\mathbf{x}$ subject to the constraint $\\mathbf{x}'\\mathbf{x}=1$. Hint: write down the Lagrangian and set its derivative to zero. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SVD\n", "\n", "### Q10\n", "\n", "Find the closest rank-1 approximation (in Frobenius norm or spectral norm) to these matrices\n", "$$\n", "\\mathbf{A} = \\begin{pmatrix} 3 & 0 & 0 \\\\ 0 & 2 & 0 \\\\ 0 & 0 & 1 \\end{pmatrix}, \\quad \\mathbf{A} = \\begin{pmatrix} 0 & 3 \\\\ 2 & 0 \\end{pmatrix}, \\quad \\mathbf{A} = \\begin{pmatrix} 2 & 1 \\\\ 1 & 2 \\end{pmatrix}, \\quad \\mathbf{A} = \\begin{pmatrix} \\cos \\theta & - \\sin \\theta \\\\ \\sin \\theta & \\cos \\theta \\end{pmatrix}. \n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q11 (Moore-Penrose inverse)\n", "\n", "1. With singular value decomposition $\\mathbf{X} = \\mathbf{U} \\boldsymbol{\\Sigma} \\mathbf{V}'$, verify that\n", "$$\n", "\\mathbf{X}^+ = \\mathbf{V} \\boldsymbol{\\Sigma}^+ \\mathbf{U}' = \\mathbf{V}_r \\boldsymbol{\\Sigma}_r^{-1} \\mathbf{U}_r' = \\sum_{i=1}^r \\sigma_i^{-1} \\mathbf{v}_i \\mathbf{u}_i',\n", "$$\n", "where $\\boldsymbol{\\Sigma}^+ = \\text{diag}(\\sigma_1^{-1}, \\ldots, \\sigma_r^{-1}, 0, \\ldots, 0)$ and $r= \\text{rank}(\\mathbf{X})$, satifies the four properties of the [Moore-Penrose inverse](https://ucla-biostat-216.github.io/2023fall/slides/07-matinv/07-matinv.html#generalized-inverse-and-moore-penrose-inverse). \n", "\n", "2. Show that $\\text{rank}(\\mathbf{X}^+) = \\text{rank}(\\mathbf{X})$. \n", "\n", "3. Show that $\\mathbf{X}^+ \\mathbf{X}$ is the orthogonal projector into $\\mathcal{C}(\\mathbf{X}')$ and $\\mathbf{X} \\mathbf{X}^+$ is the orthogonal projector into $\\mathcal{C}(\\mathbf{X})$. \n", "\n", "4. Show that $\\boldsymbol{\\beta}^+ = \\mathbf{X}^+ \\mathbf{y}$ is a minimizer of the least squares criterion $f(\\boldsymbol{\\beta}) = \\|\\mathbf{y} - \\mathbf{X} \\boldsymbol{\\beta}\\|^2$. Hint: check $\\boldsymbol{\\beta}^+$ satisfies the normal equation $\\mathbf{X}'\\mathbf{X}\\boldsymbol{\\beta} = \\mathbf{X}'\\mathbf{y}$. \n", "\n", "5. Show that $\\boldsymbol{\\beta}^+ \\in \\mathcal{C}(\\mathbf{X}')$. \n", "\n", "6. Show that if another $\\boldsymbol{\\beta}^\\star$ minimizes $f(\\boldsymbol{\\beta})$, then $\\|\\boldsymbol{\\beta}^\\star\\| \\ge \\|\\boldsymbol{\\beta}^+\\|$. This says that $\\boldsymbol{\\beta}^+ = \\mathbf{X}^+ \\mathbf{y}$ is the least squares solution with smallest $\\ell_2$ norm. Hint: since both $\\boldsymbol{\\beta}^\\star$ and $\\boldsymbol{\\beta}^+$ satisfy the normal equation, $\\mathbf{X}'\\mathbf{X} \\boldsymbol{\\beta}^\\star = \\mathbf{X}'\\mathbf{X} \\boldsymbol{\\beta}^+$ and deduce that $\\boldsymbol{\\beta}^\\star - \\boldsymbol{\\beta}^+ \\in \\mathcal{N}(\\mathbf{X})$. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q12\n", "\n", "Let $\\mathbf{B}$ be a submatrix of $\\mathbf{A} \\in \\mathbb{R}^{m \\times n}$. Show that the largest singular value of $\\mathbf{B}$ is always less than or equal to the largest singular value of $\\mathbf{A}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q13\n", "\n", "Show that all three matrix norms ($\\ell_2$, Frobenius, nuclear) are invariant under orthogonal transforms. That is\n", "$$\n", "\\|\\mathbf{Q}_1 \\mathbf{A} \\mathbf{Q}_2'\\| = \\|\\mathbf{A}\\| \\text{ for orthogonal } \\mathbf{Q}_1 \\text{ and } \\mathbf{Q}_2.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimization and multivariate calculus\n", "\n", "### Q14\n", "\n", "1. Explain why the intersection $K_1 \\cap K_2$ of two convex sets is a convex set. \n", "\n", "2. Prove that the maximum $F_3$ of two convex functions $F_1$ and $F_2$ is a convex function. Hint: What is the set above the graph of $F_3$? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q15\n", "\n", "Show that these functions are convex: \n", "\n", "1. Entropy $x \\log x$. \n", "\n", "2. $\\log (e^x + e^y)$. \n", "\n", "3. $\\ell_p$ norm $\\|\\mathbf{x}\\|_p = (|x_1|^p + |x_2|^p)^{1/p}$, $p \\ge 1$. \n", "\n", "4. $\\lambda_{\\max}(\\mathbf{S})$ as a function of the symmetric matrix $\\mathbf{S}$. Hint: HW6 Q9.6 and Q14.2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q16\n", "\n", "Minimize $f(x_1,x_2)= \\frac 12 \\mathbf{x}'\\mathbf{S} \\mathbf{x}= \\frac 12 x_1^2 + 2 x_2^2$ subject to the constraint $\\mathbf{A}'\\mathbf{x}=x_1 + 3x_2 = b$. \n", "\n", "1. What is the Lagrangian $L(\\mathbf{x},\\lambda)$ for this problem. \n", "\n", "2. What are the three equations \"derivative of L=zero\"? \n", "\n", "3. Solve these equations to find $\\mathbf{x}^\\star = (x_1^\\star, x_2^\\star)$ and the multiplier $\\lambda^\\star$. \n", "\n", "4. Verify that the derivative of the minimum cost is $\\partial f^\\star / \\partial b = -\\lambda^\\star$. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q17 (MLE of multivariate normal)\n", "\n", "Let $\\mathbf{y}_1, \\ldots, \\mathbf{y}_n \\in \\mathbb{R}^{p}$ be iid samples from a $p$-dimensional multivariate normal distribution $N(\\boldsymbol{\\mu}, \\boldsymbol{\\Omega})$, where the mean $\\boldsymbol{\\mu} \\in \\mathbb{R}^p$ and covariance $\\boldsymbol{\\Omega} \\in \\mathbb{R}^{p \\times p}$ are unkonwn parameters. The log-likelihood is\n", "$$\n", "\\ell(\\boldsymbol{\\mu}, \\boldsymbol{\\Omega}) = - \\frac n2 \\log \\det \\boldsymbol{\\Omega} - \\frac 12 \\sum_{i=1}^n (\\mathbf{y}_i - \\boldsymbol{\\mu})' \\boldsymbol{\\Omega}^{-1} (\\mathbf{y}_i - \\boldsymbol{\\mu}) - \\frac{n}{2} \\log 2\\pi.\n", "$$\n", "Show that the maximum likelihood estimate (MLE) is\n", "\\begin{eqnarray*}\n", "\\widehat{\\boldsymbol{\\mu}} &=& \\frac{\\sum_{i=1}^n \\mathbf{y}_i}{n} \\\\\n", "\\widehat{\\boldsymbol{\\Omega}} &=& \\frac{\\sum_{i=1}^n (\\mathbf{y}_i - \\hat{\\boldsymbol{\\mu}})(\\mathbf{y}_i - \\hat{\\boldsymbol{\\mu}})'}{n}.\n", "\\end{eqnarray*}\n", "That is to show that $\\widehat{\\boldsymbol{\\mu}}, \\widehat{\\boldsymbol{\\Omega}}$ maximize $\\ell$.\n", " \n", "Hint: Use the first order optimality condition to find $\\widehat{\\boldsymbol{\\mu}}$ and $\\widehat{\\boldsymbol{\\Omega}}$. To check the optimality of $\\widehat{\\boldsymbol{\\Omega}}$, use its Cholesky factor." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q18 (Smallest matrix subject to linear constraints)\n", "\n", "Find the matrix $\\mathbf{X}$ with the smallest Frobenius norm subject to the constraint $\\mathbf{X} \\mathbf{U} = \\mathbf{V}$, assuming $\\mathbf{U}$ has full column rank. \n", "\n", "Hint: Write down the optimization problem and use the method of Lagrange multipliers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q19 (Minimizing a convex quadratic form over manifold)\n", "\n", "$\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$ is a positive semidefinite matrix. Find a matrix $\\mathbf{U} \\in \\mathbb{R}^{n \\times r}$ with orthonomal columns that maximizes $\\text{tr} (\\mathbf{U}' \\mathbf{A} \\mathbf{U})$. That is to \n", "\\begin{eqnarray*}\n", " &\\text{maximize}& \\quad \\text{tr} (\\mathbf{U}' \\mathbf{A} \\mathbf{U}) \\\\\n", " &\\text{subject to}& \\quad \\mathbf{U}' \\mathbf{U} = \\mathbf{I}_r.\n", "\\end{eqnarray*}\n", "This result generalizes the fact that the top eigenvector of $\\mathbf{A}$ maximizes $\\mathbf{u}' \\mathbf{A} \\mathbf{u}$ subject to constraint $\\mathbf{u}'\\mathbf{u} = 1$.\n", "\n", "Hint: Use the method of Lagrange multipliers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q20 ([Procrustes rotation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.orthogonal_procrustes.html#scipy.linalg.orthogonal_procrustes))\n", "\n", "The Procrustes problem studies how to properly align images. \n", "![](./signature.png)\n", "![](./procrustes.jpg) \n", "Let matrices $\\mathbf{X}, \\mathbf{Y} \\in \\mathbb{R}^{n \\times p}$ record $n$ points on the two shapes. Mathematically we consider the problem\n", "\\begin{eqnarray*}\n", "\\text{minimize}_{\\beta, \\mathbf{O}, \\mu} \\quad \\|\\mathbf{X} - (\\beta \\mathbf{Y} \\mathbf{O} + \\mathbf{1}_n \\mu^T)\\|_{\\text{F}}^2,\n", "\\end{eqnarray*}\n", "where $\\beta > 0$ is a scaling factor, $\\mathbf{O} \\in \\mathbb{R}^{p \\times p}$ is an orthogonal matrix, and $\\mu \\in \\mathbb{R}^{p}$ is a vector of shifts. Here $\\|\\mathbf{M}\\|_{\\text{F}}^2 = \\sum_{i,j} m_{ij}^2$ is the squared Frobenius norm. Intuitively we want to rotate, stretch and shift the shape $\\mathbf{Y}$ to match the shape $\\mathbf{X}$ as much as possible. \n", " \n", "1. Let $\\bar{\\mathbf{x}}$ and $\\bar{\\mathbf{y}}$ be the column mean vectors of the matrices and $\\tilde{\\mathbf{X}}$ and $\\tilde{\\mathbf{Y}}$ be the versions of these matrices centered by column means. Show that the solution $(\\hat{\\beta}, \\hat{\\mathbf{O}}, \\hat{\\mu})$ satisfies \n", "\\begin{eqnarray*}\n", "\t\\hat{\\mu} = \\bar{\\mathbf{x}} - \\hat{\\beta} \\cdot \\hat{\\mathbf{O}}^T \\bar{\\mathbf{y}}.\n", "\\end{eqnarray*}\n", "Therefore we can center each matrix at its column centroid and then ignore the location completely.\n", "\n", "2. Derive the solution to\n", "\\begin{eqnarray*}\n", "\\text{minimize}_{\\beta, \\mathbf{O}} \\quad \\|\\tilde{\\mathbf{X}} - \\beta \\tilde{\\mathbf{Y}} \\mathbf{O}\\|_{\\text{F}}^2\n", "\\end{eqnarray*}\n", "using the SVD of $\\tilde{\\mathbf{Y}}^T \\tilde{\\mathbf{X}}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q21 ([**Ridge regression**](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html))\n", "\n", "One popular regularization method in machine learning is the ridge regression, which estimates regression coefficients by minimizing a penalized least squares criterion\n", "\\begin{eqnarray*}\n", "\\frac 12 \\| \\mathbf{y} - \\mathbf{X} \\beta\\|_2^2 + \\frac{\\lambda}{2} \\|\\beta\\|_2^2.\n", "\\end{eqnarray*}\n", "Here $\\mathbf{y} \\in \\mathbb{R}^n$ and $\\mathbf{X} \\in \\mathbb{R}^{n \\times p}$ are fixed data. $\\boldsymbol{\\beta} \\in \\mathbb{R}^p$ are the regression coefficients to be estimated. \n", "\n", "1. Show that, regardless the shape of $\\mathbf{X}$, there is always a unique global minimum for any $\\lambda>0$ and the ridge solution is given by\n", "\\begin{eqnarray*}\n", "\\widehat{\\boldsymbol{\\beta}}(\\lambda) = (\\mathbf{X}^T \\mathbf{X} + \\lambda \\mathbf{I})^{-1} \\mathbf{X}^T \\mathbf{y}.\n", "\\end{eqnarray*}\n", "\n", "2. Express ridge solution $\\widehat{\\boldsymbol{\\beta}}(\\lambda)$ in terms of the singular value decomposition (SVD) of $\\mathbf{X}$.\n", " \n", "3. Show that (i) the $\\ell_2$ norms of ridge solution $\\|\\widehat{\\boldsymbol{\\beta}}(\\lambda)\\|_2$ and corresponding fitted values $\\|\\widehat{\\mathbf{y}} (\\lambda)\\|_2 = \\|\\mathbf{X} \\widehat{\\boldsymbol{\\beta}}(\\lambda)\\|_2$ are non-increasing in $\\lambda$ and (ii) the $\\ell_2$ norm of the residual vector $\\|\\mathbf{y} - \\widehat{\\mathbf{y}}(\\lambda)\\|_2$ is non-decreasing in $\\lambda$.\n", " \n", "4. Let's address how to choose the optimal tuning parameter $\\lambda$. Let $\\widehat{\\boldsymbol{\\beta}}_k(\\lambda)$ be the solution to the ridge problem\n", "\\begin{eqnarray*}\n", " \\text{minimize} \\,\\, \\frac 12 \\| \\mathbf{y}_{-k} - \\mathbf{X}_{-k} \\boldsymbol{\\beta}\\|_2^2 + \\frac{\\lambda}{2} \\|\\beta\\|_2^2,\n", "\\end{eqnarray*}\n", "where $\\mathbf{y}_{-k}$ and $\\mathbf{X}_{-k}$ are the data with the $k$-th observation taken out. The optimal $\\lambda$ can to chosen to minimize the cross-validation square error\n", "\\begin{eqnarray*}\n", " C(\\lambda) = \\frac 1n \\sum_{k=1}^n [y_k - \\mathbf{x}_k^T \\widehat{\\boldsymbol{\\beta}}_k(\\lambda)]^2.\n", "\\end{eqnarray*}\n", "However computing $n$ ridge solutions $\\widehat{\\boldsymbol{\\beta}}_k(\\lambda)$ is expensive. Show that, using SVD $\\mathbf{X}=\\mathbf{U} \\Sigma \\mathbf{V}^T$, \n", "\\begin{eqnarray*}\n", " C(\\lambda) = \\frac 1n \\sum_{k=1}^n \\left[ \\frac{y_k - \\sum_{j=1}^r u_{kj} \\tilde y_j \\left( \\frac{\\sigma_j^2}{\\sigma_j^2 + \\lambda} \\right)}{1 - \\sum_{j=1}^r u_{kj}^2 \\left( \\frac{\\sigma_j^2}{\\sigma_j^2 + \\lambda} \\right)} \\right]^2,\n", "\\end{eqnarray*}\n", "where $\\tilde{\\mathbf{y}} = \\mathbf{U}^T \\mathbf{y}$. Therefore, in practice, we only need to do SVD of $\\mathbf{X}$ and then find the optimal value $\\lambda$ that minimizes the leave-one-out (LOO) cross-validation error." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q22 ([**Factor analysis**](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FactorAnalysis.html))\n", "\n", "Let $\\mathbf{y}_1, \\ldots, \\mathbf{y}_n \\in \\mathbb{R}^p$ be iid samples from a multivariate normal distribution $N(\\mathbf{0}_p, \\mathbf{F} \\mathbf{F}' + \\mathbf{D})$, where $\\mathbf{F} \\in \\mathbb{R}^{p \\times r}$ and $\\mathbf{D} \\in \\mathbb{R}^{p \\times p}$ is a diagonal matrix with positive entries. We estimate the factor matrix $\\mathbf{F}$ and diagonal matrix $\\mathbf{D}$ by maximizing the log-likelihood function\n", "$$\n", "\\ell(\\mathbf{F}, \\mathbf{D}) = - \\frac n 2 \\ln \\det (\\mathbf{F} \\mathbf{F}' + \\mathbf{D}) - \\frac n2 \\text{tr} \\left[(\\mathbf{F} \\mathbf{F}' + \\mathbf{D})^{-1} \\mathbf{S} \\right] - \\frac{np}{2} \\ln 2\\pi,\n", "$$\n", "where $\\mathbf{S} = n^{-1} \\sum_{i=1}^n \\mathbf{y}_i \\mathbf{y}_i'$.\n", "\n", "1. We first show that, for fixed $\\mathbf{D}$, we can find the maximizer $\\mathbf{F}$ explicitly using SVD by the following steps. \n", " - Step 1: Take derivative with respect to $\\mathbf{F}$ and set to 0 to obtain the first-order optimality condition. \n", " - Step 2: Reparameterize $\\mathbf{H} = \\mathbf{D}^{-1/2} \\mathbf{F}$ and $\\tilde{\\mathbf{S}} = \\mathbf{D}^{-1/2} \\mathbf{S} \\mathbf{D}^{-1/2}$, and express the first-order optimality condition in terms of $\\mathbf{H}$ and $\\tilde{\\mathbf{S}}$. \n", " - Step 3: Let $\\mathbf{H} = \\mathbf{U} \\boldsymbol{\\Sigma} \\mathbf{V}'$ be its SVD. Show that columns of $\\mathbf{U}$ must be $r$ eigenvectors of $\\tilde{\\mathbf{S}}$. \n", " - Step 4: Identify which $r$ eigenvectors of $\\tilde{\\mathbf{S}}$ to use in $\\mathbf{U}$ and then derive the solution to $\\mathbf{F}$. \n", " \n", "2. Show that, for fixed $\\mathbf{F}$, we can find the maximizer $\\mathbf{D}$ explicitly. (Hint: first-order optimality condition.)\n", " \n", "Combining 1 and 2, a natural algorithm for finding the MLE of factor analysis model is to alternately update $\\mathbf{F}$ and $\\mathbf{D}$ until convergence. " ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "hide_input": false, "kernelspec": { "display_name": "Julia 1.8.0", "language": "julia", "name": "julia-1.8" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.8.0" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "87px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "skip_h1_title": true, "threshold": 4, "toc_cell": false, "toc_position": { "height": "447px", "left": "0px", "right": "842.667px", "top": "32.5625px", "width": "148px" }, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 4 }