{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Biostat 257 Homework 6\n",
    "\n",
    "**This homework is optional. Do it if you want to get hands-on experience with derivation and implementation of MM algorithm. I am glad to answer any questions you have.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again we continue with the linear mixed effects model (LMM)\n",
    "$$\n",
    "    \\mathbf{Y}_i = \\mathbf{X}_i \\boldsymbol{\\beta} + \\mathbf{Z}_i \\boldsymbol{\\gamma} + \\boldsymbol{\\epsilon}_i, \\quad i=1,\\ldots,n,\n",
    "$$\n",
    "where   \n",
    "- $\\mathbf{Y}_i \\in \\mathbb{R}^{n_i}$ is the response vector of $i$-th individual,  \n",
    "- $\\mathbf{X}_i \\in \\mathbb{R}^{n_i \\times p}$ is the fixed effects predictor matrix of $i$-th individual,  \n",
    "- $\\mathbf{Z}_i \\in \\mathbb{R}^{n_i \\times q}$ is the random effects predictor matrix of $i$-th individual,  \n",
    "- $\\boldsymbol{\\epsilon}_i \\in \\mathbb{R}^{n_i}$ are multivariate normal $N(\\mathbf{0}_{n_i},\\sigma^2 \\mathbf{I}_{n_i})$,  \n",
    "- $\\boldsymbol{\\beta} \\in \\mathbb{R}^p$ are fixed effects, and  \n",
    "- $\\boldsymbol{\\gamma} \\in \\mathbb{R}^q$ are random effects assumed to be $N(\\mathbf{0}_q, \\boldsymbol{\\Sigma}_{q \\times q}$) independent of $\\boldsymbol{\\epsilon}_i$.\n",
    "\n",
    "The log-likelihood of the $i$-th datum $(\\mathbf{y}_i, \\mathbf{X}_i, \\mathbf{Z}_i)$ is \n",
    "$$\n",
    "    \\ell_i(\\boldsymbol{\\beta}, \\mathbf{L}, \\sigma_0^2) = - \\frac{n_i}{2} \\log (2\\pi) - \\frac{1}{2} \\log \\det \\boldsymbol{\\Omega}_i - \\frac{1}{2} (\\mathbf{y} - \\mathbf{X}_i \\boldsymbol{\\beta})^T \\boldsymbol{\\Omega}_i^{-1} (\\mathbf{y} - \\mathbf{X}_i \\boldsymbol{\\beta}),\n",
    "$$\n",
    "where\n",
    "$$\n",
    "    \\boldsymbol{\\Omega}_i = \\sigma^2 \\mathbf{I}_{n_i} + \\mathbf{Z}_i \\boldsymbol{\\Sigma} \\mathbf{Z}_i^T.\n",
    "$$\n",
    "Given $m$ independent data points $(\\mathbf{y}_i, \\mathbf{X}_i, \\mathbf{Z}_i)$, $i=1,\\ldots,m$, we seek the maximum likelihood estimate (MLE) by maximizing the log-likelihood\n",
    "$$\n",
    "\\ell(\\boldsymbol{\\beta}, \\boldsymbol{\\Sigma}, \\sigma_0^2) = \\sum_{i=1}^m \\ell_i(\\boldsymbol{\\beta}, \\boldsymbol{\\Sigma}, \\sigma_0^2).\n",
    "$$\n",
    "\n",
    "In HW4 and HW5, we considered the nonlinear programming (NLP) and EM algorithm for optimization. In this assignment, we derive and implement an **minorization-maximization (MM) algorithm** for the same problem."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q1. (20 pts) Convex matrix functions\n",
    "\n",
    "We say a matrix-valued function $f$ is (matrix) convex if\n",
    "$$\n",
    "f[\\lambda \\mathbf{A} + (1 - \\lambda) \\mathbf{B}] \\preceq \\lambda f(\\mathbf{A}) + (1 - \\lambda) f(\\mathbf{B})\n",
    "$$\n",
    "for all $\\mathbf{A}$, $\\mathbf{B}$, and $\\lambda \\in (0, 1)$. \n",
    "\n",
    "1. Show that the matrix fractional function\n",
    "$$\n",
    "f(\\mathbf{A}, \\mathbf{B}) = \\mathbf{A}^T \\mathbf{B}^{-1} \\mathbf{A}\n",
    "$$\n",
    "is jointly convex in $m \\times n$ matrix $\\mathbf{A}$ and $m \\times m$ positive definite matrix $\\mathbf{B}$. \n",
    "\n",
    "2. Show that the log determinant function\n",
    "$$\n",
    "f(\\mathbf{B}) = \\log \\det \\mathbf{B}\n",
    "$$\n",
    "is concave on the set of positive definite matrix."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q2. (20 pts) MM derivation - minorization step\n",
    "\n",
    "Let the covariance of $i$-th datum be\n",
    "$$\n",
    "\\boldsymbol{\\Omega}_i = \\mathbf{Z}_i \\boldsymbol{\\Sigma} \\mathbf{Z}_i^T + \\sigma^2 \\mathbf{I}_{n_i}.\n",
    "$$\n",
    "and $\\boldsymbol{\\Omega}_i^{(t)}$ be the covariance matrix evaluated at current parameter iterate $(\\boldsymbol{\\Sigma}^{(t)}, \\sigma^{2(t)})$\n",
    "$$\n",
    "\\boldsymbol{\\Omega}_i^{(t)} = \\mathbf{Z}_i \\boldsymbol{\\Sigma}^{(t)} \\mathbf{Z}_i^T + \\sigma^{2(t)} \\mathbf{I}_{n_i}.\n",
    "$$\n",
    "\n",
    "1. From Q1.1, show that \n",
    "\\begin{eqnarray*}\n",
    "& & \\boldsymbol{\\Omega}_i^{(t)} \\boldsymbol{\\Omega}_i^{-1} \\boldsymbol{\\Omega}_i^{(t)} \\\\\n",
    "&\\preceq& \\left( \\mathbf{Z}_i \\boldsymbol{\\Sigma}^{(t)} \\mathbf{Z}_i^T \\right) \\left( \\mathbf{Z}_i \\boldsymbol{\\Sigma} \\mathbf{Z}_i^T \\right)^+ \\left( \\mathbf{Z}_i \\boldsymbol{\\Sigma}^{(t)} \\mathbf{Z}_i^T \\right) + \\frac{\\sigma^{4(t)}}{\\sigma^2} \\mathbf{I}_{n_i} \\\\\n",
    "&=& \\left( \\mathbf{Z}_i \\boldsymbol{\\Sigma}^{(t)} \\mathbf{Z}_i^T \\mathbf{Z}_i^{T+} \\right) \\boldsymbol{\\Sigma}^{-1} \\left( \\mathbf{Z}_i^+ \\mathbf{Z}_i \\boldsymbol{\\Sigma}^{(t)} \\mathbf{Z}_i^T \\right) + \\frac{\\sigma^{4(t)}}{\\sigma^2} \\mathbf{I}_{n_i}.\n",
    "\\end{eqnarray*}\n",
    "Thus\n",
    "\\begin{eqnarray*}\n",
    "& & - \\frac 12 (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta})^T \\boldsymbol{\\Omega}_i^{-1} (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta}) \\\\\n",
    "&\\succeq& - \\frac 12 (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta})^T \\boldsymbol{\\Omega}_i^{-(t)} \\left( \\mathbf{Z}_i \\boldsymbol{\\Sigma}^{(t)} \\mathbf{Z}_i^T \\mathbf{Z}_i^{T+} \\right) \\boldsymbol{\\Sigma}^{-1} \\left( \\mathbf{Z}_i^+ \\mathbf{Z}_i \\boldsymbol{\\Sigma}^{(t)} \\mathbf{Z}_i^T \\right) \\boldsymbol{\\Omega}_i^{-(t)} (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta}) \\\\\n",
    "& & - \\frac{\\sigma^{4(t)}}{\\sigma^2} (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta})^T \\boldsymbol{\\Omega}_i^{-2(t)} (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta}).\n",
    "\\end{eqnarray*}\n",
    "\n",
    "2. From Q1.2, show that\n",
    "$$\n",
    "\\, - \\frac 12 \\log \\det \\boldsymbol{\\Omega} \\ge \\, - \\frac 12 \\log \\det \\boldsymbol{\\Omega}^{(t)} - \\operatorname{tr} [\\boldsymbol{\\Omega}^{-(t)} (\\boldsymbol{\\Omega} - \\boldsymbol{\\Omega}^{(t)})].\n",
    "$$\n",
    "Hint: Support hyperplane inequality for convex function.\n",
    "\n",
    "3. Combining 1 and 2, we obtain a minorization function\n",
    "\\begin{eqnarray*}\n",
    "g(\\boldsymbol{\\Omega}, \\sigma^2 \\mid \\boldsymbol{\\Omega}^{(t)}, \\sigma^{2(t)}) &=& \\sum_i g_i(\\boldsymbol{\\Omega}, \\sigma^2 \\mid \\boldsymbol{\\Omega}^{(t)}, \\sigma^{2(t)}) \\\\\n",
    "    &=& - \\frac 12 \\sum_i \\operatorname{tr} (\\mathbf{Z}_i^T \\boldsymbol{\\Omega}_i^{-(t)} \\mathbf{Z}_i \\boldsymbol{\\Sigma}) - \\frac 12 \\sum_i \\mathbf{r}_i^{(t)} \\boldsymbol{\\Sigma}^{-1} \\mathbf{r}_i^{(t)} \\\\\n",
    "    & & - \\frac{\\sigma^2}{2} \\sum_i \\operatorname{tr} \\boldsymbol{\\Omega}_i^{-(t)} - \\frac{\\sigma^{4(t)}}{2\\sigma^2} \\sum_i (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta})^T \\boldsymbol{\\Omega}_i^{-2(t)} (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta}) + c^{(t)}\n",
    "\\end{eqnarray*}\n",
    "for the LMM log-likelihood, where $c^{(t)}$ is a constant irrelavent to optimization and\n",
    "\\begin{eqnarray*}\n",
    "\\mathbf{r}_i^{(t)} &=& \\mathbf{Z}_i^+ \\mathbf{Z}_i \\boldsymbol{\\Sigma}^{(t)} \\mathbf{Z}_i^T \\boldsymbol{\\Omega}_i^{-(t)} (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta}) \\\\\n",
    "&=& \\boldsymbol{\\Sigma}^{(t)} \\mathbf{Z}_i^T \\boldsymbol{\\Omega}_i^{-(t)} (\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta}).\n",
    "\\end{eqnarray*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q3. (20 pts) MM derivation - maximization step\n",
    "\n",
    "In the maximization step of the MM algorithm, we maximize the minorization function $g$. It turns out there are analytical update for the parameters $\\boldsymbol{\\beta}$, $\\boldsymbol{\\Sigma}$, and $\\sigma^2$. \n",
    "\n",
    "1. Write down the analytical update of $\\boldsymbol{\\beta}$.\n",
    "\n",
    "2. Write down the analytical update of $\\sigma^2$. \n",
    "\n",
    "3. To update $\\boldsymbol{\\Sigma}$, we set the gradient to zero\n",
    "$$\n",
    "\\frac{\\partial}{\\partial \\boldsymbol{\\Sigma}} g(\\boldsymbol{\\Omega}, \\sigma^2 \\mid \\boldsymbol{\\Omega}^{(t)}, \\sigma^{2(t)}) = - \\frac 12 \\sum_i \\mathbf{Z}_i^T \\boldsymbol{\\Omega}_i^{-(t)} \\mathbf{Z}_i + \\frac 12 \\boldsymbol{\\Sigma}^{-1} \\left( \\sum_i \\mathbf{r}_i^{(t)} \\mathbf{r}_i^{(t)T} \\right) \\boldsymbol{\\Sigma}^{-1} = \\mathbf{0}_{q \\times q}.\n",
    "$$\n",
    "Find an analytical solution to the estimation equation\n",
    "$$\n",
    "\\boldsymbol{\\Sigma}^{-1} \\left( \\sum_i \\mathbf{r}_i^{(t)} \\mathbf{r}_i^{(t)T} \\right) \\boldsymbol{\\Sigma}^{-1} = \\sum_i \\mathbf{Z}_i^T \\boldsymbol{\\Omega}_i^{-(t)} \\mathbf{Z}_i.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q4. (50 pts) Implementation\n",
    "\n",
    "Mimic the code in HW4 and HW5 to implement the MM algorithm for finding the MLE of LMM model. \n",
    "\n",
    "1. Break complicated coding tasks into pieces: objective evaluator (10 pts), a single MM iteration (20 pts), a `fit` function for running MM iterations (10 pts). \n",
    "\n",
    "2. Modularize your code by small functions. Test the correctness and efficiency of each function separately. \n",
    "\n",
    "3. Test the MM algorithm on the same data (1000 individuals with 1500-2000 observations per individual). Make sure it achieves the same log-likelihood as EM or NLP solutions. (10 pts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q5. (10 pts) MM vs EM vs Newton type algorithms\n",
    "\n",
    "Using the same starting point and convergence criterion, contrast MM algorithm with the EM algorithm (HW5) and Newton type algorithms (HW4) in terms the convergence rate.\n",
    "\n",
    "Keep in mind comparison of algorithms is very problem specific. Usually conclusion on one problem cannot be generalized to other problems. "
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Julia 1.4.1",
   "language": "julia",
   "name": "julia-1.4"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.4.1"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "87px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}