{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Biostat M280 Homework 6\n",
    "\n",
    "**Due ~~June 7 @ 11:59PM~~ June 9 @ 11:59PM**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Consider again the MLE of the Dirichlet-multinomial model. In [HW5](http://hua-zhou.github.io/teaching/biostatm280-2019spring/hw/hw5/hw05.html), we worked out a Newton's method. In this homework, we explore the MM and EM approach."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q1\n",
    "\n",
    "Show that, given iid observations $\\mathbf{x}_1,\\ldots,\\mathbf{x}_n$, the log-likelihood can be written as\n",
    "$$\n",
    "L(\\alpha) = \\sum_{i=1}^n \\ln \\binom{|\\mathbf{x}_i|}{\\mathbf{x}_i}\n",
    "+\\sum_{i=1}^n \\sum_{j=1}^d \\sum_{k=0}^{x_{ij}-1} \\ln(\\alpha_j+k) - \\sum_{i=1}^n \\sum_{k=0}^{|\\mathbf{x}_i|-1} \\ln(|\\alpha|+k).\n",
    "$$\n",
    "Hint: $\\Gamma(a + k) / \\Gamma(a) = a (a + 1) \\cdots (a+k-1)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q2\n",
    "\n",
    "Suppose $(P_1,\\ldots,P_d) \\in \\Delta_d = \\{\\mathbf{p}: p_i \\ge 0, \\sum_i p_i = 1\\}$ follows a Dirichlet distribution with parameter $\\alpha = (\\alpha_1,\\ldots,\\alpha_d)$. Show that\n",
    "$$\n",
    "\t\\mathbf{E}(\\ln P_j) = \\Psi(\\alpha_j) - \\Psi(|\\alpha|),\n",
    "$$\n",
    "where $\\Psi(z) = \\Gamma'(z)/\\Gamma(z)$ is the digamma function and $|\\alpha| = \\sum_{j=1}^d \\alpha_j$. Hint: Differentiate the identity \n",
    "$$\n",
    "1 = \\int_{\\Delta_d} \\frac{\\Gamma(|\\alpha|)}{\\prod_{j=1}^d \\Gamma(\\alpha_j)} \\prod_{j=1}^d p_j^{\\alpha_j-1} \\, d\\mathbf{p}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q3\n",
    "\n",
    "The admixture representation of the Dirichlet-multinomial distribution suggests that we can treat the unobserved multinomial parameters $\\mathbf{p}_1,\\ldots,\\mathbf{p}_n$ as missing data and derive an EM algorithm. Show that the Q function is\n",
    "$$\n",
    "    Q(\\alpha|\\alpha^{(t)}) = \n",
    "\\sum_{j=1}^d \\sum_{i=1}^n \\alpha_j \\left[\\Psi(x_{ij}+\\alpha_j^{(t)}) - \\Psi(|\\mathbf{x}_i|+|\\alpha^{(t)}|)\\right] - n \\sum_{j=1}^d \\ln \\Gamma(\\alpha_j) + n \\ln \\Gamma(|\\alpha|) + c^{(t)},\n",
    "$$\n",
    "where $c^{(t)}$ is a constant irrelevant to optimization. Explain why it is not easy to maximize the Q function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q4\n",
    "\n",
    "We derive an MM algorithm for maximing $L$. Consider the formulation of the log-likelihood that contains terms $\\ln (\\alpha_j + k)$ and $- \\ln (|\\alpha|+k)$. Applying Jensen's inequality to the concave term $\\ln (\\alpha_j + k)$ and supporting hyperplane inequality to the convex term $- \\ln (|\\alpha|+k)$, show that a minorizing function to $L(\\alpha)$ is\n",
    "$$\n",
    "\tg(\\alpha|\\alpha^{(t)}) = - \\sum_{k=0}^{\\max_i |\\mathbf{x}_i|-1} \\frac{r_k}{|\\alpha^{(t)}|+k} |\\alpha| + \\sum_{j=1}^d \\sum_{k=0}^{\\max_i x_{ij}-1} \\frac{s_{jk} \\alpha_j^{(t)}}{\\alpha_j^{(t)}+k} \\ln \\alpha_j + c^{(t)},\n",
    "$$\n",
    "where $s_{jk} = \\sum_{i=1}^n 1_{\\{x_{ij} > k\\}}$, $r_k = \\sum_{i=1}^n 1_{\\{|\\mathbf{x}_i| > k\\}}$, and  $c^{(t)}$ is a constant irrelevant to optimization. Maximizing the surrogate function $g(\\alpha|\\alpha^{(t)})$ is trivial since $\\alpha_j$ are separated. Show that the MM updates are\n",
    "$$\n",
    "\t\\alpha_j^{(t+1)} = \\frac{\\sum_{k=0}^{\\max_i x_{ij}-1} \\frac{s_{jk}}{\\alpha_j^{(t)}+k}}{\\sum_{k=0}^{\\max_i |\\mathbf{x}_i|-1} \\frac{r_k}{|\\alpha^{(t)}|+k}} \\alpha_j^{(t)}, \\quad j=1,\\ldots,d.\n",
    "$$\n",
    "The quantities $s_{jk}$, $r_k$, $\\max_i x_{ij}$ and $\\max_i |\\mathbf{x}_i|$ only depend on data and can be pre-computed. Comment on whether the MM updates respect the parameter constraint $\\alpha_j>0$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q5\n",
    "\n",
    "Write a function for finding MLE of Dirichlet-multinomial distribution given iid observations $\\mathbf{x}_1,\\ldots,\\mathbf{x}_n$, using MM algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dirmult_mm"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\"\"\n",
    "    dirmult_mm(X)\n",
    "\n",
    "Find the MLE of Dirichlet-multinomial distribution using MM algorithm.\n",
    "\n",
    "# Argument\n",
    "* `X`: an `d`-by-`n` matrix of counts; each column is one data point.\n",
    "\n",
    "# Optional argument  \n",
    "* `alpha0`: starting point. \n",
    "* `maxiters`: the maximum allowable Newton iterations (default 100). \n",
    "* `tolfun`: the tolerance for  relative change in objective values (default 1e-6). \n",
    "\n",
    "# Output\n",
    "# Output\n",
    "* `logl`: the log-likelihood at MLE.   \n",
    "* `niter`: the number of iterations performed.\n",
    "# `α`: the MLE.\n",
    "* `∇`: the gradient at MLE. \n",
    "* `obsinfo`: the observed information matrix at MLE. \n",
    "\"\"\"\n",
    "function dirmult_mm(\n",
    "    X::AbstractMatrix; \n",
    "    α0::Vector = dirmult_mom(X), \n",
    "    maxiters::Int = 100, \n",
    "    tolfun = 1e-6\n",
    "    )\n",
    "    \n",
    "    # your code here\n",
    "    \n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q6\n",
    "\n",
    "Re-do [HW5 Q9](http://hua-zhou.github.io/teaching/biostatm280-2019spring/hw/hw5/hw05.html#Q9) using your new `dirmult_mm` function. Compare the number of iterations and run time by MM algorithm to those by Newton's method. Comment on the efficiency of Newton's algorithm vs MM algorithm for this problem."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q7\n",
    "\n",
    "Finally let us re-consider the EM algorithm. The difficulty with the M step in EM algorithm can be remedied. Discuss how we can further minorize the $\\ln \\Gamma(|\\alpha|)$ term in the $Q$ function to produce a minorizing function with all $\\alpha_j$ separated. For this homework, you do **not** need to implement this EM-MM hybrid algorithm. Hint: $z \\mapsto \\ln \\Gamma(z)$ is a convex function."
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Julia 1.1.0",
   "language": "julia",
   "name": "julia-1.1"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.1.0"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "65px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}