{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": "true"
   },
   "source": [
    "# Table of Contents\n",
    " <p><div class=\"lev1 toc-item\"><a href=\"#MM-Algorithm-(KL-Chapter-12)\" data-toc-modified-id=\"MM-Algorithm-(KL-Chapter-12)-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>MM Algorithm (KL Chapter 12)</a></div><div class=\"lev2 toc-item\"><a href=\"#MM-algorithm\" data-toc-modified-id=\"MM-algorithm-11\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>MM algorithm</a></div><div class=\"lev2 toc-item\"><a href=\"#MM-example:-t-distribution-revisited\" data-toc-modified-id=\"MM-example:-t-distribution-revisited-12\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>MM example: t-distribution revisited</a></div><div class=\"lev2 toc-item\"><a href=\"#MM-example:-non-negative-matrix-factorization-(NNMF)\" data-toc-modified-id=\"MM-example:-non-negative-matrix-factorization-(NNMF)-13\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>MM example: non-negative matrix factorization (NNMF)</a></div><div class=\"lev2 toc-item\"><a href=\"#MM-example:-Netflix-and-matrix-completion\" data-toc-modified-id=\"MM-example:-Netflix-and-matrix-completion-14\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>MM example: Netflix and matrix completion</a></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MM Algorithm (KL Chapter 12)\n",
    "\n",
    "<img src=\"https://people.healthsciences.ucla.edu/institution/photo?personnel%5Fid=45702\" width=\"200\" align=\"center\"/>\n",
    "\n",
    "<img src=\"http://cdn3.bigcommerce.com/s-q8xt814/products/608/images/955/OT147large__19280.1468256169.380.500.jpg?c=2\" width=\"200\" align=\"center\"/>\n",
    "\n",
    "## MM algorithm\n",
    "\n",
    "* Recall our picture for understanding the ascent property of EM\n",
    "\n",
    "<img src=\"./mm_illustration.pdf\" width=\"300\" align=\"center\"/>\n",
    "\n",
    "<img src=\"./mm_2Dgraph.pdf\" width=\"300\" align=\"center\"/>\n",
    "\n",
    "* EM as a minorization-maximization (MM) algorithm\n",
    "    * The $Q$ function constitutes a **minorizing** function of the objective function up to an additive constant\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\tL(\\theta) &\\ge& Q(\\theta|\\theta^{(t)}) + c^{(t)} \\quad \\text{for all } \\theta \\\\\n",
    "\tL(\\theta^{(t)}) &=& Q(\\theta^{(t)}|\\theta^{(t)}) + c^{(t)}\n",
    "\\end{eqnarray*}    \n",
    "$$\n",
    "    * **Maximizing** the $Q$ function generates an ascent iterate $\\theta^{(t+1)}$\n",
    "\n",
    "* Questions:\n",
    "    1. Is EM principle only limited to maximizing likelihood model?\n",
    "    2. Can we flip the picture and apply same principle to **minimization** problem?\n",
    "    3. Is there any other way to produce such surrogate function?  \n",
    "\n",
    "* These thoughts lead to a powerful tool - MM algorithm.  \n",
    "**Lange, K.**, Hunter, D. R., and Yang, I. (2000). Optimization transfer using surrogate objective functions. _J. Comput. Graph. Statist._, 9(1):1-59. With discussion, and a rejoinder by Hunter and Lange.\n",
    "\n",
    "* For maximization of $f(\\theta)$ - **minorization-maximization (MM)** algorithm\n",
    "    * **M**inorization step: Construct a surrogate function $g(\\theta|\\theta^{(t)})$ that satisfies\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\tf(\\theta) &\\ge& g(\\theta|\\theta^{(t)}) \\quad \\text{(dominance condition)} \\\\\n",
    "\tf(\\theta^{(t)}) &=& g(\\theta^{(t)}|\\theta^{(t)}) \\quad \\text{(tangent condition)}.\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "    * **M**aximization step: \n",
    "$$\n",
    "\t\\theta^{(t+1)} = \\text{argmax} \\, g(\\theta|\\theta^{(t)}).\n",
    "$$\n",
    "\n",
    "* _Ascent_ property of minorization-maximization algorithm\n",
    "$$\n",
    "f(\\theta^{(t)}) = g(\\theta^{(t)}|\\theta^{(t)}) \\le g(\\theta^{(t+1)}|\\theta^{(t)}) \\le f(\\theta^{(t+1)}).\n",
    "$$\n",
    "\n",
    "* EM is a special case of minorization-maximization (MM) algorithm.\n",
    "\n",
    "* For minimization of $f(\\theta)$ - **majorization-minimization (MM)** algorithm\n",
    "    * **M**ajorization step: Construct a surrogate function $g(\\theta|\\theta^{(t)})$ that satisfies\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\tf(\\theta) &\\le& g(\\theta|\\theta^{(t)}) \\quad \\text{(dominance condition)} \\\\\n",
    "\tf(\\theta^{(t)}) &=& g(\\theta^{(t)}|\\theta^{(t)}) \\quad \\text{(tangent condition)}.\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "    * **M**inimization step: \n",
    "$$\n",
    "\t\\theta^{(t+1)} = \\text{argmin} \\, g(\\theta|\\theta^{(t)}).\n",
    "$$\n",
    "\n",
    "<img src=\"./RosenbrockSurface.jpg\" width=\"300\" align=\"center\"/>\n",
    "\n",
    "* Similarly we have the _descent_ property of majorization-minimization algorithm.\n",
    "\n",
    "* Same convergence theory as EM.\n",
    "\n",
    "* Rational of the MM principle for developing optimization algorithms\n",
    "    * Free the derivation from missing data structure.\n",
    "    * Avoid matrix inversion.\n",
    "    * Linearize an optimization problem.\n",
    "    * Deal gracefully with certain equality and inequality constraints.\n",
    "    * Turn a non-differentiable problem into a smooth problem.\n",
    "    * Separate the parameters of a problem (perfect for massive, _fine-scale_ parallelization).\n",
    "\n",
    "* Generic methods for majorization and minorization - **inequalities**\n",
    "    * Jensen's (information) inequality - EM algorithms\n",
    "    * Supporting hyperplane property of a convex function\n",
    "    * The Cauchy-Schwartz inequality - multidimensional scaling\n",
    "    * Arithmetic-geometric mean inequality\n",
    "    * Quadratic upper bound principle - Bohning and Lindsay\n",
    "    * ...\n",
    "    \n",
    "* Numerous examples in KL chapter 12, or take Kenneth Lange's optimization course BIOMATH 210.\n",
    "\n",
    "* History: the name _MM algorithm_ originates from the discussion (by Xiaoli Meng) and rejoinder of the Lange, Hunter, Yang (2000) paper."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MM example: t-distribution revisited\n",
    "\n",
    "* Recall that given iid data $\\mathbf{w}_1, \\ldots, \\mathbf{w}_n$ from multivariate $t$-distribution $t_p(\\mu,\\Sigma,\\nu)$, the log-likelihood is\n",
    "\\begin{eqnarray*}\n",
    "\tL(\\mu, \\Sigma, \\nu) &=& - \\frac{np}{2} \\log (\\pi \\nu) + n \\left[ \\log \\Gamma \\left(\\frac{\\nu+p}{2}\\right) - \\log \\Gamma \\left(\\frac{\\nu}{2}\\right) \\right]\t\\\\\n",
    "\t&& - \\frac n2 \\log \\det \\Sigma + \\frac n2 (\\nu + p) \\log \\nu \\\\\n",
    "\t&& - \\frac{\\nu + p}{2} \\sum_{j=1}^n \\log [\\nu + \\delta (\\mathbf{w}_j,\\mu;\\Sigma) ].\n",
    "\\end{eqnarray*}\n",
    "\n",
    "* Early on we discussed EM algorithm and its variants for maximizing the log-likelihood.\n",
    "\n",
    "* Since $t \\mapsto - \\log t$ is a convex function, we can invoke the supporting hyperplane inequality to minorize the terms $- \\log [\\nu + \\delta (\\mathbf{w}_j,\\mu;\\Sigma)$:\n",
    "\\begin{eqnarray*}\n",
    "    - \\log [\\nu + \\delta (\\mathbf{w}_j,\\mu;\\Sigma)] &\\ge& - \\log [\\nu^{(t)} + \\delta (\\mathbf{w}_j,\\mu^{(t)};\\Sigma^{(t)})] - \\frac{\\nu + \\delta (\\mathbf{w}_j,\\mu;\\Sigma) - \\nu^{(t)} - \\delta (\\mathbf{w}_j,\\mu^{(t)};\\Sigma^{(t)})}{\\nu^{(t)} + \\delta (\\mathbf{w}_j,\\mu^{(t)};\\Sigma^{(t)})} \\\\\n",
    "    &=& - \\frac{\\nu + \\delta (\\mathbf{w}_j,\\mu;\\Sigma)}{\\nu^{(t)} + \\delta (\\mathbf{w}_j,\\mu^{(t)};\\Sigma^{(t)})} + c^{(t)},\n",
    "\\end{eqnarray*}\n",
    "where $c^{(t)}$ is a constant irrelevant to optimization.\n",
    "\n",
    "* This yields a minorization function\n",
    "\\begin{eqnarray*}\n",
    "    g(\\mu, \\Sigma, \\nu) &=& - \\frac{np}{2} \\log (\\pi \\nu) + n \\left[ \\log \\Gamma \\left(\\frac{\\nu+p}{2}\\right) - \\log \\Gamma \\left(\\frac{\\nu}{2}\\right) \\right]\t\\\\\n",
    "\t&& - \\frac n2 \\log \\det \\Sigma + \\frac n2 (\\nu + p) \\log \\nu \\\\\n",
    "\t&& - \\frac{\\nu + p}{2} \\sum_{j=1}^n \\frac{\\nu + \\delta (\\mathbf{w}_j,\\mu;\\Sigma)}{\\nu^{(t)} + \\delta (\\mathbf{w}_j,\\mu^{(t)};\\Sigma^{(t)})} + c^{(t)}.\n",
    "\\end{eqnarray*}\n",
    "Now we can deploy the block ascent strategy. \n",
    "    * Fixing $\\nu$, update of $\\mu$ and $\\Sigma$ are same as in EM algorithm.  \n",
    "    * Fixing $\\mu$ and $\\Sigma$, we can update $\\nu$ by working on the observed log-likelihood directly.  \n",
    "\n",
    "Overall this MM algorithm coincides with ECME.\n",
    "\n",
    "* An alternative MM derivation leads to the optimal EM algorithm derived in Meng and van Dyk (1997). See [Wu and Lange (2010)](https://doi.org/10.1214/08-STS264)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MM example: non-negative matrix factorization (NNMF)\n",
    "\n",
    "<img src=\"./nnmf.png\" width=\"500\" align=\"center\"/>\n",
    "\n",
    "* Nonnegative matrix factorization (NNMF) was introduced by Lee and Seung ([1999](http://www.nature.com/nature/journal/v401/n6755/abs/401788a0.html), [2001](https://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf)) as an analog of principal components and vector quantization with applications in data compression and clustering. \n",
    "\n",
    "* In mathematical terms, one approximates a data matrix $\\mathbf{X} \\in \\mathbb{R}^{m \\times n}$ with nonnegative entries $x_{ij}$ by a product of two low-rank matrices $\\mathbf{V} \\in \\mathbb{R}^{m \\times r}$ and $\\mathbf{W} \\in \\mathbb{R}^{r \\times n}$ with nonnegative entries $v_{ik}$ and $w_{kj}$. \n",
    "\n",
    "* Consider minimization of the squared Frobenius norm\n",
    "$$\n",
    "\tL(\\mathbf{V}, \\mathbf{W}) = \\|\\mathbf{X} - \\mathbf{V} \\mathbf{W}\\|_{\\text{F}}^2 = \\sum_i \\sum_j \\left( x_{ij} - \\sum_k v_{ik} w_{kj} \\right)^2, \\quad v_{ik} \\ge 0, w_{kj} \\ge 0,\n",
    "$$\n",
    "which should lead to a good factorization.\n",
    "\n",
    "* $L(\\mathbf{V}, \\mathbf{W})$ is not convex, but bi-convex. The strategy is to alternately update $\\mathbf{V}$ and $\\mathbf{W}$.\n",
    "\n",
    "* The key is the majorization, via convexity of the function $(x_{ij} - x)^2$, \n",
    "$$\n",
    "\t\\left( x_{ij} - \\sum_k v_{ik} w_{kj} \\right)^2 \\le \\sum_k \\frac{a_{ikj}^{(t)}}{b_{ij}^{(t)}} \\left( x_{ij} - \\frac{b_{ij}^{(t)}}{a_{ikj}^{(t)}} v_{ik} w_{kj} \\right)^2,\n",
    "$$\n",
    "where\n",
    "$$\n",
    "\ta_{ikj}^{(t)} = v_{ik}^{(t)} w_{kj}^{(t)}, \\quad b_{ij}^{(t)} = \\sum_k v_{ik}^{(t)} w_{kj}^{(t)}.\n",
    "$$\n",
    "\n",
    "* This suggests the alternating multiplicative updates\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\tv_{ik}^{(t+1)} &=& v_{ik}^{(t)} \\frac{\\sum_j x_{ij} w_{kj}^{(t)}}{\\sum_j b_{ij}^{(t)} w_{kj}^{(t)}} \\\\\n",
    "\tb_{ij}^{(t+1/2)} &=& \\sum_k v_{ik}^{(t+1)} w_{kj}^{(t)} \\\\\n",
    "\tw_{kj}^{(t+1)} &=& w_{kj}^{(t)} \\frac{\\sum_i x_{ij} v_{ik}^{(t+1)}}{\\sum_i b_{ij}^{(t+1/2)} v_{ik}^{(t+1)}}.\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "\n",
    "* The update in matrix notation is extremely simple\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\t\\mathbf{B}^{(t)} &=& \\mathbf{V}^{(t)} \\mathbf{W}^{(t)} \\\\\n",
    "\t\\mathbf{V}^{(t+1)} &=&  \\mathbf{V}^{(t)} \\odot (\\mathbf{X} \\mathbf{W}^{(t)T}) \\oslash (\\mathbf{B}^{(t)} \\mathbf{W}^{(t)T}) \\\\\n",
    "\t\\mathbf{B}^{(t+1/2)} &=& \\mathbf{V}^{(t+1)} \\mathbf{W}^{(t)} \\\\\n",
    "\t\\mathbf{W}^{(t+1)} &=& \\mathbf{W}^{(t)} \\odot (\\mathbf{X}^T \\mathbf{V}^{(t+1)}) \\oslash (\\mathbf{B}^{(t+1/2)T} \\mathbf{V}^{(t)}),\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "where $\\odot$ denotes elementwise multiplication and $\\oslash$ denotes elementwise division. If we start with $v_{ik}, w_{kj} > 0$, parameter iterates stay positive.\n",
    "\n",
    "* Monotoniciy of the algorithm is free (from MM principle).\n",
    "\n",
    "* Database \\#1 from the MIT Center for Biological and Computational Learning (CBCL, http://cbcl.mit.edu) reduces to a matrix X containing $m = 2,429$ gray-scale face images with $n = 19 \\times 19 = 361$ pixels per face. Each image (row) is scaled to have mean and standard deviation 0.25.\n",
    "\n",
    "* My implementation in around **2010**:\n",
    "\n",
    "CPU: Intel Core 2 @ 3.2GHZ (4 cores), implemented using BLAS in the GNU Scientific Library (GSL)  \n",
    "GPU: NVIDIA GeForce GTX 280 (240 cores), implemented using cuBLAS.\n",
    "\n",
    "<img src=\"./nnmf_runtime.png\" width=\"500\" align=\"center\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MM example: Netflix and matrix completion\n",
    "\n",
    "* Snapshot of the kind of data collected by Netflix. Only 100,480,507 ratings (1.2% entries of the 480K-by-18K matrix) are observed\n",
    "\n",
    "<img src=\"./netflix_matrix.png\" width=\"500\" align=\"center\"/>\n",
    "\n",
    "* Netflix challenge: impute the unobserved ratings for personalized recommendation.\n",
    "http://en.wikipedia.org/wiki/Netflix_Prize\n",
    "\n",
    "<img src=\"./netflix_prize.png\" width=\"500\" align=\"center\"/>\n",
    "\n",
    "* **Matrix completion problem**. Observe a very sparse matrix $\\mathbf{Y} = (y_{ij})$. Want to impute all the missing entries. It is possible only when the matrix is structured, e.g., of _low rank_.\n",
    "\n",
    "* Let $\\Omega = \\{(i,j): \\text{observed entries}\\}$ index the set of observed entries and $P_\\Omega(\\mathbf{M})$ denote the projection of matrix $\\mathbf{M}$ to $\\Omega$. The problem\n",
    "$$\n",
    "\t\\min_{\\text{rank}(\\mathbf{X}) \\le r} \\frac{1}{2} \\| P_\\Omega(\\mathbf{Y}) - P_\\Omega(\\mathbf{X})\\|_{\\text{F}}^2 = \\frac{1}{2} \\sum_{(i,j) \\in \\Omega} (y_{ij} - x_{ij})^2\n",
    "$$\n",
    "unfortunately is non-convex and difficult.\n",
    "\n",
    "* _Convex relaxation_: [Mazumder, Hastie, Tibshirani (2010)](https://web.stanford.edu/~hastie/Papers/mazumder10a.pdf)\n",
    "$$\n",
    "\t\\min_{\\mathbf{X}} f(\\mathbf{X}) = \\frac{1}{2} \\| P_\\Omega(\\mathbf{Y}) - P_\\Omega(\\mathbf{X})\\|_{\\text{F}}^2 + \\lambda \\|\\mathbf{X}\\|_*,\n",
    "$$\n",
    "where $\\|\\mathbf{X}\\|_* = \\|\\sigma(\\mathbf{X})\\|_1 = \\sum_i \\sigma_i(\\mathbf{X})$ is the nuclear norm.\n",
    "\n",
    "* **Majorization step**:\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\tf(\\mathbf{X}) &=& \\frac{1}{2} \\sum_{(i,j) \\in \\Omega} (y_{ij} - x_{ij})^2 +  \\frac 12 \\sum_{(i,j) \\notin \\Omega} 0 + \\lambda \\|\\mathbf{X}\\|_* \\\\\n",
    "\t&\\le& \\frac{1}{2} \\sum_{(i,j) \\in \\Omega} (y_{ij} - x_{ij})^2 +  \\frac{1}{2} \\sum_{(i,j) \\notin \\Omega} (x_{ij}^{(t)} - x_{ij})^2  + \\lambda \\|\\mathbf{X}\\|_* \\\\\n",
    "\t&=& \\frac{1}{2} \\| \\mathbf{X} - \\mathbf{Z}^{(t)}\\|_{\\text{F}}^2 + \\lambda \\|\\mathbf{X}\\|_* \\\\\n",
    "\t&=& g(\\mathbf{X}|\\mathbf{X}^{(t)}),\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "where $\\mathbf{Z}^{(t)} = P_\\Omega(\\mathbf{Y}) + P_{\\Omega^\\perp}(\\mathbf{X}^{(t)})$. \"fill in missing entries by the current imputation\"\n",
    "\n",
    "* **Minimization step**: Rewrite the surrogate function\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\tg(\\mathbf{X}|\\mathbf{X}^{(t)}) = \\frac{1}{2} \\|\\mathbf{X}\\|_{\\text{F}}^2  - \\text{tr}(\\mathbf{X}^T \\mathbf{Z}^{(t)}) + \\frac{1}{2} \\|\\mathbf{Z}^{(t)}\\|_{\\text{F}}^2 + \\lambda \\|\\mathbf{X}\\|_*.\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "Let $\\sigma_i$ be the (ordered) singular values of $\\mathbf{X}$ and $\\omega_i$ be the (ordered) singular values of $\\mathbf{Z}^{(t)}$. Observe\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\t\\|\\mathbf{X}\\|_{\\text{F}}^2 &=& \\|\\sigma(\\mathbf{X})\\|_2^2 = \\sum_{i} \\sigma_i^2 \\\\\n",
    "\t\\|\\mathbf{Z}^{(t)}\\|_{\\text{F}}^2 &=& \\|\\sigma(\\mathbf{Z}^{(t)})\\|_2^2 = \\sum_{i} \\omega_i^2\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "and by the [Fan-von Neuman's inequality](https://en.wikipedia.org/wiki/Trace_inequalities#Von_Neumann.27s_trace_inequality)\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\t\\text{tr}(\\mathbf{X}^T \\mathbf{Z}^{(t)}) \\le \\sum_i \\sigma_i \\omega_i\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "with equality achieved if and only if the left and right singular vectors of the two matrices coincide. Thus we should choose $\\mathbf{X}$ to have same singular vectors as $\\mathbf{Z}^{(t)}$ and\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\tg(\\mathbf{X}|\\mathbf{X}^{(t)}) &=& \\frac{1}{2} \\sum_i \\sigma_i^2 - \\sum_i \\sigma_i \\omega_i + \\frac{1}{2} \\sum_i \\omega_i^2 + \\lambda \\sum_i \\sigma_i \\\\\n",
    "\t&=& \\frac{1}{2} \\sum_i (\\sigma_i - \\omega_i)^2 + \\lambda \\sum_i \\sigma_i,\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "with minimizer given by **singular value thresholding**\n",
    "$$\n",
    "    \\sigma_i^{(t+1)} = \\max(0, \\omega_i - \\lambda) = (\\omega_i - \\lambda)_+.\n",
    "$$\n",
    "\n",
    "* Algorithm for matrix completion:\n",
    "    * Initialize $\\mathbf{X}^{(0)} \\in \\mathbb{R}^{m \\times n}$\n",
    "    * Repeat\n",
    "        * $\\mathbf{Z}^{(t)} \\gets P_\\Omega(\\mathbf{Y}) + P_{\\Omega^\\perp}(\\mathbf{X}^{(t)})$  \n",
    "        * Compute SVD: $\\mathbf{U} \\text{diag}(\\mathbf{w}) \\mathbf{V}^T \\gets \\mathbf{Z}^{(t)}$  \n",
    "        * $\\mathbf{X}^{(t+1)} \\gets \\mathbf{U} \\text{diag}[(\\mathbf{w}-\\lambda)_+] \\mathbf{V}^T$\n",
    "    * objective value converges\n",
    "\n",
    "* \"Golub-Kahan-Reinsch\" algorithm takes $4m^2n + 8mn^2 + 9n^3$ flops for an $m \\ge n$ matrix and is **not** going to work for 480K-by-18K Netflix matrix.\n",
    "\n",
    "* Notice only top singular values are needed since low rank solutions are achieved by large $\\lambda$. Lanczos/Arnoldi algorithm is the way to go. Matrix-vector multiplication $\\mathbf{Z}^{(t)} \\mathbf{v}$ is fast (why?)"
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Julia 1.1.0",
   "language": "julia",
   "name": "julia-1.1"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.1.0"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "82px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "threshold": 4,
   "toc_cell": true,
   "toc_position": {
    "height": "555.5999755859375px",
    "left": "0px",
    "right": "1069.199951171875px",
    "top": "109.4000015258789px",
    "width": "181.1999969482422px"
   },
   "toc_section_display": "block",
   "toc_window_display": true,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}