{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4.3.3 Reduced-Rank Linear Discriminant Analysis\n",
    "\n",
    "The K centroids (p-dimensional) lie in an affine subspace of dimension <= K - 1. Morever, in locating the closest centroid, we can ignore distances orthogonal to this subspace, since they contribute equally to each class. Thus, we might just project the $X^{*}$ onto this subspace.\n",
    "\n",
    "If K = 3, this could allow us to view the data in a two dimensional plot.\n",
    "\n",
    "If K > 3, we might then ask for a L < K - 1 dimensional subspace $H_l \\subseteq H_{k-1}$. In summary, finding the sequences of optimal subspaces for LDA involves the following steps:\n",
    "\n",
    "\n",
    "- compute the $K \\times p$ matrix of class centroids **M** and the common covariance matrix **W** (for within-class covariance);\n",
    "\n",
    "- Compute $\\mathbf{M}^{*}=\\mathbf{M}\\mathbf{W}^{-1/2}$ using the eigen-decomposition of **W**;\n",
    "\n",
    "- Compute $\\mathbf{B}^{*}$, the covariance matrix of $\\mathbf{M}^{*}$ (**B** the *between-class* covariance), and its eigen-decomposition $\\mathbf{B}^{*}=\\mathbf{V}^{*}\\mathbf{D}_B\\mathbf{V}^{*T}$. The columns $v_l^{*} \\text{ of } \\mathbf{V}^{*}$ in sequence from first to last define the coordinates of the optimal subspaces.\n",
    "\n",
    "Combining all these operations the lth *discriminant variable* is given by $Z_l = v_l^TX$ with $v_l=\\mathbf{W}^{-1/2}v_l^{*}$.\n",
    "\n",
    "**TODO**: Implement the algorithm and plot Figure 4.8. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fisher arrived at this decomposition via a different route. He posed the problem:\n",
    "\n",
    ">*Find the linear combination Z = a^T X such that the between-class variance is maximized relative to the within-class variance.*\n",
    "\n",
    "The between-class variance of Z is $a^T\\mathbf{B}a$ and within-class variance $a^T\\mathbf{W}a$, where **W** is defined earlier, and **B** is the covariance matrix of the class centroid matrix **M**. Note that $\\mathbf{B}+\\mathbf{W}=\\mathbf{T}$, where **T** is the total covariance matrix of X, ignoring class information.\n",
    "\n",
    "Fisher's problem amounts to maximizing the *Rayleigh quotient* (4.15):\n",
    "$$\n",
    "\\max_{a} \\cfrac{a^T\\mathbf{B}a}{a^T\\mathbf{W}a}\n",
    "$$\n",
    "\n",
    "or equivalently (4.16):\n",
    "\n",
    "$$\n",
    "\\max_{a} a^T\\mathbf{B}a \\text{ subject to } a^T\\mathbf{W}a = 1.\n",
    "$$\n",
    "\n",
    "This is a generalized eigenvalue problem, with *a* given by the largest eigenvalue of $\\mathbf{W}^{-1}\\mathbf{B}$.\n",
    "- The optimal $a_1$ is identical to $v_1$ (defined above)\n",
    "- The next direction $a_2$, orthogonal in **W** to $a_1$ is $a_2=v_2$.\n",
    "- The $a_l$ are referred to as *discriminant coordinates*, or *canonical variates*.\n",
    "\n",
    "To summarize the developments so far:\n",
    "\n",
    "- Classification can be achieved by sphering the data with respect to **W**.\n",
    "\n",
    "- Since only the relative distance to the centroids count, one can confine the data to the subspace spanned by the centroids in the sphered space.\n",
    "\n",
    "- This subsapce can be further decomposed into successively optimal subspaces in term of centroid separation. This decomposition is identical to the decomposition due to Fisher."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TODO**: Finish the last paragraphs."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}