{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 4.3.3 Reduced-Rank Linear Discriminant Analysis\n", "\n", "The K centroids (p-dimensional) lie in an affine subspace of dimension <= K - 1. Morever, in locating the closest centroid, we can ignore distances orthogonal to this subspace, since they contribute equally to each class. Thus, we might just project the $X^{*}$ onto this subspace.\n", "\n", "If K = 3, this could allow us to view the data in a two dimensional plot.\n", "\n", "If K > 3, we might then ask for a L < K - 1 dimensional subspace $H_l \\subseteq H_{k-1}$. In summary, finding the sequences of optimal subspaces for LDA involves the following steps:\n", "\n", "\n", "- compute the $K \\times p$ matrix of class centroids **M** and the common covariance matrix **W** (for within-class covariance);\n", "\n", "- Compute $\\mathbf{M}^{*}=\\mathbf{M}\\mathbf{W}^{-1/2}$ using the eigen-decomposition of **W**;\n", "\n", "- Compute $\\mathbf{B}^{*}$, the covariance matrix of $\\mathbf{M}^{*}$ (**B** the *between-class* covariance), and its eigen-decomposition $\\mathbf{B}^{*}=\\mathbf{V}^{*}\\mathbf{D}_B\\mathbf{V}^{*T}$. The columns $v_l^{*} \\text{ of } \\mathbf{V}^{*}$ in sequence from first to last define the coordinates of the optimal subspaces.\n", "\n", "Combining all these operations the lth *discriminant variable* is given by $Z_l = v_l^TX$ with $v_l=\\mathbf{W}^{-1/2}v_l^{*}$.\n", "\n", "**TODO**: Implement the algorithm and plot Figure 4.8. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fisher arrived at this decomposition via a different route. He posed the problem:\n", "\n", ">*Find the linear combination Z = a^T X such that the between-class variance is maximized relative to the within-class variance.*\n", "\n", "The between-class variance of Z is $a^T\\mathbf{B}a$ and within-class variance $a^T\\mathbf{W}a$, where **W** is defined earlier, and **B** is the covariance matrix of the class centroid matrix **M**. Note that $\\mathbf{B}+\\mathbf{W}=\\mathbf{T}$, where **T** is the total covariance matrix of X, ignoring class information.\n", "\n", "Fisher's problem amounts to maximizing the *Rayleigh quotient* (4.15):\n", "$$\n", "\\max_{a} \\cfrac{a^T\\mathbf{B}a}{a^T\\mathbf{W}a}\n", "$$\n", "\n", "or equivalently (4.16):\n", "\n", "$$\n", "\\max_{a} a^T\\mathbf{B}a \\text{ subject to } a^T\\mathbf{W}a = 1.\n", "$$\n", "\n", "This is a generalized eigenvalue problem, with *a* given by the largest eigenvalue of $\\mathbf{W}^{-1}\\mathbf{B}$.\n", "- The optimal $a_1$ is identical to $v_1$ (defined above)\n", "- The next direction $a_2$, orthogonal in **W** to $a_1$ is $a_2=v_2$.\n", "- The $a_l$ are referred to as *discriminant coordinates*, or *canonical variates*.\n", "\n", "To summarize the developments so far:\n", "\n", "- Classification can be achieved by sphering the data with respect to **W**.\n", "\n", "- Since only the relative distance to the centroids count, one can confine the data to the subspace spanned by the centroids in the sphered space.\n", "\n", "- This subsapce can be further decomposed into successively optimal subspaces in term of centroid separation. This decomposition is identical to the decomposition due to Fisher." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TODO**: Finish the last paragraphs." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }