{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Continuous Latent Variable Models - PCA and FA"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Preliminaries\n",
"\n",
"- Goal \n",
" - Introduction to Linear Latent Variable Models on continuous domains, specifically **factor analysis** and **principal component analysis**\n",
"- Materials \n",
" - Mandatory\n",
" - These lecture notes\n",
" - Optional\n",
" - Bishop pp. 570-573, 577-580, 584-586 (PCA and FA)\n",
" - M. Tipping and C. Bishop, [Probabilistic Principal Component Analysis](./files/bishop-ppca-jrss.pdf), Journal of the Royal Statistical Society. Series B, Vol.61, No.3, 1999 "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Continuous Latent Variable Models\n",
"\n",
"- (Recall that) mixture models use a discrete class variable."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- Sometimes, it is more appropriate to think in terms of **continuous**\n",
"underlying causes (factors) that control the observed data.\n",
"\n",
" - E.g., observe test results for subjects: English, Spanish and French\n",
"\n",
"$$\\begin{align*}\n",
" \\underbrace{ \\begin{bmatrix} x_1\\;(=\\text{English})\\\\ x_2\\;(=\\text{Spanish})\\\\ x_3\\;(=\\text{French}) \\end{bmatrix} }_{\\text{observed}}% &= f(\\text{causes},\\theta) + \\text{noise}\\\\\n",
"&= \\begin{bmatrix} \\lambda_{11},\\lambda_{12}\\\\ \\lambda_{21},\\lambda_{22}\\\\ \\lambda_{31},\\lambda_{32}\\end{bmatrix} \\cdot \\underbrace{ \\begin{bmatrix} z_1\\;(=\\text{literacy})\\\\ z_2\\;(=\\text{intelligence})\\end{bmatrix} }_{\\text{causes}} + \\underbrace{\\begin{bmatrix} v_1\\\\v_2\\\\v_3\\end{bmatrix} }_{\\text{noise}}\n",
"\\end{align*}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- (**Unsupervised Regression**). This is like (linear) regression with unobserved inputs."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Dimensionality Reduction\n",
"\n",
"- If the dimension for the hidden 'causes' ($z$) is smaller than for the observed data ($x$), then the model (tries to) achieve **dimensionality reduction**."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- Key applications include \n",
" 1. **compression** (store $z$ rather than $x$) \n",
" - Compression through **real-valued** latent variables can be far more efficient than with discrete clusters.\n",
" - E.g., with two 8-bit hidden factors, one can describe $2^{16}\\approx 10^5$ settings; this would take $2^{16}$ clusters!\n",
" 2. **noise reduction** (e.g. in biomedical, financial or speech signals)\n",
" 3. **feature extraction** (e.g. as a pre-processor for classification) \n",
" 4. **visualization** (particularly if $\\mathrm{dim}(Z)=2$) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example Problem: Visualization with missing data\n",
"\n",
"\n",
"- We consider 38 examples from the 18-dimensional data set from the\n",
"**Tobamovirus** data set, see section 4.1 in Tipping and Bishop (1999) (Originally from Ripley (1996)). "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- We will visualize this data set after projection onto the two principal axes (i.e., axes that explain largest data variance). "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- We will also consider the visualization problem when 20% of the data set is missing. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"38×18 Array{Int64,2}:\n",
" 17 13 14 16 4 9 14 1 13 0 11 13 5 7 1 4 11 5\n",
" 12 11 9 12 6 5 12 1 9 1 7 12 5 6 0 4 8 2\n",
" 18 16 16 16 8 6 14 1 14 0 9 12 4 8 0 2 11 3\n",
" 18 16 15 19 8 6 11 1 15 1 7 13 5 8 0 2 9 3\n",
" 17 13 13 22 8 4 18 1 10 3 8 11 7 6 1 2 10 2\n",
" 16 13 16 21 9 3 17 1 10 4 7 12 7 5 1 2 11 3\n",
" 22 19 10 16 10 4 18 1 12 2 8 11 6 8 0 1 8 2\n",
" 20 10 24 10 6 9 21 0 7 0 7 18 4 9 1 4 8 2\n",
" 20 21 12 15 9 7 11 1 9 3 8 14 6 7 0 1 10 3\n",
" 20 21 12 15 9 7 11 1 9 3 9 14 5 7 0 1 10 3\n",
" 18 11 24 10 9 6 19 0 12 0 7 14 4 11 0 4 9 1\n",
" 20 12 23 10 8 5 20 0 13 0 6 13 4 11 0 4 10 1\n",
" 18 19 18 16 8 4 12 0 12 0 10 15 8 6 1 1 12 1\n",
" ⋮ ⋮ ⋮ ⋮ \n",
" 17 12 22 10 8 5 18 0 14 0 5 13 4 10 0 3 9 1\n",
" 17 16 16 16 8 6 15 1 14 0 9 12 4 8 0 2 11 3\n",
" 19 17 15 17 7 6 15 1 14 0 8 12 4 8 0 2 10 3\n",
" 18 16 16 19 8 6 11 1 15 1 7 13 5 8 0 2 9 3\n",
" 18 17 15 17 8 6 15 1 14 0 8 12 4 8 0 3 9 3\n",
" 15 12 14 23 8 3 17 1 9 4 7 15 6 6 1 2 11 2\n",
" 13 11 14 22 7 3 17 1 10 4 8 13 6 6 1 3 11 2\n",
" 16 11 15 23 10 4 18 1 10 3 7 12 6 5 1 2 9 3\n",
" 14 11 14 25 11 3 19 2 10 2 7 12 6 5 1 2 9 3\n",
" 11 11 15 24 10 5 18 1 11 1 7 14 5 7 2 3 11 2\n",
" 15 9 12 21 8 4 21 1 10 3 7 15 7 6 1 3 10 3\n",
" 15 11 15 22 7 3 19 1 8 3 4 14 6 5 1 2 10 2"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"include(\"scripts/pca_demo_helpers.jl\")\n",
"X = readDataSet(\"datasets/virus3.dat\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Model Specification for LC-LVM\n",
"\n",
"- In this lesson, we focus on _Linear_ Continuous Latent Variable Models (**LC-LVM**). "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- Introduce observation vector ${x}\\in\\mathbb{R}^D$ and $M$-dimensional (with $M \\lt D$) real-valued **latent factor** $z$:\n",
"$$\\begin{align*}\n",
" x &= W z + \\mu + \\epsilon \\\\\n",
" z &\\sim \\mathcal{N}(0,I) \\\\\n",
" \\epsilon &\\sim \\mathcal{N}(0,\\Psi)\n",
"\\end{align*}$$\n",
"or equivalently\n",
"$$\\begin{align*}\n",
"p(x|z) &= \\mathcal{N}(x|\\,W z + \\mu,\\Psi) \\tag{likelihood}\\\\\n",
"p(z) &= \\mathcal{N}(z|\\,0,I) \\tag{prior}\n",
"\\end{align*}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- $W$ is the so-called $(D\\times M)$-dim **factor loading matrix**. The parameters of this model are given by $\\theta=\\{W,\\mu,\\Psi\\}$. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- For interesting models, the **observation noise covariance matrix** $\\Psi$ is always **diagonal**. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"\n",
"- Note the similarity of the likelihood function in LC-LVM and [linear regression](http://nbviewer.ipython.org/github/bertdv/AIP-5SSB0/blob/master/lessons/06_linear_regression/Linear-Regression.ipynb): \n",
"$$p(y|x) = \\mathcal{N}(y|\\theta^T {x}, \\sigma^2)$$\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### LC-LVM Analysis (1): The marginal distribution $p({x})$\n",
"\n",
"- Since the product of Gaussians is Gaussian, both the joint $p(x,z) = p(x|z)p(z)$, the marginal $p(x)$ and the conditional\n",
"$p(z|x)$ distributions are also Gaussian."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- The marginal distribution for the observed data is\n",
"$$\n",
"\\boxed{ p(x) = \\mathcal{N}({x}|\\,{\\mu},W W^T + \\Psi) } \n",
"$$\n",
"since the **mean** evaluates to \n",
"$$\\begin{align*}\n",
"\\mathrm{E}[x] &= \\mathrm{E}[W z + \\mu+ \\epsilon] \\\\\n",
" &= W \\mathrm{E}[z] + \\mu + \\mathrm{E}[\\epsilon] \\\\\n",
" &= \\mu \n",
"\\end{align*}$$\n",
"and the **covariance** matrix is\n",
"$$\\begin{align*}\n",
"\\mathrm{cov}[x] &= \\mathrm{E}[({x}-{\\mu})({x}-{\\mu})^T] \\\\\n",
" &= \\mathrm{E}[(W z +\\epsilon)(W z +\\epsilon)^T] \\\\\n",
" &= W \\mathrm{E}[z z^T] W^T + \\mathrm{E}[\\epsilon \\epsilon^T] \\\\\n",
" &= W W^T + \\Psi \n",
"\\end{align*}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- $\\Rightarrow$ **LC-LVM is just a MultiVariate Gaussian (MVG) model** $x \\sim \\mathcal{N}({\\mu},\\Sigma)$ with the restriction that\n",
"\n",
"$$\\Sigma= W W^T + \\Psi \\,.$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### The Covariance Matrix of $p(x)$ is of Intermediate Complexity\n",
"\n",
"- The effective covariance $\\mathrm{cov}[x] = W W^T + \\Psi$ is the low-rank outer product of two\n",
"long skinny matrices plus a diagonal matrix.\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- $\\Rightarrow$ LC-LVM provides a MVG model of **intermediate complexity**. Compare the number of free parameters:\n",
" - $D(D+1)/2$ for full Gaussian covariance $\\Sigma$\n",
" - $D(M+1)$ for LC-LVM model where $\\Sigma = W W^T + \\Psi$. \n",
" - $D$ for diagonal Gaussian covariance $\\Sigma = \\mathrm{diag}(\\sigma_i^2)$\n",
" - $1$ for isotropic Gaussian noise $\\Sigma = \\sigma^2 \\mathrm{I}$\n",
" \n",
"\n",
"\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### LC-LVM Analysis (2): The Factor Loading Matrix $W$ is Not Unique\n",
"\n",
"- The factor loading matrix $W$ can only be estimated up to a rotation matrix $R$. Namely, if we rotate $W \\rightarrow WR $, then the covariance matrix for observations $x$ does not change (N.B.: a rotation (or orthogonal) matrix $R$ is a matrix such that $R^TR = R R^T = I$):\n",
"\n",
"$$\n",
"W R (W R)^T + \\Psi = W R R^T W^T + \\Psi = W W^T + \\Psi\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- $\\Rightarrow$ Two persons that estimate ML parameters for FA on the same data are **not guaranteed to find the same parameters**, since any rotation of $W$ is equally likely."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- $\\Rightarrow$ we can infer latent **subspaces** rather than individual components. One has to be careful when interpreting the numerical values of $W$ and $z$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### LC-LVM analysis (3): Constraints on the Noise Variance $\\Psi$\n",
"\n",
"- When doing ML estimation for the parameters, a trivial solution for the covariance matrix $\\Sigma_x = W W^T + \\Psi$ is setting $\\hat W=0$ and $\\hat\\Psi$ equal to the sample variance of the data."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- In this case, all data correlation is explained as noise. (We'd like to avoid this.) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- $\\Rightarrow$ The LC-LVM model is uninteresting without some restriction on the observation noise covariance matrix $\\Psi$. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- The interesting cases are mostly for diagonal $\\Psi$. Note that if $\\Psi$ is diagonal, all correlations between the $(D)$ components of $x$ **must be explained** by the rank-$M$ matrix $W W^T$. Three model choices are common:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"##### 1. Factor Anaysis \n",
"\n",
"- In Factor Analysis (**FA**), $\\Psi$ is restricted to be _diagonal_:\n",
"\n",
"$$\\begin{align*} \n",
"\\Psi = \\mathrm{diag}(\\psi_i) \n",
"\\end{align*}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### LC-LVM analysis (3): Constraints on the Noise Variance $\\Psi$, cont'd\n",
"\n",
"\n",
"##### 2. Probabilistic Principal Component Analysis \n",
"\n",
"- In Probabilistic Principal Component Analysis (**pPCA**), the variances are further restricted to be the same,\n",
" \n",
"$$\\begin{align*} \n",
"\\Psi = \\sigma^2 I \n",
"\\end{align*}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"##### 3. Principal Component Analysis \n",
"\n",
"- The 'regular' (deterministic) Principal Component Analysis (**PCA**) procedure can be obtained by further requiring that\n",
"$$\\begin{align*} \n",
"\\Psi &= \\lim_{\\sigma^2\\rightarrow 0} \\sigma^2 I \\\\\n",
"W^T W &= I\n",
"\\end{align*}$$ \n",
"i.e., the noise model is discarded altogether and the columns of $W$ are orthonormal. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- Regular PCA is a well-known deterministic procedure for dimensionality reduction (that predates pPCA)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"$\\Rightarrow$ FA, pPCA and PCA differ only by their model for the noise variance $\\Psi$ (namely, diagonal, isotropic and 'zeros')."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Typical Applications\n",
"- In PCA (or pPCA), the noise variance is assumed to be the same for all components. This is appropriate if all components of the observed data are 'shifted' versions of each other.\n",
"\n",
"$\\Rightarrow$ **PCA is very widely applied to image and signal processing tasks!**\n",
"\n",
"- Google (May-2015): [PCA \"face recognition\"] $>$ 300K hits; [PCA \"noise reduction\"] $>$ 100K hits "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- FA is insensitive to scaling of individual components in the observed data (see appendix).\n",
"- Use FA if the data are not shifted versions of the same kind.\n",
"\n",
"$\\Rightarrow$ **FA has strong history in 'social sciences'**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### ML estimation for pPCA Model\n",
"\n",
"- Given the generative model for pPCA \n",
"$$\\begin{align*}\n",
"p(x_n|z_n) &= \\mathcal{N}(x_n\\mid W z_n + \\mu,\\sigma^2 \\mathrm{I})\\\\\n",
"p(z_n) &= \\mathcal{N}(z_n \\mid0,\\mathrm{I})\n",
"\\end{align*}$$\n",
"and observations ${D}=\\{x_1,\\dotsc,x_N\\}$, find ML estimates for the parameters $\\theta=\\{W,\\mu,\\sigma\\}$ "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- **Inference for ${\\mu}$** is easy: ${x}$ is a multivariate Gaussian with mean ${\\mu}$, so its ML estimate is\n",
"$$ \\hat {\\mu} = \\frac{1}{N}\\sum_n {x}_n$$\n",
"Now subtract $\\hat {\\mu}$ from all data points (${x}_n:= {x}_n-\\hat {\\mu}$) and assume that we have zero-mean data."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- For ML estimation of $W$ and $\\sigma^2$, both gradient-ascent and EM are possible. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Solution method 1: Gradient-ascent on the log-likelihood \n",
"\n",
"- Work out the gradients for the log-likelihood\n",
"$$\\begin{align*}\\log\\, &p({D}|{\\theta}) \\\\ &= -\\frac{N}{2} \\log \\lvert 2\\pi(W W^T + \\sigma^2 \\mathrm{I})\\rvert -\\frac{1}{2}\\sum_n {x}_n^T(W W^T + \\sigma^2 \\mathrm{I})^{-1}{x}_n\\end{align*}$$\n",
"and optimize w.r.t. $W$ and $\\sigma^2$. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- (Similarly to ML estimation in Gaussian mixture models), it turns out to be quite difficult to work out the gradient because of the coupling between $W$ and $\\sigma^2$ (but it is possible, see [Tipping and Bishop, 1999](./files/bishop-ppca-jrss.pdf))."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Solution method 2: Use EM\n",
"\n",
"- A big bonus for EM over gradient-based methods is that EM comfortably handles missing observations, e.g. through sensor malfunction. Missing observations are simply treated as hidden variables. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- Maximizing the _expected complete-data log-likelihood_ leads to the following (see Bishop, pg.578 for derivation): "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"$$\\begin{align*}\n",
"\\textbf{E-step}:& \\\\\n",
"M &= W^T W + \\sigma^2 \\mathrm{I}\\\\\n",
"\\mathrm{E}\\left[ z_n\\right] &= M^{-1} W^T x_n \\\\\n",
"\\mathrm{E}\\left[ z_n z_n^T\\right] &= \\sigma^2 M^{-1} + \\mathrm{E}\\left[ z_n\\right] \\mathrm{E}\\left[ z_n\\right]^T\\\\\n",
"\\\\\n",
"\\textbf{M-step}:& \\\\\n",
"W_{\\text{new}} &= \\left[ \\sum_{n=1}^N x_n \\mathrm{E}\\left[z_n\\right]^T\\right] \\left[ \\sum_{n=1}^N \\mathrm{E}\\left[ z_n z_n^T\\right]\\right]^{-1} \\\\\n",
"\\sigma^2_{\\text{new}} &= \\frac{1}{ND} \\sum_{n=1}^N \\left\\{ x_n^T x_n - 2 \\mathrm{E}\\left[z_n\\right]^T W_{\\text{new}}^T x_n + \\mathrm{Tr}\\left( \\mathrm{E}\\left[ z_n z_n^T\\right] W_{\\text{new}}^T W_{\\text{new}} \\right) \\right\\}\n",
"\\end{align*}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Solution method 2: Use EM, cont'd\n",
"\n",
"- Note that after $x_n$ is observed, the unobserved 'input' $z_n$ is not known exactly; the uncertainty about input $z_n$, as expressed by the covariance $$\\text{cov}(z_n) = \\mathrm{E}\\left[ z_n z_n^T\\right] - \\mathrm{E}\\left[ z_n\\right] \\mathrm{E}\\left[ z_n\\right] ^T = \\sigma^2 M^{-1}$$ can be computed _before the data point $x_n$ has been seen_. \n",
" - Exercise: Show that the precision about $z_n$ increases through observing $x_n.$\n",
" - Compare this to linear regression, where we have full knowledge about an input-output pair."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- If there was no uncertainty about $z_n$, i.e., $\\mathrm{E}\\left[ z_n\\right] = z_n$ and $\\mathrm{E}\\left[ z_n z_n^T\\right] = z_n z_n^T$, then\n",
"$$\n",
"W_{\\text{new}} = \\left[ \\sum_{n=1}^N x_n z_n^T\\right] \\left[ \\sum_{n=1}^N z_n z_n^T\\right]^{-1}\n",
"$$\n",
" - Exercise: Verify that this solution resembles the [ML solution for linear regression](http://nbviewer.ipython.org/github/bertdv/AIP-5SSB0/blob/master/lessons/notebooks/06_Linear-Regression.ipynb). "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example Problem Revisited\n",
"\n",
"Let's perform pPCA on the example (**Tobamovirus**) data set using EM. We'll find the two principal components ($M=2$), and then visualize the data in a 2-D plot. The implementation is quite straightforward, have a look at the [source file](https://github.com/bertdv/AIP-5SSB0/blob/master/lessons/notebooks/scripts/pca_demo_helpers.jl) if you're interested in the details."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"Figure(PyObject