Part 1 covers the expectation maximization (EM) algorithm and its application to Gaussian mixture models. [Part 2](latent_variable_models_part_2.ipynb) covers approximate inference and variational autoencoders.\n", "\n", "## Introduction\n", "\n", "Given a probabilistic model $p(\\mathbf{x} \\lvert \\boldsymbol{\\theta})$ and $N$ observations $\\mathbf{X} = \\left\\{ \\mathbf{x}_1, \\ldots, \\mathbf{x}_N \\right\\}$ we often want to find a value for parameter $\\boldsymbol{\\theta}$ that maximizes the likelihood function $p(\\mathbf{X} \\lvert \\boldsymbol{\\theta})$, a function of parameter $\\boldsymbol{\\theta}$. This is known as [maximimum likelihood estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE). \n", "\n", "$$\n", "\\boldsymbol{\\theta}_{MLE} = \\underset{\\boldsymbol{\\theta}}{\\mathrm{argmax}} p(\\mathbf{X} \\lvert \\boldsymbol{\\theta})\n", "\\tag{1}\n", "$$\n", "\n", "If the model is a simple probability distribution, like a single Gaussian, for example, then $\\boldsymbol{\\theta}_{MLE} = \\left\\{ \\boldsymbol{\\mu}_{MLE}, \\boldsymbol{\\Sigma}_{MLE} \\right\\}$ has an analytical solution. A common approach for more complex models is *gradient descent* using the *negative log likelihood*, $-\\log p(\\mathbf{X} \\lvert \\boldsymbol{\\theta})$, as loss function. This can easily be implemented with frameworks like PyTorch or Tensorflow provided that $p(\\mathbf{X} \\lvert \\boldsymbol{\\theta})$ is differentiable w.r.t. $\\boldsymbol{\\theta}$. But this is not necessarily the most efficient approach.\n", "\n", "## Gaussian mixture model\n", "\n", "MLE can often be simplified by introducing *latent variables*. A latent variable model makes the assumption that an observation $\\mathbf{x}_i$ is caused by some underlying latent variable, a variable that cannot be observed directly but can be inferred from observed variables and parameters. For example, the following plot shows observations in 2-dimensional space and one can see that their overall distribution doesn't seem follow a simple distribution like a single Gaussian." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "image/png": 