{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Gaussian Mixture Models\n", "by Marc Deisenroth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we will look at density modeling with Gaussian mixture models (GMMs).\n", "In Gaussian mixture models, we describe the density of the data as\n", "$$\n", "p(\\boldsymbol x) = \\sum_{k=1}^K \\pi_k \\mathcal{N}(\\boldsymbol x|\\boldsymbol \\mu_k, \\boldsymbol \\Sigma_k)\\,,\\quad \\pi_k \\geq 0\\,,\\quad \\sum_{k=1}^K\\pi_k = 1\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this notebook is to get a better understanding of GMMs and to write some code for training GMMs using the EM algorithm. We provide a code skeleton and mark the bits and pieces that you need to implement yourself." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# imports\n", "import numpy as np\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "from scipy.stats import multivariate_normal\n", "import scipy.linalg as la\n", "import matplotlib.cm as cm\n", "from matplotlib import rc\n", "import time\n", "from IPython import display\n", "\n", "%matplotlib inline\n", "\n", "np.random.seed(42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define a GMM from which we generate data\n", "Set up the true GMM from which we will generate data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Choose a GMM with 3 components\n", "\n", "# means\n", "m = np.zeros((3,2))\n", "m[0] = np.array([1.2, 0.4])\n", "m[1] = np.array([-4.4, 1.0])\n", "m[2] = np.array([4.1, -0.3])\n", "\n", "# covariances\n", "S = np.zeros((3,2,2))\n", "S[0] = np.array([[0.8, -0.4], [-0.4, 1.0]])\n", "S[1] = np.array([[1.2, -0.8], [-0.8, 1.0]])\n", "S[2] = np.array([[1.2, 0.6], [0.6, 3.0]])\n", "\n", "# mixture weights\n", "w = np.array([0.3, 0.2, 0.5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate some data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "N_split = 200 # number of data points per mixture component\n", "N = N_split*3 # total number of data points\n", "x = []\n", "y = []\n", "for k in range(3):\n", " x_tmp, y_tmp = np.random.multivariate_normal(m[k], S[k], N_split).T \n", " x = np.hstack([x, x_tmp])\n", " y = np.hstack([y, y_tmp])\n", "\n", "data = np.vstack([x, y])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Visualization of the dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X, Y = np.meshgrid(np.linspace(-10,10,100), np.linspace(-10,10,100))\n", "pos = np.dstack((X, Y))\n", "\n", "mvn = multivariate_normal(m[0,:].ravel(), S[0,:,:])\n", "xx = mvn.pdf(pos)\n", "\n", "# plot the dataset\n", "plt.figure()\n", "plt.title(\"Mixture components\")\n", "plt.plot(x, y, 'ko', alpha=0.3)\n", "plt.xlabel(\"$x_1$\")\n", "plt.ylabel(\"$x_2$\")\n", "\n", "# plot the individual components of the GMM\n", "plt.plot(m[:,0], m[:,1], 'or')\n", "\n", "for k in range(3):\n", " mvn = multivariate_normal(m[k,:].ravel(), S[k,:,:])\n", " xx = mvn.pdf(pos)\n", " plt.contour(X, Y, xx, alpha = 1.0, zorder=10)\n", " \n", "# plot the GMM \n", "plt.figure()\n", "plt.title(\"GMM\")\n", "plt.plot(x, y, 'ko', alpha=0.3)\n", "plt.xlabel(\"$x_1$\")\n", "plt.ylabel(\"$x_2$\")\n", "\n", "# build the GMM\n", "gmm = 0\n", "for k in range(3):\n", " mix_comp = multivariate_normal(m[k,:].ravel(), S[k,:,:])\n", " gmm += w[k]*mix_comp.pdf(pos)\n", " \n", "plt.contour(X, Y, gmm, alpha = 1.0, zorder=10);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train the GMM via EM\n", "### Initialize the parameters for EM" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "K = 3 # number of clusters\n", "\n", "means = np.zeros((K,2))\n", "covs = np.zeros((K,2,2))\n", "for k in range(K):\n", " means[k] = np.random.normal(size=(2,))\n", " covs[k] = np.eye(2)\n", "\n", "weights = np.ones((K,1))/K\n", "print(\"Initial mean vectors (one per row):\\n\" + str(means))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#EDIT THIS FUNCTION\n", "NLL = [] # log-likelihood of the GMM\n", "gmm_nll = 0\n", "NLL += [gmm_nll] #<-- REPLACE THIS LINE\n", "\n", "plt.figure()\n", "plt.plot(x, y, 'ko', alpha=0.3)\n", "plt.plot(means[:,0], means[:,1], 'oy', markersize=25)\n", "\n", "for k in range(K):\n", " rv = multivariate_normal(means[k,:], covs[k,:,:])\n", " plt.contour(X, Y, rv.pdf(pos), alpha = 1.0, zorder=10)\n", " \n", "plt.xlabel(\"$x_1$\");\n", "plt.ylabel(\"$x_2$\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we define the responsibilities (which are updated in the E-step), given the model parameters $\\pi_k, \\boldsymbol\\mu_k, \\boldsymbol\\Sigma_k$ as\n", "$$\n", "r_{nk} := \\frac{\\pi_k\\mathcal N(\\boldsymbol\n", " x_n|\\boldsymbol\\mu_k,\\boldsymbol\\Sigma_k)}{\\sum_{j=1}^K\\pi_j\\mathcal N(\\boldsymbol\n", " x_n|\\boldsymbol \\mu_j,\\boldsymbol\\Sigma_j)} \n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the responsibilities we just defined, we can update the model parameters in the M-step as follows:\n", "\\begin{align*}\n", "\\boldsymbol\\mu_k^\\text{new} &= \\frac{1}{N_k}\\sum_{n = 1}^Nr_{nk}\\boldsymbol x_n\\,,\\\\\n", " \\boldsymbol\\Sigma_k^\\text{new}&= \\frac{1}{N_k}\\sum_{n=1}^Nr_{nk}(\\boldsymbol x_n-\\boldsymbol\\mu_k)(\\boldsymbol x_n-\\boldsymbol\\mu_k)^\\top\\,,\\\\\n", " \\pi_k^\\text{new} &= \\frac{N_k}{N}\n", "\\end{align*}\n", "where $$\n", "N_k := \\sum_{n=1}^N r_{nk}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### EM Algorithm" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#EDIT THIS FUNCTION\n", "r = np.zeros((K,N)) # will store the responsibilities\n", "\n", "for em_iter in range(100): \n", " means_old = means.copy()\n", " \n", " # E-step: update responsibilities\n", " r = np.zeros((K,N)) #<-- REPLACE THIS LINE\n", " \n", " # M-step\n", " N_k = np.sum(r, axis=1)\n", "\n", " for k in range(K): \n", " # update the means\n", " means[k] = 0 #<-- REPLACE THIS LINE\n", " \n", " # update the covariances\n", " covs[k] = 0 #<-- REPLACE THIS LINE\n", " \n", " # weights\n", " weights = np.zeros((K,)) #<-- REPLACE THIS LINE\n", " \n", " # log-likelihood\n", " NLL += [10] #<-- REPLACE THIS LINE\n", " \n", " plt.figure() \n", " plt.plot(x, y, 'ko', alpha=0.3)\n", " plt.plot(means[:,0], means[:,1], 'oy', markersize=25)\n", " for k in range(K):\n", " rv = multivariate_normal(means[k,:], covs[k])\n", " plt.contour(X, Y, rv.pdf(pos), alpha = 1.0, zorder=10)\n", " \n", " plt.xlabel(\"$x_1$\")\n", " plt.ylabel(\"$x_2$\")\n", " plt.text(x=3.5, y=8, s=\"EM iteration \"+str(em_iter+1))\n", " \n", " if la.norm(NLL[em_iter+1]-NLL[em_iter]) < 1e-6:\n", " print(\"Converged after iteration \", em_iter+1)\n", " break\n", " \n", "# plot final the mixture model\n", "plt.figure() \n", "gmm = 0\n", "for k in range(3):\n", " mix_comp = multivariate_normal(means[k,:].ravel(), covs[k,:,:])\n", " gmm += weights[k]*mix_comp.pdf(pos)\n", "\n", "plt.plot(x, y, 'ko', alpha=0.3)\n", "plt.contour(X, Y, gmm, alpha = 1.0, zorder=10) \n", "plt.xlim([-8,8]);\n", "plt.ylim([-6,6]);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure()\n", "plt.semilogy(np.linspace(1,len(NLL), len(NLL)), NLL)\n", "plt.xlabel(\"EM iteration\");\n", "plt.ylabel(\"Negative log-likelihood\");\n", "\n", "idx = [0, 1, 9, em_iter+1]\n", "\n", "for i in idx:\n", " plt.plot(i+1, NLL[i], 'or')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 2 }