{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Gaussian Mixture Model Ellipsoids\n\nPlot the confidence ellipsoids of a mixture of two Gaussians\nobtained with Expectation Maximisation (``GaussianMixture`` class) and\nVariational Inference (``BayesianGaussianMixture`` class models with\na Dirichlet process prior).\n\nBoth models have access to five components with which to fit the data. Note\nthat the Expectation Maximisation model will necessarily use all five\ncomponents while the Variational Inference model will effectively only use as\nmany as are needed for a good fit. Here we can see that the Expectation\nMaximisation model splits some components arbitrarily, because it is trying to\nfit too many components, while the Dirichlet Process model adapts it number of\nstate automatically.\n\nThis example doesn't show it, as we're in a low-dimensional space, but\nanother advantage of the Dirichlet process model is that it can fit\nfull covariance matrices effectively even when there are less examples\nper cluster than there are dimensions in the data, due to\nregularization properties of the inference algorithm.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause\n\nimport itertools\n\nimport matplotlib as mpl\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom scipy import linalg\n\nfrom sklearn import mixture\n\ncolor_iter = itertools.cycle([\"navy\", \"c\", \"cornflowerblue\", \"gold\", \"darkorange\"])\n\n\ndef plot_results(X, Y_, means, covariances, index, title):\n splot = plt.subplot(2, 1, 1 + index)\n for i, (mean, covar, color) in enumerate(zip(means, covariances, color_iter)):\n v, w = linalg.eigh(covar)\n v = 2.0 * np.sqrt(2.0) * np.sqrt(v)\n u = w[0] / linalg.norm(w[0])\n # as the DP will not use every component it has access to\n # unless it needs it, we shouldn't plot the redundant\n # components.\n if not np.any(Y_ == i):\n continue\n plt.scatter(X[Y_ == i, 0], X[Y_ == i, 1], 0.8, color=color)\n\n # Plot an ellipse to show the Gaussian component\n angle = np.arctan(u[1] / u[0])\n angle = 180.0 * angle / np.pi # convert to degrees\n ell = mpl.patches.Ellipse(mean, v[0], v[1], angle=180.0 + angle, color=color)\n ell.set_clip_box(splot.bbox)\n ell.set_alpha(0.5)\n splot.add_artist(ell)\n\n plt.xlim(-9.0, 5.0)\n plt.ylim(-3.0, 6.0)\n plt.xticks(())\n plt.yticks(())\n plt.title(title)\n\n\n# Number of samples per component\nn_samples = 500\n\n# Generate random sample, two components\nnp.random.seed(0)\nC = np.array([[0.0, -0.1], [1.7, 0.4]])\nX = np.r_[\n np.dot(np.random.randn(n_samples, 2), C),\n 0.7 * np.random.randn(n_samples, 2) + np.array([-6, 3]),\n]\n\n# Fit a Gaussian mixture with EM using five components\ngmm = mixture.GaussianMixture(n_components=5, covariance_type=\"full\").fit(X)\nplot_results(X, gmm.predict(X), gmm.means_, gmm.covariances_, 0, \"Gaussian Mixture\")\n\n# Fit a Dirichlet process Gaussian mixture using five components\ndpgmm = mixture.BayesianGaussianMixture(n_components=5, covariance_type=\"full\").fit(X)\nplot_results(\n X,\n dpgmm.predict(X),\n dpgmm.means_,\n dpgmm.covariances_,\n 1,\n \"Bayesian Gaussian Mixture with a Dirichlet process prior\",\n)\n\nplt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }