{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# GMM\n", "\n", "It is a probabilistic approach to clustering addressing many of these problems. In this approach we describe each cluster by its centroid (mean), covariance , and the size of the cluster(Weight)\n", "\n", "### Aproach\n", "\n", "* Rather than identifying clusters by “nearest” centroids like k means, we fit a set of k gaussians to the data. \n", "* Then we estimate gaussian distribution parameters such as mean and Variance for each cluster and weight of a cluster. \n", "* After learning the parameters for each data point we can calculate the probabilities of it belonging to each of the clusters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How do we estimate GD params\n", "\n", "* Expectation maximization is the technique most commonly used to estimate the mixture model's parameters. \n", "* In frequentist probability theory, models are typically learned by using maximum likelihood estimation techniques, which seek to maximize the probability, or likelihood, of the observed data given the model parameters. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well it's hard to visualize and grab these concepts in first shot. Here is some material for deep dive in gmm\n", "* http://www.cse.iitm.ac.in/~vplab/courses/DVP/PDF/gmm.pdf" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "sns.set(style=\"white\", color_codes=True)\n", "\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | SepalLengthCm | \n", "SepalWidthCm | \n", "PetalLengthCm | \n", "PetalWidthCm | \n", "Species | \n", "
|---|---|---|---|---|---|
| 75 | \n", "6.6 | \n", "3.0 | \n", "4.4 | \n", "1.4 | \n", "Iris-versicolor | \n", "
| 36 | \n", "5.5 | \n", "3.5 | \n", "1.3 | \n", "0.2 | \n", "Iris-setosa | \n", "
| 129 | \n", "7.2 | \n", "3.0 | \n", "5.8 | \n", "1.6 | \n", "Iris-virginica | \n", "
| 73 | \n", "6.1 | \n", "2.8 | \n", "4.7 | \n", "1.2 | \n", "Iris-versicolor | \n", "
| 83 | \n", "6.0 | \n", "2.7 | \n", "5.1 | \n", "1.6 | \n", "Iris-versicolor | \n", "