{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Clustering\n",
"\n",
"See our notes on [unsupervised learning](https://jennselby.github.io/MachineLearningCourseNotes/#unsupervised-learning), [K-means](https://jennselby.github.io/MachineLearningCourseNotes/#k-means-clustering), [DBSCAN](https://jennselby.github.io/MachineLearningCourseNotes/#dbscan-clustering), and [clustering validation](https://jennselby.github.io/MachineLearningCourseNotes/#clustering-validation).\n",
"\n",
"For documentation of various clustering methods in scikit-learn, see http://scikit-learn.org/stable/modules/clustering.html\n",
"\n",
"This code was based on the example at http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_iris.html\n",
"which has the following comments:\n",
"\n",
"Code source: Gaƫl Varoquaux
\n",
"Modified for documentation by Jaques Grobler
\n",
"License: BSD 3 clause\n",
"## Instructions\n",
"0. If you haven't already, follow [the setup instructions here](https://jennselby.github.io/MachineLearningCourseNotes/#setting-up-python3) to get all necessary software installed.\n",
"1. Read through the code in the following sections:\n",
" * [Iris Dataset](#Iris-Dataset)\n",
" * [Visualization](#Visualization)\n",
" * [Training and Visualization](#Training-and-Visualization)\n",
"2. Complete the three-part [Exercise](#Exercise)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import numpy\n",
"import matplotlib.pyplot\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"\n",
"from sklearn.cluster import KMeans\n",
"from sklearn import datasets\n",
"\n",
"import pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Iris Dataset\n",
"\n",
"Before you go on, if you haven't used the iris dataset in a previous assignment, make sure you understand it. Modify the cell below to examine different parts of the dataset that are contained in the iris dictionary object.\n",
"\n",
"What are the features? What are we trying to classify?"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris = datasets.load_iris()\n",
"iris.keys()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | sepal length (cm) | \n", "sepal width (cm) | \n", "petal length (cm) | \n", "petal width (cm) | \n", "
---|---|---|---|---|
0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "
1 | \n", "4.9 | \n", "3.0 | \n", "1.4 | \n", "0.2 | \n", "
2 | \n", "4.7 | \n", "3.2 | \n", "1.3 | \n", "0.2 | \n", "
3 | \n", "4.6 | \n", "3.1 | \n", "1.5 | \n", "0.2 | \n", "
4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "