{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 10 - Neighbours and clusters\n", "\n", "> Use distance measures to classify and cluster data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/lewtun/dslectures/master?urlpath=lab/tree/notebooks%2Flesson10_clusters.ipynb) [![slides](https://img.shields.io/static/v1?label=slides&message=lesson10_clusters.pdf&color=blue&logo=Google-drive)](https://drive.google.com/open?id=1PZ_Yh3nK1qLVJoLBAc5nMpd2wyFQKKX8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning objectives\n", "In this lesson we will have a look at distance and similarity measures to analyse data. If one can determine the distance between data samples (e.g. between rows in a table, different texts or images) one has access to a range of tools to investigate data and to model it. In this lesson we will have a look at two of them, namely k-nearest neighbour classification and k-means clustering. \n", "\n", "* **k-nearest neighbour** classification falls in the category of supervised algorithms, which we already encountered with Decision Trees and Random Forests.\n", "\n", "* **k-means** clustering on the other hand is an unsupervised method and therefore forms a new class of algorithms. We will investigate how these algorithms can be used.\n", "\n", "This notebook is split according to three learning goals:\n", "1. Understand different measures of distance and the curse of dimensionality.\n", "2. How the nearest neighbours can be used to build a classifier in scikit-learn.\n", "3. Explore a dataset with the unsupervised k-means method.\n", "\n", "## References\n", "* Chapter 6: Similarity, Neighbors, and Clusters of _Data Science for Business_ by F. Provost and P. Fawcett\n", "\n", "## Homework\n", "Work through the notebook and solve the exercises." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Figure: A group of cats is called a cluster.
\n", "Figure: Obsolete Russian measures of distance.
\n", "