{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "from IPython.display import YouTubeVideo, IFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Concepts\n", "\n", "This block is all about grouping; grouping of _similar_ observations, areas, records... We start by discussing why grouping, or clustering in statistical parlance, is important and what it can do for us. Then we move on different types of clustering. We focus on two: one is traditional non-spatial clustering, or unsupervised learning, for which we cover the most popular technique; the other one is explicitly spatial clustering, or regionalisation, which imposes additional (geographic) constraints when grouping observations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The need to group data\n", "\n", "This video motivates the block: *what do we mean by \"grouping data\" and why is it useful?*\n", "\n", "```{sidebar} Slides\n", "\n", "The slides used in the clip are available at:\n", "\n", "- `[HTML]` \n", "- `[PDF]` \n", "\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "IFrame(\"https://liverpool.instructuremedia.com/embed/1227f575-c31a-45ec-b988-a3cf08cd18d7\",\n", " width=500,\n", " height=300\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Non-spatial clustering\n", "\n", "Non-spatial clustering is the most common form of data grouping. In this section, we cover the basics and mention a few approaches. We wrap it up with an example of clustering very dear to human geography: geodemographics.\n", "\n", "```{sidebar} Slides\n", "\n", "The slides used in the clip are available at:\n", "\n", "- `[HTML]` \n", "- `[PDF]` \n", "\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "IFrame(\"https://liverpool.instructuremedia.com/embed/0647598b-7b1d-4021-bad5-df26db367ac7\",\n", " width=500,\n", " height=300\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### K-Means\n", "\n", "In the clip above, we talk about K-Means, by far the most common clustering algorithm. Watch the video on the expandable to get the intuition behind the algorithm and better understand how it does its \"magic\"." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "remove-input", "hide-output" ] }, "outputs": [ { "data": { "image/jpeg": "\n", "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "YouTubeVideo(\"hDmNF9JG3lo\", width=700)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a striking visual comparison of how K-Means compares to other clustering algorithms, check out this figure produced by the `scikit-learn` project, a Python package for machine learning (more on this [later](lab_G)):\n", "\n", "````{toggle}\n", "```{figure} https://scikit-learn.org/stable/_images/sphx_glr_plot_cluster_comparison_0011.png\n", "---\n", "height: 500px\n", "name: clustering-algos\n", "---\n", "Clustering algorithms comparison [`[Source]`](https://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods)\n", "```\n", "````\n", "\n", "### Geodemographics\n", "\n", "If you are interested in Geodemographics, a very good reference to get a broader perspective on the idea, origins and history of the field is \"The Predictive Postcode\" {cite}`webber2018predictive`, by Richard Webber and Roger Burrows. In particular, the first four chapters provide an excellent overview.\n", "\n", "Furthermore, the clip mentions the Output Area Classification (OAC), which you can access, for example, through the CDRC Maps platform:\n", "\n", "```{margin}\n", "Explore the resource further [here](https://maps.cdrc.ac.uk/#/geodemographics/oac11/default/BTTTFFT/13/-2.9700/53.4000/), and if you want to peek into the next generation of the patform, have a look in [here](https://mapmaker.cdrc.ac.uk/#/internet-user-classification?lon=-2.97643&lat=53.38031&zoom=11)\n", "```" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "IFrame(\"https://mapmaker.cdrc.ac.uk/#/internet-user-classification?lon=-2.97643&lat=53.38031&zoom=11\",\n", " width=700,\n", " height=600\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regionalisation\n", "\n", "Regionalisation is explicitly spatial clustering. We cover the conceptual basics in the following clip:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{sidebar} Slides\n", "\n", "The slides used in the clip are available at:\n", "\n", "- `[HTML]` \n", "- `[PDF]` \n", "\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "IFrame(\"https://liverpool.instructuremedia.com/embed/f70ba654-b5fc-4dd6-a918-2f488fe8cf9e\",\n", " width=500,\n", " height=300\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are interested in the idea of regionalisation, a very good place to continue reading is Duque et al. (2007) {cite}`duque2007supervised`, which was an important inspiration in structuring the clip.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further readings\n", "\n", "```{margin}\n", "The chapter is available for free [here](https://geographicdata.science/book/notebooks/10_clustering_and_regionalization.html)\n", "```\n", "\n", "A similar coverage of clustering and regionalisation as provided here, but with a bit more detail, is available on the corresponding chapter of the GDS book (in progress) {cite}`reyABwolf`." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }