{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
build
method from SOMFactory creates self organizing map, give it the size of the map and the data. the method takes the size of the map and the data.\n",
"\n",
"initialization='random'
is a type of initial node weights, the random values to all weights."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"som = sompy.SOMFactory.build(data_full, mapsize, initialization=\"random\")\n",
"som.train(n_job=1, verbose=\"info\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For visualizaion used mapview.View2DPacked."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"v = sompy.mapview.View2DPacked(10, 10, \"example\", text_size=8)\n",
"v.show(som)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The som could recognize four clusters. Although the scope of the cluster are far from ideal.\n",
"\n",
"The \"cluster\" method is using [sklearn.Kmeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) for predict clusters on the raw data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"v = sompy.mapview.View2DPacked(5, 5, \"test\", text_size=8)\n",
"som.cluster(n_clusters=4)\n",
"som.cluster_labels"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"v.show(som, what=\"cluster\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at the visualization of clusters on the grid. For this use HitMapView."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h = sompy.hitmap.HitMapView(8, 8, \"hitmap\", text_size=8, show_text=True)\n",
"h.show(som);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The grid of self organizing map have a two types:\n",
" - square grid\n",
" - hexagonal grid"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will create a new SOM and add some arguments for best result.\n",
"\n",
"Increasing map size."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mapsize = [20, 20]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"lattice='rect'
is a square grid of SOM.\n",
"\n",
"normalization='var'
is the type of [normalization](https://en.wikipedia.org/wiki/Normalization_(statistics)) of the input data. 'var' is [t-statistic](https://en.wikipedia.org/wiki/T-statistic).\n",
"$$\\frac{X-\\bar{X}}{s}$$\n",
"- $X$ is input data.\n",
"- $\\bar{X}$ is average of input data.\n",
"- $s$ is [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation).\n",
"\n",
"initialization='pca'
is a type of initial node weights, principal component initialization.\n",
"\n",
"neighborhood='gaussian'
use the 'gaussian' function for \"measure of neighborhood\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"som = sompy.SOMFactory.build(\n",
" data_full,\n",
" mapsize,\n",
" lattice=\"rect\",\n",
" normalization=\"var\",\n",
" initialization=\"random\",\n",
" neighborhood=\"gaussian\",\n",
")\n",
"som.train(n_job=1, verbose=\"info\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"v = sompy.mapview.View2DPacked(10, 10, \"example\", text_size=8)\n",
"v.show(som)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"v = sompy.mapview.View2DPacked(5, 5, \"test\", text_size=8)\n",
"som.cluster(n_clusters=4)\n",
"som.cluster_labels"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h = sompy.hitmap.HitMapView(8, 8, \"hitmap\", text_size=8, show_text=True)\n",
"h.show(som);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's use the SOM for [the Iris Dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn import datasets"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iris = datasets.load_iris()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iris.target_names"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mapsize = [20, 20]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iris.target"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"som = sompy.SOMFactory.build(\n",
" iris.data,\n",
" mapsize,\n",
" lattice=\"rect\",\n",
" normalization=\"var\",\n",
" initialization=\"random\",\n",
" neighborhood=\"gaussian\",\n",
")\n",
"som.train(n_job=1, verbose=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"v = sompy.mapview.View2DPacked(10, 10, \"iris\", text_size=8)\n",
"v.show(som, which_dim=[0, 1, 2])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The raw data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"view2D = sompy.mapview.View2D(10, 10, \"Iris_raw_data\", text_size=8)\n",
"view2D.show(som, col_sz=4, which_dim=\"all\", desnormalize=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After training, SOM separates four distinct clusters, which is true."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iris.data.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Visualization of a grid."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"v = sompy.mapview.View2DPacked(5, 5, \"test\", text_size=8)\n",
"som.cluster(n_clusters=3)\n",
"som.cluster_labels"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h = sompy.hitmap.HitMapView(8, 8, \"hitmap_iris\", text_size=8, show_text=True)\n",
"h.show(som,);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also we can build the [U-matrix](https://en.wikipedia.org/wiki/U-matrix). Use umatrix.UMatrixView for visualization."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"u = sompy.umatrix.UMatrixView(20, 20, \"umatrix\")\n",
"UMAT = u.build_u_matrix(som)\n",
"UMAT = u.show(som)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### normalization='var'
is only one implementation of the normalization.\n",
"\n",
"Kohonen self-organizing maps solve many issues and are a powerful tool for data analysis. In this article, we learned the principle of the SOM, as well as considered small examples of clustering and data visualization. But at the moment, the SOM is losing its popularity in favor of other algorithms."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}