{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Case study: Classification of shapes\n", "\n", "This notebook explains how to use `giotto-tda` to be able to classify topologically different high-dimensional spaces.\n", "\n", "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/classifying_shapes.ipynb).\n", "\n", "**License: AGPLv3**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import libraries\n", "The first step consists in importing relevant `giotto-tda` components and other useful libraries or modules." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Importing libraries\n", "from gtda.homology import VietorisRipsPersistence\n", "from gtda.diagrams import PersistenceEntropy\n", "\n", "import numpy as np\n", "\n", "from gtda.pipeline import Pipeline\n", "from sklearn.linear_model import LogisticRegression\n", "\n", "# Plotting functions\n", "from gtda.plotting import plot_diagram, plot_point_cloud, plot_heatmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sampling orientable surfaces\n", "\n", "We are going to consider three classical topological spaces: the circle, the 2-torus and the 2-sphere.\n", "The purpose of this tutorial is to go through the most famous topological spaces and compute their homology groups.\n", "\n", "Each of the topological spaces we are going to encounter will be sampled. The resulting point cloud will be the input of the *persistent homology pipeline*. The first step is to apply the Vietoris–Rips technique to the point cloud. Finally, the persistent homology groups will be computed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Representing the circle in 3d with parametric equations.\n", "circle = np.asarray([[np.sin(t),np.cos(t),0] for t in range(400)])\n", "plot_point_cloud(circle)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Representing the sphere in 3d with parametric equations\n", "sphere = np.asarray([[np.cos(s)*np.cos(t),np.cos(s)*np.sin(t),np.sin(s)] for t in range(20) for s in range(20)])\n", "plot_point_cloud(sphere)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Representing the torus in 3d with parametric equations\n", "torus = np.asarray([[(2+np.cos(s))*np.cos(t),(2+np.cos(s))*np.sin(t),np.sin(s)] for t in range(20) for s in range(20)])\n", "plot_point_cloud(torus)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Saving the results into an array\n", "topological_spaces = np.asarray([circle,sphere,torus])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing persistent homology\n", "\n", "In the next section we will use giotto-tda to compute the persistent homology groups of the topological spaces we just constructed.\n", "\n", "We will use the Vietoris–Rips technique to generate a filtration out of a point cloud:\n", "\n", "![SegmentLocal](https://miro.medium.com/max/1200/1*w3BiQI1OX93KXcezctRQTQ.gif \"segment\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The homology ranks we choose to consider\n", "homology_dimensions = (0, 1, 2)\n", "VR = VietorisRipsPersistence(\n", " metric='euclidean', max_edge_length=10, homology_dimensions=homology_dimensions)\n", "\n", "# Array of persistence diagrams, one per point cloud in the input\n", "diagrams = VR.fit_transform(topological_spaces)\n", "print(f'diagrams.shape = {diagrams.shape}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Persistence diagrams\n", "\n", "The topological information of the point cloud is synthesised in the persistence diagram. The horizontal axis corresponds to the moment in which an homological generator is born, while the vertical axis corresponds to the moments in which a homological generator dies.\n", "\n", "The generators of the homology groups (at given rank) are colored differently." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the persistence diagram of the circle\n", "plot_diagram(diagrams[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the persistence diagram of the sphere\n", "plot_diagram(diagrams[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plotting the persistence diagram of the torus\n", "plot_diagram(diagrams[2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion of the first part\n", "\n", "As you can see from the persistence diagrams, all the betti numbers were found. Some other persistent generators are also appearing, depending on how dense the sampling is and how much noise there is. For example, we see a rather neat persistence diagram over the Torus bottle (we see two persistent generators for $H_1$ and one persistent generator for $H_2$). Notice though that there are other persistent $H_1$ generators, possibly due to the non-uniform sampling method we used for the torus.\n", "\n", "On the other hand, the persistence diagram for the circle is as perfect as it could be: one unique generator of $H_1$ and no other persistent generator, as expected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classification of noisy orientable surfaces\n", "In the next section we generate hundreds of noisy spheres and noisy tori. The effect of noise is to displace the points sampling the aforementioned surfaces by a random amount in a random direction.\n", "\n", "The Vietoris–Rips algorithm is used to compute persistence diagrams. Afterwards, from each diagram, the *persistence entropy* is computed.\n", "\n", "A simple linear classifier is then applied to the 3D-space of entropies to classify shapes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Building dataset\n", "n_train = 10\n", "n_range = 15\n", "eps = 0.3\n", "\n", "train_Xs = [np.asarray([[np.cos(s)*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),np.cos(s)*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n", "train_ys = np.zeros(n_train)\n", "train_Xt = [np.asarray([[(2+np.cos(s))*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),(2+np.cos(s))*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n", "train_yt = np.ones(n_train)\n", "\n", "# Training set\n", "train_X = np.concatenate((train_Xs, train_Xt))\n", "train_y = np.concatenate((train_ys, train_yt))\n", "\n", "test_Xs = [np.asarray([[np.cos(s)*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),np.cos(s)*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n", "test_ys = np.zeros(n_train)\n", "test_Xt = [np.asarray([[(2+np.cos(s))*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),(2+np.cos(s))*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n", "test_yt = np.ones(n_train)\n", "\n", "\n", "# Test set\n", "test_X = np.concatenate((test_Xs, test_Xt))\n", "test_y = np.concatenate((test_ys, test_yt))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Build persistence diagrams\n", "\n", "# The homology ranks we choose to consider\n", "homology_dimensions = (0, 1, 2)\n", "VR = VietorisRipsPersistence(\n", " metric='euclidean', max_edge_length=10, homology_dimensions=homology_dimensions)\n", "\n", "# List of all the time-ordered persistence diagrams obtained from the list of correlation matrices\n", "train_diagrams = VR.fit_transform(train_X)\n", "test_diagrams = VR.transform(test_X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract and plot persistent entropies" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Extract vector features from diagrams: persistence entropy\n", "PE = PersistenceEntropy()\n", "X_train = PE.fit_transform(train_diagrams)\n", "X_test = PE.transform(test_diagrams)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_point_cloud(X_train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Training logistic regression\n", "LR = LogisticRegression(solver='lbfgs')\n", "LR.fit(X_train, train_y)\n", "# Score\n", "LR.score(X_test, test_y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating non-orientable surfaces\n", "\n", "We are going to consider different classical shapes: the real projective space and the Klein bottle.\n", "The purpose of the second part of the tutorial is to define shapes via a distance matrix. We also add noise to the distance matrix: the main reason is not to have overlapping points in the persistence diagram.\n", "\n", "Each of the topological spaces we are going to encounter will be sampled discretely. Afterwards, the Vietoris–Rips technique will be applied to the surface and the persistent homology groups will be computed." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Compute the adjacency matrix of the grid points, with boundaries identified as in the real projective space\n", "from sklearn.utils.graph_shortest_path import graph_shortest_path\n", "\n", "# This functions prepares the grid matrix with boundary identification\n", "def make_matrix(rows, cols):\n", " n = rows*cols\n", " M = np.zeros((n,n))\n", " for r in range(rows):\n", " for c in range(cols):\n", " i = r*cols + c\n", " # Two inner diagonals\n", " if c > 0: M[i-1,i] = M[i,i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # Two outer diagonals\n", " if r > 0: M[i-cols,i] = M[i,i-cols] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # vertical twisted boundary identification\n", " if c == 0: M[n-i-1,i] = M[i,n-i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # horizontal twisted boundary identification\n", " if r == 0: M[n-i-1,i] = M[i,n-i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " \n", " return M\n", "\n", "M = make_matrix(20,20)\n", "\n", "# Compute the distance matrix of the points over the Klein bottle\n", "\n", "rp2 = graph_shortest_path(M)\n", "\n", "# Plot of the distance matrix\n", "plot_heatmap(rp2, colorscale='viridis')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compute the adjacency matrix of the grid points, with boundaries identified as in the Klein bottle\n", "from sklearn.utils.graph_shortest_path import graph_shortest_path\n", "\n", "# This functions prepares the grid matrix with boundary identification\n", "def make_matrix(rows, cols):\n", " n = rows*cols\n", " M = np.zeros((n,n))\n", " for r in range(rows):\n", " for c in range(cols):\n", " i = r*cols + c\n", " # Two inner diagonals\n", " if c > 0: M[i-1,i] = M[i,i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # Two outer diagonals\n", " if r > 0: M[i-cols,i] = M[i,i-cols] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # vertical boundary identification\n", " if c == 0: M[i+cols-1,i] = M[i,i+cols-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # horizontal twisted boundary identification\n", " if r == 0: M[n-i-1,i] = M[i,n-i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " \n", " return M\n", "\n", "M = make_matrix(20,20)\n", "\n", "# computing the distance matrix of the points over the Klein bottle\n", "\n", "klein = graph_shortest_path(M)\n", "\n", "# Plot of the distance matrix\n", "plot_heatmap(klein, colorscale='viridis')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Saving the results into an array\n", "topological_spaces_mat = np.asarray([rp2, klein])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing persistent homology\n", "\n", "In the next section we will use `giotto-tda` to compute the persistent homology groups of the topological spaces we just constructed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the homology ranks we choose to consider\n", "homology_dimensions = (0, 1, 2)\n", "VR = VietorisRipsPersistence(\n", " metric='precomputed', max_edge_length=np.inf, homology_dimensions=homology_dimensions)\n", "\n", "# List of all the time-ordered persistence diagrams obtained from the list of correlation matrices\n", "diagrams = VR.fit_transform(topological_spaces_mat)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Persistence diagrams\n", "\n", "The topological information of the point cloud is synthesised in the persistence diagram. The horizontal axis corresponds to the moment in which an homological generator is born, while the vertical axis corresponds to the moments in which an homological generator dies.\n", "\n", "The generators of the homology groups (at given rank) are colored differently." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the persistence diagram of the projective space\n", "plot_diagram(diagrams[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the persistence diagram of the Klein bottle\n", "plot_diagram(diagrams[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "As you can see from the persistence diagrams, all the Betti numbers were found. Some other persistent generators are also appearing, depending on how dense the sampling is and how much noise there is.\n", "\n", "For example, we see a rather neat persistence diagram over the Klein bottle (we see two persistent generators for $H_1$ and one persistent generator for $H_2$). Notice that all these homology groups are computed over the field $\\mathbb{F}_2$." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" } }, "nbformat": 4, "nbformat_minor": 4 }