{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Case study: Classification of shapes\n", "\n", "##### License: AGPLv3\n", "\n", "\n", "The following notebook explains how to use giotto-tda to be able to classify topologically different high-dimensional spaces.\n", "\n", "The first step consists in importing the *gtda* module." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Importing libraries\n", "import gtda as go\n", "import gtda.time_series as ts\n", "import gtda.diagrams as diag\n", "import gtda.homology as hl\n", "from gtda.diagrams import PersistenceEntropy, BettiCurve, PersistenceLandscape, HeatKernel\n", "import numpy as np\n", "import sklearn as sk\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.metrics import pairwise_distances\n", "\n", "import matplotlib.pyplot as plt\n", "from mpl_toolkits.mplot3d import Axes3D \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plotting functions\n", "\n", "The *plotting.py* file is required to use the following plotting functions. It can be found in the *examples* folder on out github ." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting functions\n", "from plotting import plot_diagram, plot_landscapes\n", "from plotting import plot_betti_surfaces, plot_betti_curves\n", "from plotting import plot_point_cloud" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sampling orientable surfaces\n", "\n", "We are going to consider three classical topological spaces: the circle, the 2-torus and the 2-sphere.\n", "The purpose of this tutorial is to go thgough the most famous topological spaces and compute their homology groups.\n", "\n", "Each of the topological spaces we are going to encounter will be sampled. The resulting point clound will be the input of the *persistent homology pipeline*. The first step is to apply the Vietoris-Rips technique to the point cloud. Finally, the persistent homology groups will be computed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Representing the circle in 3d with parametric equations.\n", "circle = np.asarray([[np.sin(t),np.cos(t),0] for t in range(400)])\n", "plot_point_cloud(circle)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# Representing the sphere in 3d with parametric equations\n", "sphere = np.asarray([[np.cos(s)*np.cos(t),np.cos(s)*np.sin(t),np.sin(s)] for t in range(20) for s in range(20)])\n", "plot_point_cloud(sphere)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Representing the torus in 3d with parametric equations\n", "torus = np.asarray([[(2+np.cos(s))*np.cos(t),(2+np.cos(s))*np.sin(t),np.sin(s)] for t in range(20) for s in range(20)])\n", "plot_point_cloud(torus)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Saving the results into an array\n", "\n", "topological_spaces=np.asarray([circle,sphere,torus])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Computing persistent homology\n", "\n", "In the next section we will use giotto-tda to compute the persistent homology groups of the topological spaces we just constructed.\n", "We will use the Vietoris-Rips technique to generate a filtration out of a point cloud:\n", "\n", "![SegmentLocal](https://miro.medium.com/max/1200/1*w3BiQI1OX93KXcezctRQTQ.gif \"segment\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# the homology ranks we choose to consider\n", "homologyDimensions = (0, 1 ,2)\n", "persistenceDiagram = hl.VietorisRipsPersistence(metric='euclidean', max_edge_length=10, \n", " homology_dimensions=homologyDimensions, \n", " n_jobs=-1)\n", "persistenceDiagram.fit(topological_spaces)\n", "\n", "# List of all the time-pordered persistent diagrams obtained from the list of correlation matrices\n", "Diagrams = persistenceDiagram.transform(topological_spaces)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(Diagrams.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Persistent diagrams\n", "\n", "The topological information of the point cloud is synthesised in the persistent diagram. The horizonral axis corresponds to the moment in which an homological generator is born, while the vertical axis corresponds to the moments in which an homological generator dies.\n", "The generators of the homology groups (at given rank) are colored differently" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plotting the persistent diagram of the circle\n", "plot_diagram(Diagrams[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plotting the persistent diagram of the sphere\n", "plot_diagram(Diagrams[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plotting the persistent diagram of the torus\n", "plot_diagram(Diagrams[2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion of the first part\n", "As you can see from the persistence diagrams, all the betti numbers were found. Some other persistent generators are also appearing, depending on how dense the sampling is and how much noise there is. For example, we see a rahter neat persistent diagram over the Torus bottle (wesee 2 persistent generators for $H^1$ and 1 persistent generator for $H^2$). Notice though that there are other persistent H1 generators, possibly due to the non-uniform sampling method we used for the torus.\n", "On the other hand, the persistent diagram for the circle is as perfect as it could be: one unique generator of $H^1$ and no other persistent generator, as expeceted.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Classification of noisy orientable surfaces\n", "In the next section we generate hundreds of noisy spheres and noisy tori. The effect of noise is to displace the points sampling the aforementioned surfaces by a random amount in a random direction.\n", "The Vietoris-Rips algorithm is used to compute persistent diagrams. Afterwards, from each diagram, the *persistece entropy* is computed. \n", "A simple linear classifier is then applied to the 3d-space of entropies to classify shapes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# building dataset\n", "n_train = 10\n", "n_range = 15\n", "eps = 0.3\n", "\n", "train_Xs = [np.asarray([[np.cos(s)*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),np.cos(s)*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n", "train_ys = np.zeros(n_train)\n", "train_Xt = [np.asarray([[(2+np.cos(s))*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),(2+np.cos(s))*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n", "train_yt = np.ones(n_train)\n", "\n", "# complete training set\n", "train_X = np.concatenate((train_Xs, train_Xt))\n", "train_y = np.concatenate((train_ys, train_yt))\n", "\n", "test_Xs = [np.asarray([[np.cos(s)*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),np.cos(s)*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n", "test_ys = np.zeros(n_train)\n", "test_Xt = [np.asarray([[(2+np.cos(s))*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),(2+np.cos(s))*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n", "test_yt = np.ones(n_train)\n", "\n", "\n", "# complete test set\n", "test_X = np.concatenate((test_Xs, test_Xt))\n", "test_y = np.concatenate((test_ys, test_yt))\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# build persistent diagrams\n", "\n", "# the homology ranks we choose to consider\n", "homologyDimensions = (0, 1, 2)\n", "persistenceDiagram = hl.VietorisRipsPersistence(metric='euclidean', \n", " max_edge_length=10, \n", " homology_dimensions=homologyDimensions, \n", " n_jobs=4)\n", "persistenceDiagram.fit(np.asarray(train_X))\n", "\n", "# List of all the time-pordered persistent diagrams obtained from the list of correlation matrices\n", "train_Diagrams = persistenceDiagram.transform(np.asarray(train_X))\n", "test_Diagrams = persistenceDiagram.transform(np.asarray(test_X))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract and plot persistent entropies" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Extract vector features from diagrams: persistence entropy\n", "pe = PersistenceEntropy()\n", "pe.fit(train_Diagrams)\n", "X_train = pe.transform(train_Diagrams)\n", "X_test = pe.transform(test_Diagrams)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_point_cloud(X_train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# training logistic regression\n", "lr = sk.linear_model.LogisticRegression(solver='lbfgs')\n", "lr.fit(X_train, train_y)\n", "# score\n", "lr.score(X_test, test_y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating non-orientable surfaces\n", "\n", "We are going to consider different classical shapes: the real projective space and the Klein bottle.\n", "The purpose of the second part of the tutorial is to define shapes via a distance matrix. We also add noise to the distace matrix: the main reason is not to have overlapping points in the persistent diagram.\n", "\n", "Each of the topological spaces we are going to encounter will be sampled discretely. Aftewards the Vietoris-Rips technique will be applied to the surface and the persistent homology groups will be computed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# computing the adjacency matrix of the grid points, with boundaries identified as in the real projective space\n", "from sklearn.utils.graph_shortest_path import graph_shortest_path\n", "\n", "# This functions prepares the grid matrix with boundary identification\n", "def make_matrix(rows, cols):\n", " n = rows*cols\n", " M = np.zeros((n,n))\n", " for r in range(rows):\n", " for c in range(cols):\n", " i = r*cols + c\n", " # Two inner diagonals\n", " if c > 0: M[i-1,i] = M[i,i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # Two outer diagonals\n", " if r > 0: M[i-cols,i] = M[i,i-cols] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # vertical twisted boundary identification\n", " if c == 0: M[n-i-1,i] = M[i,n-i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # horizontal twisted boundary identification\n", " if r == 0: M[n-i-1,i] = M[i,n-i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " \n", " return M\n", "\n", "M = make_matrix(20,20)\n", "\n", "# computing the distance matrix of the points over the Klein bottle\n", "\n", "rp2 = graph_shortest_path(M)\n", "\n", "# Plot of the distance matrix\n", "figure = plt.figure(figsize=(10,10))\n", "plt.imshow(rp2)\n", "plt.title('Reciprocal distance between points over the Klein bottle')\n", "plt.colorbar()\n", "plt.show()\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# computing the adjacency matrix of the grid points, with boundaries identified as in the klein bottle\n", "from sklearn.utils.graph_shortest_path import graph_shortest_path\n", "\n", "# This functions prepares the grid matrix with boundary identification\n", "def make_matrix(rows, cols):\n", " n = rows*cols\n", " M = np.zeros((n,n))\n", " for r in range(rows):\n", " for c in range(cols):\n", " i = r*cols + c\n", " # Two inner diagonals\n", " if c > 0: M[i-1,i] = M[i,i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # Two outer diagonals\n", " if r > 0: M[i-cols,i] = M[i,i-cols] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # vertical boundary identification\n", " if c == 0: M[i+cols-1,i] = M[i,i+cols-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " # horizontal twisted boundary identification\n", " if r == 0: M[n-i-1,i] = M[i,n-i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n", " \n", " return M\n", "\n", "M = make_matrix(20,20)\n", "\n", "# computing the distance matrix of the points over the Klein bottle\n", "\n", "klein = graph_shortest_path(M)\n", "\n", "# Plot of the distance matrix\n", "figure = plt.figure(figsize=(10,10))\n", "plt.imshow(klein)\n", "plt.title('Reciprocal distance between points over the Klein bottle')\n", "plt.colorbar()\n", "plt.show()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Saving the results into an array\n", "\n", "topological_spaces_mat=np.asarray([rp2, klein])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Computing persistent homology\n", "\n", "In the next section we will use giotto-tda to compute the persistent homology groups of the topological spaces we just constructed" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# the homology ranks we choose to consider\n", "homologyDimensions = (0, 1 ,2)\n", "persistenceDiagram = hl.VietorisRipsPersistence(metric='precomputed', max_edge_length=np.inf, homology_dimensions=homologyDimensions, n_jobs=-1)\n", "persistenceDiagram.fit(topological_spaces_mat)\n", "\n", "# List of all the time-pordered persistent diagrams obtained from the list of correlation matrices\n", "zDiagrams = persistenceDiagram.transform(topological_spaces_mat)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Persistent diagrams\n", "\n", "The topological information of the point cloud is synthesised in the persistent diagram. The horizonral axis corresponds to the moment in which an homological generator is born, while the vertical axis corresponds to the moments in which an homological generator dies.\n", "The generators of the homology groups (at given rank) are colored differently" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plotting the persistent diagram of the projective space\n", "plot_diagram(zDiagrams[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plotting the persistent diagram of the Klein bottle\n", "plot_diagram(zDiagrams[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion\n", "\n", "As you can see from the persistence diagrams, all the betti numbers were found. \n", "Some other persistent generators are also appearing, depending on how dense the sampling is and how much noise there is.\n", "For example, we see a rahter neat persistent diagram over the Klein bottle (wesee 2 persistent generators for $H^1$ and 1 persistent generator for $H^2$). Notice that all these homology groups are computed over the field $\\mathbb{F}_2$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }