{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Topological feature extraction using `VietorisRipsPersistence` and `PersistenceEntropy`\n", "\n", "In this notebook, we showcase the ease of use of one of the core components of `giotto-tda`: `VietorisRipsPersistence`, along with vectorisation methods. We first list steps in a typical, topological-feature extraction routine and then show to encapsulate them with a standard `scikit-learn`–like pipeline.\n", "\n", "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/vietoris_rips_quickstart.ipynb).\n", "\n", "**License: AGPLv3**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from gtda.diagrams import PersistenceEntropy\n", "from gtda.homology import VietorisRipsPersistence\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate data\n", "\n", "Let's begin by generating 3D point clouds of spheres and tori, along with a label of 0 (1) for each sphere (torus). We also add noise to each point cloud, whose effect is to displace the points sampling the surfaces by a random amount in a random direction. **Note**: You will need the auxiliary module [datasets.py](https://github.com/giotto-ai/giotto-tda/blob/master/examples/datasets.py) to run this cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datasets import generate_point_clouds\n", "point_clouds, labels = generate_point_clouds(100, 10, 0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate persistent homology\n", "\n", "Instantiate a `VietorisRipsPersistence` transformer and calculate persistence diagrams for this collection of point clouds." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vietorisrips_tr = VietorisRipsPersistence()\n", "diagrams = vietorisrips_tr.fit_transform(point_clouds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract features\n", "\n", "Instantiate a `PersistenceEntropy` transformer and extract features from the persistence diagrams." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entropy_tr = PersistenceEntropy()\n", "features = entropy_tr.fit_transform(diagrams)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use the new features in a standard classifier\n", "\n", "Leverage the compatibility with `scikit-learn` to perform a train-test split and score the features." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train, X_valid, y_train, y_valid = train_test_split(features, labels)\n", "model = RandomForestClassifier()\n", "model.fit(X_train, y_train)\n", "model.score(X_valid, y_valid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Encapsulates the steps above in a pipeline\n", "\n", "Subdivide into train-validation first, and use the pipeline." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from gtda.pipeline import make_pipeline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define the pipeline\n", "\n", "Chain transformers from `giotto-tda` with `scikit-learn` ones." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "steps = [VietorisRipsPersistence(),\n", " PersistenceEntropy(),\n", " RandomForestClassifier()]\n", "pipeline = make_pipeline(*steps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare the data\n", "Train-test split on the point-cloud data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pcs_train, pcs_valid, labels_train, labels_valid = train_test_split(\n", " point_clouds, labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train and score" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pipeline.fit(pcs_train, labels_train)\n", "pipeline.score(pcs_valid, labels_valid)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }