{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Topological feature extraction using `VietorisRipsPersistence` and `PersistenceEntropy`\n",
    "\n",
    "In this notebook, we showcase the ease of use of one of the core components of `giotto-tda`: `VietorisRipsPersistence`, along with vectorisation methods. We first list steps in a typical, topological-feature extraction routine and then show to encapsulate them with a standard `scikit-learn`–like pipeline.\n",
    "\n",
    "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/vietoris_rips_quickstart.ipynb).\n",
    "\n",
    "**License: AGPLv3**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gtda.diagrams import PersistenceEntropy\n",
    "from gtda.homology import VietorisRipsPersistence\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate data\n",
    "\n",
    "Let's begin by generating 3D point clouds of spheres and tori, along with a label of 0 (1) for each sphere (torus). We also add noise to each point cloud, whose effect is to displace the points sampling the surfaces by a random amount in a random direction. **Note**: You will need the auxiliary module [datasets.py](https://github.com/giotto-ai/giotto-tda/blob/master/examples/datasets.py) to run this cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import generate_point_clouds\n",
    "point_clouds, labels = generate_point_clouds(100, 10, 0.1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate persistent homology\n",
    "\n",
    "Instantiate a `VietorisRipsPersistence` transformer and calculate persistence diagrams for this collection of point clouds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vietorisrips_tr = VietorisRipsPersistence()\n",
    "diagrams = vietorisrips_tr.fit_transform(point_clouds)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extract features\n",
    "\n",
    "Instantiate a `PersistenceEntropy` transformer and extract features from the persistence diagrams."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "entropy_tr = PersistenceEntropy()\n",
    "features = entropy_tr.fit_transform(diagrams)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use the new features in a standard classifier\n",
    "\n",
    "Leverage the compatibility with `scikit-learn` to perform a train-test split and score the features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_valid, y_train, y_valid = train_test_split(features, labels)\n",
    "model = RandomForestClassifier()\n",
    "model.fit(X_train, y_train)\n",
    "model.score(X_valid, y_valid)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Encapsulates the steps above in a pipeline\n",
    "\n",
    "Subdivide into train-validation first, and use the pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gtda.pipeline import make_pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define the pipeline\n",
    "\n",
    "Chain transformers from `giotto-tda` with `scikit-learn` ones."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "steps = [VietorisRipsPersistence(),\n",
    "         PersistenceEntropy(),\n",
    "         RandomForestClassifier()]\n",
    "pipeline = make_pipeline(*steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare the data\n",
    "Train-test split on the point-cloud data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pcs_train, pcs_valid, labels_train, labels_valid = train_test_split(\n",
    "    point_clouds, labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train and score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline.fit(pcs_train, labels_train)\n",
    "pipeline.score(pcs_valid, labels_valid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}