{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Case study: Lorenz attractor\n", "\n", "This notebook contains a full TDA pipeline to analyse the transitions of the Lorenz system to a chaotic regime from the stable one and viceversa.\n", "\n", "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/lorenz_attractor.ipynb).\n", "\n", "**License: AGPLv3**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import libraries\n", "The first step consists in importing relevant *gtda* components and other useful libraries or modules." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the gtda modules\n", "from gtda.time_series import Resampler, TakensEmbedding, SlidingWindow, PermutationEntropy\n", "from gtda.homology import VietorisRipsPersistence\n", "from gtda.diagrams import Scaler, Filtering, PersistenceEntropy, BettiCurve, PairwiseDistance\n", "from gtda.graphs import KNeighborsGraph, GraphGeodesicDistance\n", "\n", "from gtda.pipeline import Pipeline\n", "\n", "import numpy as np\n", "from sklearn.metrics import pairwise_distances\n", "\n", "import plotly.express as px\n", "from plotly.offline import init_notebook_mode, iplot\n", "init_notebook_mode(connected=True)\n", "\n", "# gtda plotting functions\n", "from gtda.plotting import plot_heatmap\n", "\n", "# Import data from openml\n", "import openml" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting functions\n", "from gtda.plotting import plot_diagram, plot_betti_surfaces\n", "from gtda.plotting import plot_point_cloud" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up the Lorenz attractor simulation\n", "\n", "In the next block we set up all the parameters of the Lorenz system and we define also the instants at which the regime (stable VS chaotic) changes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the trajectories of the Lorenz system\n", "from openml.datasets.functions import get_dataset\n", "\n", "point_cloud = get_dataset(42182).get_data(dataset_format='array')[0]\n", "plot_point_cloud(point_cloud)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Selecting the z-axis and the label rho\n", "X = point_cloud[:,2]\n", "y = point_cloud[:,3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = px.line(title='Trajectory of the Lorenz solution, projected along the z-axis')\n", "fig.add_scatter(y=X, name='X')\n", "fig.add_scatter(y=y, name='y')\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resampling the time series\n", "\n", "It is important to find the correct time scale at which key signals take place. Here we propose one possible resampling period: *10h*. Recall that the unit time is *1h*. The resampler method is used to perform the resampling." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "period = 10\n", "periodicSampler = Resampler(period=period)\n", "\n", "X_sampled, y_sampled = periodicSampler.fit_transform_resample(X, y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = px.line(title='Trajectory of the Lorenz solution, projected along the z-axis and resampled every 10h')\n", "fig.add_scatter(y=X_sampled.flatten(), name='X_sampled')\n", "fig.add_scatter(y=y_sampled, name='y_sampled')\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Takens Embedding\n", "\n", "In order to obtain meaningful topological features from a time series, we use a delayed-time embedding technique, invented by F. Takens in the late 1960s.\n", "The idea is simple: given a time series $X(t)$, one can extract a sequence of vectors of the form $X_i := [(X(t_i)), X(t_i + 2 \\tau), ..., X(t_i + M \\tau)]$.\n", "The difference between $t_i$ and $t_{i-1}$ is called *stride*.\n", "\n", "$M$ and $\\tau$ are optimized automatically in this example (they can be set by the user if needed)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "embedding_dimension = 10\n", "time_delay = 3\n", "TE = TakensEmbedding(\n", " parameters_type='search', dimension=embedding_dimension, time_delay=time_delay)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TE.fit(X_sampled)\n", "time_delay_ = TE.time_delay_\n", "embedding_dimension_ = TE.dimension_\n", "\n", "print('Optimal embedding time delay based on mutual information: ', time_delay_)\n", "print('Optimal embedding dimension based on false nearest neighbors: ', embedding_dimension_)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_embedded, y_embedded = TE.transform_resample(X_sampled, y_sampled)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also \"localise\" our Takens embedding procedure to multiple sliding windows over the data, rather than over the whole time series as we just did. The result is therefore a \"time series of point clouds\" with possibly interesting topologies, which we will be able to feed directly to our homology transformers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "window_width = 40\n", "window_stride = 5\n", "SW = SlidingWindow(width=window_width, stride=window_stride)\n", "\n", "X_windows, y_windows = SW.fit_transform_resample(X_embedded, y_embedded)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can plot the Takens embedding of a specific window either by using `plot_point_cloud`, or by using the `plot` method of `SlidingWindow`, as follows (*note*: when `embedding_dimension > 3`, only the first three coordinates are plotted!):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "window_number = 3\n", "SW.plot(X_windows, sample=window_number)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For comparison, here is the portion of time series containing the data which originates this point cloud. Notice the quasi-periodicity, corresponding to the loop in the point cloud." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "embedded_begin, embedded_end = SW._slice_windows(X_embedded)[window_number]\n", "window_indices = np.arange(embedded_begin, embedded_end + time_delay_ * (embedding_dimension_ - 1))\n", "fig = px.line(title=f'Resampled Lorenz solution over sliding window {window_number}')\n", "fig.add_scatter(x=window_indices, y=X_sampled[window_indices], name='X_sampled')\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Persistence diagram\n", "The topological information in the embedding is synthesised via the persistence diagram. The horizontal axis corresponds to the moment in which a homological generator is born, while the vertical axis corresponds to the moments in which a homological generator dies.\n", "The generators of the homology groups (at given rank) are colored differently." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "homology_dimensions = (0, 1, 2)\n", "VR = VietorisRipsPersistence(\n", " metric='euclidean', max_edge_length=100, homology_dimensions=homology_dimensions)\n", "\n", "X_diagrams = VR.fit_transform(X_windows)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can plot the persistence diagram for the embedding of the same sliding window as before. One way is using the `plot_diagram` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_diagram(X_diagrams[window_number])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we could have used the `plot` method of `VietorisRipsPersistence` as follows:\n", "```\n", "VR.plot(X_diagrams, sample=window_number)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scikit-learn–style pipeline\n", "\n", "One of the advantages of `giotto-tda` is the compatibility with `scikit-learn`. It is possible to set up and run a full pipeline such as the one above in a few lines:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Steps of the Pipeline\n", "steps = [\n", " ('sampling', periodicSampler),\n", " ('embedding', TE),\n", " ('window', SW),\n", " ('diagrams', VR)\n", "]\n", "\n", "# Define the Pipeline\n", "pipeline = Pipeline(steps)\n", "\n", "# Run the pipeline\n", "X_diagrams = pipeline.fit_transform(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The final result is the same as before:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_diagram(X_diagrams[window_number])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Rescaling the diagram\n", "\n", "Rescaling a diagram means normalizing points such that the maximum \"bottleneck distance\" from the *empty diagram* (by default, across all homology dimensions) is equal to one. Notice that this means the birth and death scales are modified. We can use `Scaler` as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "diagramScaler = Scaler()\n", "\n", "X_scaled = diagramScaler.fit_transform(X_diagrams)\n", "\n", "diagramScaler.plot(X_scaled, sample=window_number)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering diagrams\n", "\n", "Filtering a diagram means eliminating the homology generators whose lifespan is considererd too short to be significant. We can use `Filtering` as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "diagramFiltering = Filtering(epsilon=0.1, homology_dimensions=(1, 2))\n", "\n", "X_filtered = diagramFiltering.fit_transform(X_scaled)\n", "\n", "diagramFiltering.plot(X_filtered, sample=window_number)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can add the steps above to our pipeline:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "steps_new = [\n", " ('scaler', diagramScaler),\n", " ('filtering', diagramFiltering)\n", "]\n", "\n", "pipeline_filter = Pipeline(steps + steps_new)\n", "\n", "X_filtered = pipeline_filter.fit_transform(X)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_diagram(X_filtered[window_number])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Persistence entropy\n", "\n", "In this section we show how to compute the *entropy* of persistence diagrams." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "PE = PersistenceEntropy()\n", "\n", "X_persistence_entropy = PE.fit_transform(X_scaled)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = px.line(title='Persistence entropies, indexed by sliding window number')\n", "for dim in range(X_persistence_entropy.shape[1]):\n", " fig.add_scatter(y=X_persistence_entropy[:, dim], name=f'PE in homology dimension {dim}')\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Betti Curves\n", "\n", "In this section we show how to compute the Betti curves of a persistence diagram. We also show the plot of the Betti surface, i.e. the time-stack of the Betti curves." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "BC = BettiCurve()\n", "\n", "X_betti_curves = BC.fit_transform(X_scaled)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "BC.plot(X_betti_curves, sample=window_number)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Distances among diagrams\n", "\n", "In this section we show how to compute several notions of distances among persistence diagrams.\n", "\n", "In each case, we will obtain distance matrices whose i-th row encodes the distance of the i-th diagram from all the others.\n", "\n", "We start with the so-called \"landscape $L^2$ distance\": when the parameter `order` is `None`, the output is one distance matrix per sample and homology dimension." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p_L = 2\n", "n_layers = 5\n", "PD = PairwiseDistance(\n", " metric='landscape', metric_params={'p': p_L, 'n_layers': n_layers, 'n_bins': 1000}, order=None)\n", "\n", "X_distance_L = PD.fit_transform(X_diagrams)\n", "print(X_distance_L.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is what distances in homology dimension 0 look like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_heatmap(X_distance_L[:, :, 0], colorscale='blues')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now change metric and compute the \"$2$-Wasserstein distances\" between the diagrams. This one takes longer to compute!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p_W = 2\n", "PD = PairwiseDistance(\n", " metric='wasserstein', metric_params={'p': p_W, 'delta': 0.1}, order=None)\n", "\n", "X_distance_W = PD.fit_transform(X_diagrams)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And again this is what distances in homology dimension 0 look like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_heatmap(X_distance_W[:, :, 0], colorscale='blues')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that how dramatically things can change when the metrics are modified." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## New distances in the embedding space: kNN graphs and geodesic distances\n", "\n", "We propose here a new way to compute distances between points in the embedding space. Instead of considering the Euclidean distance in the Takens space, we propose to build a $k$-nearest neighbors graph and then use the geodesic distance on such graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_neighbors = 3\n", "kNN = KNeighborsGraph(n_neighbors=n_neighbors)\n", "\n", "X_kNN = kNN.fit_transform(X_windows)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the graph embedding, the natural notion of distance between vertices corresponds to the lengths of the shortest path connecting two vertices. This is also known as *graph geodesic distance*. We compute it and plot it for our chosen window number." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "GGD = GraphGeodesicDistance()\n", "\n", "GGD.fit_transform_plot(X_kNN, sample=window_number);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For comparison, this is what the ordinary pairwise Euclidean distance matrix looks like for the same window:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_heatmap(pairwise_distances(X_windows[window_number]), colorscale='blues')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is what the first few steps (before scaling and filtering) of the pipeline described above would be if you'd like persistence diagrams to be obtained using this new distance instead. Notice that we have to pass `metric='precomputed'` to the `VietorisRipsPersistence` constructor this time, because the input already consists of distance matrices:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Steps of the Pipeline\n", "steps = [\n", " ('sampling', periodicSampler),\n", " ('embedding', TE),\n", " ('window', SW),\n", " ('kNN_graph', kNN),\n", " ('graph_geo_distance', GGD),\n", " ('diagrams', VietorisRipsPersistence(\n", " metric='precomputed', max_edge_length=100, homology_dimensions=homology_dimensions))\n", "]\n", "\n", "# Define the Pipeline\n", "pipeline = Pipeline(steps)\n", "\n", "# Run the pipeline\n", "X_diagrams = pipeline.fit_transform(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the persistence diagrams obtained with this new distance:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_diagram(X_diagrams[window_number])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }