{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Case study: Lorenz attractor\n", "##### License: Apache 2.0\n", "\n", "\n", "This notebook contains a full TDA pipeline to analyse the transitions of the Lorenz system to a chaotic regime from the stable one and viceversa.\n", "\n", "The first step consists in importing the *giotto* library." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the giotto library\n", "import giotto as go\n", "import giotto.time_series as ts\n", "import giotto.graphs as gr\n", "import giotto.diagrams as diag\n", "import giotto.homology as hl\n", "import numpy as np\n", "import sklearn as sk\n", "from giotto.pipeline import Pipeline\n", "from sklearn.ensemble import RandomForestRegressor\n", "import matplotlib.pyplot as plt\n", "\n", "import plotly.graph_objs as gobj\n", "from plotly.offline import init_notebook_mode, iplot\n", "init_notebook_mode(connected=True)\n", "\n", "# import data from openml\n", "import openml\n", "openml.config.start_using_configuration_for_example()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plotting functions\n", "\n", "The *plotting.py* file is required to use the following plotting functions. It can be found in the *examples* folder on out github." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting functions\n", "from plotting import plot_diagram, plot_landscapes\n", "from plotting import plot_betti_surfaces, plot_betti_curves\n", "from plotting import plot_point_cloud" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setting up the Lorenz attractor simulation\n", "\n", "In the next block we set up all the parameters of the Lorenz system and we define also the instants at which the regime (stable VS chaotic) changes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the trajectories of the Lorenz system\n", "from openml.datasets.functions import get_dataset\n", "\n", "point_cloud = get_dataset(74800).get_data(dataset_format='array')[0]\n", "plot_point_cloud(point_cloud)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Selecting the z-axis and the label rho\n", "X = point_cloud[:,2].reshape(-1, 1)\n", "y = point_cloud[:,3]\n", "\n", "# Plotting the Lorenz system\n", "fig = plt.figure(figsize=(16,6))\n", "plt.plot(X)\n", "plt.plot(y)\n", "plt.title('Trajectory of the Lorenz solution, projected along the z-axis')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Resampling the time series\n", "\n", "It is important to find the correct time scale at which key signals take place. Here we propose one possible resampling period: *10h*. Recall that the unit time is *1h*. The resampler method is used to perform the resampling." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "period = 10\n", "periodicSampler = ts.Resampler(period=period)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "periodicSampler.fit(X)\n", "X_sampled, y_sampled = periodicSampler.transform_resample(X, y)\n", "\n", "fig = plt.figure(figsize=(16,6))\n", "plt.plot(X_sampled)\n", "plt.plot(y_sampled)\n", "plt.title('Trajectory of the Lorenz solution, projected along the z-axis and resampled every 10h')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Steps of the Pipeline\n", "steps = [\n", " ('sampling', ts.Resampler(period=period)),\n", " ('regressor', RandomForestRegressor(n_estimators=100))\n", "]\n", "\n", "# Sklearn Pipeline\n", "pipeline = Pipeline(steps)\n", "\n", "pipeline.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Takens Embedding\n", "\n", "In order to obtain meaningful topological features from a time series, we use a delayed-time embedding technique, invented by F. Takens in the late sixties.\n", "The idea is simple: given a time series X(t), one can extract a sequence of vectors of the form X_i := [(X(t_i)), X(t_i + 2 tau), ..., X(t_i + M tau)].\n", "The difference between t_i and t_i-1 is called *stride*; the numbers M and tau are optimized authomatically in this example (they can be set by the user if needed).\n", " \n", "The *outer window* allows us to apply Takens embedding locally on a certain interval rather than over the whole time series. The result of this procedure is therefore a time series of point clouds with possibly interesting topologies.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "embedding_dimension = 10\n", "embedding_time_delay = 3\n", "embedder = ts.TakensEmbedding(parameters_type='search', dimension=embedding_dimension, \n", " time_delay=embedding_time_delay, n_jobs=-1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "embedder.fit(X)\n", "embedder_time_delay = embedder.time_delay_\n", "embedder_dimension = embedder.dimension_\n", "\n", "print('Optimal embedding time delay based on mutual information: ', embedder_time_delay)\n", "print('Optimal embedding dimension based on false nearest neighbors: ', embedder_dimension)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_embedded, y_embedded = embedder.transform_resample(X_sampled, y_sampled)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "window_width = 40\n", "window_stride = 5\n", "sliding_window = ts.SlidingWindow(width=window_width, stride=window_stride)\n", "sliding_window.fit(X_embedded, y_embedded)\n", "\n", "X_windows, y_windows = sliding_window.transform_resample(X_embedded, y_embedded)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting Takens embedding of the third outer window\n", "window_number = 3\n", "window = X_windows[window_number][:,:3]\n", "plot_point_cloud(window)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the time series associated to the above point cloud. \n", "# Notice the two periodicities corresponding to the loops in the embedding\n", "fig = plt.figure(figsize=(16,6))\n", "plt.plot(X_sampled[(window_number*window_stride)*embedder_time_delay*embedder_dimension:\n", " (window_number*window_stride + window_width)*embedder_time_delay*embedder_dimension])\n", "plt.title('The Lorenz solution over one outer window ')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Persistence diagram\n", "The topological information of the correlation metric is synthesised in the persistent diagram. The horizonral axis corresponds to the moment in which an homological generator is born, while the vertical axis corresponds to the moments in which an homological generator dies.\n", "The generators of the homology groups (at given rank) are colored differently." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "homology_dimensions = [0, 1, 2]\n", "persistenceDiagram = hl.VietorisRipsPersistence(metric='euclidean', max_edge_length=100, \n", " homology_dimensions=homology_dimensions, n_jobs=1)\n", "persistenceDiagram.fit(X_windows)\n", "X_diagrams = persistenceDiagram.transform(X_windows, y_windows)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot the persistence diagram\n", "plot_diagram(X_diagrams[window_number])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Scikit-learn *Pipeline*\n", "One of the advantages of our giotto library is the compatibility with scikit-learn. It is possible to set up a full pipeline such as the one above in a few lines. We will show how in the next section.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Steps of the Pipeline\n", "steps = [\n", " ('sampling', ts.Resampler(period=period)),\n", " ('embedding', ts.TakensEmbedding(parameters_type='search', dimension=embedding_dimension, \n", " time_delay=embedding_time_delay, n_jobs=-1)),\n", " ('window', ts.SlidingWindow(width=window_width, stride=window_stride)),\n", " ('diagrams', hl.VietorisRipsPersistence(metric='euclidean', max_edge_length=100, \n", " homology_dimensions=homology_dimensions, n_jobs=-1))\n", "]\n", "\n", "# Sklearn Pipeline\n", "pipeline = Pipeline(steps)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Running the pipeline\n", "pipeline.fit(X)\n", "X_diagrams = pipeline.transform(X)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the final persistent diagram of one outer window\n", "plot_diagram(X_diagrams[window_number])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Rescaling the diagram\n", "Rescaling the diagram means normalizing the points such that their distance from the *empty diagram* is equal to one.\n", "Once the diagram is rescaled, we can filter noise by removing all the points that are close to the diagonal." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "diagram_scaler = diag.Scaler()\n", "diagram_scaler.fit(X_diagrams)\n", "X_scaled = diagram_scaler.transform(X_diagrams)\n", "\n", "plot_diagram(X_scaled[window_number])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Filtering diagrams\n", "\n", "Filtering noise from a diagram corresponds to eliminating all the homology generators whose lifespan is too short to be significant. It means equivalently that we are removing the points closer to the diagonal than a centrain threshold." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Perparing the filtering transformer\n", "diagram_filter = diag.Filtering(epsilon=0.1, homology_dimensions=[1, 2])\n", "diagram_filter.fit(X_scaled)\n", "X_filtered = diagram_filter.transform(X_scaled)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_diagram(X_filtered[window_number])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Wrapping up all the steps inside a scikit-learn pipeline\n", "steps = [\n", " ('sampling', ts.Resampler(period=period)),\n", " ('embedding', ts.TakensEmbedding(parameters_type='search', dimension=embedding_dimension, \n", " time_delay=embedding_time_delay, n_jobs=1)),\n", " ('window', ts.SlidingWindow(width=window_width, stride=window_stride)),\n", " ('diagrams', hl.VietorisRipsPersistence(metric='euclidean', max_edge_length=100, \n", " homology_dimensions=homology_dimensions, n_jobs=1)),\n", " ('diagrams_scaler', diag.Scaler()),\n", " ('diagrams_filter', diag.Filtering(epsilon=0.1))\n", "]\n", "\n", "pipeline_filter = Pipeline(steps)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pipeline_filter.fit(X)\n", "X_filtered_noisy = pipeline_filter.transform(X)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_diagram(X_filtered_noisy[window_number])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Persistence entropy\n", "\n", "In this section we show how to compute the entropy of persistence diagrams." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Perparing the filtering transformer\n", "persistent_entropy = diag.PersistenceEntropy()\n", "persistent_entropy.fit(X_scaled)\n", "X_persistent_entropy = persistent_entropy.transform(X_scaled)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(16,6))\n", "plt.plot(X_persistent_entropy)\n", "plt.title('Trajectory of the Lorenz solution, projected along the z-axis')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Persistent Landscape\n", "In this section we shoew how to compute the persistent landscape of a persistent diagram." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "persistent_landscape = diag.PersistenceLandscape(n_layers=4)\n", "persistent_landscape.fit(X_scaled)\n", "X_persistent_landscape = persistent_landscape.transform(X_scaled)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_landscapes(X_persistent_landscape[window_number])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Betti Curves\n", "In this section we show how to compute the Betti curves of a persistent diagram. We also show the plot of the Betti surface, i.e. the time-stack of the bettu curves." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "betti_curves = diag.BettiCurve()\n", "betti_curves.fit(X_scaled)\n", "X_betti_curves = betti_curves.transform(X_scaled)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_betti_curves(X_betti_curves[window_number])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_betti_surfaces(X_betti_curves)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Distances among diagrams\n", "\n", "In this section we show how to compute distances among persistent diagrams. There are many notions of distances: here we use the *bottleneck distance* as an example.\n", "\n", "We stress that the *i-th* row of this matrix is the distance of the *i-th* diagram from all the others." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We change metric: we compute the landscape distance among the diagrams \n", "diagram_distance = diag.PairwiseDistance(metric='landscape', metric_params={'p': 2, 'n_layers':1, 'n_values':1000}, \n", " order=None, n_jobs=2)\n", "diagram_distance.fit(X_diagrams)\n", "X_distance_L = diagram_distance.transform(X_diagrams)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot of the landascape L2 distance between persistent diagrams \n", "figure = plt.figure(figsize=(10,10))\n", "plt.imshow(X_distance_L[:,:, 0])\n", "plt.colorbar()\n", "plt.title('Landscape L2 distance matrix for H0')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We change metric: we compute the wasserestein distance among the diagrams \n", "diagram_distance = diag.PairwiseDistance(metric='wasserstein', metric_params={'p': 2, 'delta':0.2}, \n", " order=2, n_jobs=1)\n", "diagram_distance.fit(X_diagrams)\n", "X_distance_W = diagram_distance.transform(X_diagrams)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot of the landascape L2 distance between persistent diagrams \n", "figure = plt.figure(figsize=(10,10))\n", "plt.imshow(X_distance_W)\n", "plt.colorbar()\n", "plt.title('2-norm of 2-wassertein distance matrices')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# New distance in the embedding space: permutations and graphs\n", "\n", "We propose here a new way to compute distances between points in the embedding space. Instead of considering the Euclidean distance in the Takens space, we propose to build a k-nearest neightbors graph and then use the geodesic distance on such graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kNN_graph = gr.KNeighborsGraph(n_neighbors=3, n_jobs=1)\n", "kNN_graph.fit(X_embedded)\n", "X_kNN = kNN_graph.transform(X_embedded[0].reshape((1, X_embedded.shape[1], -1)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Geodesic distance on graphs\n", "Given the graph embedding, the natural notion of distance between vertices corresponds to the lengths of the shortest path connecting two vertices. This is also known as *graph geodesic distance*. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geodesic_distance = gr.GraphGeodesicDistance(n_jobs=1)\n", "geodesic_distance.fit(X_kNN)\n", "X_GeoNN = geodesic_distance.transform(X_kNN)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the geodesic distance\n", "figure = plt.figure(figsize=(10,10))\n", "plt.imshow(X_GeoNN[0])\n", "plt.colorbar()\n", "plt.title('Plot of the geodesic distance of the first outer window kNN graph')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Synthetic features extracted from topology\n", "\n", "One can use persistent diagrams as he pleases. Here we show one way of extracting summary information from the time-series of diagrams: the *permutation entropy*. The entropy is computed by estimating a probability based on teh counting how many permutation patterns appear along the time series within the outer window." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Computing permutation entropy\n", "permutation_entropy = ts.PermutationEntropy(n_jobs=1)\n", "permutation_entropy.fit(X_windows)\n", "X_entropy = permutation_entropy.transform(X_windows)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot the sampled Lorenz solution, projected along the z-axis\n", "fig = plt.figure(figsize=(16,6))\n", "plt.plot(X_sampled)\n", "plt.title('Resampled solution of the Lorenz attractor projected along the z-axis')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot of the permutation entropy time-series\n", "fig = plt.figure(figsize=(16,6))\n", "plt.plot(X_entropy)\n", "plt.title('Permutation entropy')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }