{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Topology of time series\n", "\n", "This notebook explores how ``giotto-tda`` can be used to gain insights from time-varying data by using ideas from from dynamical systems and persistent homology.\n", "\n", "If you are looking at a static version of this notebook and would like to run its contents, head over to [GitHub](https://github.com/giotto-ai/giotto-tda/blob/master/examples/topology_time_series.ipynb) and download the source.\n", "\n", "## Useful references\n", "\n", "* [Topological Methods for the Analysis of Data](https://youtu.be/DZwK2gT-d8g) by Jose Perea\n", "* The sliding window notebooks from Chris Tralie's [TDALabs](https://github.com/ctralie/TDALabs)\n", "* [Detection of gravitational waves using topological data analysis and convolutional neural network: An improved approach](https://arxiv.org/abs/1910.08245) by Christopher Bresten and Jae-Hun Jung. We thank Christopher Bresten for sharing the code and data used in the article.\n", "\n", "## See also\n", "\n", "- [Gravitational waves detection](https://giotto-ai.github.io/gtda-docs/latest/notebooks/gravitational_waves_detection.html), where,following [arXiv:1910.08245](https://arxiv.org/abs/1910.08245), the *Takens embedding* technique is shown to be effective for the detection of gravitational waves signals buried in background noise.\n", "- [Topology in time series forecasting](https://giotto-ai.github.io/gtda-docs/latest/notebooks/time_series_forecasting.html), in which the Takens embedding technique is used in time series forecasting tasks by using sliding windows.\n", "- [Topological feature extraction using VietorisRipsPersistence and PersistenceEntropy](https://giotto-ai.github.io/gtda-docs/latest/notebooks/vietoris_rips_quickstart.html) for a quick introduction to general topological feature extraction in ``giotto-tda``.\n", "\n", "**License: AGPLv3**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## From time series to time delay embeddings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step in analysing the topology of time series is to construct a _**time delay embedding**_ or _**Takens embedding**_, named after [Floris Takens](https://en.wikipedia.org/wiki/Floris_Takens) who pioneered its use in the study of [dynamical systems](https://en.wikipedia.org/wiki/Takens's_theorem). A time delay embedding can be thought of as sliding a \"window\" of fixed size over a signal, with each window represented as a point in a (possibly) higher-dimensional space. A simple example is shown in the animation below, where pairs of points in a 1-dimensional signal are mapped to coordinates in a 2-dimensional embedding space. \n", "\n", "![A 2-dimensional time delay embedding](images/time_delay_embedding.gif)\n", "\n", "More formally, given a time series $f(t)$, one can extract a _**sequence of vectors**_ of the form $f_i = [f(t_i), f(t_i + 2 \\tau), \\ldots, f(t_i + (d-1) \\tau)] \\in \\mathbb{R}^{d}$, where $d$ is the _**embedding dimension**_ and $\\tau$ is the _**time delay**_. The quantity $(d-1)\\tau$ is known as the \"window size\" and the difference between $t_{i+1}$ and $t_i$ is called the **_stride_**. In other words, the time delay embedding of $f$ with parameters $(d,\\tau)$ is the function\n", "\n", "$$\n", "TD_{d,\\tau} f : \\mathbb{R} \\to \\mathbb{R}^{d}\\,, \\qquad t \\to \\begin{bmatrix}\n", " f(t) \\\\\n", " f(t + \\tau) \\\\\n", " f(t + 2\\tau) \\\\\n", " \\vdots \\\\\n", " f(t + (d-1)\\tau)\n", " \\end{bmatrix}\n", "$$\n", "\n", "\n", "and the main idea we will explore in this notebook is that if $f$ has a non-trivial recurrent structure, then the image of $TD_{d,\\tau}f$ will have non-trivial topology for appropriate choices of $(d, \\tau)$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A periodic example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a warm-up, recall that a function is periodic with period $T > 0$ if $f(t + T) = f(t)$ for all $t \\in \\mathbb{R}$. For example, consider the function $f(t) = \\cos(5 t)$ which can be visualised as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import plotly.graph_objects as go\n", "\n", "x_periodic = np.linspace(0, 10, 1000)\n", "y_periodic = np.cos(5 * x_periodic)\n", "\n", "fig = go.Figure(data=go.Scatter(x=x_periodic, y=y_periodic))\n", "fig.update_layout(xaxis_title=\"Timestamp\", yaxis_title=\"Amplitude\")\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can show that periodicity implies ellipticity of the time delay embedding. To do that we need to specify the embedding dimension $d$ and the time delay $\\tau$ for the Takens embedding, which in ``giotto-tda`` can be achieved as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from gtda.time_series import SingleTakensEmbedding\n", "\n", "embedding_dimension_periodic = 3\n", "embedding_time_delay_periodic = 8\n", "stride = 10\n", "\n", "embedder_periodic = SingleTakensEmbedding(\n", " parameters_type=\"fixed\",\n", " n_jobs=2,\n", " time_delay=embedding_time_delay_periodic,\n", " dimension=embedding_dimension_periodic,\n", " stride=stride,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Tip:** You can use the `stride` parameter to downsample the time delay embedding. This is handy when you want to quickly compute persistence diagrams on a dense signal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's apply this embedding to our one-dimensional time series to get a 3-dimensional _point cloud_:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_periodic_embedded = embedder_periodic.fit_transform(y_periodic)\n", "print(f\"Shape of embedded time series: {y_periodic_embedded.shape}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then use ``giotto-tda``'s plotting API to visualise the result:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from gtda.plotting import plot_point_cloud\n", "\n", "plot_point_cloud(y_periodic_embedded)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As promised, the periodicity of $f$ is reflected in the ellipticity of the time delay embedding! It turns out that in general, _**periodic functions trace out ellipses**_ in $\\mathbb{R}^{d}$. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A non-periodic example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is another type of recurrent behaviour: if we let $f(t) = \\cos(t) + \\cos(\\pi t)$ then it follows that $f$ is not periodic since the ratio of the two frequencies is irrational, i.e. we say that $\\cos(t)$ and $\\cos(\\pi t)$ are _incommensurate_. Nevertheless, their sum produces recurrent behaviour:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x_nonperiodic = np.linspace(0, 50, 1000)\n", "y_nonperiodic = np.cos(x_nonperiodic) + np.cos(np.pi * x_nonperiodic)\n", "\n", "fig = go.Figure(data=go.Scatter(x=x_nonperiodic, y=y_nonperiodic))\n", "fig.update_layout(xaxis_title=\"Timestamp\", yaxis_title=\"Amplitude\")\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As before, let's create a time delay embedding for this signal and visualise the resulting point cloud:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "embedding_dimension_nonperiodic = 3\n", "embedding_time_delay_nonperiodic = 16\n", "stride = 3\n", "\n", "embedder_nonperiodic = SingleTakensEmbedding(\n", " parameters_type=\"fixed\",\n", " n_jobs=2,\n", " time_delay=embedding_time_delay_nonperiodic,\n", " dimension=embedding_dimension_nonperiodic,\n", " stride=stride,\n", ")\n", "\n", "y_nonperiodic_embedded = embedder_nonperiodic.fit_transform(y_nonperiodic)\n", "\n", "plot_point_cloud(y_nonperiodic_embedded)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## From time delay embeddings to persistence diagrams" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the examples above we saw that the resulting point clouds appear to exhibit distinct topology. We can verify this explicitly using persistent homology! First we need to reshape our point cloud arrays in a form suitable for the [VietorisRipsPersistence transformer](https://giotto-ai.github.io/gtda-docs/latest/notebooks/vietoris_rips_quickstart.html), namely `(n_samples, n_points, n_dimensions)`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_periodic_embedded = y_periodic_embedded[None, :, :]\n", "y_nonperiodic_embedded = y_nonperiodic_embedded[None, :, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next step is to calculate the persistence diagrams associated with each point cloud. In ``giotto-tda`` we can do this with the Vietoris-Rips construction as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from gtda.homology import VietorisRipsPersistence\n", "\n", "# 0 - connected components, 1 - loops, 2 - voids\n", "homology_dimensions = [0, 1, 2]\n", "\n", "periodic_persistence = VietorisRipsPersistence(\n", " homology_dimensions=homology_dimensions, n_jobs=6\n", ")\n", "print(\"Persistence diagram for periodic signal\")\n", "periodic_persistence.fit_transform_plot(y_periodic_embedded)\n", "\n", "nonperiodic_persistence = VietorisRipsPersistence(\n", " homology_dimensions=homology_dimensions, n_jobs=6\n", ")\n", "print(\"Persistence diagram for nonperiodic signal\")\n", "nonperiodic_persistence.fit_transform_plot(y_nonperiodic_embedded);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What can we conclude from these diagrams? The first thing that stands out is the different types of homology dimensions that are most persistent. In the periodic case we see a single point associated with 1-dimensional persistent homology, namely a loop! On the other hand, the non-periodic signal has revealed two points associated with 2-dimensional persistent homology, namely _voids_. These clear differences in topology make the time delay embedding technique especially powerful at classifying different time series." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Picking the embedding dimension and time delay" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the examples above, we manually chose values for the embedding dimension $d$ and time delay $\\tau$. However, it turns out there are two techniques that can be used to determine these parameters _automatically_:\n", "\n", "* [Mutual information](https://en.wikipedia.org/wiki/Mutual_information) to determine $\\tau$\n", "* [False nearest neighbours](https://en.wikipedia.org/wiki/False_nearest_neighbor_algorithm) to determine $d$\n", "\n", "In ``giotto-tda``, these techniques are applied when we select ``parameters_type=\"search\"`` in the ``SingleTakensEmbedding`` transformer, e.g.\n", "\n", "```python\n", "embedder = SingleTakensEmbedding(\n", " parameters_type=\"search\", time_delay=time_delay, dimension=embedding_dimension,\n", ")\n", "```\n", "\n", "where the values of `time_delay` and `embedding_dimension` provide _**upper bounds**_ on the search algorithm. Before applying this to our sample signals, let's have a look at how these methods actually work under the hood." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mutual information\n", "To determine an optimal value for $\\tau$ we first calculate the maximum $x_\\mathrm{max}$ and minimum $x_\\mathrm{min}$ values of the time series, and divide the interval $[x_\\mathrm{min}, x_\\mathrm{max}]$ into a large number of bins. We let $p_k$ be the probability that an element of the time series is in the $k$th bin and let $p_{j,k}$ be the probability that $x_i$ is in the $j$th bin while $x_{i+\\tau}$ is in the $k$th bin. Then the mutual information is defined as:\n", "\n", "$$ I(\\tau) = - \\sum_{j=1}^{n_\\mathrm{bins}} \\sum_{k=1}^{n_\\mathrm{bins}} p_{j,k}(\\tau) \\log \\frac{p_{j,k}(\\tau)}{p_j p_k} $$\n", "\n", "The first minimum of $I(\\tau)$ gives the optimal time delay since there we get the most information by adding $x_{i+\\tau}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### False nearest neighbours\n", "\n", "The false nearest neighbours algorithm is based on the assumption that \"unfolding\" or embedding a deterministic system into successively higher dimensions is smooth. In other words, points which are close in one embedding dimension should be close in a higher one. More formally, if we have a point $p_i$ and neighbour $p_j$, we check if the normalised distance $R_i$ for the next dimension is greater than some threshold $R_\\mathrm{th}$:\n", "\n", "$$ R_i = \\frac{\\mid x_{i+m\\tau} - x_{j+m\\tau} \\mid}{\\lVert p_i - p_j \\rVert} > R_\\mathrm{th}$$\n", "\n", "If $R_i > R_\\mathrm{th}$ then we have a \"false nearest neighbour\" and the optimal embedding dimension is obtained by minimising the total number of such neighbours." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running the search algorithm\n", "\n", "Let's now apply these ideas to our original signals to see what the algorithm determines as optimal choices for $d$ and $\\tau$. We will allow the search to scan up to relatively large values of $(d, \\tau)$ to ensure we do not get stuck in a sub-optimal minimum.\n", "\n", "For the periodic signal, we initialise the Takens embedding as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "max_embedding_dimension = 30\n", "max_time_delay = 30\n", "stride = 5\n", "\n", "embedder_periodic = SingleTakensEmbedding(\n", " parameters_type=\"search\",\n", " time_delay=max_time_delay,\n", " dimension=max_embedding_dimension,\n", " stride=stride,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a helper function to view the optimal values found during the search:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def fit_embedder(embedder: SingleTakensEmbedding, y: np.ndarray, verbose: bool=True) -> np.ndarray:\n", " \"\"\"Fits a Takens embedder and displays optimal search parameters.\"\"\"\n", " y_embedded = embedder.fit_transform(y)\n", "\n", " if verbose:\n", " print(f\"Shape of embedded time series: {y_embedded.shape}\")\n", " print(\n", " f\"Optimal embedding dimension is {embedder.dimension_} and time delay is {embedder.time_delay_}\"\n", " )\n", "\n", " return y_embedded" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_periodic_embedded = fit_embedder(embedder_periodic, y_periodic)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although the resulting embedding is in a high dimensional space, we can apply dimensionality reduction techniques like [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) to project down to 3-dimensions for visualisation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.decomposition import PCA\n", "\n", "pca = PCA(n_components=3)\n", "y_periodic_embedded_pca = pca.fit_transform(y_periodic_embedded)\n", "plot_point_cloud(y_periodic_embedded_pca)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now for the non-periodic case we have:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "embedder_nonperiodic = SingleTakensEmbedding(\n", " parameters_type=\"search\",\n", " n_jobs=2,\n", " time_delay=max_time_delay,\n", " dimension=max_embedding_dimension,\n", " stride=stride,\n", ")\n", "\n", "y_nonperiodic_embedded = fit_embedder(embedder_nonperiodic, y_nonperiodic)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pca = PCA(n_components=3)\n", "y_nonperiodic_embedded_pca = pca.fit_transform(y_nonperiodic_embedded)\n", "plot_point_cloud(y_nonperiodic_embedded_pca)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So we have embedding point clouds whose geometry looks clearly distinct; how about the persistence diagrams? As we did earlier, we first need to reshape our arrays into the form `(n_samples, n_points, n_dimensions)`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_periodic_embedded = y_periodic_embedded[None, :, :]\n", "y_nonperiodic_embedded = y_nonperiodic_embedded[None, :, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next step is to calculate the persistence diagrams associated with each point cloud:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "homology_dimensions = [0, 1, 2]\n", "\n", "periodic_persistence = VietorisRipsPersistence(homology_dimensions=homology_dimensions)\n", "print(\"Persistence diagram for periodic signal\")\n", "periodic_persistence.fit_transform_plot(y_periodic_embedded)\n", "\n", "nonperiodic_persistence = VietorisRipsPersistence(\n", " homology_dimensions=homology_dimensions, n_jobs=6\n", ")\n", "print(\"Persistence diagram for nonperiodic signal\")\n", "nonperiodic_persistence.fit_transform_plot(y_nonperiodic_embedded);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case the persistence diagram for the periodic signal is essentially unchanged, but the non-periodic signal now reveals two $H_1$ points and one $H_2$ one - the signature of a hypertorus! It turns out that in general, the image of $TD_{d,\\tau}f$ is a hypertorus." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }