{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Machine Learning for Time Series (Master MVA)**\n", "\n", "- [Link to the class material.](http://www.laurentoudre.fr/ast.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "In this notebook, we illustrate the following concept:\n", "- graph signal processing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Import**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import re\n", "from math import asin, cos, radians, sin, sqrt\n", "\n", "import contextily as cx\n", "import geopandas\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "from loadmydata.load_molene_meteo import load_molene_meteo_dataset\n", "from matplotlib.dates import DateFormatter\n", "from pygsp import graphs\n", "from scipy.linalg import eigh\n", "from scipy.spatial.distance import pdist, squareform" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "CRS = \"EPSG:4326\"\n", "\n", "STATION_LIST = [\n", " \"ARZAL\",\n", " \"AURAY\",\n", " \"BELLE ILE-LE TALUT\",\n", " \"BIGNAN\",\n", " \"BREST-GUIPAVAS\",\n", " \"BRIGNOGAN\",\n", " \"DINARD\",\n", " \"GUERANDE\",\n", " \"ILE DE GROIX\",\n", " \"ILE-DE-BREHAT\",\n", " \"KERPERT\",\n", " \"LANDIVISIAU\",\n", " \"LANNAERO\",\n", " \"LANVEOC\",\n", " \"LORIENT-LANN BIHOUE\",\n", " \"LOUARGAT\",\n", " \"MERDRIGNAC\",\n", " \"NOIRMOUTIER EN\",\n", " \"OUESSANT-STIFF\",\n", " \"PLEUCADEUC\",\n", " \"PLEYBER-CHRIST SA\",\n", " \"PLOERMEL\",\n", " \"PLOUDALMEZEAU\",\n", " \"PLOUGUENAST\",\n", " \"PLOUMANAC'H\",\n", " \"POMMERIT-JAUDY\",\n", " \"PONTIVY\",\n", " \"PTE DE CHEMOULIN\",\n", " \"PTE DE PENMARCH\",\n", " \"PTE DU RAZ\",\n", " \"QUIMPER\",\n", " \"QUINTENIC\",\n", " \"ROSTRENEN\",\n", " \"SAINT-CAST-LE-G\",\n", " \"SARZEAU SA\",\n", " \"SIBIRIL S A\",\n", " \"SIZUN\",\n", " \"SPEZET\",\n", " \"ST BRIEUC\",\n", " \"ST NAZAIRE-MONTOIR\",\n", " \"ST-SEGAL S A\",\n", " \"THEIX\",\n", " \"VANNES-SENE\",\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Utility functions**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def fig_ax(figsize=(15, 3)):\n", " return plt.subplots(figsize=figsize)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_line_graph(n_nodes=10) -> graphs.Graph:\n", " \"\"\"Return a line graph.\"\"\"\n", " adjacency_matrix = np.eye(n_nodes)\n", " adjacency_matrix = np.c_[adjacency_matrix[:, -1], adjacency_matrix[:, :-1]]\n", " adjacency_matrix += adjacency_matrix.T\n", " line_graph = graphs.Graph(adjacency_matrix)\n", " line_graph.set_coordinates(kind=\"ring2D\")\n", " line_graph.compute_laplacian(\"combinatorial\")\n", " return line_graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_grid_graph(n_nodes_height=10, n_nodes_width=10) -> graphs.Graph:\n", " \"\"\"Return a 2D grid graph.\"\"\"\n", " g = graphs.Grid2d(n_nodes_height, n_nodes_width)\n", " xx, yy = np.meshgrid(np.arange(n_nodes_height), np.arange(n_nodes_width))\n", " coords = np.array((xx.ravel(), yy.ravel())).T\n", " g.set_coordinates(coords)\n", " g.compute_laplacian(\"combinatorial\")\n", " return g" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def dms2dd(s):\n", " \"\"\"Convert longitude and latitude strings to float.\"\"\"\n", " # https://stackoverflow.com/a/50193328\n", " # example: s = \"\"\"48°51'18\"\"\"\n", " degrees, minutes, seconds = re.split(\"[°'\\\"]+\", s[:-1])\n", " direction = s[-1]\n", " dd = float(degrees) + float(minutes) / 60 + float(seconds) / (60 * 60)\n", " if direction in (\"S\", \"W\"):\n", " dd *= -1\n", " return dd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_geodesic_distance(point_1, point_2) -> float:\n", " \"\"\"\n", " Calculate the great circle distance (in km) between two points\n", " on the earth (specified in decimal degrees)\n", "\n", " https://stackoverflow.com/a/4913653\n", " \"\"\"\n", "\n", " lon1, lat1 = point_1\n", " lon2, lat2 = point_2\n", "\n", " # convert decimal degrees to radians\n", " lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])\n", "\n", " # haversine formula\n", " dlon = lon2 - lon1\n", " dlat = lat2 - lat1\n", " a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2\n", " c = 2 * asin(sqrt(a))\n", " r = 6371 # Radius of earth in kilometers. Use 3956 for miles\n", " return c * r" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_exponential_similarity(condensed_distance_matrix, bandwidth, threshold):\n", " exp_similarity = np.exp(-(condensed_distance_matrix**2) / bandwidth / bandwidth)\n", " res_arr = np.where(exp_similarity > threshold, exp_similarity, 0.0)\n", " return res_arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Graph signal processing\n", "\n", "A graph $G$ is a set of $N$ **nodes** connected with **edges**. A **graph signal** is a $\\mathbb{R}^N$ vector that is supported by the nodes of the graph $G$.\n", "\n", "Graph Signal Processing (GSP) is the set of methods of methods to study such objects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Toy data\n", "\n", "Let us illustrate the basic principles of GSP on two toy graphs: the line graph and the 2D grid graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "line_graph = get_line_graph(50) # 50 nodes\n", "line_graph.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "grid_graph = get_grid_graph(20, 20) # 20 by 20 grid\n", "grid_graph.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now generate noisy signals on those two graphs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, (ax_0, ax_1) = plt.subplots(nrows=1, ncols=2, figsize=(20, 5))\n", "# generate a noisy sinusoid\n", "tt = np.linspace(0, 6 * np.pi, line_graph.N)\n", "signal_line = 5 * np.sin(tt) + np.random.normal(size=line_graph.N)\n", "# plot\n", "line_graph.plot_signal(signal_line, ax=ax_0)\n", "ax_1.plot(signal_line)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, (ax_0, ax_1) = plt.subplots(nrows=1, ncols=2, figsize=(20, 5))\n", "\n", "# generate a noisy sinusoid\n", "x = np.linspace(-2 * np.pi, 2 * np.pi, 20)\n", "y = np.linspace(-2 * np.pi, 2 * np.pi, 20)\n", "xx, yy = np.meshgrid(x, y, sparse=True)\n", "z = 5 * np.sin(xx + yy)\n", "z += np.random.normal(size=z.shape)\n", "signal_grid = z.flatten()\n", "# plot\n", "ax_0.contourf(x, y, z)\n", "grid_graph.plot_signal(signal_grid, ax=ax_1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fourier basis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question
\n", "Recall the definition of the Laplacian matrix.
\n", "Question
\n", "Compute the Laplacian matrix for both the line graph and the grid graph.
\n", "Verify your result with the Laplacian matrix provided by the Graph class (available in g.L.todense() for a graph g).
\n", "Question
\n", "What do you observe on the shape of the eigenvectors?
\n", "Question
\n", "For the grid graph, compute and display the first and last eigenvectors.
\n", "Question
\n", "Visually, which one the two eigenvectors is the smoothest?
\n", "Recall the definition of a graph signal smoothness.
\n", "Question
\n", "Compute and plot the smoothness of each eigenvector of the Laplacian (of the grid graph).
\n", "What do you observe? Is this expected?
\n", "Question
\n", "Given the Fourier representation of a signal, how can we recover the original signal?
\n", "Question
\n", "What are the two closest stations?
\n", "Question
\n", "What are the two most distant stations?
\n", "Question
\n", "Plot the temperature evolution for the two closest stations.
\n", "Question
\n", "Do the same for the two most distant stations.
\n", "Question
\n", "Give two ways to derive an adjacency matrix from a distance matrix?
\n", "Question
\n", "The number of connected components of a graph is the multiplicity of the eigenvalue 0 of the Laplacian matrix. Can you explain why? Write a function `is_connected` which returns True if the input graph is connected and False otherwise.
\n", "Question
\n", "Plot the number of edges with respect to the threshold.
\n", "What is approximatively the lowest threshold possible in order to have a connected graph?
\n", "Question
\n", "Plot the average degree with respect to the threshold.
\n", "Question
\n", "What is the influence of the threshold.
\n", "Choose an appropriate value for the threshold.
\n", "Question
\n", "Describe how to create a graph using the signal correlation. Code it.
\n", "Question
\n", "Compute the smoothness for a specific hour of measure.
\n", "Question
\n", "Display the smoothess evolution with time.
\n", "Question
\n", "Show the state of the network, when the signal is the smoothest.
\n", "Question
\n", "Do the same, when the signal is the least smooth.
\n", "