{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4D change analysis of near-continuous LiDAR time series for applications in geomorphic monitoring\n",
    "\n",
    "## Time series-based change analysis of surface dynamics using PCA\n",
    "\n",
    "In this notebook we will perform time series-based surface change analysis on a time series of permanent TLS point clouds of the sandy beach at Kijkduin for a timespan of around 6 months (<a href=\"#references\">Vos et al., 2022</a>). An introduction to the case study and dataset can be found [here](https://3dgeo-heidelberg.github.io/etrainee/module3/06_casestudy_sandybeach/06_casestudy_sandybeach.html). \n",
    "\n",
    "The objective is to assess surface dynamics using Principal Component Analysis (PCA; following Frantzen et al., 2023). Look into the related article for comparison of possible surface dynamics of the use case and help for deciding on suitable parameters, etc.\n",
    "\n",
    "The workflow is  introduced throughout this notebook. Also, make use of the software documentations!\n",
    "\n",
    "## Software and data\n",
    "This task is solved using Python with the [`py4dgeo`](https://github.com/3dgeo-heidelberg/py4dgeo) library. \n",
    "\n",
    "You can use CloudCompare or GIS Software (e.g. QGIS) to check the data and visualize your results.\n",
    "\n",
    "The dataset is a subsampled version of the original time series, using 12-hourly epochs of point clouds and spatial subsampling to 50 cm. The dataset [can be downloaded](https://zenodo.org/records/10003575) (module3.zip) from the E-learning course E-TRAINEE. In the data directory `kijkduin`, you find the prepared input point clouds and a core points point cloud, which is manually cleaned from noise.\n",
    "\n",
    "[E-TRAINEE](https://github.com/3dgeo-heidelberg/etrainee) is an e-learning course on Time Series Analysis in Remote Sensing for Understanding Human-Environment Interactions. This course has been developed by research groups from four partner universities – Charles University, Heidelberg University, University of Innsbruck, and University of Warsaw."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading data and calculation of surface changes\n",
    "\n",
    "Prepare the analysis by compiling the list of files (epochs) and read the timestamps from the file names (format `YYMMDD_hhmmss`) into `datetime` objects. Use the point cloud files and timestamps to create a py4dgeo `SpatiotemporalAnalysis` object. For this you need to instantiate the M3C2 algorithm. You can use the point cloud file `170115_150816_aoi_50cm.laz` as core points. Explore the point cloud properties in CloudCompare: \n",
    "\n",
    "* Considering the available point density and surface characteristics, what would be a suitable cylinder radius for the distance calculation?\n",
    "* What would be a suitable approach to derive the surface normals in this topography and expected types of surface changes?\n",
    "\n",
    "Hint: In this flat topography and predominant provess of sand deposition and erosion, it can be suitable to orient the normals purely vertically. In this case, they do not need to be computed, and you can [customize the py4dgeo algorithm](https://py4dgeo.readthedocs.io/en/latest/customization.html#Changing-search-directions) accordingly.\n",
    "\n",
    "Use the first point cloud in the time series (list of files) as reference epoch. You can assume a registration error of 1.9 cm for the M3C2 distance calculation (cf. <a href=\"#references\">Vos et al., 2022</a>).\n",
    "\n",
    "Explore the spatiotemporal change information by visualizing the changes at a selected epoch and visualizing the time series at a selected location. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we start by setting up the Python environment and data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required modules\n",
    "import py4dgeo\n",
    "import os\n",
    "import numpy as np\n",
    "from datetime import datetime\n",
    "import pooch\n",
    "\n",
    "# Download data from zenodo and set path to point cloud folder\n",
    "p = pooch.Pooch(base_url=\"doi:10.5281/zenodo.10003574/\", path=pooch.os_cache(\"py4dgeo\"))\n",
    "p.load_registry_from_doi()\n",
    "p.fetch(\"module3.zip\", processor=pooch.Unzip())\n",
    "pc_dir = p.path / \"module3.zip.unzip\" / \"module3\" / \"kijkduin\" / \"pointclouds\"\n",
    "\n",
    "# List of point clouds (time series)\n",
    "pc_list = os.listdir(pc_dir)\n",
    "pc_list[:5]  # print the first elements"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the list of point cloud files you can see that we have one laz file per epoch available. The file name contains the timestamp of the epoch, respectively, in format `YYMMDD_hhmmss`. To use this information for our analysis, we read the timestamp information from the file names into `datetime` objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the timestamps from file names\n",
    "timestamps = []\n",
    "for f in pc_list:\n",
    "    if not f.endswith(\".laz\"):\n",
    "        continue\n",
    "\n",
    "    # Get the timestamp from the file name\n",
    "    timestamp_str = \"_\".join(f.split(\".\")[0].split(\"_\")[1:])  # yields YYMMDD_hhmmss\n",
    "\n",
    "    # Convert string to datetime object\n",
    "    timestamp = datetime.strptime(timestamp_str, \"%y%m%d_%H%M%S\")\n",
    "    timestamps.append(timestamp)\n",
    "\n",
    "timestamps[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we use the point cloud files and timestamp information to create a `SpatiotemporalAnalysis` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "analysis = py4dgeo.SpatiotemporalAnalysis(pc_dir / \"kijkduin.zip\", force=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As reference epoch, we use the first epoch in our time series:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Specify the reference epoch\n",
    "reference_epoch_file = pc_dir / pc_list[0]\n",
    "\n",
    "# Read the reference epoch and set the timestamp\n",
    "reference_epoch = py4dgeo.read_from_las(reference_epoch_file)\n",
    "reference_epoch.timestamp = timestamps[0]\n",
    "\n",
    "# Set the reference epoch in the spatiotemporal analysis object\n",
    "analysis.reference_epoch = reference_epoch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For epochs to be added, we now configure the M3C2 algorithm to derive the change values. We would like to set the normals purely vertically, so we define a customized computation of cylinder `directions`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Inherit from the M3C2 algorithm class to define a custom direction algorithm\n",
    "class M3C2_Vertical(py4dgeo.M3C2):\n",
    "    def directions(self):\n",
    "        return np.array([0, 0, 1])  # vertical vector orientation\n",
    "\n",
    "\n",
    "# Specify corepoints, here all points of the reference epoch\n",
    "analysis.corepoints = reference_epoch.cloud[::]\n",
    "\n",
    "# Specify M3C2 parameters for our custom algorithm class\n",
    "analysis.m3c2 = M3C2_Vertical(\n",
    "    cyl_radii=(1.0,), max_distance=10.0, registration_error=0.019\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we add all the other epochs with their timestamps:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a list to collect epoch objects\n",
    "epochs = []\n",
    "for e, pc_file in enumerate(pc_list[1:]):\n",
    "    if not pc_file.endswith(\"laz\"):\n",
    "        continue\n",
    "    epoch_file = pc_dir / pc_file\n",
    "    epoch = py4dgeo.read_from_las(epoch_file)\n",
    "    epoch.timestamp = timestamps[e]\n",
    "    epochs.append(epoch)\n",
    "\n",
    "# Add epoch objects to the spatiotemporal analysis object\n",
    "analysis.add_epochs(*epochs)\n",
    "\n",
    "# Print the spatiotemporal analysis data for 3 corepoints and 5 epochs, respectively\n",
    "print(f\"Space-time distance array:\\n{analysis.distances[:3,:5]}\")\n",
    "print(\n",
    "    f\"Uncertainties of M3C2 distance calculation:\\n{analysis.uncertainties['lodetection'][:3, :5]}\"\n",
    ")\n",
    "print(f\"Timestamp deltas of analysis:\\n{analysis.timedeltas[:5]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We visualize the changes in the scene for a selected epoch, together with the time series of surface changes at a selected location. The location here was selected separately in CloudCompare (as the corepoint id):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cp_idx_sel = 15162  # selected core point index\n",
    "epoch_idx_sel = 28  # selected epoch index\n",
    "\n",
    "# Import plotting module\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Allow interactive rotation in notebook\n",
    "%matplotlib inline\n",
    "\n",
    "# Create the figure\n",
    "fig = plt.figure(figsize=(15, 5))\n",
    "ax1 = fig.add_subplot(1, 2, 1)\n",
    "ax2 = fig.add_subplot(1, 2, 2)\n",
    "\n",
    "# Get the corepoints\n",
    "corepoints = analysis.corepoints.cloud\n",
    "\n",
    "# Get change values of last epoch for all corepoints\n",
    "distances = analysis.distances\n",
    "distances_epoch = [d[epoch_idx_sel] for d in distances]\n",
    "\n",
    "# Get the time series of changes at a specific core point locations\n",
    "coord_sel = analysis.corepoints.cloud[cp_idx_sel]\n",
    "timeseries_sel = distances[cp_idx_sel]\n",
    "\n",
    "# Get the list of timestamps from the reference epoch timestamp and timedeltas\n",
    "timestamps = [t + analysis.reference_epoch.timestamp for t in analysis.timedeltas]\n",
    "\n",
    "# Plot the scene\n",
    "d = ax1.scatter(\n",
    "    corepoints[:, 0],\n",
    "    corepoints[:, 1],\n",
    "    c=distances_epoch[:],\n",
    "    cmap=\"seismic_r\",\n",
    "    vmin=-1.5,\n",
    "    vmax=1.5,\n",
    "    s=1,\n",
    "    zorder=1,\n",
    ")\n",
    "plt.colorbar(d, format=(\"%.2f\"), label=\"Distance [m]\", ax=ax1, pad=0.15)\n",
    "\n",
    "# Add the location of the selected coordinate\n",
    "ax1.scatter(\n",
    "    coord_sel[0],\n",
    "    coord_sel[1],\n",
    "    facecolor=\"yellow\",\n",
    "    edgecolor=\"black\",\n",
    "    s=100,\n",
    "    zorder=2,\n",
    "    label=\"Selected location\",\n",
    "    marker=\"*\",\n",
    ")\n",
    "ax1.legend()\n",
    "\n",
    "# Configure the plot layout\n",
    "ax1.set_xlabel(\"X [m]\")\n",
    "ax1.set_ylabel(\"Y [m]\")\n",
    "ax1.set_aspect(\"equal\")\n",
    "ax1.set_title(\n",
    "    \"Changes at %s\"\n",
    "    % (analysis.reference_epoch.timestamp + analysis.timedeltas[epoch_idx_sel])\n",
    ")\n",
    "\n",
    "# Plot the time series\n",
    "ax2.scatter(timestamps, timeseries_sel, s=5, color=\"black\", label=\"Raw\")\n",
    "ax2.plot(timestamps, timeseries_sel, color=\"blue\", label=\"Smoothed\")\n",
    "ax2.set_xlabel(\"Date\")\n",
    "\n",
    "# Add the epoch of the plotted scene\n",
    "ax2.scatter(\n",
    "    timestamps[epoch_idx_sel],\n",
    "    timeseries_sel[epoch_idx_sel],\n",
    "    facecolor=\"yellow\",\n",
    "    marker=\"*\",\n",
    "    edgecolor=\"black\",\n",
    "    s=100,\n",
    "    color=\"red\",\n",
    "    label=\"Selected epoch\",\n",
    ")\n",
    "ax2.legend()\n",
    "\n",
    "# Format the date labels\n",
    "import matplotlib.dates as mdates\n",
    "\n",
    "dtFmt = mdates.DateFormatter(\"%b-%d\")\n",
    "plt.gca().xaxis.set_major_formatter(dtFmt)\n",
    "\n",
    "# configure the plot layout\n",
    "ax2.set_ylabel(\"Distance [m]\")\n",
    "ax2.grid()\n",
    "ax2.set_ylim(-0.3, 1.6)\n",
    "ax2.set_title(\"Time series at selected location\")\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The map of changes in the scene shows us linear structures of sand deposition near the coast which can be interpreted as sand bars (with knowledge about coastal processes). This is confirmed by the surface behavior over time, expressed in the time series plot. However, the time series is quite noisy especially in this part of the beach, which is regularly covered by water during high tides (leading to missing data) and also varies strongly in surface moisture (influencing the LiDAR range measurement and causing noise). We therefore continue with temporal smoothing of the time series."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Principal component analysis\n",
    "\n",
    "In the following, we will use principal component analysis (PCA; also referred to as empirical orthogonal functions) as a tool to summarize the change patterns within a spatiotemporal dataset. The method is based on [this master thesis](http://resolver.tudelft.nl/uuid:ce98c4e3-6ca1-4966-a5cf-2120f2fa44bf), to be published in [Frantzen et al. (2023)](#references). \n",
    " \n",
    "For this technique, the 4D data shall be represented as a two-dimensional array in which the rows correspond to different epochs, and the columns correspond to each individual [x, y] coordinate where a z or surface change value is represented. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Reshape the data from the `SpatiotemporalAnalysis` object so that all coordinates are represented in one axis, i.e. an array holding [time, space]:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = distances.transpose()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Find coordinates without nan values, and create a new array without nan values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idxs = np.argwhere(\n",
    "    ~np.isnan(X).any(axis=0)\n",
    ")  # store the column indices where actual data is present\n",
    "X_no_nan = X[:, idxs.flatten()].reshape((X.shape[0], -1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compute the loading and scores of the prinicpal components in 'NaN-free' X:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.decomposition import PCA\n",
    "\n",
    "pca = PCA()\n",
    "scores = pca.fit_transform(X_no_nan)\n",
    "result = pca.components_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot the principal component loadings and scores along with their fraction of explained variance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "coords_no_nan = analysis.corepoints.cloud[idxs]\n",
    "for i in range(1):  # range(pca.n_components_):\n",
    "    fig, axes = plt.subplots(1, 2, figsize=(10, 4))\n",
    "    sat_val = (\n",
    "        np.abs(np.nanpercentile(result[i], 10))\n",
    "        + np.abs(np.nanpercentile(result[i], 90)) / 2\n",
    "    )\n",
    "    s = axes[0].scatter(\n",
    "        coords_no_nan[:, 0, 0],\n",
    "        coords_no_nan[:, 0, 1],\n",
    "        c=result[i],\n",
    "        clim=(-sat_val, sat_val),\n",
    "        cmap=\"PiYG\",\n",
    "    )\n",
    "    axes[0].set_aspect(\"equal\")\n",
    "    axes[1].plot(scores[i])\n",
    "    axes[1].set_xlabel(\"Epoch\")\n",
    "    axes[1].set_ylabel(\"Score\")\n",
    "    fig.suptitle(\n",
    "        f\"PC loading (left) and scores (right) for principal component {i} with EVF {pca.explained_variance_ratio_[i]}\"\n",
    "    )\n",
    "    plt.colorbar(s, ax=axes[0])\n",
    "    plt.tight_layout()\n",
    "    fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='references'></a>\n",
    "# References\n",
    "\n",
    "* Anders, K., Lindenbergh, R. C., Vos, S. E., Mara, H., de Vries, S., & Höfle, B. (2019). High-Frequency 3D Geomorphic Observation Using Hourly Terrestrial Laser Scanning Data Of A Sandy Beach. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., IV-2/W5, pp. 317-324. doi: [10.5194/isprs-annals-IV-2-W5-317-2019](https://doi.org/10.5194/isprs-annals-IV-2-W5-317-2019).\n",
    "* Anders, K., Winiwarter, L., Mara, H., Lindenbergh, R., Vos, S. E., & Höfle, B. (2021). Fully automatic spatiotemporal segmentation of 3D LiDAR time series for the extraction of natural surface changes. ISPRS Journal of Photogrammetry and Remote Sensing, 173, pp. 297-308. doi: [10.1016/j.isprsjprs.2021.01.015](https://doi.org/10.1016/j.isprsjprs.2021.01.015).\n",
    "* Frantzen, P., Voordendag, A., Kuschnerus, M., Rutzinger, M.,  Wouters, B., Lindenbergh, R. (2023, in review). Identifying Paraglacial Geomorphic Process Dynamics Using A Principal Component Analysis Of Terrestrial Laser Scanning Time Series.\n",
    "* Vos, S., Anders, K., Kuschnerus, M., Lindenberg, R., Höfle, B., Aarnikhof, S. & Vries, S. (2022). A high-resolution 4D terrestrial laser scan dataset of the Kijkduin beach-dune system, The Netherlands.  Scientific Data, 9:191. doi: [10.1038/s41597-022-01291-9](https://doi.org/10.1038/s41597-022-01291-9)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}