{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ECG Processing Example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "    \n",
    "This example illustrates how to import and process electrocardiogram (ECG) data recorded per subject and how to save intermediate processing results so that further analysis can be performed (e.g., in [<code>ECG_Analysis_Example.ipynb</code>](ECG_Analysis_Example.ipynb) or in [<code>Protocol_Example.ipynb</code>](Protocol_Example.ipynb)).\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup and Helper Functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "import re\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from fau_colors import cmaps\n",
    "import biopsykit as bp\n",
    "from biopsykit.signals.ecg import EcgProcessor\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "%matplotlib widget\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.close(\"all\")\n",
    "\n",
    "palette = sns.color_palette(cmaps.faculties)\n",
    "sns.set_theme(context=\"notebook\", style=\"ticks\", font=\"sans-serif\", palette=palette)\n",
    "\n",
    "plt.rcParams[\"figure.figsize\"] = (8, 4)\n",
    "plt.rcParams[\"pdf.fonttype\"] = 42\n",
    "plt.rcParams[\"mathtext.default\"] = \"regular\"\n",
    "\n",
    "palette"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def display_dict_structure(dict_to_display):\n",
    "    _display_dict_recursive(dict_to_display)\n",
    "\n",
    "\n",
    "def _display_dict_recursive(dict_to_display):\n",
    "    if isinstance(dict_to_display, dict):\n",
    "        display(dict_to_display.keys())\n",
    "        _display_dict_recursive(list(dict_to_display.values())[0])\n",
    "    else:\n",
    "        display(\"Dataframe shape: {}\".format(dict_to_display.shape))\n",
    "        display(dict_to_display.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1: 1 Dataset, 1 Phase"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example assumes that we have one dataset with only one phase, i.e. the dataset does not need to be split into multiple parts internally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ecg_data, sampling_rate = bp.example_data.get_ecg_example()\n",
    "ep = EcgProcessor(data=ecg_data, sampling_rate=sampling_rate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Process ECG Signal"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling [ep.ecg_process()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.ecg.html#biopsykit.signals.ecg.ecg.EcgProcessor.ecg_process) will perform R peak detection and perform outlier correcrtion with the default outlier correction pipeline.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ep.ecg_process()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Optional: Use other outlier correction parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling [ep.outlier_params_default()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.ecg.html#biopsykit.signals.ecg.ecg.EcgProcessor.outlier_params_default) will list available outlier correction parameters and their default parameter. See the doumentation of [ep.outlier_corrections()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.ecg.html#biopsykit.signals.ecg.ecg.EcgProcessor.outlier_corrections) for further information.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List available outlier correction parameters and their default parameter. See the doumentation of 'EcgProcessor.outlier_corrections()' for further information.\n",
    "# print(ep.outlier_params_default())\n",
    "# ep.ecg_process(outlier_correction=['statistical_rr', 'statistical_rr_diff', 'physiological'], outlier_params={'statistical_rr': 2.576, 'statistical_rr_diff': 1.96, 'physiological': (50, 180)})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ECG and Heart Rate Result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display_dict_structure(ep.ecg_result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get heart rate and print resulting heart rate\n",
    "hr_data = ep.heart_rate[\"Data\"]\n",
    "hr_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbsphinx-thumbnail"
    ]
   },
   "outputs": [],
   "source": [
    "# Plot an overview of the acquired data - with ECG Signal\n",
    "fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key=\"Data\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot an overview of the acquired data - without ECG signal\n",
    "fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key=\"Data\", plot_ecg_signal=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: See the documentation of [ep.ecg_result()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.ecg.html#biopsykit.signals.ecg.ecg.EcgProcessor.ecg_result), [ep.heart_rate()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.ecg.html#biopsykit.signals.ecg.ecg.EcgProcessor.heart_rate) and [plotting.ecg_plot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.plotting.html#biopsykit.signals.ecg.plotting.ecg_plot) for further information."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compute Heart Rate Variability (HRV)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Heart rate variability (HRV) is the physiological phenomenon of varying time intervals of consecutive heart beats. HRV is an important marker for the activity of the autonomic nervous system (ANS) since it can be used to assess the activities of the two brances of the ANS: The sympathetic nervous system (SNS) and the parasympathetic nervous system (PNS).\n",
    "\n",
    "The function [EcgProcessor.hrv_process()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.ecg.html#biopsykit.signals.ecg.ecg.EcgProcessor.hrv_process) computes HRV over the complete data. If you want to compute HRV over different subintervals, you need to split the data first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ep.hrv_process(ep, \"Data\", index=\"Vp01\", index_name=\"subject_id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot HRV Overview"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axs = bp.signals.ecg.plotting.hrv_plot(ep, \"Data\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 2: 1 Dataset, Multiple Phases"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example illustrates the pipeline for one single dataset which contains data from multiple phases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ecg_data, sampling_rate = bp.example_data.get_ecg_example()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example, we use the example ECG Data and assume we want to split it into 4 phases (names: Preparation, Stress, Recovery, End) of 3 minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# provide list of edge times (start times of the phases and the total end time)\n",
    "time_intervals = pd.Series([\"12:32\", \"12:35\", \"12:38\", \"12:41\"], index=[\"Preparation\", \"Stress\", \"Recovery\", \"End\"])\n",
    "# alternatively: provide dict with start-end times and names per phase\n",
    "# time_intervals = {\"Preparation\": (\"12:32\", \"12:35\"), \"Stress\": (\"12:35\", \"12:38\"), \"Recovery\": (\"12:38\", \"12:41\")}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Process all Phases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ep = EcgProcessor(data=ecg_data, sampling_rate=sampling_rate, time_intervals=time_intervals)\n",
    "ep.ecg_process()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Compute HRV parameters for each Phase"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using List Comprehension ([EcgProcessor.hrv_process()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.ecg.html#biopsykit.signals.ecg.ecg.EcgProcessor.hrv_process) for each phase) and concat the results into one dataframe using [pd.concat()](https://pandas.pydata.org/docs/reference/api/pandas.concat.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hrv_result = pd.concat([ep.hrv_process(ep, key=key, index=key) for key in ep.phases])\n",
    "# alternatively: call EcgProcessor.hrv_batch_process()\n",
    "# hrv_result = ep.hrv_batch_process()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot one Phase"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key=\"Stress\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axs = bp.signals.ecg.plotting.hrv_plot(ep, key=\"Stress\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 3: Multiple Subjects, Multiple Phases per Recording (Example Processing)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ecg_data, sampling_rate = bp.example_data.get_ecg_example()\n",
    "ecg_data_02, sampling_rate = bp.example_data.get_ecg_example_02()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example, we use two ECG example datasets, where the last phase (\"Recovery\") differs in length"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subject_dict = {\n",
    "    \"Vp01\": {\n",
    "        \"Data\": ecg_data,\n",
    "        \"Time\": pd.Series([\"12:32\", \"12:35\", \"12:38\", \"12:41\"], index=[\"Preparation\", \"Stress\", \"Recovery\", \"End\"]),\n",
    "    },\n",
    "    \"Vp02\": {\n",
    "        \"Data\": ecg_data_02,\n",
    "        # The last phase of Vp02 has a length of only 1 minute to demonstrate the functionality of cutting to equal length\n",
    "        \"Time\": pd.Series([\"12:55\", \"12:58\", \"13:01\", \"13:02\"], index=[\"Preparation\", \"Stress\", \"Recovery\", \"End\"]),\n",
    "    },\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "\n",
    "**Note**: For the further steps of the Processing Pipeline, please refer to **Example 4**.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Example 4: Multiple Subjects, Multiple Phases per Recording (Batch Processing)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example illustrates how to set up a proper data analysis pipeline to process multiple subjects in a loop. It consists of the following steps:\n",
    "\n",
    "\n",
    "1. **Get Time Information**: Load Excel sheet with time information (to process multiple subjects in a loop).\n",
    "2. **Query Data**: Iterate through a folder and search for all files with a specific file pattern\n",
    "    *optional*: Iterate through a folder that contains *subfolders* for each subjects where data is stored  \n",
    "    *optional*: Extract Subject ID either from data filename or from folder name\n",
    "3. **Process Data**:\n",
    "    1.  Load ECG Dataset, split into phases based on time information from Excel sheet.\n",
    "    2.  Perform ECG processing with outlier correction.\n",
    "    3.  Compute Heart Rate for each subject and each phase.\n",
    "    4.  Store and Export Intermediate Processing Results: Heart rate processing results are stored in a [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict), a special nested dictionary structure that contains processed data of all subjects, and export processed heart rate data as well as R-peaks (needed for computing HRV parameters) per subject (as Excel files).\n",
    "\n",
    "Further heart rate processing steps (such as resampling data, splitting data into subphases, normalizing data, computing aggregated results, computing HRV parameters, ...) will be performed in [<code>ECG_Analysis_Example.ipynb</code>](ECG_Analysis_Example.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get Time Information"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_time_log = bp.example_data.get_time_log_example()\n",
    "# Alternatively: load your own file\n",
    "# df_time_log = bp.io.load_time_log(\"<path-to-time-log-file>\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_time_log"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Query Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# path to folder containing ECG raw data\n",
    "ecg_base_path = bp.example_data.get_ecg_path_example()\n",
    "# Alternatively: Use your own data\n",
    "# ecg_base_path = Path(\"<path-to-ecg-data-folder>\")\n",
    "file_list = list(sorted(ecg_base_path.glob(\"*.bin\")))\n",
    "print(file_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Process Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tqdm.auto import tqdm  # to visualize for-loop progress\n",
    "\n",
    "# folder to save processing results (comment to skip)\n",
    "ecg_export_path = Path(\"./results/ecg\")\n",
    "# create directory and its parent directories (if it not already exists)\n",
    "bp.utils.file_handling.mkdirs(ecg_export_path)\n",
    "\n",
    "# dicionaries to store processing results\n",
    "dict_hr_subjects = {}\n",
    "dict_hrv_subjects = {}\n",
    "\n",
    "# for loop to iterate though subjects\n",
    "for file_path in tqdm(file_list, desc=\"Subjects\"):\n",
    "    # optional: extract Subject ID from file name; multiple ways, e.g. using regex or by splitting the filename string\n",
    "    subject_id = file_path.name.split(\".\")[0].split(\"_\")[-1]\n",
    "\n",
    "    # optional: if data is stored in subfolders: additional .glob() on file_path, get subject_id from folder name\n",
    "    # ecg_files = list(sorted(file_path.glob(\"*\")))\n",
    "    # subject_id = file_path.name\n",
    "\n",
    "    # check if folder contains data\n",
    "    # if len(ecg_files) == 0:\n",
    "    # print(\"No data for subject {}.\".format(subject_id))\n",
    "\n",
    "    # the next step depends on the file structure:\n",
    "    # if you only have one recording per subject: load this file\n",
    "    # df, fs = bp.io.load_dataset_nilspod(ecg_files[0])\n",
    "    # if you have more than one recording per subject: loop through them, load the files and e.g. put them into a dictionary\n",
    "\n",
    "    # load dataset\n",
    "    data, fs = bp.io.nilspod.load_dataset_nilspod(file_path)\n",
    "\n",
    "    ep = EcgProcessor(data=data, time_intervals=df_time_log.loc[subject_id], sampling_rate=256.0)\n",
    "    ep.ecg_process(title=subject_id)\n",
    "\n",
    "    # save ecg processing result into HR subject dict\n",
    "    dict_hr_subjects[subject_id] = ep.heart_rate\n",
    "    dict_hrv_subjects[subject_id] = ep.rpeaks\n",
    "\n",
    "    # save HR data and R-Peak data to files in export folder (comment to skip)\n",
    "    # bp.io.ecg.write_hr_phase_dict(ep.heart_rate, ecg_export_path.joinpath(\"hr_result_{}.xlsx\".format(subject_id)))\n",
    "    # bp.io.ecg.write_hr_phase_dict(ep.rpeaks, ecg_export_path.joinpath(\"rpeaks_result_{}.xlsx\".format(subject_id)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Structure of the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display_dict_structure(dict_hr_subjects)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display_dict_structure(dict_hrv_subjects)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "biopsykit",
   "language": "python",
   "name": "biopsykit"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "toc-showtags": false
 },
 "nbformat": 4,
 "nbformat_minor": 4
}