{ "cells": [ { "cell_type": "markdown", "metadata": { "cell_marker": "\"\"\"" }, "source": [ "# ECG Analysis Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This example illustrates how to further process time-series heart rate data extracted from, for instance, ECG data. This includes assigning study conditions, splitting data further into subphases, resampling data, cutting data of all subjects to equal length, normalizing data, rearranging data, as well as computing aggregated results (such as mean and standard error over all subjects for each (sub)phase.\n", "\n", "As input, this notebook uses the heart rate processing outputs generated from the ECG processing pipeline as shown in [ECG_Processing_Example.ipynb](ECG_Processing_Example.ipynb)).\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup and Helper Functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import re\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "from fau_colors import cmaps\n", "\n", "import biopsykit as bp\n", "from biopsykit.signals.ecg import EcgProcessor\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "%matplotlib widget\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.close(\"all\")\n", "\n", "palette = sns.color_palette(cmaps.faculties)\n", "sns.set_theme(context=\"notebook\", style=\"ticks\", font=\"sans-serif\", palette=palette)\n", "\n", "plt.rcParams[\"figure.figsize\"] = (10, 5)\n", "plt.rcParams[\"pdf.fonttype\"] = 42\n", "plt.rcParams[\"mathtext.default\"] = \"regular\"\n", "\n", "palette" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# path to import and export example processing results\n", "result_path = Path(\"./results\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "lines_to_next_cell": 1 }, "outputs": [], "source": [ "def display_dict_structure(dict_to_display):\n", " _display_dict_recursive(dict_to_display)\n", "\n", "\n", "def _display_dict_recursive(dict_to_display):\n", " if isinstance(dict_to_display, dict):\n", " display(dict_to_display.keys())\n", " _display_dict_recursive(list(dict_to_display.values())[0])\n", " else:\n", " display(\"Dataframe shape: {}\".format(dict_to_display.shape))\n", " display(dict_to_display.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we collect ECG data from a study, we typically extract heart rate (among others), resulting in time-series heart rate data for each subject. These data are then usually processed further in order to visualize results, perform statistical analyses, etc. How the data is further processed depends on the application. In this example, we will look at two different applications:\n", "\n", "1. **Aggregated Results**: This means that we compute aggregations over time-series data. In the end, we will get tabular data with mean values for each subject within one time interval of the study procedure. We compute these *Aggregated Results* when we want to analyze differences, for example, between different study conditions, or between different phases of one study (or both). \n", "2. **Ensemble Time-series**: This means that we have time-series data where the data of each subject has equal length. In the end, this allows us to concatenate the data of all subjects into a tabular format. We compute these *Ensemble Time-series* data when we want to visualize time-series data of all subjects in one plot by overlaying the data and computing the mean ± standard deviation over all subjects for each time point (a so-called *ensemble plot*). \n", "\n", "\n", "Additionally, we will show how to **compute heart rate variability (HRV)** parameters each subject and for each (sub)phase of the recorded data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aggregated Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For computing *Aggregated Results* the following processing steps will be performed:\n", "\n", "1. **Load Data**: Load heart rate processing results per subject and concatenate them into a [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict), a special nested dictionary structure that contains processed data of all subjects. \n", "2. **Resample**: Resample heart rate data of all subjects to 1 Hz. This allows to better split the data further into subphases later. \n", "3. **Normalize Data** (*optional*): Normalize heart rate data with respect to the mean heart rate of a baseline phase. This can allow to better compare heart rate reactions within one study since it removed biases from different resting heart rates. \n", "4. **Select Phases** (*optional*): If the following steps should not be performed on the data of *all* phases, but only on a *subset* of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset. \n", "5. **Split into Subphases** (*optional*): If the different phases of your study consist of subphases, which you want to aggregate and analyze separately, you can further split the data in the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) further into different subphases, which adds a new dictionary level.\n", "6. **Aggregate Results**: Aggregate time-series data by computing the mean heart rate per (sub)phase for each subject. The results will be concatenated into a dataframe. \n", "7. **Add Conditions** (*optional*): If multiple study conditions were present during the study the study condition can be added as new index level to the dataframe. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Heart Rate Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load processed heart rate time-series data of all subjects and concatenate into one dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "study_data_dict = {}\n", "# or use your own path\n", "ecg_path = bp.example_data.get_ecg_processing_results_path_example()\n", "\n", "for file in sorted(ecg_path.glob(\"hr_result_*.xlsx\")):\n", " subject_id = re.findall(\"hr_result_(Vp\\w+).xlsx\", file.name)[0]\n", " study_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col=\"time\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(study_data_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Subject Conditions\n", "(will be needed later)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note**: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "condition_list = pd.DataFrame(\n", " [\"Control\", \"Intervention\"], columns=[\"condition\"], index=pd.Index([\"Vp01\", \"Vp02\"], name=\"subject\")\n", ")\n", "condition_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Resample Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Resample all heart rate values to a sampling rate of 1 Hz." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result = bp.utils.data_processing.resample_dict_sec(study_data_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the resampled [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Normalize Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normalize heart rate data with respect to the mean heart rate of a baseline phase. In this case, the baseline phase is called `\"Baseline\"`. All heart rate samples are then interpreted as \"heart rate change relative to baseline in percent\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result_norm = bp.utils.data_processing.normalize_to_phase(dict_result, \"Baseline\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the normalized [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_norm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select Phases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If desired, select only specific phases, such as only \"Intervention\", \"Stress\", and \"Recovery\" (i.e., drop the \"Baseline\" phase because it was only used for normalizing all other phases)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result_norm_selected = bp.utils.data_processing.select_dict_phases(\n", " dict_result_norm, phases=[\"Intervention\", \"Stress\", \"Recovery\"]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the normalized [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) with only the selected phases:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_norm_selected)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split into Subphases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By splitting the data in the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) further into subphases a new index level is added as the most inner dictionary level. You can specify the different subphases in multiple ways (depending on your study protocol):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Fixed Lengths, Consecutive Subphases\n", "\n", "If all subphases have fixed lengths and subphases are consecutive, i.e., each subphase begins right after the previous one, you can pass a dict with subphase names as keys and subphase *durations* (in seconds) as values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(\n", " dict_result, subphases={\"Start\": 20, \"Middle\": 30, \"End\": 10}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) with a new subphase level added:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_subph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Fixed Lengths, No Consecutive Subphases\n", "\n", "If all subphases have fixed lengths, but subphase are *not* consecutive, i.e., do *not* begin right after the previous subphase, you can pass a dict with subphase names as keys and and a tuple with relative start and end *times* (not durations!) (in seconds) as values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(\n", " dict_result, subphases={\"Start\": (0, 10), \"Middle\": (20, 40), \"End\": (50, 60)}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) with a new subphase level added:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_subph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Last Subphase without Fixed Length" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If all subphases have fixed length, except the last subphase, which does not have fixed length (e.g. because the last subphase was a phase where feedback was provided to the subject and thus had variable length), but you still want to extract all data from this subphase, you can set the duration of the last subphase to `0`. The duration of this subphase is then inferred from the data by finding the subject with the longest recording. Shorter recordings of other subjects are then also automatically cut to their respective duration." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(\n", " dict_result, subphases={\"Start\": 20, \"Middle\": 30, \"End\": 0}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) with a new subphase level added:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_subph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Aggregate Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Aggregate Over Phases" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_result = bp.utils.data_processing.mean_per_subject_dict(\n", " dict_result, dict_levels=[\"subject\", \"phase\"], param_name=\"HR\"\n", ")\n", "\n", "df_result.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Aggregate Over Subphases" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_result_subph = bp.utils.data_processing.mean_per_subject_dict(\n", " dict_result_subph, dict_levels=[\"subject\", \"phase\", \"subphase\"], param_name=\"HR\"\n", ")\n", "\n", "df_result_subph.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add Conditions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_result_cond = bp.utils.data_processing.add_subject_conditions(df_result, condition_list)\n", "df_result_cond.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_result_subph_cond = bp.utils.data_processing.add_subject_conditions(df_result_subph, condition_list)\n", "df_result_subph_cond.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute Parameter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting aggregated data can then be further processed and \"population averages\" can be computed by averaging the heart rate data of each (sub)phase over all subjects, resulting in mean ± standard error values per (sub)phase, and condition (if applicable). The resulting dataframe is a [MeanSeDataFrame](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.MeanSeDataFrame), a containing mean and standard error of time-series data in a standardized format." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Without Conditions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_result_mse = bp.utils.data_processing.mean_se_per_phase(df_result)\n", "df_result_mse.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_result_subph_mse = bp.utils.data_processing.mean_se_per_phase(df_result_subph)\n", "df_result_subph_mse.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### With Conditions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just calling the same function, but with a different dataframe (where we added the condition as index level)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note**: In this example, the standard error is `NaN` because we only have one subject per condition and can thus not compute the standard error.\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bp.utils.data_processing.mean_se_per_phase(df_result_cond).head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bp.utils.data_processing.mean_se_per_phase(df_result_subph_cond).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The final data can then either be statistically analyzed, or plotted. Here, we plot the mean and standard error of each phase using the plotting function [plotting.hr_mean_plot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.plotting.html#biopsykit.protocols.plotting.hr_mean_plot) from `BioPsyKit`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(8, 4))\n", "bp.protocols.plotting.hr_mean_plot(df_result_mse, ax=ax);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we plot our data with [plotting.hr_mean_plot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.plotting.html#biopsykit.protocols.plotting.hr_mean_plot), and our data consists of different phases, where each phase consist of subphases, the phases will be highlighted in the background for better visibility." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(8, 4))\n", "bp.protocols.plotting.hr_mean_plot(df_result_subph_mse, ax=ax);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ensemble Time-series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For computing *Ensemble Time-series* the following processing steps will be performed:\n", "\n", "1. **Load Data**: Load heart rate processing results per subject and concatenate them into a [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict), a special nested dictionary structure that contains processed data of all subjects.\n", "2. **Resample**: Resample heart rate data of all subjects to 1 Hz. This allows to better split the data further into subphases later.\n", "3. **Normalize Data** (*optional*): Normalize heart rate data with respect to the mean heart rate of a baseline phase. This can allow to better compare heart rate reactions within one study since it removed biases from different resting heart rates.\n", "4. **Select Phases** (*optional*): If the following steps should not be performed on the data of *all* phases, but only on a *subset* of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.\n", "5. **Rearrange Data** The data in form of a `SubjectDataDict` is rearranged into a [StudyDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.StudyDataDict). A `StudyDataDict` is constructed from a `SubjectDataDict` by swapping outer (subject IDs) and inner (phase names) dictionary levels.\n", "6. **Cut Phases to Equal Length**: If the data don't have equal length per subject (e.g., because the last phase had a flexible duration or was not timed accurately), you can cut the data to the same length in order to concatenate the time-series data afterwards. This is beneficial if you want to overlay time-series data from multiple subjects in an so-called *Ensemble Plot*.\n", "7. **Merge StudyDataDict**: Now that the time-series data of all subjects have equal length within each phase they can be merged into a [MergedStudyDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.MergedStudyDataDict) where the inner level of the nested `StudyDataDict` is removed by merging the individual data into one dataframe for each phase.\n", "8. **Add Conditions** (*optional*): If multiple study conditions were present during the study the `StudyDataDict` can further be split by adding a new dictionary level with study conditions to the `StudyDataDict`.\n", "\n", "Note: See Documentation for [biopsykit.utils.data_processing](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.data_processing.html) for further information of the used functions. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Heart Rate Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or use your own path\n", "ecg_path = bp.example_data.get_ecg_processing_results_path_example()\n", "\n", "# Load processed heart rate time-series data of all subjects and concatenate into one dict\n", "subject_data_dict = {}\n", "for file in sorted(ecg_path.glob(\"hr_result_*.xlsx\")):\n", " subject_id = re.findall(\"hr_result_(Vp\\w+).xlsx\", file.name)[0]\n", " subject_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col=\"time\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(subject_data_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Subject Conditions\n", "(will be needed later)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "condition_list = pd.DataFrame(\n", " [\"Control\", \"Intervention\"], columns=[\"condition\"], index=pd.Index([\"Vp01\", \"Vp02\"], name=\"subject\")\n", ")\n", "condition_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Resample Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Resample all heart rate values to a sampling rate of 1 Hz." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result = bp.utils.data_processing.resample_dict_sec(subject_data_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the resampled [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Normalize Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normalize heart rate data with respect to the mean heart rate of a baseline phase. In this case, the baseline phase is called `\"Baseline\"`. All heart rate samples are then interpreted as \"heart rate change relative to baseline in percent\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result_norm = bp.utils.data_processing.normalize_to_phase(dict_result, \"Baseline\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the normalized [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict)`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_norm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select Phases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If desired, select only specific phases, such as only \"Intervention\", \"Stress\", and \"Recovery\" (i.e., drop the \"Baseline\" phase because it was only used for normalizing all other phases)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result_norm_selected = bp.utils.data_processing.select_dict_phases(\n", " dict_result_norm, phases=[\"Intervention\", \"Stress\", \"Recovery\"]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the normalized [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) with only the selected phases:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_norm_selected)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rearrange Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_study = bp.utils.data_processing.rearrange_subject_data_dict(dict_result_norm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the normalized [StudyDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.StudyDataDict):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_study)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cut Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data for all subjects will be cut to the shortest duration for each phase." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** If only a subset of phases should be cut, these phases can be specified using the `phases` parameter.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result = bp.utils.data_processing.cut_phases_to_shortest(dict_study)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the cut [StudyDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.StudyDataDict):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Merge `StudyDataDict`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_merged = bp.utils.data_processing.merge_study_data_dict(dict_result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the [MergedStudyDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.MergedStudyDataDict):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_merged)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add Conditions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_merged_cond = bp.utils.data_processing.split_subject_conditions(dict_merged, condition_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the [MergedStudyDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.MergedStudyDataDict), split into different conditions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_merged_cond)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After these processing steps we can finally visualize the time-series data ob all subjects in an *Ensemble Plot* where the data ob all subjects is overlaid and each point in the time-series is plotted as mean ± standard error over all subject. For creating Ensemble plots, `BioPsykit` provides the function [plotting.hr_ensemble_plot](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.plotting.html#biopsykit.protocols.plotting.hr_ensemble_plot)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note**: These plots does not look as pretty as it should because this is just example data from two subjects. If you want to see plots from \"actual\" data, just scroll down!\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(8, 4))\n", "bp.protocols.plotting.hr_ensemble_plot(dict_merged, ax=ax);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the phases have subphases they can be passed to the plotting function in order to highlight them in the background:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subphases = {\"Start\": 20, \"Middle\": 100, \"End\": 20}\n", "subphase_dict = {\"Baseline\": subphases, \"Intervention\": subphases, \"Stress\": subphases, \"Recovery\": subphases}\n", "\n", "fig, ax = plt.subplots(figsize=(8, 4))\n", "bp.protocols.plotting.hr_ensemble_plot(dict_merged, subphases=subphase_dict, ax=ax);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load example data from Excel file. The data is already in the format of a [MergedStudyDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.MergedStudyDataDict), so it can directly be passed to [plotting.hr_ensemble_plot](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.plotting.html#biopsykit.protocols.plotting.hr_ensemble_plot)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_merged_norm = bp.example_data.get_hr_ensemble_sample()\n", "\n", "print(bp.utils.datatype_helper.is_merged_study_data_dict(dict_merged_norm))\n", "display_dict_structure(dict_merged_norm)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nbsphinx-thumbnail" ] }, "outputs": [], "source": [ "subphases = {\"Start\": 60, \"Middle\": 240, \"End\": 0}\n", "subphase_dict = {\"Phase1\": subphases, \"Phase2\": subphases, \"Phase3\": subphases}\n", "\n", "fig, ax = plt.subplots(figsize=(8, 4))\n", "bp.protocols.plotting.hr_ensemble_plot(dict_merged_norm, subphases=subphase_dict, ax=ax);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute Heart Rate Variability (HRV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For computing *Heart Rate Variability (HRV)* the following processing steps will be performed: \n", "\n", "1. **Load Data**: Load R-peak data per subject and concatenate them into a [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict), a special nested dictionary structure that contains processed data of all subjects. \n", "1. **Select Phases** (*optional*): If the following steps should not be performed on the data of *all* phases, but only on a *subset* of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset. \n", "1. **Split into Subphases** (*optional*): If the different phases of your study consist of subphases, which you want to aggregate and analyze separately, you can further split the data in the `SubjectDataDict` further into different subphases, which adds a new dictionary level.\n", "1. **Compute HRV**: Compute HRV parameters from R-peak data. \n", "1. **Add Conditions** (*optional*): If multiple study conditions were present during the study the [StudyDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.StudyDataDict) can further be split by adding a new dictionary level with study conditions to the `StudyDataDict`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### R-Peak Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or use your own path\n", "ecg_path = bp.example_data.get_ecg_processing_results_path_example()\n", "\n", "# Load processed r-peaks data of all subjects and concatenate into one dict\n", "rpeaks_subject_data_dict = {}\n", "for file in sorted(ecg_path.glob(\"rpeaks_result_*.xlsx\")):\n", " subject_id = re.findall(\"rpeaks_result_(Vp\\w+).xlsx\", file.name)[0]\n", " rpeaks_subject_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col=\"time\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# rename to make it consistent with the other examples\n", "dict_result_rpeaks = rpeaks_subject_data_dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the R Peaks [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_rpeaks)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Subject Conditions\n", "(will be needed later)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note**: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "condition_list = pd.DataFrame(\n", " [\"Control\", \"Intervention\"], columns=[\"condition\"], index=pd.Index([\"Vp01\", \"Vp02\"], name=\"subject\")\n", ")\n", "condition_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select Phases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If desired, select only specific phases, such as only \"Intervention\", \"Stress\", and \"Recovery\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result_rpeaks_selected = bp.utils.data_processing.select_dict_phases(\n", " dict_result_rpeaks, phases=[\"Intervention\", \"Stress\", \"Recovery\"]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) with only the selected phases:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_rpeaks_selected)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split into Subphases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By splitting the data in the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) further into subphases a new index level is added as the most inner dictionary level. In this example, we assume that all subphases are consecutive and have fixed length (see the *Aggregated Results* example for further options)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_result_rpeaks_subph = bp.utils.data_processing.split_dict_into_subphases(\n", " dict_result_rpeaks, subphases={\"Start\": 20, \"Middle\": 30, \"End\": 10}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Structure of the [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) with a new subphase level added:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display_dict_structure(dict_result_rpeaks_subph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute HRV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterate through all subjects and all (sub)phases, call [EcgProcessor.hrv_process()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.html#biopsykit.signals.ecg.EcgProcessor.hrv_process) and concetenate the results into one dataframe." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Without Subphases" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_hrv_subject = {}\n", "for subject_id, data_dict in dict_result_rpeaks.items():\n", " list_hrv_phases = []\n", " for phase, rpeaks in data_dict.items():\n", " list_hrv_phases.append(EcgProcessor.hrv_process(rpeaks=rpeaks, index=phase, index_name=\"phase\"))\n", " dict_hrv_subject[subject_id] = pd.concat(list_hrv_phases)\n", "\n", "df_hrv = pd.concat(dict_hrv_subject, names=[\"subject\"])\n", "df_hrv.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### With Subphases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assuming we only want to compute a certain type of HRV measures (in this example, only time-domain measures), we can specify the `hrv_type` in [EcgProcessor.hrv_process()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.signals.ecg.html#biopsykit.signals.ecg.EcgProcessor.hrv_process) ." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_hrv_subject = {}\n", "for subject_id, data_dict in dict_result_rpeaks_subph.items():\n", " dict_hrv_phases = {}\n", " for phase, phase_dict in data_dict.items():\n", " list_hrv_subphases = []\n", " for subphase, rpeaks in phase_dict.items():\n", " list_hrv_subphases.append(\n", " EcgProcessor.hrv_process(rpeaks=rpeaks, index=subphase, index_name=\"phase\", hrv_types=\"hrv_time\")\n", " )\n", " dict_hrv_phases[phase] = pd.concat(list_hrv_subphases)\n", "\n", " dict_hrv_subject[subject_id] = pd.concat(dict_hrv_phases, names=[\"phase\"])\n", "\n", "df_hrv_subph = pd.concat(dict_hrv_subject, names=[\"subject\"])\n", "df_hrv_subph.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add Conditions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_hrv_cond = bp.utils.data_processing.add_subject_conditions(df_hrv, condition_list)\n", "df_hrv_cond.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "df_hrv_subph_cond = bp.utils.data_processing.add_subject_conditions(df_hrv_subph, condition_list)\n", "df_hrv_subph_cond.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Note at the End\n", "\n", "This example notebook had the purpose to show how to \"manually\" process such data step by step. However, `BioPsyKit` also offers an easier, object-oriented, way for performing such processing steps in the [biopsykit.protocols](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.html) module. For that, you just need to follow these steps:\n", "\n", "1. Create a new `Prococol` instance (choose from a set of pre-defined, established laboratory protocols, such as [MIST](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.mist.html#biopsykit.protocols.mist.MIST) or [TSST](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.tsst.html#biopsykit.protocols.tsst.TSST), or use the [BaseProtocol](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.base.html#biopsykit.protocols.base.BaseProtocol)) \n", "2. Add heart rate data (and R-peak data, if you additionally want to compute HRV), in the form of a [SubjectDataDict](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.datatype_helper.html#biopsykit.utils.datatype_helper.SubjectDataDict) to the `Protocol` instance via [Protocol.add_hr_data()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.html#biopsykit.protocols.BaseProtocol.add_hr_data). \n", "3. Compute *aggregated results* by calling [Protocol.compute_hr_results()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.html#biopsykit.protocols.BaseProtocol.compute_hr_results), *ensemble time-series* by calling [Protocol.compute_hr_ensemble()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.html#biopsykit.protocols.BaseProtocol.compute_hr_ensemble), and HRV data by calling [Protocol.compute_hrv_results()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.html#biopsykit.protocols.BaseProtocol.compute_hrv_results). The single processing steps can be enabled or disabled by passing `True` or `False` as function arguments (have a look at the Documentation!). Additional parameters can be passed to the processing steps by adding them a dictionary and passing it to the function via the `params` argument. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "jupytext": { "encoding": "# -*- coding: utf-8 -*-", "text_representation": { "extension": ".py", "format_name": "sphinx", "format_version": "1.1", "jupytext_version": "1.13.0" } }, "kernelspec": { "display_name": "biopsykit", "language": "python", "name": "biopsykit" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "toc-showtags": false }, "nbformat": 4, "nbformat_minor": 4 }