{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# StatsPipeline & Plotting Example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "    \n",
    "This example shows how to set up a statistical analysis pipeline using [StatsPipeline](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline), how to export results, and how to create LaTeX output from statistical analysis results. \n",
    "\n",
    "Additionally, it demonstrates the integrated plotting functions of `BioPsyKit`, that wrap the [boxplot()](https://seaborn.pydata.org/generated/seaborn.boxplot.html) function of [seaborn](https://seaborn.pydata.org/index.html) and offer additional, often used features, such as adding significance brackets:\n",
    "    \n",
    "* [feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.plotting.html#biopsykit.plotting.feature_boxplot) and [multi_feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.plotting.html#biopsykit.plotting.multi_feature_boxplot)\n",
    "* as well as the derived functions specialized for saliva features: [saliva_feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.plotting.html#biopsykit.protocols.plotting.saliva_feature_boxplot) and [saliva_multi_feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.plotting.html#biopsykit.protocols.plotting.saliva_multi_feature_boxplot).\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import and Helper Functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from fau_colors import cmaps\n",
    "import biopsykit as bp\n",
    "from biopsykit.utils.dataframe_handling import multi_xs\n",
    "from biopsykit.stats import StatsPipeline\n",
    "\n",
    "import pingouin as pg\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "%matplotlib widget\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.close(\"all\")\n",
    "\n",
    "palette = sns.color_palette(cmaps.faculties_light)\n",
    "sns.set_theme(context=\"notebook\", style=\"ticks\", font=\"sans-serif\", palette=palette)\n",
    "\n",
    "plt.rcParams[\"figure.figsize\"] = (8, 4)\n",
    "plt.rcParams[\"pdf.fonttype\"] = 42\n",
    "plt.rcParams[\"mathtext.default\"] = \"regular\"\n",
    "\n",
    "palette"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Import"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sample_times = [-30, -1, 30, 40, 50, 60, 70]\n",
    "\n",
    "condition_order = [\"Control\", \"Intervention\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Raw Cortisol"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cort_samples = bp.example_data.get_saliva_example()\n",
    "cort_samples.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(8, 4))\n",
    "# specialized function for plotting saliva data\n",
    "bp.protocols.plotting.saliva_plot(\n",
    "    data=cort_samples,\n",
    "    saliva_type=\"cortisol\",\n",
    "    sample_times=sample_times,\n",
    "    test_times=[0, 30],\n",
    "    sample_times_absolute=True,\n",
    "    test_title=\"TEST\",\n",
    "    ax=ax,\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compute Cortisol Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "auc = bp.saliva.auc(\n",
    "    cort_samples, saliva_type=\"cortisol\", sample_times=sample_times, compute_auc_post=True, remove_s0=True\n",
    ")\n",
    "max_inc = bp.saliva.max_increase(cort_samples, saliva_type=\"cortisol\", remove_s0=True)\n",
    "slope = bp.saliva.slope(cort_samples, sample_idx=[1, 4], sample_times=sample_times, saliva_type=\"cortisol\")\n",
    "\n",
    "cort_features = pd.concat([auc, max_inc, slope], axis=1)\n",
    "cort_features = bp.saliva.utils.saliva_feature_wide_to_long(cort_features, saliva_type=\"cortisol\")\n",
    "cort_features.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See the [<code>Saliva_Example.ipynb</code>](ECG_Analysis_Example.ipynb) notebook for further explanations on the different processing steps."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## General Information"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The purpose of creating such a statistical analysis pipeline is to assemble several steps of a typical statistical analysis procedure while setting different parameters. The parameters passed to this class depend on the used statistical functions.\n",
    "\n",
    "The interface of this class is inspired by the scikit-learn [Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) for machine learning tasks.\n",
    "\n",
    "\n",
    "The different steps of statistical analysis can be divided into categories:\n",
    "\n",
    "* *Preparatory Analysis* (`prep`): Analyses applied to the data before performing the actual statistical analysis, such as:\n",
    "    * `normality`: Testing whether a random sample comes from a normal distribution.\n",
    "    * `equal_var`: Testing the equality of variances (homoscedasticity).\n",
    "* *Statistical Tests* (`test`): Statistical test to determine differences or similarities in the data, such as:\n",
    "    * `pairwise_ttests`: Pairwise T-tests (either for independent or dependent samples).\n",
    "    * `anova`: One-way or N-way ANOVA.\n",
    "    * `welch_anova`: One-way Welch-ANOVA.\n",
    "    * `rm_anova`: One-way and two-way repeated measures ANOVA.\n",
    "    * `mixed_anova`: Mixed-design (split-plot) ANOVA.\n",
    "    * `kruskal`: Kruskal-Wallis H-test for independent samples.\n",
    "* *Posthoc Tests* (`posthoc`): Posthoc tests to determine differences of individual groups if more than two\n",
    "  groups are analyzed, such as:\n",
    "    * `pairwise_ttests`: Pairwise T-tests (either for independent or dependent samples).\n",
    "    * `pairwise_tukey`: Pairwise Tukey-HSD post-hoc test.\n",
    "    * `pairwise_gameshowell`: Pairwise Games-Howell post-hoc test."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A [StatsPipeline](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline) consists of a list of tuples specifying the individual `steps` of the pipeline.\n",
    "The first value of each tuple indicates the category this step belongs to (`prep`, `test`, or `posthoc`),\n",
    "the second value indicates the analysis function to use in this step (e.g., `normality`, or `rm_anova`).\n",
    "\n",
    "Furthermore, a `params` dictionary specifying the parameters and variables for statistical analysis\n",
    "needs to be supplied. Parameters can either be specified *globally*, i.e., for all steps in the pipeline\n",
    "(the default), or *locally*, i.e., only for one specific category, by prepending the category name and separating it from the parameter name by a `__`. The parameters depend on the type of analysis used in the pipeline. \n",
    "\n",
    "\n",
    "Examples are:\n",
    "\n",
    "* `dv`: column name of the dependent variable\n",
    "* `between`: column name of the between-subject factor\n",
    "* `within`: column name of the within-subject factor\n",
    "* `effsize`: type of effect size to compute (if applicable)\n",
    "* `multicomp`: whether (and how) to apply multi-comparison correction of p-values to the *last* step in the\n",
    "  pipeline (either \"test\" or \"posthoc\") using [`StatsPipeline.multicomp()`](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline.multicomp). The arguments are supplied via dictionary.\n",
    "* ...\n",
    "\n",
    "More information can be found here: [StatsPipeline](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## T-Tests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cort_features.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`cort_features` contains multiple features that need to be analyzed individually. The analysis pipeline can be applied to each feature individually by specifying a column to group the dataframe by (`groupby` parameter). The result dataframes will then contain the `groupby` column as index level."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline = StatsPipeline(\n",
    "    steps=[(\"prep\", \"normality\"), (\"prep\", \"equal_var\"), (\"test\", \"pairwise_tests\")],\n",
    "    params={\"dv\": \"cortisol\", \"groupby\": \"saliva_feature\", \"between\": \"condition\"},\n",
    ")\n",
    "\n",
    "pipeline.apply(cort_features);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Display Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# display all results\n",
    "pipeline.display_results()\n",
    "# don't show results from the \"prep\" cagegory\n",
    "# pipeline.display_results(prep=False)\n",
    "# only significant results\n",
    "# pipeline.display_results(sig_only=True)\n",
    "# only significant results from the \"posthoc\" category (results from other categories will all be displayed)\n",
    "# pipeline.display_results(sig_only=\"posthoc\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Further functions of ``StatsPipeline``"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Analysis categories and their respective analysis steps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline.category_steps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dictionary with analysis results per step"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = pipeline.results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get results from normality check"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results[\"normality\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Return only results from one category"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline.results_cat(\"test\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Export the whole pipeline as Excel sheet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pipeline.export_statistics(\"path_to_file.xlsx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### LaTeX Output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Inline LaTeX Code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling [StatsPipeline.stats_to_latex()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline.stats_to_latex)  generates LaTeX code for each row. The output can be copied and pasted into LaTeX documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline.stats_to_latex(\"pairwise_tests\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generate LaTeX output for selected rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline.stats_to_latex(\"pairwise_tests\", \"auc_g\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### LaTeX Table"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[StatsPipeline.results_to_latex_table()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline.results_to_latex_table) uses [DataFrame.to_latex()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_latex.html) to convert a pandas dataframe into a LaTeX table."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Default Output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results_out = pipeline.results_to_latex_table(\"pairwise_tests\")\n",
    "print(results_out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### With Caption and Label"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results_out = pipeline.results_to_latex_table(\"pairwise_tests\", caption=\"This is a caption.\", label=\"tab:label\")\n",
    "print(results_out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Further Customization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, the column format for columns that contain numbers is \"S\" which is provided by the [siunitx](https://ctan.org/pkg/siunitx?lang=en) package. The column format can be configured by the `si_table_format` argument. The default table format is `\"<1.3\"`.\n",
    "\n",
    "Additionally, [StatsPipeline.results_to_latex_table()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline.results_to_latex_table) provides many options to customize the LaTeX table output:\n",
    "\n",
    "* `collapse_dof`: `True` to collapse degree(s)-of-freedom (dof) from a separate column into the column header of the t- or F-value, respectively, `False` to keep it as separate \"dof\" column(s). This only works if the degrees-of-freedom are the same for all tests in the table. Default: `True`\n",
    "* `unstack_levels`: name(s) of dataframe index level(s) to be unstacked in the resulting LaTeX table or `None` to not unstack any index level(s). Default: `None`\n",
    "* All other arguments that are passed down to [DataFrame.to_latex](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_latex.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results_out = pipeline.results_to_latex_table(\"pairwise_tests\", collapse_dof=False, si_table_format=\"table-format=1.1\")\n",
    "print(results_out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Index Customization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The index of the LaTeX table can be further configured by passing further arguments as a `index_kws` dict:\n",
    "\n",
    "* `index_italic`: `True` to format index columns in italic, `False` otherwise. Default: `True`\n",
    "* `index_level_order`: list of index level names indicating the index level order of a [MultiIndex](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html) in the LaTeX table. If `None` the index order of the dataframe will be used\n",
    "* `index_value_order`: list of index values if rows in LaTeX table should have a different order than the underlying dataframe or if only specific rows should be exported as LaTeX table. If the table index is a [MultiIndex](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html) then `index_value_order` should be a dictionary with the index level names as keys and lists of index values of the specific level as values\n",
    "* `index_rename_map`: mapping with dictionary with index values as keys and new index values to be exported\n",
    "* `index_level_names_tex`: names of index levels in the LaTeX table or `None` to keep the index level names of the dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "index_kws = {\n",
    "    \"index_italic\": False,\n",
    "    \"index_rename_map\": {\n",
    "        \"auc_g\": \"$AUC_{G}$\",\n",
    "        \"auc_i\": \"$AUC_{I}$\",\n",
    "        \"auc_i_post\": \"$AUC_{I}^{Post}$\",\n",
    "        \"max_inc\": \"$\\Delta c_{max}$\",\n",
    "        \"slope14\": \"$a_{S1S4}$\",\n",
    "    },\n",
    "    \"index_level_names_tex\": \"Saliva Feature\",\n",
    "}\n",
    "\n",
    "results_out = pipeline.results_to_latex_table(\"pairwise_tests\", index_kws=index_kws)\n",
    "print(results_out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mixed ANOVA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To demonstrate Mixed-ANOVA analysis we construct some artificial example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_example = multi_xs(cort_samples, [\"2\", \"3\"], level=\"sample\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_example.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline = StatsPipeline(\n",
    "    steps=[(\"prep\", \"normality\"), (\"prep\", \"equal_var\"), (\"test\", \"mixed_anova\"), (\"posthoc\", \"pairwise_tests\")],\n",
    "    params={\n",
    "        \"dv\": \"cortisol\",\n",
    "        \"between\": \"condition\",\n",
    "        \"within\": \"sample\",\n",
    "        \"subject\": \"subject\",\n",
    "        \"padjust\": \"bonf\",  # specify multicorrection method to be applied on the posthoc tests\n",
    "    },\n",
    ")\n",
    "\n",
    "pipeline.apply(data_example)\n",
    "pipeline.display_results()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Repeated-Measurements ANOVA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_slice = data_example.xs(\"Control\", level=\"condition\")\n",
    "\n",
    "pipeline = StatsPipeline(\n",
    "    steps=[(\"prep\", \"normality\"), (\"prep\", \"equal_var\"), (\"test\", \"rm_anova\"), (\"posthoc\", \"pairwise_tests\")],\n",
    "    params={\"dv\": \"cortisol\", \"within\": \"sample\", \"subject\": \"subject\"},\n",
    ")\n",
    "\n",
    "pipeline.apply(data_slice)\n",
    "pipeline.display_results()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get Significance Brackets from [StatsPipeline](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[StatsPipeline.sig_brackets()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline.sig_brackets) returns the significance brackets and the corresponding p-values to add to the plotting functions of `BioPsyKit`.\n",
    "\n",
    "The method takes the following parameters (from the documentation):\n",
    "\n",
    "* `stats_category_or_data`: either a string of the pipeline category to use for generating significance brackets or a dataframe with statistical if significance brackets should be generated from the dataframe\n",
    "* `stats_effect_type`: effect type of analysis performed (\"between\", \"within\", \"interaction\"). Needed to extract the correct information from the analysis dataframe\n",
    "* `plot_type`: type of plot for which significance brackets are generated: \"multi\" if boxplots are grouped (by a `hue` variable), \"single\" (the default) otherwise\n",
    "* `features`: feature(s) displayed in the boxplot. The resulting significance brackets will be filtered accordingly to only contain features present in the boxplot. It can have the following formats:\n",
    "    * `str`: only one feature is plotted in the boxplot  \n",
    "      => returns significance brackets of only one feature\n",
    "    * `list`: multiple features are combined into *one* `Axes` object (i.e., no subplots)  \n",
    "      => returns significance brackets of multiple features\n",
    "    * `dict`: if boxplots of features are organized in subplots then `features` needs to dictionary with the feature (or list of features) per subplot (`subplots` is `True`)  \n",
    "      => returns dictionary with significance brackets per subplot\n",
    "* `x`: name of column used as `x` axis in the boxplot. Only required if `plot_type` is \"multi\".\n",
    "     "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Single Plot – One Feature"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example shows how to add significance brackets to a *single plot* with a *single boxplot* where only one type of feature is plotted (e.g., only to display `max_inc` feature, where the different groups are separated by the `x` variable). \n",
    "\n",
    "If the feature to be plotted is only a subset of different features, it must be filtered via the `features` parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# rerun the t-test pipeline for plotting\n",
    "pipeline = StatsPipeline(\n",
    "    steps=[(\"prep\", \"normality\"), (\"prep\", \"equal_var\"), (\"test\", \"pairwise_tests\")],\n",
    "    params={\"dv\": \"cortisol\", \"groupby\": \"saliva_feature\", \"between\": \"condition\"},\n",
    ")\n",
    "\n",
    "pipeline.apply(cort_features);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "box_pairs, pvalues = pipeline.sig_brackets(\"test\", stats_effect_type=\"between\", plot_type=\"single\", features=\"max_inc\")\n",
    "print(box_pairs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Single Plot – Multiple Features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example shows how to add significance brackets to a *single plot* with *multiple boxplots* where multiple types of features are plotted (e.g., to display the `max_inc` and `slope14` features, where different features are separated by the `x` variable and the different groups are separated by the `hue` variable).\n",
    "\n",
    "If only a subset of features should be plotted, the features of interest must be filtered via the `features` parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "box_pairs, pvalues = pipeline.sig_brackets(\n",
    "    \"test\", stats_effect_type=\"between\", plot_type=\"multi\", x=\"saliva_feature\", features=[\"max_inc\", \"slope14\"]\n",
    ")\n",
    "print(box_pairs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multiple Plots – Multiple Features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example shows how to add significance brackets to *multiple subplots* with *multiple boxplots* per subplot (e.g., to display the `max_inc` and the `slope14` feature where `max_inc` and `slope14` each have their own subplot, the features are plotted along the `x` variable and the different groups are separated by the `hue` variable). For creating significance brackets to be used in subplots set the `subplots` parameter to `True`. \n",
    "\n",
    "By default, each feature is assumed to be in its **own subplot** (see the next example if you want to change the behavior).\n",
    "\n",
    "If only a subset of features should be plotted, the features of interest must be filtered via the `features` parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "box_pairs, pvalues = pipeline.sig_brackets(\n",
    "    \"test\",\n",
    "    stats_effect_type=\"between\",\n",
    "    plot_type=\"multi\",\n",
    "    x=\"saliva_feature\",\n",
    "    features=[\"max_inc\", \"slope14\"],\n",
    "    subplots=True,\n",
    ")\n",
    "print(box_pairs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to specify a **custom mapping** of features to subplots you can provide a dictionary specifying this mapping as `features` argument (here, we want to have the features `auc_i` and `auc_g` in one subplot, and `max_inc` and `slope14` in another subplot):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "box_pairs, pvalues = pipeline.sig_brackets(\n",
    "    \"test\",\n",
    "    stats_effect_type=\"between\",\n",
    "    plot_type=\"multi\",\n",
    "    x=\"saliva_feature\",\n",
    "    features={\"auc\": [\"auc_i\", \"auc_g\"], \"inc\": [\"max_inc\", \"slope14\"]},\n",
    "    subplots=True,\n",
    ")\n",
    "print(box_pairs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Single Plot – One Feature"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example shows how to plot a single feature in a single boxplot using [plotting.feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.plotting.plotting.html#biopsykit.plotting.plotting.feature_boxplot). The two conditions are plotted along the `x` axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = \"max_inc\"\n",
    "data_plot = multi_xs(cort_features, features, level=\"saliva_feature\")\n",
    "\n",
    "box_pairs, pvalues = pipeline.sig_brackets(\"test\", stats_effect_type=\"between\", plot_type=\"single\", features=features)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "bp.plotting.feature_boxplot(\n",
    "    data=data_plot,\n",
    "    x=\"condition\",\n",
    "    y=\"cortisol\",\n",
    "    stats_kwargs={\"box_pairs\": box_pairs, \"pvalues\": pvalues, \"verbose\": 0},\n",
    "    ax=ax,\n",
    ")\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Single Plot – Multiple Features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example is the *same* as the one in the **Single Plot – One Feature** example above, but this time, the (single) feature is plotted along the `x` axis and the two groups are separated by the `hue` parameter. This makes it `plot_type` \"multi\" and thus requires to specify the `x` parameter in [StatsPipeline.sig_brackets()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline.sig_brackets)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = \"max_inc\"\n",
    "data_plot = multi_xs(cort_features, features, level=\"saliva_feature\")\n",
    "\n",
    "box_pairs, pvalues = pipeline.sig_brackets(\n",
    "    \"test\", stats_effect_type=\"between\", plot_type=\"multi\", x=\"saliva_feature\", features=features\n",
    ")\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "bp.plotting.feature_boxplot(\n",
    "    data=data_plot,\n",
    "    x=\"saliva_feature\",\n",
    "    y=\"cortisol\",\n",
    "    hue=\"condition\",\n",
    "    stats_kwargs={\"box_pairs\": box_pairs, \"pvalues\": pvalues, \"verbose\": 0},\n",
    "    ax=ax,\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now with a \"real\" example: We use [plotting.feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.plotting.plotting.html#biopsykit.plotting.plotting.feature_boxplot) to plot actually multiple features along the `x` axis with the `hue` variable separating the conditions.\n",
    "\n",
    "In this example, however, no feature shows statistically significant differences between the two conditions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = [\"auc_g\", \"auc_i\"]\n",
    "data_plot = multi_xs(cort_features, features, level=\"saliva_feature\")\n",
    "\n",
    "box_pairs, pvalues = pipeline.sig_brackets(\n",
    "    \"test\", stats_effect_type=\"between\", features=features, plot_type=\"multi\", x=\"saliva_feature\"\n",
    ")\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "bp.plotting.feature_boxplot(\n",
    "    data=data_plot,\n",
    "    x=\"saliva_feature\",\n",
    "    y=\"cortisol\",\n",
    "    hue=\"condition\",\n",
    "    stats_kwargs={\"box_pairs\": box_pairs, \"pvalues\": pvalues, \"verbose\": 0},\n",
    "    ax=ax,\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multiple Plots – Multiple Features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example shows how to plot features in *multiple subplots* with *multiple boxplots* per subplot. Here, `max_inc` and the `slope14` features are displayed in their own subplot and the features `auc_i` and `auc_g` are combined into one joint subplot. The features are separated by the `x` variable and the different groups are separated by the `hue` variable.\n",
    "\n",
    "This *custom mapping* of features to subplots can be provided via a dictionary passed to `features` argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbsphinx-thumbnail"
    ]
   },
   "outputs": [],
   "source": [
    "features = {\"auc\": [\"auc_g\", \"auc_i\"], \"max_inc\": \"max_inc\", \"slope14\": \"slope14\"}\n",
    "\n",
    "box_pairs, pvalues = pipeline.sig_brackets(\n",
    "    \"test\", stats_effect_type=\"between\", plot_type=\"multi\", x=\"saliva_feature\", features=features, subplots=True\n",
    ")\n",
    "\n",
    "data_plot = cort_features.copy()\n",
    "\n",
    "bp.plotting.multi_feature_boxplot(\n",
    "    data=data_plot,\n",
    "    x=\"saliva_feature\",\n",
    "    y=\"cortisol\",\n",
    "    hue=\"condition\",\n",
    "    group=\"saliva_feature\",\n",
    "    features=features,\n",
    "    stats_kwargs={\"box_pairs\": box_pairs, \"pvalues\": pvalues, \"verbose\": 0},\n",
    ")\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use Specialized [saliva_feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.plotting.html#biopsykit.protocols.plotting.saliva_feature_boxplot) functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to the \"general-purpose\" plotting functions `BioPsyKit` also offers specialized plotting functions for saliva features since plotting saliva data is a commonly performed task. These functions offer a better styling of axis and labels for saliva data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# rerun the pipeline\n",
    "pipeline = StatsPipeline(\n",
    "    steps=[(\"prep\", \"normality\"), (\"prep\", \"equal_var\"), (\"test\", \"pairwise_tests\")],\n",
    "    params={\"dv\": \"cortisol\", \"groupby\": \"saliva_feature\", \"between\": \"condition\"},\n",
    ")\n",
    "\n",
    "pipeline.apply(cort_features)\n",
    "pipeline.display_results()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Single Plot – One Feature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = \"max_inc\"\n",
    "data_plot = multi_xs(cort_features, features, level=\"saliva_feature\")\n",
    "\n",
    "box_pairs, pvalues = pipeline.sig_brackets(\"test\", stats_effect_type=\"between\", features=features, plot_type=\"single\")\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "bp.protocols.plotting.saliva_feature_boxplot(\n",
    "    data=data_plot,\n",
    "    x=\"condition\",\n",
    "    saliva_type=\"cortisol\",\n",
    "    feature=features,\n",
    "    stats_kwargs={\"box_pairs\": box_pairs, \"pvalues\": pvalues, \"verbose\": 0},\n",
    "    ax=ax,\n",
    ")\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Single Plot – Multiple Feature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = [\"auc_g\", \"auc_i\"]\n",
    "data_plot = multi_xs(cort_features, features, level=\"saliva_feature\")\n",
    "\n",
    "box_pairs, pvalues = pipeline.sig_brackets(\n",
    "    \"test\", stats_effect_type=\"between\", features=features, plot_type=\"multi\", x=\"saliva_feature\"\n",
    ")\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "bp.protocols.plotting.saliva_feature_boxplot(\n",
    "    data=data_plot,\n",
    "    x=\"saliva_feature\",\n",
    "    saliva_type=\"cortisol\",\n",
    "    hue=\"condition\",\n",
    "    feature=features,\n",
    "    stats_kwargs={\"box_pairs\": box_pairs, \"pvalues\": pvalues, \"verbose\": 0},\n",
    "    ax=ax,\n",
    ")\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot Multiple Features in Subplots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using [saliva_multi_feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.protocols.plotting.html#biopsykit.protocols.plotting.saliva_multi_feature_boxplot):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = {\"auc\": [\"auc_g\", \"auc_i\"], \"max_inc\": \"max_inc\", \"slope14\": \"slope14\"}\n",
    "\n",
    "box_pairs, pvalues = pipeline.sig_brackets(\n",
    "    \"test\", stats_effect_type=\"between\", plot_type=\"multi\", x=\"saliva_feature\", features=features, subplots=True\n",
    ")\n",
    "\n",
    "data_plot = cort_features.copy()\n",
    "\n",
    "bp.protocols.plotting.saliva_multi_feature_boxplot(\n",
    "    data=data_plot,\n",
    "    saliva_type=\"cortisol\",\n",
    "    features=features,\n",
    "    hue=\"condition\",\n",
    "    stats_kwargs={\"box_pairs\": box_pairs, \"pvalues\": pvalues, \"verbose\": 0},\n",
    ")\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "biopsykit",
   "language": "python",
   "name": "biopsykit"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "toc-showtags": false
 },
 "nbformat": 4,
 "nbformat_minor": 4
}