{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Questionnaire Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "This example illustrates how to process questionnare data.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup and Helper Functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import re\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "from fau_colors import cmaps\n", "import biopsykit as bp\n", "import pingouin as pg\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "%matplotlib widget\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.close(\"all\")\n", "\n", "palette = sns.color_palette(cmaps.faculties)\n", "sns.set_theme(context=\"notebook\", style=\"ticks\", font=\"sans-serif\", palette=palette)\n", "\n", "plt.rcParams[\"figure.figsize\"] = (8, 4)\n", "plt.rcParams[\"pdf.fonttype\"] = 42\n", "plt.rcParams[\"mathtext.default\"] = \"regular\"\n", "\n", "palette" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Questionnaire Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Example data\n", "data = bp.example_data.get_questionnaire_example()\n", "# Alternatively: Load your own data using bp.io.load_questionnaire_data()\n", "# bp.io.load_questionnaire_data(\"\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1: Compute Perceived Stress Scale (PSS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example we compute the Perceived Stress Scale (PSS).\n", "\n", "The PSS is a widely used self-report questionnaire with adequate reliability and validity asking about how stressful a person has found his/her life during the previous month." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slice Dataframe and Select Columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To extract only the columns belonging to the PSS questionnaire we can use the function [utils.find_cols()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.utils.html#biopsykit.questionnaires.utils.find_cols). This function returns the sliced dataframe and the columns belonging to the questionnaire." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_pss, columns_pss = bp.questionnaires.utils.find_cols(data, starts_with=\"PSS\")\n", "data_pss.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute PSS Score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can compute the PSS score by passing the questionnaire data to the function \n", "[questionnaires.pss()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.html#biopsykit.questionnaires.pss).\n", "\n", "\n", "This can be achieved on two ways:\n", "1. Directly passing the sliced PSS dataframe\n", "2. Passing the whole dataframe and a list of all column names that belong to the PSS. This option is better suited for computing multiple questionnaire scores at once (more on that later!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Option 1: Sliced PSS dataframe\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pss = bp.questionnaires.pss(data_pss)\n", "pss.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Option 2: Whole dataframe + PSS columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pss = bp.questionnaires.pss(data, columns=columns_pss)\n", "pss.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### *Feature Demo*: Compute PSS Score with Wrong Item Ranges" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example is supposed to demonstrate `BioPsyKit`'s feature of asserting that questionnaire items are provided in the correct value range according to the original definition of the questionnaire before computing the actual questionnaire score.\n", "\n", "In this example, we load an example dataset where the *PSS* items in this dataset are (wrongly) coded from `1` to `5`. The original definition of the *PSS*, however, was defined for items that are coded from `0` to `4`. Attempting to computing the *PSS* by passing the data to [questionnaires.pss()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.html#biopsykit.questionnaires.pss) will result in a [ValueRangeError](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.exceptions.html#biopsykit.utils.exceptions.ValueRangeError)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Load Questionnaire Data with Wrong Item Ranges" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_wrong = bp.example_data.get_questionnaire_example_wrong_range()\n", "data_wrong.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Slice Columns and Compute PSS Score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: This code will fail on purpose (the Exception is being catched) because the items are provided in the wrong range." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_pss_wrong, columns_pss = bp.questionnaires.utils.find_cols(data_wrong, starts_with=\"PSS\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " pss = bp.questionnaires.pss(data_pss_wrong)\n", "except bp.utils.exceptions.ValueRangeError as e:\n", " print(\"ValueRangeError: {}\".format(e))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Solution: Convert (Recode) Questionnaire Items" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To solve this issue we need to convert the PSS questionnaire items into the correct value range first by simply subtracting all values by `-1`. This can easily be done using the function [utils.convert_scale()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.utils.html#biopsykit.questionnaires.utils.convert_scale). This can also be done on two different ways:\n", "\n", "1. Convert the whole, sliced PSS dataframe\n", "2. Convert only the PSS columns, leave the other columns " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Option 1: Convert the sliced PSS dataframe" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_pss_conv = bp.questionnaires.utils.convert_scale(data_pss_wrong, offset=-1)\n", "data_pss_conv.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Option 2: Convert only the PSS columns, leave the other columns unchanged" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_conv = bp.questionnaires.utils.convert_scale(data_wrong, cols=columns_pss, offset=-1)\n", "data_conv.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Compute PSS Score (Finally!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the scores are in the correct range and we can compute the *PSS* score:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Option 1: the sliced PSS dataframe\n", "pss = bp.questionnaires.pss(data_pss_conv)\n", "pss.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Option 2: the whole dataframe + PSS columns\n", "pss = bp.questionnaires.pss(data_conv, columns=columns_pss)\n", "pss.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 2: Compute Positive and Negative Affect Schedule (PANAS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The PANAS assesses *positive affect* (interested, excited, strong, enthusiastic, proud, alert, inspired, determined, attentive, and active) and *negative affect* (distressed, upset, guilty, scared, hostile, irritable, ashamed, nervous, jittery, and afraid).\n", "\n", "Higher scores on each subscale indicate greater positive or negative affect." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slice Dataframe and Select Columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, the PANAS was assessed *pre* and *post* Stress:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_panas_pre, columns_panas_pre = bp.questionnaires.utils.find_cols(data, starts_with=\"PANAS\", ends_with=\"Pre\")\n", "data_panas_post, columns_panas_post = bp.questionnaires.utils.find_cols(data, starts_with=\"PANAS\", ends_with=\"Post\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute PANAS" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "panas_pre = bp.questionnaires.panas(data_panas_pre)\n", "panas_pre.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "panas_post = bp.questionnaires.panas(data_panas_post)\n", "panas_post.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 3: Compute Multiple Scores at Once" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Build a dictionary where each key corresponds to the questionnaire score to be computed and each value corresponds to the columns of the questionnaire. If some scores were assessed repeatedly (e.g. PANAS was assessed at two different time points, *pre* and *post*) separate the suffix from the computation by a `-` (e.g. `panas-pre` and `panas-post`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Example Questionnaire Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = bp.example_data.get_questionnaire_example()\n", "data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from biopsykit.questionnaires.utils import find_cols\n", "\n", "dict_scores = {\n", " \"pss\": find_cols(data, starts_with=\"PSS\")[1],\n", " \"pasa\": find_cols(data, starts_with=\"PASA\")[1],\n", " \"panas-pre\": find_cols(data, starts_with=\"PANAS\", ends_with=\"Pre\")[1],\n", " \"panas-post\": find_cols(data, starts_with=\"PANAS\", ends_with=\"Post\")[1],\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compute all scores and store in result dataframe\n", "data_scores = bp.questionnaires.utils.compute_scores(data, dict_scores)\n", "data_scores.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convert Scores into Long Format" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_scores.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Questionnaires that only have different *subscales* => Create one new index level `subscale`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(list(data_scores.filter(like=\"PASA\").columns))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pasa = bp.questionnaires.utils.wide_to_long(data_scores, quest_name=\"PASA\", levels=[\"subscale\"])\n", "pasa.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Questionnaires that have different *subscales* and different *assessment times* => Create two new index levels `subscale` and `time`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(list(data_scores.filter(like=\"PANAS\").columns))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[utils.wide_to_long()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.utils.html#biopsykit.questionnaires.utils.wide_to_long) converts the data into the wide format recursively from the *first* level (here: `subscale`) to the *last* level (here: `time`):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "panas = bp.questionnaires.utils.wide_to_long(data_scores, quest_name=\"PANAS\", levels=[\"subscale\", \"time\"])\n", "panas.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### In one Plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots()\n", "bp.plotting.feature_boxplot(\n", " data=panas, x=\"subscale\", y=\"PANAS\", hue=\"time\", hue_order=[\"pre\", \"post\"], palette=cmaps.faculties_light, ax=ax\n", ");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: See Documentation for [plotting.feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.plotting.html#biopsykit.plotting.feature_boxplot) for further information of the used functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### In Subplots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Regular" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(ncols=3)\n", "bp.plotting.multi_feature_boxplot(\n", " data=panas,\n", " x=\"time\",\n", " y=\"PANAS\",\n", " features=[\"NegativeAffect\", \"PositiveAffect\", \"Total\"],\n", " group=\"subscale\",\n", " order=[\"pre\", \"post\"],\n", " palette=cmaps.faculties_light,\n", " ax=axs,\n", ")\n", "fig.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: See Documentation for [plotting.multi_feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.plotting.html#biopsykit.plotting.multi_feature_boxplot) for further information of the used functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### With Significance Brackets\n", "\n", "**Note**: See [StatsPipeline_Plotting_Example.ipynb](StatsPipeline_Plotting_Example.ipynb) for further information!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pipeline = bp.stats.StatsPipeline(\n", " steps=[(\"prep\", \"normality\"), (\"test\", \"pairwise_tests\")],\n", " params={\"dv\": \"PANAS\", \"groupby\": \"subscale\", \"subject\": \"subject\", \"within\": \"time\"},\n", ")\n", "\n", "pipeline.apply(panas);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nbsphinx-thumbnail" ] }, "outputs": [], "source": [ "fig, axs = plt.subplots(ncols=3)\n", "\n", "features = [\"NegativeAffect\", \"PositiveAffect\", \"Total\"]\n", "\n", "box_pairs, pvalues = pipeline.sig_brackets(\n", " \"test\", stats_effect_type=\"within\", plot_type=\"single\", x=\"time\", features=features, subplots=True\n", ")\n", "\n", "bp.plotting.multi_feature_boxplot(\n", " data=panas,\n", " x=\"time\",\n", " y=\"PANAS\",\n", " features=features,\n", " group=\"subscale\",\n", " order=[\"pre\", \"post\"],\n", " stats_kwargs={\"box_pairs\": box_pairs, \"pvalues\": pvalues, \"verbose\": 0},\n", " palette=cmaps.faculties_light,\n", " ax=axs,\n", ")\n", "for ax, feature in zip(axs, features):\n", " ax.set_title(feature)\n", "\n", "fig.tight_layout()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "biopsykit", "language": "python", "name": "biopsykit" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }