# Questionnaire Example

<div class="alert alert-block alert-info">
This example illustrates how to process questionnare data.
</div>

## Setup and Helper Functions

In [None]:
from pathlib import Path

import re

import pandas as pd
import numpy as np

from fau_colors import cmaps
import biopsykit as bp
import pingouin as pg

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib widget
%load_ext autoreload
%autoreload 2

In [None]:
plt.close("all")

palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)

plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

palette

## Load Questionnaire Data

In [None]:
# Example data
data = bp.example_data.get_questionnaire_example()
# Alternatively: Load your own data using bp.io.load_questionnaire_data()
# bp.io.load_questionnaire_data("<path-to-questionnaire-data>")

In [None]:
data.head()

## Example 1: Compute Perceived Stress Scale (PSS)

In this example we compute the Perceived Stress Scale (PSS).

The PSS is a widely used self-report questionnaire with adequate reliability and validity asking about how stressful a person has found his/her life during the previous month.

### Slice Dataframe and Select Columns

To extract only the columns belonging to the PSS questionnaire we can use the function [utils.find_cols()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.utils.html#biopsykit.questionnaires.utils.find_cols). This function returns the sliced dataframe and the columns belonging to the questionnaire.

In [None]:
data_pss, columns_pss = bp.questionnaires.utils.find_cols(data, starts_with="PSS")
data_pss.head()

### Compute PSS Score

We can compute the PSS score by passing the questionnaire data to the function 
[questionnaires.pss()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.html#biopsykit.questionnaires.pss).


This can be achieved on two ways:
1. Directly passing the sliced PSS dataframe
2. Passing the whole dataframe and a list of all column names that belong to the PSS. This option is better suited for computing multiple questionnaire scores at once (more on that later!)

#### Option 1: Sliced PSS dataframe


In [None]:
pss = bp.questionnaires.pss(data_pss)
pss.head()

#### Option 2: Whole dataframe + PSS columns

In [None]:
pss = bp.questionnaires.pss(data, columns=columns_pss)
pss.head()

### *Feature Demo*: Compute PSS Score with Wrong Item Ranges

This example is supposed to demonstrate `BioPsyKit`'s feature of asserting that questionnaire items are provided in the correct value range according to the original definition of the questionnaire before computing the actual questionnaire score.

In this example, we load an example dataset where the *PSS* items in this dataset are (wrongly) coded from `1` to `5`. The original definition of the *PSS*, however, was defined for items that are coded from `0` to `4`. Attempting to computing the *PSS* by passing the data to [questionnaires.pss()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.html#biopsykit.questionnaires.pss) will result in a [ValueRangeError](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.utils.exceptions.html#biopsykit.utils.exceptions.ValueRangeError).

#### Load Questionnaire Data with Wrong Item Ranges

In [None]:
data_wrong = bp.example_data.get_questionnaire_example_wrong_range()
data_wrong.head()

#### Slice Columns and Compute PSS Score

**Note**: This code will fail on purpose (the Exception is being catched) because the items are provided in the wrong range.

In [None]:
data_pss_wrong, columns_pss = bp.questionnaires.utils.find_cols(data_wrong, starts_with="PSS")

In [None]:
try:
    pss = bp.questionnaires.pss(data_pss_wrong)
except bp.utils.exceptions.ValueRangeError as e:
    print("ValueRangeError: {}".format(e))

#### Solution: Convert (Recode) Questionnaire Items

To solve this issue we need to convert the PSS questionnaire items into the correct value range first by simply subtracting all values by `-1`. This can easily be done using the function [utils.convert_scale()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.utils.html#biopsykit.questionnaires.utils.convert_scale). This can also be done on two different ways:

1. Convert the whole, sliced PSS dataframe
2. Convert only the PSS columns, leave the other columns 

##### Option 1: Convert the sliced PSS dataframe

In [None]:
data_pss_conv = bp.questionnaires.utils.convert_scale(data_pss_wrong, offset=-1)
data_pss_conv.head()

##### Option 2: Convert only the PSS columns, leave the other columns unchanged

In [None]:
data_conv = bp.questionnaires.utils.convert_scale(data_wrong, cols=columns_pss, offset=-1)
data_conv.head()

#### Compute PSS Score (Finally!)

Now the scores are in the correct range and we can compute the *PSS* score:

In [None]:
# Option 1: the sliced PSS dataframe
pss = bp.questionnaires.pss(data_pss_conv)
pss.head()

In [None]:
# Option 2: the whole dataframe + PSS columns
pss = bp.questionnaires.pss(data_conv, columns=columns_pss)
pss.head()

## Example 2: Compute Positive and Negative Affect Schedule (PANAS)

The PANAS assesses *positive affect* (interested, excited, strong, enthusiastic, proud, alert, inspired, determined, attentive, and active) and *negative affect* (distressed, upset, guilty, scared, hostile, irritable, ashamed, nervous, jittery, and afraid).

Higher scores on each subscale indicate greater positive or negative affect.

### Slice Dataframe and Select Columns

In this example, the PANAS was assessed *pre* and *post* Stress:

In [None]:
data_panas_pre, columns_panas_pre = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Pre")
data_panas_post, columns_panas_post = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Post")

### Compute PANAS

In [None]:
panas_pre = bp.questionnaires.panas(data_panas_pre)
panas_pre.head()

In [None]:
panas_post = bp.questionnaires.panas(data_panas_post)
panas_post.head()

## Example 3: Compute Multiple Scores at Once

Build a dictionary where each key corresponds to the questionnaire score to be computed and each value corresponds to the columns of the questionnaire. If some scores were assessed repeatedly (e.g. PANAS was assessed at two different time points, *pre* and *post*) separate the suffix from the computation by a `-` (e.g. `panas-pre` and `panas-post`).

### Load Example Questionnaire Data

In [None]:
data = bp.example_data.get_questionnaire_example()
data.head()

In [None]:
from biopsykit.questionnaires.utils import find_cols

dict_scores = {
    "pss": find_cols(data, starts_with="PSS")[1],
    "pasa": find_cols(data, starts_with="PASA")[1],
    "panas-pre": find_cols(data, starts_with="PANAS", ends_with="Pre")[1],
    "panas-post": find_cols(data, starts_with="PANAS", ends_with="Post")[1],
}

In [None]:
# Compute all scores and store in result dataframe
data_scores = bp.questionnaires.utils.compute_scores(data, dict_scores)
data_scores.head()

## Convert Scores into Long Format

In [None]:
data_scores.head()

Questionnaires that only have different *subscales* => Create one new index level `subscale`:

In [None]:
print(list(data_scores.filter(like="PASA").columns))

In [None]:
pasa = bp.questionnaires.utils.wide_to_long(data_scores, quest_name="PASA", levels=["subscale"])
pasa.head()

Questionnaires that have different *subscales* and different *assessment times* => Create two new index levels `subscale` and `time`

In [None]:
print(list(data_scores.filter(like="PANAS").columns))

[utils.wide_to_long()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.questionnaires.utils.html#biopsykit.questionnaires.utils.wide_to_long) converts the data into the wide format recursively from the *first* level (here: `subscale`) to the *last* level (here: `time`):

In [None]:
panas = bp.questionnaires.utils.wide_to_long(data_scores, quest_name="PANAS", levels=["subscale", "time"])
panas.head()

## Plotting

### In one Plot

In [None]:
fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
    data=panas, x="subscale", y="PANAS", hue="time", hue_order=["pre", "post"], palette=cmaps.faculties_light, ax=ax
);

Note: See Documentation for [plotting.feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.plotting.html#biopsykit.plotting.feature_boxplot) for further information of the used functions.

### In Subplots

#### Regular

In [None]:
fig, axs = plt.subplots(ncols=3)
bp.plotting.multi_feature_boxplot(
    data=panas,
    x="time",
    y="PANAS",
    features=["NegativeAffect", "PositiveAffect", "Total"],
    group="subscale",
    order=["pre", "post"],
    palette=cmaps.faculties_light,
    ax=axs,
)
fig.tight_layout()

Note: See Documentation for [plotting.multi_feature_boxplot()](https://biopsykit.readthedocs.io/en/latest/api/biopsykit.plotting.html#biopsykit.plotting.multi_feature_boxplot) for further information of the used functions.

#### With Significance Brackets

**Note**: See [<code>StatsPipeline_Plotting_Example.ipynb</code>](StatsPipeline_Plotting_Example.ipynb) for further information!

In [None]:
pipeline = bp.stats.StatsPipeline(
    steps=[("prep", "normality"), ("test", "pairwise_tests")],
    params={"dv": "PANAS", "groupby": "subscale", "subject": "subject", "within": "time"},
)

pipeline.apply(panas);

In [None]:
fig, axs = plt.subplots(ncols=3)

features = ["NegativeAffect", "PositiveAffect", "Total"]

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="within", plot_type="single", x="time", features=features, subplots=True
)

bp.plotting.multi_feature_boxplot(
    data=panas,
    x="time",
    y="PANAS",
    features=features,
    group="subscale",
    order=["pre", "post"],
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    palette=cmaps.faculties_light,
    ax=axs,
)
for ax, feature in zip(axs, features):
    ax.set_title(feature)

fig.tight_layout()