{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Regression-based baseline correction\n\nThis tutorial compares traditional baseline correction (adding or subtracting a\nscalar amount from every timepoint in an epoch) to a regression-based approach\nto baseline correction (which allows the effect of the baseline period to vary\nby timepoint). Specifically, this tutorial follows the method introduced by\n:footcite:t:`Alday2019`.\n\nThere are at least two reasons you might consider using regression-based\nbaseline correction:\n\n1. Unlike traditional baseline correction, the regression-based approach does\n   not assume that the effect of the baseline is equivalent between different\n   experimental conditions. Thus it is safer against introduced bias.\n\n2. Assuming that pre-trial baseline signal level is mostly determined by slow\n   drifts in the data, the further away (in time) you get from the baseline\n   period, the less likely it is that the signal level is similar in amplitude\n   to the baseline amplitude. Thus using a time-varying baseline correction is\n   less likely to introduce signal distortions / spurious effects in the later\n   spans of long-duration epochs.\n\nOne issue that affects both traditional and regression-based baseline\ncorrection is the question of what time window to choose as the baseline\nwindow.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: Carina Forster\n# Email: carinaforster0611@gmail.com\n\n# License: BSD-3-Clause\n# Copyright the MNE-Python contributors.\n\nimport numpy as np\n\nimport mne"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Load the data\n\nWe'll start by loading the MNE-Python `sample dataset <sample-dataset>`\nand extracting the experimental events to get trial locations and trial\ntypes. Since for this tutorial we're only going to look at EEG channels, we\ncan drop the other channel types, to speed things up:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data_path = mne.datasets.sample.data_path()\nraw_fname = data_path / \"MEG\" / \"sample\" / \"sample_audvis_filt-0-40_raw.fif\"\nraw = mne.io.read_raw_fif(raw_fname, preload=True)\n\nevents = mne.find_events(raw)\n\nraw.pick(picks=\"eeg\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Here we merge visual and auditory events from both hemispheres, and make our\n``event_id`` dictionary for use during epoching.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "events = mne.merge_events(events, [1, 2], 1)  # auditory events will be \"1\"\nevents = mne.merge_events(events, [3, 4], 2)  # visual events will be \"2\"\nevent_id = {\"auditory\": 1, \"visual\": 2}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Preprocessing\n\nNext we'll define some variables needed to epoch and preprocess the\ndata. We'll be combining left- and right-side stimuli, so we'll look at a\nsingle *central* electrode to visualize the difference between auditory and\nvisual trials.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "tmin, tmax = -0.2, 0.5\nlowpass, highpass = 40, 0.1\nbaseline_tmin, baseline_tmax = None, 0  # None takes the first timepoint\nch = \"EEG 021\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We'll do some standard preprocessing (a bandpass filter) and then epoch\nthe data. Note that we don't baseline correct the epochs (we specify\n``baseline=None``); we just minimally clean the data by rejecting channels\nwith very high or low amplitudes.  Note also that we operate on a *copy* of\nthe data so that we can later compare this with traditional baselining.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "raw_filtered = raw.copy().filter(highpass, lowpass)\n\nepochs = mne.Epochs(\n    raw_filtered,\n    events,\n    event_id,\n    tmin=tmin,\n    tmax=tmax,\n    reject=dict(eeg=150e-6),\n    flat=dict(eeg=5e-6),\n    baseline=None,\n    preload=True,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Traditional baselining\n\nFirst let's baseline correct the data the traditional way. We average epochs\nwithin each condition, and subtract the condition-specific baseline\nseparately for auditory and visual trials.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "baseline = (baseline_tmin, baseline_tmax)\ntrad_aud = epochs[\"auditory\"].average().apply_baseline(baseline)\ntrad_vis = epochs[\"visual\"].average().apply_baseline(baseline)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Regression-based baselining\n\nNow let's try out the regression-based baseline correction approach. We'll\nuse :func:`mne.stats.linear_regression`, which needs a *design matrix* to\nrepresent the regression predictors. We'll use four predictors: one for each\nexperimental condition, one for the effect of baseline, and one that is an\ninteraction between the baseline and one of the conditions (to account for\nany heterogeneity of the effect of baseline between the two conditions). Here\nare the first two:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "aud_predictor = epochs.events[:, 2] == epochs.event_id[\"auditory\"]\nvis_predictor = epochs.events[:, 2] == epochs.event_id[\"visual\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The baseline predictor is a bit trickier to compute: we'll find the average\nvalue within the baseline period *separately for each epoch*, and use that\nvalue as our (trial-level) predictor.  Here, since we're focused on one\nparticular channel, we'll use the baseline value *in that channel* as our\npredictor, but depending on your research question you may want to do this\nseaprately for each channel or combine information across channels.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "baseline_predictor = (\n    epochs.copy()\n    .crop(*baseline)\n    .pick([ch])\n    .get_data(copy=False)  # convert to NumPy array\n    .mean(axis=-1)  # average across timepoints\n    .squeeze()  # only 1 channel, so remove singleton dimension\n)\nbaseline_predictor *= 1e6  # convert V \u2192 \u03bcV"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note that we converted *just the predictor* (not the epochs data) from Volts\nto microVolts. This is done for regression-model-fitting purposes (very small\nvalues can make model fitting unstable).\n\nNow we can set up the design matrix, stacking the 1-D predictors as rows,\nthen transposing with ``.T`` to make them columns. Combining them into one\n:func:`~numpy.array` will also automatically convert the\n:class:`boolean <bool>` ``aud_predictor`` and ``vis_predictor`` into\nones and zeros:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "design_matrix = np.vstack(\n    [\n        aud_predictor,\n        vis_predictor,\n        baseline_predictor,\n        baseline_predictor * vis_predictor,\n    ]\n).T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally we fit the regression model:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "reg_model = mne.stats.linear_regression(\n    epochs, design_matrix, names=[\"auditory\", \"visual\", \"baseline\", \"baseline:visual\"]\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The function returns a dictionary of ``mne.stats.regression.lm`` objects,\nwhich are each a :func:`~collections.namedtuple` with the various estimated\nvalues stored as if it were an :class:`~mne.Evoked` object. Let's inspect it:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(reg_model.keys())\nprint(f\"model attributes: {reg_model['auditory']._fields}\")\nprint(\"values are stored in Evoked objects:\")\nprint(reg_model[\"auditory\"].t_val)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Plot the baseline regressor\n\nFirst let's look at the estimated effect of the baseline period. What we care\nabout is the ``beta`` values, which tell us how strongly predictive the\nbaseline value is at each timepoint. The model will estimate its\neffectiveness *for every channel* but since we used only one channel to form\nour baseline predictor, let's examine how it looks for that channel only.\nWe'll add a horizontal line at \u03b2=1 to represent traditional baselining, where\nthe effect is assumed to be constant across timepoints:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "effect_of_baseline = reg_model[\"baseline\"].beta\neffect_of_baseline.plot(\n    picks=ch,\n    hline=[1.0],\n    units=dict(eeg=r\"$\\beta$ value\"),\n    titles=dict(eeg=ch),\n    selectable=False,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Unsurprisingly, the trend is that the farther away in time we get from the\nbaseline period, the weaker the predictive value of the baseline amplitude\nbecomes. Put another way, early time points (in this data) should be more\nstrongly baseline-corrected than later time points.\n\n## Plot the ERPs\n\nNow let's look at the ``beta`` values for the two\nconditions (``auditory`` and ``visual``): these are the coefficients that\nrepresent the \"pure\" influence of the experimental stimuli on the signal,\nafter taking into account the (time-varying!) effect of the baseline. We'll\nplot them together, side-by-side with the traditional baseline approach:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "reg_aud = reg_model[\"auditory\"].beta\nreg_vis = reg_model[\"visual\"].beta\n\nkwargs = dict(picks=ch, show_sensors=False, truncate_yaxis=False)\nmne.viz.plot_compare_evokeds(\n    dict(auditory=trad_aud, visual=trad_vis), title=\"Traditional\", **kwargs\n)\nmne.viz.plot_compare_evokeds(\n    dict(auditory=reg_aud, visual=reg_vis), title=\"Regression-based\", **kwargs\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "They look pretty similar, but there are some subtle differences in how far\napart the two conditions are (e.g., around 400-500 ms).\n\n## Plot the scalp topographies and difference waves\n\nNow let's compare the\nscalp topographies for the traditional and regression-based approach. We'll\ndo this by computing the difference between conditions:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "diff_traditional = mne.combine_evoked([trad_aud, trad_vis], weights=[1, -1])\ndiff_regression = mne.combine_evoked([reg_aud, reg_vis], weights=[1, -1])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Before we plot, let's make sure we get the same color scale for both figures:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "vmin = min(diff_traditional.get_data().min(), diff_regression.get_data().min()) * 1e6\nvmax = max(diff_traditional.get_data().max(), diff_regression.get_data().max()) * 1e6\ntopo_kwargs = dict(vlim=(vmin, vmax), ch_type=\"eeg\", times=np.linspace(0.05, 0.45, 9))\n\nfig = diff_traditional.plot_topomap(**topo_kwargs)\nfig.suptitle(\"Traditional\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig = diff_regression.plot_topomap(**topo_kwargs)\nfig.suptitle(\"Regression-based\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can see that the regression-based approach shows *stronger* difference\nbetween conditions early on (around 100-150 ms) and *weaker* differences\nlater (around 250-350 ms, and again around 450 ms). This is also reflected in\nthe difference waves themselves: notice how the regression-based difference\nwave is *further from zero* around 150 ms but *closer to zero* around 250-350\nms.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "title = \"Difference in evoked potential (auditory minus visual)\"\nfig = mne.viz.plot_compare_evokeds(\n    dict(Traditional=diff_traditional, Regression=diff_regression),\n    title=title,\n    **kwargs,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Examine the interaction term\n\nFinally, let's look at the interaction term from the regression model. This\ntells us whether the effect of the baseline period is different in the visual\ntrials versus its effect in the auditory trials. Here we'll add a horizontal\nline at zero, indicating the assumption that there ought to be no difference\n(i.e., baselines should not be systematically higher in one type of trial,\nand there should not be a difference in how long the effect of the baseline\npersists through time in each type of trial).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "interaction_effect = reg_model[\"baseline:visual\"].beta\ninteraction_effect.plot(\n    picks=ch,\n    hline=[0.0],\n    units=dict(eeg=r\"$\\beta$ value\"),\n    titles=dict(eeg=ch),\n    selectable=False,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Indeed, the interaction beta weights are rather small and seem to fluctuate\nrandomly around zero, suggesting that there is no systematic difference in\nthe effect of the baseline on our two trial types.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## References\n.. footbibliography::\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}