{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Decoding (MVPA)\n\n.. include:: ../../links.inc\n\n## Design philosophy\nDecoding (a.k.a. MVPA) in MNE largely follows the machine learning API of the\nscikit-learn package.\nEach estimator implements ``fit``, ``transform``, ``fit_transform``, and\n(optionally) ``inverse_transform`` methods. For more details on this design,\nvisit scikit-learn_. For additional theoretical insights into the decoding\nframework in MNE :footcite:`KingEtAl2018`.\n\nFor ease of comprehension, we will denote instantiations of the class using\nthe same name as the class but in small caps instead of camel cases.\n\nLet's start by loading data for a simple two-class problem:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\nimport numpy as np\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nimport mne\nfrom mne.datasets import sample\nfrom mne.decoding import (\n    CSP,\n    GeneralizingEstimator,\n    LinearModel,\n    Scaler,\n    SlidingEstimator,\n    Vectorizer,\n    cross_val_multiscore,\n    get_coef,\n)\n\ndata_path = sample.data_path()\n\nsubjects_dir = data_path / \"subjects\"\nmeg_path = data_path / \"MEG\" / \"sample\"\nraw_fname = meg_path / \"sample_audvis_filt-0-40_raw.fif\"\ntmin, tmax = -0.200, 0.500\nevent_id = {\"Auditory/Left\": 1, \"Visual/Left\": 3}  # just use two\nraw = mne.io.read_raw_fif(raw_fname)\nraw.pick(picks=[\"grad\", \"stim\", \"eog\"])\n\n# The subsequent decoding analyses only capture evoked responses, so we can\n# low-pass the MEG data. Usually a value more like 40 Hz would be used,\n# but here low-pass at 20 so we can more heavily decimate, and allow\n# the example to run faster. The 2 Hz high-pass helps improve CSP.\nraw.load_data().filter(2, 20)\nevents = mne.find_events(raw, \"STI 014\")\n\n# Set up bad channels (modify to your needs)\nraw.info[\"bads\"] += [\"MEG 2443\"]  # bads + 2 more\n\n# Read epochs\nepochs = mne.Epochs(\n    raw,\n    events,\n    event_id,\n    tmin,\n    tmax,\n    proj=True,\n    picks=(\"grad\", \"eog\"),\n    baseline=(None, 0.0),\n    preload=True,\n    reject=dict(grad=4000e-13, eog=150e-6),\n    decim=3,\n    verbose=\"error\",\n)\nepochs.pick(picks=\"meg\", exclude=\"bads\")  # remove stim and EOG\ndel raw\n\nX = epochs.get_data(copy=False)  # MEG signals: n_epochs, n_meg_channels, n_times\ny = epochs.events[:, 2]  # target: auditory left vs visual left"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Transformation classes\n\n### Scaler\nThe :class:`mne.decoding.Scaler` will standardize the data based on channel\nscales. In the simplest modes ``scalings=None`` or ``scalings=dict(...)``,\neach data channel type (e.g., mag, grad, eeg) is treated separately and\nscaled by a constant. This is the approach used by e.g.,\n:func:`mne.compute_covariance` to standardize channel scales.\n\nIf ``scalings='mean'`` or ``scalings='median'``, each channel is scaled using\nempirical measures. Each channel is scaled independently by the mean and\nstandand deviation, or median and interquartile range, respectively, across\nall epochs and time points during :class:`~mne.decoding.Scaler.fit`\n(during training). The :meth:`~mne.decoding.Scaler.transform` method is\ncalled to transform data (training or test set) by scaling all time points\nand epochs on a channel-by-channel basis. To perform both the ``fit`` and\n``transform`` operations in a single call, the\n:meth:`~mne.decoding.Scaler.fit_transform` method may be used. To invert the\ntransform, :meth:`~mne.decoding.Scaler.inverse_transform` can be used. For\n``scalings='median'``, scikit-learn_ version 0.17+ is required.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Using this class is different from directly applying\n          :class:`sklearn.preprocessing.StandardScaler` or\n          :class:`sklearn.preprocessing.RobustScaler` offered by\n          scikit-learn_. These scale each *classification feature*, e.g.\n          each time point for each channel, with mean and standard\n          deviation computed across epochs, whereas\n          :class:`mne.decoding.Scaler` scales each *channel* using mean and\n          standard deviation computed across all of its time points\n          and epochs.</p></div>\n\n### Vectorizer\nScikit-learn API provides functionality to chain transformers and estimators\nby using :class:`sklearn.pipeline.Pipeline`. We can construct decoding\npipelines and perform cross-validation and grid-search. However scikit-learn\ntransformers and estimators generally expect 2D data\n(n_samples * n_features), whereas MNE transformers typically output data\nwith a higher dimensionality\n(e.g. n_samples * n_channels * n_frequencies * n_times). A Vectorizer\ntherefore needs to be applied between the MNE and the scikit-learn steps\nlike:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Uses all MEG sensors and time points as separate classification\n# features, so the resulting filters used are spatio-temporal\nclf = make_pipeline(\n    Scaler(epochs.info),\n    Vectorizer(),\n    LogisticRegression(solver=\"liblinear\"),  # liblinear is faster than lbfgs\n)\n\nscores = cross_val_multiscore(clf, X, y, cv=5, n_jobs=None)\n\n# Mean scores across cross-validation splits\nscore = np.mean(scores, axis=0)\nprint(f\"Spatio-temporal: {100 * score:0.1f}%\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### PSDEstimator\nThe :class:`mne.decoding.PSDEstimator`\ncomputes the power spectral density (PSD) using the multitaper\nmethod. It takes a 3D array as input, converts it into 2D and computes the\nPSD.\n\n### FilterEstimator\nThe :class:`mne.decoding.FilterEstimator` filters the 3D epochs data.\n\n## Spatial filters\n\nJust like temporal filters, spatial filters provide weights to modify the\ndata along the sensor dimension. They are popular in the BCI community\nbecause of their simplicity and ability to distinguish spatially-separated\nneural activity.\n\n### Common spatial pattern\n\n:class:`mne.decoding.CSP` is a technique to analyze multichannel data based\non recordings from two classes :footcite:`Koles1991` (see also\nhttps://en.wikipedia.org/wiki/Common_spatial_pattern).\n\nLet $X \\in R^{C\\times T}$ be a segment of data with\n$C$ channels and $T$ time points. The data at a single time point\nis denoted by $x(t)$ such that $X=[x(t), x(t+1), ..., x(t+T-1)]$.\nCommon spatial pattern (CSP) finds a decomposition that projects the signal\nin the original sensor space to CSP space using the following transformation:\n\n\\begin{align}x_{CSP}(t) = W^{T}x(t)\n   :name: csp\\end{align}\n\nwhere each column of $W \\in R^{C\\times C}$ is a spatial filter and each\nrow of $x_{CSP}$ is a CSP component. The matrix $W$ is also\ncalled the de-mixing matrix in other contexts. Let\n$\\Sigma^{+} \\in R^{C\\times C}$ and $\\Sigma^{-} \\in R^{C\\times C}$\nbe the estimates of the covariance matrices of the two conditions.\nCSP analysis is given by the simultaneous diagonalization of the two\ncovariance matrices\n\n\\begin{align}W^{T}\\Sigma^{+}W = \\lambda^{+}\n   :name: diagonalize_p\\end{align}\n\\begin{align}W^{T}\\Sigma^{-}W = \\lambda^{-}\n   :name: diagonalize_n\\end{align}\n\nwhere $\\lambda^{C}$ is a diagonal matrix whose entries are the\neigenvalues of the following generalized eigenvalue problem\n\n\\begin{align}\\Sigma^{+}w = \\lambda \\Sigma^{-}w\n   :name: eigen_problem\\end{align}\n\nLarge entries in the diagonal matrix corresponds to a spatial filter which\ngives high variance in one class but low variance in the other. Thus, the\nfilter facilitates discrimination between the two classes.\n\n.. topic:: Examples\n\n    * `ex-decoding-csp-eeg`\n    * `ex-decoding-csp-eeg-timefreq`\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>The winning entry of the Grasp-and-lift EEG competition in Kaggle used\n    the :class:`~mne.decoding.CSP` implementation in MNE and was featured as\n    a [script of the week](sotw_).</p></div>\n\n\nWe can use CSP with these data with:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "csp = CSP(n_components=3, norm_trace=False)\nclf_csp = make_pipeline(csp, LinearModel(LogisticRegression(solver=\"liblinear\")))\nscores = cross_val_multiscore(clf_csp, X, y, cv=5, n_jobs=None)\nprint(f\"CSP: {100 * scores.mean():0.1f}%\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Source power comodulation (SPoC)\nSource Power Comodulation (:class:`mne.decoding.SPoC`)\n:footcite:`DahneEtAl2014` identifies the composition of\northogonal spatial filters that maximally correlate with a continuous target.\n\nSPoC can be seen as an extension of the CSP where the target is driven by a\ncontinuous variable rather than a discrete variable. Typical applications\ninclude extraction of motor patterns using EMG power or audio patterns using\nsound envelope.\n\n.. topic:: Examples\n\n    * `ex-spoc-cmc`\n\n### xDAWN\n:class:`mne.preprocessing.Xdawn` is a spatial filtering method designed to\nimprove the signal to signal + noise ratio (SSNR) of the ERP responses\n:footcite:`RivetEtAl2009`. Xdawn was originally\ndesigned for P300 evoked potential by enhancing the target response with\nrespect to the non-target response. The implementation in MNE-Python is a\ngeneralization to any type of ERP.\n\n.. topic:: Examples\n\n    * `ex-xdawn-denoising`\n    * `ex-xdawn-decoding`\n\n### Effect-matched spatial filtering\nThe result of :class:`mne.decoding.EMS` is a spatial filter at each time\npoint and a corresponding time course :footcite:`SchurgerEtAl2013`.\nIntuitively, the result gives the similarity between the filter at\neach time point and the data vector (sensors) at that time point.\n\n.. topic:: Examples\n\n    * `ex-ems-filtering`\n\n### Patterns vs. filters\n\nWhen interpreting the components of the CSP (or spatial filters in general),\nit is often more intuitive to think about how $x(t)$ is composed of\nthe different CSP components $x_{CSP}(t)$. In other words, we can\nrewrite Equation :eq:`csp` as follows:\n\n\\begin{align}x(t) = (W^{-1})^{T}x_{CSP}(t)\n   :name: patterns\\end{align}\n\nThe columns of the matrix $(W^{-1})^T$ are called spatial patterns.\nThis is also called the mixing matrix. The example `ex-linear-patterns`\ndiscusses the difference between patterns and filters.\n\nThese can be plotted with:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Fit CSP on full data and plot\ncsp.fit(X, y)\ncsp.plot_patterns(epochs.info)\ncsp.plot_filters(epochs.info, scalings=1e-9)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Decoding over time\n\nThis strategy consists in fitting a multivariate predictive model on each\ntime instant and evaluating its performance at the same instant on new\nepochs. The :class:`mne.decoding.SlidingEstimator` will take as input a\npair of features $X$ and targets $y$, where $X$ has\nmore than 2 dimensions. For decoding over time the data $X$\nis the epochs data of shape n_epochs \u00d7 n_channels \u00d7 n_times. As the\nlast dimension of $X$ is the time, an estimator will be fit\non every time instant.\n\nThis approach is analogous to SlidingEstimator-based approaches in fMRI,\nwhere here we are interested in when one can discriminate experimental\nconditions and therefore figure out when the effect of interest happens.\n\nWhen working with linear models as estimators, this approach boils\ndown to estimating a discriminative spatial filter for each time instant.\n\n### Temporal decoding\n\nWe'll use a Logistic Regression for a binary classification as machine\nlearning model.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# We will train the classifier on all left visual vs auditory trials on MEG\n\nclf = make_pipeline(StandardScaler(), LogisticRegression(solver=\"liblinear\"))\n\ntime_decod = SlidingEstimator(clf, n_jobs=None, scoring=\"roc_auc\", verbose=True)\n# here we use cv=3 just for speed\nscores = cross_val_multiscore(time_decod, X, y, cv=3, n_jobs=None)\n\n# Mean scores across cross-validation splits\nscores = np.mean(scores, axis=0)\n\n# Plot\nfig, ax = plt.subplots()\nax.plot(epochs.times, scores, label=\"score\")\nax.axhline(0.5, color=\"k\", linestyle=\"--\", label=\"chance\")\nax.set_xlabel(\"Times\")\nax.set_ylabel(\"AUC\")  # Area Under the Curve\nax.legend()\nax.axvline(0.0, color=\"k\", linestyle=\"-\")\nax.set_title(\"Sensor space decoding\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can retrieve the spatial filters and spatial patterns if you explicitly\nuse a LinearModel\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "clf = make_pipeline(\n    StandardScaler(), LinearModel(LogisticRegression(solver=\"liblinear\"))\n)\ntime_decod = SlidingEstimator(clf, n_jobs=None, scoring=\"roc_auc\", verbose=True)\ntime_decod.fit(X, y)\n\ncoef = get_coef(time_decod, \"patterns_\", inverse_transform=True)\nevoked_time_gen = mne.EvokedArray(coef, epochs.info, tmin=epochs.times[0])\njoint_kwargs = dict(ts_args=dict(time_unit=\"s\"), topomap_args=dict(time_unit=\"s\"))\nevoked_time_gen.plot_joint(\n    times=np.arange(0.0, 0.500, 0.100), title=\"patterns\", **joint_kwargs\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Temporal generalization\n\nTemporal generalization is an extension of the decoding over time approach.\nIt consists in evaluating whether the model estimated at a particular\ntime instant accurately predicts any other time instant. It is analogous to\ntransferring a trained model to a distinct learning problem, where the\nproblems correspond to decoding the patterns of brain activity recorded at\ndistinct time instants.\n\nThe object to for Temporal generalization is\n:class:`mne.decoding.GeneralizingEstimator`. It expects as input $X$\nand $y$ (similarly to :class:`~mne.decoding.SlidingEstimator`) but\ngenerates predictions from each model for all time instants. The class\n:class:`~mne.decoding.GeneralizingEstimator` is generic and will treat the\nlast dimension as the one to be used for generalization testing. For\nconvenience, here, we refer to it as different tasks. If $X$\ncorresponds to epochs data then the last dimension is time.\n\nThis runs the analysis used in :footcite:`KingEtAl2014` and further detailed\nin :footcite:`KingDehaene2014`:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# define the Temporal generalization object\ntime_gen = GeneralizingEstimator(clf, n_jobs=None, scoring=\"roc_auc\", verbose=True)\n\n# again, cv=3 just for speed\nscores = cross_val_multiscore(time_gen, X, y, cv=3, n_jobs=None)\n\n# Mean scores across cross-validation splits\nscores = np.mean(scores, axis=0)\n\n# Plot the diagonal (it's exactly the same as the time-by-time decoding above)\nfig, ax = plt.subplots()\nax.plot(epochs.times, np.diag(scores), label=\"score\")\nax.axhline(0.5, color=\"k\", linestyle=\"--\", label=\"chance\")\nax.set_xlabel(\"Times\")\nax.set_ylabel(\"AUC\")\nax.legend()\nax.axvline(0.0, color=\"k\", linestyle=\"-\")\nax.set_title(\"Decoding MEG sensors over time\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Plot the full (generalization) matrix:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig, ax = plt.subplots(1, 1)\nim = ax.imshow(\n    scores,\n    interpolation=\"lanczos\",\n    origin=\"lower\",\n    cmap=\"RdBu_r\",\n    extent=epochs.times[[0, -1, 0, -1]],\n    vmin=0.0,\n    vmax=1.0,\n)\nax.set_xlabel(\"Testing Time (s)\")\nax.set_ylabel(\"Training Time (s)\")\nax.set_title(\"Temporal generalization\")\nax.axvline(0, color=\"k\")\nax.axhline(0, color=\"k\")\ncbar = plt.colorbar(im, ax=ax)\ncbar.set_label(\"AUC\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Projecting sensor-space patterns to source space\nIf you use a linear classifier (or regressor) for your data, you can also\nproject these to source space. For example, using our ``evoked_time_gen``\nfrom before:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "cov = mne.compute_covariance(epochs, tmax=0.0)\ndel epochs\nfwd = mne.read_forward_solution(meg_path / \"sample_audvis-meg-eeg-oct-6-fwd.fif\")\ninv = mne.minimum_norm.make_inverse_operator(evoked_time_gen.info, fwd, cov, loose=0.0)\nstc = mne.minimum_norm.apply_inverse(evoked_time_gen, inv, 1.0 / 9.0, \"dSPM\")\ndel fwd, inv"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "And this can be visualized using :meth:`stc.plot <mne.SourceEstimate.plot>`:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "brain = stc.plot(\n    hemi=\"split\", views=(\"lat\", \"med\"), initial_time=0.1, subjects_dir=subjects_dir\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Source-space decoding\n\nSource space decoding is also possible, but because the number of features\ncan be much larger than in the sensor space, univariate feature selection\nusing ANOVA f-test (or some other metric) can be done to reduce the feature\ndimension. Interpreting decoding results might be easier in source space as\ncompared to sensor space.\n\n.. topic:: Examples\n\n    * `ex-dec-st-source`\n\n## Exercise\n\n - Explore other datasets from MNE (e.g. Face dataset from SPM to predict\n   Face vs. Scrambled)\n\n## References\n.. footbibliography::\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}