{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Working with Epoch metadata\n\nThis tutorial shows how to add metadata to `~mne.Epochs` objects, and\nhow to use `Pandas query strings <pandas:indexing.query>` to select and\nplot epochs based on metadata properties.\n\nFor this tutorial we'll use a different dataset than usual: the\n`kiloword-dataset`, which contains EEG data averaged across 75 subjects\nwho were performing a lexical decision (word/non-word) task. The data is in\n`~mne.Epochs` format, with each epoch representing the response to a\ndifferent stimulus (word). As usual we'll start by importing the modules we\nneed and loading the data:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: The MNE-Python contributors.\n# License: BSD-3-Clause\n# Copyright the MNE-Python contributors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport pandas as pd\n\nimport mne\n\nkiloword_data_folder = mne.datasets.kiloword.data_path()\nkiloword_data_file = kiloword_data_folder / \"kword_metadata-epo.fif\"\nepochs = mne.read_epochs(kiloword_data_file)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Viewing ``Epochs`` metadata\n\n.. admonition:: Restrictions on metadata DataFrames\n   :class: sidebar warning\n\n   Metadata dataframes are less flexible than typical\n   :class:`Pandas DataFrames <pandas.DataFrame>`. For example, the allowed\n   data types are restricted to strings, floats, integers, or booleans;\n   and the row labels are always integers corresponding to epoch numbers.\n   Other capabilities of :class:`DataFrames <pandas.DataFrame>` such as\n   :class:`hierarchical indexing <pandas.MultiIndex>` are possible while the\n   `~mne.Epochs` object is in memory, but will not survive saving and\n   reloading the `~mne.Epochs` object to/from disk.\n\nThe metadata attached to `~mne.Epochs` objects is stored as a\n:class:`pandas.DataFrame`:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "assert isinstance(epochs.metadata, pd.DataFrame)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Each row corresponds to one epoch. The columns can contain just about any information\nyou want to store about each epoch; in this case, the metadata encodes\ninformation about the stimulus seen on each trial, including properties of\nthe visual word form itself (e.g., ``NumberOfLetters``, ``VisualComplexity``)\nas well as properties of what the word means (e.g., its ``Concreteness``) and\nits prominence in the English lexicon (e.g., ``WordFrequency``). Here are all\nthe variables; note that in a Jupyter notebook, viewing a\n:class:`pandas.DataFrame` gets rendered as an HTML table instead of the\nnormal Python output block:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "epochs.metadata"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Viewing the metadata values for a given epoch and metadata variable is done\nusing any of the `Pandas indexing <pandas:/reference/indexing.rst>`\nmethods such as :obj:`~pandas.DataFrame.loc`,\n:obj:`~pandas.DataFrame.iloc`, :obj:`~pandas.DataFrame.at`,\nand :obj:`~pandas.DataFrame.iat`. Because the\nindex of the dataframe is the integer epoch number, the name- and index-based\nselection methods will work similarly for selecting rows, except that\nname-based selection (with :obj:`~pandas.DataFrame.loc`) is inclusive of the\nendpoint:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"Name-based selection with .loc\")\nprint(epochs.metadata.loc[2:4])\n\nprint(\"\\nIndex-based selection with .iloc\")\nprint(epochs.metadata.iloc[2:4])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Modifying the metadata\n\nLike any :class:`pandas.DataFrame`, you can modify the data or add columns as\nneeded. Here we convert the ``NumberOfLetters`` column from :class:`float` to\n:class:`integer <int>` data type, and add a :class:`boolean <bool>` column\nthat arbitrarily divides the variable ``VisualComplexity`` into high and low\ngroups.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "epochs.metadata[\"NumberOfLetters\"] = epochs.metadata[\"NumberOfLetters\"].map(int)\nepochs.metadata[\"HighComplexity\"] = epochs.metadata[\"VisualComplexity\"] > 65\nepochs.metadata.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Selecting epochs using metadata queries\n\nAll `~mne.Epochs` objects can be subselected by event name, index, or\n:term:`slice` (see `tut-section-subselect-epochs`). But\n`~mne.Epochs` objects with metadata can also be queried using\n`Pandas query strings <pandas:indexing.query>` by passing the query\nstring just as you would normally pass an event name. For example:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(epochs['WORD.str.startswith(\"dis\")'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This capability uses the :meth:`pandas.DataFrame.query` method under the\nhood, so you can check out the documentation of that method to learn how to\nformat query strings. Here's another example:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(epochs[\"Concreteness > 6 and WordFrequency < 1\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note also that traditional epochs subselection by condition name still works;\nMNE-Python will try the traditional method first before falling back on rich\nmetadata querying.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "epochs[\"solenoid\"].compute_psd().plot(picks=\"data\", exclude=\"bads\", amplitude=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "One use of the Pandas query string approach is to select specific words for\nplotting:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "words = [\"typhoon\", \"bungalow\", \"colossus\", \"drudgery\", \"linguist\", \"solenoid\"]\nepochs[f\"WORD in {words}\"].plot(n_channels=29, events=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Notice that in this dataset, each \"condition\" (A.K.A., each word) occurs only\nonce, whereas with the `sample-dataset` dataset each condition (e.g.,\n\"auditory/left\", \"visual/right\", etc) occurred dozens of times. This makes\nthe Pandas querying methods especially useful when you want to aggregate\nepochs that have different condition names but that share similar stimulus\nproperties. For example, here we group epochs based on the number of letters\nin the stimulus word, and compare the average signal at electrode ``Pz`` for\neach group:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "evokeds = dict()\nquery = \"NumberOfLetters == {}\"\nfor n_letters in epochs.metadata[\"NumberOfLetters\"].unique():\n    evokeds[str(n_letters)] = epochs[query.format(n_letters)].average()\n\nmne.viz.plot_compare_evokeds(evokeds, cmap=(\"word length\", \"viridis\"), picks=\"Pz\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Metadata can also be useful for sorting the epochs in an image plot. For\nexample, here we order the epochs based on word frequency to see if there's a\npattern to the latency or intensity of the response:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sort_order = np.argsort(epochs.metadata[\"WordFrequency\"])\nepochs.plot_image(order=sort_order, picks=\"Pz\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Although there's no obvious relationship in this case, such analyses may be\nuseful for metadata variables that more directly index the time course of\nstimulus processing (such as reaction time).\n\n\n## Adding metadata to an ``Epochs`` object\n\nYou can add a metadata :class:`~pandas.DataFrame` to any\n`~mne.Epochs` object (or replace existing metadata) simply by\nassigning to the :attr:`~mne.Epochs.metadata` attribute:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "new_metadata = pd.DataFrame(\n    data=[\"foo\"] * len(epochs), columns=[\"bar\"], index=range(len(epochs))\n)\nepochs.metadata = new_metadata\nepochs.metadata.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can remove metadata from an `~mne.Epochs` object by setting its\nmetadata to ``None``:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "epochs.metadata = None"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}