""" .. _tut-epochs-metadata: =========================== Working with Epoch metadata =========================== This tutorial shows how to add metadata to `~mne.Epochs` objects, and how to use :ref:`Pandas query strings ` to select and plot epochs based on metadata properties. For this tutorial we'll use a different dataset than usual: the :ref:`kiloword-dataset`, which contains EEG data averaged across 75 subjects who were performing a lexical decision (word/non-word) task. The data is in `~mne.Epochs` format, with each epoch representing the response to a different stimulus (word). As usual we'll start by importing the modules we need and loading the data: """ # Authors: The MNE-Python contributors. # License: BSD-3-Clause # Copyright the MNE-Python contributors. # %% import numpy as np import pandas as pd import mne kiloword_data_folder = mne.datasets.kiloword.data_path() kiloword_data_file = kiloword_data_folder / "kword_metadata-epo.fif" epochs = mne.read_epochs(kiloword_data_file) # %% # Viewing ``Epochs`` metadata # ^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # .. admonition:: Restrictions on metadata DataFrames # :class: sidebar warning # # Metadata dataframes are less flexible than typical # :class:`Pandas DataFrames `. For example, the allowed # data types are restricted to strings, floats, integers, or booleans; # and the row labels are always integers corresponding to epoch numbers. # Other capabilities of :class:`DataFrames ` such as # :class:`hierarchical indexing ` are possible while the # `~mne.Epochs` object is in memory, but will not survive saving and # reloading the `~mne.Epochs` object to/from disk. # # The metadata attached to `~mne.Epochs` objects is stored as a # :class:`pandas.DataFrame`: assert isinstance(epochs.metadata, pd.DataFrame) # %% # Each row corresponds to one epoch. The columns can contain just about any information # you want to store about each epoch; in this case, the metadata encodes # information about the stimulus seen on each trial, including properties of # the visual word form itself (e.g., ``NumberOfLetters``, ``VisualComplexity``) # as well as properties of what the word means (e.g., its ``Concreteness``) and # its prominence in the English lexicon (e.g., ``WordFrequency``). Here are all # the variables; note that in a Jupyter notebook, viewing a # :class:`pandas.DataFrame` gets rendered as an HTML table instead of the # normal Python output block: epochs.metadata # %% # Viewing the metadata values for a given epoch and metadata variable is done # using any of the :ref:`Pandas indexing ` # methods such as :obj:`~pandas.DataFrame.loc`, # :obj:`~pandas.DataFrame.iloc`, :obj:`~pandas.DataFrame.at`, # and :obj:`~pandas.DataFrame.iat`. Because the # index of the dataframe is the integer epoch number, the name- and index-based # selection methods will work similarly for selecting rows, except that # name-based selection (with :obj:`~pandas.DataFrame.loc`) is inclusive of the # endpoint: print("Name-based selection with .loc") print(epochs.metadata.loc[2:4]) print("\nIndex-based selection with .iloc") print(epochs.metadata.iloc[2:4]) # %% # Modifying the metadata # ^^^^^^^^^^^^^^^^^^^^^^ # # Like any :class:`pandas.DataFrame`, you can modify the data or add columns as # needed. Here we convert the ``NumberOfLetters`` column from :class:`float` to # :class:`integer ` data type, and add a :class:`boolean ` column # that arbitrarily divides the variable ``VisualComplexity`` into high and low # groups. epochs.metadata["NumberOfLetters"] = epochs.metadata["NumberOfLetters"].map(int) epochs.metadata["HighComplexity"] = epochs.metadata["VisualComplexity"] > 65 epochs.metadata.head() # %% # Selecting epochs using metadata queries # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # All `~mne.Epochs` objects can be subselected by event name, index, or # :term:`slice` (see :ref:`tut-section-subselect-epochs`). But # `~mne.Epochs` objects with metadata can also be queried using # :ref:`Pandas query strings ` by passing the query # string just as you would normally pass an event name. For example: print(epochs['WORD.str.startswith("dis")']) # %% # This capability uses the :meth:`pandas.DataFrame.query` method under the # hood, so you can check out the documentation of that method to learn how to # format query strings. Here's another example: print(epochs["Concreteness > 6 and WordFrequency < 1"]) # %% # Note also that traditional epochs subselection by condition name still works; # MNE-Python will try the traditional method first before falling back on rich # metadata querying. epochs["solenoid"].compute_psd().plot(picks="data", exclude="bads", amplitude=False) # %% # One use of the Pandas query string approach is to select specific words for # plotting: words = ["typhoon", "bungalow", "colossus", "drudgery", "linguist", "solenoid"] epochs[f"WORD in {words}"].plot(n_channels=29, events=True) # %% # Notice that in this dataset, each "condition" (A.K.A., each word) occurs only # once, whereas with the :ref:`sample-dataset` dataset each condition (e.g., # "auditory/left", "visual/right", etc) occurred dozens of times. This makes # the Pandas querying methods especially useful when you want to aggregate # epochs that have different condition names but that share similar stimulus # properties. For example, here we group epochs based on the number of letters # in the stimulus word, and compare the average signal at electrode ``Pz`` for # each group: evokeds = dict() query = "NumberOfLetters == {}" for n_letters in epochs.metadata["NumberOfLetters"].unique(): evokeds[str(n_letters)] = epochs[query.format(n_letters)].average() # sphinx_gallery_thumbnail_number = 3 mne.viz.plot_compare_evokeds(evokeds, cmap=("word length", "viridis"), picks="Pz") # %% # Metadata can also be useful for sorting the epochs in an image plot. For # example, here we order the epochs based on word frequency to see if there's a # pattern to the latency or intensity of the response: sort_order = np.argsort(epochs.metadata["WordFrequency"]) epochs.plot_image(order=sort_order, picks="Pz") # %% # Although there's no obvious relationship in this case, such analyses may be # useful for metadata variables that more directly index the time course of # stimulus processing (such as reaction time). # # # Adding metadata to an ``Epochs`` object # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # You can add a metadata :class:`~pandas.DataFrame` to any # `~mne.Epochs` object (or replace existing metadata) simply by # assigning to the :attr:`~mne.Epochs.metadata` attribute: new_metadata = pd.DataFrame( data=["foo"] * len(epochs), columns=["bar"], index=range(len(epochs)) ) epochs.metadata = new_metadata epochs.metadata.head() # %% # You can remove metadata from an `~mne.Epochs` object by setting its # metadata to ``None``: epochs.metadata = None