{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Exporting Epochs to Pandas DataFrames\n\nThis tutorial shows how to export the data in :class:`~mne.Epochs` objects to a\n:class:`Pandas DataFrame <pandas.DataFrame>`, and applies a typical Pandas\n:doc:`split-apply-combine <pandas:user_guide/groupby>` workflow to examine the\nlatencies of the response maxima across epochs and conditions.\n\nWe'll use the `sample-dataset` dataset, but load a version of the raw file\nthat has already been filtered and downsampled, and has an average reference\napplied to its EEG channels. As usual we'll start by importing the modules we\nneed and loading the data:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: The MNE-Python contributors.\n# License: BSD-3-Clause\n# Copyright the MNE-Python contributors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\nimport seaborn as sns\n\nimport mne\n\nsample_data_folder = mne.datasets.sample.data_path()\nsample_data_raw_file = (\n    sample_data_folder / \"MEG\" / \"sample\" / \"sample_audvis_filt-0-40_raw.fif\"\n)\nraw = mne.io.read_raw_fif(sample_data_raw_file, verbose=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Next we'll load a list of events from file, map them to condition names with\nan event dictionary, set some signal rejection thresholds (cf.\n`tut-reject-epochs-section`), and segment the continuous data into\nepochs:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sample_data_events_file = (\n    sample_data_folder / \"MEG\" / \"sample\" / \"sample_audvis_filt-0-40_raw-eve.fif\"\n)\nevents = mne.read_events(sample_data_events_file)\n\nevent_dict = {\n    \"auditory/left\": 1,\n    \"auditory/right\": 2,\n    \"visual/left\": 3,\n    \"visual/right\": 4,\n}\n\nreject_criteria = dict(\n    mag=3000e-15,  # 3000 fT\n    grad=3000e-13,  # 3000 fT/cm\n    eeg=100e-6,  # 100 \u00b5V\n    eog=200e-6,\n)  # 200 \u00b5V\n\ntmin, tmax = (-0.2, 0.5)  # epoch from 200 ms before event to 500 ms after it\nbaseline = (None, 0)  # baseline period from start of epoch to time=0\n\nepochs = mne.Epochs(\n    raw,\n    events,\n    event_dict,\n    tmin,\n    tmax,\n    proj=True,\n    baseline=baseline,\n    reject=reject_criteria,\n    preload=True,\n)\ndel raw"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Converting an ``Epochs`` object to a ``DataFrame``\n\nOnce we have our :class:`~mne.Epochs` object, converting it to a\n:class:`~pandas.DataFrame` is simple: just call :meth:`epochs.to_data_frame()\n<mne.Epochs.to_data_frame>`. Each channel's data will be a column of the new\n:class:`~pandas.DataFrame`, alongside three additional columns of event name,\nepoch number, and sample time. Here we'll just show the first few rows and\ncolumns:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "df = epochs.to_data_frame()\ndf.iloc[:5, :10]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Scaling time and channel values\n\nBy default, time values are converted from seconds to milliseconds and\nthen rounded to the nearest integer; if you don't want this, you can pass\n``time_format=None`` to keep time as a :class:`float` value in seconds, or\nconvert it to a :class:`~pandas.Timedelta` value via\n``time_format='timedelta'``.\n\nNote also that, by default, channel measurement values are scaled so that EEG\ndata are converted to \u00b5V, magnetometer data are converted to fT, and\ngradiometer data are converted to fT/cm. These scalings can be customized\nthrough the ``scalings`` parameter, or suppressed by passing\n``scalings=dict(eeg=1, mag=1, grad=1)``.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "df = epochs.to_data_frame(time_format=None, scalings=dict(eeg=1, mag=1, grad=1))\ndf.iloc[:5, :10]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Notice that the time values are no longer integers, and the channel values\nhave changed by several orders of magnitude compared to the earlier\nDataFrame.\n\n\n### Setting the ``index``\n\nIt is also possible to move one or more of the indicator columns (event name,\nepoch number, and sample time) into the `index <pandas:indexing>`, by\npassing a string or list of strings as the ``index`` parameter. We'll also\ndemonstrate here the effect of ``time_format='timedelta'``, yielding\n:class:`~pandas.Timedelta` values in the \"time\" column.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "df = epochs.to_data_frame(index=[\"condition\", \"epoch\"], time_format=\"timedelta\")\ndf.iloc[:5, :10]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Wide- versus long-format DataFrames\n\nAnother parameter, ``long_format``, determines whether each channel's data\nis in a separate column of the :class:`~pandas.DataFrame`\n(``long_format=False``), or whether the measured values are pivoted into a\nsingle ``'value'`` column with an extra indicator column for the channel name\n(``long_format=True``). Passing ``long_format=True`` will also create an\nextra column ``ch_type`` indicating the channel type.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "long_df = epochs.to_data_frame(time_format=None, index=\"condition\", long_format=True)\nlong_df.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Generating the :class:`~pandas.DataFrame` in long format can be helpful when\nusing other Python modules for subsequent analysis or plotting. For example,\nhere we'll take data from the \"auditory/left\" condition, pick a couple MEG\nchannels, and use :func:`seaborn.lineplot` to automatically plot the mean and\nconfidence band for each channel, with confidence computed across the epochs\nin the chosen condition:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.figure()\nchannels = [\"MEG 1332\", \"MEG 1342\"]\ndata = long_df.loc[\"auditory/left\"].query(\"channel in @channels\")\n# convert channel column (CategoryDtype \u2192 string; for a nicer-looking legend)\ndata[\"channel\"] = data[\"channel\"].astype(str)\ndata.reset_index(drop=True, inplace=True)  # speeds things up\nsns.lineplot(x=\"time\", y=\"value\", hue=\"channel\", data=data)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can also now use all the power of Pandas for grouping and transforming our\ndata. Here, we find the latency of peak activation of 2 gradiometers (one\nnear auditory cortex and one near visual cortex), and plot the distribution\nof the timing of the peak in each channel as a :func:`~seaborn.violinplot`:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.figure()\ndf = epochs.to_data_frame(time_format=None)\npeak_latency = (\n    df.filter(regex=r\"condition|epoch|MEG 1332|MEG 2123\")\n    .groupby([\"condition\", \"epoch\"])\n    .aggregate(lambda x: df[\"time\"].iloc[x.idxmax()])\n    .reset_index()\n    .melt(\n        id_vars=[\"condition\", \"epoch\"], var_name=\"channel\", value_name=\"latency of peak\"\n    )\n)\nax = sns.violinplot(\n    x=\"channel\",\n    y=\"latency of peak\",\n    hue=\"condition\",\n    data=peak_latency,\n    palette=\"deep\",\n    saturation=1,\n)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}