{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NWB Tutorial - Reading NWB Data in Python\n",
    "\n",
    "## Introduction\n",
    "\n",
    "In this tutorial, we will read single neuron spiking data that is in the NWB standard format and do a basic visualization of the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download the data\n",
    "\n",
    "First, let's download an NWB data file from the [DANDI neurophysiology data archive](https://dandiarchive.org/). \n",
    "\n",
    "An NWB file represents a single session of an experiment. It contains all the data of that session and the metadata required to understand the data. \n",
    "\n",
    "We will use data from one session of an experiment by [Chandravadia et al. (2020)](https://www.nature.com/articles/s41597-020-0415-9), where the authors recorded single neuron activity from the medial temporal lobes of human subjects while they performed a recognition memory task.\n",
    "\n",
    "1. Go to the DANDI page for this dataset: https://dandiarchive.org/dandiset/000004/draft\n",
    "2. Toward the top middle of the page, click the \"View Data\" button. \n",
    "<img src=\"images/dandi_download1.png\" width=\"600\">\n",
    "\n",
    "3. Click on the folder \"sub-P11MHM\" (click the folder name, not the checkbox).\n",
    "<img src=\"images/dandi_download2.png\" width=\"600\">\n",
    "\n",
    "4. Then click on the download symbol to the right of the filename \"sub-P11HMH_ses-20061101_ecephys+image.nwb\" to download the data file (69 MB) to your computer.\n",
    "<img src=\"images/dandi_download3.png\" width=\"600\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installing PyNWB\n",
    "\n",
    "To read data in the NWB format in Python, we recommend using the [PyNWB](https://pynwb.readthedocs.io/) package created by the NWB development team.\n",
    "\n",
    "First, install PyNWB using `pip` or `conda`. You will need Python 3.5+ installed.\n",
    "- `pip install -U pynwb`\n",
    "- `conda install -c conda-forge pynwb`\n",
    "\n",
    "In addition you will need `matplotlib` later on for the generation of plots. This can also be installed via `pip` or `conda`\n",
    "- `pip install -U matplotlib`\n",
    "- `conda install matplotlib`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using NWBHDF5IO\n",
    "\n",
    "To read and write data in NWB, we will use the `NWBHDF5IO` class. You can read this as \"NWB\" \"HDF5\" \"IO\". This class reads NWB data that is in the HDF5 storage format, a popular, hierarchical format for storing large-scale scientific data. \"IO\" stands for Input/Output which means reading/writing data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pynwb import NWBHDF5IO\n",
    "\n",
    "# Change the string below to the path of the file on your computer\n",
    "filepath = 'C:/Users/Ryan/Downloads/sub-P11HMH_ses-20061101_ecephys+image.nwb'\n",
    "io = NWBHDF5IO(filepath, 'r')  # open the file in read mode 'r'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Call the `NWBHDF5IO.read()` method to read the NWB data into an `NWBFile` object. Print the `NWBFile` object to inspect its contents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nwb = io.read()\n",
    "nwb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Access stimulus data in NWB\n",
    "\n",
    "Data representing stimuli that were presented to the experimental subject are stored in `NWBFile.stimulus`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nwb.stimulus"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`NWBFile.stimulus` is a dictionary that can contain PyNWB objects representing different types of data, such as images or time series of images. In this file, `NWBFile.stimulus` contains a single key `'StimulusPresentation'` with an `OpticalSeries` object representing what images were shown to the subject and at what times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stim = nwb.stimulus['StimulusPresentation']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lazy loading of datasets\n",
    "\n",
    "Data arrays are read passively from the NWB file. Calling the `data` attribute on a `TimeSeries` such as an `OpticalSeries` does not read the data values, but presents an `h5py` object that can be indexed to read the data. You can use the `[:]` operator to read the entire data array into memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stim = nwb.stimulus['StimulusPresentation']\n",
    "\n",
    "print(stim.data.shape)\n",
    "all_stim_data = stim.data[:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Time series data in NWB can have multiple dimensions, and the first dimension always represents time. So this dataset has 200 images of size 400x300 pixels with three channels (red, green, and blue)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Slicing datasets\n",
    "It is often preferable to read only a portion of the data, e.g., because the full data array is too large to fit into your computer's RAM. To do this, index or slice into the `data` attribute just like if you were indexing or slicing a numpy array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pic_index = 31\n",
    "img = stim.data[pic_index]\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "img = img[...,::-1]  # reverse the last dimension because the data were stored in BGR instead of RGB\n",
    "plt.imshow(img, aspect='auto')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Access single unit data in NWB\n",
    "Data and metadata about sorted single units are stored in `NWBFile.units`. `NWBFile.units` is a `Units` object that stores metadata about each single unit in a tabular form, where each row represents a unit and has spike times and additional metadata. Printing the `Units` object shows the column names and basic metadata associated with each unit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nwb.units"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also view the single unit data as a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "units_df = nwb.units.to_dataframe()\n",
    "units_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To access the spike times of the first single unit, index `nwb.units` with the column name 'spike_times' and then the row index, 0. All times in NWB are stored in seconds relative to the session start time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nwb.units['spike_times'][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize spiking activity relative to stimulus onset\n",
    "\n",
    "Now we can look at when these single units spike relative to when image stimuli were presented to the subject.\n",
    "\n",
    "Let's loop through the first 10 units and get their spike times. For each unit, loop through each stimulus onset time and compute the spike times relative to stimulus onset. Finally, create a raster plot and histogram of these aligned spike times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "before = 1.  # in seconds\n",
    "after = 3.\n",
    "\n",
    "stim_on_times = stim.timestamps[:]  # get the stimulus times for all stimuli\n",
    "\n",
    "for unit in range(10):\n",
    "    unit_spike_times = nwb.units['spike_times'][unit]\n",
    "    trial_spikes = []\n",
    "    for time in stim_on_times:\n",
    "        # compute spike times relative to stimulus onset\n",
    "        aligned_spikes = unit_spike_times - time\n",
    "        # keep only spike times in a given time window around the stimulus onset\n",
    "        aligned_spikes = aligned_spikes[(-before < aligned_spikes) & (aligned_spikes < after)]\n",
    "        trial_spikes.append(aligned_spikes)\n",
    "    fig, axs = plt.subplots(2, 1, sharex=True)\n",
    "    plt.xlabel('time (s)')\n",
    "    axs[0].eventplot(trial_spikes)\n",
    "    \n",
    "    axs[0].set_ylabel('trial')\n",
    "    axs[0].set_title('unit {}'.format(unit))\n",
    "    axs[0].axvline(0, color=[.5,.5,.5])\n",
    "    \n",
    "    axs[1].hist(np.hstack(trial_spikes), 30)\n",
    "    axs[1].axvline(0, color=[.5,.5,.5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Accessing trial information in NWB\n",
    "\n",
    "Trial data are stored in `NWBFile.trials`. `NWBFile.trials` is a `TimeIntervals` object that stores metadata about each trial in a tabular form, where each row represents a trial and has a start time, stop time, and additional metadata. Like for the `Units` table, you can also view the trial data as a pandas DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trials_df = nwb.trials.to_dataframe()\n",
    "trials_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The stimuli are stored in `nwb.stimuli` and can be mapped one to one to each row (trial) of `nwb.trials` based on the `stim_on_time` column. Let's visualize the first 10 images that were categorized as landscapes in the session. We can interact with the trial data as a PyNWB `TimeIntervals` type or as a pandas DataFrame. Here, let's use pandas functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert np.all(stim.timestamps[:] == trials_df.stim_on_time[:])\n",
    "\n",
    "stim_on_times_landscapes = trials_df[trials_df.category_name == 'landscapes'].stim_on_time\n",
    "for time in stim_on_times_landscapes[:10]:\n",
    "    img = np.squeeze(stim.data[np.where(stim.timestamps[:] == time)])\n",
    "    img = img[...,::-1]  # reverse the last dimension because the data were stored in BGR instead of RGB\n",
    "    plt.figure()\n",
    "    plt.imshow(img, aspect='auto')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exploring the NWB file\n",
    "\n",
    "You can explore the NWB file by printing the `NWBFile` object and accessing its attributes, but it may be useful to explore the data in a less programmatic, more visual way. You can use NWBWidgets, a package containing interactive widgets for visualizing NWB data, or you can use the HDFView tool, which can open any generic HDF5 file, which an NWB file is.\n",
    "\n",
    "### NWBWidgets\n",
    "\n",
    "To use NWBWidgets, first install NWBWidgets:\n",
    "- `pip install -U nwbwidgets`\n",
    "\n",
    "Then import and run the `nwb2widget` function on the `NWBFile` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nwbwidgets import nwb2widget\n",
    "\n",
    "nwb2widget(nwb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### HDFView\n",
    "\n",
    "To use HDFView to inspect and explore the NWB file, download and install HDFView from here: https://www.hdfgroup.org/downloads/hdfview/ and then open the NWB file using the program."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "This is an example of how to get started with understanding and analyzing public NWB datasets. This particular dataset was published with an extensive open analysis in both MATLAB and Python, which you can find [here](https://github.com/rutishauserlab/recogmem-release-NWB). For more datasets, or to publish your own NWB data for free, check out the DANDI archive. Also, make sure to check out the DANDI breakout session later in this event."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Learn more!\n",
    "\n",
    "## Python tutorials\n",
    "### See our tutorials for more details about your data type:\n",
    "* [Extracellular electrophysiology](https://pynwb.readthedocs.io/en/stable/tutorials/domain/ecephys.html#sphx-glr-tutorials-domain-ecephys-py)\n",
    "* [Calcium imaging](https://pynwb.readthedocs.io/en/stable/tutorials/domain/ophys.html#sphx-glr-tutorials-domain-ophys-py)\n",
    "* [Intracellular electrophysiology](https://pynwb.readthedocs.io/en/stable/tutorials/domain/icephys.html#sphx-glr-tutorials-domain-icephys-py)\n",
    "\n",
    "### Check out other tutorials that teach advanced NWB topics:\n",
    "* [Iterative data write](https://pynwb.readthedocs.io/en/stable/tutorials/general/iterative_write.html#sphx-glr-tutorials-general-iterative-write-py)\n",
    "* [Extensions](https://pynwb.readthedocs.io/en/stable/tutorials/general/extensions.html#sphx-glr-tutorials-general-extensions-py)\n",
    "* [Advanced HDF5 I/O](https://pynwb.readthedocs.io/en/stable/tutorials/general/advanced_hdf5_io.html#sphx-glr-tutorials-general-advanced-hdf5-io-py)\n",
    "\n",
    "\n",
    "## MATLAB tutorials\n",
    "* [Extracellular electrophysiology](https://neurodatawithoutborders.github.io/matnwb/tutorials/html/ecephys.html)\n",
    "* [Calcium imaging](https://neurodatawithoutborders.github.io/matnwb/tutorials/html/ophys.html)\n",
    "* [Intracellular electrophysiology](https://neurodatawithoutborders.github.io/matnwb/tutorials/html/icephys.html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}