{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## About this resource" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All of the tutorial notebooks as well as information about the dependent package (`nma-ibl`) can be found at [nma-ibl GitHub repository](https://github.com/int-brain-lab/nma-ibl)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up the environment (particularly for Colab users)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please execute the cells below to install the necessary dependencies and prepare the environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# install IBL pipeline package to access and navigate the pipeline\n", "!pip install --quiet nma-ibl\n", "\n", "# Download data needed for plot recreation\n", "!wget https://github.com/vathes/nma-ibl/raw/master/uuids_trained1.npy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Replication of study figures " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the immense strenghts of [DataJoint](https://datajoint.io) pipelines lies in the tight data integrity and full tracking of all processing and computations as captured by the data pipeline. Here we demonstrate how a study figure based on the IBL pipeline can be replicated using data freshly fetched from the data pipeline." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the study [A standardized and reproducible method to measure decision-making in mice](https://doi.org/10.1101/2020.01.17.909838), the authors have shown that the animal behavior in a visual decision-making task is similar across 9 labs in 7 institutions across 3 countries, when using a standardized, reproduciable experimental hardware, software, and procedures." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook replicates Figure 2 from that work, which shows a similar learning rate of animals across different labs.\n", "This notebook was generated based on [this repository](https://github.com/int-brain-lab/paper-behavior), allowing us to perform figure replications on a local machine!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's connect to the database again. Use the public credentials `ibl-public`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import datajoint as dj\n", "\n", "dj.config['database.host'] = 'datajoint-public.internationalbrainlab.org'\n", "dj.config['database.user'] = 'ibl-public'\n", "dj.config['database.password'] = 'ibl-public'\n", "dj.conn() # explicitly verify that the connection to database can be established" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Import modules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To start with, we import some modules that will be used in the rest of the notebook:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import os\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also import modules that allow us to interact with the schemas and tables in the IBL DataJoint pipeline." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nma_ibl import reference, subject, behavior_analyses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some overview of what each schema contains:\n", "* `reference` schema contains lab, user and project information\n", "* `subject` schema contains information about subjects\n", "* `behavior_analyses` schema contains results of standardized analyses, including the training status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some modules that defines pre-defined figure settings" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nma_ibl.paper_behavior_functions import (query_subjects, seaborn_style,\n", " group_colors, institution_map, seaborn_style)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Initialize figure settings" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "seaborn_style()\n", "pal = group_colors()\n", "institution_map, col_names = institution_map()\n", "col_names = col_names[:-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Query subjects that are trained" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We pre-selected the \"trained\" animals based on the following criteria and save their uuids in the file `uuids_trained1.npy`\n", "- 0% and 6% contrasts had been introduced to the contrast set. \n", "- 200 trials were completed with >80% performance on easy (100% and 50% contrasts) trials in each of the last three sessions. \n", "- A four-parameter psychometric curve (bias, lapse left, lapse right, threshold) fitted to performance on all trials from the last three sessions had parameter values of bias < 16, threshold < 19, and lapses < 0.2.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "uuids = np.load('uuids_trained1.npy', allow_pickle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could then fetch the animals in the data pipeline corresponding to their uuids:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subjects = subject.Subject & [{'subject_uuid': uuid} for uuid in uuids]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are the 101 subjects reported in this study:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subjects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To include all information that are needed for subjects, we pre-queried subjects with the function `query_subject`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "use_subjects = query_subjects()\n", "use_subjects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One important field used in Figure 2 in this table is `date_trained`, which is the first date that the animal reached the trained criteria." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fetch data from the trained animals as a data frame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The summary statistics of the behavior are processed and saved in `behavior_analyses.BehavioralSummaryByDate`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "behavior_analyses.BehavioralSummaryByDate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- performance: the correct rate on all trials of the date\n", "- performance_easy: the correct rate on easy trials that contrast is greater than 50%\n", "- n_trials_date: totoal number of trials on the date\n", "- training_day: days since the animal is in training, starting from zero. \n", "- training_week: days since the animal is in training, starting from zero." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Join the BehavioralSummaryByDate table with subject query to gather info together:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = behavior_analyses.BehavioralSummaryByDate * use_subjects\n", "b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we could fetch the contents in the table and return the data as a data frame:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "behav = b.fetch(order_by='institution_short, subject_nickname, training_day',\n", " format='frame').reset_index()\n", "behav['institution_code'] = behav.institution_short.map(institution_map)\n", "behav" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Now compute how many mice are there for each institution and add the column to the dataframe" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "N = behav.groupby(['institution_code'])['subject_nickname'].nunique().to_dict()\n", "behav['n_mice'] = behav.institution_code.map(N)\n", "behav['institution_name'] = behav.institution_code + \\\n", " ': ' + behav.n_mice.apply(str) + ' mice'\n", "behav" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fig 2a, plot learning curves of animals in each of the institution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Fig 2a, we plot the performance on easy trials `performance_easy` as a function of `training_day` for each animal in each institution.\n", "\n", "For plotting purpose, we create another column only after the mouse is trained, and performance before the training date is marked as NaN:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "behav2 = pd.DataFrame([])\n", "for index, group in behav.groupby(['institution_code', 'subject_nickname']):\n", " group['performance_easy_trained'] = group.performance_easy\n", " group.loc[group['session_date'] < pd.to_datetime(group['date_trained']),\n", " 'performance_easy_trained'] = np.nan\n", " # add this\n", " behav2 = behav2.append(group)\n", "behav = behav2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally we generate the figure. The following cell may take some time to run." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "behav['performance_easy'] = behav.performance_easy * 100\n", "behav['performance_easy_trained'] = behav.performance_easy_trained * 100\n", "\n", "# plot one curve for each animal, one panel per lab\n", "fig = sns.FacetGrid(behav,\n", " col=\"institution_code\", col_wrap=4, col_order=col_names,\n", " sharex=True, sharey=True, aspect=1, hue=\"subject_uuid\", xlim=[-1, 41.5])\n", "fig.map(sns.lineplot, \"training_day\",\n", " \"performance_easy\", color='gray', alpha=0.3)\n", "fig.map(sns.lineplot, \"training_day\",\n", " \"performance_easy_trained\", color='darkblue', alpha=0.3)\n", "fig.set_titles(\"{col_name}\")\n", "for axidx, ax in enumerate(fig.axes.flat):\n", " ax.set_title(behav.institution_name.unique()[\n", " axidx], color=pal[axidx], fontweight='bold')\n", "\n", "# overlay the example mouse\n", "sns.lineplot(ax=fig.axes[0], x='training_day', y='performance_easy', color='black',\n", " data=behav[behav['subject_nickname'].str.contains('KS014')], legend=False)\n", "\n", "fig.set_axis_labels('Training day', 'Performance (%) on easy trials')\n", "fig.despine(trim=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Performance on easy contrast trials (50% and 100% contrast) across mice and laboratories. Each panel represents a different lab, and each curve represents a mouse (gray). The transition from gray to blue indicates when performance criteria for \"trained\" are met. Black, performance for example mouse `KS014`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fig 2b - plot the learning curve averaged over animals for all institutions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot all labs\n", "fig, ax1 = plt.subplots(1, 1, figsize=(5, 4))\n", "sns.lineplot(x='training_day', y='performance_easy', hue='institution_code', palette=pal,\n", " ax=ax1, legend=False, data=behav, ci=None)\n", "ax1.set_title('All labs', color='k', fontweight='bold')\n", "ax1.set(xlabel='Training day',\n", " ylabel='Performance (%) on easy trials', xlim=[-1, 41.5])\n", "\n", "seaborn_style()\n", "plt.tight_layout(pad=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Print some statistics" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "behav_summary_std = behav.groupby(['training_day'])[\n", " 'performance_easy'].std().reset_index()\n", "behav_summary = behav.groupby(['training_day'])[\n", " 'performance_easy'].mean().reset_index()\n", "print('number of days to reach 80% accuracy on easy trials: ')\n", "print(behav_summary.loc[behav_summary.performance_easy >\n", " 80, 'training_day'].min())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And that's it! You have now completed the introductory tutorials for navigating and accessing IBL data pipepline, and hopefully this gets sets you on a good track to take a deeper dive into this rich and exciting datasets.\n", "\n", "Be sire tp visit [DataJoint.io](https://datajoint.io) for further learning resources for DataJoint. Also be sure to signup to our DataJoint Slack group (link on the website) to join the vibrant DataJoint user community!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }