{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## About this resource"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All of the tutorial notebooks as well as information about the dependent package (`nma-ibl`) can be found at [nma-ibl GitHub repository](https://github.com/int-brain-lab/nma-ibl)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting up the environment (particularly for Colab users)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Please execute the cells below to install the necessary dependencies and prepare the environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install IBL pipeline package to access and navigate the pipeline\n",
    "!pip install --quiet nma-ibl\n",
    "\n",
    "# Download data needed for plot recreation\n",
    "!wget https://github.com/vathes/nma-ibl/raw/master/uuids_trained1.npy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Replication of study figures "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One of the immense strenghts of [DataJoint](https://datajoint.io) pipelines lies in the tight data integrity and full tracking of all processing and computations as captured by the data pipeline. Here we demonstrate how a study figure based on the IBL pipeline can be replicated using data freshly fetched from the data pipeline."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the study [A standardized and reproducible method to measure decision-making in mice](https://doi.org/10.1101/2020.01.17.909838), the authors have shown that the animal behavior in a visual decision-making task is similar across 9 labs in 7 institutions across 3 countries, when using a standardized, reproduciable experimental hardware, software, and procedures."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook replicates Figure 2 from that work, which shows a similar learning rate of animals across different labs.\n",
    "This notebook was generated based on [this repository](https://github.com/int-brain-lab/paper-behavior), allowing us to perform figure replications on a local machine!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's connect to the database again. Use the public credentials `ibl-public`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datajoint as dj\n",
    "\n",
    "dj.config['database.host'] = 'datajoint-public.internationalbrainlab.org'\n",
    "dj.config['database.user'] = 'ibl-public'\n",
    "dj.config['database.password'] = 'ibl-public'\n",
    "dj.conn() # explicitly verify that the connection to database can be established"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import modules"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To start with, we import some modules that will be used in the rest of the notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import os\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also import modules that allow us to interact with the schemas and tables in the IBL DataJoint pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nma_ibl import reference, subject, behavior_analyses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are some overview of what each schema contains:\n",
    "* `reference` schema contains lab, user and project information\n",
    "* `subject` schema contains information about subjects\n",
    "* `behavior_analyses` schema contains results of standardized analyses, including the training status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are some modules that defines pre-defined figure settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nma_ibl.paper_behavior_functions import (query_subjects, seaborn_style,\n",
    "                                              group_colors, institution_map, seaborn_style)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Initialize figure settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "seaborn_style()\n",
    "pal = group_colors()\n",
    "institution_map, col_names = institution_map()\n",
    "col_names = col_names[:-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Query subjects that are trained"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We pre-selected the \"trained\" animals based on the following criteria and save their uuids in the file `uuids_trained1.npy`\n",
    "- 0% and 6% contrasts had been introduced to the contrast set.   \n",
    "- 200 trials were completed with >80% performance on easy (100% and 50% contrasts) trials in each of the last three sessions. \n",
    "- A four-parameter psychometric curve (bias, lapse left, lapse right, threshold) fitted to performance on all trials from the last three sessions had parameter values of bias < 16, threshold < 19, and lapses < 0.2.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "uuids = np.load('uuids_trained1.npy', allow_pickle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We could then fetch the animals in the data pipeline corresponding to their uuids:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subjects = subject.Subject & [{'subject_uuid': uuid} for uuid in uuids]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These are the 101 subjects reported in this study:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subjects"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To include all information that are needed for subjects, we pre-queried subjects with the function `query_subject`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "use_subjects = query_subjects()\n",
    "use_subjects"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One important field used in Figure 2 in this table is `date_trained`, which is the first date that the animal reached the trained criteria."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fetch data from the trained animals as a data frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The summary statistics of the behavior are processed and saved in `behavior_analyses.BehavioralSummaryByDate`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "behavior_analyses.BehavioralSummaryByDate()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- performance: the correct rate on all trials of the date\n",
    "- performance_easy: the correct rate on easy trials that contrast is greater than 50%\n",
    "- n_trials_date: totoal number of trials on the date\n",
    "- training_day: days since the animal is in training, starting from zero. \n",
    "- training_week: days since the animal is in training, starting from zero."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Join the BehavioralSummaryByDate table with subject query to gather info together:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = behavior_analyses.BehavioralSummaryByDate * use_subjects\n",
    "b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we could fetch the contents in the table and return the data as a data frame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "behav = b.fetch(order_by='institution_short, subject_nickname, training_day',\n",
    "                format='frame').reset_index()\n",
    "behav['institution_code'] = behav.institution_short.map(institution_map)\n",
    "behav"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Now compute how many mice are there for each institution and add the column to the dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = behav.groupby(['institution_code'])['subject_nickname'].nunique().to_dict()\n",
    "behav['n_mice'] = behav.institution_code.map(N)\n",
    "behav['institution_name'] = behav.institution_code + \\\n",
    "    ': ' + behav.n_mice.apply(str) + ' mice'\n",
    "behav"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fig 2a, plot learning curves of animals in each of the institution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In Fig 2a, we plot the performance on easy trials `performance_easy` as a function of `training_day` for each animal in each institution.\n",
    "\n",
    "For plotting purpose, we create another column only after the mouse is trained, and performance before the training date is marked as NaN:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "behav2 = pd.DataFrame([])\n",
    "for index, group in behav.groupby(['institution_code', 'subject_nickname']):\n",
    "    group['performance_easy_trained'] = group.performance_easy\n",
    "    group.loc[group['session_date'] < pd.to_datetime(group['date_trained']),\n",
    "              'performance_easy_trained'] = np.nan\n",
    "    # add this\n",
    "    behav2 = behav2.append(group)\n",
    "behav = behav2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we generate the figure. The following cell may take some time to run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "behav['performance_easy'] = behav.performance_easy * 100\n",
    "behav['performance_easy_trained'] = behav.performance_easy_trained * 100\n",
    "\n",
    "# plot one curve for each animal, one panel per lab\n",
    "fig = sns.FacetGrid(behav,\n",
    "                    col=\"institution_code\", col_wrap=4, col_order=col_names,\n",
    "                    sharex=True, sharey=True, aspect=1, hue=\"subject_uuid\", xlim=[-1, 41.5])\n",
    "fig.map(sns.lineplot, \"training_day\",\n",
    "        \"performance_easy\", color='gray', alpha=0.3)\n",
    "fig.map(sns.lineplot, \"training_day\",\n",
    "        \"performance_easy_trained\", color='darkblue', alpha=0.3)\n",
    "fig.set_titles(\"{col_name}\")\n",
    "for axidx, ax in enumerate(fig.axes.flat):\n",
    "    ax.set_title(behav.institution_name.unique()[\n",
    "                 axidx], color=pal[axidx], fontweight='bold')\n",
    "\n",
    "# overlay the example mouse\n",
    "sns.lineplot(ax=fig.axes[0], x='training_day', y='performance_easy', color='black',\n",
    "             data=behav[behav['subject_nickname'].str.contains('KS014')], legend=False)\n",
    "\n",
    "fig.set_axis_labels('Training day', 'Performance (%) on easy trials')\n",
    "fig.despine(trim=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Performance on easy contrast trials (50% and 100% contrast) across mice and laboratories. Each panel represents a different lab, and each curve represents a mouse (gray). The transition from gray to blue indicates when performance criteria for \"trained\" are met. Black, performance for example mouse `KS014`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fig 2b - plot the learning curve averaged over animals for all institutions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot all labs\n",
    "fig, ax1 = plt.subplots(1, 1, figsize=(5, 4))\n",
    "sns.lineplot(x='training_day', y='performance_easy', hue='institution_code', palette=pal,\n",
    "             ax=ax1, legend=False, data=behav, ci=None)\n",
    "ax1.set_title('All labs', color='k', fontweight='bold')\n",
    "ax1.set(xlabel='Training day',\n",
    "        ylabel='Performance (%) on easy trials', xlim=[-1, 41.5])\n",
    "\n",
    "seaborn_style()\n",
    "plt.tight_layout(pad=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Print some statistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "behav_summary_std = behav.groupby(['training_day'])[\n",
    "    'performance_easy'].std().reset_index()\n",
    "behav_summary = behav.groupby(['training_day'])[\n",
    "    'performance_easy'].mean().reset_index()\n",
    "print('number of days to reach 80% accuracy on easy trials: ')\n",
    "print(behav_summary.loc[behav_summary.performance_easy >\n",
    "                        80, 'training_day'].min())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And that's it! You have now completed the introductory tutorials for navigating and accessing IBL data pipepline, and hopefully this gets sets you on a good track to take a deeper dive into this rich and exciting datasets.\n",
    "\n",
    "Be sire tp visit [DataJoint.io](https://datajoint.io) for further learning resources for DataJoint. Also be sure to signup to our DataJoint Slack group (link on the website) to join the vibrant DataJoint user community!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}