{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/NeuromatchAcademy/course-content/blob/NMA2020/tutorials/W1D1_ModelTypes/student/W1D1_Tutorial3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "#  Neuromatch Academy: Week 1, Day 1, Tutorial 3\n",
    "# Model Types: \"Why\" models\n",
    "__Content creators:__ Matt Laporte, Byron Galbraith, Konrad Kording\n",
    "\n",
    "__Content reviewers:__ Dalin Guo, Aishwarya Balwani, Madineh Sarvestani, Maryam Vaziri-Pashkam, Michael Waskom\n",
    "\n",
    "We would like to acknowledge [Steinmetz _et al._ (2019)](https://www.nature.com/articles/s41586-019-1787-x) for sharing their data, a subset of which is used here.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "___\n",
    "# Tutorial Objectives\n",
    "This is tutorial 3 of a 3-part series on different flavors of models used to understand neural data. In parts 1 and 2 we explored mechanisms that would produce the data. In this tutorial we will explore models and techniques that can potentially explain *why* the spiking data we have observed is produced the way it is.\n",
    "\n",
    "To understand why different spiking behaviors may be beneficial, we will learn about the concept of entropy. Specifically, we will:\n",
    "\n",
    "- Write code to compute formula for entropy, a measure of information\n",
    "- Compute the entropy of a number of toy distributions\n",
    "- Compute the entropy of spiking activity from the Steinmetz dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 520
    },
    "colab_type": "code",
    "outputId": "752e3d33-c179-4735-999c-6a55bb6478a2"
   },
   "outputs": [],
   "source": [
    "#@title Video 1: “Why” models\n",
    "from IPython.display import YouTubeVideo\n",
    "video = YouTubeVideo(id='OOIDEr1e5Gg', width=854, height=480, fs=1)\n",
    "print(\"Video available at https://youtube.com/watch?v=\" + video.id)\n",
    "video"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from scipy import stats"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "#@title Figure Settings\n",
    "import ipywidgets as widgets #interactive display\n",
    "\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/NMA2020/nma.mplstyle\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "#@title Helper Functions\n",
    "\n",
    "def plot_pmf(pmf,isi_range):\n",
    "  \"\"\"Plot the probability mass function.\"\"\"\n",
    "  ymax = max(0.2, 1.05 * np.max(pmf))\n",
    "  pmf_ = np.insert(pmf, 0, pmf[0])\n",
    "  plt.plot(bins, pmf_, drawstyle=\"steps\")\n",
    "  plt.fill_between(bins, pmf_, step=\"pre\", alpha=0.4)\n",
    "  plt.title(f\"Neuron {neuron_idx}\")\n",
    "  plt.xlabel(\"Inter-spike interval (s)\")\n",
    "  plt.ylabel(\"Probability mass\")\n",
    "  plt.xlim(isi_range);\n",
    "  plt.ylim([0, ymax])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "#@title Download Data\n",
    "import io\n",
    "import requests\n",
    "r = requests.get('https://osf.io/sy5xt/download')\n",
    "if r.status_code != 200:\n",
    "  print('Could not download data')\n",
    "else:\n",
    "  steinmetz_spikes = np.load(io.BytesIO(r.content), allow_pickle=True)['spike_times']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Section 1: Optimization and Information\n",
    "\n",
    "Neurons can only fire so often in a fixed period of time, as the act of emitting a spike consumes energy that is depleted and must eventually be replenished. To communicate effectively for downstream computation, the neuron would need to make good use of its limited spiking capability. This becomes an optimization problem: \n",
    "\n",
    "What is the optimal way for a neuron to fire in order to maximize its ability to communicate information?\n",
    "\n",
    "In order to explore this question, we first need to have a quantifiable measure for information. Shannon introduced the concept of entropy to do just that, and defined it as\n",
    "\n",
    "\\begin{align}\n",
    "  H_b(X) &= -\\sum_{x\\in X} p(x) \\log_b p(x)\n",
    "\\end{align}\n",
    "\n",
    "where $H$ is entropy measured in units of base $b$ and $p(x)$ is the probability of observing the event $x$ from the set of all possible events in $X$. See the Appendix for a more detailed look at how this equation was derived.\n",
    "\n",
    "The most common base of measuring entropy is $b=2$, so we often talk about *bits* of information, though other bases are used as well e.g. when $b=e$ we call the units *nats*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "First, let's explore how entropy changes between some simple discrete probability distributions. In the rest of this tutorial we will refer to these as probability mass functions (PMF), where $p(x_i)$ equals the $i^{th}$ value in an array, and mass refers to how much of the distribution is contained at that value.\n",
    "\n",
    "For our first PMF, we will choose one where all of the probability mass is located in the middle of the distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "code",
    "outputId": "cf1b675d-f893-4c1d-bac2-d30329b51410"
   },
   "outputs": [],
   "source": [
    "n_bins = 50  # number of points supporting the distribution\n",
    "x_range = (0, 1)  # will be subdivided evenly into bins corresponding to points\n",
    "\n",
    "bins = np.linspace(*x_range, n_bins + 1)  # bin edges\n",
    "\n",
    "pmf = np.zeros(n_bins)\n",
    "pmf[len(pmf) // 2] = 1.0  # middle point has all the mass\n",
    "\n",
    "# Since we already have a PMF, rather than un-binned samples, `plt.hist` is not\n",
    "# suitable. Instead, we directly plot the PMF as a step function to visualize\n",
    "# the histogram:\n",
    "pmf_ = np.insert(pmf, 0, pmf[0])  # this is necessary to align plot steps with bin edges\n",
    "plt.plot(bins, pmf_, drawstyle=\"steps\")\n",
    "# `fill_between` provides area shading\n",
    "plt.fill_between(bins, pmf_, step=\"pre\", alpha=0.4)\n",
    "plt.xlabel(\"x\")\n",
    "plt.ylabel(\"p(x)\")\n",
    "plt.xlim(x_range)\n",
    "plt.ylim(0, 1);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "If we were to draw a sample from this distribution, we know exactly what we would get every time. Distributions where all the mass is concentrated on a single event are known as *deterministic*.\n",
    "\n",
    "How much entropy is contained in a deterministic distribution?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Exercise 1: Computing Entropy\n",
    "\n",
    "Your first exercise is to implement a method that computes the entropy of a discrete probability distribution, given its mass function. Remember that we are interested in entropy in units of _bits_, so be sure to use the correct log function. \n",
    "\n",
    "Recall that $\\log(0)$ is undefined. When evaluated at $0$, NumPy log functions (such as `np.log2`) return `np.nan` (\"Not a Number\"). By convention, these undefined terms— which correspond to points in the distribution with zero mass—are excluded from the sum that computes the entropy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "def entropy(pmf):\n",
    "  \"\"\"Given a discrete distribution, return the Shannon entropy in bits.\n",
    "\n",
    "  This is a measure of information in the distribution. For a totally\n",
    "  deterministic distribution, where samples are always found in the same bin,\n",
    "  then samples from the distribution give no more information and the entropy\n",
    "  is 0.\n",
    "\n",
    "  For now this assumes `pmf` arrives as a well-formed distribution (that is,\n",
    "  `np.sum(pmf)==1` and `not np.any(pmf < 0)`)\n",
    "\n",
    "  Args:\n",
    "    pmf (np.ndarray): The probability mass function for a discrete distribution\n",
    "      represented as an array of probabilities.\n",
    "  Returns:\n",
    "    h (number): The entropy of the distribution in `pmf`.\n",
    "\n",
    "  \"\"\"\n",
    "  ############################################################################\n",
    "  # Exercise for students: compute the entropy of the provided PMF\n",
    "  #   1. Exclude the points in the distribution with no mass (where `pmf==0`).\n",
    "  #      Hint: this is equivalent to including only the points with `pmf>0`.\n",
    "  #   2. Implement the equation for Shannon entropy (in bits).\n",
    "  #  When ready to test, comment or remove the next line\n",
    "  raise NotImplementedError(\"Excercise: implement the equation for entropy\")\n",
    "  ############################################################################\n",
    "\n",
    "  # reduce to non-zero entries to avoid an error from log2(0)\n",
    "  pmf = ...\n",
    "\n",
    "  # implement the equation for Shannon entropy (in bits)\n",
    "  h = ...\n",
    "\n",
    "  # return the absolute value (avoids getting a -0 result)\n",
    "  return np.abs(h)\n",
    "\n",
    "# Uncomment to test your entropy function\n",
    "# print(f\"{entropy(pmf):.2f} bits\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cellView": "both",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "text",
    "outputId": "56dd3f8f-b57d-4543-f4cc-36285463a978"
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/NMA2020/tutorials/W1D1_ModelTypes/solutions/W1D1_Tutorial3_Solution_3dc69011.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "We expect zero surprise from a deterministic distribution. If we had done this calculation by hand, it would simply be"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "$-1\\log_2 1 = -0=0$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Note that changing the location of the peak (i.e. the point and bin on which all the mass rests) doesn't alter the entropy. The entropy is about how predictable a sample is with respect to a distribution. A single peak is deterministic regardless of which point it sits on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "code",
    "outputId": "20d23d57-6484-4565-a429-02e60c171da1"
   },
   "outputs": [],
   "source": [
    "pmf = np.zeros(n_bins)\n",
    "pmf[2] = 1.0  # arbitrary point has all the mass\n",
    "\n",
    "pmf_ = np.insert(pmf, 0, pmf[0])\n",
    "plt.plot(bins, pmf_, drawstyle=\"steps\")\n",
    "plt.fill_between(bins, pmf_, step=\"pre\", alpha=0.4)\n",
    "plt.xlabel(\"x\")\n",
    "plt.ylabel(\"p(x)\")\n",
    "plt.xlim(x_range)\n",
    "plt.ylim(0, 1);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "What about a distribution with mass split equally between two points?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "code",
    "outputId": "05166a7e-334f-4c73-922d-8179743fa840"
   },
   "outputs": [],
   "source": [
    "pmf = np.zeros(n_bins)\n",
    "pmf[len(pmf) // 3] = 0.5\n",
    "pmf[2 * len(pmf) // 3] = 0.5\n",
    "\n",
    "pmf_ = np.insert(pmf, 0, pmf[0])\n",
    "plt.plot(bins, pmf_, drawstyle=\"steps\")\n",
    "plt.fill_between(bins, pmf_, step=\"pre\", alpha=0.4)\n",
    "plt.xlabel(\"x\")\n",
    "plt.ylabel(\"p(x)\")\n",
    "plt.xlim(x_range)\n",
    "plt.ylim(0, 1);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Here, the entropy calculation is"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "$-(0.5 \\log_2 0.5 + 0.5\\log_2 0.5)=1$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "There is 1 bit of entropy. This means that before we take a random sample, there is 1 bit of uncertainty about which point in the distribution the sample will fall on: it will either be the first peak or the second one. \n",
    "\n",
    "Likewise, if we make one of the peaks taller (i.e. its point holds more of the probability mass) and the other one shorter, the entropy will decrease because of the increased certainty that the sample will fall on one point and not the other:\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "$-(0.2 \\log_2 0.2 + 0.8\\log_2 0.8)\\approx 0.72$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Try changing the definition of the number and weighting of peaks, and see how the entropy varies."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "If we split the probability mass among even more points, the entropy continues to increase. Let's derive the general form for $N$ points of equal mass, where $p_i=p=1/N$:\n",
    "\n",
    "\\begin{align}\n",
    " -\\sum_i p_i \\log_b p_i&= -\\sum_i^N \\frac{1}{N} \\log_b \\frac{1}{N}\\\\\n",
    "                       &= -\\log_b \\frac{1}{N} \\\\\n",
    "                       &= \\log_b N\n",
    "\\end{align}\n",
    "$$$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "If we have $N$ discrete points, the _uniform distribution_ (where all points have equal mass) is the distribution with the highest entropy: $\\log_b N$. This upper bound on entropy is useful when considering binning strategies, as any estimate of entropy over $N$ discrete points (or bins) must be in the interval $[0, \\log_b N]$.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "code",
    "outputId": "d7df2eda-e152-4bcb-8709-0d34596c7175"
   },
   "outputs": [],
   "source": [
    "pmf = np.ones(n_bins) / n_bins  # [1/N] * N\n",
    "\n",
    "pmf_ = np.insert(pmf, 0, pmf[0])\n",
    "plt.plot(bins, pmf_, drawstyle=\"steps\")\n",
    "plt.fill_between(bins, pmf_, step=\"pre\", alpha=0.4)\n",
    "plt.xlabel(\"x\")\n",
    "plt.ylabel(\"p(x)\")\n",
    "plt.xlim(x_range);\n",
    "plt.ylim(0, 1);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Here, there are 50 points and the entropy of the uniform distribution is $\\log_2 50\\approx 5.64$. If we construct _any_ discrete distribution $X$ over 50 points (or bins) and calculate an entropy of $H_2(X)>\\log_2 50$, something must be wrong with our implementation of the discrete entropy computation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Section 2: Information, neurons, and spikes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 520
    },
    "colab_type": "code",
    "outputId": "0e8d91c6-7f32-4482-c10c-706d37249ac6"
   },
   "outputs": [],
   "source": [
    "#@title Video 2: Entropy of different distributions\n",
    "from IPython.display import YouTubeVideo\n",
    "video = YouTubeVideo(id='o6nyrx3KH20', width=854, height=480, fs=1)\n",
    "print(\"Video available at https://youtube.com/watch?v=\" + video.id)\n",
    "video"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Recall the discussion of spike times and inter-spike intervals (ISIs) from Tutorial 1. What does the information content (or distributional entropy) of these measures say about our theory of nervous systems? \n",
    "\n",
    "We'll consider three hypothetical neurons that all have the same mean ISI, but with different distributions:\n",
    "\n",
    "1. Deterministic\n",
    "2. Uniform\n",
    "3. Exponential\n",
    "\n",
    "Fixing the mean of the ISI distribution is equivalent to fixing its inverse: the neuron's mean firing rate. If a neuron has a fixed energy budget and each of its spikes has the same energy cost, then by fixing the mean firing rate, we are normalizing for energy expenditure. This provides a basis for comparing the entropy of different ISI distributions. In other words: if our neuron has a fixed budget, what ISI distribution should it express (all else being equal) to maximize the information content of its outputs?\n",
    "\n",
    "Let's construct our three distributions and see how their entropies differ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "n_bins = 50\n",
    "mean_isi = 0.025\n",
    "isi_range = (0, 0.25)\n",
    "\n",
    "bins = np.linspace(*isi_range, n_bins + 1)\n",
    "mean_idx = np.searchsorted(bins, mean_isi)\n",
    "\n",
    "# 1. all mass concentrated on the ISI mean\n",
    "pmf_single = np.zeros(n_bins)\n",
    "pmf_single[mean_idx] = 1.0\n",
    "\n",
    "# 2. mass uniformly distributed about the ISI mean\n",
    "pmf_uniform = np.zeros(n_bins)\n",
    "pmf_uniform[0:2*mean_idx] = 1 / (2 * mean_idx)\n",
    "\n",
    "# 3. mass exponentially distributed about the ISI mean\n",
    "pmf_exp = stats.expon.pdf(bins[1:], scale=mean_isi)\n",
    "pmf_exp /= np.sum(pmf_exp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 358
    },
    "colab_type": "code",
    "outputId": "a6d5ffd8-d99b-445a-de1a-adeb65c409bc"
   },
   "outputs": [],
   "source": [
    "#@title\n",
    "#@markdown Run this cell to plot the three PMFs\n",
    "fig, axes = plt.subplots(ncols=3, figsize=(18, 5))\n",
    "\n",
    "dists =  [# (subplot title, pmf, ylim)\n",
    "          (\"(1) Deterministic\", pmf_single, (0, 1.05)),\n",
    "          (\"(1) Uniform\", pmf_uniform, (0, 1.05)),\n",
    "          (\"(1) Exponential\", pmf_exp, (0, 1.05))]\n",
    "\n",
    "for ax, (label, pmf_, ylim) in zip(axes, dists):\n",
    "  pmf_ = np.insert(pmf_, 0, pmf_[0])\n",
    "  ax.plot(bins, pmf_, drawstyle=\"steps\")\n",
    "  ax.fill_between(bins, pmf_, step=\"pre\", alpha=0.4)\n",
    "  ax.set_title(label)\n",
    "  ax.set_xlabel(\"Inter-spike interval (s)\")\n",
    "  ax.set_ylabel(\"Probability mass\")\n",
    "  ax.set_xlim(isi_range);\n",
    "  ax.set_ylim(ylim);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 70
    },
    "colab_type": "code",
    "outputId": "dacd5378-369b-47ec-9de5-5e2d1f2829ea"
   },
   "outputs": [],
   "source": [
    "print(\n",
    "  f\"Deterministic: {entropy(pmf_single):.2f} bits\",\n",
    "  f\"Uniform: {entropy(pmf_uniform):.2f} bits\",\n",
    "  f\"Exponential: {entropy(pmf_exp):.2f} bits\",\n",
    "  sep=\"\\n\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 520
    },
    "colab_type": "code",
    "outputId": "349f56bf-9ddf-4830-ec91-cf26e69af27d"
   },
   "outputs": [],
   "source": [
    "#@title Video 3: Probabilities from histogram\n",
    "from IPython.display import YouTubeVideo\n",
    "video = YouTubeVideo(id='e2U_-07O9jo', width=854, height=480, fs=1)\n",
    "print(\"Video available at https://youtube.com/watch?v=\" + video.id)\n",
    "video"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "In the previous example we created the PMFs by hand to illustrate idealized scenarios. How would we compute them from data recorded from actual neurons?\n",
    "\n",
    "One way is to convert the ISI histograms we've previously computed into discrete probability distributions using the following equation:\n",
    "\n",
    "\\begin{align}\n",
    "p_i = \\frac{n_i}{\\sum\\nolimits_{i}n_i}\n",
    "\\end{align}\n",
    "\n",
    "where $p_i$ is the probability of an ISI falling within a particular interval $i$ and $n_i$ is the count of how many ISIs were observed in that interval."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Exercise 2: Probabilty Mass Function\n",
    "\n",
    "Your second exercise is to implement a method that will produce a probability mass function from an array of ISI bin counts.\n",
    "\n",
    "To verify your solution, we will compute the probability distribution of ISIs from real neural data taken from the Steinmetz dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "neuron_idx = 283\n",
    "\n",
    "isi = np.diff(steinmetz_spikes[neuron_idx])\n",
    "bins = np.linspace(*isi_range, n_bins + 1)\n",
    "counts, _ = np.histogram(isi, bins)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "def pmf_from_counts(counts):\n",
    "  \"\"\"Given counts, normalize by the total to estimate probabilities.\"\"\"\n",
    "  ###########################################################################\n",
    "  # Exercise: Compute the PMF. Remove the next line to test your function\n",
    "  raise NotImplementedError(\"Student excercise: compute the PMF from ISI counts\")\n",
    "  ###########################################################################\n",
    "\n",
    "  pmf = ...\n",
    "\n",
    "  return pmf\n",
    "\n",
    "# Uncomment when ready to test your function\n",
    "# pmf = pmf_from_counts(counts)\n",
    "# plot_pmf(pmf,isi_range)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cellView": "both",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "text",
    "outputId": "40291ce9-4842-4c66-c467-3292a96e9b19"
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/NMA2020/tutorials/W1D1_ModelTypes/solutions/W1D1_Tutorial3_Solution_49231923.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=558 height=414 src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/NMA2020/tutorials/W1D1_ModelTypes/static/W1D1_Tutorial3_Solution_49231923_0.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Section 3: Calculate entropy from a PMF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 520
    },
    "colab_type": "code",
    "outputId": "b1c2cf2d-ae11-4e8b-e98a-5f0aba37640f"
   },
   "outputs": [],
   "source": [
    "#@title Video 4: Calculating entropy from pmf\n",
    "from IPython.display import YouTubeVideo\n",
    "video = YouTubeVideo(id='Xjy-jj-6Oz0', width=854, height=480, fs=1)\n",
    "print(\"Video available at https://youtube.com/watch?v=\" + video.id)\n",
    "video"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Now that we have the probability distribution for the actual neuron spiking activity, we can calculate its entropy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "outputId": "da585b68-092e-4de1-9041-bb7ab806e85c"
   },
   "outputs": [],
   "source": [
    "print(f\"Entropy for Neuron {neuron_idx}: {entropy(pmf):.2f} bits\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Interactive Demo: Entropy of neurons\n",
    "\n",
    "We can combine the above distribution plot and entropy calculation with an interactive widget to explore how the different neurons in the dataset vary in spiking activity and relative information. Note that the mean firing rate across neurons is not fixed, so some neurons with a uniform ISI distribution may have higher entropy than neurons with a more exponential distribution.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 462,
     "referenced_widgets": [
      "2371a383fca2465c848a7530644a94a5",
      "6ee4e3b0704c461b9fddb0d2bd4cb768",
      "65176fdad49d45379a9c9e1238dd4f4e",
      "4c9e02fdedcd42f1ad15f0ae771c9692",
      "d918baf72b2a4b9395346122bcbf817e",
      "fed5ec1c321a44ad92c0ce942f96b9b6",
      "a422984464d44beabb836b92d1d66022"
     ]
    },
    "colab_type": "code",
    "outputId": "dc8a4b5c-44a8-4b91-b748-1e6c756fab46"
   },
   "outputs": [],
   "source": [
    "#@title\n",
    "#@markdown **Run the cell** to enable the sliders.\n",
    "\n",
    "def _pmf_from_counts(counts):\n",
    "  \"\"\"Given counts, normalize by the total to estimate probabilities.\"\"\"\n",
    "  pmf = counts / np.sum(counts)\n",
    "  return pmf\n",
    "\n",
    "def _entropy(pmf):\n",
    "  \"\"\"Given a discrete distribution, return the Shannon entropy in bits.\"\"\"\n",
    "  # remove non-zero entries to avoid an error from log2(0)\n",
    "  pmf = pmf[pmf > 0]\n",
    "  h = -np.sum(pmf * np.log2(pmf))\n",
    "  # absolute value applied to avoid getting a -0 result\n",
    "  return np.abs(h)\n",
    "\n",
    "@widgets.interact(neuron=widgets.IntSlider(0, min=0, max=(len(steinmetz_spikes)-1)))\n",
    "def steinmetz_pmf(neuron):\n",
    "  \"\"\" Given a neuron from the Steinmetz data, compute its PMF and entropy \"\"\"\n",
    "  isi = np.diff(steinmetz_spikes[neuron])\n",
    "  bins = np.linspace(*isi_range, n_bins + 1)\n",
    "  counts, _ = np.histogram(isi, bins)\n",
    "  pmf = _pmf_from_counts(counts)\n",
    "\n",
    "  plot_pmf(pmf,isi_range)\n",
    "  plt.title(f\"Neuron {neuron}: H = {_entropy(pmf):.2f} bits\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Summary\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 520
    },
    "colab_type": "code",
    "outputId": "3727ad8b-22ce-4358-b82d-cc11d82e37dc"
   },
   "outputs": [],
   "source": [
    "#@title Video 5: Summary of model types\n",
    "from IPython.display import YouTubeVideo\n",
    "video = YouTubeVideo(id='X4K2RR5qBK8', width=854, height=480, fs=1)\n",
    "print(\"Video available at https://youtube.com/watch?v=\" + video.id)\n",
    "video"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Congratulations! You've finished your first NMA tutorial. In this 3 part tutorial series, we used different types of models to understand the spiking behavior of neurons recorded in the Steinmetz data set. \n",
    "\n",
    " - We used \"what\" models to discover that the ISI distribution of real neurons is closest to an exponential distribution\n",
    " - We used \"how\" models to discover that balanced excitatory and inbhitiory inputs, coupled with a leaky membrane, can give rise to neuronal spiking with exhibiting such an exponential ISI distribution\n",
    " - We used \"why\" models to discover that exponential ISI distributions contain the most information when the mean spiking is constrained\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Bonus"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### The foundations for Entropy\n",
    "\n",
    "In his foundational [1948 paper](https://en.wikipedia.org/wiki/A_Mathematical_Theory_of_Communication) on information theory, Claude Shannon began with three criteria for a function $H$ defining the entropy of a discrete distribution of probability masses $p_i\\in p(X)$ over the points $x_i\\in X$:\n",
    "1. $H$ should be continuous in the $p_i$. \n",
    "  - That is, $H$ should change smoothly in response to smooth changes to the mass $p_i$ on each point $x_i$.\n",
    "2. If all the points have equal shares of the probability mass, $p_i=1/N$, $H$ should be a non-decreasing function of $N$. \n",
    "  - That is, if $X_N$ is the support with $N$ discrete points and $p(x\\in X_N)$ assigns constant mass to each point, then $H(X_1) < H(X_2) < H(X_3) < \\dots$\n",
    "3. $H$ should be preserved by (invariant to) the equivalent (de)composition of distributions.\n",
    "  - For example (from Shannon's paper) if we have a discrete distribution over three points with masses $(\\frac{1}{2},\\frac{1}{3},\\frac{1}{6})$, then their entropy can be represented in terms of a direct choice between the three and calculated $H(\\frac{1}{2},\\frac{1}{3},\\frac{1}{6})$. However, it could also be represented in terms of a series of two choices: \n",
    "    1. either we sample the point with mass $1/2$ or not (_not_ is the other $1/2$, whose subdivisions are not given in the first choice), \n",
    "    2. if (with probability $1/2$) we _don't_ sample the first point, we sample one of the two remaining points, masses $1/3$ and $1/6$.\n",
    "    \n",
    "    Thus in this case we require that $H(\\frac{1}{2},\\frac{1}{3},\\frac{1}{6})=H(\\frac{1}{2},\\frac{1}{2}) + \\frac{1}{2}H(\\frac{1}{3}, \\frac{1}{6})$\n",
    "\n",
    "There is a unique function (up to a linear scaling factor) which satisfies these 3 requirements: \n",
    "\n",
    "\\begin{align}\n",
    "  H_b(X) &= -\\sum_{x\\in X} p(x) \\log_b p(x)\n",
    "\\end{align}\n",
    "\n",
    "Where the base of the logarithm $b>1$ controls the units of entropy. The two most common cases are $b=2$ for units of _bits_, and $b=e$ for _nats_.\n",
    "\n",
    "We can view this function as the expectation of the self-information over a distribution:\n",
    "\n",
    "$$H_b(X) = \\mathbb{E}_{x\\in X} \\left[I_b(x)\\right]$$\n",
    "\n",
    "$$I_b(x)=-\\log_b p(x)$$\n",
    "\n",
    "Self-information is just the negative logarithm of probability, and is a measure of how surprising an event sampled from the distribution would be. Events with $p(x)=1$ are certain to occur, and their self-information is zero (as is the entropy of the distribution they compose) meaning they are totally unsurprising. The smaller the probability of an event, the higher its self-information, and the more surprising the event would be to observe. \n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "W1D1_Tutorial3",
   "provenance": [],
   "toc_visible": true
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}