{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MLE of Normal parameters\n",
    "\n",
    "[Dataset download](https://s3.amazonaws.com/bebi103.caltech.edu/data/good_invitro_droplet_data.csv)\n",
    "\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Colab setup ------------------\n",
    "import os, sys, subprocess\n",
    "if \"google.colab\" in sys.modules:\n",
    "    cmd = \"pip install --upgrade watermark\"\n",
    "    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n",
    "    stdout, stderr = process.communicate()\n",
    "    data_path = \"https://s3.amazonaws.com/bebi103.caltech.edu/data/\"\n",
    "else:\n",
    "    data_path = \"../data/\"\n",
    "# ------------------------------\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr>\n",
    "\n",
    "Recall our first model for spindle length.\n",
    "\n",
    "\\begin{align}\n",
    "l_i \\sim \\text{Norm}(\\phi, \\sigma) \\;\\;\\forall i.\n",
    "\\end{align}\n",
    "\n",
    "For this model, we have already worked out analytically that the maximum likelihood estimates for the parameters $\\phi$ and $\\sigma$ are given by their respective plug-in estimates.\n",
    "\n",
    "\\begin{align}\n",
    "&\\phi^* = \\hat{\\phi}\\\\[1em]\n",
    "&\\sigma^* = \\hat{\\sigma}.\n",
    "\\end{align}\n",
    "\n",
    "We can directly compute these and use parametric bootstrap to get their confidence intervals. We start by loading in the data and computing the MLEs for $\\phi$ and $\\sigma$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "φ MLE: 32.86402985074626 µm\n",
      "σ MLE: 4.784665043782949 µm\n"
     ]
    }
   ],
   "source": [
    "df = pd.read_csv(os.path.join(data_path, 'good_invitro_droplet_data.csv'), comment='#')\n",
    "\n",
    "spindle_length = df['Spindle Length (um)'].values\n",
    "\n",
    "phi_mle = np.mean(spindle_length)\n",
    "sigma_mle = np.std(spindle_length)\n",
    "\n",
    "print('φ MLE:', phi_mle, 'µm')\n",
    "print('σ MLE:', sigma_mle, 'µm')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we use parametric bootstrap to compute the confidence intervals. First, we'll write functions to perform the bootstrapping."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "rg = np.random.default_rng()\n",
    "\n",
    "def draw_parametric_bs_reps(data, phi, sigma, size=1):\n",
    "    \"\"\"Parametric bootstrap replicates of parameters of\n",
    "    Normal distribution.\"\"\"\n",
    "    bs_reps_phi = np.empty(size)\n",
    "    bs_reps_sigma = np.empty(size)\n",
    "    \n",
    "    for i in range(size):\n",
    "        bs_sample = np.random.normal(phi, sigma, size=len(data))\n",
    "        bs_reps_phi[i] = np.mean(bs_sample)\n",
    "        bs_reps_sigma[i] = np.std(bs_sample)\n",
    "                                     \n",
    "    return bs_reps_phi, bs_reps_sigma"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can get the bootstrap replicates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "bs_reps_phi, bs_reps_sigma = draw_parametric_bs_reps(\n",
    "    spindle_length, phi_mle, sigma_mle, size=100000\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the bootstrap replicates in hand, we can compute the 95% confidence intervals using percentiles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "φ conf int: [32.50257673 33.22919193]\n",
      "σ conf int: [4.5262453  5.03866745]\n"
     ]
    }
   ],
   "source": [
    "print('φ conf int:', np.percentile(bs_reps_phi, [2.5, 97.5]))\n",
    "print('σ conf int:', np.percentile(bs_reps_sigma, [2.5, 97.5]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, we have estimated the parameters under this model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computing environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPython 3.8.5\n",
      "IPython 7.19.0\n",
      "\n",
      "numpy 1.19.2\n",
      "pandas 1.1.3\n",
      "jupyterlab 2.2.6\n"
     ]
    }
   ],
   "source": [
    "%load_ext watermark\n",
    "%watermark -v -p numpy,pandas,jupyterlab"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}