{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework 7.2: Models for microtubule catastrophe (70 pts)\n",
    "\n",
    "[Data set download](https://s3.amazonaws.com/bebi103.caltech.edu/data/gardner_mt_catastrophe_only_tubulin.csv)\n",
    "\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have thoroughly investigated the process by which microtubules undergo catastrophe using data from the [Gardner, Zanic, et al. paper](../protected/papers/gardner_2011.pdf). We used an Exponential, Gamma, Weibull, and our custom two-step distribution to model the catastrophe times. We have consistently found that the Gamma model works best. It does have the shortcoming,though, that it does not *directly* match a story that might arise from chemical kinetics, which we strongly suspect would regulate microtubule castastrophe. We expect an *integer* number $m$ of Poisson processes to arrive sequentially in order for catastrophe to occur. The Exponential and two-step model are special cases we have already considered where $m = 1$ and $m = 2$, respectively. In this problem, we will consider a model for arbitrary $m$ and assess which $m$ gives the most plausible generative model. We will again use the experiment run with a tubulin concentration of 12 µM as our data set.\n",
    "\n",
    "The probability density function for the time $t$ to catastrophe triggered by the arrival of $m$ Poisson processes may be shown to be\n",
    "\n",
    "\\begin{align}\n",
    "f(t\\mid \\boldsymbol{\\beta}, m) = \\sum_{j=1}^m \\frac{\\mathrm{e}^{-\\beta_j t}}{\\beta_j^{m-2}\\,\\prod_{k=1,k\\ne j}^m \\frac{\\beta_k - \\beta_j}{\\beta_j\\beta_k}}.\n",
    "\\end{align}\n",
    "\n",
    "The expression is a bit cleaner when written in terms of $\\tau_i = 1/\\beta_i$.\n",
    "\n",
    "\\begin{align}\n",
    "f(t\\mid \\boldsymbol{\\tau}, m) = \\sum_{j=1}^m \\frac{\\tau_j^{m-2}\\,\\mathrm{e}^{-t/\\tau_j}}{\\prod_{k=1,k\\ne j}^m (\\tau_j - \\tau_k)}.\n",
    "\\end{align}\n",
    "\n",
    "For clarity, the likelihoods for the first few $m$ are\n",
    "\n",
    "\\begin{align}\n",
    "f(t\\mid \\tau_1, 1) &= \\frac{\\mathrm{e}^{-t/\\tau_1}}{\\tau_1},\\\\[1em]\n",
    "f(t\\mid \\tau_1, \\tau_2, 2) &=\n",
    "\\frac{\\mathrm{e}^{-t/\\tau_1}}{\\tau_1 - \\tau_2} + \\frac{\\mathrm{e}^{-t/\\tau_2}}{\\tau_2 - \\tau_1}\n",
    "= \\frac{\\mathrm{e}^{-t/\\tau_2} - \\mathrm{e}^{-t/\\tau_1}}{\\tau_2 - \\tau_1}, \\\\[1em]\n",
    "f(t\\mid \\tau_1, \\tau_2, \\tau_3, 3) &=\n",
    "\\frac{\\tau_1^\\,\\mathrm{e}^{-t/\\tau_1}}{(\\tau_1 - \\tau_2)(\\tau_1-\\tau_3)}\n",
    "+\\frac{\\tau_2\\,\\mathrm{e}^{-t/\\tau_2}}{(\\tau_2 - \\tau_1)(\\tau_2-\\tau_3)}\n",
    "+\\frac{\\tau_3\\,\\mathrm{e}^{-t/\\tau_3}}{(\\tau_3 - \\tau_1)(\\tau_3-\\tau_2)},\\\\[1em]\n",
    "f(t\\mid \\tau_1, \\tau_2, \\tau_3, \\tau_4, 4) &=\n",
    "\\frac{\\tau_1^2\\,\\mathrm{e}^{-t/\\tau_1}}{(\\tau_1 - \\tau_2)(\\tau_1-\\tau_3)(\\tau_1 - \\tau_4)}\n",
    "+\\frac{\\tau_2^2\\,\\mathrm{e}^{-t/\\tau_2}}{(\\tau_2 - \\tau_1)(\\tau_2-\\tau_3)(\\tau_2 - \\tau_4)} \\\\[1em]\n",
    "&\\;\\;\\;\\;\\;+\\frac{\\tau_3^2\\,\\mathrm{e}^{-t/\\tau_3}}{(\\tau_3 - \\tau_1)(\\tau_3-\\tau_2)(\\tau_3 - \\tau_4)}\n",
    "+\\frac{\\tau_4^2\\,\\mathrm{e}^{-t/\\tau_4}}{(\\tau_4 - \\tau_1)(\\tau_4-\\tau_2)(\\tau_4 - \\tau_3)}.\n",
    "\\end{align}\n",
    "\n",
    "Note that these probability distributions assume that none of the $\\tau_j$'s are equal, and you should explicitly ensure this in your calculations (*Hint*: You may want to read the solutions to [Homework 5.2](http://bebi103.caltech.edu/2024b_hw_solutions/hw5.2_solution.html), in particular the Stan model for the two-step model, to see how to implement this.)\n",
    "\n",
    "**a)** Build a model for arbitrary $m$ and code it up in Stan. Draw samples out of this model for values of $m$ ranging from 1 to whatever you think is reasonable.\n",
    "\n",
    "**b)** Compare the models for various $m$. Which value of $m$ gives the best predictive model based on a model comparison problem?\n",
    "\n",
    "**c)** This is another example where I think the model comparison is unnecessary and should actually be avoided. Without directly doing a model comparison by computing a LOO or WAIC like you did in part (b), interpret the results of your sampling to advocate for a physical picture of how catastrophe proceeds."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br />"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}