{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework 7.2: Models for microtubule catastrophe (70 pts)\n", "\n", "[Data set download](https://s3.amazonaws.com/bebi103.caltech.edu/data/gardner_mt_catastrophe_only_tubulin.csv)\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have thoroughly investigated the process by which microtubules undergo catastrophe using data from the [Gardner, Zanic, et al. paper](../protected/papers/gardner_2011.pdf). We used an Exponential, Gamma, Weibull, and our custom two-step distribution to model the catastrophe times. We have consistently found that the Gamma model works best. It does have the shortcoming,though, that it does not *directly* match a story that might arise from chemical kinetics, which we strongly suspect would regulate microtubule castastrophe. We expect an *integer* number $m$ of Poisson processes to arrive sequentially in order for catastrophe to occur. The Exponential and two-step model are special cases we have already considered where $m = 1$ and $m = 2$, respectively. In this problem, we will consider a model for arbitrary $m$ and assess which $m$ gives the most plausible generative model. We will again use the experiment run with a tubulin concentration of 12 µM as our data set.\n", "\n", "The probability density function for the time $t$ to catastrophe triggered by the arrival of $m$ Poisson processes may be shown to be\n", "\n", "\\begin{align}\n", "f(t\\mid \\boldsymbol{\\beta}, m) = \\sum_{j=1}^m \\frac{\\mathrm{e}^{-\\beta_j t}}{\\beta_j^{m-2}\\,\\prod_{k=1,k\\ne j}^m \\frac{\\beta_k - \\beta_j}{\\beta_j\\beta_k}}.\n", "\\end{align}\n", "\n", "The expression is a bit cleaner when written in terms of $\\tau_i = 1/\\beta_i$.\n", "\n", "\\begin{align}\n", "f(t\\mid \\boldsymbol{\\tau}, m) = \\sum_{j=1}^m \\frac{\\tau_j^{m-2}\\,\\mathrm{e}^{-t/\\tau_j}}{\\prod_{k=1,k\\ne j}^m (\\tau_j - \\tau_k)}.\n", "\\end{align}\n", "\n", "For clarity, the likelihoods for the first few $m$ are\n", "\n", "\\begin{align}\n", "f(t\\mid \\tau_1, 1) &= \\frac{\\mathrm{e}^{-t/\\tau_1}}{\\tau_1},\\\\[1em]\n", "f(t\\mid \\tau_1, \\tau_2, 2) &=\n", "\\frac{\\mathrm{e}^{-t/\\tau_1}}{\\tau_1 - \\tau_2} + \\frac{\\mathrm{e}^{-t/\\tau_2}}{\\tau_2 - \\tau_1}\n", "= \\frac{\\mathrm{e}^{-t/\\tau_2} - \\mathrm{e}^{-t/\\tau_1}}{\\tau_2 - \\tau_1}, \\\\[1em]\n", "f(t\\mid \\tau_1, \\tau_2, \\tau_3, 3) &=\n", "\\frac{\\tau_1^\\,\\mathrm{e}^{-t/\\tau_1}}{(\\tau_1 - \\tau_2)(\\tau_1-\\tau_3)}\n", "+\\frac{\\tau_2\\,\\mathrm{e}^{-t/\\tau_2}}{(\\tau_2 - \\tau_1)(\\tau_2-\\tau_3)}\n", "+\\frac{\\tau_3\\,\\mathrm{e}^{-t/\\tau_3}}{(\\tau_3 - \\tau_1)(\\tau_3-\\tau_2)},\\\\[1em]\n", "f(t\\mid \\tau_1, \\tau_2, \\tau_3, \\tau_4, 4) &=\n", "\\frac{\\tau_1^2\\,\\mathrm{e}^{-t/\\tau_1}}{(\\tau_1 - \\tau_2)(\\tau_1-\\tau_3)(\\tau_1 - \\tau_4)}\n", "+\\frac{\\tau_2^2\\,\\mathrm{e}^{-t/\\tau_2}}{(\\tau_2 - \\tau_1)(\\tau_2-\\tau_3)(\\tau_2 - \\tau_4)} \\\\[1em]\n", "&\\;\\;\\;\\;\\;+\\frac{\\tau_3^2\\,\\mathrm{e}^{-t/\\tau_3}}{(\\tau_3 - \\tau_1)(\\tau_3-\\tau_2)(\\tau_3 - \\tau_4)}\n", "+\\frac{\\tau_4^2\\,\\mathrm{e}^{-t/\\tau_4}}{(\\tau_4 - \\tau_1)(\\tau_4-\\tau_2)(\\tau_4 - \\tau_3)}.\n", "\\end{align}\n", "\n", "Note that these probability distributions assume that none of the $\\tau_j$'s are equal, and you should explicitly ensure this in your calculations (*Hint*: You may want to read the solutions to [Homework 5.2](http://bebi103.caltech.edu/2024b_hw_solutions/hw5.2_solution.html), in particular the Stan model for the two-step model, to see how to implement this.)\n", "\n", "**a)** Build a model for arbitrary $m$ and code it up in Stan. Draw samples out of this model for values of $m$ ranging from 1 to whatever you think is reasonable.\n", "\n", "**b)** Compare the models for various $m$. Which value of $m$ gives the best predictive model based on a model comparison problem?\n", "\n", "**c)** This is another example where I think the model comparison is unnecessary and should actually be avoided. Without directly doing a model comparison by computing a LOO or WAIC like you did in part (b), interpret the results of your sampling to advocate for a physical picture of how catastrophe proceeds." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 4 }