{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework 9.2: Data transformations and parameter estimation (30 pts)\n",
    "\n",
    "[Data download](https://s3.amazonaws.com/bebi103.caltech.edu/data/fret_binding_curve.csv)\n",
    "\n",
    "<hr />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We often want to ascertain how tightly two proteins are bound by measuring their dissociation constant, $K_d$. This is usually done by doing a titration experiment and then performing a maximum likelihood estimate of $K_d$. For example, imagine two proteins, $a$ and $b$ may bind to each other in the reaction\n",
    "\n",
    "\\begin{align}\n",
    "ab \\rightleftharpoons a + b\n",
    "\\end{align}\n",
    "\n",
    "with dissociation constant $K_d$. At equilibrium\n",
    "\n",
    "\\begin{align}\n",
    "K_d = \\frac{c_a\\,c_b}{c_{ab}},\n",
    "\\end{align}\n",
    "\n",
    "were $c_i$ is the concentration of species $i$. If we add known amounts of $a$ and $b$ to a solution such that the total concentration of a is $c_a^0$ and the total concentration of b is $c_b^0$, we can compute the equilibrium concentrations of all species. Specifically, in addition to the equation above, we have conservation of mass equations,\n",
    "\n",
    "\\begin{align}\n",
    "c_a^0 &= c_a + c_{ab}\\\\[1em]\n",
    "c_b^0 &= c_b + c_{ab},\n",
    "\\end{align}\n",
    "\n",
    "fully specifying the problem. We can solve the three equations for $c_{ab}$ in terms of the known quantities $c_a^0$ and $c_b^0$, along with the parameter we are trying to measure, $K_d$. We get\n",
    "\n",
    "\\begin{align}\n",
    "c_{ab} = \\frac{2c_a^0\\,c_b^0}{K_d+c_a^0+c_b^0 + \\sqrt{\\left(K_d+c_a^0+c_b^0\\right)^2 - 4c_a^0\\,c_b^0}}.\n",
    "\\end{align}\n",
    "\n",
    "The technique, then, is to hold $c_a^0$ fixed and measure $c_{ab}$ for various $c_b^0$. We can then perform devise a variate-covariate model and obtain an MLE of $K_d$.\n",
    "\n",
    "In order to do this, though, we need some readout of $c_{ab}$. For this problem, we will use FRET (fluorescence resonance energy transfer) to monitor how much of $a$ is bound to $b$. Specifically, we take $a$ with a fluorophore and $b$ is a receptor. When the two are unbound, we get a fluorescence signal per molecule of $f_0$. When they are bound, the receptor absorbs the light coming out of the fluorophore, so we get less fluorescence per molecule, which we will call $f_q$ (for \"quenched\"). Let $f$ be the total per-fluorophore fluorescence signal. Then, the measured fluorescence signal, $F$, is\n",
    "\n",
    "\\begin{align}\n",
    "F = c_a^0\\,V f = \\left(c_a \\,f_0 + c_{ab}\\, f_q\\right)V,\n",
    "\\end{align}\n",
    "\n",
    "where $V$ is the reaction volume.\n",
    "\n",
    "As is commonly done by biochemists, we can define a FRET efficiency, $e$, as\n",
    "\n",
    "\\begin{align}\n",
    "e = 1 - \\frac{f}{f_0}.\n",
    "\\end{align}\n",
    "\n",
    "If we measure $F_0$, the measured fluorescence when there is no b protein in the sample, we can compute the FRET efficiency from the measured values $F$ and $F_0$\n",
    "\n",
    "\\begin{align}\n",
    "e =  1 - \\frac{c_a^0\\,V f}{c_a^0\\,Vf_0} = 1 - \\frac{F}{F_0}.\n",
    "\\end{align}\n",
    "\n",
    "Substituting in our expressions for $F$ and $F_0$, we get\n",
    "\n",
    "\\begin{align}\n",
    "e = 1 - \\frac{\\left(c_a \\,f_0 + c_{ab}\\, f_q\\right)V}{c_a^0\\,V f_0}\n",
    "= 1 - \\frac{c_a}{c_a^0} - \\frac{c_{ab}}{c_a^0}\\,\\frac{f_q}{f_0}.\n",
    "\\end{align}\n",
    "\n",
    "Using the fact that $c_a^0 = c_a + c_{ab}$, this becomes\n",
    "\n",
    "\\begin{align}\n",
    "e = \\left(1-\\frac{f_q}{f_0}\\right)\\frac{c_{ab}}{c_a^0}.\n",
    "\\end{align}\n",
    "\n",
    "In other words, the FRET efficiency is proportional to the fraction of a that is bound, or\n",
    "\n",
    "\\begin{align}\n",
    "e = \\alpha \\, \\frac{c_{ab}}{c_a^0} = \\frac{2\\alpha\\,c_b^0}{K_d+c_a^0+c_b^0 + \\sqrt{\\left(K_d+c_a^0+c_b^0\\right)^2 - 4c_a^0\\,c_b^0}},\n",
    "\\end{align}\n",
    "\n",
    "where $\\alpha = 1 - f_q/f_0$. Biochemists then typically consider $e$ to be a covariate and $c_b^0$ to be a variate and then obtain MLEs for the parameters $\\alpha$ and $K_d$.\n",
    "\n",
    "**a)** Load in the data for one of these FRET efficiency titration curves. You can download the data set [here](https://s3.amazonaws.com/bebi103.caltech.edu/data/fret_binding_curve.csv). These are real data from here on campus, collected by a former student in this class, Emily Blythe. They were never published, but were preliminary experiments for [this publication](https://doi.org/10.1016/j.str.2019.09.011). To get the fluorescence for each measurement, you need to subtract the background fluorescence. Do that, and then also compute the FRET efficiency.\n",
    "\n",
    "**b)** One could use a variate-covariate model based on the typical approach used by biochemists using the FRET efficiency as described above to obtain estimates for $K_d$ and $\\alpha$. Alternatively, one could instead directly use the measured (background-subtracted) fluorescence and build a variate-covariate model around the equation\n",
    "\n",
    "\\begin{align}\n",
    "F = \\left(c_a \\,f_0 + c_{ab}\\, f_q\\right)V,\n",
    "\\end{align}\n",
    "\n",
    "where there are now three parameters, $K_d$, $f_0V$, and $f_qV$, from which $\\alpha$ may be calculated as $\\alpha = 1 - f_qV/f_0V$. Which of these two approaches is preferred, and why?\n",
    "\n",
    "**c)** Provide MLEs for $\\alpha$ and $K_d$, along with confidence intervals, and display a graphical model assessment."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}