{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Data transformations and parameter estimation\n",
        "\n",
        "[Data download](https://s3.amazonaws.com/bebi103.caltech.edu/data/fret_binding_curve.csv)\n",
        "\n",
        "<hr />"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We often want to ascertain how tightly two proteins are bound by measuring their dissociation constant, $K_d$. This is usually done by doing a titration experiment and then performing a maximum likelihood estimate of $K_d$. For example, imagine two proteins, $a$ and $b$ may bind to each other in the reaction\n",
        "\n",
        "\\begin{align}\n",
        "ab \\rightleftharpoons a + b\n",
        "\\end{align}\n",
        "\n",
        "with dissociation constant $K_d$. At equilibrium\n",
        "\n",
        "\\begin{align}\n",
        "K_d = \\frac{c_a\\,c_b}{c_{ab}},\n",
        "\\end{align}\n",
        "\n",
        "were $c_i$ is the concentration of species $i$. If we add known amounts of $a$ and $b$ to a solution such that the total concentration of a is $c_a^0$ and the total concentration of b is $c_b^0$, we can compute the equilibrium concentrations of all species. Specifically, in addition to the equation above, we have conservation of mass equations,\n",
        "\n",
        "\\begin{align}\n",
        "c_a^0 &= c_a + c_{ab}\\\\[1em]\n",
        "c_b^0 &= c_b + c_{ab},\n",
        "\\end{align}\n",
        "\n",
        "fully specifying the problem. We can solve the three equations for $c_{ab}$ in terms of the known quantities $c_a^0$ and $c_b^0$, along with the parameter we are trying to measure, $K_d$. We get\n",
        "\n",
        "\\begin{align}\n",
        "c_{ab} = \\frac{2c_a^0\\,c_b^0}{K_d+c_a^0+c_b^0 + \\sqrt{\\left(K_d+c_a^0+c_b^0\\right)^2 - 4c_a^0\\,c_b^0}}.\n",
        "\\end{align}\n",
        "\n",
        "The technique, then, is to hold $c_a^0$ fixed and measure $c_{ab}$ for various $c_b^0$. We can then perform devise a variate-covariate model and obtain an MLE of $K_d$.\n",
        "\n",
        "In order to do this, though, we need some readout of $c_{ab}$. For this problem, we will use FRET (fluorescence resonance energy transfer) to monitor how much of $a$ is bound to $b$. Specifically, we take $a$ with a fluorophore and $b$ is a receptor. When the two are unbound, we get a fluorescence signal per molecule of $f_0$. When they are bound, the receptor absorbs the light coming out of the fluorophore, so we get less fluorescence per molecule, which we will call $f_q$ (for \"quenched\"). Let $f$ be the total per-fluorophore fluorescence signal. Then, the measured fluorescence signal, $F$, is\n",
        "\n",
        "\\begin{align}\n",
        "F = c_a^0\\,V f = \\left(c_a \\,f_0 + c_{ab}\\, f_q\\right)V,\n",
        "\\end{align}\n",
        "\n",
        "where $V$ is the reaction volume.\n",
        "\n",
        "As is commonly done by biochemists, we can define a FRET efficiency, $e$, as\n",
        "\n",
        "\\begin{align}\n",
        "e = 1 - \\frac{f}{f_0}.\n",
        "\\end{align}\n",
        "\n",
        "If we measure $F_0$, the measured fluorescence when there is no b protein in the sample, we can compute the FRET efficiency from the measured values $F$ and $F_0$\n",
        "\n",
        "\\begin{align}\n",
        "e =  1 - \\frac{c_a^0\\,V f}{c_a^0\\,Vf_0} = 1 - \\frac{F}{F_0}.\n",
        "\\end{align}\n",
        "\n",
        "Substituting in our expressions for $F$ and $F_0$, we get\n",
        "\n",
        "\\begin{align}\n",
        "e = 1 - \\frac{\\left(c_a \\,f_0 + c_{ab}\\, f_q\\right)V}{c_a^0\\,V f_0}\n",
        "= 1 - \\frac{c_a}{c_a^0} - \\frac{c_{ab}}{c_a^0}\\,\\frac{f_q}{f_0}.\n",
        "\\end{align}\n",
        "\n",
        "Using the fact that $c_a^0 = c_a + c_{ab}$, this becomes\n",
        "\n",
        "\\begin{align}\n",
        "e = \\left(1-\\frac{f_q}{f_0}\\right)\\frac{c_{ab}}{c_a^0}.\n",
        "\\end{align}\n",
        "\n",
        "In other words, the FRET efficiency is proportional to the fraction of a that is bound, or\n",
        "\n",
        "\\begin{align}\n",
        "e = \\alpha \\, \\frac{c_{ab}}{c_a^0} = \\frac{2\\alpha\\,c_b^0}{K_d+c_a^0+c_b^0 + \\sqrt{\\left(K_d+c_a^0+c_b^0\\right)^2 - 4c_a^0\\,c_b^0}},\n",
        "\\end{align}\n",
        "\n",
        "where $\\alpha = 1 - f_q/f_0$. Biochemists then typically consider $e$ to be a variate (and $c_a^0$ and $c_b^0$ to be covariates) and then obtain MLEs for the parameters $\\alpha$ and $K_d$.\n",
        "\n",
        "**a)** Load in the data for one of these FRET efficiency titration curves. You can download the data set [here](https://s3.amazonaws.com/bebi103.caltech.edu/data/fret_binding_curve.csv). These are real data from Caltech's campus, collected by a former student in my data analysis class, Emily Blythe. They were never published, but were preliminary experiments for [this publication](https://doi.org/10.1016/j.str.2019.09.011). To get the fluorescence for each measurement, you need to subtract the background fluorescence. Do that, and then also compute the FRET efficiency.\n",
        "\n",
        "**b)** One could use a variate-covariate model based on the typical approach used by biochemists using the FRET efficiency as described above to obtain estimates for $K_d$ and $\\alpha$. Alternatively, one could instead directly use the measured (background-subtracted) fluorescence and build a variate-covariate model around the equation\n",
        "\n",
        "\\begin{align}\n",
        "F = \\left(c_a \\,f_0 + c_{ab}\\, f_q\\right)V,\n",
        "\\end{align}\n",
        "\n",
        "where there are now three parameters, $K_d$, $f_0V$, and $f_qV$, from which $\\alpha$ may be calculated as $\\alpha = 1 - f_qV/f_0V$. Which of these two approaches is preferred, and why?\n",
        "\n",
        "**c)** Provide MLEs for $\\alpha$ and $K_d$, along with confidence intervals, and display a graphical model assessment."
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "default",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.13.5"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}