{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sera concentrations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Choosing sera concentrations is an important consideration for DMS selections. We'll use simulated data to estimate the optimal set of sera concentrations that provides the best performance while minimizing the **number of concentrations** (or selection experiments) required."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pickle\n",
    "\n",
    "import altair as alt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import polyclonal"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we read in a simulated \"noisy\" dataset measured at six sera concentrations. The variants in this library were simulated to contain a Poisson-distributed number of mutations, with an average of three mutations per gene. The variants also generally span a wide range of escape fractions across the different sera concentrations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>library</th>\n",
       "      <th>barcode</th>\n",
       "      <th>concentration</th>\n",
       "      <th>prob_escape</th>\n",
       "      <th>aa_substitutions</th>\n",
       "      <th>IC90</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>AAAACTGCTGAGGAGA</td>\n",
       "      <td>0.125</td>\n",
       "      <td>0.013540</td>\n",
       "      <td></td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>AAAAGCAGGCTACTCT</td>\n",
       "      <td>0.125</td>\n",
       "      <td>0.066420</td>\n",
       "      <td></td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>AAAAGCTATAGGTGCC</td>\n",
       "      <td>0.125</td>\n",
       "      <td>0.023120</td>\n",
       "      <td></td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>AAAAGGTATTAGTGGC</td>\n",
       "      <td>0.125</td>\n",
       "      <td>0.005456</td>\n",
       "      <td></td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>AAAAGTGCCTTCGTTA</td>\n",
       "      <td>0.125</td>\n",
       "      <td>0.054760</td>\n",
       "      <td></td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239995</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>CTTAAAATAGCTGGTC</td>\n",
       "      <td>0.250</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>Y508W</td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239996</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>CTTAAAATAGCTGGTC</td>\n",
       "      <td>0.500</td>\n",
       "      <td>0.026480</td>\n",
       "      <td>Y508W</td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239997</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>CTTAAAATAGCTGGTC</td>\n",
       "      <td>1.000</td>\n",
       "      <td>0.012260</td>\n",
       "      <td>Y508W</td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239998</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>CTTAAAATAGCTGGTC</td>\n",
       "      <td>2.000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>Y508W</td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239999</th>\n",
       "      <td>avg3muts</td>\n",
       "      <td>CTTAAAATAGCTGGTC</td>\n",
       "      <td>4.000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>Y508W</td>\n",
       "      <td>0.08212</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>240000 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         library           barcode  concentration  prob_escape  \\\n",
       "0       avg3muts  AAAACTGCTGAGGAGA          0.125     0.013540   \n",
       "1       avg3muts  AAAAGCAGGCTACTCT          0.125     0.066420   \n",
       "2       avg3muts  AAAAGCTATAGGTGCC          0.125     0.023120   \n",
       "3       avg3muts  AAAAGGTATTAGTGGC          0.125     0.005456   \n",
       "4       avg3muts  AAAAGTGCCTTCGTTA          0.125     0.054760   \n",
       "...          ...               ...            ...          ...   \n",
       "239995  avg3muts  CTTAAAATAGCTGGTC          0.250     0.000000   \n",
       "239996  avg3muts  CTTAAAATAGCTGGTC          0.500     0.026480   \n",
       "239997  avg3muts  CTTAAAATAGCTGGTC          1.000     0.012260   \n",
       "239998  avg3muts  CTTAAAATAGCTGGTC          2.000     0.000000   \n",
       "239999  avg3muts  CTTAAAATAGCTGGTC          4.000     0.000000   \n",
       "\n",
       "       aa_substitutions     IC90  \n",
       "0                        0.08212  \n",
       "1                        0.08212  \n",
       "2                        0.08212  \n",
       "3                        0.08212  \n",
       "4                        0.08212  \n",
       "...                 ...      ...  \n",
       "239995            Y508W  0.08212  \n",
       "239996            Y508W  0.08212  \n",
       "239997            Y508W  0.08212  \n",
       "239998            Y508W  0.08212  \n",
       "239999            Y508W  0.08212  \n",
       "\n",
       "[240000 rows x 6 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "noisy_data = (\n",
    "    pd.read_csv(\"RBD_variants_escape_noisy.csv\", na_filter=None)\n",
    "    .query(\"library == 'avg3muts'\")\n",
    "    .reset_index(drop=True)\n",
    ")\n",
    "noisy_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While the concentrations are in arbitrary units (`0.125, 0.25, 0.5, 1, 2, 4`), we can describe them in terms of their ICXX's against wildtype virus. So, we can interpret each as a sera concentration that neutralizes XX % of wildtype viruses. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>barcode</th>\n",
       "      <th>ICxx_against_wt</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>concentration</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0.125</th>\n",
       "      <td>AAAAAATGTTCTATCC</td>\n",
       "      <td>95.141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0.125</th>\n",
       "      <td>AAAAACAATCCGGACT</td>\n",
       "      <td>95.141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0.125</th>\n",
       "      <td>AAAAACGCGGTCACTT</td>\n",
       "      <td>95.141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0.125</th>\n",
       "      <td>AAAAACTTGGCTAGCT</td>\n",
       "      <td>95.141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0.125</th>\n",
       "      <td>AAAAAGCAAGGCCCAG</td>\n",
       "      <td>95.141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4.000</th>\n",
       "      <td>TTTGTATGGTCCATAT</td>\n",
       "      <td>99.999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4.000</th>\n",
       "      <td>TTTGTCTCGAATGGTG</td>\n",
       "      <td>99.999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4.000</th>\n",
       "      <td>TTTTAAGCTCATACGC</td>\n",
       "      <td>99.999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4.000</th>\n",
       "      <td>TTTTATCCACCGAACT</td>\n",
       "      <td>99.999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4.000</th>\n",
       "      <td>TTTTCGATCCTTGTCA</td>\n",
       "      <td>99.999</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>137868 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                        barcode  ICxx_against_wt\n",
       "concentration                                   \n",
       "0.125          AAAAAATGTTCTATCC           95.141\n",
       "0.125          AAAAACAATCCGGACT           95.141\n",
       "0.125          AAAAACGCGGTCACTT           95.141\n",
       "0.125          AAAAACTTGGCTAGCT           95.141\n",
       "0.125          AAAAAGCAAGGCCCAG           95.141\n",
       "...                         ...              ...\n",
       "4.000          TTTGTATGGTCCATAT           99.999\n",
       "4.000          TTTGTCTCGAATGGTG           99.999\n",
       "4.000          TTTTAAGCTCATACGC           99.999\n",
       "4.000          TTTTATCCACCGAACT           99.999\n",
       "4.000          TTTTCGATCCTTGTCA           99.999\n",
       "\n",
       "[137868 rows x 2 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "wt_data_icxx_df = (\n",
    "    pd.read_csv(\"RBD_variants_escape_exact.csv\", na_filter=None)\n",
    "    .query(\"aa_substitutions == ''\")\n",
    "    .reset_index(drop=True)\n",
    "    .drop(columns=\"library\")\n",
    "    .drop_duplicates()\n",
    "    .assign(ICxx_against_wt=lambda x: round((1 - x[\"prob_escape\"]) * 100, 3))\n",
    "    .drop(columns=[\"aa_substitutions\", \"prob_escape\", \"IC90\"])\n",
    "    .set_index(\"concentration\")\n",
    ")\n",
    "\n",
    "display(wt_data_icxx_df)\n",
    "wt_data_icxx = wt_data_icxx_df[\"ICxx_against_wt\"].to_dict()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we'll fit multiple `Polyclonal` models to data measured at **single** concentrations. We'll initialize each `Polyclonal` model with the same values. We know from [prior work](https://www.nature.com/articles/s41467-021-24435-8) the three most important epitopes and a key mutation in each, so we use this prior knowledge to “seed” initial guesses that assign large escape values to a key site in each epitope:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- site 417 for class 1 epitope, which is often the least important"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- site 484 for class 2 epitope, which is often the dominant one"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- site 444 for class 3 epitope, which is often the second most dominant one"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Additionally, we'll store fit models as [pickle](https://docs.python.org/3/library/pickle.html#module-pickle) files, so that we can conveniently load them in the future without having to fit again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model with [0.125] was already fit.\n",
      "Model with [0.25] was already fit.\n",
      "Model with [0.5] was already fit.\n",
      "Model with [1] was already fit.\n",
      "Model with [2] was already fit.\n",
      "Model with [4] was already fit.\n"
     ]
    }
   ],
   "source": [
    "conc_sets = [[0.125], [0.25], [0.5], [1], [2], [4]]\n",
    "\n",
    "# Make a directory to house pickled models\n",
    "os.makedirs(\"fit_polyclonal_models\", exist_ok=True)\n",
    "\n",
    "\n",
    "def fit_polyclonal(conc_set):\n",
    "    \"\"\"\n",
    "    Fit `Polyclonal` model with data measured for a specific concentration set.\n",
    "    Returns fit `Polyclonal` object.\n",
    "    \"\"\"\n",
    "    poly_abs = polyclonal.Polyclonal(\n",
    "        data_to_fit=noisy_data.query(f\"concentration in {conc_set}\"),\n",
    "        activity_wt_df=pd.DataFrame.from_records(\n",
    "            [\n",
    "                (\"1\", 1.0),\n",
    "                (\"2\", 3.0),\n",
    "                (\"3\", 2.0),\n",
    "            ],\n",
    "            columns=[\"epitope\", \"activity\"],\n",
    "        ),\n",
    "        site_escape_df=pd.DataFrame.from_records(\n",
    "            [\n",
    "                (\"1\", 417, 10.0),\n",
    "                (\"2\", 484, 10.0),\n",
    "                (\"3\", 444, 10.0),\n",
    "            ],\n",
    "            columns=[\"epitope\", \"site\", \"escape\"],\n",
    "        ),\n",
    "        data_mut_escape_overlap=\"fill_to_data\",\n",
    "    )\n",
    "    poly_abs.fit(reg_escape_weight=0.01, reg_uniqueness2_weight=0)\n",
    "    return poly_abs\n",
    "\n",
    "\n",
    "# Store all fit models in a dictionary for future lookup\n",
    "fit_models = {}\n",
    "\n",
    "for s in conc_sets:\n",
    "    # These are the keys for fit models\n",
    "    model_string = f\"noisy_{s}conc_3muts\"\n",
    "\n",
    "    # If the pickled model exists in fit_polyclonal_models directory,\n",
    "    # load it and update fit_models\n",
    "    if os.path.exists(f\"fit_polyclonal_models/{model_string}.pkl\") is True:\n",
    "        model = pickle.load(open(f\"fit_polyclonal_models/{model_string}.pkl\", \"rb\"))\n",
    "        fit_models.update({model_string: model})\n",
    "        print(f\"Model with {s} was already fit.\")\n",
    "    else:\n",
    "        # Else, fit a model using fit_polyclonal(), save it to the\n",
    "        # fit_polyclonal_models directory, and update fit_models\n",
    "        model = fit_polyclonal(s)\n",
    "        fit_models.update({model_string: model})\n",
    "        pickle.dump(model, open(f\"fit_polyclonal_models/{model_string}.pkl\", \"wb\"))\n",
    "        print(f\"Model with {s} fit and saved.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can look at the correlation between the “true” and inferred mutation-escape values, $\\beta_{m,e}$, for the fit models. These mutation-escape values represent the extent to which mutations mediate escape from specific epitopes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_corrs = pd.DataFrame()\n",
    "\n",
    "for s in conc_sets:\n",
    "    model = fit_models[f\"noisy_{s}conc_3muts\"]\n",
    "\n",
    "    mut_escape_pred = pd.read_csv(\"RBD_mut_escape_df.csv\").merge(\n",
    "        (\n",
    "            model.mut_escape_df.assign(\n",
    "                epitope=lambda x: \"class \" + x[\"epitope\"].astype(str)\n",
    "            ).rename(columns={\"escape\": \"predicted escape\"})\n",
    "        ),\n",
    "        on=[\"mutation\", \"epitope\"],\n",
    "        validate=\"one_to_one\",\n",
    "    )\n",
    "\n",
    "    corr = (\n",
    "        mut_escape_pred.groupby(\"epitope\")\n",
    "        .apply(lambda x: x[\"escape\"].corr(x[\"predicted escape\"]) ** 2)\n",
    "        .rename(\"correlation (R^2)\")\n",
    "        .reset_index()\n",
    "    )\n",
    "\n",
    "    all_corrs = pd.concat(\n",
    "        [\n",
    "            all_corrs,\n",
    "            corr.assign(\n",
    "                icxx_set=[\" \".join([f\"IC{wt_data_icxx[c]}\" for c in s])]\n",
    "                * len(corr.index)\n",
    "            ),\n",
    "        ]\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tyu2/.local/lib/python3.8/site-packages/altair/utils/core.py:317: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.\n",
      "  for col_name, dtype in df.dtypes.iteritems():\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "\n",
       "<div id=\"altair-viz-517cdcd5a6174091994094ab4107e9e5\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-517cdcd5a6174091994094ab4107e9e5\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-517cdcd5a6174091994094ab4107e9e5\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@5.2.0?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function maybeLoadScript(lib, version) {\n",
       "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
       "      return (VEGA_DEBUG[key] == version) ?\n",
       "        Promise.resolve(paths[lib]) :\n",
       "        new Promise(function(resolve, reject) {\n",
       "          var s = document.createElement('script');\n",
       "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "          s.async = true;\n",
       "          s.onload = () => {\n",
       "            VEGA_DEBUG[key] = version;\n",
       "            return resolve(paths[lib]);\n",
       "          };\n",
       "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "          s.src = paths[lib];\n",
       "        });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else {\n",
       "      maybeLoadScript(\"vega\", \"5\")\n",
       "        .then(() => maybeLoadScript(\"vega-lite\", \"5.2.0\"))\n",
       "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-8cf1de1071b8cf3f72e40839af01a692\"}, \"mark\": {\"type\": \"circle\", \"size\": 125}, \"encoding\": {\"color\": {\"field\": \"epitope\", \"legend\": null, \"scale\": {\"range\": [\"#0072B2\", \"#CC79A7\", \"#4C3549\"]}, \"type\": \"nominal\"}, \"column\": {\"field\": \"epitope\", \"type\": \"nominal\"}, \"tooltip\": [{\"field\": \"icxx_set\", \"type\": \"ordinal\"}, {\"field\": \"correlation (R^2)\", \"format\": \".3f\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"icxx_set\", \"sort\": {\"field\": \"x\", \"order\": \"descending\"}, \"type\": \"ordinal\"}, \"y\": {\"field\": \"correlation (R^2)\", \"type\": \"quantitative\"}}, \"height\": 200, \"title\": \"inferred vs. true mutation escape values\", \"width\": 200, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.2.0.json\", \"datasets\": {\"data-8cf1de1071b8cf3f72e40839af01a692\": [{\"epitope\": \"class 1\", \"correlation (R^2)\": 0.6494946791057244, \"icxx_set\": \"IC95.141\"}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9411429752536082, \"icxx_set\": \"IC95.141\"}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9235798092082146, \"icxx_set\": \"IC95.141\"}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.633408776422364, \"icxx_set\": \"IC98.854\"}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9449715250188195, \"icxx_set\": \"IC98.854\"}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9328787988314352, \"icxx_set\": \"IC98.854\"}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.5855229863399843, \"icxx_set\": \"IC99.79\"}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9447264445598186, \"icxx_set\": \"IC99.79\"}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9320465960897137, \"icxx_set\": \"IC99.79\"}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.47277660829732004, \"icxx_set\": \"IC99.968\"}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9255322091240006, \"icxx_set\": \"IC99.968\"}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9266370719977781, \"icxx_set\": \"IC99.968\"}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.3998783246881719, \"icxx_set\": \"IC99.995\"}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9295250314996651, \"icxx_set\": \"IC99.995\"}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9009690071099429, \"icxx_set\": \"IC99.995\"}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.3722224083975561, \"icxx_set\": \"IC99.999\"}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9126328899571241, \"icxx_set\": \"IC99.999\"}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.8726719283299381, \"icxx_set\": \"IC99.999\"}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NBVAL_IGNORE_OUTPUT\n",
    "alt.Chart(all_corrs).mark_circle(size=125).encode(\n",
    "    x=alt.X(\"icxx_set:O\", sort=alt.EncodingSortField(\"x\", order=\"descending\")),\n",
    "    y=\"correlation (R^2):Q\",\n",
    "    column=\"epitope:N\",\n",
    "    tooltip=[\"icxx_set:O\", alt.Tooltip(\"correlation (R^2)\", format=\".3f\")],\n",
    "    color=alt.Color(\n",
    "        \"epitope\", scale=alt.Scale(range=[\"#0072B2\", \"#CC79A7\", \"#4C3549\"]), legend=None\n",
    "    ),\n",
    ").properties(width=200, height=200, title=\"inferred vs. true mutation escape values\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Additionally, we'll look at the correlation between \"true\" and predicted IC90's for each of the fit models. To do this, we'll predict the IC90's of variants in a separate library with a with a different (higher) mutation rate. We therefore read in the “exact” simulated data from a library containing variants with an average of four mutations per gene."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "exact_data = (\n",
    "    pd.read_csv(\"RBD_variants_escape_exact.csv\", na_filter=None)\n",
    "    .query('library == \"avg4muts\"')\n",
    "    .query(\"concentration in [1]\")\n",
    "    .reset_index(drop=True)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll make the comparison on a log scale, and clip IC90s at values >50 as that is likely to be way outside the dynamic range given the concentrations used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "ic90_corrs = pd.DataFrame()\n",
    "\n",
    "max_ic90 = 50\n",
    "for s in conc_sets:\n",
    "    model = fit_models[f\"noisy_{s}conc_3muts\"]\n",
    "\n",
    "    ic90s = (\n",
    "        exact_data[[\"aa_substitutions\", \"IC90\"]]\n",
    "        .assign(IC90=lambda x: x[\"IC90\"].clip(upper=max_ic90))\n",
    "        .drop_duplicates()\n",
    "    )\n",
    "    ic90s = model.filter_variants_by_seen_muts(ic90s)\n",
    "    ic90s = model.icXX(ic90s, x=0.9, col=\"predicted_IC90\", max_c=max_ic90)\n",
    "\n",
    "    ic90s = ic90s.assign(\n",
    "        log_IC90=lambda x: np.log10(x[\"IC90\"]),\n",
    "        predicted_log_IC90=lambda x: np.log10(x[\"predicted_IC90\"]),\n",
    "    )\n",
    "\n",
    "    corr = ic90s[\"log_IC90\"].corr(ic90s[\"predicted_log_IC90\"]) ** 2\n",
    "\n",
    "    ic90_corrs = pd.concat(\n",
    "        [\n",
    "            ic90_corrs,\n",
    "            pd.DataFrame(\n",
    "                {\n",
    "                    \"correlation (R^2)\": corr,\n",
    "                    \"icxx_set\": [f\"IC{wt_data_icxx[c]}\" for c in s],\n",
    "                }\n",
    "            ),\n",
    "        ]\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tyu2/.local/lib/python3.8/site-packages/altair/utils/core.py:317: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.\n",
      "  for col_name, dtype in df.dtypes.iteritems():\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "\n",
       "<div id=\"altair-viz-f9d08b355c564a1797e593ca476d283c\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-f9d08b355c564a1797e593ca476d283c\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-f9d08b355c564a1797e593ca476d283c\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@5.2.0?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function maybeLoadScript(lib, version) {\n",
       "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
       "      return (VEGA_DEBUG[key] == version) ?\n",
       "        Promise.resolve(paths[lib]) :\n",
       "        new Promise(function(resolve, reject) {\n",
       "          var s = document.createElement('script');\n",
       "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "          s.async = true;\n",
       "          s.onload = () => {\n",
       "            VEGA_DEBUG[key] = version;\n",
       "            return resolve(paths[lib]);\n",
       "          };\n",
       "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "          s.src = paths[lib];\n",
       "        });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else {\n",
       "      maybeLoadScript(\"vega\", \"5\")\n",
       "        .then(() => maybeLoadScript(\"vega-lite\", \"5.2.0\"))\n",
       "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-20bba568f249810ece7bb6819d46dd5a\"}, \"mark\": {\"type\": \"circle\", \"size\": 125}, \"encoding\": {\"tooltip\": [{\"field\": \"icxx_set\", \"type\": \"nominal\"}, {\"field\": \"correlation (R^2)\", \"format\": \".3f\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"icxx_set\", \"type\": \"ordinal\"}, \"y\": {\"field\": \"correlation (R^2)\", \"type\": \"quantitative\"}}, \"height\": 200, \"title\": \"predicted vs. true IC90\", \"width\": 200, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.2.0.json\", \"datasets\": {\"data-20bba568f249810ece7bb6819d46dd5a\": [{\"correlation (R^2)\": 0.9735547111012681, \"icxx_set\": \"IC95.141\"}, {\"correlation (R^2)\": 0.9706927418536281, \"icxx_set\": \"IC98.854\"}, {\"correlation (R^2)\": 0.9371232495080029, \"icxx_set\": \"IC99.79\"}, {\"correlation (R^2)\": 0.8524044532251411, \"icxx_set\": \"IC99.968\"}, {\"correlation (R^2)\": 0.7156542863450766, \"icxx_set\": \"IC99.995\"}, {\"correlation (R^2)\": 0.5676210971236344, \"icxx_set\": \"IC99.999\"}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NBVAL_IGNORE_OUTPUT\n",
    "alt.Chart(ic90_corrs).mark_circle(size=125).encode(\n",
    "    x=\"icxx_set:O\",\n",
    "    y=\"correlation (R^2):Q\",\n",
    "    tooltip=[\"icxx_set\", alt.Tooltip(\"correlation (R^2)\", format=\".3f\")],\n",
    ").properties(width=200, height=200, title=\"predicted vs. true IC90\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Across all models fit on data with single concentrations, the correlation was very strong."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll fit a couple more `Polyclonal` models to determine if adding a second or third concentration to the best performing single concentration will improve inference of the mutation-escape values. As a reference, we also fit a `Polyclonal` model on all six concentrations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model with [0.25, 1] was already fit.\n",
      "Model with [0.5, 1] was already fit.\n",
      "Model with [1, 2] was already fit.\n",
      "Model with [1, 4] was already fit.\n",
      "Model with [0.5, 1, 2] was already fit.\n",
      "Model with [0.25, 1, 4] was already fit.\n",
      "Model with [0.125, 0.25, 0.5, 1, 2, 4] was already fit.\n"
     ]
    }
   ],
   "source": [
    "conc_sets = [\n",
    "    [0.25, 1],\n",
    "    [0.5, 1],\n",
    "    [1, 2],\n",
    "    [1, 4],\n",
    "    [0.5, 1, 2],\n",
    "    [0.25, 1, 4],\n",
    "    [0.125, 0.25, 0.5, 1, 2, 4],\n",
    "]\n",
    "\n",
    "for s in conc_sets:\n",
    "    # These are the keys for fit models\n",
    "    model_string = f\"noisy_{s}conc_3muts\"\n",
    "\n",
    "    # If the pickled model exists in fit_polyclonal_models directory,\n",
    "    # load it and add to fit_models\n",
    "    if os.path.exists(f\"fit_polyclonal_models/{model_string}.pkl\") is True:\n",
    "        model = pickle.load(open(f\"fit_polyclonal_models/{model_string}.pkl\", \"rb\"))\n",
    "        fit_models.update({model_string: model})\n",
    "        print(f\"Model with {s} was already fit.\")\n",
    "    else:\n",
    "        # Else, fit a model using fit_polyclonal(), save it to the\n",
    "        # fit_polyclonal_models directory, and add to fit_models\n",
    "        model = fit_polyclonal(s)\n",
    "        fit_models.update({model_string: model})\n",
    "        pickle.dump(model, open(f\"fit_polyclonal_models/{model_string}.pkl\", \"wb\"))\n",
    "        print(f\"Model with {s} fit and saved.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, lets look at the correlation between \"true\" and predicted mutation-escape values for each of the fit models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add best performing single concentration to concentration sets, so we can plot as reference\n",
    "conc_sets_to_plot = [[1]] + conc_sets\n",
    "\n",
    "all_corrs = pd.DataFrame()\n",
    "\n",
    "for s in conc_sets_to_plot:\n",
    "    model = fit_models[f\"noisy_{s}conc_3muts\"]\n",
    "\n",
    "    mut_escape_pred = pd.read_csv(\"RBD_mut_escape_df.csv\").merge(\n",
    "        (\n",
    "            model.mut_escape_df.assign(\n",
    "                epitope=lambda x: \"class \" + x[\"epitope\"].astype(str)\n",
    "            ).rename(columns={\"escape\": \"predicted escape\"})\n",
    "        ),\n",
    "        on=[\"mutation\", \"epitope\"],\n",
    "        validate=\"one_to_one\",\n",
    "    )\n",
    "\n",
    "    corr = (\n",
    "        mut_escape_pred.groupby(\"epitope\")\n",
    "        .apply(lambda x: x[\"escape\"].corr(x[\"predicted escape\"]) ** 2)\n",
    "        .rename(\"correlation (R^2)\")\n",
    "        .reset_index()\n",
    "    )\n",
    "\n",
    "    all_corrs = pd.concat(\n",
    "        [\n",
    "            all_corrs,\n",
    "            corr.assign(\n",
    "                icxx_set=[\",\".join([f\"IC{wt_data_icxx[c]}\" for c in s])]\n",
    "                * len(corr.index),\n",
    "                num_concs=len([c for c in s]),\n",
    "            ),\n",
    "        ]\n",
    "    ).reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tyu2/.local/lib/python3.8/site-packages/altair/utils/core.py:317: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.\n",
      "  for col_name, dtype in df.dtypes.iteritems():\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "\n",
       "<div id=\"altair-viz-76de93894ce04dc8950bde4ca99d3450\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-76de93894ce04dc8950bde4ca99d3450\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-76de93894ce04dc8950bde4ca99d3450\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@5.2.0?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function maybeLoadScript(lib, version) {\n",
       "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
       "      return (VEGA_DEBUG[key] == version) ?\n",
       "        Promise.resolve(paths[lib]) :\n",
       "        new Promise(function(resolve, reject) {\n",
       "          var s = document.createElement('script');\n",
       "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "          s.async = true;\n",
       "          s.onload = () => {\n",
       "            VEGA_DEBUG[key] = version;\n",
       "            return resolve(paths[lib]);\n",
       "          };\n",
       "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "          s.src = paths[lib];\n",
       "        });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else {\n",
       "      maybeLoadScript(\"vega\", \"5\")\n",
       "        .then(() => maybeLoadScript(\"vega-lite\", \"5.2.0\"))\n",
       "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}, \"axis\": {\"labelLimit\": 300}}, \"data\": {\"name\": \"data-a4d697f23857208c41016677ef312adf\"}, \"mark\": {\"type\": \"circle\", \"size\": 125}, \"encoding\": {\"color\": {\"field\": \"epitope\", \"legend\": null, \"scale\": {\"range\": [\"#0072B2\", \"#CC79A7\", \"#4C3549\"]}, \"type\": \"nominal\"}, \"row\": {\"field\": \"epitope\", \"type\": \"nominal\"}, \"tooltip\": [{\"field\": \"icxx_set\", \"type\": \"nominal\"}, {\"field\": \"correlation (R^2)\", \"format\": \".3f\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"icxx_set\", \"sort\": {\"field\": \"x\", \"order\": \"descending\"}, \"type\": \"nominal\"}, \"y\": {\"field\": \"correlation (R^2)\", \"type\": \"quantitative\"}}, \"height\": 200, \"title\": \"inferred vs. true mutation escape values\", \"width\": 375, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.2.0.json\", \"datasets\": {\"data-a4d697f23857208c41016677ef312adf\": [{\"epitope\": \"class 1\", \"correlation (R^2)\": 0.47277660829732004, \"icxx_set\": \"IC99.968\", \"num_concs\": 1}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9255322091240006, \"icxx_set\": \"IC99.968\", \"num_concs\": 1}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9266370719977781, \"icxx_set\": \"IC99.968\", \"num_concs\": 1}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.6909266584026795, \"icxx_set\": \"IC98.854,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9525609389728366, \"icxx_set\": \"IC98.854,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9404001493279208, \"icxx_set\": \"IC98.854,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.6265691352114429, \"icxx_set\": \"IC99.79,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9557568037529991, \"icxx_set\": \"IC99.79,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9387576633420915, \"icxx_set\": \"IC99.79,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.4940309374276288, \"icxx_set\": \"IC99.968,IC99.995\", \"num_concs\": 2}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9380210668269039, \"icxx_set\": \"IC99.968,IC99.995\", \"num_concs\": 2}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9271310579812604, \"icxx_set\": \"IC99.968,IC99.995\", \"num_concs\": 2}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.5442101528058595, \"icxx_set\": \"IC99.968,IC99.999\", \"num_concs\": 2}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.933233971350855, \"icxx_set\": \"IC99.968,IC99.999\", \"num_concs\": 2}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9329567842547906, \"icxx_set\": \"IC99.968,IC99.999\", \"num_concs\": 2}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.6672003017549143, \"icxx_set\": \"IC99.79,IC99.968,IC99.995\", \"num_concs\": 3}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9550005721717503, \"icxx_set\": \"IC99.79,IC99.968,IC99.995\", \"num_concs\": 3}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9409394223955386, \"icxx_set\": \"IC99.79,IC99.968,IC99.995\", \"num_concs\": 3}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.6903693412396601, \"icxx_set\": \"IC98.854,IC99.968,IC99.999\", \"num_concs\": 3}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.973458806521, \"icxx_set\": \"IC98.854,IC99.968,IC99.999\", \"num_concs\": 3}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9496508848872558, \"icxx_set\": \"IC98.854,IC99.968,IC99.999\", \"num_concs\": 3}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.7317139483963229, \"icxx_set\": \"IC95.141,IC98.854,IC99.79,IC99.968,IC99.995,IC99.999\", \"num_concs\": 6}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9568631207505862, \"icxx_set\": \"IC95.141,IC98.854,IC99.79,IC99.968,IC99.995,IC99.999\", \"num_concs\": 6}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9549092400778721, \"icxx_set\": \"IC95.141,IC98.854,IC99.79,IC99.968,IC99.995,IC99.999\", \"num_concs\": 6}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NBVAL_IGNORE_OUTPUT\n",
    "alt.Chart(all_corrs).mark_circle(size=125).encode(\n",
    "    x=alt.X(\n",
    "        \"icxx_set:N\",\n",
    "        sort=alt.EncodingSortField(\"x\", order=\"descending\"),\n",
    "    ),\n",
    "    y=\"correlation (R^2):Q\",\n",
    "    tooltip=[\"icxx_set\", alt.Tooltip(\"correlation (R^2)\", format=\".3f\")],\n",
    "    color=alt.Color(\n",
    "        \"epitope\", scale=alt.Scale(range=[\"#0072B2\", \"#CC79A7\", \"#4C3549\"]), legend=None\n",
    "    ),\n",
    "    row=\"epitope:N\",\n",
    ").properties(\n",
    "    width=375, height=200, title=\"inferred vs. true mutation escape values\"\n",
    ").configure_axis(\n",
    "    labelLimit=300\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the performance of models fit on data with two or three concentrations is nearly indistinguishable from that of the model fit on data with all six concentrations. Additionally, adding two or three concentrations does lead to a modest improvement in the correlation, especially for the class 1 epitope, which is expected to be the hardest to predict since it has the lowest wildtype activity. We can also summarize this by plotting the number of concentrations used for fitting on the x-axis. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<div id=\"altair-viz-f7fa1cfbcaef410a96500452d34e7882\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-f7fa1cfbcaef410a96500452d34e7882\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-f7fa1cfbcaef410a96500452d34e7882\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@5.2.0?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function maybeLoadScript(lib, version) {\n",
       "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
       "      return (VEGA_DEBUG[key] == version) ?\n",
       "        Promise.resolve(paths[lib]) :\n",
       "        new Promise(function(resolve, reject) {\n",
       "          var s = document.createElement('script');\n",
       "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "          s.async = true;\n",
       "          s.onload = () => {\n",
       "            VEGA_DEBUG[key] = version;\n",
       "            return resolve(paths[lib]);\n",
       "          };\n",
       "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "          s.src = paths[lib];\n",
       "        });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else {\n",
       "      maybeLoadScript(\"vega\", \"5\")\n",
       "        .then(() => maybeLoadScript(\"vega-lite\", \"5.2.0\"))\n",
       "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-a4d697f23857208c41016677ef312adf\"}, \"mark\": {\"type\": \"circle\", \"size\": 125}, \"encoding\": {\"color\": {\"field\": \"epitope\", \"legend\": null, \"scale\": {\"range\": [\"#0072B2\", \"#CC79A7\", \"#4C3549\"]}, \"type\": \"nominal\"}, \"column\": {\"field\": \"epitope\", \"type\": \"nominal\"}, \"tooltip\": [{\"field\": \"icxx_set\", \"type\": \"ordinal\"}, {\"field\": \"num_concs\", \"type\": \"ordinal\"}, {\"field\": \"correlation (R^2)\", \"format\": \".3f\", \"type\": \"quantitative\"}], \"x\": {\"axis\": {\"labelAngle\": 0}, \"field\": \"num_concs\", \"type\": \"ordinal\"}, \"y\": {\"field\": \"correlation (R^2)\", \"type\": \"quantitative\"}}, \"height\": 200, \"title\": \"inferred vs. true mutation escape values\", \"width\": 150, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.2.0.json\", \"datasets\": {\"data-a4d697f23857208c41016677ef312adf\": [{\"epitope\": \"class 1\", \"correlation (R^2)\": 0.47277660829732004, \"icxx_set\": \"IC99.968\", \"num_concs\": 1}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9255322091240006, \"icxx_set\": \"IC99.968\", \"num_concs\": 1}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9266370719977781, \"icxx_set\": \"IC99.968\", \"num_concs\": 1}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.6909266584026795, \"icxx_set\": \"IC98.854,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9525609389728366, \"icxx_set\": \"IC98.854,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9404001493279208, \"icxx_set\": \"IC98.854,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.6265691352114429, \"icxx_set\": \"IC99.79,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9557568037529991, \"icxx_set\": \"IC99.79,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9387576633420915, \"icxx_set\": \"IC99.79,IC99.968\", \"num_concs\": 2}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.4940309374276288, \"icxx_set\": \"IC99.968,IC99.995\", \"num_concs\": 2}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9380210668269039, \"icxx_set\": \"IC99.968,IC99.995\", \"num_concs\": 2}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9271310579812604, \"icxx_set\": \"IC99.968,IC99.995\", \"num_concs\": 2}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.5442101528058595, \"icxx_set\": \"IC99.968,IC99.999\", \"num_concs\": 2}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.933233971350855, \"icxx_set\": \"IC99.968,IC99.999\", \"num_concs\": 2}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9329567842547906, \"icxx_set\": \"IC99.968,IC99.999\", \"num_concs\": 2}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.6672003017549143, \"icxx_set\": \"IC99.79,IC99.968,IC99.995\", \"num_concs\": 3}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9550005721717503, \"icxx_set\": \"IC99.79,IC99.968,IC99.995\", \"num_concs\": 3}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9409394223955386, \"icxx_set\": \"IC99.79,IC99.968,IC99.995\", \"num_concs\": 3}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.6903693412396601, \"icxx_set\": \"IC98.854,IC99.968,IC99.999\", \"num_concs\": 3}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.973458806521, \"icxx_set\": \"IC98.854,IC99.968,IC99.999\", \"num_concs\": 3}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9496508848872558, \"icxx_set\": \"IC98.854,IC99.968,IC99.999\", \"num_concs\": 3}, {\"epitope\": \"class 1\", \"correlation (R^2)\": 0.7317139483963229, \"icxx_set\": \"IC95.141,IC98.854,IC99.79,IC99.968,IC99.995,IC99.999\", \"num_concs\": 6}, {\"epitope\": \"class 2\", \"correlation (R^2)\": 0.9568631207505862, \"icxx_set\": \"IC95.141,IC98.854,IC99.79,IC99.968,IC99.995,IC99.999\", \"num_concs\": 6}, {\"epitope\": \"class 3\", \"correlation (R^2)\": 0.9549092400778721, \"icxx_set\": \"IC95.141,IC98.854,IC99.79,IC99.968,IC99.995,IC99.999\", \"num_concs\": 6}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NBVAL_IGNORE_OUTPUT\n",
    "alt.Chart(all_corrs).mark_circle(size=125).encode(\n",
    "    x=alt.X(\"num_concs:O\", axis=alt.Axis(labelAngle=0)),\n",
    "    y=\"correlation (R^2):Q\",\n",
    "    column=\"epitope:N\",\n",
    "    tooltip=[\n",
    "        \"icxx_set:O\",\n",
    "        \"num_concs:O\",\n",
    "        alt.Tooltip(\"correlation (R^2)\", format=\".3f\"),\n",
    "    ],\n",
    "    color=alt.Color(\n",
    "        \"epitope\", scale=alt.Scale(range=[\"#0072B2\", \"#CC79A7\", \"#4C3549\"]), legend=None\n",
    "    ),\n",
    ").properties(width=150, height=200, title=\"inferred vs. true mutation escape values\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, lets look at the correlation between \"true\" and predicted IC90's for each of the fit models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "ic90_corrs = pd.DataFrame()\n",
    "\n",
    "max_ic90 = 50\n",
    "for s in conc_sets_to_plot:\n",
    "    model = fit_models[f\"noisy_{s}conc_3muts\"]\n",
    "\n",
    "    ic90s = (\n",
    "        exact_data[[\"aa_substitutions\", \"IC90\"]]\n",
    "        .assign(IC90=lambda x: x[\"IC90\"].clip(upper=max_ic90))\n",
    "        .drop_duplicates()\n",
    "    )\n",
    "    ic90s = model.filter_variants_by_seen_muts(ic90s)\n",
    "    ic90s = model.icXX(ic90s, x=0.9, col=\"predicted_IC90\", max_c=max_ic90)\n",
    "\n",
    "    ic90s = ic90s.assign(\n",
    "        log_IC90=lambda x: np.log10(x[\"IC90\"]),\n",
    "        predicted_log_IC90=lambda x: np.log10(x[\"predicted_IC90\"]),\n",
    "    )\n",
    "\n",
    "    corr = ic90s[\"log_IC90\"].corr(ic90s[\"predicted_log_IC90\"]) ** 2\n",
    "\n",
    "    ic90_corrs = pd.concat(\n",
    "        [\n",
    "            ic90_corrs,\n",
    "            pd.DataFrame(\n",
    "                {\n",
    "                    \"correlation (R^2)\": corr,\n",
    "                    \"icxx_set\": [[f\"IC{wt_data_icxx[c]}\" for c in s]],\n",
    "                }\n",
    "            ),\n",
    "        ]\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tyu2/.local/lib/python3.8/site-packages/altair/utils/core.py:317: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.\n",
      "  for col_name, dtype in df.dtypes.iteritems():\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "\n",
       "<div id=\"altair-viz-77fd88ba83ec4c418398a88c2dd39a86\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-77fd88ba83ec4c418398a88c2dd39a86\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-77fd88ba83ec4c418398a88c2dd39a86\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@5.2.0?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function maybeLoadScript(lib, version) {\n",
       "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
       "      return (VEGA_DEBUG[key] == version) ?\n",
       "        Promise.resolve(paths[lib]) :\n",
       "        new Promise(function(resolve, reject) {\n",
       "          var s = document.createElement('script');\n",
       "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "          s.async = true;\n",
       "          s.onload = () => {\n",
       "            VEGA_DEBUG[key] = version;\n",
       "            return resolve(paths[lib]);\n",
       "          };\n",
       "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "          s.src = paths[lib];\n",
       "        });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else {\n",
       "      maybeLoadScript(\"vega\", \"5\")\n",
       "        .then(() => maybeLoadScript(\"vega-lite\", \"5.2.0\"))\n",
       "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-dbc277a6a2b7f964ce12499caf84a24e\"}, \"mark\": {\"type\": \"circle\", \"size\": 125}, \"encoding\": {\"tooltip\": [{\"field\": \"icxx_set\", \"type\": \"nominal\"}, {\"field\": \"correlation (R^2)\", \"format\": \".3f\", \"type\": \"quantitative\"}], \"x\": {\"axis\": {\"labelAngle\": 0}, \"field\": \"icxx_set\", \"sort\": {\"field\": \"x\", \"order\": \"descending\"}, \"type\": \"ordinal\"}, \"y\": {\"field\": \"correlation (R^2)\", \"type\": \"quantitative\"}}, \"height\": 200, \"title\": \"predicted vs. true IC90\", \"width\": 375, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.2.0.json\", \"datasets\": {\"data-dbc277a6a2b7f964ce12499caf84a24e\": [{\"correlation (R^2)\": 0.8524044532251411, \"icxx_set\": [\"IC99.968\"]}, {\"correlation (R^2)\": 0.96607646947196, \"icxx_set\": [\"IC98.854\", \"IC99.968\"]}, {\"correlation (R^2)\": 0.9425893159550646, \"icxx_set\": [\"IC99.79\", \"IC99.968\"]}, {\"correlation (R^2)\": 0.8242406059856914, \"icxx_set\": [\"IC99.968\", \"IC99.995\"]}, {\"correlation (R^2)\": 0.8256362539344192, \"icxx_set\": [\"IC99.968\", \"IC99.999\"]}, {\"correlation (R^2)\": 0.9199170200617324, \"icxx_set\": [\"IC99.79\", \"IC99.968\", \"IC99.995\"]}, {\"correlation (R^2)\": 0.9647391925377282, \"icxx_set\": [\"IC98.854\", \"IC99.968\", \"IC99.999\"]}, {\"correlation (R^2)\": 0.9096933604307439, \"icxx_set\": [\"IC95.141\", \"IC98.854\", \"IC99.79\", \"IC99.968\", \"IC99.995\", \"IC99.999\"]}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NBVAL_IGNORE_OUTPUT\n",
    "alt.Chart(ic90_corrs).mark_circle(size=125).encode(\n",
    "    x=alt.X(\n",
    "        \"icxx_set:O\",\n",
    "        sort=alt.EncodingSortField(\"x\", order=\"descending\"),\n",
    "        axis=alt.Axis(labelAngle=0),\n",
    "    ),\n",
    "    y=\"correlation (R^2):Q\",\n",
    "    tooltip=[\"icxx_set\", alt.Tooltip(\"correlation (R^2)\", format=\".3f\")],\n",
    ").properties(width=375, height=200, title=\"predicted vs. true IC90\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fitting with more concentrations maintained the excellent IC90 prediction exhibited by models fit on data with a single concentration."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "Based on these simulation experiments:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Choose a single concentration that is potent enough (ex. `IC99.9`) to neutralize all wildtype viruses, but not potent enough to neutralize all the variants. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. Adding a second or third concentration can help refine inference of true mutation-escape values, especially for the most subdominant epitopes. Due to technical difficulties in precisely achieving the  `IC99.9` in a given experiment, we suggest using concentrations that are 2-4 fold higher or lower than `IC99.9` in order to more likely span the correct dynamic range."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}