{ "cells": [ { "cell_type": "markdown", "id": "b9e3d531", "metadata": {}, "source": [ "# Averaging models\n", "Probably the best way to ensure robust inferences and estimate errors is to have multiple experimental replicates, ideally on different libraries.\n", "\n", "Here we describe how to average model fits across libraries and/or replicates." ] }, { "cell_type": "markdown", "id": "6e2135ec", "metadata": {}, "source": [ "## Split data into replicates\n", "We will use our data for the RBD as an earlier examples, but split it into several libraries / replicates.\n", "\n", "Specifically, we will fit two different libraries: `avg2muts` and `avg3muts`, which have different barcodes and also different mutation rates (although of course in real life you might sometimes want to average results from different libraries with the same mutation rates).\n", "We will also simulate having two replicates for each library just by sampling each library.\n", "To make this example faster, we'll just use one concentration:" ] }, { "cell_type": "code", "execution_count": 1, "id": "b242df9c", "metadata": { "execution": { "iopub.execute_input": "2023-04-11T20:19:12.879382Z", "iopub.status.busy": "2023-04-11T20:19:12.879157Z", "iopub.status.idle": "2023-04-11T20:19:15.737800Z", "shell.execute_reply": "2023-04-11T20:19:15.737059Z", "shell.execute_reply.started": "2023-04-11T20:19:12.879360Z" }, "tags": [] }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "import polyclonal.polyclonal\n", "import polyclonal.polyclonal_collection\n", "\n", "\n", "# read data\n", "all_data = pd.read_csv(\"RBD_variants_escape_noisy.csv\", na_filter=None)\n", "\n", "# split by library and replicates\n", "libraries = [\"avg2muts\", \"avg3muts\"] # the two libraries to use\n", "concentrations = [1] # use just use this concentration\n", "n_replicates = 2 # number of replicates per library\n", "\n", "data_by_replicate = {\n", " (library, replicate + 1): (\n", " all_data.query(\"library == @library\")\n", " .query(\"concentration in @concentrations\")\n", " .sample(frac=0.3, random_state=replicate)\n", " )\n", " for library in libraries\n", " for replicate in range(n_replicates)\n", "}" ] }, { "cell_type": "markdown", "id": "bb47c6ce", "metadata": {}, "source": [ "## Fit models to each replicate\n", "We now fit a `Polyclonal` model to each replicate using just 2 epitopes, as the data don't seem sufficient to accurately fit all three epitopes.\n", "Then we arrange the models in a data frame:" ] }, { "cell_type": "code", "execution_count": 2, "id": "1f86179e", "metadata": { "execution": { "iopub.execute_input": "2023-04-11T20:19:15.741207Z", "iopub.status.busy": "2023-04-11T20:19:15.740852Z", "iopub.status.idle": "2023-04-11T20:20:52.151015Z", "shell.execute_reply": "2023-04-11T20:20:52.149442Z", "shell.execute_reply.started": "2023-04-11T20:19:15.741183Z" }, "tags": [] }, "outputs": [], "source": [ "# first create a data frame with all the models\n", "models_by_replicate = {}\n", "for (library, replicate), data in data_by_replicate.items():\n", " model = polyclonal.Polyclonal(data_to_fit=data, n_epitopes=2)\n", " models_by_replicate[(library, replicate)] = model\n", "models_df = (\n", " pd.Series(models_by_replicate, name=\"model\")\n", " .rename_axis([\"library\", \"replicate\"])\n", " .reset_index()\n", ")\n", "\n", "# now fit the models\n", "n_fit, n_failed, models_df[\"model\"] = polyclonal.polyclonal_collection.fit_models(\n", " models_df[\"model\"],\n", " n_threads=2,\n", " reg_escape_weight=0.01,\n", " reg_uniqueness2_weight=0,\n", ")" ] }, { "cell_type": "markdown", "id": "9b91ea0e", "metadata": {}, "source": [ "Note how the models are arranged in a data frame:" ] }, { "cell_type": "code", "execution_count": 3, "id": "d6e2060b", "metadata": { "execution": { "iopub.execute_input": "2023-04-11T20:20:52.158647Z", "iopub.status.busy": "2023-04-11T20:20:52.158108Z", "iopub.status.idle": "2023-04-11T20:20:52.176663Z", "shell.execute_reply": "2023-04-11T20:20:52.176005Z", "shell.execute_reply.started": "2023-04-11T20:20:52.158597Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
| \n", " | library | \n", "replicate | \n", "model | \n", "
|---|---|---|---|
| 0 | \n", "avg2muts | \n", "1 | \n", "<polyclonal.polyclonal.Polyclonal object at 0x... | \n", "
| 1 | \n", "avg2muts | \n", "2 | \n", "<polyclonal.polyclonal.Polyclonal object at 0x... | \n", "
| 2 | \n", "avg3muts | \n", "1 | \n", "<polyclonal.polyclonal.Polyclonal object at 0x... | \n", "
| 3 | \n", "avg3muts | \n", "2 | \n", "<polyclonal.polyclonal.Polyclonal object at 0x... | \n", "
| \n", " | epitope | \n", "site | \n", "wildtype | \n", "mutant | \n", "mutation | \n", "escape_mean | \n", "escape_median | \n", "escape_min_magnitude | \n", "escape_std | \n", "n_models | \n", "times_seen | \n", "frac_models | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "331 | \n", "N | \n", "A | \n", "N331A | \n", "-0.066612 | \n", "-0.031718 | \n", "0.026679 | \n", "0.324873 | \n", "4 | \n", "17.75 | \n", "1.0 | \n", "
| 1 | \n", "1 | \n", "331 | \n", "N | \n", "D | \n", "N331D | \n", "-0.098052 | \n", "-0.100283 | \n", "-0.071111 | \n", "0.022076 | \n", "4 | \n", "11.25 | \n", "1.0 | \n", "
| 2 | \n", "1 | \n", "331 | \n", "N | \n", "E | \n", "N331E | \n", "-0.061359 | \n", "-0.008731 | \n", "-0.001655 | \n", "0.112576 | \n", "4 | \n", "10.25 | \n", "1.0 | \n", "
| 3 | \n", "1 | \n", "331 | \n", "N | \n", "F | \n", "N331F | \n", "0.270243 | \n", "0.170869 | \n", "0.014552 | \n", "0.330315 | \n", "4 | \n", "10.00 | \n", "1.0 | \n", "
| 4 | \n", "1 | \n", "331 | \n", "N | \n", "G | \n", "N331G | \n", "0.186271 | \n", "0.164941 | \n", "0.031369 | \n", "0.232272 | \n", "4 | \n", "25.00 | \n", "1.0 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 3859 | \n", "2 | \n", "531 | \n", "T | \n", "R | \n", "T531R | \n", "-0.176471 | \n", "-0.112467 | \n", "-0.033899 | \n", "0.187640 | \n", "4 | \n", "27.00 | \n", "1.0 | \n", "
| 3860 | \n", "2 | \n", "531 | \n", "T | \n", "S | \n", "T531S | \n", "-0.075136 | \n", "-0.011741 | \n", "0.000405 | \n", "0.155037 | \n", "4 | \n", "31.75 | \n", "1.0 | \n", "
| 3861 | \n", "2 | \n", "531 | \n", "T | \n", "V | \n", "T531V | \n", "-0.041719 | \n", "-0.005589 | \n", "0.012383 | \n", "0.094055 | \n", "4 | \n", "19.50 | \n", "1.0 | \n", "
| 3862 | \n", "2 | \n", "531 | \n", "T | \n", "W | \n", "T531W | \n", "0.169718 | \n", "0.100366 | \n", "-0.004133 | \n", "0.266617 | \n", "4 | \n", "5.25 | \n", "1.0 | \n", "
| 3863 | \n", "2 | \n", "531 | \n", "T | \n", "Y | \n", "T531Y | \n", "-0.051762 | \n", "-0.050223 | \n", "-0.032461 | \n", "0.020582 | \n", "4 | \n", "11.75 | \n", "1.0 | \n", "
3864 rows × 12 columns
\n", "| \n", " | epitope | \n", "site | \n", "wildtype | \n", "mutant | \n", "mutation | \n", "escape | \n", "times_seen | \n", "library | \n", "replicate | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "331 | \n", "N | \n", "A | \n", "N331A | \n", "-0.090115 | \n", "19 | \n", "avg2muts | \n", "1 | \n", "
| 1 | \n", "1 | \n", "331 | \n", "N | \n", "D | \n", "N331D | \n", "-0.110795 | \n", "10 | \n", "avg2muts | \n", "1 | \n", "
| 2 | \n", "1 | \n", "331 | \n", "N | \n", "E | \n", "N331E | \n", "-0.001655 | \n", "11 | \n", "avg2muts | \n", "1 | \n", "
| 3 | \n", "1 | \n", "331 | \n", "N | \n", "F | \n", "N331F | \n", "0.037305 | \n", "10 | \n", "avg2muts | \n", "1 | \n", "
| 4 | \n", "1 | \n", "331 | \n", "N | \n", "G | \n", "N331G | \n", "0.031369 | \n", "18 | \n", "avg2muts | \n", "1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 15429 | \n", "2 | \n", "531 | \n", "T | \n", "R | \n", "T531R | \n", "-0.033899 | \n", "24 | \n", "avg3muts | \n", "2 | \n", "
| 15430 | \n", "2 | \n", "531 | \n", "T | \n", "S | \n", "T531S | \n", "0.000405 | \n", "42 | \n", "avg3muts | \n", "2 | \n", "
| 15431 | \n", "2 | \n", "531 | \n", "T | \n", "V | \n", "T531V | \n", "-0.023562 | \n", "24 | \n", "avg3muts | \n", "2 | \n", "
| 15432 | \n", "2 | \n", "531 | \n", "T | \n", "W | \n", "T531W | \n", "0.532476 | \n", "6 | \n", "avg3muts | \n", "2 | \n", "
| 15433 | \n", "2 | \n", "531 | \n", "T | \n", "Y | \n", "T531Y | \n", "-0.074142 | \n", "9 | \n", "avg3muts | \n", "2 | \n", "
15434 rows × 9 columns
\n", "