{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2c27fe6a-0c34-48b2-aae6-c4ba47ff545d",
   "metadata": {},
   "source": [
    "# Analysis of a network of RBFE SepTop calculations\n",
    "\n",
    "In this notebook we show how to analyze a network of transformations, run with the OpenFE Separated Topologies protocol.\n",
    "This notebook shows you how to extract\n",
    "\n",
    "- The overall difference in binding affinity between two ligands (DDG)\n",
    "- The MLE-derived absolute binding free energies\n",
    "- The contribution from the different legs (complex and solvent) of a transformation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f35d97f0-2887-4be5-8c61-b5e9b0ec0eac",
   "metadata": {},
   "source": [
    "### Downloading the example dataset\n",
    "\n",
    "First let's download some example SepTop results. Please skip this section if you have already done this!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9d3e0eec-eaec-434d-b3fa-91b8b5f842b2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2025-10-14 20:00:18--  https://zenodo.org/records/17435569/files/septop_results.zip\n",
      "Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.48.194, 188.185.43.25\n",
      "Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 5974197 (5.7M) [application/octet-stream]\n",
      "Saving to: ‘septop_results.zip’\n",
      "\n",
      "septop_results.zip  100%[===================>]   5.70M   559KB/s    in 11s     \n",
      "\n",
      "2025-10-14 20:00:30 (529 KB/s) - ‘septop_results.zip’ saved [5974197/5974197]\n",
      "\n",
      "Archive:  septop_results.zip\n",
      "   creating: septop_results\n",
      "   creating: septop_results/results_2\n",
      "  inflating: septop_results/results_2/rbfe_7a_7b.json  \n",
      "  inflating: septop_results/results_2/rbfe_1_7b.json  \n",
      "  inflating: septop_results/results_2/rbfe_1_7a.json  \n",
      "   creating: septop_results/results_1\n",
      "  inflating: septop_results/results_1/rbfe_7a_7b.json  \n",
      "  inflating: septop_results/results_1/rbfe_1_7a.json  \n",
      "   creating: septop_results/results_0\n",
      "  inflating: septop_results/results_0/rbfe_7a_7b.json  \n",
      "  inflating: septop_results/results_0/rbfe_1_7b.json  \n",
      "  inflating: septop_results/results_0/rbfe_1_7a.json  \n",
      "  inflating: septop_results/results_0/rbfe_1_25.json  \n"
     ]
    }
   ],
   "source": [
    "!wget https://zenodo.org/records/17435569/files/septop_results.zip\n",
    "!unzip septop_results.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7797cad5-08c2-404b-b5c4-cffd0df12d46",
   "metadata": {},
   "source": [
    "### Imports\n",
    "\n",
    "Here are a bunch of imports we will need later in the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7fbf1482-25ca-427b-a881-af88a983461c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import os\n",
    "import pathlib\n",
    "from gufe.tokenization import JSON_HANDLER\n",
    "import pandas as pd\n",
    "from openff.units import unit\n",
    "from openfecli.commands.gather import (\n",
    "    format_estimate_uncertainty,\n",
    "    _collect_result_jsons,\n",
    "    load_json,\n",
    ")\n",
    "from cinnabar import Measurement, FEMap"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28767183-02c3-42cc-84a6-786b4f4de470",
   "metadata": {},
   "source": [
    "### Some helper methods to load and format the SepTop results\n",
    "\n",
    "Over the next few cells, we define some helper methods that we will use to load and format the SepTop results.\n",
    "\n",
    "**Note: you do not need to directly interact with any of these, unless you are looking to change the behaviour of how data is being processed**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c185164a-f1c6-483b-b013-f4d499889570",
   "metadata": {},
   "outputs": [],
   "source": [
    "def _load_valid_result_json(\n",
    "    fpath: os.PathLike | str,\n",
    ") -> tuple[tuple | None, dict | None]:\n",
    "    \"\"\"Load the data from a results JSON into a dict.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    fpath : os.PathLike | str\n",
    "        The path to deserialized results.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    dict | None\n",
    "        A dict containing data from the results JSON,\n",
    "        or None if the JSON file is invalid or missing.\n",
    "\n",
    "    Raises\n",
    "    ------\n",
    "    ValueError\n",
    "      If the JSON file contains an ``estimate`` or ``uncertainty`` key with the\n",
    "      value ``None``.\n",
    "      If\n",
    "    \"\"\"\n",
    "\n",
    "    # TODO: only load this once during collection, then pass namedtuple(fname, dict) into this function\n",
    "    # for now though, it's not the bottleneck on performance\n",
    "    result = load_json(fpath)\n",
    "    try:\n",
    "        names = _get_names(result)\n",
    "    except (ValueError, IndexError):\n",
    "        print(f\"{fpath}: Missing ligand names. Skipping.\")\n",
    "        return None, None\n",
    "    if result[\"estimate\"] is None:\n",
    "        errormsg = f\"{fpath}: No 'estimate' found, assuming to be a failed simulation.\"\n",
    "        raise ValueError(errormsg)\n",
    "\n",
    "    return names, result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3733c540-de62-45a0-ba66-268397a38b49",
   "metadata": {},
   "outputs": [],
   "source": [
    "def _get_legs_from_result_jsons(\n",
    "    result_fns: list[pathlib.Path],\n",
    ") -> dict[str, dict[str, list]]:\n",
    "    \"\"\"\n",
    "    Iterate over a list of result JSONs and populate a dict of dicts with all data needed\n",
    "    for results processing.\n",
    "\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    result_fns : list[pathlib.Path]\n",
    "        List of filepaths containing results formatted as JSON.\n",
    "    report : Literal[\"dg\", \"ddg\", \"raw\"]\n",
    "        Type of report to generate.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    legs: dict[str, dict[str, list]]\n",
    "        Data extracted from the given result JSONs, organized by the leg's ligand names and simulation type.\n",
    "    \"\"\"\n",
    "    from collections import defaultdict\n",
    "\n",
    "    ddgs = defaultdict(lambda: defaultdict(list))\n",
    "\n",
    "    for result_fn in result_fns:\n",
    "        names, result = _load_valid_result_json(result_fn)\n",
    "        if names is None:  # this means it couldn't find names and/or simtype\n",
    "            continue\n",
    "\n",
    "        ddgs[names][\"overall\"].append([result[\"estimate\"], result[\"uncertainty\"]])\n",
    "        proto_key = [\n",
    "            k\n",
    "            for k in result[\"unit_results\"].keys()\n",
    "            if k.startswith(\"ProtocolUnitResult\")\n",
    "        ]\n",
    "        for p in proto_key:\n",
    "            if \"unit_estimate\" in result[\"unit_results\"][p][\"outputs\"]:\n",
    "                simtype = result[\"unit_results\"][p][\"outputs\"][\"simtype\"]\n",
    "                dg = result[\"unit_results\"][p][\"outputs\"][\"unit_estimate\"]\n",
    "                dg_error = result[\"unit_results\"][p][\"outputs\"][\"unit_estimate_error\"]\n",
    "\n",
    "                ddgs[names][simtype].append([dg, dg_error])\n",
    "            elif \"standard_state_correction_A\" in result[\"unit_results\"][p][\"outputs\"]:\n",
    "                corr_A = result[\"unit_results\"][p][\"outputs\"][\n",
    "                    \"standard_state_correction_A\"\n",
    "                ]\n",
    "                corr_B = result[\"unit_results\"][p][\"outputs\"][\n",
    "                    \"standard_state_correction_B\"\n",
    "                ]\n",
    "                ddgs[names][\"standard_state_correction_A\"].append(\n",
    "                    [corr_A, 0 * unit.kilocalorie_per_mole]\n",
    "                )\n",
    "                ddgs[names][\"standard_state_correction_B\"].append(\n",
    "                    [corr_B, 0 * unit.kilocalorie_per_mole]\n",
    "                )\n",
    "            else:\n",
    "                continue\n",
    "\n",
    "    return ddgs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "08afcbcf-34a8-4450-a967-eb4ed5800ee1",
   "metadata": {},
   "outputs": [],
   "source": [
    "def _get_names(result: dict) -> tuple[str, str]:\n",
    "    \"\"\"Get the ligand names from a unit's results data.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    result : dict\n",
    "        A results dict.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    tuple[str, str]\n",
    "        Ligand names corresponding to the results.\n",
    "    \"\"\"\n",
    "    try:\n",
    "        nm = list(result[\"unit_results\"].values())[0][\"name\"]\n",
    "\n",
    "    except KeyError:\n",
    "        raise ValueError(\"Failed to guess names\")\n",
    "\n",
    "    toks = nm.split(\",\")\n",
    "    toks = toks[1].split()\n",
    "    return toks[1], toks[3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "cf88e4e4-7565-4248-8213-cdb4a3856385",
   "metadata": {},
   "outputs": [],
   "source": [
    "def _error_std(r):\n",
    "    \"\"\"\n",
    "    Calculate the error of the estimate as the std of the repeats\n",
    "    \"\"\"\n",
    "    return np.std([v[0].m for v in r[\"overall\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "30faf828",
   "metadata": {},
   "outputs": [],
   "source": [
    "def _error_mbar(r):\n",
    "    \"\"\"\n",
    "    Calculate the error of the estimate using the reported MBAR errors.\n",
    "\n",
    "    This also takes into account that repeats may have been run for this edge by using the average MBAR error\n",
    "    \"\"\"\n",
    "    complex_errors = [x[1].m for x in r[\"complex\"]]\n",
    "    solvent_errors = [x[1].m for x in r[\"solvent\"]]\n",
    "    return np.sqrt(np.mean(complex_errors) ** 2 + np.mean(solvent_errors) ** 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d3ac78b-7e96-455e-b5c5-0fe1ee407153",
   "metadata": {},
   "source": [
    "### Methods to extract and manipulate the SepTop results\n",
    "\n",
    "The next four methods allow you to extract SepTop results (``extract_results_dict``) and then manipulate them to get different types of results.\n",
    "\n",
    "These include:\n",
    "\n",
    "* ``generate_ddg``: to get the ddG values.\n",
    "* ``generate_dg_mle``: to get the MLE-derived dG values.\n",
    "* ``generate_dg_raw``: to get the raw dG values for each individual legs in the SepTop transformation cycles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "e025eb3c-dc67-4170-afbe-0923b2cff0b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def extract_results_dict(\n",
    "    results_files: list[os.PathLike | str],\n",
    ") -> dict[str, dict[str, list]]:\n",
    "    \"\"\"\n",
    "    Get a dictionary of SepTop results from a list of directories.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    results_files : list[ps.PathLike | str]\n",
    "        A list of directors with SepTop result files to process.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    sim_results : dict[str, dict[str, list]]\n",
    "        Simulation results, organized by the leg's ligand names and simulation type.\n",
    "    \"\"\"\n",
    "    # find and filter result jsons\n",
    "    result_fns = _collect_result_jsons(results_files)\n",
    "    # pair legs of simulations together into dict of dicts\n",
    "    sim_results = _get_legs_from_result_jsons(result_fns)\n",
    "\n",
    "    return sim_results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "411fe035-2ae2-4f98-9bab-19764af724ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_ddg(results_dict: dict[str, dict[str, list]]) -> pd.DataFrame:\n",
    "    \"\"\"Compute and write out DDG values for the given results.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    results_dict : dict[str, dict[str, list]]\n",
    "        Dictionary of results created by ``extract_results_dict``.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    pd.DataFrame\n",
    "        A pandas DataFrame with the ddG results for each ligand pair.\n",
    "    \"\"\"\n",
    "    data = []\n",
    "    # check the type of error which should be used based on the number of repeats\n",
    "    repeats = {len(v[\"overall\"]) for v in results_dict.values()}\n",
    "    error_func = _error_mbar if 1 in repeats else _error_std\n",
    "    for ligpair, results in sorted(results_dict.items()):\n",
    "        ddg = np.mean([v[0].m for v in results[\"overall\"]])\n",
    "        error = error_func(results)\n",
    "        m, u = format_estimate_uncertainty(ddg, error, unc_prec=2)\n",
    "        data.append((ligpair[0], ligpair[1], m, u))\n",
    "\n",
    "    df = pd.DataFrame(\n",
    "        data,\n",
    "        columns=[\n",
    "            \"ligand_i\",\n",
    "            \"ligand_j\",\n",
    "            \"DDG(i->j) (kcal/mol)\",\n",
    "            \"uncertainty (kcal/mol)\",\n",
    "        ],\n",
    "    )\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "213f1c7b-185f-403b-a98c-22d31a5e60e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_dg_mle(results_dict: dict[str, dict[str, list]]) -> pd.DataFrame:\n",
    "    \"\"\"Compute and write out MLE-derived DG values for the given results.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    results_dict : dict[str, dict[str, list]]\n",
    "        Dictionary of results created by ``extract_results_dict``.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    pd.DataFrame\n",
    "        A pandas DataFrame with the dG results for each ligand pair.\n",
    "    \"\"\"\n",
    "\n",
    "    DDGs = generate_ddg(results_dict)\n",
    "    fe_results = []\n",
    "    for inx, row in DDGs.iterrows():\n",
    "        ligA, ligB, DDGbind, bind_unc = row.tolist()\n",
    "        m = Measurement(\n",
    "            labelA=ligA,\n",
    "            labelB=ligB,\n",
    "            DG=DDGbind * unit.kilocalorie_per_mole,\n",
    "            uncertainty=bind_unc * unit.kilocalorie_per_mole,\n",
    "            computational=True,\n",
    "        )\n",
    "        fe_results.append(m)\n",
    "\n",
    "    # Feed into the FEMap object\n",
    "    femap = FEMap()\n",
    "\n",
    "    for entry in fe_results:\n",
    "        femap.add_measurement(entry)\n",
    "\n",
    "    femap.generate_absolute_values()\n",
    "    df = femap.get_absolute_dataframe()\n",
    "    df = df.iloc[:, :3]\n",
    "    df.rename({\"label\": \"ligand\"}, axis=\"columns\", inplace=True)\n",
    "\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "76ced921-9d12-40c1-b95e-57430ea84778",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_dg_raw(results_dict: dict[str, dict[str, list]]) -> pd.DataFrame:\n",
    "    \"\"\"\n",
    "    Get all the transformation cycle legs found and their DG values.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    results_dict : dict[str, dict[str, list]]\n",
    "        Dictionary of results created by ``extract_results_dict``.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    pd.DataFrame\n",
    "        A pandas DataFrame with the individual cycle leg dG results.\n",
    "    \"\"\"\n",
    "    data = []\n",
    "    for ligpair, results in sorted(results_dict.items()):\n",
    "        for simtype, repeats in sorted(results.items()):\n",
    "            if simtype != \"overall\":\n",
    "                for repeat in repeats:\n",
    "                    m, u = format_estimate_uncertainty(\n",
    "                        repeat[0].m, repeat[1].m, unc_prec=2\n",
    "                    )\n",
    "                    data.append((simtype, ligpair[0], ligpair[1], m, u))\n",
    "\n",
    "    df = pd.DataFrame(\n",
    "        data,\n",
    "        columns=[\n",
    "            \"leg\",\n",
    "            \"ligand_i\",\n",
    "            \"ligand_j\",\n",
    "            \"DG(i->j) (kcal/mol)\",\n",
    "            \"uncertainty (kcal/mol)\",\n",
    "        ],\n",
    "    )\n",
    "    return df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0128d483-e043-4205-a20e-195f9b59974c",
   "metadata": {},
   "source": [
    "## Analyzing your results\n",
    "\n",
    "Now that we have defined a set of methods to help us extract results. Let's analyze the results!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98a84e37-91c3-4f78-8a68-a54f8835f7d2",
   "metadata": {},
   "source": [
    "### Specify result directories and gather results\n",
    "\n",
    "Let's start by gathering all our simulation results. First we define all the directories where our SepTop results exist. Here we assume that our simulation repeats sit in three different results directories under `septop_results`, named from `results_0` to `results_2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "3b81c26b-4e3d-4bca-bfb1-4f72c75ec1bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/ialibay/software/mambaforge/install/envs/openfe/lib/python3.12/site-packages/openmoltools/utils.py:9: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n",
      "  from pkg_resources import resource_filename\n",
      "/home/ialibay/software/mambaforge/install/envs/openfe/lib/python3.12/site-packages/Bio/Application/__init__.py:39: BiopythonDeprecationWarning: The Bio.Application modules and modules relying on it have been deprecated.\n",
      "\n",
      "Due to the on going maintenance burden of keeping command line application\n",
      "wrappers up to date, we have decided to deprecate and eventually remove these\n",
      "modules.\n",
      "\n",
      "We instead now recommend building your command line and invoking it directly\n",
      "with the subprocess module.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Specify paths to result directories\n",
    "results_dir = [\n",
    "    pathlib.Path(\"septop_results/results_0\"),\n",
    "    pathlib.Path(\"septop_results/results_1\"),\n",
    "    pathlib.Path(\"septop_results/results_2\"),\n",
    "]\n",
    "ddgs = extract_results_dict(results_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6e47322-bd5b-4b3b-b601-c406c03f8284",
   "metadata": {},
   "source": [
    "### Obtain the overall difference in binding affinity for all edges in the network\n",
    "\n",
    "With these extracted results, we can now get the ddG prediction between each pair of ligand.\n",
    "\n",
    "**Note: if only a single repeat was run, the MBAR error is used as uncertainty estimate, while the standard deviation is used when results from more than one repeat are provided.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "46996a74-709c-41f2-ac39-0f77fb33371e",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_ddg = generate_ddg(ddgs)\n",
    "df_ddg.to_csv(\"ddg.tsv\", sep=\"\\t\", lineterminator=\"\\n\", index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "b2b6fdf0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ligand_i</th>\n",
       "      <th>ligand_j</th>\n",
       "      <th>DDG(i-&gt;j) (kcal/mol)</th>\n",
       "      <th>uncertainty (kcal/mol)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>0.6</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>7b</td>\n",
       "      <td>0.1</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>1.9</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ligand_i ligand_j DDG(i->j) (kcal/mol) uncertainty (kcal/mol)\n",
       "0        1       25                  2.0                    1.6\n",
       "1        1       7a                  0.6                    1.5\n",
       "2        1       7b                  0.1                    1.5\n",
       "3       7a       7b                  1.9                    1.5"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ddg"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c3525c0-416d-4c18-9ff8-074bebbda0ea",
   "metadata": {},
   "source": [
    "### Obtain the MLE-derived absolute binding affinities\n",
    "\n",
    "We can also get the estimated binding dG for each ligand.\n",
    "\n",
    "This uses the maximum-likelihood method as implemented in cinnabar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "22e25226-0073-40a4-9cbc-da802a29fc25",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_dg = generate_dg_mle(ddgs)\n",
    "df_dg.to_csv(\"dg.tsv\", sep=\"\\t\", lineterminator=\"\\n\", index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "cda55931",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ligand</th>\n",
       "      <th>DG (kcal/mol)</th>\n",
       "      <th>uncertainty (kcal/mol)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>-0.675</td>\n",
       "      <td>0.664267</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>25</td>\n",
       "      <td>1.325</td>\n",
       "      <td>1.311964</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7a</td>\n",
       "      <td>-0.875</td>\n",
       "      <td>0.903466</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7b</td>\n",
       "      <td>0.225</td>\n",
       "      <td>0.903466</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ligand  DG (kcal/mol)  uncertainty (kcal/mol)\n",
       "0      1         -0.675                0.664267\n",
       "1     25          1.325                1.311964\n",
       "2     7a         -0.875                0.903466\n",
       "3     7b          0.225                0.903466"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_dg"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cb15526-7972-4776-8109-8aa8a49817be",
   "metadata": {},
   "source": [
    "### Obtain the raw DGs of every leg in the thermodynamic cycle\n",
    "\n",
    "If needed, you can also get the individual dG results for each leg of the SepTop transformation cycles.\n",
    "This can be useful in various different situation, such as when trying to diagnose which part of your simulation has the highest uncertainty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "8b2c1dd8-ffa3-4585-94a7-4a1ed1454f30",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_raw = generate_dg_raw(ddgs)\n",
    "df_raw.to_csv(\"ddg_raw.tsv\", sep=\"\\t\", lineterminator=\"\\n\", index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "08b72901-9c71-460b-b5da-9fb4a35e07f7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>leg</th>\n",
       "      <th>ligand_i</th>\n",
       "      <th>ligand_j</th>\n",
       "      <th>DG(i-&gt;j) (kcal/mol)</th>\n",
       "      <th>uncertainty (kcal/mol)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>complex</td>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>39.88</td>\n",
       "      <td>0.61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>solvent</td>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>37.9</td>\n",
       "      <td>1.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>standard_state_correction_A</td>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>-9.2</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>standard_state_correction_B</td>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>9.3</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>complex</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>-0.82</td>\n",
       "      <td>0.71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>complex</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>-0.35</td>\n",
       "      <td>0.69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>complex</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>-2.10</td>\n",
       "      <td>0.62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>solvent</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>-1.9</td>\n",
       "      <td>1.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>solvent</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>-1.5</td>\n",
       "      <td>1.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>solvent</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>-1.6</td>\n",
       "      <td>1.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>standard_state_correction_A</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>-8.8</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>standard_state_correction_A</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>-9.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>standard_state_correction_A</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>-9.1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>standard_state_correction_B</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>9.1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>standard_state_correction_B</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>9.1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>standard_state_correction_B</td>\n",
       "      <td>1</td>\n",
       "      <td>7a</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>complex</td>\n",
       "      <td>1</td>\n",
       "      <td>7b</td>\n",
       "      <td>2.48</td>\n",
       "      <td>0.84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>complex</td>\n",
       "      <td>1</td>\n",
       "      <td>7b</td>\n",
       "      <td>3.63</td>\n",
       "      <td>0.74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>solvent</td>\n",
       "      <td>1</td>\n",
       "      <td>7b</td>\n",
       "      <td>2.8</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>solvent</td>\n",
       "      <td>1</td>\n",
       "      <td>7b</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>standard_state_correction_A</td>\n",
       "      <td>1</td>\n",
       "      <td>7b</td>\n",
       "      <td>-9.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>standard_state_correction_A</td>\n",
       "      <td>1</td>\n",
       "      <td>7b</td>\n",
       "      <td>-9.3</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>standard_state_correction_B</td>\n",
       "      <td>1</td>\n",
       "      <td>7b</td>\n",
       "      <td>9.1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>standard_state_correction_B</td>\n",
       "      <td>1</td>\n",
       "      <td>7b</td>\n",
       "      <td>9.3</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>complex</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>3.17</td>\n",
       "      <td>0.80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>complex</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>1.65</td>\n",
       "      <td>0.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>complex</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>-2.06</td>\n",
       "      <td>0.58</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>solvent</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>-1.2</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>solvent</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>-0.9</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>solvent</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>-1.1</td>\n",
       "      <td>1.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>standard_state_correction_A</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>-9.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>standard_state_correction_A</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>-9.4</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>standard_state_correction_A</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>-9.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>standard_state_correction_B</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>8.9</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>standard_state_correction_B</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>9.3</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>standard_state_correction_B</td>\n",
       "      <td>7a</td>\n",
       "      <td>7b</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            leg ligand_i ligand_j DG(i->j) (kcal/mol)  \\\n",
       "0                       complex        1       25               39.88   \n",
       "1                       solvent        1       25                37.9   \n",
       "2   standard_state_correction_A        1       25                -9.2   \n",
       "3   standard_state_correction_B        1       25                 9.3   \n",
       "4                       complex        1       7a               -0.82   \n",
       "5                       complex        1       7a               -0.35   \n",
       "6                       complex        1       7a               -2.10   \n",
       "7                       solvent        1       7a                -1.9   \n",
       "8                       solvent        1       7a                -1.5   \n",
       "9                       solvent        1       7a                -1.6   \n",
       "10  standard_state_correction_A        1       7a                -8.8   \n",
       "11  standard_state_correction_A        1       7a                -9.0   \n",
       "12  standard_state_correction_A        1       7a                -9.1   \n",
       "13  standard_state_correction_B        1       7a                 9.1   \n",
       "14  standard_state_correction_B        1       7a                 9.1   \n",
       "15  standard_state_correction_B        1       7a                 9.0   \n",
       "16                      complex        1       7b                2.48   \n",
       "17                      complex        1       7b                3.63   \n",
       "18                      solvent        1       7b                 2.8   \n",
       "19                      solvent        1       7b                 3.1   \n",
       "20  standard_state_correction_A        1       7b                -9.0   \n",
       "21  standard_state_correction_A        1       7b                -9.3   \n",
       "22  standard_state_correction_B        1       7b                 9.1   \n",
       "23  standard_state_correction_B        1       7b                 9.3   \n",
       "24                      complex       7a       7b                3.17   \n",
       "25                      complex       7a       7b                1.65   \n",
       "26                      complex       7a       7b               -2.06   \n",
       "27                      solvent       7a       7b                -1.2   \n",
       "28                      solvent       7a       7b                -0.9   \n",
       "29                      solvent       7a       7b                -1.1   \n",
       "30  standard_state_correction_A       7a       7b                -9.0   \n",
       "31  standard_state_correction_A       7a       7b                -9.4   \n",
       "32  standard_state_correction_A       7a       7b                -9.0   \n",
       "33  standard_state_correction_B       7a       7b                 8.9   \n",
       "34  standard_state_correction_B       7a       7b                 9.3   \n",
       "35  standard_state_correction_B       7a       7b                 9.0   \n",
       "\n",
       "   uncertainty (kcal/mol)  \n",
       "0                    0.61  \n",
       "1                     1.4  \n",
       "2                     0.0  \n",
       "3                     0.0  \n",
       "4                    0.71  \n",
       "5                    0.69  \n",
       "6                    0.62  \n",
       "7                     1.2  \n",
       "8                     1.4  \n",
       "9                     1.4  \n",
       "10                    0.0  \n",
       "11                    0.0  \n",
       "12                    0.0  \n",
       "13                    0.0  \n",
       "14                    0.0  \n",
       "15                    0.0  \n",
       "16                   0.84  \n",
       "17                   0.74  \n",
       "18                    1.3  \n",
       "19                    1.3  \n",
       "20                    0.0  \n",
       "21                    0.0  \n",
       "22                    0.0  \n",
       "23                    0.0  \n",
       "24                   0.80  \n",
       "25                   0.68  \n",
       "26                   0.58  \n",
       "27                    1.3  \n",
       "28                    1.3  \n",
       "29                    1.2  \n",
       "30                    0.0  \n",
       "31                    0.0  \n",
       "32                    0.0  \n",
       "33                    0.0  \n",
       "34                    0.0  \n",
       "35                    0.0  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_raw"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.8"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}