{ "cells": [ { "cell_type": "markdown", "id": "35354229", "metadata": {}, "source": [ "# Setting up and running absolute binding free energy calculations\n", "\n", "This tutorial gives a step-by-step process to set up an absolute binding free energy (ABFE) simulation campaign using OpenFE. In this tutorial we are performing an absolute binding free energy calculation of toluene binding to T4 Lysozyme.\n", "\n", "The absolute binding free energy is obtained through a thermodynamic cycle. The interactions of the molecule are decoupled both in solvent, giving $\\Delta G$(solvent) and in the complex, giving $\\Delta G$(complex), which allows calculation of the absolute binding free energy, $\\Delta G$(ABFE). \n", "\n", "Note: In this `Protocol`, the coulombic interactions of the molecule are fully turned off (annihilated), while the Lennard-Jones interactions are decoupled, meaning the intermolecular interactions are turned off, whilst keeping the intramolecular Lennard-Jones interactions." ] }, { "cell_type": "markdown", "id": "aa696921-9fef-4a5e-ab66-a5023fa56a80", "metadata": {}, "source": [ "## \"Drawing\"" ] }, { "cell_type": "markdown", "id": "f9154ba7", "metadata": { "jp-MarkdownHeadingCollapsed": true }, "source": [ "## 0. Setup for Google Colab\n", "\n", "If you are running this example in Google Colab, run the following cells to setup the environment. If you are running this notebook locally, skip down to `1. Loading the ligand`" ] }, { "cell_type": "code", "execution_count": null, "id": "6173dd67", "metadata": {}, "outputs": [], "source": [ "# NBVAL_SKIP\n", "# Only run this cell if on google colab\n", "import os\n", "if \"COLAB_RELEASE_TAG\" in os.environ:\n", " # fix for colab's torchvision causing issues\n", " !rm -r /usr/local/lib/python3.12/dist-packages/torchvision\n", " \n", " !pip install -q condacolab\n", " import condacolab\n", " condacolab.install_from_url(\"https://github.com/OpenFreeEnergy/openfe/releases/download/v1.7.0/OpenFEforge-1.7.0-Linux-x86_64.sh\")" ] }, { "cell_type": "code", "execution_count": null, "id": "2e888557", "metadata": {}, "outputs": [], "source": [ "# NBVAL_SKIP\n", "# Only run this cell if on google colab\n", "import os\n", "if \"COLAB_RELEASE_TAG\" in os.environ:\n", " import condacolab\n", " import locale\n", " locale.getpreferredencoding = lambda: \"UTF-8\"\n", " !mkdir inputs && cd inputs && openfe fetch rbfe-tutorial\n", " # quick fix for https://github.com/conda-incubator/condacolab/issues/75\n", " # if deprecation warnings persist, rerun this cell\n", " import warnings\n", " warnings.filterwarnings(action=\"ignore\", message=r\"datetime.datetime.utcnow\") \n", " for _ in range(3):\n", " # Sometimes we have to re-run the check\n", " try:\n", " condacolab.check()\n", " except:\n", " pass\n", " else:\n", " break" ] }, { "cell_type": "markdown", "id": "2fea29c3", "metadata": {}, "source": [ "## 1. Loading the ligand\n", "\n", "First we must load the chemical models for which we wish to calculate the binding free energy.\n", "In this example these are initially stored in a molfile (`.sdf`).\n", "This can be loaded using the `SDMolSupplier` class from rdkit and passed to openfe." ] }, { "cell_type": "code", "execution_count": 1, "id": "fc97de03", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/ialibay/software/mambaforge/install/envs/openfe/lib/python3.12/site-packages/openmoltools/utils.py:9: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n", " from pkg_resources import resource_filename\n" ] } ], "source": [ "%matplotlib inline\n", "import openfe" ] }, { "cell_type": "code", "execution_count": 2, "id": "41cf8be7", "metadata": {}, "outputs": [], "source": [ "from rdkit import Chem\n", "supp = Chem.SDMolSupplier(\"../cookbook/assets/toluene.sdf\", removeHs=False)\n", "ligands = [openfe.SmallMoleculeComponent.from_rdkit(mol) for mol in supp]" ] }, { "cell_type": "markdown", "id": "92e8c3b5-dcea-401e-be9d-7a9ced9f702c", "metadata": {}, "source": [ "## 2. Charging the ligand\n", "\n", "It is recommended to use a single set of charges for each ligand to ensure reproducibility between repeats. \n", "\n", "Here we will use some utility functions from OpenFE which can assign partial charges to a series of molecules with a variety of methods which can be configured via the `OpenFFPartialChargeSettings` class. In this example \n", "we will charge the ligands using the `am1bcc` method from `ambertools` which is the default charge scheme used by OpenFE." ] }, { "cell_type": "code", "execution_count": 3, "id": "8281b9f9-7bfc-404d-b20d-0c6136d57473", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Generating charges: 100%|█████████████████████████| 1/1 [00:00<00:00, 1.44it/s]\n" ] } ], "source": [ "from openfe.protocols.openmm_utils.omm_settings import OpenFFPartialChargeSettings\n", "from openfe.protocols.openmm_utils.charge_generation import bulk_assign_partial_charges\n", "\n", "charge_settings = OpenFFPartialChargeSettings(partial_charge_method=\"am1bcc\", off_toolkit_backend=\"ambertools\")\n", "\n", "ligands = bulk_assign_partial_charges(\n", " molecules=ligands,\n", " overwrite=False, \n", " method=charge_settings.partial_charge_method,\n", " toolkit_backend=charge_settings.off_toolkit_backend,\n", " generate_n_conformers=charge_settings.number_of_conformers,\n", " nagl_model=charge_settings.nagl_model,\n", " processors=1\n", ")" ] }, { "cell_type": "markdown", "id": "d0cb1329", "metadata": {}, "source": [ "## 3. Creating `ChemicalSystem`s\n", "\n", "OpenFE describes complex molecular systems as being composed of `Component`s. For example, we have a `SmallMoleculeComponent` for each small molecule, a `SolventComponent` to describe the solvent, and a `ProteinComponent` to describe the protein.\n", "\n", "The `Component`s are joined in a `ChemicalSystem` to define the entire system.\n", "\n", "In state A of the binding free energy `Protocol`, the ligand is fully interacting in the complex and the `ChemicalSystem` contains the ligand, the protein, and the solvent. In the other endstate, state B, the ligand is fully decoupled in the complex. Therefore, the `ChemicalSystem` in state B only contains the protein and the solvent.\n", "\n", "Note that for ABFE simulations, we are not separately defining the solvent state, but the `Protocol` creates that based on the complex states." ] }, { "cell_type": "code", "execution_count": 4, "id": "9d2fbc22", "metadata": {}, "outputs": [], "source": [ "# defaults are water with NaCl at 0.15 M\n", "solvent = openfe.SolventComponent()" ] }, { "cell_type": "code", "execution_count": 5, "id": "534184d1-3acc-4031-beea-144b0add1fac", "metadata": {}, "outputs": [], "source": [ "protein = openfe.ProteinComponent.from_pdb_file(\"../cookbook/assets/t4_lysozyme.pdb\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "710285ca", "metadata": {}, "outputs": [], "source": [ "# In state A the ligand is fully interacting in the solvent\n", "systemA = openfe.ChemicalSystem({\n", " 'ligand': ligands[0],\n", " 'protein': protein,\n", " 'solvent': solvent,\n", "}, name=ligands[0].name)\n", "# In state B the ligand is fully decoupled in the solvent, therefore we are only defining the solvent here\n", "systemB = openfe.ChemicalSystem({\n", " 'protein': protein,\n", " 'solvent': solvent,\n", "})" ] }, { "cell_type": "markdown", "id": "33aa384c-54f4-4b42-8b6d-38b07b4e3c47", "metadata": {}, "source": [ "## 4. Defining the ABFE simulation settings and creating a `Protocol`" ] }, { "cell_type": "markdown", "id": "a0993eec-50f4-49d2-83ba-e3c166b9377e", "metadata": {}, "source": [ "There are various different parameters which can be set to determine how the ABFE simulation will take place. \n", "\n", "The easiest way to customize `Protocol` settings is to start with the default settings, and modify them. Many settings carry units with them." ] }, { "cell_type": "code", "execution_count": 7, "id": "c31c712b-4844-477b-8aa4-ded6f0c8ca5f", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/ialibay/software/mambaforge/install/envs/openfe/lib/python3.12/site-packages/Bio/Application/__init__.py:39: BiopythonDeprecationWarning: The Bio.Application modules and modules relying on it have been deprecated.\n", "\n", "Due to the on going maintenance burden of keeping command line application\n", "wrappers up to date, we have decided to deprecate and eventually remove these\n", "modules.\n", "\n", "We instead now recommend building your command line and invoking it directly\n", "with the subprocess module.\n", " warnings.warn(\n" ] } ], "source": [ "from openfe.protocols.openmm_afe import AbsoluteBindingProtocol;" ] }, { "cell_type": "code", "execution_count": 8, "id": "fb839094", "metadata": {}, "outputs": [], "source": [ "settings = AbsoluteBindingProtocol.default_settings()" ] }, { "cell_type": "code", "execution_count": 9, "id": "8b99f77f-c70c-436d-b4eb-fb462a4b043e", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'alchemical_settings': {},\n", " 'complex_equil_output_settings': {'checkpoint_interval': {'unit': 'nanosecond',\n", " 'val': 1.0},\n", " 'checkpoint_storage_filename': 'checkpoint.chk',\n", " 'equil_npt_structure': 'equil_npt_structure.pdb',\n", " 'equil_nvt_structure': 'equil_nvt_structure.pdb',\n", " 'forcefield_cache': 'db.json',\n", " 'log_output': 'production_equil_simulation.log',\n", " 'minimized_structure': 'minimized.pdb',\n", " 'output_indices': 'all',\n", " 'preminimized_structure': 'system.pdb',\n", " 'production_trajectory_filename': 'production_equil.xtc',\n", " 'trajectory_write_interval': {'unit': 'picosecond',\n", " 'val': 20.0}},\n", " 'complex_equil_simulation_settings': {'equilibration_length': {'unit': 'nanosecond',\n", " 'val': 0.5},\n", " 'equilibration_length_nvt': {'unit': 'nanosecond',\n", " 'val': 0.25},\n", " 'minimization_steps': 5000,\n", " 'production_length': {'unit': 'nanosecond',\n", " 'val': 5.0}},\n", " 'complex_lambda_settings': {'lambda_elec': [0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.1,\n", " 0.2,\n", " 0.3,\n", " 0.4,\n", " 0.5,\n", " 0.6,\n", " 0.7,\n", " 0.8,\n", " 0.9,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0],\n", " 'lambda_restraints': [0.0,\n", " 0.2,\n", " 0.4,\n", " 0.6,\n", " 0.8,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0],\n", " 'lambda_vdw': [0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.1,\n", " 0.2,\n", " 0.3,\n", " 0.4,\n", " 0.5,\n", " 0.6,\n", " 0.65,\n", " 0.7,\n", " 0.75,\n", " 0.8,\n", " 0.85,\n", " 0.9,\n", " 0.95,\n", " 1.0]},\n", " 'complex_output_settings': {'checkpoint_interval': {'unit': 'nanosecond',\n", " 'val': 1.0},\n", " 'checkpoint_storage_filename': 'complex_checkpoint.nc',\n", " 'forcefield_cache': 'db.json',\n", " 'output_filename': 'complex.nc',\n", " 'output_indices': 'not water',\n", " 'output_structure': 'alchemical_system.pdb',\n", " 'positions_write_frequency': {'unit': 'picosecond',\n", " 'val': 100.0},\n", " 'velocities_write_frequency': None},\n", " 'complex_simulation_settings': {'early_termination_target_error': {'unit': 'kilocalorie_per_mole',\n", " 'val': 0.0},\n", " 'equilibration_length': {'unit': 'nanosecond',\n", " 'val': 1},\n", " 'minimization_steps': 5000,\n", " 'n_replicas': 30,\n", " 'production_length': {'unit': 'nanosecond',\n", " 'val': 10.0},\n", " 'real_time_analysis_interval': {'unit': 'picosecond',\n", " 'val': 250.0},\n", " 'real_time_analysis_minimum_time': {'unit': 'picosecond',\n", " 'val': 500.0},\n", " 'sampler_method': 'repex',\n", " 'sams_flatness_criteria': 'logZ-flatness',\n", " 'sams_gamma0': 1.0,\n", " 'time_per_iteration': {'unit': 'picosecond',\n", " 'val': 2.5}},\n", " 'complex_solvation_settings': {'box_shape': 'dodecahedron',\n", " 'box_size': None,\n", " 'box_vectors': None,\n", " 'number_of_solvent_molecules': None,\n", " 'solvent_model': 'tip3p',\n", " 'solvent_padding': {'unit': 'nanometer',\n", " 'val': 1.0}},\n", " 'engine_settings': {'compute_platform': 'cuda', 'gpu_device_index': None},\n", " 'forcefield_settings': {'constraints': 'hbonds',\n", " 'forcefields': ['amber/ff14SB.xml',\n", " 'amber/tip3p_standard.xml',\n", " 'amber/tip3p_HFE_multivalent.xml',\n", " 'amber/phosaa10.xml'],\n", " 'hydrogen_mass': 3.0,\n", " 'nonbonded_cutoff': {'unit': 'nanometer', 'val': 0.9},\n", " 'nonbonded_method': 'PME',\n", " 'rigid_water': True,\n", " 'small_molecule_forcefield': 'openff-2.2.1'},\n", " 'integrator_settings': {'barostat_frequency': {'unit': 'timestep',\n", " 'val': 25.0},\n", " 'constraint_tolerance': 1e-06,\n", " 'langevin_collision_rate': {'unit': '1 / picosecond',\n", " 'val': 1.0},\n", " 'n_restart_attempts': 20,\n", " 'reassign_velocities': False,\n", " 'remove_com': False,\n", " 'timestep': {'unit': 'femtosecond', 'val': 4.0}},\n", " 'partial_charge_settings': {'nagl_model': None,\n", " 'number_of_conformers': None,\n", " 'off_toolkit_backend': 'ambertools',\n", " 'partial_charge_method': 'am1bcc'},\n", " 'protocol_repeats': 3,\n", " 'restraint_settings': {'K_phiA': {'unit': 'kilojoule_per_mole / radian ** 2',\n", " 'val': 334.72},\n", " 'K_phiB': {'unit': 'kilojoule_per_mole / radian ** 2',\n", " 'val': 334.72},\n", " 'K_phiC': {'unit': 'kilojoule_per_mole / radian ** 2',\n", " 'val': 334.72},\n", " 'K_r': {'unit': 'kilojoule_per_mole / nanometer ** 2',\n", " 'val': 4184.0},\n", " 'K_thetaA': {'unit': 'kilojoule_per_mole / radian ** 2',\n", " 'val': 334.72},\n", " 'K_thetaB': {'unit': 'kilojoule_per_mole / radian ** 2',\n", " 'val': 334.72},\n", " 'anchor_finding_strategy': 'bonded',\n", " 'dssp_filter': True,\n", " 'host_max_distance': {'unit': 'nanometer', 'val': 1.5},\n", " 'host_min_distance': {'unit': 'nanometer', 'val': 0.5},\n", " 'host_selection': 'backbone',\n", " 'rmsf_cutoff': {'unit': 'nanometer', 'val': 0.1}},\n", " 'solvent_equil_output_settings': {'checkpoint_interval': {'unit': 'nanosecond',\n", " 'val': 1.0},\n", " 'checkpoint_storage_filename': 'checkpoint.chk',\n", " 'equil_npt_structure': 'equil_npt_structure.pdb',\n", " 'equil_nvt_structure': 'equil_nvt_structure.pdb',\n", " 'forcefield_cache': 'db.json',\n", " 'log_output': 'production_equil_simulation.log',\n", " 'minimized_structure': 'minimized.pdb',\n", " 'output_indices': 'all',\n", " 'preminimized_structure': 'system.pdb',\n", " 'production_trajectory_filename': 'production_equil.xtc',\n", " 'trajectory_write_interval': {'unit': 'picosecond',\n", " 'val': 20.0}},\n", " 'solvent_equil_simulation_settings': {'equilibration_length': {'unit': 'nanosecond',\n", " 'val': 0.2},\n", " 'equilibration_length_nvt': {'unit': 'nanosecond',\n", " 'val': 0.1},\n", " 'minimization_steps': 5000,\n", " 'production_length': {'unit': 'nanosecond',\n", " 'val': 0.5}},\n", " 'solvent_lambda_settings': {'lambda_elec': [0.0,\n", " 0.25,\n", " 0.5,\n", " 0.75,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0,\n", " 1.0],\n", " 'lambda_restraints': [0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0],\n", " 'lambda_vdw': [0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.12,\n", " 0.24,\n", " 0.36,\n", " 0.48,\n", " 0.6,\n", " 0.7,\n", " 0.77,\n", " 0.85,\n", " 1.0]},\n", " 'solvent_output_settings': {'checkpoint_interval': {'unit': 'nanosecond',\n", " 'val': 1.0},\n", " 'checkpoint_storage_filename': 'solvent_checkpoint.nc',\n", " 'forcefield_cache': 'db.json',\n", " 'output_filename': 'solvent.nc',\n", " 'output_indices': 'not water',\n", " 'output_structure': 'alchemical_system.pdb',\n", " 'positions_write_frequency': {'unit': 'picosecond',\n", " 'val': 100.0},\n", " 'velocities_write_frequency': None},\n", " 'solvent_simulation_settings': {'early_termination_target_error': {'unit': 'kilocalorie_per_mole',\n", " 'val': 0.0},\n", " 'equilibration_length': {'unit': 'nanosecond',\n", " 'val': 1.0},\n", " 'minimization_steps': 5000,\n", " 'n_replicas': 14,\n", " 'production_length': {'unit': 'nanosecond',\n", " 'val': 10.0},\n", " 'real_time_analysis_interval': {'unit': 'picosecond',\n", " 'val': 250.0},\n", " 'real_time_analysis_minimum_time': {'unit': 'picosecond',\n", " 'val': 500.0},\n", " 'sampler_method': 'repex',\n", " 'sams_flatness_criteria': 'logZ-flatness',\n", " 'sams_gamma0': 1.0,\n", " 'time_per_iteration': {'unit': 'picosecond',\n", " 'val': 2.5}},\n", " 'solvent_solvation_settings': {'box_shape': 'dodecahedron',\n", " 'box_size': None,\n", " 'box_vectors': None,\n", " 'number_of_solvent_molecules': None,\n", " 'solvent_model': 'tip3p',\n", " 'solvent_padding': {'unit': 'nanometer',\n", " 'val': 1.5}},\n", " 'thermo_settings': {'ph': None,\n", " 'pressure': {'unit': 'bar', 'val': 1},\n", " 'redox_potential': None,\n", " 'temperature': {'unit': 'kelvin', 'val': 298.15}}}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/ialibay/software/mambaforge/install/envs/openfe/lib/python3.12/site-packages/gufe/settings/models.py:30: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/\n", " pprint.pprint(self.dict())\n" ] } ], "source": [ "settings" ] }, { "cell_type": "markdown", "id": "9f81f879-9eea-4af0-b772-e28832912487", "metadata": {}, "source": [ "By default we run 3 repeats with solvent and complex simulation lengths of 10 ns over 14 and 30 lambda windows. \n", "To speed things up here we instead run 1 repeat with both solvent and complex simulation lengths of 0.5 ns over 14 and 30 lambda windows.\n", "\n", "Changing default values:" ] }, { "cell_type": "code", "execution_count": 10, "id": "55067780-d228-4661-8c1e-5cb0217fd2dc", "metadata": {}, "outputs": [], "source": [ "from openff.units import unit\n", "\n", "# Run only a single repeat\n", "settings.protocol_repeats = 1\n", "# Change the min and max distance between protein and ligand atoms for Boresch restraints to avoid periodicity issues\n", "settings.restraint_settings.host_min_distance = 0.5 * unit.nanometer\n", "settings.restraint_settings.host_max_distance = 1.5 * unit.nanometer\n", "# The compute platform is CUDA by default, but we're just showing you how you could change it\n", "settings.engine_settings.compute_platform = 'CUDA'" ] }, { "cell_type": "markdown", "id": "8c32f1b9-b09e-41ac-b66d-3fbc8ee433ec", "metadata": {}, "source": [ "### Creating the `Protocol`\n", "With the Settings inspected and adjusted, we can provide these to the `Protocol`. This `Protocol` defines the procedure to estimate a free energy difference between two chemical systems, with the details of the two end states yet to be defined." ] }, { "cell_type": "code", "execution_count": 11, "id": "23f6322b-0336-4aa9-b9d0-ebe533dc5753", "metadata": {}, "outputs": [], "source": [ "protocol = AbsoluteBindingProtocol(settings=settings)" ] }, { "cell_type": "markdown", "id": "7f1dfcff-bb06-4508-9f16-4e3de18a5f93", "metadata": {}, "source": [ "## 5. Running the ABFE simulation" ] }, { "cell_type": "markdown", "id": "fe618563-c445-4fb3-bc0c-6c08148ddede", "metadata": {}, "source": [ "### (a) Using the CLI" ] }, { "cell_type": "markdown", "id": "38129396-ba19-46e9-9eb0-36b0e3a7ad3c", "metadata": {}, "source": [ "Once we have the ChemicalSystems, and the `Protocol`, we can create the Transformation. " ] }, { "cell_type": "code", "execution_count": 12, "id": "1231fad8-b37d-4008-a851-5ad546386286", "metadata": {}, "outputs": [], "source": [ "transformation = openfe.Transformation(\n", " stateA=systemA,\n", " stateB=systemB,\n", " mapping=None,\n", " protocol=protocol, # use protocol created above\n", " name=f\"{systemA.name}\"\n", " )" ] }, { "cell_type": "markdown", "id": "451e3345-0b30-4dbf-93b5-28847fad1c9a", "metadata": {}, "source": [ "We'll write out the transformation to disk, so that it can be run using the openfe quickrun command:" ] }, { "cell_type": "code", "execution_count": 13, "id": "b69efbcd-b888-4afc-a33e-01860889f408", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/ialibay/software/mambaforge/install/envs/openfe/lib/python3.12/site-packages/gufe/transformations/transformation.py:124: DeprecationWarning: use of this method is deprecated; instead use `to_json`\n", " warnings.warn(\n" ] } ], "source": [ "import pathlib\n", "# first we create the directory\n", "transformation_dir = pathlib.Path(\"abfe_json\")\n", "transformation_dir.mkdir(exist_ok=True)\n", "\n", "# then we write out the transformation\n", "transformation.dump(transformation_dir / f\"{transformation.name}.json\")" ] }, { "cell_type": "markdown", "id": "5b4f7397-d79e-4bea-a473-1750b27d7b44", "metadata": {}, "source": [ "You can run the ABFE simulation from the CLI by using the `openfe quickrun` command. It takes a transformation JSON as input, and the flags -o to give the final output JSON file and -d for the directory where simulation results should be stored. For example,\n", "\n", "`openfe quickrun path/to/transformation.json -o results.json -d working-directory`\n", "\n", "where path/to/transformation.json is the path to one of the files created above." ] }, { "cell_type": "markdown", "id": "bc919dc3-1281-49d2-ad92-4a3355d8bbe5", "metadata": {}, "source": [ "### (b) Using the Python API" ] }, { "cell_type": "markdown", "id": "21b64394-fdc7-47e9-aadd-488a55c1d0bb", "metadata": {}, "source": [ "**Creating the `ProtocolDAG`**\n", "\n", "Once we have the two `ChemicalSystem`s, and the `Protocol`, we can create the `ProtocolDAG`.\n", "\n", "This creates a directed-acyclic-graph (DAG) of computational tasks necessary for creating an estimate of the free energy difference between the two chemical systems." ] }, { "cell_type": "code", "execution_count": 14, "id": "44ba94ca", "metadata": {}, "outputs": [], "source": [ "dag = protocol.create(stateA=systemA, stateB=systemB, mapping=None)" ] }, { "cell_type": "markdown", "id": "4283dfe4", "metadata": {}, "source": [ "To summarize, this `ProtocolDAG` contains:\n", "- chemical models of both sides of the alchemical transformation in `systemA` and `systemB`\n", "- a description of the exact computational algorithm to use to perform the estimate in `Protocol`\n", "- the `mapping` is set to `None` since no atoms are mapped in the ABFE protocol" ] }, { "cell_type": "markdown", "id": "1e29d1c8", "metadata": {}, "source": [ "**Executing the simulation**\n", "\n", "The DAG contains many invdividual jobs. We could execute them sequentially in this notebook using the `gufe.protocols.execute` function, however, this calculation is too expensive to be run in that manner, therefore the `execulte_DAG` command is commented out here.\n", "\n", "In a more realistic (expansive) situation we would farm off the individual jobs to a HPC cluster or cloud compute service so they could be executed in parallel.\n", "\n", "Note: we use the `shared_basedir` and `scratch_basedir` argument of `execute_DAG` in order to set the directory where the simulation files are written to" ] }, { "cell_type": "code", "execution_count": 15, "id": "6dbedb47-46b9-4c22-ad74-a580174a359c", "metadata": {}, "outputs": [], "source": [ "from gufe.protocols import execute_DAG\n", "import pathlib" ] }, { "cell_type": "code", "execution_count": 16, "id": "be690ef1-3243-4114-b56f-5babbb660af6", "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Finally we can run the simulations\n", "path = pathlib.Path('./abfe_results')\n", "\n", "# Commented out since this would be too expensive to run in this notebook.\n", "# path.mkdir()\n", "# Execute the DAG\n", "# dag_results = execute_DAG(dag, scratch_basedir=path, shared_basedir=path, n_retries=3)" ] }, { "cell_type": "markdown", "id": "d91f78b8-58dc-4780-880f-5a643cbb18c9", "metadata": {}, "source": [ "## 6. Analysis" ] }, { "cell_type": "markdown", "id": "27f1f7d0-a294-41a7-8b97-70878747a22d", "metadata": {}, "source": [ "Finally now that we've run our simulations, let's go ahead and gather the free\n", "energies for both phases." ] }, { "cell_type": "markdown", "id": "8dfef043-35de-4943-a728-571a83c13161", "metadata": {}, "source": [ "### Python API" ] }, { "cell_type": "markdown", "id": "cea729d2-231f-4759-8a11-d6d09c1428fb", "metadata": {}, "source": [ "If you executed the simulations using the Python API, you will have generated a `dag_results` object. You can analyze these results by calling the `Protocol`s' `gather()` method. This takes a **list** of completed DAG results and returns a `AbsoluteBindingProtocolResult` which can return a free energy estimate and uncertainty by calling the `get_estimate()` and `get_uncertainty()` methods.\n", "\n", "The code for this part is commented out since executing the DAG in this notebook would be too expensive. " ] }, { "cell_type": "code", "execution_count": 17, "id": "d0f6b8fb-dde2-442a-b641-3160a70e2f84", "metadata": {}, "outputs": [], "source": [ "# Get the complex and solvent results\n", "# protocol_results = protocol.gather([dag_results])\n", "\n", "# print(f\"ABFE dG: {protocol_results.get_estimate()}, err {protocol_results.get_uncertainty()}\")" ] }, { "cell_type": "markdown", "id": "b23748c3-3fa6-45f2-8fcb-e471bd28dd3d", "metadata": {}, "source": [ "You can save the `AbsoluteBindingProtocolResult` to a JSON output file in the following manner:" ] }, { "cell_type": "code", "execution_count": 18, "id": "847b5663-2e6c-4cb2-aa9d-39d132fcff3d", "metadata": {}, "outputs": [], "source": [ "# Save the results in a json file\n", "# import gzip\n", "# import json\n", "# import gufe\n", "# outdict = {\n", "# \"estimate\": protocol_results.get_estimate(),\n", "# \"uncertainty\": protocol_results.get_uncertainty(),\n", "# \"protocol_result\": protocol_results.to_dict(),\n", "# \"unit_results\": {\n", "# unit.key: unit.to_keyed_dict()\n", "# for unit in dag_results.protocol_unit_results\n", "# }\n", "# }\n", "\n", "# with open(\"abfe_json/toluene_results.json\") as stream:\n", "# json.dump(outdict, stream, cls=gufe.tokenization.JSON_HANDLER.encoder)" ] }, { "cell_type": "markdown", "id": "43f3626a-74c1-4835-be7b-fd6bf0c81562", "metadata": {}, "source": [ "### CLI / Quickrun" ] }, { "cell_type": "markdown", "id": "aa3978a2-ed90-4706-b3be-554237d40751", "metadata": {}, "source": [ "If you ran the simulation using the CLI (i.e. by calling `openfe quickrun` ) you will end up with the same JSON output file as the one created in the previous cell. To get your simulation results you can load them back into Python in the following manner:" ] }, { "cell_type": "code", "execution_count": null, "id": "f62f69a1-09c0-4a4a-9b37-9663b51a75ac", "metadata": {}, "outputs": [], "source": [ "# import gzip\n", "# import json\n", "# import gufe\n", "\n", "# outfile = \"abfe_results/toluene_results.json\"\n", "# with open(outfile) as stream:\n", "# results = json.load(stream)\n", "# estimate = results['estimate']\n", "# uncertainty = results['uncertainty']\n", "\n", "# estimate" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.11" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }