{ "cells": [ { "cell_type": "markdown", "id": "088a1bc4-e5f3-47ac-8ebf-1a904fa82f80", "metadata": {}, "source": [ "# Running a Molecular Dynamics (MD) simulation of a protein-ligand complex" ] }, { "cell_type": "markdown", "id": "7266db2c-37e5-419a-9015-929ea1635d98", "metadata": {}, "source": [ "In this notebook we run an MD simulation of benzene bound to T4-lysozyme L99A.\n", "\n", "" ] }, { "cell_type": "markdown", "id": "00d9ca69-1b38-4cf8-9c93-1fc4c08dae15", "metadata": { "editable": true, "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## On the MD protocol" ] }, { "cell_type": "markdown", "id": "8bc61b88-0c8d-4b89-b704-8ac4ade19c6c", "metadata": { "editable": true, "raw_mimetype": "", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The plain MD protocol allows the user to run an MD simulation either in solvent or vacuum of e.g. a small molecule, a protein, or a protein-ligand complex. Running an MD simulations can be useful for a variety of things, such as pre-equilibration of a protein-ligand complex prior to running free energy calculations or for getting insights into the general dynamics of the system of interest.\n", "\n", "The MD protocol uses different software tools in a series of steps and provides multiple outputs for the user:" ] }, { "cell_type": "markdown", "id": "1938576f-98a2-42de-b161-3ff221d53c52", "metadata": {}, "source": [ "| **Step** | **Software used** | **Outputs (with default names)** |\n", "|:----------------------------------------------------|:-----------------------------------|:-----------------------------------------------------------|\n", "| 1. Input handling using gufe | OpenFE, Gufe, RDKit ||\n", "| 2. Parameterization using OpenMMForceFields & OpenFF| OpenFE - OpenMMForceFields - OpenFF| Forcefield cache (`db.json`) |\n", "| 3. OpenMM object creation | OpenFE - OpenMM + OpenMMTools | Structure of the full system (`system.pdb`) |\n", "| 4. Minimization | OpenFE - OpenMM + OpenMMTools | Minimized Structure (`minimized.pdb`) |\n", "| 5. NVT equilibration (if not gas phase) | OpenFE - OpenMM + OpenMMTools | NVT equilibrated structure (`equil_nvt.pdb`) |\n", "| 6. NPT equilibration (if not gas phase) | OpenFE - OpenMM + OpenMMTools | NPT equilibrated structure (`equil_npt.pdb`) |\n", "| 7. NPT production (if not gas phase) | OpenFE - OpenMM + OpenMMTools | Simulation trajectory (`simulation.xtc`), Checkpoint file (`checkpoint.chk`), Log output (`simulation.log`) |\n" ] }, { "cell_type": "markdown", "id": "fc27c86c", "metadata": {}, "source": [ "## 0. Setup for Google Colab\n", "\n", "If you are running this example in Google Colab, run the following cells to setup the environment. If you are running this notebook locally, skip down to `1. Overview`" ] }, { "cell_type": "code", "execution_count": null, "id": "6559bc05", "metadata": {}, "outputs": [], "source": [ "# NBVAL_SKIP\n", "# Only run this cell if on google colab\n", "import os\n", "if \"COLAB_RELEASE_TAG\" in os.environ:\n", " # fix for colab's torchvision causing issues\n", " !rm -r /usr/local/lib/python3.12/dist-packages/torchvision\n", " \n", " !pip install -q condacolab\n", " import condacolab\n", " condacolab.install_from_url(\"https://github.com/OpenFreeEnergy/openfe/releases/download/v1.7.0/OpenFEforge-1.7.0-Linux-x86_64.sh\")" ] }, { "cell_type": "code", "execution_count": null, "id": "eb1368ca", "metadata": {}, "outputs": [], "source": [ "# NBVAL_SKIP\n", "# Only run this cell if on google colab\n", "import os\n", "if \"COLAB_RELEASE_TAG\" in os.environ:\n", " import condacolab\n", " import locale\n", " locale.getpreferredencoding = lambda: \"UTF-8\"\n", " !mkdir inputs && cd inputs && openfe fetch rbfe-tutorial\n", " # quick fix for https://github.com/conda-incubator/condacolab/issues/75\n", " # if deprecation warnings persist, rerun this cell\n", " import warnings\n", " warnings.filterwarnings(action=\"ignore\", message=r\"datetime.datetime.utcnow\") \n", " for _ in range(3):\n", " # Sometimes we have to re-run the check\n", " try:\n", " condacolab.check()\n", " except:\n", " pass\n", " else:\n", " break" ] }, { "cell_type": "markdown", "id": "0eb8203a-650f-4703-b032-dbafb605061b", "metadata": {}, "source": [ "## 1. Defining the ChemicalSystem" ] }, { "cell_type": "markdown", "id": "7c477c87-a099-4e4e-9231-fa5edf9f15f4", "metadata": {}, "source": [ "`ChemicalSystems` are OpenFE containers which define the various components which exist in a system of interest. \n", "Here, we will be passing the `SmallMoleculeComponent` for benzene, a `ProteinComponent` generated from a PDB file, and a `SolventComponent` which will contain the necessary information for OpenMM’s Modeller class to add water and 0.15 M NaCl around the solute." ] }, { "cell_type": "code", "execution_count": 3, "id": "20fc8142-c618-4d50-b903-5a04f6a34d5c", "metadata": {}, "outputs": [], "source": [ "import openfe\n", "from openfe import ChemicalSystem, ProteinComponent, SmallMoleculeComponent, SolventComponent\n", "from openff.units import unit\n", "\n", "# Define the ligand we are interested in\n", "ligand = SmallMoleculeComponent.from_sdf_file('assets/benzene.sdf')\n", "\n", "# Define the solvent environment and protein structure\n", "solvent = SolventComponent(ion_concentration=0.15 * unit.molar)\n", "protein = ProteinComponent.from_pdb_file('assets/t4_lysozyme.pdb', name='t4-lysozyme')\n", "\n", "# create the ChemicalSystem\n", "system = ChemicalSystem({'ligand': ligand, 'protein': protein, 'solvent': solvent}, name=f\"{ligand.name}_{protein.name}\")" ] }, { "cell_type": "markdown", "id": "e936355a-8184-4c7c-8f21-019d60845037", "metadata": {}, "source": [ "## 2. Defining the MD simulation settings" ] }, { "cell_type": "markdown", "id": "12ac4696-c8d7-4674-9fc0-c9415c2c888e", "metadata": {}, "source": [ "There are various different parameters which can be set to determine how the MD simulation will take place. To allow for maximum user flexibility, these are defined as a series of settings objects which control the following:\n", "\n", "| **Setting** | **Description** |\n", "|:------------------------------|:-----------------------------------------------------------|\n", "| `simulation_settings` | Parameters controlling the simulation plan, including the number of `minimization_steps`, the length of the NVT and NPT equilibration (`equilibration_length_nvt` and `equilibration_length`), and the length of the production MD run (`production_length`). |\n", "| `output_settings` | Parameters controlling the output from the MD simulations, including file names to save the system after minimization, NVT and NPT equilibration, and production run. Special `output_indices` can be defined to select which portions of the system should be saved (default: `not water`). A `trajectory_write_interval` determines the frequency of writing frames to the output trajectory. |\n", "| `forcefield_settings` | Settings that define the forcefield for the components, including the general `forcefields`, the `small_molecule_forcefield`, the `nonbonded_method`, and the `nonbonded_cutoff`. |\n", "| `engine_settings` | Parameters determining how the OpenMM engine will execute the simulation. This controls the `compute_platform` which will be used to carry out the simulation. |\n", "| `integrator_settings` | Parameters controlling the LangevinSplittingDynamicsMove integrator used for simulation, as well as the `barostat_frequency`. |\n", "| `partial_charge_settings` | Settings that define which method is used for assigning partial charges. |\n", "| `protocol_repeats` | Defines how often to run the MD protocol. |\n", "| `solvation_settings` | Parameters to control the `solvent_model` and the `solvent_padding`. |\n", "| `thermo_settings` | Parameters to control e.g. the `temperature` and the `pressure` of the system. |\n", "\n", "The easiest way to access and change settings is by first importing the default settings, printing them and then changing the settings according to the user's needs." ] }, { "cell_type": "code", "execution_count": 4, "id": "b27e54cc-fd6a-4afc-ab22-4dde0561938c", "metadata": {}, "outputs": [], "source": [ "from openfe.protocols.openmm_md.plain_md_methods import PlainMDProtocol\n", "from openff.units import unit\n", "\n", "settings = PlainMDProtocol.default_settings()\n", "settings.simulation_settings.equilibration_length_nvt = 0.01 * unit.nanosecond # setting the nvt equilibration length to 10 ps\n", "settings.simulation_settings.equilibration_length = 0.01 * unit.nanosecond # setting the npt equilibration length to 10 ps\n", "# Setting the production length and checkpoint interval to 20 ps to match the trajectory write interval, so one frame will be written\n", "settings.simulation_settings.production_length = 0.02 * unit.nanosecond # setting the npt production length to 20 ps\n", "settings.output_settings.checkpoint_interval = 0.02 * unit.nanosecond # setting the checkpoint interval to 20 ps\n", "settings.engine_settings.compute_platform = 'CPU' # running the simulation on the cpu\n", "settings.solvation_settings.solvent_padding = 1.0 * unit.nanometer # set the solvent padding to 1 nm to reduce the number of waters" ] }, { "cell_type": "code", "execution_count": 5, "id": "b69668c9-2e74-421e-a9ea-b7f3bc6c6e5a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'engine_settings': {'compute_platform': 'CPU', 'gpu_device_index': None},\n", " 'forcefield_settings': {'constraints': 'hbonds',\n", " 'forcefields': ['amber/ff14SB.xml',\n", " 'amber/tip3p_standard.xml',\n", " 'amber/tip3p_HFE_multivalent.xml',\n", " 'amber/phosaa10.xml'],\n", " 'hydrogen_mass': 3.0,\n", " 'nonbonded_cutoff': {'unit': 'nanometer', 'val': 0.9},\n", " 'nonbonded_method': 'PME',\n", " 'rigid_water': True,\n", " 'small_molecule_forcefield': 'openff-2.2.1'},\n", " 'integrator_settings': {'barostat_frequency': {'unit': 'timestep',\n", " 'val': 25.0},\n", " 'constraint_tolerance': 1e-06,\n", " 'langevin_collision_rate': {'unit': '1 / picosecond',\n", " 'val': 1.0},\n", " 'n_restart_attempts': 20,\n", " 'reassign_velocities': False,\n", " 'remove_com': False,\n", " 'timestep': {'unit': 'femtosecond', 'val': 4.0}},\n", " 'output_settings': {'checkpoint_interval': {'unit': 'nanosecond', 'val': 0.02},\n", " 'checkpoint_storage_filename': 'checkpoint.chk',\n", " 'equil_npt_structure': 'equil_npt.pdb',\n", " 'equil_nvt_structure': 'equil_nvt.pdb',\n", " 'forcefield_cache': 'db.json',\n", " 'log_output': 'simulation.log',\n", " 'minimized_structure': 'minimized.pdb',\n", " 'output_indices': 'not water',\n", " 'preminimized_structure': 'system.pdb',\n", " 'production_trajectory_filename': 'simulation.xtc',\n", " 'trajectory_write_interval': {'unit': 'picosecond',\n", " 'val': 20.0}},\n", " 'partial_charge_settings': {'nagl_model': None,\n", " 'number_of_conformers': None,\n", " 'off_toolkit_backend': 'ambertools',\n", " 'partial_charge_method': 'am1bcc'},\n", " 'protocol_repeats': 1,\n", " 'simulation_settings': {'equilibration_length': {'unit': 'nanosecond',\n", " 'val': 0.01},\n", " 'equilibration_length_nvt': {'unit': 'nanosecond',\n", " 'val': 0.01},\n", " 'minimization_steps': 5000,\n", " 'production_length': {'unit': 'nanosecond',\n", " 'val': 0.02}},\n", " 'solvation_settings': {'box_shape': 'dodecahedron',\n", " 'box_size': None,\n", " 'box_vectors': None,\n", " 'number_of_solvent_molecules': None,\n", " 'solvent_model': 'tip3p',\n", " 'solvent_padding': {'unit': 'nanometer', 'val': 1.0}},\n", " 'thermo_settings': {'ph': None,\n", " 'pressure': {'unit': 'bar', 'val': 1},\n", " 'redox_potential': None,\n", " 'temperature': {'unit': 'kelvin', 'val': 298.15}}}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/gufe/settings/models.py:30: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/\n", " pprint.pprint(self.dict())\n" ] } ], "source": [ "settings" ] }, { "cell_type": "markdown", "id": "c0f9416e-4623-4468-8e05-155c737599e6", "metadata": {}, "source": [ "## 3. Creating a `Protocol`\n", "\n", "The actual simulation is performed by a [`Protocol`](https://docs.openfree.energy/en/stable/guide/models/execution.html#protocols-and-the-execution-model). \n", "\n", "With the `Settings` inspected and adjusted, we can provide these to the `Protocol`. Here, the OpenMM-based MD Protocol is named `PlainMDProtocol`." ] }, { "cell_type": "code", "execution_count": 6, "id": "3b0f5a26-b23b-4ab2-9f1e-293d98ed5bcd", "metadata": {}, "outputs": [], "source": [ "# Creating the Protocol\n", "from openfe.protocols.openmm_md.plain_md_methods import PlainMDProtocol\n", "protocol = PlainMDProtocol(settings=settings)" ] }, { "cell_type": "markdown", "id": "64eaff78-9d15-47c9-8c20-d9ba15449488", "metadata": {}, "source": [ "## 4. Creating the `NonTransformation`\n", "Once we have the `ChemicalSystem`s, and the `Protocol`, we can create the `NonTransformation`. `NonTransformation` here simply means that the system is not \"transformed\" between two end states as is the case in binding free energy calculations." ] }, { "cell_type": "code", "execution_count": 7, "id": "987cceda-1253-417b-8cca-bc750ac0aa1e", "metadata": {}, "outputs": [], "source": [ "nontransformation = openfe.NonTransformation(\n", " system=system,\n", " protocol=protocol, # use protocol created above\n", " name=f\"{system.name}\",\n", ")" ] }, { "cell_type": "markdown", "id": "0c6d8278-ddda-4778-902a-18fef9196a07", "metadata": {}, "source": [ "## 5. Running the MD simulation\n", "There are two ways in which you could execute this MD simulation, either using the OpenFE command-line interface (CLI) or the Python API." ] }, { "cell_type": "markdown", "id": "1f61de94-0e10-4353-9fa0-d04bad7117dc", "metadata": {}, "source": [ "**(a) Using the CLI**" ] }, { "cell_type": "markdown", "id": "e762a3d5-7de5-4b46-aa34-b4b31feb05d6", "metadata": {}, "source": [ "We'll write out the transformation to disk, so that it can be run using the `openfe quickrun` command:" ] }, { "cell_type": "code", "execution_count": 8, "id": "7689c7ba-39a2-49ea-b82b-e3b198354c5b", "metadata": {}, "outputs": [], "source": [ "import pathlib\n", "# first we create the directory\n", "md_dir = pathlib.Path(\"md_input\")\n", "md_dir.mkdir(exist_ok=True)\n", "\n", "# then we write out the transformation\n", "nontransformation.to_json(md_dir / f\"{nontransformation.name}.json\")" ] }, { "cell_type": "markdown", "id": "3fbac123-f73e-450c-82e9-15f0dd4bc7da", "metadata": {}, "source": [ "You can run the MD simulation from the CLI by using the `openfe quickrun` command. It\n", "takes a transformation JSON as input, and the flags `-o` to give the final\n", "output JSON file and `-d` for the directory where simulation results should be\n", "stored. For example,\n", "\n", "```bash\n", "openfe quickrun path/to/nontransformation.json -o results.json -d working-directory\n", "```\n", "\n", "where `path/to/nontransformation.json` is the path to one of the files created above (`md_input/benzene_t4-lysozyme.json`).\n", "\n", "**(b) Using the Python API**\n", "\n", "Alternatively, the MD simulation can be run by executing the `ProtocolDAG`. The `ProtocolDAG` is created using the `protocol.create()` method and requires as input the `ChemicalSystem`. \n", "\n", "Note: we use the ``shared_basedir`` and ``scratch_basedir`` argument of ``execute_DAG`` in order to set the directory where the simulation files are written to." ] }, { "cell_type": "code", "execution_count": 9, "id": "544d037d-3a53-4390-91b8-3104c00694f0", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:openfe.utils.system_probe.log:SYSTEM CONFIG DETAILS:\n", "INFO:openfe.utils.system_probe.log.hostname:hostname: 'Alyssas-Macbook-Pro.local'\n", "INFO:openfe.utils.system_probe.log.gpu:CUDA-based GPU not found\n", "INFO:openfe.utils.system_probe.log:Memory used: 20.7G (63.6%)\n", "INFO:openfe.utils.system_probe.log:scratch_PlainMDProtocolUnit-52e66ff618014bb3877c15dc0d137171_attempt_0: 43% full (530.3G free)\n", "INFO:gufekey.openfe.protocols.openmm_md.plain_md_methods.PlainMDProtocolUnit:Creating system\n", "INFO:openmmforcefields.generators.template_generators:Requested to generate parameters for residue \n", "INFO:openmmforcefields.generators.template_generators:Generating a residue template for [H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H] using openff-2.2.1\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 0\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 1\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 2\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 3\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 4\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 5\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 6\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 7\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 8\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 9\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 10\n", "INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 11\n", "/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfe/protocols/openmm_utils/omm_compute.py:76: UserWarning: Non-CUDA platform selected: CPU, this may significantly impact simulation performance\n", " warnings.warn(wmsg)\n", "WARNING:root:Non-CUDA platform selected: CPU, this may significantly impact simulation performance\n", "INFO:openfe.protocols.openmm_md.plain_md_methods:minimizing systems\n", "INFO:openfe.protocols.openmm_md.plain_md_methods:Running NVT equilibration\n", "INFO:openfe.protocols.openmm_md.plain_md_methods:Completed NVT equilibration in 19.77936577796936 seconds\n", "INFO:openfe.protocols.openmm_md.plain_md_methods:Running NPT equilibration\n", "INFO:openfe.protocols.openmm_md.plain_md_methods:Completed NPT equilibration in 22.49833083152771 seconds\n", "INFO:openfe.protocols.openmm_md.plain_md_methods:running production phase\n", "INFO:openfe.protocols.openmm_md.plain_md_methods:Completed simulation in 45.916950941085815 seconds\n" ] } ], "source": [ "import gufe\n", "import pathlib\n", "\n", "# Creating the Protocol\n", "protocol = PlainMDProtocol(settings=settings)\n", "# Creating the Protocol DAG\n", "dag = protocol.create(stateA=system, stateB=system, mapping=None)\n", "workdir = pathlib.Path('./')\n", "# Running the MD simulations\n", "dagres = gufe.protocols.execute_DAG(\n", " dag,\n", " shared_basedir=workdir,\n", " scratch_basedir=workdir,\n", " keep_shared=True, # set this to True to save the outputs\n", " n_retries=3\n", ")" ] }, { "cell_type": "markdown", "id": "2d69fb85-d2da-4cb0-9b73-060d684618bb", "metadata": {}, "source": [ "Following files were created for the MD run. Note that ff multiple repeats of the MD simulation were run (`protocol_repeats` > 1), you will get N full replicates of these output files with N being the number of repeats." ] }, { "cell_type": "code", "execution_count": 10, "id": "16fbf816-8d3d-4121-ab0e-5cd7741788d6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ls: shared_PlainMDProtocolUnit-6b85013e6cb94120baf447540650643b_attempt_0/: No such file or directory\n" ] } ], "source": [ "!ls shared_PlainMDProtocolUnit-6b85013e6cb94120baf447540650643b_attempt_0/" ] }, { "cell_type": "markdown", "id": "44ad3ba8-e6f5-456a-ae0e-d2fcb7b8b152", "metadata": {}, "source": [ "### Performance consideration for gas phase MD simulations\n", "For gas phase MD simulations, we suggest setting `OPENMM_CPU_THREADS` to `1` to obtain good performance." ] }, { "cell_type": "code", "execution_count": null, "id": "b7f8de97-95b6-4c83-b5ef-17fae1614557", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.14" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }