{ "cells": [ { "cell_type": "raw", "id": "4d9beae7-b6bd-4743-9614-42450df0697d", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ ".. _Loading Molecules:\n", "\n", "Loading Molecules and Chemical Systems\n", "======================================" ] }, { "cell_type": "markdown", "id": "088a1bc4-e5f3-47ac-8ebf-1a904fa82f80", "metadata": { "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Loading Molecules and Chemical Systems" ] }, { "cell_type": "raw", "id": "acb1ed05-1874-4dd3-a41d-e18623bece44", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Loading Molecule Data into Components\n", "-------------------------------------" ] }, { "cell_type": "markdown", "id": "00d9ca69-1b38-4cf8-9c93-1fc4c08dae15", "metadata": { "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Loading Molecule Data into Components" ] }, { "cell_type": "markdown", "id": "8bc61b88-0c8d-4b89-b704-8ac4ade19c6c", "metadata": { "raw_mimetype": "", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "One of the first tasks you'll likely want to do is loading your various input files. In `openfe` the entire contents of a simulation volume, for example the ligand, protein and water is referred to\n", "as the `ChemicalSystem`.\n", "\n", "A free energy difference is defined as being between two such `ChemicalSystem` objects. To make expressing free energy calculations easier,this `ChemicalSystem` is broken down into various `Component` objects. It is these `Component` objects that are then transformed, added or removed when performing a free energy calculation.\n", "\n", "
\n", " Once a chemical model is loaded into a Component it is read only and cannot be modified.\n", " This means that any modification/tweaking of the inputs must be done before any Component objects are created.\n", " This is done so that any data cannot be accidentally modified, ruining the provenance chain.\n", "
\n", "\n", "As these all behave slightly differently to accomodate their contents, there are specialised versions of Component to handle the different items in your system. We will walk through how different items can be loaded, and then how these are assembled to form `ChemicalSystem` objects." ] }, { "cell_type": "raw", "id": "d7277536-be61-4d45-b09f-1a7b355bee82", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ ".. _Loading small molecules:\n", "\n", "Loading small molecules\n", "~~~~~~~~~~~~~~~~~~~~~~~" ] }, { "cell_type": "markdown", "id": "d2658885-8ad9-4142-b163-17c8aa576b00", "metadata": { "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Loading small molecules" ] }, { "cell_type": "markdown", "id": "15115699-8bdb-4261-9afd-18ebd55bd06a", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Small molecules, such as ligands, are handled using the `SmallMoleculeComponent` class. These are lightweight wrappers around RDKit Molecules and can be created directly from an RDKit molecule:" ] }, { "cell_type": "code", "execution_count": 1, "id": "1436cd71-2a65-4cef-9f9f-73013c791a17", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "from rdkit import Chem\n", "import openfe\n", "\n", "m = Chem.MolFromMol2File('assets/benzene.mol2', removeHs=False)\n", "\n", "smc = openfe.SmallMoleculeComponent(m, name='benzene')" ] }, { "cell_type": "markdown", "id": "8c78efe2-5a0d-4aac-afd1-3e6c5b7a1efc", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", " Remember to include the removeHs=False keyword argument so that RDKit does not strip your hydrogens!\n", "
" ] }, { "cell_type": "markdown", "id": "b092b780-a2cb-4ece-9d77-c09f2e035b8f", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "As these types of structures are typically stored inside sdf files, there is a `from_sdf_file` convenience class method:" ] }, { "cell_type": "code", "execution_count": 2, "id": "7f7e56c9-2bee-46a1-9ccf-7003b4886249", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import openfe\n", "\n", "smc = openfe.SmallMoleculeComponent.from_sdf_file('assets/benzene.sdf')" ] }, { "cell_type": "markdown", "id": "2dbd57c0-87ee-426a-b0ac-1861480a0d34", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", " The from_sdf_file method will only read the first molecule in a multi-molecule file.\n", "
\n", "\n", "To load multiple molcules, use RDKit's `Chem.SDMolSupplier` to iterate over the contents, and create a `SmallMoleculeComponent` from each." ] }, { "cell_type": "code", "execution_count": 3, "id": "85704a16-86b6-4c25-9a4d-050fd215a2ef", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "from rdkit import Chem\n", "import openfe\n", "\n", "molecules = [\n", " openfe.SmallMoleculeComponent(mol) \n", " for mol in Chem.SDMolSupplier(\n", " \"assets/somebenzenes.sdf\", \n", " removeHs=False,\n", " )\n", "]" ] }, { "cell_type": "raw", "id": "b1c7eea8-7aaa-4a72-8d48-dfb7c32a60eb", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ ".. _Loading proteins:\n", "\n", "Loading proteins\n", "~~~~~~~~~~~~~~~~" ] }, { "cell_type": "markdown", "id": "852c038e-d4dc-4fac-975c-d5e6baee719e", "metadata": { "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Loading proteins" ] }, { "cell_type": "markdown", "id": "13839df1-fa2c-4e0a-8322-56ce466aea37", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Proteins are handled using a `ProteinComponent`. Like `SmallMoleculeComponent`, these are based upon RDKit Molecules; however, they are expected to have the `MonomerInfo` struct present on all atoms. This struct contains the residue and chain information and is essential to apply many popular force fields. A \"protein\" here is considered as the fully modelled entire biological assembly, including all chains, structural waters, ions, and so forth.\n", "\n", "To load a protein, use the `ProteinComponent.from_pdb_file` or `ProteinComponent.from_pdbx_file` class methods." ] }, { "cell_type": "code", "execution_count": 4, "id": "669ea0d6-3f49-4e91-b992-95f1bd5e7174", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import openfe\n", "\n", "p = openfe.ProteinComponent.from_pdb_file('assets/t4_lysozyme.pdb')" ] }, { "cell_type": "raw", "id": "f0e245cc-8763-48da-8908-1ab28e6b84d9", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ ".. _Defining solvents:\n", "\n", "Defining solvents\n", "~~~~~~~~~~~~~~~~~" ] }, { "cell_type": "markdown", "id": "21ebdadc-1bf6-4ad2-827e-b6b7dc19572a", "metadata": { "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Defining solvents" ] }, { "cell_type": "markdown", "id": "40e3a23f-cc13-402e-8197-9552fbb387e7", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The bulk solvent phase is defined using a `SolventComponent` object. Unlike the previously detailed Components, this does not have any explicit molecules or coordinates, but instead represents the way that the overall system will be solvated. This information is then interpreted inside the `Protocol` when solvating the system.\n", "\n", "By default, this solvent is water with 0.15 M NaCl salt. All parameters; the positive and negative ion as well as the ion concentration (which must be specified along with the unit) can be freely defined." ] }, { "cell_type": "code", "execution_count": 5, "id": "e6c35e81-d0c9-4680-bb35-e7ae2b3fd2da", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import openfe\n", "from openff.units import unit\n", "\n", "solv = openfe.SolventComponent(ion_concentration=0.15 * unit.molar)" ] }, { "cell_type": "raw", "id": "02e4137f-f823-4dd7-beaa-a8382735bc64", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ ".. _Assembling into ChemicalSystems:\n", "\n", "Assembling into ChemicalSystems\n", "-------------------------------" ] }, { "cell_type": "markdown", "id": "1e388bd2-f1b3-4a90-970e-72b652f6227b", "metadata": { "nbsphinx": "hidden", "raw_mimetype": "", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Assembling into ChemicalSystems" ] }, { "cell_type": "markdown", "id": "10655021-f3f0-4b21-9d8b-fc5a885555e4", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "With individual components defined, we can then proceed to assemble combinations of these into a description of an entire **system** with the `ChemicalSystem` class.\n", "The end result of this is a chemical model which describes the chemical topology (e.g. bonds, formal charges) and atom positions, but does not describe the force field parameters or atom types or any other energetic terms.\n", "\n", "The input to the `ChemicalSystem` constructor is a dictionary mapping string labels (e.g. 'ligand' or 'protein') to individual `Component` objects. The nature of these labels must match the labels that a given `Protocol` expects. For free energy calculations, we often want to describe two systems which feature many similar components but differ in one component, which is the subject of the free energy perturbation.\n", "For example we could define two `ChemicalSystem` objects which we could perform a relative binding free energy calculation between as:" ] }, { "cell_type": "code", "execution_count": 6, "id": "bf52b73e-5705-48e7-b69a-fb63748588d8", "metadata": { "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "from openfe import ChemicalSystem, ProteinComponent, SmallMoleculeComponent, SolventComponent\n", "\n", "# Define the solvent environment and protein structure, these are common across both systems\n", "sol = SolventComponent(ion_concentration=0.15 * unit.molar)\n", "p = ProteinComponent.from_pdb_file('assets/t4_lysozyme.pdb')\n", "\n", "# Specify the dictionary of shared components\n", "shared_components = {'solvent': sol, 'protein': p}\n", "\n", "# Define the two ligands we are interested in\n", "m1 = SmallMoleculeComponent.from_sdf_file('assets/benzene.sdf')\n", "m2 = SmallMoleculeComponent.from_sdf_file('assets/toluene.sdf')\n", "\n", "# create the systems\n", "cs1 = ChemicalSystem({'ligand': m1, **shared_components})\n", "cs2 = ChemicalSystem({'ligand': m2, **shared_components})" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }