{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Check two SMIRNOFFs have the same typing\n", "\n", "This notebook was created to check that a change to a SMIRNOFF force field doesn't change the way it types\n", "molecules. This concern came from switching the decorator `'R'` to `'x'` so that the SMIRKS in the SMIRNOFF format would be compatible with OEtoolkits and RDKit. See [smirnoff99Frosst issue#54](https://github.com/openforcefield/smirnoff99Frosst/issues/54)\n", "\n", "This notebook will:\n", "1. Convert a specified smirks-frcmod file to SMIRNOFF FFXML (this is the test force field)\n", "2. Load the current release of smirnoff99Frosst (this is the reference force field)\n", "3. Generate (or take in) a set of molecules in OpenEye oemol format. Label these molecules with both force fields.\n", "4. Identify molecules where parameter assignment doesn't agree\n", "5. Visualize molecules by force type\n", "\n", "**Authors**:\n", "* Caitlin C. Bannan (UCI)\n", "* functions copied from parameter_usage.ipynb written by David L. Mobley (UCI)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Imports\n", "from __future__ import print_function\n", "from convert_frcmod import *\n", "import openeye.oechem as oechem\n", "import openeye.oeiupac as oeiupac\n", "import openeye.oeomega as oeomega\n", "import openeye.oedepict as oedepict\n", "from IPython.display import display\n", "from openff.toolkit.typing.engines.smirnoff.forcefield import *\n", "from openff.toolkit.typing.engines.smirnoff.forcefield_utils import get_molecule_parameterIDs\n", "from openff.toolkit.utils import *\n", "% matplotlib inline\n", "import matplotlib\n", "import numpy as np\n", "import pylab as pl\n", "import matplotlib.pyplot as plt\n", "import time\n", "import IPython\n", "import pickle\n", "import glob" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Relevant methods" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def depictAtomByIdx(mol_copy, atomIdxList, supH = True, width=900, height=500):\n", " mol = oechem.OEMol(mol_copy)\n", " OEGenerate2DCoordinates(mol)\n", " atomBondSet = oechem.OEAtomBondSet()\n", " for atom in mol.GetAtoms():\n", " if atom.GetIdx() in atomIdxList:\n", " atomBondSet.AddAtom( atom)\n", " for bond in atom.GetBonds():\n", " nbrAtom = bond.GetNbr(atom)\n", " nbrIdx = nbrAtom.GetIdx()\n", " if (nbrIdx in atomIdxList) and nbrIdx>atom.GetIdx():\n", " atomBondSet.AddBond( bond)\n", " from IPython.display import Image\n", " dopt = oedepict.OEPrepareDepictionOptions()\n", " dopt.SetDepictOrientation( oedepict.OEDepictOrientation_Horizontal)\n", " dopt.SetSuppressHydrogens(supH)\n", " oedepict.OEPrepareDepiction(mol, dopt)\n", " opts = oedepict.OE2DMolDisplayOptions(width, height, oedepict.OEScale_AutoScale)\n", " disp = oedepict.OE2DMolDisplay(mol, opts)\n", " aroStyle = oedepict.OEHighlightStyle_Color\n", " aroColor = oechem.OEColor(oechem.OEGrey)\n", " oedepict.OEAddHighlighting(disp, aroColor, aroStyle, \n", " oechem.OEIsAromaticAtom(), oechem.OEIsAromaticBond() )\n", " hstyle = oedepict.OEHighlightStyle_BallAndStick\n", " hcolor = oechem.OEColor(oechem.OELightGreen)\n", " oedepict.OEAddHighlighting(disp, hcolor, hstyle, atomBondSet)\n", " #ofs = oechem.oeosstream()\n", " img = oedepict.OEImage(width, height)\n", " oedepict.OERenderMolecule(img, disp)\n", " #oedepict.OERenderMolecule(ofs, 'png', disp)\n", " #ofs.flush()\n", " #return Image(data = \"\".join(ofs.str()))\n", " return Image(oedepict.OEWriteImageToString(\"png\",img))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def getMolParamIDToAtomIndex( oemol, ff):\n", " \"\"\"Take an OEMol and a SMIRNOFF force field object and return a dictionary,\n", " keyed by parameter ID, where each entry is a tuple of\n", " ( smirks, [[atom1, ... atomN], [atom1, ... atomN]) giving the SMIRKS\n", " corresponding to that parameter ID and a list of the atom groups in that\n", " molecule that parameter is applied to.\n", "\n", " Parameters\n", " ----------\n", " oemol : OEMol\n", " OpenEye OEMol with the molecule to investigate.\n", " ff : ForceField\n", " SMIRNOFF ForceField object (obtained from an ffxml via ForceField(ffxml)) containing FF of interest.\n", "\n", " Returns\n", " -------\n", " param_usage : dictionary\n", " Dictionary, keyed by parameter ID, where each entry is a tuple of\n", " ( smirks, [[atom1, ... atomN], [atom1, ... atomN]) giving the SMIRKS\n", " corresponding to that parameter ID and a list of the atom groups in\n", " that molecule that parameter is applied to.\n", "\n", " \"\"\"\n", " labels = ff.labelMolecules([oemol])\n", " param_usage = {}\n", " for mol_entry in range(len(labels)):\n", " for force in labels[mol_entry].keys():\n", " for (atom_indices, pid, smirks) in labels[mol_entry][force]:\n", " if not pid in param_usage:\n", " param_usage[pid] = (smirks, [atom_indices])\n", " else:\n", " param_usage[pid][1].append( atom_indices )\n", "\n", " return param_usage" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def labels_to_pidDict(labels):\n", " \"\"\"\n", " This method takes a set of SMIRNOFF force field labels and returns\n", " a dictionary with information for each molecule at each force type\n", " in the form:\n", " { force_type: {mol_index: {(indice tuple): pid, ...}, ... } }\n", " \"\"\"\n", " force_type_dict = dict()\n", " for idx, mol_dict in enumerate(labels):\n", " for force_type, label_set in mol_dict.items():\n", " if not force_type in force_type_dict:\n", " force_type_dict[force_type] = dict()\n", " force_type_dict[force_type][idx] = dict()\n", " \n", " for (indices, pid, smirks) in label_set:\n", " force_type_dict[force_type][idx][tuple(indices)] = {'pid': pid, 'smirks':smirks}\n", " return force_type_dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Convert specified SMIRKS `frcmod` file to SMIRNOFF FFXML" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Input and output info\n", "#infile = 'example.frcmod' # smirnoffish frcmod file to convert\n", "infile = 'smirnoffishFrcmod.parm99Frosst.txt' # smirnoffish frcmod file to convert\n", "ffxmlFile = 'smirnoff99FrosstFrcmod.offxml'\n", "template = 'template.offxml' # Template FFXML file without parameters (but with remainder of contents)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Convert\n", "# Already converted\n", "convert_frcmod_to_ffxml( infile, template, ffxmlFile)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Load SMIRNOFF FFXML\n", "test_ff = ForceField(ffxmlFile) # We will use this below to access details of parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Load smirnoff99Frosst from current release\n", "\n", "This is currently linking to `openforcefield/data/test_forcefields/smirnoff99Frosst.offxml` which is the current release" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ref_ff = ForceField('test_forcefields/smirnoff99Frosst.offxml')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Generate or take in a set of molecules in OpenEye OEMol format\n", "\n", "Here we will generate a list of molecules. We will label all molecules given. Here we don't care rather the assigned parameters are generic or not just that the typing doesn't change between the two force fields. \n", "\n", "Currently this section expects a relative or absolute path to a single file. The utils.read_molecules function will access `/openforcefield/data/molecules/[your path]` if there is no file at the relative path.\n", " " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading molecules from '/Users/bannanc/anaconda3/lib/python3.5/site-packages/openforcefield/data/molecules/DrugBank_tripos.mol2'...\n", "5928 molecules read\n", "1.180 s elapsed\n" ] } ], "source": [ "molecule_file = \"DrugBank_tripos.mol2\"\n", "molecules = utils.read_molecules(molecule_file)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Typed 5928 molecules with test and reference force fields in 22.57 minutes\n" ] } ], "source": [ "init = time.time()\n", "test_labels = test_ff.labelMolecules(molecules)\n", "ref_labels = ref_ff.labelMolecules(molecules)\n", "t = (time.time() - init) / 60.0\n", "print(\"Typed %i molecules with test and reference force fields in %.2f minutes\" % (len(molecules), t))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Identify any molecules not assigned the same parameters by both force fields" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "created indices tuple to pid dictionaries in 0.06 minutes\n" ] } ], "source": [ "# Make dictionary by molecule and tuple indices\n", "init = time.time()\n", "test_dict = labels_to_pidDict(test_labels)\n", "ref_dict = labels_to_pidDict(ref_labels)\n", "t = (time.time() - init) / 60.0\n", "print(\"created indices tuple to pid dictionaries in %.2f minutes\" % t)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Force Type Number mismatches\n", "-------------------------------------------------------\n", "NonbondedGenerator 0\n", "HarmonicBondGenerator 0\n", "PeriodicTorsionGenerator 0\n", "HarmonicAngleGenerator 0\n" ] } ], "source": [ "# Make a dictionary to store mismatches:\n", "mismatch = dict()\n", "# This will have embedded dictionaries with this form:\n", "# force_type: {mol_idx:{(index tuple): {test_pid, test_smirks, ref_pid, ref_smirks}}}\n", "mismatch_count = dict()\n", "\n", "# loop through force types\n", "for force_type, test_mol_dict in test_dict.items():\n", " if force_type not in mismatch:\n", " mismatch[force_type] = dict()\n", " if force_type not in mismatch_count:\n", " mismatch_count[force_type] = 0\n", " \n", " # loop through molecules in each force type\n", " for mol_idx, test_tuple_dict in test_mol_dict.items():\n", " if not mol_idx in mismatch[force_type]:\n", " mismatch[force_type][mol_idx] = dict()\n", " \n", " # loop through all atom indice tuples in this molecule\n", " for indice_tuple, test_info in test_tuple_dict.items():\n", " \n", " # compare pid assignment\n", " test_pid = test_info['pid']\n", " ref_pid = ref_dict[force_type][mol_idx][indice_tuple]['pid']\n", " # if they don't match store info in mismatch dictionary and update count\n", " if test_pid != ref_pid:\n", " test_smirks = test_info['smirks']\n", " ref_smirks = ref_dict[force_type][mol_idx][indice_tuple]['smirks']\n", " mismatch[force_type][mol_idx][indice_tuple] = {'test_pid': test_pid, 'test_smirks': test_smirks,\n", " 'ref_pid': ref_pid, 'ref_smirks': ref_smirks}\n", " mismatch_count[force_type] +=1\n", "\n", "print(\"%-35s %s\" % (\"Force Type\", \"Number mismatches\"))\n", "print(\"-\"*55)\n", "for force_type, count in mismatch_count.items():\n", " print(\"%-35s %i\" % (force_type, count))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Visualize mismatches by force type\n", "\n", "Chose a force type and then the molecules for each will be displayed" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true, "scrolled": false }, "outputs": [], "source": [ "ForceType = \"PeriodicTorsionGenerator\"\n", "\n", "for mol_idx, tuple_dict in mismatch[ForceType].items():\n", " # only visualize molecules with mismatch indices\n", " keys = [k for k in tuple_dict.keys()]\n", " if len(keys) == 0:\n", " continue\n", " \n", " mol = OEMol(molecules[mol_idx])\n", " print(\"Looking at molecule %i\" % mol_idx) \n", " \n", " for indice_tuple, pid_info in tuple_dict.items():\n", " test_pid = pid_info['test_pid']\n", " test_smirks = pid_info['test_smirks']\n", " \n", " ref_pid = pid_info['ref_pid']\n", " ref_smirks = pid_info['ref_smirks']\n", " \n", " print(\"%-10s %-40s %-40s\" % ('', 'test force field', 'reference force field'))\n", " print(\"%-10s %-40s %-40s\" % ('pid'))\n", " print(\"%-10s %-30s %-10s %-30s\" % (test_pid, test_smirks, ref_pid, ref_smirks))\n", " display(depictAtomByIdx(mol, indice_tuple, supH = False))\n", " print(\"\\n\")\n", " print(\"\\n\")\n", " print(\"-\"*100)\n", " print(\"\\n\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Extra check for `R`\n", "\n", "Since this notebook was made explicitly for the change from `'R'` to `'x'` I want to make sure that all of the `'R'`s were replaced" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# loop through force types\n", "for force_type, test_mol_dict in test_dict.items(): \n", " # loop through molecules in each force type\n", " for mol_idx, test_tuple_dict in test_mol_dict.items():\n", " # loop through all atom indice tuples in this molecule\n", " for indice_tuple, test_info in test_tuple_dict.items():\n", " # compare pid assignment\n", " test_pid = test_info['pid']\n", " test_smirks = test_info['smirks']\n", " # Check for 'R'\n", " if 'R' in test_smirks:\n", " print(\"Found 'R' in %s (%s)\" % )\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 1 }