{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Protein preparation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What is protein preparation?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The **protein preparation** phase, based on the PDB2PQR and propKa softwares, addresses e.g. the problems of assigning titration states at the user-chosen pH; flipping the side chains of HIS, ASN, and GLN residues; and optimizing the overall hydrogen bonding network. \n", "\n", "After preparing, the **build** phase takes a prepared system and applies the chosen forcefield in order to obtain simulation-ready input files." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Let's start" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Please cite -- HTMD: High-Throughput Molecular Dynamics for Molecular Discovery\n", "J. Chem. Theory Comput., 2016, 12 (4), pp 1845-1852. \n", "http://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00049\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Using Anaconda API: https://api.anaconda.org\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).\n", "\n" ] } ], "source": [ "from htmd.ui import *\n", "config(viewer='ngl')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Protein Preparation in HTMD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The system preparation phase is based on the PDB2PQR software. It \n", "includes the following steps (from the\n", "[PDB2PQR algorithm\n", "description](http://www.poissonboltzmann.org/docs/pdb2pqr-algorithm-description/)):\n", "\n", " * Compute empirical pKa values for the residues' local environment (propKa)\n", " * Assign titration states at the user-chosen pH;\n", " * Flipping the side chains of HIS (including user defined HIS states), ASN, and GLN residues;\n", "\n", " * Rotating the sidechain hydrogen on SER, THR, TYR, and CYS (if available);\n", " * Determining the best placement for the sidechain hydrogen on neutral HIS, protonated GLU, and protonated ASP;\n", " * Optimizing all water hydrogens." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The hydrogen bonding network calculations are performed by the\n", "[PDB2PQR](http://www.poissonboltzmann.org/) software package. The pKa\n", "calculations are performed by the [PROPKA\n", "3.1](https://github.com/jensengroup/propka-3.1) software packages.\n", "Please see the copyright, license and citation terms distributed with each.\n", "\n", "Note that this version was modified in order to use an \n", "externally-supplied propKa **3.1** (installed automatically via dependencies), whereas\n", "the original had propKa 3.0 *embedded*!\n", "\n", "The results of the function should be roughly equivalent of the system\n", "preparation wizard's preprocessing and optimization steps\n", "of Schrodinger's Maestro software." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Protein residue pKas in water" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Modified residue names" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The molecule produced by the preparation modifies residue names\n", "according to their protonation.\n", "Later system-building functions assume these residue naming conventions. \n", "**Note**: support for alternative charge states varies between the forcefields.\n", "\n", "Charge +1 | Neutral | Charge -1\n", "-------------|------------|----------\n", " - | ASH | ASP\n", " - | CYS | CYM\n", " - | GLH | GLU\n", "HIP | HID/HIE | -\n", "LYS | LYN | -\n", " - | TYR | TYM\n", "ARG | AR0 | -" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Limitations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ " * *PDB2PQR*: returns **one** solution consistent with its restraints, not the optimal one;\n", " * *Membrane proteins*: propKa ignores **lipid exposure** (more on this later);\n", " * *Large conformational changes*: local environment changes may be large enough that pKa decisions are **not transferable**. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## `proteinPrepare` function" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The `proteinPrepare` function requires a `Molecule` object, the protein to be prepared, as an argument, and returns the prepared system, also as a `Molecule`. Logging messages will provide information and warnings about the process.\n", "\n", "```python\n", "def proteinPrepare(mol_in,\n", " pH=7.0,\n", " verbose=0,\n", " returnDetails=False,\n", " hydrophobicThickness=None,\n", " holdSelection=None):\n", "```\n", "\n", "Returns a Molecule object, where residues have been renamed to follow internal conventions on protonation (below). Coordinates are changed to optimize the H-bonding network. This should be roughly comparable to Schroedinger Maestro's preparation wizard." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Parameters" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ " mol_in : htmd.Molecule\n", " the object to be optimized\n", " pH : float\n", " pH to decide titration\n", " verbose : int\n", " verbosity\n", " returnDetails : bool\n", " whether to return just the prepared Molecule (False, default) or a molecule *and* a ResidueInfo\n", " object including computed properties\n", " hydrophobicThickness : float\n", " the thickness of the membrane in which the protein is embedded, or None if globular protein.\n", " Used to provide a warning about membrane-exposed residues.\n", " holdSelection : str\n", " Atom selection to be excluded from optimization.\n", " Only the carbon-alpha atom will be considered for the corresponding residue." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`proteinPrepare()` is a convenience function. Using it\n", "is **not** mandatory. You can \n", "manipulate the input molecule with your custom functions. \n", "In particular,\n", "\n", "* Addition of hydrogen atoms is not required\n", "* Protonation states are set by renaming residues\n", "* HIS and other residues can be edited as coordinates\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Prepare trypsin (PDB: 3PTB) at pH 7." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tryp = Molecule(\"3PTB\")\n", "tryp_op = proteinPrepare(tryp)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Visualize protonation of residue 40" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "tryp_op.view(style=\"Licorice\",sel=\"resid 40\",hold=True)\n", "tryp_op.view(style=\"Lines\",sel=\"same residue as exwithin 4 of resid 40\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Preparation report" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "If the `returnDetails` argument is set, an object of type `ResidueData` is returned as a **second** return value. It carries a wealth of information on the preparation results. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "PreparationData object about 290 residues.\n", "Unparametrized residue names: CA, BEN\n", "Please find the full info in the .data property, e.g.: \n", " resname resid insertion chain pKa protonation flipped buried\n", "0 ILE 16 A NaN ILE NaN NaN\n", "1 VAL 17 A NaN VAL NaN NaN\n", "2 GLY 18 A NaN GLY NaN NaN\n", "3 GLY 19 A NaN GLY NaN NaN\n", "4 TYR 20 A 9.590845 TYR NaN 14.642857\n", " . . ." ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tryp_op, prepData = proteinPrepare(tryp, returnDetails=True)\n", "prepData" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Most of it is accessible in the `data` property, as a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index(['resname', 'resid', 'insertion', 'chain', 'pKa', 'protonation',\n", " 'flipped', 'patches', 'buried', 'z', 'membraneExposed',\n", " 'forced_protonation', 'pka_group_id', 'pka_residue_type', 'pka_type',\n", " 'pka_charge', 'pka_atom_name', 'pka_atom_sybyl_type', 'pdb2pqr_idx'],\n", " dtype='object')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prepData.data.columns" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
resnameresidpKaprotonation
0ILE16NaNILE
1VAL17NaNVAL
2GLY18NaNGLY
3GLY19NaNGLY
4TYR209.590845TYR
5THR21NaNTHR
6CYS2299.990000CYX
7GLY23NaNGLY
8ALA24NaNALA
9ASN25NaNASN
\n", "
" ], "text/plain": [ " resname resid pKa protonation\n", "0 ILE 16 NaN ILE\n", "1 VAL 17 NaN VAL\n", "2 GLY 18 NaN GLY\n", "3 GLY 19 NaN GLY\n", "4 TYR 20 9.590845 TYR\n", "5 THR 21 NaN THR\n", "6 CYS 22 99.990000 CYX\n", "7 GLY 23 NaN GLY\n", "8 ALA 24 NaN ALA\n", "9 ASN 25 NaN ASN" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prepData.data.loc[:,['resname','resid','pKa','protonation']].head(10)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As such, it can be easily queried and written as a spreadsheet in Excel or CSV format." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "prepData.data.to_excel(\"./tryp_data.xlsx\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Special case: Membrane proteins" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Membrane-embedded proteins are in contact with an hydrophobic region which may alter pKa values for membrane-exposed residues ([Teixeira et al., J. Chem. Theory Comput., 2016, 12 (3), pp 930–934](http://dx.doi.org/10.1021/acs.jctc.5b01114)). Although the effect is not currently taken into account quantitatively, if a `hydrophobicThickness` argument is provided, warnings will be generated for residues exposed to the lipid region." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The following example shows the preparation of the mu opioid receptor, 4DKL. \n", "The **pre-oriented** structure is retrieved from the OPM database." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "32.0\n" ] } ], "source": [ "from htmd.util import opm\n", "mor, thickness = opm(\"4dkl\") \n", "print(thickness)\n", "mor.filter(\"protein and noh\")\n", "mor_opt, mor_data = proteinPrepare(mor, returnDetails=True,\n", " hydrophobicThickness=thickness)\n", "\n", "exposedRes = mor_data.data.membraneExposed\n", "mor_data.data[exposedRes]\n", "mor_data.data[exposedRes].to_excel(\"./mor_exposed_residues.xlsx\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Case 2. Building with a ligand\n", "\n", "Coexistence and automatic placement of a ligand requires further manipulation,\n", "because:\n", "\n", "1. The ligand may have to be arranged in a geometrically sensible way\n", "2. We likely need additional parameters and topologies (M. J. Harvey's parametrization session)\n", "\n", "See the tutorial [System Building Trypsin-Benzamidine](https://www.htmd.org/docs/latest/tutorials/system-building-protein-ligand.html)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Case 3. Membrane proteins\n", "\n", "Pre-equilibrated membranes can be merged with pre-oriented systems,\n", "e.g. downloaded from the OPM. See the tutorial [System Building μ-opioid Receptor in Membrane](https://www.htmd.org/docs/latest/tutorials/system-building-protein-in-membrane.html)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Citations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Please acknowledge your use of PDB2PQR and PropKa by citing:\n", "\n", "* Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA. PDB2PQR: Expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res, 35, W522-5, 2007. \n", "* Sondergaard, Chresten R., Mats HM Olsson, Michal Rostkowski, and Jan H. Jensen. \"Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of pKa Values.\" Journal of Chemical Theory and Computation 7, no. 7 (2011): 2284-2295." ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3.0 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "widgets": { "state": { "97d97c883f3143d9b5c0d6e4341c6b21": { "views": [ { "cell_index": 22.0 } ] }, "ca44adb71e664d59b619bf10f8c210b9": { "views": [ { "cell_index": 22.0 } ] } }, "version": "1.2.0" } }, "nbformat": 4, "nbformat_minor": 0 }