{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Protein preparation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What is protein preparation?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The **protein preparation** phase, based on the PDB2PQR and propKa softwares, addresses e.g. the problems of assigning titration states at the user-chosen pH; flipping the side chains of HIS, ASN, and GLN residues; and optimizing the overall hydrogen bonding network. \n", "\n", "After preparing, the **build** phase takes a prepared system and applies the chosen forcefield in order to obtain simulation-ready input files." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Let's start" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Please cite -- HTMD: High-Throughput Molecular Dynamics for Molecular Discovery\n", "J. Chem. Theory Comput., 2016, 12 (4), pp 1845-1852. \n", "http://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00049\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Using Anaconda API: https://api.anaconda.org\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).\n", "\n" ] } ], "source": [ "from htmd.ui import *\n", "config(viewer='ngl')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Protein Preparation in HTMD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The system preparation phase is based on the PDB2PQR software. It \n", "includes the following steps (from the\n", "[PDB2PQR algorithm\n", "description](http://www.poissonboltzmann.org/docs/pdb2pqr-algorithm-description/)):\n", "\n", " * Compute empirical pKa values for the residues' local environment (propKa)\n", " * Assign titration states at the user-chosen pH;\n", " * Flipping the side chains of HIS (including user defined HIS states), ASN, and GLN residues;\n", "\n", " * Rotating the sidechain hydrogen on SER, THR, TYR, and CYS (if available);\n", " * Determining the best placement for the sidechain hydrogen on neutral HIS, protonated GLU, and protonated ASP;\n", " * Optimizing all water hydrogens." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The hydrogen bonding network calculations are performed by the\n", "[PDB2PQR](http://www.poissonboltzmann.org/) software package. The pKa\n", "calculations are performed by the [PROPKA\n", "3.1](https://github.com/jensengroup/propka-3.1) software packages.\n", "Please see the copyright, license and citation terms distributed with each.\n", "\n", "Note that this version was modified in order to use an \n", "externally-supplied propKa **3.1** (installed automatically via dependencies), whereas\n", "the original had propKa 3.0 *embedded*!\n", "\n", "The results of the function should be roughly equivalent of the system\n", "preparation wizard's preprocessing and optimization steps\n", "of Schrodinger's Maestro software." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Protein residue pKas in water" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Modified residue names" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The molecule produced by the preparation modifies residue names\n", "according to their protonation.\n", "Later system-building functions assume these residue naming conventions. \n", "**Note**: support for alternative charge states varies between the forcefields.\n", "\n", "Charge +1 | Neutral | Charge -1\n", "-------------|------------|----------\n", " - | ASH | ASP\n", " - | CYS | CYM\n", " - | GLH | GLU\n", "HIP | HID/HIE | -\n", "LYS | LYN | -\n", " - | TYR | TYM\n", "ARG | AR0 | -" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Limitations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ " * *PDB2PQR*: returns **one** solution consistent with its restraints, not the optimal one;\n", " * *Membrane proteins*: propKa ignores **lipid exposure** (more on this later);\n", " * *Large conformational changes*: local environment changes may be large enough that pKa decisions are **not transferable**. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## `proteinPrepare` function" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The `proteinPrepare` function requires a `Molecule` object, the protein to be prepared, as an argument, and returns the prepared system, also as a `Molecule`. Logging messages will provide information and warnings about the process.\n", "\n", "```python\n", "def proteinPrepare(mol_in,\n", " pH=7.0,\n", " verbose=0,\n", " returnDetails=False,\n", " hydrophobicThickness=None,\n", " holdSelection=None):\n", "```\n", "\n", "Returns a Molecule object, where residues have been renamed to follow internal conventions on protonation (below). Coordinates are changed to optimize the H-bonding network. This should be roughly comparable to Schroedinger Maestro's preparation wizard." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Parameters" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ " mol_in : htmd.Molecule\n", " the object to be optimized\n", " pH : float\n", " pH to decide titration\n", " verbose : int\n", " verbosity\n", " returnDetails : bool\n", " whether to return just the prepared Molecule (False, default) or a molecule *and* a ResidueInfo\n", " object including computed properties\n", " hydrophobicThickness : float\n", " the thickness of the membrane in which the protein is embedded, or None if globular protein.\n", " Used to provide a warning about membrane-exposed residues.\n", " holdSelection : str\n", " Atom selection to be excluded from optimization.\n", " Only the carbon-alpha atom will be considered for the corresponding residue." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`proteinPrepare()` is a convenience function. Using it\n", "is **not** mandatory. You can \n", "manipulate the input molecule with your custom functions. \n", "In particular,\n", "\n", "* Addition of hydrogen atoms is not required\n", "* Protonation states are set by renaming residues\n", "* HIS and other residues can be edited as coordinates\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Prepare trypsin (PDB: 3PTB) at pH 7." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tryp = Molecule(\"3PTB\")\n", "tryp_op = proteinPrepare(tryp)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Visualize protonation of residue 40" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "tryp_op.view(style=\"Licorice\",sel=\"resid 40\",hold=True)\n", "tryp_op.view(style=\"Lines\",sel=\"same residue as exwithin 4 of resid 40\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Preparation report" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "If the `returnDetails` argument is set, an object of type `ResidueData` is returned as a **second** return value. It carries a wealth of information on the preparation results. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "PreparationData object about 290 residues.\n", "Unparametrized residue names: CA, BEN\n", "Please find the full info in the .data property, e.g.: \n", " resname resid insertion chain pKa protonation flipped buried\n", "0 ILE 16 A NaN ILE NaN NaN\n", "1 VAL 17 A NaN VAL NaN NaN\n", "2 GLY 18 A NaN GLY NaN NaN\n", "3 GLY 19 A NaN GLY NaN NaN\n", "4 TYR 20 A 9.590845 TYR NaN 14.642857\n", " . . ." ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tryp_op, prepData = proteinPrepare(tryp, returnDetails=True)\n", "prepData" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Most of it is accessible in the `data` property, as a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index(['resname', 'resid', 'insertion', 'chain', 'pKa', 'protonation',\n", " 'flipped', 'patches', 'buried', 'z', 'membraneExposed',\n", " 'forced_protonation', 'pka_group_id', 'pka_residue_type', 'pka_type',\n", " 'pka_charge', 'pka_atom_name', 'pka_atom_sybyl_type', 'pdb2pqr_idx'],\n", " dtype='object')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prepData.data.columns" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | resname | \n", "resid | \n", "pKa | \n", "protonation | \n", "
---|---|---|---|---|
0 | \n", "ILE | \n", "16 | \n", "NaN | \n", "ILE | \n", "
1 | \n", "VAL | \n", "17 | \n", "NaN | \n", "VAL | \n", "
2 | \n", "GLY | \n", "18 | \n", "NaN | \n", "GLY | \n", "
3 | \n", "GLY | \n", "19 | \n", "NaN | \n", "GLY | \n", "
4 | \n", "TYR | \n", "20 | \n", "9.590845 | \n", "TYR | \n", "
5 | \n", "THR | \n", "21 | \n", "NaN | \n", "THR | \n", "
6 | \n", "CYS | \n", "22 | \n", "99.990000 | \n", "CYX | \n", "
7 | \n", "GLY | \n", "23 | \n", "NaN | \n", "GLY | \n", "
8 | \n", "ALA | \n", "24 | \n", "NaN | \n", "ALA | \n", "
9 | \n", "ASN | \n", "25 | \n", "NaN | \n", "ASN | \n", "