{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Protein preparation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What is protein preparation?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The **protein preparation** phase, based on the PDB2PQR and propKa softwares, addresses e.g. the problems of assigning titration states at the user-chosen pH; flipping the side chains of HIS, ASN, and GLN residues; and optimizing the overall hydrogen bonding network. \n", "\n", "After preparing, the **build** phase takes a prepared system and applies the chosen forcefield in order to obtain simulation-ready input files." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Let's start" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. \n", "https://dx.doi.org/10.1021/acs.jctc.6b00049\n", "Documentation: http://software.acellera.com/\n", "To update: conda update htmd -c acellera -c psi4\n", "\n", "You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).\n", "\n" ] } ], "source": [ "from htmd.ui import *\n", "config(viewer='ngl')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Protein Preparation in HTMD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The system preparation phase is based on the PDB2PQR software. It \n", "includes the following steps (from the\n", "[PDB2PQR algorithm\n", "description](http://apbs-pdb2pqr.readthedocs.io/en/latest/pdb2pqr/invoking.html)):\n", "\n", " * Compute empirical pKa values for the residues' local environment (propKa)\n", " * Assign titration states at the user-chosen pH;\n", " * Flipping the side chains of HIS (including user defined HIS states), ASN, and GLN residues;\n", "\n", " * Rotating the sidechain hydrogen on SER, THR, TYR, and CYS (if available);\n", " * Determining the best placement for the sidechain hydrogen on neutral HIS, protonated GLU, and protonated ASP;\n", " * Optimizing all water hydrogens." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The hydrogen bonding network calculations are performed by the\n", "[PDB2PQR](http://www.poissonboltzmann.org/) software package. The pKa\n", "calculations are performed by the [PROPKA\n", "3.1](https://github.com/jensengroup/propka-3.1) software packages.\n", "Please see the copyright, license and citation terms distributed with each.\n", "\n", "Note that this version was modified in order to use an \n", "externally-supplied propKa **3.1** (installed automatically via dependencies), whereas\n", "the original had propKa 3.0 *embedded*!\n", "\n", "The results of the function should be roughly equivalent of the system\n", "preparation wizard's preprocessing and optimization steps\n", "of Schrodinger's Maestro software." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Protein residue pKas in water" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![](http://pub.htmd.org/tutorials/protein-preparation/naming.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Modified residue names" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The molecule produced by the preparation modifies residue names\n", "according to their protonation.\n", "Later system-building functions assume these residue naming conventions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Charge +1 | Neutral | Charge -1\n", "-------------|------------|----------\n", " - | ASH | ASP\n", " - | CYS | CYM\n", " - | GLH | GLU\n", "HIP | HID/HIE | -\n", "LYS | LYN | -\n", " - | TYR | TYM\n", "ARG | AR0 | -" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: support for alternative charge states varies between the forcefields." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Limitations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ " * *PDB2PQR*: returns **one** solution consistent with its restraints, not the optimal one;\n", " * *Membrane proteins*: propKa ignores **lipid exposure** (more on this later);\n", " * *Large conformational changes*: local environment changes may be large enough that pKa decisions are **not transferable**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## `proteinPrepare` function" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The `proteinPrepare` function requires a `Molecule` object, the protein to be prepared, as an argument, and returns the prepared system, also as a `Molecule`. Logging messages will provide information and warnings about the process.\n", "\n", "```python\n", "def proteinPrepare(mol_in,\n", " pH=7.0,\n", " verbose=0,\n", " returnDetails=False,\n", " hydrophobicThickness=None,\n", " holdSelection=None):\n", "```\n", "\n", "Returns a `Molecule` object, where residues have been renamed to follow internal conventions on protonation (below). Coordinates are changed to optimize the H-bonding network. This should be roughly comparable to Schroedinger Maestro's preparation wizard." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Parameters" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "```\n", "mol_in : htmd.Molecule\n", " the object to be optimized\n", "pH : float\n", " pH to decide titration\n", "verbose : int\n", " verbosity\n", "returnDetails : bool\n", " whether to return just the prepared Molecule (False, default) or a molecule *and* a ResidueInfo\n", " object including computed properties\n", "hydrophobicThickness : float\n", " the thickness of the membrane in which the protein is embedded, or None if globular protein.\n", " Used to provide a warning about membrane-exposed residues.\n", "holdSelection : str\n", " Atom selection to be excluded from optimization.\n", " Only the carbon-alpha atom will be considered for the corresponding residue.\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`proteinPrepare()` is a convenience function. Using it\n", "is **not** mandatory. You can \n", "manipulate the input molecule with your custom functions. \n", "In particular,\n", "\n", "* Addition of hydrogen atoms is not required\n", "* Protonation states are set by renaming residues\n", "* HIS and other residues can be edited as coordinates\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Prepare trypsin (PDB: 3PTB) at pH 7." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-03-16 16:45:27,014 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb\n", "2018-03-16 16:45:28,394 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.\n", "2018-03-16 16:45:28,556 - propka - INFO - No pdbfile provided\n", "2018-03-16 16:45:31,188 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CA\n", "2018-03-16 16:45:31,189 - htmd.builder.preparation - WARNING - The following residue has not been optimized: BEN\n", "2018-03-16 16:45:39,499 - htmd.builder.preparationdata - INFO - The following residues are in a non-standard state: CYS 22 A (CYX), HIS 40 A (HIE), CYS 42 A (CYX), HIS 57 A (HIP), CYS 58 A (CYX), HIS 91 A (HID), CYS 128 A (CYX), CYS 136 A (CYX), CYS 157 A (CYX), CYS 168 A (CYX), CYS 182 A (CYX), CYS 191 A (CYX), CYS 201 A (CYX), CYS 220 A (CYX), CYS 232 A (CYX)\n", "2018-03-16 16:45:39,935 - htmd.builder.preparationdata - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.0.\n", "2018-03-16 16:45:39,939 - htmd.builder.preparationdata - WARNING - Dubious protonation state: HIS 57 A (pKa= 7.44)\n", "2018-03-16 16:45:39,940 - htmd.builder.preparationdata - WARNING - Dubious protonation state: GLU 70 A (pKa= 6.10)\n", "2018-03-16 16:45:39,942 - htmd.builder.preparationdata - WARNING - Dubious protonation state: N+ 16T A (pKa= 7.41)\n", "2018-03-16 16:45:40,042 - htmd.builder.preparationdata - WARNING - Found N-terminus 83.9% buried (> 50.0% threshold)\n", "2018-03-16 16:45:40,043 - htmd.builder.preparationdata - WARNING - Found C-terminus involved in H bonds\n" ] } ], "source": [ "tryp = Molecule(\"3PTB\")\n", "tryp_op = proteinPrepare(tryp)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Visualize protonation of residue 40" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cfa19ea4a2074c09b0dcfd45f37b8475", "version_major": 2, "version_minor": 0 }, "text/plain": [ "A Jupyter Widget" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tryp_op.view(style=\"Licorice\",sel=\"resid 40\",hold=True)\n", "tryp_op.view(style=\"Lines\",sel=\"same residue as exwithin 4 of resid 40\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Preparation report" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "If the `returnDetails` argument is set, an object of type `ResidueData` is returned as a **second** return value. It carries a wealth of information on the preparation results. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-03-16 16:46:21,332 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CA\n", "2018-03-16 16:46:21,334 - htmd.builder.preparation - WARNING - The following residue has not been optimized: BEN\n", "2018-03-16 16:46:29,334 - htmd.builder.preparationdata - INFO - The following residues are in a non-standard state: CYS 22 A (CYX), HIS 40 A (HIE), CYS 42 A (CYX), HIS 57 A (HIP), CYS 58 A (CYX), HIS 91 A (HID), CYS 128 A (CYX), CYS 136 A (CYX), CYS 157 A (CYX), CYS 168 A (CYX), CYS 182 A (CYX), CYS 191 A (CYX), CYS 201 A (CYX), CYS 220 A (CYX), CYS 232 A (CYX)\n", "2018-03-16 16:46:29,337 - htmd.builder.preparationdata - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.0.\n", "2018-03-16 16:46:29,339 - htmd.builder.preparationdata - WARNING - Dubious protonation state: HIS 57 A (pKa= 7.44)\n", "2018-03-16 16:46:29,340 - htmd.builder.preparationdata - WARNING - Dubious protonation state: GLU 70 A (pKa= 6.10)\n", "2018-03-16 16:46:29,341 - htmd.builder.preparationdata - WARNING - Dubious protonation state: N+ 16T A (pKa= 7.41)\n", "2018-03-16 16:46:29,433 - htmd.builder.preparationdata - WARNING - Found N-terminus 83.9% buried (> 50.0% threshold)\n", "2018-03-16 16:46:29,434 - htmd.builder.preparationdata - WARNING - Found C-terminus involved in H bonds\n" ] }, { "data": { "text/plain": [ "PreparationData object about 290 residues.\n", "Unparametrized residue names: CA, BEN\n", "Please find the full info in the .data property, e.g.: \n", " resname resid insertion chain pKa protonation flipped buried\n", "0 ILE 16 A NaN ILE NaN NaN\n", "1 VAL 17 A NaN VAL NaN NaN\n", "2 GLY 18 A NaN GLY NaN NaN\n", "3 GLY 19 A NaN GLY NaN NaN\n", "4 TYR 20 A 9.590845 TYR NaN 14.642857\n", " . . ." ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tryp_op, prepData = proteinPrepare(tryp, returnDetails=True)\n", "prepData" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Most of it is accessible in the `data` property, as a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['resname', 'resid', 'insertion', 'chain', 'pKa', 'protonation',\n", " 'flipped', 'patches', 'buried', 'z', 'membraneExposed',\n", " 'forced_protonation', 'default_protonation', 'pka_group_id',\n", " 'pka_residue_type', 'pka_type', 'pka_charge', 'pka_atom_name',\n", " 'pka_atom_sybyl_type', 'pdb2pqr_idx', 'guessedAtoms'],\n", " dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prepData.data.columns" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | resname | \n", "resid | \n", "pKa | \n", "protonation | \n", "
---|---|---|---|---|
0 | \n", "ILE | \n", "16 | \n", "NaN | \n", "ILE | \n", "
1 | \n", "VAL | \n", "17 | \n", "NaN | \n", "VAL | \n", "
2 | \n", "GLY | \n", "18 | \n", "NaN | \n", "GLY | \n", "
3 | \n", "GLY | \n", "19 | \n", "NaN | \n", "GLY | \n", "
4 | \n", "TYR | \n", "20 | \n", "9.590845 | \n", "TYR | \n", "
5 | \n", "THR | \n", "21 | \n", "NaN | \n", "THR | \n", "
6 | \n", "CYS | \n", "22 | \n", "99.990000 | \n", "CYX | \n", "
7 | \n", "GLY | \n", "23 | \n", "NaN | \n", "GLY | \n", "
8 | \n", "ALA | \n", "24 | \n", "NaN | \n", "ALA | \n", "
9 | \n", "ASN | \n", "25 | \n", "NaN | \n", "ASN | \n", "