{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Protein preparation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What is protein preparation?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The **protein preparation** phase, based on the PDB2PQR and propKa softwares, addresses e.g. the problems of assigning titration states at the user-chosen pH; flipping the side chains of HIS, ASN, and GLN residues; and optimizing the overall hydrogen bonding network. \n", "\n", "After preparing, the **build** phase takes a prepared system and applies the chosen forcefield in order to obtain simulation-ready input files." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Let's start" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. \n", "https://dx.doi.org/10.1021/acs.jctc.6b00049\n", "Documentation: http://software.acellera.com/\n", "To update: conda update htmd -c acellera -c psi4\n", "\n", "You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).\n", "\n" ] } ], "source": [ "from htmd.ui import *\n", "config(viewer='ngl')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Protein Preparation in HTMD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The system preparation phase is based on the PDB2PQR software. It \n", "includes the following steps (from the\n", "[PDB2PQR algorithm\n", "description](http://apbs-pdb2pqr.readthedocs.io/en/latest/pdb2pqr/invoking.html)):\n", "\n", " * Compute empirical pKa values for the residues' local environment (propKa)\n", " * Assign titration states at the user-chosen pH;\n", " * Flipping the side chains of HIS (including user defined HIS states), ASN, and GLN residues;\n", "\n", " * Rotating the sidechain hydrogen on SER, THR, TYR, and CYS (if available);\n", " * Determining the best placement for the sidechain hydrogen on neutral HIS, protonated GLU, and protonated ASP;\n", " * Optimizing all water hydrogens." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The hydrogen bonding network calculations are performed by the\n", "[PDB2PQR](http://www.poissonboltzmann.org/) software package. The pKa\n", "calculations are performed by the [PROPKA\n", "3.1](https://github.com/jensengroup/propka-3.1) software packages.\n", "Please see the copyright, license and citation terms distributed with each.\n", "\n", "Note that this version was modified in order to use an \n", "externally-supplied propKa **3.1** (installed automatically via dependencies), whereas\n", "the original had propKa 3.0 *embedded*!\n", "\n", "The results of the function should be roughly equivalent of the system\n", "preparation wizard's preprocessing and optimization steps\n", "of Schrodinger's Maestro software." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Protein residue pKas in water" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![](http://pub.htmd.org/tutorials/protein-preparation/naming.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Modified residue names" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The molecule produced by the preparation modifies residue names\n", "according to their protonation.\n", "Later system-building functions assume these residue naming conventions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Charge +1 | Neutral | Charge -1\n", "-------------|------------|----------\n", " - | ASH | ASP\n", " - | CYS | CYM\n", " - | GLH | GLU\n", "HIP | HID/HIE | -\n", "LYS | LYN | -\n", " - | TYR | TYM\n", "ARG | AR0 | -" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: support for alternative charge states varies between the forcefields." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Limitations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ " * *PDB2PQR*: returns **one** solution consistent with its restraints, not the optimal one;\n", " * *Membrane proteins*: propKa ignores **lipid exposure** (more on this later);\n", " * *Large conformational changes*: local environment changes may be large enough that pKa decisions are **not transferable**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## `proteinPrepare` function" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The `proteinPrepare` function requires a `Molecule` object, the protein to be prepared, as an argument, and returns the prepared system, also as a `Molecule`. Logging messages will provide information and warnings about the process.\n", "\n", "```python\n", "def proteinPrepare(mol_in,\n", " pH=7.0,\n", " verbose=0,\n", " returnDetails=False,\n", " hydrophobicThickness=None,\n", " holdSelection=None):\n", "```\n", "\n", "Returns a `Molecule` object, where residues have been renamed to follow internal conventions on protonation (below). Coordinates are changed to optimize the H-bonding network. This should be roughly comparable to Schroedinger Maestro's preparation wizard." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Parameters" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "```\n", "mol_in : htmd.Molecule\n", " the object to be optimized\n", "pH : float\n", " pH to decide titration\n", "verbose : int\n", " verbosity\n", "returnDetails : bool\n", " whether to return just the prepared Molecule (False, default) or a molecule *and* a ResidueInfo\n", " object including computed properties\n", "hydrophobicThickness : float\n", " the thickness of the membrane in which the protein is embedded, or None if globular protein.\n", " Used to provide a warning about membrane-exposed residues.\n", "holdSelection : str\n", " Atom selection to be excluded from optimization.\n", " Only the carbon-alpha atom will be considered for the corresponding residue.\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`proteinPrepare()` is a convenience function. Using it\n", "is **not** mandatory. You can \n", "manipulate the input molecule with your custom functions. \n", "In particular,\n", "\n", "* Addition of hydrogen atoms is not required\n", "* Protonation states are set by renaming residues\n", "* HIS and other residues can be edited as coordinates\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Prepare trypsin (PDB: 3PTB) at pH 7." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-03-16 16:45:27,014 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb\n", "2018-03-16 16:45:28,394 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.\n", "2018-03-16 16:45:28,556 - propka - INFO - No pdbfile provided\n", "2018-03-16 16:45:31,188 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CA\n", "2018-03-16 16:45:31,189 - htmd.builder.preparation - WARNING - The following residue has not been optimized: BEN\n", "2018-03-16 16:45:39,499 - htmd.builder.preparationdata - INFO - The following residues are in a non-standard state: CYS 22 A (CYX), HIS 40 A (HIE), CYS 42 A (CYX), HIS 57 A (HIP), CYS 58 A (CYX), HIS 91 A (HID), CYS 128 A (CYX), CYS 136 A (CYX), CYS 157 A (CYX), CYS 168 A (CYX), CYS 182 A (CYX), CYS 191 A (CYX), CYS 201 A (CYX), CYS 220 A (CYX), CYS 232 A (CYX)\n", "2018-03-16 16:45:39,935 - htmd.builder.preparationdata - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.0.\n", "2018-03-16 16:45:39,939 - htmd.builder.preparationdata - WARNING - Dubious protonation state: HIS 57 A (pKa= 7.44)\n", "2018-03-16 16:45:39,940 - htmd.builder.preparationdata - WARNING - Dubious protonation state: GLU 70 A (pKa= 6.10)\n", "2018-03-16 16:45:39,942 - htmd.builder.preparationdata - WARNING - Dubious protonation state: N+ 16T A (pKa= 7.41)\n", "2018-03-16 16:45:40,042 - htmd.builder.preparationdata - WARNING - Found N-terminus 83.9% buried (> 50.0% threshold)\n", "2018-03-16 16:45:40,043 - htmd.builder.preparationdata - WARNING - Found C-terminus involved in H bonds\n" ] } ], "source": [ "tryp = Molecule(\"3PTB\")\n", "tryp_op = proteinPrepare(tryp)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Visualize protonation of residue 40" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cfa19ea4a2074c09b0dcfd45f37b8475", "version_major": 2, "version_minor": 0 }, "text/plain": [ "A Jupyter Widget" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tryp_op.view(style=\"Licorice\",sel=\"resid 40\",hold=True)\n", "tryp_op.view(style=\"Lines\",sel=\"same residue as exwithin 4 of resid 40\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Preparation report" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "If the `returnDetails` argument is set, an object of type `ResidueData` is returned as a **second** return value. It carries a wealth of information on the preparation results. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-03-16 16:46:21,332 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CA\n", "2018-03-16 16:46:21,334 - htmd.builder.preparation - WARNING - The following residue has not been optimized: BEN\n", "2018-03-16 16:46:29,334 - htmd.builder.preparationdata - INFO - The following residues are in a non-standard state: CYS 22 A (CYX), HIS 40 A (HIE), CYS 42 A (CYX), HIS 57 A (HIP), CYS 58 A (CYX), HIS 91 A (HID), CYS 128 A (CYX), CYS 136 A (CYX), CYS 157 A (CYX), CYS 168 A (CYX), CYS 182 A (CYX), CYS 191 A (CYX), CYS 201 A (CYX), CYS 220 A (CYX), CYS 232 A (CYX)\n", "2018-03-16 16:46:29,337 - htmd.builder.preparationdata - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.0.\n", "2018-03-16 16:46:29,339 - htmd.builder.preparationdata - WARNING - Dubious protonation state: HIS 57 A (pKa= 7.44)\n", "2018-03-16 16:46:29,340 - htmd.builder.preparationdata - WARNING - Dubious protonation state: GLU 70 A (pKa= 6.10)\n", "2018-03-16 16:46:29,341 - htmd.builder.preparationdata - WARNING - Dubious protonation state: N+ 16T A (pKa= 7.41)\n", "2018-03-16 16:46:29,433 - htmd.builder.preparationdata - WARNING - Found N-terminus 83.9% buried (> 50.0% threshold)\n", "2018-03-16 16:46:29,434 - htmd.builder.preparationdata - WARNING - Found C-terminus involved in H bonds\n" ] }, { "data": { "text/plain": [ "PreparationData object about 290 residues.\n", "Unparametrized residue names: CA, BEN\n", "Please find the full info in the .data property, e.g.: \n", " resname resid insertion chain pKa protonation flipped buried\n", "0 ILE 16 A NaN ILE NaN NaN\n", "1 VAL 17 A NaN VAL NaN NaN\n", "2 GLY 18 A NaN GLY NaN NaN\n", "3 GLY 19 A NaN GLY NaN NaN\n", "4 TYR 20 A 9.590845 TYR NaN 14.642857\n", " . . ." ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tryp_op, prepData = proteinPrepare(tryp, returnDetails=True)\n", "prepData" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Most of it is accessible in the `data` property, as a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['resname', 'resid', 'insertion', 'chain', 'pKa', 'protonation',\n", " 'flipped', 'patches', 'buried', 'z', 'membraneExposed',\n", " 'forced_protonation', 'default_protonation', 'pka_group_id',\n", " 'pka_residue_type', 'pka_type', 'pka_charge', 'pka_atom_name',\n", " 'pka_atom_sybyl_type', 'pdb2pqr_idx', 'guessedAtoms'],\n", " dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prepData.data.columns" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
resnameresidpKaprotonation
0ILE16NaNILE
1VAL17NaNVAL
2GLY18NaNGLY
3GLY19NaNGLY
4TYR209.590845TYR
5THR21NaNTHR
6CYS2299.990000CYX
7GLY23NaNGLY
8ALA24NaNALA
9ASN25NaNASN
\n", "
" ], "text/plain": [ " resname resid pKa protonation\n", "0 ILE 16 NaN ILE\n", "1 VAL 17 NaN VAL\n", "2 GLY 18 NaN GLY\n", "3 GLY 19 NaN GLY\n", "4 TYR 20 9.590845 TYR\n", "5 THR 21 NaN THR\n", "6 CYS 22 99.990000 CYX\n", "7 GLY 23 NaN GLY\n", "8 ALA 24 NaN ALA\n", "9 ASN 25 NaN ASN" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prepData.data.loc[:,['resname','resid','pKa','protonation']].head(10)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As such, it can be easily queried and written as a spreadsheet in Excel or CSV format." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "prepData.data.to_excel(\"./tryp_data.xlsx\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Special case: Membrane proteins" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Membrane-embedded proteins are in contact with an hydrophobic region which may alter pKa values for membrane-exposed residues (image taken from [Teixeira et al., J. Chem. Theory Comput., 2016, 12 (3), pp 930–934](http://dx.doi.org/10.1021/acs.jctc.5b01114))." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "
![](http://pubs.acs.org/appl/literatum/publisher/achs/journals/content/jctcce/2016/jctcce.2016.12.issue-3/acs.jctc.5b01114/20160302/images/large/ct-2015-01114c_0002.jpeg)
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although the effect is not currently taken into account quantitatively, if a `hydrophobicThickness` argument is provided, warnings will be generated for residues exposed to the lipid region." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The following example shows the preparation of the mu opioid receptor, 4DKL. \n", "The **pre-oriented** structure is retrieved from the OPM database." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-03-16 16:47:01,783 - htmd.molecule.molecule - INFO - Removed 2546 atoms. 4836 atoms remaining in the molecule.\n", "2018-03-16 16:47:01,876 - htmd.molecule.molecule - INFO - Removed 364 atoms. 4472 atoms remaining in the molecule.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "32.0\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2018-03-16 16:47:29,115 - htmd.builder.preparationdata - INFO - The following residues are in a non-standard state: ASP 114 A (ASH), CYS 140 A (CYX), HIS 171 A (HID), CYS 217 A (CYX), HIS 223 A (HID), HIS 297 A (HID), HIS 319 A (HIE), ASP 114 B (ASH), CYS 140 B (CYX), HIS 171 B (HID), CYS 217 B (CYX), HIS 223 B (HID), HIS 297 B (HID), HIS 319 B (HIE)\n", "2018-03-16 16:47:29,118 - htmd.builder.preparationdata - WARNING - Dubious protonation state: the pKa of 6 residues is within 1.0 units of pH 7.0.\n", "2018-03-16 16:47:29,119 - htmd.builder.preparationdata - WARNING - Dubious protonation state: ASP 114 A (pKa= 7.85)\n", "2018-03-16 16:47:29,121 - htmd.builder.preparationdata - WARNING - Dubious protonation state: HIS 223 A (pKa= 6.36)\n", "2018-03-16 16:47:29,122 - htmd.builder.preparationdata - WARNING - Dubious protonation state: ASP 114 B (pKa= 7.85)\n", "2018-03-16 16:47:29,123 - htmd.builder.preparationdata - WARNING - Dubious protonation state: HIS 223 B (pKa= 6.36)\n", "2018-03-16 16:47:29,124 - htmd.builder.preparationdata - WARNING - Dubious protonation state: N+ 65T A (pKa= 7.76)\n", "2018-03-16 16:47:29,125 - htmd.builder.preparationdata - WARNING - Dubious protonation state: N+ 65T B (pKa= 7.76)\n", "2018-03-16 16:47:29,342 - htmd.builder.preparationdata - WARNING - Predictions for 34 residues may be incorrect because they are exposed to the membrane (-16.0