{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# HTMD Molecules" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Molecule getters and setters" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "* The `Molecule` objects can be accessed and/or manipulated.\n", "* This is principally done through a pair of methods which are known as \"getter/setter\" methods in the object-oriented jargon.\n", " * `Molecule.get()` to access properties\n", " * `Molecule.set()` to change properties\n", "* **The following sections will show the usage of property getters and setters in a number of real-world tasks.**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "First, let's import HTMD and load 3PTB into a `Molecule` object:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-02-24 04:26:31,141 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb\n", "2018-02-24 04:26:31,269 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. \n", "https://dx.doi.org/10.1021/acs.jctc.6b00049\n", "Documentation: http://software.acellera.com/\n", "To update: conda update htmd -c acellera -c psi4\n", "\n", "You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).\n", "\n" ] } ], "source": [ "from htmd.ui import *\n", "mol = Molecule('3PTB')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's show-case some examples." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Check the residue IDs of Cysteines in the `Molecule`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to get the residue IDs of cysteines in the `Molecule`, one can do:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 22, 22, 22, 22, 22, 22, 42, 42, 42, 42, 42, 42, 58,\n", " 58, 58, 58, 58, 58, 128, 128, 128, 128, 128, 128, 136, 136,\n", " 136, 136, 136, 136, 157, 157, 157, 157, 157, 157, 168, 168, 168,\n", " 168, 168, 168, 182, 182, 182, 182, 182, 182, 191, 191, 191, 191,\n", " 191, 191, 201, 201, 201, 201, 201, 201, 220, 220, 220, 220, 220,\n", " 220, 232, 232, 232, 232, 232, 232])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.get('resid', sel='resname CYS')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Note:** residue IDs are outputted multiple times. This is due to the fact that one value is returned per matched atom, and there are 6 atoms resolved per Cys residue." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The names of the 6 atoms of cysteine residue 58 can be checked with:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['N', 'CA', 'C', 'O', 'CB', 'SG'], dtype=object)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.get('name','resname CYS and resid 58')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To obtain one residue ID per residue, one can either,\n", "\n", "* further restrict the selection to carbon α atoms (for example):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 22, 42, 58, 128, 136, 157, 168, 182, 191, 201, 220, 232])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.get('resid',sel='name CA and resname CYS')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or,\n", "\n", "* take advantage of `python` and use the `numpy.unique` function to remove repeated entries:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 22, 42, 58, 128, 136, 157, 168, 182, 191, 201, 220, 232])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.unique(mol.get('resid',sel='resname CYS'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** `numpy` is imported as `np` when `from htmd.ui import *` is ran." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Retrieve the coordinates of atoms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is done by accessing the `Molecule.coords` property. It is a special property, since it returns a 3-column vector (for the x, y, z coordinates)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 4.23999977, 16.49500084, 27.98600006]], dtype=float32)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.get('coords','resname CYS and resid 58 and name CA')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Note**: the precision is restricted to the one in the PDB file." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If more than one atom (say, N>1) are selected, an `N x 3` matrix is returned:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 5.12200022, 16.71899986, 26.86300087],\n", " [ 4.23999977, 16.49500084, 27.98600006],\n", " [ 4.87400007, 16.95800018, 29.29999924],\n", " [ 4.23799992, 16.76399994, 30.36199951],\n", " [ 3.94099998, 14.9989996 , 28.07099915],\n", " [ 2.79200006, 14.45199966, 26.72200012]], dtype=float32)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.get('coords','resname CYS and resid 58')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Know the chains or segments present in the `Molecule`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The chains present in the `Molecule` can be known using:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['A'], dtype=object)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.unique(mol.get('chain'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which means that every atom is assigned to the same chain (i.e., chain A)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same can be done for segments:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['0', '1'], dtype=object)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.unique(mol.get('segid'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These segment IDs may need to be changed for system building." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Set properties of the `Molecule`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "`Molecule.set` can be used to set or change specific fields of the molecule or a selection.\n", "\n", "For example, it can create a segment ID called 'P' for all protein atoms:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['P'], dtype=object)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.set('segid','P','protein')\n", "np.unique(mol.get('segid', 'protein'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another useful task for `Molecule.set` is to rename residues.\n", "\n", "For example, to change the residue name of all histidine residues to 'HSN':" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 57, 57, 57, 57, 57, 57, 57,\n", " 57, 57, 57, 91, 91, 91, 91, 91, 91, 91, 91, 91, 91])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.set('resname','HSN','resname HIS')\n", "mol.get('resid',sel='resname HSN')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Count the number of waters in the `Molecule`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the indices of atoms that were recognized as water:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1641, 1642, 1643, 1644, 1645, 1646, 1647, 1648, 1649, 1650, 1651,\n", " 1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662,\n", " 1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673,\n", " 1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1683, 1684,\n", " 1685, 1686, 1687, 1688, 1689, 1690, 1691, 1692, 1693, 1694, 1695,\n", " 1696, 1697, 1698, 1699, 1700, 1701, 1702])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.get('serial',sel='water')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** in this case, the hydrogens of waters are not present, so we only get one index per water without using the `np.unique` function." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, the number of waters present in the `Molecule` can be obtained with:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "62" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(mol.get('serial',sel='water and name O'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Alternatively, the `Molecule.atomselect` method can be used. It returns a vector of boolean values:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, False, ..., True, True, True], dtype=bool)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.atomselect('water and name O')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fact that `True` counts as 1 in the `sum` function can be used to obtain, through a different way, the number of waters:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False False False ..., True True True]\n" ] }, { "data": { "text/plain": [ "62" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "selection = mol.atomselect('water and name O')\n", "print(selection)\n", "sum(selection)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Duplicate, modify and write `Molecule` objects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `Molecule.copy` to duplicate the molecule into a new object:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "newmol = mol.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Molecule.write` can be used to output a PDB file of your whole molecule (or just a selection). The following command uses the above copied molecule `newmol` to write out a PDB file of the benzamidine ligand atoms present in the fetched PDB file, except for hydrogen atoms:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "newmol.write('/tmp/ligand.pdb','resname BEN and noh')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`Molecule.filter` can be used to select specific parts of the molecule (e.g. chains, segments) and clean/remove all the rest.\n", "\n", "For example, clean all except for protein atoms in chain A:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-02-24 04:27:52,480 - htmd.molecule.molecule - INFO - Removed 72 atoms. 1629 atoms remaining in the molecule.\n" ] }, { "data": { "text/plain": [ "array([], dtype=int64)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.filter('chain A and protein')\n", "mol.get('serial', sel='not chain A or not protein')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Joining molecules/segments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Molecule.append` appends a Molecule object (e.g. ligand, water or ion segments, etc.) to another molecule object (e.g. protein, membrane).\n", "\n", "For example, to append the benzamidine ligand (saved above as a PDB file) to the molecule we are working with (which is now only the protein chain A), simply do:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n", "['C1' 'C2' 'C3' 'C4' 'C5' 'C6' 'C' 'N1' 'N2']\n" ] } ], "source": [ "ligand = Molecule('/tmp/ligand.pdb')\n", "print(mol.get('name', 'resname BEN'))\n", "mol.append(ligand)\n", "print(mol.get('name', 'resname BEN'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "You can also add an atom using `Molecule.insert`. This method allows to choose the index at which to insert the `Molecule` object:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "atom = Molecule()\n", "atom.empty(1)\n", "atom.record = 'ATOM'\n", "atom.name = 'CA'\n", "atom.resid = 0\n", "atom.resname = 'XXX'\n", "atom.chain = 'X'\n", "atom.coords = [6, 3, 2]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "name ['CA']\n", "resid [0]\n", "resname ['XXX']\n", "chain ['X']\n", "coords [[ 6. 3. 2.]]\n" ] } ], "source": [ "newligand = ligand.copy()\n", "newligand.insert(atom, 0)\n", "for field in ['name', 'resid', 'resname', 'chain', 'coords']:\n", " print(field, newligand.get(field, sel='index 0'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Playing with coordinates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Coordinates can be used to perform geometric tasks on your molecule, such as translations or rotations.\n", "\n", "For example, calculate the geometric center of your molecule:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "coo = mol.get('coords')\n", "c = np.mean(coo, axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, `Molecule.moveBy` can be used to translate the molecule and center it on the origin [0, 0, 0] using the previosly calculated center `c`:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "mol.moveBy(-c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, check the new center (which is [0, 0, 0] within the precision):" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 2.15420940e-07, -2.36497249e-06, 4.63504330e-06], dtype=float32)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.mean(mol.get('coords'),axis=0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Rotations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Another common task is rotations (e.g. to build a protein - ligand system).\n", "For example, load up your ligand and visualize its orientation:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-02-24 04:28:23,209 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb\n", "2018-02-24 04:28:23,332 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.\n", "2018-02-24 04:28:23,367 - htmd.molecule.molecule - INFO - Removed 1692 atoms. 9 atoms remaining in the molecule.\n" ] } ], "source": [ "ligand = Molecule('3PTB')\n", "ligand.filter('resname BEN')\n", "ligand_orig = ligand.copy()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Then, calculate its geometric center and use `Molecule.rotateBy` to rotate your ligand with the use of a rotation matrix `M`:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "ligcenter = np.mean(ligand.get('coords'),axis=0)\n", "M = uniformRandomRotation()\n", "ligand.rotateBy(M, center=ligcenter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can use the transpose of rotation matrix `M` to rotate the ligand back to the original position and verify its coordinates are the same within precision:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ligand.rotateBy(np.transpose(M), center=ligcenter)\n", "np.allclose(ligand.get('coords'), ligand_orig.get('coords'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Another example is to rotate around an axis" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "ligand.rotateBy(rotationMatrix([1, 0, 0], math.pi),center=[0, 0, 0])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "**Note:** the `uniformRandomRotation()` function provides the transformation matrix for uniformly distributed random rotation." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Working with MD trajectories" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Molecule` also allows to:\n", "* read MD trajectories:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-02-24 04:28:37,143 - htmd.molecule.readers - WARNING - No time information read from /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/dhfr/dhfr.xtc. Defaulting to 0.1ns framestep.\n" ] }, { "data": { "text/plain": [ "100" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from htmd.home import home\n", "molTraj = Molecule(os.path.join(home(dataDir='dhfr'), 'dhfr.psf'))\n", "molTraj.read(os.path.join(home(dataDir='dhfr'), 'dhfr.xtc'))\n", "molTraj.numFrames" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "* and manipulate them (e.g. drop/keep frames):" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame1 = molTraj.copy()\n", "frame1.dropFrames(keep=0)\n", "frame1.numFrames" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "90" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "last10frames = molTraj.copy()\n", "last10frames.dropFrames(drop=range(molTraj.numFrames)[-10:])\n", "last10frames.numFrames" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Biochemistry of HTMD Molecules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print the protein sequence:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-02-24 04:28:49,658 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb\n", "2018-02-24 04:28:49,793 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.\n" ] }, { "data": { "text/plain": [ "{'0': 'IVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEGNEQFISASKSIVHPSYNSNTLNNDIMLIKLKSAASLNSRVASISLPTSCASAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILSDSSCKSAYPGQITSNMFCAGYLEGGKDSCQGDSGGPVVCSGKLQGIVSWGSGCAQKNKPGVYTKVCNYVSWIKQTIASN'}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol = Molecule('3PTB')\n", "mol.sequence()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create mutant proteins by mutating residues:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU']\n", "['ARG' 'ARG' 'ARG' 'ARG']\n" ] } ], "source": [ "mutant = mol.copy()\n", "print(mol.get('resname', 'resid 158'))\n", "mutant.mutateResidue('resid 158', 'ARG')\n", "print(mutant.get('resname', 'resid 158'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercise 1: find residues in contact with a ligand" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "#Load the 'clean' molecule with 3PTB once again" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden" }, "outputs": [], "source": [ "mol = Molecule('3PTB')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "#Identify residues in contact with the benzamidine ligand.\n", "#We exploit the fact that there is exactly one carbon alpha (`name CA`) atom per residue." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden" }, "outputs": [], "source": [ "mol.get('resid', sel='name CA and same residue as protein within 4 of resname BEN')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercise 2: find duplicate residues" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "#Load the 'clean' 3PTB molecule once again:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden" }, "outputs": [], "source": [ "mol = Molecule('3PTB')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Solution 1." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Identify duplicate residues, relying on PDB's *insertion* attribute:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "# Quick and dirty" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden" }, "outputs": [], "source": [ "np.unique(mol.get('resid', sel='insertion A'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "# Or equivantly, more explicit and pretty-printed" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden" }, "outputs": [], "source": [ "ia = mol.copy()\n", "ia.filter(\"insertion A and name CA\")\n", "rid = ia.get('resid') # ia.resid also works!\n", "rn = ia.get('resname')\n", "\n", "for f, b in zip(rn, rid):\n", " print(f, b)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Solution 2." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "If one doesn't want to rely on the *insertion* property, we can process the list of gaps between residues." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" }, "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "#Let's try to use the power of the return_counts property of numpy.unique" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" }, "solution2": "hidden" }, "outputs": [], "source": [ "dups = mol.copy() # Make a working copy\n", "dups.filter(\"name CA and protein\") # Keep only C-alphas\n", "rid = dups.get('resid') # Get list of residue ids\n", "\n", "nrid, count= np.unique(rid,return_counts=True)\n", " # Get how many times each residue id appeared\n", "nrid[count>1] # Show duplicates" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercise 3: find gaps in the sequence of residue numbers" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" }, "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "#We can exploit the numpy.diff() function to compute deltas between array elements." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden" }, "outputs": [], "source": [ "# array with the \"holes\":\n", "# - 0 means duplicate residues\n", "# - >1 means segments of protein missing\n", "deltas = np.diff(rid)\n", "print(deltas)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "#Print residue IDs at which the holes (including duplicate residues) occur" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "hidden" }, "outputs": [], "source": [ "new_rid = rid[:np.size(rid)-1] # no delta for last residue\n", "new_rid[deltas!=1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" }, "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "# Try printing a prettier and more explicit output" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" }, "solution2": "hidden" }, "outputs": [], "source": [ "# Iterate over all residues (excluding last one)\n", "rn = dups.get('resname')\n", "for i in range(np.size(rid)-1):\n", " # If there is a break...\n", " if(deltas[i]>1):\n", " # Remember that deltas[i]=rid[i+1]-rid[i]\n", " print(rid[i],rn[i],' followed by ',rid[i+1],rn[i+1])" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" }, "nav_menu": {}, "toc": { "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }