{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# HTMD Molecules"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Molecule getters and setters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "* The `Molecule` objects can be accessed and/or manipulated.\n",
    "* This is principally done through a pair of methods which are known as \"getter/setter\" methods in the object-oriented jargon.\n",
    "   * `Molecule.get()` to access properties\n",
    "   * `Molecule.set()` to change properties\n",
    "* **The following sections will show the usage of property getters and setters in a number of real-world tasks.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "First, let's import HTMD and load 3PTB into a `Molecule` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2018-02-24 04:26:31,141 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb\n",
      "2018-02-24 04:26:31,269 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. \n",
      "https://dx.doi.org/10.1021/acs.jctc.6b00049\n",
      "Documentation: http://software.acellera.com/\n",
      "To update: conda update htmd -c acellera -c psi4\n",
      "\n",
      "You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from htmd.ui import *\n",
    "mol = Molecule('3PTB')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's show-case some examples."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Check the residue IDs of Cysteines in the `Molecule`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to get the residue IDs of cysteines in the `Molecule`, one can do:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 22,  22,  22,  22,  22,  22,  42,  42,  42,  42,  42,  42,  58,\n",
       "        58,  58,  58,  58,  58, 128, 128, 128, 128, 128, 128, 136, 136,\n",
       "       136, 136, 136, 136, 157, 157, 157, 157, 157, 157, 168, 168, 168,\n",
       "       168, 168, 168, 182, 182, 182, 182, 182, 182, 191, 191, 191, 191,\n",
       "       191, 191, 201, 201, 201, 201, 201, 201, 220, 220, 220, 220, 220,\n",
       "       220, 232, 232, 232, 232, 232, 232])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.get('resid', sel='resname CYS')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "**Note:** residue IDs are outputted multiple times. This is due to the fact that one value is returned per matched atom, and there are 6 atoms resolved per Cys residue."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The names of the 6 atoms of cysteine residue 58 can be checked with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['N', 'CA', 'C', 'O', 'CB', 'SG'], dtype=object)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.get('name','resname CYS and resid 58')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "To obtain one residue ID per residue, one can either,\n",
    "\n",
    "* further restrict the selection to carbon &alpha; atoms (for example):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 22,  42,  58, 128, 136, 157, 168, 182, 191, 201, 220, 232])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.get('resid',sel='name CA and resname CYS')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or,\n",
    "\n",
    "* take advantage of `python` and use the `numpy.unique` function to remove repeated entries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 22,  42,  58, 128, 136, 157, 168, 182, 191, 201, 220, 232])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.unique(mol.get('resid',sel='resname CYS'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** `numpy` is imported as `np` when `from htmd.ui import *` is ran."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Retrieve the coordinates of atoms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is done by accessing the `Molecule.coords` property. It is a special property, since it returns a 3-column vector (for the x, y, z coordinates)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[  4.23999977,  16.49500084,  27.98600006]], dtype=float32)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.get('coords','resname CYS and resid 58 and name CA')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "**Note**: the precision is restricted to the one in the PDB file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "If more than one atom (say, N>1) are selected, an `N x 3` matrix is returned:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[  5.12200022,  16.71899986,  26.86300087],\n",
       "       [  4.23999977,  16.49500084,  27.98600006],\n",
       "       [  4.87400007,  16.95800018,  29.29999924],\n",
       "       [  4.23799992,  16.76399994,  30.36199951],\n",
       "       [  3.94099998,  14.9989996 ,  28.07099915],\n",
       "       [  2.79200006,  14.45199966,  26.72200012]], dtype=float32)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.get('coords','resname CYS and resid 58')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Know the chains or segments present in the `Molecule`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The chains present in the `Molecule` can be known using:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['A'], dtype=object)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.unique(mol.get('chain'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "which means that every atom is assigned to the same chain (i.e., chain A)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same can be done for segments:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['0', '1'], dtype=object)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.unique(mol.get('segid'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These segment IDs may need to be changed for system building."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Set properties of the `Molecule`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "`Molecule.set` can be used to set or change specific fields of the molecule or a selection.\n",
    "\n",
    "For example, it can create a segment ID called 'P' for all protein atoms:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['P'], dtype=object)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.set('segid','P','protein')\n",
    "np.unique(mol.get('segid', 'protein'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another useful task for `Molecule.set` is to rename residues.\n",
    "\n",
    "For example, to change the residue name of all histidine residues to 'HSN':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 57, 57, 57, 57, 57, 57, 57,\n",
       "       57, 57, 57, 91, 91, 91, 91, 91, 91, 91, 91, 91, 91])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.set('resname','HSN','resname HIS')\n",
    "mol.get('resid',sel='resname HSN')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Count the number of waters in the `Molecule`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get the indices of atoms that were recognized as water:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1641, 1642, 1643, 1644, 1645, 1646, 1647, 1648, 1649, 1650, 1651,\n",
       "       1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662,\n",
       "       1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673,\n",
       "       1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1683, 1684,\n",
       "       1685, 1686, 1687, 1688, 1689, 1690, 1691, 1692, 1693, 1694, 1695,\n",
       "       1696, 1697, 1698, 1699, 1700, 1701, 1702])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.get('serial',sel='water')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** in this case, the hydrogens of waters are not present, so we only get one index per water without using the `np.unique` function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, the number of waters present in the `Molecule` can be obtained with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "62"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(mol.get('serial',sel='water and name O'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Alternatively, the `Molecule.atomselect` method can be used. It returns a vector of boolean values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([False, False, False, ...,  True,  True,  True], dtype=bool)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.atomselect('water and name O')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The fact that `True` counts as 1 in the `sum` function can be used to obtain, through a different way, the number of waters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[False False False ...,  True  True  True]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "62"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "selection = mol.atomselect('water and name O')\n",
    "print(selection)\n",
    "sum(selection)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Duplicate, modify and write `Molecule` objects"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `Molecule.copy` to duplicate the molecule into a new object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "newmol = mol.copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Molecule.write` can be used to output a PDB file of your whole molecule (or just a selection). The following command uses the above copied molecule `newmol` to write out a PDB file of the benzamidine ligand atoms present in the fetched PDB file, except for hydrogen atoms:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "newmol.write('/tmp/ligand.pdb','resname BEN and noh')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "`Molecule.filter` can be used to select specific parts of the molecule (e.g. chains, segments) and clean/remove all the rest.\n",
    "\n",
    "For example, clean all except for protein atoms in chain A:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2018-02-24 04:27:52,480 - htmd.molecule.molecule - INFO - Removed 72 atoms. 1629 atoms remaining in the molecule.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([], dtype=int64)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol.filter('chain A and protein')\n",
    "mol.get('serial', sel='not chain A or not protein')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Joining molecules/segments"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Molecule.append` appends a Molecule object (e.g. ligand, water or ion segments, etc.) to another molecule object (e.g. protein, membrane).\n",
    "\n",
    "For example, to append the benzamidine ligand (saved above as a PDB file) to the molecule we are working with (which is now only the protein chain A), simply do:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[]\n",
      "['C1' 'C2' 'C3' 'C4' 'C5' 'C6' 'C' 'N1' 'N2']\n"
     ]
    }
   ],
   "source": [
    "ligand = Molecule('/tmp/ligand.pdb')\n",
    "print(mol.get('name', 'resname BEN'))\n",
    "mol.append(ligand)\n",
    "print(mol.get('name', 'resname BEN'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "You can also add an atom using `Molecule.insert`. This method allows to choose the index at which to insert the `Molecule` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "atom = Molecule()\n",
    "atom.empty(1)\n",
    "atom.record = 'ATOM'\n",
    "atom.name = 'CA'\n",
    "atom.resid = 0\n",
    "atom.resname = 'XXX'\n",
    "atom.chain = 'X'\n",
    "atom.coords = [6, 3, 2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name ['CA']\n",
      "resid [0]\n",
      "resname ['XXX']\n",
      "chain ['X']\n",
      "coords [[ 6.  3.  2.]]\n"
     ]
    }
   ],
   "source": [
    "newligand = ligand.copy()\n",
    "newligand.insert(atom, 0)\n",
    "for field in ['name', 'resid', 'resname', 'chain', 'coords']:\n",
    "    print(field, newligand.get(field, sel='index 0'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Playing with coordinates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Coordinates can be used to perform geometric tasks on your molecule, such as translations or rotations.\n",
    "\n",
    "For example, calculate the geometric center of your molecule:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "coo = mol.get('coords')\n",
    "c = np.mean(coo, axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, `Molecule.moveBy` can be used to translate the molecule and center it on the origin [0, 0, 0] using the previosly calculated center `c`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "mol.moveBy(-c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, check the new center (which is [0, 0, 0] within the precision):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  2.15420940e-07,  -2.36497249e-06,   4.63504330e-06], dtype=float32)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(mol.get('coords'),axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Rotations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "Another common task is rotations (e.g. to build a protein - ligand system).\n",
    "For example, load up your ligand and visualize its orientation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2018-02-24 04:28:23,209 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb\n",
      "2018-02-24 04:28:23,332 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.\n",
      "2018-02-24 04:28:23,367 - htmd.molecule.molecule - INFO - Removed 1692 atoms. 9 atoms remaining in the molecule.\n"
     ]
    }
   ],
   "source": [
    "ligand = Molecule('3PTB')\n",
    "ligand.filter('resname BEN')\n",
    "ligand_orig = ligand.copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Then, calculate its geometric center and use `Molecule.rotateBy` to rotate your ligand with the use of a rotation matrix `M`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "ligcenter = np.mean(ligand.get('coords'),axis=0)\n",
    "M = uniformRandomRotation()\n",
    "ligand.rotateBy(M, center=ligcenter)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can use the transpose of rotation matrix `M` to rotate the ligand back to the original position and verify its coordinates are the same within precision:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ligand.rotateBy(np.transpose(M), center=ligcenter)\n",
    "np.allclose(ligand.get('coords'), ligand_orig.get('coords'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "Another example is to rotate around an axis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "ligand.rotateBy(rotationMatrix([1, 0, 0], math.pi),center=[0, 0, 0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "**Note:** the `uniformRandomRotation()` function provides the transformation matrix  for uniformly distributed random rotation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Working with MD trajectories"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Molecule` also allows to:\n",
    "* read MD trajectories:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2018-02-24 04:28:37,143 - htmd.molecule.readers - WARNING - No time information read from /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/dhfr/dhfr.xtc. Defaulting to 0.1ns framestep.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "100"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from htmd.home import home\n",
    "molTraj = Molecule(os.path.join(home(dataDir='dhfr'), 'dhfr.psf'))\n",
    "molTraj.read(os.path.join(home(dataDir='dhfr'), 'dhfr.xtc'))\n",
    "molTraj.numFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "* and manipulate them (e.g. drop/keep frames):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame1 = molTraj.copy()\n",
    "frame1.dropFrames(keep=0)\n",
    "frame1.numFrames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "90"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "last10frames = molTraj.copy()\n",
    "last10frames.dropFrames(drop=range(molTraj.numFrames)[-10:])\n",
    "last10frames.numFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Biochemistry of HTMD Molecules"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Print the protein sequence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2018-02-24 04:28:49,658 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb\n",
      "2018-02-24 04:28:49,793 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'0': 'IVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEGNEQFISASKSIVHPSYNSNTLNNDIMLIKLKSAASLNSRVASISLPTSCASAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILSDSSCKSAYPGQITSNMFCAGYLEGGKDSCQGDSGGPVVCSGKLQGIVSWGSGCAQKNKPGVYTKVCNYVSWIKQTIASN'}"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol = Molecule('3PTB')\n",
    "mol.sequence()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create mutant proteins by mutating residues:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU']\n",
      "['ARG' 'ARG' 'ARG' 'ARG']\n"
     ]
    }
   ],
   "source": [
    "mutant = mol.copy()\n",
    "print(mol.get('resname', 'resid 158'))\n",
    "mutant.mutateResidue('resid 158', 'ARG')\n",
    "print(mutant.get('resname', 'resid 158'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Exercise 1: find residues in contact with a ligand"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "#Load the 'clean' molecule with 3PTB once again"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "mol = Molecule('3PTB')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "#Identify residues in contact with the benzamidine ligand.\n",
    "#We exploit the fact that there is exactly one carbon alpha (`name CA`) atom per residue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "mol.get('resid', sel='name CA and same residue as protein within 4 of resname BEN')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Exercise 2: find duplicate residues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "#Load the 'clean' 3PTB molecule once again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "mol = Molecule('3PTB')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Solution 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "Identify duplicate residues, relying on PDB's *insertion* attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "# Quick and dirty"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "np.unique(mol.get('resid', sel='insertion A'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "# Or equivantly, more explicit and pretty-printed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "ia = mol.copy()\n",
    "ia.filter(\"insertion A and name CA\")\n",
    "rid = ia.get('resid') # ia.resid also works!\n",
    "rn = ia.get('resname')\n",
    "\n",
    "for f, b in zip(rn, rid):\n",
    "    print(f, b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Solution 2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "If one doesn't want to rely on the *insertion* property, we can process the list of gaps between residues."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "#Let's try to use the power of the return_counts property of numpy.unique"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "dups = mol.copy()                       # Make a working copy\n",
    "dups.filter(\"name CA and protein\")      # Keep only C-alphas\n",
    "rid = dups.get('resid')                 # Get list of residue ids\n",
    "\n",
    "nrid, count= np.unique(rid,return_counts=True)\n",
    "                                        # Get how many times each residue id appeared\n",
    "nrid[count>1]                           # Show duplicates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Exercise 3: find gaps in the sequence of residue numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "#We can exploit the numpy.diff() function to compute deltas between array elements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "# array with the \"holes\":\n",
    "# - 0 means duplicate residues\n",
    "# - >1 means segments of protein missing\n",
    "deltas = np.diff(rid)\n",
    "print(deltas)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "#Print residue IDs at which the holes (including duplicate residues) occur"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "new_rid = rid[:np.size(rid)-1] # no delta for last residue\n",
    "new_rid[deltas!=1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "# Try printing a prettier and more explicit output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "# Iterate over all residues (excluding last one)\n",
    "rn = dups.get('resname')\n",
    "for i in range(np.size(rid)-1):\n",
    "    # If there is a break...\n",
    "    if(deltas[i]>1):\n",
    "        # Remember that deltas[i]=rid[i+1]-rid[i]\n",
    "        print(rid[i],rn[i],' followed by ',rid[i+1],rn[i+1])"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  },
  "nav_menu": {},
  "toc": {
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 6,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}