{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Demo of Iterating over Residue Secondary Structure\n",
    "\n",
    "This notebook will demonstrate using PyMOL to iterate over the residues of a chain and report the secondary structure designation assigned to each residue.\n",
    "\n",
    "Return to [the index page](index.ipynb) for the list of demonstration notebooks in this series.\n",
    "\n",
    "----\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.</p>\n",
    "\n",
    "<p>\n",
    "    Some tips:\n",
    "    <ul>\n",
    "        <li>Code cells have boxes around them. When you hover over them a <i class=\"fa-step-forward fa\"></i> icon appears.</li>\n",
    "        <li>To run a code cell either click the <i class=\"fa-step-forward fa\"></i> icon, or click on the cell and then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook.</li>\n",
    "        <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.</li>\n",
    "        <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>\n",
    "        <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>\n",
    "    </ul>\n",
    "</p>\n",
    "</div>\n",
    "\n",
    "We'll get a structure file using a bash shell command (unix command) this time and then iterate on a chain in that structure. We'll use this instead of PyMOL's `fetch` command because **the intended use case for this approach is when the structure is not yet published**. For example, imagine you've had [I-TASSER](https://zhanglab.ccmb.med.umich.edu/I-TASSER/) make you a model of your protein chain of interest. **For published structures, you are much better off using PDBsum's 'protein' tab for the chain of interest.** For example, this is [the protein tab page for chain A of 1d66](http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=1d66&template=protein.html&r=wiring&l=1&chain=A). The reason it is better to use PDBsum's report is because this has done an official analysis of the secondary structure and even reported a topology wiring diagram, provided on the right side of the page, that makes it easy to be sure the residues that compose a helix or strand of a beta sheet. Therefore, you have an independent call for the secondary structure that you can reference.\n",
    "\n",
    "So we'll get a file from the PDB to simulate the file we'd upload and use here to iterate on the residues to make a report for an unpublished structure model.\n",
    "\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A described above, we need to get a structure file to simulate the file we'd upload and use here to iterate on the residues to make a report for an unpublished structure model. Running the next cell will do that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100  183k    0  183k    0     0   137k      0 --:--:--  0:00:01 --:--:--  137k\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "curl -OL https://files.rcsb.org/download/1d66.pdb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you are ready to analyze your structure you can skip that cell and instead use the Jupyter Dashboard, which can be accessed by clicking on the Jupyter logo in the upper left, to upload your file to the session. For convenience, you'd want to place the file in the `notebooks` directory where this Jupyter notebook is found. If you understand paths though, you can upload this to the default root directory and point at it when you edit the code below.\n",
    "\n",
    "The intial steps to set up to send commands to PyMol are similar to the other notebooks in this series and so we'll define those as block of code we can prepend in front of specific things to do."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "init_block = '''#!/usr/bin/python\n",
    "\n",
    "import sys, os\n",
    "\n",
    "# pymol environment\n",
    "moddir='/opt/pymol-svn/modules'\n",
    "sys.path.insert(0, moddir)\n",
    "os.environ['PYMOL_PATH'] = os.path.join(moddir, 'pymol/pymol_path')\n",
    "\n",
    "import pymol\n",
    "cmd = pymol.cmd\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With a structure file and a block of code defined that we can use within this running notebook, we can now step through each of the steps below to get the secondry structure on a per residue basis with PyMOL."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Iterating on the residues of a chain: basics\n",
    "\n",
    "To give an idea of what is going to be done here, you can quickly try the following code by pasting it into the command console in your typical desktop graphical user interface version of PyMOL after having loaded any structure file you'd like. If you structure, doesn't have a chain A, change the text to match a chain it has. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "secondary_structure_list_by_resnumber = []\n",
    "iterate (chain A and name ca), secondary_structure_list_by_resnumber.append((resv,ss))\n",
    "print(secondary_structure_list_by_resnumber)    \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You'll get a nice report by residue number of the secondary structure assignments that will look a lot like what you'll see below if you happen to use the same structure file.  \n",
    "That is easy enough to do. However, it isn't easily reproduced or scaled. For example, to come back and do the same thing a week later because you misplaced your record of the results, wouldn't it be nicer to just open a Jupyter notebook and press `Run all` on a notebook? Or what if you wanted to do this for 200 structures?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And so now that you have an idea of what is going on, back to using scripting to do this in a reproducible and easily documented, pipelined, and scalable manner...\n",
    "\n",
    "First, we'll need to load the structure file. When the line below actually gets run later, the following command will load the file into PyMOL. Edit the file name `1d66.pdb` to match your file name if you want to perform the iteration on your own structure that you've uploaded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "cmds2run = \"cmd.load('1d66.pdb')\\n\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PyMOL's [iterate](http://pymolwiki.org/index.php/Iterate) command iterates on a selection includes the ability to accessing a string for secondary structure as it exposes a Python string `ss` for secondary structure. If we assign that to a list we can see the report after. Running the following will set that up and run everything. If your chain of interest isn't chain A, edit the code to indicate the chain you want to iterate on. (Note: the use of `stored` was first covered after the sixth code cell in the Jupyter notebook in this series of notebooks [Demo of Dealing with PyMol colors](demo_colors.ipynb), and used again in [Demo of Sampling Various Combinations of Applying A Color Palette to a Complex](demo_palette.ipynb) and [Demo of Applying a Color Combination Choice to a Complex](demo_apply_combo.ipynb).)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing 'script_txt' (str) to file 'script_sec.py'.\n"
     ]
    }
   ],
   "source": [
    "cmds2run += '''stored.secondary_structure_list_by_resnumber = []\n",
    "secondary_structure_list_by_resnumber = []\n",
    "cmd.iterate('(chain A and name ca)', 'stored.secondary_structure_list_by_resnumber.append((resv,ss))', quiet=0)\n",
    "print(stored.secondary_structure_list_by_resnumber)\n",
    "'''\n",
    "script_txt = init_block + cmds2run\n",
    "%store script_txt >script_sec.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's run that script now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Iterate: iterated over 57 atoms.\r\n",
      "[(8, ''), (9, ''), (10, ''), (11, 'H'), (12, 'H'), (13, 'H'), (14, 'H'), (15, 'H'), (16, 'H'), (17, 'H'), (18, 'H'), (19, ''), (20, ''), (21, ''), (22, ''), (23, ''), (24, ''), (25, ''), (26, ''), (27, ''), (28, 'H'), (29, 'H'), (30, 'H'), (31, 'H'), (32, 'H'), (33, 'H'), (34, 'H'), (35, 'H'), (36, ''), (37, ''), (38, ''), (39, ''), (40, ''), (41, ''), (42, ''), (43, ''), (44, ''), (45, ''), (46, ''), (47, ''), (48, ''), (49, ''), (50, 'H'), (51, 'H'), (52, 'H'), (53, 'H'), (54, 'H'), (55, 'H'), (56, 'H'), (57, 'H'), (58, 'H'), (59, 'H'), (60, 'H'), (61, 'H'), (62, 'H'), (63, 'H'), (64, 'H')]\r\n"
     ]
    }
   ],
   "source": [
    "!pymol -cq script_sec.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`H` is for residues in helices. `S` would be for those in beta-strands; `1d66` only has alphe-helices. If you used the demo structure 1d66 and you compare the residues in runs of helices, you'll see it is a good match to the topology wiring diagram on the right side of the  [the protein tab page for chain A of 1d66](http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=1d66&template=protein.html&r=wiring&l=1&chain=A).\n",
    "\n",
    "The keen-sighted among you may notice that the last two residues observed in the structure (residues 63 and 64) differ in calls from the topology wiring diagram shown at PDBsum. This is because different software uses different algorithms.  According to [here](http://pymol.sourceforge.net/faq.html), \"PyMOL will interpret HELIX and SHEET records from PDB files if they are availablw.\" And that seems to be what is happening here. Run the next command to see the HELX records for the demo file. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HELIX    1 H1A CYS A   11  LYS A   18  1                                   8    \r\n",
      "HELIX    2 H2A CYS A   28  ASN A   35  1                                   8    \r\n",
      "HELIX    3 H3A THR A   50  LEU A   64  1                                  15    \r\n"
     ]
    }
   ],
   "source": [
    "!sed -n '494,496p' < 1d66.pdb # based on https://stackoverflow.com/a/6022441/8508004"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The last line shows the PDB file itself assigns the last two residues, 64 & 64, as C-terminal end of an alpha-helix beginning at residue 50, whereas the [the protein tab page for chain A of 1d66](http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=1d66&template=protein.html&r=wiring&l=1&chain=A) ends the helical segment at 62.\n",
    "\n",
    "What about files that don't include such records?  \n",
    "PyMOL will use its algorithm to assign secondary structure when those records aren't present, such as will be the case with models made by software.As it says for PyMOL [here](https://pymolwiki.org/index.php/Dss), \"PyMOL's algorithm is 'fuzzy' in that there is a grey area where residues may be accepted or rejected as helust the representation seen in PyMOL to match say the topology wiring diagrams elsewhere, it is possible to assign second structure per residue using the `alter` command, see the bottom of the very bottom of the page [here](http://pymol.sourceforge.net/newman/user/S0260cartoons.html) for a good example of how you'd do it when in the typical PyMOL graphical user interface."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Iterating on the residues of a chain: linking to Jupyter\n",
    "\n",
    "The result above was nice; however, you can imagine you might be running the steps above as a larger process in a Jupyter notebook or Python environment and want to bring the data from the above result in as a Python object. To demonstrate how to do that, we'll get the result above as a Python object in this section.\n",
    "\n",
    "Jupyter/IPython offers a general utility method to bring stdout into the namespace of the current kernel easily. It was first briefly covered in dealing with the third cell in the Jupyter notebook in this series of notebooks [Demo of Dealing with PyMol colors](notebooks/demo_colors.ipynb). We'll use that again here. (Even if there was a more direct route to go from PyMOL's environment to returning such a list in useful form, knowing about IPython `utils.text` methods are useful for doing bioinformatics in Jupyter because a lot of other software used isn't based on Python and just returns text.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing 'script_txt' (str) to file 'script_exp.py'.\n"
     ]
    }
   ],
   "source": [
    "cmds2run = '''cmd.load('1d66.pdb')\n",
    "stored.secondary_structure_list_by_resnumber = []\n",
    "secondary_structure_list_by_resnumber = []\n",
    "cmd.iterate('(chain A and name ca)', 'stored.secondary_structure_list_by_resnumber.append((resv,ss))')\n",
    "[print(x) for x in stored.secondary_structure_list_by_resnumber]\n",
    "'''\n",
    "script_txt = init_block + cmds2run\n",
    "%store script_txt >script_exp.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(8, '')    (17, 'H')  (26, '')   (35, 'H')  (44, '')   (53, 'H')  (62, 'H')\n",
      "(9, '')    (18, 'H')  (27, '')   (36, '')   (45, '')   (54, 'H')  (63, 'H')\n",
      "(10, '')   (19, '')   (28, 'H')  (37, '')   (46, '')   (55, 'H')  (64, 'H')\n",
      "(11, 'H')  (20, '')   (29, 'H')  (38, '')   (47, '')   (56, 'H')\n",
      "(12, 'H')  (21, '')   (30, 'H')  (39, '')   (48, '')   (57, 'H')\n",
      "(13, 'H')  (22, '')   (31, 'H')  (40, '')   (49, '')   (58, 'H')\n",
      "(14, 'H')  (23, '')   (32, 'H')  (41, '')   (50, 'H')  (59, 'H')\n",
      "(15, 'H')  (24, '')   (33, 'H')  (42, '')   (51, 'H')  (60, 'H')\n",
      "(16, 'H')  (25, '')   (34, 'H')  (43, '')   (52, 'H')  (61, 'H')\n",
      "\n"
     ]
    }
   ],
   "source": [
    "ll = !pymol -cq script_exp.py\n",
    "import IPython\n",
    "#print(ll.l)\n",
    "print(IPython.utils.text.columnize(ll.l)) #based on https://github.com/ipython/ipython/issues/8741"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To touch upon the possibilities now that the data is in a Python form, the rest of the section will step through using these to get at items or information.\n",
    "\n",
    "With the [SList object](https://ipython.readthedocs.io/en/stable/api/generated/IPython.utils.text.html#IPython.utils.text.SList) as a list, it can be indexed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"(11, 'H')\""
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ll.l[3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The items can be filtered several ways. What if we just the residue numbers for helices?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[11, 12, 13, 14, 15, 16, 17, 18, 28, 29, 30, 31, 32, 33, 34, 35, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64]\n"
     ]
    }
   ],
   "source": [
    "helix_res = []\n",
    "for x in ll.l:\n",
    "    if 'H' in x:\n",
    "        helix_res.append(x.split(\",\")[0])\n",
    "helix_res = [x[1:] for x in helix_res] # this strips off the '(' in front\n",
    "helix_res = [int(x) for x in helix_res] # this converts the numbers in string form to integers\n",
    "print(helix_res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fancier stuff is possible. For example, from inside the PyMOL commands, we could have saved the residue and secondary structure calls as tab delimited values with residue on a separate line and then read than into this notebook via Pandas. However, hopefully this gave you a flavor of the possibilities.\n",
    "\n",
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}