{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Demo of PDBrenum in your broswer via MyBinder.org\n",
    "\n",
    "-----\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.</p>\n",
    "\n",
    "<p>\n",
    "    Some tips:\n",
    "    <ul>\n",
    "        <li>Code cells have boxes around them.</li>\n",
    "        <li>To run a code cell, click on the cell and either click the <i class=\"fa-play fa\"></i> button on the toolbar above, or then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook. Selecting from the menu above the toolbar, <b>Cell</b> > <b>Run All</b> is a shortcut to trigger attempting to run all the cells in the notebook.</li>\n",
    "        <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterisk will be replaced with a number.</li>\n",
    "        <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>\n",
    "        <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>\n",
    "    </ul>\n",
    "</p>\n",
    "</div>\n",
    "\n",
    "----\n",
    "\n",
    "Step through running the cells below. Then substitute in your PDB entry identifiers of interest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Downloading PDB files: 100%|██████████| 4/4 [00:01<00:00,  2.07it/s]\n",
      "Downloading SIFTS files: 100%|██████████| 4/4 [00:00<00:00, 12.80it/s]\n",
      "Renumbering PDB files: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s]\n"
     ]
    }
   ],
   "source": [
    "%run PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -PDB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's it. Really.  \n",
    "Below this demonstration notebook will demonstrate that it worked and fill in some information about running the script here, where to find the output, options for running it elsewhere, etc.. But mostly that is it as you'll see. \n",
    "\n",
    "There's some other options that are handy. If instead you wanted the converted results i the `mmCIF` format you'd use the following command here:\n",
    "\n",
    "```python\n",
    "%run PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -mmCIF\n",
    "\n",
    "```\n",
    "\n",
    "Or simply leave off any reference to format because it defaults to `mmCIF` format if no type is indicated when calling the script. `mmCIF_assembly` and `-PDB_assembly` are also valid types\n",
    "\n",
    "Note the `%run` part is magic for properly running a script in a Jupyter environnemt. If you were running the first demonstration command in a terminal you'd use the following:\n",
    "\n",
    "```bash\n",
    "python PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -PDB\n",
    "```\n",
    "\n",
    "Depending on your system and how you installed Python, you may need to replace `python` with `python3`.\n",
    "\n",
    "The `-rfla` flag in the call the the script above stands for `--renumber_from_list_of_arguments` to indicate we are providing the PDB entry identifiers as part of the command. Use of a text file to provide the PDB ids will be demonstrated [below](#Using-a-list-of-PDB-entry-identifiers).\n",
    "\n",
    "If you ever need the full ist of options and flags just call the script, with the `help` flag like below to print out the full usage details:\n",
    "\n",
    "```python\n",
    "%run PDBrenum.py --help\n",
    "```\n",
    "Running the below cell will do that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "PDB.py\n",
      "optional arguments:\n",
      "-h, --help            show this help message and exit\n",
      "\n",
      "-rftf text_file_with_PDB.txt, --renumber_from_text_file text_file_with_PDB.txt\n",
      "This option will download and renumber specified files\n",
      "usage $ python3 PDB.py -rftf text_file_with_PDB_in_it.txt -mmCIF\n",
      "usage $ python3 PDB.py -rftf text_file_with_PDB_in_it.txt -PDB\n",
      "usage $ python3 PDB.py -rftf text_file_with_PDB_in_it.txt -mmCIF_assembly\n",
      "usage $ python3 PDB.py -rftf text_file_with_PDB_in_it.txt -PDB_assembly\n",
      "usage $ python3 PDB.py -rftf text_file_with_PDB_in_it.txt -all\n",
      "\n",
      "-rfla [6dbp 3v03 2jit ...], --renumber_from_list_of_arguments [6dbp 3v03 2jit ...]\n",
      "This option will download and renumber specified files\n",
      "usage $ python3 PDB.py -rfla 6dbp 3v03 2jit -mmCIF\n",
      "usage $ python3 PDB.py -rfla 6dbp 3v03 2jit -PDB\n",
      "usage $ python3 PDB.py -rfla 6dbp 3v03 2jit -mmCIF_assembly\n",
      "usage $ python3 PDB.py -rfla 6dbp 3v03 2jit -PDB_assembly\n",
      "usage $ python3 PDB.py -rfla 6dbp 3v03 2jit -all\n",
      "\n",
      "-dftf text_file_with_PDB.txt, --download_from_text_file text_file_with_PDB.txt\n",
      "This option will read given input file parse by space\n",
      "or tab or comma or new line and download it example \n",
      "usage $ python3 PDB.py -dftf text_file_with_PDB_in_it.txt -mmCIF\n",
      "usage $ python3 PDB.py -dftf text_file_with_PDB_in_it.txt -PDB\n",
      "usage $ python3 PDB.py -dftf text_file_with_PDB_in_it.txt -mmCIF_assembly\n",
      "usage $ python3 PDB.py -dftf text_file_with_PDB_in_it.txt -PDB_assembly\n",
      "usage $ python3 PDB.py -dftf text_file_with_PDB_in_it.txt -all\n",
      "\n",
      "-dfla [6dbp 3v03 2jit ...], --download_from_list_of_arguments 6dbp 3v03 2jit ...]\n",
      "This option will read given list of arguments separated by space. \n",
      "Format of the list should be without any commas or quotation marks\n",
      "usage $ python3 PDB.py -dfla 6dbp 3v03 2jit -mmCIF\n",
      "usage $ python3 PDB.py -dfla 6dbp 3v03 2jit -PDB\n",
      "usage $ python3 PDB.py -dfla 6dbp 3v03 2jit -mmCIF_assembly\n",
      "usage $ python3 PDB.py -dfla 6dbp 3v03 2jit -PDB_assembly\n",
      "usage $ python3 PDB.py -dfla 6dbp 3v03 2jit -all\n",
      "\n",
      "-redb, --renumber_entire_database\n",
      "This option will download and renumber entire PDB database in PDB or/and mmCIF format\n",
      "usage $ python3 PDB.py -redb -mmCIF\n",
      "usage $ python3 PDB.py -redb -PDB\n",
      "usage $ python3 PDB.py -redb -mmCIF_assembly\n",
      "usage $ python3 PDB.py -redb -PDB_assembly\n",
      "usage $ python3 PDB.py -redb -all \n",
      "\n",
      "-dall, --download_entire_database\n",
      "This option will download entire mmCIF database\n",
      "usage $ python3 PDB.py -dall -mmCIF\n",
      "usage $ python3 PDB.py -dall -PDB\n",
      "usage $ python3 PDB.py -dall -mmCIF_assembly\n",
      "usage $ python3 PDB.py -dall -PDB_assembly\n",
      "usage $ python3 PDB.py -dall -all\n",
      "\n",
      "-refr, --refresh_entire_database\n",
      "This option will delete outdated files and download\n",
      "fresh ones. This option makes sense and only works if\n",
      "you work with entire database\n",
      "usage $ python3 PDB.py -refr -mmCIF\n",
      "usage $ python3 PDB.py -refr -PDB\n",
      "usage $ python3 PDB.py -refr -mmCIF_assembly\n",
      "usage $ python3 PDB.py -refr -PDB_assembly\n",
      "usage $ python3 PDB.py -refr -all    \n",
      "\n",
      "-PDB, --PDB_format_only\n",
      "This option will specify working format to pdb format\n",
      "-mmCIF, --mmCIF_format_only\n",
      "This option will specify working format to mmCIF format (default)\n",
      "-PDB_assembly, --PDB_assembly_format_only\n",
      "This option will specify working format to pdb format\n",
      "-mmCIF_assembly, --mmCIF_assembly_format_only\n",
      "This option will specify working format to mmCIF format\n",
      "-all, --all_formats   This option will work with both formats\n",
      "\n",
      "argpar.add_argument(\"-sipm\", \"--set_default_input_path_to_mmCIF\", type=str, help=argparse.SUPPRESS)\n",
      "argpar.add_argument(\"-sipma\", \"--set_default_input_path_to_mmCIF_assembly\", type=str, help=argparse.SUPPRESS)\n",
      "argpar.add_argument(\"-sipp\", \"--set_default_input_path_to_PDB\", type=str, help=argparse.SUPPRESS)\n",
      "argpar.add_argument(\"-sippa\", \"--set_default_input_path_to_PDB_assembly\", type=str, help=argparse.SUPPRESS)\n",
      "argpar.add_argument(\"-sips\", \"--set_default_input_path_to_SIFTS\", type=str, help=argparse.SUPPRESS)\n",
      "argpar.add_argument(\"-sopm\", \"--set_default_output_path_to_mmCIF\", type=str, help=argparse.SUPPRESS)\n",
      "argpar.add_argument(\"-sopma\", \"--set_default_output_path_to_mmCIF_assembly\", type=str, help=argparse.SUPPRESS)\n",
      "argpar.add_argument(\"-sopp\", \"--set_default_output_path_to_PDB\", type=str, help=argparse.SUPPRESS)\n",
      "argpar.add_argument(\"-soppa\", \"--set_default_output_path_to_PDB_assembly\", type=str, help=argparse.SUPPRESS)\n",
      "\n",
      "-sipm, --set_default_input_path_to_mmCIF\n",
      "This option will set default input path to mmCIF files (default: <./mmCIF>)\n",
      "usage $ python3 PDB.py -sipm /Users/bulatfaezov/PycharmProjects/renum/venv/mmCIF\n",
      "-sipp, --set_default_input_path_to_PDB\n",
      "This option will set default input path to PDB files (default: <./PDB>)\n",
      "usage $ python3 PDB.py -sipp /Users/bulatfaezov/PycharmProjects/renum/venv/PDB\n",
      "-sipma, --set_default_input_path_to_mmCIF_assembly\n",
      "This option will set default input path to mmCIF_assembly files (default: <./mmCIF_assembly>)\n",
      "usage $ python3 PDB.py -sipm /Users/bulatfaezov/PycharmProjects/renum/venv/mmCIF_assembly\n",
      "-sippa, --set_default_input_path_to_PDB_assembly\n",
      "This option will set default input path to PDB_assembly files (default: <./PDB_assembly>)\n",
      "usage $ python3 PDB.py -sipp /Users/bulatfaezov/PycharmProjects/renum/venv/PDB_assembly\n",
      "-sips, --set_default_input_path_to_SIFTS\n",
      "This option will set default input path to SIFTS files (default: <./SIFTS>)\n",
      "usage $ python3 PDB.py -sips /Users/bulatfaezov/PycharmProjects/renum/venv/SIFTS\n",
      "-sopm, --set_default_output_path_to_mmCIF\n",
      "This option will set default output path to mmCIF files (default: <./output_mmCIF>)\n",
      "usage $ python3 PDB.py -sopm /Users/bulatfaezov/PycharmProjects/renum/venv/output_mmCIF\n",
      "-sopp, --set_default_output_path_to_PDB\n",
      "This option will set default output path to PDB files (default: <./output_PDB>)\n",
      "usage $ python3 PDB.py -sopp /Users/bulatfaezov/PycharmProjects/renum/venv/output_PDB\n",
      "-sopma, --set_default_output_path_to_mmCIF_assembly\n",
      "This option will set default output path to mmCIF_assembly files (default: <./output_mmCIF_assembly>)\n",
      "usage $ python3 PDB.py -sopm /Users/bulatfaezov/PycharmProjects/renum/venv/output_mmCIF_assembly\n",
      "-soppa, --set_default_output_path_to_PDB_assembly\n",
      "This option will set default output path to PDB_assembly files (default: <./output_PDB_assembly>)\n",
      "usage $ python3 PDB.py -sopp /Users/bulatfaezov/PycharmProjects/renum/venv/output_PDB_assembly\n",
      "\n",
      "-sdmn, --set_default_mmCIF_num\n",
      "This option will set default mmCIF number which will be added to 1 to end numbering in cases \n",
      "when there are no UniProt numbering (default: 50000)\n",
      "usage $ python3 PDB.py -rfla 6dbp 3v03 2jit -mmCIF -sdmn 50000\n",
      "\n",
      "-sdpn, --set_default_PDB_num\n",
      "This option will set default PDB number which will be added to 1 to end numbering in cases \n",
      "when there are no UniProt numbering (default: 5000)\n",
      "usage $ python3 PDB.py -rfla 6dbp 3v03 2jit -mmCIF -sdpn 5000\n",
      "\n",
      "\"-offz\", \"--set_to_off_mode_gzip\"\n",
      "By default program will compress files with gzip this option will turn that off\n",
      "(default: gzip is on)\n",
      "usage $ python3 PDB.py -rfla 6dbp 3v03 2jit -mmCIF -offz\n",
      "\n",
      "\"-nproc\", \"--set_number_of_processes\"\n",
      "By default program will use all available CPUs. User can reduce number of CPUs for PDBrenum.\n",
      "In this example: only 4 CPUs will be used by the PDBrenum even if more CPUs available\n",
      "(default: nproc = None)\n",
      "usage $ python3 PDB.py -rfla 6dbp 3v03 2jit -mmCIF -nproc 4\n",
      "\n",
      "\n",
      "Roland Dunbrack's Lab\n",
      "Fox Chase Cancer Center\n",
      "Philadelphia, PA\n",
      "2020\n",
      "        \n"
     ]
    }
   ],
   "source": [
    "%run PDBrenum.py --help"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "----\n",
    "\n",
    "### Locating results and showing it worked\n",
    "\n",
    "\n",
    "Let's demonstrate that the `%run PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -PDB` command first run worked.\n",
    "\n",
    "When the script runs, it creates a directory for the data it obtains from the PDB. Because the demo command indicated we wanted the legacy PDB format, the script created a directory called `PDB` as it ran and saved the PDB files there.\n",
    "\n",
    "We can see that in some steps. First by running the following to list the contents of that working directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0m\u001b[01;34mbinder\u001b[0m/     LICENSE             \u001b[01;34moutput_PDB\u001b[0m/      \u001b[01;32mPDBrenum.py\u001b[0m*  \u001b[01;34msrc\u001b[0m/\n",
      "demo.ipynb  log_corrected.txt   \u001b[01;34mPDB\u001b[0m/             \u001b[01;32mREADME.md\u001b[0m*\n",
      "\u001b[01;32minput.txt\u001b[0m*  log_translator.txt  \u001b[01;32mPDBrenum.ipynb\u001b[0m*  \u001b[01;34mSIFTS\u001b[0m/\n"
     ]
    }
   ],
   "source": [
    "ls "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(Note that listing the files and directory show `log_corrected.txt` present in the directory along with this `demo.ipynb` notebook. That file harboring useful information will be discussed below in the section 'There's some good information that PDBrenum exposes as part of its process¶\n",
    "'.)\n",
    "\n",
    "We see the `PDB` directory and we can check the contents of that with the following command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pdb1bxw.ent.gz  pdb1d5t.ent.gz  pdb2vl3.ent.gz  pdb5e6h.ent.gz\n"
     ]
    }
   ],
   "source": [
    "ls PDB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Those files are compressed in the gzip format; however, using the unix `zcat` command to uncompress in combination with the unix command `head` to grab the start of a text file and display, we can view the start of one of them by running the following command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HEADER    MEMBRANE PROTEIN                        03-OCT-98   1BXW              \n",
      "TITLE     OUTER MEMBRANE PROTEIN A (OMPA) TRANSMEMBRANE DOMAIN                  \n",
      "COMPND    MOL_ID: 1;                                                            \n",
      "COMPND   2 MOLECULE: PROTEIN (OUTER MEMBRANE PROTEIN A);                        \n",
      "COMPND   3 CHAIN: A;                                                            \n",
      "COMPND   4 FRAGMENT: TRANSMEMBRANE DOMAIN;                                      \n",
      "COMPND   5 ENGINEERED: YES;                                                     \n",
      "COMPND   6 MUTATION: YES                                                        \n",
      "SOURCE    MOL_ID: 1;                                                            \n",
      "SOURCE   2 ORGANISM_SCIENTIFIC: ESCHERICHIA COLI BL21(DE3);                     \n",
      "\n",
      "gzip: stdout: Broken pipe\n"
     ]
    }
   ],
   "source": [
    "!zcat PDB/pdb1bxw.ent.gz|head"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The exclamation point at the beginning is to tell Jupyter this is a Unix command and to run it in the shell. `ls` we used above is so commonly used that Jupyter has been told to recognize it without needing the exclamation point.\n",
    "\n",
    "Don't mind the `gzip: stdout: Broken pipe` at the end; zcat is meant to handle an entire file and so it causes 'Broken pipe' notice when it doesn't get to write all the file to the destination. (Also, if you ever see the `gzip` line somewhere other than the very end of the output, just run the cell again and it will probably move to the end where it should show.) The point is you can read the PDB file.\n",
    "\n",
    "So that is the initial input? What did the script do?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The outout from the `PDBrenum.py` script gets saved over in `output_PDB/` because the PDB format was specified when calling the script. Using commands similar to when viewing the initial PDB files, the output can be viewed like so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1bxw_renum.pdb.gz  1d5t_renum.pdb.gz  2vl3_renum.pdb.gz  5e6h_renum.pdb.gz\n"
     ]
    }
   ],
   "source": [
    "ls output_PDB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alright, we can see the renumbered verison of the file we looked at earlier is `1bxw_renum.pdb.gz`. \n",
    "\n",
    "But how do we see the difference?\n",
    "\n",
    "We can add in the Unix `tail` command to the 'pipe' the outout of our earlier `zcat` & `head` combination to show part of the middle of the original and renumbered files. \n",
    "\n",
    "First let's display a section of the original by running the command below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "gzip: stdout: Broken pipe\n",
      "ATOM     11  C   ALA A   1      46.036  12.651  40.029  1.00 51.14           C  \n",
      "ATOM     12  O   ALA A   1      47.195  12.259  40.003  1.00 53.33           O  \n",
      "ATOM     13  CB  ALA A   1      44.229  11.697  41.473  1.00 53.91           C  \n",
      "ATOM     14  N   PRO A   2      45.736  13.936  40.024  1.00 49.63           N  \n",
      "ATOM     15  CA  PRO A   2      46.822  14.919  40.021  1.00 51.58           C  \n",
      "ATOM     16  C   PRO A   2      47.754  14.618  41.197  1.00 56.34           C  \n",
      "ATOM     17  O   PRO A   2      47.328  14.035  42.194  1.00 54.38           O  \n",
      "ATOM     18  CB  PRO A   2      46.081  16.238  40.142  1.00 45.83           C  \n",
      "ATOM     19  CG  PRO A   2      44.708  15.943  39.588  1.00 44.84           C  \n",
      "ATOM     20  CD  PRO A   2      44.381  14.536  40.054  1.00 41.42           C  \n"
     ]
    }
   ],
   "source": [
    "!zcat PDB/pdb1bxw.ent.gz|head -n 510|tail"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now to display the renumbered version by running the command below:  \n",
    "(the renumbered version gets an extra 14 lines in the header and so that is why `510` used in command above and `524` in command below)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ATOM     11  C   ALA A  22      46.036  12.651  40.029  1.00 51.14           C  \n",
      "ATOM     12  O   ALA A  22      47.195  12.259  40.003  1.00 53.33           O  \n",
      "ATOM     13  CB  ALA A  22      44.229  11.697  41.473  1.00 53.91           C  \n",
      "ATOM     14  N   PRO A  23      45.736  13.936  40.024  1.00 49.63           N  \n",
      "ATOM     15  CA  PRO A  23      46.822  14.919  40.021  1.00 51.58           C  \n",
      "ATOM     16  C   PRO A  23      47.754  14.618  41.197  1.00 56.34           C  \n",
      "ATOM     17  O   PRO A  23      47.328  14.035  42.194  1.00 54.38           O  \n",
      "ATOM     18  CB  PRO A  23      46.081  16.238  40.142  1.00 45.83           C  \n",
      "ATOM     19  CG  PRO A  23      44.708  15.943  39.588  1.00 44.84           C  \n",
      "ATOM     20  CD  PRO A  23      44.381  14.536  40.054  1.00 41.42           C  \n",
      "\n",
      "gzip: stdout: Broken pipe\n"
     ]
    }
   ],
   "source": [
    "!zcat output_PDB/1bxw_renum.pdb.gz|head -n 524|tail"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparing the results of the two commands shows that what the original PDB has as residues `#1` and `#2` correspond to residues `#22` and `#23` in the UniProt numbering.  \n",
    "\n",
    "By viewing [the corresponding UniProt entry](https://www.uniprot.org/uniprot/P0A910#sequences) (shown below for convenience), we can convince ourselves of the validity of this renumbering:\n",
    "![](binder/start_of1bxw_at_uniprot.png)\n",
    "\n",
    "This sample above shows that the numbering has been corrected in `1bxw_renum.pdb.gz` and the similarly processed PDB entries.\n",
    "\n",
    "\n",
    "\n",
    "#### Locating the output for download\n",
    "\n",
    "Above we showed how we can see the results listed from within this notebook and even display contents; however, if anything useful is created, you'll want to get those files out of the `output` directories and download them to your local computer. Jupyter has a file navigator accessible from the dashboard that allows you to download files from this session to your local machine. Click on the Jupyter icon in the upper left side above this notebook, next to 'demo'. That will take you to the Juptyer Dashboard. You should see the directory `output_PDB` listed there. Click on the word `output_PDB` and you should go into it where you can click the checkbox next to a file name and get a 'Download' button up at the top. Click 'Download' to initiate downloading the file to your local machine. \n",
    "\n",
    "#### Dealing with compression\n",
    "\n",
    "The files that get used in running the `PDBrenum.py` get the gzip flavor of compression applied. At any point to convert them you can uncompress witht the `gunzip` Unix command. For example, to uncompress the above example output, use:\n",
    "\n",
    "```text\n",
    "!gunzip output_PDB/1bxw_renum.pdb.gz\n",
    "```\n",
    "\n",
    "After that you can view the file directly as text by either navigating to it in the file navigator and clicking on it to open it in the Jupyter Dashboard, or running the command below to view the first few lines of it directly:\n",
    "\n",
    "```text\n",
    "!head output_PDB/1bxw_renum.pdb\n",
    "```\n",
    "\n",
    "Substitute `cat` in place of `head` to display the entire file in this notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### There's some good information that PDBrenum exposes as part of its process\n",
    "\n",
    "It's important to point out that in the process PDBrenum exposes some information that can be useful in other contexts. Luckily, it makes that information available in a an easy to access form.  \n",
    "The file generated during the process `log_corrected.txt` contains some useful information, such as mapping chain IDs for each PDB file to UniProt accession identifiers. The location of it in the directory where the `output_PDB/` and `PDB/` directories get generated was illustrated above where `ls` was run in the section 'Locating results and showing it worked' after PDBrenum was first run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SP PDB_id chain_PDB   chain_auth  UniProt             SwissProt              uni_len chain_len     renum 5k_or_50k\n",
      "+  5e6h   A           A           P29375              KDM5A_HUMAN                294       294         0         0\n",
      "+  1bxw   A           A           P0A910              OMPA_ECOLI                 171       172       171         1\n",
      "+  2vl3   A           A           P30044              PRDX5_HUMAN                161       162       161         1\n",
      "+  2vl3   B           B           P30044              PRDX5_HUMAN                161       161       161         0\n",
      "+  2vl3   C           C           P30044              PRDX5_HUMAN                161       161       161         0\n",
      "+  1d5t   A           A           P21856              GDIA_BOVIN                 431       433         0         2\n"
     ]
    }
   ],
   "source": [
    "cat log_corrected.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Demonstrations of various ways of taking advantage of this information to map chains in PDB files to UniProt ids is found in a companion notebook, [Demo of using PDBrenum to perform mapping of chain IDs in PDB files to UniProt IDs](chainID_mapping_to_UniProt_id_demo.ipynb). That was originally suggested as an option to address this Biostars question: [Mapping PDB ID + chain ID to UniProt ID](https://www.biostars.org/p/9540519/#9540519). PDBrenum provides the necessary information, via the SIFTS database, parsed out as side product of its efforts and the information is in an easy to mine fixed width text-based data table."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### Using a list of PDB entry identifiers\n",
    "\n",
    "You may have a lot of PDB entries that you want to process. The script allows for listing them in a separate text file with each id separated by a space and then indicating that file when calling the script. Such a file is included along with the script as `input.txt`. Let's examine the contents of that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2aa3 4zah 2aa2 2af2 2aac 2aaa 2asd\n"
     ]
    }
   ],
   "source": [
    "!head input.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can point the script at it when calling it, like so using the `-rftf` flag this time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Downloading PDB files: 100%|██████████| 7/7 [00:01<00:00,  3.98it/s]\n",
      "Downloading SIFTS files: 100%|██████████| 7/7 [00:32<00:00,  4.63s/it]\n",
      "Renumbering PDB files: 100%|██████████| 7/7 [04:23<00:00, 37.65s/it]\n"
     ]
    }
   ],
   "source": [
    "%run PDBrenum.py -rftf input.txt -PDB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When that is finished, we can run the following cell to see that `output_PDB/` contains additional files that corresponds to the contents of `input.txt`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1bxw_renum.pdb.gz  2aa3_renum.pdb.gz  2af2_renum.pdb.gz  4zah_renum.pdb.gz\n",
      "1d5t_renum.pdb.gz  2aaa_renum.pdb.gz  2asd_renum.pdb.gz  5e6h_renum.pdb.gz\n",
      "2aa2_renum.pdb.gz  2aac_renum.pdb.gz  2vl3_renum.pdb.gz\n"
     ]
    }
   ],
   "source": [
    "ls output_PDB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the means we used to analyze the contents of `1bxw_renum.pdb.gz` above, you could convince yourself those have been processed.\n",
    "\n",
    "----\n",
    "\n",
    "----"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}