{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Biopython's PDB Header parser to get missing residues\n", "\n", "\n", "Previously this worked out and had to be run at that time with a development version of Biopython that I got working [here](https://github.com/fomightez/BernBiopython). Now current Bioython has the essential functionality about missing residues in structure files, and so this notebook can be run here where the current Biopython is installed.\n", "\n", "You may also be interested in the notebook entitled, 'Using Biopython's PDB module to list resolved residues and construct fit commands'. Think of this notebook as complementing that one. Depending on what you are trying to do (or use as the source of information), that one may be better suited." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 491k 100 491k 0 0 666k 0 --:--:-- --:--:-- --:--:-- 665k\n", " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 519k 100 519k 0 0 872k 0 --:--:-- --:--:-- --:--:-- 871k\n" ] } ], "source": [ "#get stuctures\n", "import os\n", "files_needed = [\"6AGB.pdb\",\"6AH3.pdb\"]\n", "for file_needed in files_needed:\n", " if not os.path.isfile(file_needed):\n", " #os.system(f\"curl -OL https://files.rcsb.org/download/{file_needed}.gz\") #version of next line that works outside Jupyter/IPython\n", " !curl -OL https://files.rcsb.org/download/{file_needed}.gz # gives more feedback in Jupyter\n", " # os.system(f\"gunzip {file_needed}.gz\") #version of next line that works outside Jupyter/IPYthon\n", " !gunzip {file_needed}.gz" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from Bio.PDB import *\n", "h =parse_pdb_header('6AGB.pdb')\n", "h['has_missing_residues']" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'B',\n", " 'ssseq': 1,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 2,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'B',\n", " 'ssseq': 3,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 4,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'B',\n", " 'ssseq': 5,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 6,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'B',\n", " 'ssseq': 7,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'B',\n", " 'ssseq': 8,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'B',\n", " 'ssseq': 9,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'B',\n", " 'ssseq': 10,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'B',\n", " 'ssseq': 11,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'B',\n", " 'ssseq': 12,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'B',\n", " 'ssseq': 13,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'B',\n", " 'ssseq': 14,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'B',\n", " 'ssseq': 15,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'B',\n", " 'ssseq': 16,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'B',\n", " 'ssseq': 17,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'B',\n", " 'ssseq': 18,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'B',\n", " 'ssseq': 19,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'B',\n", " 'ssseq': 20,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'B',\n", " 'ssseq': 21,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'B',\n", " 'ssseq': 22,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'B',\n", " 'ssseq': 23,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'B',\n", " 'ssseq': 24,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'B',\n", " 'ssseq': 25,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'B',\n", " 'ssseq': 26,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'B',\n", " 'ssseq': 27,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'B',\n", " 'ssseq': 28,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'B',\n", " 'ssseq': 29,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'B',\n", " 'ssseq': 30,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 31,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'B',\n", " 'ssseq': 32,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'B',\n", " 'ssseq': 33,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'B',\n", " 'ssseq': 34,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'B',\n", " 'ssseq': 35,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'B',\n", " 'ssseq': 36,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'B',\n", " 'ssseq': 37,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'B',\n", " 'ssseq': 38,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'B',\n", " 'ssseq': 39,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 40,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 41,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'B',\n", " 'ssseq': 42,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'B',\n", " 'ssseq': 43,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'B',\n", " 'ssseq': 44,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'B',\n", " 'ssseq': 45,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'B',\n", " 'ssseq': 46,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PRO',\n", " 'chain': 'B',\n", " 'ssseq': 47,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 48,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'B',\n", " 'ssseq': 49,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'B',\n", " 'ssseq': 50,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 51,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'B',\n", " 'ssseq': 52,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 53,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'B',\n", " 'ssseq': 125,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'B',\n", " 'ssseq': 126,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'B',\n", " 'ssseq': 127,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'B',\n", " 'ssseq': 128,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'B',\n", " 'ssseq': 129,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'B',\n", " 'ssseq': 130,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'B',\n", " 'ssseq': 131,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 132,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 133,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'B',\n", " 'ssseq': 134,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 135,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 136,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'B',\n", " 'ssseq': 137,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'B',\n", " 'ssseq': 138,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PRO',\n", " 'chain': 'B',\n", " 'ssseq': 525,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'B',\n", " 'ssseq': 526,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 527,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'B',\n", " 'ssseq': 528,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'B',\n", " 'ssseq': 529,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 686,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'B',\n", " 'ssseq': 687,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'B',\n", " 'ssseq': 688,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'B',\n", " 'ssseq': 689,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'B',\n", " 'ssseq': 690,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'B',\n", " 'ssseq': 691,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PHE',\n", " 'chain': 'B',\n", " 'ssseq': 692,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'B',\n", " 'ssseq': 693,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'B',\n", " 'ssseq': 694,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'B',\n", " 'ssseq': 695,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PRO',\n", " 'chain': 'B',\n", " 'ssseq': 743,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'B',\n", " 'ssseq': 744,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 745,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'B',\n", " 'ssseq': 746,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 747,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'B',\n", " 'ssseq': 748,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'B',\n", " 'ssseq': 749,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 750,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'B',\n", " 'ssseq': 751,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'C',\n", " 'ssseq': 1,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'C',\n", " 'ssseq': 2,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'C',\n", " 'ssseq': 3,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'C',\n", " 'ssseq': 4,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'C',\n", " 'ssseq': 5,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'C',\n", " 'ssseq': 6,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'C',\n", " 'ssseq': 7,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'C',\n", " 'ssseq': 8,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'C',\n", " 'ssseq': 9,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'C',\n", " 'ssseq': 10,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'C',\n", " 'ssseq': 11,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'C',\n", " 'ssseq': 12,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'C',\n", " 'ssseq': 13,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'C',\n", " 'ssseq': 189,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'C',\n", " 'ssseq': 190,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'C',\n", " 'ssseq': 191,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'C',\n", " 'ssseq': 192,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'C',\n", " 'ssseq': 193,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'C',\n", " 'ssseq': 194,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'C',\n", " 'ssseq': 195,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'D',\n", " 'ssseq': 25,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'D',\n", " 'ssseq': 26,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'D',\n", " 'ssseq': 27,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'D',\n", " 'ssseq': 28,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PHE',\n", " 'chain': 'D',\n", " 'ssseq': 29,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'D',\n", " 'ssseq': 30,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'D',\n", " 'ssseq': 31,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'D',\n", " 'ssseq': 32,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 33,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 34,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 35,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 36,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PRO',\n", " 'chain': 'D',\n", " 'ssseq': 37,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'D',\n", " 'ssseq': 38,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'D',\n", " 'ssseq': 39,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'D',\n", " 'ssseq': 40,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'D',\n", " 'ssseq': 41,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 42,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'D',\n", " 'ssseq': 43,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'D',\n", " 'ssseq': 44,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'D',\n", " 'ssseq': 45,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 46,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'D',\n", " 'ssseq': 47,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'D',\n", " 'ssseq': 48,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'D',\n", " 'ssseq': 49,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'D',\n", " 'ssseq': 50,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'D',\n", " 'ssseq': 51,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'D',\n", " 'ssseq': 52,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'D',\n", " 'ssseq': 53,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'D',\n", " 'ssseq': 54,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 55,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'D',\n", " 'ssseq': 56,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 57,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'D',\n", " 'ssseq': 58,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'D',\n", " 'ssseq': 59,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 60,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'D',\n", " 'ssseq': 61,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'D',\n", " 'ssseq': 62,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'D',\n", " 'ssseq': 63,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'D',\n", " 'ssseq': 64,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'D',\n", " 'ssseq': 65,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 66,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'D',\n", " 'ssseq': 67,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'D',\n", " 'ssseq': 68,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'D',\n", " 'ssseq': 69,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'D',\n", " 'ssseq': 70,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'D',\n", " 'ssseq': 71,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'D',\n", " 'ssseq': 72,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'D',\n", " 'ssseq': 73,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'D',\n", " 'ssseq': 74,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'D',\n", " 'ssseq': 75,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'D',\n", " 'ssseq': 76,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'E',\n", " 'ssseq': 1,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'HIS',\n", " 'chain': 'E',\n", " 'ssseq': 148,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'E',\n", " 'ssseq': 149,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'E',\n", " 'ssseq': 150,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'E',\n", " 'ssseq': 151,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'E',\n", " 'ssseq': 152,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'E',\n", " 'ssseq': 153,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PHE',\n", " 'chain': 'E',\n", " 'ssseq': 154,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'E',\n", " 'ssseq': 155,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'E',\n", " 'ssseq': 156,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'E',\n", " 'ssseq': 157,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'E',\n", " 'ssseq': 158,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PHE',\n", " 'chain': 'E',\n", " 'ssseq': 159,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'E',\n", " 'ssseq': 160,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'E',\n", " 'ssseq': 161,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'E',\n", " 'ssseq': 162,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'E',\n", " 'ssseq': 163,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'E',\n", " 'ssseq': 164,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'E',\n", " 'ssseq': 165,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'E',\n", " 'ssseq': 166,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'E',\n", " 'ssseq': 167,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'E',\n", " 'ssseq': 168,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'E',\n", " 'ssseq': 169,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'E',\n", " 'ssseq': 170,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'E',\n", " 'ssseq': 171,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'E',\n", " 'ssseq': 172,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'E',\n", " 'ssseq': 173,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'F',\n", " 'ssseq': 1,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'G',\n", " 'ssseq': 1,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'G',\n", " 'ssseq': 2,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'G',\n", " 'ssseq': 3,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'G',\n", " 'ssseq': 4,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'G',\n", " 'ssseq': 5,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'G',\n", " 'ssseq': 6,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'G',\n", " 'ssseq': 7,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'HIS',\n", " 'chain': 'G',\n", " 'ssseq': 8,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'G',\n", " 'ssseq': 9,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'G',\n", " 'ssseq': 10,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'G',\n", " 'ssseq': 11,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'G',\n", " 'ssseq': 12,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'G',\n", " 'ssseq': 108,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'G',\n", " 'ssseq': 109,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'G',\n", " 'ssseq': 110,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'G',\n", " 'ssseq': 111,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'G',\n", " 'ssseq': 112,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'G',\n", " 'ssseq': 113,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'G',\n", " 'ssseq': 114,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'H',\n", " 'ssseq': 1,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'H',\n", " 'ssseq': 2,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'HIS',\n", " 'chain': 'I',\n", " 'ssseq': 243,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'I',\n", " 'ssseq': 244,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'I',\n", " 'ssseq': 245,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'I',\n", " 'ssseq': 246,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'I',\n", " 'ssseq': 247,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'I',\n", " 'ssseq': 248,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'I',\n", " 'ssseq': 249,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'I',\n", " 'ssseq': 250,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'I',\n", " 'ssseq': 251,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'I',\n", " 'ssseq': 252,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'I',\n", " 'ssseq': 253,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'I',\n", " 'ssseq': 254,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'I',\n", " 'ssseq': 255,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'I',\n", " 'ssseq': 256,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'I',\n", " 'ssseq': 257,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'I',\n", " 'ssseq': 258,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'I',\n", " 'ssseq': 259,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'I',\n", " 'ssseq': 260,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASN',\n", " 'chain': 'I',\n", " 'ssseq': 261,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'I',\n", " 'ssseq': 262,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'I',\n", " 'ssseq': 263,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'I',\n", " 'ssseq': 264,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'I',\n", " 'ssseq': 265,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'I',\n", " 'ssseq': 266,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'I',\n", " 'ssseq': 267,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'I',\n", " 'ssseq': 268,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'I',\n", " 'ssseq': 269,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'I',\n", " 'ssseq': 270,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'THR',\n", " 'chain': 'I',\n", " 'ssseq': 271,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'I',\n", " 'ssseq': 272,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'I',\n", " 'ssseq': 273,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'I',\n", " 'ssseq': 274,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'VAL',\n", " 'chain': 'I',\n", " 'ssseq': 275,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'I',\n", " 'ssseq': 276,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'I',\n", " 'ssseq': 277,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'I',\n", " 'ssseq': 278,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'I',\n", " 'ssseq': 279,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'I',\n", " 'ssseq': 280,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'I',\n", " 'ssseq': 281,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'I',\n", " 'ssseq': 282,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLN',\n", " 'chain': 'I',\n", " 'ssseq': 283,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LEU',\n", " 'chain': 'I',\n", " 'ssseq': 284,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'I',\n", " 'ssseq': 285,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'HIS',\n", " 'chain': 'I',\n", " 'ssseq': 286,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'I',\n", " 'ssseq': 287,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'SER',\n", " 'chain': 'I',\n", " 'ssseq': 288,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'I',\n", " 'ssseq': 289,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ARG',\n", " 'chain': 'I',\n", " 'ssseq': 290,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'HIS',\n", " 'chain': 'I',\n", " 'ssseq': 291,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'I',\n", " 'ssseq': 292,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PRO',\n", " 'chain': 'I',\n", " 'ssseq': 293,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'K',\n", " 'ssseq': 1,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'K',\n", " 'ssseq': 2,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'K',\n", " 'ssseq': 3,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'K',\n", " 'ssseq': 4,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ALA',\n", " 'chain': 'K',\n", " 'ssseq': 5,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'HIS',\n", " 'chain': 'K',\n", " 'ssseq': 6,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'K',\n", " 'ssseq': 7,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLY',\n", " 'chain': 'K',\n", " 'ssseq': 8,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'K',\n", " 'ssseq': 9,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'MET',\n", " 'chain': 'K',\n", " 'ssseq': 10,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'LYS',\n", " 'chain': 'K',\n", " 'ssseq': 11,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'PRO',\n", " 'chain': 'K',\n", " 'ssseq': 12,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'K',\n", " 'ssseq': 13,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ILE',\n", " 'chain': 'K',\n", " 'ssseq': 14,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'ASP',\n", " 'chain': 'K',\n", " 'ssseq': 15,\n", " 'insertion': None},\n", " {'model': None,\n", " 'res_name': 'GLU',\n", " 'chain': 'K',\n", " 'ssseq': 16,\n", " 'insertion': None}]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from Bio.PDB import *\n", "h =parse_pdb_header('6AGB.pdb')\n", "h['missing_residues']" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Missing from chain 'G':\n", "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 108, 109, 110, 111, 112, 113, 114]\n", "\n", "\n", "\n", "F [1]\n", "G [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 108, 109, 110, 111, 112, 113, 114]\n" ] } ], "source": [ "# Missing residue positions for specific chains\n", "from Bio.PDB import *\n", "from collections import defaultdict\n", "h =parse_pdb_header('6AGB.pdb')\n", "#parse per chain\n", "chains_of_interest = [\"F\",\"G\"]\n", "# make a dictionary for each chain of interest with value of a list. The list will be the list of residues later\n", "missing_per_chain = defaultdict(list)\n", "# go through missing residues and populate each chain's list\n", "for residue in h['missing_residues']:\n", " if residue[\"chain\"] in chains_of_interest:\n", " missing_per_chain[residue[\"chain\"]].append(residue[\"ssseq\"])\n", "#print(missing_per_chain)\n", "\n", "\n", "\n", "print('')\n", "print(\"Missing from chain 'G':\\n{}\".format(missing_per_chain['G']))\n", "print('\\n\\n')\n", "for chain in missing_per_chain:\n", " print(chain,missing_per_chain[chain])" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "defaultdict(, {'B': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 525, 526, 527, 528, 529, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 743, 744, 745, 746, 747, 748, 749, 750, 751], 'C': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 189, 190, 191, 192, 193, 194, 195], 'D': [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76], 'E': [1, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173], 'F': [1], 'G': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 108, 109, 110, 111, 112, 113, 114], 'H': [1, 2], 'I': [243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293], 'K': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]})\n", "\n", "Missing from chain 'K':\n", "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]\n" ] } ], "source": [ "# Missing residue positions for ALL chains\n", "from Bio.PDB import *\n", "from collections import defaultdict\n", "# extract information on chains in structure\n", "structure = PDBParser().get_structure('6AGB', '6AGB.pdb')\n", "chains = [each.id for each in structure.get_chains()]\n", "\n", "h =parse_pdb_header('6AGB.pdb')\n", "\n", "# make a dictionary for each chain of interest with value of a list. The list will be the list of residue positions later\n", "missing_per_chain = defaultdict(list)\n", "# go through missing residues and populate each chain's list\n", "for residue in h['missing_residues']:\n", " if residue[\"chain\"] in chains:\n", " missing_per_chain[residue[\"chain\"]].append(residue[\"ssseq\"])\n", "print(missing_per_chain)\n", "\n", "\n", "\n", "print('')\n", "print(\"Missing from chain 'K':\\n{}\".format(missing_per_chain['K']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "#### Compare two structures" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Missing from chains in AGH3:\n", "defaultdict(, {'B': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 525, 526, 527, 528, 529, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 743, 744, 745, 746, 747, 748, 749, 750, 751], 'C': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 189, 190, 191, 192, 193, 194, 195], 'D': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76], 'E': [1, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173], 'F': [1], 'G': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 108, 109, 110, 111, 112, 113, 114], 'H': [1, 2], 'I': [243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293], 'K': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]})\n", "\n", "Missing from chain 'K':\n", "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]\n", "\n", "Same residues missing for chains shared by 6AGB and 6AH3?:\n", "False\n", "\n", "Chain by chain accounting of whether missing same residues between 6AGB and 6AH3?:\n", "A\n", "True\n", "B\n", "True\n", "C\n", "True\n", "D\n", "False\n", "E\n", "True\n", "F\n", "True\n", "G\n", "True\n", "H\n", "True\n", "I\n", "True\n", "J\n", "True\n", "K\n", "True\n", "\n", "\n", "Further details on those chains where not the same residues missing:\n", "chain 'D' in 6AH3 has more missing residues than 6AGB:\n", "True\n", "\n", "total # residues missing in chain 'D' of 6AH3: 76\n", "total # residues missing in chain 'D' of 6AGB: 52\n", "\n" ] } ], "source": [ "# Does 6AH3 have any residues missing that 6AGB doesn't, for chains shared between them?\n", "from Bio.PDB import *\n", "from collections import defaultdict\n", "# extract information on chains in structure\n", "structure = PDBParser().get_structure('6AGB', '6AGB.pdb') # USING CHAIN LISTING FROM THAT BECAUSE ONLY CARE ABOUT ONES SHARED\n", "chains = [each.id for each in structure.get_chains()]\n", "\n", "h =parse_pdb_header('6AH3.pdb')\n", "\n", "# make a dictionary for each chain of interest with value of a list. The list will be the list of residue positions later\n", "missing_per_chainh3 = defaultdict(list)\n", "# go through missing residues and populate each chain's list\n", "for residue in h['missing_residues']:\n", " if residue[\"chain\"] in chains:\n", " missing_per_chainh3[residue[\"chain\"]].append(residue[\"ssseq\"])\n", "print(\"Missing from chains in AGH3:\\n{}\".format(missing_per_chainh3))\n", "\n", "\n", "\n", "print('')\n", "print(\"Missing from chain 'K':\\n{}\".format(missing_per_chainh3['K']))\n", "\n", "print ('')\n", "same_result = missing_per_chainh3 == missing_per_chain\n", "print(\"Same residues missing for chains shared by 6AGB and 6AH3?:\\n{}\".format(same_result))\n", "print ('')\n", "print(\"Chain by chain accounting of whether missing same residues between 6AGB and 6AH3?:\")\n", "same_residues_present_list = []\n", "for chain in chains:\n", " print(chain)\n", " print (missing_per_chainh3[chain] == missing_per_chain[chain])\n", " if missing_per_chainh3[chain] != missing_per_chain[chain]:\n", " same_residues_present_list.append(chain)\n", "print(\"\\n\\nFurther details on those chains where not the same residues missing:\")\n", "for chain in same_residues_present_list:\n", " print(\"chain '{}' in 6AH3 has more missing residues than 6AGB:\\n{}\".format(chain,len(missing_per_chainh3[chain]) > len(missing_per_chain[chain]) ))\n", " print(\"\\ntotal # residues missing in chain '{}' of 6AH3: {}\".format(chain,len(missing_per_chainh3[chain]) ))\n", " print(\"total # residues missing in chain '{}' of 6AGB: {}\\n\".format(chain,len(missing_per_chain[chain]) ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Chain D is the only one with differences. The one in 6AGB(without the substrate) has more residues total, as it has less missing. (**Note that depending on how the gaps are distributed the the chain with the most residues may not have more information than the other for a region you are particularly interested in. It is just a rough metric and you should look into details. The 'Protein' tab at [PDBsum](https://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=index.html) is particularly useful for comparing missing residues in specific regions of proteins in structures.**)\n", "\n", "Python's set math can be used to look into some of the specific missing residues. Here that is done to provide some insight into the parts to examine for Chain D:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total missing: 52\n", "total missing: 76\n", "These are missing in 6AH3 Chain D but present in 6AGB: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24}\n", "These are present in 6AH3 Chain D but missing in 6AGB: set()\n" ] } ], "source": [ "# How does Chain D compare specifically:\n", "print(\"total missing:\",len(missing_per_chain['D']))\n", "print(\"total missing:\",len(missing_per_chainh3['D']))\n", "A= set(missing_per_chainh3['D'])\n", "B= set(missing_per_chain['D']) \n", "print(\"These are missing in 6AH3 Chain D but present in 6AGB:\",A-B) # set math based on https://stackoverflow.com/a/1830195/8508004\n", "print(\"These are present in 6AH3 Chain D but missing in 6AGB:\",B-A) # set math based on https://stackoverflow.com/a/1830195/8508004" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that there are none present in 6AH3 Chain D but missing in 6AGB and so the result is an empty set for this case." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---- " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }