{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# myChEMBL BLAST Tutorial\n", "\n", "### myChEMBL team, ChEMBL Group, EMBL-EBI.\n", "\n", "This notebook is intended to illustrate the following:\n", "\n", "* How to run a BLAST search and parse the results\n", "* Creating a basic Druggability Score and linking this score to a BLAST search\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to run a BLAST search and parse the results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What is a BLAST search?\n", "\n", "BLAST, the Basic Local Alignment Search Tool, allows a user to search with either a protein (amino acid based) or nucleotide (DNA or RNA based) sequence and find statistically significant similar sequences within a BLAST sequence database. The finer details of how BLAST works can reviewed at the [NCBI website](http://blast.ncbi.nlm.nih.gov/Blast.cgi) or [Wikipedia](http://en.wikipedia.org/wiki/BLAST).\n", "\n", "### Does ChEMBL provide any BLAST services?\n", "\n", "Yes. The website allows a user to run a BLAST search against the [protein target sequences](https://www.ebi.ac.uk/chembl/target) or the [biotherapeutic sequences](https://www.ebi.ac.uk/chembl/). Users can also download all of the 'sequence databases' from the [ftpsite](ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/) (look for the .fa.gz downloads) and run BLAST searches locally. Currently the ChEMBL Web Services do not provide this functionality, but this will be added in the future.\n", "\n", "### How do I run a BLAST search\n", "\n", "If you want to search ChEMBL, you could use the ChEMBL web interface. Alternatively you could run the search locally working through the following steps:\n", "\n", "1. Download blast software from NCBI\n", "2. Download/Create sequence database you wish to search against\n", "3. Format sequence database\n", "4. Run search\n", "\n", "As you see, it is not too complicated and steps 1-3 have already carried out on this version of myChEMBL. Before we run a search we will first set up some parameters:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import re\n", "\n", "# Input parameters\n", "blast_exe = '/home/chembl/blast/ncbi-blast-2.2.29+/bin/blastp'\n", "query_file = '/tmp/test.fa'\n", "eval_threshold = 0.001\n", "num_descriptions = 5 \n", "num_alignments = 5\n", "database = '/home/chembl/blast/chembl/chembl_21.fa'\n", "\n", "# Output parameters\n", "results_txt = '/tmp/test.out'\n", "results_xml = '/tmp/test.xml'\n", "results_csv = '/tmp/test.csv'\n", "\n", "\n", "# Query sequence used throughout this tutorial\n", "# ** Feel free to edit the protein sequence below **\n", "# ** DO NOT INCLUDE WHITESPACES IN SEQUENCE HEADER LINE ** \n", "# **\n", "query_sequence = '''\n", ">Q96P68_OXGR1_HUMAN\n", "MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF\n", "KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS\n", "ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR\n", "SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR\n", "RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL\n", "LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP\n", ">Q86XF0_DHFRL1_HUMAN\n", "MFLLLNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFS\n", "IPEKNRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSS\n", "VYKEAMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKF\n", "EVCEKDD\n", ">Q9UKX5_ITGA11_HUMAN\n", "MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA\n", "PLETNGYQKTGDVYKCPVIHGNCTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNSFLA\n", "CSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVE\n", "VQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTET\n", "RTAFGIEFARSEAFQKGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVL\n", "GYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNET\n", "SFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKN\n", "HGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYF\n", "GSEITSVDIDGDGVTDVLLVGAPMYFNEGRERGKVYVYELRQNLFVYNGTLKDSHSYQNA\n", "RFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELATG\n", "LQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCK\n", "RSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGGDRFTNRAV\n", "LLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNG\n", "CNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAV\n", "EATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQVCNVSYPFF\n", "RAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFT\n", "RSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPIATRSGNRLL\n", "KLRDFLTDEANTSCNIWGNSTEYRPTPVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEI\n", "NFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISKQEDWQV\n", "PIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE\n", ">P06804_TNFA_MOUSE\n", "MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR\n", "DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM\n", "DLKDNQLVVPADGLYLVYSQVLFKGQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCP\n", "KDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL\n", ">P48050_KCNJ4_HUMAN\n", "MHGHSRNGQAHVPRRKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLM\n", "IFSAAFLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNG\n", "FLGAFLFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKK\n", "RAQTLLFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQR\n", "DLNVGYDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMT\n", "TQARSSYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITV\n", "LPAPPPPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFG\n", "SHLDLERMQASLPLDNISYRRESAI\n", ">Q80Z70_SE1L1_RAT\n", "MQVRVRLLLLLCAVLLGSAAASSDEETNQDESLDSKGALPTDGSVKDHTTGKVVLLARDL\n", "LILKDSEVESLLQDEEESSKSQEEVSVTEDISFLDSPNPSSKTYEELKRVRKPVLTAIEG\n", "TAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEEDAAKRRQM\n", "QEAEAIYQSGMKILNGSTRKNQKREAYRYLQKAAGMNHTKALERVSYALLFGDYLTQNIQ\n", "AAKEMFEKLTEEGSPKGQTGLGFLYASGLGVNSSQAKALVYYTFGALGGNLIAHMVLGYR\n", "YWAGIGVLQSCESALTHYRLVANHVASDISLTGGSVVQRIRLPDEVENPGMNSGMLEEDL\n", "IQYYQFLAEKGDVQAQVGLGQLHLHGGRGVEQNHQRAFDYFNLAANAGNSHAMAFLGKMY\n", "SEGSDIVPQSNETALHYFKKAADMGNPVGQSGLGMAYLYGRGVQVNYDLALKYFQKAAEQ\n", "GWVDGQLQLGSMYYNGIGVKRDYKQALKYFNLASQGGHILAFYNLAQMHASGTGVMRSCH\n", "TAVELFKNVCERGRWSERLMTAYNSYKDDDYNAAVVQYLLLAEQGYEVAQSNAAFILDQR\n", "EATIVGENETYPRALLHWNRAASQGYTVARIKLGDYHFYGFGTDVDYETAFIHYRLASEQ\n", "QHSAQAMFNLGYMHEKGLGIKQDIHLAKRFYDMAAEASPDAQVPVFLALCKLGVVYFLQY\n", "IREANIRDLFTQLDMDQLLGPEWDLYLMTIIALLLGTVIAYRQRQHQDIPVPRPPGPRPA\n", "PPQQEGPPEQQPPQ\n", ">P33277_GAP1_SCHPO\n", "MTKRHSGTLSSSVLPQTNRLSLLRNRESTSVLYTIDLDMESDVEDAFFHLDRELHDLKQQ\n", "ISSQSKQNFVLERDVRYLDSKIALLIQNRMAQEEQHEFAKRLNDNYNAVKGSFPDDRKLQ\n", "LYGALFFLLQSEPAYIASLVRRVKLFNMDALLQIVMFNIYGNQYESREEHLLLSLFQMVL\n", "TTEFEATSDVLSLLRANTPVSRMLTTYTRRGPGQAYLRSILYQCINDVAIHPDLQLDIHP\n", "LSVYRYLVNTGQLSPSEDDNLLTNEEVSEFPAVKNAIQERSAQLLLLTKRFLDAVLNSID\n", "EIPYGIRWVCKLIRNLTNRLFPSISDSTICSLIGGFFFLRFVNPAIISPQTSMLLDSCPS\n", "DNVRKTLATIAKIIQSVANGTSSTKTHLDVSFQPMLKEYEEKVHNLLRKLGNVGDFFEAL\n", "ELDQYIALSKKSLALEMTVNEIYLTHEIILENLDNLYDPDSHVHLILQELGEPCKSVPQE\n", "DNCLVTLPLYNRWDSSIPDLKQNLKVTREDILYVDAKTLFIQLLRLLPSGHPATRVPLDL\n", "PLIADSVSSLKSMSLMKKGIRAIELLDELSTLRLVDKENRYEPLTSEVEKEFIDLDALYE\n", "RIRAERDALQDVHRAICDHNEYLQTQLQIYGSYLNNARSQIKPSHSDSKGFSRGVGVVGI\n", "KPKNIKSSNTVKLSSQQLKKESVLLNCTIPEFNVSNTYFTFSSPSTDNFVIAVYQRGHSK\n", "VLVEVCICLDDVLQRRYASNPVVDLGFLTFEANKLYHLFEQLFLRK\n", ">Q96PD4_IL17F_HUMAN\n", "MTVKTLHGPAMVKYLLLSILGLAFLSEAAARKIPKVGHTFFQKPESCPPVPGGSMKLDIG\n", "IINENQRVSMSRNIESRSTSPWNYTVTWDPNRYPSEVVQAQCRNLGCINAQGKEDISMNS\n", "VPIQQETLVVRRKHQGCSVSFQLEKVLVTVGCTCVTPVIHHVQ\n", ">P10144_GRAB_HUMAN\n", "MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL\n", "TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR\n", "TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY\n", "YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI\n", "KKTMKRY\n", "'''\n", "\n", "# We will use the query sequence lengths later - just store these for now\n", "query_sequence_details = {}\n", "query_sequence_order = [];\n", "\n", "seq_counter = 0\n", "for seq in query_sequence.split('>'):\n", " seq = seq.strip(' \\n\\t')\n", " if(len(seq) == 0):\n", " continue\n", " \n", " seq_header = seq.split('\\n')[0].strip()\n", " seq_length = len(''.join(seq.split('\\n')[1:]))\n", " seq_counter = seq_counter+1\n", " query_sequence_details[seq_header] = {}\n", " query_sequence_details[seq_header]['seq_length'] = seq_length\n", " query_sequence_order.append(seq_header)\n", " \n", "# Write test query sequence above to query_file location\n", "text_file = open(query_file, \"w\")\n", "text_file.write(query_sequence)\n", "text_file.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have defined some query parameters we can run a BLAST search. The query below will execute the BLAST search and the raw BLAST output will be printed afterwards (Note, the -num_descriptions and -num_alignments arguments are used to limit the size of the output in the this online notebook)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BLASTP 2.2.29+\n", "\n", "\n", "Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.\n", "Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.\n", "Lipman (1997), \"Gapped BLAST and PSI-BLAST: a new generation of\n", "protein database search programs\", Nucleic Acids Res. 25:3389-3402.\n", "\n", "\n", "Reference for composition-based statistics: Alejandro A. Schaffer,\n", "L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri\n", "I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),\n", "\"Improving the accuracy of PSI-BLAST protein database searches with\n", "composition-based statistics and other refinements\", Nucleic Acids\n", "Res. 29:2994-3005.\n", "\n", "\n", "\n", "Database: chembl_21.fa\n", " 8,834 sequences; 5,161,060 total letters\n", "\n", "\n", "\n", "Query= Q96P68_OXGR1_HUMAN\n", "\n", "Length=337\n", " Score E\n", "Sequences producing significant alignments: (Bits) Value\n", "\n", " CHEMBL2150840 [Q96P68] 2-oxoglutarate receptor 1 (Homo sapiens) 688 0.0 \n", " CHEMBL2325 [Q6Y1R5] 2-oxoglutarate receptor 1 (Rattus norvegicus) 579 0.0 \n", " CHEMBL4315 [P47900] P2Y purinoceptor 1 (Homo sapiens) 216 9e-67\n", " CHEMBL5720 [P49652] P2Y purinoceptor 1 (Meleagris gallopavo) 215 1e-66\n", " CHEMBL2497 [P49651] P2Y purinoceptor 1 (Rattus norvegicus) 215 3e-66\n", "\n", "\n", "> CHEMBL2150840 [Q96P68] 2-oxoglutarate receptor 1 (Homo sapiens)\n", "Length=337\n", "\n", " Score = 688 bits (1775), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 337/337 (100%), Positives = 337/337 (100%), Gaps = 0/337 (0%)\n", "\n", "Query 1 MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF 60\n", " MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF\n", "Sbjct 1 MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF 60\n", "\n", "Query 61 KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS 120\n", " KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS\n", "Sbjct 61 KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS 120\n", "\n", "Query 121 ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR 180\n", " ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR\n", "Sbjct 121 ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR 180\n", "\n", "Query 181 SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR 240\n", " SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR\n", "Sbjct 181 SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR 240\n", "\n", "Query 241 RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL 300\n", " RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL\n", "Sbjct 241 RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL 300\n", "\n", "Query 301 LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP 337\n", " LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP\n", "Sbjct 301 LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP 337\n", "\n", "\n", "> CHEMBL2325 [Q6Y1R5] 2-oxoglutarate receptor 1 (Rattus norvegicus)\n", "Length=337\n", "\n", " Score = 579 bits (1492), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 289/337 (86%), Positives = 299/337 (89%), Gaps = 0/337 (0%)\n", "\n", "Query 1 MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF 60\n", " M E LD AN SDF DY A NCTDE I KM YLPVIY IIFLVGFPGN V IS Y+F\n", "Sbjct 1 MIETLDSPANDSDFLDYITALENCTDEQISFKMQYLPVIYSIIFLVGFPGNTVAISIYVF 60\n", "\n", "Query 61 KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS 120\n", " KMRPWKSSTIIMLNLA TDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRF FHFNLYSS\n", "Sbjct 61 KMRPWKSSTIIMLNLALTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFGFHFNLYSS 120\n", "\n", "Query 121 ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR 180\n", " ILFLTCFS+FRY VIIHPMSCFSI KTR AVVACA VW+ISLVAV+PMTFLITST RTNR\n", "Sbjct 121 ILFLTCFSLFRYIVIIHPMSCFSIQKTRWAVVACAGVWVISLVAVMPMTFLITSTTRTNR 180\n", "\n", "Query 181 SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR 240\n", " SACLDLTSSD+L TIKWYNLILTATTFCLPL+IVTLCYTTII TLTHG +T SC KQKAR\n", "Sbjct 181 SACLDLTSSDDLTTIKWYNLILTATTFCLPLLIVTLCYTTIISTLTHGPRTHSCFKQKAR 240\n", "\n", "Query 241 RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL 300\n", " RLTILLLL FYVCFLPFHILRVIRIESRLLSISCSIE+ IHEAYIVSRPLAALNTFGNLL\n", "Sbjct 241 RLTILLLLVFYVCFLPFHILRVIRIESRLLSISCSIESHIHEAYIVSRPLAALNTFGNLL 300\n", "\n", "Query 301 LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP 337\n", " LYVVVS+NFQQA CS VRCK G+LEQAKK S SNNP\n", "Sbjct 301 LYVVVSNNFQQAFCSAVRCKAIGDLEQAKKDSCSNNP 337\n", "\n", "\n", "> CHEMBL4315 [P47900] P2Y purinoceptor 1 (Homo sapiens)\n", "Length=373\n", "\n", " Score = 216 bits (550), Expect = 9e-67, Method: Compositional matrix adjust.\n", " Identities = 108/300 (36%), Positives = 176/300 (59%), Gaps = 4/300 (1%)\n", "\n", "Query 23 NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLY 82\n", " C + +YLP +Y ++F++GF GN+V I ++F M+PW ++ M NLA D LY\n", "Sbjct 41 KCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLY 100\n", "\n", "Query 83 LTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSSILFLTCFSIFRYCVIIHPMSCF 142\n", " + +LP LI YY + +WIFGD MCK RF FH NLY SILFLTC S RY +++P+ \n", "Sbjct 101 VLTLPALIFYYFNKTDWIFGDAMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLKSL 160\n", "\n", "Query 143 SIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNRS-ACLDLTSSDELNTIKWYNLI 201\n", " K + A+ +VW+I +VA+ P+ F + R N++ C D TS + L + Y++ \n", "Sbjct 161 GRLKKKNAICISVLVWLIVVVAISPILFYSGTGVRKNKTITCYDTTSDEYLRSYFIYSMC 220\n", "\n", "Query 202 LTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKARRLTILLLLAFYVCFLPFHILR 261\n", " T FC+PLV++ CY I+ L + +S L++K+ L I++L F V ++PFH+++\n", "Sbjct 221 TTVAMFCVPLVLILGCYGLIVRALIYKDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMK 280\n", "\n", "Query 262 VIRIESRL---LSISCSIENQIHEAYIVSRPLAALNTFGNLLLYVVVSDNFQQAVCSTVR 318\n", " + + +RL C+ ++++ Y V+R LA+LN+ + +LY + D F++ + R\n", "Sbjct 281 TMNLRARLDFQTPAMCAFNDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATR 340\n", "\n", "\n", "> CHEMBL5720 [P49652] P2Y purinoceptor 1 (Meleagris gallopavo)\n", "Length=362\n", "\n", " Score = 215 bits (548), Expect = 1e-66, Method: Compositional matrix adjust.\n", " Identities = 106/300 (35%), Positives = 174/300 (58%), Gaps = 4/300 (1%)\n", "\n", "Query 23 NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLY 82\n", " C+ + +YLP +Y ++F+ GF GN+V I ++F MRPW ++ M NLA D LY\n", "Sbjct 30 KCSLTKTGFQFYYLPTVYILVFITGFLGNSVAIWMFVFHMRPWSGISVYMFNLALADFLY 89\n", "\n", "Query 83 LTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSSILFLTCFSIFRYCVIIHPMSCF 142\n", " + +LP LI YY + +WIFGD MCK RF FH NLY SILFLTC S+ RY ++HP+ \n", "Sbjct 90 VLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISVHRYTGVVHPLKSL 149\n", "\n", "Query 143 SIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNRS-ACLDLTSSDELNTIKWYNLI 201\n", " K + AV ++VW + + + P+ F + R N++ C D T+ + L + Y++ \n", "Sbjct 150 GRLKKKNAVYVSSLVWALVVAVIAPILFYSGTGVRRNKTITCYDTTADEYLRSYFVYSMC 209\n", "\n", "Query 202 LTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKARRLTILLLLAFYVCFLPFHILR 261\n", " T FC+P +++ CY I+ L + +S L++K+ L I++L F V +LPFH+++\n", "Sbjct 210 TTVFMFCIPFIVILGCYGLIVKALIYKDLDNSPLRRKSIYLVIIVLTVFAVSYLPFHVMK 269\n", "\n", "Query 262 VIRIESRL---LSISCSIENQIHEAYIVSRPLAALNTFGNLLLYVVVSDNFQQAVCSTVR 318\n", " + + +RL C+ ++++ Y V+R LA+LN+ + +LY + D F++ + R\n", "Sbjct 270 TLNLRARLDFQTPQMCAFNDKVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATR 329\n", "\n", "\n", "> CHEMBL2497 [P49651] P2Y purinoceptor 1 (Rattus norvegicus)\n", "Length=373\n", "\n", " Score = 215 bits (548), Expect = 3e-66, Method: Compositional matrix adjust.\n", " Identities = 108/300 (36%), Positives = 175/300 (58%), Gaps = 4/300 (1%)\n", "\n", "Query 23 NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLY 82\n", " C + +YLP +Y ++F++GF GN+V I ++F M+PW ++ M NLA D LY\n", "Sbjct 41 RCALIKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLY 100\n", "\n", "Query 83 LTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSSILFLTCFSIFRYCVIIHPMSCF 142\n", " + +LP LI YY + +WIFGD MCK RF FH NLY SILFLTC S RY +++P+ \n", "Sbjct 101 VLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLKSL 160\n", "\n", "Query 143 SIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNRS-ACLDLTSSDELNTIKWYNLI 201\n", " K + A+ +VW+I +VA+ P+ F + R N++ C D TS + L + Y++ \n", "Sbjct 161 GRLKKKNAIYVSVLVWLIVVVAISPILFYSGTGIRKNKTVTCYDSTSDEYLRSYFIYSMC 220\n", "\n", "Query 202 LTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKARRLTILLLLAFYVCFLPFHILR 261\n", " T FC+PLV++ CY I+ L + +S L++K+ L I++L F V ++PFH+++\n", "Sbjct 221 TTVAMFCIPLVLILGCYGLIVRALIYKDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMK 280\n", "\n", "Query 262 VIRIESRL---LSISCSIENQIHEAYIVSRPLAALNTFGNLLLYVVVSDNFQQAVCSTVR 318\n", " + + +RL C ++++ Y V+R LA+LN+ + +LY + D F++ + R\n", "Sbjct 281 TMNLRARLDFQTPEMCDFNDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATR 340\n", "\n", "\n", "\n", "Lambda K H a alpha\n", " 0.331 0.141 0.446 0.792 4.96 \n", "\n", "Gapped\n", "Lambda K H a alpha sigma\n", " 0.267 0.0410 0.140 1.90 42.6 43.6 \n", "\n", "Effective search space used: 1045882860\n", "\n", "\n", "Query= Q86XF0_DHFRL1_HUMAN\n", "\n", "Length=187\n", " Score E\n", "Sequences producing significant alignments: (Bits) Value\n", "\n", " CHEMBL202 [P00374] Dihydrofolate reductase (Homo sapiens) 352 9e-125\n", " CHEMBL2363 [Q920D2] Dihydrofolate reductase (Rattus norvegicus) 336 2e-118\n", " CHEMBL2097172 [Q920D2] Dihydrofolate reductase (Rattus norvegicus) 336 2e-118\n", " CHEMBL2097173 [Q920D2] Dihydrofolate reductase (Rattus norvegicus) 336 2e-118\n", " CHEMBL4564 [P00375] Dihydrofolate reductase (Mus musculus) 335 5e-118\n", "\n", "\n", "> CHEMBL202 [P00374] Dihydrofolate reductase (Homo sapiens)\n", "Length=187\n", "\n", " Score = 352 bits (904), Expect = 9e-125, Method: Compositional matrix adjust.\n", " Identities = 171/183 (93%), Positives = 176/183 (96%), Gaps = 0/183 (0%)\n", "\n", "Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64\n", " LNCIVAVSQNMGIGKNGDLP PPLRNEFRYFQRMTTTSSVEGKQNLVIMG+KTWFSIPEK\n", "Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQNLVIMGKKTWFSIPEK 64\n", "\n", "Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124\n", " NRPLK RINLVLSRELKEPPQGAHFL+RSLDDALKLTE+PELANKVDM+WIVGGSSVYKE\n", "Sbjct 65 NRPLKGRINLVLSRELKEPPQGAHFLSRSLDDALKLTEQPELANKVDMVWIVGGSSVYKE 124\n", "\n", "Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184\n", " AMNH GHLKLFVTRIMQDFESDTFF EIDLEKYKLLPEYPGVLSDVQE K IKYKFEV E\n", "Sbjct 125 AMNHPGHLKLFVTRIMQDFESDTFFPEIDLEKYKLLPEYPGVLSDVQEEKGIKYKFEVYE 184\n", "\n", "Query 185 KDD 187\n", " K+D\n", "Sbjct 185 KND 187\n", "\n", "\n", "> CHEMBL2363 [Q920D2] Dihydrofolate reductase (Rattus norvegicus)\n", "Length=187\n", "\n", " Score = 336 bits (862), Expect = 2e-118, Method: Compositional matrix adjust.\n", " Identities = 161/183 (88%), Positives = 173/183 (95%), Gaps = 0/183 (0%)\n", "\n", "Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64\n", " LNCIVAVSQNMGIGKNGDLP P LRNEF+YFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK\n", "Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPLLRNEFKYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64\n", "\n", "Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124\n", " NRPLKDRIN+VLSRELKEPPQGAHFLA+SLDDALKL E+PELA+KVDM+W+VGGSSVY+E\n", "Sbjct 65 NRPLKDRINIVLSRELKEPPQGAHFLAKSLDDALKLIEQPELASKVDMVWVVGGSSVYQE 124\n", "\n", "Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184\n", " AMN GHL+LFVTRIMQ+FESDTFF EIDLEKYKLLPEYPGVLS++QE K IKYKFEV E\n", "Sbjct 125 AMNQPGHLRLFVTRIMQEFESDTFFPEIDLEKYKLLPEYPGVLSEIQEEKGIKYKFEVYE 184\n", "\n", "Query 185 KDD 187\n", " K D\n", "Sbjct 185 KKD 187\n", "\n", "\n", "> CHEMBL2097172 [Q920D2] Dihydrofolate reductase (Rattus norvegicus)\n", "Length=187\n", "\n", " Score = 336 bits (862), Expect = 2e-118, Method: Compositional matrix adjust.\n", " Identities = 161/183 (88%), Positives = 173/183 (95%), Gaps = 0/183 (0%)\n", "\n", "Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64\n", " LNCIVAVSQNMGIGKNGDLP P LRNEF+YFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK\n", "Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPLLRNEFKYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64\n", "\n", "Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124\n", " NRPLKDRIN+VLSRELKEPPQGAHFLA+SLDDALKL E+PELA+KVDM+W+VGGSSVY+E\n", "Sbjct 65 NRPLKDRINIVLSRELKEPPQGAHFLAKSLDDALKLIEQPELASKVDMVWVVGGSSVYQE 124\n", "\n", "Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184\n", " AMN GHL+LFVTRIMQ+FESDTFF EIDLEKYKLLPEYPGVLS++QE K IKYKFEV E\n", "Sbjct 125 AMNQPGHLRLFVTRIMQEFESDTFFPEIDLEKYKLLPEYPGVLSEIQEEKGIKYKFEVYE 184\n", "\n", "Query 185 KDD 187\n", " K D\n", "Sbjct 185 KKD 187\n", "\n", "\n", "> CHEMBL2097173 [Q920D2] Dihydrofolate reductase (Rattus norvegicus)\n", "Length=187\n", "\n", " Score = 336 bits (862), Expect = 2e-118, Method: Compositional matrix adjust.\n", " Identities = 161/183 (88%), Positives = 173/183 (95%), Gaps = 0/183 (0%)\n", "\n", "Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64\n", " LNCIVAVSQNMGIGKNGDLP P LRNEF+YFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK\n", "Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPLLRNEFKYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64\n", "\n", "Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124\n", " NRPLKDRIN+VLSRELKEPPQGAHFLA+SLDDALKL E+PELA+KVDM+W+VGGSSVY+E\n", "Sbjct 65 NRPLKDRINIVLSRELKEPPQGAHFLAKSLDDALKLIEQPELASKVDMVWVVGGSSVYQE 124\n", "\n", "Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184\n", " AMN GHL+LFVTRIMQ+FESDTFF EIDLEKYKLLPEYPGVLS++QE K IKYKFEV E\n", "Sbjct 125 AMNQPGHLRLFVTRIMQEFESDTFFPEIDLEKYKLLPEYPGVLSEIQEEKGIKYKFEVYE 184\n", "\n", "Query 185 KDD 187\n", " K D\n", "Sbjct 185 KKD 187\n", "\n", "\n", "> CHEMBL4564 [P00375] Dihydrofolate reductase (Mus musculus)\n", "Length=187\n", "\n", " Score = 335 bits (859), Expect = 5e-118, Method: Compositional matrix adjust.\n", " Identities = 161/183 (88%), Positives = 173/183 (95%), Gaps = 0/183 (0%)\n", "\n", "Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64\n", " LNCIVAVSQNMGIGKNGDLP PPLRNEF+YFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK\n", "Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPPLRNEFKYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64\n", "\n", "Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124\n", " NRPLKDRIN+VLSRELKEPP+GAHFLA+SLDDAL+L E+PELA+KVDM+WIVGGSSVY+E\n", "Sbjct 65 NRPLKDRINIVLSRELKEPPRGAHFLAKSLDDALRLIEQPELASKVDMVWIVGGSSVYQE 124\n", "\n", "Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184\n", " AMN GHL+LFVTRIMQ+FESDTFF EIDL KYKLLPEYPGVLS+VQE K IKYKFEV E\n", "Sbjct 125 AMNQPGHLRLFVTRIMQEFESDTFFPEIDLGKYKLLPEYPGVLSEVQEEKGIKYKFEVYE 184\n", "\n", "Query 185 KDD 187\n", " K D\n", "Sbjct 185 KKD 187\n", "\n", "\n", "\n", "Lambda K H a alpha\n", " 0.320 0.138 0.409 0.792 4.96 \n", "\n", "Gapped\n", "Lambda K H a alpha sigma\n", " 0.267 0.0410 0.140 1.90 42.6 43.6 \n", "\n", "Effective search space used: 433983132\n", "\n", "\n", "Query= Q9UKX5_ITGA11_HUMAN\n", "\n", "Length=1188\n", " Score E\n", "Sequences producing significant alignments: (Bits) Value\n", "\n", " CHEMBL5883 [Q9UKX5] Integrin alpha-11 (Homo sapiens) 2473 0.0 \n", " CHEMBL5882 [O75578] Integrin alpha-10 (Homo sapiens) 919 0.0 \n", " CHEMBL3682 [P56199] Integrin alpha-1 (Homo sapiens) 825 0.0 \n", " CHEMBL3137278 [P56199] Integrin alpha-1 (Homo sapiens) 825 0.0 \n", " CHEMBL4998 [P17301] Integrin alpha-2 (Homo sapiens) 710 0.0 \n", "\n", "\n", "> CHEMBL5883 [Q9UKX5] Integrin alpha-11 (Homo sapiens)\n", "Length=1188\n", "\n", " Score = 2473 bits (6409), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 1188/1188 (100%), Positives = 1188/1188 (100%), Gaps = 0/1188 (0%)\n", "\n", "Query 1 MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA 60\n", " MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA\n", "Sbjct 1 MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA 60\n", "\n", "Query 61 PLETNGYQKTGDVYKCPVIHGNCTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNSFLA 120\n", " PLETNGYQKTGDVYKCPVIHGNCTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNSFLA\n", "Sbjct 61 PLETNGYQKTGDVYKCPVIHGNCTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNSFLA 120\n", "\n", "Query 121 CSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVE 180\n", " CSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVE\n", "Sbjct 121 CSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVE 180\n", "\n", "Query 181 VQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTET 240\n", " VQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTET\n", "Sbjct 181 VQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTET 240\n", "\n", "Query 241 RTAFGIEFARSEAFQKGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVL 300\n", " RTAFGIEFARSEAFQKGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVL\n", "Sbjct 241 RTAFGIEFARSEAFQKGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVL 300\n", "\n", "Query 301 GYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNET 360\n", " GYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNET\n", "Sbjct 301 GYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNET 360\n", "\n", "Query 361 SFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKN 420\n", " SFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKN\n", "Sbjct 361 SFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKN 420\n", "\n", "Query 421 HGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYF 480\n", " HGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYF\n", "Sbjct 421 HGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYF 480\n", "\n", "Query 481 GSEITSVDIDGDGVTDVLLVGAPMYFNEGRERGKVYVYELRQNLFVYNGTLKDSHSYQNA 540\n", " GSEITSVDIDGDGVTDVLLVGAPMYFNEGRERGKVYVYELRQNLFVYNGTLKDSHSYQNA\n", "Sbjct 481 GSEITSVDIDGDGVTDVLLVGAPMYFNEGRERGKVYVYELRQNLFVYNGTLKDSHSYQNA 540\n", "\n", "Query 541 RFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELATG 600\n", " RFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELATG\n", "Sbjct 541 RFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELATG 600\n", "\n", "Query 601 LQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCK 660\n", " LQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCK\n", "Sbjct 601 LQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCK 660\n", "\n", "Query 661 RSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGGDRFTNRAV 720\n", " RSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGGDRFTNRAV\n", "Sbjct 661 RSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGGDRFTNRAV 720\n", "\n", "Query 721 LLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNG 780\n", " LLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNG\n", "Sbjct 721 LLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNG 780\n", "\n", "Query 781 CNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAV 840\n", " CNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAV\n", "Sbjct 781 CNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAV 840\n", "\n", "Query 841 EATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQVCNVSYPFF 900\n", " EATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQVCNVSYPFF\n", "Sbjct 841 EATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQVCNVSYPFF 900\n", "\n", "Query 901 RAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFT 960\n", " RAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFT\n", "Sbjct 901 RAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFT 960\n", "\n", "Query 961 RSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPIATRSGNRLL 1020\n", " RSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPIATRSGNRLL\n", "Sbjct 961 RSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPIATRSGNRLL 1020\n", "\n", "Query 1021 KLRDFLTDEANTSCNIWGNSTEYRPTPVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEI 1080\n", " KLRDFLTDEANTSCNIWGNSTEYRPTPVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEI\n", "Sbjct 1021 KLRDFLTDEANTSCNIWGNSTEYRPTPVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEI 1080\n", "\n", "Query 1081 NFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISKQEDWQV 1140\n", " NFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISKQEDWQV\n", "Sbjct 1081 NFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISKQEDWQV 1140\n", "\n", "Query 1141 PIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE 1188\n", " PIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE\n", "Sbjct 1141 PIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE 1188\n", "\n", "\n", "> CHEMBL5882 [O75578] Integrin alpha-10 (Homo sapiens)\n", "Length=1167\n", "\n", " Score = 919 bits (2374), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 515/1180 (44%), Positives = 726/1180 (62%), Gaps = 41/1180 (3%)\n", "\n", "Query 1 MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA 60\n", " M+LP + L G FN+D PR+ PG A FGY+V QH G +W++VGA\n", "Sbjct 1 MELPFVTHLFLPLVFLTGLCSPFNLDEHHPRLFPGPPEAEFGYSVLQHVGGGQRWMLVGA 60\n", "\n", "Query 61 PLETNGYQKTGDVYKCPVIHGN---CTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNS 117\n", " P + + GDVY+CPV + C K +LG L N S NM LG+SL D \n", "Sbjct 61 PWDGPSGDRRGDVYRCPVGGAHNAPCAKGHLGDYQLGNSSHPAVNMHLGMSLLETDGDGG 120\n", "\n", "Query 118 FLACSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYP 177\n", " F+AC+PLWS CGSS +++G+C+RV+++F+ ++AP QRC TYMD+VIVLDGSNSIYP\n", "Sbjct 121 FMACAPLWSRACGSSVFSSGICARVDASFQPQGSLAPTAQRCPTYMDVVIVLDGSNSIYP 180\n", "\n", "Query 178 WVEVQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGG 237\n", " W EVQ FL ++ K +I P QIQVG+VQYGE VHE+ L D+R+ ++VV AA ++ +R G\n", "Sbjct 181 WSEVQTFLRRLVGKLFIDPEQIQVGLVQYGESPVHEWSLGDFRTKEEVVRAAKNLSRREG 240\n", "\n", "Query 238 TETRTAFGIEFARSEAFQK--GGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRY 295\n", " ET+TA I A +E F + GGR A ++++V+TDGESHD +L ++ E VTRY\n", "Sbjct 241 RETKTAQAIMVACTEGFSQSHGGRPEAARLLVVVTDGESHDGEELPAALKACEAGRVTRY 300\n", "\n", "Query 296 AVAVLGYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGT 355\n", " +AVLG+Y RR +P +FL EI+ IASDPD++ FFNVTDEAAL DIVDALGDRIF LEG+\n", "Sbjct 301 GIAVLGHYLRRQRDPSSFLREIRTIASDPDERFFFNVTDEAALTDIVDALGDRIFGLEGS 360\n", "\n", "Query 356 N-KNETSFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEF 414\n", " + +NE+SFGLEMSQ GFS+H ++DG+L G VGAYDW G+VL ++ P R + EF\n", "Sbjct 361 HAENESSFGLEMSQIGFSTHRLKDGILFGMVGAYDWGGSVLWLEGGHRLFPPRMALEDEF 420\n", "\n", "Query 415 PEELKNHGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQ 474\n", " P L+NH AYLGY+V+S++ R++++GAPRF H GKVI F + + ++ + Q+++G+\n", "Sbjct 421 PPALQNHAAYLGYSVSSMLLRGGRRLFLSGAPRFRHRGKVIAFQLKKDGAVRVAQSLQGE 480\n", "\n", "Query 475 QIGSYFGSEITSVDIDGDGVTDVLLVGAPMYFN-EGRERGKVYVYEL-RQNLFVYNGTLK 532\n", " QIGSYFGSE+ +D D DG TDVLLV APM+ + +E G+VYVY + +Q+L GTL+\n", "Sbjct 481 QIGSYFGSELCPLDTDRDGTTDVLLVAAPMFLGPQNKETGRVYVYLVGQQSLLTLQGTLQ 540\n", "\n", "Query 533 DSHSYQNARFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRI 592\n", " Q+ARFG ++ ++ DLNQD + DV VGAPLED H GA+Y++HG + + P QRI\n", "Sbjct 541 PEPP-QDARFGFAMGALPDLNQDGFADVAVGAPLEDGHQGALYLYHGTQSGVRPHPAQRI 599\n", "\n", "Query 593 TASELATGLQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKI 652\n", " A+ + L YFG S+ G+LDL+ D L+D+AVGA G A++L SRP+V + SL P I\n", "Sbjct 600 AAASMPHALSYFGRSVDGRLDLDGDDLVDVAVGAQGAAILLSSRPIVHLTPSLEVTPQAI 659\n", "\n", "Query 653 NIFHRDCKRSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGG 712\n", " ++ RDC+R G++A CL A LCF P +R+ A++DE RA D G\n", "Sbjct 660 SVVQRDCRRRGQEAVCLTAALCFQVTSRTPGRWDHQFYMRFTASLDEWTAGARAAFDGSG 719\n", "\n", "Query 713 DRFTNRAVLLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDH-GPMLDDGWPTTL 771\n", " R + R + LS G CE+++FHVLDT+DY++PV +V ++L++ GP+L++G PT++\n", "Sbjct 720 QRLSPRRLRLSVGNVTCEQLHFHVLDTSDYLRPVALTVTFALDNTTKPGPVLNEGSPTSI 779\n", "\n", "Query 772 RVSVPFWNGCNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFII 831\n", " + VPF C D CV DLVL D+ R RK F++\n", "Sbjct 780 QKLVPFSKDCGPDNECVTDLVLQVNMDI--------RGSRK--------------APFVV 817\n", "\n", "Query 832 ESTRQRVAVEATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQ 891\n", " R++V V TLENR ENAY+T L++ S NL ASL + +S +EC +\n", "Sbjct 818 RGGRRKVLVSTTLENRKENAYNTSLSLIFSRNLHLASLTPQRESPIKVECAAPSA--HAR 875\n", "\n", "Query 892 VCNVSYPFFRAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHL 951\n", " +C+V +P F+ AKV F L+FEFS S L + ++L A SDS ER+ T +DN A ++\n", "Sbjct 876 LCSVGHPVFQTGAKVTFLLEFEFSCSSLLSQVFVKLTASSDSLERNGTLQDNTAQTSAYI 935\n", "\n", "Query 952 KYEADVLFTRSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPI 1011\n", " +YE +LF+ S+L YEV P +L G GP F R+QNLG + + G+++ +P \n", "Sbjct 936 QYEPHLLFSSESTLHRYEVHPYGTLPV--GPGPEFKTTLRVQNLGCYVVSGLIISALLPA 993\n", "\n", "Query 1012 ATRSGNRLLKLRDFLTDEANTSCNIWGNSTEYRPTPVE-EDLRRAPQLNHSNSDVVSINC 1070\n", " GN L L +T+ N SC I N TE PV E+L+ +LN SN+ + C\n", "Sbjct 994 VAHGGNYFLSLSQVITN--NASC-IVQNLTEPPGPPVHPEELQHTNRLNGSNTQCQVVRC 1050\n", "\n", "Query 1071 NI-RLVPNQEINFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIV 1129\n", " ++ +L E++ LL + + K+KS+ ++ L + S E + +\n", "Sbjct 1051 HLGQLAKGTEVSVGLLRLVHNEFFRRAKFKSLTVVSTFELGTEEGSVLQLTEASRWSESL 1110\n", "\n", "Query 1130 FEISKQEDWQVPIWIIVGSTLGGLLLLALLVLALWKLGFF 1169\n", " E+ + + +WI++GS LGGLLLLALLV LWKLGFF\n", "Sbjct 1111 LEVVQTRPILISLWILIGSVLGGLLLLALLVFCLWKLGFF 1150\n", "\n", "\n", "> CHEMBL3682 [P56199] Integrin alpha-1 (Homo sapiens)\n", "Length=1179\n", "\n", " Score = 825 bits (2131), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 464/1211 (38%), Positives = 704/1211 (58%), Gaps = 84/1211 (7%)\n", "\n", "Query 6 GLVVA--WALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGAPLE 63\n", " G+ VA W L++ +FN+D + G FGYTVQQ++ KW+++G+PL \n", "Sbjct 10 GVAVACCWLLTVVLRCCVSFNVDVKNSMTFSGPVEDMFGYTVQQYENEEGKWVLIGSPLV 69\n", "\n", "Query 64 TNGYQKTGDVYKCPVIHGN---CTKLNLG-RVTLSNVSERKDNMRLGLSLATNPKDNSFL 119\n", " +TGDVYKCPV G C KL+L ++ NV+E K+NM G +L TNP + FL\n", "Sbjct 70 GQPKNRTGDVYKCPVGRGESLPCVKLDLPVNTSIPNVTEVKENMTFGSTLVTNP-NGGFL 128\n", "\n", "Query 120 ACSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWV 179\n", " AC PL+++ CG +YTTG+CS V+ F+ ++AP +Q C T +DIVIVLDGSNSIYPW \n", "Sbjct 129 ACGPLYAYRCGHLHYTTGICSDVSPTFQVVNSIAP-VQECSTQLDIVIVLDGSNSIYPWD 187\n", "\n", "Query 180 EVQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTE 239\n", " V FL ++L++ IGP Q QVG+VQYGE+V HEF+LN Y S ++V+ AA I QRGG +\n", "Sbjct 188 SVTAFLNDLLERMDIGPKQTQVGIVQYGENVTHEFNLNKYSSTEEVLVAAKKIVQRGGRQ 247\n", "\n", "Query 240 TRTAFGIEFARSEAFQ--KGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAV 297\n", " T TA GI+ AR EAF +G R+G KKVM+++TDGESHD+ L+KVIQ E +N+ R+++\n", "Sbjct 248 TMTALGIDTARKEAFTEARGARRGVKKVMVIVTDGESHDNHRLKKVIQDCEDENIQRFSI 307\n", "\n", "Query 298 AVLGYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGT-N 356\n", " A+LG YNR ++ E F+ EIK IAS+P +KHFFNV+DE AL IV LG+RIF+LE T +\n", "Sbjct 308 AILGSYNRGNLSTEKFVEEIKSIASEPTEKHFFNVSDELALVTIVKTLGERIFALEATAD 367\n", "\n", "Query 357 KNETSFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPE 416\n", " ++ SF +EMSQTGFS+H +D V+LGAVGAYDWNG V+ + ++ +IP ++ E +\n", "Sbjct 368 QSAASFEMEMSQTGFSAHYSQDWVMLGAVGAYDWNGTVVMQKASQIIIPRNTTFNVESTK 427\n", "\n", "Query 417 ELKNHGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQI 476\n", " + + +YLGYTV S +S +Y+AG PR+NHTG+VI++ M + ++ I Q + G+QI\n", "Sbjct 428 KNEPLASYLGYTVNSATASSGDVLYIAGQPRYNHTGQVIIYRMEDG-NIKILQTLSGEQI 486\n", "\n", "Query 477 GSYFGSEITSVDIDGDGVTDVLLVGAPMYF-NEGRERGKVYVYELRQNLFVYNGTLK--- 532\n", " GSYFGS +T+ DID D TD+LLVGAPMY E E+GKVYVY L Q F Y +L+ \n", "Sbjct 487 GSYFGSILTTTDIDKDSNTDILLVGAPMYMGTEKEEQGKVYVYALNQTRFEYQMSLEPIK 546\n", "\n", "Query 533 ---------DSHSYQN------ARFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIF 577\n", " +S + +N ARFG++IA+V+DLN D +ND+V+GAPLED+H GA+YI+\n", "Sbjct 547 QTCCSSRQHNSCTTENKNEPCGARFGTAIAAVKDLNLDGFNDIVIGAPLEDDHGGAVYIY 606\n", "\n", "Query 578 HGFRGSILKTPKQRITASELATGLQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRP 637\n", " HG +I K QRI + L++FG SIHG++DLN DGL D+ +G LG A + WSR \n", "Sbjct 607 HGSGKTIRKEYAQRIPSGGDGKTLKFFGQSIHGEMDLNGDGLTDVTIGGLGGAALFWSRD 666\n", "\n", "Query 638 VVQINASLHFEPSKINIFHRDCKRSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATM 697\n", " V + +++FEP+K+NI ++C G++ C+ A +CF + ++Y T+\n", "Sbjct 667 VAVVKVTMNFEPNKVNIQKKNCHMEGKETVCINATVCFDVKLKSKEDTIYEADLQYRVTL 726\n", "\n", "Query 698 DERRYTPRAHLDEGGDRFTNRAVLLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDP 757\n", " D R R+ +R R + + + C + +F++LD D+ V +++++L DP\n", "Sbjct 727 DSLRQISRSFFSGTQERKVQRNITVRKSE--CTKHSFYMLDKHDFQDSVRITLDFNLTDP 784\n", "\n", "Query 758 DHGPMLDDGWPTTLRVSVPFWNGCNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDC 817\n", " ++GP+LDD P ++ +PF C E C+ DL L \n", "Sbjct 785 ENGPVLDDSLPNSVHEYIPFAKDCGNKEKCISDLSL------------------------ 820\n", "\n", "Query 818 SAYTLSFDTTVFIIESTRQRVAVEATLENRGENAYSTVLNISQSANLQFASL--IQKEDS 875\n", " + + + + I+ S + V T++N ++AY+T + S NL F+ + IQK+ \n", "Sbjct 821 --HVATTEKDLLIVRSQNDKFNVSLTVKNTKDSAYNTRTIVHYSPNLVFSGIEAIQKDSC 878\n", "\n", "Query 876 DGSIECVNEERRLQKQVCNVSYPFFRAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNE 935\n", " + + C V YPF R V F++ F+F+ S + ++ I L+A SDS E\n", "Sbjct 879 ESN----------HNITCKVGYPFLRRGEMVTFKILFQFNTSYLMENVTIYLSATSDSEE 928\n", "\n", "Query 936 RDSTKEDNVAPLRFHLKYEADVLFTRSSSLSHYEVKPNSS----LERYDGIGPPFSCIFR 991\n", " T DNV + +KYE + F S+S H + N + + + IG + + \n", "Sbjct 929 PPETLSDNVVNISIPVKYEVGLQFYSSASEYHISIAANETVPEVINSTEDIGNEINIFYL 988\n", "\n", "Query 992 IQNLGLFPIHGMMMKITIPIATRSGNRLLKLRDFLTDE-ANTSCNIWGN----STEYRPT 1046\n", " I+ G FP+ + + I+ P T +G +L + E AN +I+ + ++ + T\n", "Sbjct 989 IRKSGSFPMPELKLSISFPNMTSNGYPVLYPTGLSSSENANCRPHIFEDPFSINSGKKMT 1048\n", "\n", "Query 1047 PVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEINFHLLGNLWLRSLKALKYKSMKIMVN 1106\n", " + L+R L+ + +I CN+ ++N L+ LW + + S+ + + \n", "Sbjct 1049 TSTDHLKRGTILDCNTCKFATITCNLTSSDISQVNVSLI--LWKPTFIKSYFSSLNLTIR 1106\n", "\n", "Query 1107 AALQRQFHSPFIFREEDPSRQIVFEISKQE-DWQVPIWIIVGSTLGGLLLLALLVLALWK 1165\n", " L R ++ + + R++ +ISK +VP+W+I+ S GLLLL LL+LALWK\n", "Sbjct 1107 GEL-RSENASLVLSSSNQKRELAIQISKDGLPGRVPLWVILLSAFAGLLLLMLLILALWK 1165\n", "\n", "Query 1166 LGFFRSARRRR 1176\n", " +GFF+ +++\n", "Sbjct 1166 IGFFKRPLKKK 1176\n", "\n", "\n", "> CHEMBL3137278 [P56199] Integrin alpha-1 (Homo sapiens)\n", "Length=1179\n", "\n", " Score = 825 bits (2131), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 464/1211 (38%), Positives = 704/1211 (58%), Gaps = 84/1211 (7%)\n", "\n", "Query 6 GLVVA--WALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGAPLE 63\n", " G+ VA W L++ +FN+D + G FGYTVQQ++ KW+++G+PL \n", "Sbjct 10 GVAVACCWLLTVVLRCCVSFNVDVKNSMTFSGPVEDMFGYTVQQYENEEGKWVLIGSPLV 69\n", "\n", "Query 64 TNGYQKTGDVYKCPVIHGN---CTKLNLG-RVTLSNVSERKDNMRLGLSLATNPKDNSFL 119\n", " +TGDVYKCPV G C KL+L ++ NV+E K+NM G +L TNP + FL\n", "Sbjct 70 GQPKNRTGDVYKCPVGRGESLPCVKLDLPVNTSIPNVTEVKENMTFGSTLVTNP-NGGFL 128\n", "\n", "Query 120 ACSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWV 179\n", " AC PL+++ CG +YTTG+CS V+ F+ ++AP +Q C T +DIVIVLDGSNSIYPW \n", "Sbjct 129 ACGPLYAYRCGHLHYTTGICSDVSPTFQVVNSIAP-VQECSTQLDIVIVLDGSNSIYPWD 187\n", "\n", "Query 180 EVQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTE 239\n", " V FL ++L++ IGP Q QVG+VQYGE+V HEF+LN Y S ++V+ AA I QRGG +\n", "Sbjct 188 SVTAFLNDLLERMDIGPKQTQVGIVQYGENVTHEFNLNKYSSTEEVLVAAKKIVQRGGRQ 247\n", "\n", "Query 240 TRTAFGIEFARSEAFQ--KGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAV 297\n", " T TA GI+ AR EAF +G R+G KKVM+++TDGESHD+ L+KVIQ E +N+ R+++\n", "Sbjct 248 TMTALGIDTARKEAFTEARGARRGVKKVMVIVTDGESHDNHRLKKVIQDCEDENIQRFSI 307\n", "\n", "Query 298 AVLGYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGT-N 356\n", " A+LG YNR ++ E F+ EIK IAS+P +KHFFNV+DE AL IV LG+RIF+LE T +\n", "Sbjct 308 AILGSYNRGNLSTEKFVEEIKSIASEPTEKHFFNVSDELALVTIVKTLGERIFALEATAD 367\n", "\n", "Query 357 KNETSFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPE 416\n", " ++ SF +EMSQTGFS+H +D V+LGAVGAYDWNG V+ + ++ +IP ++ E +\n", "Sbjct 368 QSAASFEMEMSQTGFSAHYSQDWVMLGAVGAYDWNGTVVMQKASQIIIPRNTTFNVESTK 427\n", "\n", "Query 417 ELKNHGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQI 476\n", " + + +YLGYTV S +S +Y+AG PR+NHTG+VI++ M + ++ I Q + G+QI\n", "Sbjct 428 KNEPLASYLGYTVNSATASSGDVLYIAGQPRYNHTGQVIIYRMEDG-NIKILQTLSGEQI 486\n", "\n", "Query 477 GSYFGSEITSVDIDGDGVTDVLLVGAPMYF-NEGRERGKVYVYELRQNLFVYNGTLK--- 532\n", " GSYFGS +T+ DID D TD+LLVGAPMY E E+GKVYVY L Q F Y +L+ \n", "Sbjct 487 GSYFGSILTTTDIDKDSNTDILLVGAPMYMGTEKEEQGKVYVYALNQTRFEYQMSLEPIK 546\n", "\n", "Query 533 ---------DSHSYQN------ARFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIF 577\n", " +S + +N ARFG++IA+V+DLN D +ND+V+GAPLED+H GA+YI+\n", "Sbjct 547 QTCCSSRQHNSCTTENKNEPCGARFGTAIAAVKDLNLDGFNDIVIGAPLEDDHGGAVYIY 606\n", "\n", "Query 578 HGFRGSILKTPKQRITASELATGLQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRP 637\n", " HG +I K QRI + L++FG SIHG++DLN DGL D+ +G LG A + WSR \n", "Sbjct 607 HGSGKTIRKEYAQRIPSGGDGKTLKFFGQSIHGEMDLNGDGLTDVTIGGLGGAALFWSRD 666\n", "\n", "Query 638 VVQINASLHFEPSKINIFHRDCKRSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATM 697\n", " V + +++FEP+K+NI ++C G++ C+ A +CF + ++Y T+\n", "Sbjct 667 VAVVKVTMNFEPNKVNIQKKNCHMEGKETVCINATVCFDVKLKSKEDTIYEADLQYRVTL 726\n", "\n", "Query 698 DERRYTPRAHLDEGGDRFTNRAVLLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDP 757\n", " D R R+ +R R + + + C + +F++LD D+ V +++++L DP\n", "Sbjct 727 DSLRQISRSFFSGTQERKVQRNITVRKSE--CTKHSFYMLDKHDFQDSVRITLDFNLTDP 784\n", "\n", "Query 758 DHGPMLDDGWPTTLRVSVPFWNGCNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDC 817\n", " ++GP+LDD P ++ +PF C E C+ DL L \n", "Sbjct 785 ENGPVLDDSLPNSVHEYIPFAKDCGNKEKCISDLSL------------------------ 820\n", "\n", "Query 818 SAYTLSFDTTVFIIESTRQRVAVEATLENRGENAYSTVLNISQSANLQFASL--IQKEDS 875\n", " + + + + I+ S + V T++N ++AY+T + S NL F+ + IQK+ \n", "Sbjct 821 --HVATTEKDLLIVRSQNDKFNVSLTVKNTKDSAYNTRTIVHYSPNLVFSGIEAIQKDSC 878\n", "\n", "Query 876 DGSIECVNEERRLQKQVCNVSYPFFRAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNE 935\n", " + + C V YPF R V F++ F+F+ S + ++ I L+A SDS E\n", "Sbjct 879 ESN----------HNITCKVGYPFLRRGEMVTFKILFQFNTSYLMENVTIYLSATSDSEE 928\n", "\n", "Query 936 RDSTKEDNVAPLRFHLKYEADVLFTRSSSLSHYEVKPNSS----LERYDGIGPPFSCIFR 991\n", " T DNV + +KYE + F S+S H + N + + + IG + + \n", "Sbjct 929 PPETLSDNVVNISIPVKYEVGLQFYSSASEYHISIAANETVPEVINSTEDIGNEINIFYL 988\n", "\n", "Query 992 IQNLGLFPIHGMMMKITIPIATRSGNRLLKLRDFLTDE-ANTSCNIWGN----STEYRPT 1046\n", " I+ G FP+ + + I+ P T +G +L + E AN +I+ + ++ + T\n", "Sbjct 989 IRKSGSFPMPELKLSISFPNMTSNGYPVLYPTGLSSSENANCRPHIFEDPFSINSGKKMT 1048\n", "\n", "Query 1047 PVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEINFHLLGNLWLRSLKALKYKSMKIMVN 1106\n", " + L+R L+ + +I CN+ ++N L+ LW + + S+ + + \n", "Sbjct 1049 TSTDHLKRGTILDCNTCKFATITCNLTSSDISQVNVSLI--LWKPTFIKSYFSSLNLTIR 1106\n", "\n", "Query 1107 AALQRQFHSPFIFREEDPSRQIVFEISKQE-DWQVPIWIIVGSTLGGLLLLALLVLALWK 1165\n", " L R ++ + + R++ +ISK +VP+W+I+ S GLLLL LL+LALWK\n", "Sbjct 1107 GEL-RSENASLVLSSSNQKRELAIQISKDGLPGRVPLWVILLSAFAGLLLLMLLILALWK 1165\n", "\n", "Query 1166 LGFFRSARRRR 1176\n", " +GFF+ +++\n", "Sbjct 1166 IGFFKRPLKKK 1176\n", "\n", "\n", "> CHEMBL4998 [P17301] Integrin alpha-2 (Homo sapiens)\n", "Length=1181\n", "\n", " Score = 710 bits (1833), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 428/1194 (36%), Positives = 675/1194 (57%), Gaps = 73/1194 (6%)\n", "\n", "Query 23 FNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGAPLETNGYQKTGDVYKCPV--IH 80\n", " +N+ + ++ G + FGY VQQ WL+VG+P + GDVYKCPV \n", "Sbjct 30 YNVGLPEAKIFSGPSSEQFGYAVQQFINPKGNWLLVGSPWSGFPENRMGDVYKCPVDLST 89\n", "\n", "Query 81 GNCTKLNLGRVT-LSNVSERKDNMRLGLSLATNPKDNSFLACSPLWSHECGSSYYTTGMC 139\n", " C KLNL T + NV+E K NM LGL L N FL C PLW+ +CG+ YYTTG+C\n", "Sbjct 90 ATCEKLNLQTSTSIPNVTEMKTNMSLGLILTRNMGTGGFLTCGPLWAQQCGNQYYTTGVC 149\n", "\n", "Query 140 SRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVEVQHFLINILKKFYIGPGQI 199\n", " S ++ +F+ S + +PA Q C + +D+V+V D SNSIYPW V++FL ++ IGP + \n", "Sbjct 150 SDISPDFQLSASFSPATQPCPSLIDVVVVCDESNSIYPWDAVKNFLEKFVQGLDIGPTKT 209\n", "\n", "Query 200 QVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTETRTAFGIEFARSEAFQ--KG 257\n", " QVG++QY + F+LN Y++ ++++ A S Q GG T T I++AR A+ G\n", "Sbjct 210 QVGLIQYANNPRVVFNLNTYKTKEEMIVATSQTSQYGGDLTNTFGAIQYARKYAYSAASG 269\n", "\n", "Query 258 GRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVLGYYNRRGINPETFLNEI 317\n", " GR+ A KVM+V+TDGESHD L+ VI Q DN+ R+ +AVLGY NR ++ + + EI\n", "Sbjct 270 GRRSATKVMVVVTDGESHDGSMLKAVIDQCNHDNILRFGIAVLGYLNRNALDTKNLIKEI 329\n", "\n", "Query 318 KYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNETSFGLEMSQTGFSSHVVE 377\n", " K IAS P +++FFNV+DEAAL + LG++IFS+EGT + +F +EMSQ GFS+ \n", "Sbjct 330 KAIASIPTERYFFNVSDEAALLEKAGTLGEQIFSIEGTVQGGDNFQMEMSQVGFSADYSS 389\n", "\n", "Query 378 --DGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKNHGAYLGYTVTSVVSS 435\n", " D ++LGAVGA+ W+G ++++TS G +I ++++ + + +NH +YLGY+V + +S+\n", "Sbjct 390 QNDILMLGAVGAFGWSGTIVQKTSHGHLIFPKQAFDQILQD--RNHSSYLGYSV-AAIST 446\n", "\n", "Query 436 RQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYFGSEITSVDIDGDGVT 495\n", " + +VAGAPR N+TG+++L++++ N ++T+ QA RG QIGSYFGS + SVD+D D +T\n", "Sbjct 447 GESTHFVAGAPRANYTGQIVLYSVNENGNITVIQAHRGDQIGSYFGSVLCSVDVDKDTIT 506\n", "\n", "Query 496 DVLLVGAPMYFNE-GRERGKVYVYELRQNLFVYNGTLKDSHSYQNARFGSSIASVRDLNQ 554\n", " DVLLVGAPMY ++ +E G+VY++ +++ + + L+ +N RFGS+IA++ D+N \n", "Sbjct 507 DVLLVGAPMYMSDLKKEEGRVYLFTIKKGILGQHQFLEGPEGIENTRFGSAIAALSDINM 566\n", "\n", "Query 555 DSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELA--TGLQYFGCSIHGQL 612\n", " D +NDV+VG+PLE+ ++GA+YI++G +G+I Q+I S+ A + LQYFG S+ G \n", "Sbjct 567 DGFNDVIVGSPLENQNSGAVYIYNGHQGTIRTKYSQKILGSDGAFRSHLQYFGRSLDGYG 626\n", "\n", "Query 613 DLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCKRSGRDATCLAAF 672\n", " DLN D + D+++GA G V LWS+ + + F P KI + +++ + + \n", "Sbjct 627 DLNGDSITDVSIGAFGQVVQLWSQSIADVAIEASFTPEKITLVNKNAQ--------IILK 678\n", "\n", "Query 673 LCFTPIFLAPHFQTTTVGIRYNATMD----ERRYTPRAHLDEGGDRFTNRAVLLSSGQEL 728\n", " LCF+ F P Q V I YN T+D R T R E +R + ++++ Q \n", "Sbjct 679 LCFSAKF-RPTKQNNQVAIVYNITLDADGFSSRVTSRGLFKENNERCLQKNMVVNQAQSC 737\n", "\n", "Query 729 CERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNGCNEDEHCV 788\n", " E I ++ + +D V + V+ SLE+P P L+ T S+PF C ED C+\n", "Sbjct 738 PEHI-IYIQEPSDVVNSLDLRVDISLENPGTSPALEAYSETAKVFSIPFHKDCGEDGLCI 796\n", "\n", "Query 789 PDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAVEATLENRG 848\n", " DLVLD R +P A E +P FI+ + +R+ TL+N+ \n", "Sbjct 797 SDLVLDVR-QIPAAQE-------QP---------------FIVSNQNKRLTFSVTLKNKR 833\n", "\n", "Query 849 ENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQV-CNVSYPFFRAKAKVA 907\n", " E+AY+T + + S NL FAS D E + QK V C+V YP + + +V \n", "Sbjct 834 ESAYNTGIVVDFSENLFFASFSLPVD---GTEVTCQVAASQKSVACDVGYPALKREQQVT 890\n", "\n", "Query 908 FRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFTRSSSLSH 967\n", " F ++F+F+ + + A S+S E + K DN+ L+ L Y+A++ TRS++++ \n", "Sbjct 891 FTINFDFNLQNLQNQASLSFQALSESQEEN--KADNLVNLKIPLLYDAEIHLTRSTNINF 948\n", "\n", "Query 968 YEVKPN----SSLERYDGIGPPFSCIFRIQ-NLGLFPIHGMMMKITIPIATRSGNRLLKL 1022\n", " YE+ + S + ++ +GP F IF ++ G P+ + I IP T+ N L+ L\n", "Sbjct 949 YEISSDGNVPSIVHSFEDVGPKF--IFSLKVTTGSVPVSMATVIIHIPQYTKEKNPLMYL 1006\n", "\n", "Query 1023 RDFLTDEA-NTSCNIWGNSTEYRPTPV-----EEDLRRAPQLNHSNSDVVSINCNIRLVP 1076\n", " TD+A + SCN N + T E+ R +LN + ++ C ++ V \n", "Sbjct 1007 TGVQTDKAGDISCNADINPLKIGQTSSSVSFKSENFRHTKELNCRTASCSNVTCWLKDVH 1066\n", "\n", "Query 1077 NQ-EINFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISK- 1134\n", " + E ++ +W + + ++++++ AA + ++P I+ ED + I I K \n", "Sbjct 1067 MKGEYFVNVTTRIWNGTFASSTFQTVQL--TAAAEINTYNPEIYVIEDNTVTIPLMIMKP 1124\n", "\n", "Query 1135 QEDWQVPIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE 1188\n", " E +VP +I+GS + G+LLL LV LWKLGFF+ + D + E\n", "Sbjct 1125 DEKAEVPTGVIIGSIIAGILLLLALVAILWKLGFFKRKYEKMTKNPDEIDETTE 1178\n", "\n", "\n", "\n", "Lambda K H a alpha\n", " 0.320 0.136 0.410 0.792 4.96 \n", "\n", "Gapped\n", "Lambda K H a alpha sigma\n", " 0.267 0.0410 0.140 1.90 42.6 43.6 \n", "\n", "Effective search space used: 4584869670\n", "\n", "\n", "Query= P06804_TNFA_MOUSE\n", "\n", "Length=235\n", " Score E\n", "Sequences producing significant alignments: (Bits) Value\n", "\n", " CHEMBL4984 [P06804] Tumor necrosis factor (Mus musculus) 483 6e-175\n", " CHEMBL1825 [P01375] Tumor necrosis factor (Homo sapiens) 379 9e-134\n", " CHEMBL2059 [P01374] Lymphotoxin-alpha (Homo sapiens) 88.6 2e-21 \n", " CHEMBL5714 [P48023] Tumor necrosis factor ligand superfamily me... 59.3 8e-11 \n", " CHEMBL2364162 [O14788] Tumor necrosis factor ligand superfamily... 52.8 1e-08 \n", "\n", "\n", "> CHEMBL4984 [P06804] Tumor necrosis factor (Mus musculus)\n", "Length=235\n", "\n", " Score = 483 bits (1244), Expect = 6e-175, Method: Compositional matrix adjust.\n", " Identities = 235/235 (100%), Positives = 235/235 (100%), Gaps = 0/235 (0%)\n", "\n", "Query 1 MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR 60\n", " MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR\n", "Sbjct 1 MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR 60\n", "\n", "Query 61 DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM 120\n", " DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM\n", "Sbjct 61 DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM 120\n", "\n", "Query 121 DLKDNQLVVPADGLYLVYSQVLFKGQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCP 180\n", " DLKDNQLVVPADGLYLVYSQVLFKGQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCP\n", "Sbjct 121 DLKDNQLVVPADGLYLVYSQVLFKGQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCP 180\n", "\n", "Query 181 KDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235\n", " KDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL\n", "Sbjct 181 KDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235\n", "\n", "\n", "> CHEMBL1825 [P01375] Tumor necrosis factor (Homo sapiens)\n", "Length=233\n", "\n", " Score = 379 bits (973), Expect = 9e-134, Method: Compositional matrix adjust.\n", " Identities = 186/236 (79%), Positives = 211/236 (89%), Gaps = 4/236 (2%)\n", "\n", "Query 1 MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR 60\n", " MSTESMIRDVELAEEALP+K GG Q SRRCL LSLFSFL+VAGATTLFCLL+FGVIGPQR\n", "Sbjct 1 MSTESMIRDVELAEEALPKKTGGPQGSRRCLFLSLFSFLIVAGATTLFCLLHFGVIGPQR 60\n", "\n", "Query 61 DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM 120\n", " +E FP L LIS +AQ + RSSS+ SDKPVAHVVAN Q E QL+WL++RANALLANG+\n", "Sbjct 61 EE-FPRDLSLISPLAQAV--RSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGV 117\n", "\n", "Query 121 DLKDNQLVVPADGLYLVYSQVLFKGQGCPD-YVLLTHTVSRFAISYQEKVNLLSAVKSPC 179\n", " +L+DNQLVVP++GLYL+YSQVLFKGQGCP +VLLTHT+SR A+SYQ KVNLLSA+KSPC\n", "Sbjct 118 ELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPC 177\n", "\n", "Query 180 PKDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235\n", " ++TPEGAE KPWYEPIYLGGVFQLEKGD+LSAE+N P YLDFAESGQVYFG+IAL\n", "Sbjct 178 QRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL 233\n", "\n", "\n", "> CHEMBL2059 [P01374] Lymphotoxin-alpha (Homo sapiens)\n", "Length=205\n", "\n", " Score = 88.6 bits (218), Expect = 2e-21, Method: Compositional matrix adjust.\n", " Identities = 63/178 (35%), Positives = 87/178 (49%), Gaps = 18/178 (10%)\n", "\n", "Query 67 GLPLISSMAQTLTLRSSSQ--NSSDKPVAHVVANHQVEEQLEWLSQRANALLANGMDLKD 124\n", " G+ L S AQT +S+ KP AH++ + + L W + A L +G L +\n", "Sbjct 37 GVGLTPSAAQTARQHPKMHLAHSTLKPAAHLIGDPSKQNSLLWRANTDRAFLQDGFSLSN 96\n", "\n", "Query 125 NQLVVPADGLYLVYSQVLFKGQG-------CPDYVLLTHTVSRFAISYQEKVNLLSAVKS 177\n", " N L+VP G+Y VYSQV+F G+ P Y L H V F+ Y V LLS+ K \n", "Sbjct 97 NSLLVPTSGIYFVYSQVVFSGKAYSPKATSSPLY--LAHEVQLFSSQYPFHVPLLSSQKM 154\n", "\n", "Query 178 PCPKDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235\n", " P G + +PW +Y G FQL +GDQLS + +L + S V+FG AL\n", "Sbjct 155 VYP-----GLQ-EPWLHSMYHGAAFQLTQGDQLSTHTDGIPHLVLSPS-TVFFGAFAL 205\n", "\n", "\n", "> CHEMBL5714 [P48023] Tumor necrosis factor ligand superfamily \n", "member 6 (Homo sapiens)\n", "Length=281\n", "\n", " Score = 59.3 bits (142), Expect = 8e-11, Method: Compositional matrix adjust.\n", " Identities = 46/147 (31%), Positives = 64/147 (44%), Gaps = 10/147 (7%)\n", "\n", "Query 90 KPVAHVVANHQVEEQ-LEWLSQRANALLANGMDLKDNQLVVPADGLYLVYSQVLFKGQGC 148\n", " + VAH+ LEW LL+ G+ K LV+ GLY VYS+V F+GQ C\n", "Sbjct 144 RKVAHLTGKSNSRSMPLEWEDTYGIVLLS-GVKYKKGGLVINETGLYFVYSKVYFRGQSC 202\n", "\n", "Query 149 PDYVLLTHTVSRFAISYQEKVNLLSAVKSPCPKDTPEGAELKPWYEPIYLGGVFQLEKGD 208\n", " + L R + Q+ V + + S C + W YLG VF L D\n", "Sbjct 203 NNLPLSHKVYMRNSKYPQDLVMMEGKMMSYCTTG-------QMWARSSYLGAVFNLTSAD 255\n", "\n", "Query 209 QLSAEVNLPKYLDFAESGQVYFGVIAL 235\n", " L V+ ++F ES Q +FG+ L\n", "Sbjct 256 HLYVNVSELSLVNFEES-QTFFGLYKL 281\n", "\n", "\n", "> CHEMBL2364162 [O14788] Tumor necrosis factor ligand superfamily \n", "member 11 (Homo sapiens)\n", "Length=317\n", "\n", " Score = 52.8 bits (125), Expect = 1e-08, Method: Compositional matrix adjust.\n", " Identities = 42/166 (25%), Positives = 71/166 (43%), Gaps = 35/166 (21%)\n", "\n", "Query 90 KPVAHVVAN--------HQVEEQLEWLSQRANALLANGMDLKDNQLVVPADGLYLVYSQV 141\n", " +P AH+ N H+V W R A ++N M + +L+V DG Y +Y+ +\n", "Sbjct 163 QPFAHLTINATDIPSGSHKVSLS-SWYHDRGWAKISN-MTFSNGKLIVNQDGFYYLYANI 220\n", "\n", "Query 142 LFK-----GQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCPKDTPEGAELKPW---- 192\n", " F+ G +Y+ L V++ +++K P +G K W \n", "Sbjct 221 CFRHHETSGDLATEYLQLMVYVTK------------TSIKIPSSHTLMKGGSTKYWSGNS 268\n", "\n", "Query 193 ---YEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235\n", " + I +GG F+L G+++S EV+ P LD + YFG +\n", "Sbjct 269 EFHFYSINVGGFFKLRSGEEISIEVSNPSLLD-PDQDATYFGAFKV 313\n", "\n", "\n", "\n", "Lambda K H a alpha\n", " 0.319 0.135 0.396 0.792 4.96 \n", "\n", "Gapped\n", "Lambda K H a alpha sigma\n", " 0.267 0.0410 0.140 1.90 42.6 43.6 \n", "\n", "Effective search space used: 627431904\n", "\n", "\n", "Query= P48050_KCNJ4_HUMAN\n", "\n", "Length=445\n", " Score E\n", "Sequences producing significant alignments: (Bits) Value\n", "\n", " CHEMBL2146347 [P48050] Inward rectifier potassium channel 4 (Ho... 925 0.0 \n", " CHEMBL1293290 [P35561] Inward rectifier potassium channel 2 (Mu... 544 0.0 \n", " CHEMBL1914276 [P63252] Inward rectifier potassium channel 2 (Ho... 540 0.0 \n", " CHEMBL3038488 [P48544] G protein-activated inward rectifier pot... 402 6e-137\n", " CHEMBL2406895 [P48051] G protein-activated inward rectifier pot... 395 9e-134\n", "\n", "\n", "> CHEMBL2146347 [P48050] Inward rectifier potassium channel 4 \n", "(Homo sapiens)\n", "Length=445\n", "\n", " Score = 925 bits (2391), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 445/445 (100%), Positives = 445/445 (100%), Gaps = 0/445 (0%)\n", "\n", "Query 1 MHGHSRNGQAHVPRRKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLM 60\n", " MHGHSRNGQAHVPRRKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLM\n", "Sbjct 1 MHGHSRNGQAHVPRRKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLM 60\n", "\n", "Query 61 IFSAAFLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNG 120\n", " IFSAAFLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNG\n", "Sbjct 61 IFSAAFLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNG 120\n", "\n", "Query 121 FLGAFLFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKK 180\n", " FLGAFLFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKK\n", "Sbjct 121 FLGAFLFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKK 180\n", "\n", "Query 181 RAQTLLFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQR 240\n", " RAQTLLFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQR\n", "Sbjct 181 RAQTLLFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQR 240\n", "\n", "Query 241 DLNVGYDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMT 300\n", " DLNVGYDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMT\n", "Sbjct 241 DLNVGYDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMT 300\n", "\n", "Query 301 TQARSSYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITV 360\n", " TQARSSYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITV\n", "Sbjct 301 TQARSSYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITV 360\n", "\n", "Query 361 LPAPPPPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFG 420\n", " LPAPPPPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFG\n", "Sbjct 361 LPAPPPPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFG 420\n", "\n", "Query 421 SHLDLERMQASLPLDNISYRRESAI 445\n", " SHLDLERMQASLPLDNISYRRESAI\n", "Sbjct 421 SHLDLERMQASLPLDNISYRRESAI 445\n", "\n", "\n", "> CHEMBL1293290 [P35561] Inward rectifier potassium channel 2 \n", "(Mus musculus)\n", "Length=428\n", "\n", " Score = 544 bits (1401), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 270/440 (61%), Positives = 325/440 (74%), Gaps = 44/440 (10%)\n", "\n", "Query 7 NGQAHVPRRKR-RNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLMIFSAA 65\n", " NG++ V R++ R+RFVKK+G CNV F N+ K QRY+ADIFTTCVD RWR+ML+IF A\n", "Sbjct 32 NGKSKVHTRQQCRSRFVKKDGHCNVQFINVGEKGQRYLADIFTTCVDIRWRWMLVIFCLA 91\n", "\n", "Query 66 FLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNGFLGAF 125\n", " F++SWLFFG +FW IA HGDL+ S K C+ VN F AF\n", "Sbjct 92 FVLSWLFFGCVFWLIALLHGDLDTSK------------------VSKACVSEVNSFTAAF 133\n", "\n", "Query 126 LFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKKRAQTL 185\n", " LFS+ETQTTIGYGFRCVT+ECP+AV VV QSIVGC+ID+F+IG +MAKMA+PKKR +TL\n", "Sbjct 134 LFSIETQTTIGYGFRCVTDECPIAVFMVVFQSIVGCIIDAFIIGAVMAKMAKPKKRNETL 193\n", "\n", "Query 186 LFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQRDLNVG 245\n", " +FSH+AVI++RDGKLCLMWRVGNLRKSH+VEAHVRAQL+K +T EGEY+PLDQ D+NVG\n", "Sbjct 194 VFSHNAVIAMRDGKLCLMWRVGNLRKSHLVEAHVRAQLLKSRITSEGEYIPLDQIDINVG 253\n", "\n", "Query 246 YDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMTTQARS 305\n", " +D G+DRIFLVSPI IVHEIDEDSPLY + K+++++ DFEIVVILEGMVEATAMTTQ RS\n", "Sbjct 254 FDSGIDRIFLVSPITIVHEIDEDSPLYDLSKQDIDNADFEIVVILEGMVEATAMTTQCRS 313\n", "\n", "Query 306 SYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITVLPAPP 365\n", " SYLA+EILWGHR+EPV+FEEK +YKVDYSRFHKTYEV TP CSAR+L E K + A \n", "Sbjct 314 SYLANEILWGHRYEPVLFEEKHYYKVDYSRFHKTYEVPNTPLCSARDLAEKKYILSNA-- 371\n", "\n", "Query 366 PPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFGSHLDL 425\n", " ++FCYENE+AL S+EEEE E G+ + GI DL\n", "Sbjct 372 ---NSFCYENEVALTSKEEEEDSEN---------GVPESTSTDSPPGI----------DL 409\n", "\n", "Query 426 ERMQASLPLDNISYRRESAI 445\n", " QAS+PL+ RRES I\n", "Sbjct 410 HN-QASVPLEPRPLRRESEI 428\n", "\n", "\n", "> CHEMBL1914276 [P63252] Inward rectifier potassium channel 2 \n", "(Homo sapiens)\n", "Length=427\n", "\n", " Score = 540 bits (1390), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 266/440 (60%), Positives = 325/440 (74%), Gaps = 45/440 (10%)\n", "\n", "Query 7 NGQAHVPRRKR-RNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLMIFSAA 65\n", " NG++ V R++ R+RFVKK+G CNV F N+ K QRY+ADIFTTCVD RWR+ML+IF A\n", "Sbjct 32 NGKSKVHTRQQCRSRFVKKDGHCNVQFINVGEKGQRYLADIFTTCVDIRWRWMLVIFCLA 91\n", "\n", "Query 66 FLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNGFLGAF 125\n", " F++SWLFFG +FW IA HGDL+AS K C+ VN F AF\n", "Sbjct 92 FVLSWLFFGCVFWLIALLHGDLDASK------------------EGKACVSEVNSFTAAF 133\n", "\n", "Query 126 LFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKKRAQTL 185\n", " LFS+ETQTTIGYGFRCVT+ECP+AV VV QSIVGC+ID+F+IG +MAKMA+PKKR +TL\n", "Sbjct 134 LFSIETQTTIGYGFRCVTDECPIAVFMVVFQSIVGCIIDAFIIGAVMAKMAKPKKRNETL 193\n", "\n", "Query 186 LFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQRDLNVG 245\n", " +FSH+AVI++RDGKLCLMWRVGNLRKSH+VEAHVRAQL+K +T EGEY+PLDQ D+NVG\n", "Sbjct 194 VFSHNAVIAMRDGKLCLMWRVGNLRKSHLVEAHVRAQLLKSRITSEGEYIPLDQIDINVG 253\n", "\n", "Query 246 YDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMTTQARS 305\n", " +D G+DRIFLVSPI IVHEIDEDSPLY + K+++++ DFEIVVILEGMVEATAMTTQ RS\n", "Sbjct 254 FDSGIDRIFLVSPITIVHEIDEDSPLYDLSKQDIDNADFEIVVILEGMVEATAMTTQCRS 313\n", "\n", "Query 306 SYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITVLPAPP 365\n", " SYLA+EILWGHR+EPV+FEEK +YKVDYSRFHKTYEV TP CSAR+L E K + A \n", "Sbjct 314 SYLANEILWGHRYEPVLFEEKHYYKVDYSRFHKTYEVPNTPLCSARDLAEKKYILSNA-- 371\n", "\n", "Query 366 PPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFGSHLDL 425\n", " ++FCYENE+AL S+EE++ E + + + +DL\n", "Sbjct 372 ---NSFCYENEVALTSKEEDDSENGVPESTST--------------------DTPPDIDL 408\n", "\n", "Query 426 ERMQASLPLDNISYRRESAI 445\n", " QAS+PL+ RRES I\n", "Sbjct 409 HN-QASVPLEPRPLRRESEI 427\n", "\n", "\n", "> CHEMBL3038488 [P48544] G protein-activated inward rectifier \n", "potassium channel 4 (Homo sapiens)\n", "Length=419\n", "\n", " Score = 402 bits (1034), Expect = 6e-137, Method: Compositional matrix adjust.\n", " Identities = 193/391 (49%), Positives = 279/391 (71%), Gaps = 25/391 (6%)\n", "\n", "Query 15 RKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLMIFSAAFLVSWLFFG 74\n", " +K R R+++K+G+CNV+ N+ ++ RY++D+FTT VD +WR+ L++F+ + V+WLFFG\n", "Sbjct 47 KKPRQRYMEKSGKCNVHHGNV-QETYRYLSDLFTTLVDLKWRFNLLVFTMVYTVTWLFFG 105\n", "\n", "Query 75 LLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNGFLGAFLFSVETQTT 134\n", " ++W IA+ GDL+ G + PC+ +++GF+ AFLFS+ET+TT\n", "Sbjct 106 FIWWLIAYIRGDLDHV-------------GDQEWI---PCVENLSGFVSAFLFSIETETT 149\n", "\n", "Query 135 IGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKKRAQTLLFSHHAVIS 194\n", " IGYGFR +TE+CP +I ++VQ+I+G ++++FM+G + K+++PKKRA+TL+FS++AVIS\n", "Sbjct 150 IGYGFRVITEKCPEGIILLLVQAILGSIVNAFMVGCMFVKISQPKKRAETLMFSNNAVIS 209\n", "\n", "Query 195 VRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQRDLNVGYDIGLDRIF 254\n", " +RD KLCLM+RVG+LR SHIVEA +RA+LIK T+EGE++PL+Q D+NVG+D G DR+F\n", "Sbjct 210 MRDEKLCLMFRVGDLRNSHIVEASIRAKLIKSRQTKEGEFIPLNQTDINVGFDTGDDRLF 269\n", "\n", "Query 255 LVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMTTQARSSYLASEILW 314\n", " LVSP+II HEI++ SP + M + +L E+FE+VVILEGMVEAT MT QARSSY+ +E+LW\n", "Sbjct 270 LVSPLIISHEINQKSPFWEMSQAQLHQEEFEVVVILEGMVEATGMTCQARSSYMDTEVLW 329\n", "\n", "Query 315 GHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESK-----ITVLPAPPPPPS 369\n", " GHRF PV+ EK Y+VDY+ FH TYE TP C A+EL E K + LP+PP \n", "Sbjct 330 GHRFTPVLTLEKGFYEVDYNTFHDTYE-TNTPSCCAKELAEMKREGRLLQYLPSPPLLGG 388\n", "\n", "Query 370 AFCYENELALMSQEEEEMEEEAAAAAAVAAG 400\n", " C E L +++ EE E + + A G\n", "Sbjct 389 --CAEAGLDAEAEQNEEDEPKGLGGSREARG 417\n", "\n", "\n", "> CHEMBL2406895 [P48051] G protein-activated inward rectifier \n", "potassium channel 2 (Homo sapiens)\n", "Length=423\n", "\n", " Score = 395 bits (1014), Expect = 9e-134, Method: Compositional matrix adjust.\n", " Identities = 183/343 (53%), Positives = 257/343 (75%), Gaps = 19/343 (6%)\n", "\n", "Query 14 RRKRR-NRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLMIFSAAFLVSWLF 72\n", " R KR+ R+V+K+G+CNV+ N+ ++ RY+ DIFTT VD +WR+ L+IF + V+WLF\n", "Sbjct 48 RTKRKIQRYVRKDGKCNVHHGNV-RETYRYLTDIFTTLVDLKWRFNLLIFVMVYTVTWLF 106\n", "\n", "Query 73 FGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNGFLGAFLFSVETQ 132\n", " FG+++W IA+ GD++ P+ PC+ ++NGF+ AFLFS+ET+\n", "Sbjct 107 FGMIWWLIAYIRGDMDH------IEDPSW----------TPCVTNLNGFVSAFLFSIETE 150\n", "\n", "Query 133 TTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKKRAQTLLFSHHAV 192\n", " TTIGYG+R +T++CP +I +++QS++G ++++FM+G + K+++PKKRA+TL+FS HAV\n", "Sbjct 151 TTIGYGYRVITDKCPEGIILLLIQSVLGSIVNAFMVGCMFVKISQPKKRAETLVFSTHAV 210\n", "\n", "Query 193 ISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQRDLNVGYDIGLDR 252\n", " IS+RDGKLCLM+RVG+LR SHIVEA +RA+LIK T EGE++PL+Q D+NVGY G DR\n", "Sbjct 211 ISMRDGKLCLMFRVGDLRNSHIVEASIRAKLIKSKQTSEGEFIPLNQTDINVGYYTGDDR 270\n", "\n", "Query 253 IFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMTTQARSSYLASEI 312\n", " +FLVSP+II HEI++ SP + + K +L E+ EIVVILEGMVEAT MT QARSSY+ SEI\n", "Sbjct 271 LFLVSPLIISHEINQQSPFWEISKAQLPKEELEIVVILEGMVEATGMTCQARSSYITSEI 330\n", "\n", "Query 313 LWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQE 355\n", " LWG+RF PV+ E Y+VDY+ FH+TYE + TP SA+EL E\n", "Sbjct 331 LWGYRFTPVLTLEDGFYEVDYNSFHETYETS-TPSLSAKELAE 372\n", "\n", "\n", "\n", "Lambda K H a alpha\n", " 0.322 0.137 0.415 0.792 4.96 \n", "\n", "Gapped\n", "Lambda K H a alpha sigma\n", " 0.267 0.0410 0.140 1.90 42.6 43.6 \n", "\n", "Effective search space used: 1497848376\n", "\n", "\n", "Query= Q80Z70_SE1L1_RAT\n", "\n", "Length=794\n", " Score E\n", "Sequences producing significant alignments: (Bits) Value\n", "\n", " CHEMBL2214 [P41245] Matrix metalloproteinase-9 (Mus musculus) 84.7 4e-17\n", " CHEMBL3870 [P50282] Matrix metalloproteinase-9 (Rattus norvegicus) 80.1 1e-15\n", " CHEMBL321 [P14780] Matrix metalloproteinase-9 (Homo sapiens) 79.7 1e-15\n", " CHEMBL2095216 [P14780] Matrix metalloproteinase-9 (Homo sapiens) 79.7 1e-15\n", " CHEMBL333 [P08253] 72 kDa type IV collagenase (Homo sapiens) 77.0 8e-15\n", "\n", "\n", "> CHEMBL2214 [P41245] Matrix metalloproteinase-9 (Mus musculus)\n", "Length=730\n", "\n", " Score = 84.7 bits (208), Expect = 4e-17, Method: Compositional matrix adjust.\n", " Identities = 34/56 (61%), Positives = 42/56 (75%), Gaps = 0/56 (0%)\n", "\n", "Query 116 TAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171\n", " T + G + GE C FPF+FL K+Y CTSDGR DGRLWCATT ++ TD+KWGFC +\n", "Sbjct 336 TVVGGNSAGELCVFPFVFLGKQYSSCTSDGRRDGRLWCATTSNFDTDKKWGFCPDQ 391\n", "\n", "\n", " Score = 76.3 bits (186), Expect = 2e-14, Method: Compositional matrix adjust.\n", " Identities = 36/108 (33%), Positives = 54/108 (50%), Gaps = 1/108 (1%)\n", "\n", "Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEED 173\n", " V+ G ++G PCHFPF F + Y CT+DGR DG WC+TT DY D K+GFC +E \n", "Sbjct 217 VIPTYYGNSNGAPCHFPFTFEGRSYSACTTDGRNDGTPWCSTTADYDKDGKFGFCPSERL 276\n", "\n", "Query 174 AAKRRQMQEAEAIYQSGMKILNGSTRKNQKR-EAYRYLQKAAGMNHTK 220\n", " + + ++ + + S + R + YR+ A + K\n", "Sbjct 277 YTEHGNGEGKPCVFPFIFEGRSYSACTTKGRSDGYRWCATTANYDQDK 324\n", "\n", "\n", " Score = 68.6 bits (166), Expect = 4e-12, Method: Compositional matrix adjust.\n", " Identities = 28/56 (50%), Positives = 35/56 (63%), Gaps = 0/56 (0%)\n", "\n", "Query 120 GTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEEDAA 175\n", " G G+PC FPF+F + Y CT+ GR DG WCATT +Y D+ +GFC T DA \n", "Sbjct 281 GNGEGKPCVFPFIFEGRSYSACTTKGRSDGYRWCATTANYDQDKLYGFCPTRVDAT 336\n", "\n", "\n", "> CHEMBL3870 [P50282] Matrix metalloproteinase-9 (Rattus norvegicus)\n", "Length=708\n", "\n", " Score = 80.1 bits (196), Expect = 1e-15, Method: Compositional matrix adjust.\n", " Identities = 39/109 (36%), Positives = 55/109 (50%), Gaps = 1/109 (1%)\n", "\n", "Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEED 173\n", " V+ G A+G PCHFPF F + Y CT+DGR DG+ WC TT DY TD K+GFC +E \n", "Sbjct 218 VVPTYFGNANGAPCHFPFTFEGRSYLSCTTDGRNDGKPWCGTTADYDTDRKYGFCPSENL 277\n", "\n", "Query 174 AAKRRQMQEAEAIYQSGMKILNGSTRKNQKR-EAYRYLQKAAGMNHTKA 221\n", " + ++ + + S + R + YR+ A + KA\n", "Sbjct 278 YTEHGNGDGKPCVFPFIFEGHSYSACTTKGRSDGYRWCATTANYDQDKA 326\n", "\n", "\n", " Score = 79.7 bits (195), Expect = 1e-15, Method: Compositional matrix adjust.\n", " Identities = 32/57 (56%), Positives = 41/57 (72%), Gaps = 0/57 (0%)\n", "\n", "Query 115 LTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171\n", " +T G + GE C FPF+FL K+Y CTS+GR DGRLWCATT ++ D+KWGFC +\n", "Sbjct 336 VTVTGGNSAGEMCVFPFVFLGKQYSTCTSEGRSDGRLWCATTSNFDADKKWGFCPDQ 392\n", "\n", "\n", " Score = 63.9 bits (154), Expect = 1e-10, Method: Compositional matrix adjust.\n", " Identities = 27/54 (50%), Positives = 32/54 (59%), Gaps = 0/54 (0%)\n", "\n", "Query 120 GTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEED 173\n", " G G+PC FPF+F Y CT+ GR DG WCATT +Y D+ GFC T D\n", "Sbjct 282 GNGDGKPCVFPFIFEGHSYSACTTKGRSDGYRWCATTANYDQDKADGFCPTRAD 335\n", "\n", "\n", "> CHEMBL321 [P14780] Matrix metalloproteinase-9 (Homo sapiens)\n", "Length=707\n", "\n", " Score = 79.7 bits (195), Expect = 1e-15, Method: Compositional matrix adjust.\n", " Identities = 33/56 (59%), Positives = 41/56 (73%), Gaps = 0/56 (0%)\n", "\n", "Query 116 TAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171\n", " T + G + GE C FPF FL KEY CTS+GR DGRLWCATT ++ +D+KWGFC +\n", "Sbjct 336 TVMGGNSAGELCVFPFTFLGKEYSTCTSEGRGDGRLWCATTSNFDSDKKWGFCPDQ 391\n", "\n", "\n", " Score = 73.6 bits (179), Expect = 1e-13, Method: Compositional matrix adjust.\n", " Identities = 32/69 (46%), Positives = 44/69 (64%), Gaps = 1/69 (1%)\n", "\n", "Query 105 EELKRVRKPVLTAIE-GTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDE 163\n", " +EL + K V+ G A G CHFPF+F + Y CT+DGR DG WC+TT +Y TD+\n", "Sbjct 207 DELWSLGKGVVVPTRFGNADGAACHFPFIFEGRSYSACTTDGRSDGLPWCSTTANYDTDD 266\n", "\n", "Query 164 KWGFCETEE 172\n", " ++GFC +E \n", "Sbjct 267 RFGFCPSER 275\n", "\n", "\n", " Score = 72.8 bits (177), Expect = 2e-13, Method: Compositional matrix adjust.\n", " Identities = 29/57 (51%), Positives = 38/57 (67%), Gaps = 0/57 (0%)\n", "\n", "Query 119 EGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEEDAA 175\n", " +G A G+PC FPF+F + Y CT+DGR DG WCATT +Y D+ +GFC T D+ \n", "Sbjct 280 DGNADGKPCQFPFIFQGQSYSACTTDGRSDGYRWCATTANYDRDKLFGFCPTRADST 336\n", "\n", "\n", "> CHEMBL2095216 [P14780] Matrix metalloproteinase-9 (Homo sapiens)\n", "Length=707\n", "\n", " Score = 79.7 bits (195), Expect = 1e-15, Method: Compositional matrix adjust.\n", " Identities = 33/56 (59%), Positives = 41/56 (73%), Gaps = 0/56 (0%)\n", "\n", "Query 116 TAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171\n", " T + G + GE C FPF FL KEY CTS+GR DGRLWCATT ++ +D+KWGFC +\n", "Sbjct 336 TVMGGNSAGELCVFPFTFLGKEYSTCTSEGRGDGRLWCATTSNFDSDKKWGFCPDQ 391\n", "\n", "\n", " Score = 73.6 bits (179), Expect = 1e-13, Method: Compositional matrix adjust.\n", " Identities = 32/69 (46%), Positives = 44/69 (64%), Gaps = 1/69 (1%)\n", "\n", "Query 105 EELKRVRKPVLTAIE-GTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDE 163\n", " +EL + K V+ G A G CHFPF+F + Y CT+DGR DG WC+TT +Y TD+\n", "Sbjct 207 DELWSLGKGVVVPTRFGNADGAACHFPFIFEGRSYSACTTDGRSDGLPWCSTTANYDTDD 266\n", "\n", "Query 164 KWGFCETEE 172\n", " ++GFC +E \n", "Sbjct 267 RFGFCPSER 275\n", "\n", "\n", " Score = 72.8 bits (177), Expect = 2e-13, Method: Compositional matrix adjust.\n", " Identities = 29/57 (51%), Positives = 38/57 (67%), Gaps = 0/57 (0%)\n", "\n", "Query 119 EGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEEDAA 175\n", " +G A G+PC FPF+F + Y CT+DGR DG WCATT +Y D+ +GFC T D+ \n", "Sbjct 280 DGNADGKPCQFPFIFQGQSYSACTTDGRSDGYRWCATTANYDRDKLFGFCPTRADST 336\n", "\n", "\n", "> CHEMBL333 [P08253] 72 kDa type IV collagenase (Homo sapiens)\n", "Length=660\n", "\n", " Score = 77.0 bits (188), Expect = 8e-15, Method: Compositional matrix adjust.\n", " Identities = 29/55 (53%), Positives = 38/55 (69%), Gaps = 0/55 (0%)\n", "\n", "Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFC 168\n", " ++ + G + G PC FPF FL +Y+ CTS GR DG++WCATT +Y D KWGFC\n", "Sbjct 336 AMSTVGGNSEGAPCVFPFTFLGNKYESCTSAGRSDGKMWCATTANYDDDRKWGFC 390\n", "\n", "\n", " Score = 72.8 bits (177), Expect = 2e-13, Method: Compositional matrix adjust.\n", " Identities = 32/58 (55%), Positives = 39/58 (67%), Gaps = 0/58 (0%)\n", "\n", "Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171\n", " V+ G A GE C FPFLF KEY+ CT GR DG LWC+TTY+++ D K+GFC E\n", "Sbjct 220 VVRVKYGNADGEYCKFPFLFNGKEYNSCTDTGRSDGFLWCSTTYNFEKDGKYGFCPHE 277\n", "\n", "\n", " Score = 70.5 bits (171), Expect = 9e-13, Method: Compositional matrix adjust.\n", " Identities = 29/55 (53%), Positives = 35/55 (64%), Gaps = 0/55 (0%)\n", "\n", "Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFC 168\n", " L + G A G+PC FPF F YD CT++GR DG WC TT DY D+K+GFC\n", "Sbjct 278 ALFTMGGNAEGQPCKFPFRFQGTSYDSCTTEGRTDGYRWCGTTEDYDRDKKYGFC 332\n", "\n", "\n", "\n", "Lambda K H a alpha\n", " 0.317 0.134 0.394 0.792 4.96 \n", "\n", "Gapped\n", "Lambda K H a alpha sigma\n", " 0.267 0.0410 0.140 1.90 42.6 43.6 \n", "\n", "Effective search space used: 2947914464\n", "\n", "\n", "Query= P33277_GAP1_SCHPO\n", "\n", "Length=766\n", " Score E\n", "Sequences producing significant alignments: (Bits) Value\n", "\n", " CHEMBL2176807 [Q04690] Neurofibromin (Mus musculus) 75.1 5e-14\n", " CHEMBL2176804 [Q9QUH6] Ras/Rap GTPase-activating protein SynGAP... 51.2 9e-07\n", "\n", "\n", "> CHEMBL2176807 [Q04690] Neurofibromin (Mus musculus)\n", "Length=2841\n", "\n", " Score = 75.1 bits (183), Expect = 5e-14, Method: Compositional matrix adjust.\n", " Identities = 62/243 (26%), Positives = 109/243 (45%), Gaps = 53/243 (22%)\n", "\n", "Query 164 YESREEHLLLSLFQMVLTTEFEATSDVLSLLRANTPVSRMLTTYTRRGPGQAYLRSILYQ 223\n", " ++SR HLL L + + E E + +L R N+ S+++T + + G YL+ +L \n", "Sbjct 1249 FDSR--HLLYQLLWNMFSKEVELADSMQTLFRGNSLASKIMT-FCFKVYGATYLQKLLDP 1305\n", "\n", "Query 224 CINDVAIHPDLQLDIHPLSVYRYLVNTGQLSPSEDDNLLTNEEVSEFPAVKNAIQERSAQ 283\n", " + + D Q H + V+ +L PSE +++E \n", "Sbjct 1306 LLRVIITSSDWQ---H----VSFEVDPTRLEPSE------------------SLEENQRN 1340\n", "\n", "Query 284 LLLLTKRFLDAVLNSIDEIPYGIRWVCKLI---------------------RNLTNRLFP 322\n", " LL +T++F A+++S E P +R VC + +++ ++ FP\n", "Sbjct 1341 LLQMTEKFFHAIISSSSEFPSQLRSVCHCLYQATCHSLLNKATVKERKENKKSVVSQRFP 1400\n", "\n", "Query 323 SISDSTICSLIGGFFFLRFVNPAIISPQTSMLLDSCPSDNVRKTLATIAKIIQSVANGTS 382\n", " S +G FLRF+NPAI+SP + +LD P + + L ++K++QS+AN \n", "Sbjct 1401 QNS----IGAVGSAMFLRFINPAIVSPYEAGILDKKPPPRIERGLKLMSKVLQSIANHVL 1456\n", "\n", "Query 383 STK 385\n", " TK\n", "Sbjct 1457 FTK 1459\n", "\n", "\n", "> CHEMBL2176804 [Q9QUH6] Ras/Rap GTPase-activating protein SynGAP \n", "(Rattus norvegicus)\n", "Length=1308\n", "\n", " Score = 51.2 bits (121), Expect = 9e-07, Method: Compositional matrix adjust.\n", " Identities = 52/192 (27%), Positives = 84/192 (44%), Gaps = 8/192 (4%)\n", "\n", "Query 264 NEEVSEFPAVKNAIQERSAQLLLLTKRFLDAVLNSIDEIPYGIRWVCKLIR-NLTNRLFP 322\n", " N EV +++ E A L + + L V+NS P ++ V R R \n", "Sbjct 523 NCEVDPIKCTASSLAEHQANLRMCCELALCKVVNSHCVFPRELKEVFASWRLRCAERGRE 582\n", "\n", "Query 323 SISDSTICSLIGGFFFLRFVNPAIISPQTSMLLDSCPSDNVRKTLATIAKIIQSVANGTS 382\n", " I+D LI FLRF+ PAI+SP L+ P + +TL IAK+IQ++AN + \n", "Sbjct 583 DIADR----LISASLFLRFLCPAIMSPSLFGLMQEYPDEQTSRTLTLIAKVIQNLANFSK 638\n", "\n", "Query 383 STKTHLDVSFQPMLKEYE-EKVHNLLRKLGNVGDFFEALELDQYIALSKKSLALEMTVNE 441\n", " T + F E E + L ++ N+ + + YI L ++ L + E\n", "Sbjct 639 FTSKEDFLGFMNEFLELEWGSMQQFLYEISNLDTLTNSSSFEGYIDLGRELSTLHALLWE 698\n", "\n", "Query 442 I--YLTHEIILE 451\n", " + L+ E +L+\n", "Sbjct 699 VLPQLSKEALLK 710\n", "\n", "\n", "\n", "Lambda K H a alpha\n", " 0.320 0.135 0.381 0.792 4.96 \n", "\n", "Gapped\n", "Lambda K H a alpha sigma\n", " 0.267 0.0410 0.140 1.90 42.6 43.6 \n", "\n", "Effective search space used: 2828634688\n", "\n", "\n", "Query= Q96PD4_IL17F_HUMAN\n", "\n", "Length=163\n", " Score E\n", "Sequences producing significant alignments: (Bits) Value\n", "\n", " CHEMBL3390822 [Q16552] Interleukin-17A (Homo sapiens) 125 9e-37\n", "\n", "\n", "> CHEMBL3390822 [Q16552] Interleukin-17A (Homo sapiens)\n", "Length=155\n", "\n", " Score = 125 bits (315), Expect = 9e-37, Method: Compositional matrix adjust.\n", " Identities = 61/108 (56%), Positives = 76/108 (70%), Gaps = 0/108 (0%)\n", "\n", "Query 55 MKLDIGIINENQRVSMSRNIESRSTSPWNYTVTWDPNRYPSEVVQAQCRNLGCINAQGKE 114\n", " + L+I N N S + +RSTSPWN DP RYPS + +A+CR+LGCINA G \n", "Sbjct 47 VNLNIHNRNTNTNPKRSSDYYNRSTSPWNLHRNEDPERYPSVIWEAKCRHLGCINADGNV 106\n", "\n", "Query 115 DISMNSVPIQQETLVVRRKHQGCSVSFQLEKVLVTVGCTCVTPVIHHV 162\n", " D MNSVPIQQE LV+RR+ C SF+LEK+LV+VGCTCVTP++HHV\n", "Sbjct 107 DYHMNSVPIQQEILVLRREPPHCPNSFRLEKILVSVGCTCVTPIVHHV 154\n", "\n", "\n", "\n", "Lambda K H a alpha\n", " 0.320 0.133 0.406 0.792 4.96 \n", "\n", "Gapped\n", "Lambda K H a alpha sigma\n", " 0.267 0.0410 0.140 1.90 42.6 43.6 \n", "\n", "Effective search space used: 338902872\n", "\n", "\n", "Query= P10144_GRAB_HUMAN\n", "\n", "Length=247\n", " Score E\n", "Sequences producing significant alignments: (Bits) Value\n", "\n", " CHEMBL2316 [P10144] Granzyme B (Homo sapiens) 519 0.0 \n", " CHEMBL5622 [P28293] Cathepsin G (Mus musculus) 270 2e-90\n", " CHEMBL4071 [P08311] Cathepsin G (Homo sapiens) 266 8e-89\n", " CHEMBL4068 [P23946] Chymase (Homo sapiens) 238 5e-78\n", " CHEMBL2132 [O35164] Mast cell protease 9 (Mus musculus) 209 1e-66\n", "\n", "\n", "> CHEMBL2316 [P10144] Granzyme B (Homo sapiens)\n", "Length=247\n", "\n", " Score = 519 bits (1337), Expect = 0.0, Method: Compositional matrix adjust.\n", " Identities = 247/247 (100%), Positives = 247/247 (100%), Gaps = 0/247 (0%)\n", "\n", "Query 1 MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL 60\n", " MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL\n", "Sbjct 1 MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL 60\n", "\n", "Query 61 TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR 120\n", " TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR\n", "Sbjct 61 TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR 120\n", "\n", "Query 121 TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY 180\n", " TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY\n", "Sbjct 121 TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY 180\n", "\n", "Query 181 YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI 240\n", " YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI\n", "Sbjct 181 YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI 240\n", "\n", "Query 241 KKTMKRY 247\n", " KKTMKRY\n", "Sbjct 241 KKTMKRY 247\n", "\n", "\n", "> CHEMBL5622 [P28293] Cathepsin G (Mus musculus)\n", "Length=261\n", "\n", " Score = 270 bits (691), Expect = 2e-90, Method: Compositional matrix adjust.\n", " Identities = 142/247 (57%), Positives = 189/247 (77%), Gaps = 2/247 (1%)\n", "\n", "Query 1 MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL 60\n", " MQP+LLLL F+LL +AG+IIGG EA+PHS PYMA+L+I + L CGGFL+R+DFVL\n", "Sbjct 1 MQPLLLLLTFILLQGDEAGKIIGGREARPHSYPYMAFLLIQSPEGLSACGGFLVREDFVL 60\n", "\n", "Query 61 TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR 120\n", " TAAHC GSSINVTLGAHNI+ +E TQQ I V R I HP YNP+N NDIMLLQL R+A+R\n", "Sbjct 61 TAAHCLGSSINVTLGAHNIQMRERTQQLITVLRAIRHPDYNPQNIRNDIMLLQLRRRARR 120\n", "\n", "Query 121 TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY 180\n", " + +V+P+ LP +++PG C+VAGWG+ + + ++ LQEV++ VQ D+ C + +\n", "Sbjct 121 SGSVKPVALPQASKKLQPGDLCTVAGWGRVSQ-SRGTNVLQEVQLRVQMDQMCANRF-QF 178\n", "\n", "Query 181 YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI 240\n", " Y+S ++CVG+P +K++F+GDSGGPLVC+ VAQGIVSYG NNG PP TK+ SF+ WI\n", "Sbjct 179 YNSQTQICVGNPRERKSAFRGDSGGPLVCSNVAQGIVSYGSNNGNPPAVFTKIQSFMPWI 238\n", "\n", "Query 241 KKTMKRY 247\n", " K+TM+R+\n", "Sbjct 239 KRTMRRF 245\n", "\n", "\n", "> CHEMBL4071 [P08311] Cathepsin G (Homo sapiens)\n", "Length=255\n", "\n", " Score = 266 bits (679), Expect = 8e-89, Method: Compositional matrix adjust.\n", " Identities = 143/249 (57%), Positives = 185/249 (74%), Gaps = 6/249 (2%)\n", "\n", "Query 1 MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL 60\n", " MQP+LLLLAFLL A+AGEIIGG E++PHSRPYMAYL I RCGGFL+R+DFVL\n", "Sbjct 1 MQPLLLLLAFLLPTGAEAGEIIGGRESRPHSRPYMAYLQIQSPAGQSRCGGFLVREDFVL 60\n", "\n", "Query 61 TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR 120\n", " TAAHCWGS+INVTLGAHNI+ +E TQQ I +R I HP YN + NDIMLLQL R+ +R\n", "Sbjct 61 TAAHCWGSNINVTLGAHNIQRRENTQQHITARRAIRHPQYNQRTIQNDIMLLQLSRRVRR 120\n", "\n", "Query 121 TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY 180\n", " R V P+ LP + ++PG C+VAGWG+ + + + + TL+EV++ VQ DR+C LR +\n", "Sbjct 121 NRNVNPVALPRAQEGLRPGTLCTVAGWGRVS-MRRGTDTLREVQLRVQRDRQC---LRIF 176\n", "\n", "Query 181 --YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVH 238\n", " YD ++CVGD +K +FKGDSGGPL+CN VA GIVSYG+++G+PP T+VSSF+ \n", "Sbjct 177 GSYDPRRQICVGDRRERKAAFKGDSGGPLLCNNVAHGIVSYGKSSGVPPEVFTRVSSFLP 236\n", "\n", "Query 239 WIKKTMKRY 247\n", " WI+ TM+ +\n", "Sbjct 237 WIRTTMRSF 245\n", "\n", "\n", "> CHEMBL4068 [P23946] Chymase (Homo sapiens)\n", "Length=247\n", "\n", " Score = 238 bits (607), Expect = 5e-78, Method: Compositional matrix adjust.\n", " Identities = 125/232 (54%), Positives = 156/232 (67%), Gaps = 3/232 (1%)\n", "\n", "Query 15 RADAGEIIGGHEAKPHSRPYMAYL-MIWDQKSLKRCGGFLIRDDFVLTAAHCWGSSINVT 73\n", " RA+AGEIIGG E KPHSRPYMAYL ++ K CGGFLIR +FVLTAAHC G SI VT\n", "Sbjct 16 RAEAGEIIGGTECKPHSRPYMAYLEIVTSNGPSKFCGGFLIRRNFVLTAAHCAGRSITVT 75\n", "\n", "Query 74 LGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKRTRAVQPLRLPSNK 133\n", " LGAHNI E+E T Q + V + HP YN +DIMLL+L+ KA T AV L PS \n", "Sbjct 76 LGAHNITEEEDTWQKLEVIKQFRHPKYNTSTLHHDIMLLKLKEKASLTLAVGTLPFPSQF 135\n", "\n", "Query 134 AQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHYYDSTIELCVGDPE 193\n", " V PG+ C VAGWG+T L S TLQEVK+ + + + C S R +D ++LCVG+P \n", "Sbjct 136 NFVPPGRMCRVAGWGRTGVLKPGSDTLQEVKLRLMDPQAC-SHFRD-FDHNLQLCVGNPR 193\n", "\n", "Query 194 IKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWIKKTMK 245\n", " K++FKGDSGGPL+C VAQGIVSYGR++ PP T++S + WI + ++\n", "Sbjct 194 KTKSAFKGDSGGPLLCAGVAQGIVSYGRSDAKPPAVFTRISHYRPWINQILQ 245\n", "\n", "\n", "> CHEMBL2132 [O35164] Mast cell protease 9 (Mus musculus)\n", "Length=246\n", "\n", " Score = 209 bits (531), Expect = 1e-66, Method: Compositional matrix adjust.\n", " Identities = 111/232 (48%), Positives = 150/232 (65%), Gaps = 3/232 (1%)\n", "\n", "Query 15 RADAGEIIGGHEAKPHSRPYMAYLMIWDQKS-LKRCGGFLIRDDFVLTAAHCWGSSINVT 73\n", " RA A EIIGG E++PHSRPYMAY+ + +K + CGGFLI FV+TAAHC G + VT\n", "Sbjct 15 RAGAEEIIGGVESEPHSRPYMAYVNTFSKKGYVAICGGFLIAPQFVMTAAHCSGRRMTVT 74\n", "\n", "Query 74 LGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKRTRAVQPLRLPSNK 133\n", " LGAHN++++E TQQ I V++ I P YN + NDI+LL+L+++A T AV + LP \n", "Sbjct 75 LGAHNVRKRECTQQKIKVEKYILPPNYNVSSKFNDIVLLKLKKQANLTSAVDVVPLPGPS 134\n", "\n", "Query 134 AQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHYYDSTIELCVGDPE 193\n", " KPG C AGWG+T SHTL+EV++ + ++ C+ RHY DS +++CVG \n", "Sbjct 135 DFAKPGTMCWAAGWGRTGVKKSISHTLREVELKIVGEKACK-IFRHYKDS-LQICVGSST 192\n", "\n", "Query 194 IKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWIKKTMK 245\n", " + + GDSGGPL+C VA GIVS GR N PP T++S V WI + +K\n", "Sbjct 193 KVASVYMGDSGGPLLCAGVAHGIVSSGRGNAKPPAIFTRISPHVPWINRVIK 244\n", "\n", "\n", "\n", "Lambda K H a alpha\n", " 0.320 0.136 0.425 0.792 4.96 \n", "\n", "Gapped\n", "Lambda K H a alpha sigma\n", " 0.267 0.0410 0.140 1.90 42.6 43.6 \n", "\n", "Effective search space used: 679717896\n", "\n", "\n", " Database: chembl_21.fa\n", " Posted date: Jun 14, 2016 2:00 PM\n", " Number of letters in database: 5,161,060\n", " Number of sequences in database: 8,834\n", "\n", "\n", "\n", "Matrix: BLOSUM62\n", "Gap Penalties: Existence: 11, Extension: 1\n", "Neighboring words threshold: 11\n", "Window for multiple hits: 40\n" ] } ], "source": [ "# So lets try and run the 'raw' commandline version\n", "!$blast_exe -query $query_file -db $database -evalue $eval_threshold -num_descriptions $num_descriptions -num_alignments $num_alignments\n", "\n", "# Stdout should be printed below:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### That's great, but is there a way to run the command above programmatically?\n", "\n", "Yes.\n", "\n", "The output above can described as 'classic' BLAST output and though it is fairly easy to read, can be quite tricky to parse. To help us, there are many programming language specific libraries which allow you to run BLAST searches and easily parse the output, for example [BioPerl](http://www.bioperl.org), [BioJava](http://biojava.org) and [BioRuby](http://bioruby.org/). Do not worry, as we are working in a Python environment we have the [Biopython](http://biopython.org) library to our disposal, so lets get started. We can wrap a commandline BLAST search with the NcbiblastpCommandline method:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from Bio.Blast.Applications import NcbiblastpCommandline\n", "\n", "# The outfmt=5 value creates an XML formatted file\n", "blastp_cmd = NcbiblastpCommandline(cmd=blast_exe, query=query_file, db=database, outfmt=5, out=results_xml, evalue=eval_threshold)\n", "stdout, stderr = blastp_cmd()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now parse the BLAST result file and create blast record object. This gives us access to the underlying BLAST result values, using the [Alignment](http://biopython.org/DIST/docs/api/Bio.Blast.Record.Alignment-class.html) and [HSP](http://biopython.org/DIST/docs/api/Bio.Blast.Record.HSP-class.html) classes: " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "# Result 1 #\n", "Sequence: gnl|BL_ORD_ID|7974 CHEMBL2150840 [Q96P68] 2-oxoglutarate receptor 1 (Homo sapiens)\n", "Length: 337\n", "E-Value: 0.0\n", "Score: 1775.0\n", "Identities: 337\n", "MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNL...\n", "MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNL...\n", "MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNL...\n", "\n", "# Result 2 #\n", "Sequence: gnl|BL_ORD_ID|1266 CHEMBL2325 [Q6Y1R5] 2-oxoglutarate receptor 1 (Rattus norvegicus)\n", "Length: 337\n", "E-Value: 0.0\n", "Score: 1492.0\n", "Identities: 289\n", "MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNL...\n", "M E LD AN SDF DY A NCTDE I KM YLPVIY IIFLVGFPGN V IS Y+FKMRPWKSSTIIMLNL...\n", "MIETLDSPANDSDFLDYITALENCTDEQISFKMQYLPVIYSIIFLVGFPGNTVAISIYVFKMRPWKSSTIIMLNL...\n", "\n", "# Result 3 #\n", "Sequence: gnl|BL_ORD_ID|3367 CHEMBL4315 [P47900] P2Y purinoceptor 1 (Homo sapiens)\n", "Length: 373\n", "E-Value: 9.48885e-67\n", "Score: 550.0\n", "Identities: 108\n", "NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGE...\n", " C + +YLP +Y ++F++GF GN+V I ++F M+PW ++ M NLA D LY+ +LP LI YY + ...\n", "KCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKT...\n", "\n", "# Result 4 #\n", "Sequence: gnl|BL_ORD_ID|5348 CHEMBL5720 [P49652] P2Y purinoceptor 1 (Meleagris gallopavo)\n", "Length: 362\n", "E-Value: 1.43544e-66\n", "Score: 548.0\n", "Identities: 106\n", "NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGE...\n", " C+ + +YLP +Y ++F+ GF GN+V I ++F MRPW ++ M NLA D LY+ +LP LI YY + ...\n", "KCSLTKTGFQFYYLPTVYILVFITGFLGNSVAIWMFVFHMRPWSGISVYMFNLALADFLYVLTLPALIFYYFNKT...\n", "\n", "# Result 5 #\n", "Sequence: gnl|BL_ORD_ID|1748 CHEMBL2497 [P49651] P2Y purinoceptor 1 (Rattus norvegicus)\n", "Length: 373\n", "E-Value: 2.52191e-66\n", "Score: 548.0\n", "Identities: 108\n", "NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGE...\n", " C + +YLP +Y ++F++GF GN+V I ++F M+PW ++ M NLA D LY+ +LP LI YY + ...\n", "RCALIKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKT...\n" ] } ], "source": [ "from Bio.Blast import NCBIXML\n", "result_handle = open(results_xml)\n", "blast_records = NCBIXML.parse(result_handle)\n", "\n", "E_VALUE_THRESH = 0.04\n", "result_counter = 0\n", "\n", "for blast_record in blast_records:\n", " for alignment in blast_record.alignments:\n", " result_counter+=1\n", " for hsp in alignment.hsps:\n", " if result_counter <= 5:\n", " print '\\n# Result ', result_counter, '#'\n", " print 'Sequence: ' + alignment.title\n", " print 'Length: ', alignment.length\n", " print 'E-Value: ', hsp.expect\n", " print 'Score: ', hsp.score\n", " print 'Identities:', hsp.identities\n", " print(hsp.query[0:75] + '...')\n", " print(hsp.match[0:75] + '...')\n", " print(hsp.sbjct[0:75] + '...')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### BLAST Data Processing with Pandas\n", "\n", "Biopython is great and provides you with lots of additional functionality, but for the purpose of this tutorial we will now turn our attention to processing BLAST data using [pandas](http://pandas.pydata.org/). To get started we need to turn our BLAST output into a 'tabular' format. Fortunately we can create a CSV BLAST results file, so lets create this now (one thing to note about the BLAST CSV output, is that it does not include the BLAST alignments):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Create a blast output file in csv format so that it can easily be loaded by pandas\n", "# The outfmt=10 value creates an CSV formatted file\n", "!$blast_exe -query $query_file -db $database -outfmt 10 -out $results_csv -evalue $eval_threshold" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now load the BLAST results into a pandas dataframe. You should be able to map the result values (e.g. length, identity, e-value,..) in the table below to the earlier 'classic' and bioptyhon BLAST results:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
querychembl_target_ididentitylengthmismatchgapopenqstartqendsstartsendevaluebitscore
0Q96P68_OXGR1_HUMANCHEMBL2150840100.0033700133713370.000000e+00688.0
1Q96P68_OXGR1_HUMANCHEMBL232585.76337480133713370.000000e+00579.0
2Q96P68_OXGR1_HUMANCHEMBL431536.00300188223318413409.000000e-67216.0
3Q96P68_OXGR1_HUMANCHEMBL572035.33300190223318303291.000000e-66215.0
4Q96P68_OXGR1_HUMANCHEMBL249736.00300188223318413403.000000e-66215.0
.......................................
775P10144_GRAB_HUMANCHEMBL107530827.6126816013122473526171.000000e-1265.9
776P10144_GRAB_HUMANCHEMBL307828.0323914011122203515871.000000e-1162.8
777P10144_GRAB_HUMANCHEMBL204070327.71231134124924752341.000000e-1161.2
778P10144_GRAB_HUMANCHEMBL573125.9423912013302204917201.000000e-0957.0
779P10144_GRAB_HUMANCHEMBL392924.3816011345320611584.000000e-0747.4
\n", "

780 rows × 12 columns

\n", "
" ], "text/plain": [ " query chembl_target_id identity length mismatch gapopen \\\n", "0 Q96P68_OXGR1_HUMAN CHEMBL2150840 100.00 337 0 0 \n", "1 Q96P68_OXGR1_HUMAN CHEMBL2325 85.76 337 48 0 \n", "2 Q96P68_OXGR1_HUMAN CHEMBL4315 36.00 300 188 2 \n", "3 Q96P68_OXGR1_HUMAN CHEMBL5720 35.33 300 190 2 \n", "4 Q96P68_OXGR1_HUMAN CHEMBL2497 36.00 300 188 2 \n", ".. ... ... ... ... ... ... \n", "775 P10144_GRAB_HUMAN CHEMBL1075308 27.61 268 160 13 \n", "776 P10144_GRAB_HUMAN CHEMBL3078 28.03 239 140 11 \n", "777 P10144_GRAB_HUMAN CHEMBL2040703 27.71 231 134 12 \n", "778 P10144_GRAB_HUMAN CHEMBL5731 25.94 239 120 13 \n", "779 P10144_GRAB_HUMAN CHEMBL3929 24.38 160 113 4 \n", "\n", " qstart qend sstart send evalue bitscore \n", "0 1 337 1 337 0.000000e+00 688.0 \n", "1 1 337 1 337 0.000000e+00 579.0 \n", "2 23 318 41 340 9.000000e-67 216.0 \n", "3 23 318 30 329 1.000000e-66 215.0 \n", "4 23 318 41 340 3.000000e-66 215.0 \n", ".. ... ... ... ... ... ... \n", "775 12 247 352 617 1.000000e-12 65.9 \n", "776 12 220 351 587 1.000000e-11 62.8 \n", "777 49 247 5 234 1.000000e-11 61.2 \n", "778 30 220 491 720 1.000000e-09 57.0 \n", "779 53 206 1 158 4.000000e-07 47.4 \n", "\n", "[780 rows x 12 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Now load BLAST information into pandas dataframe\n", "import pandas\n", "from pandas import DataFrame, read_csv, merge\n", "from pandas.io import sql\n", "from pandas.io.sql import read_sql\n", "# Limit the default pandas dataframe size \n", "pandas.set_option('display.max_rows', 10)\n", "\n", "# Setup database connection to local ChEMBL instance\n", "import psycopg2\n", "con = psycopg2.connect(port=5432, user='chembl', dbname='chembl_21')\n", "\n", "Location = results_csv\n", "blast_df = read_csv(Location, names=['query', 'chembl_target_id', 'identity', 'length', 'mismatch', 'gapopen', 'qstart', 'qend', 'sstart', 'send', 'evalue', 'bitscore'])\n", "blast_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have not done anything new yet, just presented the BLAST results in yet another format, so what next?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a basic Druggability Score and linking this score to a BLAST search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The benefit of using a pandas dataframe is that it makes it very easy for us to join the BLAST resultset to another pandas dataframe resultset, in a similar way to how you might join 2 or more tables in an SQL query. So lets create some additional dataframes.\n", "\n", "\n", "First, lets get some additional about the ChEMBL targets, such as names, organism:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chembl_target_idpref_nameorganismtax_idtidtarget_type
0CHEMBL2074Maltase-glucoamylaseHomo sapiens9606.01SINGLE PROTEIN
1CHEMBL1971Sulfonylurea receptor 2Homo sapiens9606.02SINGLE PROTEIN
2CHEMBL1827Phosphodiesterase 5AHomo sapiens9606.03SINGLE PROTEIN
3CHEMBL1859Voltage-gated T-type calcium channel alpha-1H ...Homo sapiens9606.04SINGLE PROTEIN
4CHEMBL1884Nicotinic acetylcholine receptor alpha subunitAscaris suum6253.05SINGLE PROTEIN
.....................
11014CHEMBL3559688Frizzled-7Homo sapiens9606.0109743SINGLE PROTEIN
11015CHEMBL3559689Frizzled-8Homo sapiens9606.0109744SINGLE PROTEIN
11016CHEMBL3559691Cyclin-dependent kinaseHomo sapiens9606.0109746PROTEIN FAMILY
11017CHEMBL3559701Proto-oncogene MasHomo sapiens9606.0109748SINGLE PROTEIN
11018CHEMBL3559703PI3-kinase class IHomo sapiens9606.0109750PROTEIN COMPLEX GROUP
\n", "

11019 rows × 6 columns

\n", "
" ], "text/plain": [ " chembl_target_id pref_name \\\n", "0 CHEMBL2074 Maltase-glucoamylase \n", "1 CHEMBL1971 Sulfonylurea receptor 2 \n", "2 CHEMBL1827 Phosphodiesterase 5A \n", "3 CHEMBL1859 Voltage-gated T-type calcium channel alpha-1H ... \n", "4 CHEMBL1884 Nicotinic acetylcholine receptor alpha subunit \n", "... ... ... \n", "11014 CHEMBL3559688 Frizzled-7 \n", "11015 CHEMBL3559689 Frizzled-8 \n", "11016 CHEMBL3559691 Cyclin-dependent kinase \n", "11017 CHEMBL3559701 Proto-oncogene Mas \n", "11018 CHEMBL3559703 PI3-kinase class I \n", "\n", " organism tax_id tid target_type \n", "0 Homo sapiens 9606.0 1 SINGLE PROTEIN \n", "1 Homo sapiens 9606.0 2 SINGLE PROTEIN \n", "2 Homo sapiens 9606.0 3 SINGLE PROTEIN \n", "3 Homo sapiens 9606.0 4 SINGLE PROTEIN \n", "4 Ascaris suum 6253.0 5 SINGLE PROTEIN \n", "... ... ... ... ... \n", "11014 Homo sapiens 9606.0 109743 SINGLE PROTEIN \n", "11015 Homo sapiens 9606.0 109744 SINGLE PROTEIN \n", "11016 Homo sapiens 9606.0 109746 PROTEIN FAMILY \n", "11017 Homo sapiens 9606.0 109748 SINGLE PROTEIN \n", "11018 Homo sapiens 9606.0 109750 PROTEIN COMPLEX GROUP \n", "\n", "[11019 rows x 6 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select additional target information from the target_dictionary table \n", "sql1 = \"\"\"\n", "select td.chembl_id as chembl_target_id,\n", " td.pref_name,\n", " td.organism,\n", " td.tax_id,\n", " td.tid,\n", " td.target_type\n", " from target_dictionary td\n", "\"\"\"\n", "\n", "chembl_target_df = read_sql(sql1, con)\n", "chembl_target_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, lets use the ChEMBL database to get a count FDA approved drugs that bind each of the targets in the database:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chembl_target_iddrug_count
0CHEMBL30047
1CHEMBL38298
2CHEMBL26581
3CHEMBL54646
4CHEMBL35631
.........
1145CHEMBL33502222
1146CHEMBL16976572
1147CHEMBL28199
1148CHEMBL16677011
1149CHEMBL39942
\n", "

1150 rows × 2 columns

\n", "
" ], "text/plain": [ " chembl_target_id drug_count\n", "0 CHEMBL3004 7\n", "1 CHEMBL3829 8\n", "2 CHEMBL2658 1\n", "3 CHEMBL5464 6\n", "4 CHEMBL3563 1\n", "... ... ...\n", "1145 CHEMBL3350222 2\n", "1146 CHEMBL1697657 2\n", "1147 CHEMBL281 99\n", "1148 CHEMBL1667701 1\n", "1149 CHEMBL3994 2\n", "\n", "[1150 rows x 2 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We can traverse the ChEMBL activities table to get the count of FDA approved molecules,\n", "# which bind ChEMBL targets with high affinity\n", "sql2 = \"\"\"\n", "select t.chembl_id as chembl_target_id,\n", " count(m.chembl_id) as drug_count\n", " from activities a,\n", " assays s,\n", " target_dictionary t,\n", " molecule_dictionary m\n", " where a.assay_id=s.assay_id\n", " and s.tid=t.tid\n", " and m.molregno=a.molregno\n", " and a.pchembl_value >= 6\n", " and s.confidence_score >= 8\n", " and m.max_phase = 4\n", " and m.therapeutic_flag=1\n", "group by t.chembl_id\n", "\"\"\"\n", "\n", "chembl_drug_df = read_sql(sql2, con)\n", "chembl_drug_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the ChEMBL database again to get a count of 'drug-like' molecules that bind each of the targets in the database:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chembl_target_iddruglike_count
0CHEMBL26588
1CHEMBL546470
2CHEMBL129324624
3CHEMBL107514089
4CHEMBL47221051
.........
3048CHEMBL502376
3049CHEMBL280385
3050CHEMBL553315
3051CHEMBL58591
3052CHEMBL598221
\n", "

3053 rows × 2 columns

\n", "
" ], "text/plain": [ " chembl_target_id druglike_count\n", "0 CHEMBL2658 8\n", "1 CHEMBL5464 70\n", "2 CHEMBL1293246 24\n", "3 CHEMBL1075140 89\n", "4 CHEMBL4722 1051\n", "... ... ...\n", "3048 CHEMBL5023 76\n", "3049 CHEMBL2803 85\n", "3050 CHEMBL5533 15\n", "3051 CHEMBL5859 1\n", "3052 CHEMBL5982 21\n", "\n", "[3053 rows x 2 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Similar to the previous query, but this time get the count of 'drug-like' compounds (defined by\n", "# having no rule-of-5 violations), which bind ChEMBL targets with high affinity \n", "sql3 = \"\"\"\n", "select t.chembl_id as chembl_target_id,\n", " count(m.chembl_id) as druglike_count\n", " from activities a,\n", " assays s,\n", " target_dictionary t,\n", " molecule_dictionary m,\n", " compound_properties p\n", " where a.assay_id=s.assay_id\n", " and s.tid=t.tid\n", " and m.molregno=a.molregno\n", " and m.molregno=p.molregno\n", " and a.pchembl_value >= 6\n", " and s.confidence_score >= 8\n", " and p.num_ro5_violations=0\n", "group by t.chembl_id\n", "\"\"\"\n", "\n", "chembl_druglike_df = read_sql(sql3, con)\n", "chembl_druglike_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, lets get the list of known drug-target interactions from the ChEMBL [Mechanism of Action](http://en.wikipedia.org/wiki/Mechanism_of_action) tables, as not all interactions will be captured in the activities table:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chembl_target_idmoa_count
0CHEMBL10753191
1CHEMBL11695961
2CHEMBL11696001
3CHEMBL12504172
4CHEMBL12933161
.........
714CHEMBL59362
715CHEMBL59718
716CHEMBL60071
717CHEMBL61204
718CHEMBL61389710
\n", "

719 rows × 2 columns

\n", "
" ], "text/plain": [ " chembl_target_id moa_count\n", "0 CHEMBL1075319 1\n", "1 CHEMBL1169596 1\n", "2 CHEMBL1169600 1\n", "3 CHEMBL1250417 2\n", "4 CHEMBL1293316 1\n", ".. ... ...\n", "714 CHEMBL5936 2\n", "715 CHEMBL5971 8\n", "716 CHEMBL6007 1\n", "717 CHEMBL6120 4\n", "718 CHEMBL613897 10\n", "\n", "[719 rows x 2 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the count of molecules assoicated to a ChEMBL target through a known Mechanism of Action\n", "sql4 = \"\"\"\n", "select td.chembl_id as chembl_target_id,\n", " count(distinct dm.molregno) as moa_count\n", " from drug_mechanism dm, \n", " target_dictionary td\n", "where dm.tid=td.tid\n", "group by td.chembl_id\n", "\"\"\"\n", "\n", "chembl_moa_df = read_sql(sql4, con)\n", "chembl_moa_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have 5 resultsets:\n", "\n", "1. The BLAST results (blast_df)\n", "2. The additional ChEMBL Target information (chembl_target_df)\n", "3. The FDA approved drug binding counts (chembl_drug_df)\n", "4. The 'drug-like' binding counts (chembl_druglike_df)\n", "5. The Mechanism-of-Action molecule counts (chembl_moa_df)\n", "\n", "So we can now think about merging the resultsets together. By planned good fortune each of the resultsets share the attribute 'chembl_target_id', so lets us that to merge:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
querychembl_target_idpref_nameorganismlengthevalueidentitybitscoremoa_countdrug_countdruglike_count
0Q96P68_OXGR1_HUMANCHEMBL21508402-oxoglutarate receptor 1Homo sapiens3370.000000e+00100.00688.00.00.00.0
1Q96P68_OXGR1_HUMANCHEMBL2325G protein-coupled receptor 80Rattus norvegicus3370.000000e+0085.76579.00.00.037.0
2Q96P68_OXGR1_HUMANCHEMBL4315Purinergic receptor P2Y1Homo sapiens3009.000000e-6736.00216.00.00.030.0
3Q96P68_OXGR1_HUMANCHEMBL5720P2Y purinoceptor 1Meleagris gallopavo3001.000000e-6635.33215.00.00.00.0
4Q96P68_OXGR1_HUMANCHEMBL2497Purinergic receptor P2Y1Rattus norvegicus3003.000000e-6636.00215.00.00.00.0
....................................
775P10144_GRAB_HUMANCHEMBL1075308ThrombinMus musculus2681.000000e-1227.6165.90.00.00.0
776P10144_GRAB_HUMANCHEMBL3078ThrombinRattus norvegicus2391.000000e-1128.0362.80.00.08.0
777P10144_GRAB_HUMANCHEMBL2040703ThrombinOryctolagus cuniculus2311.000000e-1127.7161.20.00.00.0
778P10144_GRAB_HUMANCHEMBL5731Complement factor BHomo sapiens2391.000000e-0925.9457.00.00.00.0
779P10144_GRAB_HUMANCHEMBL3929Coagulation factor XCanis lupus familiaris1604.000000e-0724.3847.40.00.00.0
\n", "

780 rows × 11 columns

\n", "
" ], "text/plain": [ " query chembl_target_id pref_name \\\n", "0 Q96P68_OXGR1_HUMAN CHEMBL2150840 2-oxoglutarate receptor 1 \n", "1 Q96P68_OXGR1_HUMAN CHEMBL2325 G protein-coupled receptor 80 \n", "2 Q96P68_OXGR1_HUMAN CHEMBL4315 Purinergic receptor P2Y1 \n", "3 Q96P68_OXGR1_HUMAN CHEMBL5720 P2Y purinoceptor 1 \n", "4 Q96P68_OXGR1_HUMAN CHEMBL2497 Purinergic receptor P2Y1 \n", ".. ... ... ... \n", "775 P10144_GRAB_HUMAN CHEMBL1075308 Thrombin \n", "776 P10144_GRAB_HUMAN CHEMBL3078 Thrombin \n", "777 P10144_GRAB_HUMAN CHEMBL2040703 Thrombin \n", "778 P10144_GRAB_HUMAN CHEMBL5731 Complement factor B \n", "779 P10144_GRAB_HUMAN CHEMBL3929 Coagulation factor X \n", "\n", " organism length evalue identity bitscore \\\n", "0 Homo sapiens 337 0.000000e+00 100.00 688.0 \n", "1 Rattus norvegicus 337 0.000000e+00 85.76 579.0 \n", "2 Homo sapiens 300 9.000000e-67 36.00 216.0 \n", "3 Meleagris gallopavo 300 1.000000e-66 35.33 215.0 \n", "4 Rattus norvegicus 300 3.000000e-66 36.00 215.0 \n", ".. ... ... ... ... ... \n", "775 Mus musculus 268 1.000000e-12 27.61 65.9 \n", "776 Rattus norvegicus 239 1.000000e-11 28.03 62.8 \n", "777 Oryctolagus cuniculus 231 1.000000e-11 27.71 61.2 \n", "778 Homo sapiens 239 1.000000e-09 25.94 57.0 \n", "779 Canis lupus familiaris 160 4.000000e-07 24.38 47.4 \n", "\n", " moa_count drug_count druglike_count \n", "0 0.0 0.0 0.0 \n", "1 0.0 0.0 37.0 \n", "2 0.0 0.0 30.0 \n", "3 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", ".. ... ... ... \n", "775 0.0 0.0 0.0 \n", "776 0.0 0.0 8.0 \n", "777 0.0 0.0 0.0 \n", "778 0.0 0.0 0.0 \n", "779 0.0 0.0 0.0 \n", "\n", "[780 rows x 11 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Carry out the merge and also only return columns we are interested in\n", "rs_merge_df = merge(blast_df, \n", " chembl_target_df, how='left', on='chembl_target_id' ).merge(\n", " chembl_drug_df, how='left', on='chembl_target_id' ).merge(\n", " chembl_druglike_df, how='left', on='chembl_target_id' ).merge(\n", " chembl_moa_df, how='left', on='chembl_target_id')[[\n", " 'query', 'chembl_target_id','pref_name', 'organism', 'length', 'evalue', 'identity', 'bitscore', 'moa_count', 'drug_count', 'druglike_count' \n", " ]].fillna(0)\n", "\n", "rs_merge_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So lets create a **really simple** score based on the information we have to predict in a target is likely to druggable:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def druggability_score(query_sequence_length, align_length, identity, moa_count, drug_count, druglike_count):\n", "\n", " align_length = float(align_length)\n", " identity = float(identity) \n", " \n", " moa_score = (align_length/query_sequence_length) * (identity/100) * (1 if (moa_count > 0) else 0)\n", " drug_score = (align_length/query_sequence_length) * (identity/100) * (1 if (drug_count > 0) else 0) * 0.8\n", " druglike_score = (align_length/query_sequence_length) * (identity/100) * (1 if (druglike_count > 0) else 0) * 0.5\n", " total_score = round((moa_score + drug_score + druglike_score),2)\n", " \n", " return (1 if (total_score > 1) else total_score)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cryptic 0.8 and 0.5 values are there to down weight the contribution of the drug_count and druglike_count values (I said it was simple). It is also assumed the mechanism of action information is a fact, i.e. it is known that this target binds 1 or more drugs, so no down weighting is applied. \n", "\n", "So lets add this new druggable score column to the results table:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
querychembl_target_idpref_nameorganismlengthevalueidentitybitscoremoa_countdrug_countdruglike_countdruggability_score
0Q96P68_OXGR1_HUMANCHEMBL21508402-oxoglutarate receptor 1Homo sapiens3370.000000e+00100.00688.00.00.00.00.00
1Q96P68_OXGR1_HUMANCHEMBL2325G protein-coupled receptor 80Rattus norvegicus3370.000000e+0085.76579.00.00.037.00.43
2Q96P68_OXGR1_HUMANCHEMBL4315Purinergic receptor P2Y1Homo sapiens3009.000000e-6736.00216.00.00.030.00.16
3Q96P68_OXGR1_HUMANCHEMBL5720P2Y purinoceptor 1Meleagris gallopavo3001.000000e-6635.33215.00.00.00.00.00
4Q96P68_OXGR1_HUMANCHEMBL2497Purinergic receptor P2Y1Rattus norvegicus3003.000000e-6636.00215.00.00.00.00.00
.......................................
775P10144_GRAB_HUMANCHEMBL1075308ThrombinMus musculus2681.000000e-1227.6165.90.00.00.00.00
776P10144_GRAB_HUMANCHEMBL3078ThrombinRattus norvegicus2391.000000e-1128.0362.80.00.08.00.14
777P10144_GRAB_HUMANCHEMBL2040703ThrombinOryctolagus cuniculus2311.000000e-1127.7161.20.00.00.00.00
778P10144_GRAB_HUMANCHEMBL5731Complement factor BHomo sapiens2391.000000e-0925.9457.00.00.00.00.00
779P10144_GRAB_HUMANCHEMBL3929Coagulation factor XCanis lupus familiaris1604.000000e-0724.3847.40.00.00.00.00
\n", "

780 rows × 12 columns

\n", "
" ], "text/plain": [ " query chembl_target_id pref_name \\\n", "0 Q96P68_OXGR1_HUMAN CHEMBL2150840 2-oxoglutarate receptor 1 \n", "1 Q96P68_OXGR1_HUMAN CHEMBL2325 G protein-coupled receptor 80 \n", "2 Q96P68_OXGR1_HUMAN CHEMBL4315 Purinergic receptor P2Y1 \n", "3 Q96P68_OXGR1_HUMAN CHEMBL5720 P2Y purinoceptor 1 \n", "4 Q96P68_OXGR1_HUMAN CHEMBL2497 Purinergic receptor P2Y1 \n", ".. ... ... ... \n", "775 P10144_GRAB_HUMAN CHEMBL1075308 Thrombin \n", "776 P10144_GRAB_HUMAN CHEMBL3078 Thrombin \n", "777 P10144_GRAB_HUMAN CHEMBL2040703 Thrombin \n", "778 P10144_GRAB_HUMAN CHEMBL5731 Complement factor B \n", "779 P10144_GRAB_HUMAN CHEMBL3929 Coagulation factor X \n", "\n", " organism length evalue identity bitscore \\\n", "0 Homo sapiens 337 0.000000e+00 100.00 688.0 \n", "1 Rattus norvegicus 337 0.000000e+00 85.76 579.0 \n", "2 Homo sapiens 300 9.000000e-67 36.00 216.0 \n", "3 Meleagris gallopavo 300 1.000000e-66 35.33 215.0 \n", "4 Rattus norvegicus 300 3.000000e-66 36.00 215.0 \n", ".. ... ... ... ... ... \n", "775 Mus musculus 268 1.000000e-12 27.61 65.9 \n", "776 Rattus norvegicus 239 1.000000e-11 28.03 62.8 \n", "777 Oryctolagus cuniculus 231 1.000000e-11 27.71 61.2 \n", "778 Homo sapiens 239 1.000000e-09 25.94 57.0 \n", "779 Canis lupus familiaris 160 4.000000e-07 24.38 47.4 \n", "\n", " moa_count drug_count druglike_count druggability_score \n", "0 0.0 0.0 0.0 0.00 \n", "1 0.0 0.0 37.0 0.43 \n", "2 0.0 0.0 30.0 0.16 \n", "3 0.0 0.0 0.0 0.00 \n", "4 0.0 0.0 0.0 0.00 \n", ".. ... ... ... ... \n", "775 0.0 0.0 0.0 0.00 \n", "776 0.0 0.0 8.0 0.14 \n", "777 0.0 0.0 0.0 0.00 \n", "778 0.0 0.0 0.0 0.00 \n", "779 0.0 0.0 0.0 0.00 \n", "\n", "[780 rows x 12 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rs_merge_df['druggability_score'] = rs_merge_df.apply(lambda row: druggability_score(query_sequence_details[row['query']]['seq_length'],\n", " row['length'], \n", " row['identity'], \n", " row['moa_count'], \n", " row['drug_count'], \n", " row['druglike_count']),axis=1)\n", "\n", "rs_merge_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great, we have an extra column in the data frame, which contains the Druggability Score. As sequence identity contributes significantly to the score, we could just take the max value for the druggability_score column and say this is its Druggability Score for this particular protein. So the predicted druggability_score for the first query sequence defined in query_sequence is:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P06804_TNFA_MOUSE: 1.0\n" ] } ], "source": [ "grouped_df = rs_merge_df.groupby('query')['druggability_score'].max().reset_index()\n", "print grouped_df.ix[0]['query']+\":\",grouped_df.ix[0]['druggability_score']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Conclusion\n", "\n", "We can wrap up this tutorial by presenting the Druggability Score for all sequences defined in query_sequence in a friendly pandas data frame:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
querydruggability_score
0Q96P68_OXGR1_HUMAN0.66
1Q86XF0_DHFRL1_HUMAN1.00
2Q9UKX5_ITGA11_HUMAN0.26
3P06804_TNFA_MOUSE1.00
4P48050_KCNJ4_HUMAN0.60
5Q80Z70_SE1L1_RAT0.05
6P33277_GAP1_SCHPO0.00
7Q96PD4_IL17F_HUMAN0.37
8P10144_GRAB_HUMAN0.75
\n", "
" ], "text/plain": [ " query druggability_score\n", "0 Q96P68_OXGR1_HUMAN 0.66\n", "1 Q86XF0_DHFRL1_HUMAN 1.00\n", "2 Q9UKX5_ITGA11_HUMAN 0.26\n", "3 P06804_TNFA_MOUSE 1.00\n", "4 P48050_KCNJ4_HUMAN 0.60\n", "5 Q80Z70_SE1L1_RAT 0.05\n", "6 P33277_GAP1_SCHPO 0.00\n", "7 Q96PD4_IL17F_HUMAN 0.37\n", "8 P10144_GRAB_HUMAN 0.75" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Show all results in final table\n", "pandas.set_option('display.max_rows', 500)\n", "\n", "druggability_results_df = DataFrame({'query':query_sequence_order}).merge(\n", " grouped_df,\n", " how='left', \n", " on='query').fillna('No BLAST hits')\n", "druggability_results_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Caveats with Drugability Score presented in this tutorial\n", "\n", "* Species information is ignored\n", "* No distinction is made between small molecule drugs and biotherapeutics, e.g. monoclonal antibodies\n", "* Multiple HSPs between query and target hits are ignored\n", "* It has not been tested beyond the scope of this notebook" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 0 }