{ "metadata": { "name": "", "signature": "sha256:6540f25d738ead3103e8fd847199ee82ed7c768235ffeb1a31c2ebc19689634d" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Finding the best local alignment in a database" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next idea we'll explore is database searching. In this context, what that means is that **we have some *query* sequence, and we want to know which *reference* sequence in a database it is most similar to**. This could be achieved in a few ways. It could be implemented with local alignment by representing the database as one long sequnence (if we build some functionality to support that into the algorithm), or with local or global alignment (depending on your application) by running our align function many times to search one *query* sequence against many *reference* sequences in the database. \n", "\n", "When our database starts getting hundreds of millions of bases long (as would be the case if we were searching against 97% OTUs from the Greengenes rRNA reference database), billions of bases long (as would be the case if we were searching against the human genome) or trillions of bases long (as would be the case if we were seraching against the NCBI non-redundant nucleotide database), **runtime becomes an important consideration**. \n", "\n", "First, let's do some notebook configuration." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from __future__ import division, print_function\n", "from time import time\n", "from random import random, shuffle\n", "from IPython.core import page\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from scipy.spatial.distance import hamming\n", "from skbio.parse.sequences import parse_fasta\n", "from skbio.alignment import local_pairwise_align_ssw\n", "\n", "page.page = print" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The main function that we'll use for database searching is ``local_pairwise_align_ssw``. Remember that you can always get help with a function by passing it as an argument to ``help``:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "help(local_pairwise_align_ssw)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Help on built-in function local_pairwise_align_ssw in module skbio.alignment._ssw_wrapper:\n", "\n", "local_pairwise_align_ssw(...)\n", " Align query and target sequences with Striped Smith-Waterman.\n", " \n", " Parameters\n", " ----------\n", " sequence1 : str or BiologicalSequence\n", " The first unaligned sequence\n", " sequence2 : str or BiologicalSequence\n", " The second unaligned sequence\n", " \n", " Returns\n", " -------\n", " ``skbio.alignment.Alignment``\n", " The resulting alignment as an Alignment object\n", " \n", " Notes\n", " -----\n", " This is a wrapper for the SSW package [1]_.\n", " \n", " For a complete list of optional keyword-arguments that can be provided,\n", " see ``skbio.alignment.StripedSmithWaterman``.\n", " \n", " The following kwargs will not have any effect: `suppress_sequences` and\n", " `zero_index`\n", " \n", " If an alignment does not meet a provided filter, `None` will be returned.\n", " \n", " References\n", " ----------\n", " .. [1] Zhao, Mengyao, Wan-Ping Lee, Erik P. Garrison, & Gabor T.\n", " Marth. \"SSW Library: An SIMD Smith-Waterman C/C++ Library for\n", " Applications\". PLOS ONE (2013). Web. 11 July 2014.\n", " http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082138\n", " \n", " See Also\n", " --------\n", " skbio.alignment.StripedSmithWaterman\n", "\n" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we're going to define some sequences in fasta format to serve as our reference database. These are derived from the [Greengenes](http://greengenes.secondgenome.com/) [13_8](ftp://greengenes.microbio.me/greengenes_release/gg_13_5/) database, and we're pulling them from the [QIIME default reference project](https://github.com/biocore/qiime-default-reference). We can load these as a list of sequences using ``skbio.parse.sequences.parse_fasta``, and count them by taking the length of the list." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from qiime_default_reference import get_reference_sequences \n", "\n", "reference_db = list(parse_fasta(get_reference_sequences()))\n", "shuffle(reference_db)\n", "print(len(reference_db))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "99322\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the sake of runtime, let's work with only a random (approximately) 10% of these sequences." ] }, { "cell_type": "code", "collapsed": false, "input": [ "temp_reference_db = []\n", "fraction_to_keep = 0.10\n", "for e in reference_db:\n", " if random() < fraction_to_keep:\n", " temp_reference_db.append(e)\n", "reference_db = temp_reference_db\n", "print(len(temp_reference_db))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "9932\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can next define our search function which takes a query sequence, a reference sequence, and an aligner function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import local_alignment_search\n", "\n", "%psource local_alignment_search" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mlocal_alignment_search\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreference_db\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlocal_pairwise_align_ssw\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_match\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mseq_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mreference_db\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0malignment\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mscore\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0malignment\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mscore\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mscore\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mbest_score\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mscore\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_match\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mseq_id\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0malignment\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0malignment\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mbest_a1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_a2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_score\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_match\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's define a query sequence to search against this database. We'll define `query1` to be an exact subsequence of one of the database sequences." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query1_id, query1 = reference_db[-1]\n", "query1 = query1[100:300]\n", "\n", "print(query1_id)\n", "print(query1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "273556\n", "ACCTGCCCTCGAGTGGGGAATAACCTTGGGAAACCGGGGCTAATACCGCATATGGTCTACGGACTAAAGCTTTATGCGCTCGAGGAGGGGCCCGCGGTCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGATCGATAGCCGGCCTGAGAGGGCACACGGCCACACTGGCACTGAGACACGGGCCAGGCTCC\n" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can now perform some database searches. Experiment with different sequences to see how they align by modifying `query1` in the next cell and then executing it (remember that you'll need to execute all of the above cells before executing this one). \n", "\n", "Also, think about the runtime here. How many sequences are we searching, and how long are they? Does this strategy seem scalable?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "start_time = time()\n", "a1, a2, score, ref_id = local_alignment_search(query1, reference_db)\n", "stop_time = time()\n", "\n", "alignment_length = len(a1)\n", "percent_id = 1 - (hamming(a1, a2)/alignment_length)\n", "\n", "print(a1)\n", "print(a2)\n", "print(score)\n", "print(ref_id)\n", "print(alignment_length)\n", "print(percent_id)\n", "print(\"Runtime: %1.4f sec\" % (stop_time - start_time))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACCTGCCCTCGAGTGGGGAATAACCTTGGGAAACCGGGGCTAATACCGCATATGGTCTACGGACTAAAGCTTTATGCGCTCGAGGAGGGGCCCGCGGTCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGATCGATAGCCGGCCTGAGAGGGCACACGGCCACACTGGCACTGAGACACGGGCCAGGCTCC\n", "ACCTGCCCTCGAGTGGGGAATAACCTTGGGAAACCGGGGCTAATACCGCATATGGTCTACGGACTAAAGCTTTATGCGCTCGAGGAGGGGCCCGCGGTCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGATCGATAGCCGGCCTGAGAGGGCACACGGCCACACTGGCACTGAGACACGGGCCAGGCTCC\n", "400.0\n", "273556\n", "200\n", "1.0\n", "Runtime: 5.6031 sec\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next cell, I took a shorter exact match from `query1`. What is the effect on our database base search here? How can this happen?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "query2 = query1[25:35]\n", "\n", "start_time = time()\n", "a1, a2, score, ref_id = local_alignment_search(query2, reference_db)\n", "stop_time = time()\n", "\n", "alignment_length = len(a1)\n", "percent_id = 1 - (hamming(a1, a2)/alignment_length)\n", "\n", "print(a1)\n", "print(a2)\n", "print(score)\n", "print(ref_id)\n", "print(alignment_length)\n", "print(percent_id)\n", "print(\"Runtime: %1.4f sec\" % (stop_time - start_time))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "TTGGGAAACC\n", "TTGGGAAACC\n", "20.0\n", "311349\n", "10\n", "1.0\n", "Runtime: 3.5120 sec\n" ] } ], "prompt_number": 8 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Using heuristics to reduce runtime for database searches" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**As illustrated above, runtimes for performing pairwise alignments can be prohibitive for database searching.** The Smith-Waterman implementation we're using here, [SSW](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0082138), is very fast. As we covered in the prvious chapter, if we were to create an even faster implementation, that provide some initial help (it'd reduce our runtime by some factor $f$) but ultimately, we're still going to have a problem regardless of how big $f$ is, because **the runtime of the algorithm scales quadratically with sequence lengths**. Experiment with different values of $f$ to see how it changes the curve below. \n", "\n", "$f$ represents the fold-reduction in runtime (e.g., $f=2$ represents a two-fold reduction, or halving, of runtime, and $f=10$ equals a ten-fold reduction in runtime)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "seq_lengths = range(25)\n", "times = [t * t for t in range(25)]\n", "f = 10000 # no scaling factor\n", "times = [t / f for t in times]\n", "\n", "plt.plot(range(25), times)\n", "plt.xlabel('Sequence Length')\n", "plt.ylabel('Runtime (s)')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAfoAAAFkCAYAAADIefl6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xd8VNed//+XJFRRQQIhOqIeQFSDGzjY2OCOwTa4xIlb\nHNfYTpx8d7PZzf52s/vdzfeR2KkuiW3sxGk2NrjhFlcMNnZsiqgHJKpAEkKAepuZ+/tjRs6YUCTQ\n1Z3yfj4ePJiZM+Wjw4j3vefee06C4ziIiIhIbEr0ugARERFxj4JeREQkhinoRUREYpiCXkREJIYp\n6EVERGKYgl5ERCSG9XDrjY0xicAjwESgBbjNWlsa1j4X+CHgAxZZa58IPf4vwFwgGfi1tfZ3btUo\nIiIS69zco58PpFhrpwPfBx5sbzDGJAMPAXOAc4HbjTF9jTHnAWeHXnMeMNzF+kRERGKem0E/A3gD\nwFr7CTAtrG0sUGKtrbHWtgErgJnAhcB6Y8yLwCvAyy7WJyIiEvPcDPpsoDbsvj80nN/eVhPWVgfk\nAH0IbhAsAO4E/uhifSIiIjHPtWP0BEM+K+x+orU2ELpdc0RbFnAYqAa2WGt9wFZjTLMxpo+19sCx\nPsRxHCchIaGLSxcREYlYnQo9N4N+JcGT6hYbY84CisPatgCjjDG5QAPBYfufAM3A/cBDxpgBQE+C\n4X9MCQkJVFXVuVC+tMvPz1IfdwP1s/vUx+5TH7svPz/rxE8K42bQLwXmGGNWhu7fYoy5Hsi01j5u\njHkAeJPg4YMnrbXlwDJjzExjzKehx++21mrVHRERkZOUEAOr1znaenSXttC7h/rZfepj96mP3Zef\nn9WpoXtNmCMiIhLDFPQiIiIxTEEvIiISwxT0IiIiMUxBLyIiEsMU9CIiIjFMQS8iIhLDFPQiIiIu\nq65p5rcvb6S8uqHbP1tBLyIi4iKfP8CjL21g1aZKqg43d/vnK+hFRERctGT5drbvq+WsogImDM/r\n9s9X0IuIiLikuPQAb3yym4LcdL5+ocGL1VYV9CIiIi44WNvME69upkdSInfNH096qpvryB2bgl5E\nRKSL+QMBfvvyRuqb2rjugpEMKejc0rJdSUEvIiLSxV5esZOtZTVMNfnMmjLQ01oU9CIiIl1o086D\nvPrRTvrkpHHLJWM8OS4fTkEvIiLSRWrqW/jtK5tITEzgznnjyUhL9rokBb2IiEhXCAQcfvvKJmob\nWllw3giGD8j2uiRAQS8iItIllq3axeZdh5g0ojcXnj7Y63K+oKAXERE5RVv3HObFD7eTm5XKNy4f\n5/lx+XAKehERkVNQ19jKb17eSAIJ3HFFEZnp3h+XD6egFxEROUkBx+HJZZs5VNfClTOHMXpwL69L\n+gcKehERkZP01qd7KC6tpqgwl0vOGup1OUeloBcRETkJpXtreOGDUnJ6pnDb3CISI+i4fDgFvYiI\nSCc1NLfx2EsbCQQcbp87jpyeKV6XdEwKehERkU5wHIenXttCdW0zc2cUMraw+5ee7QwFvYiISCe8\nu3ovq7dWMWZIL66YMczrck5IQS8iItJBuyrqePbdbWSmJ/PNuUUkJkbmcflwCnoREZEOaGrx8ehL\nG/D5Hb45dxy5Walel9QhCnoREZETcByH372xhf2HmrjkrCFMGN7b65I6TEEvIiJyAsvX7ePTzfsZ\nMTCbK78y3OtyOkVBLyIichxl++v509vb6JnWgzuuKKJHUnRFZ3RVKyIi0o2aW4PH5dt8AW69dCx9\nctK9LqnTFPQiIiJHETwubymvbmT2tEFMGZ3vdUknRUEvIiJyFO98XsYnmyoZMTCba2aN9Lqck6ag\nFxEROUJJWQ3PvltCVkYyd8+fEHXH5cNFb+UiIiIuqG1o5ZEX1xNwHO68oihqrpc/FgW9iIhIiD8Q\n4LGXNnC4vpWrzx0R8fPYd4SCXkREJGTp8h1s2X2YKaP6cMmZQ7wup0so6EVERIA1W6t4bdUu+uam\n843LxpEQoevLd5aCXkRE4l7loUaeWLaJlB6J3HPlBDLSenhdUpdR0IuISFxrafPz8JINNLX4ufFi\nw+C+mV6X1KUU9CIiErccx+H3b1jKquqZNWUg08f397qkLqegFxGRuPX+2n18vLGCYf2zue6CUV6X\n4wrXDkIYYxKBR4CJQAtwm7W2NKx9LvBDwAcsstY+EXp8NVATetp2a+033KpRRETi1/Z9tfz57a1k\npidz9/zxJPeIzX1fN882mA+kWGunG2POBB4MPYYxJhl4CJgGNAIrjTEvAXUA1tpZLtYlIiJxrq4x\nOCmO3+9wxxVF9M5J87ok17i5+TIDeAPAWvsJwVBvNxYosdbWWGvbgBXAucAkIMMY86Yx5p3QBoKI\niEiXCQQcfvvyRg7WtjD/K8MoGhb9k+Icj5tBnw3Uht33h4bz29tqwtrqgBygAfiJtfYi4E7gj2Gv\nEREROWUvrdjBxp2HmDSiN5dNL/S6HNe5OXRfC2SF3U+01gZCt2uOaMsCDgFbgRIAa+02Y0w10B/Y\ne7wPys/POl6zdAH1cfdQP7tPfey+SO7jv22q4JWPdlKQl8H3bz6DzIwUr0tynZtBvxKYCyw2xpwF\nFIe1bQFGGWNyCe7FzwR+AtxC8OS9e4wxAwju+Zef6IOqquq6uHQJl5+fpT7uBupn96mP3RfJfbz/\ncBM//cPnJPdI5M4rimhqaKGpocXrsjqtsxtSbg6LLwWajTErCZ6I9x1jzPXGmG+Gjss/ALwJfAQ8\naa0tB54Eso0xy4G/ALeEjQKIiIiclNY2P48sXU9ji4+vXTiaof0id9ShqyU4juN1DafKidStx1gR\nyVvosUT97D71sfsitY8XvbaZFcXlzJzUn5svGet1OackPz+rU5Pw60Q3ERGJacvX7WNFcTlD+2Vx\nw5zRXpfT7RT0IiISs3ZW1PKHt7bSM60H98wfT3KPJK9L6nYKehERiUn1TW08snQDfn+Ab84tok+v\ndK9L8oSCXkREYk4g4PD4K5s4UNPM3BmFTBzR2+uSPKOgFxGRmLP0w+2s317N+GF5XDFjmNfleEpB\nLyIiMeXTzZUs+3gXfXPTuWNeEYmJnTpJPeYo6EVEJGbsrqxj0bLNpKYkce/VE+mZlux1SZ5T0IuI\nSEyobWzlVy8U0+oLcPvccQzs09PrkiKCgl5ERKKezx/gkaUbqK5t4cqvDGPKqHyvS4oYCnoREYl6\nf35nG1v3HGaayefyOFiRrjMU9CIiEtU+WLuX91bvZVB+JrdeNpaEhPg++e5ICnoREYla28oO84e3\ntpKZnsy9V08gLcXNRVmjk4JeRESi0sHaZh5eugHHgbvmFZEfpzPfnYiCXkREok5rm59fLVlPbUMr\n110wkrGFeV6XFLEU9CIiElUcx+HpN7awq6KOcyb254Kpg7wuKaIp6EVEJKq8+ekeVm2sZMSAbL5+\nodHJdyegoBcRkaixfns1i98voVdmCvdcNYHkHoqxE1EPiYhIVKg82MhjL20kKTGRb101kV6ZqV6X\nFBUU9CIiEvGaWnz88oVimlp83HSxYfiAbK9LihoKehERiWgBJ7i2fHl1IxeePpgZE/p7XVJUUdCL\niEhEe/HDHawtOcC4wlwWzhrhdTlRR0EvIiIR629b9vPqRzvJ75XGnfPGk5So2Oos9ZiIiESk3ZV1\nPLlsE6kpSdx39UQy07W2/MlQ0IuISMSpa2zlVy+sp7UtwDcvH8fA/EyvS4paCnoREYkoPn+AR1/c\nQHVtM/POGcZpo7W2/KlQ0IuISER59p0Stuw+zGmj85k7o9DrcqKegl5ERCLGO5+X8c7qMgbm9+Qb\nl40lUdPbnjIFvYiIRITi0gP86e2tZGckc/+CiaSnam35rqCgFxERz+3ZX8+jL22kR1Ii9y6YSJ8c\nrS3fVRT0IiLiqZr6Fn7x/DpaWv3cdvk4RgzI8bqkmKKgFxERz7S0+fnlC8UcrG3hqpnDOX1MX69L\nijkKehER8UTAcXji1U3sKK9jxvh+XHb2UK9LikkKehER8cSSD7bzua3CDO7FTZeMIUFn2LtCQS8i\nIt3uw3X7eG3VLgpy07nnqgn0SFIcuUU9KyIi3WrzrkP8/k1Lz7QefHvhJM1h7zIFvYiIdJvy6gYe\nXrIegG9dNYGCvAyPK4p9CnoREekWdY2t/GJxMY0tPm6+ZAxmSK7XJcUFBb2IiLiuzRfg10vWs/9w\nE5dPH8qMCf29LiluKOhFRMRVjuPw9Oub2VZWw+lj+jL/K8O9LimuKOhFRMRVr3y0k483VjJ8QLYW\nqvGAgl5ERFyzalMFL364g97Zadx79URSkpO8LinuKOhFRMQVJWU1LFq2hfTUJL69cCI5PVO8Liku\nKehFRKTL7T/cxK+WFBMIONw1bzwD8zO9LiluubbYrzEmEXgEmAi0ALdZa0vD2ucCPwR8wCJr7RNh\nbX2Bz4ELrLVb3apRRES6XmNzG79YvI66xja+fuFoxg/v7XVJcc3NPfr5QIq1djrwfeDB9gZjTDLw\nEDAHOBe4PRTu7W2/ARpcrE1ERFzg8wd45MUNlFc3MmfaYGadNsjrkuKem0E/A3gDwFr7CTAtrG0s\nUGKtrbHWtgErgJmhtp8AjwLlLtYmIiJdzHEcHltSzKadh5g0ojfXnj/S65IEd4M+G6gNu+8PDee3\nt9WEtdUBOcaYm4Eqa+1bocd1DYaISJR489M9vLlqF0P6ZnLHvCISE/VfeCRw7Rg9wZDPCrufaK0N\nhG7XHNGWBRwG7gMcY8xsYDLwO2PMPGtt5fE+KD8/63jN0gXUx91D/ew+9bE7PlhdxnPvlZCXncZ/\n3jGdPr3SvS5JQtwM+pXAXGCxMeYsoDisbQswyhiTS/BY/EzgJ9baF9qfYIx5D7jjRCEPUFVV16WF\ny5fl52epj7uB+tl96mN3bNp5kJ89t4701CT+45tn4bT51M8u6uzGqptBvxSYY4xZGbp/izHmeiDT\nWvu4MeYB4E2Chw+etNbqmLyISJTZXVnHr5esJyEBvnXVRIYNyFHIR5gEx3G8ruFUOfpSuUt7Qd1D\n/ew+9XHXqjrcxP888zm1Da3cMa+IM8YWqI+7QX5+VqdOftCEOSIi0ml1ja089Nw6ahpaue6CUZwx\ntsDrkuQYFPQiItIpLW1+fvl8MZUHG7n4zCHMOX2w1yXJcSjoRUSkw/yBAI+9uIHSfbWcVVTAgvNG\neF2SnICCXkREOsRxHJ5507KutJqiwlxuvVRLzkYDBb2IiHTISyt2sHxdOUMKMrn7ygn0SFKERAP9\nK4mIyAm9v2YvL6/cSZ+cNL6zcBLpqW5enS1dSUEvIiLHtWZrFc+8ZclMT+a7104mJzPV65KkExT0\nIiJyTCVlNTz28kaSeyTy7YWTKMjL8Lok6SQFvYiIHFV5dQO/eH4dfr/D3fPHM3xAttclyUlQ0IuI\nyD84VNfCQ8+upaHZx02XGCaO6ON1SXKSFPQiIvIljc0+fvbcOqprW7jyK8P4ysQBXpckp0BBLyIi\nX2jzBfj1kmLKquqZNWUgl08v9LokOUUKehERASDgODzx6ia27D7MaaPzuWHOaBI0IU7UU9CLiAiO\n4/DsOyX8bct+Rg7K4fa540hMVMjHAgW9iIjw5qd7+Otne+jfO4P7rp5ISnKS1yVJF1HQi4jEuY83\nVPDceyXkZqXywDWTyUxP9rok6UIKehGROLZmaxVPLttMemoPvrNwEr1z0rwuSbqYgl5EJE5t3HmQ\nR1/aQHKPRL5zzSQG9c30uiRxgYJeRCQOlZTV8KsXigH41tUTGDkwx+OKxC0KehGROLO7so6fLV6H\nz+dw17zxFBXmeV2SuOiE6wwaY7KAWcAoIABsA9621ja7XJuIiHSx8uoGHnx2Lc0tPm6bO44po/O9\nLklcdsygN8b0BP4duAooBnYBbcDZwM+NMS8A/2Wtre+OQkVE5NQcqGnip39ZS11jG1+/yHB2UT+v\nS5JucLw9+meAx4EfWGv94Q3GmCTgcuCPwDz3yhMRka5wuL6Fn/55LYfqWlg4awSzpgz0uiTpJscL\n+gXW2sDRGkLB/5Ix5hV3yhIRka5S39TGg8+uZf/hJi6fPpRLzhzqdUnSjY4Z9O0hb4wZCZwF/Al4\nDDgN+I619sNjbQiIiEhkaGrx8bPn1rK3qoELpg7iyq8M97ok6WYdOev+KaAVuAIYDTwA/NTNokRE\n5NS1tvn55fPF7CivY8aEflw/e5QWqYlDHQn6NGvtcwSPyf/JWrucDpytLyIi3vH5Azzy4gbsnsNM\nNfncfMkYEhXycakjQe8zxiwgGPSvGmPmA/4TvEZERDwSCDg8/somikurGT88j9vnFpGUqGlT4lVH\n/uXvAC4F7rHW7gOuAW5ztSoRETkpjuPwuze28Lct+xk9KId7rpxAcg+FfDw73nX0k621a621xcCt\n7Y9ba7965HNcrlFERDrAcRz+8k4JHxaXM7RfFvctmESqlpuNe8c71n6DMea7BK+n/9Ba2wRgjMkA\nzgVuAfYACnoRkQjw0ood/PWzPQzo05MHrplERppOp5LjX173f4wxk4DvAn82xgD4CA73vw78d2hv\nX0REPPbmp7t5eeVO8nul8d1rJ5OVkeJ1SRIhjru5Z61dB9xojEkA+gABa211t1QmIiIdsnzdPp59\nt4RemSl877op5Galel2SRJAOjetYax2gyuVaRESkkz7dXMnvXt9CZnoy37tuCvm90r0uSSKMTsUU\nEYlSa7cd4PFXNpGWmsR3r53MgD49vS5JIpCCXkQkCq3ddoCHl64nKSmB+xdMYmi/LK9LkgjVkfXo\nU4HvAQa4D7gf+F9rbavLtYmIyFGEh/x3Fk5i9OBeXpckEawje/QPA5nAVIJn3Y8EnnSzKBERObq1\nJV8OeTMk1+uSJMJ1JOinWmv/BWi11tYDNxJcwU5ERLrR2pIDPBIK+W8vUMhLx3Qk6APGmPALMvsA\nWp5WRKQbrQuFfGJCMOTHDFXIS8d0JOh/AbwN9DPG/AL4HPi5q1WJiMgXikuDw/WJCQncv1AhL51z\nwpPxrLW/N8Z8DswiuGFwuWbEExHpHsWlB/j1kr+H/FiFvHTSCffoQ2fdjwDqgBpgijHmRrcLExGJ\nd18K+QUTFfJyUjoyM97rob93HfH477u4FhERCSkurebXS9aTkJDAfQsmMrYwz+uSJEp1JOh7W2sn\ndfaNjTGJwCPARKAFuM1aWxrWPhf4IcFL9hZZa58wxiQBjwOjAQe401q7sbOfLSISzYIhX0xCaE9+\nnEJeTkFHTsZ71xgzJxTcnTEfSLHWTge+DzzY3mCMSQYeAuYQXPL2dmNMX2AuwYVzzgH+Dfi/nfxM\nEZGotn773/fkFfLSFToS3ruBNwGfMSYQ+uPvwOtmAG8AWGs/AaaFtY0FSqy1NdbaNmAFMNNa+yJw\nR+g5hcChjv0YIiLRb/32an71wnoSEuA+hbx0kY4M3X8bKLTW7u7ke2cDtWH3/caYRGttINRWE9ZW\nB+QAWGv9xpingSuBBR35oPx8zfHsNvVx91A/uy9S+3j1lv2hE+/g3249kymmr9clnbRI7eN41ZGg\nLwMOnsR71wLh/9rtIQ/BkA9vyyJs791ae7Mx5p+BT4wxY621Tcf7oKqqupMoTzoqPz9LfdwN1M/u\ni9Q+3rC9ml+G9uTvvXoig/LSI7LOjojUPo4lnd2Q6kjQ7wM2GGNWAu0L2TjW2ltP8LqVBI+5LzbG\nnAWEX3u/BRhljMkFGoCZwE+MMV8HBllr/xdoIjgDn2bhE5GY1R7yAPdePYGiYRqul67VkaBfFvoT\nzunA65YCc0IbCAC3GGOuBzKttY8bYx4geOw/EXjSWltujHkeeNoY8wGQDNxvrW3p0E8iIhJlNuz4\ne8jft2AC44f19rgiiUXHDHpjTD9rbQXwHsFgTwhrPmHQW2sd4K4jHt4a1v4q8OoRr2kCrj1x2SIi\n0W3DjuCJdwD3Xa2QF/ccb4/+SeAy4AP+MdgdYLhbRYmIxLLi0moeXroexwmF/HCFvLjnmEFvrb0s\ndPM0a+2XTsYzxhS6WZSISKz625b9/PbljSQmJnCvQl66wfGG7gcTPH6+zBhzaVhTMsFj9mNcrk1E\nJKasKC7nqdc3k5KcxLcXTNR68tItjjd0/yPgPGAAweH7dj6OOLYuIiLH9/Zne/jT29vomdaD71wz\nmeEDsr0uSeLE8YbubwEwxnzfWvvj7itJRCR2OI7Dso93sWT5drJ7pvC9ayczqG+m12VJHOnI5XW/\nMcZ8C8gleOZ9AsHr6H/kamUiIlHOcRyef7+U1z/ZTe/sVL533RQK8jK8LkviTEeCfjFwGNjAP15m\nJyIiRxFwHP741lbeW7OXgtx0vnfdFHrnpHldlsShjgR9gbV2tuuViIjECH8gwKJlW/h4YwWD8jP5\n7nWTyemZ4nVZEqc6snrdGmNMp9ejFxGJR22+AI++uJGPN1YwfEA2/3zDFIW8eKoje/QTgNXGmP1A\nc+gxx1qrCXNERMK0tPr59ZJiNu48xJghvbj36omkp3bkv1kR93TkGziffzwu35G57kVE4kZjs4+f\nP7+OkrIaJo3ozd1Xjie5R5LXZYl0KOjP4+jB/vuuLUVEJDrVNrbys2fXsauyjjPG9uW2y8fRI6kj\nR0ZF3NeRoJ/F34M+GfgKsBwFvYgIh+pa+Olf1lBe3cjMSf258aIxJCbq4iSJHCcMemvtzeH3jTF5\nwHNuFSQiEi32H27ip39ew4GaZi48fTDXnj+ShASFvESWkzlLpAEo7OI6RESiyt4DDTz4lzUcrm9l\n3jnDuGJGoUJeItIJg94Y817Y3QSCy9Muc60iEZEIt6uijgefXUt9UxvXnj+Si84Y4nVJIsfUkT36\n/wy77QAHrLUbXapHRCSibSs7zM8Xr6O5xc9NFxvOnTzQ65JEjuu4QW+MMcBWa+2+sMcKjDG/tdbe\n7np1IiIRZF3JAR59aQN+v8PtVxRx5rgCr0sSOaHjrUf/H8D3QrevBN4L3f8BsKo7ihMRiRQfrN3L\n79+0JCclcs9VE5g8so/XJYl0yPH26G8CRhFcj/6/gH8GCoCF1to3u6E2ERHPOY7DSyt28PLKnWSm\nJ3P/gomMGJjjdVkiHXa8oK+11pYD5caY04FngP9jrfV3T2kiIt7y+QP8/k3LiuJy+uSk8cC1k+mn\nZWYlyhwv6ANhtw8A37XWaupbEYkLza0+Hn1xI+u3VzO0XxbfXjhJi9NIVOrodfTNCnkRiRc1Da38\nfPE6dlXUMWF4b+6aX0Raihankeh0vG9ukTFmR+j2gLDboNXrRCRGVRxs5GfPraXqcDPnTOjPjRcb\nzVsvUe14QT+626oQEYkApXtr+MXzxdQ3tXHFjELmnTNMs91J1Dtm0Ftrd3ZjHSIinlq77QCPvbSB\nNn9AE+FITNFBJxGJe++v2cszb1mSeyRy79UTdY28xBQFvYjELcdxWPrhDl79KHiN/LcXTmL4gGyv\nyxLpUgp6EYlLPn+A372xhZXrK+jbK53vXDuJglxdIy+xR0EvInGnqcXHoy9uYMOOgwzrn8X9CyaR\nrWvkJUYp6EUkrtTUt/DzxcXsqqxj4oje3DVvPKkpSV6XJeIaBb2IxI2y/XX832c+50BNMzMn9efr\nFxmSEnWNvMQ2Bb2IxIWSvTX86oX11DW2Mu+cYVwxo1DXyEtcUNCLSMxbtamCp17bgj/gcPMlY5g5\naYDXJYl0GwW9iMSsgOPwYujyufTUJH5w8xkM7aMz6yW+KOhFJCa1tPp54tVNfL61ivxeady3YBKT\nxxZQVVXndWki3UpBLyIx52BtM798vpjd++sZM6QXd185gcz0ZK/LEvGEgl5EYkrp3hp+tWQ9tQ2t\nnDt5ADfMGa3V5ySuKehFJGZ8vKGCp17fgj8Q4PrZo5g9dZDOrJe4p6AXkagXcByWfLCd11btIj21\nB/fNn8D4Yb29LkskIijoRSSqNbf6ePyVTazZdoC+uencv2Ai/Xv39LoskYihoBeRqHWgpolfPr+e\nsqp6xg7N5a7543XSncgRFPQiEpW2lR3m10vWU9fYxqwpA7l+9iiddCdyFK4FvTEmEXgEmAi0ALdZ\na0vD2ucCPwR8wCJr7RPGmGRgETAUSAX+21r7ils1ikh0WlFczu/e2ILjwNcuHM35pw3yuiSRiOXm\n5u98IMVaOx34PvBge0Mo0B8C5gDnArcbY/oCNwBV1tqZwMXAr12sT0SiTCDg8Ny7JSx6bTOpyUl8\n59pJCnmRE3Bz6H4G8AaAtfYTY8y0sLaxQIm1tgbAGLMCmAksBp4PPSeR4N6+iAhNLT5++/JG1pVW\n0y8vg/sXTKQgT9PZipyIm0GfDdSG3fcbYxKttYFQW01YWx2QY61tADDGZBEM/X/tyAfl52d1TcVy\nTOrj7qF+PrqK6gb+35/XsLuijimj8/mnG08/6ZPu1MfuUx9HFjeDvhYI/9duD3kIhnx4WxZwCMAY\nMxhYAjxsrf1LRz5Ic1e7Kz8/S33cDdTPR2d3H+LhpRuob2pj9tRBXHvBSJrqm2mqb+70e6mP3ac+\ndl9nN6TcDPqVwFxgsTHmLKA4rG0LMMoYkws0EBy2/4kxpgB4C7jbWvuei7WJSIRzHIe//m0Pz71X\nSkIC3Hix4bzJA70uSyTquBn0S4E5xpiVofu3GGOuBzKttY8bYx4A3iR4LP5Ja225MeYXQA7w78aY\nfw+97hJrbec33UUkajW1+Hjqtc18ZqvI7pnCXfOKMENyvS5LJColOI7jdQ2nytEwkbs0FNc91M9B\ne6vqeXjpBioONjJ6UA53zh9Pr8zULnlv9bH71Mfuy8/P6tQCDpowR0QixqpNFTz9+hZa2wJcfMYQ\nrjp3uCbBETlFCnoR8ZzPH+DZd0p4Z3UZaSlJ3D1/PNPG9PW6LJGYoKAXEU8drG3m0Rc3ULqvloH5\nPbnnygn00/XxIl1GQS8intm48yC/eWkj9U1tnFVUwE0XjSE1JcnrskRiioJeRLpdwHFY9vEuXly+\nncTEBL5+4WjOmzKQhIROnWMkIh2goBeRbtXQ3Mbjr2yiuLSavOxU7po/nhEDcrwuSyRmKehFpNvs\nqqjj4aV/mkraAAAV9ElEQVTrOVDTTFFhLrdfUURWRorXZYnENAW9iHSL5ev28Ye3tuLzB7hiRiFX\nzBhGYqKG6kXcpqAXEVe1tvn5w1+3sqK4nJ5pPfjWVeOZOKKP12WJxA0FvYi4Zv/hJh5Zsp7d++sZ\n2i+Le+aPp0+vdK/LEokrCnoRccWabVU8+epmGlt8nDt5AF+dPYrkHrp0TqS7KehFpEu1tPl57t0S\n3luzl+Qeidx66VjOmdjf67JE4paCXkS6zJ799fzm5Y3sO9DAwPye3DG3iEF9M70uSySuKehF5JQF\nHIe3Pyvj+fdL8PkdLpg6iIXnjSAlWUP1Il5T0IvIKalpaOXJZZvYsP0gWRnJ3HrpWCaN1Fn1IpFC\nQS8iJ21dyQEWvbaZusY2xg/P4xuXjiWni9aOF5GuoaAXkU5r8/l57r1S3vm8jB5JCVx3wShmTxtE\nouaqF4k4CnoR6ZSyquAJd3urGhjQpye3zx3HkIIsr8sSkWNQ0ItIhziOw7ur9/LsuyX4/AFmTRnI\nNeePJFUn3IlENAW9iJxQbUMri17bTHFpNZnpydxyaRFTRuV7XZaIdICCXkSOa/32ap5ctpnahlbG\nFebyjcvGkZulE+5EooWCXkSOqs3n5/n3t/PXz/aQlJjANbNGcuEZg3XCnUiUUdCLyD/Ye6CB37y0\nkbKqevrlZXDHFUUM7acT7kSikYJeRL4QcBze+byM598vpc0X4NzJA7ju/FGkpuiEO5FopaAXEQAq\nDjby1Gub2VZWQ8+0Htw+t4ipRifciUQ7Bb1InAsEHN762x6WfridNl+AqaPz+dqFozXDnUiMUNCL\nxLG9Bxp46rXNbN9XS1ZGMrddPo7Tx/T1uiwR6UIKepE45PMHeOOT3by8cgc+v8OZ4wr46uxRZGWk\neF2aiHQxBb1InNmzv55Fyzazq7KOnJ4p3HiRYcpoHYsXiVUKepE44fMHePWjnSz7eBf+gMOM8f24\nbvYoeqYle12aiLhIQS8SB3ZW1LJo2RbKqurJzUrlpovHMHFEb6/LEpFuoKAXiWFtPj8vr9zJ66t2\nE3AcZk4awDWzRpKRpl99kXih33aRGFW6r4ZFyzZTXt1In5w0brpkDEWFeV6XJSLdTEEvEmNa2/ws\n/XA7b/1tD44D5582kAXnjSAtRb/uIvFIv/kiMWTrnsM89dpmKg810bdXOrdcOgYzJNfrskTEQwp6\nkRhQ39TGkg9K+WDtPgAuPH0wV84cTmqy5qgXiXcKepEoFnAcPly3j+ffL6Wh2ceAPj25+ZIxjByY\n43VpIhIhFPQiUWpnRS3PvLmVHeW1pKYkce35I7lg6iB6JCV6XZqIRBAFvUiUqW9qY8ny7XywZi8O\ncOa4Aq6ZNZLcLC1CIyL/SEEvEiUCjsOK4nKef7+U+qY2+vfO4GtzRjNWl8yJyHEo6EWiwK6KOv7w\nlqV0Xy2pyUksnDWCOdMGa5heRE5IQS8SwRqa21i6fDvvrdmL48DpY/py7fkjyctO87o0EYkSCnqR\nCBRwHD5aX8Hi90uoa2yjX14GN1w4WjPbiUinuR70xphE4BFgItAC3GatLQ1rnwv8EPABi6y1T4S1\nnQn82Fo7y+06RSLF7so6/vDWVkr21pCSnMiC80Zw4ekapheRk9Mde/TzgRRr7fRQcD8YegxjTDLw\nEDANaARWGmNettbuN8b8E/A1oL4bahTxXH1TG3/861beXV2G48A0k891F4zSML2InJLuCPoZwBsA\n1tpPjDHTwtrGAiXW2hoAY8wKYCbwPFACXAU80w01ingm4Dh8vKGCFz7YzuH6FgryMrhhzijGD9My\nsiJy6roj6LOB2rD7fmNMorU2EGqrCWurA3IArLVLjDGF3VCfiGc27TzI4vdK2VVZR0pyElefO5wL\nTx9Ccg8N04tI1+iOoK8FssLut4c8BEM+vC0LONTZD8jPzzrxk+SUqI+71o59NTz96iZW2/0AzJwy\nkJsuHUffvAyPK4t9+i67T30cWboj6FcCc4HFxpizgOKwti3AKGNMLtBAcNj+J539gKqquq6oU44h\nPz9LfdxFqmuaWfrhdj7eUIEDjB2ay8JZIyjsl01+Xob62WX6LrtPfey+zm5IdUfQLwXmGGNWhu7f\nYoy5Hsi01j5ujHkAeBNIBJ601pYf8XqnG2oUcVVDcxvLPt7F25+V4fMHGJSfycJZIxg/LI+EhASv\nyxORGJbgOFGfo462Ht2lLfST1+bz887ne1n28U4amn3kZady5VeGc3ZRPxITvxzw6mf3qY/dpz52\nX35+Vqf2DjRhjogLAo7Dqo0VLF2+neraFjJSe7Bw1ghmTx1Ecg+tES8i3UdBL9LFNuyo5vn3Stm9\nv54eSQlcdMZgLju7kMz0ZK9LE5E4pKAX6SK7KupY/H4Jm3YeIgE4u6gfV84cRp+cdK9LE5E4pqAX\nOUUHDjex5MPtrNpYCUDRsDwWnjeCIQW6xEhEvKegFzlJNfUtvP7Jbt5dXYbP7zCkbyYLZ42kaJgW\nnhGRyKGgF+mkg7XNvP7Jbpav20ebL0Dv7DSuOnc4Z44rIFGXyolIhFHQi3RQ1eEmXl+1ixXry/H5\nHXpnp3LpWUM5Z+IATVkrIhFLQS9yApUHG3n14518vKGSgOPQt1c6l509lLPH99PSsSIS8RT0Isew\nt6qeZR/v4pPNlTgO9O+dweXTCzljbF+SEhXwIhIdFPQiR9hVUcerH+/kc1sFwOC+mcydXshpJl/H\n4EUk6ijoRUK276vllZU7WFdaDUBhvyzmzihk8sg+mo9eRKKWgl7i3tY9h3nlo51s3HEQgJGDcrhi\neiFFWnBGRGKAgl7ikuM4bN51iFdW7sTuOQwEl4ydO70QM6SXAl5EYoaCXuKKzx/gM7uftz8rY/u+\nWgAmDO/N3OmFjByU43F1IiJdT0EvcaG2oZUP1u7l3TV7qalvJQGYMqoPl08vZFj/bK/LExFxjYJe\nYtrOilre+ayMTzZX4vM7pKcmMWfaYM6fOpCC3AyvyxMRcZ2CXmKOzx9g9dYq3v68jJKyGgAK8jKY\nPXUQ08f3Iz1VX3sRiR/6H09iRm1jK8vX7uO9NXs5VNcCwMQRvZk9dRDjhuXpGngRiUsKeol6uyrq\neOfzMlZtqsTnD5CWksTsqYM4f+og+uVpeF5E4puCXqKSPxBgzdYDvP3ZHra2D8/npnPB1EHMmNBf\nw/MiIiH631CiSl1jK8vX7ePd1X8fnh8/PI/ZUwczfriG50VEjqSgl4jn8wdYX1rNyg0VrCs5gD/g\nkJqcxPmnDeSCqYPo37un1yWKiEQsBb1EJMdx2F1Zz8oN5XyyqZK6xjYABuX35JyJAzhnQn8y0vT1\nFRE5Ef1PKRGlpqGVVRsrWLm+nLKqBgAy05OZPW0QM8b3Z0hBpqanFRHpBAW9eK7NF2BdyQFWrC9n\nw/aDBByHpMQEThudz4zx/Zgwojc9krT+u4jIyVDQiyccx2FHeR0r15fz6eZKGpp9AAztl8WM8f04\nc1wBWRkpHlcpIhL9FPTSrQ7VtfDRhnI+2lBBeXUjADk9U7j4jCFMn9CPQfmZHlcoIhJbFPTiuvqm\nNtaVHGDVpko27TiIA/RISuT0MX2ZMaE/RcNySUrU0LyIiBsU9OKK6ppm1myrYvXWKrbuqSHgOACM\nGJDNjAn9OX1sX3qmJXtcpYhI7FPQS5dwHId9BxpYve0Aq7dWsaui7ou24QOyOW10PqeNzteUtCIi\n3UxBLyct4Dhs31fLmq3BPffKQ00AJCUmUFSYy2mj85k8Kp/crFSPKxURiV8KeukUnz/All2HWL3t\nAGu2VVFT3wpASnIiU01wr33iiN4alhcRiRAKejmhphYff9uynzVbq1hXWk1TS/BSuMz0ZM6Z0J/T\nRuczrjCXlOQkjysVEZEjKejlH/gDAXZW1LFl1yE27zrEtrIa2nwBAHpnpzJjQj+mjs5n5KAcnS0v\nIhLhFPRCwHEo21//RbDbPYdpbvV/0V7YP5sJw/I4bXS+pqAVEYkyCvo45DgOFQcbvwj2LbsPU9/U\n9kV7QW46Z43LZczQXMYMyWVEYW+qquqO844iIhKpFPRx4kBNUzDUQ+F+OHQSHUBediozRvZjzJBc\nxg7NJS87zcNKRUSkKynoY5DjOFQdbmL7vlq27A4Ge9Xh5i/aszOSOWNsX8YODQZ7fq90DceLiMQo\nBX2Uaw/1nRV17Kqo++LvxtCZ8QAZqT2YMqrPF8E+oE9PBbuISJxQ0EcRx3GoqmkOBXotO8vr2F1Z\n98XKb+0KctMZPzyPwn7ZjBnaiyF9s0hMVLCLiMQjBX2EchyHA6FQ31FRy67QnvqRod43N52iYcFQ\nH9ovi6EFWWSk6Z9VRESClAge8/kDVB1uovJQE/sPNVF5qJGK6saj7qm3h/rQflnBYC/IJEMz0ImI\nyHEo6LuBzx/gQE0zlQcbQ4Ee/LvyYCPVtc2EFnb7kr690hlXmEdh/ywKC7IY2i9LoS4iIp3mWtAb\nYxKBR4CJQAtwm7W2NKx9LvBDwAcsstY+caLXRCrHcWhs8VFT38qBmiYqDwb3zNv30KtrWr5YpjVc\nds8URg7MoW9uOgW5GRTkZdC3Vzp9c9NJT9U2mIiInDo302Q+kGKtnW6MORN4MPQYxphk4CFgGtAI\nrDTGvAycA6Qe7TXdrbXNT21jK3WNbdQ2tH7pdl1jK7WNbdSFPe4PHGW3nOClbMMHZlOQm07f3AwK\nQqGuMBcRke7gZtLMAN4AsNZ+YoyZFtY2Fiix1tYAGGNWADOBs4HXj/Gaoyo/0EBlVT0+v0ObL0Cb\nP4DPH6DN9/e/2/wBfL5A6Dl+2vzOl57j8wVoaPaFAjwY4i1hU8AeS2pKEtkZyRT2yyIrI4WsjGT6\n5KRRkJehMBcRkYjgZgplA7Vh9/3GmERrbSDUVhPWVgfknOA1R3X7/77dZQUnJSaQlZFMQa90snqm\nkJ2RTFZGCtk9gyGeHXY7KyOFVK3WJiIiEc7NoK8FssLuhwd2zRFtWcDhE7zmqF55cJ4uEO8G+flZ\nJ36SnDL1s/vUx+5TH0cWN9cYXQlcCmCMOQsoDmvbAowyxuQaY1IIDtt/dILXiIiISCclOEe7tqsL\nGGMS+PsZ9AC3AFOBTGvt48aYy4F/J7ix8aS19tGjvcZau9WVAkVEROKAa0EvIiIi3nNz6F5EREQ8\npqAXERGJYQp6ERGRGKagFxERiWFRO21btM6LH22MMav5++RG26213/CynlgSmub5x9baWcaYkcDT\nQADYANxjrdWZsqfoiD6eArwCbAs1P2qtfc676qJfaDrzRcBQIBX4b2Az+i53mWP0cRnwKtB+Vdpx\nv8tRG/QcZy596RrGmDQAa+0sr2uJNcaYfwK+BtSHHnoI+IG1drkx5lFgHvCiV/XFgqP08VTgIWvt\nQ95VFXNuAKqstV83xuQC64A16LvclY7Wx/8JPNjR73I0D91/aS59ggvkSNeaBGQYY940xrwT2qCS\nrlECXAW0z+x4mrV2eej268BsT6qKLUf28VTgMmPMB8aYJ4wxmd6VFjMWE5wPBYJ50oa+y13taH3c\nqe9yNAf9UefF96qYGNUA/MRaexFwJ/BH9XHXsNYuIbhEc7vwqZzrCa79IKfgKH38CfA9a+25wHbg\n//OksBhirW2w1tYbY7IIBtK/8eVc0Xf5FB2lj/8V+JROfJej+T/tTs+LL522FfgjgLV2G1AN9Pe0\notgV/t1tX/tButZSa+2a0O0XgSleFhMrjDGDgXeB31tr/4y+y13uiD7+C538Lkdz0GtefPfdQvDc\nB4wxAwiOopR7WlHsWmOMOTd0+xJg+fGeLCflDWPM6aHbFwCfeVlMLDDGFABvAf9krX069LC+y13o\nGH3cqe9yNJ+MtxSYY4xZGbp/i5fFxKgngaeMMe2/qLdo1KTLtZ+N/F3g8dAiT5uA570rKea09/Gd\nwMPGmDaCG6y3e1dSzPgBwaH5fzfGtB9Hvh/4pb7LXeZoffxt4Gcd/S5rrnsREZEYFs1D9yIiInIC\nCnoREZEYpqAXERGJYQp6ERGRGKagFxERiWEKehERkRgWzdfRi8QkY8wC4PsEfz8TCc6G9VNvqzp5\nxpingfestb9z6f1zgKettVcaYwpDnzXMjc8SiUbaoxeJIMaYgcBPgTnW2snA2cB1xpi53lZ2Shz+\nPmmNG3KByS6+v0hU0x69SGTpAyQDPYFD1toGY8xNQDNAaNrLh4AM4ABwh7V2pzHmNOCJ0Hu8Btxg\nrR125N60MSZgrU0MrXb1MFAEJAH/z1r7F2PMzcDFBMNzOPCWtfYeY0wC8GOCS0H7gN9Ya39pjBkJ\nPAL0BhqBe621a4/ycyUc+UBoas/HgMEE50f/F2vtO8aY/wAGAiMJrsH9hLX2f0Lrcj9GcOXKvQQ3\nHv6L4KyCA4wxLwAPAOnGmD8D44FDwHxr7cGO/gOIxBrt0YtEEGvtOuAlYLsx5hNjzI+BJGttaWhK\n0SeA6621UwkG/uOhl/4B+L619jSCqw6270Efa0/634DPrLXTgHOBfzXGtA93n01wedeJwFxjzHhg\nATCdYHieAdwSCurfEZyDeypwB/CXTvy4vwAWhWqYB/wmbLnNCcAc4Ezg+6Hh+TuBdGvtGIJTXp8e\n+vnuBfZZa68muEGRT3Ct7glAJXBdJ2oSiTnaoxeJMNbau40x/wVcFPqzyhhzA7CN4F72K8aY9qdn\nGWP6AAXW2rdCj/2GE8/jPpvgnu+tofsZBPfuHeAja20DgDFmO5AHzASetda2EVwPe0oolKcRXA+h\n/X17GmNyrbWHOvCjzg5+hPlR6H4PYESohnettT6gyhhzkOBc37OB34b6aLcx5p3Q644cLdhnrW1f\n5GMjwVESkbiloBeJIMaYy4AMa+1i4GngaWPMbcA3CC5usd1aOyX03ESgH9DCl8OuLey2094WGvpu\nl0hweH9tqK0fwWWIv0roMMERr28L/4zQSW+HgOb2ekKPDz5GyB9tZCERmGWtPRx67UCCC3TMD/1M\nR9bgJ3iY4UTC16D/4ucXiVcauheJLA3A/xpjhgCEjo0XAauBLUCeMeac0HNvBf5ora0DNhtj5oce\n/2rY+x0IvR6CAdruXeDu0Gf0B9YQPFZ+rFBcDlxljOlhjMkA3gD6AttCow0YY2YD7x/j9Ud733eB\ne0KvLQLWERxZOFYNfyU0DB9aNvk8gkHuQzstIsekoBeJINba94EfAa8aYzYDmwkG34+sta3AQuBB\nY8w64EaCYU/o9v3GmM+Bc8Le8lHg3NDzpwP7Qo//J8Gh+/XAOwSPs2/n6GfIO9baF4GVBDc4PgV+\nZq3dBtwA3BZ6//8BrjnGj/aYMaYu7M8MgsfWzwq99s8ERxjqj1UDwfMR6kI1Pw3sApqACqB9KP9Y\nrxWJW1qmViTGhE6SWxVr15IbYy4FEqy1y0In560GprYP/YvI0Wm4SyT2JBCbe7GbgGeMMf8duv9D\nhbzIiWmPXkREJIbpGL2IiEgMU9CLiIjEMAW9iIhIDFPQi4iIxDAFvYiISAz7/wFybAitPkJz4QAA\nAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Database seearching is a slightly different problem however. There are a few different scenarios here:\n", "\n", "1. we may have a database that is growing in size (for example, over months and years as more sequences are discovered);\n", "2. we may have a fixed database, but increasingly be obtaining larger numbers of sequences that we want to search against that database;\n", "3. or, the situation that we find ourselves in as of this writing: both. \n", "\n", "For the purposes of an exercise, think of the database as one long sequence that we want to align against and a collection of query sequence as another long sequence that we want to search. What do you expect a curve for each of the above to look like? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The core issue here is that it just takes too long to search each query sequence against the whole database. If we can search against a subset of the database by quickly deciding on certain sequences that are unlikely to match, that may help us to reduce runtime.** A heurtistic in this case would be a rule that we apply to determine which sequecnces we're going to align and which sequences we're not going to align. If we decided to not align against a given reference sequence, it becomes possible that we might miss the best alignment, so we want our heurtistics to be good to make that unlikely. So, when thinking about heurtistics, there are some important considerations:\n", "\n", "1. How often do I fail to find the best alignment? \n", "2. Is my runtime reduced enough that I can tolerate not getting the best alignment this often? \n", "\n", "Let's look at a few heuristics, starting with a silly one first: we'll select a random `p` percent of database to align our query against. We'll start by defining `p` as 10%. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "random()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "0.09427767107139828" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import approximated_local_alignment_search_random\n", "%psource approximated_local_alignment_search_random" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mapproximated_local_alignment_search_random\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreference_db\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mp\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0.10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlocal_pairwise_align_ssw\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_match\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mseq_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mreference_db\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mrandom\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mp\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0malignment\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mscore\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0malignment\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mscore\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mscore\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mbest_score\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mscore\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_match\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mseq_id\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0malignment\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0malignment\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mbest_a1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_a2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_score\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_match\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's pass our initial `query1` (again, an exact match to a database sequence) and see if we get the right answer, and how much runtime is reduced." ] }, { "cell_type": "code", "collapsed": false, "input": [ "start_time = time()\n", "a1, a2, score, ref_id = approximated_local_alignment_search_random(query1, reference_db)\n", "stop_time = time()\n", "\n", "print(a1)\n", "print(a2)\n", "print(score)\n", "print(ref_id)\n", "print(\"Runtime: %1.4f sec\" % (stop_time - start_time))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "AGTGGGGAATAACCTTGGGAAACCGGGGCTAATACCGCAT-ATGGTCT-ACGGACTAAA-GCTTTATGCGCTCGAGGAGGGGCCCGCGGTCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGATCGATAGCCGGCCTGAGAGGGCACACGGCCACACTGGCACTGAGACACGGGCCAGGCTCC\n", "AGTGGGGAATAACCCTGGGAAACCGGGGCTAATACCGCATAATCCTGTAAA--GGGAAAGCAGCAATGCGCTGAAAGAGGAGCCCGCGGCCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCGGTGATCGGTAGCCGGCCTGAGAGGGCACACGGCCACACTGGCACTGAAACACGGGCCAGACTCC\n", "252.0\n", "1127642\n", "Runtime: 0.5750 sec\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, we know what the right answer is, so we can run this a bunch of times and figure out how often we'll get that right answer. In this case, we'll iterate over the first 100 sequences in the reference database and search 50 bases of each against the full database." ] }, { "cell_type": "code", "collapsed": false, "input": [ "results = []\n", "\n", "for query_id, query_seq in reference_db[:100]:\n", " query_seq = query_seq[50:101]\n", " a1, a2, score, ref_id = approximated_local_alignment_search_random(query_seq, reference_db)\n", " results.append(ref_id == query_id)\n", "fraction_correct = results.count(True) / len(results)\n", "\n", "print(\"We get the right answer %.2f%% of the time.\" % (fraction_correct * 100))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "We get the right answer 12.00% of the time.\n" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "How much was the run time reduced here? How often were we right? What do you think: good heurtistic? \n", "\n", "Let's go with something a little smarter. **We can hypothesize that if the overall composition of a query sequence is different than the overall composition of a reference sequence, it's unlikely that the best alignment will result from that pairwise alignment.** One metric of sequence composition is GC content, so let's use that and only align against sequences whose GC content is within `p` percent of the query sequence." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import gc_content, approximated_local_alignment_search_gc\n", "%psource gc_content" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mgc_content\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseq\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mseq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcount\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'G'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mseq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcount\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'C'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseq\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "gc_contents = {}\n", "for seq_id, seq in reference_db:\n", " gc_contents[seq_id] = gc_content(seq)\n", "\n", "sns.set(style=\"white\", palette=\"muted\")\n", "ax = sns.distplot(gc_contents.values())" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAeMAAAFVCAYAAADc5IdQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuQpXld3/H3uXX3TM/s7IXdnVVuAvLzgglGEV3DrRQi\nKpVIrDKyBjXxUqApolQwkIiWiSlKE43REpWLQkCsUohRYigkQMEiKjddrr9ld1kWZmd2dm49py/n\n+jz54zlPdzPMdPc5/TzndJ/n/aqipvtMX36H2T6f/v4u318tTVMkSdLs1Gc9AEmSqs4wliRpxgxj\nSZJmzDCWJGnGDGNJkmbMMJYkacaau31ACOGpwKtijM8KIdwCvAa4HqgBL4wx3l/uECVJmm87VsYh\nhJeRhe/i6KFfAf5njPEZwCuBJ5U7PEmS5t9u09T3AM8nq4IBbgceFUL4S+AO4N0ljk2SpErYcZo6\nxvi2EMJjtz30WOBCjPHZIYSfB34O+IVrfX4IYRF4CnAaGO57tJIkHWwN4DbgQzHG7l4/adc14yuc\nB/5s9PafA7+8y8c/BXj/mN9DkqTD7mnAnXv94HHD+E7ge4A3Ac8APrHLx58GePOb38zJkyfH/FaS\nJB0uZ86c4Y477oBR/u3VXsM4v03ipcBrQwgvAi4BL9jl84YAJ0+e5JGPfOQ445Ik6TAba2l21zAe\nHV26ffT2A8BzJhqWJEm6Kpt+SJI0Y4axJJIkYWVlhSRJZj0UqZIMY0m0223e8o5P0m63Zz0UqZIM\nY0kALB05OushSJVlGEuSNGOGsVRh29eKXTeWZscwlips+1pxZ2ON//Xee1w3lmbAMJYqbvta8eKS\n68bSLBjGkiTNmGEsSdKMGcaSJM2YYSxJ0owZxlLFdfvw2r9cZa1b23zMY07SdBnGUsU9tAIfuqfH\n/ecXNh+zPaY0XYaxVHH90a2r51YbX/K47TGl6TGMpYrLw/hyp05vMNuxSFVlGEsVNxjmb9U4v+pL\ngjQL/uRJFdcfbr19zjCWZsKfPKni+tumpg1jaTaasx6ApNkajE4vHVsccnmjzuW1PulgnTRJZzsw\nqUIMY6ni8sr45HUD7nl4kY999hL33ncfR5avm+3ApApxTkqquP4wpV6Dm49lqRxP9VlcOjLjUUnV\nYhhLFdfrJ9RrQ5abG9RrKXc/6PkmadoMY6ni+kmNZh3qdbhxOeGL54eeN5amzDCWKirvPz0c1mjW\ns81ajzie7eY6v+Z2EmmaDGOpotrtNn/yrs/QT6DZGIXxsVEYX9EaU1K5DGOpwpqLR4Ctyvj6oymt\nBpxbM4ylaTKMpQobDLNrE/MwbtTh8SebtDsNuq4bS1NjGEsVljf8aIzCOEkSHnNTPlXty4M0Lf60\nSRV2ZWXc7axz5qHTAFxc8+VBmhZ/2qQKGyR5GCebjx072gKgN7zqp0gqwa5hHEJ4agjhPVc89oIQ\nwl+VNyxJ09C/ojIGaI3e7g9qMxmTVEU7HiYMIbwM+CFgddtj3wj8q5LHJWkKrlwzhq1jTn0rY2lq\ndquM7wGeD9QAQgg3Ab8M/Nv8MUmH15VrxgD1GjRq6WbVLKl8O4ZxjPFtwAAghFAHXgf8LNsqZUmH\n1+aaceNLr0tsNhIrY2mKxtnA9U3AE4BXA28Bvi6E8GuljErSVAxGgbu9Ms7ftzKWpmfPDWhjjB8C\nngQQQngM8Ecxxp8ta2CSypdXxo0rwrjVSFnrQZqmV/s0SQXba2V85U9k7SqPSTpkrrabeuv9Gp2e\nP+bSNOxaGccY7wdu3+0xSYdPvpv6yjBujc4dr3WTKz9FUgls+iFVWL6b+spp6nxD13rHMJamwTCW\nKmyQ1KjXUupX7NXKK+W1jluqpWkwjKUKGwxrXzZFDdkGLnCaWpoWw1iqsEHypX2pc/ljTlNL02EY\nSxVmZSwdDIaxVFHDJGWY1mg0vjyM84C2MpamwzCWKmqjlwXt1StjjzZJ02QYSxW10b12GFsZS9Nl\nGEsVtdHNAveqYeyasTRVhrFUUTtNU2+dMzaMpWkwjKWKysP4yu5bkN1p3KynTlNLU2IYSxW105ox\nQKvhNLU0LYaxVFEbvWuvGUN21tjKWJoOw1iqqL1UxuvdhCTxGkWpbIaxVFE7rRlDVhmn2z5OUnkM\nY6miNivjq3Tgyh7P/lzd8OYmqWyGsVRRW0ebrl75tppeoyhNi2EsVVCSJKys9oCd14zByliaBsNY\nqqB2u83nT7eBnXdTA6xuDKY2LqmqDGOpooY0gJR67ep/n1fGTlNL5TOMpYoaJNldxrVrhnFeGRvG\nUtkMY6miBsPaNaeowcpYmibDWKqovDK+FitjaXoMY6mC0jRlMLz25i2wMpamyTCWKqg/SEmp0bhG\nww/YqowNY6l8hrFUQeu7XBIBnjOWpskwlipot0siAGo1OLJYM4ylKTCMpQraaoW5841My4sNp6ml\nKTCMpQrKK+Nr3diUW16qWxlLU2AYSxW02yURuaOLdda7CUPvNJZK1dztA0IITwVeFWN8VgjhycD/\nAIZAF3hhjPFsyWOUVLCN7u4buCCrjAHWu0OOH9n15ULShHasjEMILwNeAyyOHvrvwE/HGJ8FvA34\nuXKHJ6kMe10zPrqYvUSsOVUtlWq3aep7gOcDeffafxFjvGv0dgvYKGtgksqzlzXjJElo1voArLqJ\nSyrVjmEcY3wbMNj2/hmAEMLtwE8Bv17q6CSVYi+VcbezzudPXwKsjKWyjb2BK4TwA8Crge+OMZ4v\nfkiSyra5ZrxDBy6AI4vZOrGVsVSusXZkhBB+CPgJ4JkxxovlDElS2fa6ZuxlEdJ07LUyTkMIdeA3\ngGPA20II7wkh/GJpI5NUmr104IJtl0UYxlKpdq2MY4z3A7eP3r2p1NFImoq8Mt6t6cdmZew0tVQq\nm35IFbTRS6jXUuq1nT/OyliaDsNYqqCNbrpZ9e6k1fQaRWkaDGOpgjZ6Cc09/PRvXqNoGEulMoyl\nClrvJrseawJo1rOOP05TS+UyjKWKGSYp3X66605qyO40PrpYtzKWSmYYSxXTyc8Y76EyhuyyCCtj\nqVyGsVQx66Mqdy9rxgBHvdNYKp1hLFXMencUxnuojJMkYbGRsNFLGA6901gqi2EsVcx6J5umbu1h\nzbjbWedSex3weJNUJsNYqpitynhvH7/Yyl4m3MQllccwlipmfY99qXObXbgMY6k0hrFUMeOsGYM3\nN0nTYBhLFbN1Y9PePr5pZSyVzjCWKqbb39uNTbm8MvassVQew1iqmO7m9Yl7+/h8bfny+qCsIUmV\nZxhLFbNZGdf2eG54uAHAxcvrZQ1JqjzDWKqYTn/Myjifph6dT5ZUPMNYqpitaeo9rhnXDWOpbIax\nVDHdCSvj/HyypOIZxlLF9PpZuO51zbhRS6mRsmplLJXGMJYqZtzKuFaDVtNpaqlMhrFUMZ1+QqtR\no1bb++e0GqlhLJXIMJYqpttPWGiNkcRk/anXOkPS1GsUpTIYxlLFdHsJi2OG8UIjZTCEbt8wlspg\nGEsV0+0nLDTHr4wBVjfswiWVwTCWKqbbTzbvKN6rVtObm6QyGcZSxXT76T4qY8NYKoNhLFXIcJgy\nGKZjb+Ba8E5jqVSGsVQh3UF2PGlx3Mq4mf256p3GUikMY6lC8r7UC+OuGVsZS6Vq7vYBIYSnAq+K\nMT4rhPAE4A+ABPgE8FMxRs86SIdE3n1r3KNN7qaWyrXjr8chhJcBrwEWRw/9GvCKGOPTgRrwT8sd\nnqQiTRrGzVpWEbfXDWOpDLvNVd0DPJ8seAH+UYzxfaO3/y/wnWUNTFLxOr1sImuhOd40dTrcAODi\n5W7hY5K0SxjHGN8GbP9VePuv06vAiTIGJakceWU87tGm5uadxq4ZS2UYdwPX9k7xx4FLBY5FUsl6\nk05T11MgZc07jaVSjBvGHwshPGP09nOB9+30wZIOls3KeMwwrtXyyyIMY6kMu+6mHsl3TL8UeE0I\nYQH4FPAnpYxKUim2NnCNf6pxwWsUpdLsGsYxxvuB20dvfxZ4ZrlDklSWzoRrxpA1/lg1jKVS2PRD\nqpC86ce4a8aQNf7oD9LNdWdJxTGMpQrJ7yMed80YvCxCKpNhLFXI1tGmydaMAdqGsVQ4w1iqkH1N\nU+eXRdgSUyqcYSxVSLefVbW9zhppMl5beS+LkMqz16NNkubA6noPgA/8/WlqSW+sz3XNWCqPlbFU\nIfma8fLy0bE/d8HKWCqNYSxVSG+QBeoE+7e2rRkbxlLRDGOpQvKjTY1JwjivjL0sQiqcYSxVSLef\n0qhDrbafc8buppaKZhhLFdIbJBNVxQCNWlYRt9etjKWiGcZShXT7Kc3G+FUxQNJfB1JWVrvFDkqS\nYSxVSX8weRhvXqPoncZS4QxjqUK6/XSindS5ltcoSqUwjKUK6faTiStjGFXGhrFUOMNYqojBMGWY\nQKMx+ddoNVK6/ZT+wECWimQYSxWRd99q1ievjBds/CGVwjCWKmIzjPdZGYNhLBXNMJYqYiuM97dm\nDIaxVDTDWKqI3iiMJ236AVuVcdswlgplGEsV0emNLonYR2W8tWZsS0ypSIaxVBGuGUsHl2EsVUQR\nu6nzNeNzl9ZIEo83SUUxjKWKyMN4X+eMm1ll/NHPPEy73S5iWJIwjKXK6HSzqeXm5IUxC6MgH9Iq\nYESScoaxVBEr7XUAkmTyzVf5mnHf/VtSoQxjqSK6/Xw39eRfI18z7hnGUqEMY6kieoMsjPdzzrhW\ngyMLNXpuppYKZRhLFdEb5Lup9/d1ji7WnKaWCmYYSxWRT1M39tH0A7IwdppaKpZhLFVEr7//aeok\nSVhoJAyS7EpGScVojvsJIYQ68FrgiUAC/HiMMRY9MEnF2tzAtY8w7nbWWWkDHGW9m3BTISOTNMmP\n5XOA5RjjPwZ+CfjlYockqQzdwf6bfgAstrKXjdWOHbikokwSxhvAiRBCDTgB9IodkqQy9AqojLPP\nz0J4reOWaqkoY09TAx8AloDPADcBzyt0RJJK8SVHm/ZR1OaNP9asjKXCTPI78suAD8QYA/Bk4A0h\nhIVihyWpaN1+Sr2WUqvtbzd1q24YS0WbJIyXgcujty8CLWCfq1CSytYbJPvaSZ1rNvJpasNYKsok\n09S/Cvx+COH9ZEH88hjjRrHDklS0bj+lUd//cSSnqaXijR3GMcZLwPeVMBZJJer1U/bZ7wPYmqZe\ndQOXVBibfkgV0R0khVTGTlNLxTOMpYro9dNC1ozdwCUVzzCWKmAwTBkmFFQZG8ZS0QxjqQK6/VH3\nrQLWjOs1aNZTw1gqkGEsVcBmGBdQGQO0GnbgkopkGEsVsBXGxXy9VtPKWCqSYSxVQLdXbGW80EjZ\n6KUMvUZRKoRhLFVAkWvGkE1Tg2eNpaIYxlIFFL9mPGr8sWEYS0UwjKUK6PS23dhUgNaod9/qxqCY\nLyhVnGEsVUBvYGUsHWSGsVQBWxu4ivl6C6M147ZhLBXCMJYqIF8zrtesjKWDyDCWKqD4c8bZn4ax\nVAzDWKqATmm7qd3AJRXBMJYqoFvwbmrXjKViGcZSBWw1/SioMm66ZiwVyTCWKqDwNeO8A5dhLBXC\nMJYqoFfwmnG9BkcWaq4ZSwUxjKUK6BRcGQMsL9WtjKWCGMZSBRR9axPA8lLDDVxSQQxjqQKKvrUJ\nssp4o5t4jaJUAMNYqoBuP6XVrFErOIzBaxSlIhjGUgV0+wmLzQKTmG1h7FS1tG+GsVQB3V7CQqus\nMHZHtbRfhrFUAd1+QquekibFre8eW8oOG1sZS/tnGEsV0OkPWe906fV7hX3NvDI+e75NkiSFfV2p\nigxjqQJ6/ZRmkYeM2Qrj9370FO12u9CvLVWNYSzNucEwZZgUe8YYtsI4rS0U+nWlKjKMpTlXxhlj\n2Arjnvu3pH1rTvJJIYSXA88DWsBvxRjfUOioJBWmjO5bsLWByzCW9m/syjiE8Ezg22KMtwPPBB5X\n8JgkFaiMvtSwrTJ2M7W0b5NUxs8BPh5C+FPgOuDfFTskSUXqFnxjUy4P476VsbRvk4TxzcCjgO8l\nq4r/DPiaIgclqTidbla61mvFhnGjXuPIQo3ewN7U0n5NMnF1DnhnjHEQY7wb6IQQHlHwuCQV5OLK\navZGWvx88vJS3TVjqQCThPGdwHcBhBC+AlgGzhc5KEnFySvXRoGVcZIkrKyssNTCylgqwNhhHGP8\nP8DHQgh/SzZF/eIYoz+N0gHV7Y/CuMA1425nnXf8zRk6nR6DpMawwDabUhVNdLQpxvhzRQ9EUjl6\now1cRa8ZLx1dZmmxD2uw1km4sdCvLlWLTT+kOdcdFF8Z51rZUWPWOvamlvbDMJbmXC+fpi64AxfA\nwmhuzTCW9scwluZcvsGqXkJlvBXGdv6Q9sMwlubcVm/qMsPYyljaD8NYmnOdXnlrxnkYrxrG0r4Y\nxtKca29kU8gLjRLC2A1cUiEMY2nOtTeyoGw1ig/MrcrYNWNpPwxjac6tbgyp19LCb20C14ylohjG\n0pxrbyQsNMvpkGUYS8UwjKU5194YlrJeDNAyjKVCGMbSHOsNEjq9tLTKuF6r0aynhrG0T4axNMfa\n6+XtpM61Gjb9kPbLMJbm2OX17LLhVkmVcf61rYyl/TGMpTnWHoVxmZXxQiNlo5cyHHqNojQpw1ia\nYytro2nqMivjUeMPzxpLkzOMpTl2ea0PQKte3jRya1R1r24YxtKkDGNpjj18cR2AOv3Svkd+vGl1\nY1Da95DmnWEszbG8L3WrhEsiclbG0v4ZxtIcWy2xL3UuvywiP0YlaXyGsTTHyryxKWdlLO2fYSzN\nsfZGQo20lLuMc/macbvjmrE0KcNYmmOrG0MWmim1WnnfI6+Mz19aJ0ls/iFNwjCW5lh7Iyl1ihq2\n1ow/fs8F2u12qd9LmleGsTSnhsOsTWWZDT9gq9XmkGap30eaZ4axNKc2jzWVXBk3atnUdK9vO0xp\nUoaxNKfySyLKroz73XUatYTuwDCWJmUYS3Pq8hQuici1Gin9QYm7xKQ5ZxhLc2rzLuOSK+PseyR0\nB5CmVsfSJAxjaU5dWs0viZhCGDdSkrTGRtcwliZhGEtz6tzFNQDq9Er/XovNbBPXpTUbf0iTmPgs\nQgjhFuAjwHfEGO8ubkiSitDe7EtdfrW6Fca2xJQmMVFlHEJoAb8LrBU7HElF2TraVH5XrIWGYSzt\nx6TT1L8KvBo4XeBYJBUov7FpGrupF/LKeNUwliYxdhiHEH4EeDjG+M7RQ55nkA6grDJOaU5hA9fi\nqDJesTKWJjJJZfyjwLNDCO8Bngy8IYRwa7HDkrRfeV/qMi+JyLlmLO3P2Bu4YozPyN8eBfJPxhgf\nKnRUkvZtdWO42Te6bPk0tZWxNBmPNklzKElSVqdwY1OuWYdGPbUylia0r2tWYozPKmogkoqz1h2S\npNPpvpVbaqZWxtKErIylOdQehWJ+1/A0LLayaephYhcuaVyGsTSHpnVj03aLrZQkhfa6XbikcRnG\n0hya5o1NucVR8F9sG8bSuAxjaQ5dXs+7b01xzbiVfa8L7f7Uvqc0LwxjaQ5dXpvBNPVoO+ilVStj\naVyGsTSHLk/xLuNc3p/6Qrv8W6KkeWMYS3OoPYM143qyAcBD59en9j2leWEYS3NoZRa7qW2JKU3M\nMJbmUL5m3KyVf31izmsUpckZxtIcWlnt0awnDAbTW7+t17Pd23bhksZnGEtzqL2RsLCvZreTWWyl\n3mksTcAwluZMmqa0N4ZT3byVW2qmrHYS+oPpTY9L88AwluZMp5cwGE5381ZusZX9eWnNs8bSOAxj\nac7MohVmbsmWmNJEDGNpzsyi4UdusZWHsS0xpXEYxtKcyY81TbMvdW7zsghbYkpjMYylOTPLynhp\ntGZsZSyNxzCW5sws14wXGtkvAhcuG8bSOAxjac6srGZB2KrP4HjRMOtLffbSxvS/t3SIGcbSnDm/\nkgVhLZ3+7UlZNW4XLmlchrE0Zy6vZxVxawZrxrVadq+xYSyNxzCW5szZS33qtZTFxmy6YC21Ui+L\nkMZkGEtzJE1TzlwccHQhoVabzRgWmymdXspG10CW9sowlubIytqQ9W7C8uLsekNvNv7wrLG0Z4ax\nNEcePN8F4NjC9NeLc541lsZnGEtz5NS5DsBsK2O7cEljM4ylOXLqXHacaZZhvGR/amlshrE0R/Jp\n6uWF2YVxa7SL+4JhLO2ZYSzNkVPnuiy2apvV6SzUh1nTkbMX7MIl7ZVhLM2JNE158HyXW69vzexY\nE8BCM6uMbfwh7Z1hLM2Ji+0BnV7CyRubMx1Hs55Sr6VcWnMDl7RXY//UhhBawOuBxwCLwH+OMf55\n0QOTNJ5To/Xi225okazPbhy1Giy2sAuXNIZJKuM7gIdjjE8Hvgv4rWKHJGkSD57LwvjE0oA0md2a\nMcBiM+HS6pDBwECW9mKSMP5j4JXbPt+5KOkA+MLD2RnjT9/zBXr96d/YtN2RRp9hAvd+8eJMxyEd\nFmNPU8cY1wBCCMfJgvk/FD0oSeN74KE1AI4uzrYqBji+NOR0G+5/qEd47KxHIx18E23gCiE8Cng3\n8MYY4x8VOyRJkzhzsU+jno7uFJ6t44vZhNn9D3VnPBLpcJhkA9etwDuBF8cY31P8kCSNK0my25qW\nZ3hb03Z5GH/+7Gyny6XDYpIzEK8ATgCvDCHka8fPjTF2ihuWpHGcv9ynP0i5eXl2nbe2W2imLLVS\n7n/IMJb2YpI145cALylhLJIm9MXR5q1ZtsG80okjCQ9dHnJptc/1x1qzHo50oNn0Q5oD951aAWCx\ncXAq0RNHs7Xrex+0Laa0G8NYmgOnL2SXMhxdODjnek8cyar0+04bxtJuDGNpDpy5mG2YOto6SGGc\nVcaGsbQ7w1iaA6cv9mk1UloH4FhT7uhCypHFmtPU0h4YxtIhN0xSzl7qH5hjTblaDR57yyJfPNel\n0zs4Fbt0EBnG0iH38KUegyEsLx6cndQASZJw2/UpaQr3n/Hko7QTw1g65E6NLog4SMeaALqddc6e\nOw/Ava4bSzsyjKVD7tS5rOo8SDupczddl50vvs91Y2lHhrF0yN1/ehWAxfrBOWOcO76U0qhbGUu7\nMYylQ+4gnjHONerwyEcscP+ZDYYzvmNZOsgMY+kQS9OU+850WWolB+pY03aPuWWBbj/dXNuW9OUM\nY+kQe+Bsl8vrCTctH7yqGLId1bdeN+rE5bqxdE2GsXSIffy+bL34oIZxt7PO/V88A8Cn7r9Ekhys\nHd/SQWEYS4fYxz+XhfEjjg1mPJJru/nEAgB//akLtNvtGY9GOpgMY+mQStOUuz63yg3HGhxdOJjr\nxQCtJtx0vM7ljQZpenDHKc2SYSwdUl94uMul1QFf9+ilA9UG80pJknDyREp3AKcvHtwKXpolw1g6\npO66N5vy/ZpHLs54JDvrdtZJOg8D8I4Pr8x4NNLBZBhLh9RHP3sJgMfcdDA3b233lTfCkYWU9961\nysqa1bF0JcNYOoTSNOXTD3RYasEtJw7+j3G9Bo+/eUBvkPL2D56b9XCkA+fg/xRL+jKnznW5tDbk\n5utS2u026SHobvWoG/ocXazxZx98mE7PI07SdoaxdAjdNTrSdP1Sl7ffeR+9/sHrS32lYX+dk8fW\nubw+5F0fuTDr4UgHimEsHULbm30sLh2d8Wj27qtvq9Fq1HjrnWftVS1tYxhLh0yaptx13yonlhsc\nWzxc071LLXj6NxzjzIUef/VJd1ZLOcNYOmQePN/jQnvAE0424BA20fiebzlBrQZ/9O4HuXDhgi0y\nJQxj6dC5azRFvd4+fyjWiq908voG//CxLe470+NX//CTtsiUMIylQyfvR33r9YfvxzdJEk6dOsVy\ncpp6LeWTZ45x/rLnjqXD99MsVdjffmaFD35qhRNH64duvRiyblxvv/M+lpcg3LJGf1jjt9/+sJu5\nVHmGsXQAJUnCysrK5npqmqa89f1n+cU3fo4kSfmx73rEge5HvZN89/cjT3Q4eWLIJx/o8Nb3n53x\nqKTZMoylA6jdbvOWd2Trqb1Bwq+/9Qu89i8e5MZjTV75g7fy1bf0D0Wjj53UavDkR/e54ViDN77z\nNJ89tf5lv4RIVdGc9QAkXV2fJd76vof4yH2nue9Mh8edXOAnnn2Ud//139PtdQ/V+eJradUTXvjM\no/zG29v8lzffx4u+5xF8+OP3ccdzv54TJ07MenjS1IwdxiGEOvDbwD8AusCPxRjvLXpgUhV94ewG\n7/rwQ/xtXOf+s3VgnRpw2/ENvv5kjw/edRHqjbkIYsjWkB948DyPv7nGvQ/DL7zpDM36ET538Ys8\n5Ws3+MavavJVX3k9a2trHD9+nHrdyTzNp0kq438GLMQYbw8hPBX4b6PHJO1BkiS0222OHz9OSo3T\n57vc+YlLvPfvLvD5s9lRpUYdbj424OblDU60Vji+vMSxY7eRDgdsdDZm/AyKtXR0mW9+fJ8bj/U4\ncuQ4H7l3nU880OcTD5zh94EnfsUZlptrvOT7n8itN98w6+FKpZgkjL8deAdAjPFvQgjfXOyQxtPr\nJzad18RS8r4ZKWmavb/eGbKyNuDy+pC7P/cQg2HK4x59C4stSAddjh9fZpiktFfXWFw8QppCp7PB\n8vJR6jXodTe44cQxjiw2WGzVqddrnF/p8/BKj7OX+pw6u8rHPnuRemORhy8PGIxuQKzVUr72K+ss\n1tZZrp1j+eiR0SiXZvL/zTTVajVOXjfg6U8acl19jcsbAx66lHKucz13PwiwyE/+5gM89uTDXHcE\nbrlhieNLKbfeuMwNx1tcf6zJ9ceaLLYa277m6M+ZPCPNg+WlBo3GdP4LmiSMrwMub3t/GEKoxxiv\nlogNgDNnzkwytl2tdYa89Hc+axhrCj5f+Fc80oIGPZbqPY631rnluiHDCxvUF5fYADau6IUx6HVJ\nhgMuX764r+97kL/OG+7usLCY/fKxDCy3vsCNi0PWuZm1wVE+fXfC0B93TcnjbjvCK//lV431Odvy\nrrHTx11pkjC+DBzf9v61ghjgNoA77rhjgm8jSdLsfAH4jj+Y+NNvA/a8n2qSMP4A8Dzgj0MI3wrc\ntcPHfgjUUr5OAAAEdUlEQVR4GnAaGE7wvSRJOkwaZEH8oXE+qZaO2Wg+hFBjazc1wI/GGO8e64tI\nkqRNY4exJEkqlof2JEmaMcNYkqQZM4wlSZoxw1iSpBkr9KKIEMIR4E3AzUAb+OEY47krPuZngB8Y\nvfsXMcZfKnIMs7Bbv+4QwvOAnwcGwOtjjK+dyUBLsofn/4PAS8ie/8eBF8cY52bn4F77tYcQfg84\nH2N8+ZSHWJo9/Ns/haxlbg04BbwwxtibxVjLsIfn/33AK8iau70+xvg7MxloiUZtkV8VY3zWFY/P\n9etebofnP9brXtGV8YuAv48xPh14I/Afrxjc44AXAN8WY/xW4DkhhG8oeAyzsNmvG/j3ZC8+AIQQ\nWsCvAc8GngH8RAjhlpmMsjw7Pf8jwH8Cnhlj/MfACeB7ZzLK8lzz+edCCD8JPInsRXme7PRvXwN+\nD/iRGOPTgP8HjNfO6ODb7d8+/9n/duClIYS5uooqhPAy4DXA4hWPV+F1b6fnP/brXtFhvNm3evTn\nd17x9w8A/2TbbwctYB663n9Jv25ge7/urwXuiTGuxBj7wJ3A06c/xFLt9Pw7ZL98dUbvN5mPf/Pt\ndnr+hBBuB74F+F3mr1XyTs/9icB54GdDCO8Fro8xxqmPsFw7/tsDfeB64AjZv/28/TJ2D/B8vvy/\n6yq87sG1n//Yr3sTh3EI4V+HED6+/X9k6Z/3rW6P3t8UYxzEGC+EEGohhP8KfDTGeM+kYzhArtqv\ne9vfrWz7uy/7/2UOXPP5xxjTGOPDACGEfwMsxxjfNYMxlumazz+EcBvwSuCnmb8ghp3/238EcDvw\nm2S/mH9HCOFZzJednj9klfJHgE8Afx5j3P6xh16M8W1k07BXqsLr3jWf/ySvexOvGccYXwe8bvtj\nIYS3stW3+jhw6crPCyEsAa8n+4d68aTf/4DZqV/3yhV/dxzYX2f9g2fHfuWjF6dfAZ4A/PMpj20a\ndnr+308WSn8BnASOhhA+HWN845THWJadnvt5suooAoQQ3kFWOb5nukMs1TWffwjh0WS/hD0GWAfe\nFEL4/hjjn0x/mFNXhde9HY37ulf0NPUHgO8evf1c4H1XDK4G/G/g72KML5qjTTybz/sq/bo/A3x1\nCOGGEMIC2VTNB6c/xFLt9Pwhm55dBL5v27TNPLnm848x/maM8ZtHmzteBfzhHAUx7Pxvfx9wLITw\n+NH7TyOrEOfJTs9/iawnf3cU0GfJpqyroAqve7sZ63Wv0HaYo0XrN5A1ye4CL4gxnh3toL6HrIH2\nW8j+UfIpu5fHGP+6sEHMwNX6dQPfBByLMb4mhPC9ZFOVdeB1McZXz2ak5djp+QMfHv1v+y9mvxFj\n/NOpDrJEu/37b/u4HwZCjPEV0x9lOfbw337+S0gN+ECM8WdmM9Jy7OH5/wzZptUO2Wvgj8cYrzat\ne2iFEB5L9kvm7aMdxJV43ctd7fkzweuevaklSZoxm35IkjRjhrEkSTNmGEuSNGOGsSRJM2YYS5I0\nY4axJEkzZhhLkjRj/x/N5Tmrp5NWSAAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "%psource approximated_local_alignment_search_gc" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mapproximated_local_alignment_search_gc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreference_db\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreference_db_gc_contents\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mp\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0.05\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlocal_pairwise_align_ssw\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mquery_gc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgc_content\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_match\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mseq_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mreference_db\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mref_gc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mreference_db_gc_contents\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mseq_id\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mref_gc\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mp\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mquery_gc\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mref_gc\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mp\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0malignment\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mscore\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0malignment\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mscore\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mscore\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mbest_score\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mscore\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_match\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mseq_id\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0malignment\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_a2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0malignment\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mbest_a1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_a2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_score\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_match\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we run our `query1` again, do we get the right answer? How much did we reduce runtime? Do you think this is a better or worse heurtistic? " ] }, { "cell_type": "code", "collapsed": false, "input": [ "start_time = time()\n", "a1, a2, score, ref_id = approximated_local_alignment_search_gc(query1, reference_db, gc_contents)\n", "stop_time = time()\n", "\n", "print(a1)\n", "print(a2)\n", "print(score)\n", "print(ref_id)\n", "print(\"Runtime: %1.4f sec\" % (stop_time - start_time))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACCTGCCCTCGAGTGGGGAATAACCTTGGGAAACCGGGGCTAATACCGCATATGGTCTACGGACTAAAGCTTTATGCGCTCGAGGAGGGGCCCGCGGTCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGATCGATAGCCGGCCTGAGAGGGCACACGGCCACACTGGCACTGAGACACGGGCCAGGCTCC\n", "ACCTGCCCTCGAGTGGGGAATAACCTTGGGAAACCGGGGCTAATACCGCATATGGTCTACGGACTAAAGCTTTATGCGCTCGAGGAGGGGCCCGCGGTCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGATCGATAGCCGGCCTGAGAGGGCACACGGCCACACTGGCACTGAGACACGGGCCAGGCTCC\n", "400.0\n", "273556\n", "Runtime: 2.6838 sec\n" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's make the alignment a litte harder by deleting some bases from our query. Now what happens?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "query2 = query1[:75] + query1[85:]\n", "start_time = time()\n", "a1, a2, score, ref_id = approximated_local_alignment_search_gc(query2, reference_db, gc_contents)\n", "stop_time = time()\n", "\n", "print(a1)\n", "print(a2)\n", "print(score)\n", "print(ref_id)\n", "print(\"Runtime: %1.4f sec\" % (stop_time - start_time))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACCTGCCCTCGAGTGGGGAATAACCTTGGGAAACCGGGGCTAATACCGCATATGGTCTACGGACTAAAGCTTTAT----------AGGGGCCCGCGGTCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGATCGATAGCCGGCCTGAGAGGGCACACGGCCACACTGGCACTGAGACACGGGCCAGGCTCC\n", "ACCTGCCCTCGAGTGGGGAATAACCTTGGGAAACCGGGGCTAATACCGCATATGGTCTACGGACTAAAGCTTTATGCGCTCGAGGAGGGGCCCGCGGTCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGATCGATAGCCGGCCTGAGAGGGCACACGGCCACACTGGCACTGAGACACGGGCCAGGCTCC\n", "357.0\n", "273556\n", "Runtime: 3.3852 sec\n" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, let's look at how many times we'd be right if we ran this on a subsequence of each reference sequence in the database." ] }, { "cell_type": "code", "collapsed": false, "input": [ "results = []\n", "\n", "for query_id, query_seq in reference_db[:20]:\n", " query_seq = query_seq[50:151]\n", " a1, a2, score, ref_id = approximated_local_alignment_search_gc(query_seq, reference_db, gc_contents)\n", " results.append(ref_id == query_id)\n", "fraction_correct = results.count(True) / len(results)\n", "\n", "print(\"We get the right answer %.2f%% of the time.\" % (fraction_correct * 100))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "We get the right answer 80.00% of the time.\n" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So this is looking a little better. It doesn't get our runtime quite as low, but we seem to get the right answer a lot more often. Our evaluation here was pretty limited though: our query sequences are always perfect subsequences of sequences in the reference database. If you were evaluating heuristics for a real-world application, what are some of the other things you might want to test?\n", "\n", "**TODO**: Port k-mer composition comparison code from [multiple sequence alignment notebook](http://nbviewer.ipython.org/github/gregcaporaso/An-Introduction-To-Applied-Bioinformatics/blob/master/algorithms/4-multiple-sequence-alignment.ipynb) to look at a third heuristic." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Is my alignment \"good\"? Determining whether an alignment is statistically significant." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may have noticed that the score you get back for an alignment isn't extremely informative. It's dependent on the query and reference sequence lengths (and possibly composition, depending on your substitution matrix). An important question then is: **is my alignment score good?**\n", "\n", "Remember that an alignment of a pair of sequences represents a hypothesis about homology between those sequences. So when we are asking whether an alignment is good, what we really want to know is: **what fraction of the time would I obtain a score at least this good if my sequences are not homologous?** If this fraction is high, then our alignment is not good. If it's low, then our alignment is good. What is defined as high and low in this context is dependent on **how often you are willing to be wrong**. \n", "\n", "If being wrong 5% of the time is acceptable (i.e., you can tolerate a false positive, or calling a pair of sequences homologous when they are actually not, one in twenty times) then you'd set your *cut-off fraction* as 0.05. This fraction is usually call our **alpha**. You have to balance this with how often you can accept false negatives, or deciding that a pair of sequences are not homologous when they actually are. If alpha is high, then you will err on the side of false positives. If alpha is low, then you will err on the side of false negatives. **There is not a hard-and-fast rule for whether false positives or false negatives are better**. It's application specific, so you need to understand your application when making this decision.\n", "\n", "In general, when might you prefer to have false positives? When might you prefer to have false negatives?\n", "\n", "**In this section, we are going to empiricially determine if a pairwise alignment is better than we would expect by chance.** For each pair of sequences, we're going to align them to determine the score of the alignment, and then we're going to align pairs of sequences that are similar to the query and reference, but that we know are not homologous. We'll do this by *shuffling* or randomizing the order of the bases in the query sequences, and performing another pairwise alignment." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from random import shuffle, choice\n", "from collections import Counter" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're going to use python's `random.shuffle` and `random.choice` functions for this. `random.choice` randomly selects an element from a sequence, so we can use it to contruct a random sequence of length `n` as follows: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "n = 10\n", "seq = [choice('ACGT') for e in range(n)]\n", "print(\"\".join(seq), Counter(seq))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "CTTATCCGGA Counter({'C': 3, 'T': 3, 'A': 2, 'G': 2})\n" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`random.shuffle` randomly re-orders the order of the elements in a sequence, but keeps the composition and length of the sequence the same. Run this next cell a few times to see the sequences that are generated." ] }, { "cell_type": "code", "collapsed": false, "input": [ "shuffle(seq)\n", "print(\"\".join(seq), Counter(seq))\n", "shuffle(seq)\n", "print(\"\".join(seq), Counter(seq))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "TATCGACGCT Counter({'C': 3, 'T': 3, 'A': 2, 'G': 2})\n", "CTAATGTCCG Counter({'C': 3, 'T': 3, 'A': 2, 'G': 2})\n" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's generate a random query sequence. Then we'll generate 99 random variants of that sequence with ``shuffle`` and compute the pairwise alignment for each of those variants against the query sequence. We'll then look at the distribution of those scores." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_seq = [choice('ACGT') for e in range(40)]\n", "''.join(query_seq)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ "'TTACGGCGCTCGCAAAAAGAAATAGGAGCACGCGGGCTCC'" ] } ], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import generate_random_score_distribution\n", "%psource generate_random_score_distribution" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mgenerate_random_score_distribution\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery_sequence\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msubject_sequence\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mn\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m99\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlocal_pairwise_align_ssw\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mrandom_sequence\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery_sequence\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mshuffle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrandom_sequence\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0malignment\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrandom_sequence\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0msubject_sequence\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0malignment\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mscore\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "random_scores = generate_random_score_distribution(query_seq, query_seq)\n", "print(random_scores)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[11.0, 14.0, 11.0, 12.0, 13.0, 10.0, 14.0, 13.0, 14.0, 14.0, 11.0, 14.0, 9.0, 17.0, 14.0, 10.0, 11.0, 9.0, 14.0, 18.0, 10.0, 12.0, 10.0, 10.0, 12.0, 16.0, 11.0, 8.0, 15.0, 10.0, 10.0, 18.0, 9.0, 8.0, 12.0, 10.0, 12.0, 10.0, 10.0, 12.0, 10.0, 12.0, 12.0, 12.0, 10.0, 10.0, 16.0, 12.0, 10.0, 10.0, 8.0, 10.0, 9.0, 8.0, 9.0, 16.0, 10.0, 10.0, 16.0, 14.0, 10.0, 10.0, 8.0, 10.0, 9.0, 11.0, 12.0, 10.0, 9.0, 10.0, 10.0, 11.0, 10.0, 13.0, 10.0, 10.0, 12.0, 10.0, 10.0, 10.0, 8.0, 11.0, 10.0, 14.0, 14.0, 10.0, 10.0, 12.0, 11.0, 16.0, 10.0, 16.0, 9.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0]\n" ] } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "import matplotlib.pyplot as plt\n", "\n", "n, bins, patches = plt.hist(random_scores, facecolor='green', alpha=0.5, bins=range(0,100,1), normed=1)\n", "plt.xlabel('Score')\n", "plt.ylabel('Frequency')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 26, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAf0AAAFkCAYAAAAqpeIDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGuVJREFUeJzt3X2QXFd55/GvZoQ8DKMXl2xi2SRZKOCBwMoERCxLYFCI\nbMeLE/FSBY7LgEAB24RQlU0BpjaonGQX8EuKmEWsI78H5aW8hSEBLOM12IJhV0AMCBbzyEiQKlh5\nl0gajS1Lsjwz+8e9I7eG0Uxrpq9Ho/P9VKmq7zn3dJ++Nepf33NP3zNnZGQESZJ08uua6Q5IkqSn\nh6EvSVIhDH1Jkgph6EuSVAhDX5KkQhj6kiQVYm5TTxwRXcAGYClwCFiXmTvG2e9vgN2ZeVW9/SCw\nr67emZnvaqqPkiSVpLHQB9YA8zJzRUScA1xflx0REe8BXgrcX2/3AGTmqgb7JUlSkZoc3l8JbAbI\nzK3AstbKiFgB/BZwIzCnLj4b6I2IeyLivvrLgiRJ6oAmz/QXAIMt20MR0ZWZwxGxBPgI8AbgLS37\n7AeuzcybI+IFwN0R8cLMHB7vBSLiFOCVwC5gqJF3IUnSiaMbWAJ8KzMPHW/jJkN/EJjfst3VEt5v\nBk4DvgScQXV2/xDwD8CPATLz4YjYTfXmfn6M13gl8LUG+i5J0ons1cDXj7dRk6HfD1wM3BkRy4Ft\noxWZ+UngkwAR8XYgMvOOiLgc+PfAeyPiTKrRgl0TvMYugE2bNnHGGWc08y4kSTpBPPLII1x66aUw\ncTYeU5OhfxewOiL66+21EXEJ0JeZG4/R5ibg1ojYMtrmWEP7tSGAM844g+c85zkd6bQkSbPAlC5p\nNxb6mTkCXDGmePs4+93e8vhJ4LKm+iRJUsm8OY8kSYUw9CVJKoShL0lSIQx9SZIKYehLklQIQ1+S\npEIY+pIkFcLQlySpEIa+JEmFMPQlSSqEoS9JUiEMfUmSCmHoS5JUCENfkqRCGPqSJBXC0JckqRCG\nviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuSVAhDX5KkQhj6kiQVwtCXJKkQhr4kSYUw9CVJKsTcpp44\nIrqADcBS4BCwLjN3jLPf3wC7M/OqdtvMZkNDQwwMDBxVtmjRIrq7u2eoR5KkUjQW+sAaYF5mroiI\nc4Dr67IjIuI9wEuB+9ttM9sNDAxw9ear6enrAeDgYwdZf+F6Fi9ePMM9kySd7Joc3l8JbAbIzK3A\nstbKiFgB/BZwIzCnnTYni56+HnoX9tK7sPdI+EuS1LQmQ38BMNiyPVQP3xMRS4CPAH/EU4E/YRtJ\nkjQ9TQ7vDwLzW7a7MnO4fvxm4DTgS8AZQG9E/GiSNpIkaRqaPIvuBy4CiIjlwLbRisz8ZGYuy8xV\nwMeATZl5+0RtJEnS9DR5pn8XsDoi+uvttRFxCdCXmRvbbdNg/yRJKkpjoZ+ZI8AVY4q3j7Pf7ZO0\nkSRJHeAkOUmSCmHoS5JUCENfkqRCGPqSJBXC0JckqRCGviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuS\nVAhDX5KkQhj6kiQVwtCXJKkQhr4kSYUw9CVJKoShL0lSIQx9SZIKYehLklQIQ1+SpEIY+pIkFcLQ\nlySpEIa+JEmFMPQlSSqEoS9JUiEMfUmSCmHoS5JUiLlNPXFEdAEbgKXAIWBdZu5oqX8T8EFgBNiU\nmTfU5Q8C++rddmbmu5rqoyRJJWks9IE1wLzMXBER5wDX12VERDfwUeAVwH7ghxHxGeBxgMxc1WC/\nJEkqUpPD+yuBzQCZuRVYNlqRmUPAizLzUeB0oBt4Ajgb6I2IeyLivvrLgiRJ6oAmQ38BMNiyPVQP\n+QOQmcMR8UbgO8BXqc7y9wPXZuYFwOXAptY2kiRp6poM1EFgfutrZeZw6w6Z+VngLOAU4G3AdmBT\nXfcwsBtY0mAfJUkqRpOh3w9cBBARy4FtoxURsSAiHoiIeZk5QnWGPwSspbr2T0ScSTVasKvBPkqS\nVIwmJ/LdBayOiP56e21EXAL0ZebGeuLelog4DHwP+AzVtf1bI2LLaJuxowOSJGlqGgv9+gz+ijHF\n21vqNwIbx9Q/CVzWVJ8kSSqZk+QkSSqEoS9JUiEMfUmSCmHoS5JUCENfkqRCGPqSJBXC0JckqRCG\nviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuSVAhDX5KkQhj6kiQVwtCXJKkQhr4kSYUw9CVJKoShL0lS\nIQx9SZIKYehLklQIQ1+SpEIY+pIkFcLQlySpEIa+JEmFMPQlSSqEoS9JUiHmNvXEEdEFbACWAoeA\ndZm5o6X+TcAHgRFgU2beMFkbSZI0dU2e6a8B5mXmCuBDwPWjFRHRDXwUeB1wLnBlRCyu25wyXhtJ\nkjQ9TYb+SmAzQGZuBZaNVmTmEPCizHwUOB3oBp6o29w9XhtJkjQ9TYb+AmCwZXuoHr4HIDOHI+KN\nwHeArwL7J2sjSZKmrslAHQTmt75WZg637pCZnwXOAk4B3tZOG0mSNDVNhn4/cBFARCwHto1WRMSC\niHggIuZl5gjVWf7QRG0kSdL0NDZ7H7gLWB0R/fX22oi4BOjLzI0R8RlgS0QcBr4HfKbe76g2DfZP\nkqSiNBb69Rn8FWOKt7fUbwQ2jtN0bBtJktQBTpKTJKkQhr4kSYUw9CVJKoShL0lSIQx9SZIKYehL\nklQIQ1+SpEIY+pIkFcLQlySpEIa+JEmFMPQlSSqEoS9JUiEMfUmSCmHoS5JUCENfkqRCGPqSJBXC\n0JckqRCGviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuSVAhDX5KkQhj6kiQVYu5kO0TEl4Bbgc9l5uHm\nuyRJkprQzpn+x4HfBR6OiE9FxCsb7pMkSWrApGf6mfkA8EBEPBN4M/DZiBgENgKfzsxDDfdRkiR1\nwKShDxARq4DLgNXA3cA/1o//CbjgGG26gA3AUuAQsC4zd7TUXwK8H3gS+D5wZWaORMSDwL56t52Z\n+a4pvC9JkjRGO9f0/xX4CXAL8N7MPFCX3w98e4Kma4B5mbkiIs4Brq/LqEcN/gJ4aWYejIi/A14f\nEfcCZOaqqb8lSZI0nnau6b8OeEtm3gHMiYjnA2TmUGb+5gTtVgKb6323Asta6g4C52bmwXp7LnAA\nOBvojYh7IuK++suCJEnqgHZC/yLq8AaeDXwhIt7TRrsFwGDL9lA95E9mjmTmLwAi4n3AszLzfwD7\ngWsz8wLgcmDTaBtJkjQ97QTqe4BXAWTmT4GXA+9ro90gML/1tTJzeHQjIroi4jqqkYQ31cXbgU31\naz0M7AaWtPFakiRpEu2E/lzgiZbtJ4DhY+zbqp9qlICIWA5sG1N/I3AK8IaWYf61VNf+iYgzqUYL\ndrXxWpIkaRLtzN7/HPCViPhHYA7wRqpZ+5O5C1gdEf319tp6xn4f1QTAdwJb6ucG+ARwM3BrRGwZ\nbdM6OiBJkqaundD/ENXv888DDgN/nZmfm6xRZo4AV4wp3t7yuPsYTS9ro0+SJOk4TTq8X4f3Q8Cd\nwOeBvRFxXtMdkyRJndXO7/Q/BVwM7ARGWqr8Lb0kSbNIO8P75wMxelMeSZI0O7Uze39nm/tJkqQT\nWDtn+nuBH0bEN6jupAcwkpnvbK5bkiSp09oJ/c31v9Hr+XM4+tq+JEmaBdpZWve2iHgu8BLgHuBX\nM3Nn4z2TJEkdNem1+oh4K9XNeP4aWAz0R4S/pZckaZZpZ4LeB6lWzBvMzEeo7r1/VaO9kiRJHddO\n6A9l5pHV8jJzFzDUXJckSVIT2pnI97/r5W/nRcTLgCuB7zbbLUmS1GntnOm/FzgLOADcQrVk7pVN\ndkqSJHVeO7P3H6NadEeSJM1i7dx7f7ylbf9PZj6ngf5IkqSGtHOmf+QSQEQ8A1gDrGiyU5IkqfOO\n6576mXk4M+8Efruh/kiSpIa0M7z/9pbNOVR35jvUWI8kSVIj2vnJ3iqeutf+CPBvwFsa65EkSWpE\nO9f03/E09EOSJDWsneH9n1Cd4c8Zp3okM5/X8V5JkqSOa2d4fxOwH7gROAz8AfAq4E8Z/4uAJEk6\nAbUT+hdl5stbtm+MiHdn5v9tqlOSJKnz2vrJXkSc3/J4DdWteCVJ0izSzpn+OuAzEfErVMP5DwFv\na7RXkiSp49qZvf8g8BsRcRpwKDMfbb5bkiSp0yYd3o+IfxcR9wL/C5gfEV+NiOc23zVJktRJ7Qzv\n3whcB3wMeIRqNv/twHkTNYqILmADsJTqDn7rMnNHS/0lwPuBJ4HvUy3XO2eiNpIkaeramch3Wmbe\nA5CZw5l5E7CwjXZrgHmZuYJqad7rRysi4pnAXwCvzcxX1c/3+rrNKeO1kSRJ09NO6D8eEUeW0Y2I\nVwEH22i3EtgMkJlbgWUtdQeBczNz9Hnm1mUrgbuP0UaSJE1DO6H/J8AXgedHxPeAv6calp/MAo7+\nad9QPeRPZo5k5i8AIuJ9wLMy896J2kiSpOlp55r+s4FXAi8EuoEfZWY7q+wNAvNbtrsyc3h0ow7z\na4DnA29qp40kSZq6dkL/2sz8DeAHx/nc/cDFwJ0RsRzYNqb+Rqoh/Tdk5kibbSRJ0hS1E/o7IuIW\nYCtPXcsfycw7Jml3F7A6Ivrr7bX1jP0+4NvAO4EtwFciAuAT47Vp+51IkqQJHTP0I+KszPw5sJvq\np3TLx+wyYejXZ+9XjCne3vK4+xhNx7aRJEkdMNGZ/heA38zMd0TEn2bmdU9XpyRJUue1OzP+0kZ7\nIUmSGufP4SRJKoShL0lSISa6pv+SiPhJ/fjMlsdQzd5/XoP9kiRJHTZR6L/waeuFJElq3DFDPzN/\n+jT2Q5IkNcxr+pIkFcLQlySpEIa+JEmFMPQlSSqEoS9JUiEMfUmSCmHoS5JUCENfkqRCGPqSJBXC\n0JckqRCGviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuSVAhDX5KkQhj6kiQVwtCXJKkQhr4kSYWY29QT\nR0QXsAFYChwC1mXmjjH79AL3Au/MzKzLHgT21bvszMx3NdVHSZJK0ljoA2uAeZm5IiLOAa6vywCI\niGXAfwPOBEbqsh6AzFzVYL8kSSpSk8P7K4HNAJm5FVg2pn4e1ZeAbCk7G+iNiHsi4r76y4IkSeqA\nJkN/ATDYsj1UD/kDkJnfyMyfjWmzH7g2My8ALgc2tbaRJElT12SgDgLzW18rM4cnabMd2ASQmQ8D\nu4ElzXRPkqSyNBn6/cBFABGxHNjWRpu1VNf+iYgzqUYLdjXVQUmSStLkRL67gNUR0V9vr42IS4C+\nzNx4jDY3A7dGxJbRNm2MDkiSpDY0FvqZOQJcMaZ4+zj7rWp5/CRwWVN9kiSpZE6SkySpEIa+JEmF\nMPQlSSqEoS9JUiEMfUmSCmHoS5JUCENfkqRCNHlzHk3R0NAQAwMDR5UtWrSI7u7uGeqRJOlkYOif\ngAYGBrh689X09PUAcPCxg6y/cD2LFy+e4Z5JkmYzQ/8E1dPXQ+/C3pnuhiTpJOI1fUmSCmHoS5JU\nCENfkqRCGPqSJBXC0JckqRCGviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuSVAhDX5KkQhj6kiQVwtCX\nJKkQhr4kSYUw9CVJKoShL0lSIeY29cQR0QVsAJYCh4B1mbljzD69wL3AOzMz22kjSZKmpskz/TXA\nvMxcAXwIuL61MiKWAVuA5wIj7bSRJElT12TorwQ2A2TmVmDZmPp5VCGfx9FGwNDQELt37/6lf0ND\nQzPdNUnSCayx4X1gATDYsj0UEV2ZOQyQmd8AiIi226gyMDDA1Zuvpqev50jZwccOsv7C9SxevHgG\neyZJOpE1GfqDwPyW7XbCeyptitTT10Pvwt6Z7oYkaRZpcni/H7gIICKWA9saaiNJktrQ5Jn+XcDq\niOivt9dGxCVAX2ZubLdNg/2TJKkojYV+Zo4AV4wp3j7OfqsmaSNJkjrAm/NIklSIJof31YbhoWH2\n7NlzVNmePXsYHnb+oiSpswz9GXZo/yGueeAaTj391CNle3ftpWdRD330zWDPJEknG0P/BDD253cH\nBg/MYG8kSScrr+lLklQIQ1+SpEIY+pIkFcLQlySpEE7ka9DQ0BADAwNHlflzPEnSTDH0GzTeanj+\nHE+SNFMM/Yb5czxJ0onCa/qSJBXC0JckqRCGviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuSVAhDX5Kk\nQhj6kiQVwtCXJKkQ3oa3g8YusDOTi+uMt9gPwKJFi+ju7p6BHkmSZpqh30FjF9iZycV1xlvs5+Bj\nB1l/4XoWL178tPdHkjTzDP0Oa11gZ6YX1xm72I8kqWxe05ckqRCGviRJhWhseD8iuoANwFLgELAu\nM3e01F8M/BnwJHBLZt5Ulz8I7Kt325mZ72qqj5IklaTJa/prgHmZuSIizgGur8uIiGcAfwUsAx4H\n+iPi88CjAJm5qsF+SZJUpCaH91cCmwEycytVwI96MfDjzNyXmYeBrwOvAc4GeiPinoi4r/6yIEmS\nOqDJ0F8ADLZsD9VD/qN1+1rqHgUWAvuBazPzAuByYFNLG0mSNA1NBuogML/1tTJz9E41+8bUzQf2\nAtuBTQCZ+TCwG1jSYB8lSSpGk6HfD1wEEBHLgW0tdT8CXhARp0bEPOA84H8Ca6mu/RMRZ1KNCOxq\nsI+SJBWjyYl8dwGrI6K/3l4bEZcAfZm5MSL+BLiH6ovHzZm5KyJuBm6NiC2jbVpGByRJ0jQ0FvqZ\nOQJcMaZ4e0v9F4AvjGnzJHBZU32SJKlkTpKTJKkQhr4kSYUw9CVJKoShL0lSIQx9SZIKYehLklQI\nQ1+SpEIY+pIkFaLJO/KpQ4aHhtmzZ8+R7T179jA87I0KJUnHx9CfBQ7tP8Q1D1zDqaefCsDeXXvp\nWdRDH30z3DNJ0mxi6M8SPX099C7sBeDA4IEZ7o0kaTbymr4kSYUw9CVJKoShL0lSIQx9SZIK4US+\ngg0NDTEwMPBL5YsWLaK7u3sGeiRJapKhX7CBgQGu3nw1PX09R8oOPnaQ9ReuZ/HixTPYM0lSEwz9\nwrX+FFCSdHLzmr4kSYUw9CVJKoShL0lSIQx9SZIK4US+k0Q7K/G5Wp8klc3QP0m0sxKfq/VJUtkM\n/ZNIOyvxuVqfJJXLa/qSJBWisTP9iOgCNgBLgUPAuszc0VJ/MfBnwJPALZl502RtJEnS1DV5pr8G\nmJeZK4APAdePVkTEM4C/AlYDrwHeHRHPrtucMl4bSZI0PU2G/kpgM0BmbgWWtdS9GPhxZu7LzMPA\n14Hz6jZ3H6ONJEmahiYn8i0ABlu2hyKiKzOH67p9LXWPAgsnaTOeboBvfvOb7Ny5syro7mbhwoUd\negvHZ2BggL0/28vjex4HYN//2wdzYeTAyJF9xpZNZZ+mnhfg0OOHeOihh1i0aFFHj40kaWpaP48f\neeSR0YdTWgq1ydAfBOa3bLeG974xdfOBgUnajGcJwFVXXTX93uqILWyZ6S5Ikia2BDjuOW9Nhn4/\ncDFwZ0QsB7a11P0IeEFEnArspxravxYYmaDNeL4FvBrYBQx1tvuSJJ1wuqkC/1tTaTxnZGRk8r2m\nICLm8NRMfIC1wCuAvszcGBGvBz5CNa/g5sz89HhtMnN7Ix2UJKkwjYW+JEk6sXhzHkmSCmHoS5JU\nCENfkqRCGPqSJBVi1q6y5336m1HfIvkW4NeBU4C/BB4CbgOGgR8A781MZ4BOU33r6X8BXkd1bG/D\nY9xREXEV1c+AnwH8V6qfEt+Gx7kj6s/hm4AXUh3TP6T6+fRteIynLSLOAT6Wmasi4vmMc1wj4g+B\nd1OtY/OXmfnFiZ5zNp/pH/Pe/pqWS4FfZOZ5wIXAp6iO7YfrsjnA789g/04K9ZerG6nuUzGHai0K\nj3EHRcRrgXPrz4jXAs/Dv+VOOx94Vma+Cvhz4L/gMe6IiPgAsJHq5AvG+YyIiDOA9wErgAuAj0bE\nvImedzaH/kT39tfU3Ul1/wSo/j4OAy/PzNHb9N0N/M5MdOwkcy3waaobS4HHuAnnA9+PiM8B/wz8\nE/AKj3NHHQAW1vdYWQg8gce4U34MvJEq4GH8z4hXAv2ZeTgzB+s2S3/pmVrM5tAf9z79M9WZk0Vm\n7s/MxyJiPtUXgP/E0X8nj1H959YURcQ7qEZTvlwXzeGp/9jgMe6U06luCPZm4HLg7/A4d1o/0EN1\nl9UbgRvwGHdEZn6Wash+VOtxbV2vZrx1bI5pNofk8d6nX22KiF8FvgLckZl/T3UNadToOgmaurXA\n6oj4KvAy4HaqgBrlMe6MfwO+nJlP1nf2PMjRH4ge5+n7ANWZZlD9Ld9BNX9ilMe4c1o/hxcw/no1\n84G9Ez3JbA79fuAigDbv0682RMSvAF8GPpCZt9XF34mI19SPfxdckWc6MvM1mfnazFwFfBd4G7DZ\nY9xxX6eal0JEnAn0Avd5nDvqWTw14rqXanK4nxfNGO+4fhN4dUScEhELqZat/8FETzJrZ+8Dd1Gd\nLfXX22tnsjMnkQ9TnQ19JCJGr+2/H7ihniDyQ+C/z1TnTlIjwH8ENnqMOyczvxgR50XEN6lOcK4E\nforHuZOuBW6NiK9RneFfRfWLFI9x54z+8uGXPiPq2fs3AF+j+hv/cGY+MdGTee99SZIKMZuH9yVJ\n0nEw9CVJKoShL0lSIQx9SZIKYehLklQIQ1+SpELM5t/pS+qQiHgz1cJVc6lOBu7IzOtmtleSOs0z\nfalwEXEWcB2wOjNfBpwLvDUiLp7ZnknqNM/0JZ1GdTe1ZwF7M3N/RLwdOBgRv0P1haAL+FfgD6iW\nA/4E8NtUdwv728y8pl7K9pp63+8DfwRsAF4CdAMfz8x/eDrfmKSjeUc+SUTEBmAd8B3gq1Qr0iVV\n0J+fmdsi4j9TLQU8TLWs55uoVli7H7gaeJzq9ti/lpmPRsTHgJ9n5icjYgHVehm/l5k/eVrfnKQj\nDH1JAETEEuCC+t/vA+uBt2bmK8bsdyfVNf9/rrf/GPh1qvXqP56Zy+vybwPPpFpjHaqVwf44M7/4\nNLwdSeNweF8qXET8B6A3M+8EbgNui4h1VEP5rfstoAruLo5e27uLpz5LDowpvzQzv1u3PwPY3cR7\nkNQeJ/JJ2g98NCJ+DSAi5lBdh/8X4LSIeHG93weB9wBfAd4eEV0R0Uv15eArHP1FgLrsyvo5l1Bd\nOnhOw+9F0gQMfalwmXk/8OfAFyLiIeAhqgC/CrgMuCMivge8CPgocCPwM+B7wIPA5zPz8/XTtV4v\nvBp4ZkR8H7gP+IDX86WZ5TV9SZIK4Zm+JEmFMPQlSSqEoS9JUiEMfUmSCmHoS5JUCENfkqRCGPqS\nJBXi/wPf9f9+/sJ+uQAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we'll compute the score for aligning the query sequence against itself. How does the actual score compare to the random distribution of scores? What does that suggest about our alignment?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "alignment = local_pairwise_align_ssw(query_seq, query_seq)\n", "print(alignment.score())\n", "\n", "# plot the distribution of random scores, but add in the actual score\n", "n, bins, patches = plt.hist(random_scores + [alignment.score()], facecolor='green', alpha=0.5, bins=range(0,100,1), normed=1)\n", "plt.xlabel('Score')\n", "plt.ylabel('Frequency')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "80.0\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAf0AAAFkCAYAAAAqpeIDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGvlJREFUeJzt3X2QHVd55/GvZoQ8DKMXlwyxDEkWCnggsIKAiIUEBoUI\nO16ciJcqcFwGhBWwTQhV2RRgaoPKSXYBv6QILGId+T0oL+UtDAlgGdY2Fgy7MsSAYDGPjASpgpV3\niaTR2LIl23dm/+ge+WoYzVyNbns0c76fqqm63adP33O7pPu7ffp0n3mjo6NIkqS5r2emGyBJkp4c\nhr4kSYUw9CVJKoShL0lSIQx9SZIKYehLklSI+U3tOCJ6gE3AcuAwsCEzd02w3d8AezPzsnr5XuBA\nXbw7My9qqo2SJJWksdAH1gELMnNVRJwJXF2vOyIi3gO8GPhavdwHkJlrGmyXJElFarJ7fzWwFSAz\ntwMr2gsjYhXwW8A1wLx69UuA/oi4PSLuqH8sSJKkLmjyTH8RMNy23IqInswciYhlwEeANwJvbdvm\nIHBlZl4XEc8DbouI52fmyERvEBGnAK8A9gCtRj6FJEknj15gGfCtzDx8vJWbDP1hYGHbck9beL8F\nOA34MnA61dn9fcA/AD8GyMz7I2Iv1Yf7+THe4xXA1xtouyRJJ7NXA9843kpNhv4gcB5wS0SsBHaM\nFWTmp4BPAUTEO4DIzJsj4mLg3wPvjYgzqHoL9kzyHnsAtmzZwumnn97Mp5Ak6STxwAMPcMEFF8Dk\n2XhMTYb+rcDaiBisl9dHxPnAQGZuPkada4EbImLbWJ1jde3XWgCnn346z3rWs7rSaEmSZoFpXdJu\nLPQzcxS4ZNzqnRNsd1Pb68eBC5tqkyRJJfPhPJIkFcLQlySpEIa+JEmFMPQlSSqEoS9JUiEMfUmS\nCmHoS5JUCENfkqRCGPqSJBXC0JckqRCGviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuSVAhDX5KkQhj6\nkiQVwtCXJKkQhr4kSYUw9CVJKoShL0lSIQx9SZIKYehLklQIQ1+SpEIY+pIkFcLQlySpEPOb2nFE\n9ACbgOXAYWBDZu6aYLu/AfZm5mWd1pEkScevyTP9dcCCzFwFfAi4evwGEfEe4MXAaKd1JEnS9DQZ\n+quBrQCZuR1Y0V4YEauA3wKuAeZ1UkeSJE1fk6G/CBhuW27V3fdExDLgI8Af8UTgT1pnrmi1Wuzd\nu/eov1arNdPNkiQVoLFr+lThvbBtuSczR+rXbwFOA74MnA70R8SPpqgzJwwNDXH51svpG+gD4NBD\nh9h4zkaWLl06wy2TJM11TZ5FDwLnAkTESmDHWEFmfiozV2TmGuBjwJbMvGmyOnNJ30Af/Yv76V/c\nfyT8JUlqWpNn+rcCayNisF5eHxHnAwOZubnTOg22T5KkojQW+pk5ClwybvXOCba7aYo6kiSpC+bU\nIDlJknRshr4kSYUw9CVJKoShL0lSIQx9SZIKYehLklQIQ1+SpEIY+pIkFcLQlySpEIa+JEmFMPQl\nSSqEoS9JUiEMfUmSCmHoS5JUCENfkqRCGPqSJBXC0JckqRCGviRJhTD0JUkqhKEvSVIhDH1Jkgph\n6EuSVAhDX5KkQhj6kiQVwtCXJKkQhr4kSYWY39SOI6IH2AQsBw4DGzJzV1v5m4EPAqPAlsz8ZL3+\nXuBAvdnuzLyoqTZKklSSxkIfWAcsyMxVEXEmcHW9jojoBT4KvBw4CPwwIj4LPAyQmWsabJckSUVq\nsnt/NbAVIDO3AyvGCjKzBbwgMx8Eng70Ao8CLwH6I+L2iLij/rEgSZK6oMnQXwQMty236i5/ADJz\nJCLeBHwHuIvqLP8gcGVmng1cDGxpryNJkqavyUAdBha2v1dmjrRvkJmfA54JnAK8HdgJbKnL7gf2\nAssabKMkScVoMvQHgXMBImIlsGOsICIWRcTdEbEgM0epzvBbwHqqa/9ExBlUvQV7GmyjJEnFaHIg\n363A2ogYrJfXR8T5wEBmbq4H7m2LiMeA7wGfpbq2f0NEbBurM753QJIkTU9joV+fwV8ybvXOtvLN\nwOZx5Y8DFzbVJkmSSuYgOUmSCmHoS5JUCENfkqRCGPqSJBXC0JckqRCGviRJhTD0JUkqhKEvSVIh\nDH1Jkgph6EuSVAhDX5KkQhj6kiQVwtCXJKkQhr4kSYUw9CVJKoShL0lSIQx9SZIKYehLklQIQ1+S\npEIY+pIkFcLQlySpEIa+JEmFMPQlSSqEoS9JUiEMfUmSCmHoS5JUiPlN7TgieoBNwHLgMLAhM3e1\nlb8Z+CAwCmzJzE9OVUeSJE1fk2f664AFmbkK+BBw9VhBRPQCHwVeB7wSuDQiltZ1TpmojiRJOjFN\nhv5qYCtAZm4HVowVZGYLeEFmPgg8HegFHq3r3DZRHUmSdGKaDP1FwHDbcqvuvgcgM0ci4k3Ad4C7\ngINT1ZEkSdPXZKAOAwvb3yszR9o3yMzPAc8ETgHe3kkdSZI0PU2G/iBwLkBErAR2jBVExKKIuDsi\nFmTmKNVZfmuyOpIk6cQ0NnofuBVYGxGD9fL6iDgfGMjMzRHxWWBbRDwGfA/4bL3dUXUabJ8kSUVp\nLPTrM/hLxq3e2Va+Gdg8QdXxdSRJUhc4SE6SpEIY+pIkFcLQlySpEIa+JEmFMPQlSSqEoS9JUiEM\nfUmSCmHoS5JUCENfkqRCGPqSJBXC0JckqRCGviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuSVAhDX5Kk\nQhj6kiQVwtCXJKkQhr4kSYUw9CVJKoShL0lSIQx9SZIKYehLklSI+VNtEBFfBm4APp+ZjzXfJEmS\n1IROzvQ/DvwucH9EfDoiXtFwmyRJUgOmPNPPzLuBuyPiqcBbgM9FxDCwGfhMZh5uuI2SJKkLpgx9\ngIhYA1wIrAVuA/6xfv1PwNnHqNMDbAKWA4eBDZm5q638fOD9wOPA94FLM3M0Iu4FDtSb7c7Mi6bx\nuSRJ0jidXNP/V+AnwPXAezPzkXr914BvT1J1HbAgM1dFxJnA1fU66l6DvwBenJmHIuLvgDdExFcB\nMnPN9D+SJEmaSCfX9F8HvDUzbwbmRcRzATKzlZm/OUm91cDWetvtwIq2skPAKzPzUL08H3gEeAnQ\nHxG3R8Qd9Y8FSZLUBZ2E/rnU4Q08A/hiRLyng3qLgOG25Vbd5U9mjmbmLwAi4n3A0zLzfwAHgSsz\n82zgYmDLWB1JknRiOgnU9wCvAsjMnwIvA97XQb1hYGH7e2XmyNhCRPRExFVUPQlvrlfvBLbU73U/\nsBdY1sF7SZKkKXQS+vOBR9uWHwVGjrFtu0GqXgIiYiWwY1z5NcApwBvbuvnXU137JyLOoOot2NPB\ne0mSpCl0Mnr/88CdEfGPwDzgTVSj9qdyK7A2Igbr5fX1iP0BqgGA7wK21fsG+ARwHXBDRGwbq9Pe\nOyBJkqavk9D/ENX9+WcBjwF/nZmfn6pSZo4Cl4xbvbPtde8xql7YQZskSdJxmrJ7vw7v+4BbgC8A\n+yPirKYbJkmSuquT+/Q/DZwH7AZG24q8l16SpFmkk+791wMx9lAeSZI0O3Uyen93h9tJkqSTWCdn\n+vuBH0bEN6mepAcwmpnvaq5ZkiSp2zoJ/a3139j1/HkcfW1fkiTNAp1MrXtjRDwbeBFwO/Crmbm7\n8ZZJkqSumvJafUS8jephPH8NLAUGI8J76SVJmmU6GaD3QaoZ84Yz8wGqZ+9f1mirJElS13US+q3M\nPDJbXmbuAVrNNUmSJDWhk4F8/7ue/nZBRLwUuBT4brPNkiRJ3dbJmf57gWcCjwDXU02Ze2mTjZIk\nSd3Xyej9h6gm3ZEkSbNYJ8/en2hq2/+Tmc9qoD2SJKkhnZzpH7kEEBFPAdYBq5pslCRJ6r7jeqZ+\nZj6WmbcAv91QeyRJUkM66d5/R9viPKon8x1urEWSJKkRndyyt4YnnrU/Cvwb8NbGWiRJkhrRyTX9\ndz4J7ZAkSQ3rpHv/J1Rn+PMmKB7NzOd0vVWSJKnrOune3wIcBK4BHgP+AHgV8KdM/ENAkiSdhDoJ\n/XMz82Vty9dExLsz8/821ShJktR9Hd2yFxGvb3u9jupRvJIkaRbp5Ex/A/DZiPgVqu78+4C3N9oq\nSZLUdZ2M3r8X+I2IOA04nJkPNt8sSZLUbVN270fEv4uIrwL/C1gYEXdFxLObb5okSeqmTrr3rwGu\nAj4GPEA1mv8m4KzJKkVED7AJWE71BL8Nmbmrrfx84P3A48D3qabrnTdZHUmSNH2dDOQ7LTNvB8jM\nkcy8FljcQb11wILMXEU1Ne/VYwUR8VTgL4DXZuar6v29oa5zykR1JEnSiekk9B+OiCPT6EbEq4BD\nHdRbDWwFyMztwIq2skPAKzNzbD/z63WrgduOUUeSJJ2ATkL/T4AvAc+NiO8Bf0/VLT+VRRx9a1+r\n7vInM0cz8xcAEfE+4GmZ+dXJ6kiSpBPTyTX9ZwCvAJ4P9AI/ysxOZtkbBha2Lfdk5sjYQh3mVwDP\nBd7cSR1JkjR9nYT+lZn5G8APjnPfg8B5wC0RsRLYMa78Gqou/Tdm5miHdSRJ0jR1Evq7IuJ6YDtP\nXMsfzcybp6h3K7A2Igbr5fX1iP0B4NvAu4BtwJ0RAfCJiep0/EkkSdKkjhn6EfHMzPw5sJfqVrqV\n4zaZNPTrs/dLxq3e2fa69xhVx9eRJEldMNmZ/heB38zMd0bEn2bmVU9WoyRJUvd1OjL+gkZbIUmS\nGuftcJIkFcLQlySpEJNd039RRPykfn1G22uoRu8/p8F2SZKkLpss9J//pLVCkiQ17pihn5k/fRLb\nIUmSGuY1fUmSCmHoS5JUCENfkqRCGPqSJBXC0JckqRCGviRJhTD0JUkqhKEvSVIhDH1Jkgph6EuS\nVAhDX5KkQhj6kiQVwtCXJKkQhr4kSYUw9CVJKoShL0lSIQx9SZIKYehLklQIQ1+SpELMb2rHEdED\nbAKWA4eBDZm5a9w2/cBXgXdlZtbr7gUO1JvszsyLmmqjJEklaSz0gXXAgsxcFRFnAlfX6wCIiBXA\nfwPOAEbrdX0AmbmmwXZJklSkJrv3VwNbATJzO7BiXPkCqh8B2bbuJUB/RNweEXfUPxYkSVIXNBn6\ni4DhtuVW3eUPQGZ+MzN/Nq7OQeDKzDwbuBjY0l5HkiRNX5OBOgwsbH+vzByZos5OYAtAZt4P7AWW\nNdM8SZLK0mToDwLnAkTESmBHB3XWU137JyLOoOot2NNUAyVJKkmTA/luBdZGxGC9vD4izgcGMnPz\nMepcB9wQEdvG6nTQOyBJkjrQWOhn5ihwybjVOyfYbk3b68eBC5tqkyRJJXOQnCRJhTD0JUkqhKEv\nSVIhDH1Jkgph6EuSVAhDX5KkQhj6kiQVwtCXJKkQTT6RT9PUarUYGho6at2SJUvo7e2doRZJkuYC\nQ/8kNDQ0xOVbL6dvoA+AQw8dYuM5G1m6dOkMt0ySNJsZ+iepvoE++hf3z3QzJElziNf0JUkqhKEv\nSVIhDH1Jkgph6EuSVAhDX5KkQhj6kiQVwtCXJKkQhr4kSYUw9CVJKoShL0lSIQx9SZIKYehLklQI\nQ1+SpEIY+pIkFcLQlySpEPOb2nFE9ACbgOXAYWBDZu4at00/8FXgXZmZndSRJEnT0+SZ/jpgQWau\nAj4EXN1eGBErgG3As4HRTupIkqTpazL0VwNbATJzO7BiXPkCqpDP46gjSZKmqcnQXwQMty236u57\nADLzm5n5s+Opo0qr1WLv3r2/9NdqtWa6aZKkk1hj1/Spwnth23JPZo40UKc4Q0NDXL71cvoG+o6s\nO/TQITaes5GlS5fOYMskSSezJs+iB4FzASJiJbCjoTpF6hvoo39x/5G/9h8AkiRNpMkz/VuBtREx\nWC+vj4jzgYHM3NxpnQbbJ0lSURoL/cwcBS4Zt3rnBNutmaKOJEnqAgfJSZJUiCa799WBkdYI+/bt\nO2rdvn37GBlx/KIkqbsM/Rl2+OBhrrj7Ck59+qlH1u3fs5++JX0MMDCDLZMkzTWG/klgbCT+mEeG\nH5nB1kiS5iqv6UuSVAhDX5KkQhj6kiQVwtCXJKkQDuRrUKvVYmho6Kh13o4nSZophn6DJpoYx9vx\nJEkzxdBvmLfjSZJOFl7TlySpEIa+JEmFMPQlSSqEoS9JUiEMfUmSCmHoS5JUCENfkqRCGPqSJBXC\n0JckqRCGviRJhTD0JUkqhM/e76Lxs+rN5Ix6E83wB7BkyRJ6e3tnoEWSpJlm6HfR+Fn1ZnJGvYlm\n+Dv00CE2nrORpUuXPuntkSTNPEO/y9pn1ZvpGfXGz/AnSSqb1/QlSSpEY2f6EdEDbAKWA4eBDZm5\nq638PODPgMeB6zPz2nr9vcCBerPdmXlRU22UJKkkTXbvrwMWZOaqiDgTuLpeR0Q8BfgrYAXwMDAY\nEV8AHgTIzDUNtkuSpCI12b2/GtgKkJnbqQJ+zAuBH2fmgcx8DPgG8BrgJUB/RNweEXfUPxYkSVIX\nNBn6i4DhtuVW3eU/VnagrexBYDFwELgyM88GLga2tNWRJEknoMlAHQYWtr9XZo7dtH5gXNlCYD+w\nE9gCkJn3A3uBZQ22UZKkYjQZ+oPAuQARsRLY0Vb2I+B5EXFqRCwAzgL+J7Ce6to/EXEGVY/Angbb\nKElSMZocyHcrsDYiBuvl9RFxPjCQmZsj4k+A26l+eFyXmXsi4jrghojYNlanrXdAkiSdgMZCPzNH\ngUvGrd7ZVv5F4Ivj6jwOXNhUmyRJKpmD5CRJKoShL0lSIQx9SZIKYehLklQIQ1+SpEIY+pIkFcLQ\nlySpEIa+JEmFaPKJfOqSkdYI+/btO7K8b98+RkZ8UKEk6fgY+rPA4YOHueLuKzj16acCsH/PfvqW\n9DHAwAy3TJI0mxj6s0TfQB/9i/sBeGT4kRlujSRpNvKaviRJhTD0JUkqhKEvSVIhDH1JkgrhQL6C\ntVothoaGfmn9kiVL6O3tnYEWSZKaZOgXbGhoiMu3Xk7fQN+RdYceOsTGczaydOnSGWyZJKkJhn7h\n2m8FlCTNbV7TlySpEIa+JEmFMPQlSSqEoS9JUiEcyDdHdDITn7P1SWqStwGf/Az9OaKTmficrU9S\nk7wN+ORn6M8hnczE52x9kprkbcAnN6/pS5JUiMbO9COiB9gELAcOAxsyc1db+XnAnwGPA9dn5rVT\n1ZEkSdPX5Jn+OmBBZq4CPgRcPVYQEU8B/gpYC7wGeHdEPKOuc8pEdSRJ0olpMvRXA1sBMnM7sKKt\n7IXAjzPzQGY+BnwDOKuuc9sx6kiSpBPQ5EC+RcBw23IrInoyc6QuO9BW9iCweIo6E+kFuOeee9i9\ne3e1oreXxYsXd+kjHJ+hoSH2/2w/D+97GIAD/+8AzIfRR0aPbDN+3XS2aWq/AIcfPsx9993HkiVL\nunpsJM19478Dwe+Ubmg/dg888MDYy2ndA9lk6A8DC9uW28P7wLiyhcDQFHUmsgzgsssuO/HW6oht\nbJvpJkiaQ/xOacQy4LjHvDUZ+oPAecAtEbES2NFW9iPgeRFxKnCQqmv/SmB0kjoT+RbwamAP0Opu\n8yVJOun0UgX+t6ZTed7o6OjUW01DRMzjiZH4AOuBlwMDmbk5It4AfIRqXMF1mfmZiepk5s5GGihJ\nUmEaC31JknRy8eE8kiQVwtCXJKkQhr4kSYUw9CVJKsSsnWXP5/Q3o35E8vXArwOnAH8J3AfcCIwA\nPwDem5mOAD1B9aOn/wV4HdWxvRGPcVdFxGVUtwE/BfivVLcS34jHuSvq7+FrgedTHdM/pLp9+kY8\nxicsIs4EPpaZayLiuUxwXCPiD4F3U81j85eZ+aXJ9jmbz/SP+Wx/nZALgF9k5lnAOcCnqY7th+t1\n84Dfn8H2zQn1j6trqJ5TMY9qLgqPcRdFxGuBV9bfEa8FnoP/lrvt9cDTMvNVwJ8D/wWPcVdExAeA\nzVQnXzDBd0REnA68D1gFnA18NCIWTLbf2Rz6kz3bX9N3C9XzE6D69/EY8LLMHHuk1m3A78xEw+aY\nK4HPUD1YCjzGTXg98P2I+Dzwz8A/AS/3OHfVI8Di+hkri4FH8Rh3y4+BN1EFPEz8HfEKYDAzH8vM\n4brO8l/aU5vZHPoTPqd/phozV2Tmwcx8KCIWUv0A+E8c/e/kIar/3JqmiHgnVW/KV+pV83jiPzZ4\njLvl6VQPBHsLcDHwd3icu20Q6KN6yuo1wCfxGHdFZn6Oqst+TPtxbZ+vZqJ5bI5pNofk8T6nXx2K\niF8F7gRuzsy/p7qGNGZsngRN33pgbUTcBbwUuIkqoMZ4jLvj34CvZObj9ZM9D3H0F6LH+cR9gOpM\nM6j+Ld9MNX5ijMe4e9q/hxcx8Xw1C4H9k+1kNof+IHAuQIfP6VcHIuJXgK8AH8jMG+vV34mI19Sv\nfxecPeNEZOZrMvO1mbkG+C7wdmCrx7jrvkE1LoWIOAPoB+7wOHfV03iix3U/1eBwvy+aMdFxvQd4\ndUScEhGLqaat/8FkO5m1o/eBW6nOlgbr5fUz2Zg55MNUZ0MfiYixa/vvBz5ZDxD5IfDfZ6pxc9Qo\n8B+BzR7j7snML0XEWRFxD9UJzqXAT/E4d9OVwA0R8XWqM/zLqO5I8Rh3z9idD7/0HVGP3v8k8HWq\nf+MfzsxHJ9uZz96XJKkQs7l7X5IkHQdDX5KkQhj6kiQVwtCXJKkQhr4kSYUw9CVJKsRsvk9fUpdE\nxFuoJq6aT3UycHNmXjWzrZLUbZ7pS4WLiGcCVwFrM/OlwCuBt0XEeTPbMknd5pm+pNOonqb2NGB/\nZh6MiHcAhyLid6h+EPQA/wr8AdV0wJ8AfpvqaWF/m5lX1FPZXlFv+33gj4BNwIuAXuDjmfkPT+YH\nk3Q0n8gniYjYBGwAvgPcRTUjXVIF/eszc0dE/GeqqYBHqKb1fDPVDGtfAy4HHqZ6PPavZeaDEfEx\n4OeZ+amIWEQ1X8bvZeZPntQPJ+kIQ18SABGxDDi7/vt9YCPwtsx8+bjtbqG65v/P9fIfA79ONV/9\nxzNzZb3+28BTqeZYh2pmsD/OzC89CR9H0gTs3pcKFxH/AejPzFuAG4EbI2IDVVd++3aLqIK7h6Pn\n9u7hie+SR8atvyAzv1vXPx3Y28RnkNQZB/JJOgh8NCJ+DSAi5lFdh/8X4LSIeGG93QeB9wB3Au+I\niJ6I6Kf6cXAnR/8QoF53ab3PZVSXDp7V8GeRNAlDXypcZn4N+HPgixFxH3AfVYBfBlwI3BwR3wNe\nAHwUuAb4GfA94F7gC5n5hXp37dcLLweeGhHfB+4APuD1fGlmeU1fkqRCeKYvSVIhDH1Jkgph6EuS\nVAhDX5KkQhj6kiQVwtCXJKkQhr4kSYX4/5TVJOlbgllEAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Let's do this experiment again, but this time quanitfy the result by computing the fraction of the random alignments that achieve equal or better scores than the random sequences.**" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import fraction_better_or_equivalent_alignments\n", "%psource fraction_better_or_equivalent_alignments" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mfraction_better_or_equivalent_alignments\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery_sequence\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msubject_sequence\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m99\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlocal_pairwise_align_ssw\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mrandom_scores\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgenerate_random_score_distribution\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery_sequence\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msubject_sequence\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mn\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maligner\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0malignment\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maligner\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery_sequence\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msubject_sequence\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcount_better\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0me\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrandom_scores\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0me\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0malignment\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mscore\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcount_better\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mcount_better\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "print(fraction_better_or_equivalent_alignments(query_seq, query_seq))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.0\n" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "What does this tell us about the quality of our alignment?\n", "\n", "Let's now try this for some harder cases, where the query and subject sequences are not identical. \n", "\n", "First, let's generate a longer subject sequence at random. Then, we'll create a random query sequence and compare it. Since we're doing this in two random steps, we know that these sequences are not homologous. Does the resulting fraction reflect that?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "subject = \"\".join([choice('ACGT') for e in range(250)])\n", "query = \"\".join([choice('ACGT') for e in range(250)])\n", "\n", "print(query)\n", "print(subject)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACGGACTGTTTGTTACACGCCGTAGGACTTCTAAGTCGAGCCGGAACAGGGGCCGCGCCGCCTCCAGGGCTCTCTTAATGCATCTTTAGGAAGTAGACCTTTCTTACTGAAAGAATCAACACTCGAGCCGTCCGACTGATGTAAGCGATCGTACACGCAGCTCACTAACAAATAAATGGTATACTTACACCATTAACCTTAAATGTATATGTCAGATAGGCAAGGAATGCACCCTTAATTGGCCGGGTTG\n", "ACATTCTTTCTCCGAAGCTTTAGCTGTACGGTGATTGAAATTCCTAAATGGTCTGGTTTTATGAACGAGTCGCCGAAGATTAATATACCAGGCCAATTAACGTCAAGACGGGTGTGTATGCGACATATGTGTAGGTATGTTGGTCGTTATTCAAAGCCAGTCAACTGACAATTCCGTGATTCCAGGCCCTCATTAACACATACCGTCGCTACCTCTCTGATAATGTCGCGCACACTAGTAGCTGTAGCGG\n" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "print(fraction_better_or_equivalent_alignments(query,subject))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.33\n" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**We've now look at two extremes: where sequences are obviously homologous, and where sequences are obviously not homologous. Next, we'll explore the region between these.** We'll obscurce the homology of a pair of sequences by randomly introducing some number of substitutions to make them approximately ``percent_id`` equal. By doing this, we can explore how this strategy works for increasingly more distantly related pairs of sequences." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def query_at_percent_id(percent_id, subject):\n", " result = []\n", " for b in subject:\n", " if random() < percent_id:\n", " result.append(b)\n", " else:\n", " # choose a base at random that is not the current base\n", " # i.e., simulate a substitution event\n", " result.append(choice([c for c in 'ACGT' if c != b]))\n", " return ''.join(result)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 32 }, { "cell_type": "code", "collapsed": false, "input": [ "q = query_at_percent_id(0.95,subject)\n", "print(q)\n", "print(subject)\n", "print(fraction_better_or_equivalent_alignments(q,subject))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACATTATTTCTCCGAAGCTTTAGCTGTACGGTGATTGAAATTCCTAAATGGTCTGGTTTTATGAACGAGTCTCCGTAGATTAATATTCCAGGCCAATTAACGTCAAGACGGCTGTGTATGCGACATATGTGTAGGTATGTTGGTCGTTATGCAAAGCCAGTCAACTGACAATTCCGTGATTCCAGGCCCTCATTAACACTTACCGTCGCAACCTCTCTGATAATGTCGCGCACACTAGTAGCTGTAGCGG\n", "ACATTCTTTCTCCGAAGCTTTAGCTGTACGGTGATTGAAATTCCTAAATGGTCTGGTTTTATGAACGAGTCGCCGAAGATTAATATACCAGGCCAATTAACGTCAAGACGGGTGTGTATGCGACATATGTGTAGGTATGTTGGTCGTTATTCAAAGCCAGTCAACTGACAATTCCGTGATTCCAGGCCCTCATTAACACATACCGTCGCTACCTCTCTGATAATGTCGCGCACACTAGTAGCTGTAGCGG\n", "0.0" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "q = query_at_percent_id(0.25,subject)\n", "print(q)\n", "print(subject)\n", "print(fraction_better_or_equivalent_alignments(q,subject))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "GAAACGCCACCCAGGAGATCGTTTTCCAAGCATCGCTGTACCTTTAGTGAGCGGCTCTCTCAAAATTTTAAAAATCGCTTTCAGGCCCTTAGAACTAGATAGCTGGATTAGATTCTTATGTCGACAAGATCTAAGTTCCCATTACCTAAAGCCCACTATCGTAGCCCTATCGCATCTGTTCCACTCTCGTGCTCGCGTAGAACTAATATTGTGGTATATGGGCGACACGTCGTTTCCCGTCCGGACACTA\n", "ACATTCTTTCTCCGAAGCTTTAGCTGTACGGTGATTGAAATTCCTAAATGGTCTGGTTTTATGAACGAGTCGCCGAAGATTAATATACCAGGCCAATTAACGTCAAGACGGGTGTGTATGCGACATATGTGTAGGTATGTTGGTCGTTATTCAAAGCCAGTCAACTGACAATTCCGTGATTCCAGGCCCTCATTAACACATACCGTCGCTACCTCTCTGATAATGTCGCGCACACTAGTAGCTGTAGCGG\n", "0.55" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case we know that our input sequences are \"homologous\" because `query` is derived from `subject`. Our method detected that homology when `query` was roughly 95% identical to `subject` (because we got a low fraction) but did not detect that homology when `query` was roughly 25% identicial to `subject`. This gives us an idea of the limit of detection of this method, and is a **real-world problem that biologists face: as sequences are more divergent from one another, detecting homology becomes increasingly difficult.**\n", "\n", "**If we want to gain some insight into our limit of detection, we can run a simulation.** If we simulate alignment of different pairs of sequences in steps of different percent identities, we can see where we start failing to observe homology. The following cell illustrates a very simplistic simulation, though this still takes a few minutes to run. \n", "\n", "What does this tell us about our limit of detection for homology? What are some things that we might want to do more robustly if we weren't as concerned about runtime?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "percent_ids = np.arange(0.0, 1.0, 0.05)\n", "num_trials = 10\n", "results = []\n", "\n", "for percent_id in percent_ids:\n", " p_values = []\n", " for i in range(num_trials):\n", " subject = \"\".join([choice('ACGT') for e in range(250)])\n", " q = query_at_percent_id(percent_id,subject)\n", " p = fraction_better_or_equivalent_alignments(q,subject)\n", " p_values.append(p)\n", " results.append((percent_id, np.median(p_values), np.mean(p_values)))\n", "pd.DataFrame(results, columns=[\"Percent id between query and subject\", \n", " \"Median p-value\", \"Mean p-value\"])" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Percent id between query and subjectMedian p-valueMean p-value
0 0.00 0.310 0.327
1 0.05 0.730 0.556
2 0.10 0.670 0.559
3 0.15 0.410 0.400
4 0.20 0.485 0.539
5 0.25 0.840 0.747
6 0.30 0.725 0.744
7 0.35 0.570 0.571
8 0.40 0.655 0.535
9 0.45 0.445 0.459
10 0.50 0.115 0.191
11 0.55 0.000 0.004
12 0.60 0.000 0.000
13 0.65 0.000 0.000
14 0.70 0.000 0.000
15 0.75 0.000 0.000
16 0.80 0.000 0.000
17 0.85 0.000 0.000
18 0.90 0.000 0.000
19 0.95 0.000 0.000
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 39, "text": [ " Percent id between query and subject Median p-value Mean p-value\n", "0 0.00 0.310 0.327\n", "1 0.05 0.730 0.556\n", "2 0.10 0.670 0.559\n", "3 0.15 0.410 0.400\n", "4 0.20 0.485 0.539\n", "5 0.25 0.840 0.747\n", "6 0.30 0.725 0.744\n", "7 0.35 0.570 0.571\n", "8 0.40 0.655 0.535\n", "9 0.45 0.445 0.459\n", "10 0.50 0.115 0.191\n", "11 0.55 0.000 0.004\n", "12 0.60 0.000 0.000\n", "13 0.65 0.000 0.000\n", "14 0.70 0.000 0.000\n", "15 0.75 0.000 0.000\n", "16 0.80 0.000 0.000\n", "17 0.85 0.000 0.000\n", "18 0.90 0.000 0.000\n", "19 0.95 0.000 0.000" ] } ], "prompt_number": 39 } ], "metadata": {} } ] }