{ "metadata": { "name": "", "signature": "sha256:4760d0f817697b8b03d8addd60b88dfe2cecb94766621c19139a0c8161d6c283" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Generalized dynamic programming for multiple sequence alignment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's possible to generalize Smith-Waterman and Needleman-Wunsch, the dynamic programming algorithms that we explored for pairwise sequence aligment, to identify the optimal alignment of more than two sequences. Remember that our scoring scheme for pairwise alignment with Smith-Waterman looked like the following:\n", "\n", "$$\n", "\\begin{align}\n", "& F(0, 0) = 0\\\\\n", "& F(i, 0) = F(i-1, 0) - d\\\\\n", "& F(0, j) = F(0, j-1) - d\\\\\n", "\\\\\n", "& F(i, j) = max \\begin{pmatrix}\n", "& F(i-1, j-1) + s(c_i, c_j)\\\\\n", "& F(i-1, j) - d\\\\\n", "& F(i, j-1) - d)\\\\\n", "\\end{pmatrix}\n", "\\end{align}\n", "$$\n", "\n", "To generalize this to three sequences, we could create $3 \\times 3$ scoring, dynamic programming, and traceback matrices. Our scoring scheme would then look like the following:\n", "\n", "$$\n", "\\begin{align}\n", "& F(0, 0, 0) = 0\\\\\n", "& F(i, 0, 0) = F(i-1, 0, 0) - d\\\\\n", "& F(0, j, 0) = F(0, j-1, 0) - d\\\\\n", "& F(0, 0, k) = F(0, 0, k-1) - d\\\\\n", "\\\\\n", "& F(i, j, k) = max \\begin{pmatrix}\n", "F(i-1, j-1, k-1) + s(c_i, c_j) + s(c_i, c_k) + s(c_j, c_k)\\\\\n", "F(i, j-1, k-1) + s(c_j, c_k) - d\\\\\n", "F(i-1, j, k-1) + s(c_i, c_k) - d\\\\\n", "F(i-1, j-1, k) + s(c_i, c_j) - d\\\\\n", "F(i, j, k-1) - 2d\\\\\n", "F(i, j-1, k) - 2d\\\\\n", "F(i-1, j, k) - 2d\\\\\n", "\\end{pmatrix}\n", "\\end{align}\n", "$$\n", "\n", "However the complexity of this algorithm is much worse than for pairwise alignment. For pairwise alignment, remember that if aligning two sequences of lengths $m$ and $n$, the runtime of the algorithm will be proportional to $m \\times n$. If $n$ is longer than or as long as $m$, we simplify the statement to say that the runtime of the algorithm will be be proportional to $n^2$. This curve has a pretty scary trajectory: runtime for pairwise alignment with dynamic programming is said to scale quadratically." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# first we need to do some configuration of the notebook, don't worry about this for now\n", "from __future__ import division, print_function\n", "from functools import partial\n", "from IPython.core import page\n", "page.page = print" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "import matplotlib.pyplot as plt\n", "\n", "seq_lengths = range(25)\n", "s2_times = [t ** 2 for t in range(25)]\n", "\n", "plt.plot(range(25), s2_times)\n", "plt.xlabel('Sequence Length')\n", "plt.ylabel('Runtime (s)')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEPCAYAAACp/QjLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm4XfPZ//H3pyFRtEIRoSliitQ8V6hD8KghPIqgPKnW\nVGMVP0l7IaXG5zG0isbYVFAhhkRbSVSOKSItiSBCoqJCJCiSIGS4f39815HtJDk55+TsvfbweV3X\nvs7aa6+1zn1Wdva9v7MiAjMzs0JfyzsAMzMrP04OZma2GCcHMzNbjJODmZktxsnBzMwW4+RgZmaL\nKXpykNRR0n2SXpE0UdLOktaQNFLSa5JGSOpYcHw/SZMlTZK0b7HjMzOzxZWi5PBb4K8RsTmwFTAJ\n6AuMjIhNgb9nz5HUHegNdAf2A26Q5NKNmVmJFfWDV9JqwO4RcRtARMyPiI+BXsDA7LCBwCHZ9sHA\n3RExLyKmAlOAnYoZo5mZLa7Y38o3BN6TdLuk5yXdLGkVoFNEzMiOmQF0yrbXBaYVnD8NWK/IMZqZ\nWSPFTg4rANsBN0TEdsAnZFVIDSLN39HUHB6e38PMrMRWKPL1pwHTIuIf2fP7gH7Au5LWiYh3JXUG\nZmavvw10KTj/29m+L0lysjAza4WIUHOPLWrJISLeBd6StGm2a2/gZWAY0Cfb1wd4MNseChwpqb2k\nDYFNgLFLuK4fEVx44YW5x1AuD98L3wvfi6YfLVXskgPA6cCdktoDrwPHAe2AwZJ+CkwFjgCIiImS\nBgMTgfnAKdGav8rMzJZL0ZNDRLwA7LiEl/ZeyvGXApcWNSgzM2uSxxBUsLq6urxDKBu+F4v4Xizi\ne9F6qrRaG0muaTIzayFJRLk0SJuZWWVycjAzqyBz55bm9zg5mJlViH/9CzbfHD76qPi/y8nBzKwC\nzJ0Lhx8Ov/gFdOy47OOXlxukzcwqwMknw4cfwp//DGp2s/IiLW2QLsUgODMzWw533AH19fCPf7Qu\nMbSGSw5mZmXsxRdhr71g1CjYYovWX8ddWc3MqsTs2XDYYXD11cuXGFrDJQczszIUAUcemRqfBwxY\n/uu5zcHMrAr8/vcweTKMHp3P73fJwcyszIwZA716pZ9du7bNNd3mYGZWwd5/H3r3hltuabvE0Bou\nOZiZlYmFC2H//WHrreGKK9r22i45mJlVqEsugc8+Sz/z5gZpM7My8OijcOON8NxzsEIZfDKXQQhm\nZrVt2jQ49li46y7o3DnvaBJXK5mZ5WjevNQAfcYZsOeeeUeziBukzcxy9ItfpPEMDz0EXyvi13UP\ngjMzqxBDhsADD6R2hmImhtZwycHMLAeTJ0OPHvDXv8IOOxT/97krq5lZmZs9Gw45BC6+uDSJoTVc\ncjAzK6GINNPqt74FN91Uut/rNgczszJ26aXwzjup22o5c3IwMyuRv/wFbrghrejWoUPe0TTNycHM\nrAQmT4bjjoMHH4R11807mmVzg7SZWZEVNkDvumve0TRP0ZODpKmSJkgaJ2lstm8NSSMlvSZphKSO\nBcf3kzRZ0iRJ+xY7PjOzYoqAH/84dVs96aS8o2m+UpQcAqiLiG0jYqdsX19gZERsCvw9e46k7kBv\noDuwH3CDJJduzKxiNTRAX3dd3pG0TKk+eBt3n+oFDMy2BwKHZNsHA3dHxLyImApMAXbCzKwCNTRA\nDxlS/g3QjZWq5PCopH9KOiHb1ykiZmTbM4BO2fa6wLSCc6cB65UgRjOzNtXQAH3vvZXRAN1YKXor\n9YiI6ZLWAkZKmlT4YkSEpKZGtXnEm5lVlEpsgG6s6MkhIqZnP9+T9ACpmmiGpHUi4l1JnYGZ2eFv\nA10KTv92tu8r+vfv/+V2XV0ddXV1xQnezKyFFi6EPn1SUjjxxPziqK+vp76+vtXnF3X6DEkrA+0i\nYrakVYARwK+BvYEPIuIKSX2BjhHRN2uQvouUQNYDHgU2Lpwvw9NnmFk5u+QSePhhqK8vr3aGcps+\noxPwgKSG33VnRIyQ9E9gsKSfAlOBIwAiYqKkwcBEYD5wijOBmVWKShoBvSyeeM/MrA00TMH94IPl\n2c7gKbvNzEqsGhqgG3PJwcxsOSxc+NUpuNXs7+alVW5tDmZmVe388+G99+Duu8s3MbSGk4OZWSvd\ncUdKCs8+W/kN0I25WsnMrBVGj4aDD4ZRo2CLLfKOZtncIG1mVmRvvpnaGQYOrIzE0BpODmZmLTB7\nNhx4IPy//wf77593NMXjaiUzs2ZasCB1We3cGQYMqKwGaFcrmZkVSd++MGcO/P73lZUYWsO9lczM\nmuHWW9Po5zFjoH37vKMpPlcrmZktw+OPw+GHwxNPQLdueUfTOq5WMjNrQ6+/Dr17w513Vm5iaA0n\nBzOzpfj4YzjoILjgAthnn7yjKS1XK5mZLcH8+anL6sYbpwboSudqJTOzNnD22WlSvWuvzTuSfLi3\nkplZI3/4AwwfnnomrVCjn5KuVjIzK/Doo/CjH8HTT6cqpWrhaiUzs1aaNCklhnvuqa7E0BpODmZm\nwLvvprmSLr8c6uryjiZ/Tg5mVvPmzEk9k/r0geOOyzua8uA2BzOrafPnp3UZOnVKU2RU65xJbnMw\nM2umCDj11JQgKm2W1WKr0U5aZmapfeHZZ9OcSSuumHc05cXJwcxq0p13pvEMzzwD3/xm3tGUHycH\nM6s5jz0GZ52V1n9ed928oylPbnMws5ry0ktw5JFpLMN3v5t3NOXLycHMasbbb8MBB8A118Cee+Yd\nTXlzcjCzmjBrVkoMJ5+cRkFb0zzOwcyq3rx5KTF07Qo33libXVbLbpyDpHaSxkkalj1fQ9JISa9J\nGiGpY8Gx/SRNljRJ0r7Fjs3Mql8EnHgidOiQ1mWoxcTQGqWoVjoTmAg0fN3vC4yMiE2Bv2fPkdQd\n6A10B/YDbpDkai8zWy4XXQQvvgh//nPtTr/dGkX98JX0bWB/4BagIV/3AgZm2wOBQ7Ltg4G7I2Je\nREwFpgA7FTM+M6tut98OAwfCww/DKqvkHU1lKfY382uAc4GFBfs6RcSMbHsG0CnbXheYVnDcNGC9\nIsdnZlVq+HDo2xf++ldYZ528o6k8RStkSToQmBkR4yTVLemYiAhJTbUuL/G1/v37f7ldV1dHnefX\nNbMCY8bAMcfAAw9At255R5OP+vp66uvrW31+0XorSboUOBaYD6wEfBO4H9gRqIuIdyV1BkZFRDdJ\nfQEi4vLs/EeACyPi2UbXdW8lM1uql1+Gnj3TDKsHHJB3NOWjbHorRcQvI6JLRGwIHAk8FhHHAkOB\nPtlhfYAHs+2hwJGS2kvaENgEGFus+Mys+kydCvvtB1dd5cSwvErZdt/wdf9yYLCknwJTgSMAImKi\npMGknk3zgVNcRDCz5poxA/bdF84914Pc2oIHwZlZxfv44zQdxkEHwa9/nXc05aml1UpODmZW0T77\nLFUlbbklXHedB7ktjZODmdWM+fPh0ENh1VVh0CD4mofNLlXZNEibmRXTwoVw/PFp3qQ//tGJoa15\nMLmZVZwIOOccmDwZRoyA9u3zjqj6ODmYWcW57DIYOTKt/expMYqjyeQgaTvgKOD7wAak7qhvAk8A\nd0XEuGIHaGZWaMAAuOUWeOopWH31vKOpXkttkJb0V+BD0uC0scB00uR5nUkT4h0EdIyIkg41cYO0\nWe269174+c/h8cdh443zjqaytFlvJUmFE+Qt7Zi1I2JmC2NcLk4OZrVp5Mg0X9KIEbD11nlHU3na\nrLdSQ2KQtIqkdtn2ZpJ6SVoxO6akicHMatOzz6ZRz0OGODGUSnM6fz0JdJC0HjCcNJneH4sZlJlZ\ngxdegF690toMu+2WdzS1oznJQRHxKXAocENEHA5sUdywzMzSDKv77ZeW9/REeqXVrGEjkr4H/Aj4\nS0vOMzNrrVdfTRPpXXUVHH543tHUnuZ8yP8c6Ac8EBEvS9oIGFXcsMyslr3+Ouy9N/zmN3D00XlH\nU5s8t5KZlZU334Q99oB+/eCkk/KOpnq0WW8lSbdJ2rGJ13eWdHtLAzQzW5pp02CvveDss50Y8tbU\nOIctgXOBXYBXWTQIbh1gM2A08H8R8VJpQv0yLpcczKrQ9OmpxHDiiWneJGtbbT5lt6QOwLbA+iya\nPuOFiJi7PIG2lpODWfWZORPq6tJYhl/9Ku9oqpPXczCzivLBB2kVt0MOgYsuyjua6uXkYGYV48MP\noWdP2GcfuPxyr+JWTE4OZlYRZs1KSWHXXeHqq50Yiq1oK8FJWrl1IZmZfdWcOfCDH8AOOzgxlKtl\nJgdJu0qaSOqxhKRtJN1Q9MjMrCp9+ikcdBB07w7XXefEUK6aU3K4FtgPeB8gIsYDexQzKDOrTnPn\npobnLl3Soj1e97l8NeufJiL+3WjX/CLEYmZV7NNP0+yq3/oW3HabE0O5a84/z78l9QCQ1F7SOcAr\nxQ3LzKrJnDlpVtV11oE77oAVvHp92WtOcvgZcCqwHvA2aUDcqcUMysyqx6xZadrtjTZKazI4MVQG\nd2U1s6L58MOUGLbfPq3J4Kqk/LS0K+syc7ikrsDpwAYFx0dE9GpVhGZWEz74II1j2GMPd1etRM0p\n4D0I3AIMAxZm+5b51V3SSsDjQAegPfBQRPSTtAZwD2mupqnAERHxUXZOP+AnwALgjIgY0aK/xszK\nwsyZaT2G/feHyy5zYqhEzZl4b2xE7NSqi0srR8SnklYAngLOAXoB70fElZLOA1aPiL6SugN3ATuS\n2jceBTaNiIWNrulqJbMyNn16mhLjiCPgwgudGMpFMUZIXyepv6TvSdqu4dGci2drT0MqObQDPiQl\nh4HZ/oHAIdn2wcDdETEvIqYCU4BWJSUzy8dbb6VqpGOOgf79nRgqWXOqlb4LHAvsyaJqJbLnTZL0\nNeB5YCPgxmyZ0U4RMSM7ZAbQKdteFxhTcPo0UgnCzCrA1KlpoZ5TT02L9Vhla05yOBzYMCK+aOnF\nsyqhbSStBgyXtGej10NSU3VES3ytf//+X27X1dVRV1fX0tDMrA1NmZKqks49F047Le9oDKC+vp76\n+vpWn9+cNocHgZMKvu237hdJ5wOfAccDdRHxrqTOwKiI6CapL0BEXJ4d/whwYUQ82+g6bnMwKyOT\nJqVeSeefn1Zxs/JUjDaH1YFJkkZIGpY9hjYjkDUldcy2vw7sA4wDhgJ9ssP6kHpDke0/MhuFvSGw\nCTC2uX+ImZXeSy+lqqSLL3ZiqDbNqVa6sJXX7gwMzNodvgbcERF/lzQOGCzpp2RdWQEiYqKkwcBE\n0txNp7iIYFa+xo9P025fdRUcfXTe0Vhb8whpM2uxp56CH/4Qrr8eDjss72isOdqsWknS09nPOZJm\nN3rMaotgzazyDBsGhx4KgwY5MVQzlxzMrNkGDoTzzoOhQ2Enj0KqKG3eIC3pjubsM7PqdtVVcMEF\nMGqUE0MtaE6D9BaFT7KpMLYvTjhmVm4ioF8/eOih1NbQpUveEVkpLDU5SPol0A/4uqTZBS/NA24q\ndmBmlr/58+Gkk1KX1aeeSqu4WW1oziC4yyOib4niWSa3OZiVxty5cNRRaXnPIUNg1VXzjsiWR0vb\nHJrVIC1pPdIU21+WNCLiiVZFuJycHMyK7+OP4eCD07Kef/oTtG+fd0S2vIqx2M8VQG/S4LQFBS/l\nkhzMrLhmzEirt/XoAb/9LbRrl3dElofmVCu9BmwZEZ+XJqSmueRgVjxvvAH77pum3L7gAk+5XU2K\nMbfS66T1GMysik2YALvvDmed5UV6rHldWT8Dxkv6O9BQeoiIOKN4YZlZKTVMh/G730Hv3nlHY+Wg\nOclhaPYo5Hodsypx//1w8slw551p6m0z8PQZZjUrAq6+Gq65Jk2HsV2zFv+1SlWM3kpvLGF3RETX\nFkVmZmVj/nw4/XR4+ml45hmPerbFNadaaceC7ZWAwwCPkzSrULNmLWpXeOop+OY3843HylOrqpUk\nPR8RuRRCXa1k1npvvQUHHgjf+x78/vewQnO+HlpVKEa10vYsaoD+GrAD4GExZhVm3Djo1QvOPBPO\nPttdVa1pzfnecBWLksN8Cpb2NLPK8PDD8JOfwI03pi6rZsuyzOQQEXWFzyWJlBxeLVJMZtaGrrsO\nLrssreC28855R2OVoqkpu1cFTgI2Al4C/gAcDFwCTAHuKUWAZtY6Cxak6qMRI1KvpA03zDsiqyRN\nlRz+BMwCxgD7AD8G5gJHR8T44odmZq31ySdw9NEwZw6MHg0dO+YdkVWapfZWkjQhIrbKttsB04H1\nI+KzEsa3pLjcW8msCdOnpx5JW20FAwZ4um1L2nLivS+n546IBcDbeScGM2vahAmwyy5w6KFw221O\nDNZ6TZUcFgCfFuz6OmkSPkgjpHMZOuOSg9mS3Xcf/OxnafK8o47KOxorN202ziEiPJbBrAIsWJDW\nXhg0CIYP9xxJ1jY8PtKsgn30UVqYZ84c+Mc/YO21847IqkVzFvsxszL0yitp3ELXrjBypBODtS0n\nB7MKNGwY7LEH9O2b2hhWXDHviKzaFDU5SOoiaZSklyW9JOmMbP8akkZKek3SCEkdC87pJ2mypEmS\n9i1mfGaVZuFCuPhiOOWUlCCOOy7viKxaFXWxH0nrAOtExPhsxPVzwCHAccD7EXGlpPOA1SOir6Tu\nwF2kacLXAx4FNo2IhQXXdG8lq0mzZ0OfPvDuuzBkCHTunHdEVknacpzDcouIdxtGU0fEHOAV0od+\nL2BgdthAUsKAND3H3RExLyKmkqbp2KmYMZpVgilT0viFNdeEUaOcGKz4StbmIGkDYFvgWaBTRMzI\nXpoBdMq21wWmFZw2jZRMzGrW8OHQoweccQbcdBN06JB3RFYLStKVNatSGgKcGRGzVTCRfESEpKbq\niRZ7rX///l9u19XVUVdX12axmpWLCPjf/4Vrr03VSLvtlndEVknq6+upr69v9flFbXMAkLQi8DDw\nt4i4Nts3CaiLiHcldQZGRUQ3SX0BIuLy7LhHgAsj4tmC67nNware7Nlwwgnw+utw//1e49mWX1m1\nOWRrP9wKTGxIDJmhQJ9suw/wYMH+IyW1l7QhsAkwtpgxmpWbCRNghx3gG9+AJ55wYrB8FLu30m7A\nE8AEFlUP9SN94A8GvkO2slxEfJSd80vgJ6RV586MiOGNrumSg1WlCLjlFvjlL1NV0o9+lHdEVk1a\nWnIoerVSW3NysGo0ezacfHIqNdx7L3TrlndEVm3KqlrJzJatoRpp5ZXh2WedGKw8ODmY5SQCbr4Z\nevZMs6refHNKEGblwLOymuWgsBrpySddWrDy45KDWYm5GskqgZODWYm4GskqiauVzErA1UhWaVxy\nMCuyceNcjWSVx8nBrEgWLIBLL4X/+i+48EJXI1llcbWSWRG8/jr8z//ASivBc895CgyrPC45mLWh\nhkbnXXaBI45Iazs7MVglcsnBrI3MmAHHHw9vvw2PPw7du+cdkVnrueRg1gYefBC22Qa22grGjHFi\nsMrnkoPZcpg1C846K5UUhgyBXXfNOyKztuGSg1krPflkKi20awfjxzsxWHVxycGshT7/PHVN/dOf\nYMAAOOigvCMya3tODmYt8OKLcOyxsMEG8MILsNZaeUdkVhyuVjJrhrlz03xIe+0FZ5wBDzzgxGDV\nzSUHs2V48kk44YTUA2n8eFhvvbwjMis+Jwezpfj4YzjvPBg2DK67Dg49NO+IzErH1UpmS/DAA/Dd\n76YRzy+/7MRgtcclB7MC77wDp58OL70Ed90F3/9+3hGZ5cMlBzNg4UK46SbYeuvUtvDCC04MVttc\ncrCa9+qrqcH5iy/gscdgyy3zjsgsfy45WM364gu45BLo0QMOOwyeftqJwayBSw5Wk554Ak49NU2n\n/dxzsP76eUdkVl6cHKym/PvfcO658Mwz8H//B4cfDlLeUZmVH1crWU347DO46CLYdtu0hvOkSWkx\nHicGsyVzycGqWgTcfz+cfTbsuGOqQtpgg7yjMit/RS05SLpN0gxJLxbsW0PSSEmvSRohqWPBa/0k\nTZY0SdK+xYzNqt+LL0LPntC/P9x+O9x7rxODWXMVu1rpdmC/Rvv6AiMjYlPg79lzJHUHegPds3Nu\nkORqL2ux//wnDWTr2TONbB43DvbcM++ozCpLUT98I+JJ4MNGu3sBA7PtgcAh2fbBwN0RMS8ipgJT\ngJ2KGZ9VlwUL4A9/gM03T9sTJ8Jpp8EKrjw1a7E8/tt0iogZ2fYMoFO2vS4wpuC4aYDnv7RmeeKJ\nNJX2aqvBiBFppLOZtV6u36kiIiRFU4eULBirSK+8AuefD2PHumuqWVvKIznMkLRORLwrqTMwM9v/\nNtCl4LhvZ/sW079//y+36+rqqKurK06kVrbefBN+/Wt4+GE455y0ZOfKK+cdlVn5qK+vp76+vtXn\nK6K4X84lbQAMi4gts+dXAh9ExBWS+gIdI6Jv1iB9F6mdYT3gUWDjaBSgpMa7rIbMnJmmvBg0CH72\ns5QYOnZc9nlmtU4SEdHscnWxu7LeDYwGNpP0lqTjgMuBfSS9BuyVPSciJgKDgYnA34BTnAWswccf\np2U6N988jV2YOBF+8xsnBrNiKXrJoa255FBbPvsMrr8errwS9t8/jVnwWAWzlmtpycGd/KwszZuX\nBq5ddFEa2TxqVFqZzcxKw8nBysrChTB4cOqB9J3vwJAhsPPOeUdlVnucHKwsLFgA990Hl10GK64I\nN94Ie++dd1RmtcvJwXL1+ecwcGBqU1h7bbj4YjjwQI9VMMubk4PlYvZsGDAArrkGttoKbr01rdns\npGBWHpwcrKTefx9+97tUbbTXXmkQ27bb5h2VmTXmWU+tJN56C37+c9h0U5g+HUaPhnvucWIwK1dO\nDlZUr74KP/lJmghvhRXSGgs33wybbJJ3ZGbWFFcrWZuLSCWDa65Js6WedhpMmQJrrJF3ZGbWXE4O\n1mY++QTuvBNuuAE+/RROPTX1RFpllbwjM7OW8vQZttxefTUlhEGDYPfdU1Lo2RO+5kpLs7Lh6TOs\nJObPh2HDUlKYMAGOPx6efx7WXz/vyMysLTg5WIvMmAG33JLGKHTpAqecAocdBh065B2ZmbUlJwdb\npoYG5uuvh7/9LSWDhx5yN1SzauY2B1uqt96Cu+6CO+6AL75IpYQ+fWD11fOOzMxaqqVtDk4O9hWz\nZqUJ8AYNghdegB/+EI45BnbbzQ3MZpXMycFabN48GD48lRAeeSRNa3HMMXDAAbDSSnlHZ2ZtwcnB\nmiUCxo5NJYR77kkjlo89Fg4/HL71rbyjM7O25q6s1qR//SslhEGD0vNjj4UxY6Br13zjMrPy4uRQ\n5RYuTOMPhg5NPYymT4fevVNy2HFHT5FtZkvmaqUqNHduWnP5oYfSQLVvfAMOPhh69YJddoF27fKO\n0MxKzdVKNer99+Evf0klhEcfTbOg9uoFjz0Gm22Wd3RmVmlccqhgr72WksHQoanb6d57pxLC/vvD\nmmvmHZ2ZlRP3Vqpi//431Ncvenz+eSod9OoFe+7pbqdmtnRODlWkcTL45BOoq1v06NbNDcpm1jxO\nDhXMycDMisXJoULMnp3aCZ5/Hp57Dp580snAzIrHyaEMffABjBuXHs8/nx7TpsEWW8B226XZTXff\n3cnAzIqn4pODpP2Aa4F2wC0RcUWj18s2OUSkQWYNSaDh53/+A9tskxJBw6NbN1jBHYnNrEQqOjlI\nage8CuwNvA38AzgqIl4pOCbX5LBwIbzzDkyZAq+/nn4Wbrdvn0oChYlgo42KM6NpfX09dXV1bX/h\nCuR7sYjvxSK+F4tU+iC4nYApETEVQNKfgYOBV5o6qS0tXAgffgjvvZcaiBsngTfegNVWg403Th/6\nG28Mhx666Pkaa5QqUr/xC/leLOJ7sYjvReuVW3JYD3ir4Pk0YOfWXCgirXP8+edpOon3308f+DNn\npp9L2/7ggzTdxFprpWUwGxLAbrul7a5dYdVV2+RvNTMrW+WWHJpVX7TXXulDv6nH3LmpKqdDB/j6\n19M01GuvnT7011orbW+yCey666Lna62VRhavuGKx/0wzs/JWbm0OuwD9I2K/7Hk/YGFho7Sk8gnY\nzKyCVHKD9AqkBumewDvAWBo1SJuZWfGVVbVSRMyXdBownNSV9VYnBjOz0iurkoOZmZWHIvS+Lx5J\n+0maJGmypPPyjidPkqZKmiBpnKSxecdTSpJukzRD0osF+9aQNFLSa5JGSOqYZ4ylspR70V/StOy9\nMS4bWFrVJHWRNErSy5JeknRGtr/m3hdN3IsWvS8qpuTQnAFytUTSG8D2EfGfvGMpNUm7A3OAP0XE\nltm+K4H3I+LK7IvD6hHRN884S2Ep9+JCYHZEXJ1rcCUkaR1gnYgYL2lV4DngEOA4aux90cS9OIIW\nvC8qqeTw5QC5iJgHNAyQq2U1ORNTRDwJfNhody9gYLY9kPSfoeot5V5Ajb03IuLdiBifbc8hDZxd\njxp8XzRxL6AF74tKSg5LGiC33lKOrQUBPCrpn5JOyDuYMtApImZk2zOATnkGUwZOl/SCpFtroSql\nkKQNgG2BZ6nx90XBvRiT7Wr2+6KSkkNl1H+VTo+I2Bb4AXBqVr1gQDb5Vi2/X24ENgS2AaYDV+Ub\nTulk1ShDgDMjYnbha7X2vsjuxX2kezGHFr4vKik5vA10KXjehVR6qEkRMT37+R7wAKnarZbNyOpa\nkdQZmJlzPLmJiJmRAW6hRt4bklYkJYY7IuLBbHdNvi8K7sWghnvR0vdFJSWHfwKbSNpAUnugNzA0\n55hyIWllSd/ItlcB9gVebPqsqjcU6JNt9wEebOLYqpZ9CDb4b2rgvSFJwK3AxIi4tuClmntfLO1e\ntPR9UTG9lQAk/YBFaz3cGhGX5RxSLiRtSCotQBrIeGct3QtJdwN7AGuS6pEvAB4CBgPfAaYCR0TE\nR3nFWCpLuBcXAnWkqoMA3gBOKqh3r0qSdgOeACawqOqoH2mWhZp6XyzlXvwSOIoWvC8qKjmYmVlp\nVFK1kpmZlYiTg5mZLcbJwczMFuPkYGZmi3FyMDOzxTg5mJnZYpwcrCJI+lU2/fAL2XTDFT3qV9If\nJf2wiNcROZnCAAADhElEQVTfQ9L3SvX7rPqU1UpwZkuSfcgdAGwbEfMkrQF0yDms5VXseX72BGYD\nzxT8PrNmc8nBKsE6pDn55wFExH8a5paStL2k+mx22kcK5tHZPitljJf0vw2L4Uj6saTrGi4s6WFJ\ne2Tb+0oaLek5SYOzqUkaFlbqn+2fIGmzbP+qkm7P9r0g6dCmrrMEX5k+WVK7LNax2fVOzPbXZX/j\nvZJekTSo4Jz9s33/lPQ7ScMkrQ+cBJwl6flsxCzA9yU9Lel1lyJsWZwcrBKMALpIelXS9ZK+D19O\nLnYd8MOI2AG4HbgkO+d24NSIaJguYGnfnAMISWsCvwJ6RsT2pAVSflFwzHvZ/huBc7L95wMfRsRW\nEbE18NgyrrMsPwU+ioidSJOinZBNuQxp2oMzge5AV0m7SloJ+AOwX/b3r0mafPTNbP/VEbFdRDxF\nSkTrREQP4EDg8mbGZDXK1UpW9iLiE0nbA7uTqkvukdSX9MH7XdK6FpDm3HpH0mrAatmHIsAdpKnN\nl0bALqQP3tHZtdoDowuOuT/7+TxwaLbdkzQBZEOcH0k6cBnXacq+wJaSDsuefxPYGJgHjI2IdwAk\njSdNvfwp8K8sGQDcDZzY6O/6MjyySeci4hVJNbWugbWck4NVhIhYCDwOPJ5VEfUhJYeXI2LXwmOX\nsIhJ4YfkfL5aYl6pYHtkRBy9lBA+z34u4Kv/b5a0slZT1ym0pNLMaRExsnCHpLqC318YQ+Pzl7XK\n1xctONZqnKuVrOxJ2lTSJgW7tiXNsPkqsJakXbLjVpTUPZt18yNJPbLjf1Rw7lRgGyVdSNU3QVop\nq4ekjbJrrdLody7JSODUgjg7tvA6jT+ghwOnSFqh4O9eeSnnRvb3d83aGCCVYhoSxmzgG8uI32yp\nnBysEqwK/FHSy5JeALoB/bMG6sOAK7KqlnFAQ/fN44DrJY0rvFBW1fQGMBH4Lan0QUS8D/wYuDv7\nHaOBzZYQS2H7xW+A1SW9mP3+uhZcB2CApLeyx9OkBVgmAs9npaMbWVRCWKyUERFzgVOARyT9E5iV\nPQCGAf/dqEG68BruvWRN8pTdVvWyb9YPR8SWecfS1iStEhGfZNvXA69FxG9zDsuqgEsOVgtE9X5T\nPiEbFPgyqQF7QN4BWXVwycHMzBbjkoOZmS3GycHMzBbj5GBmZotxcjAzs8U4OZiZ2WKcHMzMbDH/\nH0ZDNoWOo057AAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The exponent in the $n^2$ term comes from the fact that, in pairwise alignment, if we assume our sequences are both of length $n$, there are $n \\times n$ cells to fill in in the dynamic programming matrix. If we were to generalize either Smith-Waterman or Needleman-Wunsch to three sequences, we would need to create a 3 dimensional array to score and traceback the alignment.** For sequences of length $n$, we would therefore have $n \\times n \\times n$ cells to fill in, and our runtime versus sequence length curve would look like the following." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s3_times = [t ** 3 for t in range(25)]\n", "\n", "plt.plot(range(25), s3_times)\n", "plt.xlabel('Sequence Length')\n", "plt.ylabel('Runtime (s)')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAAEPCAYAAACHuClZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm8lHXd//HXOxVXcglzT7FIwbQUvTE1OWqSuWFpCpXh\nkmlkWpkK2p1Ulkilot1agQpqovhTEe/bUEyPSwgnFQRBFk1U1pRFJUU58vn9cV1HxukAZ5mZa5b3\n8/GYx1zznWv5zDyG8+G7XooIzMzM2uNjWQdgZmaVz8nEzMzazcnEzMzazcnEzMzazcnEzMzazcnE\nzMzarWjJRNJNkhZLmtbMexdIWi1pm5yygZLmSJopqVdOeXdJ09L3huaUbyzpzrR8oqRdi/VZzMxs\n3YpZM7kZOCq/UNIuwJHAKzll3YBTgG7pMddLUvr2DcCZEdEF6CKp6ZxnAkvS8quBK4v1QczMbN2K\nlkwi4glgWTNvXQVclFfWGxgVEasiYi7wItBD0g5Ax4hoSPe7BTgh3T4eGJlu3w0cUcDwzcysFUra\nZyKpNzAvIqbmvbUjMC/n9Txgp2bK56flpM+vAUREI/BmbrOZmZmVzoalupCkzYBLSJq4Piwu1fXN\nzKx4SpZMgE8DuwHPpd0hOwPPSOpBUuPYJWffnUlqJPPT7fxy0vc+BSyQtCGwZUQszb+oJC8+ZmbW\nBhHR4v/wl6yZKyKmRcR2EdE5IjqTJIX9ImIxMBboI6mDpM5AF6AhIhYBb0nqkXbInwrcl55yLNAv\n3T4J+Ns6ru1HBJdddlnmMZTLw9+Fv4ta+S4WLWrbca1VzKHBo4AJwGclvSbp9LxdPow2ImYAo4EZ\nwF+B/rHm0/QHhgNzgBcjYlxafiPwCUlzgB8BA4r1WczMKtGyZbDXXrBgQfGvVbRmrojou573d897\n/RvgN83s9wywdzPl7wEntzNMM7OqddVV0Ls37Lhj8a9Vyj4Ty1hdXV3WIZQNfxdr+LtYo5q+iyVL\n4Prr4emnS3M9taVtrJJIimr/jGZm+S65BN54A/7857YdL4loRQe8k4mZWZV5/XXYYw+YPBl2beNC\nU61NJl7o0cysyvz2t9CnT9sTSVu4ZmJmVkUWL4auXWHqVNh55/XvvzaumZiZ1bAhQ+Db325fImkL\n10zMzKrEwoXJvJLnn2//cGB3wOdxMjGzWnH++fCxj8HVV7f/XE4meZxMzKwWzJ8Pe+8NM2bA9tu3\n/3xOJnmcTMysFpx7Lmy6aTKSqxCcTPI4mZhZtXv1Vdh3X5g5E7bdtjDn9GguM7Ma85vfwFlnFS6R\ntIVrJmZmFWzuXOjeHWbNgk6dCnde10zMzGrI5ZfD979f2ETSFl412MysQr30EowZA7NnZx2JayZm\nZhXr8suTUVzbbJN1JK6ZmJlVpDlz4P774cUXs44k4ZqJmVkF+uUvkxnvW22VdSQJj+YyM6swM2fC\noYcmtZKPf7w41/BoLjOzKvfLX8KPf1y8RNIWrpmYmVWQ6dPh8MOTWknHjsW7jmsmZmZV7Be/gAsu\nKG4iaQvXTMzMKsS0aXDkkcn8ks03L+61yqZmIukmSYslTcsp+62kFyQ9J+keSVvmvDdQ0hxJMyX1\nyinvLmla+t7QnPKNJd2Zlk+UVMK7HZuZld6gQXDRRcVPJG1RzGaum4Gj8soeAvaKiM8Ds4GBAJK6\nAacA3dJjrpfUlBFvAM6MiC5AF0lN5zwTWJKWXw1cWcTPYmaWqWefhaeegnPOyTqS5hUtmUTEE8Cy\nvLLxEbE6fTkJaLpLcW9gVESsioi5wItAD0k7AB0joiHd7xbghHT7eGBkun03cERRPoiZWcYikhrJ\nz34Gm22WdTTNy7ID/gzggXR7R2BeznvzgJ2aKZ+flpM+vwYQEY3Am5LKYFEBM7PCGjcO5s1Llpkv\nV5kspyLpUuD9iLi9FNcbNGjQh9t1dXXU1dWV4rJmZu3W2AgXXghDhsBGGxXvOvX19dTX17f5+JIn\nE0mnAUfz0Wap+cAuOa93JqmRzGdNU1huedMxnwIWSNoQ2DIiljZ3zdxkYmZWSUaMgE98Ao47rrjX\nyf+P9i9+8YtWHV/SZq608/xCoHdErMx5ayzQR1IHSZ2BLkBDRCwC3pLUI+2QPxW4L+eYfun2ScDf\nSvIhzMxKZMUKuOwy+N3vQC0epJuNotVMJI0CegKdJL0GXEYyeqsDMD4drPVURPSPiBmSRgMzgEag\nf87kkP7ACGBT4IGIGJeW3wjcKmkOsAToU6zPYmaWhd//Hnr2hAMOyDqS9fOkRTOzMrRwIXzuc/DM\nM7DbbqW/fmsnLTqZmJmVobPOSpaX/+1vs7l+a5OJb45lZlZmnn8e7rsPZs3KOpKW80KPZmZl5qKL\n4NJLYeuts46k5VwzMTMrIw8/DLNnw5gxWUfSOq6ZmJmVidWrkwmKgwdDhw5ZR9M6TiZmZmXitttg\n003hxBOzjqT1PJrLzKwMvPMO7LEH3HknHHRQ1tGU0f1MzMys5a65Bnr0KI9E0haumZiZZexf/4Ju\n3WDiRPjMZ7KOJuFJi3mcTMys3P3gB7DhhjB06Pr3LRVPWjQzqyCzZsHo0fDCC1lH0j7uMzEzy9DF\nFyfDgTt1yjqS9nHNxMwsI48/DlOmwB13ZB1J+7lmYmaWgdWr4ac/hV//GjbZJOto2s/JxMwsA3fe\nmSSUvn2zjqQwPJrLzKzEVq6Erl3h5psh5065ZcWTFs3Mytwf/gB7712+iaQtXDMxMyuhBQtgn33g\nySdhzz2zjmbtPGkxj5OJmZWTb34Tdt0Vrrgi60jWzckkj5OJmZWLRx6B00+HGTNg882zjmbd3Gdi\nZlaG3n8/WTblmmvKP5G0hZOJmVkJXH01dO4MJ5yQdSTF4WYuM7Mie/VV2G8/mDQJPv3prKNpmbJp\n5pJ0k6TFkqbllG0jabyk2ZIekrRVznsDJc2RNFNSr5zy7pKmpe8NzSnfWNKdaflESbsW67OYmbXH\nj38M555bOYmkLYrZzHUzcFRe2QBgfER8Fvhb+hpJ3YBTgG7pMddLasqINwBnRkQXoIukpnOeCSxJ\ny68GriziZzEza5Nx45L1ty6+OOtIiqtoySQingCW5RUfD4xMt0cCTa2HvYFREbEqIuYCLwI9JO0A\ndIyIhnS/W3KOyT3X3cARBf8QZmbtsHJlUiO57rrk3u7VrNQd8NtFxOJ0ezGwXbq9IzAvZ795wE7N\nlM9Py0mfXwOIiEbgTUnbFCluM7NWGzIkmaB49NFZR1J8mS1BHxEhqSQ944MGDfpwu66ujrpqWsPA\nzMrSP/8J114Lzz6bdSQtU19fT319fZuPL+poLkm7AfdHxN7p65lAXUQsSpuwHo2IPSUNAIiIwel+\n44DLgFfSfbqm5X2BQyPi++k+gyJioqQNgYURsW0zMXg0l5mVVAQcdxwccggMGJB1NG1TNqO51mIs\n0C/d7geMySnvI6mDpM5AF6AhIhYBb0nqkXbInwrc18y5TiLp0Dczy9zYsfDSS/CTn2QdSekUrWYi\naRTQE+hE0j/yc5JEMBr4FDAXODkilqf7XwKcATQC50fEg2l5d2AEsCnwQEScl5ZvDNwK7AssAfqk\nnff5cbhmYmYl88470K0b3HQTHH541tG0ndfmyuNkYmaldOmlSX/JqFFZR9I+TiZ5nEzMrFRmzYKD\nD4apU2HHHbOOpn3Kvc/EzKwqRSRzSi69tPITSVs4mZiZFcBdd8HixfDDH2YdSTbczGVm1k5vv53c\n0/2OO5LhwNXAfSZ5nEzMrNh++lN44w0YMSLrSAqntckksxnwZmbV4Pnn4ZZbkuda5j4TM7M2ikju\nnjhoEHzyk1lHky0nEzOzNho2DN59F84+O+tIsuc+EzOzNnjlFdh/f3jssWTGe7XxPBMzsyKLgO9+\nFy64oDoTSVs4mZiZtdKwYfDmm8koLku4mcvMrBWamrfq62GvvbKOpnjczGVmViRNzVs/+Ul1J5K2\ncDIxM2uhpuatCy/MOpLy42YuM7MWqJXmrSZu5jIzKzA3b63fOpdTkbQf0Bc4FNgNCJL7sj8O3B4R\nk4sdoJlZ1oYNg+XL3by1Lmtt5pL0ALCM5F7rDcBCQMAOwH8BxwFbRcQxpQm1bdzMZWbtUWvNW00K\ntmqwpO0iYvF6LvbJiPhXK2MsKScTM2urCPjKV+Cww2DgwKyjKa2C9Zk0JRJJm0vaIN3eQ9LxkjZK\n9ynrRGJm1h7DhsGyZW7eaon1juaS9CxwCLA18HfgH8D7EfGt4ofXfq6ZmFlbvPIKdO+eNG997nNZ\nR1N6xRjNpYh4B/g6cH1EfAOowa/WzGpFBJx1VjJ6qxYTSVu0aGiwpC8C3wL+rzXHmZlVouHDk+at\niy7KOpLK0ZKk8CNgIHBvREyX9Gng0fZcVNJASdMlTZN0u6SNJW0jabyk2ZIekrRV3v5zJM2U1Cun\nvHt6jjmShrYnJjMzgFdfhUsugZtvhg19L9oWK/kMeEm7AY8AXSPiPUl3Ag8AewFvRMQQSRcDW0fE\nAEndgNuBA4CdgIeBLhERkhqAcyOiIR3KfG1EjMu7nvtMzKxFmkZv1dUlCaWWFazPRNJNkg5Yx/s9\nJN3c2gCBt4BVwGaSNgQ2AxYAxwMj031GAiek272BURGxKiLmAi8CPSTtAHSMiIZ0v1tyjjEza7Xh\nw2HpUjdvtcW6KnFXAxdKOhCYxZpJi9sDewATgN+19oIRsVTS74FXgXeBByNifN68lsXAdun2jsDE\nnFPMI6mhrEq3m8xPy83MWq2peevRR9281RZr/coiYhrwHUkbA/sCu7JmOZXnImJlWy6Y9rn8iGR5\nljeBuyR9O+/aIalgbVODBg36cLuuro66urpCndrMqkBjI5x6am2P3qqvr6e+vr7Nx2fRZ3IKcGRE\nfDd9fSpwIHA4cFhELEqbsB6NiD0lDQCIiMHp/uOAy0iS2qMR0TUt7wv0jIhz8q7nPhMzW6dBg+DJ\nJ+HBB2GDDbKOpjxUwqrBM4EDJW0qScCXgRnA/UC/dJ9+wJh0eyzQR1IHSZ2BLkBDRCwC3kr7bgSc\nmnOMmVmL1NfDn/4Et97qRNIeJW8ZjIjnJN0CPA2sBp4F/gx0BEZLOhOYC5yc7j9D0miShNMI9M+p\navQHRgCbAg/kj+QyM1uX11+Hb38bRoyAHXbIOprK1uJmLkmbpTPhK4qbucysOatXw3HHJX0kV16Z\ndTTlp+DNXJIOkjSDZEQXkr4g6fp2xGhmlrlrroElS+Dyy7OOpDq0ZKHHBuAk4L6I2Dctmx4RFbGy\nv2smZpbvH/+AY46BSZOgc+esoylPRemAj4hX84oaWxWVmVmZePNN6NMHrr/eiaSQWtIB/6qkgwEk\ndQDOA14oalRmZkUQAWefDb16wUknZR1NdWlJMvk+MJRkdvl84CHgB8UMysysGG68EWbMSJq3rLBK\nPmmx1NxnYmYA06cnCzg+/jh07Zp1NOWvtX0m662ZSNod+CHJ8idN+0dEHN+mCM3MSuzdd+GUU5Ih\nwE4kxdGS0VxTgeHA8ySTDCFJJo8VObaCcM3EzM4+G1asgNtuA7X4/9q1reA1E2BlRFzbjpjMzDIz\nejQ88gg884wTSTG1pGZyKvBp4EHgvabyiHi2uKEVhmsmZrXrn/+EAw+Ev/4VunfPOprKUoyayV4k\niygexppmLtLXZmZl6f33oW/f5B4lTiTF15KayUskt9h9vzQhFZZrJma16cILYeZMGDvWzVttUYya\nyTRga5K7H5qZlb2//hXuuAMmT3YiKZWWJJOtgZmS/sGaPhMPDTazsjR3LpxxRpJMOnXKOpra0ZJk\nclnRozAzK4AVK6B3bxgwAHr2zDqa2uIZ8GZWFVavhm98A7baCoYPd/NWexWsz0TS3yPiYEkrgPy/\nxhERH29rkGZmhfarX8GiRXD77U4kWVhrMomIg9PnLUoXjplZ6919d7KIY0MDbLxx1tHUppbcafHW\nlpSZmWVh6lQ45xy4917Yfvuso6ldLbk51udyX0jaEPAUIDPL3OuvJx3u117riYlZW2sykXSJpLeB\nvSW93fQA/gWMLVmEZmbNWLUq6XDv2zd5WLZaMgN+cEQMKFE8BefRXGbVqX9/eO01uO8++FiLbkBu\nrVHwGfARMUDSTsCuuftHxONtC9HMrH3++Eeor4eJE51IykVLaiZXAqcAM4APmsoj4rg2X1TaiuQe\nKXuRDDs+HZgD3EmStOYCJ0fE8nT/gcAZ6fXPi4iH0vLuwAhgE+CBiDi/mWu5ZmJWRR57DE4+Gf7+\nd/jMZ7KOpnq1tmbSkmQyG9g7It5b546tIGkk8FhE3JR26G8OXAq8ERFDJF0MbJ3WiroBtwMHkNyH\n/mGgS0SEpAbg3IhokPQAcG1EjMu7lpOJWZWYOxe++EW45RY48siso6lurU0mLakgvgR0aHtIHyVp\nS+BLEXETQEQ0RsSbwPHAyHS3kcAJ6XZvYFRErIqIucCLQA9JOwAdI6Ih3e+WnGPMrMrkLpXiRFJ+\nWrI217vAFEl/46MLPZ7Xxmt2Bl6XdDPweeAZ4EfAdhHRtDLxYmC7dHtHYGLO8fNIaiir0u0m89Ny\nM6syq1fDaaclw3/Pa+tfHiuqliSTsfznUOD2tBttCOxH0jz1D0nXAB8ZLZY2YRWsbWrQoEEfbtfV\n1VFXV1eoU5tZCfzqV7BgAfzlL14qpVjq6+upr69v8/ElX+hR0vbAUxHROX19CDAQ2B04LCIWpU1Y\nj0bEnpIGAETE4HT/cSQrGb+S7tM1Le8L9IyIc/Ku5z4Tswp2zz3wox8lS6V4hnvpFLzPRNLLzTz+\n2dYAI2IR8Jqkz6ZFXwamA/cD/dKyfsCYdHss0EdSB0mdgS5AQ3qetyT1kCSSWws3HWNmVWDyZDj7\nbC+VUgla0sx1QM72JsBJwCfaed0fAn+R1IGkg/90YANgtKQzSYcGA0TEDEmjSYYmNwL9c6oa/UmG\nBm9KMjT4IyO5zKxyvfgiHHNMMqfES6WUvzY1c0l6NiL2K0I8BedmLrPKs3AhHHIIXHwxfO97WUdT\nmwo+Az6dGNj01/hjwP4ktQgzs4Jbvhy++tXk1rtOJJWjJZMW61mTTBpJmqB+FxGzihpZgbhmYlY5\n3n0XjjoKvvAFuOYaj9zKUsFnwDdzAZEsdXJna4PLgpOJWWVobISTToLNNoPbbvOaW1kr2GguSVtI\nukDS9ZL6S/qYpK+RjLz6ViGCNTMDiEhGba1cCSNGOJFUonX1mdwCvEUy+/xI4DRgJfDNiJhS/NDM\nrFZccglMnw5/+xt0KNjiTVZKa23mkjQ1IvZJtzcAFgK7RsS7JYyv3dzMZVberroKhg+HJ56AT7R3\n0oEVTCFHc+UuN/+BpPmVlkjMrLzdeisMHQpPPulEUunWVTP5AHgnp2hTkkUfIVk+6+NFjq0gXDMx\nK0//939w5pnw6KPQtWvW0Vi+gtVMIsJzScysKCZMgNNPh/vvdyKpFh4zYWYl9fzz8LWvJU1cPXpk\nHY0VipOJmZXMK68ks9uvuQa+8pWso7FCcjIxs5J4/XXo1QsuvBD69s06Gis0JxMzK7o33kgSyTe+\n4TslVisnEzMrqsWLoa4Ojj46uWOiVScnEzMrmgULkkRy8slw+eVeuLGaOZmYWVG8+ir07An9+sHP\nf+5EUu2cTMys4F5+OUkk/fvDgAFZR2Ol4GRiZgU1Z06SSC68EH7846yjsVJxMjGzgnnhBTjssKRZ\nq3//rKOxUlrvbXvNzFri+eeT4b+DB8N3vpN1NFZqTiZm1m6TJ6+Z2d6nT9bRWBacTMysXRoa4Ljj\n4IYb4Otfzzoay4qTiZm12YQJcMIJcNNNcOyxWUdjWcqsA17SBpImS7o/fb2NpPGSZkt6SNJWOfsO\nlDRH0kxJvXLKu0ualr43NIvPYVarHnssSSS33upEYtmO5jofmAE03blqADA+Ij4L/C19jaRuwClA\nN+Ao4Hrpw+lPNwBnRkQXoIuko0oYv1nNevhhOOkkuOMOr/5riUySiaSdgaOB4UBTYjgeGJlujwRO\nSLd7A6MiYlVEzAVeBHpI2gHoGBEN6X635BxjZkUyejR885twzz1w+OFZR2PlIqs+k6uBC4HcW/9u\nFxGL0+3FwHbp9o7AxJz95gE7AavS7Sbz03IzK4IIuOIK+OMfYfx4+Pzns47IyknJk4mkY4F/RcRk\nSXXN7RMRIalgN24fNGjQh9t1dXXU1TV7WTNbi/ffh3POgSlT4KmnYCf/t63q1NfXU19f3+bjFVGw\nv9ktu6D0G+BUoBHYhKR2cg9wAFAXEYvSJqxHI2JPSQMAImJwevw44DLglXSfrml5X6BnRJyTd70o\n9Wc0qybLl8OJJ8Lmm8Ptt8MWW2QdkZWCJCKixctzlrzPJCIuiYhdIqIz0Ad4JCJOBcYC/dLd+gFj\n0u2xQB9JHSR1BroADRGxCHhLUo+0Q/7UnGPMrABefhkOOgj23hvuvdeJxNauHNbmaqo2DAaOlDQb\nODx9TUTMAEaTjPz6K9A/p6rRn6QTfw7wYkSMK2XgZtVs4kQ4+GD4/veTme0bbJB1RFbOSt7MVWpu\n5jJrvbvuShZqvPlmzyGpVa1t5vIMeDP7UAQMGQJ/+AM89BDsu2/WEVmlcDIxMwBWrUqatJ55Jhmx\ntfPOWUdklcTJxMxYvjyZ0b7JJvD449CxY9YRWaUphw54M8vQ3LlJR3vXrjBmjBOJtY2TiVkNmzAh\nGfp79tlw3XWwodsqrI380zGrQRFw1VVJZ/uNN3rElrWfk4lZjVm6FE47DRYvhkmTYLfdso7IqoGb\nucxqyMSJsN9+0KULPPGEE4kVjmsmZjUgAq6+GgYPhmHDoHfvrCOyauNkYlblli1LmrUWLkyatTp3\nzjoiq0Zu5jKrYpMmJc1au+8OTz7pRGLF45qJWRWKgKFD4Te/gT//OblXu1kxOZmYVZlly+D002HB\nAjdrWem4mcusijQ0JM1au+3mZi0rLddMzKrA6tVJs9YVV8Cf/gRf+1rWEVmtcTIxq3CzZsGZZybb\nEycmne1mpeZmLrMKtWpVUhM5+GDo0ydZ7deJxLLimolZBZoyBc44Azp1gqef9kx2y55rJmYVZOVK\n+NnPoFcvOO88ePBBJxIrD66ZmFWICROSvpGuXeG552CHHbKOyGwNJxOzMrdiBVx6Kdx1F1x7bXJH\nRLNy42YuszL28MOwzz7JbXWnTXMisfLlmolZGVq+HC64IEkmf/wjfPWrWUdktm4lr5lI2kXSo5Km\nS3pe0nlp+TaSxkuaLekhSVvlHDNQ0hxJMyX1yinvLmla+t7QUn8Ws0JbvRpuuw0+9znYeOOkNuJE\nYpVAEVHaC0rbA9tHxBRJWwDPACcApwNvRMQQSRcDW0fEAEndgNuBA4CdgIeBLhERkhqAcyOiQdID\nwLURMS7velHqz2jWFo8/ntRGpOTeIwcfnHVEVsskERFq6f4lr5lExKKImJJurwBeIEkSxwMj091G\nkiQYgN7AqIhYFRFzgReBHpJ2ADpGREO63y05x5hVjNmzk+VPvvMd+MlPklnsTiRWaTLtgJe0G7Av\nMAnYLiIWp28tBrZLt3cE5uUcNo8k+eSXz0/LzSrCkiVw/vlw0EFw4IEwcyb07Qsf87AYq0CZdcCn\nTVx3A+dHxNvSmtpU2oRVsLapQYMGfbhdV1dHXV1doU5t1mrvvQfXXQdXXgknnwwvvADbbpt1VFbr\n6uvrqa+vb/PxJe8zAZC0EfC/wF8j4pq0bCZQFxGL0iasRyNiT0kDACJicLrfOOAy4JV0n65peV+g\nZ0Sck3ct95lYWYhI5ooMGAB77QVDhiQTEM3KUdn3mSipgtwIzGhKJKmxQL90ux8wJqe8j6QOkjoD\nXYCGiFgEvCWpR3rOU3OOMSsrTz2V9INccQUMHw733+9EYtUli9FchwCPA1OBposPBBqA0cCngLnA\nyRGxPD3mEuAMoJGkWezBtLw7MALYFHggIs5r5nqumVhmXnoJLrkkWQrl8svh1FPdJ2KVobU1k0ya\nuUrJycSyMG0aDB6cLMR4/vnJkN/NNss6KrOWK/tmLrNqNmECHHdcsqrvPvskNZP//m8nEqt+Xk7F\nrJ0ikhrIFVfAq6/CRRfB6NGw6aZZR2ZWOk4mZm30wQdw991Jc9aqVckorVNOgQ39r8pqkH/2Zq30\n3ntw663J0N5OneAXv4BjjnHHutU2JxOzFlqxAv78Z7jqqmQhxmHD4NBDk7W0zGqdk4nZekyfniSO\nv/wFDjsMxo6F/fbLOiqz8uJkYtaMf/876UQfNgxeeQVOPx0aGqBz56wjMytPnmdiluOZZ5IEMnp0\nMmP9rLPg6KPdqW61p7XzTPxPxGrem2/C7bcnSWTpUvjud2HqVNh556wjM6scrplYTYpI1ssaNgzG\njIEvfzmphXz5yx6VZQZeTuU/OJlYrtmz4Z57kqG9jY1JLaRfP/jkJ7OOzKy8OJnkcTKpbRHJOll3\n350kkSVLkrsannIKfOlLHtZrtjZOJnmcTGpPRDLy6p57kkdjI5x4Inz968kdDd2MZbZ+7oC3mvTB\nB/Dkk0kN5N57YYstkgQyejR84QuugZgVm5OJVawVK+Dxx5MO9Pvug512SmofDz3kG0+ZlZqTiVWM\nlSth4kR45JHkMWUK7L8/HHtsMjJr992zjtCsdrnPxMpWYyM8/fSa5DFpEnTrBkccAYcfDgcd5PuE\nmBWLO+DzOJlUjtWrk5FXTcnjiSdg112TxHH44cmiiltumXWUZrXBySSPk0l5ioAFC5LlS55+Onme\nNAm22WZN8jjsMNh226wjNatNTiZ5nEzKw8KFa5JG03NjI3TvnvR7dO8OBxzgJUzMyoWTSR4nk9KK\ngPnz4bnnPpo83n//o4mje3f41Kc8ZNesXDmZ5HEyKY633kqWJpk1a83zrFkwZw507JjcPGr//dck\nj113deIwqyQ1l0wkHQVcA2wADI+IK/PedzJpo3fegXnzmk8ab70Fn/1s8thjj+TR9Nqd5GaVr6aS\niaQNgFnAl4H5wD+AvhHxQs4+Tiap+vp66urqiEiWXZ8/P0kWTY/81++8k0wE7NLlowljjz2S8kpe\nlqTpuzBNL0TmAAAHh0lEQVR/F7n8XaxRa8up/BfwYkTMBZB0B9AbeGFdB1Wb1auT5LBkSfJYuvQ/\nt5cuhYkT6+nQoY5585Imp112SZLCzjsnj/32g+OPX/P6E5+o3qYp/9FYw9/FGv4u2q7Sk8lOwGs5\nr+cBPTKKZZ0iYNWq5PH++2se772X3CK26bFixdpf524vW7YmSSxfnvRTbLNNkgDyn/fYI9neYAMY\nODBJFB//eNbfiJlVk0pPJi1qvzr22HTnnL3zW77y31u9uvnHBx+svbyx8T+TRdPrVauSW7926LDm\nsdFGyfMWW8DmmyeP5ra33jqpRTSVN5U1JYytt27ZbWXnzElmkJuZFVql95kcCAyKiKPS1wOB1bmd\n8JIq9wOamWWoljrgNyTpgD8CWAA0kNcBb2ZmxVfRzVwR0SjpXOBBkqHBNzqRmJmVXkXXTMzMrDxU\n8EyB9ZN0lKSZkuZIujjreLIkaa6kqZImS2rIOp5SknSTpMWSpuWUbSNpvKTZkh6StFWWMZbKWr6L\nQZLmpb+NyelE4KomaRdJj0qaLul5Seel5TX3u1jHd9Gq30XV1kxaMqGxlkh6GegeEUuzjqXUJH0J\nWAHcEhF7p2VDgDciYkj6H42tI2JAlnGWwlq+i8uAtyPiqkyDKyFJ2wPbR8QUSVsAzwAnAKdTY7+L\ndXwXJ9OK30U110w+nNAYEauApgmNtaxKpyCuW0Q8ASzLKz4eGJlujyT5x1P11vJdQI39NiJiUURM\nSbdXkEx03oka/F2s47uAVvwuqjmZNDehcae17FsLAnhY0tOSzso6mDKwXUQsTrcXA9tlGUwZ+KGk\n5yTdWAtNO7kk7QbsC0yixn8XOd/FxLSoxb+Lak4m1dl+13YHR8S+wFeBH6TNHQaki7fV8u/lBqAz\n8AVgIfD7bMMpnbRZ527g/Ih4O/e9WvtdpN/F/yP5LlbQyt9FNSeT+cAuOa93Iamd1KSIWJg+vw7c\nS9IMWMsWp23FSNoB+FfG8WQmIv4VKWA4NfLbkLQRSSK5NSLGpMU1+bvI+S5ua/ouWvu7qOZk8jTQ\nRdJukjoApwBjM44pE5I2k9Qx3d4c6AVMW/dRVW8s0C/d7geMWce+VS39o9nka9TAb0OSgBuBGRFx\nTc5bNfe7WNt30drfRdWO5gKQ9FXW3Ovkxoi4IuOQMiGpM0ltBJKJqn+ppe9C0iigJ9CJpB3858B9\nwGjgU8Bc4OSIWJ5VjKXSzHdxGVBH0pQRwMvA2Tn9BlVJ0iHA48BU1jRlDSRZRaOmfhdr+S4uAfrS\nit9FVScTMzMrjWpu5jIzsxJxMjEzs3ZzMjEzs3ZzMjEzs3ZzMjEzs3ZzMjEzs3ZzMrGqJOnSdDnt\n59Llsyt6VrekEZJOLOL5e0r6YqmuZ9Wnou+0aNac9I/iMcC+EbFK0jbAxhmH1V7FXifqMOBt4Kmc\n65m1mGsmVo22J7knxSqAiFjatDaZpO6S6tPVk8flrMPUPa3FTJH026abR0k6TdJ1TSeW9L+Seqbb\nvSRNkPSMpNHpUjVNNyIblJZPlbRHWr6FpJvTsuckfX1d52nGR5YDl7RBGmtDer7vpeV16We8S9IL\nkm7LOebotOxpSddKul/SrsDZwI8lPZvOiAY4VNLfJb3kWoqtj5OJVaOHgF0kzZL0P5IOhQ8Xs7sO\nODEi9gduBn6dHnMz8IOIaFo+Ym3/Mw8gJHUCLgWOiIjuJDcU+knOPq+n5TcAP03L/xtYFhH7RMTn\ngUfWc571ORNYHhH/RbII31npEuKQLINxPtAN2F3SQZI2Af4IHJV+/k4ki+O+kpZfFRH7RcSTJIlr\n+4g4GDgWGNzCmKxGuZnLqk5E/FtSd+BLJM03d0oaQPKHei+S+7pAsmbbAklbAlumf0QBbiVZqn9t\nBBxI8od6QnquDsCEnH3uSZ+fBb6ebh9BsuBoU5zLJR27nvOsSy9gb0knpa8/DnwGWAU0RMQCAElT\nSJYSfwf4Z5o8AEYB38v7XB+GR7rIYUS8IKmm7uthredkYlUpIlYDjwGPpU1W/UiSyfSIOCh332Zu\n+pP7R7WRj9bgN8nZHh8R31xLCO+lzx/w0X9nzd25bl3nydVcbenciBifWyCpLuf6uTHkH7++u+i9\n34p9rca5mcuqjqTPSuqSU7QvyQqws4BtJR2Y7reRpG7pqrDLJR2c7v+tnGPnAl9QYheS5qQguRPd\nwZI+nZ5r87xrNmc88IOcOLdq5Xny/6A/CPSXtGHO595sLcdG+vl3T/tIIKklNSWYt4GO64nfbK2c\nTKwabQGMkDRd0nPAnsCgtEP+JODKtOlnMtA0HPZ04H8kTc49Udr09TIwAxhKUrshIt4ATgNGpdeY\nAOzRTCy5/S+XA1tLmpZev64V5wH4k6TX0sffSW5YNAN4Nq193cCaGsh/1GIiYiXQHxgn6WngrfQB\ncD/wtbwO+NxzeHSXrZOXoDfLk/7P/X8jYu+sYyk0SZtHxL/T7f8BZkfE0IzDsirgmonZfxLV+z/x\ns9JJnNNJOuz/lHVAVh1cMzEzs3ZzzcTMzNrNycTMzNrNycTMzNrNycTMzNrNycTMzNrNycTMzNrt\n/wNW3C+n5wmUKwAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That curve looks steeper than the curve for pairwise alignment, and the values on the y-axis are bigger, but it's not really clear how much of a problem this is until we plot runtime for three sequences in the context of the run times for pairwise alignment." ] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.plot(range(25), s2_times)\n", "plt.plot(range(25), s3_times)\n", "plt.xlabel('Sequence Length')\n", "plt.ylabel('Runtime (s)')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAAEPCAYAAACHuClZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcFNW5//HPw75v4sKmYIIKYlRGg4LLiErcUSECSQwq\nEBRx3wDv7zom3oiYqGiuGyICKopXRUwUQXFEZBnZZN9UkF1l32eGeX5/dA004wCzdddM9/f9etWr\nT52q6nq6aeqZc6rqlLk7IiIixVEu7ABERKTsUzIREZFiUzIREZFiUzIREZFiUzIREZFiUzIREZFi\ni1kyMbNXzWyDmc3LZ9l9ZpZjZvWi6vqb2TIzW2xmHaLqU8xsXrBscFR9ZTN7O6ifZmYnxOqziIjI\n4cWyZTIMuCxvpZk1AS4FVkbVtQS6AC2DbZ43MwsWvwD0cPfmQHMzy33PHsDGoP5p4IlYfRARETm8\nmCUTd/8S2JzPoqeAB/PUdQRGuXuWu68AlgNtzKwBUNPdM4L1RgDXBuVrgOFB+V3g4hIMX0RECiGu\n50zMrCOw2t3n5lnUEFgdNb8aaJRP/ZqgnuB1FYC7ZwNbo7vNREQkfirEa0dmVg0YQKSLa391vPYv\nIiKxE7dkAvwKaAp8E5wOaQzMNLM2RFocTaLWbUykRbImKOetJ1h2PLDWzCoAtd19U96dmpkGHxMR\nKQJ3L/Af/HHr5nL3ee5+rLs3c/dmRJJCa3ffAIwFuppZJTNrBjQHMtx9PbDNzNoEJ+RvBD4I3nIs\n0D0odwY+O8y+NbnzyCOPhB5DaZn0Xei7SJbvYv329UXarrBieWnwKGAKcJKZrTKzm/Ossj9ad18I\njAYWAh8DffzAp+kDvAIsA5a7+7igfihwlJktA+4G+sXqs4iIlEWbd2/m1OdPZe32tTHfV8y6udy9\n2xGWn5hn/u/A3/NZbyZwWj71e4EbihmmiEjCemrqU3Q8uSMNazaM+b7iec5EQpaamhp2CKWGvosD\n9F0ckEjfxcZdG3l+xvPM6DUjLvuzovSNlSVm5on+GUVE8hrw2QB+3vUzL1/9cpG2NzO8ECfg1TIR\nEUkwP+38iRdnvMjs3rPjtk8N9CgikmCenPIkXVt15YQ68RuyUC0TEZEEsmHHBl6Z9Qpzb8s70Ehs\nqWUiIpJABn01iD/95k80rtX4yCuXILVMREQSxLrt6xg2Zxjz+8yP+77VMhERSRADJw+k++nd43Jf\nSV5qmYiIJIA129Ywcu5IFt6+MJT9q2UiIpIAHp/8OD3O7MFxNY4LZf9qmYiIlHE/bP2BUfNHsfj2\nxaHFoJaJiEgZ9/cv/06v1r04uvrRocWglomISBm2YssK3ln4Dkv6Lgk1DrVMRETKsMcmPcZtZ91G\n/Wr1Q41DLRMRkTLq203fMmbxGJbesTTsUNQyEREpqx778jH6/rYv9arWCzsUtUxERMqiZRuX8eGS\nD1l+5/KwQwHUMhERKZP+Oumv3NXmLupUqRN2KIAejiUiUuYs/nkxFwy7gOV3LqdW5Vox2UdhH46l\nlomISBnz1y/+yj3n3BOzRFIUSiYiImXIgh8X8Nn3n9H3t33DDuUgSiYiImXIo188yn3n3kfNyjXD\nDuUgSiYiImXEvA3zmLRyEreffXvYofxCzJKJmb1qZhvMbF5U3ZNmtsjMvjGz98ysdtSy/ma2zMwW\nm1mHqPoUM5sXLBscVV/ZzN4O6qeZWfwediwiEoK0L9J4sN2DVK9UPexQfiGWLZNhwGV56sYDp7r7\n6cBSoD+AmbUEugAtg22eN7PcqwheAHq4e3OguZnlvmcPYGNQ/zTwRAw/i4hIqGatm8XUVVO59axb\nww4lXzFLJu7+JbA5T90Ed88JZqcDuQ8p7giMcvcsd18BLAfamFkDoKa7ZwTrjQCuDcrXAMOD8rvA\nxTH5ICIiIXN3HpzwIP91wX9RrWK1sMPJV5jnTG4BPgrKDYHVUctWA43yqV8T1BO8rgJw92xgq5mF\nP6aAiEgJG7d8HKu3raZX615hh3JIoQynYmYPA5nu/mY89peWlra/nJqaSmpqajx2KyJSbNk52Tww\n4QEGXTqIiuUrxmw/6enppKenF3n7uCcTM7sJuIKDu6XWAE2i5hsTaZGs4UBXWHR97jbHA2vNrAJQ\n29035bfP6GQiIlKWvDbnNY6qdhRXn3R1TPeT9w/tRx99tFDbx7WbKzh5/gDQ0d33RC0aC3Q1s0pm\n1gxoDmS4+3pgm5m1CU7I3wh8ELVN96DcGfgsLh9CRCROdmTu4JH0R/jHpf/gwDVJpVPMWiZmNgq4\nEKhvZquAR4hcvVUJmBB8MVPdvY+7LzSz0cBCIBvoEzWgVh/gNaAq8JG7jwvqhwIjzWwZsBHoGqvP\nIiIShn9O+ScXnnAhZzc6O+xQjkgDPYqIlELrtq+j1QutmPmXmTSt0zTu+y/sQI9KJiIipVCvsb2o\nU6UOT3Z4MpT9FzaZ6OFYIiKlzPwf5/PBkg9Y0ndJ2KEUmMbmEhEpZR6c8CAPn/8wdavWDTuUAlPL\nRESkFPn0u09ZunEpY7qOCTuUQlHLRESklMjxHB6Y8AADLxlIpfKVwg6nUJRMRERKidfnvk7VClXp\n1KJT2KEUmrq5RERKgV1Zu3h44sO83fntUn+DYn7UMhERKQWemfYMbRq1oW2TtmGHUiRqmYiIhOzH\nnT/y1NSnmNZzWtihFJluWhQRCdnt/7mdCuUqMPjywUdeOU5006KISBmy5OcljF44mkW3Lwo7lGLR\nORMRkRA99OlDPND2AepXqx92KMWilomISEgmrZzEnPVzeKvzW2GHUmxqmYiIhCDHc7h//P38T/v/\noUqFKmGHU2xKJiIiIXh7/tvkeA7dTusWdiglQt1cIiJxtid7DwMmDmBYx2GUs8T4mz4xPoWISBny\nr4x/cdoxp5HaNDXsUEqMWiYiInG0dvtaBk4eyORbJocdSolSy0REJI7uH38/vVr34pT6p4QdSolS\ny0REJE4mfj+Rr1Z9xZCrh4QdSolTy0REJA4y92Vy+0e388zvnqF6pephh1PilExEROLg6alP06xO\nM6495dqwQ4kJdXOJiMTYD1t/4MkpTzK95/Qy+aySgohZy8TMXjWzDWY2L6qunplNMLOlZjbezOpE\nLetvZsvMbLGZdYiqTzGzecGywVH1lc3s7aB+mpmdEKvPIiJSHPd8cg99f9uXX9X7VdihxEwsu7mG\nAZflqesHTHD3k4DPgnnMrCXQBWgZbPO8HUjfLwA93L050NzMct+zB7AxqH8aeCKGn0VEpEjGLR/H\nnPVzeKjdQ2GHElMxSybu/iWwOU/1NcDwoDwcyO087AiMcvcsd18BLAfamFkDoKa7ZwTrjYjaJvq9\n3gUuLvEPISJSDHuy99D3o748d/lzVK1YNexwYireJ+CPdfcNQXkDcGxQbgisjlpvNdAon/o1QT3B\n6yoAd88GtppZvRjFLSJSaIO+GsRvjv0NVzS/IuxQYi60E/Du7mYWl0cgpqWl7S+npqaSmpoaj92K\nSBL7bvN3PDv9WWb1nhV2KAWSnp5Oenp6kbePdzLZYGbHufv6oAvrx6B+DdAkar3GRFoka4Jy3vrc\nbY4H1ppZBaC2u2/Kb6fRyUREJNbcnTs/vpP7297P8bWPDzucAsn7h/ajjz5aqO3j3c01FugelLsD\nY6Lqu5pZJTNrBjQHMtx9PbDNzNoEJ+RvBD7I5706EzmhLyISurFLxvLt5m+599x7ww4lbmLWMjGz\nUcCFQH0zWwX8NzAQGG1mPYAVwA0A7r7QzEYDC4FsoI+753aB9QFeA6oCH7n7uKB+KDDSzJYBG4Gu\nsfosIiIFtStrF3eNu4tXO75KpfKVwg4nbuzAMTsxmZkn+mcUkdLj4c8e5rst3zGq06iwQykWM8Pd\nC3yHpe6AFxEpIUt+XsJLM19i7m1zww4l7jQ2l4hICXB3+n7cl4fPf5iGNRuGHU7cKZmIiJSAdxa+\nw4YdG7ijzR1hhxIKdXOJiBTT9r3bufeTe3mr81tUKJech1W1TEREiunRLx7lkhMv4bzjzws7lNAk\nZwoVESkh83+cz4hvRjC/z/ywQwmVWiYiIkXk7tz+0e2kpaZxTPVjwg4nVEomIiJFNGTWEHZn7aZ3\nSu+wQwmdurlERIpg5ZaVPDzxYb646QvKlysfdjihU8tERKSQ3J2eH/bkvnPvo+XRLcMOp1RQMhER\nKaQhs4awdc9W7m97f9ihlBrq5hIRKYTc7q307ulJe09JftQyEREpoNzurXvPuZdTjzk17HBKFSUT\nEZECyu3eeqDdA2GHUuqojSYiUgDq3jo8tUxERI5A3VtHdtj0amatgW7ABUBTwIGVwCTgTXefHesA\nRUTCNmTWELbs2aLurcM45JMWzewjYDORZ61nAOsAAxoAvwWuBuq4+5XxCbVo9KRFESmOlVtWctaQ\ns0jvnp5UrZLCPmnxcMnkWHffcISdHePuPxYyxrhSMhGRonJ3fvf677io6UX0P79/2OHEVWGTySHP\nmeQmEjOrbmblg/LJZnaNmVUM1inViUREpDiGzBrC5j2b1b1VAIdsmexfwWwWcB5QF/gK+BrIdPc/\nxj684lPLRESKYuWWlaS8nEL6Tem0OqZV2OHEXYm1TKLf0913AdcDz7v774Hk+2ZFJGm4O70+7MW9\n596blImkKAp0abCZnQv8EfhPYbYTESmLXpn1Cpv3bObBdg+GHUqZUZCkcDfQH3jf3ReY2a+Az4uz\nUzPrb2YLzGyemb1pZpXNrJ6ZTTCzpWY23szq5Fl/mZktNrMOUfUpwXssM7PBxYlJRATgh60/MGDi\nAIZ1HKabEwvhiOdMSnyHZk2BiUALd99rZm8DHwGnAj+7+yAzewio6+79zKwl8CZwNtAI+BRo7u5u\nZhlAX3fPCC5lftbdx+XZn86ZiEiB5F69ldo0lQHnDwg7nFCV2DkTM3vVzM4+zPI2ZjassAEC24As\noJqZVQCqAWuBa4DhwTrDgWuDckdglLtnufsKYDnQxswaADXdPSNYb0TUNiIihfbKrFfYtHuTureK\n4HBtuKeBB8zsHGAJB25aPA44GZgC/KOwO3T3TWb2T+AHYDfwibtPyHNfywbg2KDcEJgW9RaribRQ\nsoJyrjVBvYhIoeV2b33e/XN1bxXBIb8xd58H/NnMKgNnAidwYDiVb9x9T1F2GJxzuZvI8CxbgXfM\n7E959u1mVmJ9U2lpafvLqamppKamltRbi0gCyM7J5sb3b+Tec5L36q309HTS09OLvH0Y50y6AJe6\ne89g/kbgHKA9cJG7rw+6sD5391PMrB+Auw8M1h8HPEIkqX3u7i2C+m7Ahe5+a5796ZyJiBxWWnoa\nk3+YzCd/+kTPcw/E4j6TkrYYOMfMqpqZAZcAC4EPge7BOt2BMUF5LNDVzCqZWTOgOZDh7uuBbcG5\nGwNujNpGRKRA0lek89LMlxh53UglkmKIe8egu39jZiOAGUAOMAt4GagJjDazHsAK4IZg/YVmNppI\nwskG+kQ1NfoArwFVgY/yXsklInI4P+38iT+99yde6/gaDWo2CDucMq3A3VxmVi24E75MUTeXiOQn\nx3O4etTVtDq6FU9c+kTY4ZQ6Jd7NZWZtzWwhkSu6MLMzzOz5YsQoIhK6Z6Y9w8ZdG3ms/WNhh5IQ\nCnLO5BngMuBnAHefA1wYy6BERGLp6zVfM3DyQEZ1GkXF8hXDDichFOgEvLv/kKcqOwaxiIjE3NY9\nW+n6bleev/J5mtVtFnY4CaMgJ+B/MLN2AGZWCbgTWBTTqEREYsDd6f3v3nQ4sQOdW3YOO5yEUpBk\nchswmMjd5WuA8cDtsQxKRCQWhs4eysKfFjK95/SwQ0k4cb9pMd50NZeIACz4cQGpw1OZdNMkWhzd\nIuxwSr3CXs11xJaJmZ0I3EFk+JPc9d3drylShCIicbY7azdd/q8LT1zyhBJJjBTksb1zgVeA+URu\nMoRIMvkixrGVCLVMRKT3h73ZkbWD1697nciAGXIkJd4yAfa4+7PFiElEJDSjF4xm4oqJzPzLTCWS\nGCpIy+RG4FfAJ8De3Hp3nxXb0EqGWiYiyeu7zd9xzivn8PEfPyalYUrY4ZQpsWiZnEpkEMWLONDN\nRTAvIlIqZe7LpNu73Rhw/gAlkjgoSMvkWyKP2M2MT0glSy0TkeT0wPgHWLxxMWO7jlX3VhHEomUy\nD6hL5OmHIiKl3sfLPuatBW8xu/dsJZI4KUgyqQssNrOvOXDORJcGi0iptGLLCm4ZewtvdXqL+tXq\nhx1O0ihIMnkk5lGIiJSAHZk76PhWR/q168eFTTUebTzpDngRSQg5nsPv3/k9dSrX4ZVrXlH3VjGV\n2DkTM/vK3duZ2Q4g79HY3b1WUYMUESlpf/vib6zfsZ43r39TiSQEh0wm7t4ueK0Rv3BERArv3YXv\nMnT2UDJ6ZVC5QuWww0lKBXnS4siC1ImIhGHuhrnc+p9beb/L+xxX47iww0laBXk4VqvoGTOrAOgO\nIBEJ3U87f6LjWx159rJndWNiyA6ZTMxsgJltB04zs+25E/AjMDZuEYqI5CNrXxa/f+f3dGvVjW6n\ndQs7nKRXkDvgB7p7vzjFU+J0NZdIYurznz6s2raKD7p+QDkr0BPIpRBK/A54d+9nZo2AE6LXd/dJ\nRQtRRKR4XpzxIukr0pnWc5oSSSlRkIdjPQF0ARYC+6IWFTmZmFkdIs9IOZXIZcc3A8uAt4kkrRXA\nDe6+JVi/P3BLsP873X18UJ8CvAZUAT5y97uKGpOIlA1frPiCR9If4atbvqJWZd2hUFoUpJtrKXCa\nu+897IqF2anZcOALd381OKFfHXgY+NndB5nZQ0DdoFXUEngTOJvIc+g/BZq7u5tZBtDX3TPM7CPg\nWXcfl2df6uYSSRArtqzg3KHnMuLaEVz6q0vDDiehFbabqyDtw2+BSkUP6WBmVhs4391fBXD3bHff\nClwDDA9WGw5cG5Q7AqPcPcvdVwDLgTZm1gCo6e4ZwXojorYRkQQTPVSKEknpU5CxuXYDc8zsMw4e\n6PHOIu6zGfCTmQ0DTgdmAncDx7p77sjEG4Bjg3JDYFrU9quJtFCygnKuNUG9iCSYHM/hpjE3kdIg\nhTvbFPXQI7FUkGQyll9eClycfqMKQGsi3VNfm9kzwEFXiwVdWCXWN5WWlra/nJqaSmpqakm9tYjE\nwd+++Btrt6/ljevf0FApMZKenk56enqRt4/7QI9mdhww1d2bBfPnAf2BE4GL3H190IX1ubufYmb9\nANx9YLD+OCIjGa8M1mkR1HcDLnT3W/PsT+dMRMqw9xa9x93j7iajV4bucI+jEj9nYmbf5zN9V9QA\n3X09sMrMTgqqLgEWAB8C3YO67sCYoDwW6GpmlcysGdAcyAjeZ5uZtbHInyo3Rm0jIglg9rrZ9P53\nbw2VUgYUpJvr7KhyFaAzcFQx93sH8IaZVSJygv9moDww2sx6EFwaDODuC81sNJFLk7OBPlFNjT5E\nLg2uSuTS4IOu5BKRsmv5puVc+eaVvHjlixoqpQwoUjeXmc1y99YxiKfEqZtLpOxZt30d5w07j4fa\nPcRfUv4SdjhJqcTvgA9uDMw9GpcDziLSihARKXFb9mzh8jcu55YzblEiKUMKctNiOgeSSTaRLqh/\nuPuSmEZWQtQyESk7dmft5rI3LuOMY8/gmcue0ZVbISpsy6TQ3VzBye4b3P3twgYXBiUTkbIhOyeb\nzqM7U61iNV6//nWNuRWyEruay8xqmNl9Zva8mfUxs3Jmdh2RK6/+WBLBiogAuDu9P+zNnuw9vHbt\na0okZdDhzpmMALYRufv8UuAmYA/wB3efE/vQRCRZDPhsAAt+WsBnf/6MSuVLbPQmiaPDJZNfu/tv\nAMxsCLAOOMHdd8clMhFJCk9NfYoPlnzAlzd/SfVK1cMOR4rocMlk/3Dz7r7PzNYokYhISRr5zUgG\nTx/M5Jsnc1S14t6+JmE65Al4M9sH7Iqqqkpk0EeIDJ9VJh4koBPwIqXTf5b+hx5je/B5989pcXSL\nsMORPErsPhN3170kIhITU1ZN4eYPbubDbh8qkSQIXTIhInE1/8f5XPf2dYy8biRtGrcJOxwpIUom\nIhI3K7es5PI3LueZ3z3D7379u7DDkRKkZCIicfHTzp/o8HoHHmj7AN1O6xZ2OFLClExEJOZ+3vUz\nHV7vwO9b/l5PSkxQSiYiElMbdmwg9bVUrvj1Ffztor+FHY7EiJKJiMTM2u1rSR2eyg2n3sBj7R/T\nwI0JrCAPxxIRKbQftv7AxSMupseZPeh3Xr+ww5EYUzIRkRL3/ebvaT+iPXf+9k7uOfeesMOROFAy\nEZEStWzjMi4ecTH9zutHn7P7hB2OxImSiYiUmEU/LeLSkZeSlppGz9Y9ww5H4kjJRERKxPwf59Nh\nZAcGXjKQP5/+57DDkThTMhGRYpu9bnbkzvbLnqFrq65hhyMhUDIRkWLJWJPB1aOu5oUrX+D6FteH\nHY6ERMlERIpsyqopXPvWtbza8VWuOumqsMOREIV206KZlTez2Wb2YTBfz8wmmNlSMxtvZnWi1u1v\nZsvMbLGZdYiqTzGzecGywWF8DpFk9cWKL7j2rWsZed1IJRIJ9Q74u4CFQO6Tq/oBE9z9JOCzYB4z\nawl0AVoClwHP24HbaF8Aerh7c6C5mV0Wx/hFktan331K53c681bntzT6rwAhJRMzawxcAbwC5CaG\na4DhQXk4cG1Q7giMcvcsd18BLAfamFkDoKa7ZwTrjYjaRkRiZPSC0fzh3T/w3g3v0b5Z+7DDkVIi\nrHMmTwMPANGP/j3W3TcE5Q3AsUG5ITAtar3VQCMgKyjnWhPUi0gMuDuPT36cF2e8yIQbJ3D6caeH\nHZKUInFPJmZ2FfCju882s9T81nF3N7MSe3B7Wlra/nJqaiqpqfnuVkQOIXNfJrf++1bmrJ/D1B5T\naVRLf7clmvT0dNLT04u8vbmX2DG7YDs0+ztwI5ANVCHSOnkPOBtIdff1QRfW5+5+ipn1A3D3gcH2\n44BHgJXBOi2C+m7Ahe5+a579ebw/o0gi2bJnC51Gd6J6xeq82elNalSqEXZIEgdmhrsXeJjnuJ8z\ncfcB7t7E3ZsBXYGJ7n4jMBboHqzWHRgTlMcCXc2skpk1A5oDGe6+HthmZm2CE/I3Rm0jIiXg+83f\n03ZoW0475jTe7/K+EokcUmm4zyS32TAQGG1mPYAVwA0A7r7QzEYTufIrG+gT1dToA7wGVAU+cvdx\ncYxbJKFNWz2N69++nv7n9eeONneEHY6UcnHv5oo3dXOJFN47C96hz0d9GNZxmO4hSVKF7eYqDS0T\nESkl3J1BXw3iX1//i/F/Gs+ZDc4MOyQpI5RMRASArH1Z3Paf25i5biZTe0ylca3GYYckZYiSiYiw\nZc8WOo/uTJUKVZh00yRqVq4ZdkhSxoQ5nIqIlAIrtqyg3avtaFG/BWO6jlEikSJRMhFJYlNWTaHt\n0Lb0TunNc1c8R4Vy6qyQotEvRyQJuTtPTX2KQVMGMfSaobpiS4pNyUQkyWzavYmbxtzEhp0bmN5z\nOk3rNA07JEkA6uYSSSLTVk+j9UutaV6vOV/e/KUSiZQYtUxEkoC78/S0pxk4eSBDrh5Cx1M6hh2S\nJBglE5EEt3n3Zm764CbWbV/H9J7TaVa3WdghSQJSN5dIApu+ejqtX27NiXVOZPItk5VIJGbUMhFJ\nQO7O4OmD+fuXf+flq1/m2lP0EFKJLSUTkQSzefdmbv7gZtZuX6tuLYkbdXOJJJCMNRm0frk1Tes0\nVbeWxJVaJiIJIMdzGDxtMI9PfpyXrnqJ61pcF3ZIkmSUTETKuCU/L6HH2B4ATOs5jRPrnhhyRJKM\n1M0lUkZl7cvi8S8fp92r7ejaqiuTbp6kRCKhUctEpAyas34Ot3xwC/Wr1WfGX2boTnYJnZKJSBmy\nJ3sPj016jJdnvsygSwfR/fTumBX4yaoiMaNkIlJGTFk1hR5je9Cifgu+ufUbGtRsEHZIIvspmYiU\ncjsyd/DwZw/zzsJ3ePbyZ+ncsnPYIYn8gk7Ai5Rin373Kb954Tds2buFebfNUyKRUkstE5FSaMue\nLdz3yX18+v2nvHjli1ze/PKwQxI5rLi3TMysiZl9bmYLzGy+md0Z1NczswlmttTMxptZnaht+pvZ\nMjNbbGYdoupTzGxesGxwvD+LSEnL8Rxen/s6rZ5vReUKlZl32zwlEikTzN3ju0Oz44Dj3H2OmdUA\nZgLXAjcDP7v7IDN7CKjr7v3MrCXwJnA20Aj4FGju7m5mGUBfd88ws4+AZ919XJ79ebw/o0hRTFo5\nifvG34dhPP27p2l3fLuwQ5IkZma4e4EvFYx7N5e7rwfWB+UdZraISJK4BrgwWG04kA70AzoCo9w9\nC1hhZsuBNma2Eqjp7hnBNiOIJKWDkolIabd041Ie+vQhZq+bzeMXP06XVl0oZzqdKWVLqL9YM2sK\nnAlMB4519w3Bog3AsUG5IbA6arPVRJJP3vo1Qb1ImbBx10bu+vgu2g5tyzmNzmFx38V0O62bEomU\nSaGdgA+6uN4F7nL37dE3XgVdWCXWN5WWlra/nJqaSmpqakm9tUih7c3ey3MZz/HEV09wQ8sbWHT7\nIo6ufnTYYUmSS09PJz09vcjbx/2cCYCZVQT+DXzs7s8EdYuBVHdfb2YNgM/d/RQz6wfg7gOD9cYB\njwArg3VaBPXdgAvd/dY8+9I5EykV3J13Fr5Dv0/7ceoxpzLokkG0OLpF2GGJ5KvUnzOxSBNkKLAw\nN5EExgLdgSeC1zFR9W+a2VNEurGaAxlB62WbmbUBMoAbgWfj9DFECmXqqqncN/4+dmfv5pVrXqF9\ns/ZhhyRSosK4mus8YBIwF8jdeX8iCWE0cDywArjB3bcE2wwAbgGyiXSLfRLUpwCvAVWBj9z9znz2\np5aJhObbTd8yYOIApqyawmMXPcaNp9+ocyJSJhS2ZRJKN1c8KZlIGOZtmMfArwbyyfJPuKvNXdzX\n9j6qVawWdlgiBaZkkoeSicTTlFVTeHzy48xYO4O729zNrWfdSu0qtcMOS6TQSv05E5FE4+588u0n\nPD75cX6bcRU2AAANk0lEQVTY+gMPtn2Q0Z1HU7Vi1bBDE4kbJRORItqXs493F73LwMkDycrJol+7\nfnRp1YUK5fTfSpKPfvUihbQ3ey8j545k0FeDqF+tPo+mPsqVJ12pE+uS1JRMRApoR+YOXp75Mk9N\nfYpWx7RiyNVDuOCEC/SkQxGUTESOaMGPCxgyawhvzHuDi5pexNhuY2ndoHXYYYmUKkomIvnYmbmT\n0QtGM2TWEFZuXcnNZ9xMRs8MmtVtFnZoIqWSLg0WiTJz7UyGzBrC6AWjaXd8O3q17sUVza/QSXVJ\nOro0WKSQtu7Zypvz3mTIrCFs2r2Jnq17Mve2uTSu1Tjs0ETKDLVMJCm5O1NXT2XIrCGMWTyGS068\nhF6te3HJiZfoqiwRdAf8LyiZSLSlG5fy3qL3GDl3JNk52fQ8syfdz+jOMdWPCTs0kVJFySQPJZPk\n5u7M+3Ee7y58l/cWv8fGXRu57pTr6NKqC+cff74u6xU5BCWTPJRMko+7k7Emg/cWvcd7i98jOyeb\nTi06cX2L6zmn8TnqxhIpACWTPJRMksO+nH1M/mEy7y56l/cXv0+NSjXo1KITnVp04ozjzlALRKSQ\nlEzyUDJJXDsydzBp5STGLB7DB0s+oFHNRlzf4no6teikJxiKFJMuDZaEtSd7D9NWT2Pi9xOZ+P1E\n5qyfw1kNz+Kqk65iao+pnFj3xLBDFElaaplIqZWdk82MtTP2J4/pa6bT8uiWXNzsYto3a0/bJm31\nwCmRGFE3Vx5KJmVHjucwb8O8SPJYMZEvV37JCXVOoH3T9rRv1p4LTrhAD5qSpOMOu3bBtm0Hpq1b\nC1eeNg0aNCjcfpVM8lAyKZ3cnbXb1zJz3UxmrJ3BzHUzmb56OvWq1qN9s0jyuKjpRRxd/eiwQxUp\nNHfYuRO2bz/8FJ0goufzlqtUgZo1I1Pt2pGpVq3IVJBy48ZQoZAnNZRM8lAyKR3WbV+3P2nkvmbn\nZJPSIIWzGp5FSoMUzm50toYwkVDs23fg4L9jx5Ff89blnXbsgMqVIwfy3CSQd8o90Oeuc6hyzZqF\nTwQlQckkDyWT+HJ31mxfwzfrvzkoeWTuyySlYQpnNTiLlIYppDRI4fjax+uSXSmU3IP+zp2RA3Zu\n+VDzBZl27oS9e6F69ciUewCvUeOXr/nV5ZcoatQIJwGUJCWTPJRMYmPb3m0s3biUJT8vibxuXMKS\njUtYtnEZNSvXpNUxrTirwVmRVkfDFE6ofYISRxLIzIz07x9q2rnzwGt0+Uh1uVNm5oGDfvXqkYP2\noeZzD/75TXmXV60K+nkeLOmSiZldBjwDlAdecfcn8ixXMimiXVm7WL1tdb5JY9vebZx01EmcdNRJ\nnHzUyZx81Mn753WSvPRwjxyAd+8+eNqz55d1u3YV/jXv5B45UFerdvBUvXrkgJ27LPeAf6hy9Hy1\nagcSQJUqOujHS1IlEzMrDywBLgHWAF8D3dx9UdQ6SiaB9PR0UlNTcXe27t3Kmm1rWL1t9f5pzfaD\n53dl7aJRrUY0r9c8kjDqn7w/eTSq1ahMD0uS+13EQ3Z25OC9Z0+kOyX69VDlvXsPHPSjp7x1+c3n\nTRoVK0YOwlWr5j/t3JlOkyapVKsWmc99jS7nV5df0qhYMS5faczE83dR2iXbTYu/BZa7+woAM3sL\n6AgsOtxGiSbHc9i6Zysbd29k466NbNq96RflTbs3Me31aVSaX4nV21ZjZjSp1YRGtRrRuFZjGtds\nTOsGrbnm5Gsi87Uac1TVo8pE19S+fZG/vjMzISvrQPlw88OHp/PTT6n76/fuzf81b11RJogczHOn\nypUPfs2vXLly5ICdmwTq1Dn4PXKXRc/nbhM9VakC5csf/vtLS0snLS015v9OZYGSSdGV9WTSCFgV\nNb8aaBNSLIfl7mTlZJG1L4vMfZn7p7379rIzcyc7s3ayM3MnOzJ37C/vzArmo8vBss17Nu9PFlv2\nbKFm5ZrUq1KPulWPom7letStchR1KtejTqWjaFT5ZFrUrMfeo8pz2wX9ObpSY6qVr8W+fZG/mqNf\n922E7Rtg/j5+sbywU1bWgdfo8qFe8065B/8jzUPkQFqxIlSqFJmiy/nNf/ddZNtKlSLb5vdau3b+\nywo7lfUTsSIFUdZ/5gXqvzr2nquClaNXP3jT6GWRcg4ePVlOULcPt0gdQX1k+T6cbLxcFm6Z5JTL\nxC0zMl8u8kpOBSynUtRUMfKaVQPLro5lVceyakBWpExmDcisDll18b1NILM6njvtqkvOjqPI2VWP\nnJ112eoV2F4OVpeP/CVarlzktXzU/K5dy/h6fMv9dRUq8IvyoeoKM5UvHzl4V6wY6fqoUCFSzn2N\nLue3LPfAf7j56Loj/eWdn7S0yCQiJaOsnzM5B0hz98uC+f5ATvRJeDMrux9QRCREyXQCvgKRE/AX\nA2uBDPKcgBcRkdgr091c7p5tZn2BT4hcGjxUiUREJP7KdMtERERKh7J7o0ABmNllZrbYzJaZ2UNh\nxxMmM1thZnPNbLaZZYQdTzyZ2atmtsHM5kXV1TOzCWa21MzGm1mdMGOMl0N8F2lmtjr4bcwObgRO\naGbWxMw+N7MFZjbfzO4M6pPud3GY76JQv4uEbZkU5IbGZGJm3wMp7r4p7FjizczOB3YAI9z9tKBu\nEPCzuw8K/tCo6+79wowzHg7xXTwCbHf3p0INLo7M7DjgOHefY2Y1gJnAtcDNJNnv4jDfxQ0U4neR\nyC2T/Tc0unsWkHtDYzIr/XcgxoC7fwlszlN9DTA8KA8n8p8n4R3iu4Ak+224+3p3nxOUdxC50bkR\nSfi7OMx3AYX4XSRyMsnvhsZGh1g3GTjwqZnNMLNeYQdTChzr7huC8gbg2DCDKQXuMLNvzGxoMnTt\nRDOzpsCZwHSS/HcR9V1MC6oK/LtI5GSSmP13RdfO3c8ELgduD7o7BAgGb0vm38sLQDPgDGAd8M9w\nw4mfoFvnXeAud98evSzZfhfBd/F/RL6LHRTyd5HIyWQN0CRqvgmR1klScvd1wetPwPtEugGT2Yag\nrxgzawD8GHI8oXH3Hz0AvEKS/DbMrCKRRDLS3ccE1Un5u4j6Ll7P/S4K+7tI5GQyA2huZk3NrBLQ\nBRgbckyhMLNqZlYzKFcHOgDzDr9VwhsLdA/K3YExh1k3oQUHzVzXkQS/DYuMYDoUWOjuz0QtSrrf\nxaG+i8L+LhL2ai4AM7ucA886Geruj4ccUijMrBmR1ghEblR9I5m+CzMbBVwI1CfSD/7fwAfAaOB4\nYAVwg7tvCSvGeMnnu3gESCXSleHA90DvqPMGCcnMzgMmAXM50JXVn8goGkn1uzjEdzEA6EYhfhcJ\nnUxERCQ+ErmbS0RE4kTJREREik3JREREik3JREREik3JREREik3JREREik3JRBKSmT0cDKf9TTB8\ndpm+q9vMXjOzTjF8/wvN7Nx47U8ST5l+0qJIfoKD4pXAme6eZWb1gMohh1VcsR4n6iJgOzA1an8i\nBaaWiSSi44g8kyILwN035Y5NZmYpZpYejJ48LmocppSgFTPHzJ7MfXiUmd1kZs/lvrGZ/dvMLgzK\nHcxsipnNNLPRwVA1uQ8iSwvq55rZyUF9DTMbFtR9Y2bXH+598nHQcOBmVj6INSN4v78E9anBZ3zH\nzBaZ2etR21wR1M0ws2fN7EMzOwHoDdxjZrOCO6IBLjCzr8zsW7VS5EiUTCQRjQeamNkSM/tfM7sA\n9g9m9xzQyd3PAoYB/xNsMwy43d1zh4841F/mDriZ1QceBi529xQiDxS6N2qdn4L6F4D7g/r/B2x2\n99+4++nAxCO8z5H0ALa4+2+JDMLXKxhCHCLDYNwFtARONLO2ZlYFeBG4LPj89YkMjrsyqH/K3Vu7\n+2Qiies4d28HXAUMLGBMkqTUzSUJx913mlkKcD6R7pu3zawfkQP1qUSe6wKRMdvWmlltoHZwEAUY\nSWSo/kMx4BwiB+opwXtVAqZErfNe8DoLuD4oX0xkwNHcOLeY2VVHeJ/D6QCcZmadg/lawK+BLCDD\n3dcCmNkcIkOJ7wK+C5IHwCjgL3k+1/7wCAY5dPdFZpZUz/WQwlMykYTk7jnAF8AXQZdVdyLJZIG7\nt41eN5+H/kQfVLM5uAVfJao8wd3/cIgQ9gav+zj4/1l+T6473PtEy6+11NfdJ0RXmFlq1P6jY8i7\n/ZGeopdZiHUlyambSxKOmZ1kZs2jqs4kMgLsEuBoMzsnWK+imbUMRoXdYmbtgvX/GLXtCuAMi2hC\npDvJiTyJrp2Z/Sp4r+p59pmfCcDtUXHWKeT75D2gfwL0MbMKUZ+72iG29eDznxicI4FIKyk3wWwH\nah4hfpFDUjKRRFQDeM3MFpjZN8ApQFpwQr4z8ETQ9TMbyL0c9mbgf81sdvQbBV1f3wMLgcFEWje4\n+8/ATcCoYB9TgJPziSX6/MtjQF0zmxfsP7UQ7wPwkpmtCqaviDywaCEwK2h9vcCBFsgvWjHuvgfo\nA4wzsxnAtmAC+BC4Ls8J+Oj30NVdclgagl4kj+Av93+7+2lhx1LSzKy6u+8Myv8LLHX3wSGHJQlA\nLRORXzIS9y/xXsFNnAuInLB/KeyAJDGoZSIiIsWmlomIiBSbkomIiBSbkomIiBSbkomIiBSbkomI\niBSbkomIiBTb/wf0pGtH4RIUmAAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And for four sequences:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "s4_times = [t ** 4 for t in range(25)]\n", "\n", "plt.plot(range(25), s2_times)\n", "plt.plot(range(25), s3_times)\n", "plt.plot(range(25), s4_times)\n", "plt.xlabel('Sequence Length')\n", "plt.ylabel('Runtime (s)')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAZoAAAEPCAYAAAB7rQKTAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcVNWZ//HPF5oGFBRc4oKIxsEoLsEQRaPRThyJmolL\nxlGMUZKYxASXzC86Gc0y4jCTRBNNNIlLjBEk6qhjFHQQwaWzqKQVRVBE1IgKIiqyKjS9PL8/7mko\n2ga6m66+3V3f9+t1X3Xq3KWeW5b9cM4991xFBGZmZsXSLe8AzMysa3OiMTOzonKiMTOzonKiMTOz\nonKiMTOzonKiMTOzoipaopHUS9LfJM2UNEfST1L9GEkLJD2TluMK9rlE0kuS5koaUVA/TNLstO7q\ngvqeku5I9dMlDSpYN0rSvLScVazzNDOzTVMx76ORtFVEfCCpDPgrcBFwNLAyIq5qtO0Q4DbgYGAA\n8BAwOCJCUhVwXkRUSZoMXBMRUySNBvaPiNGSTgNOjoiRkrYDngSGpcPPAIZFxLKinayZmTWpqF1n\nEfFBKpYD3YGl6b2a2PxE4PaIqImI+cDLwHBJuwB9I6IqbXcLcFIqnwCMT+W7yZIYwOeAqRGxLCWX\nacCxbXNWZmbWEkVNNJK6SZoJLAYejYjn06rzJT0r6SZJ/VLdrsCCgt0XkLVsGtcvTPWk1zcAIqIW\nWC5p+00cy8zM2lmxWzT1ETEU2A04UlIFcB2wJzAUWARcWcwYzMwsX2Xt8SERsVzS/wGfjIjKhnpJ\nvwPuS28XAgMLdtuNrCWyMJUb1zfsszvwZroOtG1ELJG0EKgo2Gcg8EjjuCR5ojczs1aIiKYugTSp\nmKPOdmjoFpPUGzgGeEbSzgWbnQzMTuVJwEhJ5ZL2BAYDVRHxFrBC0nBJAs4EJhbsMyqVTwEeTuWp\nwAhJ/ST1T5/9YFNxRoSXCC699NLcY+goi78Lfxf+Lja9tFQxWzS7AOMldSNLaBMi4mFJt0gaCgTw\nKnAOQETMkXQnMAeoBUbH+jMaDYwDegOTI2JKqr8JmCDpJWAJMDId6z1JY8lGngFcFh5xZmaWi6Il\nmoiYDXyiifqN3tMSET8GftxE/QzggCbqq4FTN3Ksm4GbWxCymZkVgWcGMAAqKiryDqHD8Hexnr+L\n9fxdtF5Rb9js6CRFKZ+/mVlrSCI6wmAAMzMzcKIxM7Mic6IxM7OicqIxM7OicqIxMytF9fXwyCPQ\nDgOinGjMzErR1Knwve+Bmj14rNWcaMzMStENN8A557TLR/k+mhI+fzMrUQsXwgEHwOuvQ58+Ld7d\n99GYmdmm3XQTjBzZqiTTGm7RlPD5m1kJqq2FPfeE+++Hj3+8VYdwi8bMzDbugQdgt91anWRaw4nG\nzKyUXH89fOtb7fqR7jor4fM3sxLz2mswbBi88Qb07t3qw7jrzMzMmnbjjfDlL29RkmkNt2hK+PzN\nrITU1MCgQfDww7Dvvlt0KLdozMzswyZNgsGDtzjJtIYTjZlZKWjHmQAac9dZCZ+/mZWIl1+GT30q\nGwTQs+cWH85dZ2ZmtqEbb4RRo9okybSGWzQlfP5mVgKqq2HgQHjssewaTRtwi8bMzNb74x/hwAPb\nLMm0RtESjaRekv4maaakOZJ+kuq3kzRN0jxJUyX1K9jnEkkvSZoraURB/TBJs9O6qwvqe0q6I9VP\nlzSoYN2o9BnzJJ1VrPM0M+vQbrih3WcCaKxoiSYi1gCfiYihwIHAZyQdAVwMTIuIvYGH03skDQFO\nA4YAxwLXSuueyHMdcHZEDAYGSzo21Z8NLEn1vwAuT8faDvgP4JC0XFqY0MzMSsILL8CLL8KJJ+Ya\nRlG7ziLig1QsB7oDS4ETgPGpfjxwUiqfCNweETURMR94GRguaRegb0RUpe1uKdin8Fh3A0en8ueA\nqRGxLCKWAdPIkpeZWen47W/ha1+DHj1yDaOsmAeX1A14GtgLuC4inpe0U0QsTpssBnZK5V2B6QW7\nLwAGADWp3GBhqie9vgEQEbWSlkvaPh1rQRPHMjMrDatXw4QJ8NRTeUdS3EQTEfXAUEnbAg9K+kyj\n9SEp12FfY8aMWVeuqKigoqIit1jMzNrMXXfBIYfAHnts8aEqKyuprKxs9f5FTTQNImK5pP8DhgGL\nJe0cEW+lbrG302YLgYEFu+1G1hJZmMqN6xv22R14U1IZsG1ELJG0EKgo2Gcg8EhTsRUmGjOzLuP6\n6+Hf/71NDtX4H+GXXXZZi/Yv5qizHRouwEvqDRwDPANMAkalzUYB96byJGCkpHJJewKDgaqIeAtY\nIWl4GhxwJjCxYJ+GY51CNrgAYCowQlI/Sf3TZz9YpFM1M+tYZs2C11+Hz38+70iA4rZodgHGp+s0\n3YAJEfGwpGeAOyWdDcwHTgWIiDmS7gTmALXA6IK7KUcD44DewOSImJLqbwImSHoJWAKMTMd6T9JY\n4Mm03WVpUICZWdd3ww3w9a9DWbt0Wm2WZwYo4fM3sy5o1SrYffesVbPbbpvfvhU8M4CZWSn7n/+B\nT3+6aEmmNZxozMy6kg4wE0BjTjRmZl3FjBnwzjswYsTmt21HTjRmZl3FDTfAN78J3bvnHckGPBig\nhM/fzLqQFStg0KBsfrOddy7qR3kwgJlZKbr1VvjHfyx6kmkNJxozs84uIpsJ4Jxz8o6kSU40Zmad\n3d/+Bh98AJ/9bN6RNMmJxsyss/vVr7LWTLeO+SfdgwFK+PzNrAt47TX4xCfg73+Hbbdtl4/0YAAz\ns1Lyi19k85q1U5JpDbdoSvj8zayTW7IEBg+G556DXXdtt491i8bMrFRcdx2cfHK7JpnWcIumhM/f\nzDqx1athzz3h0Udh333b9aPdojEzKwXjx2ePam7nJNMabtGU8PmbWSdVVwcf+xiMGwdHHNHuH+8W\njZlZV3fPPfCRj8Dhh+cdSbM40ZiZdSYRcMUV8L3vgZrdqMiVE42ZWWfypz/B8uVwwgl5R9JsTjRm\nZp3JFVfAv/1bh51upikeDFDC529mncysWXDssdl0M7165RaGBwOYmXVVP/85XHBBrkmmNdyiKeHz\nN7NO5PXXYejQrDXTr1+uoXSYFo2kgZIelfS8pOckXZDqx0haIOmZtBxXsM8lkl6SNFfSiIL6YZJm\np3VXF9T3lHRHqp8uaVDBulGS5qXlrGKdp5lZu/jlL+FrX8s9ybRG0Vo0knYGdo6ImZL6ADOAk4BT\ngZURcVWj7YcAtwEHAwOAh4DBERGSqoDzIqJK0mTgmoiYImk0sH9EjJZ0GnByRIyUtB3wJDAsHX4G\nMCwiljX6TLdozKzjW7oU9toru0az2255R9NxWjQR8VZEzEzlVcALZAkEoKkATwRuj4iaiJgPvAwM\nl7QL0DciqtJ2t5AlLIATgPGpfDdwdCp/DpgaEctScpkGHNtmJ2dm1p6uuy4bztwBkkxrtMtgAEl7\nAAcB01PV+ZKelXSTpIZ24K7AgoLdFpAlpsb1C1mfsAYAbwBERC2wXNL2mziWmVnnsmZN9gTNiy7K\nO5JWKyv2B6Rus/8FvhMRqyRdB/xnWj0WuBI4u9hxbMyYMWPWlSsqKqioqMgrFDOzD5swAYYNg/33\nzy2EyspKKisrW71/UUedSeoB3A88EBG/bGL9HsB9EXGApIsBIuKnad0U4FLgNeDRiNg31Z8OHBkR\n307bjImI6ZLKgEURsaOkkUBFRHwr7XMD8EhE3NHo832Nxsw6rro6GDIEfvtbOOqovKNZp8Nco5Ek\n4CZgTmGSSddcGpwMzE7lScBISeWS9gQGA1UR8RawQtLwdMwzgYkF+4xK5VOAh1N5KjBCUj9J/YFj\ngAfb/CTNzIpp0qRslNmRR+YdyRYpZtfZ4cCXgVmSnkl13wdOlzQUCOBV4ByAiJgj6U5gDlALjC5o\nbowGxgG9gckRMSXV3wRMkPQSsAQYmY71nqSxZCPPAC5rPOLMzKxDi4DLL+9Uk2dujG/YLOHzN7MO\n7C9/ye6bmTsXunfPO5oNdJiuMzMz2wJXXJGNNOtgSaY13KIp4fM3sw7q+efh6KPh1Vehd++8o/kQ\nt2jMzDq7n/8czj+/QyaZ1nCLpoTP38w6oAUL4MAD4eWXYbvt8o6mSW7RmJl1ZldfDaNGddgk0xpu\n0ZTw+ZtZB/POO7DPPvD00zBo0Oa3z0lLWzRONCV8/mbWwXz3u1BdDb/5Td6RbJITTQs40ZhZh/HG\nG9mDzZ57DnbZZfPb58iJpgWcaMysw/j61+EjH4Ef/zjvSDarpYmm6LM3m5nZZrz4IkycCPPm5R1J\nUXjUmZlZ3n70I7jwQujfP+9IisJdZyV8/mbWAcyYAV/4QnbfzFZb5R1Ns/g+GjOzzuT734cf/rDT\nJJnWcKIxM8tLZSW89FI2EKALc6IxM8tDBFxyCYwdC+XleUdTVE40ZmZ5uO8+eP99OP30vCMpuk0O\nb5b0CeB04EhgD7KnYr4G/Bm4LSKe2fjeZmbWpLo6+MEPsntmunX9f+9vNNFImgwsBSYB1wKLAAG7\nAIcAF0nqFxGfb49Azcy6jNtug7594Z/+Ke9I2sVGhzdL2ikiFm9yZ+kjEfF2USJrBx7ebGbtbu3a\nbOLMm2+Go47KO5pWabPhzQ1JRtLWkrqn8scknSCpR9qm0yYZM7Nc3Hgj7L13p00yrbHZGzYlPQ0c\nAfQHHgOeBNZGxBnFD6+43KIxs3b1/vsweDDcfz984hN5R9NqxbhhUxHxAfBF4NqI+Bdg/9YGaGZW\nsq65Bj796U6dZFqjWZNqSjoMOAM4O1V1/WESZmZtaelSuOoqeOyxvCNpd81JGP8KXALcExHPS9oL\neHRzO0kaKOlRSc9Lek7SBal+O0nTJM2TNFVSv4J9LpH0kqS5kkYU1A+TNDutu7qgvqekO1L9dEmD\nCtaNSp8xT9JZzfs6zMyK5PLL4aSTsuszJaZok2pK2hnYOSJmSuoDzABOAr4KvBsRV0j6d6B/RFws\naQhwG3AwMAB4CBgcESGpCjgvIqrSsOtrImKKpNHA/hExWtJpwMkRMVLSdmTXkoalcGYAwyJiWaMY\nfY3GzIpv0SLYbz+YNQt22y3vaLZYm12jkfR7SQdvYv1wSTdvbH1EvBURM1N5FfACWQI5ARifNhtP\nlnwATgRuj4iaiJgPvAwMl7QL0DciqtJ2txTsU3isu4GjU/lzwNSIWJaSyzTg2I3FamZWVGPHwle/\n2iWSTGts6hrNL4B/k3Qo8CLrb9jcGfgY8Djw8+Z8iKQ9gIOAvwGF9+csBnZK5V2B6QW7LSBLTDWp\n3GBhqie9vgEQEbWSlkvaPh1rQRPHMjNrX6+8AnfeCXPn5h1JbjaaaCJiNnCWpJ5kSWIQ66egeTYi\n1jTnA1K32d3AdyJipbS+tZW6xXLtuxozZsy6ckVFBRUVFbnFYmZd0KWXwgUXwA475B1Jq1VWVlJZ\nWdnq/Yv64LN0Y+f9wAMR8ctUNxeoiIi3UrfYoxGxj6SLASLip2m7KcClZInt0YjYN9WfDhwZEd9O\n24yJiOmSyoBFEbGjpJHpM76V9rkBeCQi7mgUn6/RmFnxzJoFI0ZkjwLo2zfvaNpMh3nwmbKmy03A\nnIYkk0wCRqXyKODegvqRksol7QkMBqoi4i1gRbomJOBMYGITxzoFeDiVpwIjJPWT1B84BniwzU/S\nzGxTfvADuPjiLpVkWqOYo86OIJvleRZZlxtkw6SrgDuB3YH5wKkNo8EkfR/4GlBL1tX2YKofBowD\negOTI6JhqHRPYAJZ194SYGQaSICkrwLfT5/7XxHRMGigMEa3aMysOP76VzjjDHjxRejVK+9o2lRL\nWzTNTjSStkozBHQZTjRmVhS1tTBsWPZgs5Ej846mzbV515mkT0maQzbyDElDJV27BTGamXVtv/41\nfOQjcNppeUfSITRnUs0qsusfEyPioFT3fETs1w7xFZVbNGbW5hYuhKFDs6lmuugsAEUZDBARrzeq\nqm1RVGZmpeLCC+Fb3+qySaY1mjOp5uuSDgeQVA5cQHaXv5mZFZo2Daqqsoea2TrNadF8GziX7M76\nhWQjvM4tZlBmZp1OdTWce272KIDevfOOpkMp6g2bHZ2v0ZhZm/mv/4KnnoJ77938tp1cmw9vlvRR\n4HxgD9Z3tUVEnNDaIDsKJxozaxN//zsccgjMmAGDBm1++06upYmmOddo7gV+B9wH1Kc6/3U2MwOI\ngPPPh4suKokk0xrNSTRrIuKaokdiZtYZTZyYtWjuuSfvSDqs5nSdnQnsRTZXWHVDfUQ8XdzQis9d\nZ2a2Rd5/H4YMgXHj4DOfyTuadlOMrrP9yCay/Azru85I783MStfYsfDpT5dUkmmN5rRoXgH2jYi1\n7RNS+3GLxsxabc4cOOoomD0bdt4572jaVTFmBpgN9G99SGZmXUxEds/MpZeWXJJpjeZ0nfUH5kp6\nkvXXaLrE8GYzs1a57TZYvhy+/e28I+kUmtN1VtFUfURUFiGeduWuMzNrsWXLsgEA99wDw4fnHU0u\nivY8mq7IicbMWuz886GmBq6/Pu9IctNmo84kPRYRh0taxYdv0IyI2Ka1QZqZdUozZsBdd2UDAazZ\n3KIp4fM3sxaoq4PDDoPRo+ErX8k7mlwV4wmbE5pTZ2bWpf3ud9CzJ5x1Vt6RdDrNGXW2f+EbSWXA\nsOKEY2bWAb39NvzoR/Dww9CtWc+LtAIb/cYkfV/SSuAASSsbFuBtYFK7RWhmlreLLoIzz4QDDsg7\nkk6pOcObfxoRF7dTPO3K12jMbLPuugt+8AN4+mno0yfvaDqEogxvljQAGERBV1tE/LlVEXYgTjRm\ntkkLFsCwYXD//XDwwXlH02EUYzDA5cBjwA+BfytYmhPM7yUtljS7oG6MpAWSnknLcQXrLpH0kqS5\nkkYU1A+TNDutu7qgvqekO1L9dEmDCtaNkjQvLb56Z2YtU18Po0bBBRc4yWyh5nSdzQMOiIjqTW7Y\n9L6fBlYBt0TEAanuUmBlRFzVaNshwG3AwcAA4CFgcESEpCrgvIiokjQZuCYipkgaDewfEaMlnQac\nHBEjJW0HPMn6QQszgGERsazRZ7pFY2ZN+/nPs2fNVFZC9+55R9OhFGNSzVeA8tYEExF/AZY2saqp\nAE8Ebo+ImoiYD7wMDJe0C9A3IqrSdrcAJ6XyCcD4VL4bODqVPwdMjYhlKblMA45tzTmYWQmaOROu\nuAImTHCSaQPNGd68Gpgp6WE2nFTzgi343PNTd9ZTwIUpGewKTC/YZgFZy6YmlRssTPWk1zdSQLWS\nlkvaPh1rQRPHMjPbtNWr4Utfgquugj32yDuaLqE5iWYSHx7OvCX9TdcB/5nKY4ErgbO34HhbZMyY\nMevKFRUVVFRU5BWKmXUE3/sefPzjcMYZeUfSYVRWVlJZWdnq/TebaCJiXKuP3vTx3m4oS/odcF96\nuxAYWLDpbmQtkYWp3Li+YZ/dgTfTjaTbRsQSSQuBioJ9BgKPNBVPYaIxsxI3eTJMmgTPPgtq9iWI\nLq/xP8Ivu+yyFu3fnFFnrzax/L3Fka4/3i4Fb08me7AaZK2mkZLKJe0JDAaqIuItYIWk4ZJE9ljp\niQX7jErlU4CHU3kqMEJSP0n9gWOAB1sbs5mVgLffhq9/HW65Bfr1yzuaLqU5XWeF4/p6kf1B3745\nB5d0O3AUsIOkN4BLgQpJQ8m6314FzgGIiDmS7gTmALXA6IIhYaOBcUBvYHJETEn1NwETJL0ELAFG\npmO9J2ks2cgzgMsajzgzM1snAs4+O5vH7Kij8o6my2nV7M2Sno6ITxQhnnbl4c1mBmTPlrnxRnji\nCShv1SDbktJmz6MpOOAw1l/87wZ8EvB4PzPrGubOzSbM/MtfnGSKpDldZ1eyPtHUAvOBU4sVkJlZ\nu1m7NhtdNnYs7LNP3tF0WS3uOksX5E+NiDuKE1L7cdeZWYm7+OLsaZkTJ3qUWQu05aOc+5BdqN8L\neA64nuzu/f8mu2u/0ycaMythlZXZCLOZM51kimyjLRpJfwRWkN2tfwzZvShrgAsiYma7RVhEbtGY\nlailS2HoULjuOjj++Lyj6XTa7DEBkmZFxIGp3B1YBAyKiNVtEmkH4ERjVoIisilmtt8efv3rvKPp\nlNpy1FldQyEi6iQt7EpJxsxK1K23Znf+z5iRdyQlY1Mtmjrgg4Kq3mQTbEI2qeY2RY6t6NyiMSsx\n8+bB4YfD1Klw0EF5R9NptVmLJiJ8r4yZdR1Ll8IXvgA//rGTTDtr1cwAXYVbNGYlorY2u+i/775w\n9dWb3942qRgPPjMz69wuvDAbwnzllXlHUpKaMzOAmVnn9dvfwpQp8Le/QZn/5OXBXWclfP5mXd6f\n/gSnnprNY7b33nlH02W468zMDODvf4fTToM//MFJJmdONGbW9axYASecAD/8IRxzTN7RlDx3nZXw\n+Zt1SXV1cNJJMGBANsWM5zFrc+46M7PSdsklsGoV/OpXTjIdhIdgmFnXMX483H03VFVBjx55R2OJ\nu85K+PzNupTHH4cTT8ym/99vv7yj6dLcdWZmpef11+GUU2DcOCeZDsiJxsw6t1WrshFmF14In/98\n3tFYE9x1VsLnb9bp1dfDv/wLbLMN/P73vvjfTtryeTRmZh3bmDHw1ltw221OMh1YUbvOJP1e0mJJ\nswvqtpM0TdI8SVMl9StYd4mklyTNlTSioH6YpNlp3dUF9T0l3ZHqp0saVLBuVPqMeZLOKuZ5mlkO\n/vAHuOUW+OMfoWfPvKOxTSj2NZqbgWMb1V0MTIuIvYGH03skDQFOA4akfa6V1v0T5Trg7IgYDAyW\n1HDMs4Elqf4XwOXpWNsB/wEckpZLCxOamXVyd90FF10E998PO+2UdzS2GUVNNBHxF2Bpo+oTgPGp\nPB44KZVPBG6PiJqImA+8DAyXtAvQNyKq0na3FOxTeKy7gaNT+XPA1IhYFhHLgGl8OOGZWWd0771w\n3nnZjMz77593NNYMeYw62ykiFqfyYqDhnyO7AgsKtlsADGiifmGqJ72+ARARtcBySdtv4lhm1pnd\nfz9885sweTIMHZp3NNZMuQ4GiIiQlOuwrzFjxqwrV1RUUFFRkVssZrYJU6bA174G990Hw4blHU1J\nqayspLKystX755FoFkvaOSLeSt1ib6f6hcDAgu12I2uJLEzlxvUN++wOvCmpDNg2IpZIWghUFOwz\nEHikqWAKE42ZdVAPPQRnnZV1mw0fnnc0JafxP8Ivu+yyFu2fR9fZJGBUKo8C7i2oHympXNKewGCg\nKiLeAlZIGp4GB5wJTGziWKeQDS4AmAqMkNRPUn/gGODBYp6UmRVJZSWcfjr87//Cpz6VdzTWCkVt\n0Ui6HTgK2EHSG2QjwX4K3CnpbGA+cCpARMyRdCcwB6gFRhfcTTkaGAf0BiZHxJRUfxMwQdJLwBJg\nZDrWe5LGAk+m7S5LgwLMrDP561+zGzLvuAOOPDLvaKyVPDNACZ+/WYc2fXo2tcwf/gAjRmx+e2s3\nnlTTzDq/p57Kksy4cU4yXYATjZl1LM88k02OedNNcPzxeUdjbcCJxsw6jlmz4Ljjskcwf+ELeUdj\nbcSJxsw6hjlz4Nhj4eqr4YtfzDsaa0NONGaWvxdfhGOOgZ/9DE47Le9orI050ZhZvmbMgM9+Fv77\nv+GMM/KOxorAicbM8jNxYtZd9pvfwFe+knc0ViR+8JmZtb8I+MUv4Mor4YEH4JOfzDsiKyInGjNr\nX7W1cP752V3/TzwBu++ed0RWZE40ZtZ+VqyAU0/Nyo89Bttsk2881i58jcbM2sfrr8MRR8Cee2bP\nlXGSKRlONGZWfE89BYcdll3wv/ZaKHNnSinxf20zK6577smeinnjjXDSSZvf3rocJxozK44IuOqq\nbHTZlCl+KmYJc6Ixs7ZXWwvnnQePP54tHllW0pxozKxtNYwsk7IhzL7oX/I8GMDM2s4rr8Dhh8Ne\ne8F99znJGOBEY2ZtISJ7SNmhh8K3vgW//rVHltk6/iWY2ZZ5770subzwAjzyCBxwQN4RWQfjFo2Z\ntd6jj8LQobDrrvDkk04y1iS3aMys5dauhR/+EG69FX7/e/jc5/KOyDowJxoza5m5c+FLX4KBA2Hm\nTNhxx7wjsg7OXWdm1jwRcP312Xxl55wD997rJGPNkluikTRf0ixJz0iqSnXbSZomaZ6kqZL6FWx/\niaSXJM2VNKKgfpik2Wnd1QX1PSXdkeqnSxrUvmdo1oW88w6ceGI2jcxf/5olGinvqKyTyLNFE0BF\nRBwUEYekuouBaRGxN/Bweo+kIcBpwBDgWOBaad2v/Drg7IgYDAyWdGyqPxtYkup/AVzeHidl1uVM\nmZJd8B8yJHt+zD775B2RdTJ5d501/ifRCcD4VB4PNMzAdyJwe0TURMR84GVguKRdgL4RUZW2u6Vg\nn8Jj3Q0c3fbhm3Vhq1fDd76TTYh5663w059CeXneUVknlHeL5iFJT0n6RqrbKSIWp/JiYKdU3hVY\nULDvAmBAE/ULUz3p9Q2AiKgFlkvars3PwqwreuCBrBWzaBE8+yxUVOQdkXVieY46OzwiFknaEZgm\naW7hyogISVHsIMaMGbOuXFFRQYX/h7JSNm8efPe78OKL2azLn/+8r8UYlZWVVFZWtnp/RRT9b/nm\ng5AuBVYB3yC7bvNW6hZ7NCL2kXQxQET8NG0/BbgUeC1ts2+qPx04MiK+nbYZExHTJZUBiyJix0af\nGx3h/M1yt2IFjB0LN98M3/te1mXWs2feUVkHJYmIaPa/QHLpOpO0laS+qbw1MAKYDUwCRqXNRgH3\npvIkYKSkckl7AoOBqoh4C1ghaXgaHHAmMLFgn4ZjnUI2uMDMCtXXZ8lln33g3XfhueeyROMkY20o\nr66znYB70sCxMuDWiJgq6SngTklnA/OBUwEiYo6kO4E5QC0wuqApMhoYB/QGJkfElFR/EzBB0kvA\nEmBke5yYWacxfTpccAF065bdE3PIIZvfx6wVOkTXWV7cdWYl6c034eKL4eGH4Sc/gS9/OUs2Zs3U\nKbrOzCwH1dXZEOUDD4QBA7KpZM46y0nGis5znZl1dREwaRJceCHst1/WZfYP/5B3VFZCnGjMuqq6\nOrj77qzLZ6QaAAAOHUlEQVQVU1MD114LI0Zsfj+zNuZEY9bVrFkDt9wCP/tZNunlZZdl98O4i8xy\n4kRj1lWsWJHNrvzLX8JBB2XPiTniCN9wablzojHr7BYvhmuugRtuyB5A9sAD8PGP5x2V2TpuS5t1\nVq++CueeC/vuC0uXQlVVNvmlk4x1ME40Zp3NrFlwxhlw8MGw7bbwwgvZhf6PfjTvyMya5K4zs85g\n1apsBNn48dn9L//6r1ly2XbbvCMz2yzPDFDC528dXH09/OlPWXKZODG7sD9qFHzhC56LzHLV0pkB\nnGhK+Pytg3rllSy53HILbLMNfOUrWVfZTjttdlez9tDSROOuM7OOYMUKuPPOLMG8+CJ86Utwzz3Z\nw8c8PNk6ObdoSvj8LWd1dfDIIzBuHPzf/8FnP5t1jR13nB+ZbB2au85awInG2t3KlfDQQ1limTwZ\ndt01Sy6nnw477JB3dGbN4kTTAk40VnQRWVdYQ2KpqoLDDsumhDn+eBg8OO8IzVrMiaYFnGisKFav\nhkcfzRLL5MnZhJbHH58ll89+Fvr0yTtCsy3iwQBmeXj11fWJ5c9/zuYaO/74bFjy/vv7gr6VNLdo\nSvj8rZWqq+GZZ+CJJ7JnuzzxRDZj8nHHZcllxAjo3z/vKM0AiAhWVK/g3Q/e3WB554N3EOLCT13Y\n4mO6RWPW1t54Y31CeeKJbAqYvfeGQw/NusPGjs2utbjVYkVWH/UsX7Oc91a/x5LVS7LXD5awZPUS\nlnywJEsiq9/lnfff2SCp9CrrxQ5b7bBu2XHrHdmh9w4M6jeoXeJ2i6aEz9+asHr1h1sra9dmSeWw\nw7Llk5/0dRbbItW11Sxds5Slq5eydM1Slq1Ztq68dPXSDRNJQUJZtmYZfcr7sP1W27Nd7+3YvveG\nrztuvSM7brXjBkllh612oGdZ284k4cEALeBEU8JWrcrmDJszZ/3ywguwYEH2uOOGxHLoodlklW6t\nWIHq2mqWVy9n2ZplLF+TXquXb1BetmbZuqUwqSxdvZSa+hr69+pP/979N3xN5Q2SyFbbryv3792f\nsm75d0Q50bSAE00JWLYsSyAvvLBhUnn7bfjYx2DIkA2Xj34UevTIO2orgohgTe0aVq1dxYrqFR9a\nVq5d2WR9w1KYSOqjnm17bcu2PbelX69+G5YL6vr16rcueRSWt+6xNerE/3hxoikg6Vjgl0B34HcR\ncXmj9U40nVlElkhefx1eey17LVzmz89ukNx33yyJNLwOGQJ77AHdu+d9BrYR9VHP+2vf5/2a99e9\nrlq7qsm6ldUrs9e1jV4L6hvK3bt1p295X7bpuU2Ty8bW9e3Zd4ME0rusd6dOFFvKiSaR1B14EfhH\nYCHwJHB6RLxQsI0TTVJZWUlFRUXeYaxXXQ3vvgvvvJO9Llr04UTy+utZsth996aXQYNgwADo1rLH\nLnW47yJHjb+L2vpaVtesZnXtalbXrGZN7Zp15cavH9R80PRS23R9YRJZU7uG3j1606e8D1v32Jqt\ny7de97qurkdW7tuzb/Za3neT7/uU96G8e+un9vHvYj2POlvvEODliJgPIOl/gBOBFza1U6kq2v9E\ntbVZq2LFivWvS5asTyINiaRxefXqbEqWHXfMlp12yhLH0KFwwgnrk0kRnsfSkf6gRAQ19TWsrVvL\n2rq11NTVUF1Xzdq6tVTXVreovKZ2DdW11aypXbNuqa6rbrqctlt0/yJ6PtlzXUKpj3p6l/Wmd4/e\n9Crrta7cuG7r8q3Zqmwrtuqxftl+q+03eF+49C5LSSUllN49etNNHeu5jB3pd9HZdOVEMwB4o+D9\nAmB4TrF0HBHZneq1tRu+Ll+eTZWyZs36ZfXqDd83rvvggyx5NE4kha/V1dC3b7Zss032uv32GyaR\nffZZX071sc02hLIulPqop66+LnuNuoJyNXUrF62rb9iucbmuvo7a+tomy3WR3heUZy+ezYRnJ1Bb\nX9vspaa+hpq6mnWvtVG7wfvC19r6Ddc1JJG1dWs3SCpr69ZSW19Lj249KO9eTo/u2Wt593J6du+Z\nvZb13Gi5YbuGul5lvehV1ottem5Dz7Ke69737F5QblR/w4IbuOTbl2QJpEdvenTrUdJdRsXQ0KnS\n1OuW1DVnXbdu0K9f255PU7pyomlWn9jUj7bsGR8f+l+sia63xtsobSNAAWLD9w2hNqxTrD9O9wgU\nQbeAbhHZAnSrT+WAbmRlBXSvD8oiKKvPlh719XSvD3qk990D1nYTtRsssLy2npfHX8ea7t1YUybW\nlHWjukxZuXt6bWJZ0VOsKhcrysXKXcSKPWFluVhZXsaKnv34oAdEtwBqCN4BvU3wElBPqB5W1RPv\n18Hr9QT1oIbXgBCiG0Q3oBuK7ii6Ad3Xl6M7orCcbddQJrqjKEvbZ2VSuXAddEf1WfmDJ1/kL/VT\nIcqy9fXpNXpk+697X4aiZ0FdD1TfY305eqC6Huv2y9al9/VlWbm+nLIop0d9OVvX9UBRjurLoa4c\npW1p1ENR+JOrD1hNtjT+KRa+b6rcnPXvvtuPB+7ZqdX7t3T9pupas01rtt3YH+eaGrjiii3/zE31\n1jfk8MLXLanb3LpBg2DmzI3H01a68jWaQ4ExEXFsen8JUF84IEBS1zx5M7Mi82AAQFIZ2WCAo4E3\ngSoaDQYwM7Pi67JdZxFRK+k84EGy4c03OcmYmbW/LtuiMTOzjqFjjR9sJ5KOlTRX0kuS/j3vePIm\nab6kWZKekVSVdzztRdLvJS2WNLugbjtJ0yTNkzRVUjuMycnfRr6LMZIWpN/FM+kG6C5P0kBJj0p6\nXtJzki5I9SX329jEd9Gi30bJtWiacyNnqZH0KjAsIt7LO5b2JOnTwCrglog4INVdAbwbEVekf4T0\nj4iL84yzPWzku7gUWBkRV+UaXDuTtDOwc0TMlNQHmAGcBHyVEvttbOK7OJUW/DZKsUWz7kbOiKgB\nGm7kLHUld3NERPwFWNqo+gRgfCqPJ/ufqsvbyHcBpfm7eCsiZqbyKrKbvAdQgr+NTXwX0ILfRikm\nmqZu5BywkW1LRQAPSXpK0jfyDiZnO0XE4lReDLTsRquu53xJz0q6qRS6ihqTtAdwEPA3Svy3UfBd\nTE9Vzf5tlGKiKa2+wuY5PCIOAo4Dzk3dKCUvTYRXyr+X64A9gaHAIuDKfMNpX6mr6G7gOxGxsnBd\nqf020nfxv2TfxSpa+NsoxUSzEBhY8H4gWaumZEXEovT6DnAPWfdiqVqc+qWRtAvwds7x5CYi3o4E\n+B0l9LuQ1IMsyUyIiHtTdUn+Ngq+iz80fBct/W2UYqJ5ChgsaQ9J5cBpwKScY8qNpK0k9U3lrYER\nwOxN79WlTQJGpfIo4N5NbNulpT+mDU6mRH4XyiZzuwmYExG/LFhVcr+NjX0XLf1tlNyoMwBJx7H+\nOTU3RcRPcg4pN5L2JGvFQHYD762l8n1Iuh04CtiBrM/9P4CJwJ3A7sB84NSIWJZXjO2lie/iUqCC\nrGskgFeBcwquUXRZko4A/gzMYn332CVks4uU1G9jI9/F94HTacFvoyQTjZmZtZ9S7DozM7N25ERj\nZmZF5URjZmZF5URjZmZF5URjZmZF5URjZmZF5URjJUXSD9J058+m6c079d3uksZJ+uciHv8oSYe1\n1+dZ19Rln7Bp1lj6g/l54KCIqJG0HdAz57C2VLHn3PoMsBJ4ouDzzFrELRorJTuTPU+kBiAi3muY\n503SMEmVaQbrKQVzWg1LrZ+Zkn7W8GAwSV+R9KuGA0u6X9JRqTxC0uOSZki6M03t0/CAuTGpfpak\nj6X6PpJuTnXPSvripo7ThA2ma5fUPcValY73zVRfkc7xLkkvSPpDwT7Hp7qnJF0j6T5Jg4BzgP8n\n6el0lzjAkZIek/SKWzfWHE40VkqmAgMlvSjpN5KOhHWTBv4K+OeI+CRwM/DfaZ+bgXMjomG6jY39\niz6AkLQD8APg6IgYRvagqO8WbPNOqr8OuCjV/whYGhEHRsTHgUc2c5zNORtYFhGHkE12+I00xTtk\n04Z8BxgCfFTSpyT1Aq4Hjk3nvwPZBMWvpfqrIuITEfFXsqS2c0QcDvwT8NNmxmQlzF1nVjIi4n1J\nw4BPk3UJ3SHpYrI/4vuRPZMHsjnw3pS0LbBt+gMLMIHsUQobI+BQsj/ij6djlQOPF2zzx/T6NPDF\nVD6abHLXhjiXSfqnzRxnU0YAB0g6Jb3fBvgHoAaoiog3ASTNJJvq/QPg7ymxANwOfLPRea0LjzSZ\nZES8IKmknslireNEYyUlIuqBPwF/St1go8gSzfMR8anCbZt4mFPhH9xaNuwR6FVQnhYRX9pICNXp\ntY4N//9r6mmFmzpOoaZaWedFxLTCCkkVBZ9fGEPj/Tf35MS1LdjWzF1nVjok7S1pcEHVQWSz8L4I\n7Cjp0LRdD0lD0sy8yyQdnrY/o2Df+cBQZQaSdVEF2dMHD5e0VzrW1o0+synTgHML4uzXwuM0/mP/\nIDBaUlnBeW+1kX0jnf9H0zUZyFpXDclnJdB3M/GbbZITjZWSPsA4Sc9LehbYBxiTBgecAlyeupOe\nARqG9H4V+I2kZwoPlLrTXgXmAFeTtYqIiHeBrwC3p894HPhYE7EUXu/5L6C/pNnp8ytacByAGyS9\nkZbHyB5ENQd4OrXarmN9y+VDrZ+IWAOMBqZIegpYkRaA+4CTGw0GKDyGR6HZZvkxAWbNlP7Ff39E\nHJB3LG1N0tYR8X4q/waYFxFX5xyWdRFu0Zg1n+i6/4L/RrqB9XmywQM35B2QdR1u0ZiZWVG5RWNm\nZkXlRGNmZkXlRGNmZkXlRGNmZkXlRGNmZkXlRGNmZkX1/wFxy3AcuLeBcwAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We clearly have a problem here, and that is that **the runtime for multiple sequence alignment using full dynamic programming algoriths grows exponentially with the number of sequences to be aligned**. If $n$ is our sequence length, and $s$ is the number of sequences, that means that runtime is proportional to $n^s$. In pairwise alignment, $s$ is always equal to 2, so the problem is more manangeable. However, **for the general case of $s$ sequences, we really can't even consider Smith-Waterman or Needleman-Wunsch for more than just a few sequences.** The pattern in the plots above should illustrate why. \n", "\n", "As we explored with database searching, we need to figure out how to align fewer sequences. This is where *progressive alignment* comes in." ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Progressive alignment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**In progressive alignment, the problem of exponential growth of runtime and space is managed by selectively aligning pairs of sequences, and aligning alignments of sequences.** What we typically do is identify a pair of closely sequences, and align those. Then, we identify the next closely related sequence to that initial pair, and align that sequence to the alignment. This concept of aligning a sequence to an alignment is new, and we'll come back to it in just a few minutes. The other concept of identifying the most closely related sequences, and then the next most closely related sequence, and so on should sound familar. It effectively means that we're traversing a tree. And herein lies our problem: **we need a tree to efficently align multiple sequences, but we need an alignment to build a good tree**. \n", "\n", "You probably have two burning questions in your mind right now:\n", "\n", "1. How do we build a tree to guide the alignment process, if we need an alignment to build a good tree?\n", "2. How do we align a sequence to an alignment, or an alignment to an alignment?\n", "\n", "We'll explore both of those through-out the rest of this notebook. First, let's cover the process of progressive multiple sequence alignment, just assuming for a moment that we know how to do both of those things.\n", "\n", "The process of progressive multiple sequence alignment could look like the following. First, we start with some sequences and a tree representing the relationship between those sequences. We'll call this our **guide tree**, because it's going to guide us through the process of multiple sequence alignment. In progressive multiple sequence alignment, we build a multiple sequence alignment for each internal node of the tree, where the alignment at a given internal node contains all of the sequences in the clade defined by that node.\n", "\n", "\n", "\n", "Starting from the root node, descend the bottom branch of the tree until you get to the an internal node. If an alignment hasn't been constructed for that node yet, continue descending the tree until to get to a pair of nodes. In this case, we follow the two branches to the tips. We then align the sequences at that pair of tips (usually with Needleman-Wunsch, for multiple sequence alignment), and assign that alignment to the node connecting those tips.\n", "\n", "\n", "\n", "Next, we want to find what to align the resulting alignment to, so start from the root node and descend the top branch of the tree. When you get to the next node, determine if an alignment has already been created for that node. If not, our job is to build that alignment so we have something to align against. In this case, that means that we need to align `s1`, `s2`, and `s3`. We can achieve this by aligning `s1` and `s3` first, to get the alignment at the internal node connecting them.\n", "\n", "\n", "\n", "We can next align the alignment of `s1` and `s3` with `s2`, to get the alignment at the internal node connecting those clades.\n", "\n", "\n", "\n", "And finally, we can compute the alignment at the root node of the tree, by aligning the alignment of `s1`, `s2`, and `s3` with the alignment of `s4` and `s5`.\n", "\n", "\n", "\n", "**The alignment at the root node is our multiple sequence alignment.**\n" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Building the guide tree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's address the first of our outstanding questions. **I mentioned above that *we need an alignment to build a good tree*. The key word here is *good*. We can build a very rough tree - one that we would never want to present as representing the actual relationships between the sequences in question - without first aligning the sequences.** Remember that building a UPGMA tree requires only a distance matrix, so if we can find a non-alignment-dependent way to compute distances between the sequences, we can build a rough UPGMA tree from them.\n", "\n", "Let's compute distances between the sequences based on their *word* composition. We'll define a *word* here as `k` adjacent characters in the sequence. We can then define a function that will return all of the words in a sequence as follows. These words can be defined as being overlapping, or non-overlapping. We'll go with overlapping for this example, as the more words we have, the better our guide tree should be." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from skbio import BiologicalSequence\n", "%psource BiologicalSequence.k_words" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " \u001b[0;32mdef\u001b[0m \u001b[0mk_words\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moverlapping\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\"Get the list of words of length k\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Parameters\u001b[0m\n", "\u001b[0;34m ----------\u001b[0m\n", "\u001b[0;34m k : int\u001b[0m\n", "\u001b[0;34m The word length.\u001b[0m\n", "\u001b[0;34m overlapping : bool, optional\u001b[0m\n", "\u001b[0;34m Defines whether the k-words should be overlapping or not\u001b[0m\n", "\u001b[0;34m overlapping.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Returns\u001b[0m\n", "\u001b[0;34m -------\u001b[0m\n", "\u001b[0;34m iterator of BiologicalSequences\u001b[0m\n", "\u001b[0;34m Iterator of words of length `k` contained in the\u001b[0m\n", "\u001b[0;34m BiologicalSequence.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Raises\u001b[0m\n", "\u001b[0;34m ------\u001b[0m\n", "\u001b[0;34m ValueError\u001b[0m\n", "\u001b[0;34m If k < 1.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Examples\u001b[0m\n", "\u001b[0;34m --------\u001b[0m\n", "\u001b[0;34m >>> from skbio.sequence import BiologicalSequence\u001b[0m\n", "\u001b[0;34m >>> s = BiologicalSequence('ACACGACGTT')\u001b[0m\n", "\u001b[0;34m >>> [str(kw) for kw in s.k_words(4, overlapping=False)]\u001b[0m\n", "\u001b[0;34m ['ACAC', 'GACG']\u001b[0m\n", "\u001b[0;34m >>> [str(kw) for kw in s.k_words(3, overlapping=True)]\u001b[0m\n", "\u001b[0;34m ['ACA', 'CAC', 'ACG', 'CGA', 'GAC', 'ACG', 'CGT', 'GTT']\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mk\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"k must be greater than 0.\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msequence_length\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0moverlapping\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mstep\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mstep\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msequence_length\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mk\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstep\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32myield\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m+\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "for e in BiologicalSequence(\"ACCGGTGACCAGTTGACCAGTA\").k_words(3):\n", " print(e)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACC\n", "CCG\n", "CGG\n", "GGT\n", "GTG\n", "TGA\n", "GAC\n", "ACC\n", "CCA\n", "CAG\n", "AGT\n", "GTT\n", "TTG\n", "TGA\n", "GAC\n", "ACC\n", "CCA\n", "CAG\n", "AGT\n", "GTA\n" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "for e in BiologicalSequence(\"ACCGGTGACCAGTTGACCAGTA\").k_words(7):\n", " print(e)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACCGGTG\n", "CCGGTGA\n", "CGGTGAC\n", "GGTGACC\n", "GTGACCA\n", "TGACCAG\n", "GACCAGT\n", "ACCAGTT\n", "CCAGTTG\n", "CAGTTGA\n", "AGTTGAC\n", "GTTGACC\n", "TTGACCA\n", "TGACCAG\n", "GACCAGT\n", "ACCAGTA\n" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "for e in BiologicalSequence(\"ACCGGTGACCAGTTGACCAGTA\").k_words(3, overlapping=False):\n", " print(e)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACC\n", "GGT\n", "GAC\n", "CAG\n", "TTG\n", "ACC\n", "AGT\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we then have two sequences, we can compute the word counts for each and define a distance between the sequences as the fraction of words that are unique to either sequence. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import kmer_distance\n", "%psource kmer_distance" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mkmer_distance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequence1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msequence2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moverlapping\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\"Compute the kmer distance between a pair of sequences\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Parameters\u001b[0m\n", "\u001b[0;34m ----------\u001b[0m\n", "\u001b[0;34m sequence1 : BiologicalSequence\u001b[0m\n", "\u001b[0;34m sequence2 : BiologicalSequence\u001b[0m\n", "\u001b[0;34m k : int, optional\u001b[0m\n", "\u001b[0;34m The word length.\u001b[0m\n", "\u001b[0;34m overlapping : bool, optional\u001b[0m\n", "\u001b[0;34m Defines whether the k-words should be overlapping or not\u001b[0m\n", "\u001b[0;34m overlapping.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Returns\u001b[0m\n", "\u001b[0;34m -------\u001b[0m\n", "\u001b[0;34m float\u001b[0m\n", "\u001b[0;34m Fraction of the set of k-mers from both sequence1 and\u001b[0m\n", "\u001b[0;34m sequence2 that are unique to either sequence1 or\u001b[0m\n", "\u001b[0;34m sequence2.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Raises\u001b[0m\n", "\u001b[0;34m ------\u001b[0m\n", "\u001b[0;34m ValueError\u001b[0m\n", "\u001b[0;34m If k < 1.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Notes\u001b[0m\n", "\u001b[0;34m -----\u001b[0m\n", "\u001b[0;34m k-mer counts are not incorporated in this distance metric.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msequence1_kmers\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequence1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk_words\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moverlapping\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msequence2_kmers\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequence2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk_words\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moverlapping\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mall_kmers\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msequence1_kmers\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0msequence2_kmers\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mshared_kmers\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msequence1_kmers\u001b[0m \u001b[0;34m&\u001b[0m \u001b[0msequence2_kmers\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mnumber_unique\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_kmers\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mshared_kmers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mfraction_unique\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnumber_unique\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_kmers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mfraction_unique\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then use this as a distance function..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s1 = BiologicalSequence(\"ACCGGTGACCAGTTGACCAGT\")\n", "s2 = BiologicalSequence(\"ATCGGTACCGGTAGAAGT\")\n", "s3 = BiologicalSequence(\"GGTACCAAATAGAA\")\n", "\n", "print(s1.distance(s2, kmer_distance))\n", "print(s1.distance(s3, kmer_distance))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.75\n", "0.857142857143\n" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we wanted to override the default to create (for example) a 5-mer distance function, we could use ``functools.partial``." ] }, { "cell_type": "code", "collapsed": false, "input": [ "fivemer_distance = partial(kmer_distance, k=5)\n", "\n", "s1 = BiologicalSequence(\"ACCGGTGACCAGTTGACCAGT\")\n", "s2 = BiologicalSequence(\"ATCGGTACCGGTAGAAGT\")\n", "s3 = BiologicalSequence(\"GGTACCAAATAGAA\")\n", "\n", "print(s1.distance(s2, fivemer_distance))\n", "print(s1.distance(s3, fivemer_distance))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.916666666667\n", "1.0\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now apply one of these functions to build a distance matrix for a set of sequences that we want to align." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from skbio import SequenceCollection\n", "query_sequences = SequenceCollection(\n", " [BiologicalSequence(\"ACCGGTGACCAGTTGACCAGT\", \"s1\"),\n", " BiologicalSequence(\"ATCGGTACCGGTAGAAGT\", \"s2\"),\n", " BiologicalSequence(\"GGTACCAAATAGAA\", \"s3\"),\n", " BiologicalSequence(\"GGCACCAAACAGAA\", \"s4\"),\n", " BiologicalSequence(\"GGCCCACTGAT\", \"s5\")])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "guide_dm = query_sequences.distances(kmer_distance)\n", "print(guide_dm)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "5x5 distance matrix\n", "IDs:\n", "'s1', 's2', 's3', 's4', 's5'\n", "Data:\n", "[[ 0. 0.75 0.85714286 0.85714286 0.89473684]\n", " [ 0.75 0. 0.61111111 0.86363636 1. ]\n", " [ 0.85714286 0.61111111 0. 0.66666667 0.95 ]\n", " [ 0.85714286 0.86363636 0.66666667 0. 0.83333333]\n", " [ 0.89473684 1. 0.95 0.83333333 0. ]]\n" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "scikit-bio also has some basic visualization functinality for these objects. For example, we can easily visualize this object as a heatmap." ] }, { "cell_type": "code", "collapsed": false, "input": [ "fig = guide_dm.plot(cmap='Greens')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAVcAAAEACAYAAAAHujVXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFy1JREFUeJzt3XuwXWV5x/Hv7xxAczGEAAOBIGpALk6ASLmI3CqKkQh4\nRVHLiICtCtppHayKUoag3GSUCgwgRaalUqcViAQwBSUhGJFLIKGNEgwIhMAkEMLdhOTpH3vleLI9\ne6919t7vWXuv/D7MmuzLWu961pzwnDfPujyKCMzMrLP6yg7AzKyKnFzNzBJwcjUzS8DJ1cwsASdX\nM7MEnFzNzBJwcjWzTZqkf5X0jKRFTda5WNISSQ9KmlpkXCdXM9vUXQ1Ma/SlpKOAXSJiV+DzwGVF\nBnVyNbNNWkTcCaxqssoxwDXZuncD4yVtlzeuk6uZWXM7Ak8Mev8kMClvo81SRSPJ99WaWSERoXa2\nH26+aWF/9evn7i9ZcgV49fWXUw4PwIyzzuGMM7+ZfD8A77jo2BHZD8Bzs//AhCMnJ9/PE8tXJN/H\nYOvmP03/u7Yf0X2mNlLHtPalNcn3sZH7VsC+26bfz5WLOzPOe3cstt5ty4Y78jJgp0HvJ2WfNeWy\ngJlVQ7+KLcM3EzgBQNKBwPMR8UzeRklnrmZmI0atVRYk/QQ4DNhG0hPAmcDmABFxeUTcLOkoSY8A\nLwMnFhm355ProYcdUnYISYyavFXZISShSWPLDqHjqnhMAEwcXXYEw9Ni1TYiji+wzqnDHbf3k+vh\nh5YdQhKjJk8oO4Qk+naqXiKq4jEBsMOYsiMYnhZnrqn0fHI1MwO67gySk6uZVYNnrmZmCXRXbnVy\nNbOKaO0yq2ScXM2sGlwWMDNLoLtyq5OrmVVEX3dlVydXM6uG7sqtTq5mVhGuuZqZJeCygJlZAk6u\nZmYJdFdudXI1s4rospnrsB91IOlUSY9IWi+pmo9uMrPeo4LLUJtK0yT9Lmuf/bUhvt9K0vVZa+27\nJb0jL5xWniMzDzgC+GML25qZpSEVW/5iM/UDP6TWXntP4HhJe9St9g3g/ojYm1pXgh/khdM0uUoa\nI2mWpAckLZJ0XEQ8EBFOrGbWXfoKLn9pf+CRiHgsItYC1wH1DfP2AH4FEBG/B94iqWmDsbyZ6zRg\nWUTsExFTgFtz1jczK0eLM1eGbp1d3+3wQeAjtd1of2Bnctpr553QWghcKOlc4KaImJez/kZmnHXO\nwOtDDzuksl0DzGwYnnoZlr/S+XEbndDK31+RttznAj+QtABYBCwA1jXboGlyjYglkqYC04EZkm6P\niLMLBAIwYi2vzayH7DBm4xYy96/szLiN/h0+aUxtaby/+tbZO1GbvQ6IiBeBz214L+lRYGmzcJom\nV0kTgVURca2k1cBJ2ecbfkV017UPZrbpav3213uBXSW9BXgK+ASwUdNCSVsCr0bEGkmnAHMi4qVm\ng+aVBaYAF0haD6wBviDpNOB0YDtgoaRZEfH5Fg7IzKxzWu/++rqkU4FfAP3AVRGxWNLfZt9fTu0q\ngh9LCuAhsolmM3llgdnA7LqP7wf+ZfiHYGaWUBs3EUTELcAtdZ9dPuj1fGC34YzpO7TMrBr8VCwz\ns85TwZlrkUsDOsHJ1cwqQQVnrk6uZmbD0GVVASdXM6uGvoLZtemV/x3k5GpmlVC0LDBSnFzNrBKc\nXM3MEnByNTNLoK/LOhE4uZpZJXjmamaWgLrsOVJOrmZWCZ65mpkl0GW5taUGhWZmXadPKrQMpUD3\n120k3Zr1E3xI0mdz42n/kMzMyiep0DLEdkW6v54KLIiIfYDDge9JavovfydXM6uEvj4VWoZQpPvr\ncmBc9noc8GxEvN4snqQ113dcVB9fb/vff7ix7BA67tiZXyo7hCRum3N/2SF03qMvlB1BV2vjhNZQ\n3V8PqFvnSuCXkp4C3gQclzeoT2iZWSU0Sq5rHlvN2sdWN9u0yFMIvwE8EBGHS5oM/I+kvbPGhUNy\ncjWzSmiUXN/w1vG84a3jB96/MueJ+lVyu78CBwHnAETEH7Lur7tRa244JNdczawSWj2hxaDur5K2\noNb9dWbdOr8D3pvtZztqibX11tpmZr2i1ZJrwe6v3wGulvQgtUnp6RHxXLNxnVzNrBLauUOrQPfX\nlcDRwxnTydXMKqG/r7uqnE6uZlYJ3Xb7q5OrmVWCH9xiZpaAHzloZpaAZ65mZgk4uZqZJdBludXJ\n1cyqoc+XYpmZdZ7LAmZmCXRZbnVyNbNq8MzVzCwBJ1czswS6Lbm2dHpN0rVZp8RFkq7Ka9RlZpZa\nGz20inR//aqkBdmySNLrksYPNdZAPC0ex79HxO4RMQUYBZzc4jhmZh2RsvtrRFwYEVMjYirwdeCO\niHi+WTy5M05JY4CfUmvi1Q+cHRE/HbTKPcCkvHHMzFJqoyww0P01G2dD99fFDdb/FPCTvEGL/HN+\nGrAsIqZnO97QXhZJmwOfAb5cYBwzs2TaKLkW6f6a7UOjgfcDX8wbtEhyXQhcKOlc4KaImDfou0uB\nORFx11AbPjf7DwOvR03eilGTJxTYnZlV2nN/glV/6viwjWauLz28kpceXtls0yLdXzc4GpiXVxKA\nAsk1IpZImgpMB2ZIuj0izpZ0JrB1RJzSaNsJR04eRsxmtkmY8IbassGjDbtTD0+D5Dp2t20Zu9u2\nA++fmfVw/SpFur9u8EkKlASgWM11IrAqIq6VtBo4SdLJwJHAEUV2YmaWWhs114Hur8BT1Lq/Hj/E\n+FsCh1KrueYqUhaYAlwgaT2whlqt4TfAY8D87ID+OyJmFNmhmVkKjS6zylOw+yvAh4BfRMSrRcYt\nUhaYDcyu+3jzwpGbmY2AlN1fs/fXANcUHdMX/5tZJXTbHVpOrmZWCV2WW51czawaPHM1M0vAydXM\nLAEnVzOzBFq9FCsVJ1czqwTPXM3MEnByNTNLwMnVzCyBLsutTq5mVg2euZqZpeDkambWef1ddilW\nqw0Kzcy6SqsNCrNtm3Z/zdY5POv++pCkO/Li8czVzCqhr8WywKDur++l1pXgHkkzI2LxoHXGA5cA\n74+IJyVtkxtPS9GYmXWZNmauA91fI2ItsKH762CfotYU4EmAiGjalAucXM2sIvoKLkMYqvvrjnXr\n7ApMkPQrSfdK+pu8eJKWBZ5YviLl8CPu2JlfKjuEjrvxmEvKDiGJw1/4bNkhdNzdT95bdghdrVFZ\n4NmHlvPsQ08327RI99fNgXdS6xs4mlqLq99ExJJGG7jmamaV0Ohk1TZTdmCbKTsMvH/kPx+oX6VI\n99cngJVZ/6xXJc0F9gYaJleXBcysEvr7+gotQxjo/ippC2rdX2fWrXMjcLCkfkmjgQOA/2sWj2eu\nZlYJrc4Ui3R/jYjfSboVWAisB66MCCdXM6u+Vi/FgsLdXy8ELiw6ppOrmVWCny1gZpZAOzPXFJxc\nzawSuiu1OrmaWUVsNvSVAKVxcjWzSnDN1cwsAddczcwS6K7U6uRqZhXhmauZWQJOrmZmCfiElplZ\nAv1OrmZmneeygJlZAk6uZmYJdFvNddj3i0m6StIDkhZKul7SlikCMzMbjjZ6aOW21s7aaq/OWmsv\nkHRGXjytzFz/PiJezHb4PeA0YEYL45iZdUyrM9cirbUzcyLimKLjNp25ShojaVY2U10k6bhBiVXA\nKCC3xayZWWp9UqFlCEVaa8MwbwLLKwtMA5ZFxD4RMQW4FUDS1cByYC/gR8PZoZlZCm300CrSWjuA\ngyQ9KOlmSXvmxZNXFlgIXCjpXOCmiJgHEBEnSuqjNpX+JnDWUBuvm//ndraaNJa+ncbmxWNmVbfi\nVVjxWseH7WswsXz8/sd5YsHjzTYt0lr7fmCniHhF0geAG4C3N9ugaXKNiCWSpgLTgRmSbo+Is7Pv\n1ku6Dji90fb979q+QMxmtknZdlRt2WDx8x0ZtlHNded9d2bnfXceeP/rq++qXyW3tfaGcmj2+hZJ\nl0qaEBHPNYonr+Y6EXgtIq6l1phrX0mTs+8EHAMsaDaGmdlIaKPmmttaW9J2Wc5D0v6AmiVWyC8L\nTAEukLQeWAOcClwjadygoL6UM4aZWXJq8aGDRVprAx8DviDpdeAV4JN54+aVBWYDs+s+PriF+M3M\nkmrnJoK81toRcQlwyXDG9B1aZlYJvv3VzCyBfvWXHcJGnFzNrBK67dkCTq5mVgmtntBKxcnVzCrB\nNVczswRcFjAzS6Bv+E9QTcrJ1cwqwTNXM7ME+uWZq5lZx3nmamaWgK8WMDNLwNe5mpkl0NdlNdfu\nisbMrEWSCi0Ntm3a/XXQevtJel3SR/Li8czVzCqh1asFinZ/zdY7j1ovwdwahGeuZlYJKvjfEIp2\nfz0N+C9gRZF4nFzNrBLaaPOS2/1V0o7UEu5l2Ue5TQ1dFhiG2+bcX3YIHXf4C58tO4Qk7vjMj8sO\noeO2WXxY2SEk8TKPdmQcNSgL/P63D/PwPUuabVqk++v3gX+KiMh6aeWWBZxczawSGl2Ktfv+u7H7\n/rsNvJ916S31q+R2fwX2Ba7LTohtA3xA0tqImEkDTq5mVglt3EQw0P0VeIpa99fjB68QEW/b8FrS\n1cDPmyVWcHI1s4po9fbXgt1fh83J1cwqoZ0Ht+R1f637/MQiYzq5mlklNDqhVRYnVzOrhD4/W8DM\nrPP8yEEzswT8VCwzswQ8czUzS8A1VzOzBPrUX3YIG3FyNbNKcFnAzCwBn9AyM0vAM1czswR8QsvM\nLAHPXM3MElCXNVZpORpJF0t6sZPBmJm1qk99hZah5HV/lXSspAclLZB0n6T35MXT0sxV0l8B4ynW\nHsHMLLlWH5ZdsPvrbRFxY7b+FOB6YJem8eTsdIykWZIekLRI0sezQM4HTqdAHxkzs5GQsvtrRLw8\n6O1YYGVePHkz12nAsoiYDiBpHHAqcGNEPN1tBWQz23S1kY+G6v56wBDjfwj4LjARODJv0LzkuhC4\nUNK5wE3AUuBjwOEqcCTr5j/958AmjaVvp7F5m5hZxa17/AXWPd750zWNTmgtnL+IhfMfarZpofJm\nRNwA3CDpEODfgN2ard80uUbEEklTgenADOCX1OoMj2SrjJb0cES8fajt+9+1fZGYzWwT0v/mcfS/\nedzA+7V3Le/IuI3me3sftBd7H7TXwPtrv39d/SpFur8OiIg7JW0maeuIeLbRek2Tq6SJwKqIuFbS\nauCkiJg46PsXGyVWM7OR1MZNBLndXyVNBpZGREh6J0CzxAr5ZYEpwAWS1gNrgC/Ufe+rBcysKzS6\nzCpPwe6vHwVOkLQWeAn4ZN64eWWB2cDsJt+Pa/SdmdlIaucEe17314g4n9pVUoX5Di0zq4Ruu0PL\nydXMKqHVmwhScXI1s0rw81zNzBLotpuanFzNrBJavVogFSdXM6uEPp/QMjPrPJcFzMwS8AktM7ME\nPHM1M0vAM1czswScXM3MEpAvxTIz6zzXXM3MEui2skB3zaPNzFrURoPCIq21P5211l4o6S5Jew01\nzmCeuZpZJbRaFijYWnspcGhErJY0DbgCOLDZuJ65mlklJG6tPT8iVmdv7wYm5cWTdOa69qU1KYcf\neY++UHYEHXf3k/eWHUIS2yw+rOwQOm7lOXPKDiGJUeeN6cg4bdRcC7XWHuQk4Oa8QV0WMLNKaPRU\nrN/Ou5d75jWdRBTuBSjpr4HPAe/OW9fJ1cwqoVHN9YBD9uOAQ/YbeH/Z+VfUr1KotXZ2EutKYFpE\nrMqLxzVXM6uENmquA621JW1BrbX2zI3Glt4M/Az4TEQ8UiQez1zNrBJarbkWbK39bWAr4LJshrw2\nIvZvNq6Tq5lVQuLW2icDJw9nTCdXM6uEbrtDy8nVzCrBydXMLAE/FcvMLAE/FcvMLAGXBczMEnBy\nNTNLwGUBM7MEPHM1M0vAydXMLIFGT8Uqi5OrmVWCa65mZgm4LGBmlkR3JddhFykk/VjSUkkLsiW3\nC6KZWWoquAy5bX73190lzZf0mqR/LBJPKzPXAL4aET9rYVszsyQSd399FjgN+FDRcZvOXCWNkTRL\n0gOSFkk6bsNXwwvfzCytxN1fV0TEvcDaovHklQWmAcsiYp+ImALcmn3+XUkPSrooa4tgZlaylgsD\nQ3V/3bHdaPLKAguBCyWdC9wUEfMkfT0ins6S6hXA14Czh9z6vhV/fj1xNOzQmRa6Zta75t4xl7lz\n7uz4uI3KAnfN/TW/nju/2aaFu78OK56I5uNKGg9MB04Bbo+Iswd9dxi1+uvRQ2wXnLJHh8Mt2aMv\nlB1B521ZzX94jNll67JD6LiV58wpO4QkRm02hohoq9QoKZ55dVmhdbcbteNG+5N0IPDPETEte/91\nYH1EnDfEfs4EXoqI7+Xtp+nMVdJEYFVEXCtpNXCSpO2zmauADwOLCh2RmVlCbVznOtD9FXiKWvfX\n4xvupqC8ssAU4AJJ64E1wBeBayVtm+1kAfCNojszM0slZfdXSdsD9wDjgPWSvgLsGREvNRq3aXKN\niNnA7LqPj2jpCMzMulSB7q9PAzsNZ0zfoWVmleBnC5iZJeBnC5iZJeHkambWcd2VWp1czawiXHM1\nM0vCydXMrOO6K7U6uZpZRajLemh1VzRmZhXhmauZVYKvczUzS8LJ1cys47ortTq5mllFdNt1rr1/\nQuupl8uOII3n/lR2BGmseLXsCDpu3eMVfIg6tY4BvaX1/q953V+zdS7Ovn9Q0tS8aHo/uS5/pewI\n0lhV1eT6WtkRdNy6x18sO4QkUrRiSanVBoWDur9OA/YEjpe0R906RwG7RMSuwOeBy/Li6f3kamZG\nrSxQZBlCbvdX4BjgGoCIuBsYL2m7ZvE4uZrZpq5I99eh1pnUbNC0J7SuXJx0+AH3rxyZ/Yy0R6v5\nz00WP598Fy/zaPJ9DLb2ruXJ9zHqvJHvnnzO2d8Z8X22avRmY1vdtGj31/ppb9PtkiXXdrs5mpkV\n1Wa+WcbGLVx2ojYzbbbOpOyzhlwWMLNN3UD3V0lbUOv+OrNunZnACTDQivv5iHim2aC+ztXMNmlF\nur9GxM2SjpL0CPAycGLeuIooWm4wM7OiXBYwM0ugEslV0hVlx9AqSZtJ+jtJMyS9u+67M8qKy8za\n0zPJVdKEBsvWwPSy42vD5cChwLPAxZIuGvTdR8sJqX2StpR0RvaLYzNJZ0q6SdLZkkaVHV+nSHq4\n7BjaJWmvQa+3kPQtST+X9B1Jo8uMrZf1TM1V0nrgjw2+3jEithjJeDpF0qKImJK93hy4FNga+BQw\nPyJy72HuRpKuB5YCo4C9gIXU7nw5BpgQEZ8rMbyWSHqR2rWNgy/7GQ28AkREjCslsDZJWrDh71n2\ny30CcDXwYWo/qxPKjK9X9dLVAkuBIyJiowSr2v1sj5cTUkdsvuFFduvdKZLOBG4HWr4qugtMjogP\nZz+f5cChEbFe0p3AgyXH1qqrgfHA6RHxdHZsSyPirSXH1UlHAPtFxBpJc6n9UrQW9ExZAPg+sFX9\nh1Gbep8/8uF0zH2SPjD4g4g4i9r/yG8pJaLOWA8DP59bImLw+54UEV8GLgb+Q9JX6K3/f5rZUtJH\nJH0UGBURa2DgZ9WzP6+y9cxfjoj4YUQ8IOk4SeMAstrQ9cBdJYfXsoj4dETcUn9c1OrIB5YbXVvu\nk/QmgIgYuCZQ0mSgZ5/RFxH3Au/L3t4BvLG8aDpmLnA08EHgLknbA2R/rigzsJ4WET21AIuyPw+m\n9pf7g8DdZcfl42p4XMcB47LX3wJuoPbPztJja+OYPg6MA3YAvg1cD7yz7LgS/KyuB/YtO65eXXpm\n5jrIuuzPDwJXRsRNQE+ezKpT1eP6VkS8IOlgavW8H1F7dmYv+3ZEvAC8DXgPcBUFnu/ZA+p/VldR\nO8FqLejF5Losu671E8AsSW+kN4+jXlWPq4q/NKp4TFDd4ypFz1yKtYGkMdSeGL4wIpZImghMiYjZ\nJYfWlgof1yxqTw96HzAVeI1auWPvUgNrQxWPCap7XGXpueRqvaWKvzSqeExQ3eMqi5OrmVkCVajp\nmZl1HSdXM7MEnFzNzBJwcjUzS+D/AT5OodAg6zElAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can next use some functionality from SciPy to cluster the sequences with UPGMA, and print out a **dendrogram**." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from scipy.cluster.hierarchy import average, dendrogram, to_tree\n", "\n", "for q in query_sequences:\n", " print(q)\n", "\n", "guide_lm = average(guide_dm.condensed_form()) \n", "guide_d = dendrogram(guide_lm, labels=guide_dm.ids, orientation='right', \n", " link_color_func=lambda x: 'black')\n", "guide_tree = to_tree(guide_lm)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACCGGTGACCAGTTGACCAGT\n", "ATCGGTACCGGTAGAAGT\n", "GGTACCAAATAGAA\n", "GGCACCAAACAGAA\n", "GGCCCACTGAT\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAWsAAAD7CAYAAACsV7WPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAC85JREFUeJzt3W2Ipeddx/Hvr7sptIvb3SFSl02LUKJV2cb6UKsWXQmV\nbadWjA8vfADrNog1rW8kSkFX2WiD0SJFLDaEWHChXVqlJQmxmholpQkG9smHQHZrtV2tmGbYRmlI\nwv59MWfLdDJz5uzsnHPPf873A8POnPuea669mPnOvdeZ2TtVhSRpe3vJ0BOQJG3MWEtSA8Zakhow\n1pLUgLGWpAaMtSQ1sHvcwST+XJ8kbUJVZSvH2/DKuqp2/MuxY8cGn8N2e3FNXBPXZfMv0+A2iCQ1\nYKwlqQFjDRw+fHjoKWw7rsmLuSZrc11mI+P2V5LUtPZfJGmnSkLN+glGSdLwjLUkNWCsJakBYy1J\nDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJGkCSe5KcTnI2yV8lecXY8/1f\n9yRpa03yv+4l+Yaqemb0+h8BS1V1x3rne2UtSVOWZE+S+0dX0ueS/MyKUAd4GfDUuDHG3jBXkrQl\njgAXq2oRIMne0Z/3Am8BzgPvGTeA2yACYGFhgaWlpaGnIe0YK7dBktwIfAr4KHBfVT2y4thLgD8B\n/ruqfne98Yy1gK/tsQ09DWlHWGvPOsk+YBG4FXioqo6vOPZDwO1V9bb1xnQbRJKmLMkBlp9APJHk\nEvDOJK+pqgujPeu3A6fGjWGsJWn6DgF3JbkMPAfcBnz4yt418Djwq+MGcBtEgNsg0lbyhrmSNKeM\ntSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPG\nWpIaMNaS1ICxlqQGvK3XJngncEmz5m29NmEn3gJrJ/6dpKF4Wy9JmlPGWpIaMNaS1ICxlqQGjLUk\nNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS9IAkpxI\n8kSSc0nuSTL2zl3GWpKG8RdV9dqqOgS8DHjnuJO9B6MkTVmSPcBJ4CCwCzheVSdXnPKPwA3jxjDW\nkjR9R4CLVbUIkGTvlQNJrgN+HnjPuAHcBpGk6TsLvDnJnUneVFVfWXHsT4G/r6rPjBvAu5tvwk68\nE/jCwgJLS0tDT0PaMVbf3TzJPmARuBV4qKqOJzkG3FRVt2w0nrHehJ0Ya0lbZ9SIrHj7ALBUVc8m\neRtwFLgfeAdwc1U9u+GYxvrqGWtJ46wR6x8F7gIuA88B7wIeBT4P/O/otI9X1R3rjmmsr56xljTO\n6lhvBZ9glKQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8ZakhqYi//IyV+lltTdXPxSzFb/Eou/FCNp\nHH8pRpLmlLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMt\nSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGW\npAaMtSQ1YKwlaUBJPpDkmY3OM9aSNJAk3wPsA2qjc421JE1Zkj1J7k9yOsm5JD+dZBfwB8DtQDYa\nY/fUZylJOgJcrKpFgCR7gduAT1TVl5INW22sN2P//v1MsriSNHIW+MMkdwL3AZ8Dfgo4nAljkqr1\nt0qS1LjjXSRhJ/w9JPUwak5WPbYPWARuBT4N/Arw7Ojwq4ELVfUt643plbUkTVmSA8BSVZ1Icgk4\nWlUHVhx/ZlyowVhL0iwcAu5Kchl4juWr6pU2/Ke/2yCStMXW2ga5Vv7oniQ1MNVtkIWFBZaWlqb5\nISRpLkx1G2S7bD9sl3lImg9ug0jSnDLWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIa\nMNaS1ICxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkN\nGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIGkOS2JOeTXE6y\nsNH5xlqShvEIcDPw75OcvHu6c5EkJdkDnAQOAruA41V1cnRsojGMtSRN3xHgYlUtAiTZe7UDuA0i\nSdN3FnhzkjuTvKmqvnK1A8zFlfX+/fsn/qeGJG21qnoyyeuBReCOJA9V1fGrGWMuYv30008PPQVJ\nc2T1xWGSA8BSVZ1Icgk4Onr8yokbXk26DSJJ03cIeCzJKeC3gONJ3g38B8tPOp5N8qFxA6Sq1j+Y\n1LjjG0nCtby/JHU0at+W7r16ZS1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkN\nGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQG\njLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1ID\nxlqSGjDWkjSAJH+e5HNJTo1eXjfu/N2zmpgk6esU8OtV9ZeTnGysJWnKkuwBTgIHgV3A8SuHJh3D\nbRBJmr4jwMWq+s6qOgQ8OHr8fUnOJHl/kpeOGyBVtf7BpMYd38jCwgJLS0ubfn9J6qqqvnbVnORG\n4FPAR4H7quqRJN9UVV8aRfpDwIWqOr7OcNONtSTNoyRfF+vRY/uAReBW4KGVYU7ywyzvX//YemO6\nZy1JU5bkALBUVSeSXAKOrriyDvATwLlxYxhrSZq+Q8BdSS4DzwHvAk4k+UaWn2Q8Bbx33ABug0jS\nFltrG+Ra+dMgktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCs\nJakBYy1JDRhrSWrAWAMPP/zw0FPYdlyTF3NN1ua6zIaxxk+2tbgmL+aarM11mQ1jLUkNGGtJamDD\n23rNcC6StGNs9W29xsZakrQ9uA0iSQ0Ya0lqYG5ineRIkieSPJnkN9Y4fn2SB5OcTvJPSX5xgGnO\n3EbrMjrncJJTo3V5eMZTnLlJ1mR03vcmeSHJLbOc3xAm+Pr5uSRnkpxN8pkkrxtinrM24dfPB0bH\nzyR5/aY/WFXt+BdgF3Ae+GbgOuA08G2rzvkd4H2j168HvgzsHnru22Bd9gH/DNxwZW2GnvfQa7Li\nvE8D9wE/OfS8h14T4PuBV4xePwI8OvS8t8m6vBV4YPT6913LuszLlfUbgPNV9fmqeh74CPDjq875\nL2Dv6PW9wJer6oUZznEIk6zLzwIfr6ovAlTVUzOe46xNsiYA7wY+BvzPLCc3kA3XpKo+W1WXRm8+\nBtww4zkOYZLPlbcDHwaoqseAfUleuZkPNi+xPgh8YcXbXxw9ttLdwHck+U/gDPBrM5rbkCZZlxuB\nhSR/l+TxJL8ws9kNY8M1SXKQ5S/KD44e2uk/UjXJ58lKR4EHpjqj7WGSdVnrnE19I9u9mXdqaJIv\npvcCp6vqcJLXAH+T5KaqembKcxvSJOtyHfBdwM3Ay4HPJnm0qp6c6syGM8ma/DHwm1VVSQJs6c/T\nbkMTfzNK8iPALwE/OL3pbBuTrsvqz49NfXOfl1hfBF614u1XsfwdbqUfAH4PoKouJPk34FuBx2cy\nw2FMsi5fAJ6qqq8CX03yD8BNwE6N9SRr8t3AR5Y7zfXAW5I8X1WfnM0UZ26SNWH0pOLdwJGqWprR\n3IY0ybqsPueG0WNXb+hN+hk9EbAbuMDyEwEvZe0nAt4PHBu9/srRoi8MPfdtsC6vBf6W5SdTXg6c\nA7596LkPuSarzr8XuGXoeQ+9JsCrWX6y7Y1Dz3ebrcvKJxjfyDU8wTgXV9ZV9UKS24C/Zjk691TV\nvyb55dHxPwN+H7g3yRmW9/Jvr6qnB5v0DEyyLlX1RJIHgbPAZeDuqvqX4WY9XRN+rsyVCdfkt4H9\nwAdH/+J4vqreMNScZ2HCr58Hkrw1yXng/4B3bPbj+evmktTAvPw0iCS1ZqwlqQFjLUkNGGtJasBY\nS1IDxlqSGjDWktSAsZakBv4fsqqfGsmCeYIAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import guide_tree_from_sequences\n", "%psource guide_tree_from_sequences" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mguide_tree_from_sequences\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequences\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdistance_fn\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkmer_distance\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdisplay_tree\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mguide_dm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msequences\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdistances\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdistance_fn\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mguide_lm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maverage\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mguide_dm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcondensed_form\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mguide_tree\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mto_tree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mguide_lm\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdisplay_tree\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mguide_d\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdendrogram\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mguide_lm\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabels\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mguide_dm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mids\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morientation\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'right'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mlink_color_func\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'black'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mguide_tree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "t = guide_tree_from_sequences(query_sequences, display_tree=True)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAWsAAAD7CAYAAACsV7WPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAC85JREFUeJzt3W2Ipeddx/Hvr7sptIvb3SFSl02LUKJV2cb6UKsWXQmV\nbadWjA8vfADrNog1rW8kSkFX2WiD0SJFLDaEWHChXVqlJQmxmholpQkG9smHQHZrtV2tmGbYRmlI\nwv59MWfLdDJz5uzsnHPPf873A8POnPuea669mPnOvdeZ2TtVhSRpe3vJ0BOQJG3MWEtSA8Zakhow\n1pLUgLGWpAaMtSQ1sHvcwST+XJ8kbUJVZSvH2/DKuqp2/MuxY8cGn8N2e3FNXBPXZfMv0+A2iCQ1\nYKwlqQFjDRw+fHjoKWw7rsmLuSZrc11mI+P2V5LUtPZfJGmnSkLN+glGSdLwjLUkNWCsJakBYy1J\nDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJGkCSe5KcTnI2yV8lecXY8/1f\n9yRpa03yv+4l+Yaqemb0+h8BS1V1x3rne2UtSVOWZE+S+0dX0ueS/MyKUAd4GfDUuDHG3jBXkrQl\njgAXq2oRIMne0Z/3Am8BzgPvGTeA2yACYGFhgaWlpaGnIe0YK7dBktwIfAr4KHBfVT2y4thLgD8B\n/ruqfne98Yy1gK/tsQ09DWlHWGvPOsk+YBG4FXioqo6vOPZDwO1V9bb1xnQbRJKmLMkBlp9APJHk\nEvDOJK+pqgujPeu3A6fGjWGsJWn6DgF3JbkMPAfcBnz4yt418Djwq+MGcBtEgNsg0lbyhrmSNKeM\ntSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPG\nWpIaMNaS1ICxlqQGvK3XJngncEmz5m29NmEn3gJrJ/6dpKF4Wy9JmlPGWpIaMNaS1ICxlqQGjLUk\nNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS9IAkpxI\n8kSSc0nuSTL2zl3GWpKG8RdV9dqqOgS8DHjnuJO9B6MkTVmSPcBJ4CCwCzheVSdXnPKPwA3jxjDW\nkjR9R4CLVbUIkGTvlQNJrgN+HnjPuAHcBpGk6TsLvDnJnUneVFVfWXHsT4G/r6rPjBvAu5tvwk68\nE/jCwgJLS0tDT0PaMVbf3TzJPmARuBV4qKqOJzkG3FRVt2w0nrHehJ0Ya0lbZ9SIrHj7ALBUVc8m\neRtwFLgfeAdwc1U9u+GYxvrqGWtJ46wR6x8F7gIuA88B7wIeBT4P/O/otI9X1R3rjmmsr56xljTO\n6lhvBZ9glKQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8ZakhqYi//IyV+lltTdXPxSzFb/Eou/FCNp\nHH8pRpLmlLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMt\nSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGW\npAaMtSQ1YKwlaUBJPpDkmY3OM9aSNJAk3wPsA2qjc421JE1Zkj1J7k9yOsm5JD+dZBfwB8DtQDYa\nY/fUZylJOgJcrKpFgCR7gduAT1TVl5INW22sN2P//v1MsriSNHIW+MMkdwL3AZ8Dfgo4nAljkqr1\nt0qS1LjjXSRhJ/w9JPUwak5WPbYPWARuBT4N/Arw7Ojwq4ELVfUt643plbUkTVmSA8BSVZ1Icgk4\nWlUHVhx/ZlyowVhL0iwcAu5Kchl4juWr6pU2/Ke/2yCStMXW2ga5Vv7oniQ1MNVtkIWFBZaWlqb5\nISRpLkx1G2S7bD9sl3lImg9ug0jSnDLWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIa\nMNaS1ICxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkN\nGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIGkOS2JOeTXE6y\nsNH5xlqShvEIcDPw75OcvHu6c5EkJdkDnAQOAruA41V1cnRsojGMtSRN3xHgYlUtAiTZe7UDuA0i\nSdN3FnhzkjuTvKmqvnK1A8zFlfX+/fsn/qeGJG21qnoyyeuBReCOJA9V1fGrGWMuYv30008PPQVJ\nc2T1xWGSA8BSVZ1Icgk4Onr8yokbXk26DSJJ03cIeCzJKeC3gONJ3g38B8tPOp5N8qFxA6Sq1j+Y\n1LjjG0nCtby/JHU0at+W7r16ZS1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkN\nGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQG\njLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1ID\nxlqSGjDWkjSAJH+e5HNJTo1eXjfu/N2zmpgk6esU8OtV9ZeTnGysJWnKkuwBTgIHgV3A8SuHJh3D\nbRBJmr4jwMWq+s6qOgQ8OHr8fUnOJHl/kpeOGyBVtf7BpMYd38jCwgJLS0ubfn9J6qqqvnbVnORG\n4FPAR4H7quqRJN9UVV8aRfpDwIWqOr7OcNONtSTNoyRfF+vRY/uAReBW4KGVYU7ywyzvX//YemO6\nZy1JU5bkALBUVSeSXAKOrriyDvATwLlxYxhrSZq+Q8BdSS4DzwHvAk4k+UaWn2Q8Bbx33ABug0jS\nFltrG+Ra+dMgktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCs\nJakBYy1JDRhrSWrAWAMPP/zw0FPYdlyTF3NN1ua6zIaxxk+2tbgmL+aarM11mQ1jLUkNGGtJamDD\n23rNcC6StGNs9W29xsZakrQ9uA0iSQ0Ya0lqYG5ineRIkieSPJnkN9Y4fn2SB5OcTvJPSX5xgGnO\n3EbrMjrncJJTo3V5eMZTnLlJ1mR03vcmeSHJLbOc3xAm+Pr5uSRnkpxN8pkkrxtinrM24dfPB0bH\nzyR5/aY/WFXt+BdgF3Ae+GbgOuA08G2rzvkd4H2j168HvgzsHnru22Bd9gH/DNxwZW2GnvfQa7Li\nvE8D9wE/OfS8h14T4PuBV4xePwI8OvS8t8m6vBV4YPT6913LuszLlfUbgPNV9fmqeh74CPDjq875\nL2Dv6PW9wJer6oUZznEIk6zLzwIfr6ovAlTVUzOe46xNsiYA7wY+BvzPLCc3kA3XpKo+W1WXRm8+\nBtww4zkOYZLPlbcDHwaoqseAfUleuZkPNi+xPgh8YcXbXxw9ttLdwHck+U/gDPBrM5rbkCZZlxuB\nhSR/l+TxJL8ws9kNY8M1SXKQ5S/KD44e2uk/UjXJ58lKR4EHpjqj7WGSdVnrnE19I9u9mXdqaJIv\npvcCp6vqcJLXAH+T5KaqembKcxvSJOtyHfBdwM3Ay4HPJnm0qp6c6syGM8ma/DHwm1VVSQJs6c/T\nbkMTfzNK8iPALwE/OL3pbBuTrsvqz49NfXOfl1hfBF614u1XsfwdbqUfAH4PoKouJPk34FuBx2cy\nw2FMsi5fAJ6qqq8CX03yD8BNwE6N9SRr8t3AR5Y7zfXAW5I8X1WfnM0UZ26SNWH0pOLdwJGqWprR\n3IY0ybqsPueG0WNXb+hN+hk9EbAbuMDyEwEvZe0nAt4PHBu9/srRoi8MPfdtsC6vBf6W5SdTXg6c\nA7596LkPuSarzr8XuGXoeQ+9JsCrWX6y7Y1Dz3ebrcvKJxjfyDU8wTgXV9ZV9UKS24C/Zjk691TV\nvyb55dHxPwN+H7g3yRmW9/Jvr6qnB5v0DEyyLlX1RJIHgbPAZeDuqvqX4WY9XRN+rsyVCdfkt4H9\nwAdH/+J4vqreMNScZ2HCr58Hkrw1yXng/4B3bPbj+evmktTAvPw0iCS1ZqwlqQFjLUkNGGtJasBY\nS1IDxlqSGjDWktSAsZakBv4fsqqfGsmCeYIAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have a guide tree, so we can move on to the next step of progressive alignment." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Generalization of Needleman-Wunsch (with affine gap scoring) for progressive multiple sequence alignment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we'll address our second burning question: aligning alignments. As illustrated above, there are basically three different types of pairwise alignment we need to support for progressive multiple sequenence alignment with Needleman-Wunsch. These are:\n", "\n", "1. Alignment of a pair of sequences.\n", "2. Alignment of a sequence and an alignment.\n", "3. Alignment of a pair of alignments.\n", "\n", "Standard Needleman-Wunsch supports the first, and it is very easy to generalize it to support the latter two. The only change that is necessary is in how the alignment of two non-gap characters is scored. Recall that we previously scored an alignment of two characters by looking up the score of subsitution from one to the other in a substitution matrix. To adapt this for aligning a sequence to an alignment, or for aligning an alignment to an alignment, we compute this subsitution as the average score of aligning the pairs of characters. \n", "\n", "For example, if we want to align the alignment column from $aln1$:\n", "\n", "```\n", "A\n", "C\n", "```\n", "\n", "to the alignment column from $aln2$:\n", "\n", "```\n", "T\n", "G\n", "```\n", "\n", "we could compute the subsitution score using the matrix $m$ as:\n", "\n", "$$\n", "s = \\frac{m[A][T] + m[A][G] + m[C][T] + m[C][G]}{aln1_{length} \\times aln2_{length}}\n", "$$\n", "\n", "The following code adapts our implementation of Needleman-Wunsh to support aligning a sequence to an alignment, or aligning an alignment to an alignment." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import format_dynamic_programming_matrix, format_traceback_matrix\n", "from skbio.alignment._pairwise import _compute_score_and_traceback_matrices \n", "\n", "%psource _compute_score_and_traceback_matrices" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0m_compute_score_and_traceback_matrices\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maln1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgap_open_penalty\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgap_extend_penalty\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msubstitution_matrix\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mnew_alignment_score\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minit_matrices_f\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0m_init_matrices_nw\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mpenalize_terminal_gaps\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgap_substitution_score\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\"Return dynamic programming (score) and traceback matrices.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m A note on the ``penalize_terminal_gaps`` parameter. When this value is\u001b[0m\n", "\u001b[0;34m ``False``, this function is no longer true Smith-Waterman/Needleman-Wunsch\u001b[0m\n", "\u001b[0;34m scoring, but when ``True`` it can result in biologically irrelevant\u001b[0m\n", "\u001b[0;34m artifacts in Needleman-Wunsch (global) alignments. Specifically, if one\u001b[0m\n", "\u001b[0;34m sequence is longer than the other (e.g., if aligning a primer sequence to\u001b[0m\n", "\u001b[0;34m an amplification product, or searching for a gene in a genome) the shorter\u001b[0m\n", "\u001b[0;34m sequence will have a long gap inserted. The parameter is ``True`` by\u001b[0m\n", "\u001b[0;34m default (so that this function computes the score and traceback matrices as\u001b[0m\n", "\u001b[0;34m described by the original authors) but the global alignment wrappers pass\u001b[0m\n", "\u001b[0;34m ``False`` by default, so that the global alignment API returns the result\u001b[0m\n", "\u001b[0;34m that users are most likely to be looking for.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maln1_length\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maln1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msequence_length\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maln2_length\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maln2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msequence_length\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# cache some values for quicker/simpler access\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_traceback_encoding\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'alignment-end'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mmatch\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_traceback_encoding\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'match'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mvgap\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_traceback_encoding\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'vertical-gap'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mhgap\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_traceback_encoding\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'horizontal-gap'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mnew_alignment_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mnew_alignment_score\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maend\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# Initialize a matrix to use for scoring the alignment and for tracing\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# back the best alignment\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mscore_matrix\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtraceback_matrix\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0minit_matrices_f\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maln1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgap_open_penalty\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgap_extend_penalty\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# Iterate over the characters in aln2 (which corresponds to the vertical\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# sequence in the matrix)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0maln2_pos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln2_chars\u001b[0m \u001b[0;32min\u001b[0m \u001b[0menumerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maln2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0miter_positions\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# Iterate over the characters in aln1 (which corresponds to the\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# horizontal sequence in the matrix)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_chars\u001b[0m \u001b[0;32min\u001b[0m \u001b[0menumerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maln1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0miter_positions\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# compute the score for a match/mismatch\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msubstitution_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_compute_substitution_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maln1_chars\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln2_chars\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msubstitution_matrix\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mgap_substitution_score\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdiag_score\u001b[0m \u001b[0;34m=\u001b[0m \\\n", " \u001b[0;34m(\u001b[0m\u001b[0mscore_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0msubstitution_score\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mmatch\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# compute the score for adding a gap in aln2 (vertical)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mpenalize_terminal_gaps\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0maln1_pos\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0maln1_length\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# we've reached the end of aln1, so adding vertical gaps\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# (which become gaps in aln1) should no longer\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# be penalized (if penalize_terminal_gaps == False)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mup_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mscore_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvgap\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mtraceback_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mvgap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# gap extend, because the cell above was also a gap\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mup_score\u001b[0m \u001b[0;34m=\u001b[0m \\\n", " \u001b[0;34m(\u001b[0m\u001b[0mscore_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mgap_extend_penalty\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mvgap\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# gap open, because the cell above was not a gap\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mup_score\u001b[0m \u001b[0;34m=\u001b[0m \\\n", " \u001b[0;34m(\u001b[0m\u001b[0mscore_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mgap_open_penalty\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mvgap\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# compute the score for adding a gap in aln1 (horizontal)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mpenalize_terminal_gaps\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0maln2_pos\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0maln2_length\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# we've reached the end of aln2, so adding horizontal gaps\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# (which become gaps in aln2) should no longer\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# be penalized (if penalize_terminal_gaps == False)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mleft_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mscore_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhgap\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mtraceback_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mhgap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# gap extend, because the cell to the left was also a gap\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mleft_score\u001b[0m \u001b[0;34m=\u001b[0m \\\n", " \u001b[0;34m(\u001b[0m\u001b[0mscore_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mgap_extend_penalty\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mhgap\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# gap open, because the cell to the left was not a gap\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mleft_score\u001b[0m \u001b[0;34m=\u001b[0m \\\n", " \u001b[0;34m(\u001b[0m\u001b[0mscore_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mgap_open_penalty\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mhgap\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# identify the largest score, and use that information to populate\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# the score and traceback matrices\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_first_largest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnew_alignment_score\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mleft_score\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdiag_score\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mup_score\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mscore_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mbest_score\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mtraceback_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maln2_pos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1_pos\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mbest_score\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mscore_matrix\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtraceback_matrix\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "from skbio.alignment._pairwise import _traceback\n", "%psource _traceback" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0m_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtraceback_matrix\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mscore_matrix\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstart_row\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mstart_col\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgap_character\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'-'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# cache some values for simpler\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_traceback_encoding\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'alignment-end'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mmatch\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_traceback_encoding\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'match'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mvgap\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_traceback_encoding\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'vertical-gap'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mhgap\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_traceback_encoding\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'horizontal-gap'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# initialize the result alignments\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maln1_sequence_count\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maln1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msequence_count\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seqs1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0me\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maln1_sequence_count\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maln2_sequence_count\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maln2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msequence_count\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seqs2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0me\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maln2_sequence_count\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcurrent_row\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstart_row\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcurrent_col\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstart_col\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mbest_score\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mscore_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mcurrent_row\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcurrent_col\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcurrent_value\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0mcurrent_value\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0maend\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcurrent_value\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtraceback_matrix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mcurrent_row\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcurrent_col\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcurrent_value\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mmatch\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minput_seq\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maligned_seqs1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minput_seq\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mcurrent_col\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minput_seq\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maligned_seqs2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minput_seq\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mcurrent_row\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcurrent_row\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcurrent_col\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mcurrent_value\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mvgap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0maligned_seq\u001b[0m \u001b[0;32min\u001b[0m \u001b[0maligned_seqs1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'-'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minput_seq\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maligned_seqs2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minput_seq\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mcurrent_row\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcurrent_row\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mcurrent_value\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mhgap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minput_seq\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maligned_seqs1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maln1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minput_seq\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mcurrent_col\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0maligned_seq\u001b[0m \u001b[0;32min\u001b[0m \u001b[0maligned_seqs2\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'-'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcurrent_col\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mcurrent_value\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0maend\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mcontinue\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"Invalid value in traceback matrix: %s\"\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mcurrent_value\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maln1_sequence_count\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seq\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maligned_seqs1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mseq_id\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_get_seq_id\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maln1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seqs1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mBiologicalSequence\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maligned_seq\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mid\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mseq_id\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maln2_sequence_count\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seq\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maligned_seqs2\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mseq_id\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_get_seq_id\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maln2\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0maln1_sequence_count\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0maligned_seqs2\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mBiologicalSequence\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maligned_seq\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mid\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mseq_id\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0maligned_seqs1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maligned_seqs2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_score\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mcurrent_col\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcurrent_row\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "from skbio.alignment import global_pairwise_align_nucleotide\n", "%psource global_pairwise_align_nucleotide " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mglobal_pairwise_align_nucleotide\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseq1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgap_open_penalty\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mgap_extend_penalty\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mmatch_score\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmismatch_score\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msubstitution_matrix\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mpenalize_terminal_gaps\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\"Globally align pair of nuc. seqs or alignments with Needleman-Wunsch\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Parameters\u001b[0m\n", "\u001b[0;34m ----------\u001b[0m\n", "\u001b[0;34m seq1 : str, BiologicalSequence, or Alignment\u001b[0m\n", "\u001b[0;34m The first unaligned sequence(s).\u001b[0m\n", "\u001b[0;34m seq2 : str, BiologicalSequence, or Alignment\u001b[0m\n", "\u001b[0;34m The second unaligned sequence(s).\u001b[0m\n", "\u001b[0;34m gap_open_penalty : int or float, optional\u001b[0m\n", "\u001b[0;34m Penalty for opening a gap (this is substracted from previous best\u001b[0m\n", "\u001b[0;34m alignment score, so is typically positive).\u001b[0m\n", "\u001b[0;34m gap_extend_penalty : int or float, optional\u001b[0m\n", "\u001b[0;34m Penalty for extending a gap (this is substracted from previous best\u001b[0m\n", "\u001b[0;34m alignment score, so is typically positive).\u001b[0m\n", "\u001b[0;34m match_score : int or float, optional\u001b[0m\n", "\u001b[0;34m The score to add for a match between a pair of bases (this is added\u001b[0m\n", "\u001b[0;34m to the previous best alignment score, so is typically positive).\u001b[0m\n", "\u001b[0;34m mismatch_score : int or float, optional\u001b[0m\n", "\u001b[0;34m The score to add for a mismatch between a pair of bases (this is\u001b[0m\n", "\u001b[0;34m added to the previous best alignment score, so is typically\u001b[0m\n", "\u001b[0;34m negative).\u001b[0m\n", "\u001b[0;34m substitution_matrix: 2D dict (or similar)\u001b[0m\n", "\u001b[0;34m Lookup for substitution scores (these values are added to the\u001b[0m\n", "\u001b[0;34m previous best alignment score). If provided, this overrides\u001b[0m\n", "\u001b[0;34m ``match_score`` and ``mismatch_score``.\u001b[0m\n", "\u001b[0;34m penalize_terminal_gaps: bool, optional\u001b[0m\n", "\u001b[0;34m If True, will continue to penalize gaps even after one sequence has\u001b[0m\n", "\u001b[0;34m been aligned through its end. This behavior is true Needleman-Wunsch\u001b[0m\n", "\u001b[0;34m alignment, but results in (biologically irrelevant) artifacts when\u001b[0m\n", "\u001b[0;34m the sequences being aligned are of different length. This is ``False``\u001b[0m\n", "\u001b[0;34m by default, which is very likely to be the behavior you want in all or\u001b[0m\n", "\u001b[0;34m nearly all cases.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Returns\u001b[0m\n", "\u001b[0;34m -------\u001b[0m\n", "\u001b[0;34m skbio.Alignment\u001b[0m\n", "\u001b[0;34m ``Alignment`` object containing the aligned sequences as well as\u001b[0m\n", "\u001b[0;34m details about the alignment.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m See Also\u001b[0m\n", "\u001b[0;34m --------\u001b[0m\n", "\u001b[0;34m local_pairwise_align\u001b[0m\n", "\u001b[0;34m local_pairwise_align_protein\u001b[0m\n", "\u001b[0;34m local_pairwise_align_nucleotide\u001b[0m\n", "\u001b[0;34m skbio.alignment.local_pairwise_align_ssw\u001b[0m\n", "\u001b[0;34m global_pairwise_align\u001b[0m\n", "\u001b[0;34m global_pairwise_align_protein\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Notes\u001b[0m\n", "\u001b[0;34m -----\u001b[0m\n", "\u001b[0;34m Default ``match_score``, ``mismatch_score``, ``gap_open_penalty`` and\u001b[0m\n", "\u001b[0;34m ``gap_extend_penalty`` parameters are derived from the NCBI BLAST\u001b[0m\n", "\u001b[0;34m Server [1]_.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m This function can be use to align either a pair of sequences, a pair of\u001b[0m\n", "\u001b[0;34m alignments, or a sequence and an alignment.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m References\u001b[0m\n", "\u001b[0;34m ----------\u001b[0m\n", "\u001b[0;34m .. [1] http://blast.ncbi.nlm.nih.gov/Blast.cgi\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# use the substitution matrix provided by the user, or compute from\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# match_score and mismatch_score if a substitution matrix was not provided\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0msubstitution_matrix\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msubstitution_matrix\u001b[0m \u001b[0;34m=\u001b[0m \\\n", " \u001b[0mmake_identity_substitution_matrix\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmatch_score\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmismatch_score\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mpass\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mglobal_pairwise_align\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseq1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgap_open_penalty\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mgap_extend_penalty\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msubstitution_matrix\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mpenalize_terminal_gaps\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mpenalize_terminal_gaps\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the sake of the examples below, I'm going to override one of the ``global_pairwise_align_nucleotide`` defaults to penalize terminal gaps. This effectively tells the algorithm that we know we have a collection of sequences that are homologous from beginning to end." ] }, { "cell_type": "code", "collapsed": false, "input": [ "global_pairwise_align_nucleotide = partial(global_pairwise_align_nucleotide, penalize_terminal_gaps=True)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, we can still use this code to align pairs of sequences (but note that we now need to pass those sequences in as a pair of one-item lists):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print(query_sequences[0])\n", "print(query_sequences[1])\n", "\n", "aln1 = global_pairwise_align_nucleotide(query_sequences[0], query_sequences[1])\n", "print(aln1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ACCGGTGACCAGTTGACCAGT\n", "ATCGGTACCGGTAGAAGT\n", ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s2\n", "ATCGGT-ACCGGTAGA--AGT\n", "\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "/Users/caporaso/.virtualenvs/iab/lib/python2.7/site-packages/skbio/alignment/_pairwise.py:540: EfficiencyWarning: You're using skbio's python implementation of Needleman-Wunsch alignment. This is known to be very slow (e.g., thousands of times slower than a native C implementation). We'll be adding a faster version soon (see https://github.com/biocore/scikit-bio/issues/254 to track progress on this).\n", " \"to track progress on this).\", EfficiencyWarning)\n" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can align that alignment to one of our other sequences. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "\n", "print(query_sequences[2])\n", "print(global_pairwise_align_nucleotide(aln1, query_sequences[2]))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "GGTACCAAATAGAA\n", ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s2\n", "ATCGGT-ACCGGTAGA--AGT\n", ">s3\n", "---GGTACCAAATAGA--A--\n", "\n" ] } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we can align another pair of sequences:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "aln2 = global_pairwise_align_nucleotide(query_sequences[2], query_sequences[3])\n", "print(aln2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">s3\n", "GGTACCAAATAGAA\n", ">s4\n", "GGCACCAAACAGAA\n", "\n" ] } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then align that alignment against our previous alignment:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print(aln1)\n", "print(aln2)\n", "aln3 = global_pairwise_align_nucleotide(aln1, aln2)\n", "print(aln3)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s2\n", "ATCGGT-ACCGGTAGA--AGT\n", "\n", ">s3\n", "GGTACCAAATAGAA\n", ">s4\n", "GGCACCAAACAGAA\n", "\n", ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s2\n", "ATCGGT-ACCGGTAGA--AGT\n", ">s3\n", "---GGTACCAAATAGA--A--\n", ">s4\n", "---GGCACCAAACAGA--A--\n", "\n" ] } ], "prompt_number": 26 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Putting it all together: progressive multiple sequence alignment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now combine all of these steps to take a set of query sequences, build a guide tree, perform progressive multiple sequence alignment, and return the guide tree (as a SciPy linkage matrix) and the alignment. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "from skbio import TreeNode\n", "guide_tree = TreeNode.from_linkage_matrix(guide_lm, guide_dm.ids)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can view the guide tree in [newick format](http://scikit-bio.org/docs/latest/generated/skbio.io.newick.html) as follows:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print(guide_tree)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(s5:0.45975877193,(s1:0.410714285714,(s4:0.382575757576,(s2:0.305555555556,s3:0.305555555556):0.0770202020202):0.0281385281385):0.0490444862155);\n", "\n" ] } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import progressive_msa\n", "%psource progressive_msa" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mprogressive_msa\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequences\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mguide_tree\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\" Perform progressive msa of sequences\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Parameters\u001b[0m\n", "\u001b[0;34m ----------\u001b[0m\n", "\u001b[0;34m sequences : skbio.SequenceCollection\u001b[0m\n", "\u001b[0;34m The sequences to be aligned.\u001b[0m\n", "\u001b[0;34m guide_tree : skbio.TreeNode\u001b[0m\n", "\u001b[0;34m The tree that should be used to guide the alignment process.\u001b[0m\n", "\u001b[0;34m pairwise_aligner : function\u001b[0m\n", "\u001b[0;34m Function that should be used to perform the pairwise alignments,\u001b[0m\n", "\u001b[0;34m for example skbio.Alignment.global_pairwise_align_nucleotide. Must\u001b[0m\n", "\u001b[0;34m support skbio.BiologicalSequence objects or skbio.Alignment objects\u001b[0m\n", "\u001b[0;34m as input.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Returns\u001b[0m\n", "\u001b[0;34m -------\u001b[0m\n", "\u001b[0;34m skbio.Alignment\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mc1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mc2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mguide_tree\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mchildren\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mc1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_tip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mc1_aln\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msequences\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mc1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mc1_aln\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mprogressive_msa\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequences\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mc1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mc2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_tip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mc2_aln\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msequences\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mc2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mc2_aln\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mprogressive_msa\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequences\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mc2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mc1_aln\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mc2_aln\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "msa = progressive_msa(query_sequences, guide_tree, pairwise_aligner=global_pairwise_align_nucleotide)\n", "print(msa)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">s5\n", "---GGC--CCA-----CTGAT\n", ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s4\n", "---GGCACCAAACAGA--A--\n", ">s2\n", "ATCGGTACC-GGTAGA--AGT\n", ">s3\n", "---GGTACCAAATAGA--A--\n", "\n" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now build a (hopefully) improved tree from our multiple sequence alignment. First we'll look at our original distance matrix again, and then the distance matrix generted from the progressive multiple sequence alignment." ] }, { "cell_type": "code", "collapsed": false, "input": [ "fig = guide_dm.plot(cmap='Greens')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAVcAAAEACAYAAAAHujVXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFy1JREFUeJzt3XuwXWV5x/Hv7xxAczGEAAOBIGpALk6ASLmI3CqKkQh4\nRVHLiICtCtppHayKUoag3GSUCgwgRaalUqcViAQwBSUhGJFLIKGNEgwIhMAkEMLdhOTpH3vleLI9\ne6919t7vWXuv/D7MmuzLWu961pzwnDfPujyKCMzMrLP6yg7AzKyKnFzNzBJwcjUzS8DJ1cwsASdX\nM7MEnFzNzBJwcjWzTZqkf5X0jKRFTda5WNISSQ9KmlpkXCdXM9vUXQ1Ma/SlpKOAXSJiV+DzwGVF\nBnVyNbNNWkTcCaxqssoxwDXZuncD4yVtlzeuk6uZWXM7Ak8Mev8kMClvo81SRSPJ99WaWSERoXa2\nH26+aWF/9evn7i9ZcgV49fWXUw4PwIyzzuGMM7+ZfD8A77jo2BHZD8Bzs//AhCMnJ9/PE8tXJN/H\nYOvmP03/u7Yf0X2mNlLHtPalNcn3sZH7VsC+26bfz5WLOzPOe3cstt5ty4Y78jJgp0HvJ2WfNeWy\ngJlVQ7+KLcM3EzgBQNKBwPMR8UzeRklnrmZmI0atVRYk/QQ4DNhG0hPAmcDmABFxeUTcLOkoSY8A\nLwMnFhm355ProYcdUnYISYyavFXZISShSWPLDqHjqnhMAEwcXXYEw9Ni1TYiji+wzqnDHbf3k+vh\nh5YdQhKjJk8oO4Qk+naqXiKq4jEBsMOYsiMYnhZnrqn0fHI1MwO67gySk6uZVYNnrmZmCXRXbnVy\nNbOKaO0yq2ScXM2sGlwWMDNLoLtyq5OrmVVEX3dlVydXM6uG7sqtTq5mVhGuuZqZJeCygJlZAk6u\nZmYJdFdudXI1s4rospnrsB91IOlUSY9IWi+pmo9uMrPeo4LLUJtK0yT9Lmuf/bUhvt9K0vVZa+27\nJb0jL5xWniMzDzgC+GML25qZpSEVW/5iM/UDP6TWXntP4HhJe9St9g3g/ojYm1pXgh/khdM0uUoa\nI2mWpAckLZJ0XEQ8EBFOrGbWXfoKLn9pf+CRiHgsItYC1wH1DfP2AH4FEBG/B94iqWmDsbyZ6zRg\nWUTsExFTgFtz1jczK0eLM1eGbp1d3+3wQeAjtd1of2Bnctpr553QWghcKOlc4KaImJez/kZmnHXO\nwOtDDzuksl0DzGwYnnoZlr/S+XEbndDK31+RttznAj+QtABYBCwA1jXboGlyjYglkqYC04EZkm6P\niLMLBAIwYi2vzayH7DBm4xYy96/szLiN/h0+aUxtaby/+tbZO1GbvQ6IiBeBz214L+lRYGmzcJom\nV0kTgVURca2k1cBJ2ecbfkV017UPZrbpav3213uBXSW9BXgK+ASwUdNCSVsCr0bEGkmnAHMi4qVm\ng+aVBaYAF0haD6wBviDpNOB0YDtgoaRZEfH5Fg7IzKxzWu/++rqkU4FfAP3AVRGxWNLfZt9fTu0q\ngh9LCuAhsolmM3llgdnA7LqP7wf+ZfiHYGaWUBs3EUTELcAtdZ9dPuj1fGC34YzpO7TMrBr8VCwz\ns85TwZlrkUsDOsHJ1cwqQQVnrk6uZmbD0GVVASdXM6uGvoLZtemV/x3k5GpmlVC0LDBSnFzNrBKc\nXM3MEnByNTNLoK/LOhE4uZpZJXjmamaWgLrsOVJOrmZWCZ65mpkl0GW5taUGhWZmXadPKrQMpUD3\n120k3Zr1E3xI0mdz42n/kMzMyiep0DLEdkW6v54KLIiIfYDDge9JavovfydXM6uEvj4VWoZQpPvr\ncmBc9noc8GxEvN4snqQ113dcVB9fb/vff7ix7BA67tiZXyo7hCRum3N/2SF03qMvlB1BV2vjhNZQ\n3V8PqFvnSuCXkp4C3gQclzeoT2iZWSU0Sq5rHlvN2sdWN9u0yFMIvwE8EBGHS5oM/I+kvbPGhUNy\ncjWzSmiUXN/w1vG84a3jB96/MueJ+lVyu78CBwHnAETEH7Lur7tRa244JNdczawSWj2hxaDur5K2\noNb9dWbdOr8D3pvtZztqibX11tpmZr2i1ZJrwe6v3wGulvQgtUnp6RHxXLNxnVzNrBLauUOrQPfX\nlcDRwxnTydXMKqG/r7uqnE6uZlYJ3Xb7q5OrmVWCH9xiZpaAHzloZpaAZ65mZgk4uZqZJdBludXJ\n1cyqoc+XYpmZdZ7LAmZmCXRZbnVyNbNq8MzVzCwBJ1czswS6Lbm2dHpN0rVZp8RFkq7Ka9RlZpZa\nGz20inR//aqkBdmySNLrksYPNdZAPC0ex79HxO4RMQUYBZzc4jhmZh2RsvtrRFwYEVMjYirwdeCO\niHi+WTy5M05JY4CfUmvi1Q+cHRE/HbTKPcCkvHHMzFJqoyww0P01G2dD99fFDdb/FPCTvEGL/HN+\nGrAsIqZnO97QXhZJmwOfAb5cYBwzs2TaKLkW6f6a7UOjgfcDX8wbtEhyXQhcKOlc4KaImDfou0uB\nORFx11AbPjf7DwOvR03eilGTJxTYnZlV2nN/glV/6viwjWauLz28kpceXtls0yLdXzc4GpiXVxKA\nAsk1IpZImgpMB2ZIuj0izpZ0JrB1RJzSaNsJR04eRsxmtkmY8IbassGjDbtTD0+D5Dp2t20Zu9u2\nA++fmfVw/SpFur9u8EkKlASgWM11IrAqIq6VtBo4SdLJwJHAEUV2YmaWWhs114Hur8BT1Lq/Hj/E\n+FsCh1KrueYqUhaYAlwgaT2whlqt4TfAY8D87ID+OyJmFNmhmVkKjS6zylOw+yvAh4BfRMSrRcYt\nUhaYDcyu+3jzwpGbmY2AlN1fs/fXANcUHdMX/5tZJXTbHVpOrmZWCV2WW51czawaPHM1M0vAydXM\nLAEnVzOzBFq9FCsVJ1czqwTPXM3MEnByNTNLwMnVzCyBLsutTq5mVg2euZqZpeDkambWef1ddilW\nqw0Kzcy6SqsNCrNtm3Z/zdY5POv++pCkO/Li8czVzCqhr8WywKDur++l1pXgHkkzI2LxoHXGA5cA\n74+IJyVtkxtPS9GYmXWZNmauA91fI2ItsKH762CfotYU4EmAiGjalAucXM2sIvoKLkMYqvvrjnXr\n7ApMkPQrSfdK+pu8eJKWBZ5YviLl8CPu2JlfKjuEjrvxmEvKDiGJw1/4bNkhdNzdT95bdghdrVFZ\n4NmHlvPsQ08327RI99fNgXdS6xs4mlqLq99ExJJGG7jmamaV0Ohk1TZTdmCbKTsMvH/kPx+oX6VI\n99cngJVZ/6xXJc0F9gYaJleXBcysEvr7+gotQxjo/ippC2rdX2fWrXMjcLCkfkmjgQOA/2sWj2eu\nZlYJrc4Ui3R/jYjfSboVWAisB66MCCdXM6u+Vi/FgsLdXy8ELiw6ppOrmVWCny1gZpZAOzPXFJxc\nzawSuiu1OrmaWUVsNvSVAKVxcjWzSnDN1cwsAddczcwS6K7U6uRqZhXhmauZWQJOrmZmCfiElplZ\nAv1OrmZmneeygJlZAk6uZmYJdFvNddj3i0m6StIDkhZKul7SlikCMzMbjjZ6aOW21s7aaq/OWmsv\nkHRGXjytzFz/PiJezHb4PeA0YEYL45iZdUyrM9cirbUzcyLimKLjNp25ShojaVY2U10k6bhBiVXA\nKCC3xayZWWp9UqFlCEVaa8MwbwLLKwtMA5ZFxD4RMQW4FUDS1cByYC/gR8PZoZlZCm300CrSWjuA\ngyQ9KOlmSXvmxZNXFlgIXCjpXOCmiJgHEBEnSuqjNpX+JnDWUBuvm//ndraaNJa+ncbmxWNmVbfi\nVVjxWseH7WswsXz8/sd5YsHjzTYt0lr7fmCniHhF0geAG4C3N9ugaXKNiCWSpgLTgRmSbo+Is7Pv\n1ku6Dji90fb979q+QMxmtknZdlRt2WDx8x0ZtlHNded9d2bnfXceeP/rq++qXyW3tfaGcmj2+hZJ\nl0qaEBHPNYonr+Y6EXgtIq6l1phrX0mTs+8EHAMsaDaGmdlIaKPmmttaW9J2Wc5D0v6AmiVWyC8L\nTAEukLQeWAOcClwjadygoL6UM4aZWXJq8aGDRVprAx8DviDpdeAV4JN54+aVBWYDs+s+PriF+M3M\nkmrnJoK81toRcQlwyXDG9B1aZlYJvv3VzCyBfvWXHcJGnFzNrBK67dkCTq5mVgmtntBKxcnVzCrB\nNVczswRcFjAzS6Bv+E9QTcrJ1cwqwTNXM7ME+uWZq5lZx3nmamaWgK8WMDNLwNe5mpkl0NdlNdfu\nisbMrEWSCi0Ntm3a/XXQevtJel3SR/Li8czVzCqh1asFinZ/zdY7j1ovwdwahGeuZlYJKvjfEIp2\nfz0N+C9gRZF4nFzNrBLaaPOS2/1V0o7UEu5l2Ue5TQ1dFhiG2+bcX3YIHXf4C58tO4Qk7vjMj8sO\noeO2WXxY2SEk8TKPdmQcNSgL/P63D/PwPUuabVqk++v3gX+KiMh6aeWWBZxczawSGl2Ktfv+u7H7\n/rsNvJ916S31q+R2fwX2Ba7LTohtA3xA0tqImEkDTq5mVglt3EQw0P0VeIpa99fjB68QEW/b8FrS\n1cDPmyVWcHI1s4po9fbXgt1fh83J1cwqoZ0Ht+R1f637/MQiYzq5mlklNDqhVRYnVzOrhD4/W8DM\nrPP8yEEzswT8VCwzswQ8czUzS8A1VzOzBPrUX3YIG3FyNbNKcFnAzCwBn9AyM0vAM1czswR8QsvM\nLAHPXM3MElCXNVZpORpJF0t6sZPBmJm1qk99hZah5HV/lXSspAclLZB0n6T35MXT0sxV0l8B4ynW\nHsHMLLlWH5ZdsPvrbRFxY7b+FOB6YJem8eTsdIykWZIekLRI0sezQM4HTqdAHxkzs5GQsvtrRLw8\n6O1YYGVePHkz12nAsoiYDiBpHHAqcGNEPN1tBWQz23S1kY+G6v56wBDjfwj4LjARODJv0LzkuhC4\nUNK5wE3AUuBjwOEqcCTr5j/958AmjaVvp7F5m5hZxa17/AXWPd750zWNTmgtnL+IhfMfarZpofJm\nRNwA3CDpEODfgN2ard80uUbEEklTgenADOCX1OoMj2SrjJb0cES8fajt+9+1fZGYzWwT0v/mcfS/\nedzA+7V3Le/IuI3me3sftBd7H7TXwPtrv39d/SpFur8OiIg7JW0maeuIeLbRek2Tq6SJwKqIuFbS\nauCkiJg46PsXGyVWM7OR1MZNBLndXyVNBpZGREh6J0CzxAr5ZYEpwAWS1gNrgC/Ufe+rBcysKzS6\nzCpPwe6vHwVOkLQWeAn4ZN64eWWB2cDsJt+Pa/SdmdlIaucEe17314g4n9pVUoX5Di0zq4Ruu0PL\nydXMKqHVmwhScXI1s0rw81zNzBLotpuanFzNrBJavVogFSdXM6uEPp/QMjPrPJcFzMwS8AktM7ME\nPHM1M0vAM1czswScXM3MEpAvxTIz6zzXXM3MEui2skB3zaPNzFrURoPCIq21P5211l4o6S5Jew01\nzmCeuZpZJbRaFijYWnspcGhErJY0DbgCOLDZuJ65mlklJG6tPT8iVmdv7wYm5cWTdOa69qU1KYcf\neY++UHYEHXf3k/eWHUIS2yw+rOwQOm7lOXPKDiGJUeeN6cg4bdRcC7XWHuQk4Oa8QV0WMLNKaPRU\nrN/Ou5d75jWdRBTuBSjpr4HPAe/OW9fJ1cwqoVHN9YBD9uOAQ/YbeH/Z+VfUr1KotXZ2EutKYFpE\nrMqLxzVXM6uENmquA621JW1BrbX2zI3Glt4M/Az4TEQ8UiQez1zNrBJarbkWbK39bWAr4LJshrw2\nIvZvNq6Tq5lVQuLW2icDJw9nTCdXM6uEbrtDy8nVzCrBydXMLAE/FcvMLAE/FcvMLAGXBczMEnBy\nNTNLwGUBM7MEPHM1M0vAydXMLIFGT8Uqi5OrmVWCa65mZgm4LGBmlkR3JddhFykk/VjSUkkLsiW3\nC6KZWWoquAy5bX73190lzZf0mqR/LBJPKzPXAL4aET9rYVszsyQSd399FjgN+FDRcZvOXCWNkTRL\n0gOSFkk6bsNXwwvfzCytxN1fV0TEvcDaovHklQWmAcsiYp+ImALcmn3+XUkPSrooa4tgZlaylgsD\nQ3V/3bHdaPLKAguBCyWdC9wUEfMkfT0ins6S6hXA14Czh9z6vhV/fj1xNOzQmRa6Zta75t4xl7lz\n7uz4uI3KAnfN/TW/nju/2aaFu78OK56I5uNKGg9MB04Bbo+Iswd9dxi1+uvRQ2wXnLJHh8Mt2aMv\nlB1B521ZzX94jNll67JD6LiV58wpO4QkRm02hohoq9QoKZ55dVmhdbcbteNG+5N0IPDPETEte/91\nYH1EnDfEfs4EXoqI7+Xtp+nMVdJEYFVEXCtpNXCSpO2zmauADwOLCh2RmVlCbVznOtD9FXiKWvfX\n4xvupqC8ssAU4AJJ64E1wBeBayVtm+1kAfCNojszM0slZfdXSdsD9wDjgPWSvgLsGREvNRq3aXKN\niNnA7LqPj2jpCMzMulSB7q9PAzsNZ0zfoWVmleBnC5iZJeBnC5iZJeHkambWcd2VWp1czawiXHM1\nM0vCydXMrOO6K7U6uZpZRajLemh1VzRmZhXhmauZVYKvczUzS8LJ1cys47ortTq5mllFdNt1rr1/\nQuupl8uOII3n/lR2BGmseLXsCDpu3eMVfIg6tY4BvaX1/q953V+zdS7Ovn9Q0tS8aHo/uS5/pewI\n0lhV1eT6WtkRdNy6x18sO4QkUrRiSanVBoWDur9OA/YEjpe0R906RwG7RMSuwOeBy/Li6f3kamZG\nrSxQZBlCbvdX4BjgGoCIuBsYL2m7ZvE4uZrZpq5I99eh1pnUbNC0J7SuXJx0+AH3rxyZ/Yy0R6v5\nz00WP598Fy/zaPJ9DLb2ruXJ9zHqvJHvnnzO2d8Z8X22avRmY1vdtGj31/ppb9PtkiXXdrs5mpkV\n1Wa+WcbGLVx2ojYzbbbOpOyzhlwWMLNN3UD3V0lbUOv+OrNunZnACTDQivv5iHim2aC+ztXMNmlF\nur9GxM2SjpL0CPAycGLeuIooWm4wM7OiXBYwM0ugEslV0hVlx9AqSZtJ+jtJMyS9u+67M8qKy8za\n0zPJVdKEBsvWwPSy42vD5cChwLPAxZIuGvTdR8sJqX2StpR0RvaLYzNJZ0q6SdLZkkaVHV+nSHq4\n7BjaJWmvQa+3kPQtST+X9B1Jo8uMrZf1TM1V0nrgjw2+3jEithjJeDpF0qKImJK93hy4FNga+BQw\nPyJy72HuRpKuB5YCo4C9gIXU7nw5BpgQEZ8rMbyWSHqR2rWNgy/7GQ28AkREjCslsDZJWrDh71n2\ny30CcDXwYWo/qxPKjK9X9dLVAkuBIyJiowSr2v1sj5cTUkdsvuFFduvdKZLOBG4HWr4qugtMjogP\nZz+f5cChEbFe0p3AgyXH1qqrgfHA6RHxdHZsSyPirSXH1UlHAPtFxBpJc6n9UrQW9ExZAPg+sFX9\nh1Gbep8/8uF0zH2SPjD4g4g4i9r/yG8pJaLOWA8DP59bImLw+54UEV8GLgb+Q9JX6K3/f5rZUtJH\nJH0UGBURa2DgZ9WzP6+y9cxfjoj4YUQ8IOk4SeMAstrQ9cBdJYfXsoj4dETcUn9c1OrIB5YbXVvu\nk/QmgIgYuCZQ0mSgZ5/RFxH3Au/L3t4BvLG8aDpmLnA08EHgLknbA2R/rigzsJ4WET21AIuyPw+m\n9pf7g8DdZcfl42p4XMcB47LX3wJuoPbPztJja+OYPg6MA3YAvg1cD7yz7LgS/KyuB/YtO65eXXpm\n5jrIuuzPDwJXRsRNQE+ezKpT1eP6VkS8IOlgavW8H1F7dmYv+3ZEvAC8DXgPcBUFnu/ZA+p/VldR\nO8FqLejF5Losu671E8AsSW+kN4+jXlWPq4q/NKp4TFDd4ypFz1yKtYGkMdSeGL4wIpZImghMiYjZ\nJYfWlgof1yxqTw96HzAVeI1auWPvUgNrQxWPCap7XGXpueRqvaWKvzSqeExQ3eMqi5OrmVkCVajp\nmZl1HSdXM7MEnFzNzBJwcjUzS+D/AT5OodAg6zElAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 31 }, { "cell_type": "code", "collapsed": false, "input": [ "msa_dm = msa.distances()\n", "fig = msa_dm.plot(cmap='Greens')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAVcAAAD+CAYAAACOVUaKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF9tJREFUeJzt3X+UHXV5x/H3ZzdAIHGFQA/GsLIFQwvnJDUEQlVIUWJd\niRptqzFqrQg2/RG1p/VAbaXUk4AolENtxBM1oqelzeFUkaiE5IBC+BEwCSGRdiMJ6UKyFC0/JKDQ\nZJunf9zJcrnsvXf23jt770w+r5w5uTPzne88QzYP3zzznTuKCMzMrLW62h2AmVkRObmamWXAydXM\nLANOrmZmGXByNTPLgJOrmVkGnFzN7JAnqV/Sdkk7JF0yyv7jJN0q6UFJD0n6aN0+Pc/VzA5lkrqB\nnwLzgCFgI7AoIgbK2vw9cEREfEbScUn74yNiuFq/EzIM2FnbzFKJCDVz/FjzTcX55gA7I2Iw6WsV\nsAAYKGvz38DM5HMP8FStxAoZJleAF4Z/mWX3ACz73OV89rK/zfw8AMdf+pZxOQ/Ai3ftYeI5J2R+\nnmN6Jmd+jnK/uG0XR887KfPzPHr7TzM/x4hH9sLJPZmfpufMaZmfo9x4/Qzu/fyPW9PRvJT/fW4b\nqtwyDdhdtr4HOKuizdeAH0p6HHgV8P56p3HN1cyKoVvplldKM+r9G+DBiHgt8Abgy5JeVeuATEeu\nZmbjRlUqC0+9CE//b60jh4DesvVeSqPXcm8CLgeIiEck/RfwG8Cmap3mfuQ693fOaXcImZjwuuz/\nmdkOE086pt0htN4xR7Q7gkzk7mdQVZbjJsIpr35peaVNwHRJfZIOBxYCqyvabKd0wwtJx1NKrLtq\nhZP7kevcc+e2O4RMTDgxZz/YKRUyuU4paHLN289gtZFrHRExLGkJsBboBlZGxICkxcn+FcAVwPWS\ntlIalF4cEU/X6jf3ydXMDGjq3+ERsQZYU7FtRdnnJ4F3jaVPJ1czK4YGR65ZcXI1s2LorNzq5Gpm\nBTH6NKu2cXI1s2JwWcDMLAOdlVudXM2sILo6K7s6uZpZMXRWbnVyNbOCcM3VzCwDLguYmWXAydXM\nLAOdlVvH/jSupG9K2iVpS7LMrH+UmVnGupRuGSeNjFwD+HREfKfVwZiZNazDRq41k6ukScCNlF6D\n0A0sPbgr47jMzMamw2YL1CsL9ANDEfGGiJgB3Jps/7ykrZKuSb5c1sysvbpSLuMYTi3bgLdJulLS\n2RGxF/hMRJwCnAlMAV7xjm8zs3EnpVvGSc2yQETskDQLmA8sk3R7RCxN9u2TdD3w6WrHL/vc5SOf\n5/7OOYV9a4CZpTf86F6GH9vb+o7zNBVL0lTgmYi4QdKzwIWSXhMRT0gS8F7gJ9WOH69XXptZfkw4\nsedlr5DZd/fjrem4w94IWG+2wAzgKkkHgH3AnwE3SPo1Sje1tlB65ayZWXt12A2temWBdcC6is3n\nZReOmVmDmsitkvqBaynNivp6RHyhYv+ngQ8lqxOAU4HjIuIX1frssIG0mVmDGnyIQFI3sJzS7KjT\ngEWSTi1vExFXR8SsiJgFfAa4o1ZiBSdXMyuKxmcLzAF2RsRgROwHVgELapzpg8C/1QvHydXMCkFd\nSrWMYhqwu2x9T7LtleeQjgLeDny7Xjz+4hYzKwRVuaEVe54n9jxf69AYw2neBdxdryQATq5mVhDV\nJguodzL0Th5Z/7/7f17ZZAjoLVvvpTR6Hc0HSFESAJcFzKwguqRUyyg2AdMl9SWP8y8EVlc2kvRq\nYC5wc5p4PHI1s0KoVhaoJyKGJS0B1lKairUyIgYkLU72r0iavgdYGxEvpOnXydXMCqHR5AoQEWuA\nNRXbVlSsfwv4Vto+nVzNrBCaSa5ZcHI1s0LoytMXt5iZ5YVHrmZmGVCHvSDFydXMCsEjVzOzDHRY\nbnVyNbNiqPKAQNs4uZpZIbgsYGaWgUNqKtbxl74ly+7H3c+W/qjdIbRc0f6MDrrs8ovaHULL3bT5\ngXaHkIltLerHI1czsww4uZqZZcDJ1cwsA06uZmYZ6LDc6uRqZsXgkauZWQa6uzrrxSpOrmZWCB02\ncHVyNbNi6LSyQGeNo83MGqSUv0Y9VuqXtF3SDkmXVGlzrqQtkh6SdEe9eDxyNbNCaHTkKqkbWA7M\no/Sa7Y2SVkfEQFmbo4EvA2+PiD2SjqvXr0euZlYIklIto5gD7IyIwYjYD6wCFlS0+SDw7YjYAxAR\nT9aLx8nVzApBSreMYhqwu2x9T7Kt3HRgiqQfSdok6Q/rxeOygJkVQlfjU7EiRZvDgNOB84CjgA2S\n7ouIHdUOcHI1s0KoVnP91c6neeGRp2sdOgT0lq33Uhq9ltsNPBkRLwAvSFoP/Bbg5GpmxVbtftak\n6VOYNH3KyPrT6x6pbLIJmC6pD3gcWAgsqmhzM7A8ufl1BHAWcE2teJxczawQGp0tEBHDkpYAa4Fu\nYGVEDEhanOxfERHbJd1K6etnDwBfi4j/rNWvk6uZFUIzDxFExBpgTcW2FRXrVwNXp+3TydXMCiH3\nT2hJWiJpp6QDkqbUP8LMLHtdXUq1jFs8DRxzN6XpCI+2OBYzs4Y18RBBJmqWBSRNAm6kNKG2G1ga\nETcm+7KPzswspU7LSfVqrv3AUETMB5DUk31IZmZj12G5tW5y3QZcLelK4PsRcfdYOn/xrpfm4U54\nXQ8TTnRuNjvUPf/wkzz/8FMt7zdXI9eI2CFpFjAfWCbp9ohYmrbzieec0Gx8ZlYwk085jsmnvPSl\nUj+/5eHWdJyn5CppKvBMRNwg6VngwmT7wavorKsxs0NWrkauwAzgKkkHgH3An0r6BHAxcDywTdIP\nIuKPM47TzKym8ZxmlUa9ssA6YF3F5geAf8osIjOzBuRt5GpmlgtOrmZmGeiw3OrkambF4JGrmVkG\nnFzNzDLg5GpmloFcTcUyM8sLj1zNzDLg5GpmloFOS64Nv+jbzKyTSOmW0Y9Vv6TtknZIumSU/edK\nelbSlmT5bL14PHI1s0JodOSavC57OTAPGAI2SlodEQMVTe+MiHen7dcjVzMrhsaHrnOAnRExGBH7\ngVXAgtHOMJZwnFzNrBC6u5RqGcU0YHfZ+p5kW7kA3iRpq6RbJJ1WLx6XBcysEJq4oRUp2jwA9EbE\nryS9A/gucEqtA5xczawQuqok16cfeoJn/uOJWocOAb1l672URq8jIuK5ss9rJF0naUpEPF2tUydX\nMyuEaiPXY2dM5dgZU0fWd924tbLJJmC6pD7gcWAhsKii7+OBn0dESJoDqFZiBSdXMyuIRm8gRcSw\npCXAWqAbWBkRA5IWJ/tXAH9A6U0sw8CvgA/U61cRacoNYycpTvz8WzPpu12e2ft8u0NouZ8t/VG7\nQ8jEols+1e4QWu668y5tdwiZeO2kE4mIpp4AkBTn33RBqra3vPf6ps+XhkeuZlYInfaElpOrmRVC\nd1dnzSx1cjWzQuis1OrkamYFUW0qVrs4uZpZIbjmamaWAY9czcwy0Fmp1cnVzApigmcLmJm1nmuu\nZmYZcM3VzCwDnZVanVzNrCA8cjUzy4CTq5lZBnxDy8wsA91OrmZmreeygJlZBjotuTb8SIOkL0l6\nrn5LM7PsSUq1jJeGkqukM4CjSfdKWjOzzHWlXEYjqV/Sdkk7JF1S7RySzpQ0LOn30sRTlaRJkn4g\n6UFJP5H0PkndwBeBi+m8ebtmdohqdOSa5LTlQD9wGrBI0qlV2n0BuJUUua9ezbUfGIqI+UnnPcAS\n4OaIeKLTpj6Y2aGriZrrHGBnRAwCSFoFLAAGKtp9Avh34MxU8dTZvw14m6QrJZ0NTKb0itnlcmY1\nsw7S3dWVahnFNGB32fqeZNsISdMoJdyvJJvqlkRrjlwjYoekWcB8YBnwQ+D1wM6kyVGSHo6IU0Y7\n/he37Rr5PPGkY5h40jH14jGzgrt3/Qbuveu+lvfb1XiVMs29o2uBv46ISAaWzZUFJE0FnomIGyQ9\nC1wYEVPL9j9XLbECHD3vpBQxm9mh5E1z38ib5r5xZP2aK65tSb/V/jH92AOP8tiWx2odOgT0lq33\nUhq9lpsNrErOcRzwDkn7I2J1tU7r1VxnAFdJOgDsA/60Yr9nC5hZR6hWc+2b3Uff7L6R9Xu/cU9l\nk03AdEl9wOPAQmBReYOIGBkpSroe+F6txAr1ywLrgHU19vfUOt7MbLyowbJARAxLWgKsBbqBlREx\nIGlxsn9FI/36CS0zK4Rm7rFHxBpgTcW2UZNqRFyQpk8nVzMrhE57/NXJ1cwKoVvd7Q7hZZxczawQ\nOm3qvZOrmRVCoze0suLkamaF4JqrmVkGXBYwM8tAV+NfT50JJ1czKwSPXM3MMtAtj1zNzFrOI1cz\nswx4toCZWQY8z9XMLANdrrmambWea65mZhnwbAEzswy45mpmloFDarbAo7f/NMvux91ll1/U7hBa\nbtEtn2p3CJn4t/P/sd0htNyVm7/Y7hA6mlwWMDNrvU4rC3RWqjcza1CXlGoZjaR+Sdsl7ZB0ySj7\nF0jaKmmLpM2S3lovHo9czawQGp2KJakbWA7MA4aAjZJWR8RAWbPbIuLmpP0M4Cbg9bX6dXI1s0Jo\nYirWHGBnRAwCSFoFLABGkmtE/LKs/WTgyXqdOrmaWSE0cUNrGrC7bH0PcNYr+9d7gM8DU4Hfrdep\na65mVghdKNUyikjTf0R8NyJOBd4F/HO99h65mlkhVKu5Dty/nYH7a04LHQJ6y9Z7KY1eRxURd0ma\nIOnYiHiqWjsnVzMrhGpTsU4761ROO+vUkfWblq+ubLIJmC6pD3gcWAgselnf0snArogISacD1Eqs\n4ORqZgXR6GyBiBiWtARYC3QDKyNiQNLiZP8K4PeBj0jaDzwPfKBev06uZlYIVeqpqUTEGmBNxbYV\nZZ+/CIzpETknVzMrhC51tzuEl3FyNbNC8Pe5mplloNO+W8DJ1cwKwSNXM7MMNHNDKwtOrmZWCB65\nmpllQB32NH9DyVXSDcBsYD/wY2BxRAy3MjAzs7HotFdrNxrNv0TEb0bEDOBIoHjvPzGzXGnmy7Kz\nUHfkKmkScCOlr+XqBpZGxI1lTTYCJ2QTnplZOnmcitUPDEXEfABJPQd3SDoM+DDwyWzCMzNLJ483\ntLYBV0u6Evh+RNxdtu864M6IuGfUIx/Z+9LnY46AKUc0HqmZFcLg5kEGHxhseb+5u6EVETskzQLm\nA8sk3R4RSyVdBhwbER+vevDJPVV3mdmhqW92H32z+0bW7/z6nS3pN3cjV0lTgWci4gZJzwIXSrqI\n0msOzss6QDOzNPL4EMEM4CpJB4B9wJ8B9wGDwIbk/xbfjohlWQVpZlZPp03FSlMWWAesq9h8WDbh\nmJk1JndlATOzPMjdDS0zszwYzwcE0nByNbNC6LSHCDprHG1m1iBJqZYqx/ZL2i5ph6RLRtn/IUlb\nJW2TdI+kmfXi8cjVzAqh0dkCkrqB5cA8YAjYKGl1RAyUNdsFzI2IZyX1A18FfrtWv06uZlYIXY3/\nQ3wOsDMiBgEkrQIWACPJNSI2lLW/nxTfp+KygJkVQhNlgWnA7rL1Pcm2ai4EbqkXj0euZlYI1W5o\nbb5nCw/c+2CtQyP1OaS3AB8D3lyvrZOrmRVCtZtVZ5x9OmecffrI+sqrv1nZZAjoLVvvpTR6rex/\nJvA1oD8inqkXj8sCZlYISvlrFJuA6ZL6JB0OLARWv6xv6XXAd4APR8TONPF45GpmhdDoPNeIGJa0\nBFhL6YUAKyNiQNLiZP8K4O+AY4CvJCPk/RExp1a/Tq5mVghq4otbImINsKZi24qyzxcxxtdZObma\nWSH4i1vMzDLQaY+/OrmaWSE4uZqZZcBlATOzDHTayFURqR9OGFvHUvR8puZMhdzp6z2+3SG03K1/\ntLzdIWRixUPfaHcILffXsy9udwiZOHLCJCKiqcwoKbY9tSlV25nHntH0+dLwyNXMCiF379AyM8sD\n11zNzDLQaTVXJ1czKwQnVzOzDLgsYGaWAY9czcwy4ORqZpaBZr4VKwtOrmZWCK65mpllwGUBM7MM\nOLmamWWg08oCnVUBNjNrUBMvKERSv6TtknZIumSU/b8paYOkFyX9VZp4PHI1s0JotCwgqRtYDsyj\n9JrtjZJWR8RAWbOngE8A70nbr0euZlYIXepKtYxiDrAzIgYjYj+wClhQ3iAi/iciNgH7U8fTzMWY\nmXUKSamWUUwDdpet70m2NcVlATMrhCZmC2TyxoAxJ1dJK4HZlEa9jwAfjYhnWx2YmdnYjJ5c712/\ngQ133VfrwCGgt2y9l9LotSmNjFz/IiKeA5D0D5SKvMuaDcTMrBnVxq1vnvtG3jz3jSPr11xxbWWT\nTcB0SX3A48BCYNEYT/MKNZOrpEnAjZTqD93A0oi4Mdkn4EhgR9qTmZllpdF5rhExLGkJsJZSnlsZ\nEQOSFif7V0h6DbAR6AEOSPoUcFpEPF+t33oj135gKCLmJ8H3JL9fD7wD2Al8sqErMjNroWae0IqI\nNcCaim0ryj4/wctLB3XVmy2wDXibpCslnR0Re5MTXQC8Ntn/t2M5oZlZNpRyGR81R64RsUPSLGA+\nsEzS7RGxNNl3QNIqoOr7fl+866Wa8ITX9TDhxJ7WRG1mubX+jvWsv/OulvfbaY+/1qu5TgWeiYgb\nJD0LXCTp5Ih4JKm5vhvYUu34ieec0NpozSz35p47l7nnzh1Zv3zpFW2MJjv1aq4zgKskHQD2AUuA\nbx2svVK6y/bnGcZnZpZKrr4VKyLWAesqNp+dXThmZo3ptOTqx1/NzDLgx1/NrBBydUPLzCwvOq0s\n4ORqZgXh5Gpm1nKdlVqdXM2sIFxzNTPLhJOrmVnLdVZqdXI1s4LQ6O/HapvOisbMrCA8cjWzQvA8\nVzOzTDi5mpm1XGelVidXMyuITpvnmvsbWsOP7m13CJl4/uEn2x1CJu5dv6HdIbTc4ObBdoeQifV3\nrG93CGPU+GteJPVL2i5ph6RLqrT5UrJ/a/KGlpryn1wfK2pyfardIWTi3trvj8+lwQcG2x1CJrJ4\nFUuWlPLXK46TuoHllF7IehqwSNKpFW3OB14fEdOBPwa+Ui+e3CdXMzMolQXSLKOYA+yMiMGI2A+s\nAhZUtHk38C2AiLgfOFrS8bXicXI1s0PdNGB32fqeZFu9NjVfEqiIaEl0r+hYyqZjMyuciGjqbtRY\n8035+ST9PtAfER9P1j8MnBURnyhr8z3gyoi4J1m/Dbg4Ih6odo7MZgs0+x/LzCytJvPNENBbtt5L\naWRaq80JybaqXBYws0PdJmC6pD5JhwMLgdUVbVYDHwGQ9NvALyLiZ7U69TxXMzukRcSwpCXAWqAb\nWBkRA5IWJ/tXRMQtks6XtBP4JXBBvX4zq7mamR3KXBYwM8tAbpKrpJllnw+XdKmk70m6QtJR7Ywt\nK5K+2u4YWknSw+2OoRmSJkj6E0nLJL25Yt9n2xVXsyS9WtJnk2ubIOkySd+XtFTSke2OL69yUxaQ\ntCUiZiWfrwGmANcD7wWmRMRH2hlfoyRNqbYL2BYRlfPtckHSc0Dw8ucNjwJ+BURE9LQlsCZIWgkc\nCWwEPgzcGRF/mewb+fnMG0k3AbsoXdtMYBulifTvpvR362NtDC+38ppctwJnRsQ+lR652BYRM9ob\nYWMkHQAerbJ7WkQcPp7xtIqkLwFHU5oL+ETy57QrIn69zaE1TNJPDv6cSToMuA44FvggsCHHyXVb\nRMxM/oz+G3htRBxI1rdGxMw6Xdgo8jRb4NWSfo/SSOjIiNgHpSFQzh9Y2AWcFxEvS7DJD/Zj7Qmp\neRHxSUlnAP8q6WZKz27n3WEHPySPSX5c0mXA7cDktkXVvAMw8ndpTUSUr7c3shzLTc0VWA+8C3gn\ncI+k1wAkv/9POwNr0rXAMZUbo/RPii+OfzitExGbgLclq3cAE9sXTUtslvSO8g0R8TlK5am+tkTU\nGpslvQogIkamGEk6GSjmNyONg9yUBQ6S9H7g1ojYK+lS4HRgWURsbnNoTalyXUtrPV6XB5LeR2n+\n4GTgImAWOb+uQ+hncDZweURsbHNouZSnketBlyZ/+GcD5wErKdW+8m6066r7tWY58HcRsRc4CXgr\nxbiuQ+Vn8OsUo5zTFnlMrv+X/P5O4GsR8X0glzd9Kvi68qOI1wTFva62yGNyHUrmfy4EfiBpIvm8\njkq+rvwo4jVBca+rLfJYc51E6RvDt0XEDklTgRkRsa7NoTXF15UfRbwmKO51tUvukquZWR54yG9m\nlgEnVzOzDDi5mpllwMnVzCwDTq5mZhn4f07XhWG0wtKoAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The UPGMA trees that result from these alignments are very different. First we'll look at the guide tree, and then the tree resulting from the progressive multiple sequence alignment." ] }, { "cell_type": "code", "collapsed": false, "input": [ "d = dendrogram(guide_lm, labels=guide_dm.ids, orientation='right', \n", " link_color_func=lambda x: 'black')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAWsAAAD7CAYAAACsV7WPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAC85JREFUeJzt3W2Ipeddx/Hvr7sptIvb3SFSl02LUKJV2cb6UKsWXQmV\nbadWjA8vfADrNog1rW8kSkFX2WiD0SJFLDaEWHChXVqlJQmxmholpQkG9smHQHZrtV2tmGbYRmlI\nwv59MWfLdDJz5uzsnHPPf873A8POnPuea669mPnOvdeZ2TtVhSRpe3vJ0BOQJG3MWEtSA8Zakhow\n1pLUgLGWpAaMtSQ1sHvcwST+XJ8kbUJVZSvH2/DKuqp2/MuxY8cGn8N2e3FNXBPXZfMv0+A2iCQ1\nYKwlqQFjDRw+fHjoKWw7rsmLuSZrc11mI+P2V5LUtPZfJGmnSkLN+glGSdLwjLUkNWCsJakBYy1J\nDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJGkCSe5KcTnI2yV8lecXY8/1f\n9yRpa03yv+4l+Yaqemb0+h8BS1V1x3rne2UtSVOWZE+S+0dX0ueS/MyKUAd4GfDUuDHG3jBXkrQl\njgAXq2oRIMne0Z/3Am8BzgPvGTeA2yACYGFhgaWlpaGnIe0YK7dBktwIfAr4KHBfVT2y4thLgD8B\n/ruqfne98Yy1gK/tsQ09DWlHWGvPOsk+YBG4FXioqo6vOPZDwO1V9bb1xnQbRJKmLMkBlp9APJHk\nEvDOJK+pqgujPeu3A6fGjWGsJWn6DgF3JbkMPAfcBnz4yt418Djwq+MGcBtEgNsg0lbyhrmSNKeM\ntSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPG\nWpIaMNaS1ICxlqQGvK3XJngncEmz5m29NmEn3gJrJ/6dpKF4Wy9JmlPGWpIaMNaS1ICxlqQGjLUk\nNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS9IAkpxI\n8kSSc0nuSTL2zl3GWpKG8RdV9dqqOgS8DHjnuJO9B6MkTVmSPcBJ4CCwCzheVSdXnPKPwA3jxjDW\nkjR9R4CLVbUIkGTvlQNJrgN+HnjPuAHcBpGk6TsLvDnJnUneVFVfWXHsT4G/r6rPjBvAu5tvwk68\nE/jCwgJLS0tDT0PaMVbf3TzJPmARuBV4qKqOJzkG3FRVt2w0nrHehJ0Ya0lbZ9SIrHj7ALBUVc8m\neRtwFLgfeAdwc1U9u+GYxvrqGWtJ46wR6x8F7gIuA88B7wIeBT4P/O/otI9X1R3rjmmsr56xljTO\n6lhvBZ9glKQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8ZakhqYi//IyV+lltTdXPxSzFb/Eou/FCNp\nHH8pRpLmlLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMt\nSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGW\npAaMtSQ1YKwlaUBJPpDkmY3OM9aSNJAk3wPsA2qjc421JE1Zkj1J7k9yOsm5JD+dZBfwB8DtQDYa\nY/fUZylJOgJcrKpFgCR7gduAT1TVl5INW22sN2P//v1MsriSNHIW+MMkdwL3AZ8Dfgo4nAljkqr1\nt0qS1LjjXSRhJ/w9JPUwak5WPbYPWARuBT4N/Arw7Ojwq4ELVfUt643plbUkTVmSA8BSVZ1Icgk4\nWlUHVhx/ZlyowVhL0iwcAu5Kchl4juWr6pU2/Ke/2yCStMXW2ga5Vv7oniQ1MNVtkIWFBZaWlqb5\nISRpLkx1G2S7bD9sl3lImg9ug0jSnDLWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIa\nMNaS1ICxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkN\nGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIGkOS2JOeTXE6y\nsNH5xlqShvEIcDPw75OcvHu6c5EkJdkDnAQOAruA41V1cnRsojGMtSRN3xHgYlUtAiTZe7UDuA0i\nSdN3FnhzkjuTvKmqvnK1A8zFlfX+/fsn/qeGJG21qnoyyeuBReCOJA9V1fGrGWMuYv30008PPQVJ\nc2T1xWGSA8BSVZ1Icgk4Onr8yokbXk26DSJJ03cIeCzJKeC3gONJ3g38B8tPOp5N8qFxA6Sq1j+Y\n1LjjG0nCtby/JHU0at+W7r16ZS1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkN\nGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQG\njLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1ID\nxlqSGjDWkjSAJH+e5HNJTo1eXjfu/N2zmpgk6esU8OtV9ZeTnGysJWnKkuwBTgIHgV3A8SuHJh3D\nbRBJmr4jwMWq+s6qOgQ8OHr8fUnOJHl/kpeOGyBVtf7BpMYd38jCwgJLS0ubfn9J6qqqvnbVnORG\n4FPAR4H7quqRJN9UVV8aRfpDwIWqOr7OcNONtSTNoyRfF+vRY/uAReBW4KGVYU7ywyzvX//YemO6\nZy1JU5bkALBUVSeSXAKOrriyDvATwLlxYxhrSZq+Q8BdSS4DzwHvAk4k+UaWn2Q8Bbx33ABug0jS\nFltrG+Ra+dMgktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCs\nJakBYy1JDRhrSWrAWAMPP/zw0FPYdlyTF3NN1ua6zIaxxk+2tbgmL+aarM11mQ1jLUkNGGtJamDD\n23rNcC6StGNs9W29xsZakrQ9uA0iSQ0Ya0lqYG5ineRIkieSPJnkN9Y4fn2SB5OcTvJPSX5xgGnO\n3EbrMjrncJJTo3V5eMZTnLlJ1mR03vcmeSHJLbOc3xAm+Pr5uSRnkpxN8pkkrxtinrM24dfPB0bH\nzyR5/aY/WFXt+BdgF3Ae+GbgOuA08G2rzvkd4H2j168HvgzsHnru22Bd9gH/DNxwZW2GnvfQa7Li\nvE8D9wE/OfS8h14T4PuBV4xePwI8OvS8t8m6vBV4YPT6913LuszLlfUbgPNV9fmqeh74CPDjq875\nL2Dv6PW9wJer6oUZznEIk6zLzwIfr6ovAlTVUzOe46xNsiYA7wY+BvzPLCc3kA3XpKo+W1WXRm8+\nBtww4zkOYZLPlbcDHwaoqseAfUleuZkPNi+xPgh8YcXbXxw9ttLdwHck+U/gDPBrM5rbkCZZlxuB\nhSR/l+TxJL8ws9kNY8M1SXKQ5S/KD44e2uk/UjXJ58lKR4EHpjqj7WGSdVnrnE19I9u9mXdqaJIv\npvcCp6vqcJLXAH+T5KaqembKcxvSJOtyHfBdwM3Ay4HPJnm0qp6c6syGM8ma/DHwm1VVSQJs6c/T\nbkMTfzNK8iPALwE/OL3pbBuTrsvqz49NfXOfl1hfBF614u1XsfwdbqUfAH4PoKouJPk34FuBx2cy\nw2FMsi5fAJ6qqq8CX03yD8BNwE6N9SRr8t3AR5Y7zfXAW5I8X1WfnM0UZ26SNWH0pOLdwJGqWprR\n3IY0ybqsPueG0WNXb+hN+hk9EbAbuMDyEwEvZe0nAt4PHBu9/srRoi8MPfdtsC6vBf6W5SdTXg6c\nA7596LkPuSarzr8XuGXoeQ+9JsCrWX6y7Y1Dz3ebrcvKJxjfyDU8wTgXV9ZV9UKS24C/Zjk691TV\nvyb55dHxPwN+H7g3yRmW9/Jvr6qnB5v0DEyyLlX1RJIHgbPAZeDuqvqX4WY9XRN+rsyVCdfkt4H9\nwAdH/+J4vqreMNScZ2HCr58Hkrw1yXng/4B3bPbj+evmktTAvPw0iCS1ZqwlqQFjLUkNGGtJasBY\nS1IDxlqSGjDWktSAsZakBv4fsqqfGsmCeYIAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "msa_lm = average(msa_dm.condensed_form())\n", "d = dendrogram(msa_lm, labels=msa_dm.ids, orientation='right', \n", " link_color_func=lambda x: 'black')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAWsAAAD7CAYAAACsV7WPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADLZJREFUeJzt3X2MZfVdx/H3p7s0gW1hd2zTEsCYVCo22VZqxapVV0nr\n0qVPtBpTHxIFYlopJrapxqholsRGojFNY2MJaRvdWEip2FDaklKJYoAUBXbVElkM2m5SU8q4Qh/C\nkv36x1zsdJm5c+4w9+HLvl/JhJk5Z898Obnz3jPnzOaXqkKStNieM+8BJEkbM9aS1ICxlqQGjLUk\nNWCsJakBYy1JDWwftzGJv9cnSZtQVdnK4214ZV1Vg9+uuuqqifZflDfndu5n89ydZ+869zR4G0SS\nGjDWktTAlsZ6z549W3m4mXHu2XLu2es6e9e5pyHj7q8kqWndf5GkZ6sk1KwfMEqS5s9YS1IDxlqS\nGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQ5SHIgyQNJDiW5LsnY\nlbuMtSTNx19V1XlVtRs4Fbhs3M5jSy5JeuaS7ABuAM4CtgH7q+qGVbt8ATh73DGMtSRN317gSFXt\nA0hy+lMbkpwC/CJw5bgDuPiAFtLS0hLLy8vzHkPatNWLDyQ5F7gVuB64uaruWLXtWuCxqvrNcccz\n1lpIo5U25j2GtClrrRSTZCewD7gcuK2q9ie5CnhFVV2y0TG9DSJJU5bkTGC5qg4kOQpcmuQy4HXA\nhYOO4ZW1FpFX1ursxCvrJK8DrgGOA08A7wTuAh4GHh/tdmNVXb3uMY21FpGxVmcumCtJJyljLUkN\nGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQG\njLUkNWCsJakB12BcMK7qLWktLuu1YFzOaoXnQZ25rJcknaSMtSQ1YKwlqQFjLUkNGGtJasBYS1ID\nxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQ5SHJFksNJjidZ\n2mh/Yy1J83EHcCHwn0N2dg1GSZqyJDuAG4CzgG3A/qq6YbRt0DGMtSRN317gSFXtA0hy+qQH8DaI\nJE3fQeC1Sd6X5DVV9b+THsAray2kXbt2Df7xUFp0VfVgkvOBfcDVSW6rqv2THMNYayE9+uij8x5B\n2rQTLzSSnAksV9WBJEeBS0eff2rHDa9MvA0iSdO3G7g7yb3A7wH7k7wL+C9WHjoeTPKhcQdIVa2/\nMalx27X1kuA5l3obfR9v6X08r6wlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWp\nAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCsJakBYy1JDUx1wdylpSWWl5en+SUk6aQw\n1TUYXU9wcp4zqT/XYJSkk5SxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaM\ntSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWkjQHSa5Lcl+Sg0n+JskZY/d3pZjF4jmT+huyUkyS\n51fVY6P3/wRYrqqr19vfK2tJmrIkO5J8anQlfSjJz60KdYBTgUfGHWOqC+ZKkgDYCxypqn0ASU4f\n/ffDwEXAYeDKcQfwNsiC8ZxJT7e0tMTy8vK8x5jI6tsgSc4FbgWuB26uqjtWbXsO8AHgv6vqD9c7\nnrFeMJ4z6em6fV+sdc86yU5gH3A5cFtV7V+17SeA91bVxesd09sgkjRlSc5k5QHigSRHgcuSvKSq\nHhrds34jcO+4YxhrSZq+3cA1SY4DTwBXAB996t41cA/w6+MO4G2QBeM5k56u2/fFkF/dm5S/uidJ\nDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZak\nBoy1JDVgrCWpAWMtSQ0Ya0lqwDUYF8yuXbtYWT9Tkr7NNRglLbxuLXENRkk6SRlrSWrAWEtSA8Za\nkhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCVpjpK8\nP8ljG+1nrCVpTpK8CtgJbLiygrGWpClLsiPJp5Lcl+RQkp9Nsg34Y+C9wIaryrgGoyRN317gSFXt\nA0hyOnAF8LdV9ZUh6656ZS1J03cQeG2S9yV5DfA84G3ABzJwhWyvrCUtvF27djGwaQupqh5Mcj6w\nD7ga+DzwvcDh0S6nJfn3qnrpesdwdXNJ2mInrm6e5Exguaq+leRi4NKqesuq7Y9V1fPHHdMra0ma\nvt3ANUmOA08A7zhh+4ZXtV5ZS9IWO/HKeiv4gFGSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lq\nwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1\nYKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIa\nMNaS1ICxlqQGjLUkNWCsJWkOknwkyX8kuXf09vJx+2+f1WCSpO9QwHuq6hNDdjbWkjRlSXYANwBn\nAduA/U9tGnoMb4NI0vTtBY5U1Q9U1W7gM6PP/1GS+5P8aZLnjjtAqmr9jUmN276RpaUllpeXN/3n\nJamrqvr/q+Yk5wK3AtcDN1fVHUleXFVfGUX6Q8BDVbV/ncNNN9aSdDJK8h2xHn1uJ7APuBy4bXWY\nk/wkK/ev37DeMb1nLUlTluRMYLmqDiQ5Cly66so6wFuAQ+OOYawlafp2A9ckOQ48AbwTOJDkhaw8\nZLwX+J1xB/A2iCRtsbVugzxT/jaIJDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUk\nNWCsJakBYy1JDRhrSWrAWEtSA8ZakhrY0ljffvvtW3m4mXHu2XLu2es6e9e5p8FY49yz5tyz13X2\nrnNPg7dBJKkBYy1JDWy4rNcMZ5GkZ42tXtZrbKwlSYvB2yCS1ICxlqQGJo51kr1JHkjyYJLfWmP7\ne5LcO3o7lOTJJDu3ZtxnZqPZR/vsGc3+L0lun/GIaxpwzvckObrqvP/uPOY80ZDzPdrvh0avk0tm\nOd96BpzvNyW5f3Su/ynJT89jzhMNmPsXRnMfTPKPSV4+jznXMmD285LcmeRbSd49jxnXMrAp7x9t\nvz/J+Zv+YlU1+A3YBhwGvgc4BbgP+P4x+18MfG6SrzGttyGzAzuBfwXOHn38giZz7wE+Oe9ZN/Na\nGe33eeBm4K0d5gZ2rHp/N3C4ydw/Apwxen8vcNe8555g9hcCrwKuBt4975knmPv1wC2j93/4mZzz\nSa+sLxi9MB+uqmPAx4A3jdn/7cBfT/g1pmXI7G8HbqyqLwNU1SMznnEtQ8/5lj553gJD534X8HHg\nq7McbowN566qr6/68HlAi9dJVd1ZVUdHH94NnD3jGdczZPavVtU9wLF5DLiOIa/xNwIfBaiqu4Gd\nSV60mS82aazPAr606uMvjz73NElOA34GuHEzg03BkNnPBZaS/F2Se5L80symW9+QuQv40dGPWbck\nednMplvfhnMnOYuVF/cHR59ahF9NGvQaT/LmJF8EPg1cOaPZxhn8vTlyKXDLVCcabtLZF8WQudfa\nZ1N/SW6fcP9JvpneANxRVf8z4deYliGznwK8ErgQOA24M8ldVfXgVCcbb8jc/wycU1XfSHIRcBPw\n0umOtaEhc/8Z8NtVVUnCYvx0MOg1XlU3ATcl+XHgL4Hvm+pUA0YaumOSnwJ+Ffix6Y0zkUX4S3oz\nhs594ut6U/+/k8b6CHDOqo/PYeVvirX8PItzCwSGzf4l4JGq+ibwzSR/D7wCmGesN5y7qh5b9f6n\nk/x5kqWqenRGM65lyPn+QeBjK53mBcBFSY5V1SdnM+KaJnmNU1X/kGR7ku+qqq9Nfbr1DZp79FDx\nWmBvVS3PaLaNTHTOF8iQuU/c5+zR5yY34Q317cBDrNxQfy7rPzQ6A/gacOq8HwJMMjtwHvA5Vh4c\nnAYcAl7WYO4X8e1/4HQB8HCH833C/h8GLukwN/CSVef7lcBDTeb+blYeiL163vNu9rUC/AGL84Bx\nyDlf/YDx1TyDB4wTXVlX1ZNJrgA+OwradVX1xSS/Ntr+F6Nd3wx8tlauUBfCkNmr6oEknwEOAseB\na6vq3+Y39eBz/jbgHUmeBL7Byk81czXBa2WhDJz7rcAvJzkGPE6f8/37wC7gg6OfZo5V1QXzmvkp\nQ2ZP8mLgC8DpwPEkv8HKhdTjizx3Vd2S5PVJDgNfB35ls1/Pf24uSQ34LxglqQFjLUkNGGtJasBY\nS1IDxlqSGjDWktSAsZakBoy1JDXwfyLSyTiWENvGAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can wrap this all up in a single convenience function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import progressive_msa_and_tree\n", "%psource progressive_msa_and_tree" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0mprogressive_msa_and_tree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequences\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msequence_distance_fn\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkmer_distance\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mguide_tree\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdisplay_aln\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdisplay_tree\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\" Perform progressive msa of sequences and build a UPGMA tree\u001b[0m\n", "\u001b[0;34m Parameters\u001b[0m\n", "\u001b[0;34m ----------\u001b[0m\n", "\u001b[0;34m sequences : skbio.SequenceCollection\u001b[0m\n", "\u001b[0;34m The sequences to be aligned.\u001b[0m\n", "\u001b[0;34m pairwise_aligner : function\u001b[0m\n", "\u001b[0;34m Function that should be used to perform the pairwise alignments,\u001b[0m\n", "\u001b[0;34m for example skbio.Alignment.global_pairwise_align_nucleotide. Must\u001b[0m\n", "\u001b[0;34m support skbio.BiologicalSequence objects or skbio.Alignment objects\u001b[0m\n", "\u001b[0;34m as input.\u001b[0m\n", "\u001b[0;34m sequence_distance_fn : function\u001b[0m\n", "\u001b[0;34m Function that returns and skbio.DistanceMatrix given an\u001b[0m\n", "\u001b[0;34m skbio.SequenceCollection. This will be used to build a guide tree if\u001b[0m\n", "\u001b[0;34m one is not provided.\u001b[0m\n", "\u001b[0;34m guide_tree : skbio.TreeNode, optional\u001b[0m\n", "\u001b[0;34m The tree that should be used to guide the alignment process.\u001b[0m\n", "\u001b[0;34m display_aln : bool, optional\u001b[0m\n", "\u001b[0;34m Print the alignment before returning.\u001b[0m\n", "\u001b[0;34m display_tree : bool, optional\u001b[0m\n", "\u001b[0;34m Print the tree before returning.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Returns\u001b[0m\n", "\u001b[0;34m -------\u001b[0m\n", "\u001b[0;34m skbio.alignment\u001b[0m\n", "\u001b[0;34m skbio.TreeNode\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mguide_tree\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mguide_dm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msequences\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdistances\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequence_distance_fn\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mguide_lm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maverage\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mguide_dm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcondensed_form\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mguide_tree\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTreeNode\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_linkage_matrix\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mguide_lm\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mguide_dm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mids\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mmsa\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mprogressive_msa\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequences\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mguide_tree\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdisplay_aln\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsa\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mmsa_dm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmsa\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdistances\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mmsa_lm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maverage\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsa_dm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcondensed_form\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mmsa_tree\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTreeNode\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_linkage_matrix\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsa_lm\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmsa_dm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mids\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdisplay_tree\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"\\nOutput tree:\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0md\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdendrogram\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsa_lm\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabels\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmsa_dm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mids\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morientation\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'right'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mlink_color_func\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'black'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mleaf_font_size\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m24\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mmsa\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmsa_tree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 35 }, { "cell_type": "code", "collapsed": false, "input": [ "msa, tree = progressive_msa_and_tree(query_sequences, pairwise_aligner=global_pairwise_align_nucleotide, display_tree=True, display_aln=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">s5\n", "---GGC--CCA-----CTGAT\n", ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s4\n", "---GGCACCAAACAGA--A--\n", ">s2\n", "ATCGGTACC-GGTAGA--AGT\n", ">s3\n", "---GGTACCAAATAGA--A--\n", "\n", "\n", "Output tree:\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAWsAAAD7CAYAAACsV7WPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADLZJREFUeJzt3X2MZfVdx/H3p7s0gW1hd2zTEsCYVCo22VZqxapVV0nr\n0qVPtBpTHxIFYlopJrapxqholsRGojFNY2MJaRvdWEip2FDaklKJYoAUBXbVElkM2m5SU8q4Qh/C\nkv36x1zsdJm5c+4w9+HLvl/JhJk5Z898Obnz3jPnzOaXqkKStNieM+8BJEkbM9aS1ICxlqQGjLUk\nNWCsJakBYy1JDWwftzGJv9cnSZtQVdnK4214ZV1Vg9+uuuqqifZflDfndu5n89ydZ+869zR4G0SS\nGjDWktTAlsZ6z549W3m4mXHu2XLu2es6e9e5pyHj7q8kqWndf5GkZ6sk1KwfMEqS5s9YS1IDxlqS\nGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQ5SHIgyQNJDiW5LsnY\nlbuMtSTNx19V1XlVtRs4Fbhs3M5jSy5JeuaS7ABuAM4CtgH7q+qGVbt8ATh73DGMtSRN317gSFXt\nA0hy+lMbkpwC/CJw5bgDuPiAFtLS0hLLy8vzHkPatNWLDyQ5F7gVuB64uaruWLXtWuCxqvrNcccz\n1lpIo5U25j2GtClrrRSTZCewD7gcuK2q9ie5CnhFVV2y0TG9DSJJU5bkTGC5qg4kOQpcmuQy4HXA\nhYOO4ZW1FpFX1ursxCvrJK8DrgGOA08A7wTuAh4GHh/tdmNVXb3uMY21FpGxVmcumCtJJyljLUkN\nGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQG\njLUkNWCsJakB12BcMK7qLWktLuu1YFzOaoXnQZ25rJcknaSMtSQ1YKwlqQFjLUkNGGtJasBYS1ID\nxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQ5SHJFksNJjidZ\n2mh/Yy1J83EHcCHwn0N2dg1GSZqyJDuAG4CzgG3A/qq6YbRt0DGMtSRN317gSFXtA0hy+qQH8DaI\nJE3fQeC1Sd6X5DVV9b+THsAray2kXbt2Df7xUFp0VfVgkvOBfcDVSW6rqv2THMNYayE9+uij8x5B\n2rQTLzSSnAksV9WBJEeBS0eff2rHDa9MvA0iSdO3G7g7yb3A7wH7k7wL+C9WHjoeTPKhcQdIVa2/\nMalx27X1kuA5l3obfR9v6X08r6wlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWp\nAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCsJakBYy1JDUx1wdylpSWWl5en+SUk6aQw\n1TUYXU9wcp4zqT/XYJSkk5SxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaM\ntSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWkjQHSa5Lcl+Sg0n+JskZY/d3pZjF4jmT+huyUkyS\n51fVY6P3/wRYrqqr19vfK2tJmrIkO5J8anQlfSjJz60KdYBTgUfGHWOqC+ZKkgDYCxypqn0ASU4f\n/ffDwEXAYeDKcQfwNsiC8ZxJT7e0tMTy8vK8x5jI6tsgSc4FbgWuB26uqjtWbXsO8AHgv6vqD9c7\nnrFeMJ4z6em6fV+sdc86yU5gH3A5cFtV7V+17SeA91bVxesd09sgkjRlSc5k5QHigSRHgcuSvKSq\nHhrds34jcO+4YxhrSZq+3cA1SY4DTwBXAB996t41cA/w6+MO4G2QBeM5k56u2/fFkF/dm5S/uidJ\nDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZak\nBoy1JDVgrCWpAWMtSQ0Ya0lqwDUYF8yuXbtYWT9Tkr7NNRglLbxuLXENRkk6SRlrSWrAWEtSA8Za\nkhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCVpjpK8\nP8ljG+1nrCVpTpK8CtgJbLiygrGWpClLsiPJp5Lcl+RQkp9Nsg34Y+C9wIaryrgGoyRN317gSFXt\nA0hyOnAF8LdV9ZUh6656ZS1J03cQeG2S9yV5DfA84G3ABzJwhWyvrCUtvF27djGwaQupqh5Mcj6w\nD7ga+DzwvcDh0S6nJfn3qnrpesdwdXNJ2mInrm6e5Exguaq+leRi4NKqesuq7Y9V1fPHHdMra0ma\nvt3ANUmOA08A7zhh+4ZXtV5ZS9IWO/HKeiv4gFGSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lq\nwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1\nYKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIa\nMNaS1ICxlqQGjLUkNWCsJWkOknwkyX8kuXf09vJx+2+f1WCSpO9QwHuq6hNDdjbWkjRlSXYANwBn\nAduA/U9tGnoMb4NI0vTtBY5U1Q9U1W7gM6PP/1GS+5P8aZLnjjtAqmr9jUmN276RpaUllpeXN/3n\nJamrqvr/q+Yk5wK3AtcDN1fVHUleXFVfGUX6Q8BDVbV/ncNNN9aSdDJK8h2xHn1uJ7APuBy4bXWY\nk/wkK/ev37DeMb1nLUlTluRMYLmqDiQ5Cly66so6wFuAQ+OOYawlafp2A9ckOQ48AbwTOJDkhaw8\nZLwX+J1xB/A2iCRtsbVugzxT/jaIJDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUk\nNWCsJakBYy1JDRhrSWrAWEtSA8ZakhrY0ljffvvtW3m4mXHu2XLu2es6e9e5p8FY49yz5tyz13X2\nrnNPg7dBJKkBYy1JDWy4rNcMZ5GkZ42tXtZrbKwlSYvB2yCS1ICxlqQGJo51kr1JHkjyYJLfWmP7\ne5LcO3o7lOTJJDu3ZtxnZqPZR/vsGc3+L0lun/GIaxpwzvckObrqvP/uPOY80ZDzPdrvh0avk0tm\nOd96BpzvNyW5f3Su/ynJT89jzhMNmPsXRnMfTPKPSV4+jznXMmD285LcmeRbSd49jxnXMrAp7x9t\nvz/J+Zv+YlU1+A3YBhwGvgc4BbgP+P4x+18MfG6SrzGttyGzAzuBfwXOHn38giZz7wE+Oe9ZN/Na\nGe33eeBm4K0d5gZ2rHp/N3C4ydw/Apwxen8vcNe8555g9hcCrwKuBt4975knmPv1wC2j93/4mZzz\nSa+sLxi9MB+uqmPAx4A3jdn/7cBfT/g1pmXI7G8HbqyqLwNU1SMznnEtQ8/5lj553gJD534X8HHg\nq7McbowN566qr6/68HlAi9dJVd1ZVUdHH94NnD3jGdczZPavVtU9wLF5DLiOIa/xNwIfBaiqu4Gd\nSV60mS82aazPAr606uMvjz73NElOA34GuHEzg03BkNnPBZaS/F2Se5L80symW9+QuQv40dGPWbck\nednMplvfhnMnOYuVF/cHR59ahF9NGvQaT/LmJF8EPg1cOaPZxhn8vTlyKXDLVCcabtLZF8WQudfa\nZ1N/SW6fcP9JvpneANxRVf8z4deYliGznwK8ErgQOA24M8ldVfXgVCcbb8jc/wycU1XfSHIRcBPw\n0umOtaEhc/8Z8NtVVUnCYvx0MOg1XlU3ATcl+XHgL4Hvm+pUA0YaumOSnwJ+Ffix6Y0zkUX4S3oz\nhs594ut6U/+/k8b6CHDOqo/PYeVvirX8PItzCwSGzf4l4JGq+ibwzSR/D7wCmGesN5y7qh5b9f6n\nk/x5kqWqenRGM65lyPn+QeBjK53mBcBFSY5V1SdnM+KaJnmNU1X/kGR7ku+qqq9Nfbr1DZp79FDx\nWmBvVS3PaLaNTHTOF8iQuU/c5+zR5yY34Q317cBDrNxQfy7rPzQ6A/gacOq8HwJMMjtwHvA5Vh4c\nnAYcAl7WYO4X8e1/4HQB8HCH833C/h8GLukwN/CSVef7lcBDTeb+blYeiL163vNu9rUC/AGL84Bx\nyDlf/YDx1TyDB4wTXVlX1ZNJrgA+OwradVX1xSS/Ntr+F6Nd3wx8tlauUBfCkNmr6oEknwEOAseB\na6vq3+Y39eBz/jbgHUmeBL7Byk81czXBa2WhDJz7rcAvJzkGPE6f8/37wC7gg6OfZo5V1QXzmvkp\nQ2ZP8mLgC8DpwPEkv8HKhdTjizx3Vd2S5PVJDgNfB35ls1/Pf24uSQ34LxglqQFjLUkNGGtJasBY\nS1IDxlqSGjDWktSAsZakBoy1JDXwfyLSyTiWENvGAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 36 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Progressive alignment versus iterative alignment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In an iterative alignment, the output tree from the above progressive alignment is used as a guide tree, and the full process repeated. This is performed to reduce errors that result from a low-quality guide tree. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "from iab.algorithms import iterative_msa_and_tree\n", "%psource iterative_msa_and_tree" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0;32mdef\u001b[0m \u001b[0miterative_msa_and_tree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequences\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mnum_iterations\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msequence_distance_fn\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkmer_distance\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdisplay_aln\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdisplay_tree\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\" Perform progressive msa of sequences and build a UPGMA tree\u001b[0m\n", "\u001b[0;34m Parameters\u001b[0m\n", "\u001b[0;34m ----------\u001b[0m\n", "\u001b[0;34m sequences : skbio.SequenceCollection\u001b[0m\n", "\u001b[0;34m The sequences to be aligned.\u001b[0m\n", "\u001b[0;34m num_iterations : int\u001b[0m\n", "\u001b[0;34m The number of iterations of progressive multiple sequence alignment to\u001b[0m\n", "\u001b[0;34m perform. Must be greater than zero and less than five.\u001b[0m\n", "\u001b[0;34m pairwise_aligner : function\u001b[0m\n", "\u001b[0;34m Function that should be used to perform the pairwise alignments,\u001b[0m\n", "\u001b[0;34m for example skbio.Alignment.global_pairwise_align_nucleotide. Must\u001b[0m\n", "\u001b[0;34m support skbio.BiologicalSequence objects or skbio.Alignment objects\u001b[0m\n", "\u001b[0;34m as input.\u001b[0m\n", "\u001b[0;34m sequence_distance_fn : function\u001b[0m\n", "\u001b[0;34m Function that returns and skbio.DistanceMatrix given an\u001b[0m\n", "\u001b[0;34m skbio.SequenceCollection. This will be used to build a guide tree.\u001b[0m\n", "\u001b[0;34m display_aln : bool, optional\u001b[0m\n", "\u001b[0;34m Print the alignment before returning.\u001b[0m\n", "\u001b[0;34m display_tree : bool, optional\u001b[0m\n", "\u001b[0;34m Print the tree before returning.\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m Returns\u001b[0m\n", "\u001b[0;34m -------\u001b[0m\n", "\u001b[0;34m skbio.alignment\u001b[0m\n", "\u001b[0;34m skbio.TreeNode\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mnum_iterations\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"A maximum of five iterations is allowed.\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"You requested %d.\"\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mnum_iterations\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mprevious_iter_tree\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnum_iterations\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mi\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mnum_iterations\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;31m# only display the last iteration\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdisplay\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTrue\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdisplay\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mFalse\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mprevious_iter_msa\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprevious_iter_tree\u001b[0m \u001b[0;34m=\u001b[0m \\\n", " \u001b[0mprogressive_msa_and_tree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msequences\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mpairwise_aligner\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msequence_distance_fn\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msequence_distance_fn\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mguide_tree\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mprevious_iter_tree\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdisplay_aln\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdisplay_aln\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mdisplay\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mdisplay_tree\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdisplay_tree\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mdisplay\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mprevious_iter_msa\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprevious_iter_tree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\n" ] } ], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": [ "msa, tree = iterative_msa_and_tree(query_sequences, pairwise_aligner=global_pairwise_align_nucleotide, num_iterations=1, display_aln=True, display_tree=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">s5\n", "---GGC--CCA-----CTGAT\n", ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s4\n", "---GGCACCAAACAGA--A--\n", ">s2\n", "ATCGGTACC-GGTAGA--AGT\n", ">s3\n", "---GGTACCAAATAGA--A--\n", "\n", "\n", "Output tree:\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAWsAAAD7CAYAAACsV7WPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADLZJREFUeJzt3X2MZfVdx/H3p7s0gW1hd2zTEsCYVCo22VZqxapVV0nr\n0qVPtBpTHxIFYlopJrapxqholsRGojFNY2MJaRvdWEip2FDaklKJYoAUBXbVElkM2m5SU8q4Qh/C\nkv36x1zsdJm5c+4w9+HLvl/JhJk5Z898Obnz3jPnzOaXqkKStNieM+8BJEkbM9aS1ICxlqQGjLUk\nNWCsJakBYy1JDWwftzGJv9cnSZtQVdnK4214ZV1Vg9+uuuqqifZflDfndu5n89ydZ+869zR4G0SS\nGjDWktTAlsZ6z549W3m4mXHu2XLu2es6e9e5pyHj7q8kqWndf5GkZ6sk1KwfMEqS5s9YS1IDxlqS\nGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQ5SHIgyQNJDiW5LsnY\nlbuMtSTNx19V1XlVtRs4Fbhs3M5jSy5JeuaS7ABuAM4CtgH7q+qGVbt8ATh73DGMtSRN317gSFXt\nA0hy+lMbkpwC/CJw5bgDuPiAFtLS0hLLy8vzHkPatNWLDyQ5F7gVuB64uaruWLXtWuCxqvrNcccz\n1lpIo5U25j2GtClrrRSTZCewD7gcuK2q9ie5CnhFVV2y0TG9DSJJU5bkTGC5qg4kOQpcmuQy4HXA\nhYOO4ZW1FpFX1ursxCvrJK8DrgGOA08A7wTuAh4GHh/tdmNVXb3uMY21FpGxVmcumCtJJyljLUkN\nGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQG\njLUkNWCsJakB12BcMK7qLWktLuu1YFzOaoXnQZ25rJcknaSMtSQ1YKwlqQFjLUkNGGtJasBYS1ID\nxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQ5SHJFksNJjidZ\n2mh/Yy1J83EHcCHwn0N2dg1GSZqyJDuAG4CzgG3A/qq6YbRt0DGMtSRN317gSFXtA0hy+qQH8DaI\nJE3fQeC1Sd6X5DVV9b+THsAray2kXbt2Df7xUFp0VfVgkvOBfcDVSW6rqv2THMNYayE9+uij8x5B\n2rQTLzSSnAksV9WBJEeBS0eff2rHDa9MvA0iSdO3G7g7yb3A7wH7k7wL+C9WHjoeTPKhcQdIVa2/\nMalx27X1kuA5l3obfR9v6X08r6wlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWp\nAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCsJakBYy1JDUx1wdylpSWWl5en+SUk6aQw\n1TUYXU9wcp4zqT/XYJSkk5SxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaM\ntSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWkjQHSa5Lcl+Sg0n+JskZY/d3pZjF4jmT+huyUkyS\n51fVY6P3/wRYrqqr19vfK2tJmrIkO5J8anQlfSjJz60KdYBTgUfGHWOqC+ZKkgDYCxypqn0ASU4f\n/ffDwEXAYeDKcQfwNsiC8ZxJT7e0tMTy8vK8x5jI6tsgSc4FbgWuB26uqjtWbXsO8AHgv6vqD9c7\nnrFeMJ4z6em6fV+sdc86yU5gH3A5cFtV7V+17SeA91bVxesd09sgkjRlSc5k5QHigSRHgcuSvKSq\nHhrds34jcO+4YxhrSZq+3cA1SY4DTwBXAB996t41cA/w6+MO4G2QBeM5k56u2/fFkF/dm5S/uidJ\nDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZak\nBoy1JDVgrCWpAWMtSQ0Ya0lqwDUYF8yuXbtYWT9Tkr7NNRglLbxuLXENRkk6SRlrSWrAWEtSA8Za\nkhow1pLUgLGWpAaMtSQ1YKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCVpjpK8\nP8ljG+1nrCVpTpK8CtgJbLiygrGWpClLsiPJp5Lcl+RQkp9Nsg34Y+C9wIaryrgGoyRN317gSFXt\nA0hyOnAF8LdV9ZUh6656ZS1J03cQeG2S9yV5DfA84G3ABzJwhWyvrCUtvF27djGwaQupqh5Mcj6w\nD7ga+DzwvcDh0S6nJfn3qnrpesdwdXNJ2mInrm6e5Exguaq+leRi4NKqesuq7Y9V1fPHHdMra0ma\nvt3ANUmOA08A7zhh+4ZXtV5ZS9IWO/HKeiv4gFGSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lq\nwFhLUgPGWpIaMNaS1ICxlqQGjLUkNWCsJakBYy1JDRhrSWrAWEtSA8Zakhow1pLUgLGWpAaMtSQ1\nYKwlqQFjLUkNGGtJasBYS1IDxlqSGjDWktSAsZakBoy1JDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIa\nMNaS1ICxlqQGjLUkNWCsJWkOknwkyX8kuXf09vJx+2+f1WCSpO9QwHuq6hNDdjbWkjRlSXYANwBn\nAduA/U9tGnoMb4NI0vTtBY5U1Q9U1W7gM6PP/1GS+5P8aZLnjjtAqmr9jUmN276RpaUllpeXN/3n\nJamrqvr/q+Yk5wK3AtcDN1fVHUleXFVfGUX6Q8BDVbV/ncNNN9aSdDJK8h2xHn1uJ7APuBy4bXWY\nk/wkK/ev37DeMb1nLUlTluRMYLmqDiQ5Cly66so6wFuAQ+OOYawlafp2A9ckOQ48AbwTOJDkhaw8\nZLwX+J1xB/A2iCRtsbVugzxT/jaIJDVgrCWpAWMtSQ0Ya0lqwFhLUgPGWpIaMNaS1ICxlqQGjLUk\nNWCsJakBYy1JDRhrSWrAWEtSA8ZakhrY0ljffvvtW3m4mXHu2XLu2es6e9e5p8FY49yz5tyz13X2\nrnNPg7dBJKkBYy1JDWy4rNcMZ5GkZ42tXtZrbKwlSYvB2yCS1ICxlqQGJo51kr1JHkjyYJLfWmP7\ne5LcO3o7lOTJJDu3ZtxnZqPZR/vsGc3+L0lun/GIaxpwzvckObrqvP/uPOY80ZDzPdrvh0avk0tm\nOd96BpzvNyW5f3Su/ynJT89jzhMNmPsXRnMfTPKPSV4+jznXMmD285LcmeRbSd49jxnXMrAp7x9t\nvz/J+Zv+YlU1+A3YBhwGvgc4BbgP+P4x+18MfG6SrzGttyGzAzuBfwXOHn38giZz7wE+Oe9ZN/Na\nGe33eeBm4K0d5gZ2rHp/N3C4ydw/Apwxen8vcNe8555g9hcCrwKuBt4975knmPv1wC2j93/4mZzz\nSa+sLxi9MB+uqmPAx4A3jdn/7cBfT/g1pmXI7G8HbqyqLwNU1SMznnEtQ8/5lj553gJD534X8HHg\nq7McbowN566qr6/68HlAi9dJVd1ZVUdHH94NnD3jGdczZPavVtU9wLF5DLiOIa/xNwIfBaiqu4Gd\nSV60mS82aazPAr606uMvjz73NElOA34GuHEzg03BkNnPBZaS/F2Se5L80symW9+QuQv40dGPWbck\nednMplvfhnMnOYuVF/cHR59ahF9NGvQaT/LmJF8EPg1cOaPZxhn8vTlyKXDLVCcabtLZF8WQudfa\nZ1N/SW6fcP9JvpneANxRVf8z4deYliGznwK8ErgQOA24M8ldVfXgVCcbb8jc/wycU1XfSHIRcBPw\n0umOtaEhc/8Z8NtVVUnCYvx0MOg1XlU3ATcl+XHgL4Hvm+pUA0YaumOSnwJ+Ffix6Y0zkUX4S3oz\nhs594ut6U/+/k8b6CHDOqo/PYeVvirX8PItzCwSGzf4l4JGq+ibwzSR/D7wCmGesN5y7qh5b9f6n\nk/x5kqWqenRGM65lyPn+QeBjK53mBcBFSY5V1SdnM+KaJnmNU1X/kGR7ku+qqq9Nfbr1DZp79FDx\nWmBvVS3PaLaNTHTOF8iQuU/c5+zR5yY34Q317cBDrNxQfy7rPzQ6A/gacOq8HwJMMjtwHvA5Vh4c\nnAYcAl7WYO4X8e1/4HQB8HCH833C/h8GLukwN/CSVef7lcBDTeb+blYeiL163vNu9rUC/AGL84Bx\nyDlf/YDx1TyDB4wTXVlX1ZNJrgA+OwradVX1xSS/Ntr+F6Nd3wx8tlauUBfCkNmr6oEknwEOAseB\na6vq3+Y39eBz/jbgHUmeBL7Byk81czXBa2WhDJz7rcAvJzkGPE6f8/37wC7gg6OfZo5V1QXzmvkp\nQ2ZP8mLgC8DpwPEkv8HKhdTjizx3Vd2S5PVJDgNfB35ls1/Pf24uSQ34LxglqQFjLUkNGGtJasBY\nS1IDxlqSGjDWktSAsZakBoy1JDXwfyLSyTiWENvGAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "msa, tree = iterative_msa_and_tree(query_sequences, pairwise_aligner=global_pairwise_align_nucleotide, num_iterations=2, display_aln=True, display_tree=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">s5\n", "---GGC--CCA-----CTGAT\n", ">s4\n", "---GGCACCAAACAGA----A\n", ">s3\n", "---GGTACCAAATAGA----A\n", ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s2\n", "ATCGGT-ACCGGTAGA--AGT\n", "\n", "\n", "Output tree:\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAD7CAYAAAB37B+tAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADNFJREFUeJzt3X2MZfVdx/H3p0AT2BZ2xzYtAYxJpWKTbaVWrFp1lbQu\nXfpEqzH1IVEgppViYptqjIpmSWwkGtM0NpaQ2ujGQkSxobQlpRLFACkK7Kolshi03aSmlHGFPgTI\nfv1jLnZcdu89d57O/e6+X8mEmblnznz3cuY9Z86ZyS9VhSSpr+eNPYAkaX0MuSQ1Z8glqTlDLknN\nGXJJas6QS1Jzp671A5P4e4uStAZVlY3c37rOyKuq7cs111wz+gzOP/4czt/vpfPsVZtz/uulFUlq\nzpBLUnMnbch37do19gjr4vzjcv7xdJ59s2St12yS1GZd75GkE1USapFudkqSxmfIJak5Qy5JzRly\nSWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqTlDLknNGXJJas6QS9KCSbIvyUNJDiS5IcnU1dwM\nuSQtnj+vqguqaidwOnDFtI3XvGanJGn9kmwDbgLOAU4B9lbVTas2+Txw7rR9GHJJGtdu4FBV7QFI\ncuazDyQ5DfhZ4OppO3BhCWmDLS0tsby8PPYYWmCrF5ZIcj5wO3AjcGtV3bXqseuBJ6rqV6ftz5BL\nG2yyAszYY2hBHWuFoCTbgT3AlcAdVbU3yTXAq6rqsln79NKKJI0oydnAclXtS3IYuDzJFcAbgIsH\n7cMzcmljeUauaY4+I0/yBuA64AjwFPBu4B7gUeDJyWY3V9W1x92nIZc2liHXNC6+LEl6DkMuSc0Z\ncklqzpBLUnOGXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpoz5JLUnCGXpOYM\nuSQ1Z8glqbkTcs1OVzGXdDI5IZd6c6ktjcnjT9O41Jsk6TkMuSQ1Z8glqTlDLknNGXJJas6QS1Jz\nhlySmjPkktScIZek5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKaM+SS1Jwhl6QFk+SqJAeTHEmy\nNGt7Qy5Ji+cu4GLgP4ZsfEKu2SlJXSTZBtwEnAOcAuytqpsmjw3ahyGXpHHtBg5V1R6AJGfOuwMv\nrUjSuPYDr0/ygSSvq6r/mXcHnpFLG2zHjh2DfySWqurhJBcCe4Brk9xRVXvn2YchlzbY448/PvYI\nWmBHf5NPcjawXFX7khwGLp+8/9kNZ54VeGlFksa1E7g3yf3AbwF7k7wH+E9WboDuT/KRaTtIVa3p\nMyeptX7sZkvCos4m6eQ26dOGXnvzjFySmjPkktScIZek5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOG\nXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpoz5JLU3GiLLy8tLbG8vDzWp5ek\nE8Zoa3Zu5rqartkpaVG5Zqck6TkMuSQ1Z8glqTlDLknNGXJJas6QS1JzhlySmjPkktScIZek5gy5\nJDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKaM+SS1Jwhl6QFk+SGJA8k2Z/kr5OcNXV7VwiSpK0zZIWg\nJC+sqicmr/8BsFxV1x5ve8/IJWlESbYl+eTkDPxAkp9aFfEApwOPTdvHaIsvS5IA2A0cqqo9AEnO\nnPz3o8AlwEHg6mk78NKKpLaWlpZYXl4ee4y5rb60kuR84HbgRuDWqrpr1WPPAz4E/FdV/e7x9mfI\nJbXV8Wv9WNfIk2wH9gBXAndU1d5Vj/0I8P6quvR4+/TSiiSNKMnZrNzM3JfkMHBFkpdV1SOTa+Rv\nBu6ftg9DLknj2glcl+QI8BRwFfCxZ6+VA/cBvzxtB15akdRWx6/1Ib9+OC9//VCSmjPkktScIZek\n5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtS\nc4Zckpoz5JLU3Am5ZueOHTtYWbNUkk58J+SanZJODh074pqdkqTnMOSS1Jwhl6TmDLkkNWfIJak5\nQy5JzRlySWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqTlDLknNGXJJas6QS9KCSvLBJE/M2s6Q\nS9ICSvIaYDswc+UMQy5JI0qyLcknkzyQ5ECSn0xyCvD7wPuBmasJnZBrdkpSI7uBQ1W1ByDJmcBV\nwN9U1ZeHrD/sGbkkjWs/8PokH0jyOuAFwDuAD2XgKvKekUtqa8eOHQxs3cKqqoeTXAjsAa4FPgd8\nJ3BwsskZSf6tql5+vH1krStQJ6n1rF7dcfVrSVqvSfuy6u2zgeWq+maSS4HLq+ptqx5/oqpeOG2f\nnpFL0rh2AtclOQI8BbzrqMdnnvF6Ri5JW+joM/KN4M1OSWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1\nZ8glqTlDLknNGXJJas6QS1JzhlySmjPkktScIZek5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKa\nM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqTlDLknN\nGXJJas6QS1JzhlySmjPkktScIZekBZPkT5P8e5L7Jy+vnLb9qVs1mCRpsALeV1V/NWRjQy5JI0qy\nDbgJOAc4Bdj77END9+GlFUka127gUFV9T1XtBD49ef/vJXkwyR8mef60HaSq1vSZk9RaPxZgaWmJ\n5eXlNX+8JHVVVf93tp3kfOB24Ebg1qq6K8lLq+rLk4B/BHikqvYeZ3fjhVySTkZJ/l/IJ+/bDuwB\nrgTuWB3tJD/KyvXyNx1vn14jl6QRJTkbWK6qfUkOA5evOiMP8DbgwLR9GHJJGtdO4LokR4CngHcD\n+5K8mJUbnvcDvzFtB15akaQtdKxLK+vlb61IUnOGXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRly\nSWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqbmTNuR33nnn2COsi/OPy/nH03n2zWLIm3L+cTn/\neDrPvllO2pBL0onCkEtSc+ta6m2DZ5Gkk8JGL/W25pBLkhaDl1YkqTlDLknNzQx5kt1JHkrycJJf\nO8bj70ty/+TlQJJnkmzfnHHnN2v+yTa7JvP/c5I7t3jEqQY8/7uSHF71/+A3x5jzeIY8/5Ptvm9y\n7Fy2lfPNMuD5f0uSByfP/T8m+fEx5jyWAbP/zGT2/Un+Ickrx5jzeAbMf0GSu5N8M8l7x5hxmoHt\n+eDk8QeTXLjmT1ZVx30BTgEOAt8BnAY8AHz3lO0vBT47bZ9b+TJkfmA78C/AuZO3XzT23HPOvwv4\nxNizruf4mWz3OeBW4O1jzz3n879t1es7gYNjzz3H7D8AnDV5fTdwz9hzzzn/i4HXANcC7x175jXM\n/0bgtsnr37+e53/WGflFkwPz0ap6Gvg48JYp278T+IsZ+9xKQ+Z/J3BzVX0JoKoe2+IZpxn6/G/o\nHfANNHT+9wB/CXxlK4cbYOb8VfW1VW++AFiU42fI7HdX1eHJm/cC527xjNMMmf8rVXUf8PQYA84w\n5Nh/M/AxgKq6F9ie5CVr+WSzQn4O8MVVb39p8r7nSHIG8BPAzWsZZJMMmf98YCnJ3ya5L8nPbdl0\nsw2Zv4AfnPxodluSV2zZdLPNnD/JOawc4B+evGuRfo1q0PGf5K1JvgB8Crh6i2abZfDX7sTlwG2b\nOtF85p1/0QyZ/1jbrOmb6akzHp/ni+pNwF1V9d9rGWSTDJn/NODVwMXAGcDdSe6pqoc3dbJhhsz/\nT8B5VfX1JJcAtwAv39yxBhsy/x8Bv15VlSQs1k8Xg47/qroFuCXJDwN/BnzXpk41zOCv3SQ/Bvwi\n8EObN87cFukb+loMnf/o431N/+5ZIT8EnLfq7fNY+a5xLD/NYl1WgWHzfxF4rKq+AXwjyd8BrwIW\nIeQz56+qJ1a9/qkkf5xkqaoe36IZpxny/H8v8PGVhvMi4JIkT1fVJ7ZmxKnmOf6pqr9PcmqSb6uq\nr276dNMNmn1yg/N6YHdVLW/RbEPM9dwvoCHzH73NuZP3zW/GBftTgUdYuWD/fI5/s+os4KvA6WPf\nZJh3fuAC4LOs3Jw4AzgAvGLs2eeY/yV86w+7LgIeHXvueY+fVdt/FLhs7LnnfP5ftur5fzXwyNhz\nzzH7t7NyQ+61Y8+7nmMH+B0W72bnkOd/9c3O17KOm51Tz8ir6pkkVwGfmYTuhqr6QpJfmjz+J5NN\n3wp8plbOahfGkPmr6qEknwb2A0eA66vqX8eb+lsGPv/vAN6V5Bng66z8ZLQQ5jh+FtLA+d8O/HyS\np4EnWZDnf+Dsvw3sAD48+Yno6aq6aKyZVxsyf5KXAp8HzgSOJPkVVk7Cnhxt8ImB7bktyRuTHAS+\nBvzCWj+ff6IvSc35l52S1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpr7X5xj706QscIm\nAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 39 }, { "cell_type": "code", "collapsed": false, "input": [ "msa, tree = iterative_msa_and_tree(query_sequences, pairwise_aligner=global_pairwise_align_nucleotide, num_iterations=3, display_aln=True, display_tree=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">s5\n", "---GGC--CCA-----CTGAT\n", ">s4\n", "---GGCACCAAACAGA----A\n", ">s3\n", "---GGTACCAAATAGA----A\n", ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s2\n", "ATCGGT-ACCGGTAGA--AGT\n", "\n", "\n", "Output tree:\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAD7CAYAAAB37B+tAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADNFJREFUeJzt3X2MZfVdx/H3p0AT2BZ2xzYtAYxJpWKTbaVWrFp1lbQu\nXfpEqzH1IVEgppViYptqjIpmSWwkGtM0NpaQ2ujGQkSxobQlpRLFACkK7Kolshi03aSmlHGFPgTI\nfv1jLnZcdu89d57O/e6+X8mEmblnznz3cuY9Z86ZyS9VhSSpr+eNPYAkaX0MuSQ1Z8glqTlDLknN\nGXJJas6QS1Jzp671A5P4e4uStAZVlY3c37rOyKuq7cs111wz+gzOP/4czt/vpfPsVZtz/uulFUlq\nzpBLUnMnbch37do19gjr4vzjcv7xdJ59s2St12yS1GZd75GkE1USapFudkqSxmfIJak5Qy5JzRly\nSWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqTlDLknNGXJJas6QS9KCSbIvyUNJDiS5IcnU1dwM\nuSQtnj+vqguqaidwOnDFtI3XvGanJGn9kmwDbgLOAU4B9lbVTas2+Txw7rR9GHJJGtdu4FBV7QFI\ncuazDyQ5DfhZ4OppO3BhCWmDLS0tsby8PPYYWmCrF5ZIcj5wO3AjcGtV3bXqseuBJ6rqV6ftz5BL\nG2yyAszYY2hBHWuFoCTbgT3AlcAdVbU3yTXAq6rqsln79NKKJI0oydnAclXtS3IYuDzJFcAbgIsH\n7cMzcmljeUauaY4+I0/yBuA64AjwFPBu4B7gUeDJyWY3V9W1x92nIZc2liHXNC6+LEl6DkMuSc0Z\ncklqzpBLUnOGXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpoz5JLUnCGXpOYM\nuSQ1Z8glqbkTcs1OVzGXdDI5IZd6c6ktjcnjT9O41Jsk6TkMuSQ1Z8glqTlDLknNGXJJas6QS1Jz\nhlySmjPkktScIZek5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKaM+SS1Jwhl6QFk+SqJAeTHEmy\nNGt7Qy5Ji+cu4GLgP4ZsfEKu2SlJXSTZBtwEnAOcAuytqpsmjw3ahyGXpHHtBg5V1R6AJGfOuwMv\nrUjSuPYDr0/ygSSvq6r/mXcHnpFLG2zHjh2DfySWqurhJBcCe4Brk9xRVXvn2YchlzbY448/PvYI\nWmBHf5NPcjawXFX7khwGLp+8/9kNZ54VeGlFksa1E7g3yf3AbwF7k7wH+E9WboDuT/KRaTtIVa3p\nMyeptX7sZkvCos4m6eQ26dOGXnvzjFySmjPkktScIZek5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOG\nXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpoz5JLU3GiLLy8tLbG8vDzWp5ek\nE8Zoa3Zu5rqartkpaVG5Zqck6TkMuSQ1Z8glqTlDLknNGXJJas6QS1JzhlySmjPkktScIZek5gy5\nJDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKaM+SS1Jwhl6QFk+SGJA8k2Z/kr5OcNXV7VwiSpK0zZIWg\nJC+sqicmr/8BsFxV1x5ve8/IJWlESbYl+eTkDPxAkp9aFfEApwOPTdvHaIsvS5IA2A0cqqo9AEnO\nnPz3o8AlwEHg6mk78NKKpLaWlpZYXl4ee4y5rb60kuR84HbgRuDWqrpr1WPPAz4E/FdV/e7x9mfI\nJbXV8Wv9WNfIk2wH9gBXAndU1d5Vj/0I8P6quvR4+/TSiiSNKMnZrNzM3JfkMHBFkpdV1SOTa+Rv\nBu6ftg9DLknj2glcl+QI8BRwFfCxZ6+VA/cBvzxtB15akdRWx6/1Ib9+OC9//VCSmjPkktScIZek\n5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtS\nc4Zckpoz5JLU3Am5ZueOHTtYWbNUkk58J+SanZJODh074pqdkqTnMOSS1Jwhl6TmDLkkNWfIJak5\nQy5JzRlySWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqTlDLknNGXJJas6QS9KCSvLBJE/M2s6Q\nS9ICSvIaYDswc+UMQy5JI0qyLcknkzyQ5ECSn0xyCvD7wPuBmasJnZBrdkpSI7uBQ1W1ByDJmcBV\nwN9U1ZeHrD/sGbkkjWs/8PokH0jyOuAFwDuAD2XgKvKekUtqa8eOHQxs3cKqqoeTXAjsAa4FPgd8\nJ3BwsskZSf6tql5+vH1krStQJ6n1rF7dcfVrSVqvSfuy6u2zgeWq+maSS4HLq+ptqx5/oqpeOG2f\nnpFL0rh2AtclOQI8BbzrqMdnnvF6Ri5JW+joM/KN4M1OSWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1\nZ8glqTlDLknNGXJJas6QS1JzhlySmjPkktScIZek5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKa\nM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqTlDLknN\nGXJJas6QS1JzhlySmjPkktScIZekBZPkT5P8e5L7Jy+vnLb9qVs1mCRpsALeV1V/NWRjQy5JI0qy\nDbgJOAc4Bdj77END9+GlFUka127gUFV9T1XtBD49ef/vJXkwyR8mef60HaSq1vSZk9RaPxZgaWmJ\n5eXlNX+8JHVVVf93tp3kfOB24Ebg1qq6K8lLq+rLk4B/BHikqvYeZ3fjhVySTkZJ/l/IJ+/bDuwB\nrgTuWB3tJD/KyvXyNx1vn14jl6QRJTkbWK6qfUkOA5evOiMP8DbgwLR9GHJJGtdO4LokR4CngHcD\n+5K8mJUbnvcDvzFtB15akaQtdKxLK+vlb61IUnOGXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRly\nSWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqbmTNuR33nnn2COsi/OPy/nH03n2zWLIm3L+cTn/\neDrPvllO2pBL0onCkEtSc+ta6m2DZ5Gkk8JGL/W25pBLkhaDl1YkqTlDLknNzQx5kt1JHkrycJJf\nO8bj70ty/+TlQJJnkmzfnHHnN2v+yTa7JvP/c5I7t3jEqQY8/7uSHF71/+A3x5jzeIY8/5Ptvm9y\n7Fy2lfPNMuD5f0uSByfP/T8m+fEx5jyWAbP/zGT2/Un+Ickrx5jzeAbMf0GSu5N8M8l7x5hxmoHt\n+eDk8QeTXLjmT1ZVx30BTgEOAt8BnAY8AHz3lO0vBT47bZ9b+TJkfmA78C/AuZO3XzT23HPOvwv4\nxNizruf4mWz3OeBW4O1jzz3n879t1es7gYNjzz3H7D8AnDV5fTdwz9hzzzn/i4HXANcC7x175jXM\n/0bgtsnr37+e53/WGflFkwPz0ap6Gvg48JYp278T+IsZ+9xKQ+Z/J3BzVX0JoKoe2+IZpxn6/G/o\nHfANNHT+9wB/CXxlK4cbYOb8VfW1VW++AFiU42fI7HdX1eHJm/cC527xjNMMmf8rVXUf8PQYA84w\n5Nh/M/AxgKq6F9ie5CVr+WSzQn4O8MVVb39p8r7nSHIG8BPAzWsZZJMMmf98YCnJ3ya5L8nPbdl0\nsw2Zv4AfnPxodluSV2zZdLPNnD/JOawc4B+evGuRfo1q0PGf5K1JvgB8Crh6i2abZfDX7sTlwG2b\nOtF85p1/0QyZ/1jbrOmb6akzHp/ni+pNwF1V9d9rGWSTDJn/NODVwMXAGcDdSe6pqoc3dbJhhsz/\nT8B5VfX1JJcAtwAv39yxBhsy/x8Bv15VlSQs1k8Xg47/qroFuCXJDwN/BnzXpk41zOCv3SQ/Bvwi\n8EObN87cFukb+loMnf/o431N/+5ZIT8EnLfq7fNY+a5xLD/NYl1WgWHzfxF4rKq+AXwjyd8BrwIW\nIeQz56+qJ1a9/qkkf5xkqaoe36IZpxny/H8v8PGVhvMi4JIkT1fVJ7ZmxKnmOf6pqr9PcmqSb6uq\nr276dNMNmn1yg/N6YHdVLW/RbEPM9dwvoCHzH73NuZP3zW/GBftTgUdYuWD/fI5/s+os4KvA6WPf\nZJh3fuAC4LOs3Jw4AzgAvGLs2eeY/yV86w+7LgIeHXvueY+fVdt/FLhs7LnnfP5ftur5fzXwyNhz\nzzH7t7NyQ+61Y8+7nmMH+B0W72bnkOd/9c3O17KOm51Tz8ir6pkkVwGfmYTuhqr6QpJfmjz+J5NN\n3wp8plbOahfGkPmr6qEknwb2A0eA66vqX8eb+lsGPv/vAN6V5Bng66z8ZLQQ5jh+FtLA+d8O/HyS\np4EnWZDnf+Dsvw3sAD48+Yno6aq6aKyZVxsyf5KXAp8HzgSOJPkVVk7Cnhxt8ImB7bktyRuTHAS+\nBvzCWj+ff6IvSc35l52S1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpr7X5xj706QscIm\nAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "msa, tree = iterative_msa_and_tree(query_sequences, pairwise_aligner=global_pairwise_align_nucleotide, num_iterations=5, display_aln=True, display_tree=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">s5\n", "---GGC--CCA-----CTGAT\n", ">s4\n", "---GGCACCAAACAGA----A\n", ">s3\n", "---GGTACCAAATAGA----A\n", ">s1\n", "ACCGGTGACCAGTTGACCAGT\n", ">s2\n", "ATCGGT-ACCGGTAGA--AGT\n", "\n", "\n", "Output tree:\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAD7CAYAAAB37B+tAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADNFJREFUeJzt3X2MZfVdx/H3p0AT2BZ2xzYtAYxJpWKTbaVWrFp1lbQu\nXfpEqzH1IVEgppViYptqjIpmSWwkGtM0NpaQ2ujGQkSxobQlpRLFACkK7Kolshi03aSmlHGFPgTI\nfv1jLnZcdu89d57O/e6+X8mEmblnznz3cuY9Z86ZyS9VhSSpr+eNPYAkaX0MuSQ1Z8glqTlDLknN\nGXJJas6QS1Jzp671A5P4e4uStAZVlY3c37rOyKuq7cs111wz+gzOP/4czt/vpfPsVZtz/uulFUlq\nzpBLUnMnbch37do19gjr4vzjcv7xdJ59s2St12yS1GZd75GkE1USapFudkqSxmfIJak5Qy5JzRly\nSWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqTlDLknNGXJJas6QS9KCSbIvyUNJDiS5IcnU1dwM\nuSQtnj+vqguqaidwOnDFtI3XvGanJGn9kmwDbgLOAU4B9lbVTas2+Txw7rR9GHJJGtdu4FBV7QFI\ncuazDyQ5DfhZ4OppO3BhCWmDLS0tsby8PPYYWmCrF5ZIcj5wO3AjcGtV3bXqseuBJ6rqV6ftz5BL\nG2yyAszYY2hBHWuFoCTbgT3AlcAdVbU3yTXAq6rqsln79NKKJI0oydnAclXtS3IYuDzJFcAbgIsH\n7cMzcmljeUauaY4+I0/yBuA64AjwFPBu4B7gUeDJyWY3V9W1x92nIZc2liHXNC6+LEl6DkMuSc0Z\ncklqzpBLUnOGXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpoz5JLUnCGXpOYM\nuSQ1Z8glqbkTcs1OVzGXdDI5IZd6c6ktjcnjT9O41Jsk6TkMuSQ1Z8glqTlDLknNGXJJas6QS1Jz\nhlySmjPkktScIZek5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKaM+SS1Jwhl6QFk+SqJAeTHEmy\nNGt7Qy5Ji+cu4GLgP4ZsfEKu2SlJXSTZBtwEnAOcAuytqpsmjw3ahyGXpHHtBg5V1R6AJGfOuwMv\nrUjSuPYDr0/ygSSvq6r/mXcHnpFLG2zHjh2DfySWqurhJBcCe4Brk9xRVXvn2YchlzbY448/PvYI\nWmBHf5NPcjawXFX7khwGLp+8/9kNZ54VeGlFksa1E7g3yf3AbwF7k7wH+E9WboDuT/KRaTtIVa3p\nMyeptX7sZkvCos4m6eQ26dOGXnvzjFySmjPkktScIZek5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOG\nXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpoz5JLU3GiLLy8tLbG8vDzWp5ek\nE8Zoa3Zu5rqartkpaVG5Zqck6TkMuSQ1Z8glqTlDLknNGXJJas6QS1JzhlySmjPkktScIZek5gy5\nJDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKaM+SS1Jwhl6QFk+SGJA8k2Z/kr5OcNXV7VwiSpK0zZIWg\nJC+sqicmr/8BsFxV1x5ve8/IJWlESbYl+eTkDPxAkp9aFfEApwOPTdvHaIsvS5IA2A0cqqo9AEnO\nnPz3o8AlwEHg6mk78NKKpLaWlpZYXl4ee4y5rb60kuR84HbgRuDWqrpr1WPPAz4E/FdV/e7x9mfI\nJbXV8Wv9WNfIk2wH9gBXAndU1d5Vj/0I8P6quvR4+/TSiiSNKMnZrNzM3JfkMHBFkpdV1SOTa+Rv\nBu6ftg9DLknj2glcl+QI8BRwFfCxZ6+VA/cBvzxtB15akdRWx6/1Ib9+OC9//VCSmjPkktScIZek\n5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtS\nc4Zckpoz5JLU3Am5ZueOHTtYWbNUkk58J+SanZJODh074pqdkqTnMOSS1Jwhl6TmDLkkNWfIJak5\nQy5JzRlySWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqTlDLknNGXJJas6QS9KCSvLBJE/M2s6Q\nS9ICSvIaYDswc+UMQy5JI0qyLcknkzyQ5ECSn0xyCvD7wPuBmasJnZBrdkpSI7uBQ1W1ByDJmcBV\nwN9U1ZeHrD/sGbkkjWs/8PokH0jyOuAFwDuAD2XgKvKekUtqa8eOHQxs3cKqqoeTXAjsAa4FPgd8\nJ3BwsskZSf6tql5+vH1krStQJ6n1rF7dcfVrSVqvSfuy6u2zgeWq+maSS4HLq+ptqx5/oqpeOG2f\nnpFL0rh2AtclOQI8BbzrqMdnnvF6Ri5JW+joM/KN4M1OSWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1\nZ8glqTlDLknNGXJJas6QS1JzhlySmjPkktScIZek5gy5JDVnyCWpOUMuSc0ZcklqzpBLUnOGXJKa\nM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqTlDLknN\nGXJJas6QS1JzhlySmjPkktScIZekBZPkT5P8e5L7Jy+vnLb9qVs1mCRpsALeV1V/NWRjQy5JI0qy\nDbgJOAc4Bdj77END9+GlFUka127gUFV9T1XtBD49ef/vJXkwyR8mef60HaSq1vSZk9RaPxZgaWmJ\n5eXlNX+8JHVVVf93tp3kfOB24Ebg1qq6K8lLq+rLk4B/BHikqvYeZ3fjhVySTkZJ/l/IJ+/bDuwB\nrgTuWB3tJD/KyvXyNx1vn14jl6QRJTkbWK6qfUkOA5evOiMP8DbgwLR9GHJJGtdO4LokR4CngHcD\n+5K8mJUbnvcDvzFtB15akaQtdKxLK+vlb61IUnOGXJKaM+SS1Jwhl6TmDLkkNWfIJak5Qy5JzRly\nSWrOkEtSc4Zckpoz5JLUnCGXpOYMuSQ1Z8glqbmTNuR33nnn2COsi/OPy/nH03n2zWLIm3L+cTn/\neDrPvllO2pBL0onCkEtSc+ta6m2DZ5Gkk8JGL/W25pBLkhaDl1YkqTlDLknNzQx5kt1JHkrycJJf\nO8bj70ty/+TlQJJnkmzfnHHnN2v+yTa7JvP/c5I7t3jEqQY8/7uSHF71/+A3x5jzeIY8/5Ptvm9y\n7Fy2lfPNMuD5f0uSByfP/T8m+fEx5jyWAbP/zGT2/Un+Ickrx5jzeAbMf0GSu5N8M8l7x5hxmoHt\n+eDk8QeTXLjmT1ZVx30BTgEOAt8BnAY8AHz3lO0vBT47bZ9b+TJkfmA78C/AuZO3XzT23HPOvwv4\nxNizruf4mWz3OeBW4O1jzz3n879t1es7gYNjzz3H7D8AnDV5fTdwz9hzzzn/i4HXANcC7x175jXM\n/0bgtsnr37+e53/WGflFkwPz0ap6Gvg48JYp278T+IsZ+9xKQ+Z/J3BzVX0JoKoe2+IZpxn6/G/o\nHfANNHT+9wB/CXxlK4cbYOb8VfW1VW++AFiU42fI7HdX1eHJm/cC527xjNMMmf8rVXUf8PQYA84w\n5Nh/M/AxgKq6F9ie5CVr+WSzQn4O8MVVb39p8r7nSHIG8BPAzWsZZJMMmf98YCnJ3ya5L8nPbdl0\nsw2Zv4AfnPxodluSV2zZdLPNnD/JOawc4B+evGuRfo1q0PGf5K1JvgB8Crh6i2abZfDX7sTlwG2b\nOtF85p1/0QyZ/1jbrOmb6akzHp/ni+pNwF1V9d9rGWSTDJn/NODVwMXAGcDdSe6pqoc3dbJhhsz/\nT8B5VfX1JJcAtwAv39yxBhsy/x8Bv15VlSQs1k8Xg47/qroFuCXJDwN/BnzXpk41zOCv3SQ/Bvwi\n8EObN87cFukb+loMnf/o431N/+5ZIT8EnLfq7fNY+a5xLD/NYl1WgWHzfxF4rKq+AXwjyd8BrwIW\nIeQz56+qJ1a9/qkkf5xkqaoe36IZpxny/H8v8PGVhvMi4JIkT1fVJ7ZmxKnmOf6pqr9PcmqSb6uq\nr276dNMNmn1yg/N6YHdVLW/RbEPM9dwvoCHzH73NuZP3zW/GBftTgUdYuWD/fI5/s+os4KvA6WPf\nZJh3fuAC4LOs3Jw4AzgAvGLs2eeY/yV86w+7LgIeHXvueY+fVdt/FLhs7LnnfP5ftur5fzXwyNhz\nzzH7t7NyQ+61Y8+7nmMH+B0W72bnkOd/9c3O17KOm51Tz8ir6pkkVwGfmYTuhqr6QpJfmjz+J5NN\n3wp8plbOahfGkPmr6qEknwb2A0eA66vqX8eb+lsGPv/vAN6V5Bng66z8ZLQQ5jh+FtLA+d8O/HyS\np4EnWZDnf+Dsvw3sAD48+Yno6aq6aKyZVxsyf5KXAp8HzgSOJPkVVk7Cnhxt8ImB7bktyRuTHAS+\nBvzCWj+ff6IvSc35l52S1Jwhl6TmDLkkNWfIJak5Qy5JzRlySWrOkEtSc4Zckpr7X5xj706QscIm\nAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Some references that I used in assembling these notes include [1](http://statweb.stanford.edu/~nzhang/345_web/sequence_slides3.pdf), [2](http://math.mit.edu/classes/18.417/Slides/alignment.pdf), [3](http://www.sciencedirect.com/science/article/pii/0378111988903307), [4](http://bioinformatics.oxfordjournals.org/content/23/21/2947.full), and [5](http://nar.oxfordjournals.org/content/32/5/1792.full). " ] } ], "metadata": {} } ] }