{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Repetitive DNA elements (\"repeats\") are DNA sequences prevalent in genomes, especially of higher eukaryotes. Repeats make up about [50% of the human genome](http://www.nature.com/nature/journal/v409/n6822/abs/409860a0.html) and over [80% of the maize genome](http://www.sciencemag.org/content/326/5956/1112). Repeats can be categorized as *interspersed*, where similar DNA sequences are spread throughout the genome, or *tandem*, where similar sequences are adjacent (see [Treangen and Salzberg](http://www.nature.com/nrg/journal/v13/n2/full/nrg3164.html)). Some interspersed repeats are long segmental duplications, but most are relatively short transposons and retrotransposons. Though repeats are sometimes referred to as “junk,” they are involved in processes of current scientific interest, including genome expansion, speciation, and epigenetic regulation (see [Fedoroff](http://www.sciencemag.org/content/338/6108/758)). Some are still actively expressed and duplicated, including in the human genome (see [Witherspoon et al](http://genome.cshlp.org/content/23/7/1170.long), [Tyekucheva et al](http://www.biomedcentral.com/1471-2164/12/495))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### RepeatMasker\n", "\n", "[RepeatMasker](http://www.repeatmasker.org/) is both a tool for identifying repeats in a genome sequence, and a [database of repeats](http://www.repeatmasker.org/genomicDatasets/RMGenomicDatasets.html) that have been found. The database covers some well known model species, like human, chimpanzee, gorilla, rhesus, rat, mouse, horse, cow, cat, dog, chicken, zebrafish, bee, fruitfly and roundworm. People often use RepeatMasker to remove (\"mask out\") repetitive sequences from the genome so that they can be ignored (or otherwise treated specially) in later analyses, though that's not our goal here.\n", "\n", "It's intructive to click on some of the species listed in [the database](http://www.repeatmasker.org/genomicDatasets/RMGenomicDatasets.html) and examine the associated bar and pie charts describing their repeat content. For example, note the differences between the bar charts for [human](http://www.repeatmasker.org/species/hg.html) and [mouse](http://www.repeatmasker.org/species/mm.html), especially for SINE/Alu and LINE/L1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Working with RepeatMasker databases\n", "\n", "Let's obtain and parse a RepeatMasker database. We'll start with [roundworm](http://www.repeatmasker.org/species/ce.html) because it's relatively small (only about 2.5 megabytes compressed)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('ce10.fa.out.gz', )" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import urllib.request\n", "rm_site = 'http://www.repeatmasker.org'\n", "fn = 'ce10.fa.out.gz'\n", "url = '%s/genomes/ce10/RepeatMasker-rm405-db20140131/%s' % (rm_site, fn)\n", "urllib.request.urlretrieve(url, fn)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " SW perc perc perc query position in query matching repeat position in repeat\n", "score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID\n", "\n", " 508 0.0 0.0 0.0 chrI 1 432 (15071991) + (GCCTAA)n Simple_repeat 1 432 (0) 1\n", " 1226 10.0 0.0 0.0 chrI 566 595 (15071828) + (GCCTAA)n Simple_repeat 1 41 (240) 2\n", " 344 22.2 0.0 0.0 chrI 596 676 (15071747) C RCS5 Satellite (41) 1387 1307 3\n", " 1226 10.0 0.0 0.0 chrI 677 846 (15071577) + (GCCTAA)n Simple_repeat 42 281 (0) 2\n", " 432 21.9 2.4 0.0 chrI 1622 1744 (15070679) + LONGPAL1 DNA/MULE-MuDR 136 261 (2330) 4\n", " 8509 0.6 0.0 0.1 chrI 2052 3026 (15069397) + PALTTTAAA3 DNA 1 974 (529) 5\n", " 4521 1.1 0.2 0.2 chrI 3124 3652 (15068771) + PALTTTAAA3 DNA 974 1502 (1) 6\n" ] } ], "source": [ "import gzip\n", "import itertools\n", "fh = gzip.open(fn, 'rt')\n", "for ln in itertools.islice(fh, 10):\n", " print(ln, end='')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above are the first several lines of the `.out.gz` file for the roundworm (*C. elegans*). The columns have headers, which are somewhat helpful. More detail is available in the [RepeatMasker documentation](http://www.repeatmasker.org/webrepeatmaskerhelp.html) under \"How to read the results\". (Note that in addition to the 14 fields descrived in the documentation, there's also a 15th `ID` field.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's an extremely simple class that parses a line from these files and stores the individual values in its fields:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "class Repeat(object):\n", " def __init__(self, ln):\n", " # parse fields\n", " (self.swsc, self.pctdiv, self.pctdel, self.pctins, self.refid,\n", " self.ref_i, self.ref_f, self.ref_remain, self.orient, self.rep_nm,\n", " self.rep_cl, self.rep_prior, self.rep_i, self.rep_f, self.unk) = ln.split()\n", " # int-ize the reference coordinates\n", " self.ref_i, self.ref_f = int(self.ref_i), int(self.ref_f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can parse a file into a list of Repeat objects:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def parse_repeat_masker_db(fn):\n", " reps = []\n", " with gzip.open(fn) if fn.endswith('.gz') else open(fn) as fh:\n", " fh.readline() # skip header\n", " fh.readline() # skip header\n", " fh.readline() # skip header\n", " while True:\n", " ln = fh.readline()\n", " if len(ln) == 0:\n", " break\n", " reps.append(Repeat(ln.decode('UTF8')))\n", " return reps" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "reps = parse_repeat_masker_db('ce10.fa.out.gz')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Extracting repeats from the genome in FASTA format\n", "\n", "Now let's obtain the genome for the roundworm in FASTA format. For more information on FASTA, see [the FASTA notebook](http://nbviewer.ipython.org/gist/BenLangmead/8307011). As seen above, the name of the genome assembly used by RepeatMasker is `ce10`. We can get it from the UCSC server. It's around 30 MB." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('chromFa.tar.gz', )" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ucsc_site = 'http://hgdownload.cse.ucsc.edu/goldenPath'\n", "fn = 'chromFa.tar.gz'\n", "urllib.request.urlretrieve(\"%s/ce10/bigZips/%s\" % (ucsc_site, fn), fn)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "chrI.fa\n", "chrII.fa\n", "chrIII.fa\n", "chrIV.fa\n", "chrM.fa\n", "chrV.fa\n", "chrX.fa\n" ] } ], "source": [ "!tar zxvf chromFa.tar.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load chromosome I into a string so that we can see the sequences of the repeats." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from collections import defaultdict\n", "\n", "def parse_fasta(fns):\n", " ret = defaultdict(list)\n", " for fn in fns:\n", " with open(fn, 'rt') as fh:\n", " for ln in fh:\n", " if ln[0] == '>':\n", " name = ln[1:].rstrip()\n", " else:\n", " ret[name].append(ln.rstrip())\n", " for k, v in ret.items():\n", " ret[k] = ''.join(v)\n", " return ret" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "genome = parse_fasta(['chrI.fa', 'chrII.fa', 'chrIII.fa', 'chrIV.fa', 'chrM.fa', 'chrV.fa', 'chrX.fa'])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaaAAAATTGAGATAAGAAAACATTTTACTTTTTCAAAATTGTTTTCATGCTAAATTCAAAACGTTTTTTTTTTAGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCTGCCTGCCAACCTATATGCTCCTGTGTTtaggcctaatactaagcctaagcctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagcctaaAAGAATATGGTAGCTACAGAAACGGTAGTACACTCTTCTGAAAATACAAAAAATTTGCAATTTTTATAGCTAGGGCACTTTTTGTCTGCCCAAATATAGGCAACCAAAAATAATTGCCAAGTTTTTAATGATTTGTTGCATATTGAAAAAAACA'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "genome['chrI'][:1000] # printing just the first 1K nucleotides" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the combination of lowercase and uppercase. Actually, that relates to our discussion here. The lowercase stretches are repeats! The UCSC genome sequences use the lowercase/uppercase distinction to make it clear where the repeats are -- and they know this because they ran RepeatMasker on the genome beforehand. In this case, the two repeats you can see are both simple hexamer repeats. Also, note that their position in the genome corresponds to the first two rows of the RepeatMasker database that we printed above." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We write a function that, given a Repeat and given a dictionary containing the sequences of all the chromosomes in the genome, outputs each repeat string." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def extract_repeat(rep, genome):\n", " assert rep.refid in genome\n", " return genome[rep.refid][rep.ref_i-1:rep.ref_f]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "extract_repeat(reps[0], genome)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'CCTGTGTTtaggcctaatactaagcctaag'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "extract_repeat(reps[1], genome)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'cctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcct'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "extract_repeat(reps[2], genome)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's specifically try to extract a repeat from the DNA/CMC-Chapaev family." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "chapaevs = filter(lambda x: 'DNA/CMC-Chapaev' == x.rep_cl, reps)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['cacggcccggcggggggtacatggatgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtgtaaactttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgacatttagcaaaaattgaacataagatttttcggaattatgaagctcaattttcacaaaaataatgagttttttgtagaatttatgaaaaaacgtgaatatatagattttttgttcatgatattcaagaaaaagcgatttttagttcttcacagaggaatcctctcgcatttcacttgctcatgatgttttttgctccactttaggacgataaaaatgcgaattgttgataaaatgaatgaataatataaaaa',\n", " 'ggggctgctgaaaccaatgtcggcatgatgagagttccggtcttctgaatccatttcctgcgtgggctgtggcgacgagctgcacgtctgaaaatcaagtttttgtaatt',\n", " 'tttgggcgcatgatatggagctgaatcattcgattttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaatttaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcagctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaagtttcaaaaattctgagaccaatttttattgagaaaaataatttttcgctcgaattattgaattttcactaaatgcaaaaaacagtaaacttgggcccatgctacaagcctgaatctttcaaattaagaaccagcatgattttttcaatattctaggacgtttaaaaaaaatctggaccaacagtttttgaggaacgtaattttttatacaaaaatgttctgatttttcactaaactcaaaaaaatagtcaagttgggcccatgctgtacacctaaatcattaaaattcagaaccgccatgtattttttcttaccaaaggctctttaaaaaaaatctggaccaacagtttttgagatatttagaaaaacaactcacttttcgacgtttttcgccttttcgtggctcacccggttgatttttgcggcgatttgtggtctttcgctgaaaatattatttttatttcaattattaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaaacgcgaaaaaacatcgaaaaaccaccgcaacctcatgaacaaaaaaaaagcattgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg',\n", " 'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtttaaagtttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgaaatttcgcgaaaattaacagaagatttttttcggaattatagagctgaaattgaaaaaaaaactatcaaattttcatcgaatttgtgaaaaatcgtaagtatgaagatcttttcttcactatattcaaggaaaatcgatatttcgcttttcacagacgaatgatgtctcattttactcgatgaaagtttctgatgagctgtttttatcgatttttgagcgataaaaatgcgatttgttgataaaatggatcaattatataaagaaacaacatatattgctctgagattactttttgagaatcaattctttatttttcggtcattttaaattaagcattaaaataaaaatattagaaatcataataaaaaaaacagaaaatcgatatattactttttcttcggaatttcacgacttttttggacgaattttattctgtaaactttcttcttcgaatttgtgtccacgtggctttcagtcgaagaagattctgcagcactccttcttgcttgcccacaacttactcgaattttctaaaatttttaacttattgaaattgtcatttcacctttacactcacttcagctaaactattactgcatttcggaagttgataggatactggtggagcaacaagtggatggcttctagtgattggctggcttgtcgagcaagtttgtgtgattgcctgaaataatttttgatttcaattttgagttgatttaaagcagtgaacctaccaccgggttcggacgagaaagagcattactcggtagaccacggaatccaattttcgttgaattgcctccaaatgcaatagaagtttgtacgttttgtgagaagtcgggctgaaaattttcaaaatttgaaacttttcgagaaaaataaaaatctcaccacagcatttcgagattttgtcgattgtggaagccttttcctggagcgaaaattgattttttttttcgctaaattttttcttttttgggcagccgtgacgtcccgaataactgcttttgggtcccgaagatcattttgcgaagaaattggcagaactgttgcatcttttggtacgatggaaagaccgggaatggacgtgttctgaaatagttgtgtttttaagaatgcagaaatgtttttctgtaccaaaattaccatagtcatgtcattcatgatgttacgacacatgagctctctcagaacatggatgtaacgccttttcttgtcccggtaattgcaaaatctcctctcaagtgcattgaaaatcgcgtggacagattcaactccttgttctgtgatccttccaatgtttctcacatcttttgccatttgtggtgcatggtagaccaacaagtgcagctttaaaataattgtttcttcgggaaccgctactttcaaatcctccacaaatccgcgaatcgaattttgaagtattaagacgtcggaatcatttaaaaacttgtttcccgaaagtgacataatagttgaaagctttcccattgctgatttcaatccgagcaacattgggcataaatttgggccaaaaatgttgaaagtctcctctacaacagccggcgttagcagcaatttcaaatggtttccgcaaaatgattggaaccaagcctgcttgtccgctccaaacttagcccaacactgtcccattttttcaagtgttccttcgggagtaccattcacaattgtatcgagcaacaatttttccgattgaagtgctttcagttcagcatgcgactccaatttcatctttccggtggctccttgatacttttcttccgcacttttaattaggttaacagcgttttttagagttgcttttcgtgttttcaggataggaaaagaagtagtgttatccaaagtatcagaatatttccagaggggattgaagatatatttgtcaaaaatacccatgataatgtgcagaagaggaatcaaatagaacatgatcgcaacgtgtggcagaagtggagtacatcctttgcgaacacccaagtcgccattttcacaacaagctttgtaaagatcgattgttcgtgggtggaatgtttcatcaacattcatatccttgattttcatcctctcttcagctccccgtggattctgtgcaaaacatttgaagcagaaattgtgggatgaatgtccttggtgtccaagaatatcagattgaaacttgcaatctccagttgcaatttgcacaatttttgcggttttttgaactcctttgtccaaatatcaaattttcgttagcttgccaagctgctcaagaacgtccggaatgaattttttcagagacgaataattgtcggatccgtcatatactgcaattaccataacgtgtctcgaagaattcggtcgagatacgtttccgattaccaatgccaactttgtgcttccacctccagcgtcaccaacgactccaatcttgattactcctttcgtgtatccgtcgtccacaaattgatttgaattgcatagaagctctattcgataggctaaaacttctgcaattttcatgcactgcacaatggtaatcacttttcctttattgtcgaacgaagtggaaactttgaaactggagatcattgataactggattgacaaatctcttgtgttctttaccgatggaagcaaatcatagccaatggcattagtcaaatagtttttgattttttccatctgacttagagataatccgcattttgataaaaagtcaacggcctcaaagtttgaaagcttgtttttgtagctttgattctcttctgaattcaggaattttgtgaattttcgaataaattgtccgacgtcatcctcgaggcagatttcgtgttgaagcaagtgaagagctttgcgaaatcgatttttgatacaacttttgcttcttagattcgaaatattaactttaaaagctgattttttaaggttttcaacttcttcggcgtgtctttgtagactcagaaccatagctttgccacttttcttcacatctgcacagcttctcaccaatcgaccttctataccactgacgatcgttcgtatattgcatacttccatttgcagcgaagaattagatgctcttatagtgatattttcatggcggactatttgcatttcttccgaaaacaccgcaaactcatcaatccgcttttgtatttcttctgatatttcatttttttcatttttcagtcgttcgatcgttagtcggagcattttgatctgcggaatttgctcaacattggagattattcgaaccctcggtgtactgaacgagtttcgtaaaggtgtcggtggaaatacgggattggagaatctcagcaaaatcatataatattagttttgaaatattgaaaaaaattacattgtgagaaaaagtcggaatttcgtcactaaaatccatttccacgtctctcgtcagaattccttcatccatattgaaacaatttgacgacctgcatgtagttgcggagctactggaagcaatgtcgggatggtgggagtttcgatcttctgaactgatttcctgattagcctgtggcgacgagctgcacgtctgaaaatcacgtttttgaagttagaacaaactactccaacttaattaaagttgacaaaattgagctgaacgaacctccactttcgaattgttcagttcttcctcttcagtttgatcttttgaaactccattagcactgttccttgctctctgggcatttgctaaaagaaggcctgcacaagatttttcttttcttttttgtttgaagtatacttttgtcatctggaaatattgcatgaatattataagggaaacaatttttaaatatcgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttaggcgcatgatattgagctgaatgttttgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaattcaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcggctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaaggtttcaaaaattctgagaccaatttttgttgagaaaaataatttttcgttcgaattatcgatttttcacgaaatgccaaaaacagtaaacttgggcccatgctaaaagcctgaatctttcaaattaaaaaccagcatgattttttctatattctaagacgtttaaaaaaaatctggaccaacagttcttgaggaaagtaattttttatacaaaaatgtgctgatttttcactaaattcaaaaaaataatcaagttgggcccatgctatacacctaaatcattaaaattcagaaccgccatgtatgtattttttcataccataggctctttaaaaaaaatctggaccaacagtttttgagatatgtcaaaaaaaacaactcactttttgacgtttttcgccttttcgcggatgatgcggtcgatttttgcggcgatttgtggtctttcgctgaaaatattatttttatttcaatttttaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaacacgaaaaaaacgtcgaaaaactcccgcaacctcatgaaaaaaaataaagcactgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg',\n", " 'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcgtcaatatcagc',\n", " 'aattcctaaattttttattaaaatcgaaaaaaaaaaatgaaatacgtgagattgagtttcgagacttttttattcagaatcagcatatatttctccatatttgagtaggttttcagaaatattgtaccataatttttggaaaaatgtaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaaataaatatgaaatacgcgagattgaggttcaagactttttaattcggaatcagcatatatttttccatatttgagtagattttcagaaatattgtaccataatttttcgagatattttgaataataacttacttttcgacgttttttgcctttgtccggtttaatccatcgaatttcgaagcggtttgcgtagattagctgaaaacattatgcttattccacgtagtaacaagaaaaaacaagaaaaaataagaaaaaacgaagaaaa',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaata',\n", " 'cacggtatcacaaaaactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagtgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattaatagtttcaaggtgaaaaaaaaaataaaaaatagattttattattcaatttagtagctaacaattgagaattgaattattgaaacacagaaaaattaattttgaacagtaaaacaaagaaataatcgaaacggaagattgaaaattgaaaaatacatacaaatcgattcaaaaaacgaatttaatatgtggaatcagcctccttcttcgctttacggatgcaagttgaaatttttcttggtaaaactttcgaaaataaggaaacccggtgaaatttcttaattgcctcgactcctggtgaactttttgagataaatccatcgaaattttgctgagtagctcattgacaaggtttcaatctgaaatttcataaaatcaattattttcgaataattttaatcccaacaaaccgaaggaaatcctgaattttagcttttcgatagaatcctaaggtgtcacatcgacaattttccaagttgaaagaaaccatgtggagcatctccgcttataatatcctcgacaaaatattgatgagaaatccatcaaatttgcaccgagtatctgcttgtcaatgtttttacctgaaatttaatgaaattaataattttttaataattttaagcacaactaaccatagaaaatcccgaattttatgatattgaggaaatcccgaaattttaattgcgaaaaggttcagaattgtgtgagggagagcgctgggtggtttattgggaagacgaggcgtcctgcaaagaaaatgttacgatttgtttttttgaaaggttccacttacttcctccatatcagaaattagattgctacatgacagctccttagcaatgtacaggtatcttttccctgtgtttctaaccgagtggaagcgtctttcgaggcgattgaaaacggcatgaagcgattcgattccttgctccgaaagccttcccaaattcctttccgtttttgcgacttctactacgtgagcaaccaggacatgcaatttttgagtgattgattcttctgggtgggcagcttgtaggaattcaacgaattccttcatcgactcatccaattcgctgatgtcatcgtcagagagcaatcgattggccgacaatgacatgattttggacagcctgctcatcgcgttcttgacattcagaagcatcggagtcatgtggtttttcagatttttgaatgcagctgtgacacctttctcagacaaaatcagcttcgtgtgatttccagtgtacatttgaaaccacgctcggcgtgttgcaccaacttcttccagatcattctcaaattgtttaagatatccaccagcaagtccatccagcgtctcgtccaataacactttttcttctttcagagcagtaaactgtgctttcatctctctttttctcttaagtggtgaagcttcgaattttttgttcgcgtcttgaatcttcaaatcggcaactctttttgtctcttctttatttcttcgaatttcaaatgatgttttattgtctaaactcacaacggccatccaaatcggctcaaagatgtatttcgtgaacagtccgacaatcaagtgtagcattgcaggcaagtagtgttccaacttgacattttgaagaattggtccacttccgcatctgacgccaaagctaccgtatttagaattgagcttgtaagaattcattgttttcaagaaatatatcttttgcaagttcagatctttaagcttttgcatcagtcctcctcttggattggtttcgaaacaaaacgggcaaaaataggtggcacattgttttttatgtgacagtaaatcacatgtaaacttaaaatcaccgactactttttggacaacgccacgtgtcaccacccttccatcttccatataggtgatgctggtgaagttgttgatcttcacgataagatcagacaggtaggccatgataagttcccgcgaatcagagtcatcaaaaacagcgagaaggacgattcggtgcggcgagttcgcatgatcacaatttccgatcagaagacaaagtttcgtcgttcctcctcccgaatctccaccaattccgataacaattttgccatcggtataggaatcatggcgtaattgttttgatacagacaaacgctccagcttcggaatgacacttttctcgacatcgataattttgacaacggtcaccttttctccttttgatatatatgttgtggccttataattgtcgatagttgacattctcgttttccgttgcatcgtcaaattaatagttggcatgatttcaagatcggtgaacatcttatagttttgcttcacccgccgcaattgattgttagagaatctacatttttcttgaaaaataatcgtttgccaatgcgtcaattggacttggaaggactgcgaatcattttgttggagatatttctggaaatcaatcatgaaattacgaatatcctcttctcgactgattcgttgaagcaaatccagtgccaaatctatcctagattttccagagaacgattccttcttcgcaactctgatatccatattattctgtactttttcgcccttgacgtaatttgagcgatttgttttcttaatatcagattttcgttggttttccgttttgatgtttttcttattgaaattatctctttcctttgcatgaaaaattcatcagaaaaaacggcaaactcgtcgtctctctccatcaaccgatttagaagtgtctcttcgtcaatgtctgttcttggattgagctggaccgaatggtttgagctggacaggatggcggttctgaaatggaaaattgaataaacaatcaacaacaaagaaataaatctacctctgatatgctatccaaaatggaatcaacttctagaaatctcattcgttctctactcaagtcttctctaattcccttcacatatggtgtctcattcgaaaacacgccgacgctgtcactcgatgaatagagtggacttgacgatctttcaaaagagtgtggtgttcttggaggtgatgaagaagattccgcaatagaagaagttgatggttgtgtgtcgaacggctggaaaaattatttttaaaatattataatcgtcttaggatccgagttgtagcatggattaattcacttacaatatcagtttcaggtgttattgtcagattcaaatcgatattgcgtttcgactccacatccagattatcatcaaaatccacaaatggagacactcgtcttctctcaggagtgatctgcgcatctccgtttctgtcaagagcttctgtcaattctacattcgacaggtttaggtccagggaagcatttctttcagtatttgaattttcgtcttcttcatcttttttccatcgtcgtttccgagcatttagaagatttgaagaccatgctgattgatttgatggaggaggcatctgaaaaaaaaattttttactcaattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtacgatttgtgtatatgtttgcctttatcttgtagagaatttttagctgattctaattcggcaataaaaataaaacattttgtgatttttttcgaaaaaaatcatttttcctccattttcagtgcaaaattttagtttcgaaaaaattttcttgaagaatgtacgatttctaaaaaattctctctgtatcttgcttaaaattaacagttctttctctttgaagcaaaaaactaacacattttgtgatttttttggcaaaaaaaattattttataaattcttattacaaaaaaaaattttttcgaaaaaattttaatgaaaaattaagatttctctatgaattgagttccatcttatacacaattaaattttgatcataaaacaactataaatcgtaaagagtttttgattttcttgaaaaagggaacgtttttaaaaactattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtacgattcgtgtatatgtttgcctttatcttgtagagattttttagctgattctaattcggcaataaaaatataacattttgtgatttttttcgaaaaaaatcatttttcctccattttcagtgcaaaattttagtttcgaaaaaaaaattaatgaagaaggtacgatttctaaaaaattcggcccgtatcttgttcagaatttttagctctctctttttcaatcaaaaaaataaaatattttgtgattttttaagagaactcgtcaaaaaactcacttaattcgttttcctttctgcccacgccgacgctcctttgttttcctcgaattttttcgcttactacctgaaacaagacacactttttcgattttacaaaaaaaaataacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatcgatgccgagtgaaaggcttgaaatttaaacaattgttcgcaatagagcgtgtttgcctccatctagagattgaaccaccgtg',\n", " 'tgctgaaaattgctgaaaatcgaaatttcgtcagctgatgtcgattattctgcgcgggggtacggtacgcaagtccgcaaacactgtcacgccaaattgcgga',\n", " 'cacggtagcacagaaactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagcgaaatggtccgcaatgtgtctcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattactagtttaaagtgaataaaattaaaaaaaaaagattttattattaaatttatcaactaacaattcaaaattaatgtattaaaacacagaaaagttgattttgaacagaaaaacggagtaatcatttaaaaagacaatattaaagtgaaaaaacacgcaaatcgattgaaaaaacgaatttaatatgtggaatcggccttctttttcgcttttcggctgcaagttggaatttttcttgggaaaaactttcgaaaatgaggaaatcagtgggaacttcttaatttcctcgactcctggtgaactttttgtgttaaatccatcgaaattgtgcggaatagctcattgacaaggtttcaatctgcaatttgatgaaattctatatttttaaattattttaatcacaacaaacctgaggaaatcccgaattcgatcctttcgataaaatccagagatgtcactttgccacttttccaaaatgaaggaaaccatgtggagcttctcagctttttgagttcctggccgaaatcatgatgaaaaactatcgaaattgacttgagcagcttcttggcaaggtttttgtctgaaattttaagattttaatgatttttgaacgtttttaacacaacgaaccaaaggaaatcctgaatttcacatttctgactgtttcctgggatgttacatcggcagttttccaaaatgaaggacatcatgtagagcatctccacttattgaaattctggtgaagttcttgccgacaaatccatcgacattacgttgaacgtcttctaggcaaggtgtttatctgaaaattcatga',\n", " 'taagcagtttttgaaaagttttcgaaaaaaaAAAGAATTTCCGTTTTTTGAGATttaattttcagtgaaaaaaatttacttttggaaaatttcaagtgaaaaatgtacgattcgtgtatatgtttgcctttatcttgtagagaatttttagctgattctaattcggcaataaaaatataacattttgtgatcgttttcgaaaaaaaaatctttttctttatttttagtgcaaaattttagtttcgataatttttctatgaagaatgtacgatttctagaaaattctgcctgtatcttgctcaaaattaacagttctttctttttaaagcaaaaaattaacacattttgtgattttttggcaaaaaaaattattttataatttcttatttcaaaaaattttttttcgaaaaaatcttaatgaaaaattaagatttctctatgaatttagttccatcttatacaaaatttaatgctgatcataaaacaactataaaatgtgaagactttttgattttcttgaaaaatggaacgtttttaaaaactgttttcagtgaaaaaaatttacttttgaaaaattgtgaaaattgaaaaattgtgaataagtgaaatatgtacgatctctcaataattttgtcttcatcttgtagagaattgttagctgtttctgattcggcaagaaaaatacaacattttgtgatcgttttcgaaaaaaaaaatttttttcttaatttttagtgcaaaattttagtttcgaaaaaaaatttatgaagaaggtacgatttctagaaaattctgctcgtatcttgttcagaatttttagctctttctttttcataccaaaaaataaaatattttgtgattttttaagagaactcgtcaaaaaactcacttaattcgttttccttgccgcccacttcgacgttcctttgtttttctcgaattttttcgcttactacttgaaacaagacacactcttttgattttacaaaaaaaaattacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatcgatgccgagcgaaaggcttgaaatttaaacgattgttcgcaatagagcgtgtttaccgccatctagcgactgaaccaccgtg',\n", " 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',\n", " 'cacggcccggcgaaagagacgtggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacgaataaacatttttaatttggctgctaagctcatttatctttgttttttctcgttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcacaaaaccgcggcaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagagtttttt',\n", " 'tgcgaaaaactgtttaaagtatcgattttcttcaatatcagcaacatacaatcctttaaaatgattattttttgtaaattcgataaaaattcatttatttttcacaacttctgcccgaaaattaccgaaataaccagcgtttctataactaagaaagtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',\n", " 'ttgttcggtggagtgcgtttgcacccatctagcaactgaaccaccgtg',\n", " 'tttttcgtgtttttttatgtttttttatgttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggacttaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttgtttt',\n", " 'aaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaactcaacacattttcgatcatttttgaacaaaaaaattgttttctgaaaaatttgacgcttaatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacactttttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatataaagaaaaacatgcttgttctgaattagtaagaattgcgactcaagctcgcgtatctcatatttatttgttcgatttcagtggaaaatcgacacatttttgggaaaatttttttttttcgaaatattcgctcctaatttttaatgatttccagatgaca',\n", " 'tctctatattattcattcattttatcaacaaacttctatcgccctaacgtcgatcaaaaaagctcatcagcaactgccgtcgagt',\n", " 'tctcgtcgagtgaaatgcgatagaatttgtctgtgaaaaaccaaatatcgattttccttgaatatcgtgaagaacaaatcttcatacttacgattcttcacaaattcgatgaaaatctgatagttttttcaattttagctctataattccgaaaaaaatcttctgttaattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaagtttacactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaatcctcgtccatgtaccccccgccgggccgtg',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',\n", " 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',\n", " 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',\n", " 'gttttttcttattttttcttgtttttttttgttactacgtggaataagcataatgttttcagctaatctacgcaaaccgcttcgaaattcgatggattaaaccggacaaaggtaagaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattatggtacaatatttctgaaaacctactcaaatatggaaaaatatatgctgattccggattgaaaagtcttgaacctcaatctcgcgtatttcttattgagtttctcgattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaagaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatagaaaaatatatgctgattctgaattagtaacatttgaaactcaatctcacgtatttcataatttttttttcgattttaatgaaaaatatattcagtgcaattttcgattttttttcaaaa',\n", " 'gctgatattgacgaattgtcgtcaaataaaatacgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt',\n", " 'tgtgggctacggtagtcaagtacgcaaacaccacgagcattttcacaattgcgtacaaaatttttttcaagcttt',\n", " 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',\n", " 'ttttgaaaaaaagttactttttgttcgaaaatgtattgattttcacttattttcactagattcaaaaaaa',\n", " 'ttggacccatgctatactcaacatttttttggaattctgaatcagcattctcttcataaattagacaatttctaaaaaatctggaccaa',\n", " 'ttttccggtgatttgctaaaactataatttctatttcaattattaaccgagaaaaccagaaaaaa',\n", " 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaata',\n", " 'tattaattgctaaaatttatgtggactacgatagccaagtccgcaaacaccacg',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaatacttcccggtttcgttttacatgataatatcgttgaatttaagcaaaaaatgcagcattagtttcatgaaaaaaaaaataagaga',\n", " 'atttttcctcaaaaactagaatttttcattgaaaaatggcttaaaaatcgatttttgttcgaaaaaa',\n", " 'aaggttttcatcgataaagtcacgaatttgtcgaaatgctttggtgagttttatctttttcagaaaaaaaattcgaaaattttcag',\n", " 'cacggtggttcagttgctagatgggtgcaaacgcgct',\n", " 'attgttcggtagagcgcgtttgcactcatctagctgctgtaccaccgtg',\n", " 'agaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcg',\n", " 'ttttggaaaaatttaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgaaaaaaaatatatgaaatacgcgagattgaggttcaaggcttttcaattcggaatcagcctatatttttccatacttcaatgggttttcagaaatattgtatgtgaatttttggaaaaatgtgatttttaatttgaaattgcactgaatttcttgaatttttcaataaaatcgagaaaataaatatgaaatacgcgagattgaggttcaagactttttaattcggaatcagcatatatttttctatatttgagtgggttttcagaaatattgtatgtgaattttttgagaaaa',\n", " 'tttttcgaaaagtgttggaccagatttttttgatcttagaatatgaataaaagcatgctg',\n", " 'caaggcccggcaaaccgtacgtggctgcaaaatctctaccgtagtaaaatttggctgactgaatattcaacgtttgatactcagtgaaaagattcgtac',\n", " 'cacggtggttcagtcgctagatggcggtaaacacgctctattgcgaacaatcgtttaaatttcaagcctttcgctcggcatcgattctctccgttttttgccgattttaaccgtttttccgtattttttttgttatttttttttgcaaaattgaaaaagtgtgtcttgtttcaggtagtaatcgaaaaatttcgaggaaaacgaaggaacgtcaccgtgagcagcaaggaaaacgaattcagtgagttttttgactatttctctctgaaaaaattaaaaaatgtattactttttgattgaaaaagaaagagcaaaaaattctgaagaagatgtaacttaatttattcagaacaaaagttttttttgtctcgaaattacaatttttcgaccaaaaatagaaaatgatcattttctcgaaaatgaatcccaaatttatgtcagttttttatgaaaaaataatcagcgtaaacttctgtactagaaatactaccttttttttggattgaaagtttgtatcgtgctcaaaattttaaaagtaaaattattttacgctgaaaaatgcaaaatcataactttttttgaggagaaacccccaaatgttatcgattttgttatcaaattgaactcagcttaaaattatctacaagatgaagctaaattcattgaga',\n", " 'atattaattaagttttttgtatcaaattgtgtgttttctcaattttatattgcctttttatttcataactttccttttctgttcaaaatcaacttttttttgtgttttaacacttcaattatcaattgttagtttataaatttcataataaactctgattttttattttttttcatcttgaaactattaattctgctgtttttctgctaaaatttgcttcaaaaatctatttgccgcttcattgttttgcggactacggtacgcaagtacgcaaacaccgcaacgacacattgcggaccatttcgctgcgtacgctgcgagatctttctcaaattttacgagagatctagtttctgtgctactgtg',\n", " 'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtaaaatttagctgactgaatattcaacgtttgatactcagcgaaaagtttcatacgacattttgcggacgcggcattttaattgacgacactttcttagttatagaaacgctggttatttcggcaattttcgggcagaaattgtgaaaaataaatgaatttttatcgaatttacaaaaaataataattttaaaggattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca',\n", " 'aaaaaaaactcacttttttcgaattttcctccttccagttggcggttgagtgctgttttgccgcggttttgtgattttctgcaacaaaaaaaatatttttatcgataaaaatgagaaaaaacgagaaaaaacgaagataaatgagcttagcagtcaaattaaaaatgtttattcgttcaaaaatcttaaccataggaggcggtggcctagccggcgcagctctcgcggccacgtctctttcgccgggccgtg',\n", " 'ttttcttcgttttttcttattttttcttgttttttcttgttactacgtggaataagcataatgttttcagctaaactacgcaaatcgcttcgaaattcgatggattaaaccggacaaaggcaaaaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattctggtacaatatttctgaaaacccactgaaatatggatatgctgatttcgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttattttctcgattttattgaaaaatttgagaaattcagtacaatttcgaattaaaaattaaatttttccaaaaattctggtacaatatttctgaaaacccactgaaatatggaaaaatatgctgatttcgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttattttctcgattttattgaaaaattcaagaaattcagtgcaatttcaaattaaaaatcacatttttccacaa',\n", " 'gctgatattgacgaattgtcgtcaaataaaatgcgcgtacgcaatgtacacaatagtgtaaaaaactttatcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt',\n", " 'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcgtcaatatcagc',\n", " 'aaaatgttgaagagaaaccagagaaattgatcgagtagattcttggcaagttttgaaattatatggttttaataagttttgaacatttttaaatacaactaaccatgga',\n", " 'ttgtttggtggagcgcgtttgcaccaatctagcaactgaaccaccgtg',\n", " 'tttttttaaatttttttcttggctgctttactgatgtttttttctcaattttttcttgttttctttgttactaatttaaattaaaaaaactattttcagcttatcacagcaaatcggagcgaaactcgaccgcgataacaggaaaaagtcgaaaagtgagttttttgccaaaatatctcgaaaaactcatattttgttttgaaaacagatgcaaataaaaagaaatacat',\n", " 'atgtattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',\n", " 'tttcgtaatgtttttttcgagtttttgattgttttttctcatttttttttgtttttctttattattagttaaaatataaaaactattttaagctaatcaacgcaaatcgaggcgaaaaccgatcgcagaaagaggaaaagtcgaaaagtgagtttttttgcaaaaatatttca',\n", " 'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactctgcgtacgcgcattttatttgacgacaattcgttaatatcagc',\n", " 'ttcattaaaatcgaaaaaaaaattatgaaatacgtgagattgagtttcaaatgttactaattcagaatcagcatatatttttctatatttgagtgggttttcagaaatattgtaccataatttttggaaaaattaaattttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaactaaatatgaaatacgcgaaattgaggttcaagacttttaattcagaatcagcatatatttttctatatttgagtgggttttcagaaatattgtaccataatttttggaaaaatttaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaactaaatatgaaatacgcgagattgagattcaagacttttaaattcggaatcagcacatatttttccatatttgagtaggttttcagaaatattgtaccatattttttcgagatattttgaataataacttacttttcgacgttttttgcctttgtccggtttaatccatcgaatttcgaagcggtttgcgtagattagctgaaaacattatgcttattcca',\n", " 'gcaaacatactcttttgcgaataagcgatttttttgttttttttttttggtgttttccgttttttgcttgttttcaccgtttcccctctttttttttgtttttttttgtcaaatcgagaaagagtgtgtttttttttcaggtgttaaacaagatttgcgagcaaaacgagggcacaccatcgtaagaagcgaagaaaacgagaaaagtgagttttttgaagattcctctttaaaaaatagggaaatgttttagttttgagccaaaaaagaaagagctgaatttttcaaacaagatacatgc',\n", " 'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtgaaatttggctgactgaatattcaacgtttgatactcagcgaaaagtttcgtacgacattttgcggacgcggcattttaattgacgacactttcttagttatagaaacgctggttatttcggcaatttcgggcagaaattgtgaaaaataaatggatttttatcgaatttacagaaaataattatttgaaagtattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca',\n", " 'aaaaaactcacttttttcggattttccgcctcccagttggcggttgagtgctgttttgtcgcggttttttaattttctgcaacaaaatgtatatttttatcgataaaaatgagaaaaaacgagaaaaaac',\n", " 'ttgttcggtggagcgcgtttgcacccatctagcagctgaaccaccgtg',\n", " 'cacggtggttcacttgctagatgggtgcaaacgcgctccactgaacaa',\n", " 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',\n", " 'gcgaaattctgcattttgtcgtgagatccgcggtgtttgcgtacttctggggctaccgtaacccggaaaa',\n", " 'tcgagttttacattgaaaaaaaatggccaaaaatcggagaaaaatgggcaaaaaacggagagaattgatgacaaatcaaag',\n", " 'tcgagttttacattgaaaaaaaatggccaaaaatcggagaaaaatgggcaaaaaacggagagaattgatgacaaatcaaag',\n", " 'ccttaaaaggaagaaatttggtggaaaaatacaattttcgctctaaaaaattccgtaaattcgagaatttatgaaaaatactttggttttttat',\n", " 'gcaaaattctgcaatatgtcgtcaaattcggtgtttgcgtattttcgacgctaccgtaccccgcggaa',\n", " 'ttttcttcgttttttcttattttttcttgttttttcttgttactacgtggaataagcataatgttttcagctaatctacgcaaaccgcttcgaaattcgatggattaaaccggacaaaggtaaaaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattatggtacaatatttctgaaaacctactcaaatatggaaaaatatatgctgattccggattgaaaagtcttgaacctcaatctcgcgtatttcttattgagtttctcgattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatagaaaaatatatgctgattctgaattagtaacatttgaaactcaatctcacgtatttcataatttttttttcgattttaatgaaaaatctaagaattcagtgcaattttcgattttttttcaaaa',\n", " 'gctgatattgacgaattgtcgtcaaataaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt',\n", " 'actacggtgcgcaagtactcaaacactgcgacgtcagagcgcagac',\n", " 'tttagacgtatttttcttttctctgctcttatgatcgattttcgcagaggtttttgattatccggtaaatattactagttattctaatttttcattaaaaaattacatcgaaaataacgaaaaaacatcgaaaaacgcgaaagatcaacgaaaccaattcatgaattaattcgaatttataattcagtacaaaagcgattcggtcgcgggactagattttgcaacttcctaggccatttccaatttgcagtgc',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',\n", " 'ggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',\n", " 'ttgttcggtggagctcgtttgcacccatctagcaactgaaccaccgtg',\n", " 'cacggtgcttcagttgctagatgggtgcgaacgcgctccaccgaacaa',\n", " 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',\n", " 'gcatggggcgtggccgaaaattctctactaccgtttaccaatttggctaatttgccaatcaacgttgaaaagttttgtacatcg',\n", " 'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttgactgactgcgtgctcaacgttgagtactcagtttaaagtttcgtacaccgttgcgtactacacagcgcgcattttaattgacgacatttcgcgaaaattaacagaagattttttcggaattatagagctgaaattgaaaaaaaactatcaaattttcatcgaatttgtgaaaaatcgtaagtatgaagatcttttcttcactatattcaaggaaaatcgatatttagcttttcacagacgaatgatgtctcattttact',\n", " 'gcctcattttactcgatggaagtttctgatgagctgtttttatcgatttttgagcgataaaaatgcgatttgttgataaaatggataaattatataaagaaacaacatatattgctctgagattactttttgagaatcaattctttatttttcggtcattttaaattaagcattaaaataaaaatattagaaatcataataaaaaaaacagaaaatcgatatattactatttcttcggaatttcacgacttttttggacgaattttagtctgtaaactttcttcttcgaatttgtgtccacgtggctttcagtcgaagaagattctgcagcactccttcttgcttgcccacaacttgctcgaattttctaaaatttttaacttattgaaattgtcatttcacctttacactctcttcagctaaactattactgcatttcggaagttgataagatactggtggagcaacaagtggatggcttctagtgattggctggcttgtcgagcaagtttgtgtgattgcctgaaataatttttgatttcaattttgagttgatttaaag',\n", " 'gatttaaagcagtgaacctaccatcgggttcggacgagaaagagcattgctcggtagaccacggaatccaattttcgttgaattgcctccaaatgcaatagaagtttgtacgttttgtgagaagtcgggctggaaattttcaaaatttgaaacttttcgtgaaaaataaaaatctcaccacagcatttcgagattttgtcgattgtggaagccttttcttggagctaaaattgattt',\n", " 'tacgatggaaagaccgggaatggacgtgttctgaaatagttgtgtttttaagaatgcataaatttttttctgtaccaaaattaccatagtcatgtcattcatgatgttacgacacatgagctctctcagaacatggatgtaacgccttttcttgtcccggtaattgcaaaatctcctctcaagtgcattgaaaatcgcgtggacagattcaactccttgttctgtgatccttccaatgtttctcacatcttttgccatttgtggtgcatggtagaccaacaagtgcagctttaaaataattgtttcttcgggaaccgctactttcaaatcctccacaaatccgcgaatcgaattttgaagtattaagacgtcggaatcatttaaaaacttgtttcccgaaagtgacataatagttgaaagctttcccattgctgatttcaatccgagcaacattgggcataaatttgggccaaaaatgttgaaagtctcctctacaacagccggcgttagcagcaatttcaaatggtttccgcaaaatgattggaaccaagcctgcttgtccgctccaaacttagcccaacactgttccattttttcaagtgttcctccgggagtaccattcacaattgtgtcgagcaacaatttttccgattgaagtgctttcagttcagcatgcgactccaatttcatctttccggtggctgcttgatacttttcttccgcacttttgattaggttaacagcgttttttagagttgcttttcgtgttttcaggatagggaaacaagtagtgttatccaaagtgacagaatatttccagaggggattgaagatatatttgtcaaaaatacccatgataatgtgcagaagaggaatcaaatagaacatgatcgcaacgtgtggcagaagtggagtacatcctttgcgaacacccaagtcgccattttcacaacaagctttgtaaagatcgattgttcgtgggtggaatgtttcatcaacattcatatccttgattttcatcctctcttcagctccccgtggattctgtgcaaaacatttgaagcagaaattgtgggatgaatgtccttggtgtccaagaatatcagattgaaacttgcaatctccagttgcaatttgcacaatttttgcggttttttgaactcctttgtccaaatatcgaattttcgttagcttgccaagctgctcaagaacgtccggaatgaattttttcagagacgagtaattgtcggatccgtcatatactgcaattaccataacgtgtctcgaagaattcggtcgagatacgtttccgattaccaatgccaactttgtgcttccacctccagcgtcaccaacgactccaatgttgattactcctttcgtgtatccgtcgtccacaaattgatttgaattgcatagaagctctattcgataggctaaaacttctgcaattttcatgcactgcacaatggtaatcacttttcctttattgtcgaacgaagtggaaactttgaaactggagatcattgataactggattggcaaatctcttgcgttctttaccgatggaagcaaatcatagccaatggcattagtcaaatagtttttgattttttccatctgacttagagataatccgcattttgataaaaagtcaacggcctcaaagtttgaaagcttgtttttgtagctttgattctcttctgaattcaggaattttgtaaattttcgaataaattgtccgacgtcatcctcgaggcagatttcgtgttgaagcaagtgaagagctttgcgaaatcgatttttgatacaacttttgtttcttagattcgaaaatttaactttaaaagctgattttttaaggttttcaacttcttcggcgtgtctttgtagactcagaaccatagctttgccacttttcttcacatctgcacagcttctcaccaatcgaccttctataccactgacgatcgttcgtatattgcatacttccatttgcagcgaagaattagatgctcttatagtgatattttcatggcggactatttgtatttcttccgaaaacaccgcaaacgcatcattctgcttttgtatttcttctgatatttcatttttttcatttttcagtcgttcgatcgttagtcggagcattttgatctgcggaatttgctcaacattggagattattcgaaccctcggtgtactgaacgagtttcgtgaaggtgtcggtggaaatacgggattggagaatctctgcgaaatcatataatataatattagttttgaaatattgaaaaaaattacattgtgagaaaaagtcggaatatcgtcactaaaatccatttccacgtctctcgtcagaattccttcatccatattgaaacaatttgacgacctgcatgtagttgcggagctactggaagcaatgtcgggatggtgggagtttcgatcttctgaactgatttcctgattagcctgtggcgacgaactgcacgtctgaaaatcacgtttttgaagttagaacaaactactccaacttaattaaagtagacaaaattgagctgaacgaacctccactttcgaattgttcagttcttcctcttcagtttgatcttttgaaactccattagcactgttccttgctctctgggcatttgctaaaagaaggcctgcacaagatttttcttttcttttttgtttgaagtatacttttgtcatctggaaatattgcatgaatattataagggaaacaatttttaaatatcgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaattcaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcggctgagcattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaagatttcaaaaattctgagaccaatttttgttgagaaaaataatttttcgttcgaattatcgatttttcacgaaatgccaaaaacagtaaacttgggcccatgctaaaagcctgaatctttcaaattaaaaaccaacatgattttttctatattctaagacgtttaaaaaaaatctggaccaacagttcttgaggaaagtaattttttatacaaaaatgtgctgatttttcactaaattcaaaaaaatagtcaagttgggcccatgctatacacctaaatcattaaaattcagaaccgccatgtattttttcataccataggctctttaaaaaaaatctggaccaacagtttttgagagatgtcaaaaaaacaactcacttttcgacgtttttcgtgtttccccggatgatgcggtcgatttttgctgcgatttgtggtctttcgctgaaaatattatttttatttcaatttttaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaacacgaaaaaaacgtcgaaaaactcccgcaacctcatgaaaaaaaataaagcactgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgtcgtg',\n", " 'aaaaaaactcacttttcgactttttcctgtttctgcgatcgggttttgcgtcgatttgtggtaattagctgaaaatataaactatagtttttatattttaactattaataaagaaaacaagagaaaagtgagaaaaaacaatcaaaaactcgaaaaa',\n", " 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',\n", " 'TGCGAAAAACTGTTTAaagtatcgattttcttcaatatcagcaacatacaattctttaaaatgattattttttgtaaattcgataaaaattaatttatttttcacaatttctgcccgaaaattgccgaaatgaccagcgtttctagaactaaaacaagtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',\n", " 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',\n", " 'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaattttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgttaattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacataaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaatcttctaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctctcgacagaaacggagatgcgcagatcactcctgagagaagacgagtgtctccatttgtggattttgatgataatctggatgtggagtcgaaacgcaatatcgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaatgatttttccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcattacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgccaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttcacaaagccgaaacgttgtcacgcttcccaagtataccatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttttctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaatcgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaagcttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggaaatcacacgaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggagctgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttacctagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg',\n", " 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',\n", " 'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaattttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgttaattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacataaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaatcttctaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctctcgacagaaacggagatgcgcagatcactcctgagagaagacgagtgtctccatttgtggattttgatgataatctggatgtggagtcgaaacgcaatatcgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaatgatttttccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcattacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgccaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttcacaaagccgaaacgttgtcacgcttcccaagtataccatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttttctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaatcgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaagcttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggaaatcacacgaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggagctgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttacctagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg',\n", " 'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtgtaaactttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgacatttagcaaaaattgaacagaagatttttcggaattatgaagctcaattttcacaaaaataatgagttttttgtagaatttatgaaaaaacgtgaatatatagattttttgttcatgatattcaagaaaaatcgatttttagttcttcacagagtaatcctatcgcatttcacttgctcatgatgtttttgctcgactttaggacgataaaaatgcgaattgttgataaaatgaatgaacaatataaagaa',\n", " 'ggggctgctggaaccaatgtcggcatgacgagagttccggtcttctggatccatttcctgcgtgggctgtggcgacgagctgcacgtctgaaaatcaagtttttgtaatt',\n", " 'tttgggcgcatgatatggagctgaatcattcgattttagaatcagcaagcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggcccaacagttttcgaaaaaatttaatttttgttcagaaatgtgaatattcacgaaatcgaaaaaaataattgcaaaatccgtcagctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaaggtttcaaaaattctgagaccaatttttattgagaaaaataatttttcgctcgaattattgaattttcactaaatgcaaaaaacagtaaacttgggcccgtgctacaagcctgaatctttcaaattaaaaaccagcatgattttttcaatattctaggacgtttaaaaaaaatctggaccaacagtttttgaggaacgtaattttttatacaaaaatgtactgatttttcactaaactcaaaaaaatagtcaagttgggcccatgctatacacctaaattattaaaattcagaaccgccatgtattttttcatactataggctctttaaaaaaaatctggaccaacagtttttgagatatttagaaaaacaactcacttttcgacgtttttcgccttttcgcggctcacccggtcgatttttgcggcgatttgtgttctttcgctgaaaatattatttttatttcaattattaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaaacgcgaaaaaacatcgaaaaaccaccgaaacctcatgaaaaaaataaagcattgcagccgcgggattagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg',\n", " 'aactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagcgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattaatagtttcaaggtgaaaaaaaaaaaaatagattttattattcaatttattagctaacaattgagaattgaattatcaaaacacagaaaaattaattttgaacagtaaaacaaagaaataatcgaaacggtagattgaaaattgaaaaatacatacaaaacgattcaaaaaacgaattaatatgtggaatcggcctccttcttcgctttacggatgcaagttgagatttttcttggaaaaactttcgaaaataaggaaatcagtgggaacttcttatttcctcgactcctgcaggatcctggtgaactttttctgttaaatccatcgaaattgtgcggagtagctcattgacaaggtttcaatctgaaattttgtgaaattttatatttttgaataattttaatcacagcaaacctagggaaatcccgaattcgagcctttcgataaaatccagagatgtcacatcgccacttttccaaaatgaaggaaaccaggtggagcttctcagctttttggcttcctggtcgaaatcttgatgaaaaaaccatcgaaatttacttgagcagcttcttggcaaggtttttgtctgaaattttaggattttaatgatttttaacatttttaaacacaactaaccataaacaatccggattttttcggttttgactgaatccttggatttatgtagaaaacatgcccagaaatcaaggaacgaggtggaacatctcatttttttgaaattctggtgtaattcttgatgaaaaatccatcgacattacgttgaacgtcttcttggcaaggtgttttcttctgaaaattcatga',\n", " 'ctgtaacatctaagcagtttttgaaaagttttcgaaaaaaaaataaatttcagtttttgagatttaattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtaccgtttctgaaaatgtttgcttttatcgtgtagagaatttttagctggttctaatccggcaagaaaaacagaacattttgtgatcgttttcgaaaaaaaaatttttttctttaattttaagtgcaaaattttagtttcgataatttttctgtgaagaatgtacgatttctagaaaattctgcctgtatcttgcttaaaatgaacagttctttctttttaaagcaaaaaactaacacattttgtgattttttttggcaaaaaaaattattttataatttcttatttcaaaaaatgttttttcgaaaaaattttaatgaaaaattaatatttctttatgaacttagttccgtcttatacaaaatttaatgctgattataaaataactataaaacgtgaaga',\n", " 'cacggcccggcgaaagagacttggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacgaataaacatttttaatttggctgctaagctcatttattttcgttttttctcgtttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcaaaaaaccgcgacaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagtgagtttttt',\n", " 'tgcgaaaaactgtttaaagtatcgattttcttcaatatcagcaacatacaatcctttaaaatgattattttttgtaaattcgataaaaattaatttatttttcacaatttctgcccgaaaattgccgaaataaccagcgtttctataactaaaacaagtgtcgtcaattaaaatgccgcatccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',\n", " 'ttgttcagtggagcgtatttgcataaatctagcaactgaaccaccatg',\n", " 'attgaccaaaatcgagaaacattgcgaaaaactgtttaaagtgtcgattttcttcaatatcagcaacatacaatactttcaaatgattattttctgtaaattcgataaaaatccatttatttttcacaatttctgcccgaaaattgccgaaataaccagcgtttctataactaagaaagcgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagaatttacagccacgtacggttcgccgggccgtg',\n", " 'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtaaaatttggctgactgaatattcaacgtttgatactcagcgaaaagtttcgtacgacattttgcggacgcggcattttaattgacgacacttgttttagttatagaaacgctggttatttcggcaattttcgggcagaaattgtgacaaataaatgaatttttatcgaatttacaaaaaataatcattttaaaggattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca',\n", " 'aaattgtagtcagtatcactgcagatgctggagcaggaatcacaaagttttgtctgattatcgagaattgt',\n", " 'ttttcgagatattttggcaaaaacctcacttttcgtcgttttcctcctactgcgatcgattttcgccccgatgattagctgaaaataattttatatgttagttagtaacaaagaaaataagaaaaattgagaaaaaacaatcaaaaactcgagaaaa',\n", " 'cgaattgtcgtcaaattaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttcttcgcgttgaatattcagtcgcgactagtcagccaaatgggactactgtagaaatttcctcggccacgttccaaacgccgctccgtg',\n", " 'ttgttcgatggagcgcgtttgaacccatctagcaactgaaccaccgtg',\n", " 'cacggtggttcagttgctagttgggtgcaaacgcgctccaccgaacaa',\n", " 'cacggtggttcagttgctagttgggtgcaaacgcgctccaccgaacaa',\n", " 'cacggtagcacagaaactagatctctcgtaaaatttgagaaagatctcgcagcgtacgcagcgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacaatgaagcggcaaatagatttttgaagcaaattttaacagaaaaacagcagaattaatagtttcaagatgaaaaaaaaaataaaaaatcagagtttattatgaaatttataaactaacaattgataattgaagtgttaaaacacaaaaaaagttgattttgaacagaaaaggaaagttatgaaataaaaaggcaatataaaattgagaaaacacacaatttgatacaaaaaacttaattaatat',\n", " 'tgcataaattcaatttcatcttgcccagaattttaagctgattcgaattcatatcaaaatctgaagattcttcaattttttttatcgaaaaatttttttttctaaattttgtgaaaaatttt',\n", " 'tttagcttcatcttgtagataactttaagctgagttcaatttgataacaaaatcgataacatttggggatttctccacataaaaaattacgatttttaattttttagtgaaaaatatttttactttcgaaattttgagcatgacacaaactttcaatcgaaaaaaaagcgtacatatctggtacagaattttatgtagattattttttcataaaaaactgatccaattttgggattcgttttcgagatagtgatcattttctatttttggtctaaaatttttgatttcgaggcaaaaaaaaattattctgaataaattaagttacatcttattcagaatttttagcttaatttctagctcgaattaaataaaaaatctgaagattcttcaatttttttttaccaaaaacagttttttttctaaactttgtgaaaaaatttttttaattcaaatggatctgacaaatttttttcaatctcaattattttagctttctcttgtacagaattttaagctgttttcaattcaataacaaaatcgataacatttgggggtttctcctcaaaaaaagttatgattttgcatttttcagcgtaaaataattttacttttaaaattttgagcacgatacaaactttcaatccaaaaaaaggtagtatttctagtacagaattttacgctgattattttttcataaaaaacttacataaatttgggattcattttcgagaaaatgatgattttctatttttggccgaaaaattgtaatttcgagacaaaaaaaaaccttttgttctgaataaattaagttatatcttcttcagaattttttgctctttctttttcaatcgaaaagtaatacattttttaattttttcaaagagaaatagtcaaaaaactcactgaattcgttttccttgttgctcacggtgacgttccttcgttttcctcgaaatgtttcgattactacctaaaacaagacacactttttcaattttgcaaaaaaaaataacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatccatgccgagcgaaaggcttgaaatttaaacgattgttcgcaatagagcgtgtttgccgccatctagcgactgaaccaccgtg',\n", " 'tcagttttcacaaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcattcggtgtataaattgtaaac',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',\n", " 'tttgctggttacggtaaaaaagtacgcaaacaccaaacgtgaagtgcagacattgcgttttaccgtacttccgtgttcttttt',\n", " 'cacggcacggcgaacgggacatggcctagaaagttgcgaaaactagtcccgcggctgcaatgctttatttttttcatttggttgcggtgggcttttcgatgttttttcgtgttttttgaagttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttttttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaa',\n", " 'tagaaagatttgagactcaagctcgcgtatttcagcttatttttttcatttggttgcggtgggcttttcgatgttttttcgtgttttttgaagttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttttttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaataagaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaataagaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacagttgagacccaagctcgcgtatttcatattttttttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttttttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagactcaagctcgcgtatttcattttttttttcgattttagcacaaaatcaacactcttttgaaaaaaatattcgaaatattcgctcctaatttttaatgatttccagatgacaagctggttcacaccacattctactgaggaaatcgcttcaaataaccgaaatcatcccgacattggctccagcagctccgcaactacctacagctcgtcaaattgtttgaatatggatgaaggaattctgacgagagacgttgaaatggctttcagtgatgaaattccgactttttctaaaaccgtaatttttttaaaaatttcaaaaataacattatatgattttgcagaggacctccaattccgtgtttcctcctgcaccttatcgaaattcgttcagt',\n", " 'ttctttacattattcattcattttatcaacaattcgcacttctatcgctctaaagtcgatcaaaaaagcttatcagcaactgccgtcgagtgaaatgcgatacaatttgtctgtgaaaaaacaaatatcgattttccttgaatatcgtgaagaacaaatcttcatacttacgattattcacaaattcgatgaaaatctgatagttttttccaatttcagctctataattccaaaaaaaaaccttctgttaattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtccatgtaccccccgccgggccgtg',\n", " 'gctgatattgacgaattgtcgtcgaataaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcggccacgt',\n", " 'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaatttttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaattattttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgtttattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacacaaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaattttttaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctcttgacagaaacggagatgcgccgatcactcctgagagaagacgagtgtctccatttgtggattttggtgataatctggatgtggagtcgaaacgcaatattgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaataattttcccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcatcacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgtcaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttctcaaagccgaaacgttgtcacgcttcccaagtatatcatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttgtctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaattgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaacttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggagctgattttgtctgagaaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggaatttgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttaccaagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg',\n", " 'atgtcgtcaactaaaatgccgcggtacacgtccgcagtcggtgaacaaaacttttcgttgaatactcagtcagccaaatttaactactgtagaaatttctccccacacgtcgcgattgccgctccgtg',\n", " 'ttgtttaaaataagtttccttttttttgaatacgcaaagaacttgatttttccaaaaaaaaaaattgttttcgaatttttatgataaaaaaaaatttttt',\n", " 'cacggcacggcgaacgggacatggcctagaaagttgcgaaaactagtcccgcggctgcaatgctttattttttcatttggttgcggtgggtttttcgatgttttttcgtgtttttttaatgttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagcaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttgtttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttctgaattagaaagatttgagactcaagctcgcgtatttcaaatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatatgaagaaaattttgattaaccggaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcattttcgaacaaaaaattcattttctcaaaaattcaacgcttattttttttgaaactgcactcaaatatgaagaaaaacatgcttgttctgaattagtaagaattgcgactcaagctcgcgtatctcatatttatttgttcgatttcagtggaaaatcaacacatttttgggaaaaattttttttcgaaa',\n", " 'ttctaatttttaatgattttcagatgacacgccggttcacaccacattctactgaggagatcgcttcaaataaccgaaaccatcccgacattggctcca',\n", " 'ctctatattattcattcattttatcaacaattcgcacttctatcgccctaacgtcgatcaaaaaagctcatcagcaactgccgtcgagtgaaatgcgatagaatttgtctgtgaataaccaaatatcgattttccttgtatatcgtgaagaacaaatcttcatatttacgattcttcacaaattcgatgaaaatctgatagttttttcaatttcagctctataattccaaaaaaaatcttttgtcaatttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacaaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtctatgtaccccccgccgggccgtg',\n", " 'gctgctgcagtgttttggcgactacggtacgctgttacgcaaaccgcgcaatgacaacatt',\n", " 'attgaacagggcatgaaaggattcaatgccctgctccgatgatcttccc',\n", " 'cacggcccggcgaaagagacgtggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacggataaaaatttttaatttggctgctaagctcatttatcttcgttttttctcgttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcaaaaaaccacgacaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagtgagtttttt',\n", " 'tgcgaaaaactgtttaaagtatcgattttcttagtaaatatcagcatcataaaattatttaaaattattattttctgtaaattcgataaaaatccatttatttttcacaatttctgcccgaaaattaataaccagcgtttctataactaagaaggtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',\n", " 'gttcgttcagctcaattttgtctactttaattaagttggagtagtttgttctaacttcaaaaacgtgattttcagacgtgcagctcgtcgccacaggctaatcaggaaatcagttcagaagatcgaaactcccaccatcccgacattgcttccagtagctccgcaactacatgcaggtcgtcaaattgtttcaatatggatgaaggaattctgacgagagacgtggaaatggattttagtgacgaaattccgactttttctcacaatgtaattttttttcaatatttcaaaactaatattatatgattttgcagagattctccaatcccgtgtttccaccgacaccttcacgaaactcgttcagtacaccgagggttcgaataatctccaatgttgagcaaattccgcagatcaaaatgctccgactaacgatcgaacgactgaaaaatgaaaaaatgaaatatcagaagaaatacaaaagcggaatggtgagtttgcggtgttttcggaagaaatactaatagtccgccaagaaaatatcactgtaagagcatcgaattcttcgctgcaaatggaagtatgcaatatgcgaacgatcgtcagtggtatagaaggtcgattggtgagaagctgggcagatgtgaagaaaagtggcaaagctatggttctgagtctacaaagacacgccgaagaaattgaaaacctcaaaaaatcagcttttaaagttaaattttcgaatctaagaaacaaaagttgtatcaaaaatcgatttcgcaaagctcttcacttgcttcaacacgaaatctgcctcgaggatgacgtcggacaatttattcgaaaattcacaaaattcctgaattcagaagagaatcaaagctacaaaaacaagctttcaaactttgaggccgttgactttttatcaaaatgcggattatctctaagtcagatggaaaaaatcaaaaactatttgactaattccattggctatgatttgcttccattggtaaagaacacaagagatttgtcaatccagttatcaatgatctccagtttcaaagtttccacttcgttcgacaataaaggaaaagtgattaccattgtgcagtgcatgaaaattgcagaagttttagcctatcgaatagagcttctatgcaattcaaatcaatttgtggacgatggctacacgaaaggagtaatcaagattggagtcgttggtgacgctggaggtggaagcacaaagttggcattggtaatcggaatgtatctcgaccgaattcttcgagacacgttatggtaattgcagtatatgacggatccgacaattactcgtgtctgaaaaaattcattcccgacgttcttgagcagcttggcaagctgacaaaaattcgatatttggacaaaggagttcaaaaaaccgcaaaaattgtgcaaattgcaactggagattgcaagtttcaatttgatattcttggacaccaaggacattcatcccacaatttctgcttcaagtgttttgcccagaatccacggggagctgaagagaggatgaaaatcaaggatatgaatgttgatgaaacattccacccacgaacaatcgatctttacaaagcttgttgtactccacttctgccacacgttgcgatcatgttctatttgattcctcttctgcacattatcatgggtatttttgacaaatatatcttcaatcccctctggaaatattctgtcactttggataacactacttgtttccctatcctgaaaacgcgaaaagcaactctaaaaaacgctgttaacctaatcaaaagtgcggaagaaaagtatcaagcagccaccggaaagatgaaattggagtcgcatgctgaactgaaagcacttcaatcggaaaaattgttgctcgacacaattgtgaatggtactcccggaggaacacttgaaaaaatggaacagtgttgggctaagtttggagcggacaagcaggcttggttccaatcattttgcggaaaccatttgaaattgctgctaacgccggctattgtagaggagactttcaacatttttggcccaaatttatgcccaatgttgctcggattgaaatcagcaatgggaaagctttcaactattatgtcactttcgggaaacaagtttttaaatgattccgacgtcttaatacttcaaaattcgattcgcggatttgtggaggatttgaaagtagcggttcccgaagaaacaattattttaaagctgcacttggtggtctaccatgcaccacaaatggcaaaagatgtgagaaacattggaaggatcacagaaccaggagttgaatctgtccacgcgattttcaatgcacttgagaggagattttgcaattaccgggacaagaaaaggcgttacatccatgttctgagagagctcatgtgtcgtaacatcatgaatgacatgactatggtaatttttgtacagaaaaaaatttctgcattcttaaaaacacaactatttcagaacacgtccattcccggtctttccatcgtaccaaaagatgcaacagttctgccaatttcttcgcaaaatgatcttcgggacccaaaagtagttattcgggacgtcacggctgcccaaaaaagaaaaaatttagcgaaaaaaaaaatcaattttagctccaagaaaaggcttccacaatcgacaaaatctcgaaatgctgtggtgagatttttatttttcacgaaaagtttcaaattttgaacattttcagcccgacttctcacaaaacgtacaaacttctattgcatttggaggcaattcaacgaaaattggattccgtggtctaccgagcaatgctctttctcgtccgaacccgatggtaggttcactgctttaaatcaactcaaaattgaaatcaaaaattatttcaggcaatcacacaaacttgctcgacaagccagccaatcactagaagccatccacttgttgctccaccagtatcttatcaacttccgaaatgcagtaatagtttagctgaagtgagtgtaaaggtgaaatgacaatttcaataagttaaaaattttagaaaattcgagcaagttgtgggcaagcaagaaggagtgctgcagaatcttcttcgactgaaagccacgtggacacaaattcgaagaagaaagtttacagactaaaattcgtccaaaaaagtcgtgaaattccgaagaaatagtaatatatcgattttctgttttttttattatgatttctaatatttttattttaatgcttaatttaaaatgaccgaaaataaagaattgattctcaaaaagtaatctcagagcaatatatgttgtttctttatataatttatccattttatcaacaaatcgcatttttatcgctcaaaaatcgataaaaacagctcatcagaaactttcatcgagtaaaatgagacatcattcgtctgtgaaaagctaaatatcgattttccttgaatatagtgaagaaaagatcttcatacttacgatttttcacaaattcgatgaaaatttgatagttttttttcaatttcagctctataattccgaaaaaaatcttctgataattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtccatgtaccccccgccgggccgtg',\n", " 'ttgttcggtggatcgcgtttgcacccatctagcaactgaaccacagtg',\n", " 'GAAGCCAATGTCGGGATGGTTTCGGTTATTTGAAGCGATTTCCTTAGTAGAATGTGGTGTGAACCGGCGtgccatctggaaatc',\n", " 'tgccatctggaaatccttaaaaatttggtgcgaatatttcgaaaaaaagttttccaaaaatgtgttgattttccactaaaatcgaaaaaataaatatgaaatacgcgagcttgagtctcaattcttactaattcagaacaagcatttttttctccatattcgagtgcagtttcaaaaaaattaaacgttgaatttttgagaaaataatttttttgttcgaaaatgtgttgattttccactaacataaaatcgaaaaagt',\n", " 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaataa',\n", " 'gtcttaatacttcaaaattcgattcgcggatttgtggaggatttgaaagtagctgttcccgaagaaacaattattttaaagctgcacttgttgatctaccatgcaccacaaatgacaaaagatctgagaaacattggaaagatcacagaacaaggagttgaatctgtccacgcgattttcaatgcacttgagaggagattttgcaattaccgggacaagaaaagacgttacatccatgttctgagagagctcatgtgtcgtaacatcatgaatggtaattttggtacagaaaaacatttctgcattcttaaaaacacaactatttcagaacacgtccattcccggtctttccatcgtaccaaaatatgcaacagttctgccaatttcttcgcaaaatgatcttcgggacccaaaagcagttatacgggacgtcacggctgcccaaaaaagaaaaaatttagcgaaaaaaaaaaatcaattttcgctccaagaaaaggcttccacaatcgacaaaatctcgaaatgctgtggtgagatttttatttttcacgaaaagttttaaattttgaaaattttcagcccgacttctcacaaaacgtacaaacttctatttcatttggaggcaattcgacgaaaattggattccgtggtctaccgagcaatgctctttctcgtccgaacccgatggtacgttcactgctttaaatccactcaaaattgaaatcaaaaattatttcagccaatcacacaaacttgctcaacaagccagcaaatcactagaagccatccacttgttgctccaccagtatcttatcaacttccgaaatgcagtaatagtttagctgaagtgagtgtaaacgcgaaatgacaatttcaataagttaaaaattttagaaaattcgagcgagttgtgggcaagcaagaaggagtgctgcagaatcttcttcgactcaaagccacgtggacacaaattcgaagaagaaagtttacagactaaaattcgttcaaaaaagtcgtgaaattccgaagaaatagtaatatatcgattttctgtttaaatatttatgatttctaatatttttattttaatgcttcatttaaaatgaccgaaaaataaagaatagattttcaaaaagtaatctcagagcaatatatgttgtttctttatataatttatccattttatcaacaaatcgcatttttatccctcaaaaatcgataaaaacagctca',\n", " 'ttgttcggtggagcgcgtttgcacccatttagcaactgaaccaccgtg',\n", " 'actacggtgtgcaagtacgcaaacaccgcggcggcaatttgc',\n", " 'ttcttcaaaaaaacttcttcgaaattcaaattttgcaccaaaaa',\n", " 'ttgttcggtggagcgcgtttgcacctatttaacaactgaaccaccgtg',\n", " 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',\n", " 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',\n", " 'aaattatacacgtttgttcggtggagcgagtttgcttccatctagcaactgaaccaccgtg',\n", " 'gtgtgcaacttgccgccgcggtgtttgcgtacttgcacaccgtagt']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[extract_repeat(chapaev, genome) for chapaev in chapaevs]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### How are repeats related?\n", "\n", "Look at the repeat family/class names for the first several repeats in the roundworm database:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Simple_repeat Simple_repeat Satellite Simple_repeat DNA/MULE-MuDR DNA DNA DNA Simple_repeat Unknown Unknown DNA/PiggyBac? DNA DNA DNA DNA DNA DNA DNA DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA DNA Simple_repeat DNA/hAT DNA/hAT DNA/MULE-MuDR DNA/hAT DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/hAT DNA/MULE-MuDR Simple_repeat DNA/hAT Unknown Unknown Unknown DNA DNA DNA/CMC-Chapaev DNA/CMC-Chapaev DNA/CMC-Chapaev RC/Helitron Simple_repeat DNA DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Pogo Simple_repeat DNA Simple_repeat DNA DNA DNA/TcMar-Tc1 DNA/TcMar-Tc1 DNA/hAT Simple_repeat'" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from operator import attrgetter\n", "\n", "' '.join(map(attrgetter('rep_cl'), reps[:60]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll notice a few things. (1) The family names seem to have some hierarchical relationships; e.g. `DNA/TcMar-Tc1` seems to be more specific than `DNA`, (2) some of them end in a question mark, (3) some of them are `Unknown`. I don't really know what these mean or what to do as a result -- you'll have to navigate that issue. Seems like you can often look up the family names on RepeatMasker's site and find more detailed info (e.g., here are the details for [DNA/TcMar-Tc1](http://www.repeatmasker.org/cgi-bin/ViewRepeat?id=Tc1-7_Xt))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Alternatives to RepeatMasker\n", "\n", "[Dfam](http://www.dfam.org/) is an alternative. Note that Dfam ultimately relies on Repbase for its \"seed alignments.\" Also, the only the human genome has a pre-built Dfam database, as far as I can tell." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }