{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Minimizers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One way to partition the space of all possible $k$-mers is by minimal $l$-mer, where $l < k$. For example, the minimal 2-mer in the string `ABC` is `AB` and the minimal 4-mer in the string `abracadabra` is `abra`. In this context, the minimal $l$-mer is called a *minimizer*, and we'll call such a partitioning scheme a $k, l$-minimizing scheme." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "def string_to_kmers(s, length):\n", " return [s[i:i+length] for i in range(len(s)-length+1)]\n", "\n", "def minimizer(k, l):\n", " \"\"\" Given k-mer, return its minimal l-mer \"\"\"\n", " assert l <= len(k)\n", " return min(string_to_kmers(k, l))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'AB'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "minimizer('ABC', 2)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'abra'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "minimizer('abracadabra', 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But if our goal is to partition the space of $k$-mers, couldn't we use a hash function instead? Say $k$ is 10 and $l$ is 4. A 10,4-minimizing scheme is a way for dividing the space of $4^{10}$ 10-mers (a million or so) into $4^4 = 256$ partitions. We can accomplish this with a hash function that maps $k$-mers to integers in $[0, 255]$. Why would we prefer minimizers over hash functions?\n", "\n", "The answer is that two strings that share long substrings tend to have the same minimizer, but not the same hash value. For example, the strings `abracadabr` and `bracadabra` have the substring `bracadabr` in common, and they have the same minimal 4-mer:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('abra', 'abra')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "minimizer('abracadabr', 4), minimizer('bracadabra', 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But their hash values (modulo 256) are not the same:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(26, 224)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# you might need to 'pip install mmh3' first\n", "import mmh3\n", "mmh3.hash('abracadabr') % 256, mmh3.hash('bracadabra') % 256" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Partition size distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A feature of hash functions is that they divide the 10-mers quite uniformly (evenly) among the 256 buckets. 10,4-minimzers divide them much less uniformly. This becomes clear when you consider that, given a random 10-mer, the 4-mer `TTTT` is very unlikely to be its minimizer, whereas the 4-mer `AAAA` is much more likely.\n", "\n", "We can also show this empirically by partitioning a collection of random 10-mers:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import random\n", "random.seed(629)\n", "\n", "def random_kmer(k):\n", " return ''.join([random.choice('ACGT') for _ in range(k)])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt \n", "\n", "def plot_counts(counter, title=None):\n", " idx = range(256)\n", " cnts = list(map(lambda x: counter.get(x, 0), idx))\n", " plt.bar(idx, cnts, ec='none')\n", " plt.xlim(0, 256)\n", " plt.ylim(0, 35)\n", " if title is not None:\n", " plt.title(title)\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEICAYAAABPgw/pAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAGNpJREFUeJzt3X2wXPV93/H3BwkMEbjm4aIKEMgm2MC4RZBb7LExQ41JhGoi6MRTSI2FgyNnamVghiZRbCcRxe44GUPqSf1QYYgUsKHUQKCOY0OJKYNLha9cAVJk82DLBiSky4OMwI+Ib//4/TYclt27T2fvXv3u5zWzs7vn8Xt+5+xnz5793b2KCMzMbO+3z6gLMDOzejjQzcwK4UA3MyuEA93MrBAOdDOzQjjQzcwK4UC3KUmaL+keSbslXTmiGv5e0vIpxn9B0p9MZ039kHS0pBckzRnCstdK+kTNy1wkKSTNrXO5fdZykaR7R13HTDdrA13SVkk/zS+wxu2IUdc1A60AngZeHxGXDXtlklZLur46LCLOjoh1efxrXtgR8XsRccWwaxtURPwoIg6MiD2jrqUkkt4s6TZJk5KelfQNSW+pjL9I0p6m1/oZTcs4X9IWSS9KekzSu6Z9Q2owawM9Oye/wBq3bc0TzISzkxE7BvjHmIa/QHNbW5/eANwOvAWYD9wP3NY0zX1Nr/W7GyMknQX8OfBB4CDgdOD701F47SJiVt6ArcB7WgxfBARwMfAj4J48/O3A/wF2AQ8AZ1TmeSPwv4HdwJ3AfwWuz+POAJ5ot27Sm+oq4DHgGeAm4JCmWpbnWp4GPlZZzhzgo3ne3cAGYCHwWeDKpnX+T+DSNm3xDuDbwI/z/Tvy8LXAL4FfAC+0aa+1wBfydu/O7XBMZfxngMeB53N976qMWw18Bbg+j1+Z1/XLvL4H8nR3Ax8CTgB+BuzJ43dVavhEZbm/CzwKPEt6oR9RGRfA7wGPAM/ltlIe96u5/h/ntv7vbdqr0z49FZjI27QDuKppf86tbNcVwLdy290BHFZZ5geAH+bj4k9oc8xW2uCzwN/lZa0Hju1yP3Sqt+Xx16aGzwF/n/fPt4B/DvyX3NbfBU6uTH9CboNdwGbgNyvjDs377nlSQF8B3Nvla/uQXPeh+flFU81Lel1fPOpMquM28gJGtuGdA/1vgHnAAcCR+UW1lBTAZ+XnY3me+4CrgNeR3t13032gXwr8X+CoPP9/A25oquXqXMdJwM+BE/L4PwAeIp2ZKI8/NL9AtwH75OkOA34CzG+xvYfkF9uFwFzggvy88WJYSyUsW8y/Nm/v6bn+z1RfPMD7c01zgcuAp4D987jVpPA+N7frAXnY9U3ruBv4UH78mhdntUbg3aTgOSXX81fkN+U8PoCvks7qjgYmgSV53A3Ax3It+wOntdnmTvv0PuDC/PhA4O1N+7Ma6I8Bb87bfjfwqTzuRFIongbsB3w6t9VUgf5s3vdzgS8BN3a5HzrV2/L4a1PD08Cv5fb7B+AHpDemOcAngG/mafclvel+NG/fu0nH0Vvy+BtJJzfzgLcCTzbv9ymOyXOB7ZXnFwEv5toeJr05NvbBHNJJxKpczxOkE7IDRp1RfeXaqAsY2YanF+ALpLODXcDfNh3Eb6pM+0fAdU3zf4N05nI08BIwrzLuy3Qf6FuAMyvjFuQX7txKLUdVxt8PnJ8ffw9Y1mb7tgBn5ccrga+1me5C4P6mYfcBF+XHa+kc6NXgOJB0Br2wzfTPASflx6uphG1l2CCBfg3wF031/BJYlJ8HlaAmhcaq/PhvgDXV9m6zDZ326T3A5VTOtpuOrWqgf7wy/j8AX8+P/5T8xp6f/wopeKYK9C9Wni8FvjvFNlT3Q6d6Wx5/bWq4uvL894Etlef/glc+Vb2L9KayT2X8DXn/z8n77PjKuP/cvN/b1HAUKfwvqAx7E+lT9D65hn8E/jiPOyJv4wTptXcY6ZPFJzutaybeZvs19HMj4g35dm7TuMcrj48B3idpV+NGOnNaQDognouIFyvT/7CHGo4Bbq0sdwspEOdXpnmq8vgnpJCCdHnlsTbLXUc6KyPfX9dmuiNa1PtD0qeSbv1TW0XEC6QzxSMAJF2Wv2z6cd6+f0Z60bxm3pq8antyPc/w6u1p155/SPqkc7+kzZJ+p88aLiaddX9X0rclvXeKadvVcgSvbtefkLZjKu2W1Wk/dKq37XJb2FF5/NMWz1+1fRHxcmV847gbI53QPN40bkqSxkiXrT4XETc0hkfE9yPiBxHxckQ8BPwn4LcqNQH8VURsj4inSZ+2l3Za30zkL6Hai8rjx0ln6L/bPJGkY4CDJc2rhPrRlflfJJ1dNaafQzpgq8v+nYj4VotlL+pQ4+PAscCmFuOuBzZJOol0rfJv2yxjG+lNpepo4Osd1l21sPFA0oGkyzjbck+BPwLOBDZHxMuSniOFZkPwas3Pm3Ua/6rtkTSPdKnhyQ7zERFPka6/I+k04H9JuiciHm2adMp9GhGPABdI2gf4t8BXJB3aaf1NtpMupTXWcUDejp512g811durbcBCSftUQv1o0iWRSdKn3oWk6+6NcW1JOpgU5rdHxCc7rDt4Zdufk/QEnY+rvcJsP0Pv1vXAOZJ+Q9IcSftLOkPSURHxQ9LHtcsl7ZeD4JzKvA8D+0v6N5L2BT5Ourbb8AXgk/mNAUljkpZ1WdcXgSskHafkXzZeiBHxBOkLzuuAmyPip22W8TXgzZJ+W9JcSf+OdP32q13WALBU0mmS9iN9ebU+Ih4n9Rh4ifQCnSvpT4HXd1jWDmBRDpd244/K62rly8AHJS2W9DrSR/X1EbG100ZIep+ko/LT50gv8lZdDKfcp5LeL2ksB9WuPLjXropfIR1z78jbejmvfiPsxZT7oaZ6e7We9Mb4h5L2zd0IzyFdvtsD3AKslvQrkk4kXd5sSdLrSZdAvxURq1qMP1vS/Pz4eNI19GovmL8Gfl/S4fmN4VJ6O/5nDAd6F3I4LSN9gTNJOjP+A15pv98G3ka61PBnpGuxjXl/TLo2+kXSWeKLpC9eGj5D+jb/Dkm7SV+Qvq3L0q4iXQO+g9Qb4BrSl1cN60jXDNtdbiEingHeS/qi7BnSZYf35o+e3foyabufJX0h9u/z8G+Qejw8TPrI/DM6X2L5H/n+GUnfaTH+H0g9Ip6S9JoaI+Iu0gv2ZtJZ7rHA+V1ux78C1kt6gbRPLomIH7RYR6d9ugTYnJfzGdI15591WUNjHZtJ16BvzNuxG9hJ+lKyV532w8D19ioifgH8JnA26cvKzwEfiIjGGflK0uWZp0jX5v96isWdR9p3H2zqa944qz8TeFDSi6QTmFtIb/QNV5BOfh4mXfL8f0Cns/wZqdFdy2okaTXwqxHx/k7TDrmO00mfLhY1Xauscx1rSV8QfnwYy7ckX8raBRzX6k3GDHyGXqx8KeASUs+HoYS5DZekc/Ilh3mkbosPkXrTmLXUMdDz9eL7JT2Qv/m/PA9fLelJSRvzba/8VrhEkk4gnc0tIP1Rh+2dlpG+PNwGHEe6FOKP1NZWx0sukkTqY/1CPuu7l3TmtwR4ISI+Pfwyzcysk47dFvMZwQv56b755rMEM7MZpqt+6Lmf7QbSb118NiLWSzobWCnpA6Rue5dFxHMt5l1B+sU+5s2b92vHH398bcWbmc0GGzZseDoixjpN11MvF0lvAG4ldaeaJHU3ClK3nwURMeVf1o2Pj8fExETX6zMzM5C0ISLGO03XUy+XiNhF+v2JJRGxIyL25B4UV5N+FMjMzEakm14uY/nMvPHnx+8h/ebDgspk59H6z8/NzGyadHMNfQGwLl9H3we4KSK+Kuk6SYtJl1y2Ah8eXplmZtZJN71cHgRObjH8wqFUZGZmffFfipqZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFcKBbmZWiI6BLml/SfdLekDSZkmX5+GHSLpT0iP5/uDhl2tmZu10c4b+c+DdEXESsBhYIuntwCrgrog4DrgrPzczsxHpGOiRvJCf7ptvASwD1uXh64Bzh1KhmZl1patr6JLmSNoI7ATujIj1wPyI2A6Q7w9vM+8KSROSJiYnJ+uq28zMmnQV6BGxJyIWA0cBp0p6a7criIg1ETEeEeNjY2P91mlmZh301MslInYBdwNLgB2SFgDk+521V2dmZl3rppfLmKQ35McHAO8BvgvcDizPky0HbhtWkWZm1tncLqZZAKyTNIf0BnBTRHxV0n3ATZIuBn4EvG+IdZqZWQcdAz0iHgRObjH8GeDMYRRlZma981+KmpkVwoFuZlYIB7qZWSEc6GZmhXCgm5kVwoFuZlYIB7qZWSEc6GZmhXCgm5kVwoFuZlYIB7qZWSEc6GZmhXCgm5kVwoFuZlYIB7qZWSEc6GZmhXCgm5kVwoFuZlYIB7qZWSEc6GZmhXCgm5kVwoFuZlaIjoEuaaGkb0raImmzpEvy8NWSnpS0Md+WDr9cMzNrZ24X07wEXBYR35F0ELBB0p153F9GxKeHV56ZmXWrY6BHxHZge368W9IW4MhhF2ZmZr3p6Rq6pEXAycD6PGilpAclXSvp4DbzrJA0IWlicnJyoGLNzKy9rgNd0oHAzcClEfE88HngWGAx6Qz+ylbzRcSaiBiPiPGxsbEaSjYzs1a6CnRJ+5LC/EsRcQtAROyIiD0R8TJwNXDq8Mo0M7NOuunlIuAaYEtEXFUZvqAy2XnApvrLMzOzbnXTy+WdwIXAQ5I25mEfBS6QtBgIYCvw4aFUaGZmXemml8u9gFqM+lr95ZiZWb/8l6JmZoVwoJuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFaJjoEtaKOmbkrZI2izpkjz8EEl3Snok3x88/HLNzKydbs7QXwIui4gTgLcDH5F0IrAKuCsijgPuys/NzGxEOgZ6RGyPiO/kx7uBLcCRwDJgXZ5sHXDusIo0M7POerqGLmkRcDKwHpgfEdshhT5weN3FmZlZ97oOdEkHAjcDl0bE8z3Mt0LShKSJycnJfmo0M7MudBXokvYlhfmXIuKWPHiHpAV5/AJgZ6t5I2JNRIxHxPjY2FgdNZuZWQvd9HIRcA2wJSKuqoy6HVieHy8Hbqu/PDMz69bcLqZ5J3Ah8JCkjXnYR4FPATdJuhj4EfC+4ZRoZmbd6BjoEXEvoDajz6y3HDMz65f/UtTMrBAOdDOzQjjQzcwK4UA3MyuEA93MrBAOdDOzQjjQzcwK4UA3MyuEA93MrBAOdDOzQjjQzcwK4UA3MyuEA93MrBAOdDOzQjjQzcwK4UA3MyuEA93MrBAOdDOzQjjQzcwK4UA3MyuEA93MrBAOdDOzQnQMdEnXStopaVNl2GpJT0ramG9Lh1ummZl10s0Z+lpgSYvhfxkRi/Pta/WWZWZmveoY6BFxD/DsNNRiZmYDGOQa+kpJD+ZLMgfXVpGZmfWl30D/PHAssBjYDlzZbkJJKyRNSJqYnJzsc3VmZtZJX4EeETsiYk9EvAxcDZw6xbRrImI8IsbHxsb6rdPMzDroK9AlLag8PQ/Y1G5aMzObHnM7TSDpBuAM4DBJTwB/BpwhaTEQwFbgw0Os0czMutAx0CPighaDrxlCLWZmNgD/paiZWSEc6GZmhXCgm5kVwoFuZlYIB7qZWSEc6GZmhXCgm5kVwoFuZlYIB7qZWSEc6GZmhXCgm5kVwoFuZlYIB7qZWSEc6GZmhXCgm5kVwoFuZlYIB7qZWSEc6GZmhXCgm5kVwoFuZlYIB7qZWSEc6GZmhXCgm5kVomOgS7pW0k5JmyrDDpF0p6RH8v3Bwy3TzMw66eYMfS2wpGnYKuCuiDgOuCs/NzOzEeoY6BFxD/Bs0+BlwLr8eB1wbs11mZlZj/q9hj4/IrYD5PvD200oaYWkCUkTk5OTfa7OzMw6GfqXohGxJiLGI2J8bGxs2KszM5u1+g30HZIWAOT7nfWVZGZm/eg30G8HlufHy4Hb6inHzMz61U23xRuA+4C3SHpC0sXAp4CzJD0CnJWfm5nZCM3tNEFEXNBm1Jk112JmZgPwX4ruxRat+ruhTGs2neo8Nve247zueh3oZmaFcKCbmRXCgW5mVggHuplZIRzoZmaFcKDvBfa2b+5t71La8bW3b88g9TvQzcwK4UA3MyuEA93MrBAOdDOzQjjQzcwKMWMDfW//phpGuw17e/sNUn+38/a6jmG3aV3Ln859PxPaZKbtx1GasYFuZma9caCbmRXCgW5mVggHuplZIRzoZmaFcKCbmRViRgT6qLoRVdc7rC5jJXUhG5WZ0J2v3bwz7d8ANtYx6Lrq2N5u5p3J+7aOeac7D2ZEoJuZ2eAc6GZmhZg7yMyStgK7gT3ASxExXkdRZmbWu4ECPfvXEfF0DcsxM7MB+JKLmVkhBg30AO6QtEHSilYTSFohaULSxOTk5D8Nb/42vt238738OE838061vKl6vQzSm6HV86mmmaruvaknS/N21rENnY6XVtP2Oq6b9Xeqq/G47l4OrZY5jHmq89a9zH7W1+34qeYZ5LXffOunvl5/aKyfbR000N8ZEacAZwMfkXR68wQRsSYixiNifGxsbMDVmZlZOwMFekRsy/c7gVuBU+soyszMetd3oEuaJ+mgxmPg14FNdRVmZma9GaSXy3zgVkmN5Xw5Ir5eS1VmZtazvgM9Ir4PnFRjLWZmNoBp77bY77fjdfY66Xb+unsp9NtDodNyBvndjen6zY5up2/X3t30Hhi0F8NU87QaN2jPkTp6iQyyrLr3fbc9NOr8/ZM6j/06e+3UtT96XY77oZuZFcKBbmZWCAe6mVkhHOhmZoVwoJuZFWJGBvowel60m3/Qb6Pr7KlQl0F+D6Ob36Fp3Df/tkVdvzHST/29/jZOt6a7R1Y/Ncyk3/fppdZBetNMxzbX0SOtzum7MSMD3czMeudANzMrhAPdzKwQDnQzs0I40M3MCjFrAn2Y/11luntR1LWOXv67Trd11tWTaBQG6bFT9zzNdfTTc6bX3kyd9n+vv6nSa1sOq6dSP+ruWTRVW9a5bbMm0M3MSudANzMrhAPdzKwQDnQzs0I40M3MCuFANzMrxF4R6N102eqli1Uv6+p2XDfje1n/MLtZjmr+qZaxt3RtHIa6jt1u1zOdbT3q/TpIt89ul93v+GHYKwLdzMw6c6CbmRVioECXtETS9yQ9KmlVXUWZmVnv+g50SXOAzwJnAycCF0g6sa7CzMysN4OcoZ8KPBoR34+IXwA3AsvqKcvMzHqliOhvRum3gCUR8aH8/ELgbRGxsmm6FcCK/PStwKb+yy3CYcDToy5iBnA7uA0a3A6d2+CYiBjrtJC5AxSgFsNe8+4QEWuANQCSJiJifIB17vXcBonbwW3Q4Haorw0GueTyBLCw8vwoYNtg5ZiZWb8GCfRvA8dJeqOk/YDzgdvrKcvMzHrV9yWXiHhJ0krgG8Ac4NqI2NxhtjX9rq8gboPE7eA2aHA71NQGfX8pamZmM4v/UtTMrBAOdDOzQkxLoM/mnwiQtFXSQ5I2SprIww6RdKekR/L9waOus06SrpW0U9KmyrC22yzpj/Ox8T1JvzGaquvXph1WS3oyHw8bJS2tjCuuHSQtlPRNSVskbZZ0SR4+a46HKdqg/mMhIoZ6I31h+hjwJmA/4AHgxGGvd6bcgK3AYU3D/gJYlR+vAv581HXWvM2nA6cAmzptM+lnIx4AXge8MR8rc0a9DUNsh9XAf2wxbZHtACwATsmPDwIezts6a46HKdqg9mNhOs7Q/RMBr7UMWJcfrwPOHWEttYuIe4Bnmwa32+ZlwI0R8fOI+AHwKOmY2eu1aYd2imyHiNgeEd/Jj3cDW4AjmUXHwxRt0E7fbTAdgX4k8Hjl+RNMvTGlCeAOSRvyzyAAzI+I7ZB2NnD4yKqbPu22eTYeHyslPZgvyTQuNRTfDpIWAScD65mlx0NTG0DNx8J0BHpXPxFQsHdGxCmkX6X8iKTTR13QDDPbjo/PA8cCi4HtwJV5eNHtIOlA4Gbg0oh4fqpJWwwroh1atEHtx8J0BPqs/omAiNiW73cCt5I+Ou2QtAAg3+8cXYXTpt02z6rjIyJ2RMSeiHgZuJpXPkoX2w6S9iUF2Zci4pY8eFYdD63aYBjHwnQE+qz9iQBJ8yQd1HgM/Drp1yZvB5bnyZYDt42mwmnVbptvB86X9DpJbwSOA+4fQX3TohFi2Xm88uujRbaDJAHXAFsi4qrKqFlzPLRrg6EcC9P0Le9S0je7jwEfG/W3ztN1I/XseSDfNje2HTgUuAt4JN8fMupaa97uG0gfIX9JOtu4eKptBj6Wj43vAWePuv4ht8N1wEPAg/mFu6DkdgBOI10ueBDYmG9LZ9PxMEUb1H4s+E//zcwK4b8UNTMrhAPdzKwQDnQzs0I40M3MCuFANzMrhAPdzKwQDnQzs0L8f67Emx+SFmavAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from collections import Counter\n", "# hash 1000 random 10-mers\n", "cnt = Counter([mmh3.hash(s) % 256 for s in [random_kmer(10) for _ in range(1000)]])\n", "plot_counts(cnt, 'Frequency of partitions using hash mod 256')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def lmer_to_int(mer):\n", " \"\"\" Maps AAAA to 0, AAAC to 1, etc. Works for any length argument. \"\"\"\n", " cum = 0\n", " charmap = {'A':0, 'C':1, 'G':2, 'T':3}\n", " for c in mer:\n", " cum *= 4\n", " cum += charmap[c]\n", " return cum" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcEAAAEICAYAAAAnXE+UAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAHfxJREFUeJzt3Xu0XHV99/H3x4RrQrmTBggEFRXUEmjEC2hpVQyoK9CKBRRSRYM+0gdbntZ4QcNSLHUBQusVhIcAgkQRQS4KRmkeLIKJDZAYFFDuISEEQsI94fv88fuduDPMzDknOefsmfl9XmvNOjP7MvPdv9l7f2bv/Zs5igjMzMxK9LK6CzAzM6uLQ9DMzIrlEDQzs2I5BM3MrFgOQTMzK5ZD0MzMiuUQHCKSxkmaK2mVpDNqquE6SdPajP+WpJNHsqYNIWk3SasljeqAWhZJOmiopx1kDQdJenCon9eGjqSQ9MoBTvtxSUvzOr79cNfWLQaz3UuamNt89Ma+br8hKOleSc/k4vpuO2/sC/eg6cBy4M8i4qThfjFJMyVdXB0WEYdExKw8/h8k3dQw/mMR8cXhrm1jRcT9ETE2ItZ2QC2vjYgbh3ra4SDpr/KO4Ut11TBQSv4g6bdtprlA0ppW+5u8joek97cYv4ekFyV9YyNrvWCo2lTSJsCZwMF5HX9skAF6XWU//IKk5yuP76rcfyYv++oWtxcb9uvVeZ/Pz933+LoNWM4bJX1kMPMM5XbfbP/YykCPBN+bi+u7PdzkRTc6kbvc7sBvYwR+fcBtbY3yzvVs4JYaaxjMevk2YCfg5ZLe0OS5xgB/B6wEPtDiOaYBK/LfZo4FHgeOlLTZIGobTuOAzYFFGzJz/qA7NiLGAt8FvlLZL+9ZGXcI8HDDfntsZfz9rL9fr877ZeCyyrhDhmbRW6t1nxYRbW/AvcA7mgyfCARwHKlB5+bhbwL+G3gCuA04qDLPHsB/AauAG4CvARfncQcBD7Z6bVJgzwDuAR4DZgPbNdQyLdeyHPhs5XlGAZ/J864C5gMTgK8DZzS85o+BT7Zoi7cAvyZtmL8G3pKHXwC8ADwPrG7RXhcA38rLvSq3w+6V8WcDDwBP5vreWhk3E/gBcHEef0J+rRfy692Wp7sR+AiwF/AssDaPf6JSw5cqz/tR4G7SjuQqYOfKuAA+BtxF2pF8HVAe98pc/8rc1pe1aK/+3tP9gXl5mZYCZza8n6Mry/VF4Je57a4Hdqg857HAfXm9OJkW62ylDb4BXJfb5pfAnwNn5eW8E9i3Rb0zSevdhbmORcDkNtN+P79nq4A7gFcBnwaW5ff64Mq8HwIW52n/ABzfrh2bLNcM4CuN73GLaQP4X/m9XZXb9hXAzfm9mA1sWpn+PcAC0jb938BfNCzzp4DbgeeA0bl9v9FPDeeTduI/BL7WZPyxuY1OBBY2Gb878CIpKNcA45pMcw/w8bxuva+fer4PPEJap+cCr83Dp7P+tv3jNm36ynx/M+B00r5oKWm73yK//0/laVcDP8+vFXn4auDv+9snN6zLTd/r/tYZ2m8jM8n75TbzbwtcDTxK2m6uBnbN404l7XuezcvU7P2dSEN+8NLtfo88fBXwM9I+6OKG+V+yzwem0GT/2HJZBtDQTRurUsSFwJj8Ju9C2hEdSgqtd+bHO+Z5biadCtiM9ElwFQMPwU8CvwJ2zfN/G7i0oZZzcx37kDbIvfL4fyHthF4NKI/fnrQTfhh4WZ5uB+Bpmm9Q2+U3+xjShn5Ufrx9fytkZfyqvNybkULvpsr4D+aaRgMnkTbIzSsr5QvAYbldt6DJikoOwXz/H6rP31gj8Dd5xdkv1/Of5A8ylY36amAbYDfSyj4lj7sU+GyuZXPgwIFuiA3v6c3AMfn+WOBNDe9nNQTvIe1EtsiPT8vj9iat6AcCm5J2Pi/QPgSXA3+Za/858EfSTncU8CXgFy3qnUnasA/N0/4b8Kt+pn1Xfk8vzK/zWWAT0geQP1bmfTcpiAT8FWk93G+AO7Tdgd/nNlz3HreZPkgfev4MeC1pW5kDvBzYGvgtMC1Pux8ptN+Yl3laXs7NKsu8gPShcov+9id5ni1JYXsoKcSWUwndPM0cUqiPI4Xcfg3jTwZuzffvAP65Yfxb83JtS1q3r+qnpg8DW5G2hbOABc22m37atC8Ez8rtu11+zh8D/9Zs3W6cdzC3dnUNYJ1Zt642GTeT/kNw+/zebZmX8fvAjyrjbyTvi1rM39cO1fxYr21I+4fTSdv1gXmdaQzBVvv8fpeh7zbQ06E/kvREvv2oYdzMiHgqIp4h7civjYhrI+LFiLiB9En/UEm7AW8ATo6I5yJibl45Bup4UtI/GBHP5YV8X8Nh9CkR8UxE3EY6Ct0nD/8I8LmI+F0kt0XEYxFxK+mT39vzdEcCN0bE0iav/27groi4KCLWRMSlpKOG9w5iGa6JiLm5/s8Cb5Y0ASAiLs41rYmIM0gb46sr894cET/K7frMIF6zlQ8A50fEb3I9n871TKxMc1pEPBER9wO/ACbl4S+Qdrw7R8SzEbHetcdBeAF4paQdImJ1RPyqzbT/NyJ+n5d9dqWW95E+nd8UEc8DnydtHO1cERHzI+JZ4Arg2Yi4MNK1iMuAfdvMe1Nev9cCF/GndayZ/xcRP42INaSdxI6kNn0B+B4wUdI2ABFxTUTck9fP/yId7b61n+Xo8x+k7Wr1AKcH+PeIeDIiFgELgesj4g8RsZJ0lNzXBh8Fvh0Rt0TE2kjXnJ8jnfFZ9/oR8cAg1su/zc9xPemD1mjS9gWkDhLAXwOX5G1xDi895XkscEm+f0mT8dOA6yLi8Tz+EEk7tSooIs6PiFWVfcs+krYe4PKsI0mkNvuniFgREatIpxePHOxzdbK8r7o8Ip7Oy3gq6cPbYFXzY51KXnw+Ip7P+5irmszfap8/YAMNwcMiYpt8O6xh3AOV+7sDR1QC8wlSgo8HdgYej4inKtPfN4hadweuqDzvYtIh97jKNI9U7j9N+mQM6VPqPS2edxYpvMl/L2ox3c5N6r2PdPQ7UOvaKu+wVuTnRdJJkhZLWpmXb2vSkelL5h0i6y1Prucx1l+eVu35r6Qjlltzj8gPb2ANx5GO7u6U9GtJ72kzbatadmb9dn2atBztVD/kPNPk8Vhaa6xj8zbXMxqfd3n86aJ/30Y/FkDSIZJ+JWlFfv8PZf33vylJ7wW2iojLWoxfVOngUA3VgbbB7sBJDdv0BPJ6mw123ZwGzM4f+J4jnRKthtgxwOKIWJAffxc4Ol/3RNIBpFNl38vjLwFeL2lSHr8FcESej4i4mXTK7OhmxUgaJek0SfdIepJ0lAQDaP8mdiQdHc2vtNdP8vCeIWlLSd+WdF9us7nANhp8j+5W687OwIq8PbebttV+YcCG4mJk9VP3A8BFEfHRxokk7Q5sK2lMJQh3q8z/FGnl6Zt+FOuvOA8AH46IXzZ57on91PgA6VTTwibjLgYWStqHdC2t8Ui3z8OkHULVbqQVfKAm9N2RNJZ0uuThvHP6FOmIdFFEvCjpcVLQ9Gk8uunvaKe/8estT+6IsD3wUD/zERGPkD7tIulA4GeS5kbE3Q2Ttn1PI+Iu4ChJLyMdHfxAg+8yvoTKEXPeAXZVt/PcaeNy0tHNlRHxQj7jovZzAmmdmSypb2ewNbBW0usjYmpEvHYjy3sAODUiTm0zTX/r2jqSdiWdit9f0t/lwVuSPkzsEBHLSe2wW2WZRpPe00NIRwPTSG2zIB14rXMs6dTs4aRTvd+Q9J953DZ5/FlNyjoamAq8gxSAW5MudfQ9+YCXj3Rq9xnSNcV+t6UudhJpu3tjRDySP4D8D4Nvs1bTLQG2k7RlJQgntJh2MM/7EkP9PcGLgfdKelf+dLW50necdo2I+0inRk+RtGneeVZPJf6etCG8O3/i+xzplGCfbwGn5jBF0o6Spg6wru8AX5S0p5K/6NvZRsSDpE4uFwGXtzmlcy3wKklHSxot6e9J16OuHmANkE4LHyhpU1JnhFsi4gHSOfU1pOtuoyV9nrQRt7OUdDqt1Xu4FNg1v1YzlwAfkjQp74S/nOu5t7+FkHRE3plB2lkE6ai8Udv3VNIHJe0YES+SOl3Q4nna+QFpnXtLXtZTGFh4dJJNSe3yKLBG0iHAwQOc92TS0fSkfLuKdJ3kQ0NU27nAxyS9MW87Y/L7uVWrGZS+UnBBi9HHkNaLV1dqfhXwIOkD0ZtJH1j3r4x/HfmUp6TNgfeTOqxMqtz+EfhAPiqfRup48/rK+AOASZJe36SmrUinZx8jBfKXG8YvJV0v7Vdel88Fvtp3+lXSLpLe1Wa2lzy/0tcmDhrIa9ZkK1LYPyFpO+ALDeMH3GbNVPJiZs6LNzO4S0/97R/XGdIQzDv0qaSemI+SPkX+S+V1jiZdYF9BarQLK/OuJPVY+w7paOQp0obR52zSBn69pFWkTjJvHGBpZ5KuI11Purh6Huliap9ZpA2m1alQIuIxUi+5k0gby78C78mfXAfqEtJyryB1zOjr+v1T0nWY35NOUT5L/6eYvp//PibpN03G/5zUe/ERSS+pMSLmkHagl5M+db2CgV+3eANwi6TVpPfkxIj4Y5PX6O89nQIsys9zNnBkvk43YPma1j+STo0tIXU+WkbaqXWFfE3lf5PW0cdJ20mz6x9N542IR/pupB3TUxGxYohqm0c66v9aru1uUqerdiaQet02M43Uc/SRhrq/lcdNIx0N39Ew/mzS9ve3pGW8sGH8eaSOOx8gHR2f1fAa80lnbRqvHULaD91HWkd/S9q3VJ0H7K3mfSKa+RSpnX6VTxX+jPWv7zeaCczKz//+/AFzNanDT6c6i7QPXU5qr8YzYmeT+mw8Luk/NvA1PgC8mbS//RLpev1At+v+9o/r9HV5r4WkmaReUR/sb9phruNtpKPYifmT3HC8xgWk3lqfG47ntySfZn4C2LNZMNvwykfjt5G+RvFC3fV0I0kfJJ1O/XTdtXQSSZcBd0ZE41HnRin+S9f5NN2JwHeGKwBteCl1DplDOg16OukT9L111lSq3EN3r7rr6GYRMaBfOul1Sj+isIL01aKDSWcZTxvq1+nY3w7N1xNvlXSbUg+3U/LwmZIekrQg3w7diNfYi3TUMJ7mF8ytO0wldfR5GNiTdFq1vlMcZjYU/pz0fcPVpK8BfTwi/meoX6TW06HtKHX7GhMRq/PR2k2kI7YpwOqIOL3WAs3MrOt17OnQ/Em+78u/m+RbZya2mZl1pY4NQVj3vbL5pN+q/HpE3JK7j58g6VhSF9qT8q9CNM47ndSNmjFjxvzla17zmhGs3Mys+82fP395RPTUF/0bdezp0Cqln5a6gtQV/lFSt9wgfddufES0/cWSyZMnx7x584a9TjOzXiJpfkRMrruO4dSxHWOqIuIJ0gXSKRGxNP+GYd+XUvevtTgzM+taHRuC+Rdhtsn3tyD9pNGdksZXJjuc5j+FZmZm1q9OviY4nvQrCqNIYT07Iq6WdFH+nbogfRfs+BprNDOzLtaxIRgRt9PkX9pExDE1lGNmZj2oY0+HmpmZDTeHYDZxxjV1l2BmZiPMIWhmZsVyCJqZWbEcgmZmViyHoJmZFcshaGZmxXIImplZsRyCZmZWLIegmZkVyyFoZmbFcgiamVmxHIJmZlYsh6CZmRXLIWhmZsVyCJqZWbEcgmZmViyHoJmZFcshaGZmxXIImplZsRyCZmZWLIegmZkVq2NDUNLmkm6VdJukRZJOycO3k3SDpLvy323rrtXMzLpTx4Yg8BzwNxGxDzAJmCLpTcAMYE5E7AnMyY/NzMwGrWNDMJLV+eEm+RbAVGBWHj4LOKyG8szMrAd0bAgCSBolaQGwDLghIm4BxkXEEoD8d6cW806XNE/SvEcffXTkijYzs67R0SEYEWsjYhKwK7C/pNcNYt5zImJyREzecccdh69IMzPrWh0dgn0i4gngRmAKsFTSeID8d1mNpZmZWRfr2BCUtKOkbfL9LYB3AHcCVwHT8mTTgCvrqdDMzLrd6LoLaGM8MEvSKFJYz46IqyXdDMyWdBxwP3BEnUWamVn36tgQjIjbgX2bDH8MePvIV2RmZr2mY0+HmpmZDTeHoJmZFcshaGZmxXIImplZsRyCZmZWLIegmZkVyyFoZmbFcgiamVmxHIJmZlYsh6CZmRXLIWhmZsVyCJqZWbEcgmZmViyHoJmZFcshaGZmxXIImplZsRyCZmZWLIegmZkVyyFoZmbFKiIE73hoZd0lmJlZByoiBM3MzJpxCJqZWbE6NgQlTZD0C0mLJS2SdGIePlPSQ5IW5NuhdddqZmbdaXTdBbSxBjgpIn4jaStgvqQb8rivRsTpNdZmZmY9oGNDMCKWAEvy/VWSFgO71FuVmZn1ko49HVolaSKwL3BLHnSCpNslnS9p2xbzTJc0T9K8tU+7d6iZmb1Ux4egpLHA5cAnI+JJ4JvAK4BJpCPFM5rNFxHnRMTkiJg8asutR6xeMzPrHh0dgpI2IQXgdyPihwARsTQi1kbEi8C5wP511mhmZt2rY0NQkoDzgMURcWZl+PjKZIcDC0e6NjMz6w0d2zEGOAA4BrhD0oI87DPAUZImAQHcCxxfT3lmZtbtOjYEI+ImQE1GXTvStZiZWW/q2NOhZmZmw80h2MTEGdfUXYKZmY0Ah6CZmRXLIWhmZsVyCJqZWbEcgmZmViyHoJmZFcshOMKGqufpxBnXuBermdlGcgiamVmxHIJmZlYsh6CZmRXLIWhmZsVyCJqZWbEcgoPg3phmZr3FIWhmZsVyCJqZWbEcgmZmViyHoJmZFcshaGZmxSoqBIfj9zZHoseoe6WamQ2PokLQzMysyiFoZmbF6tgQlDRB0i8kLZa0SNKJefh2km6QdFf+u23dtZqZWXfq2BAE1gAnRcRewJuAT0jaG5gBzImIPYE5+bGZmdmgdWwIRsSSiPhNvr8KWAzsAkwFZuXJZgGH1VOhmZl1u44NwSpJE4F9gVuAcRGxBFJQAjvVV5mZmXWzjg9BSWOBy4FPRsSTg5hvuqR5kuatfXrleuNG6isH1dfx1xzMzDpPR4egpE1IAfjdiPhhHrxU0vg8fjywrNm8EXFOREyOiMmjttx6ZAo2M7Ou0rEhKEnAecDiiDizMuoqYFq+Pw24cqRrMzOz3jC67gLaOAA4BrhD0oI87DPAacBsSccB9wNH1FSfmZl1uY4NwYi4CVCL0W8fyVrMzKw3dezpUDMzs+HmEKxo7MHZ99g9O83MepND0MzMiuUQNDOzYjkEzcysWA5BMzMrlkPQzMyK5RAcYo09Stv1LO3UXqedWpeZ2VBzCJqZWbEcgmZmViyHoJmZFcshaGZmxXIImplZsYoJwcH2eGw1/cQZ12xw70n3ujQz6yzFhKCZmVkjh6CZmRXLIWhmZsVyCJqZWbEcgmZmVqyiQ9C9Nc3MylZ0CJqZWdkcgmZmVqyODkFJ50taJmlhZdhMSQ9JWpBvh9ZZo5mZda+ODkHgAmBKk+FfjYhJ+XbtCNdkZmY9oqNDMCLmAivqrsPMzHpTR4dgGydIuj2fLt227mLMzKw7dWMIfhN4BTAJWAKc0WwiSdMlzZM0b+3TKzf6RYfz6xT+qoaZWT26LgQjYmlErI2IF4Fzgf1bTHdOREyOiMmjttx6ZIs0M7Ou0HUhKGl85eHhwMJW05qZmbUzuu4C2pF0KXAQsIOkB4EvAAdJmgQEcC9wfG0FmplZV+voEIyIo5oMPm/ECzEzs57UdadDzczMhopDsAYTZ1yzQT1CN3S+jeXeq2bWqxyCZmZWLIegmZkVyyFoZmbFcgiamVmxHIJmZlYshyBD0/uxrh6U7rlpZrbhHIJmZlYsh6CZmRXLIWhmZsVyCJqZWbEcgmZmViyH4EbolJ6ZI1FHpyyrmdlQcgiamVmxHIJmZlYsh6CZmRXLIWhmZsVyCJqZWbEcgkNoqHtQbszzDWbegU7r/2pvZr3GIWhmZsVyCJqZWbEcgmZmVqyODkFJ50taJmlhZdh2km6QdFf+u22dNZqZWffq6BAELgCmNAybAcyJiD2BOfmxmZnZoHV0CEbEXGBFw+CpwKx8fxZw2IgWZWZmPaOjQ7CFcRGxBCD/3anZRJKmS5onad7ap1eOaIF16uSvO3S6iTOucbuYFaYbQ3BAIuKciJgcEZNHbbl13eWYmVkH6sYQXCppPED+u6zmeszMrEt1YwheBUzL96cBV9ZYi5mZdbGODkFJlwI3A6+W9KCk44DTgHdKugt4Z35sZmY2aKPrLqCdiDiqxai3j2ghZmbWkzr6SLAEzXojuofi+tweZjZcHIJmZlYsh6CZmRXLIWhmZsVyCJqZWbEcgmZmVqyO/orEcOqWHocbUqd/P9TMbGB8JGhmZsVyCJqZWbEcgmZmViyHoJmZFcshaGZmxXIIFqbdf0+vDi+l52gpy2lmzTkEzcysWA5BMzMrlkPQzMyK5RA0M7NiOQTNzKxYDkEzMyuWQ7BHlNrVf0OXu9T2MrP1OQTNzKxYDkEzMytW1/4/QUn3AquAtcCaiJhcb0VmZtZtujYEs7+OiOV1F2FmZt3Jp0PNzKxY3RyCAVwvab6k6Y0jJU2XNE/SvLVPr6yhvMFr9+PWjdOVrG/5m7XDQNtwIM/f6rGZ9Y5uPh16QEQ8LGkn4AZJd0bE3L6REXEOcA7AZuP3jLqKNDOzztW1R4IR8XD+uwy4Ati/3orMzKzbdGUIShojaau++8DBwMJ6qzIzs27TradDxwFXSIK0DJdExE/qLcnMzLpNV4ZgRPwB2KfuOszMrLt15elQG3p19YAsteflUPRiNbON5xA0M7NiOQTNzKxYDkEzMyuWQ9DMzIrlEDQzs2I5BAvQ6z0R+5ZvqJex+ny93H5mJXMImplZsRyCZmZWLIegmZkVyyFoZmbFcgiamVmxHIIbqFt6C25Mne3mbTVuMPN0SxtuqF7vldufkpfduodD0MzMiuUQNDOzYjkEzcysWA5BMzMrlkPQzMyK5RA0M7NiOQRt0Jp1fW/8OsCGdo9v97WCwXzloLGWdrVt6FcZ2rXDhn4dpN18g6lxKJdpY7Rqe399wjqFQ9DMzIrlEDQzs2J1bQhKmiLpd5LuljSj7nrMzKz7dGUIShoFfB04BNgbOErS3vVWZWZm3aYrQxDYH7g7Iv4QEc8D3wOm1lyTmZl1GUVE3TUMmqT3AVMi4iP58THAGyPihMo004Hp+eHrgIUjXmjn2QFYXncRNXMbJG4HtwH03wa7R8SOI1VMHUbXXcAGUpNh66V5RJwDnAMgaV5ETB6JwjqZ28Ft0Mft4DYAtwF07+nQB4EJlce7Ag/XVIuZmXWpbg3BXwN7StpD0qbAkcBVNddkZmZdpitPh0bEGkknAD8FRgHnR8SiNrOcMzKVdTy3g9ugj9vBbQBug+7sGGNmZjYUuvV0qJmZ2UZzCJqZWbF6PgRL/Xk1SfdKukPSAknz8rDtJN0g6a78d9u66xxqks6XtEzSwsqwlsst6dN53fidpHfVU/XQatEGMyU9lNeHBZIOrYzrxTaYIOkXkhZLWiTpxDy8tHWhVTsUtT60FRE9eyN1mrkHeDmwKXAbsHfddY3Qst8L7NAw7CvAjHx/BvDvddc5DMv9NmA/YGF/y036yb3bgM2APfK6MqruZRimNpgJ/J8m0/ZqG4wH9sv3twJ+n5e1tHWhVTsUtT60u/X6kaB/Xm19U4FZ+f4s4LAaaxkWETEXWNEwuNVyTwW+FxHPRcQfgbtJ60xXa9EGrfRqGyyJiN/k+6uAxcAulLcutGqHVnqyHdrp9RDcBXig8vhB2q8AvSSA6yXNzz8hBzAuIpZA2jiAnWqrbmS1Wu7S1o8TJN2eT5f2nQbs+TaQNBHYF7iFgteFhnaAQteHRr0egv3+vFoPOyAi9iP9p41PSHpb3QV1oJLWj28CrwAmAUuAM/Lwnm4DSWOBy4FPRsST7SZtMqyX26HI9aGZXg/BYn9eLSIezn+XAVeQTmkslTQeIP9dVl+FI6rVchezfkTE0ohYGxEvAufyp1NcPdsGkjYh7fi/GxE/zIOLWxeatUOJ60MrvR6CRf68mqQxkrbquw8cTPovGlcB0/Jk04Ar66lwxLVa7quAIyVtJmkPYE/g1hrqG3Z9O/7scP70X1V6sg0kCTgPWBwRZ1ZGFbUutGqH0taHdrryZ9MGKgb/82q9YhxwRVr/GQ1cEhE/kfRrYLak44D7gSNqrHFYSLoUOAjYQdKDwBeA02iy3BGxSNJs4LfAGuATEbG2lsKHUIs2OEjSJNKprXuB46F32wA4ADgGuEPSgjzsMxS2LtC6HY4qbH1oyT+bZmZmxer106FmZmYtOQTNzKxYDkEzMyuWQ9DMzIrlEDQzs2I5BM3MrFgOQTMzK9b/B9CaOMXFZjMBAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# get minimal 4-mers from 1000 random 10-mers\n", "cnt = Counter([lmer_to_int(minimizer(s, 4)) for s in [random_kmer(10) for _ in range(1000)]])\n", "plot_counts(cnt, 'Frequency of partitions using minimal 4-mer; AAAA at left, TTTT at right')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### TODO: Salting minimizers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### TODO: Making minimizer partitionings more uniform" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }