{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Markov chains for finding CpG islands" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from __future__ import print_function\n", "import random\n", "import re\n", "import gzip\n", "from itertools import islice\n", "from operator import itemgetter\n", "import numpy as np\n", "\n", "from future.standard_library import install_aliases\n", "install_aliases()\n", "from urllib.request import urlopen, urlcleanup, urlretrieve" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As training data, we use some already-called CpG islands. These were called in a prior study that used a kind of Hidden Markov Model. Relevant studies:\n", "\n", "* A species-generalized probabilistic model-based definition of CpG islands. Irizarry RA, Wu H, Feinberg AP. [doi:10.1007/s00335-009-9222-5](https://doi.org/10.1007/s00335-009-9222-5)\n", "* Redefining CpG islands using hidden Markov models. Wu H, Caffo B, Jaffee HA, Irizarry RA, Feinberg AP. [doi:10.1093/biostatistics/kxq005](https://doi.org/10.1093/biostatistics/kxq005)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "islands_url = 'http://www.haowulab.org/software/makeCGI/model-based-cpg-islands-hg19.txt'\n", "\n", "# URL for chromosome of the hg19 human genome assembly\n", "def hg19_chr_url(chrom):\n", " return 'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/%s.fa.gz' % chrom" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def sample(iterable, n):\n", " \"\"\" Samples n items from a stream \"\"\"\n", " samp = []\n", " for t, item in enumerate(iterable):\n", " if t < n:\n", " samp.append(item)\n", " else:\n", " m = random.randint(0, t)\n", " if m < n:\n", " samp[m] = item\n", " return samp" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def kmers_from_fasta(fh, k):\n", " \"\"\" Yield k-mer, offset pairs from FASTA filehandle.\n", " Ignore k-mers with chars besides A, C, G or T. \"\"\"\n", " non_acgt = re.compile('[^ACGTacgt]') # regex for detecting non-A/C/G/Ts\n", " kmer, off = [], 0\n", " for ln in fh:\n", " if ln[0] == r'>':\n", " kmer, off = [], 0 # new sequence\n", " continue\n", " for c in filter(lambda x: x.isalpha(), ln.decode()):\n", " if len(kmer) == k:\n", " kmer.pop(0) # k-mer buffer full, so bump one element\n", " kmer.append(c.upper())\n", " off += 1\n", " if len(kmer) == k:\n", " kmerstr = ''.join(kmer)\n", " if not non_acgt.search(kmerstr):\n", " yield kmerstr, off - k" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def kmers_islands_from_fasta(fh, k, isles, want_inside):\n", " \"\"\" Yield k-mers along with string indicating whether k-mer lies\n", " entirely within an island (True) or not (False) \"\"\"\n", " cur = 0\n", " for kmer, off in kmers_from_fasta(fh, k):\n", " while cur < len(isles) and off >= isles[cur][1]:\n", " cur += 1\n", " was_inside = False\n", " if cur < len(isles) and off >= isles[cur][0]:\n", " if off + k <= isles[cur][1]:\n", " was_inside = True\n", " if want_inside:\n", " yield kmer\n", " if not was_inside and not want_inside:\n", " yield kmer" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def parse_islands(fh, chromosome):\n", " \"\"\" Parse a file with island annotations. Only take\n", " records from given chromosome name. \"\"\"\n", " islands = []\n", " for ln in fh:\n", " ch, st, en, _ = ln.split(b'\\t', 3)\n", " if ch == chromosome.encode('utf8'):\n", " # convert 1-based closed interval to 0-based right-open\n", " islands.append((int(st)-1, int(en)))\n", " return islands" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def get_islands(chromosome):\n", " with urlopen(islands_url) as fh:\n", " return parse_islands(fh, chromosome) # takes a few seconds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Islands are described simply as a pair of numbers giving the 0-based right open interval for each island." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(16096917, 16097083),\n", " (16097226, 16097940),\n", " (16122658, 16123497),\n", " (16155779, 16157985),\n", " (16192710, 16193099),\n", " (16200190, 16202154),\n", " (16216495, 16218123),\n", " (16227376, 16227533),\n", " (16228205, 16228802)]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_islands('chr22')[1:10]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def kmers_islands_from_hg19(k, chromosome, islands, inside):\n", " fa_fn, _ = urlretrieve(hg19_chr_url(chromosome))\n", " with gzip.open(fa_fn, 'rb') as fa_fh:\n", " # Yield all the k-mer tuples\n", " for r in kmers_islands_from_fasta(fa_fh, k, islands, inside):\n", " yield r" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def samples_from_hg19(k, chromosome, n, upto):\n", " \"\"\" Given given k, and n, sample n k-mers from both inside\n", " and outside CpG islands, then return histograms of number\n", " of times each k-mer occurs inside and outside. \"\"\"\n", " islands = get_islands(chromosome)\n", " ins = sample(islice(kmers_islands_from_hg19(\n", " k, chromosome, islands, True), upto), n)\n", " out = sample(islice(kmers_islands_from_hg19(\n", " k, chromosome, islands, False), upto), n)\n", " return ins, out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our first idea for a model is to count how many times our $k$-mer of interest occurs inside and outside CpG islands. This get problematic as $k$ grows as it requires exponentially more training data." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "inside: 1553 out of 500000\n", "outside: 45 out of 500000\n", "p(inside): 0.97184\n", "p(outside): 0.02816\n" ] } ], "source": [ "from collections import Counter\n", "\n", "random.seed(723444)\n", "q = 'CGCGC'\n", "n = 500000\n", "upto = 5000000\n", "ins, out = samples_from_hg19(len(q), 'chr22', n, upto)\n", "assert len(ins) == n, (len(ins), len(out), n)\n", "assert len(out) == n, (len(ins), len(out), n)\n", "hist_in, hist_out = Counter(ins), Counter(out)\n", "\n", "# print info about inside/outside counts and probabilities\n", "print(\"inside: %d out of %d\" % (hist_in[q], n))\n", "print(\"outside: %d out of %d\" % (hist_out[q], n))\n", "print(\"p(inside): %0.5f\" % (float(hist_in[q]) / (hist_in[q] + hist_out[q])))\n", "print(\"p(outside): %0.5f\" % (float(hist_out[q]) / (hist_in[q] + hist_out[q])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we adopt the Markov assumption and estimate all the conditional probabilities, e.g. $P(A|C)$." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Now to build inside and outside Markov chains\n", "\n", "# compile dinucleotide tables\n", "samp_in, samp_out = samples_from_hg19(2, 'chr22', n=100000, upto=1000000)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def markov_chain_from_dinucs(dinucs):\n", " ''' Given dinucleotide frequencies, make a transition table. '''\n", " conds = np.zeros((4, 4), dtype=np.float64)\n", " margs = np.zeros(4, dtype=np.float64)\n", " for i, ci in enumerate('ACGT'):\n", " tot = 0\n", " for j, cj in enumerate('ACGT'):\n", " count = dinucs.get(ci + cj, 0)\n", " tot += count\n", " margs[i] += count\n", " if tot > 0:\n", " for j, cj in enumerate('ACGT'):\n", " conds[i, j] = dinucs.get(ci + cj, 0) / float(tot)\n", " return conds, margs" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "ins_conds, ins_margs = markov_chain_from_dinucs(Counter(samp_in))\n", "out_conds, out_margs = markov_chain_from_dinucs(Counter(samp_out))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.18427153, 0.27129525, 0.4055757 , 0.13885752],\n", " [ 0.19081672, 0.36113346, 0.24897947, 0.19907035],\n", " [ 0.17440554, 0.32764433, 0.35676759, 0.14118254],\n", " [ 0.09348595, 0.3474561 , 0.36885 , 0.19020795]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# transition probabilities inside CpG island\n", "ins_conds" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([ 1., 1., 1., 1.]), array([ 1., 1., 1., 1.]))" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# confirm that rows add to 1\n", "np.sum(ins_conds, 1), np.sum(out_conds, 1)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-2.44009488, -1.8820643 , -1.30195688, -2.84832282],\n", " [-2.38974049, -1.469396 , -2.00590131, -2.32864974],\n", " [-2.51948223, -1.60979755, -1.48694353, -2.82436637],\n", " [-3.41910668, -1.52509737, -1.43889385, -2.39435058]])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# elementwise log2 of above table\n", "np.log2(ins_conds)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.87536356, 0.59419041, 0.81181564, -0.85527103],\n", " [-0.98532149, 0.49570561, 2.64256972, -0.7126391 ],\n", " [-0.79486196, 0.68874785, 0.51821792, -0.79549511],\n", " [-1.22085697, 0.73036913, 0.48119354, -0.69736839]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# log ratio table\n", "np.log2(ins_conds) - np.log2(out_conds)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def classify(seq, lrTab):\n", " \"\"\" Classify seq using given log-ratio table. We're ignoring the\n", " initial probability for simplicity. \"\"\"\n", " bits = 0\n", " nucmap = { 'A':0, 'C':1, 'G':2, 'T':3 }\n", " for dinuc in [ seq[i:i+2] for i in range(len(seq)-1) ]:\n", " i, j = nucmap[dinuc[0]], nucmap[dinuc[1]]\n", " bits += lrTab[i, j]\n", " return bits" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "log_ratios = np.log2(ins_conds) - np.log2(out_conds)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "42.618380534506265" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "classify('CGCGCGCGCGCGCGCGCGCGCGCGCG', log_ratios)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-10.839270703924727" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "classify('ATTCTACTATCATCTATCTATCTTCT', log_ratios)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "itest, otest = samples_from_hg19(100, 'chr18', 1000, 100000)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "itestClass = [ classify(x, log_ratios) for x in itest ]\n", "otestClass = [ classify(x, log_ratios) for x in otest ]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD8CAYAAABn919SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAELRJREFUeJzt3X2sZHV9x/H3R0RtfAgoF7oBbhcT\nsGKrQFeCITUiokCNYNRE0tBNpbnWIEFiUwH/qI0mxValNm1NV6FuEyoShEIMtd1SWmMiWEDkwa1i\nURTdsvWBSmOiWfj2jzngZb2zM3ce7sz89v1KbmbmzDkz33NP7md/e+Y7v5OqQpK0+J426wIkSZNh\noEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIa8fSNfLNDDjmkNm/evJFvKUkL7447\n7vh+VS0NWm9DA33z5s3cfvvtG/mWkrTwkjw4zHqecpGkRhjoktQIA12SGmGgS1IjDHRJaoSBLkmN\nMNAlqREGuiQ1wkCXpEZs6DdFJc2fy3d8/cn7F512zAwr0bgcoUtSIwx0SWqEgS5JjTDQJakRBrok\nNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY1wLpf9xS1/8vP7p1wyuzokTY0jdElqhIEuSY0Y\nGOhJnpXkS0m+kuS+JH/cLT8qyW1J7k/y6STPmH65kqR+hhmh/xR4dVW9DDgOOD3JScAHgcur6mjg\nR8B50ytTkjTIwECvnv/rHh7Y/RTwauDabvl24OypVChJGspQ59CTHJDkLmA3sAP4L+CRqtrTrfIQ\ncPh0SpQkDWOotsWqegw4LslBwPXAi9daba1tk6wAKwDLy8sjlqmZWt3yCLY9SnNqXV0uVfUI8G/A\nScBBSZ74B+EI4Ht9ttlWVVuqasvS0tI4tUqS9mGYLpelbmROkl8CXgPsBG4B3tytthW4YVpFSpIG\nG+aUyyZge5ID6P0DcE1VfTbJV4Grk3wA+DJwxRTrlCQNMDDQq+pu4Pg1lj8AnDiNoiRJ6+c3RSWp\nEU7OpY3lJGHS1DhCl6RGGOiS1AgDXZIaYaBLUiMMdElqhF0umr6954KRNBWO0CWpEQa6JDXCQJek\nRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCCfnakG/y7o5KVbTLt/x\n9SfvX3TaMTOsRPPCEbokNWJgoCc5MsktSXYmuS/Jhd3y9yX5bpK7up8zp1+uJKmfYU657AHeXVV3\nJnkucEeSHd1zl1fVh6ZXniRpWAMDvap2Abu6+48m2QkcPu3CJEnrs65z6Ek2A8cDt3WL3pnk7iRX\nJjl4wrVJktZh6C6XJM8BPgO8q6p+nORjwPuB6m4/DLxtje1WgBWA5eXlSdQ8n/p1mmzE+7Vso3+v\nGkq/Dhs7b2ZrqBF6kgPphflVVXUdQFU9XFWPVdXjwMeBE9fatqq2VdWWqtqytLQ0qbolSXsZpssl\nwBXAzqr6yKrlm1at9kbg3smXJ0ka1jCnXE4GzgXuSXJXt+xS4Jwkx9E75fIt4O1TqVCSNJRhuly+\nAGSNp26afDmSpFH5TVFJaoRzuWhy7EhRH3a/bAxH6JLUCANdkhphoEtSIwx0SWqEgS5JjbDLZd6t\nt3Nkf5njRRtudaeK5pMjdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQI2xZnZe/2wv1xMqtZXrZv\nf/x9q3mO0CWpEQa6JDXCQJekRhjoktQIA12SGmGXy0Zy4qz2NN45M86EXE7mtfEcoUtSIwx0SWrE\nwEBPcmSSW5LsTHJfkgu75c9PsiPJ/d3twdMvV5LUzzAj9D3Au6vqxcBJwPlJjgUuBm6uqqOBm7vH\nkqQZGRjoVbWrqu7s7j8K7AQOB84CtnerbQfOnlaRkqTB1tXlkmQzcDxwG3BYVe2CXugnObTPNivA\nCsDy8vI4tWraRunC6bfNOB09jXUDre72uOi0Y/b7OoYx77XOa31Dfyia5DnAZ4B3VdWPh92uqrZV\n1Zaq2rK0tDRKjZKkIQwV6EkOpBfmV1XVdd3ih5Ns6p7fBOyeTomSpGEM0+US4ApgZ1V9ZNVTNwJb\nu/tbgRsmX54kaVjDnEM/GTgXuCfJXd2yS4HLgGuSnAd8G3jLdEqUJA1jYKBX1ReA9Hn61MmWI0ka\nld8UlaRGODmXpIW09+Rf024fXITJxhyhS1IjDHRJaoSBLkmNMNAlqREGuiQ1wi6X/dE8XjZtmAm5\n5rDuLz7wgyfvv+KUyb3uvE7+NM/8nTlCl6RmGOiS1AgDXZIaYaBLUiMMdElqhF0u82K9XR4b+b4b\nUcc49q5nnA6YSXXSPKWmNw21yUbOFdLvvUapYRHmONlfOEKXpEYY6JLUCANdkhphoEtSIwx0SWqE\nXS7TMIdzjmgf1tm1s7qr46Q+yy8a8i9rvR0iLXSULNKcK4v2+3aELkmNMNAlqREDAz3JlUl2J7l3\n1bL3Jflukru6nzOnW6YkaZBhRuifBE5fY/nlVXVc93PTZMuSJK3XwECvqs8DP9yAWiRJYxjnHPo7\nk9zdnZI5eGIVSZJGMmrb4seA9wPV3X4YeNtaKyZZAVYAlpeXR3w7qQ3TaoNbpFbAfiY5Ydj+aqQR\nelU9XFWPVdXjwMeBE/ex7raq2lJVW5aWlkatU5I0wEiBnmTTqodvBO7tt64kaWMMPOWS5FPAq4BD\nkjwE/BHwqiTH0Tvl8i3g7VOsUZI0hIGBXlXnrLH4iinUIkkag98UlaRGODmX2jbO5fLmZJK1cbo8\n7BB5qha6gfbFEbokNcJAl6RGGOiS1AgDXZIaYaBLUiPscpm2cbostCG++MAPnrz/ihe+YO2VnnIc\n3zTdgsZkZ8t4Fvn35whdkhphoEtSIwx0SWqEgS5JjTDQJakRdrms15zM76F9GKOzaJiOl5O+vW3k\n199721uXV0Z+LY2nxXldHKFLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRti2qMU07qRnU540bXX7\nI8tTfauFNq2JsGY1wdasWyEdoUtSIwYGepIrk+xOcu+qZc9PsiPJ/d3twdMtU5I0yDAj9E8Cp++1\n7GLg5qo6Gri5eyxJmqGBgV5Vnwd+uNfis4Dt3f3twNkTrkuStE6jnkM/rKp2AXS3h06uJEnSKKbe\n5ZJkBVgBWF72435N3lM6StjHZeTGfN1Bxpm0S5qEUUfoDyfZBNDd7u63YlVtq6otVbVlaWlpxLeT\nJA0yaqDfCGzt7m8FbphMOZKkUQ3Ttvgp4IvAi5I8lOQ84DLgtCT3A6d1jyVJMzTwHHpVndPnqVMn\nXIskaQx+U1SSGuFcLsPoN+/HlOcD0fj6XVJuvR0s4xi2+2X1el6aTqNwhC5JjTDQJakRBrokNcJA\nl6RGGOiS1AgDXZIaYduiFka/FsR9rSdNW7/L3e29fCMuSecIXZIaYaBLUiMMdElqhIEuSY0w0CWp\nEXa5rJ5g65RLZleH1qWVThYvWzcf+nWqLBpH6JLUCANdkhphoEtSIwx0SWqEgS5Jjdg/u1y8dJzm\nnJej0ygcoUtSIwx0SWrEWKdcknwLeBR4DNhTVVsmUZQkaf0mcQ79lKr6/gReR5I0Bk+5SFIjUlWj\nb5x8E/gRUMDfVNUvTEyRZAVYAVheXv6NBx98cOT3G4udLRMz7JWDRn2dVuZpmQY7XhbXOFcsSnLH\nMKe0xx2hn1xVJwBnAOcneeXeK1TVtqraUlVblpaWxnw7SVI/YwV6VX2vu90NXA+cOImiJEnrN3Kg\nJ3l2kuc+cR94LXDvpAqTJK3POF0uhwHXJ3nidf6+qj43kaokSes2cqBX1QPAyyZYiyRpDLYtSlIj\n9s/JuTQxw7QeDtPaaKvicPpN2jXMZF5O+NU+R+iS1AgDXZIaYaBLUiMMdElqhIEuSY2wy0Vq2OrO\nlmHWsftlsTlCl6RGGOiS1AgDXZIaYaBLUiMMdElqxOJ0uay+hNwpl6y9XDM1zHwsztkiTY8jdElq\nhIEuSY0w0CWpEQa6JDXCQJekRhjoktSIxWlbXM1WxakY5nJy476uJmeYibcmuW2/bfpdCq+fcSYA\n29frO7GYI3RJasZYgZ7k9CRfS/KNJBdPqihJ0vqNHOhJDgD+CjgDOBY4J8mxkypMkrQ+44zQTwS+\nUVUPVNXPgKuBsyZTliRpvcYJ9MOB76x6/FC3TJI0A6mq0TZM3gK8rqp+r3t8LnBiVV2w13orwBMf\nP78I+NqItR4CfH/EbeeN+zJ/WtkPcF/m1Tj78itVtTRopXHaFh8Cjlz1+Ajge3uvVFXbgNH7qzpJ\nbq+qLeO+zjxwX+ZPK/sB7su82oh9GeeUy38ARyc5KskzgLcCN06mLEnSeo08Qq+qPUneCfwTcABw\nZVXdN7HKJEnrMtY3RavqJuCmCdUyyNinbeaI+zJ/WtkPcF/m1dT3ZeQPRSVJ88Wv/ktSIxYi0JNc\n0E0xcF+SP121/JJu2oGvJXndLGtcjyR/kKSSHNI9TpK/6Pbl7iQnzLrGfUnyZ0n+s6v1+iQHrXpu\n4Y7JIk9hkeTIJLck2dn9fVzYLX9+kh1J7u9uD551rcNIckCSLyf5bPf4qCS3dfvx6a4BY+4lOSjJ\ntd3fyc4kr9iIYzL3gZ7kFHrfQH1pVb0E+FC3/Fh6nTUvAU4H/rqbjmCuJTkSOA349qrFZwBHdz8r\nwMdmUNp67AB+rapeCnwduAQW85g0MIXFHuDdVfVi4CTg/K7+i4Gbq+po4Obu8SK4ENi56vEHgcu7\n/fgRcN5Mqlq/jwKfq6pfBV5Gb5+mfkzmPtCBdwCXVdVPAapqd7f8LODqqvppVX0T+Aa96Qjm3eXA\nHwKrP7w4C/i76rkVOCjJpplUN4Sq+ueq2tM9vJXedxBgMY/JQk9hUVW7qurO7v6j9ILjcHr7sL1b\nbTtw9mwqHF6SI4DfAj7RPQ7wauDabpVF2Y/nAa8ErgCoqp9V1SNswDFZhEA/BvjN7r9d/57k5d3y\nhZt6IMkbgO9W1Vf2emrh9mWVtwH/2N1fxP1YxJrXlGQzcDxwG3BYVe2CXugDh86usqH9Ob3BzuPd\n4xcAj6waPCzKsXkh8D/A33anjz6R5NlswDGZiwtcJPkX4JfXeOq99Go8mN5/J18OXJPkhUDWWH/m\nLTsD9uVS4LVrbbbGspnuy772o6pu6NZ5L73/8l/1xGZrrD/zYzLAItb8C5I8B/gM8K6q+nFvcLs4\nkrwe2F1VdyR51ROL11h1EY7N04ETgAuq6rYkH2WDTnnNRaBX1Wv6PZfkHcB11euv/FKSx+nNiTDU\n1AMbrd++JPl14CjgK90f2xHAnUlOZA73ZV/HBCDJVuD1wKn1897XuduPISxizU+R5EB6YX5VVV3X\nLX44yaaq2tWdvtvd/xXmwsnAG5KcCTwLeB69EftBSZ7ejdIX5dg8BDxUVbd1j6+lF+hTPyaLcMrl\nH+idRyPJMcAz6E1wcyPw1iTPTHIUvQ8UvzSzKgeoqnuq6tCq2lxVm+kd9BOq6r/p7cvvdN0uJwH/\n+8R/zeZRktOB9wBvqKqfrHpqoY5JZ6GnsOjOM18B7Kyqj6x66kZga3d/K3DDRte2HlV1SVUd0f1t\nvBX416r6beAW4M3danO/HwDd3/R3kryoW3Qq8FU24JjMxQh9gCuBK5PcC/wM2NqNCO9Lcg29X9Qe\n4PyqemyGdY7jJuBMeh8i/gT43dmWM9BfAs8EdnT/27i1qn6/qhbumDQwhcXJwLnAPUnu6pZdClxG\n7/TkefQ6qt4yo/rG9R7g6iQfAL5M90HjArgAuKobJDxA72/6aUz5mPhNUUlqxCKccpEkDcFAl6RG\nGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEf8PvCTn3iemCK0AAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%pylab inline --no-import-all\n", "from matplotlib import pyplot\n", "bins = numpy.linspace(-60, 60, 100)\n", "pyplot.hist(itestClass, bins, alpha=0.5)\n", "pyplot.hist(otestClass, bins, alpha=0.5)\n", "pyplot.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }