{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Introduction to PMFs\n", "====================\n", "\n", "Copyright 2015 Allen Downey\n", "\n", "License: [Creative Commons Attribution 4.0 International](http://creativecommons.org/licenses/by/4.0/)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# this line makes the code compatible with Python 2 and 3\n", "from __future__ import print_function, division\n", "\n", "# this line makes Jupyter show figures in the notebook\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hist objects\n", "\n", "A histogram is a map from each possible value to the number of times it appears. A map can be a mathematical function or, as in the examples below, a Python data structure that provides the ability to look up a value and get its probability.\n", "\n", "`Counter` is a data structure provided by Python; I am defining a new data structure, called a `Hist`, that has all the features of a Counter, plus a few more that I define." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import random\n", "import matplotlib.pyplot as plt\n", "from collections import Counter\n", "from itertools import izip\n", "\n", "class Hist(Counter):\n", " \n", " def __add__(self, other):\n", " \"\"\"Returns the Pmf of the sum of elements from self and other.\"\"\"\n", " return Hist(x + y for x, y in product(self.elements(), other.elements()))\n", " \n", " def choice(self):\n", " \"\"\"Chooses a random element.\"\"\"\n", " return random.choice(list(self.elements()))\n", " \n", " def plot(self, **options):\n", " \"\"\"Plots the Pmf.\"\"\"\n", " plt.bar(*zip(*self.items()), **options)\n", " plt.xlabel('Values')\n", " plt.ylabel('Counts')\n", " \n", " def ranks(self):\n", " \"\"\"Returns ranks and counts as lists.\"\"\"\n", " return izip(*enumerate(sorted(self.values(), reverse=True)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an example, I'll make a Hist of the letters in my name:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Hist({'a': 1, 'e': 1, 'l': 2, 'n': 1})" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hist = Hist('allen')\n", "hist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can look up a letter and get the corresponding count:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hist['l']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or loop through all the letters and print their counts:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a 1\n", "e 1\n", "l 2\n", "n 1\n" ] } ], "source": [ "for letter in hist:\n", " print(letter, hist[letter])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Counter` provides `most_common`, which makes a list of (element, count) pairs:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[('l', 2), ('a', 1), ('e', 1), ('n', 1)]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hist.most_common()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here they are in a more readable form:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "l 2\n", "a 1\n", "e 1\n", "n 1\n" ] } ], "source": [ "for letter, count in hist.most_common():\n", " print(letter, count)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I defined `choice`, which returns a random element from the Hist. On average, 'l' should appear twice as often as the other letters." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a\n", "a\n", "e\n", "a\n", "e\n", "n\n", "l\n", "l\n", "l\n", "l\n" ] } ], "source": [ "for i in range(10):\n", " print(hist.choice())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One (perhaps surprising) thing you can use Hists for: checking whether two words are anagrams of each other. If two words are anagrams, they have the same Hist: " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def is_anagram(word1, word2):\n", " return Hist(word1) == Hist(word2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a simple test:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "is_anagram('allen', 'nella')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And my favorite anagram pair:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "is_anagram('tachymetric', 'mccarthyite')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's a false one, just to make sure:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "is_anagram('abcd', 'abccd')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far the elements in the Hists have been letters (actually strings), but in statistics it is more common to work with numerical elements. Here's a Hist that represents the possible outcomes of a six-sided die:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Hist({1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1})" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6 = Hist([1,2,3,4,5,6])\n", "d6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Hist` provides a plot function:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfMAAAFmCAYAAAB5pHO7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGdFJREFUeJzt3X9MVff9x/HX5RJUBJWrgC3+SEcqbogWa92XYYqxGHXT\nLFV0KGJr7Oaqa+boYhS/w66DQLta24067WqjmxaqnaAuXZnV6pqCSl2HSuvcMFNaInCrQKGI0t7v\nH83ut1aLx9se7/1cno/E5N577pE3n6hPzrnXcx0ej8cjAABgrBB/DwAAAL4aYg4AgOGIOQAAhiPm\nAAAYjpgDAGA4Yg4AgOFsj/np06c1depUbd++/ZptlZWVmjt3rjIzM7Vhwwa7RwEAICjZGvPOzk7l\n5+crJSXlutsLCgpUXFyskpISvfXWW6qrq7NzHAAAgpKtMe/Tp49eeOEFxcTEXLOtvr5egwYNUmxs\nrBwOh9LS0nT48GE7xwEAICjZGvOQkBCFhYVdd5vb7ZbL5fLed7lcampqsnMcAACCUsC8AY6rygIA\n4JtQf33hmJgYNTc3e+83NjZe93T85y2a80sNGhhr92gBoaW1Uf9bOF+jRo266X1Pnz6t/NUlvWKt\nWCfrfF0r1sm63rRWrJN1X2WtrPJbzOPi4tTR0aGGhgbFxMTo4MGDWrduXY/7DBoYqyFRcbdoQv+7\ncKFdzc0f+bRfb1or1sk6X9aKdbq5/XrTWrFO1vm6VpIUHR15w+fYGvPa2loVFRWpoaFBoaGhqqio\n0JQpUzRs2DClp6dr7dq1ysnJkSTNnDlTI0eOtHMcAACCkq0xT0xM1B//+Mcv3T5hwgSVlpbaOQIA\nAEEvYN4ABwAAfEPMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAw\nHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAA\nDEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwA\nAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEH\nAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADBdq\n9xcoLCxUTU2NHA6HcnNzlZSU5N22fft27d27V06nU2PGjNHq1avtHgcAgKBja8yrq6t19uxZlZaW\nqq6uTmvWrFFpaakkqb29XZs3b9b+/fvlcDi0ZMkSHT9+XGPHjrVzJAAAgo6tp9mrqqqUnp4uSYqP\nj1dbW5s6OjokSWFhYQoLC1N7e7u6u7t16dIlDRw40M5xAAAISrbG3O12y+Vyee9HRUXJ7XZL+izm\ny5cvV3p6uu677z6NHTtWI0eOtHMcAACCku2vmX+ex+Px3m5vb9emTZv017/+Vf3799eiRYv0z3/+\nUwkJCbdypIDmckUoOjrypve7eDHChmkCF+tknS9rxTpZ19vWinWyzte1ssrWmMfExHiPxCWpqalJ\n0dHRkqQzZ85o+PDh3lPrEyZMUG1tLTH/nAsX2tXc/JFP+/UmrJN1vqwV63Rz+/UmrJN1vq6VJEs/\nBNh6mj01NVUVFRWSpNraWsXGxio8PFySFBcXpzNnzujy5cuSpJMnT3KaHQAAH9h6ZJ6cnKzExERl\nZmbK6XQqLy9PZWVlioyMVHp6upYsWaLs7GyFhoYqOTlZd999t53jAAAQlGx/zTwnJ+eq+58/jT5v\n3jzNmzfP7hEAAAhqXAEOAADDEXMAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwxBwDAcMQc\nAADDEXMAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwx\nBwDAcMQcAADDEXMAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxH\nzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwxBwDAcMQcAADD\nEXMAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwxBwDA\ncKF2f4HCwkLV1NTI4XAoNzdXSUlJ3m3nz59XTk6Ouru79a1vfUuPPfaY3eMAABB0bD0yr66u1tmz\nZ1VaWqr8/HwVFBRctb2oqEhLlizRjh075HQ6df78eTvHAQAgKNka86qqKqWnp0uS4uPj1dbWpo6O\nDkmSx+PRsWPHNGXKFEnSL37xCw0dOtTOcQAACEq2xtztdsvlcnnvR0VFye12S5IuXLig8PBwFRQU\naMGCBXr66aftHAUAgKB1S98A5/F4rrrd1NSkBx98UNu2bdO7776rQ4cO3cpxAAAICra+AS4mJsZ7\nJC5JTU1Nio6OlvTZUXpcXJyGDRsmSUpJSdG///1vpaWl2TmSUVyuCEVHR970fhcvRtgwTeBinazz\nZa1YJ+t621qxTtb5ulZW2Rrz1NRUFRcXa968eaqtrVVsbKzCw8MlSU6nU8OGDdO5c+c0YsQI1dbW\naubMmXaOY5wLF9rV3PyRT/v1JqyTdb6sFet0c/v1JqyTdb6ulSRLPwTYGvPk5GQlJiYqMzNTTqdT\neXl5KisrU2RkpNLT05Wbm6tVq1bJ4/Fo1KhR3jfDAQAA62z/f+Y5OTlX3U9ISPDeHjFihF566SW7\nRwAAIKhxBTgAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEA\nMBwxBwDAcJZifuXKFZ0/f16SdOrUKZWXl6uzs9PWwQAAgDWWYr5q1Sr94x//UGNjox555BGdPn1a\nq1atsns2AABggaWYNzY2avr06Xr11Ve1YMECrVy5Uq2trXbPBgAALLAU88uXL8vj8Wjfvn2aPHmy\nJKmjo8POuQAAgEWWYj5x4kTdfffdio6O1h133KEtW7boG9/4ht2zAQAAC0KtPOn+++/Xj370Iw0Y\nMECSdN9992nMmDG2DgYAAKzp8ci8ra1N586dU25urlpbW1VfX6/6+npduXJFa9asuVUzAgCAHvR4\nZP7OO+9o69ateu+99/TAAw94Hw8JCdGkSZNsHw4AANxYjzFPS0tTWlqaSkpKNH/+/Fs1EwAAuAmW\nXjNPT0/X1q1b1draKo/H4338pz/9qW2DAQAAayy9m33p0qU6deqUQkJC5HQ6vb8AAID/WToyDw8P\nV2Fhod2zAAAAH1g6Mh83bpzq6ursngUAAPjA0pH5m2++qS1btigqKkqhoaHyeDxyOBw6ePCgzeMB\nAIAbsRTz3/3ud3bPAQAAfGQp5lVVVdd9PCMj42sdBgAA3DxLMT927Jj39uXLl3X8+HGNHz+emAMA\nEAAsxfyL72Tv7OzU6tWrbRkIAADcHEvvZv+ifv366dy5c1/3LAAAwAeWjswXLFggh8Phvd/Y2KiE\nhATbhgIAANZZivmKFSu8tx0OhyIiIjR69GjbhgIAANZZOs0+ceJEhYSEqLa2VrW1tbp06dJVR+oA\nAMB/LMX82Wef1ZNPPqmmpiY1NjYqPz9fmzZtsns2AABggaXT7EeOHFFpaalCQj5rf3d3txYuXKil\nS5faOhwAALgxS0fmn376qTfkkhQaGsppdgAAAoSlI/MxY8boxz/+sb7zne9IkiorKzVmzBhbBwMA\nANbcMOb19fXKzc3VX/7yF9XU1MjhcGjChAl66KGHbsV8AADgBno8zV5VVaX58+ero6ND3/ve95Sb\nm6vZs2erpKREJ0+evFUzAgCAHvQY8+LiYr344ouKjIz0PpaQkKCNGzfqmWeesX04AABwYz3G3OPx\naNSoUdc8fuedd6qrq8u2oQAAgHU9xvzjjz/+0m0tLS1f+zAAAODm9RjzO++8UyUlJdc8/vvf/17j\nxo2zbSgAAGBdj+9mX7lypZYvX67du3drzJgx+vTTT/X3v/9dERERXAEOAIAA0WPMo6OjtWPHDlVV\nVelf//qXnE6nZsyYoXvuuedWzQcAAG7A0kVjUlJSlJKSYvcsAADAB5Yu5woAAAIXMQcAwHDEHAAA\nwxFzAAAMR8wBADAcMQcAwHC2x7ywsFCZmZmaP3++Tpw4cd3nrFu3TtnZ2XaPAgBAULI15tXV1Tp7\n9qxKS0uVn5+vgoKCa55TV1ent99+Ww6Hw85RAAAIWrbGvKqqSunp6ZKk+Ph4tbW1qaOj46rnFBUV\nKScnx84xAAAIarbG3O12y+Vyee9HRUXJ7XZ775eVlenb3/62br/9djvHAAAgqN3SN8B5PB7v7dbW\nVu3atUuLFy+Wx+O5ahsAALDO0rXZfRUTE3PVkXhTU5Oio6MlSYcPH9bFixeVlZWlrq4u1dfXq6io\nSKtWrbJzJKO4XBGKjo686f0uXoywYZrAxTpZ58tasU7W9ba1Yp2s83WtrLI15qmpqSouLta8efNU\nW1ur2NhYhYeHS5KmTZumadOmSZI++OADrV69mpB/wYUL7Wpu/sin/XoT1sk6X9aKdbq5/XoT1sk6\nX9dKkqUfAmyNeXJyshITE5WZmSmn06m8vDyVlZUpMjLS+8Y4AADw1dgac0nXvFM9ISHhmufExcXp\nD3/4g92jAAAQlLgCHAAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAA\nhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4A\ngOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgD\nAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPm\nAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGI\nOQAAhgu1+wsUFhaqpqZGDodDubm5SkpK8m47fPiw1q9fL6fTqTvuuEMFBQV2jwMAQNCx9ci8urpa\nZ8+eVWlpqfLz86+J9dq1a/Xb3/5WL730ktrb2/W3v/3NznEAAAhKtsa8qqpK6enpkqT4+Hi1tbWp\no6PDu33Xrl2KiYmRJLlcLrW0tNg5DgAAQcnWmLvdbrlcLu/9qKgoud1u7/3+/ftLkpqamlRZWam0\ntDQ7xwEAICjZ/pr553k8nmse+/DDD/Xwww/rscce08CBA2/lOAHP5YpQdHTkTe938WKEDdMELtbJ\nOl/WinWyrretFetkna9rZZWtMY+JibnqSLypqUnR0dHe++3t7frhD3+oRx99VCkpKXaOYqQLF9rV\n3PyRT/v1JqyTdb6sFet0c/v1JqyTdb6ulSRLPwTYepo9NTVVFRUVkqTa2lrFxsYqPDzcu72oqEiL\nFy9WamqqnWMAABDUbD0yT05OVmJiojIzM+V0OpWXl6eysjJFRkZq0qRJ2rNnj86dO6cdO3bI4XBo\n1qxZmjt3rp0jAQAQdGx/zTwnJ+eq+wkJCd7bx48ft/vLAwAQ9LgCHAAAhiPmAAAYjpgDAGA4Yg4A\ngOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgD\nAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPm\nAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGI\nOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4\nYg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOFC7f4ChYWFqqmpkcPhUG5urpKSkrzbKisrtX79ejmd\nTt17771atmyZ3eMAABB0bD0yr66u1tmzZ1VaWqr8/HwVFBRctb2goEDFxcUqKSnRW2+9pbq6OjvH\nAQAgKNka86qqKqWnp0uS4uPj1dbWpo6ODklSfX29Bg0apNjYWDkcDqWlpenw4cN2jgMAQFCyNeZu\nt1sul8t7PyoqSm63+7rbXC6Xmpqa7BwHAICgZPtr5p/n8Xh82vZfLa2NX+c4Ae2rfq+9Za1YJ+u+\nyvfKOt26/U3BOll3K75XW2MeExPjPRKXpKamJkVHR3u3NTc3e7c1NjYqJiamx9/vD39aa8+gQSY6\nerz+50/j/T1GwGOdrGGdrGOtrGGdvn62nmZPTU1VRUWFJKm2tlaxsbEKDw+XJMXFxamjo0MNDQ3q\n7u7WwYMHNWnSJDvHAQAgKDk8Vs5vfwVPP/20jh49KqfTqby8PL377ruKjIxUenq63n77bT311FOS\npOnTp+vBBx+0cxQAAIKS7TEHAAD24gpwAAAYjpgDAGA4Yg4AgOGMifnp06c1depUbd++3d+jBLQn\nn3xSmZmZmjt3rvbt2+fvcQLWpUuXtGLFCmVnZ+sHP/iBDh486O+RAlpXV5emTp2q8vJyf48SkI4e\nPaqUlBQtWrRI2dnZys/P9/dIAW3Pnj36/ve/rzlz5ujQoUP+HicgvfLKK8rOzvb+mRo/vuf/yndL\nLxrjq87OTuXn5yslJcXfowS0I0eOqK6uTqWlpWppadH999+vqVOn+nusgHTgwAElJSVpyZIlamho\n0OLFizV58mR/jxWwNmzYoEGDBvl7jIA2ceJEPfvss/4eI+C1tLToueeeU3l5uTo6OvSb3/xGaWlp\n/h4r4GRkZCgjI0PSZ59z8tprr/X4fCNi3qdPH73wwgt6/vnn/T1KQJs4caLGjRsnSRowYIA6Ozvl\n8XjkcDj8PFng+e53v+u93dDQoNtuu82P0wS2M2fO6MyZM/yDewP8xyBrKisrlZqaqn79+qlfv356\n/PHH/T1SwHvuuee0bt26Hp9jxGn2kJAQhYWF+XuMgOdwONS3b19J0s6dO5WWlkbIbyAzM1MrV65U\nbm6uv0cJWE888YRWrVrl7zECXl1dnZYtW6asrCxVVlb6e5yA9cEHH6izs1MPP/ywFi5cqKqqKn+P\nFNBOnDih2267TYMHD+7xeUYcmePmvP7669q1a5c2b97s71ECXmlpqU6dOqWf//zn2rNnj7/HCTjl\n5eVKTk5WXFycJI4+v8zIkSP1k5/8RDNmzFB9fb0WLVqkffv2KTSUf2K/yOPxqKWlRRs2bND777+v\nRYsW6Y033vD3WAFr586dmj179g2fx5+0IPPmm2/q+eef1+bNmxUREeHvcQJWbW2tBg8erKFDh2r0\n6NH65JNPdOHChas+yQ/SoUOH9P777+uNN97Q+fPn1adPHw0dOpT3r3xBbGysZsyYIUkaPny4hgwZ\nosbGRu8PQfh/Q4YMUXJyshwOh4YPH67+/fvzd68HR48eVV5e3g2fZ8RpdljT3t6uX//619q4caMi\nIyP9PU5Aq66u1osvvijps4/j7ezs5B+T61i/fr127typl19+WXPnztWyZcsI+XXs3bvX++epublZ\nH374oWJjY/08VWBKTU3VkSNH5PF4dPHiRX388cf83fsSTU1N6t+/v6UzPEYcmdfW1qqoqEgNDQ0K\nDQ1VRUWFiouLNWDAAH+PFlBeffVVtbS0aMWKFd43vj355JMaOnSov0cLOPPnz1dubq6ysrLU1dWl\ntWv5RD74bsqUKXr00Ue1f/9+dXd365e//CWn2L9EbGyspk2bpnnz5snhcFg66uytmpubb/ha+X9x\nbXYAAAzHaXYAAAxHzAEAMBwxBwDAcMQcAADDEXMAAAxHzAEAMBwxB3qBhQsX6sCBA1c91tXVpYkT\nJ6qxsfG6+2RnZ3PdbMAQxBzoBTIyMlRWVnbVY/v27dNdd93FlcqAIEDMgV5g+vTpOnbsmFpbW72P\nlZeXKyMjQ6+//royMzP1wAMPaOHChWpoaLhq36NHj2rBggXe+6tXr9Yrr7wi6bOrDmZlZSkrK0uP\nPPKIWltb9cknn2j16tXKzMzU/Pnz9atf/erWfJNAL0bMgV6gb9++mjp1qv785z9L+uyaz6dOndKU\nKVPU1tamZ555Rlu3btW9996rbdu2XbP/9T5K9/z589q0aZO2bNmi7du365577tHGjRt1+vRp1dTU\nqLS0VCUlJRo9erTa29tt/x6B3oyLBwO9xJw5c/T4448rKytLe/fu1axZsxQaGqrBgwdr5cqV8ng8\ncrvduuuuuyz9fu+8846am5u1ZMkSeTweXblyRcOHD1d8fLxcLpeWLl2qyZMna8aMGXyCH2AzYg70\nEmPHjtXly5dVV1en3bt3a/369eru7tbPfvYz7d69W8OHD9f27dt18uTJq/b74lH55cuXJUlhYWEa\nO3asNm7ceM3X2rZtm9577z0dOHBAGRkZKi0t1ZAhQ+z75oBejtPsQC+SkZGhDRs2KDw8XPHx8ero\n6JDT6dTtt9+urq4u7d+/3xvr/4qIiPC+472zs1PHjx+XJCUlJenEiRNyu92SpNdee00HDhzQyZMn\nVV5erm9+85tavny5EhMT9Z///OeWfp9Ab8OROdCLzJo1S0899ZT3YycHDhyomTNnas6cOYqLi9ND\nDz2klStXqqKiwntEPnr0aCUkJGj27NkaMWKExo8fL0mKiYnRmjVrtHTpUoWHh6tv37564oknFBoa\nquLiYr388ssKCwvTyJEjvfsAsAcfgQoAgOE4zQ4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAA\nhiPmAAAYjpgDAGC4/wNfN1FpXeEFYgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "COLORS = sns.color_palette()\n", "\n", "d6.plot(color=COLORS[3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`elements` returns an iterator" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.elements()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which is easier to see if you convert to a list:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4, 5, 6]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(d6.elements())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The product of two iterators is an iterator that enumerates all pairs:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from itertools import product\n", "\n", "product(d6.elements(), d6.elements())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the elements of the product:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[(1, 1),\n", " (1, 2),\n", " (1, 3),\n", " (1, 4),\n", " (1, 5),\n", " (1, 6),\n", " (2, 1),\n", " (2, 2),\n", " (2, 3),\n", " (2, 4),\n", " (2, 5),\n", " (2, 6),\n", " (3, 1),\n", " (3, 2),\n", " (3, 3),\n", " (3, 4),\n", " (3, 5),\n", " (3, 6),\n", " (4, 1),\n", " (4, 2),\n", " (4, 3),\n", " (4, 4),\n", " (4, 5),\n", " (4, 6),\n", " (5, 1),\n", " (5, 2),\n", " (5, 3),\n", " (5, 4),\n", " (5, 5),\n", " (5, 6),\n", " (6, 1),\n", " (6, 2),\n", " (6, 3),\n", " (6, 4),\n", " (6, 5),\n", " (6, 6)]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(product(d6.elements(), d6.elements()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can compute the sum of all pairs:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[2,\n", " 3,\n", " 4,\n", " 5,\n", " 6,\n", " 7,\n", " 3,\n", " 4,\n", " 5,\n", " 6,\n", " 7,\n", " 8,\n", " 4,\n", " 5,\n", " 6,\n", " 7,\n", " 8,\n", " 9,\n", " 5,\n", " 6,\n", " 7,\n", " 8,\n", " 9,\n", " 10,\n", " 6,\n", " 7,\n", " 8,\n", " 9,\n", " 10,\n", " 11,\n", " 7,\n", " 8,\n", " 9,\n", " 10,\n", " 11,\n", " 12]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(x + y for x, y in product(d6.elements(), d6.elements()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally make a Hist of the sums:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Hist({2: 1, 3: 2, 4: 3, 5: 4, 6: 5, 7: 6, 8: 5, 9: 4, 10: 3, 11: 2, 12: 1})" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Hist(x + y for x, y in product(d6.elements(), d6.elements()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But all of that is provided by `__add__`, which we can call using the `+` operator:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Hist({2: 1, 3: 2, 4: 3, 5: 4, 6: 5, 7: 6, 8: 5, 9: 4, 10: 3, 11: 2, 12: 1})" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twice = d6 + d6\n", "twice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can plot the histogram of outcomes from rolling two dice:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAe4AAAFmCAYAAACr9HnjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGWZJREFUeJzt3XtwVPX5x/HPJmnAJaESCOEiF6GEdAARWzOmtI0iEbEy\nDkLTAAmMeEEK/IRppRAc2lpaAe0gM5EGBAYKlFhbCdGhTbmVoUMGrCiWjHQ7ASsQCIElwZBIIOzv\nD0s0kMsqnt3zJO/XDDPJbna/T85e3tmT5cQTCAQCAgAAJkSEewAAABA8wg0AgCGEGwAAQwg3AACG\nEG4AAAwh3AAAGOJ4uAsKCvTII49o3Lhx2rNnj9PLAQDQqjka7oqKCr3yyivKy8vTypUrtXPnTieX\nAwCg1fM4eQCWbdu26Z///KcWLlzo1BIAALQpjr7iPnnypGpqajR9+nRlZmaqqKjIyeUAAGj1opy8\n8kAgoIqKCq1YsUInTpzQ5MmTtXv3bieXBACgVXM03F26dNGwYcPk8XjUq1cvdejQQX6/X3FxcY1+\nfSAQkMfjcXIkIKx8Pp/emjJV3b3ekK57qrpaD69fq8TExJCuC+Cr52i4hw8fruzsbD355JOqqKhQ\ndXV1k9GWJI/Ho/Lyj50cqVWIj49lOwXJbdvK769Sd69XvWNiw7J2U9vCbdvJrdhOwWNbBSc+/os/\nFzga7oSEBI0aNUrp6enyeDy8SQ0AgJvkaLglKT09Xenp6U4vAwBAm8CR0wAAMIRwAwBgCOEGAMAQ\nwg0AgCGEGwAAQwg3AACGEG4AAAwh3AAAGEK4AQAwhHADAGAI4QYAwBDCDQCAIYQbAABDCDcAAIYQ\nbgAADCHcAAAYQrgBADCEcAMAYAjhBgDAEMINAIAhhBsAAEMINwAAhhBuAAAMIdwAABhCuAEAMIRw\nAwBgCOEGAMAQwg0AgCGEGwAAQwg3AACGEG4AAAwh3AAAGEK4AQAwhHADAGAI4QYAwBDCDQCAIYQb\nAABDCDcAAIYQbgAADCHcAAAYQrgBADCEcAMAYEiUk1d+4MABPfPMMxowYIACgYAGDhyo5557zskl\nAQBo1RwNtyQlJydr+fLlTi8DAECb4Piu8kAg4PQSAAC0GY6Hu6SkRD/+8Y81adIk7du3z+nlAABo\n1RzdVd6nTx/NnDlTo0eP1vHjxzV58mRt375dUVGO76EHVFdXJ5/PJ7+/KqTr9u3bT5GRkSFd82aE\naztJ9rYV4AaOFjQhIUGjR4+WJPXq1UtdunRRWVmZevbs2eRl4uNjnRyp1WA7tczn8+mtKVPV3esN\n2ZqnqqsVt36tEhMTGz3//PkYHQvZNA3FxcU0er8Jx3aSWt5WbsVjL3hsK2c4Gu4333xT5eXlmjp1\nqsrLy3Xu3DklJCQ0e5ny8o+dHKlViI+PZTsFwe+vUnevV71jQvvk4fdXNXn7hONV7efXbmyucG2n\n5mZyKx57wWNbBefL/HDjaLhHjBihn/zkJ9q5c6euXLmiX/7yl+wmBwDgJjha0Q4dOig3N9fJJQAA\naFM4choAAIYQbgAADCHcAAAYQrgBADCEcAMAYAjhBgDAEMINAIAhhBsAAEMINwAAhhBuAAAMIdwA\nABhCuAEAMIRwAwBgCOEGAMAQwg0AgCGEGwAAQwg3AACGEG4AAAwh3AAAGEK4AQAwhHADAGAI4QYA\nwBDCDQCAIYQbAABDCDcAAIYQbgAADCHcAAAYQrgBADCEcAMAYAjhBgDAEMINAIAhhBsAAEMINwAA\nhhBuAAAMIdwAABhCuAEAMIRwAwBgCOEGAMAQwg0AgCGEGwAAQwg3AACGEG4AAAwh3AAAGOJ4uC9d\nuqS0tDTl5+c7vRQAAK2e4+FesWKFbr31VqeXAQCgTXA03EePHtXRo0eVmprq5DIAALQZUU5e+ZIl\nS7Rw4UJt2bLFyWUQZnV1dfrww6NhWbtv336KjIwMy9pwDvcpoGmOhTs/P1/Dhg1Tz549JUmBQCCo\ny8XHxzo1Uqvipu3k8/m0b87/qbvXG9J1T1VXK279WiUmJjZ6/vnzMToW0ok+FRcX0+TtE66ZpKbn\ncuNMbr1PSe567Lkd28oZjoV7z549OnHihHbv3q3Tp0+rXbt26tatm1JSUpq9XHn5x06N1GrEx8e6\najv5/VXq7vWqd0zoH6R+f1WT28LvrwrxNJ+t67aZrq3d2FxuncmN9ym3PfbcjG0VnC/zw41j4V62\nbFn9xzk5ObrttttajDYAAGge/48bAABDHH1z2jUzZ84MxTIAALR6vOIGAMAQwg0AgCGEGwAAQwg3\nAACGEG4AAAwh3AAAGEK4AQAwhHADAGAI4QYAwBDCDQCAIYQbAABDCDcAAIYQbgAADCHcAAAYQrgB\nADCEcAMAYAjhBgDAEMINAIAhhBsAAEMINwAAhhBuAAAMIdwAABhCuAEAMIRwAwBgCOEGAMAQwg0A\ngCGEGwAAQwg3AACGEG4AAAwh3AAAGEK4AQAwJKhwX758WadPn5YkHTlyRPn5+aqpqXF0MAAAcKOg\nwj1v3jy99957Kisr06xZs+Tz+TRv3jynZwMAANcJKtxlZWV68MEHtW3bNk2cOFFz585VZWWl07MB\nAIDrBBXu2tpaBQIBbd++Xffee68k6eLFi07OBQAAGhFUuJOTk/Wtb31L8fHxuv3227Vu3Tr169fP\n6dkAAMB1ooL5orFjx+qpp55Sx44dJUn333+/Bg8e7OhgAADgRs2+4r5w4YI++ugjZWdnq7KyUseP\nH9fx48d1+fJlLViwIFQzAgCA/2n2Ffe7776r9evX64MPPtCUKVPqT4+IiNB3v/tdx4cDAAANNRvu\n1NRUpaamavPmzZowYUKoZgIAAE0I6nfcI0eO1Pr161VZWalAIFB/+jPPPOPYYAAA4EZBvat82rRp\nOnLkiCIiIhQZGVn/DwAAhFZQr7i9Xq9eeOGFL3zln3zyiebNm6dz586ptrZW06dPr/9/4AAA4IsL\nKtxDhw5VSUmJ+vfv/4WufNeuXRoyZIgef/xxlZaW6rHHHiPcAADchKDCvXfvXq1bt06dOnVSVFSU\nAoGAPB6P/v73vzd7uYceeqj+49LSUnXv3v2mhgUAoK0LKty/+93vbmqRjIwMnTlzRrm5uTd1PZDq\n6urk8/nk91eFfO2+ffvx3ga0WTz24BZBhbuoqKjR08ePHx/UInl5eTpy5Ih++tOfqqCgoNmvjY+P\nDeo62yqfz6e3pkxVd683pOueqq5W3Pq1SkxMvOG88+djdCyk03wmLi6myftMuOZy40xS03MxU0NN\nzeTGx57b8XzujKDC/c4779R/XFtbq/fff1933XVXi+EuLi5W586d1a1bNyUlJamurk5+v19xcXFN\nXqa8/OMgR2+b/P4qdfd61Tsm9A8Iv7+q0dsnHK9APr92U/eZcM3lxpmurW3l9nPrTG577LlZfHys\nuZnD4cv8cBNUuK9/R3lNTY3mz5/f4uXefvttlZaWKjs7W2fPnlVNTU2z0QYAAM0L6v9xX++WW27R\nRx991OLXTZgwQefOndOkSZP09NNP6+c///mXWQ4AAPxPUK+4J06cKI/HU/95WVmZBg4c2OLl2rVr\np9/+9rdffjoAANBAUOGePXt2/ccej0cxMTFKSkpybCgAANC4oHaVJycnKyIiQsXFxSouLtYnn3zS\n4BU4AAAIjaDCvXz5ci1dulRnzpxRWVmZFi1apJUrVzo9GwAAuE5Qu8r379+vvLw8RUR82vkrV64o\nMzNT06ZNc3Q4AADQUFCvuK9evVofbUmKiopiVzkAAGEQ1CvuwYMH6+mnn9Z3vvMdSdK+ffs0ePBg\nRwcDAAA3ajHcx48fV3Z2tv7yl7/o0KFD8ng8+va3v60nnngiFPMBAIDPaXZXeVFRkSZMmKCLFy/q\nBz/4gbKzs/Xoo49q8+bNOnz4cKhmBAAA/9NsuHNycrR27VrFxn52LNWBAwcqNzdXL7/8suPDAQCA\nhpoNdyAQaPQv0gwYMECXLl1ybCgAANC4ZsNdXV3d5HkVFRVf+TAAAKB5zYZ7wIAB2rx58w2nv/rq\nqxo6dKhjQwEAgMY1+67yuXPnasaMGdq6dasGDx6sq1ev6uDBg4qJieHIaQAAhEGz4Y6Pj9cf//hH\nFRUV6T//+Y8iIyM1evRo3X333aGaDwAAfE5QB2BJSUlRSkqK07MAAIAWBHXIUwAA4A6EGwAAQwg3\nAACGEG4AAAwh3AAAGEK4AQAwhHADAGAI4QYAwBDCDQCAIYQbAABDCDcAAIYQbgAADCHcAAAYQrgB\nADCEcAMAYAjhBgDAEMINAIAhhBsAAEMINwAAhhBuAAAMIdwAABhCuAEAMIRwAwBgCOEGAMAQwg0A\ngCGEGwAAQwg3AACGRDm9wNKlS3Xw4EHV1dXpqaeeUlpamtNLAgDQajka7v3796ukpER5eXmqqKjQ\n2LFjCTcAADfB0XAnJydr6NChkqSOHTuqpqZGgUBAHo/HyWUBAGi1HP0dt8fjUfv27SVJr7/+ulJT\nU4k2AAA3wfHfcUvSjh079MYbb2jNmjWhWO4rU1dXpw8/PBqWtfv27afIyMiwrA3AjnA9T/EcFT6O\nh3vv3r1atWqV1qxZo5iYmBa/Pj4+1umRgubz+bRvzv+pu9cb0nVPVVcrbv1aJSYm3nDe+fMxOhbS\naT4TFxfT6O3jxpmk8M3lxpkkW7cfMzXU3H0qHM9TzT1HfZ6bns9bE0fDXVVVpRdffFHr1q1TbGxw\nN2B5+cdOjvSF+P1V6u71qndM6O98fn9Vo9vC768K+SyfX9vKTNfOCwc3znRtbSu3HzPduHZz96lw\nPE81N5P0abTd9HzuVl/mhxtHw71t2zZVVFRo9uzZ9W9KW7p0qbp16+bksgAAtFqOhjs9PV3p6elO\nLgEAQJvCkdMAADCEcAMAYAjhBgDAEMINAIAhhBsAAEMINwAAhhBuAAAMIdwAABhCuAEAMIRwAwBg\nCOEGAMAQwg0AgCGEGwAAQwg3AACGEG4AAAwh3AAAGEK4AQAwhHADAGAI4QYAwBDCDQCAIYQbAABD\nCDcAAIYQbgAADCHcAAAYQrgBADCEcAMAYAjhBgDAEMINAIAhhBsAAEMINwAAhhBuAAAMIdwAABhC\nuAEAMIRwAwBgCOEGAMAQwg0AgCGEGwAAQwg3AACGEG4AAAwh3AAAGEK4AQAwhHADAGCI4+H2+XxK\nS0vTpk2bnF4KAIBWz9Fw19TUaNGiRUpJSXFyGQAA2gxHw92uXTutXr1aXbt2dXIZAADaDEfDHRER\noejoaCeXAACgTYkK9wCf5/P55PdXhXzdvn37KTIyMuTrAkBrVFdXx/O5g1wV7remTFV3rzeka56q\nrlbc+rVKTEy84bzz52N0LKTTfCYuLkbx8bE3nM5MDTU1kxS+udw4k2Tr9mOmhtx4n2puJp/P57rn\n89bEVeHu7vWqd0zjdwQn+f1VKi//uNHTw4WZgtPUTNfOCwc3znRtbSu3HzPduLbb7lMtzeS253O3\nauqHn+Y4Gu7i4mItXrxYpaWlioqKUmFhoXJyctSxY0cnlwUAoNVyNNyDBg3Shg0bnFwCAIA2hSOn\nAQBgCOEGAMAQwg0AgCGEGwAAQwg3AACGEG4AAAwh3AAAGEK4AQAwhHADAGAI4QYAwBDCDQCAIYQb\nAABDCDcAAIYQbgAADCHcAAAYQrgBADCEcAMAYAjhBgDAEMINAIAhhBsAAEMINwAAhhBuAAAMIdwA\nABhCuAEAMIRwAwBgCOEGAMAQwg0AgCGEGwAAQwg3AACGEG4AAAwh3AAAGEK4AQAwhHADAGAI4QYA\nwBDCDQCAIYQbAABDCDcAAIYQbgAADCHcAAAYQrgBADCEcAMAYAjhBgDAkCinF3jhhRd06NAheTwe\nZWdna8iQIU4vCQBAq+VouN9++23997//VV5enkpKSrRgwQLl5eU5uSQAAK2ao7vKi4qKNHLkSElS\n//79deHCBV28eNHJJQEAaNUcDffZs2cVFxdX/3mnTp109uxZJ5cEAKBVc/x33J8XCASaPf9UdXWI\nJmm45u0tnB9qzBSclma69jWh5MaZrq1p7fZjps/WdNt9yo0zXVuzpblaA0+gpZrehJycHHXt2lXp\n6emSpJEjR6qgoEBer9epJQEAaNUc3VU+fPhwFRYWSpKKi4uVkJBAtAEAuAmO7iofNmyYBg0apIyM\nDEVGRmrhwoVOLgcAQKvn6K5yAADw1eLIaQAAGEK4AQAwhHADAGCIa8K9dOlSZWRk6Ic//KG2b98e\n7nFc7dKlS0pLS1N+fn64R3GtgoICPfLIIxo3bpz27NkT7nFcq7q6WrNmzdLkyZM1YcIE/eMf/wj3\nSK7j8/mUlpamTZs2SZJOnz6trKwsZWZmas6cObp8+XKYJ3SH67fTqVOn9NhjjykrK0tTp07VuXPn\nwjyhO1y/na7Zu3evkpKSgroOV4R7//79KikpUV5enl599VX95je/CfdIrrZixQrdeuut4R7DtSoq\nKvTKK68oLy9PK1eu1M6dO8M9kmtt2bJF/fr10+9//3stX75cv/71r8M9kqvU1NRo0aJFSklJqT9t\n+fLlysrK0saNG9W7d2/9+c9/DuOE7tDUdsrIyNCGDRt0//33a+3atWGc0B0a206SVFtbq1WrVqlr\n165BXY8rwp2cnKzly5dLkjp27KiampoWj7LWVh09elRHjx5VampquEdxrX379mn48OG65ZZb1KVL\nFz3//PPhHsm1OnXqpPPnz0uSKisrGxyiGFK7du20evXqBk+oBw4c0H333SdJuu+++7Rv375wjeca\njW2nX/ziF3rggQckSXFxcaqsrAzXeK7R2HaSpNzcXGVmZuprX/taUNfjinB7PB61b99ekvT6668r\nNTVVHo8nzFO505IlSzRv3rxwj+FqJ0+eVE1NjaZPn67MzEwVFRWFeyTXeuihh1RaWqoHHnhAWVlZ\n+tnPfhbukVwlIiJC0dHRDU6rqampf4Lt3LmzysvLwzGaqzS2ndq3by+Px6OrV6/qD3/4gx5++OEw\nTecejW2nY8eO6d///rdGjRoV9AvWkB6rvCU7duzQG2+8oTVr1oR7FFfKz8/XsGHD1LNnT0ktH/u9\nrQoEAqqoqNCKFSt04sQJTZ48Wbt37w73WK5UUFCgHj16aPXq1Tpy5IgWLFjArt8vgMdg865evapn\nn31W99xzj+65555wj+NKixcv1nPPPfeFLuOacO/du1erVq3SmjVrFBMTE+5xXGnPnj06ceKEdu/e\nrdOnT6tdu3bq1q3bDb8vaeu6dOmiYcOGyePxqFevXurQoYP8fj+7gRtx8OBBfe9735MkJSUl6cyZ\nMwoEAuzxakaHDh1UW1ur6OholZWVBf17ybZo/vz5uv322zVjxoxwj+JKZWVlOnbsmJ599lkFAgGV\nl5crKytLGzZsaPZyrgh3VVWVXnzxRa1bt06xsbHhHse1li1bVv9xTk6ObrvtNqLdiOHDhys7O1tP\nPvmkKioqVF1dTbSb0KdPH7333ntKS0vTyZMn1aFDB6LdgpSUFBUWFmrMmDEqLCys/8EHDRUUFCg6\nOlozZ84M9yiulZCQoL/97W/1n48YMaLFaEsuCfe2bdtUUVGh2bNn1/+0v3TpUnXr1i3co8GghIQE\njRo1Sunp6fJ4PBwjvxk/+tGPlJ2draysLNXV1fFGvusUFxdr8eLFKi0tVVRUlAoLC/XSSy9p3rx5\neu2119SjRw+NHTs23GOGXWPbye/3Kzo6WllZWfJ4PPrGN77R5h+LjW2nnJwcdezYUZKC/qGZY5UD\nAGCIK95VDgAAgkO4AQAwhHADAGAI4QYAwBDCDQCAIYQbAABDCDdgXGZmpnbt2tXgtEuXLik5OVll\nZWWNXiYrK4tjuANGEW7AuPHjx2vLli0NTtu+fbvuvPNOJSQkhGkqAE4h3IBxDz74oN55550GfzYx\nPz9f48eP144dO5SRkaEpU6YoMzNTpaWlDS574MABTZw4sf7z+fPn609/+pOkT49oOGnSJE2aNEmz\nZs1SZWWl6urqNH/+fGVkZGjChAn61a9+FZpvEkA9wg0Y1759e6Wlpemtt96SJJ05c0ZHjhzRiBEj\ndOHCBb388stav369vv/972vjxo03XL6xwyyePn1aK1eu1Lp167Rp0ybdfffdys3Nlc/n06FDh5SX\nl6fNmzcrKSlJVVVVjn+PAD7jimOVA7g548aN0/PPP69JkybpzTff1JgxYxQVFaXOnTtr7ty5CgQC\nOnv2rO68886gru/dd99VeXm5Hn/8cQUCAV2+fFm9evVS//79FRcXp2nTpunee+/V6NGj+Wt+QIgR\nbqAVuOOOO1RbW6uSkhJt3bpVy5Yt05UrVzRnzhxt3bpVvXr10qZNm3T48OEGl7v+1XZtba0kKTo6\nWnfccYdyc3NvWGvjxo364IMPtGvXLo0fP155eXnq0qWLc98cgAbYVQ60EuPHj9eKFSvk9XrVv39/\nXbx4UZGRkerRo4cuXbqknTt31of5mpiYmPp3ntfU1Oj999+XJA0ZMkT/+te/dPbsWUnSX//6V+3a\ntUuHDx9Wfn6+vvnNb2rGjBkaNGiQPvzww5B+n0BbxytuoJUYM2aMXnrppfo/nfj1r39dDz/8sMaN\nG6eePXvqiSee0Ny5c1VYWFj/SjspKUkDBw7Uo48+qt69e+uuu+6SJHXt2lULFizQtGnT5PV61b59\ney1ZskRRUVHKycnRa6+9pujoaPXp06f+MgBCgz/rCQCAIewqBwDAEMINAIAhhBsAAEMINwAAhhBu\nAAAMIdwAABhCuAEAMIRwAwBgyP8Dd7bJ8T4AxeYAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "twice.plot(color=COLORS[2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or three dice:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Hist({3: 1,\n", " 4: 3,\n", " 5: 6,\n", " 6: 10,\n", " 7: 15,\n", " 8: 21,\n", " 9: 25,\n", " 10: 27,\n", " 11: 27,\n", " 12: 25,\n", " 13: 21,\n", " 14: 15,\n", " 15: 10,\n", " 16: 6,\n", " 17: 3,\n", " 18: 1})" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "thrice = twice + d6\n", "thrice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that this is looking more and more like a bell curve:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfMAAAFmCAYAAAB5pHO7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHoJJREFUeJzt3X1UlHX+//HXAOLNgIvoaNGaJpns0W50N45UJmpWtllp\n6kGF7uxkZTeaySKuHtd1vaFa48QakXosdUXNMttjkWbbugcPutmdnGz2YGxuJI4iKjeJjvP7o198\nU1mcobnm4jM+H+d0jo7N53p/kJknc4HXOHw+n08AAMBYEXYPAAAAfh5iDgCA4Yg5AACGI+YAABiO\nmAMAYDhiDgCA4aKsXPz7779XVlaWjhw5ooaGBj322GNKSkrSjBkz5PP55HK5lJOTozZt2lg5BgAA\nYc1h5b8z37Jli7777jtNmjRJFRUVevDBBzVgwAClpqbqtttu05IlS3TppZcqLS3NqhEAAAh7lp5m\nv+OOOzRp0iRJUkVFhS699FLt3r1bQ4cOlSQNGTJExcXFVo4AAEDYs/Q0+4/S0tJ06NAhvfzyy3ro\noYcaT6t37txZHo8nFCMAABC2QhLzwsJC7du3T88++6x+elafK8kCAPDzWXqavbS0VAcPHpQkJSUl\n6cyZM3I6nWpoaJAkVVZWqmvXrs2uQfABAGiepa/Md+/erYqKCmVnZ+vw4cOqq6vToEGD9N577+mu\nu+5SUVGRBg0a1OwaDodDHs8JK8cMCZcrln20EuGwByk89hEOe5DYR2sSDnuQfthHICyN+fjx45Wd\nna2JEyfq5MmTmjt3rvr27avMzEytX79eCQkJGjVqlJUjAAAQ9iyNedu2bfXCCy+cd/uKFSusPCwA\nABcVrgAHAIDhiDkAAIYj5gAAGI6YAwBgOGIOAIDhiDkAAIYj5gAAGI6YAwBgOGIOAIDhiDkAAIYj\n5gAAGI6YAwBgOGIOAIDhiDkAAIYj5gAAGI6YAwBgOGIOAIDhiDkAAIYj5gAAGI6YAwBgOGIOAIDh\niDkAAIYj5gAAGI6YAwBgOGIOAIDhiDkAAIaLsnsAAIHzer1yu92qqqoJ2po9e/ZSZGTkWccoL99v\n2foAgoeYAwYqL9+vzM1z5HTFBmW9Ws8J5dw1T4mJvS05RlPrAwgeYg4YyumKVWxCnPHHAPDz8T1z\nAAAMR8wBADAcMQcAwHDEHAAAwxFzAAAMR8wBADAcMQcAwHDEHAAAwxFzAAAMR8wBADAcMQcAwHDE\nHAAAwxFzAAAMZ/m7puXk5GjPnj3yer165JFHtH37du3du1edOnWSJE2aNEmDBw+2egwAAMKWpTEv\nKSlRWVmZCgsLVV1drVGjRmngwIF69tlnCTgAAEFiacyTk5N17bXXSpI6duyouro6nTlzRj6fz8rD\nAgBwUbH0e+YOh0Pt2rWTJG3YsEGpqamKiIjQ6tWrdf/992v69Omqrq62cgQAAMKe5d8zl6Rt27bp\nzTff1PLly7V3717FxcUpKSlJBQUFeumllzR79uxQjAGEhNfrVXn5/qCu2bNnL0VGRgZ1Tbt5vV65\n3W5VVdUEbc1w/DgB/rA85jt27FBBQYGWL1+umJgYDRw4sPHPhg0bprlz515wDZcr1sIJQ4d9tB5W\n7sHtditz8xw5g3SMWs8JLXvgz7rqqqsabzt6NCYoa/9UfHzMWR+XYB/j3PXdbrceXvmMpR+nUAqH\nx4UUHvsIhz0EytKY19TU6LnnntPKlSsVG/vDB/epp57SjBkz1L17d5WUlPj1wPN4Tlg5Zki4XLHs\no5Wweg9VVTVyumIVmxAX1DV/OnMwX82G6hhNrW/1xylUwuFxIYXHPsJhD1LgX5BYGvMtW7aourpa\nU6dOlc/nk8Ph0OjRozVt2jS1b99eTqdTCxYssHIEAADCnqUxHzdunMaNG3fe7ffcc4+VhwUA4KLC\nFeAAADAcMQcAwHDEHAAAwxFzAAAMR8wBADAcMQcAwHDEHAAAwxFzAAAMR8wBADAcMQcAwHDEHAAA\nwxFzAAAMR8wBADAcMQcAwHDEHAAAwxFzAAAMR8wBADAcMQcAwHDEHAAAwxFzAAAMR8wBADAcMQcA\nwHDEHAAAwxFzAAAMR8wBADAcMQcAwHDEHAAAwxFzAAAMR8wBADAcMQcAwHDEHAAAwxFzAAAMR8wB\nADAcMQcAwHDEHAAAwxFzAAAMF2X3AEAoeb1eud1uVVXVBG3Nnj17KTIyMmjrIXi8Xq/Ky/cHdU3+\nvtEaEXNcVMrL9ytz8xw5XbFBWa/Wc0I5d81TYmLvoKyH4OLvGxcLYo6LjtMVq9iEOLvHQIjw942L\nAd8zBwDAcMQcAADDEXMAAAxHzAEAMJzlPwCXk5OjPXv2yOv16pFHHtHVV1+tGTNmyOfzyeVyKScn\nR23atLF6DAAAwpalMS8pKVFZWZkKCwtVXV2tUaNGaeDAgUpPT9dtt92mJUuWaOPGjUpLS7NyDAAA\nwpqlp9mTk5OVm5srSerYsaPq6uq0e/duDR06VJI0ZMgQFRcXWzkCAABhz9KYOxwOtWvXTpL0xhtv\nKDU1VfX19Y2n1Tt37iyPx2PlCAAAhL2QXDRm27Zt2rhxo5YvX65bb7218Xafz+fX/V1BunqT3diH\n/Y4ejQn6mvHxMWd9TDiGPevbdYwfmfy4+Klw2Ec47CFQlsd8x44dKigo0PLlyxUTEyOn06mGhgZF\nR0ersrJSXbt2veAaHs8Jq8e0nMsVyz5agWBek/2na/70Y8Ix7FnfrmNI5j8ufhQO+wiHPUiBf0Fi\n6Wn2mpoaPffcc8rPz1ds7A+DpaSkqKioSJJUVFSkQYMGWTkCAABhz9JX5lu2bFF1dbWmTp0qn88n\nh8OhxYsXa9asWVq3bp0SEhI0atQoK0cAACDsWRrzcePGady4cefdvmLFCisPCwDARYUrwAEAYDhi\nDgCA4Yg5AACGI+YAABiOmAMAYDhiDgCA4Yg5AACGI+YAABiOmAMAYDhiDgCA4Yg5AACGI+YAABiO\nmAMAYDhiDgCA4Yg5AACGI+YAABiOmAMAYDhiDgCA4Yg5AACGI+YAABiOmAMAYDhiDgCA4Yg5AACG\nI+YAABiOmAMAYDhiDgCA4Yg5AACGI+YAABiOmAMAYDhiDgCA4Yg5AACG8yvmp06d0sGDByVJ+/bt\n06ZNm1RfX2/pYAAAwD9+xTwrK0uffvqpKisr9eSTT8rtdisrK8vq2QAAgB/8inllZaVuv/12bdmy\nRRMmTFBmZqaOHTtm9WwAAMAPfsW8oaFBPp9PW7duVWpqqiSptrbWyrkAAICf/Ip5cnKyfv3rX8vl\ncumKK67QypUr1atXL6tnAwAAfojy538aNWqUHnnkEXXs2FGSNGzYMPXr18/SwQAAgH+afWV+/Phx\nffPNN8rOztaxY8d04MABHThwQKdOndKsWbNCNSMAAGhGs6/MP/nkE7322mv68ssvdf/99zfeHhER\noZtuusny4QAAwIU1G/PBgwdr8ODBWrt2rcaPHx+qmQAAQAD8+p75Lbfcotdee03Hjh2Tz+drvP3p\np5+2bDAAAOAfv36affLkydq3b58iIiIUGRnZ+B8AALCfX6/MO3TooIULF7boAG63W1OmTNEDDzyg\niRMnaubMmdq7d686deokSZo0aZIGDx7corUBAICfMb/22mtVVlamxMTEgBavr6/X/PnzlZKSctbt\nzz77LAEHACBI/Ir5jh07tHLlSnXq1ElRUVHy+XxyOBz6+9//3uz92rZtq2XLlqmgoCAYswIAgCb4\nFfOXX365RYtHREQoOjr6vNtXr16tFStWqEuXLpo9e7bi4uJatD7Ci9frVXn5/qCu2bNnL36+A5by\ner1yu92qqqoJ2pp83iJQfsV8586dTd4+ZsyYgA949913Ky4uTklJSSooKNBLL72k2bNnN3sflys2\n4OO0RuyjeW63W5mb58gZpPVrPSe07IE/66qrrmq87ejRmKCs/VPx8TFnfUw4hj3r23UMt9uth1c+\nY+nnbSiFw/NUOOwhUH7F/OOPP278dUNDgz7//HMNGDCgRTEfOHBg46+HDRumuXPnXvA+Hs+JgI/T\n2rhcsezjAqqqauR0xSo2IXhnaqqqas6aN5ivnjhG61rfzmNY/XkbKuHwPBUOe5AC/4LEr5if+5Ps\n9fX1mjlzZkAH+tFTTz2lGTNmqHv37iopKbHtq08AAMKFXzE/V/v27fXNN99c8P8rLS3VokWLVFFR\noaioKBUVFSkjI0PTpk1T+/bt5XQ6tWDBgpaMAAAA/j+/Yj5hwgQ5HI7G31dWVqpPnz4XvF/fvn21\natWq824fPnx4ACMCAIDm+BXzqVOnNv7a4XAoJiZGSUlJlg0FAAD859flXJOTkxUREaHS0lKVlpbq\n+++/P+uVOgAAsI9fMc/NzVVOTo4OHTqkyspKzZ8/X6+88orVswEAAD/4dZq9pKREhYWFioj4of2n\nT59Wenq6Jk+ebOlwAADgwvx6ZX7mzJnGkEtSVFQUp9kBAGgl/Hpl3q9fPz366KO64YYbJEnFxcXq\n16+fpYMBAAD/XDDmBw4cUHZ2tt5991199tlncjgc+s1vfqOHH344FPMBAIALaPY0+86dOzV+/HjV\n1tbqt7/9rbKzszV69GitXbtWe/fuDdWMAACgGc3GPC8vTytWrFBs7P9dI7ZPnz7Kz8/Xiy++aPlw\nAADgwpqNuc/na/La6b1799bJkyctGwoAAPiv2ZjX1dX9zz+rrq4O+jAAACBwzca8d+/eWrt27Xm3\nv/rqq7r22mstGwoAAPiv2Z9mz8zM1JQpU/T222+rX79+OnPmjPbs2aOYmBiuAAcAQCvRbMxdLpfW\nr1+vnTt36t///rciIyM1YsQIXX/99aGaDwAAXIBfF41JSUlRSkqK1bMAAIAW8OtyrgAAoPUi5gAA\nGI6YAwBgOGIOAIDhiDkAAIYj5gAAGI6YAwBgOGIOAIDhiDkAAIYj5gAAGI6YAwBgOGIOAIDhiDkA\nAIYj5gAAGI6YAwBgOGIOAIDhiDkAAIYj5gAAGI6YAwBgOGIOAIDhiDkAAIYj5gAAGI6YAwBgOGIO\nAIDhiDkAAIYj5gAAGI6YAwBgOMtj7na7NXz4cK1Zs0aSdPDgQWVkZCg9PV3Tpk3TqVOnrB4BAICw\nZmnM6+vrNX/+fKWkpDTelpubq4yMDK1evVqXX365Nm7caOUIAACEPUtj3rZtWy1btkxdu3ZtvG3X\nrl0aMmSIJGnIkCEqLi62cgQAAMKepTGPiIhQdHT0WbfV19erTZs2kqTOnTvL4/FYOQIAAGEvys6D\n+3w+Ow+PAHm9XrndblVV1QRlvZ49eykyMjIoawHhzOv1qrx8f1DX5PEXXkIec6fTqYaGBkVHR6uy\nsvKsU/D/i8sVG4LJrGf6Ptxutx5e+YycQdhHreeElj3wZ1111VWNtx09GvOz1z1XfHzMWR93jmHf\nMcJhD3Ydw+12K3PznKA89qSmH38/Mv15SgqPPQQq5DFPSUlRUVGRRo4cqaKiIg0aNOiC9/F4ToRg\nMmu5XLHG76OqqkZOV6xiE+KCtt5PPybBesXPMVrnMcJhD3YeI5iPvaaOIYXH81Q47EEK/AsSS2Ne\nWlqqRYsWqaKiQlFRUSoqKtLzzz+vrKwsrVu3TgkJCRo1apSVIwAAEPYsjXnfvn21atWq825fsWKF\nlYcFAOCiwhXgAAAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEH\nAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfM\nAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMRcwAADEfMAQAwHDEHAMBwxBwAAMMR\ncwAADEfMAQAwHDEHAMBwxBwAAMNF2T0AgsPr9aq8fH9Q1+zZs5ciIyODuiaA1snr9crtdquqqiYo\n6/H8EVrEPEyUl+9X5uY5crpig7JereeEcu6ap8TE3kFZD0DrFsznEJ4/Qo+YhxGnK1axCXF2jwHA\nUDyHmIvvmQMAYDhiDgCA4Yg5AACGI+YAABgu5D8At2vXLj399NPq3bu3fD6f+vTpo9///vehHgMA\ngLBhy0+zJycnKzc3145DAwAQdmw5ze7z+ew4LAAAYcmWmJeVlenxxx/XxIkTVVxcbMcIAACEjZCf\nZu/Ro4eeeOIJjRgxQgcOHNB9992nrVu3KiqK69cAANASIS9ot27dNGLECElS9+7d1aVLF1VWVuqy\nyy77n/dxBekSpXazch9Hj8YEfc34+JizZg72Maxen2O0rmOEwx44RsvXD6VwaUYgQh7zd955Rx6P\nRw899JA8Ho+OHDmibt26NXsfj+dEiKazjssVa+k+gvXmCOeu+dOZg30Mq9fnGK3rGOGwB47R8vVD\nxern2lAJ9AuSkMd86NChmj59uj744AOdPn1af/jDHzjFDgDAzxDyijqdTuXn54f6sAAAhC2uAAcA\ngOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAYjpgD\nAGA4Yg4AgOGIOQAAhiPmAAAYLuTvZ34x8nq9crvdqqqqCdqaPXv2UmRkZNDWAwAreb1elZfvD+qa\nPA/+H2IeAuXl+5W5eY6crtigrFfrOaGcu+YpMbF3UNYDAKvxPGgtYh4iTlesYhPi7B4DAGzD86B1\n+J45AACGI+YAABiOmAMAYDhiDgCA4Yg5AACGI+YAABiOmAMAYDhiDgCA4Yg5AACGI+YAABiOmAMA\nYDhiDgCA4Yg5AACGI+YAABiOmAMAYDhiDgCA4Yg5AACGI+YAABguyu4B7Ob1elVevj+oa/bs2UuR\nkZFBXRMA0Dyv1yu3262qqpqgrWnK8/lFH/Py8v3K3DxHTldsUNar9ZxQzl3zlJjYOyjrAQD8czE/\nn1/0MZckpytWsQlxdo8BAPiZLtbnc75nDgCA4Yg5AACGI+YAABjOlu+ZL1y4UJ999pkcDoeys7N1\n9dVX2zEGAABhIeQx3717t/7zn/+osLBQZWVlmjVrlgoLC0M9BgAAYSPkp9l37typW265RZKUmJio\n48ePq7a2NtRjAAAQNkIe88OHDys+Pr7x9506ddLhw4dDPQYAAGHD9n9n7vP57B5BtZ4Tlq/FMUK7\nPsdoXccIhz1wjNazvt3HaI0cvhDXNC8vT127dtW4ceMkSbfccos2b96sDh06hHIMAADCRshPs994\n440qKiqSJJWWlqpbt26EHACAnyHkp9n79++vvn37Ki0tTZGRkZozZ06oRwAAIKyE/DQ7AAAILq4A\nBwCA4Yg5AACGI+YAABiuVcc8JydHaWlpGjt2rLZu3Wr3OC128uRJDR8+XJs2bbJ7lBbbvHmz7r77\nbt1777366KOP7B6nRerq6vTkk0/qvvvu0/jx4/XPf/7T7pEC4na7NXz4cK1Zs0aSdPDgQWVkZCg9\nPV3Tpk3TqVOnbJ7QP+fu47vvvtODDz6ojIwMPfTQQzpy5IjNE17YuXv40Y4dO5SUlGTTVIE7dx+n\nT5/W9OnTNXbsWD344IM6caL1/zvrc/ewe/duTZgwQffdd58effRRI/Ygnd+7QB/frTbmJSUlKisr\nU2FhoV599VUtWLDA7pFabOnSpYqLi7N7jBarrq7WX/7yFxUWFuqVV17RBx98YPdILfLWW2+pV69e\nev3115Wbm6s//elPdo/kt/r6es2fP18pKSmNt+Xm5iojI0OrV6/W5Zdfro0bN9o4oX/+1z7S0tK0\natUqDRs2TCtWrLBxwgtrag+S1NDQoIKCAnXt2tWmyQLT1D7Wr1+vzp07a8OGDbrjjjv0r3/9y8YJ\nL6ypPSxatEgLFy7U66+/rv79+xvx3h9N9S43N1fp6el+P75bbcyTk5OVm5srSerYsaPq6+tbxdXi\nArV//37t379fgwcPtnuUFisuLtaNN96o9u3bq0uXLpo3b57dI7VIp06ddPToUUnSsWPHzrqscGvX\ntm1bLVu27KxQ7Nq1S0OGDJEkDRkyRMXFxXaN57em9jF37lzdeuutkqT4+HgdO3bMrvH80tQeJCk/\nP1/p6elq06aNTZMFpql9fPjhhxo5cqQkaezYsY2fX61VU3uIj49XVVWVpB8e5506dbJrPL+d27u6\nujrt3r1bQ4cOleTf47vVxtzhcKhdu3aSpA0bNmjw4MFyOBw2TxW4xYsXKysry+4xfpZvv/1W9fX1\neuyxx5Senq6dO3faPVKL3HHHHaqoqNCtt96qjIwM/e53v7N7JL9FREQoOjr6rNvq6+sbw9G5c2d5\nPB47RgtIU/to166dHA6Hzpw5o7/+9a+68847bZrOP03t4euvv9ZXX32l2267zZgXHU3t49tvv9VH\nH32kjIwMTZ8+XcePH7dpOv80tYesrCxNmTJFI0aM0J49ezR69GibpvPfT3v3xhtvKDU1NeDHd6uN\n+Y+2bdumN998U7Nnz7Z7lIBt2rRJ/fv312WXXSapdVyHviV8Pp+qq6u1dOlSLVy4UNnZ2XaP1CKb\nN29WQkKC3n//fa1cuVJ/+MMf7B4paEz93PrRmTNnNGPGDA0cOFADBw60e5yALVq0yPgv2qUfPo8S\nExO1atUqXXnllcrPz7d7pIDNnz9fS5cu1bvvvqsBAwac93MNrdm2bdu0ceNGzZ49+6zHtD+Pb9vf\naKU5O3bsUEFBgZYvX66YmBi7xwnYRx99pP/+97/68MMPdfDgQbVt21aXXHLJed9ra+26dOmi/v37\ny+FwqHv37nI6naqqqjLqNLUk7dmzR4MGDZIkJSUl6dChQ/L5fEae8ZEkp9OphoYGRUdHq7Ky0pjv\n1TZl5syZuuKKKzRlyhS7RwlYZWWlvv76a82YMUM+n08ej0cZGRlatWqV3aMFrEuXLrr++uslSTfd\ndJPy8vJsnihwX331la677jpJ0g033KC//e1vNk/kn3N7F+jju9W+Mq+pqdFzzz2n/Px8xcbG2j1O\niyxZskQbNmzQunXrNHbsWD3++OPGhVz64Xr6JSUl8vl8Onr0qOrq6owLuST16NFDn376qaQfTic6\nnU5jQy5JKSkpje9zUFRU1PiFimk2b96s6OhoPfHEE3aP0iLdunXT+++/r8LCQq1bt04ul8vIkEvS\nzTffrH/84x+SfnjvjCuuuMLmiQLncrlUVlYmSfriiy/Uo0cPmye6sKZ6F+jju9VeznX9+vXKy8tT\nz549G1895eTk6JJLLrF7tBbJy8vTL3/5S91zzz12j9Ii69ev14YNG+RwOPT4448rNTXV7pECVldX\np+zsbB05ckRer1dTp05VcnKy3WP5pbS0VIsWLVJFRYWioqLUrVs3Pf/888rKylJDQ4MSEhK0cOFC\nRUZG2j1qs5raR1VVlaKjoxu/uLryyitb9Xs2NLWHvLw8dezYUZI0bNgwI/7FR1P7eOGFFzR//nx5\nPB45nU4tXry4VX/h3tQennnmGS1evFht2rRRXFycFixY0OrP7DbVu8WLF2vWrFl+P75bbcwBAIB/\nWu1pdgAA4B9iDgCA4Yg5AACGI+YAABiOmAMAYDhiDgCA4Yg5EKbS09O1ffv2s247efKkkpOTVVlZ\n2eR9MjIyjL32PnAxI+ZAmBozZozeeuuts27bunWrrrvuOnXr1s2mqQBYgZgDYer222/Xxx9/fNZb\nim7atEljxozRtm3blJaWpvvvv1/p6emqqKg46767du3ShAkTGn8/c+ZMvfHGG5KkLVu2aOLEiZo4\ncaKefPJJHTt2TF6vVzNnzlRaWprGjx+vP/7xj6HZJABJxBwIW+3atdPw4cMb32ji0KFD2rdvn4YO\nHarjx4/rxRdf1Guvvaabb75Zq1evPu/+TV23/uDBg3rllVe0cuVKrVmzRtdff73y8/Pldrv12Wef\nqbCwUGvXrlVSUpJqamos3yOAH7Tqd00D8PPce++9mjdvniZOnKh33nlHI0eOVFRUlDp37qzMzEz5\nfD4dPny48V2mLuSTTz6Rx+PRpEmT5PP5dOrUKXXv3l2JiYmKj4/X5MmTlZqaqhEjRrT662ED4YSY\nA2HsmmuuUUNDg8rKyvT2229ryZIlOn36tKZNm6a3335b3bt315o1a7R3796z7nfuq/KGhgZJUnR0\ntK655pom3+d69erV+vLLL7V9+3aNGTNGhYWF6tKli3WbA9CI0+xAmBszZoyWLl2qDh06KDExUbW1\ntYqMjFRCQoJOnjypDz74oDHWP4qJiWn8iff6+np9/vnnkqSrr75aX3zxhQ4fPixJeu+997R9+3bt\n3btXmzZt0q9+9StNmTJFffv2VXl5eUj3CVzMeGUOhLmRI0fq+eefb3xb0V/84he68847de+99+qy\nyy7Tww8/rMzMTBUVFTW+Ik9KSlKfPn00evRoXX755RowYIAkqWvXrpo1a5YmT56sDh06qF27dlq8\neLGioqKUl5endevWKTo6Wj169Gi8DwDr8RaoAAAYjtPsAAAYjpgDAGA4Yg4AgOGIOQAAhiPmAAAY\njpgDAGA4Yg4AgOGIOQAAhvt/0Qpow5o5T9wAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "thrice.plot(color=COLORS[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the number of dice increases, the result converges to a normal distribution, also known as a Gaussian distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are first babies more likely to be late?\n", "----------------------------------------\n", "\n", "This is one of the first topics I wrote about in my blog, and still the most popular, with more than 100,000 page views:\n", "\n", "http://allendowney.blogspot.com/2011/02/are-first-babies-more-likely-to-be-late.html\n", "\n", "I used data from the National Survey of Family Growth (NSFG):\n", "\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
caseidpregordrhowpreg_nhowpreg_pmoscurrpnowprgdkpregend1pregend2nbrnalivmultbrth...poverty_ilaborfor_ireligion_imetro_ibasewgtadj_mod_basewgtfinalwgtsecu_psestcmintvw
011NaNNaNNaNNaN6.0NaN1.0NaN...00003410.3893993869.3496026448.271112291231
112NaNNaNNaNNaN6.0NaN1.0NaN...00003410.3893993869.3496026448.271112291231
221NaNNaNNaNNaN5.0NaN3.05.0...00007226.3017408567.54911012999.5422642121231
322NaNNaNNaNNaN6.0NaN1.0NaN...00007226.3017408567.54911012999.5422642121231
423NaNNaNNaNNaN6.0NaN1.0NaN...00007226.3017408567.54911012999.5422642121231
\n", "

5 rows × 243 columns

\n", "
" ], "text/plain": [ " caseid pregordr howpreg_n howpreg_p moscurrp nowprgdk pregend1 \\\n", "0 1 1 NaN NaN NaN NaN 6.0 \n", "1 1 2 NaN NaN NaN NaN 6.0 \n", "2 2 1 NaN NaN NaN NaN 5.0 \n", "3 2 2 NaN NaN NaN NaN 6.0 \n", "4 2 3 NaN NaN NaN NaN 6.0 \n", "\n", " pregend2 nbrnaliv multbrth ... poverty_i laborfor_i religion_i \\\n", "0 NaN 1.0 NaN ... 0 0 0 \n", "1 NaN 1.0 NaN ... 0 0 0 \n", "2 NaN 3.0 5.0 ... 0 0 0 \n", "3 NaN 1.0 NaN ... 0 0 0 \n", "4 NaN 1.0 NaN ... 0 0 0 \n", "\n", " metro_i basewgt adj_mod_basewgt finalwgt secu_p sest cmintvw \n", "0 0 3410.389399 3869.349602 6448.271112 2 9 1231 \n", "1 0 3410.389399 3869.349602 6448.271112 2 9 1231 \n", "2 0 7226.301740 8567.549110 12999.542264 2 12 1231 \n", "3 0 7226.301740 8567.549110 12999.542264 2 12 1231 \n", "4 0 7226.301740 8567.549110 12999.542264 2 12 1231 \n", "\n", "[5 rows x 243 columns]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import thinkstats2\n", "\n", "dct_file = '2002FemPreg.dct'\n", "dat_file = '2002FemPreg.dat.gz'\n", "\n", "dct = thinkstats2.ReadStataDct(dct_file)\n", "preg = dct.ReadFixedWidth(dat_file, compression='gzip')\n", "\n", "preg.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variable `outcome` encodes the outcome of the pregnancy. Outcome 1 is a live birth." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 9148\n", "2 1862\n", "3 120\n", "4 1921\n", "5 190\n", "6 352\n", "Name: outcome, dtype: int64" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "preg.outcome.value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`pregorder` is 1 for first pregnancies, 2 for others." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 5033\n", "2 3766\n", "3 2334\n", "4 1224\n", "5 613\n", "6 308\n", "7 158\n", "8 78\n", "9 38\n", "10 17\n", "11 8\n", "12 5\n", "13 3\n", "14 3\n", "15 1\n", "16 1\n", "17 1\n", "18 1\n", "19 1\n", "Name: pregordr, dtype: int64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "preg.pregordr.value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I selected live births, then split into first babies and others." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(4413, 4735)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "live = preg[preg.outcome == 1]\n", "firsts = live[live.birthord == 1]\n", "others = live[live.birthord != 1]\n", "len(firsts), len(others)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mean pregnancy lengths are slightly different:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(38.60095173351461, 38.52291446673706)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "firsts.prglngth.mean(), others.prglngth.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The difference is 0.078 weeks:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.07803726677754952" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "diff = firsts.prglngth.mean() - others.prglngth.mean()\n", "diff" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which is 13 hours. Note: the best units to report are often not the units you computed." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "13.11026081862832" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "diff * 7 * 24" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see if we can visualize the difference in the histograms:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [], "source": [ "first_hist = Hist(firsts.prglngth)\n", "other_hist = Hist(others.prglngth)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I used some plotting options to put two bar charts side-by-side:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def plot_distributions(dist1, dist2):\n", " dist1.plot(width=-0.45, align='edge', color=COLORS[3], label='firsts')\n", " dist2.plot(width=0.45, align='edge', color=COLORS[4], label='others')\n", " plt.xlim(33.5, 43.5)\n", " plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the two histograms:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfkAAAFmCAYAAABuhuNyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtcVXW+//H3vgAmKLoRUNIsHdNJo6DJax4vGZmVHh3s\nmEnN0SanDMvjJRRnpul4RFEj58eQXfSRt5FK5ph1HJmKZmqMURPGkl+OHYysYcC9FVEUxQ37/NGE\neQlJ92LD19fz8ejxwMXa3/3Zn5a+97p9l83n8/kEAACMYw90AQAAwBqEPAAAhiLkAQAwFCEPAICh\nCHkAAAxFyAMAYCinlYOfPHlSKSkpOnTokGpqavToo4+qV69emj17tnw+nyIjI5Wenq6goCBt3rxZ\na9askcPh0Pjx45WYmCiv16uUlBSVlpbK4XAoLS1NnTt3trJkAACMYbPyPvktW7boH//4h6ZMmaLS\n0lL9+7//u+Lj4zV06FDdeeedysjIUKdOnTRmzBiNHTtWOTk5cjqdSkxM1Pr165WXl6dPPvlEP//5\nz7Vt2zZt3LhRGRkZVpULAIBRLD1cP2rUKE2ZMkWSVFpaqk6dOmnnzp0aPny4JGnYsGH68MMPtXv3\nbsXGxio0NFQhISGKj4/Xrl27lJ+frxEjRkiSBg4cqIKCAivLBQDAKJYerv/GhAkTdPDgQT3//POa\nPHmygoKCJEkRERE6ePCgDh06JJfLVb++y+WS2+2Wx+OpX26z2WS32+X1euV0NknZAAC0aE2SltnZ\n2dq7d69mzZqlb58d+K4zBd+1vK6uzpL6AAAwkaWH64uKilRWViZJ6tWrl+rq6hQaGqqamhpJUnl5\nuaKjoxUVFSW3213/um8v93g8kiSv1ytJF92L93prrfgoAAC0OJbuye/cuVOlpaWaN2+ePB6PTpw4\nocGDB2vr1q0aPXq0cnNzNXjwYMXGxmr+/PmqqqqSzWZTYWGhUlNTdezYMW3dulWDBg1SXl6e+vXr\nd9H3rKg4YclniYxsI7f7mCVj42v0uGnQZ+vRY+vR4zMiI9t85+8svbr+1KlTmjdvnsrKynTq1Ckl\nJyerd+/emjNnjmpqahQTE6O0tDQ5HA794Q9/0Msvvyy73a6kpCTdfffdqqurU2pqqr744guFhIRo\n0aJFio6ObvA9rfqfzgZlPXrcNOiz9eix9ejxGQEL+UAg5Fsuetw06LP16LH16PEZDYU8M94BAGAo\nQh4AAEMR8gAAGIqQBwDAUIQ8AACGYn5YAECLVFtbq5KS/X4d89pru8nhcHzn771erx59dIqOHKnQ\n9On/ocGDh150zD//+X317z8wIFOyE/IAgBappGS/spa8qXbhDc+f0lhHKsv12Ox71b17j+9cx+Px\nqLbWq9df39zocV99db1uueVWQh4AgO+jXXi0OrS/usneLzPzWf39719p4cJfqWfPH6pbt+7asGGt\nTp48qccff1Jbtrylv/3tU9XV1elf//XHstvtKirao9mzn1BGxm+0YMEvdOjQIZ0+fVpTpkxV3779\nLa2Xc/IAADTS44/PUJcuXdWpU4xsNpsk6fPP9+vZZzPVsWOM8vP/rOefX6nf/OYl1dZ6deedoxQR\n0UHLlv1an39erMrKSmVmvqhnn/1/Onq00vJ6CXkAAC7DD37QQ06nU23bttU113TV3LmzlJf3tkaO\nvOefa/jk80ldu16rEydOaMGCX+qjj3ZqxIg7La+NkAcA4DI4nUH1Py9ZslyTJ/9Un322T3PmzDhr\nvZCQVnrxxVc0Zsw4/eUvHyot7RnLayPkAQDwg7Kyf2jjxmz16NFT06Y9UX843mazyev1at++vfrD\nH36vG2+8STNnPqUvviixvCYuvAMAtFhHKsubfKx/noo/T4cOkfrkk4/17rt/UHBwiO65Z7QkKS7u\nFk2b9rB+/esVeuGFLL3xxu/kcDh0//1J/ir9u2vlKXSNwxOPrEePmwZ9th49tl5kZBuVlR1p8vvk\nm6OGnkLHnjwAoEVyOBwN3tMOzskDAGAsQh4AAEMR8gAAGIqQBwDAUIQ8AACG4up6AECLFIhHzX6X\n3bsL1bXrdWrXrp3Gjx+ttWtfU6tWrfxa26Ug5AEALVJJyX59lLdQMR3D/TJeaVmlNHzeJd2W9z//\ns1n335+kdu3aSfqO2XICgJAHALRYMR3D1bWzq0nf0+v1Kj39v1Ra+nd5vV5NnvyI3n//j/r88/1a\nsGCxJJ82bszWX/7yoWpra/Xss5kKCQlRevp/6R//KJXX69WUKVMVH/8jJSdPVbdu3WWz2XT33aO1\nbNliBQcHKygoWM88s1ChoWGXVSshDwDA9/DOO7kKCWmlzMwX5fF4lJz8iK6/vqf+4z+eUnR0R0lS\n9+49NGnST/SrX83Xrl07dPz4cXXoEKmUlJ+rsvKIpk9/VKtXb5Akdev2A40ZM07PPbdU48aNV0LC\nXSoo+EiHDh0i5AEAaEp7936quLhbJEkdOnRQcHCwjh49qm/PEh8be9M/fx+pqqoq7dnziT755K/6\n+OO/yufz6fTpGnm9XknSDTf0liQNHjxES5em6csvD2jYsBG65pqul10rIQ8AwPdgs9nOCvTTp0+f\nd5Gdw3EmXn0+n4KDg/Tgg5N1++0J5433zaNqb7nlVq1cuVZ//vMHWrjwV5o27Yn6LxOXilvoAAD4\nHn74wxtUWPiRJKm8vEx2u11t2oSrtrb2O19zww199P77f5QkVVQc1gsv/Oa8dXJyXlNlZaUSEkbq\nvvvu12ef/e2ya2VPHgDQYpWWVfp1rJgbLr7e7bcnqLBwl6ZP/5m8Xq9mz05VQcFOzZ//lNLSlurb\nV9d/81ja4cPv0K5dO/Xoo5NVV+fTlClT//n7M+t27txFP/95ikJDwxQSEqy5c3952Z+JR802Eo+O\ntB49bhr02Xr02Ho8avYMHjULADAOj5q9OM7JAwBgKEIeAABDEfIAABiKkAcAwFCEPAAAhiLkAQAw\nFCEPAIChCHkAAAzFZDgAmq3a2lpmNAMuAyEPoNkqKdmvj/IWKqZjuF/GKy2rlIbPY5Y0XDEIeQDN\nWkzHcHXt7Ap0GUCLxDl5AAAMZfmefHp6ugoKClRbW6tHHnlEeXl52rNnj9q3by9JmjJlioYMGaLN\nmzdrzZo1cjgcGj9+vBITE+X1epWSkqLS0lI5HA6lpaWpc+fOVpcMAIARLA357du3q7i4WNnZ2Tpy\n5IjGjh2r/v37a9asWRoyZEj9etXV1crKylJOTo6cTqcSExOVkJCgvLw8hYeHa+nSpdq2bZuWLVum\njIwMK0sGAMAYlh6u79u3r5YvXy5Jatu2rU6cOKG6ujqd+wj73bt3KzY2VqGhoQoJCVF8fLx27dql\n/Px8jRgxQpI0cOBAFRQUWFkuAABGsTTkbTabWrVqJUl6/fXXNXToUNntdq1bt04PPfSQZs6cqYqK\nCnk8HrlcZy6scblccrvdZy232Wyy2+3yer1WlgwAgDGa5Or6d955R7/73e+0cuVK7dmzR+3atVOv\nXr300ksvKTMzU3FxcWetf+6e/jfq6uqaolwAAIxgech/8MEHevHFF7Vy5UqFhYWpf//+9b8bPny4\nnn76aY0cOVLvvfde/fLy8nLFxcUpKipKHo9HPXv2rN+DdzobLrl9+9ZyOq2Z6CIyso0l4+IMetw0\nWkqfKyrCVOrnMV2usCb5/C2lxy0ZPb44S0O+qqpKS5Ys0SuvvKI2bb7+nzF9+nTNnj1bXbp00fbt\n23X99dcrNjZW8+fPV1VVlWw2mwoLC5Wamqpjx45p69atGjRokPLy8tSvX7+LvmdFxQlLPktkZBu5\n3ccsGRtfo8dNoyX1+fDhKkvGtPrzt6Qet1T0+IyGvuxYGvJbtmzRkSNH9OSTT8rn88lms2ncuHGa\nMWOGrrrqKoWGhmrhwoUKCQnRzJkzNXnyZNntdiUnJyssLEyjRo3Stm3bNHHiRIWEhGjRokVWlgsA\ngFFsvu86Ad5CWfXNjm+N1qPHTaMl9bm4+DOV/v/f+G3Guy++OqyYG6ZZPq1tS+pxS0WPz2hoT54Z\n7wAAMBQhDwCAoQh5AAAMRcgDAGAoQh4AAEMR8gAAGIqQBwDAUIQ8AACGIuQBADAUIQ8AgKEIeQAA\nDEXIAwBgKEIeAABDEfIAABiKkAcAwFCEPAAAhiLkAQAwFCEPAIChCHkAAAxFyAMAYChCHgAAQxHy\nAAAYipAHAMBQhDwAAIYi5AEAMBQhDwCAoQh5AAAMRcgDAGAoQh4AAEMR8gAAGIqQBwDAUIQ8AACG\nIuQBADAUIQ8AgKEIeQAADOUMdAEAzFFbW6uSkv1+G+/AgS/4Rwq4DPz9AeA3JSX7lbXkTbULj/bL\neAe+KtKDE/wyFHBFIuQB+FW78Gh1aH+1X8aqqCyXdMAvYwFXIs7JAwBgKEIeAABDEfIAABiKkAcA\nwFCEPAAAhrL86vr09HQVFBSotrZWjzzyiG688UbNnj1bPp9PkZGRSk9PV1BQkDZv3qw1a9bI4XBo\n/PjxSkxMlNfrVUpKikpLS+VwOJSWlqbOnTtbXTIAAEawNOS3b9+u4uJiZWdn68iRIxo7dqz69++v\nSZMm6c4771RGRoZycnI0ZswYZWVlKScnR06nU4mJiUpISFBeXp7Cw8O1dOlSbdu2TcuWLVNGRoaV\nJQMAYAxLD9f37dtXy5cvlyS1bdtWJ06c0M6dOzV8+HBJ0rBhw/Thhx9q9+7dio2NVWhoqEJCQhQf\nH69du3YpPz9fI0aMkCQNHDhQBQUFVpYLAIBRLA15m82mVq1aSZI2btyooUOHqrq6WkFBQZKkiIgI\nHTx4UIcOHZLL5ap/ncvlktvtlsfjqV9us9lkt9vl9XqtLBkAAGM0yYx377zzjnJycrRy5UolJCTU\nL/f5fBdc/7uW19XVXfS92rdvLafTcWmFXkRkZBtLxsUZ9LhpWNXnioowS8b1J5crrEm2M7Zl69Hj\ni7M85D/44AO9+OKLWrlypcLCwhQaGqqamhoFBwervLxc0dHRioqKktvtrn9NeXm54uLiFBUVJY/H\no549e9bvwTudDZdcUXHCks8RGdlGbvcxS8bG1+hx07Cyz4cPV1kyrj8dPlxl+XbGtmw9enxGQ192\nLD1cX1VVpSVLlmjFihVq0+brIgYMGKDc3FxJUm5urgYPHqzY2Fjt2bNHVVVVOn78uAoLC3XLLbdo\n0KBB2rp1qyQpLy9P/fr1s7JcAACMYume/JYtW3TkyBE9+eST8vl8stlsWrx4sVJTU/Xqq68qJiZG\nY8eOlcPh0MyZMzV58mTZ7XYlJycrLCxMo0aN0rZt2zRx4kSFhIRo0aJFVpYLAIBRLA35++67T/fd\nd995y1etWnXesoSEhLPO10uS3W5XWlqaZfUBAGAyZrwDAMBQhDwAAIYi5AEAMBQhDwCAoQh5AAAM\nRcgDAGAoQh4AAEMR8gAAGIqQBwDAUIQ8AACGIuQBADAUIQ8AgKEIeQAADEXIAwBgKEIeAABDEfIA\nABiKkAcAwFCEPAAAhiLkAQAwFCEPAIChCHkAAAxFyAMAYChCHgAAQxHyAAAYipAHAMBQhDwAAIYi\n5AEAMBQhDwCAoQh5AAAMRcgDAGAoQh4AAEM1KuRPnz6tsrIySdLevXu1adMmVVdXW1oYAAC4PI0K\n+ZSUFP31r39VeXm5kpOTtW/fPqWkpFhdGwAAuAyNCvny8nKNHDlSW7Zs0cSJEzVnzhxVVlZaXRsA\nALgMjQr5mpoa+Xw+vf322xo6dKgk6fjx41bWBQAALlOjQr5v37665ZZbFBkZqeuuu06vvPKKunXr\nZnVtAADgMjgbs9LYsWP1yCOPqG3btpKk22+/XX369LG0MAAAcHka3JM/evSoDhw4oHnz5qmyslJf\nfvmlvvzyS50+fVqpqalNVSMAALgEDe7JFxYWavXq1fr000/10EMP1S+32+267bbbLC8OAABcugZD\nfsiQIRoyZIg2bNig+++/v6lqAgAAftCoc/IjRozQ6tWrVVlZKZ/PV7/8iSeesKwwAABweRp1df3U\nqVO1d+9e2e12ORyO+v8AAEDz1ag9+datWystLe2S3mDfvn2aNm2afvKTn+iBBx7Q3LlztWfPHrVv\n316SNGXKFA0ZMkSbN2/WmjVr5HA4NH78eCUmJsrr9SolJUWlpaVyOBxKS0tT586dL6kOAACuNI0K\n+ZtuuknFxcXq3r379xq8urpaCxYs0IABA85aPmvWLA0ZMuSs9bKyspSTkyOn06nExEQlJCQoLy9P\n4eHhWrp0qbZt26Zly5YpIyPje9UAAMCVqlGH6z/44AONHj1at912m4YOHaohQ4bUz3zXkJCQEL38\n8suKiopqcL3du3crNjZWoaGhCgkJUXx8vHbt2qX8/HyNGDFCkjRw4EAVFBQ0plwAAKBG7sk///zz\nlzS43W5XcHDwecvXrVunVatWqUOHDpo/f748Ho9cLlf9710ul9xu91nLbTab7Ha7vF6vnM5GlQ0A\nwBWtUWmZn59/weWJiYnf+w3HjBmjdu3aqVevXnrppZeUmZmpuLi4s9b59hX831ZXV3fR8du3by2n\n05qLAiMj21gyLs6gx03Dqj5XVIRZMq4/uVxhTbKdsS1bjx5fXKNCfteuXfU/19TU6OOPP1Z8fPwl\nhXz//v3rfx4+fLiefvppjRw5Uu+991798vLycsXFxSkqKkoej0c9e/aU1+v9uuCL7MVXVJz43jU1\nRmRkG7ndxywZG1+jx03Dyj4fPlxlybj+dPhwleXbGduy9ejxGQ192WlUyJ97ZX11dbXmzp17ScVM\nnz5ds2fPVpcuXbR9+3Zdf/31io2N1fz581VVVSWbzabCwkKlpqbq2LFj2rp1qwYNGqS8vDz169fv\nkt4TAIAr0SWd3L7qqqt04MCBi65XVFSkRYsWqbS0VE6nU7m5uUpKStKMGTN01VVXKTQ0VAsXLlRI\nSIhmzpypyZMny263Kzk5WWFhYRo1apS2bdumiRMnKiQkRIsWLbqUcgEAuCI1KuQnTpwom81W/+fy\n8nL17Nnzoq/r3bu31q5de97yO+6447xlCQkJSkhIOGuZ3W6/5PvzAQC40jUq5J988sn6n202m8LC\nwtSrVy/LigIAAJevUffJ9+3bV3a7XUVFRSoqKtLJkyfP2rMHAADNT6NCfvny5UpPT9fBgwdVXl6u\nBQsW6IUXXrC6NgAAcBkadbh++/btys7Olt3+9XcCr9erSZMmaerUqZYWBwAALl2j9uTr6urqA176\n+l51DtcDANC8NWpPvk+fPvrZz36mgQMHSpI+/PBD9enTx9LCAADA5bloyH/55ZeaN2+efv/732v3\n7t2y2Wz60Y9+pIcffrgp6gMAAJeowcP1+fn5uv/++3X8+HHdfffdmjdvnsaNG6cNGzZoz549TVUj\nAAC4BA2GfGZmplatWqU2bc7Mi9uzZ0+tWLFCzz33nOXFAQCAS9dgyPt8Pl1//fXnLe/Ro4dOnTpl\nWVEAAODyNRjyJ0589xPdjhw54vdiAACA/zQY8j169NCGDRvOW/7SSy/ppptusqwoAABw+Rq8un7O\nnDmaNm2a3njjDfXp00d1dXUqKChQWFgYM94BANDMNRjykZGReu2115Sfn6/PPvtMDodDd911l269\n9damqg8AAFyiRk2GM2DAAA0YMMDqWgAAgB81alpbAADQ8hDyAAAYipAHAMBQhDwAAIYi5AEAMBQh\nDwCAoQh5AAAMRcgDAGAoQh4AAEMR8gAAGIqQBwDAUIQ8AACGIuQBADAUIQ8AgKEIeQAADEXIAwBg\nKEIeAABDEfIAABiKkAcAwFCEPAAAhiLkAQAwFCEPAIChCHkAAAxFyAMAYChCHgAAQxHyAAAYipAH\nAMBQlof8vn37dMcdd2j9+vWSpLKyMiUlJWnSpEmaMWOGTp8+LUnavHmzEhMT9W//9m/auHGjJMnr\n9WrWrFmaOHGikpKS9NVXX1ldLgAAxrA05Kurq7VgwQINGDCgftny5cuVlJSkdevW6ZprrlFOTo6q\nq6uVlZWl1atXa82aNVq9erWOHj2qt956S+Hh4frtb3+rn/3sZ1q2bJmV5QIAYBRLQz4kJEQvv/yy\noqKi6pft2LFDw4YNkyQNGzZMH374oXbv3q3Y2FiFhoYqJCRE8fHx2rVrl/Lz8zVixAhJ0sCBA1VQ\nUGBluQAAGMXSkLfb7QoODj5rWXV1tYKCgiRJEREROnjwoA4dOiSXy1W/jsvlktvtlsfjqV9us9lk\nt9vl9XqtLBkAAGME9MI7n8/3vZbX1dVZWQ4AAEZxNvUbhoaGqqamRsHBwSovL1d0dLSioqLkdrvr\n1ykvL1dcXJyioqLk8XjUs2fP+j14p7Phktu3by2n02FJ7ZGRbSwZF2fQ46ZhVZ8rKsIsGdefXK6w\nJtnO2JatR48vrslDfsCAAcrNzdW9996r3NxcDR48WLGxsZo/f76qqqpks9lUWFio1NRUHTt2TFu3\nbtWgQYOUl5enfv36XXT8iooTltQdGdlGbvcxS8bG1+hx07Cyz4cPV1kyrj8dPlxl+XbGtmw9enxG\nQ192LA35oqIiLVq0SKWlpXI6ncrNzdXSpUuVkpKiV199VTExMRo7dqwcDodmzpypyZMny263Kzk5\nWWFhYRo1apS2bdumiRMnKiQkRIsWLbKyXAAAjGJpyPfu3Vtr1649b/mqVavOW5aQkKCEhISzltnt\ndqWlpVlWHwAAJmPGOwAADEXIAwBgKEIeAABDEfIAABiKkAcAwFCEPAAAhiLkAQAwFCEPAIChCHkA\nAAxFyAMAYChCHgAAQxHyAAAYipAHAMBQhDwAAIYi5AEAMBQhDwCAoQh5AAAMRcgDAGAoQh4AAEMR\n8gAAGIqQBwDAUIQ8AACGIuQBADAUIQ8AgKEIeQAADEXIAwBgKEIeAABDEfIAABiKkAcAwFCEPAAA\nhiLkAQAwFCEPAIChCHkAAAxFyAMAYChCHgAAQxHyAAAYipAHAMBQzkAXAAAmqa2t1b59+3T4cJXf\nxrz22m5yOBx+Gw9XDkIeAPyopGS/PspbqJiO4X4Zr7SsUho+T9279/DLeLiyEPIA4GcxHcPVtbMr\n0GUAnJMHAMBUhDwAAIZq8sP1O3bs0BNPPKEePXrI5/OpZ8+eevjhhzV79mz5fD5FRkYqPT1dQUFB\n2rx5s9asWSOHw6Hx48crMTGxqcsFYLja2lqVlOz323gHDnzBeVA0GwHZFvv27avly5fX/3nu3LlK\nSkpSQkKCMjIylJOTozFjxigrK0s5OTlyOp1KTExUQkKC2rZtG4iSARiqpGS/spa8qXbh0X4Z78BX\nRXpwgl+GAi5bQELe5/Od9ecdO3bomWeekSQNGzZMq1at0rXXXqvY2FiFhoZKkuLj41VQUKChQ4c2\ndbkADNcuPFod2l/tl7EqKsslHfDLWMDlCkjIFxcX67HHHlNlZaWmTZumkydPKigoSJIUERGhgwcP\n6tChQ3K5zlyd6nK55Ha7A1EuAAAtUpOHfNeuXfX444/rrrvu0pdffqkHH3xQXq+3/vfn7uVfbDkA\nALiwJg/56Oho3XXXXZKkLl26qEOHDtqzZ49qamoUHBys8vJyRUdHKyoq6qw99/LycsXFxV10/Pbt\nW8vptGZmqMjINpaMizPocdOwqs8VFWGWjOtPLlfYWZ+/JdaMr9GTi2vykH/zzTfldrs1efJkud1u\nHTp0SOPGjdPWrVs1evRo5ebmavDgwYqNjdX8+fNVVVUlm82mwsJCpaamXnT8iooTltQdGdlGbvcx\nS8bG1+hx07Cyz/6cytUqhw9XnfX5W2LN4N+Lb2voy06Th/zw4cM1c+ZMvfvuu/J6vfrVr36lXr16\n6amnntJrr72mmJgYjR07Vg6HQzNnztTkyZNlt9uVnJyssLDm/40bAIDmoslDPjQ0VCtWrDhv+apV\nq85blpCQoISEhKYoCwAA4zDjHQAAhiLkAQAwFCEPAIChCHkAAAxFyAMAYChCHgAAQxHyAAAYipAH\nAMBQhDwAAIYKyKNmAQCXpra2ViUl+/065rXXdpPDYc2DvRBYhDwAtCAlJfuVteRNtQuP9st4RyrL\n9djse9W9ew+/jIfmhZAHgBamXXi0OrS/OtBloAUg5IErSG1trfbt2+fXx6tyqBdovgh54ApSUrJf\nH+UtVEzHcL+MV1pWKQ2fx6FeoJki5IErTEzHcHXt7Ap0GQCaALfQAQBgKEIeAABDEfIAABiKc/JA\nM2XFpCcHDnzBX3rgCsLfd6CZ8vekJ5J04KsiPTjBb8MBaOYIeaAZ8/ekJxWV5ZIO+G08AM0bIQ9c\nIuYQB9DcEfLAJWJiGQDNHSEPXAYmloEJamtrVVz8mV/H5KhU80DIA8AV7u9//0oH/3c9R6UMRMjj\niuHvc+jcjgaTWHlUyorrV1yum/w6nqn4NwpXDH/fksbtaEDj+Pvv3pHKcs1PC1P79p38Mp7JCHlc\nUfx5Sxq3owGN5+/bQdE4hDwuCYffAKD5I+RxSTj8BgDNHyHfTPh7z7i2tlaSTQ6H/55BdO4tMRx+\nA4DmjZBvJqy4KGzE0H+0mFti6urq9Pnnn+vw4Sq/jcl9ugCudIR8M+Lvi8JiOp5oMRO1VB5z67Nd\neS3mSwkAtASEPJoNZo8D0Bgc+Ws8Qh4A0KJw5K/xCHkAQIvDkb/G8d+l1wAAoFlhTx4AgHNYMeFX\nIM77E/IAAJzDigm/Hpt9b5Of9yfkAQC4ABMm/CLkAQCwWF1dnQ4c+MKvYzbm8D8hDwCAxSqPuXXw\nf/PkrGra2/6afcinpaVp9+7dstlsmjdvnm688cZAlwQAwPcWiNv+mnXI79y5U1988YWys7NVXFys\n1NRUZWdnX/R1VjzsxeMJU2Vltd/GNHV2JQBA89GsQz4/P18jRoyQJHXv3l1Hjx7V8ePHFRoa2uDr\nrvSHvQAAIDXzkPd4POrTp0/9n9u3by+Px3PRkJeu7Ie9AAAgNfOQP5fP52v0ukcqy/32vseOeb7e\n+/aT0rJKxdxw/vKWVnNLq1dqeTX7s16p5dXMdmF9j78eK7pF1cx28d01n8vm+z7J2cQyMzMVFRWl\n++67T5KjO2YDAAAH5UlEQVQ0YsQIbd68Wa1btw5wZQAANH/Neu76QYMGKTc3V5JUVFSk6OhoAh4A\ngEZq1ofr4+Li1Lt3b02YMEEOh0O/+MUvAl0SAAAtRrM+XA8AAC5dsz5cDwAALh0hDwCAoQh5AAAM\n1awvvAuUkydPKiUlRYcOHVJNTY0effRRDR06VJL0wQcf6Kc//an27t0b2CJbuHN7/Nhjj2nQoEF6\n6qmndODAAYWFhenXv/612rRpE+hSW6wLbcdhYWF69tln5XQ61bp1ay1ZsoQe+8GpU6d0zz33aNq0\naerfv79mz54tn8+nyMhIpaenKygoKNAlGuHcPs+dO1der1dBQUFasmSJIiIiAl1is8Oe/AXk5eXp\nxhtv1Nq1a5WRkaG0tDRJUk1NjV588UVFRUUFuMKW79weL1y4UK+99poiIiL0+uuva9SoUfroo48C\nXWaLdqHteNGiRUpLS9OaNWsUFxfXqGdB4OKysrLUrl07SdLy5cuVlJSkdevW6ZprrlFOTk6AqzPH\nt/v83HPPacKECVq7dq1uv/12rVq1KsDVNU/syV/AqFGj6n8uLS1Vp06dJEkrVqzQpEmTlJ6eHqjS\njHGhHv/xj39UcnKyJGn8+PGBKs0YF+pxUFCQDh8+rK5du6qyslLdunULYIVm2L9/v/bv368hQ4bI\n5/Np586deuaZZyRJw4YN06pVqzRhwoQAV9nyfbvPkvT0008rJCREkuRyufTpp58Gsrxmi5BvwIQJ\nE3Tw4EGtWLFCJSUl+tvf/qbp06dr8eLFgS7NGN/0+Pnnn9eMGTP0pz/9Senp6YqKitIvf/lLtW3b\nNtAltnjf3o4dDoeSkpIUHh6u8PBwzZo1K9DltXiLFy/WL37xC/33f/+3JKm6urr+8HxERITcbncg\nyzPGuX1u1aqVJKmurk6//e1vNW3atECW12wR8g3Izs7W3r17NWvWLHXq1Enz588PdEnG+abH35zD\n7Natmx5//HE9//zzWrFihebMmRPoElu8b2/HLpdLWVlZuvnmm5Wenq7169crKSkp0CW2WJs2bVJc\nXJyuvvrCD8NiGhL/OLfP3/S1rq5Os2fPVv/+/dW/f/9AlthsEfIXUFRUpIiICHXs2FG9evXS8ePH\nVVxcXB9EbrdbSUlJWrt2baBLbbHO7XFtba3sdrv69u0rSbrtttuUmZkZ4Cpbtgv1eMeOHbr55psl\nSQMHDtRbb70V4Cpbtj/96U/66quv9N5776m8vFxBQUFq3bq1ampqFBwcrPLycq7h8YNv97msrEwh\nISHq2LGjNm3apOuuu469+AYQ8hewc+dOlZaWat68efJ4PPL5fHr33Xfrfz98+HAC/jKd2+Pq6mpN\nmDBB77//vsaNG6eioiJdd911gS6zRTu3xydOnFCPHj1UXFys7t2765NPPlHXrl0DXWaLlpGRUf9z\nZmamOnfurIKCAm3dulWjR49Wbm6uBg8eHMAKzXChPns8HgUHB+vxxx8PYGXNH9PaXsCpU6c0b948\nlZWV6dSpU0pOTq6/2EOSbr/99rNCH9/fhXrcr18/PfXUU3K73QoNDdXixYvlcrkCXWqLdaEeh4eH\na/HixQoKClK7du20cOFChYWFBbpUI3wTPrfddpvmzJmjmpoaxcTEKC0tTQ6HI9DlGeObPmdnZ6um\npkahoaGy2Wz6wQ9+wPNNLoCQBwDAUNwnDwCAoQh5AAAMRcgDAGAoQh4AAEMR8gAAGIqQBwDAUIQ8\nAE2aNEl5eXlnLTt16pT69u2r8vLyC74mKSlJ+fn5TVEegEtEyANQYmJi/YM/vvH222/r5ptvVnR0\ndICqAnC5CHkAGjlypHbt2qXKysr6ZZs2bVJiYqLeeecdTZgwQQ899JAmTZqk0tLSs167Y8cOTZw4\nsf7Pc+fO1caNGyVJW7Zs0QMPPKAHHnhAycnJqqysVG1trebOnasJEybo/vvv13/+5382zYcErkCE\nPAC1atVKd9xxR/0Daw4ePKi9e/dq+PDhOnr0qJ577jmtXr1a//Iv/6J169ad93qbzXbesrKyMr3w\nwgt65ZVXtH79et16661asWKF9u3bp927dys7O1sbNmxQr169VFVVZflnBK5EPKAGgCTpxz/+sZ55\n5hk98MADevPNN3XvvffK6XQqIiJCc+bMkc/nk8fjqX+K3cUUFhbK7XZrypQp8vl8On36tLp06aLu\n3bvL5XJp6tSpGjp0qO666y7mzwcsQsgDkCTFxsaqpqZGxcXFeuONN5SRkSGv16sZM2bojTfeUJcu\nXbR+/Xrt2bPnrNeduxdfU1MjSQoODlZsbKxWrFhx3nutW7dOn376qfLy8pSYmKjs7Gx16NDBug8H\nXKE4XA+gXmJiorKystS6dWt1795dx48fl8PhUExMjE6dOqV33323PsS/ERYWVn8FfnV1tT7++GNJ\n0o033qhPPvlEHo9HkrR161bl5eVpz5492rRpk374wx9q2rRp6t27t0pKSpr0cwJXCvbkAdS79957\ntXTp0vpHdoaHh+uee+7Rj3/8Y1199dV6+OGHNWfOHOXm5tbvwffq1Us9e/bUuHHjdM011yg+Pl6S\nFBUVpdTUVE2dOlWtW7dWq1attHjxYjmdTmVmZurVV19VcHCwunbtWv8aAP7Fo2YBADAUh+sBADAU\nIQ8AgKEIeQAADEXIAwBgKEIeAABDEfIAABiKkAcAwFCEPAAAhvo/h6YtceqVzXwAAAAASUVORK5C\nYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_distributions(first_hist, other_hist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember that the vertical axis is counts. In this case, we are comparing counts with different totals, which might be misleading.\n", "\n", "An alternative is to compute a probability mass function (PMF), which divides the counts by the totals, yielding a map from each element to its probability.\n", "\n", "The probabilities are \"normalized\" to add up to 1.\n" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "\n", "class Pmf(Hist):\n", " \n", " def normalize(self):\n", " total = sum(self.values())\n", " for element in self:\n", " self[element] /= total\n", " return self\n", " \n", " def plot_cumulative(self, **options):\n", " xs, ps = zip(*sorted(self.iteritems()))\n", " cs = np.cumsum(ps, dtype=np.float)\n", " cs /= cs[-1]\n", " plt.plot(xs, cs, **options)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can compare PMFs fairly." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [], "source": [ "first_pmf = Pmf(firsts.prglngth).normalize()\n", "other_pmf = Pmf(others.prglngth).normalize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using PMFs, we see that some of the difference at 39 weeks was an artifact of the different samples sizes:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfAAAAFmCAYAAACSk8i4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtcVHX+x/H3MIiGEIIOmJqmZJhmRZZpWLrKpl2s1szw\nQrVZuaXmZmmIRW2r4i2tljVrd11LS9atsMuvsja3yz7CDbW8UKaLAZZxGcEh8ILg/P4wx1Ag0jkM\n3+H1fDx6PObMmfmeD5/H2HvOZb7H5na73QIAAEYJ8HUBAADglyPAAQAwEAEOAICBCHAAAAxEgAMA\nYCACHAAAAwVavYHU1FRt3rxZNptNycnJ6t27t2ddQUGBpk6dqqqqKvXs2VOPP/641eUAAOAXLN0D\nz8rKUl5entLT0zVr1izNnj27xvq5c+dq/PjxWr16tex2uwoKCqwsBwAAv2FpgGdmZio+Pl6SFB0d\nrbKyMlVUVEiS3G63Nm7cqMGDB0uSHn30UbVv397KcgAA8BuWBrjT6VRERIRnOTw8XE6nU5JUUlKi\n4OBgzZ49W2PGjNGiRYusLAUAAL/SqBex/XTWVrfbraKiIt1xxx1auXKlvvzyS3300UeNWQ4AAMay\nNMAjIyM9e9ySVFRUJIfDIeno3njHjh3VqVMnBQQEqH///vrf//5X73hVVdVWlgsAgDEsvQo9Li5O\naWlpGjVqlLKzsxUVFaXg4GBJkt1uV6dOnZSfn6/OnTsrOztb119/fb3jlZbut6xWhyNUxcU/WDY+\n6HFjoMfWo8eNgz4f53CE1vq8pQEeGxurXr16KSEhQXa7XSkpKcrIyFBoaKji4+OVnJyspKQkud1u\nnXfeeZ4L2gAAQP1sJt1O1MpvY3zbsx49th49th49bhz0+bi69sCZiQ0AAAMR4AAAGIgABwDAQAQ4\nAAAGIsABADCQ5XcjAwDgVFRXVys3d5dXxzznnG6y2+11rq+qqtK9947Xvn2luv/+qbryykE/O+Z/\n/vOx+vW7QoGBjRupBDgAoEnKzd2lJQveVJuwKK+Mt89VqPumDVd0dPc6X+N0OlVdXaV//vONBo/7\nj3+8pD59LiPAAQA4pk1YlNqFd2y07aWlLdJ3332rOXP+oJiY89WtW7RWrVqhgwcPatKk3+vtt9/S\n119/pSNHjuimm25WQECAsrO3adq0KVq8+M+aNStFe/fu1eHDhzV+/AT17dvPslo5Bw4AwI8mTXpA\nZ5/dRWed1UE2m02S9M03u7RoUZrat++gzMz/6Nln/6Y///kvqq6u0tCh16pt23Z68sln9M03OXK5\nXEpLe16LFv1JZWUuS2slwAEAqMe553ZXYGCgzjzzTHXu3EUzZjykdeve17Bhx+7f4ZbbLXXpco72\n79+vWbMe04YNWYqPH2ppXQQ4AAD1CAxs4Xm8YMHTuvPOu7Vz5w5Nn/5Ajde1bNlKzz+/XDfeOELr\n13+q1NQnLK2LAAcAoAEKCr7XK6+kq3v3GE2cOMVziNxms6mqqko7dmzXe++9o969L9KDDz6svLxc\nS+vhIjYAQJO1z1XY6GP9eOr7JO3aObR16xZ98MF7Cgpqqeuvv0GSFBvbRxMn3qVnnlmq555botdf\nf012u12jRyd6q/Ta6+RuZEdx5xvr0WPr0WPr0ePG4XCEqqBgX6P/Drwp8sn9wAEAOFV2u73e32w3\nd5wDBwDAQAQ4AAAGIsABADAQAQ4AgIEIcAAADMRV6ACAJskXtxOtzebNn6tLl65q06aNbrnlBq1Y\nsVqtWrXyal2nggAHADRJubm7tGHdHHVoH+aV8fYUuKTByb/4p2n/939vaPToRLVp00ZSHbO8+AAB\nDgBosjq0D1OXThGNtr2qqirNnz9be/Z8p6qqKt155z36+OMP9c03uzRr1jxJbr3ySrrWr/9U1dXV\nWrQoTS1bttT8+bP1/fd7VFVVpfHjJ+iSSy7V5MkT1K1btGw2m6677gY9+eQ8BQUFqUWLID3xxBy1\nbh1yWrUS4AB8pqkcIgWO+de/1qply1ZKS3teTqdTkyffo/POi9HUqQ8rKqq9JCk6urvGjbtDf/jD\nI9q48TNVVFSoXTuHkpIelcu1T/fff69eeGGVJKlbt3N1440j9NRTCzVixC26+uprtGnTBu3du5cA\nB2CupnKIFDhm+/avFBvbR5LUrl07BQUFqaysTD+ddfzCCy/6cb1D5eXl2rZtq7Zu/UJbtnwht9ut\nw4crVVVVJUnq2bOXJOnKKwdq4cJU7d6dr1/9Kl6dO3c57VoJcAA+1diHSIH62Gy2GmF9+PDhky5Y\ns9uPR6fb7VZQUAvddtudGjLk6pPGO3Yr0j59LtPf/rZC//nPJ5oz5w+aOHGK54vCqeJnZAAA/Oj8\n83vq8883SJIKCwsUEBCg0NAwVVdX1/menj0v0McffyhJKi0t0XPP/fmk17z66mq5XC5dffUwjRo1\nWjt3fn3atbIHDgBosvYUuLw6Voee9b9myJCr9fnnG3X//b9TVVWVpk2bqU2bsvTIIw8rNXWhfnoV\n+rHbjg4e/Gtt3Jile++9U0eOuDV+/IQf1x9/badOZ+vRR5PUunWIWrYM0owZj53238PtRH/ELQKt\nR4+tZ1qPc3J2as+Xf/baIfS8b0vUoedES8+Bm9ZjU3E70eO4nSgAwCjcTrR+nAMHAMBABDgAAAYi\nwAEAMBABDgCAgQhwAAAMRIADAGAgAhwAAAMR4AAAGIgABwDAQAQ4AAAGIsABADAQAQ4AgIEsv5lJ\namqqNm/eLJvNpuTkZPXu3duzbvDgwerQoYNsNptsNpsWLlyoyMhIq0sCAMB4lgZ4VlaW8vLylJ6e\nrpycHM2cOVPp6eme9TabTX/961/VqlUrK8sAAMDvWHoIPTMzU/Hx8ZKk6OholZWVqaKiwrPe7XbL\noNuRAwDQZFga4E6nUxEREZ7l8PBwOZ3OGq957LHHNGbMGC1atMjKUgAA8CuNehHbiXvbU6ZMUVJS\nklauXKkdO3bovffea8xyAAAwlqXnwCMjI2vscRcVFcnhcHiWb7zxRs/jq666Sjt27NDVV19d53jh\n4cEKDLRbU6wkhyPUsrFxFD22nkk9Li0N0R4vjxkREWJ5D0zqscnoc/0sDfC4uDilpaVp1KhRys7O\nVlRUlIKDgyVJ5eXlmjJlipYuXaoWLVooKytLw4YNq3e80tL9ltXqcISquPgHy8YHPW4MpvW4pKTc\nkjGt7IFpPTYVfT6uri8ylgZ4bGysevXqpYSEBNntdqWkpCgjI0OhoaGKj4/XoEGDdOutt6pVq1bq\n2bOnhg4damU5AAD4Dct/Bz516tQayzExMZ7HiYmJSkxMtLoEAAD8DjOxAQBgIAIcAAADEeAAABiI\nAAcAwEAEOAAABiLAAQAwkOU/IwPgP6qrq5Wbu8tr4+Xn5/E/IeAU8W8HQIPl5u7SkgVvqk1YlFfG\ny/82W7cleGUooNkhwAH8Im3CotQuvKNXxip1FUrK98pYQHPDOXAAAAxEgAMAYCACHAAAAxHgAAAY\niAAHAMBABDgAAAYiwAEAMBABDgCAgQhwAAAMRIADAGAgAhwAAAMR4AAAGIgABwDAQAQ4AAAGIsAB\nADAQAQ4AgIEIcAAADESAAwBgIAIcAAADEeAAABiIAAcAwEAEOAAABiLAAQAwEAEOAICBCHAAAAxE\ngAMAYCACHAAAAxHgAAAYiAAHAMBABDgAAAYiwAEAMBABDgCAgQhwAAAMRIADAGAgywM8NTVVCQkJ\nGj16tLZu3Vrra5588kklJiZaXQoAAH7D0gDPyspSXl6e0tPTNWvWLM2ePfuk1+Tk5GjDhg2y2WxW\nlgIAgF+xNMAzMzMVHx8vSYqOjlZZWZkqKipqvGbu3LmaOnWqlWUAAOB3LA1wp9OpiIgIz3J4eLic\nTqdnOSMjQ5dffrk6dOhgZRkAAPidwMbcmNvt9jx2uVx67bXXtHz5cn3//fc11tUlPDxYgYF2y+pz\nOEItGxtH0WPrWdnj0tIQy8b2loiIEMs/Z3yOGwd9rp+lAR4ZGVljj7uoqEgOh0OStH79epWWlmrs\n2LE6dOiQdu/erblz5yopKanO8UpL91tWq8MRquLiHywbH/S4MVjd45KScsvG9paSknJLe8DnuHHQ\n5+Pq+iJj6SH0uLg4rV27VpKUnZ2tqKgoBQcHS5KGDh2qt956S+np6UpLS1PPnj3rDW8AAHCcpXvg\nsbGx6tWrlxISEmS325WSkqKMjAyFhoZ6Lm4DAAC/nOXnwE+8wjwmJuak13Ts2FEvvvii1aUAAOA3\nmIkNAAADEeAAABiIAAcAwEAEOAAABiLAAQAwEAEOAICBCHAAAAxEgAMAYCACHAAAAxHgAAAYiAAH\nAMBABDgAAAYiwAEAMBABDgCAgQhwAAAMRIADAGAgAhwAAAMR4AAAGIgABwDAQAQ4AAAGIsABADAQ\nAQ4AgIEIcAAADESAAwBgIAIcAAADEeAAABiIAAcAwEAEOAAABiLAAQAwEAEOAICBCHAAAAzUoAA/\nfPiwCgoKJEnbt2/XmjVrdODAAUsLAwAAdWtQgCclJemLL75QYWGhJk+erB07digpKcnq2gAAQB0a\nFOCFhYUaNmyY3n77bY0ZM0bTp0+Xy+WyujYAAFCHBgV4ZWWl3G633n//fQ0aNEiSVFFRYWVdAACg\nHg0K8L59+6pPnz5yOBzq2rWrli9frm7dulldGwAAqENgQ170m9/8Rvfcc4/OPPNMSdKQIUN0wQUX\nWFoYAACoW7174GVlZcrPz1dycrJcLpd2796t3bt36/Dhw5o5c2Zj1QgAAE5Q7x74559/rhdeeEFf\nffWVbr/9ds/zAQEBGjBggOXFAQCA2tUb4AMHDtTAgQO1atUqjR49urFqAgAAP6NB58Dj4+P1wgsv\nyOVyye12e56fMmWKZYUBAIC6Negq9AkTJmj79u0KCAiQ3W73/AcAAHyjQXvgwcHBSk1NPaUNpKam\navPmzbLZbEpOTlbv3r0961avXq1XX31VdrtdPXr0UEpKyiltAwCA5qZBe+AXXXSRcnJyfvHgWVlZ\nysvLU3p6umbNmqXZs2d71h08eFDvvPOOVq1apZdfflk5OTn64osvfvE2AABojhq0B/7JJ59o+fLl\nCg8PV2BgoNxut2w2mz788MN635eZman4+HhJUnR0tMrKylRRUaHWrVurVatW+vvf/y5JOnDggMrL\ny9WuXbvT+2sAAGgmGhTgzz777CkN7nQ6a0z4Eh4eLqfTqdatW3uee/7557VixQrdfvvt6tSp0ylt\nBwCA5qZBAZ6ZmVnr8yNHjvxFG/vpFezH3HPPPbrjjjt01113qU+fPoqNja3z/eHhwQoMtO7iOYcj\n1LKxcRQ9tp6VPS4tDbFsbG+JiAix/HPG57hx0Of6NSjAN27c6HlcWVmpLVu26JJLLvnZAI+MjJTT\n6fQsFxUVyeFwSJJcLpd27typSy+9VEFBQbrqqqu0adOmegO8tHR/Q8o9JQ5HqIqLf7BsfNDjxmB1\nj0tKyi0b21tKSsot7QGf48ZBn4+r64tMgwL8xCvQDxw4oBkzZvzs++Li4pSWlqZRo0YpOztbUVFR\nCg4OliRVVVUpKSlJb775ps444wxt2bJFN910U0PKAQCg2WtQgJ/ojDPOUH5+/s++LjY2Vr169VJC\nQoLsdrtSUlKUkZGh0NBQxcfHa9KkSUpMTFRgYKB69OihwYMHn0o5AAA0Ow0K8DFjxshms3mWCwsL\nFRMT06ANTJ06tcbyT9930003sdcNAMApaFCA//73v/c8ttlsCgkJUY8ePSwrCgAA1K9BE7n07dtX\nAQEBys7OVnZ2tg4ePFhjjxwAADSuBgX4008/rfnz56uoqEiFhYWaNWuWnnvuOatrAwAAdWjQIfT/\n/ve/Sk9PV0DA0byvqqrSuHHjNGHCBEuLAwAAtWvQHviRI0c84S1JgYGBHEIHAMCHGrQHfsEFF+h3\nv/udrrjiCknSp59+WmOKVAAA0Lh+NsB3796t5ORkvfPOO57bgl566aW66667GqM+AABQi3oPoWdm\nZmr06NGqqKjQddddp+TkZI0YMUKrVq3Stm3bGqtGAABwgnoDPC0tTcuWLVNo6PF5WGNiYrR06VI9\n9dRTlhcHAABqV2+Au91unXfeeSc93717dx06dMiyogAAQP3qDfD9++u++9e+ffu8XgwAAGiYegO8\ne/fuWrVq1UnP/+Uvf9FFF11kWVEAAKB+9V6FPn36dE2cOFGvv/66LrjgAh05ckSbNm1SSEgIM7EB\nAOBD9Qa4w+HQ6tWrlZmZqZ07d8put+uaa67RZZdd1lj1AQCAWjRoIpf+/furf//+VtcCAAAaqEFT\nqQIAgKaFAAcAwEAEOAAABiLAAQAwEAEOAICBCHAAAAxEgAMAYCACHAAAAxHgAAAYiAAHAMBABDgA\nAAYiwAEAMBABDgCAgQhwAAAMRIADAGAgAhwAAAMR4AAAGIgABwDAQAQ4AAAGIsABADAQAQ4AgIEI\ncAAADESAAwBgIAIcAAADEeAAABiIAAcAwEAEOAAABgq0egOpqanavHmzbDabkpOT1bt3b8+69evX\na/HixbLb7eratatmz55tdTkAAPgFS/fAs7KylJeXp/T0dM2aNeukgH7sscf0pz/9SS+//LLKy8v1\n8ccfW1kOAAB+w9IAz8zMVHx8vCQpOjpaZWVlqqio8Kx/7bXXFBkZKUmKiIjQvn37rCwHAAC/YWmA\nO51ORUREeJbDw8PldDo9y61bt5YkFRUV6dNPP9XAgQOtLAcAAL/RqBexud3uk57bu3ev7r33Xj3+\n+OMKCwtrzHIAADCWpRexRUZG1tjjLioqksPh8CyXl5fr7rvv1oMPPqj+/fv/7Hjh4cEKDLRbUqsk\nORyhlo2No+ix9azscWlpiGVje0tERIjlnzM+x42DPtfP0gCPi4tTWlqaRo0apezsbEVFRSk4ONiz\nfu7cufrtb3+ruLi4Bo1XWrrfqlLlcISquPgHy8YHPW4MVve4pKTcsrG9paSk3NIe8DluHPT5uLq+\nyFga4LGxserVq5cSEhJkt9uVkpKijIwMhYaGasCAAXrjjTeUn5+v1atXy2azafjw4brlllusLAkA\nAL9g+e/Ap06dWmM5JibG83jLli1Wbx4AAL/ETGwAABiIAAcAwEAEOAAABiLAAQAwEAEOAICBCHAA\nAAxEgAMAYCACHAAAAxHgAAAYiAAHAMBABDgAAAYiwAEAMBABDgCAgQhwAAAMRIADAGAgAhwAAAMR\n4AAAGIgABwDAQAQ4AAAGIsABADAQAQ4AgIEIcAAADESAAwBgIAIcAAADEeAAABiIAAcAwEAEOAAA\nBiLAAQAwEAEOAICBCHAAAAxEgAMAYCACHAAAAxHgAAAYiAAHAMBAgb4uAABMUV1drR07dqikpNxr\nY55zTjfZ7XavjYfmgwAH4Leqq6uVm7vLa+Pl5+ep6H8vqUP7MK+Mt6fAJQ1OVnR0d6+Mh+aFAAfg\nt3Jzd2nJgjfVJizKK+Plf5ut2xLC1KVThFfGA04HAQ7Ar7UJi1K78I5eGavUVSgp3ytjAaeLi9gA\nADAQAQ4AgIEIcAAADESAAwBgIMsDPDU1VQkJCRo9erS2bt1aY11lZaWSkpJ08803W10GAAB+xdIA\nz8rKUl5entLT0zVr1izNnj27xvr58+fr/PPPl81ms7IMAAD8jqUBnpmZqfj4eElSdHS0ysrKVFFR\n4Vk/depUz3oAANBwlga40+lURMTxCQ/Cw8PldDo9y8HBwVZuHgAAv9WoF7G53e7G3BwAAH7L0pnY\nIiMja+xxFxUVyeFwnPJ44eHBCgy0btJ/hyPUsrFxFD22npU9Li0NsWxsb4mICPH0wLR6URN9qZ+l\nAR4XF6e0tDSNGjVK2dnZioqKOumwudvtbvCeeWnpfivKlHT0g1Jc/INl44MeNware+zNu3BZpaSk\n3NMD0+rFcfz/4ri6vshYGuCxsbHq1auXEhISZLfblZKSooyMDIWGhio+Pl5TpkxRQUGBcnNzddtt\nt+nWW2/VddddZ2VJAAD4BctvZjJ16tQayzExMZ7HTz/9tNWbBwDALzETGwAABiLAAQAwEAEOAICB\nCHAAAAxk+UVsAICGq66uVm7uLq+Oec453WS3WzeHBnyDAAeAJiQ3d5eWLHhTbcKivDLePleh7ps2\nXNHR3b0yHpoOAhwAmpg2YVFqF97R12WgieMcOAAABiLAAQAwEAEOAICBOAcO+Inq6mrt2LHDqzfw\n4OploOkiwAE/kZu7SxvWzVGH9mFeGW9PgUsanMzVy0ATRYADfqRD+zB16RTh6zIANALOgQMAYCD2\nwAEf8faMW/n5efyDBpoR/r0DPuLtGbfyv83WbQleGQqAAQhwwIe8OeNWqatQUr5XxgLQ9HEOHAAA\nAxHgAAAYiEPoQC2suKWjxMQoALyHAAdq4e1JUSQmRgHgXQQ4UAcmRYE/OHLkiPLz87w6JkeSmgYC\nHH6B31QDtXP9UKyi/61TYLl1U+xaccopIuIir47nj/h/FPwCv6kG6mb10SRv//vb5yrUI6khCg8/\nyyvj+SsCHH6D31QDvuPNf39oGH5GBgCAgdgDR604pwUATRsB3gisCEOrrwLlnBYANG0EeCOwIgzv\nmzbc8t8Tc04LAJouAryReDMM+V0nAIAAN1Bj/K7T244cOaJvvvlGJSXlXhuTLx0AmjMC3FCmzRLm\n+qFYOzeu89rUpExLCqC5I8DRaEz70gEATRkBDgBoUjjl1jAEOACgSeGUW8MQ4ACAJodTbj+PAAcA\nNCsmTq5VGwIcANCsmDq51okIcABAs+MPM00S4AAAnAZfzY5JgAMAcBp8NTsmAQ4AwGnyxVXzlgd4\namqqNm/eLJvNpuTkZPXu3duz7tNPP9XixYtlt9t11VVX6b777vvZ8ay4erC6ulpOZ4hcrgNeG9Mf\nJw0AADQdlgZ4VlaW8vLylJ6erpycHM2cOVPp6eme9bNnz9ayZcsUGRmpcePGaejQoYqOjq53TG9f\nPShJ+d9mK37Q90waAAAwhqUBnpmZqfj4eElSdHS0ysrKVFFRodatW2v37t1q06aNoqKOBvHAgQO1\nfv36nw1wyftXD5a6CtWh/X4mDQAAGCPAysGdTqciIo6HYnh4uJxOZ63rIiIiVFRUZGU5AAD4jUa9\niM3tdp/SuhPtcxV6oxyPH35wHj3s7SV7Clzq0LPmc96suTHqlcyruSnXK5lXM5+Lk9Fj/6nZtHpr\nY3P/kuT8hdLS0hQZGalRo0ZJkuLj4/XGG28oODhY3333nR588EHPOfG0tDSFh4dr7NixVpUDAIDf\nsPQQelxcnNauXStJys7OVlRUlIKDgyVJHTt2VEVFhfbs2aOqqip9+OGHGjBggJXlAADgNyzdA5ek\nRYsW6bPPPpPdbldKSoq+/PJLhYaGKj4+Xhs2bNDChQslScOGDdMdd9xhZSkAAPgNywMcAAB4n6WH\n0AEAgDUIcAAADESAAwBgoGZ3M5ODBw8qKSlJe/fuVWVlpe69914NGjRIkvTJJ5/o7rvv1vbt231b\npB84sc/33Xef4uLi9PDDDys/P18hISF65plnFBoa6utSjVXbZzkkJESLFi1SYGCggoODtWDBAnrs\nBYcOHdL111+viRMnql+/fpo2bZrcbrccDofmz5+vFi1a+LpE453Y4xkzZqiqqkotWrTQggUL1LZt\nW1+X2OQ0uz3wdevWqXfv3lqxYoUWL16s1NRUSVJlZaWef/55RUZG+rhC/3Bin+fMmaPVq1erbdu2\n+uc//6lrr71WGzZs8HWZRqvtszx37lylpqbqxRdfVGxsbI17D+DULVmyRG3atJEkPf3000pMTNTK\nlSvVuXNnvfrqqz6uzj/8tMdPPfWUEhIStGLFCg0ZMkTLli3zcXVNU7PbA7/22ms9j/fs2aOzzjpL\nkrR06VKNGzdO8+fP91VpfqW2Pn/44YeaPHmyJOmWW27xVWl+o7Yet2jRQiUlJerSpYtcLpe6devm\nwwr9w65du7Rr1y4NHDhQbrdbWVlZeuKJJyRJv/rVr7Rs2TIlJCT4uEqz/bTHkvT444+rZcuWko5O\ns/3VV1/5srwmq9kF+DEJCQkqKirS0qVLlZubq6+//lr333+/5s2b5+vS/MqxPj/77LN64IEH9NFH\nH2n+/PmKjIzUY489pjPPPNPXJRrvp59lu92uxMREhYWFKSwsTA899JCvyzPevHnzlJKSooyMDEnS\ngQMHPIfM27Ztq+LiYl+W5xdO7HGrVq0kSUeOHNHLL7+siRMn+rK8JqvZBnh6erq2b9+uhx56SGed\ndZYeeeQRX5fkl471+dg5w27dumnSpEl69tlntXTpUk2fPt3XJRrvp5/liIgILVmyRBdffLHmz5+v\nl156SYmJib4u0Vhr1qxRbGysOnas/e6HTKNx+k7s8bGeHjlyRNOmTVO/fv3Ur18/X5bYZDW7AM/O\nzlbbtm3Vvn179ejRQxUVFcrJyfEETHFxsRITE7VixQpfl2q0E/tcXV2tgIAA9e3bV5I0YMAApaWl\n+bhKs9XW488++0wXX3yxJOmKK67QW2+95eMqzfbRRx/p22+/1b///W8VFhaqRYsWCg4OVmVlpYKC\nglRYWMh1M6fppz0uKChQy5Yt1b59e61Zs0Zdu3Zl77sezS7As7KytGfPHiUnJ8vpdMrtduuDDz7w\nrB88eDDh7QUn9vnAgQNKSEjQxx9/rBEjRig7O1tdu3b1dZlGO7HH+/fvV/fu3ZWTk6Po6Ght3bpV\nXbp08XWZRlu8eLHncVpamjp16qRNmzbp3Xff1Q033KC1a9fqyiuv9GGF5qutx06nU0FBQZo0aZIP\nK2v6mt1UqocOHVJycrIKCgp06NAhTZ482XPhhCQNGTKkRqDj1NTW58svv1wPP/ywiouL1bp1a82b\nN6/GPeHxy9TW47CwMM2bN08tWrRQmzZtNGfOHIWEhPi6VL9wLFwGDBig6dOnq7KyUh06dFBqaqrs\ndruvy/MLx3qcnp6uyspKtW7dWjabTeeee65SUlJ8XV6T0+wCHAAAf9DsfgcOAIA/IMABADAQAQ4A\ngIEIcAAaQ5N3AAAC7klEQVQADESAAwBgIAIcAAADEeBAMzBu3DitW7euxnOHDh1S3759VVhYWOt7\nEhMTlZmZ2RjlATgFBDjQDIwcOdJzo4hj3n//fV188cWKioryUVUATgcBDjQDw4YN08aNG+VyuTzP\nrVmzRiNHjtS//vUvJSQk6Pbbb9e4ceO0Z8+eGu/97LPPNGbMGM/yjBkz9Morr0iS3n77bY0dO1Zj\nx47V5MmT5XK5VF1drRkzZighIUGjR4/WH//4x8b5I4FmhgAHmoFWrVrp17/+tefmJkVFRdq+fbsG\nDx6ssrIyPfXUU3rhhRd01VVXaeXKlSe932aznfRcQUGBnnvuOS1fvlwvvfSSLrvsMi1dulQ7duzQ\n5s2blZ6erlWrVqlHjx4qLy+3/G8EmptmdzMToLm6+eab9cQTT2js2LF68803NXz4cAUGBqpt27aa\nPn263G63nE6n525mP+fzzz9XcXGxxo8fL7fbrcOHD+vss89WdHS0IiIiNGHCBA0aNEjXXHMN87ED\nFiDAgWbiwgsvVGVlpXJycvT6669r8eLFqqqq0gMPPKDXX39dZ599tl566SVt27atxvtO3PuurKyU\nJAUFBenCCy/U0qVLT9rWypUr9dVXX2ndunUaOXKk0tPT1a5dO+v+OKAZ4hA60IyMHDlSS5YsUXBw\nsKKjo1VRUSG73a4OHTro0KFD+uCDDzwBfUxISIjnSvUDBw5oy5YtkqTevXtr69atcjqdkqR3331X\n69at07Zt27RmzRqdf/75mjhxonr16qXc3NxG/TuB5oA9cKAZGT58uBYuXOi5NWNYWJiuv/563Xzz\nzerYsaPuuusuTZ8+XWvXrvXseffo0UMxMTEaMWKEOnfurEsuuUSSFBkZqZkzZ2rChAkKDg5Wq1at\nNG/ePAUGBiotLU3/+Mc/FBQUpC5dunjeA8B7uJ0oAAAG4hA6AAAGIsABADAQAQ4AgIEIcAAADESA\nAwBgIAIcAAADEeAAABiIAAcAwED/D8UBSTL827U6AAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_distributions(first_pmf, other_pmf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even so, it is not easy to compare PMFs. One more alternative is the cumulative mass function (CMF), which shows, for each $t$, the total probability up to and including $t$." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeEAAAFXCAYAAACV2fZmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl0XNWBLvrvDDWoSlWq0jzY8iDwPMQ2GBtDbGNhm4RM\nNPCcsUPcnTwawn3P6aTBCUPfZceEBLjd8e3VsALrdTcEBzo4I9hcMGDANsgMHoQHPMmyZM1jzXWG\n94fkko6qNFepqlTfby0t6exzStr2kfRp77MHQdd1HURERDThxGRXgIiIKFMxhImIiJKEIUxERJQk\nDGEiIqIkYQgTERElCUOYiIgoSUYUwqdPn8bNN9+M559/PurcgQMHcMcdd2DTpk34t3/7t7hXkIiI\naLIaNoT9fj+2bduGlStXxjy/fft27Ny5Ey+88ALee+89nD17Nu6VJCIimoyGDWGLxYLf/OY3KCws\njDpXW1sLl8uFoqIiCIKA1atX49ChQwmpKBER0WQzbAiLogiz2RzzXEtLC3JzcyPHubm5aGpqil/t\niIiIJrG4DsziCphEREQjJ4/nxYWFhWhubo4cNzY2xuy27k9RVMiyNJ4vS0REFJOmhuDvboCvuw6t\nDTXoar8EaK2QRCWp9Vq2/pcxy8cVwmVlZfB6vaivr0dhYSHeeustPP7440O+pr3dN54vOW4FBQ40\nN3cntQ40PN6n9MF7lR4m233SdR2a4kHI14CQvxFhfyOCvgYooTYI6OuVlYCUnow7bAhXV1fj0Ucf\nRX19PWRZxt69e3HTTTdhypQpqKysxMMPP4wtW7YAAG699VZMmzYt4ZUmIqLMoesawoEWhP2NCPkb\net83QlO8UdcKCfj64bCMUFiGopig6WZougWABYIoA5AgCAIgiP3e93wsRD4e/K8AYaK3Mkz2X2KT\n7a/ByYr3KX3wXqWHdLlPmhqItGxD/kaEfQ0IBZoAXR3X59V1IBQ2IRyWEQ7LULW+MIVghSBZIZuy\nIJlsMJntMFvtsNiyYbXZkWWzwGKVIYpjb1IXFDhilo+rO5qIiGi8NMUPb3s1vG2fIOSrH/fn83qt\n6OrOhsfngNlWhLzi6cgtKoLbbobFaoLZ0tt6TQEMYSIimnC6riHQdRbetiPwdZ4aU0tXVUV0ddvR\n3W1HV3c2urrtEM0FmDK9COVzc1EyNQeSlMIPhMEQJiKiCRT2N8PTdgS+tqNQFc+IXxcImtHV1Re2\nXd3Z8HqzYDLLmDLdjenzc1E+MxfZTmsCax9/DGEiIkqo0XY3d3tsUYEbCvUtGpVbYMfVC3pCt3hK\n6rd2h8IQJiKiuBttd7PXZ8WluiLU1RfBHzC2Zs0WCTNmuVFekYvyGenX2h0KQxiAoii4++7N6Oho\nx333bcGNN64Z9jXvvrsfK1ZcD1nmfyER0RWj6W5WVQl1lwtwqa4I7R1O9J9glFdoR/nMPJTPzEVR\nmTOtW7tDYYKgZw1sVVXw0kt/GvFrfve757Fs2bUMYSLKeKPtbg5rJThxIgf1l/OgqsYVFKddlYcb\nb74ajpzJ09odSkolSF1NO/a/9hk6WuO7qpYrz4bPr78aZdPcMc/v3PkE6uou4ec//2fMnj0XM2dW\n4IUX/guBQAD33vv/4JVX/oJTp05A0zR89at/A1EUUV19HD/+8f/Ak0/+b2zb9hBaW1sRDoexefMP\nsHz5irjWn4go1Yy2u1k2uyHZ5uPDKgtqzoWjzpstMm64+SrMml+UMtOHJkJKhfDbe06js90f98/b\n0erD23tO4xs/uC7m+Xvv/X/xs5/9E0pKSiM3//z5c3jhhZfh8/lw8OC7+N3v/gBFUbBnz19wyy1f\nxW9+8xQef/xfcf78WXR2dmLnzqfh9Xpw8OB7ca8/EVEq8XeeRlvtq1DDnUNeJ4hm2NzzYXcvwpnT\nEg7uOQclHB3A0ypysXrjbNgdlkRVOWWlVAinkquuuhqyLMPpdKK8fBoeeOAfsXbtOmzceGvvFTp0\nHZg2bTp8Ph+2bXsYN964BpWVG5JabyKiRNHUINrrXoO39eMhr7Nkz0B23mJk5cyBp1vF3j+fQv3F\njqjrzBYJq9ZdhdkLizOq9dtfSoXw6o2z8M5rn6E9zt3R7jwbblx/9aheI8umyMe//OW/4LPPTuG1\n1/Zgz55X8MQTv46cs1isePrp/w/Hjh3BK6/8BQcOvIMHHngobnUnIkoFQc9FtNb8EUqoPeZ52eyG\nPW8x7LmLIJtd0HUd1R/X4+CbZ6GEtajrp87MxZqNsybVSOexSKkQLpvmxqa/X57sahg0NFzGu+++\njdtv34Srr56Nv/u77wAABEGAoig4ffokLlw4j/Xrb8HcufNx773fT3KNiYjiR9dUdDa8ha7GAwAG\nbjUgwJ67GPa8z8FinxppzXZ3BvDmKydRVxPd+jWZe1q/cxZlbuu3v5QK4WQa7HshP78Ax44dxRtv\nvAaz2YJbb/0yAGDJkmW4556/w7/+67/jqaf+DX/848uQJAlf//q3J7DWRESJE/I3obXmDwj7G6LO\nyZY85E37Kiz2skiZrus4ceQyDuw7i3AoeqDWlOlurLlldsaMfB4J7qJEKYn3KX3wXqWH0dwnXdfR\n3XQIHZf3xRz1nF2wHK7SdRDFvsd2nq4A3nr1FGrPR3dXm8wSrr+pAnMXl2Rs65e7KBER0bCUUAda\na/6IoKcm6pxkciC3/MvIclZEynRdx8mjDTiw7wxCwejALpvmwppbZsPpykpovdMVQ5iIiKDrOrxt\nR9B+aQ90LRR13uZegNwpt0CU+8LU0x3E26+ewsVzbVHXyyYRK9dWYP6S0oxt/Y4EQ5iIKMOpYS/a\nav8Kf+fJqHOiZIV76hdhd8+PlOm6jlPHG/He65/FbP2Wlruw9gts/Y4EQ5iIKIP5O0+j9eKfoSne\nqHNWx0zkln8ZstlpKK965wI+PBDdXS2bRKxYMxMLlpax9TtCDGEiogw01MIbgiDDVXYzsvOviQrT\nk8caYgZwyZQcrP3iHOS42fodDYYwEVGGGWrhDbOtFHnTvgqTNT/qXP3FDrz96ilDmSyLuG71TCy8\nhq3fsZice0PFyZEjH6Ojo2ey+R13fBmBQCDJNSIiGjtdU9FR/wYaP/uPGAEsIKd4NYpm3RUzgDvb\nfdjz8nFoWt+sVkkS8KWvL8aia6cwgMeIITyEv/71T2hvvzLqj99gRJS+/N0NaDj9G3Q1voeBK1/J\nljwUzfoeckpWQxCkqNcG/GH89aVjCAYUQ/lNt85FcVlOIqs96aVUd3Sg+zzaal+FEmyJ6+eVLfnI\nnXoLrI4Zg16jKAoee2w76uvroCgKvve972P//rdw/vw5bNv2CwA6/vu/d+HQoQNQVRVPPLETFosF\njz22HZcv10NRFGze/AMsXXoNfvjDH2DmzAoIgoAvfvHLePzxX8BsNsNkMuN//s+fw27Pjuu/j4ho\nMFcW3qi9vA/6CBfe6E9VNezdXY3ONuMOd9feOB1XzS1MSJ0zSUqFcFvtX6EEo+ebjZcSbEFb7V9R\nOu/eQa95/fW9sFis2LnzabS0tOCHP/w+Zs2ajS1b/glFRcUAgIqKq/Gtb30X//zPP8OHH34Ar9eL\n/PwC3H//g+js7MB9992N//iPFwAAM2deha985Tb8r//1K9x22x1Yv/4WfPTRYbS2tjKEiWjCtNX+\nJebgq1gLbwyk6zr27z0dtQPSrPlFWHb9tLjXNROlVAgn08mTJ7BkyTIAQH5+PsxmM7q6utB/Vc9F\nixb3ni+Ax+PB8ePHcOzYJzh69BPouo5wOARF6emumTevZ07djTeuxq9+tQO1tRexdm0lysv5jUtE\nEyPQfT5mAMdaeCOWT96vxcmjxnWji6fkYM0ts/kMOE5SKoRzp34RbZdehRKIc3e0NR+5U24Z8hpB\nEAyBGw6HYbUaFxmXpL7/Ll3XYTab8J3vfA/r1q2P/pq9WyEuW3Ytnnnmv/Duu+/g5z//Z9xzz/+I\nhD0RUaLouob2utcMZbEW3hjMuVPNOPTWOUOZ02XFxtvmQ5I5nCheUiqErY4ZKJ37D0n52nPnzsPH\nHx/GunU3o7GxAaIowuHIgapGP0O5Yt68Bdi//y2sW7ce7e1tePHFF/CDH9xjuOb3v38R119/A9av\n3whAx2efnWIIE1HCeduOIOxvNJQVVHwDFvuUYV/bdLkLb/z5hKHMbJHwhTsWIstmjms9M11KhXAy\nrVu3Hh9//CHuu+//hqIo+PGPf4qPPqrCz372T9ix41foPzr6Si/MTTfdjA8/rMLdd38PmqZj8+Yf\n9J7vu3bKlKl48MH7Ybdnw2Ix44EHHp7IfxYRZSBNDaKjfp+hLLd4yYgC2NMVwKu/Pw5F0SJloihg\nw9cWwJ1nj3tdMx23MqSUxPuUPnivUk9H/T50Nb4bORYEGQtu+Ak6PUO3u8IhBbuf+xitTcYlLFdv\nnIV5nytNSF0zxWBbGbJjn4hoElFCnehuOmQocxSuhDnLPeTrNE3H//njiagAXrx8KgM4gRjCREST\nSEf9G9D1vkU1RDkbzqJVw77u4JtnUXO21VA2/eo8rFgzM+51pD4MYSKiSSLovQRf+3FDmat0LURp\n6MFU1R/X4WjVJUNZflE2Kr80D6LIqUiJxBAmIpoEdF1H+6W9hjJTVjHsuYuHfF3t+Ta889pnhjJ7\nthm33L4QJnP0EpYUXwxhIqJJwNdRjZCvzlDmLrsZgjD4r/m2Fi9e+0M1+g/PlU0ibrl9IbIdlkRV\nlfphCBMRpTlNC6Oj7g1DWVbO7CHXy/d5Q3jlpWMIBY1rIVR+aR4KimOP5KX4YwgTEaW57qZDUMOd\nfQWCCFdp5aDXK4qKPS8fR3encXvWlWtnYsas6G0MKXEYwkREaUwNe3q3J+zjyL8WJmtezOt1Xceb\nr5xCY12XoXzu4hIsXj41YfWk2BjCRERprOPym9C1UORYlLKQU/z5Qa8//O4FnPm0yVBWNs2FG9df\nzU0ZkoAhTESUpkK+hqhdknJKVg+6O9Lp6kYcfq/GUObKzcKGr82HJDEOkoH/60REaUjX9ahdkmRL\nPrLzY28Qc/F8G9585aShzJol4wt3LILFakpYPWlo3MCBiCgN+btOI+i5YChzl1VCEKLn9nZ1+LH7\nvz6GpvbNRRIlARtvW4Ac99B7ClNisSVMRJRmdE1FR93/MZRZHTNhdV4ddW0wEMYrLx2DzxsylK+9\nZTZKproSWk8aHkOYiCjNdLcchhJs61ciwFV2c9TAKl3X8cafT6K91WcoX3b9NMxaUDwBNaXhMISJ\niNKIqvjR1fC2oSw7bwnMWUVR19acbY3alOGquQW49sbpiawijQJDmIgojXQ2vA1N7VtkQxDNyClZ\nE3Wdqmo4sO+soaywxIG1X5jDqUgphCFMRJQmwoEWeJoPG8pyim+EZMqOuvb4R3XobPNHjgUBWHPL\nbMgmbsqQShjCRERpoqPudQBa5Fgyu+AouC7qOr8vhMPvGucDL7muHHmF0WFNycUQJiJKA4Huc/B3\nnTaUuUrXQRCjZ5oefvcCQkElcmwyS1i7cU7C60ijxxAmIkpxuq6h/ZJxSpLFPhU217yoa9uavaj+\nuN5QtmzVNNi5NWFKYggTEaU4b+snCAcaDWWusvUxpyQd2HfGsD+w02XFomVTJqKaNAYMYSKiFKap\nQXRcftNQZnMvhMVeFnXtxXNtqD3fbihbubYCksxf9amKd4aIKIV1Nb4LTfFGjgVBhqv0pqjrVFXD\ngTfOGMpKy13cHzjFMYSJiFKUEuxAV9MhQ5mjaCVkc07UtdUf16Oj35QkAFi1roJzglPciDZw2LFj\nB44cOQJBELB161YsXLgwcu7555/Hn//8Z0iShAULFuCBBx5IWGWJiDJJR/0bgK5GjiU5G87CVVHX\nBfxhHH73gqFs7uIS5Bc5El1FGqdhQ7iqqgo1NTXYtWsXzp49i5/+9KfYtWsXAMDj8eCZZ57BG2+8\nAUEQsHnzZhw9ehSLFi1KeMWJiCazoKcWvo5qQ1lO6U0QJXPUtYffvYBgwDglafnnZyS8jjR+w3ZH\nHzx4EJWVlQCAiooKdHV1wevteT5hNpthNpvh8XigKAoCgQBycqK7SYiIaOR69greaygzZRXDnrs4\n6tr2Fi+Of1RnKFt2/TTY7NFhTaln2BBuaWlBbm5u5NjtdqOlpQVATwjfc889qKysxLp167Bo0SJM\nmzYtcbUlIsoAvvbjCPmMc33dMaYkAcCBfWejpyRdwylJ6WJEz4T70/vdbY/Hg6eeegqvvfYa7HY7\nvvOd7+DUqVOYPXv2oK93u22Q5eSuXVpQwOck6YD3KX3wXsWPpoZw+YRxSpKrcD6mzlwYde1nJxpx\n8VyboWzDVxaguCR2jyTvU+oZNoQLCwsjLV8AaGpqQkFBAQDg3LlzmDp1aqQL+pprrkF1dfWQIdze\n7hv03EQoKHCgubk7qXWg4fE+pQ/eq/jqbNiPcKCjr0AQkZW/Nur/WFU17Nl93FBWOjUHecX2mPeD\n9ym5BvsDaNju6FWrVmHv3p5nE9XV1SgqKoLNZgMAlJWV4dy5cwiFQgCA48ePszuaiGiM1HA3uhrf\nM5Q58pfDZMmNuvbTT+rR3mps1Fy/7ipOSUozw7aElyxZgvnz52PTpk2QJAkPPfQQdu/eDYfDgcrK\nSmzevBnf/va3IcsylixZgmXLlk1EvYmIJp2O+jeha+HIsShlIaf481HXBfxhVL1zwVA2Z1ExCorZ\n3ZxuRvRMeMuWLYbj/t3Nd955J+6888741oqIKMMowXZ42z4xlOWUrIYoW6OuPfxe9JSk6zglKS1x\nxSwiohQQ9NYajmVLPrLzo3sW21u9qP7IOHJ66cpy2LK5S1I6YggTEaUAJdRhOM7KuQqCED2T5OC+\ns9C0vlkqjhwrFl3LKUnpiiFMRJQClKAxhGWzO+qai+faUHPWOCVp5dqZSZ/2SWPHECYiSgEDW8ID\nN2nQNA0H9hl3SSqZkoOZswsSXjdKHIYwEVEKiA5hY0v4008uo73FOCVpVSWnJKU7hjARUZLpugY1\n1GkokyyuyMfBQBhV75w3nJ+zkFOSJgOGMBFRkqmhLgB9g61E2Q5RNEWOD79Xg4C/b0qSbBKxfDWn\nJE0GDGEioiSL7oruawV3tPlw/EPjLklLV06DnVOSJgWGMBFRkg0VwgcGTklyWrCYU5ImDYYwEVGS\nRYVw7/Pg2vNtqDnTaji3Ym0FZBOnJE0WDGEioiQbOEdYMrt6pySdNZQXT3GiYg6nJE0mDGEioiRT\nQ+2GY9nswokjl9HW7DWUr+IuSZMOQ5iIKMmUAdOTNN2OD/ZfMJTNXlCEwhLnBNaKJgJDmIgoiXRN\ngRruMpQd+bATAX/floayScR1q2dOdNVoAjCEiYiSSBkQwIKUjWOHGwxlS1eUw+7glKTJiCFMRJRE\nStD4PNjjNRumJGU7LVi8fOpEV4smCEOYiCiJBi5X2d4mG45XrJnJKUmTGEOYiCiJlAEjo/1+a+Tj\nojInrppbONFVognEECYiSqKBI6N9/UKYU5ImP4YwEVESDXwm7Pf3DMCaMSsfRaWckjTZMYSJiJJo\nsJZwYQm3KcwEDGEioiTRtDA0xRM51nUgEOhpCee4s5JVLZpADGEioiRRB2zc4A9YoOs9v5Zz3LZk\nVIkmGEOYiChJBm7c0H9kNFvCmYEhTESUJAO3MLzyPNiebYbJzLnBmYAhTESUJIOFcE4uu6IzBUOY\niChJBobwle5odkVnDoYwEVGSDFyy0ue70hJmCGcKhjARUZJELdTROz3JxZZwxmAIExElgaYGoan+\nvmNN6DdHmM+EMwVDmIgoCaKeBwcsAHrWiXa6rTFeQZMRQ5iIKAmiRkb3Pg/Odlogy5yelCkYwkRE\nSaAOXKgjwJHRmYghTESUBIO1hDlHOLMwhImIkmCwOcIcGZ1ZGMJEREkw6GpZDOGMwhAmIkqCQVfL\n4kIdGYUhTEQ0wTTFD10NRo5VVUQwZIIgAE4XQziTMISJiCZYdCu4Z45wttMKSeKv5UzCu01ENMEG\nex7sYld0xmEIExFNMGXgHGEOyspYDGEiogk2+MhozhHONAxhIqIJNmgIszs64zCEiYgmmDrY9CR2\nR2cchjAR0QTSdT1mS1gQAEcOd0/KNAxhIqIJpCk+6Fo4cqwoEsJhGU5XFqcnZSDecSKiCRTdCu6Z\nI8yu6MzEECYimkCDLlfJEM5IDGEiogk0cFAWR0ZnNoYwEdEEGnyhDs4RzkQMYSKiCcQlK6k/hjAR\n0QSK9UxYFAVkOy1JqhElkzySi3bs2IEjR45AEARs3boVCxcujJxraGjAli1boCgK5s2bh0ceeSRR\ndSUiSmuDzRF2uqwQRbaJMtGwd72qqgo1NTXYtWsXtm3bhu3btxvOP/roo9i8eTNefPFFSJKEhoaG\nhFWWiCidqYoH0NXIcTgsQ1FkPg/OYMOG8MGDB1FZWQkAqKioQFdXF7xeL4Cev+o+/PBD3HTTTQCA\nBx98EMXFxQmsLhFR+lKDHBlNRsOGcEtLC3JzcyPHbrcbLS0tAIC2tjbYbDZs374d3/jGN/DEE08k\nrqZERGku+nlwz3NgzhHOXKN+CKHruuHjpqYmfPe738Vzzz2HTz/9FG+//XZcK0hENFlwZDQNNOzA\nrMLCwkjLFwCamppQUFAAoKdVXFZWhilTpgAAVq5ciTNnzmD16tWDfj632wZZlsZb73EpKHAk9evT\nyPA+pQ/eq5HxNnkNx1fmCM+oKIArN/HPhXmfUs+wIbxq1Srs3LkTd955J6qrq1FUVASbreebRZIk\nTJkyBRcvXkR5eTmqq6tx6623Dvn52tt98an5GBUUONDc3J3UOtDweJ/SB+/VyHk6mwzHPr8VkiQg\npCgJ/z/kfUquwf4AGjaElyxZgvnz52PTpk2QJAkPPfQQdu/eDYfDgcrKSmzduhX3338/dF3HrFmz\nIoO0iIjISA11Go59fiuc7iwIgpCkGlGyjWie8JYtWwzHs2fPjnxcXl6O3/72t/GtFRHRJKPrGpQB\nIez3W5FfwufBmYyzw4mIJoAa7gagRY6DQRNUVeIc4QzHECYimgBKsN1w7A9wZDQxhImIJsTArmif\nj/sIE0OYiGhCKKGBLWEu1EEMYSKiCRE1MtpnhSyLsDu4e1ImYwgTEU2A6JYwpycRQ5iIaEIoAzdv\n8FnZFU0MYSKiRNM1tXeKUh9/wMqR0cQQJiJKNCXcCaBv85tAwAxNEzlHmBjCRESJpg6yexK7o4kh\nTESUYAOfB0f2EWZ3dMZjCBMRJVisfYRNZgk2uzlJNaJUwRAmIkqwgSHs91uR4+L0JGIIExElXKyW\nMLuiCWAIExElnBr1TJhzhKkHQ5iIKIE0LQxV8USOdb1n3WiGMAEMYSKihBq4ZnQgYIGui8jJ5Rxh\nYggTESVUrOfBAOcIUw+GMBFRAsUKYbNFQpbNlKQaUSphCBMRJVCshTpyuHsS9WIIExElUKwlK7lm\nNF3BECYiSqCYC3XweTD1YggTESUQF+qgoTCEiYgSRFND0BRf37EmIMA5wtQPQ5iIKEGiuqIDFgAC\nXJwjTL0YwkRECTJwUJbfb4XFKsOaxelJ1IMhTESUIDGfB7MrmvphCBMRJUj0HGEOyiIjhjARUYJE\nt4QtnCNMBgxhIqIE4RxhGg5DmIgoQWI9E3axO5r6YQgTESWApgSgq4HIsaoKCAbNbAmTAUOYiCgB\noucIW2G1mWGxcnoS9WEIExElQMyuaLaCaQCGMBFRAnBQFo0EQ5iIKAFib9zA6UlkxBAmIkoAdeBC\nHT62hCkaQ5iIKAGiWsLcPYliYAgTEcWZruvRz4TZEqYYGMJERHGmqX7oWihyrCgiZLMdZoucxFpR\nKmIIExHFWeyR0RyURdEYwkREcTZwUJaPuyfRIBjCRERxxjnCNFIMYSKiOIs5R5jd0RQDQ5iIKM6U\nYLvh2M/dk2gQDGEiojiL1RJ2sjuaYmAIExHFUc8c4U5DmSA5YDJJSaoRpTKGMBFRHGmKF9CVyHE4\nLMHuzElijSiVMYSJiOJICRmfB/s4MpqGwBAmIoojJWjsivZzjjANgSFMRBRHsVrCLraEaRAMYSKi\nOBo4KItLVtJQRhTCO3bswKZNm/D1r38dx44di3nN448/jm9/+9txrRwRUboJB6Nbwk63NUm1oVQ3\nbAhXVVWhpqYGu3btwrZt27B9+/aoa86ePYvDhw9DEISEVJKIKF2EA8YQFiQHZJnTkyi2YUP44MGD\nqKysBABUVFSgq6sLXq/XcM2jjz6KLVu2JKaGRERpQtc1aEqXocyclZuk2lA6GDaEW1pakJvb903k\ndrvR0tISOd69ezeuu+46lJaWJqaGRERpQg13Q4AWOQ6FZDhcziTWiFLdqAdm6boe+bizsxMvv/wy\n7rrrLui6bjhHRJRpYi1XyZHRNBR5uAsKCwsNLd+mpiYUFBQAAA4dOoT29nZ885vfRDAYRG1tLR59\n9FHcf//9g34+t9uW9OcjBQWOpH59Ghnep/TBe9WjNRwwHPv9Vlw9Lzdl/n9SpR7UZ9gQXrVqFXbu\n3Ik777wT1dXVKCoqgs3WM9x+w4YN2LBhAwCgrq4ODzzwwJABDADt7b44VHvsCgocaG7uTmodaHi8\nT+mD96pPZ0uD4djnt0KQkBL/P7xPyTXYH0DDhvCSJUswf/58bNq0CZIk4aGHHsLu3bvhcDgiA7aI\niAgIDRgZ7Q9Y4XSxO5oGN2wIA4ga+Tx79uyoa8rKyvCf//mf8akVEVEaCvraDMe64IAkcU0kGhy/\nO4iI4kQNG1fLMlvdSaoJpQuGMBFRHOi6CmgeQ1mWnXOEaWgMYSKiOFBDXRCEvmmagaAZTjfnCNPQ\nGMJERHEwcI6w32/hFoY0LIYwEVEcxFqoI4cLddAwGMJERHEQGjAy2u+3wpHD3ZNoaAxhIqI48Htb\nDcecnkQjwe8QIqI4CAeN3dGy2ZWkmlA6YQgTEcWBrhq3MLRwC0MaAYYwEdE46ZoCEX3r4us6YHfl\nJbFGlC4YwkRE46SEOiEIfceBgBk5bu5YRMNjCBMRjVPMfYQ5R5hGgCFMRDROQZ9xZHQgkIVspyVJ\ntaF0whA9JWnIAAAX5klEQVQmIhonX5cxhFU9G6LIX680PH6XEBGNU9BvXKhDlHOSVBNKNwxhIqJx\n0gZuYZjFLQxpZBjCRETjJKDbcJzl4PQkGhmGMBHROGhaGLIU7HcMON35SawRpROGMBHROKgDlqsM\nBCxw5WYnqTaUbhjCRETj4Pe0GI85PYlGgSFMRDQOnk5jCCuaHUL/5bOIhsAQJiIah8CALQwF0Zmk\nmlA6YggTEY3DwCUrZQunJ9HIMYSJiMZDM25haLVzC0MaOYYwEdE4yKLXcJztKkhSTSgdMYSJiMZI\nVQKQ5XDfsSYgJ5dzhGnkGMJERGPk6zKOjA4ErMh2WpNUG0pHDGEiojHq7mgyHIcVG6cn0agwhImI\nxsjXbZyepMORpJpQumIIExGNUTjQbjiWzNzCkEaHIUxENEaaMnALQ+6eRKPDECYiGiNR8BiO7Q6O\njKbRYQgTEY2BpmmwmHyGMmdeYZJqQ+mKIUxENAY+TzdkWY0cq6oIu9OVxBpROmIIExGNQVdro+E4\nFLJBFPkrlUaH3zFERGPgHbBQh6pnJ6kmlM4YwkREYxDwtRmOBZlbGNLoMYSJiMZAHbCFoYlbGNIY\nMISJiMZA0LsNx1kOzhGm0WMIExGNkq7rMMnGLQwdbk5PotFjCBMRjZKnO4gsa8BQZuNCHTQGDGEi\nolHqbG2FJGmRY1WVIclZSawRpSuGMBHRKHk6mg3HYdXOLQxpTBjCRESj0FDXiXMnzhoLRW5hSGPD\nECYiGqELn7Xgzy8cgd1m3MLQkpWbpBpRupOTXQEionTw6Sf12L/3NKaXX8KMafWGc+7C0iTVitId\nQ5iIaAi6rqPqnQv48EANppXXYd6cc4bzgmiBPXdhkmpH6Y4hTEQ0CFXVsH/PaZw81oDyKfVYMNf4\nLFgQZBRUbOLIaBozhjARUQzhkILX/vApLp5rw5SyBiycf8ZwvieAvw5r9rQk1ZAmA4YwEdEAPm8I\nr7x0DM0N3SgrbcSi+aeNFwgS8mf+X7A6ZiSngjRpMISJiPrpbPfhL787iq6OAEpLmrB4wSkYpgAL\nIgpm3IksZ0XS6kiTB0OYiKhXY30XXnnpGAL+MIqLmrF4wUljAENE/ow7kJVzdbKqSJMMQ5iICEDN\nmVa89sdqKGENRYUtWLLoJETDSgoC8mf8DWw5s5NVRZqERhTCO3bswJEjRyAIArZu3YqFC/uG4x86\ndAhPPvkkJEnCjBkzsH379oRVlogoET49Uo/9e05D14HCglYsXXwCoqj3u0JA3vTbYHPNTVodaXIa\ndsWsqqoq1NTUYNeuXdi2bVtUyD788MP49a9/jd/+9rfweDzYv39/wipLRBRPPXOAz+PtV3sCuCC/\nDUs/9+mAAAbypn0Fdvf8JNWSJrNhW8IHDx5EZWUlAKCiogJdXV3wer2w2+0AgJdffjnycW5uLjo6\nOhJYXSKi+NA0DW/vOY2TRxsAAHm57Vj2uWpIAwI4t/zLsOcuSkYVKQMM2xJuaWlBbm7fuqhutxst\nLS2R4ysB3NTUhAMHDmD16tUJqCYRUfyEQyr2/P54JIBz3R24dmk1JGlAAE+9Fdl5n0tGFSlDjHpg\nlq7rUWWtra24++678cgjjyAnJ2fI17vdNsiyNNovG1cFBdzxJB3wPqWPdLpX3u4gXnj+A9TX9vTa\nuV2duHbpccP+wAAwdc7XUFh+fTKqmDDpdJ8yxbAhXFhYaGj5NjU1oaCgIHLs8Xjw93//9/jRj36E\nlStXDvsF29t9Y6xqfBQUONDc3J3UOtDweJ/SRzrdq/5zgAHAldOF5cuOQ5aNAewq2wAha2Ha/LtG\nIp3u02Q02B9Aw3ZHr1q1Cnv37gUAVFdXo6ioCDabLXL+0UcfxV133YVVq1bFqapERPHXdLkLL//X\nx5EAzsnpwvJlxyDLquE6V2klnIXXJaOKlIGGbQkvWbIE8+fPx6ZNmyBJEh566CHs3r0bDocDN9xw\nA/70pz/h4sWLePHFFyEIAr70pS/hjjvumIi6ExGNSM3ZVrz2h545wADgdHTjumXHYTIZAzin5CY4\niyZXFzSlNkGP9ZA3gZLdHcIumfTA+5Q+UvleaZqGT96vxQf7z+PKbzqHw4MV1xyF2awYrs0pXo2c\nksk7sDSV71MmGKw7mitmEdGk1NHmw76/nERjfVekLDvbi+uuORYVwM6iG+As/vxEV5GIIUxEk4uu\n6zj2YR3ef+scFKVvwJXd7sOKa47CYg4brncUrkROyVoIxkWiiSYEQ5iIJo2uDj/efOUU6i8aFw2y\n2fxYee1RWCwDArjgOrhKKxnAlDQMYSJKe7qu4+TRBrz3xhmEQ8bBVna7D6tWHIdJDhnKs/Ovhats\nPQOYkoohTERpzdsdxFt7TuHi2baoc+VTW7Bg3mkIMD4Dzs5bCveUjQxgSjqGMBGlJV3XceZEE955\n7TMEA8aQFUUVy6+9hDxXTdTr7Lmfg3vqFxnAlBIYwkSUdvy+EN557TOcPdkcdc7pDOD6FWcgCdEt\nY3veEuQygCmFMISJKK2c/6wFb796Cn5fOOrc3HkezCw/BugDzgkScqdshD1vKQOYUgpDmIjSQjCg\n4L03zuDUsYaoc7JJx+rVTbBKp4AByw/JllzkT78dZlvxBNWUaOQYwkSU8i5daMObr5yCpysYda5s\nqoCli09CCzdFnbO55iO3/FaIkmUiqkk0agxhIkpZ4ZCKQ2+dxfGP6qPOiaKAlTfqcNsOQgsbpx9B\nkOAu24Ds/GXsfqaUxhAmopR0+VIn9v3lRGTXo/5yC6y44YZGqL4j0I27EEI2u5E/43aYbSUTVFOi\nsWMIE1FKURQVVe9cwCfv18Y8v2ylC1OLPkDYF/1sOMs1F3nlX4IoWRNdTaK4YAgTUcpoqOvEW6+e\nQnuLL+qc02XFmptN0Lx7EfYPeDYsSHCX3Yzs/GvZ/UxphSFMREnX1eHH+2+fw5kT0fN+AWDB0iLM\nnX0BvraqqHOS2YX8GbfDYitNcC2J4o8hTERJEwwo+PhQDY5WXYKqRm9tnu20YM2GEpiU1+Frix6c\nlZUzB3nlX4Yos/uZ0hNDmIgmnKZpOHHkMj545wICMRbdAIDZC4uxbLmGrssvIqQOGJwliHCV3gxH\nwXJ2P1NaYwgT0YS6eK4VB/adjfncFwDyi7Kxcu102OWP0HnpUNR5yZyD/Om3w2IvS3BNiRKPIUxE\nE6K12YOD+86i9nx7zPP2bDOWr56JillWtF74Pbrb66KuycqZhdzyr0CSsxJdXaIJwRAmooTyeUOo\neuc8Thy5DD36sS9kk4jPXVeOxdcUI+g5jsZTb0JT/QOuEuEqWwdHwQp2P9OkwhAmooRQFBVHqy7h\no4MXEQ6pMa+ZvbAY16wqgu4/iqbP/huaEt1FLZmcyJ/xN7DYpya6ykQTjiFMRHF1ZZ/f9986h+4Y\naz0DQOnUHKxYnQeTfgydF16Grisxr7M6r0betK9Akm2JrDJR0jCEiShuGuo6ceCNs2is74p5Psed\nhZWftyHb8in8zacQO6IBCBJcJWvgKLye3c80qTGEiWjchltsw2KVsPIGAW5HNUK+S/BHLwcNABBE\nC7Lzl8FRsByy2ZnAGhOlBoYwEY1ZKKjg9b+cwPv7z8ZcbEOWNVx7XQAF7rNQw+0IxZ6VBMnkhKNw\nBbLzlnDbQcooDGEiGjVV7Vlso+rd2IttmE0hLF7SgaK8GuiaH2rs9ThgyiqGs3AlbO55EAQpwbUm\nSj0MYSIaMU3TcPp4Iw6/V4Puzug+ZZvNj3lzGlFUUAdAjdpm8AqrowLOopWwZM/gM1/KaAxhIhqW\nrus4d6oZH7xzAR2t0X3Kblcnrr6qHvl5zRg0UgURdvdCOApXwJxVlND6EqULhjARDUrXdVw824YP\n9p9HS5Nn4FkUFbaiYuYluHNij4YGAEGywJG3DNkcbEUUhSFMRDHV1bTjg/3n0VBnDFhR1FBW2oiK\n6Zdgtw9c2aqPZMqBo/A6DrYiGgJDmIgMGuu78MH+87h0wbjGsywrmDa1HjOm1cFiGWSkFa4Mtroe\nNvdcDrYiGgZDmIgAAK1NHnyw/zwunGk1lFssQcycVofyqZchy7GXnwQAq/MqOAtXwpI9nYOtiEaI\nIUyU4TrafKh69wLOfNpkKM+2+zBzei3KSpsgijF2XgAAQUReyRKYnNfCnFU4AbUlmlwYwkQZqrsz\ngMPvXcCpYw2G3Y3crk7MnHEJxYWtg75WEM3Izl8KR8EKlJSVobm5ewJqTDT5MISJMozPE8RHBy+i\n+pN6aJFVrnQUFrShYkYtct2Dj3QWZTscBdfBkb8MIvf0JRo3hjBRhgj4w/jk/Voc+/ASlHDPKhqC\noKG0pAkVMy7BkT3ImpIAZEsuHIUrkZ27GILIXxtE8cKfJqJJQtd1hEMqfN4QfJ6Q4b3XE8SFz1oQ\nCvYMrJIkBeVTGjBj2iVkZYUG/ZxmWymchdcjyzUHgiBO1D+FKGMwhIlSnKpq8PvC8PeGqc8bgt8T\ngrc3ZP3evsBVlEHWiexlNocwo7wO08ovw2SKvYcvcGVZyes50pkowRjCREmg63okWP2+Ky3WcE/A\nXgnV3rdYGySM8KvAZFJgMYdgsYRQUtyCKaUNkKRBRjpDgM09H87C62G2FY/1n0ZEo8AQJooTXdcR\n8Ifh7w3T/oHa9743aH0hw4jk0RBFFRZLCFZLCBZzGBZLT8heCdu+j8ODTy3qRxBk2POWwFm4ArLF\nPbZKEdGYMISJRkHTNHS2+dHa7EVrkwetzV54u4O9LdowNG20yapDFDWYTApMstLz3qTAbLoSruGo\ncDWZBl8wYzREKQvZBdfCkX8tJJM9Lp+TiEaHIUw0CL8vhNamvrBtbfKgvcUbtXm9IGiQZRVWa78g\nlRXIpr6Pr5TLJuM1JpMyotZqPEmmHDgLV8CetwSiZJ7Qr01ERgxhyniqqqGj1YfWJg9amrxoa/ag\ntckLn9c4algUVTiyfXA6PHA4vHA6vHBke2E2Dz7AKdkE0QLJlA3JZIdkciLLeTVs7nlc05koRTCE\nKWPoug6fJ4TW3pC98r6j1TegG1mHxRJCQb4XTocHTkfPe7vdj9QYKCz2Bms2RNke+ViSje9F2c6W\nLlGKYwhTylNVDeGQilBQQTisIhzqewuFVCghFaFQ77mgGrkmFLlOQTikIuBXEAoaW62ioCHb0Lrt\nCd0Jbd0KEkTJanyTs/pCtTdYxSvBK2Vx2hDRJMEQppSgKCqa6rtx+VInLtd2oKsjgIA/jHBIHcNg\np4F0SJIGWVaQn+eNtGydDi/sdh/EOKxBIUpWCJIVopQFUbJEgrTn46zokJWsEOXe14im8VeAiNIS\nQ5iSIhhQ0FDX2Ru6nWi63AVN1SEIOux2H+xZfthdGkRJgySpkEQNkqRBFHuPJa23TO0tu3K+/zkN\noqRCiuPAJ8nsgjmrCKasIpizimHOKoJkdrFlSkRjwhCmCeHzBCOBe7m2E63NHgA9XcE5Dg/mzvIg\nx9kNp8MLSRp61aeJIIgmmKyF/QK3570oWZJdNSKaRBjCFHe6rqOrI4DLtR2R4O3q8MKR7UOOsxvF\neR7MmuGB0+EZYvWmiSOZcqLCVra4uVYyESUcQ5jGTdd1tDV7e1q5lzrQcKkdotCOHGdP63bB7J5B\nT/HsFh4tQZAhiCbIFrchbM3WIoiyNWn1IqLMxhDOALquIxRUegc6aVBVDarS815RNGi97yPlim64\nJvq9DkVVoakKdC2MkL8NNmsHcnI8KHZ5cPUU77gWoBBlO+w5pVBVMwSxJzwFUYYgmPo+Fk0QRVNv\nuF65pvdN6H8s9xzzmS0RpSCGcJrRdR1KWO1Z/N8XRNAfRMAXQCAQRDAQRCgQQigYQjgQRCgURjgU\nQjgUhihqEAS9bzBTv4FNknhlAJQGqXdgkyhqkCUVFkmDaDaeu/IWD5KcDZOtBOYrb1klkEwOFBY6\n0dzcHZevQUSUqkYUwjt27MCRI0cgCAK2bt2KhQsXRs4dOHAATz75JCRJwuc//3n8wz/8Q8IqO1F0\nXYeuhaFpQehqEErYj1DQh3DQDyXkhxIOQFP8UJUANDUE6CHoug7gSuuv//sYH+vGMgE6dADCgGt1\nXYOuq4CuQoAGCD3vRVHvDdWeq60ArDKA7N63FCWZHJGgvRK6ksmR7GoRESXNsCFcVVWFmpoa7Nq1\nC2fPnsVPf/pT7Nq1K3J++/btePbZZ1FYWIhvfetb2LBhAyoqKgb9fAHPJWhaTxeoqqrQtZ4uTl3T\nIuWarkFT1d6wuhJI6PfxwEC7UtbvOCr0et7qTQJ83m5oahC6HgK0IIAQBIQgCuGeN0mBKAzfnSr1\nvkHofaMIyeQ0tG57AjeF/0IgIkqCYUP44MGDqKysBABUVFSgq6sLXq8XdrsdtbW1cLlcKCoqAgCs\nXr0ahw4dGjKEmz57Nk5VNxqYgYNlohICzEBvetL4iIBggiBZYbUP7FLmrjxERMMZNoRbWlqwYMGC\nyLHb7UZLSwvsdjtaWlqQm5sbOZebm4va2trE1JSi6Dqg6yJ0iLjSLhcECYLY8yaKMkSp902UAUHs\nGcwUGeQkGwY6DSwTo8r6D3qSOYWHiGicRj0wSx9iJ/KhzqUbVRWgKDLCigxFkaBqJmiaCbpugg4z\ndJghiGZAtEAUzYj0SQsA9N5wEoTeFrkAHUK/Ebr93gv9XydA6Pd5TGYzLFYzLFYLLFk9b1k2K2Sz\nGSJDkIgo7Q0bwoWFhWhpaYkcNzU1oaCgIHKuubk5cq6xsRGFhYVDfr5l63851rpShiko4KCtdMF7\nlR54n1LPsE2pVatWYe/evQCA6upqFBUVwWazAQDKysrg9XpRX18PRVHw1ltv4YYbbkhsjYmIiCYJ\nQR9BH/ITTzyBDz74AJIk4aGHHsKnn34Kh8OByspKHD58GL/61a8AABs3bsR3v/vdRNeZiIhoUhhR\nCBMREVH8cWQPERFRkjCEiYiIkoQhTERElCSTfgOHxx57DB999BFUVcX3v/997Nu3D8ePH4fb7QYA\nbN68GatXr05yLTNbIBDA/fffj9bWVoRCIdx9992YM2cOfvzjH0PXdRQUFOCxxx6DyWRKdlUzWqz7\ntHfvXv48pbBgMIhbb70V99xzD1asWMGfqRQ0qQdmvf/++3j22Wfx1FNPoaOjA1/72tewYsUKbNy4\nkb8oUsgrr7yCy5cvY/Pmzaivr8ddd92FpUuXYs2aNdiwYQOefPJJlJSUYNOmTcmuakYb7D7x5yl1\nPfnkkzhw4AC++c1v4v3338fatWuxfv16/kylkEndHb18+XL8y7/8CwDA6XTC5/NB07RJtbLXZPCF\nL3wBmzdvBgDU19ejpKQEVVVVuOmmmwAAa9euxYEDB5JZRULs+wRMrpXyJpNz587h3LlzWL16NXRd\nR1VVFdauXQuAP1OpZFKHsCAIsFqtAICXXnoJa9asgSiKeO655/C3f/u3+NGPfoSOjo4k15Ku2LRp\nE37yk5/ggQcegN/vj3SV5eXlGVZmo+S6cp+2bt0KAHj++ef585SCfvGLX+D++++PHPNnKjVN+mfC\nAPD666/j5ZdfxjPPPIPjx4/D5XJhzpw5ePrpp/HrX/8aDz74YLKrSAB27dqFkydP4h//8R8NrSu2\ntFJL//u0detW/jyloD/84Q9YsmQJysrKYp7nz1TqmPQh/M477+Dpp5/GM888g+zsbKxYsSJybt26\ndXjkkUeSVzkC0LMcal5eHoqLizFnzhxomga73Y5QKASz2TyiNckp8QbeJ1VVMWvWrMhOavx5Sh1v\nv/02Ll26hDfffBONjY0wmUyw2Wz8mUpBk7o72uPx4Je//CX+/d//HQ5Hz8Ll9913X2S7xffffx+z\nZs1KZhUJQFVVFZ59tmef6ZaWFvh8PqxcuRJ79uwBAOzduxc33nhjMqtIiH2fHn74Yf48paAnn3wS\nL730En73u9/h9ttvxz333MOfqRQ1qUdHv/jii9i5cyemT58OXdchCAJuu+02PPfcc8jKyoLdbsfP\nf/5zw57INPGCwSC2bt2KhoYGBINB/PCHP8T8+fPxk5/8BKFQCKWlpdixYwckSUp2VTPawPt07733\nwmaz4bHHHuPPUwrbuXMnpkyZghtuuIE/UyloUocwERFRKpvU3dFERESpjCFMRESUJAxhIiKiJGEI\nExERJQlDmIiIKEkYwkREREnCECYiIkoShjAREVGS/P8gW3N8M4na4wAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "first_pmf.plot_cumulative(linewidth=4, color=COLORS[3], label='firsts')\n", "other_pmf.plot_cumulative(linewidth=4, color=COLORS[4], label='others')\n", "plt.xlim(23.5, 44.5)\n", "plt.legend(loc='upper left')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The CDFs are similar up to week 38. After that, first babies are more likely to be born late." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: don't be afraid of thick lines. Differences that are only visible with thin lines are unlikely to be real.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Word Frequencies\n", "----------------\n", "\n", "Next topic: let's look at histograms of words, bigrams and trigrams.\n", "\n", "The following function reads lines from a file or URL and splits them into words:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def iterate_words(filename):\n", " \"\"\"Read lines from a file and split them into words.\"\"\"\n", " for line in open(filename):\n", " for word in line.split():\n", " yield word.strip()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's an example using a book from Project Gutenberg. `wc` is a histogram of word counts:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# FAIRY TALES\n", "# By The Brothers Grimm\n", "# http://www.gutenberg.org/cache/epub/2591/pg2591.txt'\n", "wc = Hist(iterate_words('pg2591.txt'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the 20 most common words:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[('the', 6507),\n", " ('and', 5250),\n", " ('to', 2707),\n", " ('a', 1932),\n", " ('he', 1817),\n", " ('of', 1450),\n", " ('was', 1337),\n", " ('in', 1080),\n", " ('she', 1049),\n", " ('that', 1021),\n", " ('his', 1014),\n", " ('you', 941),\n", " ('it', 881),\n", " ('her', 880),\n", " ('had', 827),\n", " ('I', 755),\n", " ('they', 751),\n", " ('for', 721),\n", " ('with', 720),\n", " ('as', 718)]" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wc.most_common(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Word frequencies in natural languages follow a predictable pattern called Zipf's law (which is an instance of Stigler's law, which is also an instance of Stigler's law).\n", "\n", "We can see the pattern by lining up the words in descending order of frequency and plotting their ranks (1st, 2nd, 3rd, ...) versus counts (6507, 5250, 2707):" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgcAAAFmCAYAAAD54TlZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X90VPWd//HXnR8ZdBLBCUk0xl9dduO3StyARyQpR8NB\n3LL+KD1hdSPx7MbVKjRql1+R4I+69hD5YcquZvsDuwtFFpEc3Wg5SU9Nj9smkeQkMavsoe1Kj1uJ\nTGZSAwkJhMnc7x/Ukaugd5CbITPPxx+ezM1M5n3fDuTF58e9hmmapgAAAP7ElegCAADAuYVwAAAA\nLAgHAADAgnAAAAAsCAcAAMCCcAAAACw8Tv7wXbt26T//8z9lGIZM09TevXu1e/durVixQqZpKisr\nS+vWrZPX61VDQ4O2bt0qt9utRYsWqbS0VJFIRFVVVert7ZXb7dbatWuVl5fnZMkAAKQ8Y7yuc9DR\n0aHGxkYNDw+rpKRE8+fPV21trS6++GLdcccdWrhwoerr6+XxeFRaWqoXX3xRzc3Neuedd/TYY4+p\npaVFu3btUm1t7XiUCwBAyhq3aYXnn39eS5YsUXt7u0pKSiRJJSUlam1tVU9PjwoKCuT3++Xz+TRj\nxgx1dnaqra1N8+bNkyQVFRWpq6trvMoFACBljUs4eOedd3TxxRcrMzNTIyMj8nq9kqTMzEz19fWp\nv79fgUAg9vxAIKBQKKRwOBw7bhiGXC6XIpHIeJQMAEDKGpdw8PLLL+ub3/zmZ46fbkbjdMej0ehZ\nrQsAAHzWuISD9vZ2FRYWSpL8fr9GR0clScFgUDk5OcrOzlYoFIo9/+Tj4XBYkmIjBh7P56+h5FYR\nAAB8OY7uVpCkvr4++f3+2C/12bNnq6mpSbfddpuampo0Z84cFRQUaM2aNRoaGpJhGOru7lZ1dbUG\nBwfV2Nio4uJiNTc3a9asWV/4foZhKBQadPq0JrysrAz6ZBO9soc+2UOf7KNX9mRlZZz1n+l4OAiF\nQsrMzIw9rqys1KpVq/TSSy8pNzdXCxculNvt1rJly1RRUSGXy6XKykqlp6drwYIFamlpUVlZmXw+\nn2pqapwuFwCAlDduWxnHE0nzi5HI7aNX9tAne+iTffTKHidGDrhCIgAAsCAcAAAAC8IBAACwIBwA\nAAALwgEAALAgHAAAAAvCAQAAsCAcAAAAC8IBAACwIBwAAAALwgEAALAgHAAAAIukCwedH36kgWPH\nE10GAAATluO3bB5vP+j+vVySbrw4oJLcC+VxJV3+AQDAUUn5mzMq6Zcf/lE/+0M40aUAADDhJGU4\n+NievkP6v6GRRJcBAMCEktThQJJ+d2g40SUAADChJH04ODhyLNElAAAwoSR9OBiLJroCAAAmlqQP\nBwAAID6EAwAAYJH04cCUmegSAACYUJI+HAAAgPgQDgAAgAXhAAAAWBAOAACABeEAAABYEA4AAIBF\n0ocDNjICABCfpA8HAAAgPoQDAABgQTgAAAAWhAMAAGBBOAAAABaEAwAAYOFx+g0aGhr0wgsvyOPx\n6KGHHlJ+fr5WrFgh0zSVlZWldevWyev1qqGhQVu3bpXb7daiRYtUWlqqSCSiqqoq9fb2yu12a+3a\ntcrLy4vr/U32MgIAEBdHRw4GBgb0/PPPa8eOHfrhD3+oN954Q5s2bVJ5ebm2bdumyy67TPX19RoZ\nGVFdXZ22bNmirVu3asuWLTp8+LBef/11TZ48Wdu3b9cDDzygjRs3OlkuAACQw+GgtbVVxcXFOu+8\n8zR16lQ99dRTam9vV0lJiSSppKREra2t6unpUUFBgfx+v3w+n2bMmKHOzk61tbVp3rx5kqSioiJ1\ndXU5WS4AAJDD0woHDhzQyMiIHnzwQQ0ODmrp0qU6evSovF6vJCkzM1N9fX3q7+9XIBCIvS4QCCgU\nCikcDseOG4Yhl8ulSCQij8fx2RAAAFKWo79lTdOMTS0cOHBA99xzj8yTFgGYp1kQcLrj0WjUkToB\nAMAnHA0HU6dOVWFhoVwuly699FL5/X55PB6Njo4qLS1NwWBQOTk5ys7OVigUir0uGAyqsLBQ2dnZ\nCofDys/PVyQSOVFwnKMG3jS3srIyzup5JQv6Yh+9soc+2UOf7KNXieFoOCguLtbq1at13333aWBg\nQMPDw/ra176mxsZG3X777WpqatKcOXNUUFCgNWvWaGhoSIZhqLu7W9XV1RocHFRjY6OKi4vV3Nys\nWbNmxV3D8dExhUKDDpzdxJaVlUFfbKJX9tAne+iTffTKHicClKPhICcnR7fccov+5m/+RoZh6PHH\nH9c111yjlStXaufOncrNzdXChQvldru1bNkyVVRUyOVyqbKyUunp6VqwYIFaWlpUVlYmn8+nmpqa\nuGtgJyMAAPExzNNN8E9Q9+227miYdsH5qsi/JEHVnLtI5PbRK3vokz30yT56ZY8TIwdcIREAAFik\nQDhIqoERAAAclwLhAAAAxINwAAAALAgHAADAIunDASsOAACIT9KHAwAAEB/CAQAAsCAcAAAAi6QP\nB8l1/UcAAJyX9OEAAADEh3AAAAAsCAcAAMCCcAAAACwIBwAAwIJwAAAALJI+HLCTEQCA+CR9OAAA\nAPEhHAAAAAvCAQAAsCAcAAAAC8IBAACwIBwAAACLpA8HbGUEACA+SR8OAABAfAgHAADAIvnDgcnE\nAgAA8Uj+cAAAAOJCOAAAABaEAwAAYJH04YAVBwAAxCfpwwEAAIgP4QAAAFgQDgAAgAXhAAAAWHic\n/OHt7e16+OGH9ed//ucyTVP5+fn6h3/4B61YsUKmaSorK0vr1q2T1+tVQ0ODtm7dKrfbrUWLFqm0\ntFSRSERVVVXq7e2V2+3W2rVrlZeX52TJAACkPEfDgSRdf/312rRpU+zxo48+qvLycs2fP1+1tbWq\nr6/XHXfcobq6OtXX18vj8ai0tFTz589Xc3OzJk+erA0bNqilpUUbN25UbW2t0yUDAJDSHJ9WMD91\n+eL29naVlJRIkkpKStTa2qqenh4VFBTI7/fL5/NpxowZ6uzsVFtbm+bNmydJKioqUldXV/zv/+VP\nAQCAlOL4yMF7772nJUuW6NChQ1q6dKmOHj0qr9crScrMzFRfX5/6+/sVCARirwkEAgqFQgqHw7Hj\nhmHI5XIpEonI43G8bAAAUpajv2Uvv/xyffvb39bXv/51/eEPf9A999yjSCQS+/6nRxW+6Hg0GnWk\nTgAA8AlHw0FOTo6+/vWvS5IuvfRSTZ06Ve+++65GR0eVlpamYDConJwcZWdnKxQKxV4XDAZVWFio\n7OxshcNh5efnx0JFvKMGXo9bWVkZZ++kkgh9sY9e2UOf7KFP9tGrxHA0HLz22msKhUKqqKhQKBRS\nf3+/vvnNb6qxsVG33367mpqaNGfOHBUUFGjNmjUaGhqSYRjq7u5WdXW1BgcH1djYqOLiYjU3N2vW\nrFlx13D8+JhCoUEHzm5iy8rKoC820St76JM99Mk+emWPEwHK0XAwd+5cLVu2TG+88YYikYi++93v\n6qqrrtKqVau0c+dO5ebmauHChXK73Vq2bJkqKirkcrlUWVmp9PR0LViwQC0tLSorK5PP51NNTY2T\n5QIAAEmGeboJ/gnqvt3WHQ2X+Sfpga9emqBqzl0kcvvolT30yR76ZB+9sseJkYOkv0JiUiUfAADG\nQdKHAwAAEB/CAQAAsCAcAAAAC8IBAACwIBwAAAALwgEAALBI+nBgspkRAIC4JH04AAAA8SEcAAAA\nC8IBAACwIBwAAAALwgEAALAgHAAAAIukDwfJdUNqAACcl/ThAAAAxIdwAAAALAgHAADAgnAAAAAs\nCAcAAMCCcAAAACySPhywkxEAgPgkfTgAAADxIRwAAAALwgEAALAgHAAAAAvCAQAAsCAcAAAAi6QP\nB2xlBAAgPkkfDgAAQHwIBwAAwIJwAAAALJI/HJisOgAAIB7JHw4AAEBcCAcAAMDC8XBw7Ngx3Xzz\nzXr11Vd18OBBlZeXa/HixfrOd76j48ePS5IaGhpUWlqqO++8U7t27ZIkRSIRLV++XGVlZSovL9cH\nH3zgdKkAAEDjEA7q6uo0ZcoUSdKmTZtUXl6ubdu26bLLLlN9fb1GRkZUV1enLVu2aOvWrdqyZYsO\nHz6s119/XZMnT9b27dv1wAMPaOPGjWf0/qw4AAAgPo6Gg/3792v//v268cYbZZqmOjo6VFJSIkkq\nKSlRa2urenp6VFBQIL/fL5/PpxkzZqizs1NtbW2aN2+eJKmoqEhdXV1OlgoAAP7E0XDwzDPPqKqq\nKvZ4ZGREXq9XkpSZmam+vj719/crEAjEnhMIBBQKhRQOh2PHDcOQy+VSJBJxslwAACDJ49QPfvXV\nV1VYWKhLLrnklN83T7PF8HTHo9HoGdXh9riUlZVxRq9NdvTFPnplD32yhz7ZR68Sw7Fw8Oabb+qD\nDz7QL3/5SwWDQXm9Xp1//vkaHR1VWlqagsGgcnJylJ2drVAoFHtdMBhUYWGhsrOzFQ6HlZ+fHxsx\n8HjiL3csElUoNHjWzitZZGVl0Beb6JU99Mke+mQfvbLHiQDlWDiora2Nff3cc88pLy9PXV1damxs\n1O23366mpibNmTNHBQUFWrNmjYaGhmQYhrq7u1VdXa3BwUE1NjaquLhYzc3NmjVrllOlAgCAkzgW\nDk7loYce0sqVK7Vz507l5uZq4cKFcrvdWrZsmSoqKuRyuVRZWan09HQtWLBALS0tKisrk8/nU01N\nzXiWCgBAyjLM003yT1D37bbuasg5L00PX3N5gqo5dzFcZx+9soc+2UOf7KNX9jgxrcAVEgEAgAXh\nAAAAWCR9OEiqORMAAMZB0ocDAAAQH8IBAACwIBwAAAALW+Hg5PsjfOzee+8968U4gkUHAADE5XMv\ngtTQ0KAdO3bod7/7ne6+++7Y8ePHjyscDjteHAAAGH+fGw5uv/12zZo1S8uXL1dlZWXsuMvl0rRp\n0xwvDgAAjL8vvHxyTk6OfvrTn2pwcFADAwOx44ODg5oyZYqjxZ0NzCoAABAfW/dWePrpp1VfX69A\nIBC7pbJhGHrjjTccLQ4AAIw/W+Fgz549euutt+Tz+ZyuBwAAJJit3QqXX345wQAAgBRha+Tgoosu\n0t13362ZM2fK7XbHjj/88MOOFXb2sOoAAIB42AoHU6ZM0ezZs52uBQAAnANshYMlS5Y4XQcAADhH\n2AoHX/3qV2UYRuyxYRjKyMjQnj17HCvsbGFSAQCA+NgKB/v27Yt9PTo6qra2Nv3mN79xrCgAAJA4\ncd94KS0tTTfeeKNaWlqcqAcAACSYrZGDXbt2WR4fPHhQwWDQkYIAAEBi2QoHnZ2dlsfp6en6/ve/\n70hBAAAgsWyFg7Vr10qSBgYGZBiGJk+e7GhRAAAgcWyFg66uLq1cuVJHjhyRaZqaMmWK1q9fr+nT\npztdHwAAGGe2wsHGjRtVV1env/iLv5Ak/c///I++973v6cUXX3S0uLPBZC8jAABxsbVbweVyxYKB\ndOK6BydfRhkAACQP2+GgqalJQ0NDGhoa0u7duwkHAAAkKVvTCt/97nf1T//0T1qzZo1cLpeuuuoq\nPf30007XBgAAEsDWyEFLS4vS0tLU0dGhPXv2KBqN6s0333S6NgAAkAC2wkFDQ4Oee+652OOf/OQn\neu211xwrCgAAJI6tcDA2NmZZY+ByxX3VZQAAMEHYWnMwd+5c3XXXXZo5c6ai0ajeeustzZ8/3+na\nAABAAtgKB0uWLNH111+v//7v/5ZhGHriiSf0l3/5l07XdlZwmQMAAOJjKxxI0nXXXafrrrvOyVoA\nAMA5gMUDAADAgnAAAAAsbE8rnImjR4+qqqpK/f39Gh0d1YMPPqirrrpKK1askGmaysrK0rp16+T1\netXQ0KCtW7fK7XZr0aJFKi0tVSQSUVVVlXp7e+V2u7V27Vrl5eU5WTIAACnP0XDQ3Nys6dOn6957\n71Vvb6/+/u//XjNmzNDixYt1yy23qLa2VvX19brjjjtUV1en+vp6eTwelZaWav78+WpubtbkyZO1\nYcMGtbS0aOPGjaqtrXWyZAAAUp6j0woLFizQvffeK0nq7e3VxRdfrI6ODs2dO1eSVFJSotbWVvX0\n9KigoEB+v18+n08zZsxQZ2en2traNG/ePElSUVGRurq6nCwXAADI4ZGDj911113q6+vTv/7rv6qi\nokJer1eSlJmZqb6+PvX39ysQCMSeHwgEFAqFFA6HY8cNw5DL5VIkEpHHY79sk82MAADEZVzCwY4d\nO7Rv3z4tX75cpvnJL+uTvz7Z6Y5Ho9G439vtcikrKyPu16UC+mIfvbKHPtlDn+yjV4nhaDjYu3ev\nMjMzddFFF+mqq65SNBqV3+/X6Oio0tLSFAwGlZOTo+zsbIVCodjrgsGgCgsLlZ2drXA4rPz8fEUi\nkRMFxzFqIElj0ahCocGzel7JICsrg77YRK/soU/20Cf76JU9TgQoR9ccdHR06Cc/+YkkKRwOa3h4\nWLNnz1ZjY6MkqampSXPmzFFBQYHeffddDQ0N6ciRI+ru7tbMmTNVXFwce25zc7NmzZoVfxHMKgAA\nEBdHRw7+9m//VqtXr9bdd9+tY8eO6cknn9TVV1+tlStXaufOncrNzdXChQvldru1bNkyVVRUyOVy\nqbKyUunp6VqwYIFaWlpUVlYmn8+nmpoaJ8sFAACSDPN0E/wT1H27rTsaLkzzaMW1VyaomnMXw3X2\n0St76JM99Mk+emXPhJtWAAAAE0/Sh4OkGhYBAGAcJH04AAAA8SEcAAAAC8IBAACwIBwAAAALwgEA\nALAgHAAAAIukDwdsZQQAID5JHw4AAEB8CAcAAMAi+cMB8woAAMQl+cMBAACIC+EAAABYEA4AAIBF\n0ocDlhwAABCfpA8HAAAgPoQDAABgQTgAAAAWKRAOWHUAAEA8UiAcAACAeBAOAACARdKHAyYVAACI\nT9KHAwAAEB/CAQAAsCAcAAAAC8IBAACwIBwAAAALwgEAALBI+nBgspcRAIC4JH04AAAA8SEcAAAA\nC8IBAACwSPpwwJIDAADi43H6DdatW6euri6NjY3p/vvv1/Tp07VixQqZpqmsrCytW7dOXq9XDQ0N\n2rp1q9xutxYtWqTS0lJFIhFVVVWpt7dXbrdba9euVV5entMlAwCQ0hwNB3v27NF7772nHTt2aGBg\nQAsXLtQNN9ygxYsX65ZbblFtba3q6+t1xx13qK6uTvX19fJ4PCotLdX8+fPV3NysyZMna8OGDWpp\nadHGjRtVW1vrZMkAAKQ8R6cVrr/+em3atEmSdMEFF2h4eFgdHR2aO3euJKmkpEStra3q6elRQUGB\n/H6/fD6fZsyYoc7OTrW1tWnevHmSpKKiInV1dTlZLgAAkMPhwDAMTZo0SZK0a9cu3XTTTRoZGZHX\n65UkZWZmqq+vT/39/QoEArHXBQIBhUIhhcPh2HHDMORyuRSJRJwsGQCAlDcuCxJ/8YtfqL6+Xo89\n9pjMk65KZJ7mCkWnOx6NRh2pDwAAfMLxBYm/+tWv9KMf/UgvvPCC0tPT5ff7NTo6qrS0NAWDQeXk\n5Cg7O1uhUCj2mmAwqMLCQmVnZyscDis/Pz82YuDxxFeyy2UoKyvjrJ5TsqAv9tEre+iTPfTJPnqV\nGI6Gg6GhIa1fv17//u//royME/+DZ8+eraamJt12221qamrSnDlzVFBQoDVr1mhoaEiGYai7u1vV\n1dUaHBxUY2OjiouL1dzcrFmzZsVdw1jUVCg0eLZPbcLLysqgLzbRK3vokz30yT56ZY8TAcrRcLB7\n924NDAzokUcekWmaMgxDzzzzjKqrq/XSSy8pNzdXCxculNvt1rJly1RRUSGXy6XKykqlp6drwYIF\namlpUVlZmXw+n2pqapwsFwAASDLM003wT1D37bbuaDjf49aawq8kqJpzF4ncPnplD32yhz7ZR6/s\ncWLkIOmvkMg1EgEAiE8KhAMAABAPwgEAALAgHAAAAIukDwfJtdwSAADnJX04AAAA8SEcAAAAC8IB\nAACwIBwAAAALwgEAALAgHAAAAIukDwfsZAQAID5JHw4AAEB8CAcAAMCCcAAAACwIBwAAwIJwAAAA\nLAgHAADAIunDAVsZAQCIT9KHAwAAEB/CAQAAsCAcAAAAi+QPByw6AAAgLskfDgAAQFwIBwAAwCLp\nw4HJvAIAAHFJ+nAAAADiQzgAAAAWhAMAAGBBOAAAABaEAwAAYEE4AAAAFkkfDtjICABAfJI+HAAA\ngPgQDgAAgIXj4eC3v/2tbr75Zr344ouSpIMHD6q8vFyLFy/Wd77zHR0/flyS1NDQoNLSUt15553a\ntWuXJCkSiWj58uUqKytTeXm5PvjgA6fLBQAg5TkaDkZGRvT0009r9uzZsWObNm1SeXm5tm3bpssu\nu0z19fUaGRlRXV2dtmzZoq1bt2rLli06fPiwXn/9dU2ePFnbt2/XAw88oI0bN8Zdg8miAwAA4uJo\nOPD5fNq8ebOys7Njx9rb21VSUiJJKikpUWtrq3p6elRQUCC/3y+fz6cZM2aos7NTbW1tmjdvniSp\nqKhIXV1dcdcQMU2ZJAQAAGxzNBy4XC6lpaVZjo2MjMjr9UqSMjMz1dfXp/7+fgUCgdhzAoGAQqGQ\nwuFw7LhhGHK5XIpEInHXMUY4AADAtoQuSDzdv+hPdzwajZ7R+0SihAMAAOzyjPcb+v1+jY6OKi0t\nTcFgUDk5OcrOzlYoFIo9JxgMqrCwUNnZ2QqHw8rPz4+NGHg88Zc8OeDXBT7vWTuHZJGVlZHoEiYM\nemUPfbKHPtlHrxJj3MPB7Nmz1dTUpNtuu01NTU2aM2eOCgoKtGbNGg0NDckwDHV3d6u6ulqDg4Nq\nbGxUcXGxmpubNWvWrDN6z4OhQR0jHFhkZWUoFBpMdBkTAr2yhz7ZQ5/so1f2OBGgHA0He/fuVU1N\njXp7e+XxeNTU1KQNGzaoqqpKL730knJzc7Vw4UK53W4tW7ZMFRUVcrlcqqysVHp6uhYsWKCWlhaV\nlZXJ5/OppqbmjOpgWgEAAPsMM8mW8t+3+7M7GiqvvkwXn+9LQDXnLhK5ffTKHvpkD32yj17Z48TI\nQUpcIZGRAwAA7EuJcHD8DHc5AACQilIiHESSa+YEAABHpUY4YFoBAADbCAcAAMAiNcIB0woAANiW\nEuHgOCMHAADYlhLhIMJuBQAAbEuNcMC0AgAAtqVEOGBaAQAA+1IiHLBbAQAA+1IjHDCtAACAbakR\nDliQCACAbSkSDhg5AADArpQIB8eZVgAAwLaUCAeMHAAAYF9KhIOjY6w5AADArtQIB5GxRJcAAMCE\nkRLhYJiRAwAAbEuJcDDCyAEAALalRDgYjZosSgQAwKaUCAeSNDLG6AEAAHakTjiIsO4AAAA7Uigc\nMHIAAIAdqRMOmFYAAMCWlAkHw0wrAABgS8qEgyNMKwAAYEvKhIPW4ECiSwAAYEJImXDgMhJdAQAA\nE0PKhIOPjkU0eDyS6DIAADjnpUw4kKSG9/sSXQIAAOe8lAoHez86op7+wUSXAQDAOS2lwoEkvbT/\noJ7f+38KjhxLdCkAAJyTPIkuIBEODB/Tpnf/T16XoUv9k3Shz6uAz6vJaR6d53Ypw+uRYUg+t0t+\nj1uGpDS3Sy6DVY0AgOSXkuHgY8ejpvYPjkiDI1/4XLdhyOMy5JKU5nLpfI9LUUku48Qxl2Eo3evW\nmGnKkGQYhgyd2CVhyJBhSIYkj8vQeW73ibtE/unYicjxyXM+jiAf/4wTX39y3G0YchvGSTswjE/+\na5x85LNff/xz04+MaGjo2Cm+d6rXWev4vJ/rNiTXn87FaWf+FvG9cvJYRIcOHT2jcxrPOHnG73WW\nzqs3GtXhw8Nn/c3GtYfj8GZBmTp0aHicPxtn+G4J/sz3u6SBz/lMjVcPDcPQ1EleZXhT51fmOX+m\na9euVU9PjwzD0OrVqzV9+vSE1DFmmhobO3Hb55GxqA4dT0gZAIAEuTDNo1svz9L/m5Ke6FIcd06v\nOejo6ND777+vHTt26Omnn9b3vve9RJcEAEhRH41G9NPffahfH/wo0aU47pwOB21tbZo3b54k6c/+\n7M90+PBhHTly5HNfk+Y+p08JADDBNX3Qr76R0USX4ahz+jdpOBxWIBCIPb7wwgsVDoc/9zVPfO3/\nOV0WACCFjZmm3vljcm+LP6fDwaeZpvmFz8n2+/Stq/LGoRoAQKr6cDi5t8Of0wsSs7OzLSMFfX19\nysrK+sLXXfeVbF33lWwnSwMAIGmd0yMHxcXFampqkiTt3btXOTk5Ov/88xNcFQAAye2cHjkoLCzU\n1Vdfrbvuuktut1uPP/54oksCACDpGaadiXwAAJAyzulpBQAAMP4IBwAAwIJwAAAALM7pBYnxOFfu\nwZBo69atU1dXl8bGxnT//fdr+vTpWrFihUzTVFZWltatWyev16uGhgZt3bpVbrdbixYtUmlpqSKR\niKqqqtTb2yu32621a9cqLy+5rxlx7Ngx3XrrrVq6dKluuOEGenUKDQ0NeuGFF+TxePTQQw8pPz+f\nPn3K8PCwVq1apUOHDun48eNaunSppk2bRp9O8tvf/lZLly7V3/3d3+nuu+/WwYMHv3R/9u3bpyef\nfFIul0v5+fl64oknEn2aZ8Wne/Xhhx9q9erVikQi8nq9Wr9+vTIzM53tlZkE2tvbzW9961umaZrm\n//7v/5p33nlngitKjLfeesu8//77TdM0zY8++si86aabzKqqKrOxsdE0TdN89tlnzf/4j/8wh4eH\nzVtuucUcGhoyjx49at56663moUOHzFdeecV86qmnTNM0zV//+tfmI488krBzGS/PPvusWVpaar7y\nyitmVVWV2dTUFDtOr058jubPn28ODw+boVDIfOyxx+jTKWzbts189tlnTdM0zWAwaP7VX/0Vf/ZO\nMjw8bJaXl5uPPfaYuW3bNtM0zbPyOSovLzffffdd0zRN8x//8R/N//qv/0rA2Z1dp+rVqlWrYp+l\nbdu2mevXr3e8V0kxrXAm92BIRtdff702bdokSbrgggs0PDysjo4OzZ07V5JUUlKi1tZW9fT0qKCg\nQH6/Xz5POjxZAAAGGklEQVSfTzNmzFBnZ6elj0VFRerq6krYuYyH/fv3a//+/brxxhtlmqY6OjpU\nUlIiiV59rLW1VcXFxTrvvPM0depUPfXUU2pvb6dPn3LhhRfqo49O3Izn0KFDCgQC/Nk7ic/n0+bN\nm5Wd/cnF6b7M56i7u1vHjx/XBx98oKuvvlqSNHfuXLW2to7/yZ1lp+rVk08+qfnz50uSAoGABgYG\nHO9VUoSDM7kHQzIyDEOTJk2SJO3atUs33XSTRkZG5PV6JUmZmZnq6+tTf3+/pV+BQEChUMjSR8Mw\n5HK5FIlExv9Exskzzzyjqqqq2GN69VkHDhzQyMiIHnzwQS1evFhtbW06evQoffqUBQsWqLe3V/Pn\nz1d5eblWrlzJ5+kkLpdLaWlplmNfpj+GYSgcDmvKlCmfee5Ed6peTZo0SYZhKBqNavv27br11ls/\n83vvbPcqadYcnMxM8Us3/OIXv1B9fb1eeOGFWNqUTt+X0x2PRqOO1HcuePXVV1VYWKhLLrnklN+n\nVyeYpqmBgQE9//zzOnDggO655x5LD+jTCQ0NDcrNzdXmzZv1m9/8Ro8++qjl+/Tp88XbH9M0ZRhG\nSv1dH41GtWLFCs2ePVs33HCDXn/9dcv3z3avkmLk4EzvwZCMfvWrX+lHP/qRNm/erPT0dPn9fo2O\nnri1aDAYVE5OjrKzsy2p8eTjH/fx43+1eDxJmR/15ptv6o033tCdd96pXbt2qa6uTueffz69+pSp\nU6eqsLBQLpdLl156qfx+P5+pU+jq6tKcOXMkSfn5+QqFQjrvvPPo0+f4Mp8j80+LGAcGBizPPXko\nPtk8+uijuvLKK7VkyRJJcrxXSREOuAfDCUNDQ1q/fr1+8IMfKCMjQ5I0e/bsWG+ampo0Z84cFRQU\n6N1339XQ0JCOHDmi7u5uzZw5U8XFxWpsbJQkNTc3a9asWQk7F6fV1tbq5Zdf1ksvvaTS0lItXbpU\ns2fPjp0/vTqhuLhYe/bskWma+uijjzQ8PEyfTuHyyy/X22+/LenEVIzf71dRURF9+hxf9u8mt9ut\nr3zlK7H1GT//+c9jAS3ZNDQ0KC0tTd/+9rdjx6699lpHe5U0l09+9tln1d7eHrsHQ35+fqJLGnc7\nd+7Uc889pyuuuCI2lPTMM8+ourpao6Ojys3N1dq1a+V2u/Xzn/9cmzdvlsvlUnl5uf76r/9a0WhU\n1dXVev/99+Xz+VRTU6OcnJxEn5bjnnvuOeXl5elrX/uaVq5cSa8+ZefOnXr55ZdlGIaWLFmia665\nhj59yvDwsFavXq3+/n6NjY3pkUce0ZVXXqlVq1bRJ534R1tNTY16e3vl8XiUk5OjDRs2qKqq6kv1\n57333tPjjz8u0zR17bXXatWqVYk+1S/tVL364x//qLS0NPn9fhmGoWnTpunxxx93tFdJEw4AAMDZ\nkRTTCgAA4OwhHAAAAAvCAQAAsCAcAAAAC8IBAACwIBwAAAALwgEAx7S3t6usrCzRZQCIE+EAgKMM\nw0h0CQDilNwX7wbwpbW3t6uurk6TJk3SzJkz9dZbb2lsbEyDg4MqLy/XN77xDb3yyitqbW1VNBrV\n73//e+Xl5emf//mfLT9n3759WrFihTZv3pxUV/8DkhHhAMAX2rt3r5qbm3XgwAFNmzZNJSUlCoVC\nuu222/SNb3xDkvT222/rZz/7mdLS0nTzzTdr3759sdcHg0FVVVXpX/7lXwgGwARAOADwha688kpl\nZGQoKytLP/7xj/XjH/9Ybrdbhw4dij2noKAgdh/6iy66SAMDA3K5XBoaGtJ9992nRx55RFdccUWC\nzgBAPFhzAOALeb1eSdL3v/99XXHFFdq+fbt++MMfWp7jdrstjz++bcuBAwdUXFysf/u3fxufYgF8\naYQDALaFw2FNmzZNkvTaa6/J5XJpdHT0c1+Tn5+vVatW6aKLLlJdXd14lAngSyIcALBt8eLF2rRp\nk+69915lZGTohhtu0PLlyz+zI+FUOxSeeOIJNTQ06O233x6vcgGcIW7ZDAAALBg5AAAAFoQDAABg\nQTgAAAAWhAMAAGBBOAAAABaEAwAAYEE4AAAAFoQDAABg8f8B3rgqpCgMim0AAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ranks, counts = wc.ranks()\n", "plt.plot(ranks, counts, linewidth=10, color=COLORS[5])\n", "plt.xlabel('rank')\n", "plt.ylabel('count')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Huh. Maybe that's not so clear after all. The problem is that the counts drop off very quickly. If we use the highest count to scale the figure, most of the other counts are indistinguishable from zero.\n", "\n", "Also, there are more than 10,000 words, but most of them appear only a few times, so we are wasting most of the space in the figure in a regime where nothing is happening.\n", "\n", "This kind of thing happens a lot. A common way to deal with it is to compute the log of the quantities or to plot them on a log scale:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfwAAAFpCAYAAAB5+ZrjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xt81OWd9//3d47J5BwySYCEAOFkpB6Ku0rR21bBHmyt\ne+8uhraudt2e7N2tra61xaKt6UJ/vWtby9IqrNp1Vephu6W37dK79a51WxAQEQjnUyDnDDlMMplk\nMoffH0pgMkNIQub8ej4ePh7Odc1MPlwMec/3+72+12WEQqGQAABAWjMlugAAABB7BD4AABmAwAcA\nIAMQ+AAAZAACHwCADEDgAwCQAQh8AAAyQMwD/9ChQ1q2bJmeffbZ4bbVq1ertrZWK1as0J49e4bb\nOzo6dO211yoYDMa6LAAAMoollm/u9XpVV1enxYsXD7dt375dDQ0N2rhxo44ePaqVK1dq48aNkqSn\nn35aV199dSxLAgAgI8X0CN9ut2vDhg0qLS0dbtuyZYuWLl0qSaqurpbb7ZbH49GmTZt00003yWaz\nxbIkAAAyUkwD32QyRQS4y+VScXHx8OPi4mK5XC7t3r1br7/+uvbv369XXnkllmUBAJBxYnpKfyzO\nXK9/8MEHJUlNTU26+eabE1kSAABpJ+6BX1paKpfLNfy4vb1dTqdz+PHq1avH9D6hUEiGYUx6fQAA\npKO4B/6SJUu0du1aLV++XPX19SorK5PD4Rj3+xiGoY6O3hhUiHM5nXmMc4wxxrHHGMceYxwfTmfe\nhF8b08Cvr6/XmjVr1NzcLIvFos2bN2vt2rWqqalRbW2tzGazVq1aFcsSAACAJCMUCoUSXcRE8W0y\n9vjWHnuMcewxxrHHGMfHxRzhs9IeAAAZgMAHACADEPgAAGQAAh8AgAxA4AMAkAEIfAAAMgCBDwBA\nBiDwAQDIAAQ+AAAZgMAHACADEPgAAGQAAh8AgAxA4AMAkAEIfAAAMgCBDwBABkjZwF/5h3q92typ\nQCiU6FIAAEh6KRv47f2D+l3Taf3sUJP6/YFElwMAQFJL2cA/44jbq5/sO6V2ry/RpQAAkLRSPvAl\n6fTgkH6y75QOdHsSXQoAAEkpZQO/uign7PFgMKhnDjfrjy2dCnFdHwCAMCkb+Pf+5VwtKskPawtJ\n+q/G03rxWJuGgsHEFAYAQBJK2cC3mk36nzNL9dEZThkj+nZ19mr9gUa5ff6E1AYAQLJJ2cCXJMMw\n9L6yQt05b5qyzOF/lEbPoP5l30md6htIUHUAACSPlA78M+YW5Ojumko5s6xh7b1DAa0/0Ki3XO4E\nVQYAQHJIi8CXpJIsm75wSaXmFzjC2v2hkF483qb/OuVSkMl8AIAMlTaBL0lZFrNunztN/6O8KKLv\nj61deuZwswYCLNIDAMg8aRX4kmQyDH2oskR/O6tMFiN8Ot/Bnn79ZF+jTg+wSA8AILOkXeCfcWVJ\nvj6zoEJ5VnNYe8eAT+v2ndIRd3+CKgMAIP7SNvAlqTI3S3fXzNB0hz2s3RsI6umDTfpzWzeL9AAA\nMkJaB74kFdgs+uwlFbq8OC+sPSjp/5zs0C9OtMsfJPQBAOkt7QNfkqwmk5bPLtMHK6ZELNKzw+XW\nvx5sVN8Qi/QAANJXRgS+9M4iPddPLdbtc6fKbgr/Yzf0DWjdvlNq7h9MUHUAAMRWxgT+GQsKc/X5\nmgoV28MX6en2+fX4/lPa29mboMoAAIidjAt8SSrLtuvumkrNzssOax8KhvTc0Vb9ruk0i/QAANJK\nRga+JDksZn163nQtLi2I6Hu1uVPPH22VL8COewCA9JCxgS9JZpOhj1WV6q9mlso8YjZffVefHt9/\nSl2DQ4kpDgCASZTRgX/GXzgL9PfzK+SwhC/S0+J9Z5Ge473eBFUGAMDkIPDfNSsvW1+sqVR5ti2s\n3eMP6MmDjdraziI9AIDUReCfo8hu1ecuqdSlRTlh7YGQtKmhQ88eaVG/n813AACph8AfwW42aUX1\nVN0wrTiib1+3R4/tbWAdfgBAyiHwozAZhpZOn6IV1eWymcJn87mHAnrqYJN+c8rFkrwAgJRB4I/i\nPcV5+tKlM1SRE775TkjS661d+un+U+rwstUuACD5EfgXMCXLps8tqNT7pxZFrMPf3D+otftOantH\nDxP6AABJjcAfA7PJ0E0VJbpr/nQVWC1hfUPBkH5xop0JfQCApEbgj8PsfIf+ceEMLSzKjeh7Z0Lf\nSR1lQh8AIAkR+OOUbTFrRXW5/npmaZQJfX49yYQ+AEASIvAnwDAMLXIW6H+NMqHvcSb0AQCSCIF/\nEUrendB3fZQJfU1M6AMAJBHLhZ+C0ZhNhj5YUaK5+Q69eKxNPUP+4b4zE/r2d3u0sChXUx12ObNs\nsphGfj0AACC2CPxJMjvfoS8tnKH/PNGuvV19YX0Huj060O2RJJkMqTTLpnKHXVOz7Sp32DTVYVeu\nlb8KAEDskDKTyPHuhL6dLrd+dbJDvigT94IhqdXrU6vXp13qHW7Ps5pVnm3XjNwsXVNaqByrOeK1\nAABMFNfwJ9m5E/qmO+wXfsG7eocCOuzu1++bO/Xj+gY19w/GsEoAQKYh8GOkJMumz9dU6u/mTtP7\npxZpQUGOCmxjO6HiHgroif2ndKjHE+MqAQCZglP6MWQ2DC0ozNGCwrPb7fb7A2rtH1RL/6BavT61\n9A+q3euTf8RMfl8wpH871KyPzyzVXzgL4l06ACDNEPhx5rCYNTvfodn5juG2QCgk14BPb7T3aGt7\nz3B7UNIvTrSrc3BIy6ZPkclgdj8AYGI4pZ8EzIahsmy7bqkq1UdnOCPu6X+tpUsvHGvVUDCYkPoA\nAKmPwE8y7ysr1CfnTJV1xL36uzv79OTBJjboAQBMCIGfhGqKcvUP8yuUYwm/Na+hb0A/3X9KpwdY\nshcAMD4EfpKqzM3SF2oq5cyyhrW7Boa0bt8pbT7lkovgBwCMEYGfxIrtVn3+kkrNyssOa/cGgnqt\ntUuP7mnQ4/tP6c2OHg0GuL4PADg/Aj/JZVvM+vS8abqiOC9qf0PfgF4+0a7Vu47p5eNtavQMxLlC\nAEAqIPBTgMVk0t/OLtMHK6bIdp6Nd3zBkN50ufWTfaf0p9auOFcIAEh23IefIgzD0PVTi3VNaaH2\ndvXpzY4eneiLPJoPSfr1KZeK7FbVFOXGv1AAQFIi8FOM3WzSopJ8LSrJl2vAp50ut3a63HIPnb1d\nLyTp58da9bkFFZqWk5W4YgEASYNT+imsJMummypK9E+Xz9Jts8vD/jKHgiE9c7hFbp8/YfUBAJIH\ngZ8GzIahy6fk6WNVzrD2niG//v1Is3zM4AeAjEfgp5GrSwv1vrLCsLZGz6BeOt6m4IjNeQAAmYXA\nTzMfqSzRvAJHWNverj79ruk0oQ8AGSzmgX/o0CEtW7ZMzz777HDb6tWrVVtbqxUrVmjv3r2SpJ07\nd+r+++/XV7/6VdXX18e6rLRlMgzVVperNNsW1v6Hli49svOYnth/Sr9q6NCbLrda+gcVCPIlAAAy\nQUxn6Xu9XtXV1Wnx4sXDbdu3b1dDQ4M2btyoo0ePauXKldq4caPy8vJUV1enAwcOaNu2bbr00ktj\nWVpayzKb9Xdzp2ndvlNhm+0MBoM60TcQdjuf2TBUnm3TVIdd03LsyrNaZDeblGUyKctiUglnBQAg\nLcQ08O12uzZs2KAnnnhiuG3Lli1aunSpJKm6ulput1sej0dz587Va6+9pqeeekqPPPJILMvKCMV2\nq26fM1UbDjYpMEpoB0IhNfUPqql/UHJF9pccadFHKkq0oDAnhtUCAGItpqf0TSaTbLbwU8sul0vF\nxcXDj4uLi+VyubR7925df/31+sEPfqCnn346lmVljKq8bH3+kgrNK3Aoyzyxv2qX16d/O9ysF4+1\nsjUvAKSwhC+8Ewy+c8tYT0+PVq1aJa/Xq1tuuWVMr3U6o68vj7OczjxdMdOpUCgkl9enU+5+nXR7\ndbKnX6fcXnUPDo3pfd463atjfV59auEMXTHiTgBcPD7LsccYxx5jnNziHvilpaVyuc6eO25vb5fT\n6VRVVZWuu+66cb1XR0fvZJeX9irNFlUW5WlJ0Tv/MPuG/GruH1RL/6DavT4NBIIaCAQ1GAiqtX9Q\n597B3zPo17+8eUxXFOfpo1VOOSzmxPwh0ozTmcdnOcYY49hjjOPjYr5UxT3wlyxZorVr12r58uWq\nr69XWVmZHA7HhV+ImMi1WjSvwKJ5BZHX6Js8A/rPkx1qGrFm/67OXh1x9+t9ZYWymgyZTYZm5GRp\nqsMuw4i+uQ8AILFiGvj19fVas2aNmpubZbFYtHnzZq1du1Y1NTWqra2V2WzWqlWrYlkCLsL0nCw9\neO0CvbD7pF5r6dS5d/D1+QP6bdPpsOdfVpyrj1eVKpsjfwBIOkYolLr3XXH6KPbOnKZr9gzo5eNt\navH6Rn1+oc2i22aXqyovO04Vpj5OhcYeYxx7jHF8XMwpfVbaw5hMy8nSF2pm6MZpxTKNcta+2+fX\n+gON2tvJP3wASCYJn6WP1GExGbpx+hRdVpynvV196vcHFAyFdLzXq9ZzjvyDkl441qY8q4UjfQBI\nEgQ+xs2ZbdMHss+upTAUDOq3jaf1p7bu4TZ/KKR/O9ys5bPLVWi3aIrdKouJE0oAkCgEPi6a1WTS\nzTOcKs+26eUT7cPt3kBQPzvcLEmym026rrxQ15UXyUrwA0DcEfiYNIucBer2+fX75s6IvsFAUL9r\n6tS29h6VZNlkNRmymEzKNpu0sDg36m2BAIDJQ+BjUt0wrVg9Pr92uNxR+91DAbmHvGFtO1xu/aUz\nXx+dUSrLaDMCAQATRuBjUhmGoVtnlqo026aj7n55/UG1D7yzgt9otnW4daDbo0uLcnWVs0BTHfY4\nVQwAmYHAx6QzGYauLS/SteVFkiSvP6BXmzu1pb07bPGekdxDAW1p79HW9h4tKSvU0ulTZJvgpj8A\ngHAEPmIu22LWzTOcumFasdq8PvmDIflD76zZ/3+bTqtr0B/2/JCk/27rlmtgSJ+aO1UmlusFgItG\n4CNusi1mzRxxX/68ghz98kS79nb1aeTB/4Eej473elWdz14LAHCxCHwklMNi1oo5U+X1B3Swx6Pf\nNp5Wt+/sEf/erj4CHwAmARdIkRSyLWZdMSVft1Q5w9r3dfUpmLrbPQBA0iDwkVTm5DtkP2dhnt6h\ngBpGbM8LABg/TukjqVhMJi0ozNHb52y+s/5Ao+bmO1SZm6VCm0V5Vouc2TYV260JrBQAUguBj6Sz\nsDg3LPAl6bC7X4fd/WFtc/IdWlSSL2e2TaVZNhbtAYBREPhIOvMKHMqxmOXxB0Z93hF3v468+yXA\n9u5Ofte9e+8/ACAc1/CRdKwmk1ZUl6sixy7zGA/afcGQfnPKpeb+wdgWBwApiiN8JKXZ+Q7dXTND\ngVBIrgGfTvYN6PTAkNxDfp3sG1Dn4FDU1/25rUt/M6s8ztUCQPIj8JHUzIahsmy7yrLPrq0fCoW0\np7NP+7s92tPZq3NX6d/p6lWPz69PVE9VtsUc/4IBIElxSh8pxzAMXTYlT7dVl+vbV81RkT38e+tR\nt1cvHmtTiPv3AWAYgY+UZjIMLSmLnKh3oMej3zd3KjDabj0AkEEIfKS8a0oLdE1pQUT7q82deqz+\npHZ39sozNPqMfwBId1zDR8ozGYZuqSrV4tJC/WhvQ9g1/Y4BnzYebZUh6S+c+fp4VakMdt8DkIE4\nwkfacGbb9Km5U5VljvxYhyRt63DrWK83/oUBQBIg8JFWFhTm6qvvqdKikvyo/Q19BD6AzETgI+3k\nWi3661ll+sdLZ0Qs3NPS70tMUQCQYAQ+0la5w67PLqgMa2tlJT4AGYrAR1ory7bp3IP8zsEhDQaC\n530+AKQrAh9pzWY2aUrW2W10Q5KOjth1DwAyAYGPtDf1nGV5Jennx1p1uMeToGoAIDEIfKS9q5zh\nM/aHgiH92+EW7e/uS1BFABB/BD7S3tyCHN00fUpYWyAU0nNHWtXYN5CgqgAgvgh8ZIT3TyvWzZUl\nYW2BUEjPHW1Rj8+foKoAIH4IfGSMJeVFurWqNKyt2+fXM4ebFWBnPQBpjsBHRvnL0gItLi0Ma2vu\nH9ShbibxAUhvBD4yzkdmlGh2XnZY2zNHWvTrkx3cow8gbRH4yDhmw9CHKkoi2v+7rVu/PtWRgIoA\nIPYIfGSk6Tl2VeZkRbRv73Dr9ZYu+YNc0weQXgh8ZCTDMHRbdbkuKcyJ6PtNo0vr9p1Uu5eNdgCk\nDwIfGavYbtXtc6fpxmnFEX2tXp8eq2/QUwebdLyXLXUBpD4CHxnv2vIilWbZItqDIemwu1/rDzTq\nlyfa5eZ+fQApzJLoAoBEs5tN+uKllTrm9mqHy636rsgld9/o6NEbHT2a5rCrtrpcJVG+IABAMuMI\nH5BkNZk0vzBHn6gu19/MKlOBLfp34eb+QT26p0F/aO5ksR4AKYUjfOAchmHovSX5unJKnra29+i3\nTaej3pv/26bTGggE9aHKyNv7ACAZcYQPRGEYhhaXFeq+98zUkrLCqM/5Y2uXWvsH41wZAEwMgQ+M\nIsdq1s0znFr13tmaX+CI6H+s/qSOuvsTUBkAjA+BD4xBltmsO+ZN1zWlBRF9zxxu1r6uPoW4pg8g\niRH4wDh8dIZTM3PDV+jzBUP69yMteuStY9yzDyBpEfjAOJgMQ383b5quKM6L6BsIBLX+QKNebe5U\nkKN9AEmGwAfGKcts1t/OLtPVzsjT+5L0u6bTenx/o7z+QJwrA4DzI/CBCTAMQx+fWaq75k+XEaX/\nlGdATxxoVPfgUNxrA4BoCHzgIlTnO/TIVXO0sCg3oq/N69MTBxrli3IfPwDEG4EPXCSTYegTc6bq\n0/OmRfyD6vb5VffWMe3o6ElIbQBwBoEPTJK5BTn65nurtaAgfMtdfyik/zjRrjW7jskfZDIfgMQg\n8IFJZDeb9NezyuSwmCP63EMBbeNIH0CCEPjAJMuxmvW5BRW6rDjyuv4J7tMHkCAEPhADzmybaqun\n6rry8HX4Tw/4ElQRgExH4AMxdF15UdjjFq9PB7s9CaoGQCYj8IEYyrGYlTPiev7PDjfr5eNtcvv8\nCaoKQCYi8IEYMgwj4rS+JL3pcmvN28d1hJ32AMSJJdEFAOnu2vIiDQZD+n/NnRF9Tx1s0kHPgC7L\nc6hyxKY8ADCZxnSE/8ADD0S03XXXXZNeDJCOTIahZdOnRD3SD0n6U+Np/WT/KW040KidLjcb7wCI\niVGP8Ddt2qSNGzfq8OHD+uQnPzncPjQ0JJfLFfPigHTy4UqnFhbl6Sf7T0XtP9br1bFer35zyqWP\nVJboiil5MoxoK/UDwPgZodDohxNtbW2677779KUvfWm4zWQyac6cOSosjDxiiaeOjt6E/vxM4HTm\nMc6TLBgK6f82ndaf27o1NMrKe3PzHaqtLld2lEV8MD58jmOPMY4PpzNya+6xumDgn9Hb26vu7u6w\ntsrKygn/4MnAhyv2+EccO71Dfv3niXbtH+U2vbn5Dn16/vQ4VpWe+BzHHmMcHxcT+GOatFdXV6eX\nX35ZxcXFOvP9wDAM/f73v5/wDwYyXZ7VotvnTlNuoUO/O9SsPZ19Oj5iJb7D7n597+3j+seFVbKb\nuakGwMSNKfDfeOMNbd26VXa7Pdb1ABkn22rWNaWFuqa0UCf7vPrp/saw/i6fX9/aeVR/4czXhytK\nlMUpfgATMKZDhqqqKsIeiIMZudm6bXZ51L7tHW59+61j2ni0Rf3+QJwrA5DqxnSEX15erk9+8pNa\ntGiRzOazRxdf/vKXL/jaQ4cO6Ytf/KLuvPPO4Zn+q1ev1ttvvy3DMLRy5UotXLhQu3bt0osvvqhg\nMKjbb79dNTU1E/wjAantsuJc1Xflam9XX9T+3Z192t3Zpy9dOkNTHXwRBzA2YzrCLyws1OLFi2Wz\n2WQ2m4f/uxCv16u6ujotXrx4uG379u1qaGjQxo0bVVdXp7q6OkmSw+HQQw89pDvuuEM7duyY4B8H\nSH2GYegTc6bqizWVKrZbz/u8DQca1TkwFMfKAKSyMR3h33333RN6c7vdrg0bNuiJJ54YbtuyZYuW\nLl0qSaqurpbb7ZbH49G8efPU19en5557Tvfdd9+Efh6QTqbnZOne91TpWK9XLx1vU8+Itfe9gaA2\nN7m0onpqgioEkErGFPg1NTVhC4AYhqG8vDy98cYbo77OZDLJZrOFtblcLi1cuHD4cVFRkVwul0Kh\nkL73ve/p3nvvVX5+/nj+DEDaMgxD1fkOfe3yWdrT2avnj7aG9R/s9sgfDMliYoEeAKMbU+AfOHBg\n+P99Pp+2bNmigwcPTkoBZ27zW79+vTwej9atW6errrpKy5Ytu+BrL+Z+RIwd4xx7YxnjG5x5um5O\nue7evGu4zRcMqdts6JIS/o4uhM9x7DHGyW3cm+fYbDZdf/31evLJJ/XZz3523D+wtLQ0bFne9vZ2\nOZ1OfeUrXxn3e7HIQ+yxmEbsjXeM31uSp52us89/dNthPfzeatm4T/+8+BzHHmMcHzFfeOell14K\ne9za2qq2trYJ/cAlS5Zo7dq1Wr58uerr61VWViaHwzGh9wIy0byCnLDAl6SHdx7VlxfOUFk2s/YB\nRDemwH/zzTfDHufm5uqHP/zhBV9XX1+vNWvWqLm5WRaLRZs3b9batWtVU1Oj2tpamc1mrVq1amKV\nAxlqXr5DDos54l78H+09qXsWVqk023aeVwLIZGNeS1+Suru7ZRiGCgoKYlnTmHH6KPY4TRd7Exnj\nnS63Xjoe/SzbtxZVy2ri9P65+BzHHmMcHxdzSn9MvxV27typpUuX6sMf/rA++MEP6kMf+pD27Nkz\n4R8K4OK8tyRfn1tQEbXvoTePyj/KLnwAMtOYAv/73/++1q1bpy1btmjr1q169NFHtWbNmljXBmAU\nVXnZeui91VH7Vr15RIGxn7wDkAHGFPgmk0nz5s0bflxTUzOmlfYAxJbdbNI/Xjojat83dxzRnk5O\nsQJ4x5gDf/Pmzerr61NfX59+/etfE/hAkih32HXH3GlR+54/2qo/t3XHuSIAyWhMgf+tb31LL7zw\ngj7wgQ/oxhtv1M9//nN9+9vfjnVtAMZofmGOPjUn+hK7/+dkhw71eOJcEYBkM6bA/9Of/iSbzabt\n27frjTfeUDAY1GuvvRbr2gCMQ01Rrj55ntB/+lCz3j7N6X0gk40p8Ddt2qS1a9cOP37yySf1q1/9\nKmZFAZiYS4ty9Z2r5kTt+/mxVn1j+2EFmMEPZKQxBX4gEAi7Zm/iHl8gaRmGobrzhL4kffuto3Gs\nBkCyGNNKezfccINqa2u1aNEiBYNBbd26VTfddFOsawMwQSbD0DevnK1H3joW0TcUDGlbe4/+sjQ5\nFtACEB9jXmlvx44d2r17twzD0JVXXqkrrrgi1rVdEKs6xR6rZ8VeLMd4KBjU93efkHsoENH3tctn\nqcA27v2zUhKf49hjjOPjYlbaG9fSusmGD1fs8Y849uIxxj8/2qq3o9yT/8Dls5SfAaHP5zj2GOP4\niPnSugBS2/LZZVHb17x9PM6VAEgUAh/IAIZh6GuXz4zaxz36QGYg8IEMUWCz6tPzIlfke/pQs1L4\nyh6AMSLwgQwytyBH0x32iPY/tnYloBoA8UTgAxnmU3MjV+Pb3HhargFfAqoBEC8EPpBhCmxWlWfb\nItof3dOgZs9AAioCEA8EPpCB/mFBRdT2tftO6T+Ot8W5GgDxQOADGchhMevBK2dH7dvhcmtvZ1+c\nKwIQawQ+kKEcFrPujDJrX5KeO9qiw9yuB6QVAh/IYPMKcnTPwqqofU8datZgIBjnigDECoEPZLjS\nbJseWRR9d71v7TyqAPfoA2mBwAcgs8nQly6dEbXvmzuOaNdpN4vzACmOwAcgSZrqsOvzl0Sfvf/C\nsTat3HFEvUP+OFcFYLIQ+ACGzcjN1jWlBeftX73ruF5t7pQ/yNE+kGoIfABhbqkq1funFp23/3dN\np/X93SfiVxCASUHgA4hwU0WJvvqe6LP3JalnyK9t7T1xrAjAxSLwAURVkmXTd66aoxunFUft/8+G\ndr3ewqY7QKog8AGcl2EYunH6FN33nplR+3/T6NK/H2Z7XSAVEPgALqg4y3repXj3dXv03bdPaCjI\nIj1AMiPwAYyJw2I+77367iG/Htl5LM4VARgPAh/AmE112PXN8xzp+0MhvcFEPiBpEfgAxiXbYtbD\n761WRY49ou+XDe3a2t6dgKoAXAiBD2DcbGaT7q6ZoQUFORF9mxo69F+nXAmoCsBoCHwAE3b73KlR\n2//Y2kXoA0mGwAcwYYZh6HMLoq+//8fWLr3B6X0gaRD4AC5KVV62vn7FrKh9v2zo0JsutwKsvQ8k\nHIEP4KLlWS1aeUX02fsvH2/T04ebWJwHSDACH8CkyLGa9bXLZ0btO+r26qf7GxUk9IGEIfABTJoC\nm1V/P2961L5TngG9eKyNI30gQQh8AJNqToFDn543LWrf2529+vmxVnV4fXGuCgCBD2DSzS3I0arz\nrMi3u7NPj9U36Ji7P85VAZmNwAcQE1kWs75xntn7gZC04WCT/vVgo3qH/HGuDMhMBD6AmMm1Ws67\nta70zmS+1buOq2twKH5FARmKwAcQU8VZVn2xplKXF+ed9zn/su+k+v2BOFYFZB4CH0DMTc/J0m3V\n5fpIZUnU/n5/UHVvHdPBbk+cKwMyB4EPIG6uLS9S3VVzVJZti9r/s8PN6uOaPhATBD6AuDIZhj6z\noELl5wn9f951XJ1c0wcmHYEPIO4cFrM+d0ml5hc4ovb/790ndLzXG+eqgPRG4ANICLvZpDvmTdfi\n0oKo/f96sFGt/YNxrgpIXwQ+gIT6cKVT0x32iPZgSHqs/qTaWZUPmBQEPoCEspgM/cOCChXbrVH7\nnzvaEueKgPRE4ANIOLvZpC/WVOqSwpyIvnavT//f28f1psudgMqA9EHgA0gK2Razls8uV0VO5On9\nbp9f/3G8TW4ft+wBE0XgA0gadrNJX7ikUrkWc0RfSNL395xg7X1gggh8AEnFMAx9Ys5U5UQJ/aFg\nSP91ypUbPOKEAAAPg0lEQVSAqoDUR+ADSDoz87L19Stm6copkevvv3W6V7tOcz0fGC8CH0BSMhmG\nlk2fIrMR2ffCsTad7GNhHmA8CHwASavQbtWDV1ZH7Xv2SIv2dfUpEArFuSogNRH4AJKa3WzS4tLC\niPbeoYD+/UiLNjW0J6AqIPUQ+ACS3kdnlGhJWWToS9Jbrl4FOcoHLojAB5D0DMPQ+8oKZTVFXtD3\nh0J6cMcRfXfXcW3v6ElAdUBqIPABpIQiu1Wfv6RS759aFLW/Z8ivX55oV78/EOfKgNRA4ANIGVMd\ndt1UUaLZedlR+4OS/ru1S21edtkDRiLwAaScD1eWaMp5Ntv5Q0uXfrT3pF5r6YxzVUByI/ABpJzp\nOVm697KZ+taias0rcER9zn+3dse5KiC5WRJdAABMlNVk0ryCHB3q6Y/o8/gDeqO9W5KhuSapKBSS\nYURZxQfIEAQ+gJR2TWmBgqGQjvd6tb/bE9b3y4aOd/6noV1Lp0/RDdOKE1AhkBxifkr/0KFDWrZs\nmZ599tnhttWrV6u2tlYrVqzQnj17JEkdHR2655579NJLL8W6JABpxGQYura8SLfPnaY8a+SGO2e8\nyS17yHAxDXyv16u6ujotXrx4uG379u1qaGjQxo0bVVdXp+985zvvFGIy6bbbbotlOQDS3PyCnPP2\nefwBNfR6dapvQL5AMI5VAckhpqf07Xa7NmzYoCeeeGK4bcuWLVq6dKkkqbq6Wm63Wx6PR1OmTJHZ\nfP5v5wBwIR+rcqo026aOAZ9CIWmH6+yuer5gSI8faJT0znK9d82froqcrESVCsRdTI/wTSaTbDZb\nWJvL5VJx8dnraEVFRXK5zu5vHWKJTAATZDWZdG15kf5qZpn+ambpeX/BDQaC2tLGLH5kloRP2jsT\n8Fu2bNHzzz8vj8ejoqKi4bMAo3E6I/fKxuRjnGOPMY6N2UU5OtLlido3IMZ9sjGeyS3ugV9aWhp2\nRN/e3i6n06mqqqqwa/1j0dHRO9nlYQSnM49xjjHGOHb+Z6VTvzWZ1BsIqm9wSG1e33Df8W6Pvven\nA5pit+nG6cVyWLikeDH4HMfHxXypivvCO0uWLNHmzZslSfX19SorK5PDEX3hDAC4GIV2q5bPLtcD\n75uvv5lVFtY3EAjqUE+/trR36+XjbQmqEIifmB7h19fXa82aNWpubpbFYtHmzZu1du1a1dTUqLa2\nVmazWatWrYplCQAgScqznv/X3YlebxwrARIjpoF/6aWX6plnnolov/fee2P5YwEgQr7NosuKc7W7\nsy+ibyjIZGGkv4RP2gOAeFk+u1zXlQ+qbyignx1uHm73h0Jat+/k8OOKnCx9qKJENjPbjSB9EPgA\nMobJMDT93XvvLYYh/zm3ATd6BsP+32wYunmGM+41ArHC11cAGSnPNvqs/EbPQJwqAeKDwAeQkW6Y\nWqzR9s7zc10faYZT+gAy0iJngS4pytXpgSFJ0ulBn144dvb2PI8/oN2nz95XPiXLqmkOO1vsImUR\n+AAylsNiliP3nVP79hET9Lp9fm081hrWtmz6FH2ALXaRojilDwCSrKYLH7lvbWf9faQuAh8AJBXa\nLCq2W0d9jtfPtrpIXZzSBwBJhmHo7+dP15/butXr80uSQpL2dp1dqMcfCikUCnEdHymJwAeAdxXb\nrfroiHvvH9x+WOce15/yDMh0TuCX2K3KYuMdpAACHwBGYTYZCp5zi95P9zeG9ZsMqXZ2uRYWszUs\nkhvX8AFgFOYLnL4PhqQ/tHTFqRpg4gh8ABhFxbtL8Y6md8gfh0qAi8MpfQAYxV/PKtPmUy51DPiG\n24KhkFq8Zx8HQqzKh+RH4APAKApsFi2vLg9rG/AH9O23jg0/DnC3HlIAgQ8A42QacV0/EAqpc3Bo\n+LEhKd9mueD1fyCeCHwAGCfziFX5/KGQ/vfuE2Ft2WaT/m7uNFXlZcexMuD8mLQHAONkkkbdaU+S\nvIGgXm3ujEc5wJgQ+AAwToZhqCzbdsHn9fiYvY/kQeADwAQsn12u6vxsFdksw//lW8OvkgbF7H0k\nD67hA8AElDvsumt+RVjb6QGfvr+nYfhxkLxHEuEIHwAmSbTZ+0CyIPABYJKMvA0vSOAjiXBKHwAm\nyYi79dTvD+qZw81Rn1tst+p/TC1SnpVfw4gPPmkAMElGHuEHQiHt7/ac9/kt/YP6hwUV5+0HJhOn\n9AFgklhNJlnGsbreiV5vDKsBwhH4ADBJLCZDV5cWjPn5QXGdH/HDKX0AmEQfqSzRlSX56j5nbf1z\nPXukJezu/FBIF162D5gEBD4ATCLDMDTNYdc0hz1qv8kwwm7XCyokM4mPOOCUPgDE0ciZ/JzRR7wQ\n+AAQRyOP5Vl+F/FC4ANAHI1cjY/ldxEvXMMHgDgaeYTf4/Nr6AKpbzMZyraYY1cUMgKBDwBxNPII\n/8f1J8f0uvcU5Wr57HKZR04CAMaIU/oAEEcTzes9XX063sdCPZg4Ah8A4qgsO/rtemPh9vknsRJk\nGk7pA0AcfbzKqV82dKjdO3jB53oDwbDr+yHu4cNFIPABII6mZNn09/Onj+m5Lx9v05su9/Bj4h4X\ng1P6AJCkRl7uJ/BxMQh8AEhSIzfe4559XAwCHwCSlDHiGD/EMT4uAoEPAElq5BE+c/ZwMQh8AEhS\nXMPHZCLwASBJRZzS5xAfF4HAB4AkFbGVbmLKQJog8AEgSUWc0ifxcRFYeAcAktTISXt7unrlGvSN\n+31MMjQzL1uXFefKGPmmyBgEPgAkqZHX8Bs9g2r0XHhJ3mje6OjRULBUVzkLJqM0pCBO6QNAkrKb\nJ/dX9KGe/kl9P6QWAh8AktSlRbmymybv13SASQAZjVP6AJCkSrNt+vLCGTra65V/AuvqtvYP6o2O\nnuHHxH1mI/ABIIkV2q1aZLdO6LX7u/vCAp/Ez2yc0geANBW5Uh+Jn8kIfABIW9yCh7MIfABIU6zF\nj3MR+ACQpthtD+ci8AEAyAAEPgCkKSbt4VwEPgBkCOI+sxH4AJCmRm6UwzX8zEbgA0Ca4qY8nIvA\nB4AMwQF+ZiPwASBNcR8+zkXgA0CaGnkNn4v4mY3AB4AMQdxnNgIfANIUp/RxLgIfANJUxCx9Ej+j\nxTzwDx06pGXLlunZZ58dblu9erVqa2u1YsUK7d27V5K0e/durVy5Ut/4xjfU0tIS67IAIO1FXMJP\nTBlIEjENfK/Xq7q6Oi1evHi4bfv27WpoaNDGjRtVV1enuro6SdLGjRv18MMP6wtf+IJeeOGFWJYF\nABmJpXUzmyWWb26327VhwwY98cQTw21btmzR0qVLJUnV1dVyu93yeDzy+/2yWq0qLS3V6dOnY1kW\nAGQEY8RJfbfPr981xeb3a05Xrzz9vuHH5dk2LSjMlcXE8j/JIqaBbzKZZLPZwtpcLpcWLlw4/Li4\nuFgul0vZ2dny+XxqbW3VtGnTYlkWAGSEkVHrHgro1ebOuP38a0oLdEtVadx+HkYX08Afi2AwKEmq\nra3Vww8/rGAwqK985SsJrgoAUp/VnNij68M9/Qn9+QgX98AvLS2Vy+Uaftze3i6n0ymHw6F//ud/\nHtd7OZ15k10eomCcY48xjr1MHGOn8rR+Rkmiy0CSiPtteUuWLNHmzZslSfX19SorK5PD4Yh3GQAA\nZJSYHuHX19drzZo1am5ulsVi0ebNm7V27VrV1NSotrZWZrNZq1atimUJAABAkhEKsbgyAADpjpX2\nAADIAAQ+AAAZgMAHACADEPgAAGQAAh8AgAyQ9IHPbnuxd6Ex3rNnjySpo6ND99xzj1566aVElZqy\nxvo53rVrl1auXKmvf/3r2rdvX6LKTVljHeedO3fq/vvv11e/+lXV19cnqtyUNNbfF9I7vzOuvfba\n4RVVMTZj/RyvXbtWDz74oL773e/qwIEDF3zfpA58dtuLvbGM8Xe+8x1J7+yNcNtttyWq1JQ1ns+x\nw+HQQw89pDvuuEM7duxIVMkpaTzjnJeXp7q6Ot15553atm1bokpOOeP5fSFJTz/9tK6++upElJqy\nxvM5lqSsrCwFAgGVll54z4KkDvwzu+2d+wdht73JNZ4xnjJlisxmc6JKTVnjGeN58+bJ5/Ppueee\n06233pqoklPSeMZ57ty52rJlix599NHhflzYeMZ406ZNuummmyI2UMPoxjPGt912m+6//37deeed\n+tnPfnbB907qwD/fbnvFxcXDj9lt7+KMZYyLiorC9j9grabxGc8Y9/X16Xvf+57uvfde5efnx7vU\nlDae3xe7d+/W9ddfrx/84Ad6+umn41xp6hrvGL/++uvav3+/XnnllXiXmrLGM8ZHjhyRxWJRXl6e\nfD7fyLeKkPDd8i4Wu+3F3pmA37Jli55//nl5PB4VFRVxZDSJzozx+vXr5fF4tG7dOl111VVatmxZ\ngitLL2d+X/T09GjVqlXyer265ZZbElxVejkzxg8++KAkqampSTfffHMiS0o7Z8Z4cHBQDzzwgKxW\nqz772c9e8HUpF/iTudseojvfGFdVVYVdV8LEnW+M+bI6uUb7LF933XUJrCx9nG+Mz1i9enUiykor\no32O3//+94/5fZL6lH407LYXe4xx7DHG8cE4xx5jHHuTNcZJfYTPbnuxxxjHHmMcH4xz7DHGsRfL\nMWa3PAAAMkDKndIHAADjR+ADAJABCHwAADIAgQ8AQAYg8AEAyAAEPgAAGYDABwAgAxD4ACbFtm3b\n9IlPfCLRZQA4DwIfwKQxDCPRJQA4j6ReWhdAfG3btk3r1q1TVlaWFi1apK1btyoQCKi3t1e33367\nbr31Vv3iF7/Qn//8ZwWDQR0/flwVFRV67LHHwt7nwIED+qd/+idt2LBBZWVlCfrTADgXgQ8gTH19\nvV599VU1NTVpzpw5+sAHPqCOjg597GMf06233ipJ2rVrl1555RXZbDYtW7ZMBw4cGH59W1ubHnjg\nAf34xz8m7IEkQuADCDNr1izl5eXJ6XRq/fr1Wr9+vcxms3p6eoafc9lll8lms0mSysvL1d3dLZPJ\npL6+Pn3mM5/RPffco5kzZyboTwAgGq7hAwhjtVolST/84Q81c+ZMPffcc3r88cfDnmM2m8Men9mD\nq6mpSUuWLNFTTz0Vn2IBjBmBDyAql8ulOXPmSJJ+9atfyWQyyefzjfqa+fPn62tf+5rKy8u1bt26\neJQJYIwIfABRfepTn9KPfvQj3XXXXcrLy9M111yj++67L2ImfrSZ+Q899JA2bdqkXbt2xatcABdg\nhM6ciwMAAGmLI3wAADIAgQ8AQAYg8AEAyAAEPgAAGYDABwAgAxD4AABkAAIfAIAM8P8DCT3hIrQy\nGMsAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ranks, counts = wc.ranks()\n", "plt.plot(ranks, counts, linewidth=4, color=COLORS[5])\n", "plt.xlabel('rank')\n", "plt.ylabel('count')\n", "plt.xscale('log')\n", "plt.yscale('log')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This (approximately) straight line is characteristic of Zipf's law.\n", "\n", "n-grams\n", "-------\n", "\n", "On to the next topic: bigrams and trigrams." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from itertools import tee\n", "\n", "def pairwise(iterator):\n", " \"\"\"Iterates through a sequence in overlapping pairs.\n", " \n", " If the sequence is 1, 2, 3, the result is (1, 2), (2, 3), (3, 4), etc.\n", " \"\"\"\n", " a, b = tee(iterator)\n", " next(b, None)\n", " return izip(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`bigrams` is the histogram of word pairs:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [], "source": [ "bigrams = Hist(pairwise(iterate_words('pg2591.txt')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the 20 most common:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[(('to', 'the'), 444),\n", " (('in', 'the'), 399),\n", " (('of', 'the'), 369),\n", " (('and', 'the'), 349),\n", " (('into', 'the'), 294),\n", " (('said', 'the'), 251),\n", " (('on', 'the'), 199),\n", " (('and', 'when'), 168),\n", " (('he', 'was'), 164),\n", " (('he', 'had'), 164),\n", " (('to', 'be'), 163),\n", " (('it', 'was'), 152),\n", " (('Then', 'the'), 151),\n", " (('I', 'will'), 149),\n", " (('that', 'he'), 143),\n", " (('at', 'the'), 142),\n", " (('came', 'to'), 138),\n", " (('and', 'he'), 135),\n", " (('she', 'was'), 129),\n", " (('all', 'the'), 125)]" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bigrams.most_common(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we can iterate the trigrams:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def triplewise(iterator):\n", " a, b, c = tee(iterator, 3)\n", " next(b)\n", " next(c)\n", " next(c)\n", " return izip(a, b, c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And make a histogram:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [], "source": [ "trigrams = Hist(triplewise(iterate_words('pg2591.txt')))\n", "\n", "# Uncomment this line to run the analysis with Elvis Presley lyrics\n", "#trigrams = Hist(triplewise(iterate_words('lyrics-elvis-presley.txt')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the 20 most common:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[(('came', 'to', 'the'), 65),\n", " (('and', 'when', 'he'), 50),\n", " (('out', 'of', 'the'), 50),\n", " (('said', 'to', 'the'), 34),\n", " (('he', 'came', 'to'), 33),\n", " (('and', 'when', 'she'), 33),\n", " (('went', 'into', 'the'), 32),\n", " (('went', 'to', 'the'), 31),\n", " (('and', 'said', 'to'), 31),\n", " (('came', 'to', 'a'), 30),\n", " (('one', 'of', 'the'), 30),\n", " (('and', 'as', 'he'), 29),\n", " (('they', 'came', 'to'), 29),\n", " (('he', 'did', 'not'), 28),\n", " (('there', 'was', 'a'), 28),\n", " (('that', 'he', 'had'), 28),\n", " (('and', 'I', 'will'), 27),\n", " (('that', 'it', 'was'), 25),\n", " (('and', 'at', 'last'), 24),\n", " (('and', 'when', 'the'), 24)]" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trigrams.most_common(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now for a little fun. I'll make a dictionary that maps from each word pair to a Hist of the words that can follow." ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from collections import defaultdict\n", "\n", "d = defaultdict(Hist)\n", "for a, b, c in trigrams:\n", " d[a, b][c] += trigrams[a, b, c]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can look up a pair and see what might come next:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Hist({'came,': 1,\n", " 'fell': 1,\n", " 'might': 1,\n", " 'of': 2,\n", " 'on': 1,\n", " 'ran': 2,\n", " 'ran.': 1,\n", " 'streamed': 1,\n", " 'that': 1})" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['the', 'blood']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the most common words that follow \"into the\":" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[('forest', 15),\n", " ('forest,', 13),\n", " ('garden', 9),\n", " ('kitchen,', 8),\n", " ('cellar', 8),\n", " ('room,', 7),\n", " ('wide', 7),\n", " ('water,', 7),\n", " ('wood', 6),\n", " ('kitchen', 6)]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['into', 'the'].most_common(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the words that follow \"said the\":" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[('old', 13),\n", " ('man,', 12),\n", " ('little', 10),\n", " ('fisherman,', 8),\n", " ('father,', 7),\n", " ('ass,', 6),\n", " ('tailor,', 5),\n", " ('wife,', 5),\n", " ('fish;', 5),\n", " ('other;', 5)]" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['said', 'the'].most_common(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Hist` provides `choice`, which chooses a random word with probability proportional to count:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'wife,'" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['said', 'the'].choice()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given a prefix, we can choose a random suffix:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'sparrow;'" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prefix = 'said', 'the'\n", "suffix = d[prefix].choice()\n", "suffix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we can shift the words and compute the next prefix:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "('the', 'sparrow;')" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prefix = prefix[1], suffix\n", "prefix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Repeating this process, we can generate random new text that has the same correlation structure between words as the original:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "but killed his horse with his finger, and three times with the sack of pearls and precious stones for his wife, being awakened by the hand up to her, 'Take us out, or alas! we shall see what sort of game lies there.' And the king all that had been with the assistance they need, is critical to reaching Project Gutenberg-tm's goals and ensuring that the wife was not dead, but had been made to fetch this ring up from the gate,' as if it cost what it is said that it was the wolf, who lived in great comfort, " ] } ], "source": [ "for i in range(100):\n", " suffix = d[prefix].choice()\n", " print(suffix, end=' ')\n", " prefix = prefix[1], suffix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With a prefix of two words, we typically get text that flirts with sensibility." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }