{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring a Text with NLTK\n", "\n", "This notebook shows how you can explore aspects of a text using the Natural Langauge Took Kit (NLTK).\n", "\n", "Some of the things you can do include:\n", "\n", "* [Tokenize a text](#Tokenization)\n", "* [Generate a concordance for a word](#Concording)\n", "* [Explore collocations (words that are located together)](#Collocations)\n", "* [Counting words and frequencies](#Counting-Words-and-Frequencies)\n", "* [Finding smiliar words and contexts](#Similar-Words)\n", "\n", "For more on NLTK see the online version of the book [Natural Language Processing with Python](http://www.nltk.org/book/). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparing for Exploration\n", "\n", "Before we can analyze a text we need to load it in and tokenize it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installing NTLK\n", "\n", "Before you can use NTLK you need to make sure it is installed. The [Anaconda Navigator](https://docs.continuum.io/anaconda/navigator) by default installs NLTK, but you can always test if it is installed by importing it with ```import nltk```. Try it. It will give you an error if you don't have it." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import nltk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### (more on) Installing NLTK\n", "\n", "If you don't have it there are different ways to install it. \n", "\n", "* The NLTK 3.0 documentation has a page on [Installing NLTK](http://www.nltk.org/install.html).\n", "* You can have Anaconda install or update it for you. See [Using Anaconda Navigator](https://docs.continuum.io/anaconda/navigator-using#) and scroll down to the part about updating packages. Basically you click on the check to the left of the package and pull down to \"Mark for upgrade\". Then click the Apply button below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting a Text\n", "\n", "Now we will get a text to process with NLTK." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we see what text files we have. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hume Enquiry.txt negative.txt positive.txt\r\n", "Hume Treatise.txt obama_tweets.txt\r\n" ] } ], "source": [ "ls *.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to use the \"Hume Enquiry.txt\" from the Gutenberg Project. You can use whatever text you want. We print the first 50 characters to check." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This string has 366798 characters.\n", "The Project Gutenberg EBook of An Enquiry Concerni\n" ] } ], "source": [ "theText2Use = \"Hume Enquiry.txt\"\n", "with open(theText2Use, \"r\") as fileToRead:\n", " theString = fileToRead.read()\n", " \n", "print(\"This string has\", len(theString), \"characters.\")\n", "print(theString[:50])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tokenization\n", "\n", "Now we tokenize the text using NTLK's tokenizer producing a list called \"listOfTokens\" and check the first words. Note that the NTLK tokenizer doesn't eliminate punctuation and doesn't lower case the words. You can tokenize using another method if you want. Then we create a NLTK text object from the tokens. Note how the text object behaves like a list of tokens." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['The', 'Project', 'Gutenberg', 'EBook', 'of', 'An', 'Enquiry', 'Concerning', 'Human', 'Understanding', ',', 'by', 'David', 'Hume', 'and', 'L.', 'A.', 'Selby-Bigge', 'This', 'eBook', 'is', 'for', 'the', 'use', 'of', 'anyone', 'anywhere', 'at', 'no', 'cost', 'and', 'with', 'almost', 'no', 'restrictions', 'whatsoever', '.', 'You', 'may', 'copy', 'it', ',', 'give', 'it', 'away', 'or', 're-use', 'it', 'under', 'the']\n" ] } ], "source": [ "listOfTokens = nltk.word_tokenize(theString)\n", "theText = nltk.Text(listOfTokens)\n", "print(listOfTokens[:50])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Concording\n", "\n", "Now we get a concordance for a word in one line. Note that we can control the width of the concordances. Edit the word to explore." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Displaying 25 of 3499 matches:\n", " The Project Gutenberg EBook of An Enquiry Concernin\n", " the use of anyone anywhere at no cost and with almo\n", "u may copy it , give it away or re-use it under the terms of the Project Gutenberg License included\n", " , give it away or re-use it under the terms of the Project Gutenberg License included with this eB\n", "AVID HUME Extracted from : Enquiries Concerning the Human Understanding , and Concerning the Princi\n", "erning the Human Understanding , and Concerning the Principles of Morals , By David Hume . Reprinte\n", "ples of Morals , By David Hume . Reprinted from The Posthumous Edition of 1777 , and Edited with In\n", " Oxford . Second Edition , 1902 CONTENTS I . Of the different Species of Philosophy II . Of the Ori\n", " Of the different Species of Philosophy II . Of the Origin of Ideas III . Of the Association of Ide\n", "Philosophy II . Of the Origin of Ideas III . Of the Association of Ideas IV . Sceptical Doubts conc\n", "ation of Ideas IV . Sceptical Doubts concerning the Operations of the Understanding V. Sceptical So\n", ". Sceptical Doubts concerning the Operations of the Understanding V. Sceptical Solution of these Do\n", "on of these Doubts VI . Of Probability VII . Of the Idea of necessary Connexion VIII . Of Liberty a\n", "nnexion VIII . Of Liberty and Necessity IX . Of the Reason of Animals X . Of Miracles XI . Of a par\n", "cular Providence and of a future State XII . Of the academical or sceptical Philosophy INDEX SECTIO\n", "al or sceptical Philosophy INDEX SECTION I . OF THE DIFFERENT SPECIES OF PHILOSOPHY . 1 . Moral phi\n", "ECIES OF PHILOSOPHY . 1 . Moral philosophy , or the science of human nature , may be treated after \n", " has its peculiar merit , and may contribute to the entertainment , instruction , and reformation o\n", "nt , instruction , and reformation of mankind . The one considers man chiefly as born for action ; \n", "ne object , and avoiding another , according to the value which these objects seem to possess , and\n", "hese objects seem to possess , and according to the light in which they present themselves . As vir\n", ". As virtue , of all objects , is allowed to be the most valuable , this species of philosophers pa\n", "ble , this species of philosophers paint her in the most amiable colours ; borrowing all helps from\n", "s manner , and such as is best fitted to please the imagination , and engage the affections . They \n", "t fitted to please the imagination , and engage the affections . They select the most striking obse\n" ] } ], "source": [ "theText.concordance(\"the\", width=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that ```concordance``` is not case sensitive. This will give you a concordance of both capitalized and lower case words.\n", "\n", "If you want more lines then you need to add a parameter." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Displaying 30 of 3499 matches:\n", " The Project Gutenberg EBook of An Enquiry\n", "d L. A. Selby-Bigge This eBook is for the use of anyone anywhere at no cost and\n", " it , give it away or re-use it under the terms of the Project Gutenberg Licens\n", " away or re-use it under the terms of the Project Gutenberg License included wi\n", "Extracted from : Enquiries Concerning the Human Understanding , and Concerning \n", " Human Understanding , and Concerning the Principles of Morals , By David Hume \n", "rals , By David Hume . Reprinted from The Posthumous Edition of 1777 , and Edit\n", "Second Edition , 1902 CONTENTS I . Of the different Species of Philosophy II . \n", "fferent Species of Philosophy II . Of the Origin of Ideas III . Of the Associat\n", " II . Of the Origin of Ideas III . Of the Association of Ideas IV . Sceptical D\n", "deas IV . Sceptical Doubts concerning the Operations of the Understanding V. Sc\n", "l Doubts concerning the Operations of the Understanding V. Sceptical Solution o\n", "e Doubts VI . Of Probability VII . Of the Idea of necessary Connexion VIII . Of\n", "II . Of Liberty and Necessity IX . Of the Reason of Animals X . Of Miracles XI \n", "idence and of a future State XII . Of the academical or sceptical Philosophy IN\n", "tical Philosophy INDEX SECTION I . OF THE DIFFERENT SPECIES OF PHILOSOPHY . 1 .\n", "HILOSOPHY . 1 . Moral philosophy , or the science of human nature , may be trea\n", "eculiar merit , and may contribute to the entertainment , instruction , and ref\n", "uction , and reformation of mankind . The one considers man chiefly as born for\n", ", and avoiding another , according to the value which these objects seem to pos\n", "ts seem to possess , and according to the light in which they present themselve\n", "e , of all objects , is allowed to be the most valuable , this species of philo\n", " species of philosophers paint her in the most amiable colours ; borrowing all \n", " and such as is best fitted to please the imagination , and engage the affectio\n", "o please the imagination , and engage the affections . They select the most str\n", "d engage the affections . They select the most striking observations and instan\n", "roper contrast ; and alluring us into the paths of virtue by the views of glory\n", "luring us into the paths of virtue by the views of glory and happiness , direct\n", " , direct our steps in these paths by the soundest precepts and most illustriou\n", "trious examples . They make us _feel_ the difference between vice and virtue ; \n" ] } ], "source": [ "theText.concordance(\"the\", lines=30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One thing that is annoying is that you can't easily save a concordance to a file and that is because the NLTK text object concordance is printed to the screen for exploration. You will need to cut and paste to a word processor to save this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot the Dispersion of Words\n", "\n", "We can easily plot the dispersion of words through the text. Note how it is case sensitive.\n", "\n", "The line ```%matplotlib inline``` makes sure that the plot is placed inline." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAEWCAYAAAB1xKBvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAFaNJREFUeJzt3XucZGV95/HPVwZFBEFgVFR0FK94\nyYgTDEQCbjZeEN2YxBUXN+Bq0Gh0vaDCYmT4w2wEo5KYDRhjSKIgyGrWJRp12TUaiEAPgqBAAAEF\n5KaZcJGIwG//OE9DTdvd8/R09XT38Hm/XvWqc55zznN+VXOmvlXPqT6VqkKSpI150GIXIElaHgwM\nSVIXA0OS1MXAkCR1MTAkSV0MDElSFwNDS0KSLyU5ZJ59HJrkH+fZx3eS7D+fPsZpHM/LJuxzbZJP\nbc59ankwMDRnSa5O8u/H2WdVvbSq/mqcfY5KsipJJbm93W5MckaSX5tSxzOr6msLVcdcLdTzkuSk\nJHe15+LHSb6a5Omb0M/YjwUtXQaGHmh2rKrtgF8Avgp8Psmhi1VMkhWLtW/g2PZcPA64CThpEWvR\nMmBgaKySHJjkgiTrk5yd5Dmtfff2TnbPNv+YJLdMDv8k+VqSN4z08ztJLklyW5Lvjmx3RJIrR9pf\nuSl1VtUNVXU8sBb4YJIHtf7ve8ecZK8kE0lubZ9IPtzaJz+tHJbk+iQ/TPKukdofNFLnj5KclmSn\nKdu+Psn3gf+bZJskn2rrrk9yXpJHTX1eWr/vS3JNkpuS/HWSHab0e0iS77fn9qjO5+InwMnAs6Zb\nnuQVbahufavnGa39b4DHA/+7fVJ5z1z/HbS8GBgam/ai/kngjcDOwInAF5I8pKquBN4LfDrJtsBf\nAidNN/yT5FUML+S/DTwceAXwo7b4SmBfYAfgGOBTSXadR9mfAx4JPG2aZccDx1fVw4HdgdOmLH8h\n8BTgRcARI0MzbwN+HdgPeAzwL8CfTtl2P+AZwIuBQ9rj2Y3heXsTcOc09Rzabi8EngRsB3xsyjov\naI/lV4H3T764zybJdsDBwLemWfZU4BTg7cBK4IsMAfHgqvrPwPeBl1fVdlV17Mb2peXNwNA4/Q5w\nYlWdU1X3tLH3nwK/BFBVfw5cDpwD7ArM9A74DQzDJefV4Iqquqb18dmqur6q7q2qU1t/e82j5uvb\n/U7TLPsZ8OQku1TV7VX1zSnLj6mqO6rqIoYAfE1rfyNwVFVdW1U/ZQi/35oy/LS2bXtn28/OwJPb\n87auqm6dpp6DgQ9X1feq6nbgSOCgKf0eU1V3VtWFwIUMQ28zOTzJeuAKhvA5dJp1Xg38XVV9tap+\nBnwIeCiwzyz9agtlYGicngC8qw1drG8vRrsxvMue9OcMQx9/0l5Mp7MbwyeJn5Pkt0eGvNa3vnaZ\nR82Pbfc/nmbZ64GnApe2YaIDpyz/wcj0Ndz/OJ/AcG5kssZLgHuAR82w7d8AXwY+04a4jk2y9TT1\nPKbtZ3SfK6b0e8PI9E8YgmAmH6qqHavq0VX1ivYpcNZ9VtW9rfbHTrOutnAGhsbpB8AH2ovQ5G3b\nqjoF7hv6+CjwF8DayXH9GfrZfWpjkicwBM7vATtX1Y7AxUDmUfMrGU74XjZ1QVVdXlWvYRiy+iBw\nepKHjayy28j047n/08oPgJdOeR62qarrRrsf2c/PquqYqtqD4Z37gQzDcVNdzxBGo/u8G7ix87Fu\nig32mSQMj3vysXi56wcQA0Obaut2snbytoLhxfxNSZ6fwcOSvCzJ9m2b44F1VfUG4O+AE2bo+xMM\nwyXPa/08uYXFwxheoG4GSPI6ZjhRuzFJHpXk94CjgSPbO+ep67w2ycq2bH1rvmdkld9Psm2SZwKv\nA05t7ScAH2g1k2Rlkv8wSy0vTPLsJFsBtzIMUd0zzaqnAO9I8sQWvn8AnFpVd8/lsc/RacDLkvxq\n+9TzLoZhxrPb8hsZzqfoAcDA0Kb6IsOJ2cnb2qqaYDiP8TGGE71X0MbF2wvmSxhO6AK8E9gzycFT\nO66qzwIfYPjmzm3A3wI7VdV3gT8C/onhherZwFlzrHt9kjuAi4ADgFdV1SdnWPclwHeS3M4QdgdV\n1b+NLP+H9hjPZBje+UprPx74AvCVJLcB3wSeP0tNjwZOZwiLS1q/0/3h3CcZhq++DlwF/Bvw1tkf\n7vxU1WXAa4E/AW4BXs5wkvuutsp/B97Xht8OX8hatPjiDyhJc5NkFcML9tYL/O5eWlL8hCFJ6mJg\nSJK6OCQlSeriJwxJUpfFvPBZt1122aVWrVq12GVI0rKybt26W6pq5bj6WxaBsWrVKiYmJha7DEla\nVpJcs/G1+jkkJUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgY\nkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgY\nkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgY\nkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgY\nkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgY\nkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpy6yBkbBzwgXtdkPC\ndSPzD+7ZQcJvJDx9ZP4fE1bPt3BJeiBYu3axK7jfrIFRxY+qWF3FauAE4COT81XcBZCQZNZ+fgPu\nDwxJUr9jjlnsCu63SUNSCU9OuDjhBOB8YLeE9SPLD0r4RMK+wAHAR9qnklVtlYMSzk24LGGf+T4I\nSdLCm885jD2Av6jiucB1061QxTeALwLvaJ9Krm6LUsVewLuB90+3bZLDkkwkmbj55pvnUaYkaRzm\nExhXVnHeJm77uXa/Du771LGBqvp4Va2pqjUrV67cxN1IksZlPoFxx8j0vUBG5rfZyLY/bff3ACvm\nUYMkaTMZy9dqq7gX+JeEp7QT4K8cWXwbsP049iNJDzRHH73YFdxvnH+H8V7g74EzgWtH2k8B/tuU\nk96SpA5L6Wu1qarFrmGj1qxZUxMTE4tdhiQtK0nWVdWacfXnX3pLkroYGJKkLgaGJKmLgSFJ6mJg\nSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJg\nSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJg\nSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJg\nSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJg\nSJK6GBiSpC4GhiSpi4EhSepiYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKlLd2Ak7Jjw\n5rnuIOHQhMeMzF+dsMtc+9lUa9fOfZv9959/H7NtO5/+5rqvzbFfbWghnudxHpPjMl0N+++/NGqb\nNFnLTDUtlVqXSh0bk6rqWzGsAs6o4llT2req4p5ZtvsacHgVE23+amBNFbf0FrlmzZqamJjoXX3q\n/ul8iDNusyl9zLbtfPqb6742x361oYV4nsd5TI7LTMc2LH5tkyZrnOn5WgrP40LWkWRdVa0ZV38r\n5rDuHwK7J1wA/Ay4HfghsDrhAEbCJOFwYDvgYmAN8OmEO4G9W19vTXg5sDXwqiouHcujkSQtmLmc\nwzgCuLKK1cC7gb2Ao6rYY6YNqjgdmAAOrmJ1FXe2RbdUsSfwZ8Dh022b5LAkE0kmbr755jmUKUla\nCPM56X1uFVdt4rafa/frgFXTrVBVH6+qNVW1ZuXKlZu4G0nSuMwnMO4Ymb57Sl/bbGTbn7b7e5jb\nsJgkaZHM5cX6NmD7GZbdCDwyYWeGcxsHAn/fsd2CO/rouW+z337z72O2befT31z3tTn2qw0txPM8\nzmNyXKarYb/9fv4bXYtpssaZnq+l8DzC0qljY7q/JQWQcDLwHOBO4MYqDhxZ9jbgbcBVwHXA1VWs\nTfhN4A/aNnsDl9C+JZWwBvhQFfvPtt/5fEtKkh6oxv0tqTkFxmIxMCRp7sYdGP6ltySpi4EhSepi\nYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepi\nYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepi\nYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepi\nYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepi\nYEiSuhgYkqQuBoYkqYuBIUnqYmBIkroYGJKkLgaGJKmLgSFJ6mJgSJK6GBiSpC4GhiSpi4EhSepi\nYEiSuhgYkqQuBoYkqYuBIUnqYmBIkrqkqha7ho1KcjNwzRw22QW4ZYHKWUjWvXktx7qXY81g3Zvb\nZN1PqKqV4+p0WQTGXCWZqKo1i13HXFn35rUc616ONYN1b24LVbdDUpKkLgaGJKnLlhoYH1/sAjaR\ndW9ey7Hu5VgzWPfmtiB1b5HnMCRJ47elfsKQJI2ZgSFJ6rLFBUaSlyS5LMkVSY5YhP1/MslNSS4e\nadspyVeTXN7uH9Hak+SPW63fTrLnyDaHtPUvT3LISPvzklzUtvnjJBlT3bsl+X9JLknynST/dTnU\nnmSbJOcmubDVfUxrf2KSc1oNpyZ5cGt/SJu/oi1fNdLXka39siQvHmlfkGMqyVZJvpXkjGVU89Xt\n3/CCJBOtbUkfI63fHZOcnuTSdozvvdTrTvK09jxP3m5N8vZFrbuqtpgbsBVwJfAk4MHAhcAem7mG\nXwH2BC4eaTsWOKJNHwF8sE0fAHwJCPBLwDmtfSfge+3+EW36EW3ZucDebZsvAS8dU927Anu26e2B\nfwb2WOq1t762a9NbA+e0ek4DDmrtJwC/26bfDJzQpg8CTm3Te7Tj5SHAE9txtNVCHlPAO4GTgTPa\n/HKo+WpglyltS/oYaf3+FfCGNv1gYMflUPdI/VsBNwBPWMy6F+yFczFu7YF/eWT+SODIRahjFRsG\nxmXArm16V+CyNn0i8Jqp6wGvAU4caT+xte0KXDrSvsF6Y34M/wv4teVUO7AtcD7wfIa/cl0x9bgA\nvgzs3aZXtPUy9ViZXG+hjingccCZwL8Dzmg1LOmaW19X8/OBsaSPEeDhwFW0L/ksl7qn1Poi4KzF\nrntLG5J6LPCDkflrW9tie1RV/RCg3T+ytc9U72zt107TPlZtyOO5DO/Wl3ztbWjnAuAm4KsM767X\nV9Xd0+zrvvra8n8Fdt6ExzNfHwXeA9zb5ndeBjUDFPCVJOuSHNbalvox8iTgZuAv2xDgJ5I8bBnU\nPeog4JQ2vWh1b2mBMd3421L+3vBM9c61fXwFJdsB/xN4e1XdOtuqM9Sy2WuvqnuqajXDu/a9gGfM\nsq9FrzvJgcBNVbVutHmW/Sx6zSN+uar2BF4KvCXJr8yy7lKpewXDMPGfVdVzgTsYhnJmslTqHooZ\nzmW9AvjsxladoY6x1b2lBca1wG4j848Drl+kWkbdmGRXgHZ/U2ufqd7Z2h83TftYJNmaISw+XVWf\nW061A1TVeuBrDOO3OyZZMc2+7quvLd8B+PFG6h73MfXLwCuSXA18hmFY6qNLvGYAqur6dn8T8HmG\ngF7qx8i1wLVVdU6bP50hQJZ63ZNeCpxfVTe2+cWre5zjbIt9Y3gn8T2GE4CTJ/ueuQh1rGLDcxjH\nseFJqmPb9MvY8CTVua19J4Yx10e021XATm3ZeW3dyZNUB4yp5gB/DXx0SvuSrh1YCezYph8KfAM4\nkOHd2OgJ5De36bew4Qnk09r0M9nwBPL3GE40LugxBezP/Se9l3TNwMOA7UemzwZestSPkdbvN4Cn\ntem1reYlX3fr+zPA65bC/8kFe9FcrBvDNwX+mWEc+6hF2P8pwA+BnzEk+OsZxpvPBC5v95P/WAH+\ntNV6EbBmpJ//AlzRbqMHyxrg4rbNx5hyIm8edb+A4ePot4EL2u2ApV478BzgW63ui4H3t/YnMXwD\n5AqGF+KHtPZt2vwVbfmTRvo6qtV2GSPfFlnIY4oNA2NJ19zqu7DdvjPZ71I/Rlq/q4GJdpz8LcML\n53Koe1vgR8AOI22LVreXBpEkddnSzmFIkhaIgSFJ6mJgSJK6GBiSpC4GhiSpi4GhLUKSjyR5+8j8\nl5N8YmT+j5K8cx79r01y+AzLDmtXQb00w5VzXzCybN8MV9G9IMlDkxzX5o+b4/5XJflPm1q/NA4G\nhrYUZwP7ACR5ELALwx+2TdoHOKunoyRb9e60XebjjcALqurpwJuAk5M8uq1yMPChqlpdVXe2dfes\nqnf37qNZBRgYWlQGhrYUZ9ECgyEoLgZuS/KIJA9huL7Ut9pvBhyX5OL2OwCvBkiyf4bfAzmZ4Y+e\nSHJUht+U+D/A02bY73uBd1fVLQBVdT7DpbTfkuQNwH8E3p/k00m+wPAX0uckeXWSV7U6Lkzy9bbP\nrVp957XfNHhj288fAvu2TyrvGOcTJ/VasfFVpKWvqq5PcneSxzMExz8xXHlzb4aru367qu5K8psM\nf/X7CwyfQs6bfLFmuC7Ss6rqqiTPY7gMx3MZ/p+cD6zj5z1zmvYJ4JCq+v02PHVGVZ0OkOT2Gi6U\nSJKLgBdX1XVJdmzbvh7416r6xRZ0ZyX5CsMlIA6vqgPn90xJm87A0JZk8lPGPsCHGQJjH4bAOLut\n8wLglKq6h+Eibv8A/CJwK8O1d65q6+0LfL6qfgLQPh30Cn1XKz0LOCnJacDkxR5fBDwnyW+1+R2A\npwB3zWH/0oJwSEpbksnzGM9mGJL6JsMnjNHzF7P9BOUdU+Z7XvS/CzxvStuerX1WVfUm4H0MVxK9\nIMnOrb63tnMeq6vqiVX1lY46pAVnYGhLchbDlWp/XMNvZPyY4ac492YYogL4OvDqdq5gJcNP6p47\nTV9fB17Zvtm0PfDyGfZ5LPDB9mJPktXAocD/2FixSXavqnOq6v0Mv6K3G8Ov5v1uu9Q8SZ7afuzn\nNoafzpUWjUNS2pJcxHBe4uQpbdtNnpRm+A2HvRmuuFrAe6rqhiRPH+2oqs5PcirDVXuvYbg89s+p\nqi8keSxwdpJieGF/bbVfRNuI45I8heFTxZmtpm8zfCPq/CRh+KW4X2/tdye5EDipqj7S0b80Vl6t\nVpLUxSEpSVIXA0OS1MXAkCR1MTAkSV0MDElSFwNDktTFwJAkdfn/W3/7luQrlHIAAAAASUVORK5C\nYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "theText.dispersion_plot([\"Truth\",\"truth\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Counting Words and Frequencies\n", "\n", "You can also count words. This is case sensitive if you use the text object. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 20\n" ] } ], "source": [ "print(theText.count(\"Truth\"), \" \", theText.count(\"truth\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make it case insensitive we are going to use [list comprehension](http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html) to lowercase every token and get a new list of tokens. We are also going to get rid of punctuation using a parameter. Then we can count things in the list." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['the', 'project', 'gutenberg', 'ebook', 'of', 'an', 'enquiry', 'concerning', 'human', 'understanding', 'by', 'david', 'hume', 'and', 'l.', 'a.', 'selby-bigge', 'this', 'ebook', 'is']\n" ] } ], "source": [ "theLowerTokens = [token.lower() for token in listOfTokens if token[0].isalpha()]\n", "print(theLowerTokens[:20])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "theLowerTokens.count(\"truth\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With NLTK we can get word frequencies. These can be displayed as a table. We can then do other things with the frequency distribution object. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " the of and to a in that is it which or be we by from \n", " 3499 2848 2210 1809 1165 1117 1002 955 786 750 711 674 663 564 529 \n" ] } ], "source": [ "theLowerFreqs = nltk.FreqDist(theLowerTokens)\n", "theLowerFreqs.tabulate(15)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "theLowerFreqs[\"truth\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rather than get the count we can get the relative frequency which is the count divided by the number of tokens." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.058443293803240356" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "theLowerFreqs.freq(\"the\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot the Frequency of Words\n", "We can also plot the high frequency words." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEdCAYAAAAb9oCRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3Xl8nFW9+PHPd7KnaZbu6UIXKBRa\nuiXQAmVXtqsCIioqFi5aFxCveBXwiiCK4lXh5waKllUU2by2BYVaoFBKaZPuGzRdaEOXNG3WZk++\nvz+eM+l0OpPMJJlMlu/79XpeM3PmPOc5mZnMd87ynEdUFWOMMSZSvnhXwBhjTO9igcMYY0xULHAY\nY4yJigUOY4wxUbHAYYwxJioWOIwxxkTFAocxxpioWOAwxhgTFQscxhhjopIY7wrEwpAhQ3TcuHEd\n3r+2tpa0tLQuy2dlWplWppXZE8sMVlhYWKqqQ9vNqKp9bsvLy9POKCgo6NJ8VqaVaWVamT2xzGBA\ngUbwHWtdVcYYY6JigcMYY0xULHAYY4yJigUOY4wxUbHAYYwxJioxCxwikioiK0VknYhsEpEfuvTH\nRWSniKx123SXLiLyaxEpEpH1IjIzoKy5IrLNbXNjVWdjjDHti+V5HPXARapaLSJJwDIR+ad77juq\n+nxQ/suBiW6bBTwMzBKRQcDdQD6gQKGILFDVslhUuq6xmcr6llgUbYwxfULMWhxuWnC1e5jktrau\nU3sl8KTbbwWQLSK5wKXAYlU97ILFYuCyWNT5+cJipv7wVZ7bXN1+ZmOM6adEY3jNcRFJAAqBk4Df\nqertIvI4cBZei2QJcIeq1ovIIuB+VV3m9l0C3A5cAKSq6o9d+l1Arar+IuhY84B5ALm5uXkLFy6M\nur5bShv4/uuHGZnh4zeXD2s3f01NDenp6RGVHWleK9PKtDKtzO4qM1h+fn6hqua3mzGSswQ7uwHZ\nwOvAFCAXECAFeAL4gcvzEjAnYJ8lQB7wHeD7Ael3Ad9u63gdPXO8oalZT7vrnzr29kW6r7y23fzx\nPuvTyrQyrUwrszNlBqMnnTmuquXAG8BlqrrP1bEeeAw402UrBsYE7DYa2NtGepdLSvAxa8JgAN4u\nKo3FIYwxpteL5ayqoSKS7e6nAR8BtrpxC0REgKuAjW6XBcAX3eyq2UCFqu4DXgEuEZEcEckBLnFp\nMXHOSUMACxzGGBNOLGdV5QJPuHEOH/Csqi4SkddEZChed9Va4Ksu/8vAFUARUAPcCKCqh0XkR8Aq\nl+9eVT0cq0rPcYFjWVEpqooX34wxxvjFLHCo6npgRoj0i8LkV+DmMM89CjzapRUM4+ThGWSn+Cip\nqqeopJqJwwd2x2GNMabXsDPHg4gIU4cnA16rwxhjzLEscIRwugscNs5hjDHHs8ARwtRhKQCs2HGY\nxmY7i9wYYwJZ4AhhSHoCE4YOoLq+ifXF5fGujjHG9CgWOMJonV217VCca2KMMT2LBY4w7HwOY4wJ\nzQJHGLMnDMYnsHp3GUfqm+JdHWOM6TEscISRlZbE1NHZNLUoK3fG7HxDY4zpdSxwtCHwLHJjjDEe\nCxxtsHEOY4w5ngWONswcm01qko+t+6soqaqLd3WMMaZHsMDRhpTEBM4c7y2z/s52m5ZrjDFggaNd\nc07yAseybdZdZYwxYIGjXYHjHBrDy+waY0xvYYGjHaeOyGTQgGT2VtSxs/RIvKtjjDFxZ4GjHT6f\ncPaJdjlZY4zxs8ARgTmt3VU2QG6MMRY4IuAf51i+vZTmFhvnMMb0bxY4IjBmUDpjB6dTWdfExg8r\n4l0dY4yJKwscETrHlh8xxhjAAkfE5tjyI8YYA8QwcIhIqoisFJF1IrJJRH7o0seLyLsisk1E/iYi\nyS49xT0ucs+PCyjrTpf+nohcGqs6t+WsCYMRgYJdZdQ2NMejCsYY0yPEssVRD1ykqtOA6cBlIjIb\n+BnwoKpOBMqAm1z+m4AyVT0JeNDlQ0ROAz4LTAYuAx4SkYQY1juknAHJTBmZRUNzCwUf2DLrxpj+\nK2aBQz3V7mGS2xS4CHjepT8BXOXuX+ke456/WETEpT+jqvWquhMoAs6MVb3bYuMcxhgDEstlNFzL\noBA4Cfgd8HNghWtVICJjgH+q6hQR2QhcpqrF7rntwCzgHrfPn136fLfP80HHmgfMA8jNzc1buHBh\nh+tdU1NDenr6cenrDtRz75tlTMhO5OcfHRI2XzRldjSflWllWplWZmfLDJafn1+oqvntZlTVmG9A\nNvA6cC5QFJA+Btjg7m8CRgc8tx0YjBdwvhCQPh+4pq3j5eXlaWcUFBSETK9taNKJ//OyjrtjkR6q\nrg+bL5oyO5rPyrQyrUwrs7NlBgMKNILv9G6ZVaWq5cAbwGwgW0QS3VOjgb3ufrELJLjns4DDgekh\n9ulWqUkJnDEuB1VbZt0Y03/FclbVUBHJdvfTgI8AW/BaHp9y2eYC/3D3F7jHuOdfcxFwAfBZN+tq\nPDARWBmrerfHxjmMMf1dLFscucDrIrIeWAUsVtVFwO3AbSJShNcVNd/lnw8Mdum3AXcAqOom4Flg\nM/Av4GZVjdt8WDufwxjT3yW2n6VjVHU9MCNE+g5CzIpS1Trg2jBl3Qfc19V17IjJI7PISkti9+Ea\n9ld3bADKGGN6MztzPEoJAcusbyhpiHNtjDGm+1ng6AD/OMf6AxY4jDH9jwWODvCPc2woqafFllk3\nxvQzFjg6YOzgdEZlp1HVoGzeVxnv6hhjTLeywNEBIsK5E71Wx1vbbHaVMaZ/scDRQXMm+s/nOBjn\nmhhjTPeywNFB55w4BAFW7bRl1o0x/YsFjg7KGZDMhJxEGppbWLnLllk3xvQfFjg6YdrwFACWbbPu\nKmNM/2GBoxOmDk8GbIDcGNO/WODohEmDk0lLSmDr/ipKKuviXR1jjOkWFjg6ISlBmDVhEGCr5Rpj\n+g8LHJ3kP4t8mXVXGWP6CQscnXTeyUMBeKuo1H+FQmOM6dMscHTSxGEZDM9M4WBVPe8dqIp3dYwx\nJuYscHSSiDDnJK/VYd1Vxpj+wAJHF7B1q4wx/YkFji7gvz7HuzsPUddoy48YY/o2CxxdYOjAFE7N\nzaSusYXVH5TFuzrGGBNTFji6iL+76k3rrjLG9HEWOLpI6/kctsy6MaaPi1ngEJExIvK6iGwRkU0i\n8k2Xfo+IfCgia912RcA+d4pIkYi8JyKXBqRf5tKKROSOWNW5M84cP4jkRB+b9lZyqLo+3tUxxpiY\niWWLown4tqqeCswGbhaR09xzD6rqdLe9DOCe+ywwGbgMeEhEEkQkAfgdcDlwGnBdQDk9RmpSAmeO\nG4QqvL39ULyrY4wxMROzwKGq+1R1tbtfBWwBRrWxy5XAM6par6o7gSLgTLcVqeoOVW0AnnF5e5zW\nqwLaMuvGmD6sW8Y4RGQcMAN41yXdIiLrReRREclxaaOAPQG7Fbu0cOk9zrkTj65bZcuPGGP6Kon1\nF5yIZABLgftU9UURGQ6UAgr8CMhV1f8Ukd8B76jqn91+84GX8YLbpar6JZd+PXCmqn4j6DjzgHkA\nubm5eQsXLuxwnWtqakhPT486X4sqNy08SGV9C7+6dAijMxM7XWYs6mllWplWppUZSn5+fqGq5reb\nUVVjtgFJwCvAbWGeHwdsdPfvBO4MeO4V4Cy3vRKQfky+UFteXp52RkFBQYfz3frX1Tr29kX62LId\nXVZmZ/NamVamlWllRgIo0Ai+22M5q0qA+cAWVX0gID03INvVwEZ3fwHwWRFJEZHxwERgJbAKmCgi\n40UkGW8AfUGs6t1Z/mm5tvyIMaavSmw/S4edA1wPbBCRtS7te3izoqbjdVXtAr4CoKqbRORZYDPe\njKybVbUZQERuwWuBJACPquqmGNa7U86d6C14uGLHIRqbW0hKsFNljDF9S8wCh6ouAyTEUy+3sc99\nwH0h0l9ua7+eZERWKhOHZbCtpJo1u8s5c/ygeFfJGGO6lP0cjoE5ravl2rRcY0zfY4EjBmyZdWNM\nX2aBIwZmjR9MUoKwvriciprGeFfHGGO6lAWOGBiQksjME3JoUVi+3Vodxpi+xQJHjLR2VxVZ4DDG\n9C0WOGLEPy3XBsiNMX2NBY4YmTIqi6y0JPYcruWDQ0fiXR1jjOkyFjhiJMEndha5MaZPssARQ3Y+\nhzGmL7LAEUP+Fsfy7YdobrFl1o0xfYMFjhgaMyid8UMGUFXXxPYyO5/DGNM3WOCIMX+rY+2BhjjX\nxBhjuoYFjhi74BRvWu6bH9TSYt1Vxpg+wAJHjJ1/8lBGZaexr7qZpTZIbozpAyxwxFhigo/rzxoL\nwGNv74pvZYwxpgtY4OgGnz1jDMkJ8Ob7BykqqY53dYwxplMscHSD7PRkzh+bBsCT7+yKa12MMaaz\nLHB0kytOSgfg+cJiKmptaq4xpveKOnCISI6ITI1FZfqyE7KSOPvEwdQ0NPNcwZ54V8cYYzososAh\nIm+ISKaIDALWAY+JyAOxrVrfc+M54wF44p1ddia5MabXirTFkaWqlcAngcdUNQ/4SOyq1TddNGkY\nYwalsedwLa9tLYl3dYwxpkMiDRyJIpILfBpYFMP69GkJPmHuWeMAeHz5zvhWxhhjOijSwPFD4BWg\nSFVXicgEYFtbO4jIGBF5XUS2iMgmEfmmSx8kIotFZJu7zXHpIiK/FpEiEVkvIjMDyprr8m8Tkbkd\n+1N7hmvzx5CenMDbRYd4b39VvKtjjDFRizRw7FPVqar6dQBV3QG0N8bRBHxbVU8FZgM3i8hpwB3A\nElWdCCxxjwEuBya6bR7wMHiBBrgbmAWcCdztDza9UVZaEtfMHA3A48t3xbcyxhjTAZEGjt9EmNZK\nVfep6mp3vwrYAowCrgSecNmeAK5y968EnlTPCiDbdY9dCixW1cOqWgYsBi6LsN490tyzxwHw9zXF\nlNfY4ofGmN5FVMPP7hGRs4Czgf8CHgx4KhO4WlWnRXQQkXHAm8AUYLeqZgc8V6aqOSKyCLhfVZe5\n9CXA7cAFQKqq/til3wXUquovgo4xD6+lQm5ubt7ChQsjqVpINTU1pKend1m+UHl/9OZh1h5o4PrT\nM7hqUkaXlBmLelqZVqaV2XfLDJafn1+oqvntZlTVsBtwPl430T53699uAya2tW9AGRlAIfBJ97g8\n6Pkyd/sSMCcgfQmQB3wH+H5A+l14XWBhj5mXl6edUVBQ0KX5QuV9bcsBHXv7Ij37p0u0sam5S8rs\nbD4r08q0MvtXmcGAAo3gez2xnaCyFFgqIo+r6gdRBC4ARCQJeAF4WlVfdMkHRCRXVfe5rij/vNRi\nYEzA7qOBvS79gqD0N6KtS09z/slDGT9kADtLj7B48wEuPz033lUyxpiIRDrGkSIij4jIqyLymn9r\nawcREWA+sEVVAwfSFwD+mVFzgX8EpH/Rza6aDVSo6j682VyXuDPWc4BLXFqv5vMJc/2r5toguTGm\nF2mzxRHgOeD3wJ+A5gj3OQe4HtggImtd2veA+4FnReQmYDdwrXvuZeAKoAioAW4EUNXDIvIjYJXL\nd6+qHo6wDj3aNXmj+cWr77Ny52E27a1g8siseFfJGGPaFWngaFLVh6MpWL1Bbgnz9MUh8itwc5iy\nHgUejeb4vcHA1CSuzR/NY2/v4vG3d/HzayOaa2CMMXEVaVfVQhH5uojkuhP4BrnzK0wnzT1rHCLw\nj3V7OVRdH+/qGGNMuyINHHPxZjctx5shVQgUxKpS/cm4IQO48JRhNDS18MwqWzXXGNPzRRQ4VHV8\niG1CrCvXX9x4zjgAnnrnA5ps1VxjTA8X0RiHiHwxVLqqPtm11emf5pw0hJOGZVBUUs27H6Yw64x4\n18gYY8KLtKvqjIDtXOAe4BMxqlO/IyKty5C8tK0mvpUxxph2RNpV9Y2A7cvADCA5tlXrX66ZOYrM\n1ETeO9RIUYmtmmuM6bk6es3xGrxVbE0XSU9O5KJJwwB48/3SONfGGGPCi/TSsQtFZIHbXgLe4+gZ\n36aLzJk4FIBlRRY4jDE9V6QnAAauRNsEfKCqxTGoT78256QhAKzYcYjG5haSEjraIDTGmNiJdIxj\nKbAVGAjkAHYRiRgYkZXK6IEJ1DQ0s2Z3ebyrY4wxIUXaVfVpYCXeulKfBt4VkU/FsmL91dThKQAs\n23YwzjUxxpjQIu0L+R/gDFWdq6pfxLuE612xq1b/NXW4N1ntLRvnMMb0UJEGDp+qlgQ8PhTFviYK\nU4Ymk+gT1u0pp7KuMd7VMcaY40T65f8vEXlFRG4QkRvwrtb3cuyq1X+lJfmYcUI2LQrvbD8U7+oY\nY8xx2gwcInKSiJyjqt8B/gBMBaYB7wCPdEP9+qU5J7lpudusu8oY0/O01+L4f0AVgKq+qKq3qeq3\n8Fob/y/Wleuv5kwcDMDbNs5hjOmB2gsc41R1fXCiqhYA42JSI8O00dkMTElkR+kRPiyvjXd1jDHm\nGO0FjtQ2nkvryoqYoxITfMw+0Wt12LRcY0xP017gWCUiXw5OdNcLL4xNlQzAuRO9s8jfsnEOY0wP\n096SI/8F/F1EPs/RQJGPtzLu1bGsWH93jlt+ZPn2Q7S0KD5fuMu3G2NM92ozcKjqAeBsEbkQmOKS\nX1LV12Jes35uwpABjMxKZW9FHZv3VTJlVFa8q2SMMUDka1W9rqq/cVtEQUNEHhWREhHZGJB2j4h8\nKCJr3XZFwHN3ikiRiLwnIpcGpF/m0opE5I5o/rjeTESY47qrbLVcY0xPEsuzvx8HLguR/qCqTnfb\nywAichrwWWCy2+chEUkQkQTgd8DlwGnAdS5vv+DvrrLzOYwxPUnMAoeqvgkcjjD7lcAzqlqvqjuB\nIrz1sM4EilR1h6o2AM+4vP2CP3Cs3HWYusbmONfGGGM8oqqxK1xkHLBIVae4x/cANwCVQAHwbVUt\nE5HfAitU9c8u33zgn66Yy1T1Sy79emCWqt4S4ljzgHkAubm5eQsXLuxwvWtqakhPT++yfJ0p878X\nl7KzvIkfnJfDNLdybk+sp5VpZVqZva/MYPn5+YWqmt9uRlWN2YZ3kuDGgMfDgQS8ls59wKMu/XfA\nFwLyzQeuwVvG/U8B6dcDv2nvuHl5edoZBQUFXZqvM2X+5KXNOvb2RfqTlzd3WZldkdfKtDKtzN5f\nZjCgQCP4bu/WFW5V9YCqNqtqC/BHvK4ogGJgTEDW0cDeNtL7DX93lS0/YozpKbo1cIhIbsDDqwH/\njKsFwGdFJEVExgMT8S4ctQqYKCLjRSQZbwB9QXfWOd7OHD+I5EQfm/ZWcviIXXjRGBN/MQscIvJX\nvFV0TxGRYne2+f+KyAYRWQ9cCHwLQFU3Ac8Cm4F/ATe7lkkTcAvwCrAFeNbl7TdSkxI4Y1wOqtbq\nMMb0DO2dOd5hqnpdiOT5beS/D2/cIzj9Zfr5tT/OOWkIbxcdYtm2Uj4+bWS8q2OM6efsKn69wLn+\n63MUlfonCRhjTNxY4OgFJo/MJCc9iQ/La9l1qCbe1THG9HMWOHoBn084u/Uscltm3RgTXxY4eok5\nJ9ky68aYnsECRy/hDxzv7DhEU3NLnGtjjOnPLHD0EmMGpTNucDpVdU2s/7Ai3tUxxvRjFjh6kdZl\n1q27yhgTRxY4ehF/d5Vdn8MYE08WOHqRs04cgk9gze4yjtQ3xbs6xph+ygJHL5KVlsTU0dk0Nivv\n7jwU7+oYY/opCxy9jE3LNcbEmwWOXsY/QG4LHhpj4sUCRy8z84Qc0pMTeP9ANYdr7XKyxpjuZ4Gj\nl0lO9DFr/CAAVu+rj3NtjDH9kQWOXsh/VcCHCyv5yANLuXfhZt54r4TaBmuBGGNiL2bX4zCxc83M\n0azeXcZrWw5QVFJNUUk1j769s7U1ct7EoZx38lBOHp6BiMS7usaYPsYCRy+UMyCZhz6fx4pVBTB4\nPG++f5A3tx1k44eVvLWtlLe2lXLfy1sYkZnKuROHcFJqHTNnqgURY0yXsMDRiyX5hLwJg5k9YTDf\nvWwSpdX1LNtW6gJJKfsr63iusBiAVYcLuO/q0xmemRrnWhtjejsLHH3IkIwUrpoxiqtmjKKlRdmy\nv5LXt5bw0Gvb+PeWElbuXMo9n5jM1TNGWevDGNNhFjj6KJ9PmDwyi8kjszg5qYy/FMEb7x3ktmfX\n8dL6ffzkk9b6MMZ0jM2q6gcGpyfw2A1n8PNPTWVgaiJLtpbw0QeW8kJhsV3D3BgTtZgFDhF5VERK\nRGRjQNogEVksItvcbY5LFxH5tYgUich6EZkZsM9cl3+biMyNVX37OhHh2vwxLP7W+Vw0aRiVdU18\n+7l13PREAfsr6uJdPWNMLxLLFsfjwGVBaXcAS1R1IrDEPQa4HJjotnnAw+AFGuBuYBZwJnC3P9iY\njhmRlcr8ufn88tppZKYm8trWEj764FKeK9hjrQ9jTERiFjhU9U3gcFDylcAT7v4TwFUB6U+qZwWQ\nLSK5wKXAYlU9rKplwGKOD0YmSiLCNXmjWXzb+Vw8aRhVdU185/n1/Ofjq3j/UINdmtYY06buHhwf\nrqr7AFR1n4gMc+mjgD0B+YpdWrh00wWGZ6byp7n5/H3Nh9yzYBOvv3eQ19+Dny5fzNknDuHck4dw\n3sShjBmUHu+qGmN6EIll94SIjAMWqeoU97hcVbMDni9T1RwReQn4qaouc+lLgO8CFwEpqvpjl34X\nUKOqvwxxrHl43Vzk5ubmLVy4sMP1rqmpIT29/S/LSPP1hjLLapt5cesRCvfWcaDm2BbHiAEJTBuR\nzLThKUwZlsyAJF+f+tutTCuzv5YZLD8/v1BV89vNqKox24BxwMaAx+8Bue5+LvCeu/8H4LrgfMB1\nwB8C0o/JF27Ly8vTzigoKOjSfL2tzN2HjujTKz7Qrz5VoKff/S8de/ui1m3CnS/pJx96W+/5y1Jt\nbm6Jaz2tTCvTyuxcmcGAAo3gu727u6oWAHOB+93tPwLSbxGRZ/AGwivU68p6BfhJwID4JcCd3Vzn\nfmfMoHQ+N+sEPjfrBJpblPXF5by1rZRl20pZvbuMwg/KKPwAqhLX87NrppLgs5MJjelPYhY4ROSv\nwAXAEBEpxpsddT/wrIjcBOwGrnXZXwauAIqAGuBGAFU9LCI/Ala5fPeqavCAu4mhBJ8w44QcZpyQ\nw60XT6S6vol/bz7A7c+v4/nCYpqaW/jFtdNITLBTgozpL2IWOFT1ujBPXRwirwI3hynnUeDRLqya\n6YSMlESumjGKygO7uX95Bf+3di9NLcqDn5lOkgUPY/oF+083HTJ5aDJP3XQmGSmJLFq/j2/8ZQ0N\nTTaN15j+wAKH6bC8sYP485dmMTA1kX9t2s/Xny6kvskuJmVMX2eBw3TK9DHZ/PXLs8lOT+LfW0r4\nylOF1DVa8DCmL7PAYTptyqgs/vKl2QwakMwb7x3ky08W2GVsjenDLHCYLnHayEyemTebIRkpvLWt\nlBsfX8mR+qZ4V8sYEwMWOEyXOXn4QJ6ZN5thA1NYseMwNzy2kmoLHsb0ORY4TJc6aVgGf/vKWeRm\npbJqVxnXz3+XvVVNtvKuMX2IXQHQdLnxQwbwt3lncd0fV7Bmdznf2A3fX7qYqaOzmDY627sdk21X\nIDSml7LAYWLihMHp/O0rs/nJy1tY/n4J5bWNvLWtlLe2lbbmGZGZ2hpEpo3OpqXBzgMxpjewwGFi\nZnROOg99Po+CggJGnjSZ9cXlrCuuYH1xOev3VLC/so79m+t4dfOB1n1OXrGUmSfkMHNsDnljc5gw\nZAAithaWMT2JBQ4TcyLCyOw0RmancdmUXABaWpSdh454wWRPBeuKy9lYXM77B6p5/0A1z6zyLsOS\nnZ7EzBO8IDLzhBymjckiPdk+tsbEk/0Hmrjw+YQTh2Zw4tAMrp4xGoAVKwtIHnEiqz/wr8BbRklV\nPa9tLeG1rSWAt+jiqbkDGZXayEcpZvqYbCYMGYDPVug1pttY4DA9RlKCeN1UJ+TwpXO9a8V8WF5L\n4QdlXjDZXcaWfVVs/LCSjcAr29cBMDA1keljspnuxkqmn5DNkIyU+P4xxvRhFjhMjyUijM5JZ3RO\nOldO964YXNPQxLo9Fby0YiMHWwawdk85Byrrjxt4H52TxvQx2QxsrmZzwy4y05LITE0iMy2JrLTE\n1vupSQnx+vOM6bUscJheJT05kbNOHExyeQZ5eXkA7K+oY+2eMtbsKWft7nI2fFhBcVktxWW13k6b\nNoUtLznRR2ZqEjnpSZwxDCZPbbZgYkw7LHCYXm9EViqXZeW2Drw3tyjbSqpYu7ucFZt2MCB7MJV1\nTVTUNlJZ20hlnXdbUdtIQ1MLpdX1lFbXs60ElhYv5XtXnMoVp4+w2VzGhGGBw/Q5CT5h0ohMJo3I\nZGLCQfLyTg+ZT1Wpb2qhsraRTfsqufvFNewur+Xmv6zmzHGD+MHHT2PKqKxurr0xPZ8FDtNviQip\nSQmkJiUwLDOVAR8ZzLaWofzy1fdZueswH//tMq7NG81/X3oKwwbaWe7G+NlaVcY4CT7h87PG8vp/\nX8CX5ownQYRnC4q58Odv8NAbRXadEWMcCxzGBMlKS+L7HzuNV791Hh85dRhHGpr533+9x0cfXMo/\nN+yzBRtNv2ddVcaEMWFoBn+aewZvbTvIjxZt5v0D1Xzt6dWMz07kgn2bmDo6i9NH2QmIpv+xwGFM\nO86dOJSXbz2Xv67awwOvvsfO8kZ2vr2r9fkByQlMHpXF1FFZnD46i6mjsxk7KN2Ciemz4hI4RGQX\nUAU0A02qmi8ig4C/AeOAXcCnVbVMvDmRvwKuAGqAG1R1dTzqbfqvxAQf188ey9UzRvHMv1dSlzaU\n9cUVbPiwgn0VdazceZiVOw+35h+YmsiUkVkMTqhlh+7h1NxMThqWYeeImD4hni2OC1W1NODxHcAS\nVb1fRO5wj28HLgcmum0W8LC7NabbZaQkMmNECnl5E1vTDlbVs/HDChdIyllfXEFJVT3v7DgEwKJt\n6wFv8H38kAFMGjGQU3MzmTRiIJNyMxmZZTO2TO/Sk7qqrgQucPefAN7ACxxXAk+qNyK5QkSyRSRX\nVffFpZbGBBk6MIULJw3jwknDWtMOVNaxobiC11ZvpdI3kK37q9hxsJqiEm9btP7ox3dgaiLD0mD0\nupXkpCeRnZ7MoAHJrfdz0pNRytuIAAAbA0lEQVTJGZBEjks3Jt4kHjNERGQnUAYo8AdVfUREylU1\nOyBPmarmiMgi4H5VXebSlwC3q2pBUJnzgHkAubm5eQsXLuxw/WpqakhPT++yfFamlQlQ36x8WNnE\nropGPihvYndFE7sqmqisj+4CVtkpwujMJEZlJjBqYKK3ZSYyOM2HL+Bs9570t1uZPbPMYPn5+YWq\nmt9evni1OM5R1b0iMgxYLCJb28gbaoTxuGinqo8AjwDk5+erfx2jjigsLCSS/SPNZ2VamW0pqapj\nyTtrGH7CBA4faaS8poGymgbKarz7h480UF7TSJn/fr1SfrCBjQePLSctKYEJQwe0LlffVFHLjMmj\nyUpLIistmez0JLLSkkhKOH4Wfl96Pa3MyMvsqLgEDlXd625LROTvwJnAAX8XlIjkAiUuezEwJmD3\n0cDebq2wMTE0bGAqJw9OJm/S8HbztrQoryxbSdrw8RSVVLP94BG2H6xmx8FqSqsb2LS3kk17K4/u\nsKrguDIyUhJdMEkiO93rAkturGInxZw4dAAThmaQlZbUlX+i6WO6PXCIyADAp6pV7v4lwL3AAmAu\ncL+7/YfbZQFwi4g8gzcoXmHjG6a/8vmEYQMSyTtlGBecMuyY58prGth+8Ig3lnKwms079+JLHUiF\nW9CxvKaBitpGquubqK5v4sPy2mP2//vWda33hw5M4cSA1suJwzI4cegAmu3kR0N8WhzDgb+7lUcT\ngb+o6r9EZBXwrIjcBOwGrnX5X8abiluENx33xu6vsjE9X3Z6Mnljk8kbmwNAYWHNcV0WLS1KdUMT\nFTWNlNd4AeXQkXqWry+iJimT7SXV7Cit5mBVPQer6lmx4/Bxx/G98BJJCT6SE3wkJfpIShASfT6S\n3f2kBB9JCT4aao8weO27pCR6z6UkJpCc4L9/NO3I4RpkaBmnjsgkLdmmK/cG3R44VHUHMC1E+iHg\n4hDpCtzcDVUzps/z+cS7iFVqEmMGHU0f3byfvLwZgBdc9lbUet1gJdVsP+jfjlBaVU+LQn1TC/VN\nLVDfzgFLS9vJ4Hlk9XJ84p2tf1puJpNHZjJ5ZBaTR2aSYzPJepyeNB3XGNMD+HxHr7x4/slDj3mu\nsLCQ6TNm0tjcQkNzC41NLTQ2a+vjJne/vqmFjZu3MO7EidQ3NtPQ3EKDCzbebXPr4w3bi9lfn9Q6\nVbmopJoF644OY47MSuU0F0QyG+qZVN/EgBT76oone/WNMVFJ8AkJvoT2z4IvTSEvKPCEUlhYTV5e\nHnWNzWw7UM2mvRVukL+CLfuq2FtRx96KOv695QAAP3n7VaaMymLW+EGcOW4QZ4wbRFa6DeZ3Jwsc\nxpgeITUpgdNHe+t9+TW3KLsOHWHT3ko2FJfzxqZidpQ3sW5POev2lPPImzsQgUkjMr1AMt4LJCa2\nLHAYY3qsBJ+0zuz6xLSRXDailklTplH4QVnr+mBr95SzZV8lW/ZV8vjyXQAMTvMxesXbDBuYwvDM\nFIYNTGXYwBSGBdwfnJFCgi1E2SEWOIwxvcqAlETOO3ko57lusLrGZtbuKW8NJIUflHGotplDe8rb\nLMcnMDgjhRRpZsTK5WSmJZGZmuhuvfNcMtMSvckEaUnsLWtk2OEastOTyEhJ7NfXpLfAYYzp1VKT\nEpg9YTCzJwwGoLG5hcVvr2LYCRMpqaqnpLLOu/VvlXUcrKrn0JEGDlZ508KKK8siO9i/Xwe8llBW\nWhLZaUlkpbvbNG9tsczURA4cqGZZ2bZ2i6s+VEPS8HImjcgkObH3XFfPAocxpk9JSvAxfEAiee2M\ndTQ0tXDoSD0rCtcxavzJVNY2UlnXSGVtIxW1Ta33K+u88132H6qkgUQqahs50tDM4SPeEjBhbXo/\novr+cc3bJCf6mDwyk+ljspk+JpsZY3IYMyitx7ZqLHAYY/ql5EQfuVlpnJCVRN749gfUA9eAamhq\ncWfkN7iz8t1W6wWbD/fuZWRubpvlKbBhezG7axLYcfAIa3aXs2b30e61QQOSmT4mm2mjs5k2Jovy\nyibGH2kgOy0p7hcJs8BhjDFRSk70MXRgCkMHpoR8vrCwiry8U9otxz8VuaKmkXXF3kyxtW47dKSB\n17aW8NrWkqM7vLKYBJ+Qk57MkIxkhmSkMDgjmcEDvNuhGSmU7a9jyrRmUhJjdxa+BQ5jjImzrPSk\nYwb8VZXislrW7Cln7e5yNu2tYE9pBUeafFTUNlJaXU9pdT3ehVSP94VLlFieI2mBwxhjehgRYcyg\ndMYMSucT00YCR7vKGppaOHykgdJqb4D/UHU9h6obKD1ST2lVA7v3HyQ9xmt+WeAwxpheJDnRx4is\nVEaEueRwYWFhzAfVe8/8L2OMMT2CBQ5jjDFRscBhjDEmKhY4jDHGRMUChzHGmKhY4DDGGBMVCxzG\nGGOiYoHDGGNMVERV412HLiciB4EPOlHEEKC0C/NZmVamlWll9sQyg41V1fav96uqtgVtQEFX5rMy\nrUwr08rsiWV2dLOuKmOMMVGxwGGMMSYqFjhCe6SL81mZVqaVaWX2xDI7pE8OjhtjjIkda3EYY4yJ\nigUOY4wxUbELOfUyIpIDTARar+Kiqm92orwUVa1vLy2exLsqzWhV3RPvuvQVveF9Nz2XtTgiJCJP\nudtvxqDs4SLyMbcNayPfl4A3gVeAH7rbe9rIf7aIfE5EvujfQmR7J8I0ROQcERng7n9BRB4QkbFh\n8h73OoV77dqrp3oDcf8Xat/OEJF0EblLRP7oHk8UkY+FyfuCiPyHiPTo/xkRifSaoRG/77EQ4WcT\nETknwrQEEflWFMdPEJGRInKCf4vuLziurD9HkX9KR4/VU/Tof4Lu4r6454vIP93j00TkpqBsee5L\n8j9FJEdEBgVuAWVViUhluC3EsT8NrASuBT4NvCsinwpT1W8CZwAfqOqFwAzgYJi/6SngF8Act88Z\nQH7A8yNEJA9IE5EZIjLTbRcA6WGO/zBQIyLTgO/inZ3/ZJi8c0Ok3RBtPQOsEJEzwhwruMxI3k+A\nx4B64Cz3uBj4cZhiHwY+B2wTkftFZFKYY2eLyK0uqP7av4XI900RyRTPfBFZLSKXhCkz4rxAkYj8\nXEROC1NWxO97G5/lqjCf5Yhe9yjec4DfRJKmqs3AlWHKCD7+N4ADwGLgJbctCpP3ZBFZIiIb3eOp\nIvL9EMceKiLJkRwf+L2IrBSRr4tIdhv1zBKRB0WkwG2/FJGsMHmHisj3ROQREXnUv0VYn+jF+gzD\n3rAB/8T70l7nHicCG4Ly3Apswfui2RGw7QR2hCjzXuDrwEAgE/ga8N0Q+dYBwwIeD/XXI0TeVe52\nLZDivx8m7xbcrLkwz88FXgeq3K1/WwB8Msw+q93tD4CbAtMC8lwHLATKXFn+7XXg39HWMyDfZqAZ\n2A6sBzYA6zv6frr0Ane7JvD9aKceWcBXgT3AcuBGICng+eXAAy59rn8L9b6720vd6zMt+LXsYN6B\nwJddPVYA84DMzrzvXfl/FOl7jhfMv+1e59sCtnva+P+4D/gtcC4w07+FyFcEDI7wb1oKnBn0GdkY\nIt8fgFXAXYH1baPcicBPXV3+Anw0RJ4X8HoWJrjtbuDFMOUtB37mXv9r/Ftn3s+2Nhvj8AxR1WdF\n5E4AVW0SkebADKr6a+DXIvIw8HvgPPfUm6q6LkSZl6rqrIDHD4vIu8D/BuXzqWpJwONDhG8JFrtf\nKP8HLBaRMmBvmLwbgRHAvlBPquoTwBMico2qvhCmjGBV7jX6AnCeeN0iSUF5lrtjDgF+Gbgv3hd+\nVPUMcDmQg/elAF6XXXmYvO2+n06DiKQBCiAiJ+L9MAhJRAYD1+P9/WuAp/F+Nc8FLnDZUlX1tnb+\nFgBxt1cAj6nqOhGRzuZV1Srgj8AfReQ84K/AgyLyPPCjaN73wJZ0mGMdDkqK9HWP5D1PBjLwgs/A\ngPRKIFyL/Gx3e29gNYGLgvLtASraOHagdFVdGfRyN4XIt9dtvqD6hqSq21zLpQD4NTDDvaffU9UX\nXbYTVfWagN1+KCJr26jn7e0dt6tY4PAccV8K/i+Q2YT/YG0F/gy8iPcP/ZSI/FFVg5vPzSLyeeAZ\nV+51eL+Yg/1TRF7B+wcH+AzwcqgDq+rV7u49IvI63q/ff4Wp5xBgs4isJODLUFU/EVTmCyLyH8Bk\njh1wD/zn8/sMXnfNTaq6X7x+4Z8HlfcBXhfWWSH2byUiC/Fel4GR1BO4CvgSAa873hdkqK6MSN/P\nu/FevzEi8jRwDiG601wZLwKT3HE/pqr73VN/E5GCgKxPiciX8bo+Av+e4C/ZQhF5FRgP3CkiA4GW\nUMeOJq8L5v+B1+IZhxe8n8YLuC8DJ7usS0TkAY7+AFoK3Kuqga9TId5rGPit6X+seL+CA7X5ukfz\nnqvqUmCpiDzuPlPtUq/7NhI7gDdE5KWg4z8QIm+p+0Hh/5s+RYiAp6o/dM8P9B5qdbiDi8hUvPfn\nP/C6yz6uqqtFZCTeOJM/cNSKyBxVXeb2OweoDVPsIhG5QlVDfnd0NTsBEBCRmXhfQFPwfg0NBT6l\nqsf9QhaR9cBZqnrEPR4AvKOqU4PyjQN+hfdlpMDbwH+p6q6gfD8D3sX75Sp4v6Rnd/bXg4icHyrd\n/UMG5vs9Xt/2hcCf8H7NrVTVUGMCkRx3marOEZEq3D+b/ynv8JrZVv3aqGdEr7t7zv9+TgY2Eeb9\ndH3tG/D+GXcA76pqyFVFReQK4DS897MFWAY8rKp1QfluxusyKQ/4+1VVJwTl8wHT8bo5y90X7qgw\nnzl/3iQgBe9HwagQP1YQkR14XU/zVXV50HO/VtVb3f0X8D7rT7inrwemqeonw/z9gzh+Nl/we9Tm\n/1G077nb53WO/Rz58wa3IhCRH4Qp996gfHeHyffDEGVOwDsT+2y87tedwOeDg5l4A95PAf5WWinw\nRVXdFKLMN/F+9DyvqrVBz12vqv6JONPx3h//uEYZXrdnqM9IFTAALxA2EvT/1tUscDgikgicgveC\nv6eqjWHybQDO8H9hiEgq3tjD6R087mpVnRmUtj7UF2Is+I8VcJuB1496SUCeiIJBB48/HtgX8Hqm\nAcNDBNiIX3f33C14YwJVeL/ifhPiS/4ivIB9Lt6v57V4XY+/ClHms3jdJE+7pOuAHFW9NijfdmBW\nGwFokqpudV+yx1HV1SH2+RLexIjRro6z8YJmqC/PjLZ+7QbkW6uq09tLa+P4y1X14oA8Ppe+knb+\nj0TkZ8E/jEKlufS8gIepeH33Tar63RB5vx2U92PAFlX9z+C8kRKRBFVtdj9UfK4rMFS+5cD/qOrr\n7vEFwE9U9exQ+SM8dgreD7kTgWy81puG6Q2IKLh3FeuqOupMvKZ9IjBTRFDVUDOGHsOb+fR39/gq\nYH5wJhEZijdI6S8TAP+HWES+hjd4PsH9mvYbiNc66ZAOfMn7f/HUuKbyIbwukVaqOsfdttt32wHP\ncbRvGrzuvOfwZtoEiuh1d57E+5L/iXt8Hd6vwWO+5FX1NRFZ6o51Id6g92S8lmKwU1R1WsDj10Uk\n1NjWJqAmTL3AGzSdx7HjP61V4vj+eDg6m26Fql4o3oyu434dO2kicithPncBoukGaff4qtoiIr9U\n1bPwXoO2fBQIDhKXh0hDVQuDkt5279lxVPWY11REfoE36E9Q+lC8WYHB3bOhXvudIvIv4G/Aa6GO\n6wzwBw1X1hsu2BxHRPwD46cFHT+46+8feC3X1cCHbRw7bHAHLm5rv46ywEFrl8WJeC+4fxxCCTHV\nVFUfEJE3ONq1dKOqrglR7D+At4B/E3ps4y94s1B+CtwRkF4Voj88Yh34kl8k3oD7z/E+oIrXZdVd\nElW1wf9AVRskxLTGKF53iPBLXkSW4DXv38F7r87QYycqBFojIrNVdYXbdxahA3wzsNZ1sQT2n9/q\nbue520j74wHqVLVORBDvJL2tInJKmLztfe78voY3SH5MN0gnj/+qiFyD12I9riujIz+W5NgBeh/e\ntN0RbfxdgdI5fhwGvFbj3/BaJF/F+7tDTmvHaz19HLgZmC8ii4Bn/AE3wA4RuQvvBwp4Eyh2hinz\nMbzxtQfxfrDcyLHjSH6jVfWyMGUEi+bHRadZ4PDkA6eF+rCH4roTjutSCNLmLAf1BiEr8H4Nx42q\n/sjdfcH9U6TqsQOksXZQRD6hqgsARORKwly9LMLXHSL/kl8P5OH1yVcA5SLyTmC/s+siU7zxhS+K\nyG73eCzeFOFg/0eEJyuKyNkc3zII1cqNZjZdpLNrtuDN8AvsBrmK0DPfIj3+bXiBuElE6ji+lduR\nH0v+AXrwZjPtAkKOvwW8V+AFmWHAj0JkHayq80Xkm3p0ED5cK6YWeBZ4VrxVG36FN5EgwR3zKVW9\nHi9Yj+Po5I2leAEhlDRVXSIi4sZK7hGRt/CCSaDlInK6qm4IU06gaH5cdJoFDk+kU0Kj0a2zHDoj\n+AusjW66WPgq8LSI/BbvH24PEPIs4vZE+yWvqt9y+2Xg/ZM/hvc5SAnIFvJM8nBU9QnXYvLPXgrX\nzx9NKzea2XSRfu4i7gaJ9PiqOjBUP/uxWXSXeBMIjiEig8IEj9PwWilz8F6ft/CmsIbyMY5O2c4G\nXg7R1QXe4DHAPvFmFO7F6+IJSbxB/c/gdaetwjtXws9/YvBcvNaDf8YZhG5FANS5MaFtInIL3usf\nasWIOcANIrITr/XqD8Shxj+j+XHRaf16cFyOnR44HW9gr60podGU3a2zHDoq3BeYv2ulG+uRgfd5\nDDn4GGEZIZc/8dPjZ8Lcgvclk4c3hfhN4C1Vbasvu706XIA3E2YX3ns+Bm8mzJtB+bYQRSs3iuNH\n9LkTkY2q2qVLX7Q3iC4ii1T1Y+6L8LhpviH6+COelODy3oo3ruj/1X8VcNxUefGWlXkL7735Dd4J\nuveo6sIQZe50f8uzwAJ1s/qCjvk1vC6xwADsf91D/U1n4LX4svFaRJnA/6rqu0H5Qn6egz/HIco/\nHxfcA7uBu1J/Dxzn473BP8MbLGt9CviZHnsCX0fK77ZZDh0Vqy+wCI77BVX9s4iEPFlOQ8+p7+o6\nfAcvWBSqaqiTujpSZiHwOVV9zz0+GfirquYF5XsOuFVVu7KV6y87kqmzj+DNNIukGyTS427gaD/7\ndH8/u6p+JijfUxwN0lvbKXNd0HhVyDSXHulU+SeAb6pquXs8CPhFiAkEiEimqh63vEqIfA+r6tfa\ny+fy5gP/g9cS9p9AG64l0SP1664q/z+TiCSF+MdK60zZ3T3LoRNi0U0XCf+Mk1jM1IqIqv68/VxR\nS/IHDXeM90Wk9ex6if7Ex6hE8bmLphskUpH2sz/mjv8b8c6TWIMXRELNZot0vAr3NwROCGgmdHfR\nVH/QAO/kTBGZEabMBte1FjwD65ggE2nQcJ4GvoN3DlG4kz57tH4dODoyyyMK3TrLIVqx/gJrj6r+\nwd32mNekixSIyHyOzq75PN4Ar98vONrKvSog3Z/WWZF+7i7vgmMFi6ifXUNPg55CwDToDkxKgMin\nbPtEJEdVy9yxBhH+u/ApvNUiLsVbyuTzeN1MnXHQPxmkt+rvXVVZeINpXTol1pW9SlXPEG9tmVmq\nWi9hTrCKh1h300VRjzbPd+ltxDtp62aOXQngIT3+2hcxOfGzp3zu2upnl+OnQS/ToGnQ0Y5XBew3\nk4DXXkNM2RZvCfc7gefxgtGngfvUnbEdlHeNqs6QoyfIJgGvaOhzPiIiIhfjjdUs4dgfay+G3amH\n6dctDo3tlNhuneUQrVh200Up0vMOejzx1omar6pfwFshN1SeWLZyoYd87toZy2t3GnR7A8BtHLfd\nKduq+qR464tdhBdgPqmq4Vox/hlY5eItK7If70dOZ9yIt+5ZEke7qpSja1T1eP26xdFdumOWQ7QC\nv8Dwlir3Gwi87b78uqMePaYV1hXEW7Dy4+He51i2ckMcq8d97gLJ0WnQ/w2MUNWUdnbpdm7M6AXg\ndOBxvBV77/J3tXawzA3awSWKegoLHP1Ud36BtVOPH+NN2ezx57tEQkT+gHcdiAVA69TN7pgl1lvE\nYhp0rLiux2vwWhmBM6BCrhcVYZl/BB5so5XT41ngMHEhx66llYHX1+ufEtvjzndpj7gziEWkHG8p\niWP0wUkAHRaLadCxIt46VRV4Exxau1I1aF2sKMvcgnfuVFfOaOtWFjhMXLk5/W/h/eLs7GyVuBGR\nzXgzlRZy9KJOrbqzFWe6ToxOlOzQiX09Sb8eHDc9gn9O/68jmNPfk/0ebxmO8Ry7JEa4ix6Z3iGa\n9aIi0psCRDjW4jBx52YjBc7pr1XVSfGtVcdEcwax6bkCziNJxDsLfwe9tFspFixwmLiKZE6/Md2t\no+eR9BfWVWXird05/cZ0t/4eGNpjLQ7TI/SGOf3GGI+1OExchZjT/yhel5UxpoeywGHiLQ1veY4e\nP6ffGOOxripjjDFR8cW7AsYYY3oXCxzGGGOiYoHDmHaIyP+IyCYRWS8ia91V6GJ1rDfcpUWN6bFs\ncNyYNojIWcDHgJnuokhDgOQ4V8uYuLIWhzFtywVK/VfwU9VSVd0rIj8QkVUislFEHhERgdYWw4Mi\n8qaIbBGRM0TkRRHZ5paQR0TGichWEXnCtWKeF5H04AOLyCUi8o6IrBaR59y5LojI/SKy2e37i258\nLYwBLHAY055XgTEi8r6IPOQujgTwW1U9w62cmobXKvFrUNXz8BY+/AfepWSnADeIyGCX5xTgEbfm\nUSXeRbVauZbN94GPuEvMFgC3uetjXw1Mdvv+OAZ/szFtssBhTBtUtRrv5MR5wEHgbyJyA3ChiLzr\nFsO7CJgcsNsCd7sB2KSq+1yLZQcwxj23R1X9l4r9M94KwYFmA6cBb7vrh88FxuIFmTrgTyLySaCm\ny/5YYyJkYxzGtENVm4E3gDdcoPgKMBXIV9U9InIPkBqwS727bQm473/s/58LPoEq+LEAi1X1uuD6\niMiZwMXAZ4Fb8AKXMd3GWhzGtEFEThGRiQFJ04H33P1SN+7wqQ4UfYIbeAe4DlgW9PwK4BwROcnV\nI11ETnbHy3KX2v0vVx9jupW1OIxpWwbwGxHJxru0bRFet1U5XlfULmBVB8rdAsx11yjfBjwc+KSq\nHnRdYn91170Gb8yjCviHiKTitUq+1YFjG9MptuSIMd1MRMYBi7r6kqTGdBfrqjLGGBMVa3EYY4yJ\nirU4jDHGRMUChzHGmKhY4DDGGBMVCxzGGGOiYoHDGGNMVCxwGGOMicr/B3loUC2uUKqPAAAAAElF\nTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "theLowerFreqs.plot(30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting Content Words\n", "\n", "What if we want to see just the high frequency content words. Here we get the NLTK English stop-word list." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers']\n" ] } ], "source": [ "stopwords = nltk.corpus.stopwords.words(\"english\")\n", "print(stopwords[:20])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to create a new list of tokens without the stopwords." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['project',\n", " 'gutenberg',\n", " 'ebook',\n", " 'enquiry',\n", " 'concerning',\n", " 'human',\n", " 'understanding',\n", " 'david',\n", " 'hume',\n", " 'l.']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "theLowerContentWords = [token for token in theLowerTokens if token not in stopwords]\n", "theLowerContentWords[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can create a table of high frequency content words." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " may one nature must us experience cause human mind never \n", " 295 203 200 177 169 166 157 149 145 125 \n" ] } ], "source": [ "theLowerContFreqs = nltk.FreqDist(theLowerContentWords)\n", "theLowerContFreqs.tabulate(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you still see words you want to remove then you need to remove them too. Note that this next cell updates what is in the variables ```theLowerContentWords``` and ```theLowerContFreqs```. If you want to go recover the words you need to start **3.1** over." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " nature experience cause human mind effect ideas objects idea reason \n", " 200 166 157 149 145 124 120 120 116 116 \n" ] } ], "source": [ "moreStopwords = [\"may\",\"one\",\"must\",\"us\",\"never\",\"every\"]\n", "theLowerContentWords = [token for token in theLowerContentWords if token not in moreStopwords]\n", "theLowerContFreqs = nltk.FreqDist(theLowerContentWords)\n", "theLowerContFreqs.tabulate(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now we get the Frequency Distribution and plot it." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAE3CAYAAACw39aGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzsnXl4VNX5+D9vFhKSsBMgsoMoAoqS\nuGvdrVZbrVtr1Vrbaq22bl20vy5Wu6ltrVu/aq1Vq9atrqBWFBVEAU0AZRFlE0WQfQ973t8f5wy5\nmdzJ3EkymYR5P89zn2TOfe+578zcue89512OqCqGYRiGEU9OphUwDMMwWidmIAzDMIxQzEAYhmEY\noZiBMAzDMEIxA2EYhmGEYgbCMAzDCMUMhGEYhhGKGQjDMAwjFDMQhmEYRih5mVagKXTv3l0HDBjQ\nqGM3b95M+/btm1XW+rQ+rU/rs7X1GUZVVdVKVS1NKqiqbXYrLy/XxlJZWdnsstan9Wl9Wp+trc8w\ngEqNcI+1KSbDMAwjFDMQhmEYRihmIAzDMIxQzEAYhmEYoZiBMAzDMEJJm4EQkb4i8oaIfCgis0Tk\nSt/eVUReFZG5/m8X3y4icoeIzBORD0RkVLp0MwzDMJKTzhHEDuAnqroPcAhwuYgMA64DxqnqEGCc\nfw1wMjDEb5cAd6dLsVlL1vHyvE0sXLkpXacwDMNo86TNQKjqUlWd6v/fAHwI9AZOAx7yYg8Bp/v/\nTwP+7cN0JwOdRaQsHbr9a+In/HPaBibOXZGO7g3DMHYLRFtgTWoRGQBMAEYAn6pq58C+NaraRUTG\nADep6kTfPg64VlUr4/q6BDfCoKysrHz06NEp6zNm7iYemL6B4we254cVnZLKV1dXU1RU1Gxy1qf1\naX1any3VZxgVFRVVqlqRVDBKNl1TNqAEqALO8K/Xxu1f4/++CBwRaB8HlDfUd2MzqSfNX6n9rx2j\nX7vzrUjyu1tmpfVpfVqf2dNnGLSGTGoRyQeeBh5V1Wd887LY1JH/u9y3Lwb6Bg7vAyxJh177lHUE\nYM4XG9ixsyYdpzAMw2jzpDOKSYD7gQ9V9dbArheAC/3/FwLPB9q/7aOZDgHWqerSdOjWqX0+PYpz\n2bqjhgXmqDYMwwglnSOIw4ELgGNFZLrfvgLcBJwgInOBE/xrgJeABcA84D7gsjTqxsDOrpDt7CXr\n03kawzCMNkvayn2rczZLgt3HhcgrcHm69IlnQOd8pny+lVlL1nH6Ab1b6rSGYRhthqzNpN41glhq\nIwjDMIwwsthA5ANuiklbINTXMAyjrZG1BqJb+xw6F+Wzpno7S9dtybQ6hmEYrY6sNRAiwvA9XLir\nOaoNwzDqk7UGAmCYz4eYZQbCMAyjHlltIIbv4cpszF66LsOaGIZhtD6y2kAMi00xWSSTYRhGPbLa\nQAzqXkxBXg6frd7Mus3bM62OYRhGqyKrDURebg5De3UA4EMbRRiGYdQhqw0EBKaZzFFtGIZRBzMQ\nFslkGIYRihmIXZFMZiAMwzCCZL2BGNqrAyIwd9kGtu7YmWl1DMMwWg1ZbyCKC/IY2L2YHTXK3GUb\nM62OYRhGqyHrDQTU+iFsmskwDKMWMxBYJJNhGEYYZiAIlNwwA2EYhrELMxDUnWKqqbG1IQzDMMAM\nBAClHQoo7VDAxq07+GxNdabVMQzDaBWYgfDY2hCGYRh1MQPhsYxqwzCMupiB8Fjpb8MwjLqYgfBY\nJJNhGEZd0mYgRORfIrJcRGYG2vYXkckiMl1EKkXkIN8uInKHiMwTkQ9EZFS69EpE/65FFLfL5Yv1\nW1i1cWtLn94wDKPVkc4RxIPASXFttwA3qOr+wG/8a4CTgSF+uwS4O416hZKTI+xjGdWGYRi7SJuB\nUNUJwOr4ZqCj/78TsMT/fxrwb3VMBjqLSFm6dEuEZVQbhmHUIqrpSwwTkQHAGFUd4V/vA7wCCM44\nHaaqi0RkDHCTqk70cuOAa1W1MqTPS3CjDMrKyspHjx7dKN2qq6spKiqq0/bagmrurlrPEX0LufqQ\nzg3KRu2zqbLWp/VpfVqfTZWNp6KiokpVK5IKqmraNmAAMDPw+g7gTP//OcBr/v8XgSMCcuOA8mT9\nl5eXa2OprKys1/bBZ2u1/7Vj9Li/vplUNmqfTZW1Pq1P69P6bKpsPEClRriHt3QU04XAM/7/p4CD\n/P+Lgb4BuT7UTj+1GEN6lpCbIyxYsZHN22xtCMMwspuWNhBLgKP8/8cCc/3/LwDf9tFMhwDrVHVp\nC+tGYX4ue5aWUKMw5wvzQxiGkd3kpatjEXkMOBroLiKLgeuBi4HbRSQP2IL3JQAvAV8B5gHVwEXp\n0isZw/foyEfLNjB76XoO6NclU2oYhmFknLQZCFU9N8Gu8hBZBS5Ply6pMGyPjjwz7XMruWEYRtZj\nmdRx7Cr9bQbCMIwsxwxEHLFciDlfrGenrQ1hGEYWYwYijs5F7ejduT1bttewcOXGTKtjGIaRMcxA\nhBAbRZgfwjCMbMYMRAjmhzAMwzADEYqtDWEYhmEGIpTg8qOaxlpVhmEYrRkzECH07tyejoV5rNq0\njWXrbW0IwzCyEzMQIYhIYJppXYa1MQzDyAxmIBJgS5AahpHtmIFIQCySyUJdDcPIVsxAJMAimQzD\nyHbMQCRgzx4ltMvNYdGqajZtr8m0OoZhGC2OGYgE5OfmsFevEgAWrd2RYW0MwzBaHjMQDTC8zDmq\nP1m7PcOaGIZhtDxmIBog5odYaCMIwzCyEDMQDRAzEPPWbLeMasMwsg4zEA0wfI+OdCjI49N1O3iq\ncnGm1TEMw2hRzEA0QFG7PG44bTgAvx09i4UrN2VYI8MwjJbDDEQSvn5Abw7vW0j1tp1c+fg0tu+0\nkFfDMLIDMxBJEBF+MKojvTu354PF6/jbqx9nWiXDMIwWwQxEBIrb5XDbN/cnR+Du8fOZNH9VplUy\nDMNIO2YgInLggK786Jg9UYVrnpzO2uptmVbJMAwjrZiBSIErjhvCAf06s3TdFv7fszMs9NUwjN2a\ntBkIEfmXiCwXkZlx7T8WkY9EZJaI3BJo/4WIzPP7vpwuvZpCXm4Ot31jf4rb5fLSjC94qspCXw3D\n2H1J5wjiQeCkYIOIHAOcBuynqsOBv/j2YcA3geH+mP8Tkdw06tZo+ncr5sbTRgDw2xcs9NUwjN2X\ntBkIVZ0ArI5r/iFwk6pu9TLLfftpwOOqulVVFwLzgIPSpVtTOWNUb746cg+qt+3kKgt9NQxjN0XS\nOY8uIgOAMao6wr+eDjyPGyVsAX6qqu+JyF3AZFV9xMvdD7ysqv8N6fMS4BKAsrKy8tGjRzdKt+rq\naoqKihotu2lbDT95dSUrqms4Y2gx5+3bocl9pkNP69P6tD6zu88wKioqqlS1IqmgqqZtAwYAMwOv\nZwJ3AIIbISz0//8dOD8gdz9wZrL+y8vLtbFUVlY2WXbKglU68LoxOuC6MTpp/spm6bOxctan9Wl9\nWp9RASo1wj28paOYFgPPeB3fBWqA7r69b0CuD7CkhXVLmYMGduVyH/p69RPT2bjNppoMw9h9aGkD\n8RxwLICI7AW0A1YCLwDfFJECERkIDAHebWHdGsUVxw1h/74u9PWeqvUW+moYxm5DOsNcHwMmAXuL\nyGIR+R7wL2CQD319HLjQjyZmAU8Cs4H/AZer6s506dac5OfmcPs3XejrpMVb+M+7n2ZaJcMwjGYh\nL10dq+q5CXadn0D+D8Af0qVPOunfrZg/fH1frnpiOr99YRZDe3WgvH/XTKtlGIbRJCyTupk4/YDe\nnDKkiO07lUsfmcqy9VsyrZJhGEaTMAPRjHx7vw4cOqgbKzZs5QcPV7F1R5uYJTMMwwjFDEQzkpcj\n3PWtA+jduT3TP1vLr5+baU5rwzDaLGYgmpluJQXce0E5hfk5PFm5mEcmL8q0SoZhGI3CDEQaGNG7\nEzefuR8AN4yezbsL4yuOGIZhtH7MQKSJ0/bvzcVHDmRHjXLZo1UsWbs50yoZhmGkhBmINHLtSUM5\nfM9urNy4jUsfqWLLdnNaG4bRdjADkUbycnO469xR9Oni1rP+5bPmtDYMo+1gBiLNdCluxz8uqKAw\nP4enpy7mwXc+ybRKhmEYkTAD0QIM26Mjfz5rJAC/f/FDJs1flWGNDMMwkmMGooX46sg9+MFRg9hZ\no1z+n6ks32T+CMMwWjdmIFqQn395KEcO6c7qTdu4Yfxq3pm/MtMqGYZhJMQMRAuSmyPcee4BDO3V\ngS827eRb903hmiens2rj1kyrZhiGUY+UDYSIdBGR/dKhTDbQuagdz//ocM4dXkK7vByemfo5x906\nnife+5SaGotwMgyj9RDJQIjImyLSUUS6Au8DD4jIrelVbfelIC+Xs4aV8MpVX+KIPbuztno71z49\ng2/+YzJzl23ItHqGYRhA9BFEJ1VdD5wBPKCq5cDx6VMrOxjYvZiHv3cQt31jf7qXtOPdT1bzlTve\n4i+vfGRJdYZhZJyoBiJPRMqAc4AxadQn6xARTj+gN+OuOZpzD+rH9p3KXW/M48u3TWDCxysyrZ5h\nGFlMVANxA/AKME9V3xORQcDc9KmVfXQqyudPZ+zLfy89lL17dmDRqmq+/a93uW3KWhtNGIaREaIa\niKWqup+qXgagqgsA80GkgYoBXRlzxRFce9JQCvNzeOvTLTw8yUqGG4bR8kQ1EHdGbDOagfzcHH54\n9GBu8dnXr85elmGNDMPIRvIa2ikihwKHAaUick1gV0cgN52KGXDM3qXkCVQuWs3qTdvoWtwu0yoZ\nhpFFJBtBtANKcIakQ2BbD5yVXtWMDoX5jOjRjhqF1+csz7Q6hmFkGQ2OIFR1PDBeRB5UVZsIzwAH\n7lHI9GXbeG32Ms4q75NpdQzDyCIaNBABCkTkH8CA4DGqemw6lDJqqdijgPumwYS5K9iyfSeF+Taz\nZxhGyxDVSf0UMA34FfCzwJYQEfmXiCwXkZkh+34qIioi3f1rEZE7RGSeiHwgIqNSexu7L92Lchm+\nR0eqt+20MuGGYbQoUQ3EDlW9W1XfVdWq2JbkmAeBk+IbRaQvcALwaaD5ZGCI3y4B7o6oV1ZwwrCe\nALz6oUUzGYbRckQ1EKNF5DIRKRORrrGtoQNUdQKwOmTX34CfA8HKdKcB/1bHZKCzz9w2gOP3cQbi\ntdnLrKCfYRgthkRZI1lEFoY0q6oOSnLcAGCMqo7wr78GHKeqV4rIJ0CFqq4UkTHATao60cuNA65V\n1cqQPi/BjTIoKysrHz16dFL9w6iurqaoqKhZZdPVZ/v27bn0xRWs3FzDzcd1Y8+u+a1ST+vT+rQ+\nW1+fYVRUVFSpakVSQVVN24Zzas/0/xcBU3CF/wA+Abr7/18EjggcNw4oT9Z/eXm5NpbKyspml01n\nn79+bob2v3aM/uWVOc3WZ3PKWp/Wp/XZOvsMA6jUCPfwqOW+vx22pWazGAwMBN73o4c+wFQR6QUs\nBvoGZPsAS1Lsf7cmNs1kWdWGYbQUUcNcDwz8XwgcB0wF/h31RKo6A+gRex03xfQC8CMReRw4GFin\nqkuj9p0NHDyoKyUFecz5YgOfra6mb9fGDS0NwzCiEmkEoao/DmwXAwfgsqwTIiKPAZOAvUVksYh8\nrwHxl4AFwDzgPuCySNpnEQV5uRy1dykAr1k0k2EYLUDUEUQ81biQ1ISo6rlJ9g8I/K/A5Y3UJWs4\nYZ+evPjBUl6dvYyLDh+YaXUMw9jNiWQgRGQ0tWGpucA+wJPpUsoI5+i9S8nNEaYsXM266u10KgqP\nZjIMw2gOoo4g/hL4fwewSFUXp0EfowE6F7XjwAFdmLxgNW9+vJzT9u+daZUMw9iNieqDGA/MwVVy\n7QJsS6dSRmJOGNYLsGgmwzDST9Qw13OAd4GzcetSTxERK/edAY7fxwWCjf9oBdt21GRYG8Mwdmei\nTjH9EjhQVZcDiEgp8Brw33QpZoTTv1sxe/Us4eNlG3l34WqOGNI90yoZhrGbErUWU07MOHhWpXCs\n0czsKt43+4sMa2IYxu5M1Jv8/0TkFRH5joh8B1ca46X0qWU0xK7ifR8uj5UmMQzDaHaSrUm9J9BT\nVX8mImcARwCCS4B7tAX0M0IY2acz3UsK+HztZj5cuoFhe3TMtEqGYeyGJBtB3AZsAFDVZ1T1GlW9\nGjd6uC3dyhnh5OTILme1ZVUbhpEukhmIAar6QXyjujLcA9KikRGJWj+EGQjDMNJDMgNR2MC+9s2p\niJEah+/ZncL8HGZ8vo6l6zZnWh3DMHZDkhmI90Tk4vhGX3gv2ZKjRhopzM/lyCGueN+4D5cnkTYM\nw0idZAbiKuAiEXlTRP7qt/HA94Er06+e0RA2zWQYRjppMIpJVZcBh4nIMcAI3/yiqr6eds2MpBw7\ntAciMGn+KjZu3ZFpdQzD2M2IlEmtqm8Ab6RZFyNFupcUMKpfF6oWreGtj1fUrsZkGIbRDFg2dBvH\nppkMw0gXZiDaOLGs6tc/Ws7OGsuqNgyj+TAD0cYZXFrMwO7FrK3ezpxV2zOtjmEYuxFmINo4IrVZ\n1e8t2ZJhbQzD2J0wA7EbEFtE6LUFm3lk8iJqbKrJMIxmwAzEbkBF/y6csm8Zm3cov3puJmfd8w5z\nvlifabUMw2jjmIHYDcjJEe761gH89NDO9OhQwNRP13LKHRP508sfUr3N8iMMw2gcZiB2E0SEQ/sU\n8tpPjuLCQ/tTo8q94xdwwq0TeGOOleIwDCN10mYgRORfIrJcRGYG2v4sInNE5AMReVZEOgf2/UJE\n5onIRyLy5XTptbvTsTCfG04bwbOXHc6wso58vnYzFz34Hpc9WsWy9ebENgwjOukcQTwInBTX9iow\nQlX3Az4GfgEgIsOAbwLD/TH/JyK5adRtt2f/vp154UeH86tT9qGoXS4vzfiC4/46nofe+YSdtgqd\nYRgRiFRqozGo6gQRGRDXNjbwcjJwlv//NOBxVd0KLBSRecBBuJXrjEaSl5vD948cxMn7lnH987N4\n7cNlXP/CLPp1ymPv2ZWR+uiVV82IkTspyDN7bRjZhqRzTWNvIMao6oiQfaOBJ1T1ERG5C5isqo/4\nffcDL6vqf0OOuwS4BKCsrKx89OjRjdKturqaoqKiZpVt7X1O+XwL909bz6rNNZH6i9G7Qy4/KO/E\n8NJ2LaKn9Wl9Wp/N12cYFRUVVapakVRQVdO24VadmxnS/kvgWWoN1N+B8wP77wfOTNZ/eXm5NpbK\nyspml20LfW7csl3vHT1R/zdzadLt6arP9NDfvaz9rx2j/a8doz99crqu2ri1RfS0Pq1P67P5ZOMB\nKjXCPTxtU0yJEJELgVOB47yiAIuBvgGxPsCSltYtGyguyKO8rJDy4b0iyZft+IIp6zvyf2/M56mq\nxbz24TJ+ecowzhzVGxFJs7aGYWSSFg1zFZGTgGuBr6lqdWDXC8A3RaRARAYCQ4B3W1I3I5x2ucJV\nx+/Fy1cdyaGDurGmejs/fep9zr1vMvOWb8y0eoZhpJF0hrk+hnMy7y0ii/0ypXcBHYBXRWS6iNwD\noKqzgCeB2cD/gMtVdWe6dDNSZ3BpCf+5+GBuPWckXYvbMXnBar5y+1vc+urHbNluX5Vh7I6kM4rp\n3JDm+xuQ/wPwh3TpYzQdEeGMUX04Zu8e3PTyHJ6o/Iw7xs1l9PtLOHdoPp1XRBtRWJitYbQNWtwH\nYbR9uhS34+az9uPM8j78v2dnMG/5Rv44Ef44cXyk4/t1zOO5fbbRtThxVJRhGJnHDITRaA4a2JWX\nrjiS+95awOOT5pPfriDpMaurt/Hp+u1c9MC7PHrxIZQU2CVoGK0V+3UaTaJdXg6XH7Mnh3RcR3l5\neVL5Zeu38NXb3uT9xeu49OEq7v9OhSXhGUYrxYr1GS1Kz46F/OaoLnQvacfEeSu55on3balUw2il\nmIEwWpyykjwevOggOhTk8eKMpfzm+ZmoOa4No9VhBsLICCN6d+K+Cytol5fDo1M+5W+vzc20SoZh\nxGEGwsgYhwzqxp3nHkCOwB3j5vLg2wszrZJhGAHMQBgZ5cvDe3HTGfsB8NvRs3l++ucZ1sgwjBhm\nIIyMc86Bfbnu5KEA/OTJ93nzI1sBzzBaA2YgjFbBpUcN5pIvDWJHjfLDR6ZStWhNplUyjKzHDITR\navjFyUM5q7wPm7fv5LsPvsen67ZnWiXDyGrMQBitBhHhpjP25fh9erJu83Z+N2EN85ZvyLRahpG1\nmIEwWhV5uTnc9a0DOGhgV1ZvqeG0u97mxQ+WZlotw8hKzEAYrY7C/FwevOhAjuhbyKZtO7n8P1P5\n3ZjZbN+Z2lKphmE0DTMQRqukqF0eVx3cieu/Ooy8HOH+iQs5774pLF+/JdOqGUbWYAbCaLWICBcd\nPpDHLzmEnh0LePeT1Zxy50TeXbg606oZRlZgBsJo9VQM6MqYHx/JwQO7smLDVs69bzL/fGuB1W8y\njDRjBsJoE5R2KODR7x/MD740iJ01yu9f/JAfPTaNjVt3ZFo1w9htMQNhtBnycnP4xVf24e7zRlFS\nkMeLHyzl9L+/baGwhpEmbMEgo81x8r5l7NWrA5c+XMXc5Rs57a63OXVIe1YWfsHg0hL6dysiP9ee\nfQyjqZiBMNokg0tLeO7yw7numRmMfn8JT8zayBOzqgDIyxH6dSticGkJg0qLGVxawuDSEvYsLcmw\n1obRtjADYbRZigvyuOOb+/OVEb148d05bJBi5q/YyOdrN7NgxSYWrNhU75hOBTnsXTnJG41iBvdw\nhmOPzu3JzZEMvAvDaL2YgTDaNCLCyfuW0WPbkl1rYm/ZvpOFKzcxf8VG5i/3f1dsZMGKTazbupN3\nF66uFyrbLi+HQd2L6xiO4i07M/GWDKPVYAbC2O0ozM9ln7KO7FPWsU57TY0y9u33KO41kPnLNzJ/\nRa3xWLZ+K3O+2MCcL2od3oW5wk9rFvCdwwaQZz4NIwtJm4EQkX8BpwLLVXWEb+sKPAEMAD4BzlHV\nNSIiwO3AV4Bq4DuqOjVduhnZSU6OUFqUS/mQUo4cUlpn34Yt21kQMBjTP1vL2/NW8fsXP+TZaZ/z\nx6/vy8i+nTOkuWFkhnQ+Fj0InBTXdh0wTlWHAOP8a4CTgSF+uwS4O416GUY9OhTmM7JvZ84Y1Yef\nfXkoj37/EK47vDO9O7dn1pL1nP5/b3P98zPZsMVKkBvZQ9oMhKpOAOJrIpwGPOT/fwg4PdD+b3VM\nBjqLSFm6dDOMKBy4RyFjr/4Sl3xpEDkiPDRpEcffOp6XZiy1LG4jK5B0XugiMgAYE5hiWquqnQP7\n16hqFxEZA9ykqhN9+zjgWlWtDOnzEtwog7KysvLRo0c3Srfq6mqKioqaVdb63H37/GTtdu6pWs/c\n1W4EUV5WwPcP6ECP4rxWpaf1aX1GoaKiokpVK5IKqmraNpyvYWbg9dq4/Wv83xeBIwLt44DyZP2X\nl5drY6msrGx2Wetz9+5zx84a/fekT3TE9f/T/teO0aG/elnveXOeTn73vValp/VpfSYDqNQI9/CW\njmJaJiJlqrrUTyHFVqdfDPQNyPUBlrSwbobRILk5wgWH9OfLw3py45jZjPlgKX96eQ69O+Ry7Ocz\nd4XHDi4toVfHQnIsr8Jo47S0gXgBuBC4yf99PtD+IxF5HDgYWKeqtoyY0Srp0bGQu741irPKl/Pr\n52fy2erNPDx5UR2Z9vm5dbK4B/dw/2/dab4Lo+2QzjDXx4Cjge4ishi4HmcYnhSR7wGfAmd78Zdw\nIa7zcGGuF6VLL8NoLo7euwevXn0Uj4ydgnQq84l5Lr9i5catzFqynllL1tc5RoDeb7xez3AMKi2m\ntKQAF/FtGK2DtBkIVT03wa7jQmQVuDxduhhGuijMz+WAXgWUlw+s076uejvzV26sm5C3fCOfrNrE\n4jWbWbxmM+M/XlHnmA6FebsMR96WjUxaNzeSDptWVVPTbTWDuhfTtbidGRmj2bBMasNIA52K8hnV\nrwuj+nWp0z75vUq69x/KghV1M7nnL9/I+i07mP7ZWqZ/ttYJz/448vnurpoEQOei/NpyIaUlDPL/\n9+vauGgXI7sxA2EYLUh+jrBnjxL27FG3sqyqsnLjtl2Go2rOAnr16pW0vxqF2Qs/Z+3OAuav2MTa\n6u1ULVpD1aI1dc+bK/QsymH47Mo6hmNQaQmd2uc363s0dh/MQBhGK0BEKO1QQGmHAg4e1I2981ZQ\nXj400rFVVZsoLy9HVVm+YasflWzy01uuSOHnazezeMNOFs9aBiyrc3xph4J6Iw621aThXRptDTMQ\nhrGbICL07FhIz46FHDa4e5191dt28OKESgpK++0yHPNXbGLhyo2s2LCVFRu2MnlBbeGDHIGDZ0zm\nxOE9OWFYT/p0sSmqbMQMhGFkAUXt8hjUJZ/ykXvUaa+pUZas21xnxDF32UaqFq1m0oJVTFqwihtG\nz2ZYWUdOHN6TE4f1Yp+yDuYIzxLMQBhGFpOTI/TpUkSfLkUctVdthdsJk95jTfs9GDtrGW9+tJzZ\nS9cze+l6bnttLr07t99lLHJrLK9jd8YMhGEY9Shul8OX9u/Nafv3ZuuOnbwzfxVjZy3j1dnL+Hzt\nZh54+xMeePsT8nOgYMwrkfosyVeGz3jPZ5vXJhF2KW6X5ndjNBYzEIZhNEhBXi7H7N2DY/buwR9O\nH8H0xWsZO2sZY2d/wYIVm9i+dUekfjZuhS/mLGfcnOV12rvsCs11iYO6bgvd+m+iT5f2tlBThjED\nYRhGZHJyZFd+x3UnD2XilPcYOXL/pMfVKLw+qYqC0v51c0CWb2RN9XYqF62hMhCa+6e33yQ/VxjQ\nrbhOxnks67xDoYXmtgRmIAzDaDTt83Ii36z7dcqnfN+6y7yoKsvWb/XhuM5wTJ23hJVbc1iybgtz\nl29k7vKNMKtuXz06FDC4tISOUs0hWxbuCs/do1N7K5LYjJiBMAwjY4gIvToV0qtTIYfv6UJzq6q2\nUF5ezqatO1i4clNtXocfcSxcuYnlG7ayfMNWAF6ZP3tXf4X5OQzqXrLLzxEzHFYksXGYgTAMo1VS\nXJDHiN6dGNG7U532nTXKkrWbmbdiIxOmzmFrYZc6RRJjEVdB2uXCFRvncsmXBtMuz/waUTEDYRhG\nmyI3R+jbtYi+XYvouPEzystUAbjcAAAgAElEQVT33bVv3ebt9Xwc83w2+V/Gfsxz05fwx6/vy0ED\nu2bwHbQdzEAYhrHb0Kl9Pgf068IBcUUSH3z5HR6atY15yzdyzr2T+EZFX647eaiF2CbBxlqGYez2\n7NujgJevPJIrjxtCu9wcnqj8jONuHc/TVYtjyxwbIZiBMAwjKyjMz+XqE/bi5auO5NBB3Vi9aRs/\neep9zvvnFBas2Jhp9VolZiAMw8gqBpeW8J+LD+avZ4+ka3E73pm/ipNue4vbXvuYrTt2Zlq9VoUZ\nCMMwsg4R4czyPoy75ijOqejDtp013PbaXE6+7S3e+nQzG7Zsz7SKrQJzUhuGkbV0KW7HLWeN5MxR\nffjlczOZt3wjt62Ev1e+yqGDu3PiMFfuvGfHwkyrmhHMQBiGkfUcPKgbL11xJI+9+ylPvPMxc1Zt\nZ8LHK5jw8Qp+9dxMRvbtzInDevLl4T0ZXFqSNeXOzUAYhmEA7fJyuPCwAYwoWMWAvUfw+pzljJ29\njLfmruD9z9by/mdr+fMrHzGwezEnDutJqW5lx4JVSfsVEbZub5sr9JmBMAzDiKNbSQFnV/Tl7Iq+\nbN62k7fmrmDs7GWM+3AZC1du4t4JC5zgW5Mj9dexIIcbChZz+v6929TowwyEYRhGA7Rvl8uJw3tx\n4vBe7NhZQ9WiNYydvYxJcxZTUtIh6fFrqrcxd/lGrn7iff5btZjfnTaCQaUlLaB50zEDYRiGEZG8\n3BwOHtSNgwd1o6psM+Xl5UmPUVX+8vREHp29mbfnreKk29/i8qP35NKjB1GQl9sCWjeejIS5isjV\nIjJLRGaKyGMiUigiA0VkiojMFZEnRMRy4A3DaPOICMcOLGLcNUdx5qg+bNtRw99e+5iTb3+LSfOT\n+zAySYsbCBHpDVwBVKjqCCAX+CZwM/A3VR0CrAG+19K6GYZhpItuJQX89ZyRPHbxIQwqLWbBik2c\ne99kfvLk+6zetC3T6oWSqUS5PKC9iOQBRcBS4Fjgv37/Q8DpGdLNMAwjbRw6uBsvX3kkVx+/F+3y\ncnh66mKO++ubPFn5WaurCyWZUEhErgT+AGwGxgJXApNVdU+/vy/wsh9hxB97CXAJQFlZWfno0aMb\npUN1dTVFRUXNKmt9Wp/Wp/WZiuySDTv4x9T1zFjuRhD9O+bQrSjaCn092isXVzSubHlFRUWVqlYk\nFVTVFt2ALsDrQCmQDzwHXADMC8j0BWYk66u8vFwbS2VlZbPLWp/Wp/VpfaYqW1NTo89M/UxH3ThW\n+187JvJ2/M2vRD5/PEClRrhfZyKK6XhgoaquABCRZ4DDgM4ikqeqO4A+wJIM6GYYhtGiiAhfP6AP\nx+3TkyfHvcegwYMjHbd00YI0a5aZMNdPgUNEpAg3xXQcUAm8AZwFPA5cCDyfAd0MwzAyQsfCfA7o\nVUD50J6R5Ks2LU6zRhlwUqvqFJwzeioww+vwD+Ba4BoRmQd0A+5vad0MwzCMWjKSKKeq1wPXxzUv\nAA7KgDqGYRhGCLYehGEYhhGKGQjDMAwjFDMQhmEYRihmIAzDMIxQzEAYhmEYoWSk1EZzISIrgEWN\nPLw7sLKZZa1P69P6tD5bW59h9FfV0qRSUdKtd8eNiKnmqchan9an9Wl9trY+m7LZFJNhGIYRihkI\nwzAMI5RsNhD/SIOs9Wl9Wp/WZ2vrs9G0aSe1YRiGkT6yeQRhGIZhNIAZCMMwDCMUMxCGYRhGKGYg\nDCONiEiOiBzWXHKG0ZJknYEQkSNE5CL/f6mIDMy0TjFE5IyGthD5m6O0hch0EZH9EuwrFpEc//9e\nIvI1EYm2inry8+aKyB4i0i+2pbtPv//PTT1PY1HVGuCvzSUHICKHR2lrDCLSW0QOE5EvxbYQmadF\n5JTYddJAX2dHafPtV0ZpSwURGRexLVdEHmnKuQJ9jWpoa45ztCRZFcUkItcDFcDeqrqXiOwBPKWq\nh8fJ7QXcDfRU1RH+Zvo1Vf19SJ89gT8Ce6jqySIyDDhUVeutiOdv8jcDPQDxm6pqR7//AS/aA7dO\n9+v+9THAm6p6Rlx/U1V1VFzbB6pa7+YvIm8CX8MtEjUdWAGMV9Vr4uSqgCOBLsBk3HKw1ap6Xkif\nhcD3gOFAYaxdVb8bIvtj3CJRy4CaWtG6uorIaCD+olzn9bhXVbc0os/XgeM0ycUuIgXAmcAAAotp\nqeqNAZkZIfoRkA377G8APgCeaUiHFOTCvvd6bb69FLg45D2FfUc3A98AZgM7a0X1a3FyxwMXAYcA\nTwEPquqcJuoZJjtNVQ+Ia0t6ffjrsgi3jPHRuN8ZQEfgZVXdJ+T8rwBfVdVt8fvi5Bq8N4jIG160\nEHeved+ffz9giqoeEdLnLcDvcUsw/w8YCVylqo/4/Ymuudj9I/RhrznIyIpyGeTrwAG45U5R1SUi\n0iFE7j7gZ8C9Xu4DEfkP7kuM50HgAeCX/vXHwBOEL5l6C+4i/DBMOVWNjWzGAMNUdal/XQb8PSYn\nIj8ELgMGi8gHgS46AO+E9Q10UtX1IvJ94AFVvT7u2F3dq2q1iHwPuFNVbxGRaQn6fBiYA3wZuBE4\nDwh9b8CVOMO8KsH+GAuAUuAx//obOAOwF+57uaARfU4DnheRp4BNsUZVfSZO7nnczaYK2Jqgr1P9\n38v934f93/OA6gTHXAMUAztEZAtxDwYhcjtFZHO8nIgcintwKBWRoGHvCOQmOPfzwFvAa9Te9BNx\nOu7zTPTewSn0GvCaiHQCzgVeFZHPcN/PI8DxwFeA3iJyR5yeO4J9ici5wLeAgSLyQmBXByDse41y\nffwAuArYA/ddxgzEegK/ozg+Ad72OgSvkVvj5Bq8N6jqMf59PQ5coqoz/OsRwE8TnPtEVf25iHwd\nWAycjTNusVHNqQmOSzvZZiC2qaqKiIKbTkkgV6Sq74pIsG1HAtnuqvqkiPwCQFV3iEiiH+KyRMYh\njgEx4xA7DvcDiPEf4GXgT8B1gfYNqro6QZ953tCcQ60xC0P8jeg83OgAEl8ne6rq2SJymqo+5H8o\nrySQ/Qx3803GAaoanNYYLSITVPVLIjKrkX12xd1sjg20KRBvIPqo6kkNdaSqi8BN6cSNPK8Tkbdx\nhjL+mLCHkLC+k8m1A0pw30dQdj1wVoJjilT12ijnx91880lsHHchIt2A83E35GnAo8ARwIU4w12J\nG7FWBQ7bAFwd19U7wFJc4bm/xsmGPcAkvT5U9XbgdhH5saremey9eJb4LYe6n208Ue8NQ2PGwes0\nU0T2T9BnbAr3K8Bjqro62H/smssE2WYgnhSRe4HOInIx8F3cE0E8K0VkMH5YJyJn4S7iMDb5H0tM\n9hAS37QqReQJ4DkCP8KQJ9k3/ZD3Md/vN3FPFDH5dcA6EbkdWK2qG/y5O4jIwao6JeTcN+Ju3hNV\n9T0RGQTMDZG7EvgF8KyqzvJyb4TIAWz3f9f6J6QvcFMZYSzw7+vFuPce/4RWKiL9VPVT/5764W4e\nAPHD/0h9xkZmEXhHRPYN/rAboFhEjlDViV7Pw3BP/6GISBdgCHWn4ibEyQjOMA9U1d+JSF+gTFXf\n9fLjgfEi8mAKN40xIvIVVX0pgmw1MN3P0wc/zyvi9HwGGIobPX018DDzhIhUqur7wPsi8iywSVV3\n+uNygYJgX/59LBKR84AlsSlEEWkP9ME92QdJ5fqoEZHOqrrWy3YBzlXV/4t/46p6g5cpVtVN8fsD\nRL03fCgi/8SNAhRnTBM9HI4WkTm4KabL/LRgcCp1Q+x8cSQaiTYbWeWDABCRE4ATcR/uK6r6aojM\nIFwa+2HAGmAhcL6qfhIiOwq4ExgBzMQNf89S1XpPP1LrYwiiCeaDz8D5AgAmqOqzITLTgFGx+Wpx\nTsPKsDneqIjI2ar6VLI23/594Gnc/OoDuKfb36jqPSGy14edL/bDDMh9BbgHmI/7jgbiptPeBC5W\n1dsa0WeyeePYHG8e7ia+AHeDTDjHKyLlwL+ATr5pLfBdVZ0aIvt9nOHtg/P/HAJMUtVj4+TuxvlS\njlXVffwNbayqHhgn9wYhN4z4/rzsBpzh2ua3hDcVEbkwvs33+1Cc3LGq+nqYbJzcZOB4Vd3oX5f4\n91MvWktEKoHDYj4AEWkHvB3y3lO5Pqar6v5xx9fza/j2Q3HTwiWq2k9ERgI/UNXL4uTC7g3nxRts\ncX6QHwKx0c4E4G4N+NDi5LsA61V1p5/Z6KCqX4TJtiRZYyD808srqnp8CscUAzmxJ/QG5PKAvXEX\n7Eequr0h+eYiwQ8gkZM6kkNZUnAsNlLnDu607qaRQKYA94QqwJxEP6qofYrIePy8cezmICIzVXWE\n/79/Q/039LQuIh1xv6OEU13eAB0ITFbV/UVkKHCDqn4jTm6qqo4K3sRE5H1VHRknVx54WYhzrO9Q\n1Z839D6aEz9iGkBdx/e/42TCrs96bQ3I1nvvvj3S9SHOxzYy8ACVC3ygqsNDZKfgpuleCLtG/Osc\n3MPfk1HvDVEQkSKc/6mfql4iIkNwvqAxfn9Hdf7DrmHHa+Jp5SaTNVNM3jJXi0inhn7MACLyR+CW\nuKHpT1T1VyGyZwP/89MxvwJGicjvEzxJRr1JNxjtFGCBiFyBezoG9yS1IMHbatChLCInE9GxGDgm\nlQiuEV6Hrv71SuDbqhrvVwAop/bms5+I1Lv5pNhng/PGWutXCPsB1rkBSF3ncLA91lf8lBnAFnXR\nNYhIgarOEZG9Q+S2+5tY7IZWSm101i5UtSqu6W1vBMP0anDaKk52CM6vNYy61+egOLmHgcG40dCu\naCcg/jvaJCKjYr8Fb9g2h+kJrBCRr6nqC172NBIvhhPp+sBNqT4pIvd4/S7FRQmFoqqfxV0jO+P2\n14jIj4AnE01DSSOi3HCj7yrcqASco/opYIx//R+co7rK9x1UUoE6309zkjUGwrMFmCEir1I3UuGK\nOLmTVfX/Bfav8UPbegYC+LWqPiUiR+Buvn/B3bAPDpGNGvXTYLRTgEuBO7xeCowDLkkgm8yhvITo\njsUYDxI9gusfwDWq+gaAiByN8//UmW5I4eYTuU+izxtPBfripg4E6AwsFZHluOmLKhp2YCZisYh0\nxvmeXhWRNbjPO547gGeBHiLyB9wTbdhDSdCQ5eBumL0SnPv/8NNWwO+AjbhIngNDZB/AhQ3/DRda\nfRF1b0YxKnBRdsmmH64CnhKR2Hstw0UdhXEp8KiI/B33PS0Gvh0vlOL1cS0uoumH/n2MBf6Z4Pyf\n+VGR+umtKwj/bb4qIj/FXefBe0jsKb4xEUeDVfUb4iK6UNXNErBUqnqq/9vyOVua5hWJWtOGi7Co\nt4XIfQAUBF63B2Yl6HOa//sn4FvBtgZkP/B/84HXQ+TeTsN7f9f/nYDzl3QHFoTIdQRyA69zcU/g\nYX2+F/9+gekJZN+P2PYhfuozwnuK2ucgXJhnNfA5MBG35GK83D3AlwOvTwRuxfkMpjTT93AUzgi3\nS7B/KC6E9kfAPglkFuJGigtxgQZjgSMSyE4N+Y7qfUa+vcr/nRFoeytE7incKCTK+83319u+QH4E\n+RLc/Hui/ZGvjxS/l+64SKxlwHKcc7lrgs8+fqv3O0rx3O/4e0zsuxoc+73GyY2L0tacW1aNIDTO\n2dYAjwDjxDmVFRftlOjYz8VFRh0P3OznRxNlmEaN+okU7SQpJPQB//BTZb8GXsA7lEPkxvr3EpvP\nb+/bwspApBLBtUBEfk1t3sD5uB9XPDNxT8OJosZS7lNVFwDHR5g3rlDVSwPHjRWRP6rqNf573UXg\n2og/13cDMmFTVrEIqRKgztyxuKi0J1Q1Uax+7BypPElGmrbybPHz7HP9VMrnuGnOmH6xJLUOwGwR\neZe612d8Ql1sbr2/ql4sIkNEZNfcepxs1OnKyNdH1Ckzz94alwwqLjv97WBb1M9e6kYetcMZyk0a\nHnF0PW7qq6+IPAocDnwn0Fcs8a+7/w0HE//2iKJPY8kaJzWAiCwk/Edd74Lxc/LH4Yemqhoa3+9/\nBCfhnrrmiss12FdVx4bIRor6kYjRTpLE+doYUnQsphLB1QW4ARcrL8B4nKN2TZzcG8D+QIM3nwR9\nTgB+G9LnfFxW+Fu4iLDZCd77WNw03eO+6RvACbjv9z0NOOpF5MzAoYW4JMwlGpiuDFxvYdM0Gn/d\niYsi+gYu5+VZnLGoDNEzn7oRMm/iroF6wRHiwke/AYzCPeScBfxKw6PSDsQ9oXfGTUd1xPnipvj9\nR4W8j+AbquMH8Q85VTi/0AhxoauTElxLL+OnK1V1pLjAj2mqum+cXCrXx0Rqp8y+ip8yU9V60W8S\nMTgjlc8+7rjTgYM0MHUdt78bbqQquGCGlYF9V1Kb+Pe5l1Hc9O8/kj1QNIVsMxDdAi8LcRmLXVU1\n7Ek6ap+h9YTUx2mnExF5T1UPlLpRL4lu5pGe0MQle/1Y6zoW71LVQxPokHIEl3+iLVbV9SH7Qm9C\n8TefuGM6ATWJRgb+6f9gXNjw4bhpnPdV9etxct1xN5SYwZmIM0DrcBEm8xrQIQd4TUNCTVPFjzzO\nxOW/9FPVIXH7/4l7Io2Nai8Adqrq9xP0N5Tah51xmsC3JSIVOH9Sf2qTt1TDo+J6AQfhblTvaUhI\npriciApJEpXl2yNdy6lcHyJSparlIjIjZmhE5C1VPTIgE8tOvwpnSGJ0BL4er2uqn33csZNV9ZDA\n6wYjAzUu0EVEfgPcpi6i6dc4o/+7eLnmJNummOJT92/zTxl1DIREjyICeJHap8RCYCDwES5SqQ7e\nUflt6ocHXuH3/1xdaYs7CR/pxDvTU0noe5BoDuWkjkUJKRzo2UtcREl84h/inOKX4hyLVUAnEblV\nVesU0mvIEIT0eSAuF6GDf70Ol4sQH+WzEze9txM3vRKbZ66Df2r7cYLTJTQOniFA6MOChBS88+eb\nENYO7IkzYgNwdZHiOTDuxvW6iLyf4NyRpq08j+JGpDNIPA0VGwn/BlcrTIA7ReRGVf1XnOg2P2qI\nXZ+DSZylHWm6MpXrgyRTZp5Us9MjffZxv5EcnGM//jfdUHFGpW7mP7jR+Y3iAmJO8McnCohpFrLK\nQMRZ7NiXFhaVEjWKiJAh8Chc5EQYL+GmOhL9AGPnq6SBULkAl+MieYaKyOf4pJ0EspFKgqjLsh5K\n7ahgTsio4Kv+b2hRQeqXsAAX9bLeT3m8hIswqQL+DG46QFWPkPpZow0Z5/uBy1T1Ld/HETgjGP/E\nux73md8K3BfyoIA/PtUEtNiDgeL8SYlKWvws8H8h7sm7irgbgLhieV/HOaCfwD0drg3pb6eIDFbV\n+f64QSSuszQV+JU4f1XCaSvPCvVhpkn4Ga7kxSp//m44R2u8gWhwbj2Oa3C+scF+FFtK4AadyvUh\nIg+r6gW4OlRFuIik3+E+7wuDJ9Xa7PTNqnpLcJ+4EPb4agNRP/uvBv7fgcsIPy3u3MeEfRANEDvP\nKcA9qvq8iPw2xT5SQ9PoAW9tG65kRGx7FXdz3TtErklRRPhohKjtIXIH4n7M03A3thn4yCe//0r/\n93D/N5Z52VCfbwLdqI2UOARXzTVerggXWnmffz0EODVBn2MIRLPgRhvPJJCdhRuaPwUc5dtCo2lS\n+JzrfU8J2k7DGaLx/nu/AVfdNV6uPLAdjjMot6ThOuyLq7kT334ZboT5G/+6H27eOl7uOOBT/52O\nx918jklyzq64qq7jgLkJZI7DhYGeC5wR20LkxhGIwsI9hb+WoM9uuBvaqbiHlIZ0zMONvEcQIeKp\ngX5m46bJ3sdVJe4a3BIcU++3maAt5c8+gr6FOAP5DM5HeRVQGCI3BlckcD7OT1TQ1N9Qsi3bfBCD\n1EW0BNsGqurCuLbbcZESyWomxSdO5eDmBbup6pdDZK/GRQeNies3PprlI0KG+lqb0DVdXUZu5Azn\nqA7lFB2LYZmmH2iIk1xcQt+1uB/tKbib3yMamA8OyOYCPak7DVfPpyMif8MZtFjNqm/gchie9sfE\nz+EOBU7G/QB7qGr7+D5DzjFeVUPnvUXkawSclRoSnZPgOMF9TvGjz0ilNrxsAXVHeQ0W2BORg3Cf\nz+nAbFX9aojMI7iprVnULZ8eHxzxb1zY6vO4z/00nNP4Yy/ykrpkwLBrU3H1w2LX8rGq+nqCaUvF\nRXpNVF/PyR/T4PXhr7Uf4sKbg07d2GhjUEA2liB6Dm7UFqMjbtR7kJc7W12+00BcDkuDn72I9MH9\n3g73556Ie7BbHCL7JM7hHKveei7QRVXPjpOLHBDTXGSbgQiLSqhS1fK4tkhRRF42GBERG0o+rSHp\n/yJyOfAHXN2e2Adf54L1chM1pG58YP9jwKG4m/z84C4SOBX9cUkdyik6Fu/CjTCCRQXnqWqiefx6\n+qhqfPnnSGs8eNk3YvtjTdS9ERzr5Z7GRb7Mw/1QJ+DyGrbE9ReWgHaHqtbLehaRm3AjvUd907m4\nOli/CJEN+pRyvC6fqOr5cXKRSm349qSlLrxc/LTVsxo+bUXQmdsQkqAGVoDe6kpGvJFgfzfck+8F\nInKDutLzYb+5mGx7VT3BnzuV6+NuVf1hkvcyEvd93EhdX+QG4A31EXGB7ybSQ5m4ZNz/UDcE+7zY\n+4iTDSunEvq9tzRZYSD8k+NwnG8hOB/cEfiZhtRmSZMe84GDNRDClkDuONwNJ76q5jMBmV64TOh6\n4X0aqB3UgEO5Xp9e/h3cMPpt/4MYjJsOOSiBrkmLCgZkT6F+mZEb42Tm4T6jZGs8hN2oNEGfB+Km\nCxpcD0HqhqXuwPl0blRfsTVO9gNgf3UrwcWeaqcluFEF57134IzD2yFyU3A+nff8Z1+KG0HEL5oT\nmk2s9YMYEJHLcKPWAeocnP2AXhpeauM+4G+aIAw4RD5pXa0Gjh2rqidGlL1fVb/n/498faSoT37Y\nA1Ng/6s4Y7w/Lly6Dlo/BySVcPEHcT6Fyf71wbgE3sviZVuabHFS742bA+1MXefRBty8bB0ktRXl\nSoGfU//GFxbuOIvEi8oEuQg31M8n8JSEd/6KyDhVPU5EXtHkZZ9TdSin4liMGZgwp3QdxNXDKfLn\n/SfOAVnvJkX0NR6gNpkP3Gd/KuHlEaYDl0ttNNF43A+yzg1BUy9l0JnaZLdODcj9F1ePaVfZaxEp\nUtX4ayFSqQ2il7oANxUUK7VxI+6af5rwUhtHABd6Q5mwmq1ErIElLsHrMt+v4m6s96jqlnjjIC5U\n+Xpqp+zG44zzuphx8KRyfaTCABFpKKnuFNz08cNEWxp2pYicT+3CRucSvgASuCikb4tIbJqsH65c\n+AwamBFoCbJiBBFDRA5V1UkR5CInoIlLrnoCt1rUpbgoiRUaskiLuPr4w3FO8obq7Tc41BeR2bg5\n1nsIiVrS8EKBY3D1hOqsUqdxy5j6fQmTdvz+lCOOxFeZDfwtwTm0428U9+MMerJ1I+rh5+Vf0Dj/\nj0SMXZfUEtC+CdzkZcQf8wtVfTxENpWy10lzFsStjHeF1l1UKpQUp636h/UR/xDiR5m/1Lo1sP4Y\n/36izq172adxvrHgdzQy/vpsyvXREBIxqU5ESlV1hSRZN8KP1O7CTQUrLsrrCg33pYV+7jEiPASm\njWwZQcSY5v0AydZQTmVFuW6qer+IXKm1IXOJYrWf81syJovIsAaG+r/BrSTXB1ccENiVrRsWPw1J\nVqkTkaFa17EYk+0nrgLoLseiev+IRlwpzROr4lktbi3wVUDYE/unfmvnt1QoIryyZdS8gbtxhiS2\noMwFvi0sCeoUXFjnGq/vtZq4fn9hcBpGVTeKczjWQ93azvXWdwaQFEtdeCKX2kjhRlQcMw7+uDcl\nfHXGveM+9zcSfO7gCtYFs9NvEJHpIXJNuT4aor2qjhMR8Z/Db0XkLZzRCLKn/32X4H4boetG4MJq\nL9RaH0ZX3G81WIqlo7pk0dAET01jGe+oZJuBiFpNNZUEtNjT5VI/x74Ed+Ouh0avBdXgUF9V/wv8\nV1xm5UJgYHB+OUGfb0oDq9ThwuwuIfHwuZt/8rwgwf5kjBGXKPhnXGy+ElJZU2tX9oqybkSwtHIu\nzmlfb8lPoseuR05Aw+VbHIHzAQ3CrcQ2Qd1yl/GkUva6If6SXKQeUaetUiFqXa1pInJI3Nx6Pd+L\nZ7PUXaHvcOI+I2/oSlT1Z2EdNJEoSXUAt+HuHy8AqOr7Ep4IuZ8GSr6oW0Y0fqGijJXxjoymMYa2\ntW1Er6YaVv1zQII+T8XNP4/A3XCrcEl2YbILcdEkdbYQuf5hW4jcPbjSzR/6113wFVYTnP8M3BD6\nb7gyAql+fmOb6XsoADol2DcCl/+xyG9VwPAEssHPpzeQl0AuGLv+Jgli13GGa3DcdZAwdwVnlA7B\nLdG6CBfyGCZ3IC7a7C2/zQPKm/D53RylLbAvaYXYFM/fBWd4pvrv6jbc1FFs/wxcReQPcaOVT/y1\nXwPMTNDnSFwI9Cd+m4a7ycbLpaV6qf+OSnAPdw/g/DQHh8hN8X8brI7r30vwM+lKoEpuW9mybQQR\nqZqqRq/+Ca6e00RVnQkcExhKjg6RrQj8v6sWVMj5ow71D1I/v+yPWyOuln0oGsGhnIpjMRX8lMpP\ncLWFLhaRfiJypNbPHYi6xkMqn9PbuASj4/zre4EwX9TPcNMgC3BPc/1xc9Fh72ccLkFxEu4zOlBV\n65Xv8HpGyU5PhROon7V9ckhb7PwJp60ag7on41h5mLC6WsE1EboQiHLDhXjXwT+5762uSF9Hf456\ndbo800XkBVzCZXA9hqSBEklQ3IioP7V1qO6jflZ+1HUj/opb4/y/vu9zcCHu9RCRr+MeVNf5152B\no1U1ynR0Wsk2J3Wsmuq+uNpEJbgFf+71+89X1UckwaphGuIIk5A1bsPaGtCpwZyHJMcmDYtM1aGc\nimMxRV0jJeCFOVATOVVTOPeTuHIbwZyFRM7SSAlo4pL0ynFTgG/jbn6TVHVzQKahJLCUb2oi8kOc\n8R5E3fyXDriw5PNDD5oPH9IAAAw9SURBVGxmJKSuFlCvrpa4KqTfxz2UCC5J7z5VvTOkzwmqGlqz\nKk4uco5SKkiS5NSAXHfgdlxJ/NgiRFdqSNituIKYx1IbcJCoinDktbNbmmwbQTyMq5I5gNpoiZ6B\n/TFHWyrO1xwR6aJ1nVGhn6tErwUVlaTzy5q6QzkVx2IqNLhqVoCo89upkMp7Ci5nOVISLGepqlfD\nroiki3DTEr1w02cxjsKFFdfLWiYQtpwC/wFexq1xcF2gfYO2rEOzwbpaAb4HHKI+2kdc0t4kXIZx\nPMlWaou9Dh3RNQNJ61D50dIFGrduRCK8QYiSUxK2fkyruDe3CiVakOdxMdRVhFSVVNV7/UWwXlX/\nFr8/AZGHkl429hQfy7pu9JO5qj4qIlXUhkWerhEKDCYhFcdiKkSt7PldXK2k2FPneBJM86RApPck\nKSxn6R2ZR+IMyiJcRFOdBCqtDZG8UeuXc2nM8pGqqp+Ii8SL16drCxqJfHEhwafjSsFvF5GwqQih\nbjDATuo6YoN8F/dZx0cDxVcZSGWRrFS4Xlw4dMLkVHXr2p9G3bLgzUGliNyK8ycqrqJwfEXijJBt\nU0yRFtMRkTc0hUqLKQwlC6kdwcSMs2pc5m8mCEQE5eOmWD71r/vjavc0ZREiwYWMfg+XiDQWn4Cn\nqm82cFzCdSNSPP+H1L4n8IlIuKkEVR8d5uUiJaCJyM9w00pVGlcuJEQ2UomXCOcco6qnSvhCRKrh\nK6U1OxKxrpafqr0QN8oFZ1AeVNXbQvpsT7jvKz6SqdkXyfJ9RK1D9QfclFr8SKfRazJ4X+evqTtt\n9XttIM+ipcg2A/EP4E5VnZFErtkvAt/v/3BOuqkEnqxUNUpmZlqRusk69RyLKTiEE/VfhVvjOWEC\nnpeLNL+d4rkjJSJJCgloEc+blhIvfqQzAbdedLM5n5uChNTV8u2jCKz4p6rTEhwf5ifqrKrnxMlF\nXiQrRf2j1qEKqy+l2gwLRbVGsm2K6QjgO5KklAC1ETPBJ/tECWip0EdVT2piH2khcJOMdyw+jIvm\nCJs3ToXJwCBVfTGJXNT57cgkM27SuAS0KKRU4iUFYjkYd4rL6ZiGMxZhORjNjiQoi0H4Aj9TcQ9E\nyYjqJ0olRykVkiWnApDKzEJU/LTZT6lffDHjRifbRhCRSgmk8fyRRjCZRFwRukMDjsViXHROk+rB\niCsPshduvn4TCYyziMzCFUT7D25+e3xTo5gi6HaU1+dmXF2tXbtw+QWNXrHLT5Ndq6p/bJqWof0e\niKttdSmwWVWHNuc5Gjh3pLIYKfb5IBEK1nmD+A/cQ9waXADD+ar6SWPP7fv9EOd/SlaHKtLSvSme\n+31cTlMVdWcWMu6HyCoDEZXmvggC8/t5uPLYC2h4BJMxvK4Hqi+F7f0m70UZfifpN2qdn8jrRjQ3\nCXwFHzSDcUzJpxWhv/gcjImaIAcjHSQIy2zSNE9UP1FAPkqOUirnj3p9voxfuldd3kYeLmmu0b+P\nxvijWopsm2KKyoNEW785KqcmF2k1PABMEVdYEJxjsdFPRzGijtJU9Q5c+G6MRSLS7MP6IBLIL/Aj\nqBgdaJ4IrnfErZ3RXD6tD3DRUyNw0zprRaRODkaaSVoWoxFEmnoVkT/iVvlb6193AX6iqk0qH5LC\nLEKkpXtTZLS4suzP0sBCYpnARhAhpMsR1laI6lhM4/mTrhvRzOfrhHPMpyW/IF2OTanNwfgpbo2H\ngiSHNAsisj9ueqkT7hpZjYtIa458mWTnDktMjbyyYjOc/01cJOKr6pJTD8FNQx7VhD7D8nxaLCqt\nIWwEEc4mcSWvY46wQ0hPDfpWSQqOxWZHoq8b0WyoK3GwDhc5k47+m3UEJBFyMNKJqk7HJREmK4uR\nDnJFpEB9hrsPj20Rw+i5Bleob7CIvI1furcpHWrq65C0GDaCCEEirt9sND8Scd2ItkZzjopSycFo\nTiRBCZoY2sQ1GSLq8HNcBd0HcA9w38WtAXJLus8d0CHp0r0R+2nWUizpwEYQIajqVB/Z0uSLwEiZ\nqOtGtBmae1TUlJyQJtKUsjDNgqre4gMpYtUDfqeqr7SwGgdRG5I6ShKUY4lAc5diaXZsBBGCNFDR\nNKOKZQHi6jDdibsBxEoP/FNVf51RxZrA7joqykYkhfXAI/aXg5udeLKZVGxWzECEIGmqaGqkhrjK\nqoXeR9BmEZEpqnqwuKVHz8CNimaq6pAMq5YSIvJz/wR/J3UrAwPQ2JtkijqcgctX6YEbQSRc5jZN\n549cjiWFPiNVss0ENsUUTroqmhpJkOjrRrQlYqvp3UJtEbZ6q+m1AWKFICszqMMtuAW5mlqUsrHM\nxFXtbZZyLJ5IlWwzgY0gQoia1Wk0PxJx3Yi2hH8PP8RFHsWmLO+2KcvUEZG3VfXwDJw3WI5lf5wP\nqTnKscTCXMNGZBkPczUDEUKqWZ1G8yEilapaEZeDktZSG+kmwZRlvUJ0bYVM1g4SkdtxT/DPkaAs\nd5rOm85yLJEq2WYCm2IKp1UW1MsSoq4b0ZbY3aYsn8LVDvonddd7aAk64taKDzr40x7xo6rjAUQk\nP/Z/DH+9NoWHcJVsYxUEzvVtGX+AMAMRzhBVfS3YICIXqupDiQ4wmo6ICO7G8z+gr4g8il83IpN6\nNQPpWoQpU+xQ1bszcWJN34pyDZLmciyt9gHCpphCEJEJuIVDfopbt/qfwFZVbVLGpJEcibhuRFti\nd5myFLecLsAVwArcU3uL1g4SkT64MOjDcSOHibg1oRen+bxpK8fSmn2eZiBC8E+yPwF+4Jt+o6qP\nZVClrEFE/o5bdey9TOvSXCSqFBojhUJxGUXqr2RX5+bREk5VEXkVVwo+uGb5eap6QrrPnS5a8wOE\nGYgQ/JPSvbjhYx+cc/Hm5ox9NsKRiOtGGJkjk07VsKKZbb2QZmt+gDAfRDiTgZtU9V/+x3Azbp7x\nsIYPM5qBkzOtgJGUTDpVV4rI+UBsRH8uLvGwzdKaR5A2gghBRPrh6qQMVNUb/esBqjohw6oZRsYJ\nCztuqVBk/1u8CzgUN3p5B7eO+KcNHmg0ipxMK9BK+QXOSRor/7wB+Gvm1DGMVsU0XwIfaPGorN/h\nHLilqtoDV831ty107qzDppjCOdgvBjINQFXXiEi7TCtlGK2Eg4Fvi8j/b+/+QeSqojiOf782JhJQ\nMQoW0RRqJAkh5o8ogmAQKxstRKuk0kILtTWIRQIWEUGDYrARBBGDEBELRVhEUVEkqEnUiAgBLVw0\niIUxMcfi3nFnl2eKjbOD6+/TzM57c/e9GZg9e95995x5k6q9yuqk54s2VdUvoydV9bN6w7kGxOIl\nQAw7bWsKP1qsdTntjoKImO5C0gvUS0dBot9Qkr9jE5IPdtgztP6wV6h7afX7z6vnbcRyMeVJ1ado\nPb4P0v6BuwfYO8XzWdYySf0P1OuZa0ry7hSrR0bEGHU9sIO57+bRKZ/SspUAERERg3IXU0REDEqA\niIiIQQkQEZ36mHpE/Vw93O/vn9SxZtRtk/r9Ef+G3MUUAag3A3cCW6rqlLoayNqX+F9LBhHRXAnM\nVtUpgKqaraof1MfVT9Qv1QO90u8oA3hafU89pm5XX1ePq3v6a9aqX6kv9azkYO+5PY96h/qh+pn6\nmrqqb39SPdrH7lvCzyICSICIGHmb1qToG/W53mISYH9Vba+qjcBKWpYx8kdV3UprcnQIeBDYCOxS\nL+uvWQcc6KuLf6VVQf1bz1R2A7dX1RbgU+DRvgDsLmBDH7tnAu854pwSICKAqvoN2ArcT2uG86q6\nC7hN/biXkdgBbBgb9kZ//AI4UlU/9gzkO2BN33eiqkZ1il6mlcgedxOwHvhAPQzsBK6mBZPfgRfV\nu2ltNiOWVOYgIrqq+hOYAWZ6QHgA2ARsq6oT6hPAirEho25qZ5nfN/ssc9+thQuNFj4XeKeq7mPh\nDr2RtljzXuAhWoCKWDLJICIAdZ167dimzcDX/efZPi+wmJazV/UJcGjVgd9fsP8j4Bb1mn4eF6nX\n9eNdXFVvAQ/384lYUskgIppVwLPqJcAZ4Fva5aaTtEtI3wOLaYN6DNipvgAcB54f31lVP/VLWa+o\nF/bNu2kl5g+pK2hZxiOLOHbEeUmpjYgJUdcCb/YJ7oj/nFxiioiIQckgIiJiUDKIiIgYlAARERGD\nEiAiImJQAkRERAxKgIiIiEF/Adr9OpUnzXCmAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "theLowerContFreqs.plot(30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We might also want to check how these words are used by looking at their concordance. " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Displaying 5 of 103 matches:\n", " in the subdividing and balancing of power ; the lawyer more method and finer p\n", "n , which not only escapes all human power and authority , but is not even rest\n", "ceived ; nor is any thing beyond the power of thought , except what implies an \n", " limits , and that all this creative power of the mind amounts to no more than \n", "o show distinctly the action of that power , which produces any single effect i\n" ] } ], "source": [ "theText.concordance(\"power\", width=80, lines=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Collocations, Similar Words, and Contexts\n", "\n", "### Collocations\n", "\n", "NLTK will also let you explore co-locating words by which is meant sets of two or more words that appear frequently together." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Project Gutenberg-tm; Project Gutenberg; Literary Archive; Gutenberg-\n", "tm electronic; common life; Archive Foundation; electronic works;\n", "Gutenberg Literary; sensible qualities; United States\n" ] } ], "source": [ "theText.collocations(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note how we are getting a lot of bigrams with \"Gutenberg\". That's because NLTK looks for bigrams where the words appear together more often than alone. If you ask for more collocations you can see some that have to do with the text." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Project Gutenberg-tm; Project Gutenberg; Literary Archive; Gutenberg-\n", "tm electronic; common life; Archive Foundation; electronic works;\n", "Gutenberg Literary; sensible qualities; United States; external\n", "objects; human nature; set forth; human testimony; voluntary actions;\n", "electronic work; necessary connexion; public domain; secret powers;\n", "Gutenberg-tm License; regular conjunction; human life; _in infinitum_;\n", "reasonings concerning; usual attendant; one object; constantly\n", "conjoined; David Hume; universally allowed; human understanding; seems\n", "evident; concerning matter; Human Understanding; copyright holder;\n", "take place; simple ideas; real existence; every moment; may observe;\n", "shall find; certain degree; infinitely less; CONCERNING HUMAN; ENQUIRY\n", "CONCERNING; HUMAN UNDERSTANDING; PROJECT GUTENBERG; past experience;\n", "Enquiry Concerning; one event; give rise; good fortune; conjoined\n", "together; human actions; customary transition; infinite number; must\n", "confess; constant conjunction; common sense; narrow limits; mutual\n", "destruction; strictly examined; first appearance; conclusions\n", "concerning; one instance; primary qualities; Concerning Human; mental\n", "geography; physical points; experimental reasoning; universally\n", "acknowledged; natural instinct; inward sentiment; universal doubt;\n", "paragraph 1.F.3; two kinds; may serve; new effects; uniform\n", "experience; usual course; Distributed Proofreaders; Jonathan Ingram;\n", "Plain Vanilla; Vanilla ASCII; _Christian Religion_; _necessary\n", "connexion_; _vis inertiae_; secondary qualities; natural events;\n", "derivative works; Gutenberg-tm trademark; greater variety; well known;\n", "distinctly conceived; may seem; strong presumption; divine existence;\n", "similar instances; draw inferences; human reason; human action\n" ] } ], "source": [ "theText.collocations(100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Similar Words\n", "\n", "We can get words that are **similar** to target words. These are not synonyms but words being used in similar contexts. You can use this to expland on a word you are interested in." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cause reason nature men it ideas necessity mankind action objects\n", "conduct them body power experience resemblance first miracles science\n", "life\n" ] } ], "source": [ "theText.similar(\"truth\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use this to get concordances of sets of similar words." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "REASON: \n", "Displaying 5 of 116 matches:\n", "Of Liberty and Necessity IX . Of the Reason of Animals X . Of Miracles XI . Of a\n", "eigns . 7 . But is this a sufficient reason , why philosophers should desist fro\n", "iscover the proper province of human reason . For , besides , that many persons \n", "er parts of nature . And there is no reason to despair of equal success in our e\n", "RT I . 20 . All the objects of human reason or enquiry may naturally be divided \n", "--------------------------------------------------\n", "\n", "FACT: \n", "Displaying 5 of 89 matches:\n", "tainty and evidence . 21 . Matters of fact , which are the second objects of hum\n", "ing . The contrary of every matter of fact is still possible ; because it can ne\n", "s of any real existence and matter of fact , beyond the present testimony of our\n", ". All reasonings concerning matter of fact seem to be founded on the relation of\n", "a man , why he believes any matter of fact , which is absent ; for instance , th\n", "--------------------------------------------------\n", "\n", "KNOWLEDGE: \n", "Displaying 5 of 37 matches:\n", "prehension , possesses an accurate knowledge of the internal fabric , the opera\n", " make any addition to our stock of knowledge , in subjects of such unspeakable \n", " letter received from him , or the knowledge of his former resolutions and prom\n", " must enquire how we arrive at the knowledge of cause and effect . I shall vent\n", " admits of no exception , that the knowledge of this relation is not , in any i\n", "--------------------------------------------------\n", "\n", "IDEAS: \n", "Displaying 5 of 120 matches:\n", " of Philosophy II . Of the Origin of Ideas III . Of the Association of Ideas IV\n", "of Ideas III . Of the Association of Ideas IV . Sceptical Doubts concerning the\n", "rror ! SECTION II . OF THE ORIGIN OF IDEAS . 11 . Every one will readily allow \n", "d impressions are distinguished from ideas , which are the less lively percepti\n", "untain , we only join two consistent ideas , _gold_ , and _mountain_ , with whi\n", "--------------------------------------------------\n", "\n" ] } ], "source": [ "listOfWords2Conc = [\"reason\",\"fact\",\"knowledge\",\"ideas\"]\n", "for i in listOfWords2Conc:\n", " print(i.upper() + \": \")\n", " theText.concordance(i, width=80, lines=5)\n", " print(\"--------------------------------------------------\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Common Contexts\n", "\n", "NLTK can give us common contexts for words that share them." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "human_it from_and of_are in_and of_but this_he by_that of_which of_in\n", "the_and\n" ] } ], "source": [ "theText.common_contexts([\"nature\", \"experience\"],10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding Patterns\n", "\n", "We can use regular expressions on tokens with the ```findall``` method of the Text object. Some guidelines:\n", "\n", "* You are matching to tokens, not the raw text. The < and > indicates the token.\n", "* ```<.*>``` matches any token as ```.``` means any character and ```*``` means 0 or more of. ```?``` would mean \n", "* The parantheses tell IPython what to show from the match. In the first example below you can see how to show all the words right before the word you want.\n", "\n", "Here are some examples." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "and; from; by; by; to; without; by; not; and; ,; assist; by; to; from;\n", "have; that; this; past; from; from; of; to; of; of; by; from; all;\n", "from; past; by; from; more; more; this; his; from; from; and; of; and;\n", "pure; is; farther; of; our; from; daily; any; and; from; without;\n", "from; by; besides; of; from; And; common; by; we; by; except; certain;\n", "by; and; and; fancied; of; by; without; have; this; have; this;\n", "uniform; and; that; of; no; and; past; past; the; our; seeming; from;\n", "and; not; even; past; greater; 's; and; Though; to; of; infallible;\n", "past; our; by; past; the; from; this; of; uniform; his; have; uniform;\n", "unalterable; from; uniform; uniform; no; we; regular; is; same; of;\n", "the; and; from; past; my; human; make; same; by; in; other; from; of;\n", "If; By; here; any; from; our; and; on; only; by; from; from; by; from;\n", "long; by; from; from; and; of; by; uniform; and; of; by; .; to; for;\n", "only; from; .\n" ] } ], "source": [ "theText.findall(\"(<.*>)\")" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "of human nature; regard human nature; , that nature; into the nature;\n", "of human nature; derived from nature; parts of nature; concerning\n", "human nature; limits of nature; , where nature; concerning their\n", "nature; triangle in nature; a like nature; is the nature; the same\n", "nature; of this nature; concerning the nature; course of nature;\n", "course of nature; laws of nature; discover in nature; established by\n", "nature; is the nature; , that nature; of their nature; course of\n", "nature; of human nature; similarity which nature; Of what nature;\n", "course of nature; learned the nature; Their secret nature; and\n", "transitory nature; as human nature; of human nature; priori_ the\n", "nature; of human nature; of human nature; of this nature; accurately\n", "the nature; excited by nature; the whole nature; the peculiar nature;\n", "observed that nature; the same nature; a similar nature; course of\n", "nature; works of nature; wisdom of nature; . As nature; the very\n", "nature; contrivance of nature; constitutes the nature; irregularity in\n", "nature; how soon nature; productions in nature; in all nature; and the\n", "nature; with the nature; and the nature; with the nature; scenes of\n", "nature; operations of nature; powers of nature; appears in nature;\n", "force in nature; author of nature; They rob nature; throughout all\n", "nature; course of nature; of this nature; laws of nature; scenes of\n", "nature; operations of nature; operations of nature; that human nature;\n", "of human nature; with the nature; course of nature; of human nature;\n", "of human nature; part of nature; characters which nature; course of\n", "nature; part of nature; laws of nature; of human nature; part of\n", "nature; the same nature; the inflexible nature; but their nature; of\n", "human nature; a similar nature; powers of nature; being in nature;\n", "their very nature; of that nature; phenomena of nature; system of\n", "nature; formed by nature; intention of nature; of the nature; course\n", "of nature; of this nature; uniformity of nature; hand of nature; a\n", "like nature; in human nature; state of nature; is placing nature;\n", "course of nature; laws of nature; the very nature; laws of nature;\n", "course of nature; from the nature; laws of nature; laws of nature;\n", "laws of nature; contrary to nature; law of nature; not its nature; in\n", "human nature; frame of nature; from human nature; the public nature;\n", "_singular_ a nature; of this nature; or miraculous nature; by the\n", "nature; of this nature; laws of nature; the very nature; laws of\n", "nature; course of nature; dissolution of nature; laws of nature;\n", "course of nature; laws of nature; extraordinary in nature; of human\n", "nature; of human nature; order of nature; phenomena of nature;\n", "phenomena in nature; appearances of nature; appearances of nature;\n", "course of nature; course of nature; course of nature; order of nature;\n", "course of nature; course of nature; course of nature; course of\n", "nature; course of nature; order of nature; laws which nature; with the\n", "nature; works of nature; works of nature; Author of nature; course of\n", "nature; In human nature; course of nature; delicate a nature;\n", "particular a nature; of this nature; a like nature; from the nature;\n", "instinct of nature; instincts of nature; instinct of nature; contrary\n", "a nature; a like nature; propensities of nature; a like nature; of our\n", "nature; of our nature; necessities of nature; in human nature;\n", "situation of nature; us the nature; course of nature; entrusted by\n", "nature; production in nature; . Human nature; course of nature; in\n", "human nature; of human nature; course of nature; , for nature; design\n", "in nature; appears in nature; match for nature\n" ] } ], "source": [ "theText.findall(\"<.*><.*>\")" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "talk of; is a; and a; , and; of their; the same; love of; . The;\n", "discovery of; for the; for the; inclination to; distinguish between;\n", "violation of; violations of; depart from; to reach; _criteria_ of;\n", "with great; love of\n" ] } ], "source": [ "theText.findall(\"(<.*><.*>)\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "not universally true; not a true; not true\n" ] } ], "source": [ "theText.findall(\"<.*>?\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/) From [The Art of Literary Text Analysis](../ArtOfLiteraryTextAnalysis.ipynb) by [Stéfan Sinclair](http://stefansinclair.name) & [Geoffrey Rockwell](http://geoffreyrockwell.com). Edited and revised by [Melissa Mony](http://melissamony.com).
Created October 10th, 2016 (Jupyter 4.2.1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 1 }