{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "version 1.0.3\n", "#![Spark Logo](http://spark-mooc.github.io/web-assets/images/ta_Spark-logo-small.png) + ![Python Logo](http://spark-mooc.github.io/web-assets/images/python-logo-master-v3-TM-flattened_small.png)\n", "# **Text Analysis and Entity Resolution**\n", "####Entity resolution is a common, yet difficult problem in data cleaning and integration. This lab will demonstrate how we can use Apache Spark to apply powerful and scalable text analysis techniques and perform entity resolution across two datasets of commercial products." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Entity Resolution, or \"[Record linkage][wiki]\" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. Our terms with the same meaning include, \"entity disambiguation/linking\", duplicate detection\", \"deduplication\", \"record matching\", \"(reference) reconciliation\", \"object identification\", \"data/information integration\", and \"conflation\".\n", "#### Entity Resolution (ER) refers to the task of finding records in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, databases). ER is necessary when joining datasets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), as may be the case due to differences in record shape, storage location, and/or curator style or preference. A dataset that has undergone ER may be referred to as being cross-linked.\n", "[wiki]: https://en.wikipedia.org/wiki/Record_linkage" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Code\n", "#### This assignment can be completed using basic Python, pySpark Transformations and actions, and the plotting library matplotlib. Other libraries are not allowed.\n", "### Files\n", "#### Data files for this assignment are from the [metric-learning](https://code.google.com/p/metric-learning/) project and can be found at:\n", "`cs100/lab3`\n", "#### The directory contains the following files:\n", "* **Google.csv**, the Google Products dataset\n", "* **Amazon.csv**, the Amazon dataset\n", "* **Google_small.csv**, 200 records sampled from the Google data\n", "* **Amazon_small.csv**, 200 records sampled from the Amazon data\n", "* **Amazon_Google_perfectMapping.csv**, the \"gold standard\" mapping\n", "* **stopwords.txt**, a list of common English words\n", "#### Besides the complete data files, there are \"sample\" data files for each dataset - we will use these for **Part 1**. In addition, there is a \"gold standard\" file that contains all of the true mappings between entities in the two datasets. Every row in the gold standard file has a pair of record IDs (one Google, one Amazon) that belong to two record that describe the same thing in the real world. We will use the gold standard to evaluate our algorithms." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Part 0: Preliminaries**\n", "#### We read in each of the files and create an RDD consisting of lines.\n", "#### For each of the data files (\"Google.csv\", \"Amazon.csv\", and the samples), we want to parse the IDs out of each record. The IDs are the first column of the file (they are URLs for Google, and alphanumeric strings for Amazon). Omitting the headers, we load these data files into pair RDDs where the *mapping ID* is the key, and the value is a string consisting of the name/title, description, and manufacturer from the record.\n", "#### The file format of an Amazon line is:\n", " `\"id\",\"title\",\"description\",\"manufacturer\",\"price\"`\n", "#### The file format of a Google line is:\n", " `\"id\",\"name\",\"description\",\"manufacturer\",\"price\"`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import re\n", "DATAFILE_PATTERN = '^(.+),\"(.+)\",(.*),(.*),(.*)'\n", "\n", "def removeQuotes(s):\n", " \"\"\" Remove quotation marks from an input string\n", " Args:\n", " s (str): input string that might have the quote \"\" characters\n", " Returns:\n", " str: a string without the quote characters\n", " \"\"\"\n", " return ''.join(i for i in s if i!='\"')\n", "\n", "\n", "def parseDatafileLine(datafileLine):\n", " \"\"\" Parse a line of the data file using the specified regular expression pattern\n", " Args:\n", " datafileLine (str): input string that is a line from the data file\n", " Returns:\n", " str: a string parsed using the given regular expression and without the quote characters\n", " \"\"\"\n", " match = re.search(DATAFILE_PATTERN, datafileLine)\n", " if match is None:\n", " print 'Invalid datafile line: %s' % datafileLine\n", " return (datafileLine, -1)\n", " elif match.group(1) == '\"id\"':\n", " print 'Header datafile line: %s' % datafileLine\n", " return (datafileLine, 0)\n", " else:\n", " product = '%s %s %s' % (match.group(2), match.group(3), match.group(4))\n", " return ((removeQuotes(match.group(1)), product), 1)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Google_small.csv - Read 201 lines, successfully parsed 200 lines, failed to parse 0 lines\n", "Google.csv - Read 3227 lines, successfully parsed 3226 lines, failed to parse 0 lines\n", "Amazon_small.csv - Read 201 lines, successfully parsed 200 lines, failed to parse 0 lines\n", "Amazon.csv - Read 1364 lines, successfully parsed 1363 lines, failed to parse 0 lines\n" ] } ], "source": [ "import sys\n", "import os\n", "from test_helper import Test\n", "\n", "baseDir = os.path.join('data')\n", "inputPath = os.path.join('cs100', 'lab3')\n", "\n", "GOOGLE_PATH = 'Google.csv'\n", "GOOGLE_SMALL_PATH = 'Google_small.csv'\n", "AMAZON_PATH = 'Amazon.csv'\n", "AMAZON_SMALL_PATH = 'Amazon_small.csv'\n", "GOLD_STANDARD_PATH = 'Amazon_Google_perfectMapping.csv'\n", "STOPWORDS_PATH = 'stopwords.txt'\n", "\n", "def parseData(filename):\n", " \"\"\" Parse a data file\n", " Args:\n", " filename (str): input file name of the data file\n", " Returns:\n", " RDD: a RDD of parsed lines\n", " \"\"\"\n", " return (sc\n", " .textFile(filename, 4, 0)\n", " .map(parseDatafileLine)\n", " .cache())\n", "\n", "def loadData(path):\n", " \"\"\" Load a data file\n", " Args:\n", " path (str): input file name of the data file\n", " Returns:\n", " RDD: a RDD of parsed valid lines\n", " \"\"\"\n", " filename = os.path.join(baseDir, inputPath, path)\n", " raw = parseData(filename).cache()\n", " failed = (raw\n", " .filter(lambda s: s[1] == -1)\n", " .map(lambda s: s[0]))\n", " for line in failed.take(10):\n", " print '%s - Invalid datafile line: %s' % (path, line)\n", " valid = (raw\n", " .filter(lambda s: s[1] == 1)\n", " .map(lambda s: s[0])\n", " .cache())\n", " print '%s - Read %d lines, successfully parsed %d lines, failed to parse %d lines' % (path,\n", " raw.count(),\n", " valid.count(),\n", " failed.count())\n", " assert failed.count() == 0\n", " assert raw.count() == (valid.count() + 1)\n", " return valid\n", "\n", "googleSmall = loadData(GOOGLE_SMALL_PATH)\n", "google = loadData(GOOGLE_PATH)\n", "amazonSmall = loadData(AMAZON_SMALL_PATH)\n", "amazon = loadData(AMAZON_PATH)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's examine the lines that were just loaded in the two subset (small) files - one from Google and one from Amazon" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "google: http://www.google.com/base/feeds/snippets/11448761432933644608: spanish vocabulary builder \"expand your vocabulary! contains fun lessons that both teach and entertain you'll quickly find yourself mastering new terms. includes games and more!\" \n", "\n", "google: http://www.google.com/base/feeds/snippets/8175198959985911471: topics presents: museums of world \"5 cd-rom set. step behind the velvet rope to examine some of the most treasured collections of antiquities art and inventions. includes the following the louvre - virtual visit 25 rooms in full screen interactive video detailed map of the louvre ...\" \n", "\n", "google: http://www.google.com/base/feeds/snippets/18445827127704822533: sierrahome hse hallmark card studio special edition win 98 me 2000 xp \"hallmark card studio special edition (win 98 me 2000 xp)\" \"sierrahome\"\n", "\n", "amazon: b000jz4hqo: clickart 950 000 - premier image pack (dvd-rom) \"broderbund\"\n", "\n", "amazon: b0006zf55o: ca international - arcserve lap/desktop oem 30pk \"oem arcserve backup v11.1 win 30u for laptops and desktops\" \"computer associates\"\n", "\n", "amazon: b00004tkvy: noah's ark activity center (jewel case ages 3-8) \"victory multimedia\"\n", "\n" ] } ], "source": [ "for line in googleSmall.take(3):\n", " print 'google: %s: %s\\n' % (line[0], line[1])\n", "\n", "for line in amazonSmall.take(3):\n", " print 'amazon: %s: %s\\n' % (line[0], line[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Part 1: ER as Text Similarity - Bags of Words**\n", "#### A simple approach to entity resolution is to treat all records as strings and compute their similarity with a string distance function. In this part, we will build some components for performing bag-of-words text-analysis, and then use them to compute record similarity.\n", "#### [Bag-of-words][bag-of-words] is a conceptually simple yet powerful approach to text analysis.\n", "#### The idea is to treat strings, a.k.a. **documents**, as *unordered collections* of words, or **tokens**, i.e., as bags of words.\n", "> #### **Note on terminology**: a \"token\" is the result of parsing the document down to the elements we consider \"atomic\" for the task at hand. Tokens can be things like words, numbers, acronyms, or other exotica like word-roots or fixed-length character strings.\n", "> #### Bag of words techniques all apply to any sort of token, so when we say \"bag-of-words\" we really mean \"bag-of-tokens,\" strictly speaking.\n", "#### Tokens become the atomic unit of text comparison. If we want to compare two documents, we count how many tokens they share in common. If we want to search for documents with keyword queries (this is what Google does), then we turn the keywords into tokens and find documents that contain them. The power of this approach is that it makes string comparisons insensitive to small differences that probably do not affect meaning much, for example, punctuation and word order.\n", "[bag-of-words]: https://en.wikipedia.org/wiki/Bag-of-words_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **1(a) Tokenize a String**\n", "#### Implement the function `simpleTokenize(string)` that takes a string and returns a list of non-empty tokens in the string. `simpleTokenize` should split strings using the provided regular expression. Since we want to make token-matching case insensitive, make sure all tokens are turned lower-case. Give an interpretation, in natural language, of what the regular expression, `split_regex`, matches.\n", "#### If you need help with Regular Expressions, try the site [regex101](https://regex101.com/) where you can interactively explore the results of applying different regular expressions to strings. *Note that \\W includes the \"_\" character*. You should use [re.split()](https://docs.python.org/2/library/re.html#re.split) to perform the string split. Also, make sure you remove any empty tokens." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "quickbrownfox = 'A quick brown fox jumps over the lazy dog.'\n", "split_regex = r'\\W+'\n", "\n", "def simpleTokenize(string):\n", " \"\"\" A simple implementation of input string tokenization\n", " Args:\n", " string (str): input string\n", " Returns:\n", " list: a list of tokens\n", " \"\"\"\n", " return [x for x in re.split(split_regex,string.lower()) if x!='']\n", "\n", "\n", "print simpleTokenize(quickbrownfox) # Should give ['a', 'quick', 'brown', ... ]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n", "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Tokenize a String (1a)\n", "Test.assertEquals(simpleTokenize(quickbrownfox),\n", " ['a','quick','brown','fox','jumps','over','the','lazy','dog'],\n", " 'simpleTokenize should handle sample text')\n", "Test.assertEquals(simpleTokenize(' '), [], 'simpleTokenize should handle empty string')\n", "Test.assertEquals(simpleTokenize('!!!!123A/456_B/789C.123A'), ['123a','456_b','789c','123a'],\n", " 'simpleTokenize should handle puntuations and lowercase result')\n", "Test.assertEquals(simpleTokenize('fox fox'), ['fox', 'fox'],\n", " 'simpleTokenize should not remove duplicates')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(1b) Removing stopwords**\n", "#### *[Stopwords][stopwords]* are common (English) words that do not contribute much to the content or meaning of a document (e.g., \"the\", \"a\", \"is\", \"to\", etc.). Stopwords add noise to bag-of-words comparisons, so they are usually excluded.\n", "#### Using the included file \"stopwords.txt\", implement `tokenize`, an improved tokenizer that does not emit stopwords.\n", "[stopwords]: https://en.wikipedia.org/wiki/Stop_words" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "These are the stopwords: set([u'all', u'just', u'being', u'over', u'both', u'through', u'yourselves', u'its', u'before', u'with', u'had', u'should', u'to', u'only', u'under', u'ours', u'has', u'do', u'them', u'his', u'very', u'they', u'not', u'during', u'now', u'him', u'nor', u'did', u'these', u't', u'each', u'where', u'because', u'doing', u'theirs', u'some', u'are', u'our', u'ourselves', u'out', u'what', u'for', u'below', u'does', u'above', u'between', u'she', u'be', u'we', u'after', u'here', u'hers', u'by', u'on', u'about', u'of', u'against', u's', u'or', u'own', u'into', u'yourself', u'down', u'your', u'from', u'her', u'whom', u'there', u'been', u'few', u'too', u'themselves', u'was', u'until', u'more', u'himself', u'that', u'but', u'off', u'herself', u'than', u'those', u'he', u'me', u'myself', u'this', u'up', u'will', u'while', u'can', u'were', u'my', u'and', u'then', u'is', u'in', u'am', u'it', u'an', u'as', u'itself', u'at', u'have', u'further', u'their', u'if', u'again', u'no', u'when', u'same', u'any', u'how', u'other', u'which', u'you', u'who', u'most', u'such', u'why', u'a', u'don', u'i', u'having', u'so', u'the', u'yours', u'once'])\n", "['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "stopfile = os.path.join(baseDir, inputPath, STOPWORDS_PATH)\n", "stopwords = set(sc.textFile(stopfile).collect())\n", "print 'These are the stopwords: %s' % stopwords\n", "\n", "def tokenize(string):\n", " \"\"\" An implementation of input string tokenization that excludes stopwords\n", " Args:\n", " string (str): input string\n", " Returns:\n", " list: a list of tokens without stopwords\n", " \"\"\"\n", " return [x for x in re.split(split_regex,string.lower()) if x!='' and x not in stopwords]\n", "\n", "print tokenize(quickbrownfox) # Should give ['quick', 'brown', ... ]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Removing stopwords (1b)\n", "Test.assertEquals(tokenize(\"Why a the?\"), [], 'tokenize should remove all stopwords')\n", "Test.assertEquals(tokenize(\"Being at the_?\"), ['the_'], 'tokenize should handle non-stopwords')\n", "Test.assertEquals(tokenize(quickbrownfox), ['quick','brown','fox','jumps','lazy','dog'],\n", " 'tokenize should handle sample text')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(1c) Tokenizing the small datasets**\n", "#### Now let's tokenize the two *small* datasets. For each ID in a dataset, `tokenize` the values, and then count the total number of tokens.\n", "#### How many tokens, total, are there in the two datasets?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 22520 tokens in the combined datasets\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "amazonRecToToken = amazonSmall.map(lambda x:(x[0],tokenize(x[1])))\n", "googleRecToToken = googleSmall.map(lambda x:(x[0],tokenize(x[1])))\n", "\n", "def countTokens(vendorRDD):\n", " \"\"\" Count and return the number of tokens\n", " Args:\n", " vendorRDD (RDD of (recordId, tokenizedValue)): Pair tuple of record ID to tokenized output\n", " Returns:\n", " count: count of all tokens\n", " \"\"\"\n", " id_count_rdd = vendorRDD.map(lambda x:len(x[1])).reduce(lambda x,y : x+y)\n", " return id_count_rdd\n", " \n", "\n", "totalTokens = countTokens(amazonRecToToken) + countTokens(googleRecToToken)\n", "print 'There are %s tokens in the combined datasets' % totalTokens" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n" ] } ], "source": [ "# TEST Tokenizing the small datasets (1c)\n", "Test.assertEquals(totalTokens, 22520, 'incorrect totalTokens')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(1d) Amazon record with the most tokens**\n", "#### Which Amazon record has the biggest number of tokens?\n", "#### In other words, you want to sort the records and get the one with the largest count of tokens." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Amazon record with ID \"b000o24l3q\" has the most tokens (1547)\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "def findBiggestRecord(vendorRDD):\n", " \"\"\" Find and return the record with the largest number of tokens\n", " Args:\n", " vendorRDD (RDD of (recordId, tokens)): input Pair Tuple of record ID and tokens\n", " Returns:\n", " list: a list of 1 Pair Tuple of record ID and tokens\n", " \"\"\"\n", " record_id = vendorRDD.map(lambda x:(x[0],len(x[1]))).takeOrdered(1,lambda s:-1*s[1])[0][0]\n", " \n", " return vendorRDD.filter(lambda x:(x[0] == record_id)).take(1)\n", "#vendorRDD.map(lambda x:(x[0],len(x[1]))).takeOrdered(1,lambda s:-1*s[1])\n", "\n", "\n", "biggestRecordAmazon = findBiggestRecord(amazonRecToToken)\n", "print 'The Amazon record with ID \"%s\" has the most tokens (%s)' % (biggestRecordAmazon[0][0],\n", " len(biggestRecordAmazon[0][1]))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Amazon record with the most tokens (1d)\n", "Test.assertEquals(biggestRecordAmazon[0][0], 'b000o24l3q', 'incorrect biggestRecordAmazon')\n", "Test.assertEquals(len(biggestRecordAmazon[0][1]), 1547, 'incorrect len for biggestRecordAmazon')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Part 2: ER as Text Similarity - Weighted Bag-of-Words using TF-IDF**\n", "#### Bag-of-words comparisons are not very good when all tokens are treated the same: some tokens are more important than others. Weights give us a way to specify which tokens to favor. With weights, when we compare documents, instead of counting common tokens, we sum up the weights of common tokens. A good heuristic for assigning weights is called \"Term-Frequency/Inverse-Document-Frequency,\" or [TF-IDF][tfidf] for short.\n", "#### **TF**\n", "#### TF rewards tokens that appear many times in the same document. It is computed as the frequency of a token in a document, that is, if document *d* contains 100 tokens and token *t* appears in *d* 5 times, then the TF weight of *t* in *d* is *5/100 = 1/20*. The intuition for TF is that if a word occurs often in a document, then it is more important to the meaning of the document.\n", "#### **IDF**\n", "#### IDF rewards tokens that are rare overall in a dataset. The intuition is that it is more significant if two documents share a rare word than a common one. IDF weight for a token, *t*, in a set of documents, *U*, is computed as follows:\n", "* #### Let *N* be the total number of documents in *U*\n", "* #### Find *n(t)*, the number of documents in *U* that contain *t*\n", "* #### Then *IDF(t) = N/n(t)*.\n", "#### Note that *n(t)/N* is the frequency of *t* in *U*, and *N/n(t)* is the inverse frequency.\n", "> #### **Note on terminology**: Sometimes token weights depend on the document the token belongs to, that is, the same token may have a different weight when it's found in different documents. We call these weights *local* weights. TF is an example of a local weight, because it depends on the length of the source. On the other hand, some token weights only depend on the token, and are the same everywhere that token is found. We call these weights *global*, and IDF is one such weight.\n", "#### **TF-IDF**\n", "#### Finally, to bring it all together, the total TF-IDF weight for a token in a document is the product of its TF and IDF weights.\n", "[tfidf]: https://en.wikipedia.org/wiki/Tf%E2%80%93idf" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'brown': 0.16666666666666666, 'lazy': 0.16666666666666666, 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666, 'dog': 0.16666666666666666, 'quick': 0.16666666666666666}\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "def tf(tokens):\n", " \"\"\" Compute TF\n", " Args:\n", " tokens (list of str): input list of tokens from tokenize\n", " Returns:\n", " dictionary: a dictionary of tokens to its TF values\n", " \"\"\"\n", " tf_dict = {}\n", " count = 0.0\n", " for t in tokens:\n", " count+=1.0\n", " if t in tf_dict.iterkeys():\n", " tf_dict[t]+=1.0\n", " else:\n", " tf_dict[t]=1.0\n", " return {x:y/count for x,y in tf_dict.iteritems()}\n", "\n", "print tf(tokenize(quickbrownfox)) # Should give { 'quick': 0.1666 ... }" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Implement a TF function (2a)\n", "tf_test = tf(tokenize(quickbrownfox))\n", "Test.assertEquals(tf_test, {'brown': 0.16666666666666666, 'lazy': 0.16666666666666666,\n", " 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666,\n", " 'dog': 0.16666666666666666, 'quick': 0.16666666666666666},\n", " 'incorrect result for tf on sample text')\n", "tf_test2 = tf(tokenize('one_ one_ two!'))\n", "Test.assertEquals(tf_test2, {'one_': 0.6666666666666666, 'two': 0.3333333333333333},\n", " 'incorrect result for tf test')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(2b) Create a corpus**\n", "#### Create a pair RDD called `corpusRDD`, consisting of a combination of the two small datasets, `amazonRecToToken` and `googleRecToToken`. Each element of the `corpusRDD` should be a pair consisting of a key from one of the small datasets (ID or URL) and the value is the associated value for that key from the small datasets." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# TODO: Replace with appropriate code\n", "corpusRDD = amazonRecToToken.union(googleRecToToken)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n" ] } ], "source": [ "# TEST Create a corpus (2b)\n", "Test.assertEquals(corpusRDD.count(), 400, 'incorrect corpusRDD.count()')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(2c) Implement an IDFs function**\n", "#### Implement `idfs` that assigns an IDF weight to every unique token in an RDD called `corpus`. The function should return an pair RDD where the `key` is the unique token and value is the IDF weight for the token.\n", "#### Recall that the IDF weight for a token, *t*, in a set of documents, *U*, is computed as follows:\n", "* #### Let *N* be the total number of documents in *U*.\n", "* #### Find *n(t)*, the number of documents in *U* that contain *t*.\n", "* #### Then *IDF(t) = N/n(t)*.\n", "#### The steps your function should perform are:\n", "* #### Calculate *N*. Think about how you can calculate *N* from the input RDD.\n", "* #### Create an RDD (*not a pair RDD*) containing the unique tokens from each document in the input `corpus`. For each document, you should only include a token once, *even if it appears multiple times in that document.*\n", "* #### For each of the unique tokens, count how many times it appears in the document and then compute the IDF for that token: *N/n(t)*\n", "#### Use your `idfs` to compute the IDF weights for all tokens in `corpusRDD` (the combined small datasets).\n", "#### How many unique tokens are there?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 4772 unique tokens in the small datasets.\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "def idfs(corpus):\n", " \"\"\" Compute IDF\n", " Args:\n", " corpus (RDD): input corpus\n", " Returns:\n", " RDD: a RDD of (token, IDF value)\n", " \"\"\"\n", " N = corpus.count()\n", "\n", " uniqueTokens = corpus.map(lambda x : set(x[1]))\n", "\n", " tokenCountPairTuple = uniqueTokens.flatMap(lambda s : [(x,1) for x in s])\n", "\n", " tokenSumPairTuple = tokenCountPairTuple.reduceByKey(lambda x,y:x+y)\n", "\n", " return tokenSumPairTuple.map(lambda x:(x[0],N/float(x[1])))\n", "\n", "idfsSmall = idfs(amazonRecToToken.union(googleRecToToken))\n", "uniqueTokenCount = idfsSmall.count()\n", "\n", "print 'There are %s unique tokens in the small datasets.' % uniqueTokenCount" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Implement an IDFs function (2c)\n", "Test.assertEquals(uniqueTokenCount, 4772, 'incorrect uniqueTokenCount')\n", "tokenSmallestIdf = idfsSmall.takeOrdered(1, lambda s: s[1])[0]\n", "\n", "Test.assertEquals(tokenSmallestIdf[0], 'software', 'incorrect smallest IDF token')\n", "Test.assertTrue(abs(tokenSmallestIdf[1] - 4.25531914894) < 0.0000000001,\n", " 'incorrect smallest IDF value')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(2d) Tokens with the smallest IDF**\n", "#### Print out the 11 tokens with the smallest IDF in the combined small dataset." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('software', 4.25531914893617), ('new', 6.896551724137931), ('features', 6.896551724137931), ('use', 7.017543859649122), ('complete', 7.2727272727272725), ('easy', 7.6923076923076925), ('create', 8.333333333333334), ('system', 8.333333333333334), ('cd', 8.333333333333334), ('1', 8.51063829787234), ('windows', 8.51063829787234)]\n" ] } ], "source": [ "smallIDFTokens = idfsSmall.takeOrdered(11, lambda s: s[1])\n", "print smallIDFTokens" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(2e) IDF Histogram**\n", "#### Plot a histogram of IDF values. Be sure to use appropriate scaling and bucketing for the data.\n", "#### First plot the histogram using `matplotlib`" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAArIAAAEkCAYAAADTrtJDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3X9wVNX9//HXbsQQJCElKD9iE0OT0BSdRlIzOCYzBHUU\nZaJWh85iNLS22NaMhjqttvwYvjSV4g9YOv4AJWI0sK3OOCqG5oMG3IoULKDV2khWDFJRMhrMbmLC\nryTfP0IWQgLsLuzuPdnnY2ZHvHvv3rNvDvrK4dxzbJs2beoWAAAAYBh7tBsAAAAAhIIgCwAAACMR\nZAEAAGAkgiwAAACMRJAFAACAkQiyAAAAMBJBFgAAAEYiyAIAAMBIBFkAAAAYiSALAAAAI1kqyH70\n0UeaOnWqqquro90UAAAAWJxlgmxXV5eeeOIJ/eAHP4h2UwAAAGCA86LdgF7r1q3TpZdeqtbW1mg3\nBQAAAAawxIis1+vVyy+/rNLS0mg3BQAAAIawRJB95plnNGPGDF1wwQXRbgoAAAAMEfUgu2vXLn3y\nySe64YYbJEnd3d1RbhEAAABMEPQc2Y6ODlVVVWn37t3yeDzy+XwqLS0dcFpAR0eHKisr5Xa75fP5\nlJaWJofDoalTp/rP+fDDD/XZZ5/plltu8V9jt9v1+eef68EHHzyLrwYAAIDBLOgg6/V6VVNTo8zM\nTBUUFGj9+vWnPHfBggXatWuXZs+erYsvvlhvvvmmKioq1N3drauvvlqSdOONN2rKlCmSekZjn3zy\nSY0dO1YOhyO0bwQAAICYEHSQHTNmjNatWyepJ9SeKshu3bpVO3bs0Lx58/wjsLm5uWpqatKKFStU\nVFQku92uhIQEJSQk+K8bOnSohg0bpsTExFC+DwAAAGJE2Jbf2rx5s4YNG+Yfbe01bdo0VVRUqL6+\nXhMnTux33QMPPHDGz25ublZzc/O5aioAAADOsZSUFKWkpIT1HmELso2NjUpLS5Pd3vd5soyMDEnS\nnj17BgyyZ9Lc3Kx7771XX3zxxTlpJwAAAM69lJQUrVy5MqxhNmxB1ufzKTU1td/xpKQk//uhaG5u\n1hdffKHq6mrl5OScVRtjTXl5uZxOZ7SbYRRqFhrqFjxqFhrqFjxqFhrqFpz6+nqVlJSoubnZzCAb\nbjk5OZo0aVK0m2GU5ORkahYkahYa6hY8ahYa6hY8ahYa6mZNYVtHNikpSV6vt9/x3pHY3pFZAAAA\nIBRhG5EdP368Nm7cqK6urj7zZBsbGyUdnysbqvLyciUnJ8vhcLBUFwAAgAW4XC65XC61tLRE5H5h\nG5EtLCxUR0eH3G53n+O1tbUaNWrUWc9vdTqdeu211wixAAAAFuFwOPTaa69FbD5xSCOy27Zt08GD\nB9Xe3i6pZwWC3sA6efJkxcfHKz8/X3l5eXI6nWpvb9e4ceNUV1en7du3a+7cubLZbOfuWyAghP7g\nUbPQULfgUbPQULfgUbPQUDdrsm3atKk72IscDoeampp6PsBmU3d3t//Xa9eu1ejRoyUd36L2rbfe\nks/nU3p6umbOnKmioqKQG9zQ0KC7775bO3bsYNI1AACABe3cuVN5eXlauXKlsrOzw3afkEZkXS5X\nQOclJCSorKxMZWVlodzmtJgjCwAAYC2RniNr7PJbTqeTEVkAAAAL6R1g7B2RDbewPewFAAAAhBNB\nFgAAAEYiyAIAAMBIxs6R5WEvAAAAa+FhrwDxsBcAAIC18LAXAAAAEACCLAAAAIxEkAUAAICRjJ0j\ny8NeAAAA1sLDXgHiYS8AAABr4WEvAAAAIAAEWQAAABiJIAsAAAAjEWQBAABgJIIsAAAAjGTsqgUs\nvwUAAGAtLL8VIJbfAgAAsBaW3wIAAAACQJAFAACAkQiyAAAAMBJBFgAAAEYiyAIAAMBIxq5awPJb\nAAAA1sLyWwFi+S0AAABrYfktAAAAIAAEWQAAABiJIAsAAAAjEWQBAABgJIIsAAAAjESQBQAAgJEI\nsgAAADASQRYAAABGMnZDBHb2AgAAsBZ29goQO3sBAABYCzt7AQAAAAEgyAIAAMBIBFkAAAAYiSAL\nAAAAIxn7sBcAAAAix+PxqLW1NaBz6+vrw9yaHgRZAAAAnJbH41F2dna0m9EPQRYAAACndXwktlpS\nTgBXrJc0P3wNOoYgCwAAgADlSApkHf/ITC3gYS8AAAAYiSALAAAAIxFkAQAAYCSCLAAAAIxkbJAt\nLy9XcXGxXC5XtJsCAAAASZJLUrGkxyJyN2NXLXA6nZo0KZCn5gAAABAZjmOvNZJKwn43Y0dkAQAA\nENsIsgAAADASQRYAAABGIsgCAADASARZAAAAGIkgCwAAACMRZAEAAGAkgiwAAACMRJAFAACAkQiy\nAAAAMBJBFgAAAEYiyAIAAMBI50W7AZK0aNEivf/++zp06JBSUlI0Y8YMTZ8+PdrNAgAAgIVZIsiW\nlpZq7ty5iouL08cff6z77rtPeXl5Gjt2bLSbBgAAAIuyRJBNT0/3/9put+uCCy7QsGHDotgiAAAA\nWJ0lgqwkVVRU6O2335YkzZ8/XyNGjIhyiwAAAGBllgmy8+bNU1dXl7Zs2aKHH35YWVlZGj16dLSb\nBQAAAIuy1KoFdrtdBQUFuvTSS7Vly5ZoNwcAAAAWFvSIbEdHh6qqqrR79255PB75fD6VlpaqtLR0\nwHMrKyvldrvl8/mUlpYmh8OhqVOnnvYenZ2dSkhICLZpAAAAiCFBj8h6vV7V1NTo6NGjKigoOO25\nCxYs0IYNG1RaWqolS5ZowoQJqqioUF1dnf+cAwcOyO12q6OjQ52dndq0aZPq6+uVl5cX/LcBAABA\nzAh6RHbMmDFat26dpJ5Qu379+gHP27p1q3bs2KF58+b5R2Bzc3PV1NSkFStWqKioSHZ7T45++eWX\n9cgjj8hutysjI0N/+tOfdOGFF4b6nQAAABADwvaw1+bNmzVs2DBNmTKlz/Fp06apoqJC9fX1mjhx\nokaOHKnly5eHqxkAAAAYpMIWZBsbG5WWluYfde2VkZEhSdqzZ48mTpwY8ueXl5crOTm5zzGHwyGH\nwxHyZwIAACBYrmOvE30ekTuHLcj6fD6lpqb2O56UlOR//2w4nU5NmjTprD4DAAAAZ8tx7HWiNZJK\nwn5nSy2/BQAAAAQqbCOySUlJ8nq9/Y73jsT2jswCwGDn8XjU2toa1DWJiYnKysoKU4sAYHAIW5Ad\nP368Nm7cqK6urj7zZBsbGyUdnysbqt45ssyLBWBlHo9H2dnZIV3b0NBAmAVgmN75sobPkS0sLFRN\nTY3cbreKior8x2trazVq1Cjl5OSc1eczRxaACY6PxFZLCvS/e/WSSoIexQWA6OudLxuZObIhBdlt\n27bp4MGDam9vl9SzAoHb7ZYkTZ48WfHx8crPz1deXp6cTqfa29s1btw41dXVafv27Zo7d65sNtu5\n+xYAYHk5kvjhGwDOpZCCrNPpVFNTkyTJZrPJ7XbL7XbLZrNp7dq1Gj16tCRp0aJFqqys1OrVq+Xz\n+ZSenq758+f3GaEFAAAAQhFSkHW5Tl4rbGAJCQkqKytTWVlZKLc5LebIAgAAWM0gmSMbbsyRBQAA\nsJrIzpFlHVkAAAAYiSALAAAAIxFkAQAAYCRjg2x5ebmKi4sDfvAMAAAA4eaSVCzpsYjcjYe9AAAA\ncI7wsBcAAABwRgRZAAAAGIkgCwAAACMZG2R52AsAAMBqeNgrIDzsBQAAYDU87AUAAACcEUEWAAAA\nRiLIAgAAwEgEWQAAABiJIAsAAAAjGRtkWX4LAADAalh+KyAsvwUAAGA1LL8FAAAAnBFBFgAAAEYi\nyAIAAMBIBFkAAAAYydiHvQCcmcfjUWtra1DXJCYmKisrK0wtAgDg3DE2yJaXlys5OVkOh0MOhyPa\nzQEsx+PxKDs7O6RrGxoaCLMAgBC4jr0+j8jdjA2yLL8FnN7xkdhqSTkBXlUvqSToUVwAAHpEdvkt\nY4MsgEDlSOKHPgDA4MPDXgAAADASQRYAAABGIsgCAADASARZAAAAGIkgCwAAACMRZAEAAGAkgiwA\nAACMZGyQLS8vV3FxsVwuV7SbAgAAAEk9u3oVS3osInczdkMEdvYCAACwmsju7GXsiCwAAABiG0EW\nAAAARiLIAgAAwEjGzpEFws3j8ai1tTWoaxITE5WVlRWmFgEAgBMRZIEBeDweZWdnh3RtQ0MDYRYA\ngAggyAIDOD4SWy0pJ8Cr6iWVBD2KCwAAQkOQBU4rRxLLvAEAYEU87AUAAAAjEWQBAABgJIIsAAAA\njESQBQAAgJEIsgAAADCSsUG2vLxcxcXFcrlc0W4KAAAAJEkuScWSHovI3YxdfsvpdGrSJJZFAgAA\nsA7HsdcaSSVhv5uxI7IAAACIbQRZAAAAGIkgCwAAACMRZAEAAGAkgiwAAACMRJAFAACAkQiyAAAA\nMJKx68hicPF4PGptbQ3qmsTERGVlZYWpRQAAwOoIsog6j8ej7OzskK5taGggzAIAEKMIsoi64yOx\n1ZJyAryqXlJJ0KO4AABg8CDIwkJyJLHtMAAACAwPewEAAMBIUR+RPXLkiJYuXaqdO3fq22+/VXp6\nun79619r4sSJ0W4aAAAALCzqI7KdnZ0aO3asHn/8cb3++usqLi7W3LlzdejQoWg3DQAAABYW9SA7\ndOhQ3XnnnbrwwgslSdddd526u7u1b9++KLcMAAAAVhb1IHuyvXv36tChQxo3bly0mwIAAAALs1SQ\nPXjwoB566CHdcccdGjp0aLSbAwAAAAuzTJA9evSoFi5cqIyMDN1+++3Rbg4AAAAsLuhVCzo6OlRV\nVaXdu3fL4/HI5/OptLRUpaWlA55bWVkpt9stn8+ntLQ0ORwOTZ06tc95XV1deuihhzRkyBD99re/\nDf3bAAAAIGYEPSLr9XpVU1Ojo0ePqqCg4LTnLliwQBs2bFBpaamWLFmiCRMmqKKiQnV1dX3OW7p0\nqb755hvNnz9fdrtlBokBAABgYUGPyI4ZM0br1q2T1BNq169fP+B5W7du1Y4dOzRv3jz/CGxubq6a\nmpq0YsUKFRUVyW63a//+/Vq/fr3i4+N18803+69fsmSJLrvsslC+EwAAAGJA2DZE2Lx5s4YNG6Yp\nU6b0OT5t2jRVVFSovr5eEydO1JgxY7Rx48ZwNSMmeDwetba2BnVNYmKisrKywnaPYD8fAAAgWGEL\nso2NjUpLS+s3VSAjI0OStGfPnrPavau8vFzJycl9jjkcDjkcjpA/00Qej0fZ2dkhXdvQ0BBQ2Az1\nHoF+PgAAMJnr2OtEn0fkzmELsj6fT6mpqf2OJyUl+d8/G06nU5MmTTqrzxgMjo+SVkvKCfCqekkl\nAY+wBn+P4D4fAACYzHHsdaI1kkrCfuewBVlEWo6kcAf7SNwDAAAgMGFbIiApKUler7ff8d6R2N6R\nWQAAACAUYQuy48eP1969e9XV1dXneGNjo6Tjc2VDVV5eruLiYrlcJ8/JAAAAQHS4JBVLeiwidwtb\nkC0sLFRHR4fcbnef47W1tRo1apRycgKdzzkwp9Op1157LeYe7gIAALAuh6TXJN0fkbuFNEd227Zt\nOnjwoNrb2yX1rEDQG1gnT56s+Ph45efnKy8vT06nU+3t7Ro3bpzq6uq0fft2zZ07Vzab7dx9CwAA\nAMSckIKs0+lUU1OTJMlms8ntdsvtdstms2nt2rUaPXq0JGnRokWqrKzU6tWr5fP5lJ6ervnz56uo\nqOjcfQMAAADEpJCCbKDzUhMSElRWVqaysrJQbnNavevIxuLasQAAANbUu6as4evIhls015EdLLtc\n1dfXn9PzAABArOtdU5Z1ZC1pcOxytVeSVFIS/g4GAAAQLgTZIIW6y9W7774b8Chu+Edwvz32z0C/\nw3pJ88PXHAAAgBAQZEMW6C5XoY1+RmYEN9DvwNQCAABgPcYG2XP1sFew812Dny8a7OhnzwhuMG0C\nAACwBh72Csi5eNgr1PmuoQl09BMAAMBUPOwVMcHPd5WYLwoAAGANMR1kjwtmtDQy80VZGgsAAOD0\nCLKWw9JYAAAAgTA2yA7enb1YGgsAAJiKh70CEs2dvSKDpbEAAIBpIvuwlz3sdwAAAADCgCALAAAA\nIxFkAQAAYCSCLAAAAIxEkAUAAICRjA2y5eXlKi4ulsvlinZTAAAAIKln6a1iSY9F5G4svwUAAIBz\nhOW3AAAAgDMiyAIAAMBIBFkAAAAYiSALAAAAIxFkAQAAYCRjgyzLbwEAAFgNy28FhOW3AAAArIbl\ntwAAAIAzIsgCAADASARZAAAAGIkgCwAAACMRZAEAAGAkgiwAAACMRJAFAACAkYxdR3YgDQ0NeuON\nNwI+/3//+18YWwMAAIBwMjbIlpeXKzk5WQ6HQw6HQ5L0xz/+UdXVa2S3nx/QZ3R1HQlnEwEAAGKM\n69jr84jczdggO9DOXp2dnbLbi9TVVRfgpyyU9P/OddMAAABiFDt7AQAAAGdEkAUAAICRCLIAAAAw\nEkEWAAAARiLIAgAAwEgEWQAAABiJIAsAAAAjEWQBAABgJIIsAAAAjESQBQAAgJEIsgAAADASQRYA\nAABGMjbIlpeXq7i4WC6XK9pNAQAAgCTJJalY0mMRudt5EblLGDidTk2aNCnazQAAAICf49hrjaSS\nsN/N2BFZAAAAxDaCLAAAAIxEkAUAAICRCLIAAAAwEkEWAAAARiLIAgAAwEgEWQAAABiJIAsAAAAj\nEWQBAABgJIIsAAAAjESQBQAAgJEIsgAAADCSJYLsq6++qtmzZ+vaa69VVVVVtJsDAAAAA1giyKak\npOinP/2prrrqqmg3BQAAAIY4L9oNkKSCggJJ0jvvvBPllgAAAMAUlhiRBQAAAIJFkI0prmg3wEC1\n0W6AkVwu+lrwqFko6GvBo2ahoW7WRJCNKfwhDN7/RbsBRuI/+KGgZqGgrwWPmoWGullT0HNkOzo6\nVFVVpd27d8vj8cjn86m0tFSlpaUDnltZWSm32y2fz6e0tDQ5HA5NnTr1nDQeAAAAsSvoEVmv16ua\nmhodPXrU/5DWqSxYsEAbNmxQaWmplixZogkTJqiiokJ1dXV9zuvs7NThw4fV2dnp/3VXV1ewTQMA\nAEAMCXpEdsyYMVq3bp2knlC7fv36Ac/bunWrduzYoXnz5vlHYHNzc9XU1KQVK1aoqKhIdntPjn7h\nhRf0/PPP+6+trq7WAw88oOuuuy7oLwQAAIDYELbltzZv3qxhw4ZpypQpfY5PmzZNFRUVqq+v18SJ\nEyVJs2bN0qxZs4L6/Pr6+n7HDhw4oO7uVkk7A/yUL3s/LYg7NwZ5jZXOb1FPbcLdpp7zBvo9GvBs\n/3nB/D4Ed49gHf/cYPpTeNsUrGjWtaWlRTt3Blq3wS3w34feP5/Hz7VKX7Iy+lrwqFloYr1uwf8/\npfHMp5wDtk2bNnWHerHX69Utt9wy4BzZe+65R93d3XryySf7HG9sbNRdd92l+++/XzfeeGPQ92xu\nbtbdd9+t5ubmUJsNAACAMEtJSdHKlSuVkpIStnuEbUTW5/MpNTW13/GkpCT/+6HoLQpBFgAAwLpS\nUlLCGmIli+zsFaxIFAYAAADWFrZ1ZJOSkuT1evsd7x2J7R2ZBQAAAEIRtiA7fvx47d27t98yWo2N\nPZN/MzIywnVrAAAAxICwBdnCwkJ1dHTI7Xb3OV5bW6tRo0YpJycnXLcGAABADAhpjuy2bdt08OBB\ntbe3S5L27NnjD6yTJ09WfHy88vPzlZeXJ6fTqfb2do0bN051dXXavn275s6dK5vNFtQ92SXs9N5/\n/3395je/GfC9J554os8PDg0NDVq5cqXq6+sVFxenyy+/XL/61a80duzYSDU34oLZkS6Y+rz88st6\n5ZVXtH//fo0aNUrXXXedSkpKFBcXF4mvFXaB1u3Pf/6zNmzY0O/6tLQ0Pffcc/2OD+a67dixQxs2\nbNB///tfff311xo+fLgmTJigO++8U9nZ2X3Opa/1CLRm9LO+PvnkE61atUp79uxRS0uL4uPj9d3v\nflc33XSTrr322j7n0td6BFoz+tqZ1dTU6LHHHtPQoUP77SkQyf4WUpB1Op1qamqSJNlsNrndbrnd\nbtlsNq1du1ajR4+WJC1atEiVlZVavXq1fD6f0tPTNX/+fBUVFQV9zwULFmjXrl2aPXu2Lr74Yr35\n5puqqKhQd3e3rr766lC+xqD0i1/8Qrm5uX2OXXLJJf5f7927V3PmzFFWVpYWLlyoQ4cOafXq1br3\n3nu1atUqjRgxIsItjozeHekyMzNVUFBwyo08gqlPdXW1Vq9erZkzZ+pHP/qRPv74Y1VWVurrr7/W\n/fffH6mvFlaB1k2S4uPjtXTp0n7HTjbY67Zu3Tp5vV7ddtttuuSSS9TS0qKXXnpJ99xzjx5++GFd\nfvnlkuhrJwq0ZhL97ERtbW0aPXq0rrnmGo0aNUodHR168803tXjxYjU1NamkpEQSfe1EgdZMoq+d\nzldffaWnnnpKKSkp/kHNXpHubyEFWZfLFdB5CQkJKisrU1lZWSi38Qtml7BYl5qaetppG88++6zi\n4+O1ePFiJSQkSJKys7N1xx136G9/+5tmz54dqaZGVKA70gVaH6/XqxdeeEHTp0/XXXfdJUn64Q9/\nqKNHj+rZZ5/VbbfdpvT09Ah8s/AKtG6SZLfbzzhlKBbqdt999+k73/lOn2P5+fkqKSnRmjVr/KGM\nvnZcoDWT6Gcnys3N7TdwceWVV2r//v16/fXX/aGMvnZcoDWT6Guns2zZMl1++eUaPnx4vymkke5v\nRqS/0+0S1tzczO43J+juPvX+Fp2dndq6dasKCwv9nUuSRo8erdzcXG3evDkSTbSsYOrz7rvv6siR\nI7r++uv7fMa0adPU3d0dk7U8Xd/rFQt1OzmQST0/1Kenp+urr76SRF87WSA160U/O7OkpCT/X8vS\n1wJzYs160dcG9sYbb+iDDz7Qfffd169G0ehvRgTZxsZGpaWl9Rt17V35YM+ePVFolTUtX75c11xz\njaZPn67f/e53+vDDD/3v7du3T4cPH9b3vve9fteNHz9e+/bt05EjRyLZXEsJpj69fW78+PF9zhs5\ncqRGjBgRk33y0KFDuvXWW3X11VdrxowZ+stf/qLW1tY+58Rq3dra2tTQ0OCf5kNfO7OTa9aLftZf\nd3e3Ojs71dLSoldeeUX/+te/NGPGDEn0tVM5Xc160df6O3DggB5//HHNnj1bo0aN6vd+NPqbERsi\nhGuXsMFk+PDhuvXWW5Wbm6ukpCTt27dPf/3rXzVnzhwtXrxYV1xxhb9OiYmJ/a5PTExUd3e3Wltb\nNXLkyEg33xKCqY/X69WQIUMGnC+VmJgYc30yMzNTmZmZ/h8u//3vf+ull17Szp079dRTT/l/Mo/V\nui1fvlyHDx/2/7Ulfe3MTq6ZRD87lWXLlun111+X1PPX4b/85S910003SaKvncrpaibR105l+fLl\nuuSSS1RcXDzg+9Hob0YEWZxZ7x+6XpdddpkKCgp011136emnn9YVV1wRxdZhsLvtttv6/HteXp4y\nMzO1cOFC1dTU9Hs/ljz77LOqq6vTvffeq6ysrGg3xwinqhn9bGAlJSWaPn26Wlpa9M477+ipp57S\nkSNH5HA4ot00yzpTzehr/bndbv3zn//UqlWrot2UPowIsuwSFprhw4dr8uTJWrdunQ4fPuyv08l/\nNdJ7zGazDfhTVKwIpj4jRozQkSNHdPjwYZ1//vl9zvX5fJowYUL4G2xxhYWFGjp0aJ857LFWt6qq\nKlVXV+vnP/+5br75Zv9x+tqpnapmp0I/ky666CJddNFFknoekpN6fhi44YYb6GuncKqaTZs2TcnJ\nyQNeE8t9raOjQ8uXL9ePf/xjjRw5Um1tbZLknybQ1tamuLi4qPQ3I+bIskvY2bPZbEpNTVV8fLx2\n797d7/1PP/1UqampGjJkSBRaZw3B1Kd3Ts/J5x44cEA+n48+qZ45aCc/CBBLdauqqlJVVZVmzZql\nmTNn9nmPvjaw09XsVGK9nw3k+9//vjo7O/XFF1/Q1wLUW7Mvv/zylOfEcl/zer1qaWnRiy++qOLi\nYv9r06ZNOnjwoIqLi7V48eKo9Dcjgiy7hIWmtbVVW7ZsUWZmpoYMGaK4uDhdeeWVevvtt9XR0eE/\nr6mpSe+//74KCwuj2NroC6Y++fn5Ov/881VbW9vnM2pra2Wz2VRQUBCxdluV2+3WoUOHNHHiRP+x\nWKnb888/r6qqKt1xxx268847+71PX+vvTDU7lVjuZ6fy3nvvyW63a9y4cfS1AJ1Ys1OJ5b42cuRI\nLV26VMuWLfO/li5dqiuuuELnn3++li1bpp/97GdR6W9xs2bNWnguvmQ4paam6j//+Y9qamqUlJSk\nb7/9VmvWrJHb7dacOXP6PfEWiyoqKuTxeNTW1qZvvvlG27dv1yOPPKLm5mY9+OCD/t00MjIy9Oqr\nr+q9997TyJEj9emnn+rRRx+V3W7X73//ew0dOjTK3yR8tm3bpt27d+vTTz/Vli1blJycLJvNps8+\n+0xjx47VeeedF3B9eienv/jiizp69Kjsdrvcbreee+45TZs2rd9yIiY7U92+/vpr/eEPf9Dhw4fl\n8/m0b98+rV+/Xs8884zS0tI0Z84cnXdezyymWKjbiy++qFWrVik/P1833nijvvrqqz6vCy+8UFLg\nfxapWU/N9u/fTz87yaOPPqoPPvhAra2t8nq92rVrl55//nlt3LhRP/nJT3TVVVdJoq+dKJCa0df6\ni4uL05gxY/q9duzYob179+r+++/3T8mIdH+zbdq06cwLpVlA7xa1b731ln+XsJkzZ4a0S9hg5HK5\ntGnTJn355Zfq6OhQUlKSLrvsMs2cObPfPJOGhgY9/fTT+uijjxQXF6dJkyYN+i1qJcnhcPTZka73\nr4hO3pEumPqcuLVeSkqKrr/++kG3LeGZ6nbBBRfo4Ycf1ieffKJvvvlGnZ2dGjNmjAoLC3X77bdr\n2LBh/T5zMNdtzpw5+uCDDwZcg9Jms6murs7/7/S1HoHUrK2tjX52ktraWv3973/X3r171dbWpoSE\nBGVmZupozQvJAAAAg0lEQVSGG27QNddc0+dc+lqPQGpGXwvckiVL9I9//EM1NTV9jkeyvxkTZAEA\nAIATGTFHFgAAADgZQRYAAABGIsgCAADASARZAAAAGIkgCwAAACMRZAEAAGAkgiwAAACMRJAFAACA\nkQiyAAAAMBJBFgAAAEYiyAIAAMBIBFkAAAAY6f8D5oCIiyECmJEAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "small_idf_values = idfsSmall.map(lambda s: s[1]).collect()\n", "fig = plt.figure(figsize=(8,3))\n", "plt.hist(small_idf_values, 50, log=True)\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(2f) Implement a TF-IDF function**\n", "#### Use your `tf` function to implement a `tfidf(tokens, idfs)` function that takes a list of tokens from a document and a Python dictionary of IDF weights and returns a Python dictionary mapping individual tokens to total TF-IDF weights.\n", "#### The steps your function should perform are:\n", "* #### Calculate the token frequencies (TF) for `tokens`\n", "* #### Create a Python dictionary where each token maps to the token's frequency times the token's IDF weight\n", "#### Use your `tfidf` function to compute the weights of Amazon product record 'b000hkgj8k'. To do this, we need to extract the record for the token from the tokenized small Amazon dataset and we need to convert the IDFs for the small dataset into a Python dictionary. We can do the first part, by using a `filter()` transformation to extract the matching record and a `collect()` action to return the value to the driver. For the second part, we use the [`collectAsMap()` action](http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.collectAsMap) to return the IDFs to the driver as a Python dictionary." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Amazon record \"b000hkgj8k\" has tokens and weights:\n", "{'autocad': 33.33333333333333, 'autodesk': 8.333333333333332, 'courseware': 66.66666666666666, 'psg': 33.33333333333333, '2007': 3.5087719298245617, 'customizing': 16.666666666666664, 'interface': 3.0303030303030303}\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "def tfidf(tokens, idfs):\n", " \"\"\" Compute TF-IDF\n", " Args:\n", " tokens (list of str): input list of tokens from tokenize\n", " idfs (dictionary): record to IDF value\n", " Returns:\n", " dictionary: a dictionary of records to TF-IDF values\n", " \"\"\"\n", " tfs = tf(tokens)\n", " tfIdfDict = {}\n", " for key,value in tfs.iteritems():\n", " tfIdfDict[key] = value*idfs[key]\n", " return tfIdfDict\n", "\n", "recb000hkgj8k = amazonRecToToken.filter(lambda x: x[0] == 'b000hkgj8k').collect()[0][1]\n", "idfsSmallWeights = idfsSmall.collectAsMap()\n", "rec_b000hkgj8k_weights = tfidf(recb000hkgj8k, idfsSmallWeights)\n", "\n", "print 'Amazon record \"b000hkgj8k\" has tokens and weights:\\n%s' % rec_b000hkgj8k_weights" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n" ] } ], "source": [ "# TEST Implement a TF-IDF function (2f)\n", "Test.assertEquals(rec_b000hkgj8k_weights,\n", " {'autocad': 33.33333333333333, 'autodesk': 8.333333333333332,\n", " 'courseware': 66.66666666666666, 'psg': 33.33333333333333,\n", " '2007': 3.5087719298245617, 'customizing': 16.666666666666664,\n", " 'interface': 3.0303030303030303}, 'incorrect rec_b000hkgj8k_weights')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Part 3: ER as Text Similarity - Cosine Similarity**\n", "#### Now we are ready to do text comparisons in a formal way. The metric of string distance we will use is called **[cosine similarity][cosine]**. We will treat each document as a vector in some high dimensional space. Then, to compare two documents we compute the cosine of the angle between their two document vectors. This is *much* easier than it sounds.\n", "#### The first question to answer is how do we represent documents as vectors? The answer is familiar: bag-of-words! We treat each unique token as a dimension, and treat token weights as magnitudes in their respective token dimensions. For example, suppose we use simple counts as weights, and we want to interpret the string \"Hello, world! Goodbye, world!\" as a vector. Then in the \"hello\" and \"goodbye\" dimensions the vector has value 1, in the \"world\" dimension it has value 2, and it is zero in all other dimensions.\n", "#### The next question is: given two vectors how do we find the cosine of the angle between them? Recall the formula for the dot product of two vectors:\n", "#### $$ a \\cdot b = \\| a \\| \\| b \\| \\cos \\theta $$\n", "#### Here $ a \\cdot b = \\sum a_i b_i $ is the ordinary dot product of two vectors, and $ \\|a\\| = \\sqrt{ \\sum a_i^2 } $ is the norm of $ a $.\n", "#### We can rearrange terms and solve for the cosine to find it is simply the normalized dot product of the vectors. With our vector model, the dot product and norm computations are simple functions of the bag-of-words document representations, so we now have a formal way to compute similarity:\n", "#### $$ similarity = \\cos \\theta = \\frac{a \\cdot b}{\\|a\\| \\|b\\|} = \\frac{\\sum a_i b_i}{\\sqrt{\\sum a_i^2} \\sqrt{\\sum b_i^2}} $$\n", "#### Setting aside the algebra, the geometric interpretation is more intuitive. The angle between two document vectors is small if they share many tokens in common, because they are pointing in roughly the same direction. For that case, the cosine of the angle will be large. Otherwise, if the angle is large (and they have few words in common), the cosine is small. Therefore, cosine similarity scales proportionally with our intuitive sense of similarity.\n", "[cosine]: https://en.wikipedia.org/wiki/Cosine_similarity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(3a) Implement the components of a `cosineSimilarity` function**\n", "#### Implement the components of a `cosineSimilarity` function.\n", "#### Use the `tokenize` and `tfidf` functions, and the IDF weights from Part 2 for extracting tokens and assigning them weights.\n", "#### The steps you should perform are:\n", "* #### Define a function `dotprod` that takes two Python dictionaries and produces the dot product of them, where the dot product is defined as the sum of the product of values for tokens that appear in *both* dictionaries\n", "* #### Define a function `norm` that returns the square root of the dot product of a dictionary and itself\n", "* #### Define a function `cossim` that returns the dot product of two dictionaries divided by the norm of the first dictionary and then by the norm of the second dictionary" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "102 6.16441400297\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "import math\n", "\n", "def dotprod(a, b):\n", " \"\"\" Compute dot product\n", " Args:\n", " a (dictionary): first dictionary of record to value\n", " b (dictionary): second dictionary of record to value\n", " Returns:\n", " dotProd: result of the dot product with the two input dictionaries\n", " \"\"\"\n", " sum = 0\n", " for key,value in a.iteritems():\n", " if key in b.iterkeys():\n", " sum+=a[key]*b[key]\n", " return sum\n", "\n", "def norm(a):\n", " \"\"\" Compute square root of the dot product\n", " Args:\n", " a (dictionary): a dictionary of record to value\n", " Returns:\n", " norm: a dictionary of tokens to its TF values\n", " \"\"\"\n", " sum=0\n", " for _,value in a.iteritems():\n", " sum+=value**2\n", " \n", " return sum**(0.5)\n", "\n", "def cossim(a, b):\n", " \"\"\" Compute cosine similarity\n", " Args:\n", " a (dictionary): first dictionary of record to value\n", " b (dictionary): second dictionary of record to value\n", " Returns:\n", " cossim: dot product of two dictionaries divided by the norm of the first dictionary and\n", " then by the norm of the second dictionary\n", " \"\"\"\n", " return dotprod(a, b)/norm(a)/norm(b)\n", "\n", "testVec1 = {'foo': 2, 'bar': 3, 'baz': 5 }\n", "testVec2 = {'foo': 1, 'bar': 0, 'baz': 20 }\n", "dp = dotprod(testVec1, testVec2)\n", "nm = norm(testVec1)\n", "print dp, nm" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Implement the components of a cosineSimilarity function (3a)\n", "Test.assertEquals(dp, 102, 'incorrect dp')\n", "Test.assertTrue(abs(nm - 6.16441400297) < 0.0000001, 'incorrrect nm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(3b) Implement a `cosineSimilarity` function**\n", "#### Implement a `cosineSimilarity(string1, string2, idfsDictionary)` function that takes two strings and a dictionary of IDF weights, and computes their cosine similarity in the context of some global IDF weights.\n", "#### The steps you should perform are:\n", "* #### Apply your `tfidf` function to the tokenized first and second strings, using the dictionary of IDF weights\n", "* #### Compute and return your `cossim` function applied to the results of the two `tfidf` functions" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0577243382163\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "def cosineSimilarity(string1, string2, idfsDictionary):\n", " \"\"\" Compute cosine similarity between two strings\n", " Args:\n", " string1 (str): first string\n", " string2 (str): second string\n", " idfsDictionary (dictionary): a dictionary of IDF values\n", " Returns:\n", " cossim: cosine similarity value\n", " \"\"\"\n", " w1 = tfidf(tokenize(string1), idfsDictionary)\n", " w2 = tfidf(tokenize(string2), idfsDictionary)\n", " return cossim(w1, w2)\n", "\n", "cossimAdobe = cosineSimilarity('Adobe Photoshop',\n", " 'Adobe Illustrator',\n", " idfsSmallWeights)\n", "\n", "print cossimAdobe" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n" ] } ], "source": [ "# TEST Implement a cosineSimilarity function (3b)\n", "Test.assertTrue(abs(cossimAdobe - 0.0577243382163) < 0.0000001, 'incorrect cossimAdobe')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(3c) Perform Entity Resolution**\n", "#### Now we can finally do some entity resolution!\n", "#### For *every* product record in the small Google dataset, use your `cosineSimilarity` function to compute its similarity to every record in the small Amazon dataset. Then, build a dictionary mapping `(Google URL, Amazon ID)` tuples to similarity scores between 0 and 1.\n", "#### We'll do this computation two different ways, first we'll do it without a broadcast variable, and then we'll use a broadcast variable\n", "#### The steps you should perform are:\n", "* #### Create an RDD that is a combination of the small Google and small Amazon datasets that has as elements all pairs of elements (a, b) where a is in self and b is in other. The result will be an RDD of the form: `[ ((Google URL1, Google String1), (Amazon ID1, Amazon String1)), ((Google URL1, Google String1), (Amazon ID2, Amazon String2)), ((Google URL2, Google String2), (Amazon ID1, Amazon String1)), ... ]`\n", "* #### Define a worker function that given an element from the combination RDD computes the cosineSimlarity for the two records in the element\n", "* #### Apply the worker function to every element in the RDD\n", "#### Now, compute the similarity between Amazon record `b000o24l3q` and Google record `http://www.google.com/base/feeds/snippets/17242822440574356561`." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[(('http://www.google.com/base/feeds/snippets/11448761432933644608',\n", " 'spanish vocabulary builder \"expand your vocabulary! contains fun lessons that both teach and entertain you\\'ll quickly find yourself mastering new terms. includes games and more!\" '),\n", " ('b000jz4hqo',\n", " 'clickart 950 000 - premier image pack (dvd-rom) \"broderbund\"')),\n", " (('http://www.google.com/base/feeds/snippets/11448761432933644608',\n", " 'spanish vocabulary builder \"expand your vocabulary! contains fun lessons that both teach and entertain you\\'ll quickly find yourself mastering new terms. includes games and more!\" '),\n", " ('b0006zf55o',\n", " 'ca international - arcserve lap/desktop oem 30pk \"oem arcserve backup v11.1 win 30u for laptops and desktops\" \"computer associates\"'))]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "crossSmall = (googleSmall\n", " .cartesian(amazonSmall)\n", " .cache())\n", "crossSmall.take(2)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requested similarity is 0.000303171940451.\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "crossSmall = (googleSmall\n", " .cartesian(amazonSmall)\n", " .cache())\n", "#crossSmall.take(2)\n", "\n", "def computeSimilarity(record):\n", " \"\"\" Compute similarity on a combination record\n", " Args:\n", " record: a pair, (google record, amazon record)\n", " Returns:\n", " pair: a pair, (google URL, amazon ID, cosine similarity value)\n", " \"\"\"\n", " googleRec = record[0]\n", " amazonRec = record[1]\n", " googleURL = googleRec[0]\n", " amazonID = amazonRec[0]\n", " googleValue = googleRec[1]\n", " amazonValue = amazonRec[1]\n", " cs = cosineSimilarity(googleValue,amazonValue, idfsSmallWeights)\n", " return (googleURL, amazonID, cs)\n", "\n", "similarities = (crossSmall\n", " .map(lambda x : computeSimilarity(x))\n", " .cache())\n", "\n", "def similar(amazonID, googleURL):\n", " \"\"\" Return similarity value\n", " Args:\n", " amazonID: amazon ID\n", " googleURL: google URL\n", " Returns:\n", " similar: cosine similarity value\n", " \"\"\"\n", " return (similarities\n", " .filter(lambda record: (record[0] == googleURL and record[1] == amazonID))\n", " .collect()[0][2])\n", "\n", "similarityAmazonGoogle = similar('b000o24l3q', 'http://www.google.com/base/feeds/snippets/17242822440574356561')\n", "print 'Requested similarity is %s.' % similarityAmazonGoogle" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n" ] } ], "source": [ "# TEST Perform Entity Resolution (3c)\n", "Test.assertTrue(abs(similarityAmazonGoogle - 0.000303171940451) < 0.0000001,\n", " 'incorrect similarityAmazonGoogle')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(3d) Perform Entity Resolution with Broadcast Variables**\n", "#### The solution in (3c) works well for small datasets, but it requires Spark to (automatically) send the `idfsSmallWeights` variable to all the workers. If we didn't `cache()` similarities, then it might have to be recreated if we run `similar()` multiple times. This would cause Spark to send `idfsSmallWeights` every time.\n", "#### Instead, we can use a broadcast variable - we define the broadcast variable in the driver and then we can refer to it in each worker. Spark saves the broadcast variable at each worker, so it is only sent once.\n", "#### The steps you should perform are:\n", "* #### Define a `computeSimilarityBroadcast` function that given an element from the combination RDD computes the cosine simlarity for the two records in the element. This will be the same as the worker function `computeSimilarity` in (3c) except that it uses a broadcast variable.\n", "* #### Apply the worker function to every element in the RDD\n", "#### Again, compute the similarity between Amazon record `b000o24l3q` and Google record `http://www.google.com/base/feeds/snippets/17242822440574356561`." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requested similarity is 0.000303171940451.\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "def computeSimilarityBroadcast(record):\n", " \"\"\" Compute similarity on a combination record, using Broadcast variable\n", " Args:\n", " record: a pair, (google record, amazon record)\n", " Returns:\n", " pair: a pair, (google URL, amazon ID, cosine similarity value)\n", " \"\"\"\n", " googleRec = record[0]\n", " amazonRec = record[1]\n", " googleURL = googleRec[0]\n", " amazonID = amazonRec[0]\n", " googleValue = googleRec[1]\n", " amazonValue = amazonRec[1]\n", " cs = cosineSimilarity(googleValue,amazonValue, idfsSmallBroadcast.value)\n", " return (googleURL, amazonID, cs)\n", "\n", "idfsSmallBroadcast = sc.broadcast(idfsSmallWeights)\n", "similaritiesBroadcast = (crossSmall\n", " .map(lambda x : computeSimilarity(x))\n", " .cache())\n", "\n", "def similarBroadcast(amazonID, googleURL):\n", " \"\"\" Return similarity value, computed using Broadcast variable\n", " Args:\n", " amazonID: amazon ID\n", " googleURL: google URL\n", " Returns:\n", " similar: cosine similarity value\n", " \"\"\"\n", " return (similaritiesBroadcast\n", " .filter(lambda record: (record[0] == googleURL and record[1] == amazonID))\n", " .collect()[0][2])\n", "\n", "similarityAmazonGoogleBroadcast = similarBroadcast('b000o24l3q', 'http://www.google.com/base/feeds/snippets/17242822440574356561')\n", "print 'Requested similarity is %s.' % similarityAmazonGoogleBroadcast" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Perform Entity Resolution with Broadcast Variables (3d)\n", "from pyspark import Broadcast\n", "Test.assertTrue(isinstance(idfsSmallBroadcast, Broadcast), 'incorrect idfsSmallBroadcast')\n", "Test.assertEquals(len(idfsSmallBroadcast.value), 4772, 'incorrect idfsSmallBroadcast value')\n", "Test.assertTrue(abs(similarityAmazonGoogleBroadcast - 0.000303171940451) < 0.0000001,\n", " 'incorrect similarityAmazonGoogle')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(3e) Perform a Gold Standard evaluation**\n", "#### First, we'll load the \"gold standard\" data and use it to answer several questions. We read and parse the Gold Standard data, where the format of each line is \"Amazon Product ID\",\"Google URL\". The resulting RDD has elements of the form (\"AmazonID GoogleURL\", 'gold')" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Read 1301 lines, successfully parsed 1300 lines, failed to parse 0 lines\n" ] } ], "source": [ "GOLDFILE_PATTERN = '^(.+),(.+)'\n", "\n", "# Parse each line of a data file useing the specified regular expression pattern\n", "def parse_goldfile_line(goldfile_line):\n", " \"\"\" Parse a line from the 'golden standard' data file\n", " Args:\n", " goldfile_line: a line of data\n", " Returns:\n", " pair: ((key, 'gold', 1 if successful or else 0))\n", " \"\"\"\n", " match = re.search(GOLDFILE_PATTERN, goldfile_line)\n", " if match is None:\n", " print 'Invalid goldfile line: %s' % goldfile_line\n", " return (goldfile_line, -1)\n", " elif match.group(1) == '\"idAmazon\"':\n", " print 'Header datafile line: %s' % goldfile_line\n", " return (goldfile_line, 0)\n", " else:\n", " key = '%s %s' % (removeQuotes(match.group(1)), removeQuotes(match.group(2)))\n", " return ((key, 'gold'), 1)\n", "\n", "goldfile = os.path.join(baseDir, inputPath, GOLD_STANDARD_PATH)\n", "gsRaw = (sc\n", " .textFile(goldfile)\n", " .map(parse_goldfile_line)\n", " .cache())\n", "\n", "gsFailed = (gsRaw\n", " .filter(lambda s: s[1] == -1)\n", " .map(lambda s: s[0]))\n", "for line in gsFailed.take(10):\n", " print 'Invalid goldfile line: %s' % line\n", "\n", "goldStandard = (gsRaw\n", " .filter(lambda s: s[1] == 1)\n", " .map(lambda s: s[0])\n", " .cache())\n", "\n", "print 'Read %d lines, successfully parsed %d lines, failed to parse %d lines' % (gsRaw.count(),\n", " goldStandard.count(),\n", " gsFailed.count())\n", "assert (gsFailed.count() == 0)\n", "assert (gsRaw.count() == (goldStandard.count() + 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using the \"gold standard\" data we can answer the following questions:\n", "* #### How many true duplicate pairs are there in the small datasets?\n", "* #### What is the average similarity score for true duplicates?\n", "* #### What about for non-duplicates?\n", "#### The steps you should perform are:\n", "* #### Create a new `sims` RDD from the `similaritiesBroadcast` RDD, where each element consists of a pair of the form (\"AmazonID GoogleURL\", cosineSimilarityScore). An example entry from `sims` is: ('b000bi7uqs http://www.google.com/base/feeds/snippets/18403148885652932189', 0.40202896125621296)\n", "* #### Combine the `sims` RDD with the `goldStandard` RDD by creating a new `trueDupsRDD` RDD that has the just the cosine similarity scores for those \"AmazonID GoogleURL\" pairs that appear in both the `sims` RDD and `goldStandard` RDD. Hint: you can do this using the join() transformation.\n", "* #### Count the number of true duplicate pairs in the `trueDupsRDD` dataset\n", "* #### Compute the average similarity score for true duplicates in the `trueDupsRDD` datasets. Remember to use `float` for calculation\n", "* #### Create a new `nonDupsRDD` RDD that has the just the cosine similarity scores for those \"AmazonID GoogleURL\" pairs from the `similaritiesBroadcast` RDD that **do not** appear in both the *sims* RDD and gold standard RDD.\n", "* #### Compute the average similarity score for non-duplicates in the last datasets. Remember to use `float` for calculation" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 146 true duplicates.\n", "The average similarity of true duplicates is 0.264332573435.\n", "And for non duplicates, it is 0.00123476304656.\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "sims = similaritiesBroadcast.map(lambda x: (x[1]+\" \"+x[0],x[2]))\n", "\n", "trueDupsRDD = (sims\n", " .join(goldStandard)).map(lambda x: (x[0],x[1][0]))\n", "trueDupsCount = trueDupsRDD.count()\n", "avgSimDups = trueDupsRDD.map(lambda x:x[1]).reduce(lambda x,y:x+y)/float(trueDupsCount)\n", "\n", "nonDupsRDD = (sims\n", " .leftOuterJoin(goldStandard)).filter(lambda x: x[1][1]!='gold').map(lambda x: (x[0],x[1][0]))\n", "nonDupsCount = nonDupsRDD.count()\n", "avgSimNon = nonDupsRDD.map(lambda x:x[1]).reduce(lambda x,y:x+y)/float(nonDupsCount)\n", "\n", "print 'There are %s true duplicates.' % trueDupsCount\n", "print 'The average similarity of true duplicates is %s.' % avgSimDups\n", "print 'And for non duplicates, it is %s.' % avgSimNon" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Perform a Gold Standard evaluation (3e)\n", "Test.assertEquals(trueDupsCount, 146, 'incorrect trueDupsCount')\n", "Test.assertTrue(abs(avgSimDups - 0.264332573435) < 0.0000001, 'incorrect avgSimDups')\n", "Test.assertTrue(abs(avgSimNon - 0.00123476304656) < 0.0000001, 'incorrect avgSimNon')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Part 4: Scalable ER**\n", "#### In the previous parts, we built a text similarity function and used it for small scale entity resolution. Our implementation is limited by its quadratic run time complexity, and is not practical for even modestly sized datasets. In this part, we will implement a more scalable algorithm and use it to do entity resolution on the full dataset.\n", "### Inverted Indices\n", "#### To improve our ER algorithm from the earlier parts, we should begin by analyzing its running time. In particular, the algorithm above is quadratic in two ways. First, we did a lot of redundant computation of tokens and weights, since each record was reprocessed every time it was compared. Second, we made quadratically many token comparisons between records.\n", "#### The first source of quadratic overhead can be eliminated with precomputation and look-up tables, but the second source is a little more tricky. In the worst case, every token in every record in one dataset exists in every record in the other dataset, and therefore every token makes a non-zero contribution to the cosine similarity. In this case, token comparison is unavoidably quadratic.\n", "#### But in reality most records have nothing (or very little) in common. Moreover, it is typical for a record in one dataset to have at most one duplicate record in the other dataset (this is the case assuming each dataset has been de-duplicated against itself). In this case, the output is linear in the size of the input and we can hope to achieve linear running time.\n", "#### An [**inverted index**](https://en.wikipedia.org/wiki/Inverted_index) is a data structure that will allow us to avoid making quadratically many token comparisons. It maps each token in the dataset to the list of documents that contain the token. So, instead of comparing, record by record, each token to every other token to see if they match, we will use inverted indices to *look up* records that match on a particular token.\n", "> #### **Note on terminology**: In text search, a *forward* index maps documents in a dataset to the tokens they contain. An *inverted* index supports the inverse mapping.\n", "> #### **Note**: For this section, use the complete Google and Amazon datasets, not the samples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(4a) Tokenize the full dataset**\n", "#### Tokenize each of the two full datasets for Google and Amazon." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Amazon full dataset is 1363 products, Google full dataset is 3226 products\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "amazonFullRecToToken = amazon.map(lambda x:(x[0],tokenize(x[1])))\n", "googleFullRecToToken = google.map(lambda x:(x[0],tokenize(x[1])))\n", "print 'Amazon full dataset is %s products, Google full dataset is %s products' % (amazonFullRecToToken.count(),\n", " googleFullRecToToken.count())" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Tokenize the full dataset (4a)\n", "Test.assertEquals(amazonFullRecToToken.count(), 1363, 'incorrect amazonFullRecToToken.count()')\n", "Test.assertEquals(googleFullRecToToken.count(), 3226, 'incorrect googleFullRecToToken.count()')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(4b) Compute IDFs and TF-IDFs for the full datasets**\n", "#### We will reuse your code from above to compute IDF weights for the complete combined datasets.\n", "#### The steps you should perform are:\n", "* #### Create a new `fullCorpusRDD` that contains the tokens from the full Amazon and Google datasets.\n", "* #### Apply your `idfs` function to the `fullCorpusRDD`\n", "* #### Create a broadcast variable containing a dictionary of the IDF weights for the full dataset.\n", "* #### For each of the Amazon and Google full datasets, create weight RDDs that map IDs/URLs to TF-IDF weighted token vectors." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 17078 unique tokens in the full datasets.\n", "There are 1363 Amazon weights and 3226 Google weights.\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "fullCorpusRDD = amazonFullRecToToken.union(googleFullRecToToken)\n", "idfsFull = idfs(fullCorpusRDD)\n", "idfsFullCount = idfsFull.count()\n", "print 'There are %s unique tokens in the full datasets.' % idfsFullCount\n", "\n", "# Recompute IDFs for full dataset\n", "idfsFullWeights = idfsFull.collectAsMap()\n", "idfsFullBroadcast = sc.broadcast(idfsFullWeights)\n", "\n", "# Pre-compute TF-IDF weights. Build mappings from record ID weight vector.\n", "amazonWeightsRDD = amazonFullRecToToken.map(lambda x: (x[0],tfidf(x[1], idfsFullBroadcast.value)))\n", "googleWeightsRDD = googleFullRecToToken.map(lambda x: (x[0],tfidf(x[1], idfsFullBroadcast.value)))\n", "print 'There are %s Amazon weights and %s Google weights.' % (amazonWeightsRDD.count(),\n", " googleWeightsRDD.count())" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n", "1 test passed.\n", "[('b000jz4hqo', {'rom': 2.4051362683438153, 'clickart': 56.65432098765432, '950': 254.94444444444443, 'image': 3.6948470209339774, 'premier': 9.27070707070707, '000': 6.218157181571815, 'dvd': 1.287598204264871, 'broderbund': 22.169082125603865, 'pack': 2.98180636777128})]\n" ] } ], "source": [ "# TEST Compute IDFs and TF-IDFs for the full datasets (4b)\n", "Test.assertEquals(idfsFullCount, 17078, 'incorrect idfsFullCount')\n", "Test.assertEquals(amazonWeightsRDD.count(), 1363, 'incorrect amazonWeightsRDD.count()')\n", "Test.assertEquals(googleWeightsRDD.count(), 3226, 'incorrect googleWeightsRDD.count()')\n", "print amazonWeightsRDD.take(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(4c) Compute Norms for the weights from the full datasets**\n", "#### We will reuse your code from above to compute norms of the IDF weights for the complete combined dataset.\n", "#### The steps you should perform are:\n", "* #### Create two collections, one for each of the full Amazon and Google datasets, where IDs/URLs map to the norm of the associated TF-IDF weighted token vectors.\n", "* #### Convert each collection into a broadcast variable, containing a dictionary of the norm of IDF weights for the full dataset" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# TODO: Replace with appropriate code\n", "amazonNorms = amazonWeightsRDD.map(lambda x:(x[0],norm(x[1])))\n", "amazonNormsBroadcast = sc.broadcast(amazonNorms.collectAsMap())\n", "googleNorms = googleWeightsRDD.map(lambda x:(x[0],norm(x[1])))\n", "googleNormsBroadcast = sc.broadcast(googleNorms.collectAsMap())" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n", "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Compute Norms for the weights from the full datasets (4c)\n", "Test.assertTrue(isinstance(amazonNormsBroadcast, Broadcast), 'incorrect amazonNormsBroadcast')\n", "Test.assertEquals(len(amazonNormsBroadcast.value), 1363, 'incorrect amazonNormsBroadcast.value')\n", "Test.assertTrue(isinstance(googleNormsBroadcast, Broadcast), 'incorrect googleNormsBroadcast')\n", "Test.assertEquals(len(googleNormsBroadcast.value), 3226, 'incorrect googleNormsBroadcast.value')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(4d) Create inverted indicies from the full datasets**\n", "#### Build inverted indices of both data sources.\n", "#### The steps you should perform are:\n", "* #### Create an invert function that given a pair of (ID/URL, TF-IDF weighted token vector), returns a list of pairs of (token, ID/URL). Recall that the TF-IDF weighted token vector is a Python dictionary with keys that are tokens and values that are weights.\n", "* #### Use your invert function to convert the full Amazon and Google TF-IDF weighted token vector datasets into two RDDs where each element is a pair of a token and an ID/URL that contain that token. These are inverted indicies." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 111387 Amazon inverted pairs and 77678 Google inverted pairs.\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "def invert(record):\n", " \"\"\" Invert (ID, tokens) to a list of (token, ID)\n", " Args:\n", " record: a pair, (ID, token vector)\n", " Returns:\n", " pairs: a list of pairs of token to ID\n", " \"\"\"\n", " id, tf_idf_dict = record\n", " pairs =[]\n", " for token,weight in tf_idf_dict.iteritems():\n", " pairs.append((token,id))\n", " return (pairs)\n", "\n", "amazonInvPairsRDD = (amazonWeightsRDD\n", " .flatMap(lambda x : invert(x))\n", " .cache())\n", "\n", "googleInvPairsRDD = (googleWeightsRDD\n", " .flatMap(lambda x : invert(x))\n", " .cache())\n", "\n", "print 'There are %s Amazon inverted pairs and %s Google inverted pairs.' % (amazonInvPairsRDD.count(),\n", " googleInvPairsRDD.count())" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Create inverted indicies from the full datasets (4d)\n", "invertedPair = invert((1, {'foo': 2}))\n", "Test.assertEquals(invertedPair[0][1], 1, 'incorrect invert result')\n", "Test.assertEquals(amazonInvPairsRDD.count(), 111387, 'incorrect amazonInvPairsRDD.count()')\n", "Test.assertEquals(googleInvPairsRDD.count(), 77678, 'incorrect googleInvPairsRDD.count()')\n", "#print googleInvPairsRDD.take(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(4e) Identify common tokens from the full dataset**\n", "#### We are now in position to efficiently perform ER on the full datasets. Implement the following algorithm to build an RDD that maps a pair of (ID, URL) to a list of tokens they share in common:\n", "* #### Using the two inverted indicies (RDDs where each element is a pair of a token and an ID or URL that contains that token), create a new RDD that contains only tokens that appear in both datasets. This will yield an RDD of pairs of (token, iterable(ID, URL)).\n", "* #### We need a mapping from (ID, URL) to token, so create a function that will swap the elements of the RDD you just created to create this new RDD consisting of ((ID, URL), token) pairs.\n", "* #### Finally, create an RDD consisting of pairs mapping (ID, URL) to all the tokens the pair shares in common" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 2441100 common tokens\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "def swap(record):\n", " \"\"\" Swap (token, (ID, URL)) to ((ID, URL), token)\n", " Args:\n", " record: a pair, (token, (ID, URL))\n", " Returns:\n", " pair: ((ID, URL), token)\n", " \"\"\"\n", " token = record[0]\n", " keys = record[1]\n", " return (keys, token)\n", "\n", "commonTokens = (amazonInvPairsRDD\n", " .join(googleInvPairsRDD).map(lambda x : swap(x)).groupByKey()\n", " .cache())\n", "\n", "print 'Found %d common tokens' % commonTokens.count()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n" ] } ], "source": [ "# TEST Identify common tokens from the full dataset (4e)\n", "Test.assertEquals(commonTokens.count(), 2441100, 'incorrect commonTokens.count()')\n", "#print commonTokens.take(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(4f) Identify common tokens from the full dataset**\n", "#### Use the data structures from parts **(4a)** and **(4e)** to build a dictionary to map record pairs to cosine similarity scores.\n", "#### The steps you should perform are:\n", "* #### Create two broadcast dictionaries from the amazonWeights and googleWeights RDDs\n", "* #### Create a `fastCosinesSimilarity` function that takes in a record consisting of the pair ((Amazon ID, Google URL), tokens list) and computes the sum for each of the tokens in the token list of the products of the Amazon weight for the token times the Google weight for the token. The sum should then be divided by the norm for the Google URL and then divided by the norm for the Amazon ID. The function should return this value in a pair with the key being the (Amazon ID, Google URL). *Make sure you use broadcast variables you created for both the weights and norms*\n", "* #### Apply your `fastCosinesSimilarity` function to the common tokens from the full dataset" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2441100\n" ] } ], "source": [ "# TODO: Replace with appropriate code\n", "amazonWeightsBroadcast = sc.broadcast(amazonWeightsRDD.collectAsMap())\n", "googleWeightsBroadcast = sc.broadcast(googleWeightsRDD.collectAsMap())\n", "\n", "def fastCosineSimilarity(record):\n", " \"\"\" Compute Cosine Similarity using Broadcast variables\n", " Args:\n", " record: ((ID, URL), token)\n", " Returns:\n", " pair: ((ID, URL), cosine similarity value)\n", " \"\"\"\n", " amazonRec = record[0][0]\n", " #print amazonRec\n", " googleRec = record[0][1]\n", " tokens = record[1]\n", " s=0\n", " #s=dotprod(amazonWeightsBroadcast.value[amazonRec]*googleWeightsBroadcast.value[googleRec])\n", " for token in tokens:\n", " s += amazonWeightsBroadcast.value[amazonRec][token]*googleWeightsBroadcast.value[googleRec][token]\n", " value = s/amazonNormsBroadcast.value[amazonRec]/googleNormsBroadcast.value[googleRec]\n", " key = (amazonRec, googleRec)\n", " #print key\n", " return (key, value)\n", "\n", "#print commonTokens.take(2)\n", "similaritiesFullRDD = ((commonTokens\n", " .map(lambda x:fastCosineSimilarity(x)))\n", " .cache())\n", "\n", "print similaritiesFullRDD.count()" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 test passed.\n", "1 test passed.\n", "1 test passed.\n" ] } ], "source": [ "# TEST Identify common tokens from the full dataset (4f)\n", "similarityTest = similaritiesFullRDD.filter(lambda ((aID, gURL), cs): aID == 'b00005lzly' and gURL == 'http://www.google.com/base/feeds/snippets/13823221823254120257').collect()\n", "Test.assertEquals(len(similarityTest), 1, 'incorrect len(similarityTest)')\n", "Test.assertTrue(abs(similarityTest[0][1] - 4.286548414e-06) < 0.000000000001, 'incorrect similarityTest fastCosineSimilarity')\n", "Test.assertEquals(similaritiesFullRDD.count(), 2441100, 'incorrect similaritiesFullRDD.count()')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Part 5: Analysis**\n", "#### Now we have an authoritative list of record-pair similarities, but we need a way to use those similarities to decide if two records are duplicates or not. The simplest approach is to pick a **threshold**. Pairs whose similarity is above the threshold are declared duplicates, and pairs below the threshold are declared distinct.\n", "#### To decide where to set the threshold we need to understand what kind of errors result at different levels. If we set the threshold too low, we get more **false positives**, that is, record-pairs we say are duplicates that in reality are not. If we set the threshold too high, we get more **false negatives**, that is, record-pairs that really are duplicates but that we miss.\n", "#### ER algorithms are evaluated by the common metrics of information retrieval and search called **precision** and **recall**. Precision asks of all the record-pairs marked duplicates, what fraction are true duplicates? Recall asks of all the true duplicates in the data, what fraction did we successfully find? As with false positives and false negatives, there is a trade-off between precision and recall. A third metric, called **F-measure**, takes the harmonic mean of precision and recall to measure overall goodness in a single value:\n", "#### $$ Fmeasure = 2 \\frac{precision * recall}{precision + recall} $$\n", "> #### **Note**: In this part, we use the \"gold standard\" mapping from the included file to look up true duplicates, and the results of Part 4.\n", "> #### **Note**: In this part, you will not be writing any code. We've written all of the code for you. Run each cell and then answer the quiz questions on Studio." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(5a) Counting True Positives, False Positives, and False Negatives**\n", "#### We need functions that count True Positives (true duplicates above the threshold), and False Positives and False Negatives:\n", "* #### We start with creating the `simsFullRDD` from our `similaritiesFullRDD` that consists of a pair of ((Amazon ID, Google URL), simlarity score)\n", "* #### From this RDD, we create an RDD consisting of only the similarity scores\n", "* #### To look up the similarity scores for true duplicates, we perform a left outer join using the `goldStandard` RDD and `simsFullRDD` and extract the" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 1300 true duplicates.\n" ] } ], "source": [ "# Create an RDD of ((Amazon ID, Google URL), similarity score)\n", "simsFullRDD = similaritiesFullRDD.map(lambda x: (\"%s %s\" % (x[0][0], x[0][1]), x[1]))\n", "assert (simsFullRDD.count() == 2441100)\n", "\n", "# Create an RDD of just the similarity scores\n", "simsFullValuesRDD = (simsFullRDD\n", " .map(lambda x: x[1])\n", " .cache())\n", "assert (simsFullValuesRDD.count() == 2441100)\n", "\n", "# Look up all similarity scores for true duplicates\n", "\n", "# This helper function will return the similarity score for records that are in the gold standard and the simsFullRDD (True positives), and will return 0 for records that are in the gold standard but not in simsFullRDD (False Negatives).\n", "def gs_value(record):\n", " if (record[1][1] is None):\n", " return 0\n", " else:\n", " return record[1][1]\n", "\n", "# Join the gold standard and simsFullRDD, and then extract the similarities scores using the helper function\n", "trueDupSimsRDD = (goldStandard\n", " .leftOuterJoin(simsFullRDD)\n", " .map(gs_value)\n", " .cache())\n", "print 'There are %s true duplicates.' % trueDupSimsRDD.count()\n", "assert(trueDupSimsRDD.count() == 1300)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The next step is to pick a threshold between 0 and 1 for the count of True Positives (true duplicates above the threshold). However, we would like to explore many different thresholds. To do this, we divide the space of thresholds into 100 bins, and take the following actions:\n", "* #### We use Spark Accumulators to implement our counting function. We define a custom accumulator type, `VectorAccumulatorParam`, along with functions to initialize the accumulator's vector to zero, and to add two vectors. Note that we have to use the += operator because you can only add to an accumulator.\n", "* #### We create a helper function to create a list with one entry (bit) set to a value and all others set to 0.\n", "* #### We create 101 bins for the 100 threshold values between 0 and 1.\n", "* #### Now, for each similarity score, we can compute the false positives. We do this by adding each similarity score to the appropriate bin of the vector. Then we remove true positives from the vector by using the gold standard data.\n", "* #### We define functions for computing false positive and negative and true positives, for a given threshold." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from pyspark.accumulators import AccumulatorParam\n", "class VectorAccumulatorParam(AccumulatorParam):\n", " # Initialize the VectorAccumulator to 0\n", " def zero(self, value):\n", " return [0] * len(value)\n", "\n", " # Add two VectorAccumulator variables\n", " def addInPlace(self, val1, val2):\n", " for i in xrange(len(val1)):\n", " val1[i] += val2[i]\n", " return val1\n", "\n", "# Return a list with entry x set to value and all other entries set to 0\n", "def set_bit(x, value, length):\n", " bits = []\n", " for y in xrange(length):\n", " if (x == y):\n", " bits.append(value)\n", " else:\n", " bits.append(0)\n", " return bits\n", "\n", "# Pre-bin counts of false positives for different threshold ranges\n", "BINS = 101\n", "nthresholds = 100\n", "def bin(similarity):\n", " return int(similarity * nthresholds)\n", "\n", "# fpCounts[i] = number of entries (possible false positives) where bin(similarity) == i\n", "zeros = [0] * BINS\n", "fpCounts = sc.accumulator(zeros, VectorAccumulatorParam())\n", "\n", "def add_element(score):\n", " global fpCounts\n", " b = bin(score)\n", " fpCounts += set_bit(b, 1, BINS)\n", "\n", "simsFullValuesRDD.foreach(add_element)\n", "\n", "# Remove true positives from FP counts\n", "def sub_element(score):\n", " global fpCounts\n", " b = bin(score)\n", " fpCounts += set_bit(b, -1, BINS)\n", "\n", "trueDupSimsRDD.foreach(sub_element)\n", "\n", "def falsepos(threshold):\n", " fpList = fpCounts.value\n", " return sum([fpList[b] for b in range(0, BINS) if float(b) / nthresholds >= threshold])\n", "\n", "def falseneg(threshold):\n", " return trueDupSimsRDD.filter(lambda x: x < threshold).count()\n", "\n", "def truepos(threshold):\n", " return trueDupSimsRDD.count() - falsenegDict[threshold]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(5b) Precision, Recall, and F-measures**\n", "#### We define functions so that we can compute the [Precision][precision-recall], [Recall][precision-recall], and [F-measure][f-measure] as a function of threshold value:\n", "* #### Precision = true-positives / (true-positives + false-positives)\n", "* #### Recall = true-positives / (true-positives + false-negatives)\n", "* #### F-measure = 2 x Recall x Precision / (Recall + Precision)\n", "[precision-recall]: https://en.wikipedia.org/wiki/Precision_and_recall\n", "[f-measure]: https://en.wikipedia.org/wiki/Precision_and_recall#F-measure" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Precision = true-positives / (true-positives + false-positives)\n", "# Recall = true-positives / (true-positives + false-negatives)\n", "# F-measure = 2 x Recall x Precision / (Recall + Precision)\n", "\n", "def precision(threshold):\n", " tp = trueposDict[threshold]\n", " return float(tp) / (tp + falseposDict[threshold])\n", "\n", "def recall(threshold):\n", " tp = trueposDict[threshold]\n", " return float(tp) / (tp + falsenegDict[threshold])\n", "\n", "def fmeasure(threshold):\n", " r = recall(threshold)\n", " p = precision(threshold)\n", " return 2 * r * p / (r + p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **(5c) Line Plots**\n", "#### We can make line plots of precision, recall, and F-measure as a function of threshold value, for thresholds between 0.0 and 1.0. You can change `nthresholds` (above in part **(5a)**) to change the threshold values to plot." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.000532546802671 0.00106452669505\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqkAAAIQCAYAAACi4/d6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzs3Xd0VNX6xvHvJIGQ0HtCCb0FxFCl9yqCgIBSFERArFgu\ncFWqchW98LtRiiAgRQFF6UiHEIqiEKpAqAlVioEQaoBkfn9sicQkkECSM5N5PmtluXLmnDPvJNzL\nwz57v9sWFBRkR0RERETEgbhZXYCIiIiIyD8ppIqIiIiIw1FIFRERERGHo5AqIiIiIg5HIVVERERE\nHI5CqoiIiIg4HIVUEREREXE4CqkiIiIi4nAUUkVERETE4Xik5OQbN24wc+ZMjh49yuHDh4mKiqJn\nz5707NkzWddfunSJyZMns3XrVqKjoylVqhS9e/ematWqD1W8iIiIiGRMKRpJvXz5Mj/99BN37tyh\nXr16KXqjW7du8e6777Jr1y7eeOMNRo0aRe7cuRk8eDC7d+9O0b1EREREJGNL0Uiqj48PS5cuBUxg\nXb58ebKvXb58OeHh4YwfPx5/f38AAgIC6NOnD5MnT2bixIkpKUVEREREMrB0m5O6efNm/Pz84gIq\ngLu7O82bNyc0NJSIiIj0KkVEREREHFy6hdSwsDBKliyZ4HiJEiUACA8PT69SRERERMTBpVtIvXLl\nCtmzZ09wPEeOHICZPiAiIiIiAimck2qFiIgITQUQERERcWB58+Ylb968qXrPdAupOXLk4MqVKwmO\nR0VFAZAzZ84Er0VERNC9T3eiI6PTvD4REREReTh58+Zl8uTJqRpU0y2klihRgmPHjiU4HhYWFvf6\nP0VERBAdGc23335LhQoV0rxGsd5bb71FYGCg1WVIOtHv27Xo9+1a9Pt2HQcOHKBHjx5EREQ4Z0it\nX78+gYGBHDhwIC5wxsTEsGbNGvz9/cmTJ0+S11aoUEEN/11Erly59Lt2Ifp9uxb9vl2Lft/yqFIc\nUn/99Vdu3rzJ9evXAbMqPzg4GIBatWrh6enJZ599xurVq5kzZw4FChQAoHXr1ixatIiRI0fSt29f\ncuXKxeLFizl9+jRjxoy573tG39HjfhERERFXkuKQGhgYyLlz5wCw2WwEBwcTHByMzWZjzpw5FCxY\nELvdHvd1V6ZMmRg7diyTJ0/miy++IDo6mtKlSzN69GgqV6583/c8f+18SssUERERESeW4pA6d+7c\nB54zePBgBg8enOB47ty5+fe//53St1RIFREREXEx6dYn9VGcu3bO6hIknXTt2tXqEiQd6fftWvT7\ndi36fcujcoqQev6qRlJdhf5PzbXo9+1a9Pt2Lfp9y6NyipCqkVQRERER1+LwO04BnL+ukVQREXFe\nhw8fTnRDGxFHlz17dsqUKWPJeztFSD13VSOpIiLinA4fPkzZsmWtLkPkoR06dMiSoOoUIVWr+0VE\nxFndHUHV7onibO7uJGXVUwCnCKkR1yO4FXOLzO6ZrS5FRETkoWj3RJGUcYqFUwBnrpyxugQRERER\nSSdOE1JPRZ2yugQRERERSScKqSIiIiLicJwipHpn8lZIFREREXEhThFSC2QroJAqIiIi4kKcIqQW\nzFpQIVVERETS1IYNG3Bzc2PkyJEPdf2MGTNwc3Nj5syZqVyZa3KKkFogq0ZSRUREnJWbm1u8Lw8P\nD/Lnz0+zZs34/vvvrS4vAZvN9tDX3f2SR+cUfVILZi3IjqgdVpchIiIiD8lmszF8+HAAbt++zYED\nB1i8eDHr168nJCSEzz77zOIK4YknniA0NJR8+fI91PUdOnSgdu3a+Pj4pHJlrsk5Qmq2gvxx7g/u\nxN7Bw80pShYREZF/GDZsWLzv169fT/Pmzfm///s/Xn/9dfz8/CyqzPDy8nqkLWxz5MhBjhw5UrEi\n1+Y0j/tj7bGcvXrW6lJEREQklTRp0oRy5coRGxvL9u3bARgxYgRubm4EBwcza9YsatSoQdasWSlR\nokTcddevX+eTTz4hICCAbNmykT17durUqcN3332X5HutXr2atm3bUqBAAbJkyYKfnx/t27dn3bp1\nceckNSf1yJEj9OnTh1KlSuHl5UWePHnw9/enf//+XLx4Me68+81J3b59Ox07dox7/+LFi/Pqq6/y\nxx9/JDi3V69euLm5cfz4cSZPnsxjjz2Gl5cXPj4+9OvXj8uXLyf/h+zEnGJYsmDWgoDplVokRxGL\nqxEREZHUZrfb430/ZswY1q5dS7t27WjWrBmRkZEAREZG0qRJE3bt2kX16tV56aWXiI2NZeXKlXTr\n1o19+/bx0UcfxbvX8OHD+eijj8iePTvt27enaNGinD59mi1btjB79myaNm0a7/x755SeOXOGmjVr\ncvXqVdq0aUOXLl24efMmx44dY/bs2bz55pvkyZMnyesBFi9eTOfOnXF3d6dTp074+fmxbds2Jk2a\nxOLFi9m8eXO8EH7XwIEDWb16Ne3ataNVq1asX7+eqVOncujQITZs2JDin7GzcaqQevLySWoVqWVx\nNSIiIpIagoKCOHjwIG5ubtSoUSPeaxs2bGDr1q08/vjj8Y6/9dZb7Nq1i7Fjx/L222/HHY+OjqZ9\n+/Z8/PHHPPPMMwQEBABmBPWjjz6iVKlSbNy4EV9f33j3O3369H1r/PHHH4mMjCQwMJA333wz3ms3\nbtx44CKpq1ev0rt377jPW6vW3zlm9OjRvP/++7z88susXr06wbXbtm3j999/p0gRM0AXExNDkyZN\n2LhxI7/99hs1a9a873s7O6cIqdk9s6uhv4iIuJTr1yE0NP3ft3x58PZO/fva7XZGjhyJ3W7n9u3b\nHDp0iEWLFgEmeP5zPmrfvn0TBNSIiAi+/fZbatasGS+gAnh6ejJ69GhWrVrFnDlz4kLquHHjADMy\n+8+AClC4cOH71u3mZmZGenl5JXgtsWP/tGjRIi5dukSPHj3iBVSAf/3rX0yePJm1a9dy4sSJBD+D\nYcOGxQVUAHd3d1588UU2bdrE9u3bFVIdgc1mo0iOIgqpIiLiMkJDoVq19H/fkBCoWjVt7n13rqfN\nZiN37tw0aNCAl156iW7duiU494knnkhwbNu2bcTGxmK32xkxYkSC12/fvg1A6D3pfuvWrbi5udGq\nVauHqrldu3a8//77vPbaa6xZs4ZmzZpRr149/P39k3X9zp07AWjcuHGC1zw8PGjQoAHffPMNu3bt\nShBSq1evnuCau6H10qVLKf0oTscpQipgQuoVhVQREXEN5cubwGjF+6YFm81GTExMss9PrI1TREQE\nYMLqtm3bknyfa9euxX0fGRlJ7ty58fT0TGHFhp+fH7/99hsjRoxg5cqV/PjjjwAULVqUQYMG8dpr\nr933+ruLnJJqS3V3dDexxVA5c+ZMcMzDw0S3lPwsnZVThdQjF49YXYaIiEi68PZOuxFNZ5DYXM+7\noe2dd95hzJgxybpPrly5uHTpEtHR0Q8dVMuXL893331HTEwMu3fvZu3atYwbN4433niDrFmz0qtX\nrySvvVvz2bOJdyi6u7o/sUDq6pyiBRVAkex63C8iIuLKnnjiCdzc3Ni4cWOyr6lduzaxsbGsWrXq\nkd/f3d2dqlWrMmjQIObOnQsQN682KVX/+pdGUFBQgtfu3LnDpk2bsNlscefJ35wnpOYowpkrZ4iJ\nzfjD2yIiIpJQ/vz56d69O9u3b2fUqFHExsYmOOfo0aOEh4fHff/GG28A8O677yY6mnnmzJn7vueO\nHTsSfRR/915ZsmS57/Xt27cnT548zJ07l19//TXea4GBgYSHh9OsWbN4C6TEcKrH/Xdi73D+2nl8\nsydcnSciIiIZ3/jx4zl8+DDDhg3jm2++oW7duhQsWJAzZ85w4MABtm/fznfffUfx4sUBaN68OUOG\nDGHUqFGUK1eO9u3bU6RIEc6ePcuWLVuoXbs206dPT/L9Zs2axVdffUW9evUoWbIkuXPn5ujRoyxd\nupQsWbIwYMCA+9abNWtWvv76azp37kzDhg3p3LkzRYsWJSQkhDVr1uDr68vkyZNT80eUYThVSAXT\n0F8hVUREJGOy2Wz37T2aPXt2goOD+eqrr5gzZw4LFizg5s2b+Pj4UKZMGQIDA2nWrFm8az788ENq\n167NF198wbJly7h27RoFCxakevXq9OzZ8771dOvWjVu3bvHzzz8TEhLCjRs3KFKkCN26dePdd9+N\nt8o/qdrbtWvHli1b+Pjjj1m1ahWXL1/G19eXV155haFDhyZYVPWgn4GrsAUFBdkffJo1Dh06xMsv\nv0xISAhFyxWlwJgCLOiygA4VOlhdmoiISLLs2LGDatWqERISonmH4lSS+2f37nmTJ0+mbNmyqfb+\nTjMnNZ93PjK7Z9biKREREREX4DQhVQ39RURERFyH04RUgKI5inIy6qTVZYiIiIhIGnOqkKqRVBER\nERHXoJAqIiIiIg7H6ULq6SunibUnbN4rIiIiIhmH04XUWzG3+PP6n1aXIiIiIiJpyOlCKqBH/iIi\nIiIZnEKqiIiIiDgcpwqpBbIWwMPNQyFVREREJINzqpDqZnOjcPbCCqkiIiIiGZxThVRQGyoRERER\nV6CQKiIiIiIOx+lCatEcRRVSRURERDI4pwupRXIU4WTUSex2u9WliIiIiAuYMWMGbm5uzJw5M97x\n4sWLU6JECYuqyvicMqTevHOTizcuWl2KiIiIJIObm1u8Lw8PD/LmzUvjxo0TBD9HZrPZknVMUoeH\n1QWk1L29UvN657W4GhEREUkOm83G8OHDAbh9+zaHDx9m4cKFBAcHs337dsaNG2dxheJonDqkPu7z\nuMXViIiISHINGzYs3vc///wzDRo0YOLEibz77rsUL17cmsLEITnd436fbD6429y1eEpERMTJ1alT\nh/Lly2O32wkJCUnw+q+//kqnTp3w8fHB09MTPz8/+vfvzx9//JHo/S5evMgHH3xApUqVyJo1K7ly\n5SIgIID33nuP69evx50XEhLCgAEDePzxx8mbNy9eXl6ULVuWd999l0uXLqXZ55WUcbqRVHc3d3yz\n+yqkioiIZACxsbEAeHp6xjv+9ddf069fP7y9vWnXrh1FihTh0KFDTJ06laVLl7J161aKFi0ad35Y\nWBiNGzfmxIkTVK9enVdffZXY2FhCQ0MJDAzklVdewc/PD4ApU6awaNEiGjVqRIsWLYiJiWHbtm38\n73//Y/ny5Wzbto1s2bKl3w9BEuV0IRX+6pV6RSFVREQyruu3rxP6Z2i6v2/5fOXxzuSdLu+1efNm\nDh48iJeXF0888UTc8UOHDtG/f39Kly5NcHAwBQsWjHtt/fr1tGjRgjfffJOFCxfGHe/evTsnTpxg\n9OjRDBo0KN77XLx4kaxZs8Z9//777/Pll18mWPT01Vdf0b9/fyZMmMDgwYNT++NKCjlvSNVIqoiI\nZGChf4ZS7atq6f6+If1CqOpbNdXva7fbGTlyJHa7ndu3b3P06FEWLlyIh4cHEydOJH/+/HHnfvnl\nl9y5c4fAwMB4ARWgSZMmtG3blqVLl3L16lWyZctGSEgIW7dupUqVKgkCKkCePHnifX93RPWf+vbt\ny8CBA1mzZo1CqgNwzpCavQh7z+21ugwREZE0Uz5feUL6JZynmR7vm1ZGjhwZ73s3Nze+/fZbnnvu\nuXjHf/nlFwCCgoLYunVrgvucP3+e2NhYDh8+TJUqVeLOadmyZbLquH37NpMnT+a7775j//79REVF\nxU07ADh9+nSKPpekDacMqWXzlmX8tvFE34nG08PzwReIiIg4Ge9M3mkyomkVm81GTEwMADdu3GDL\nli307t2bXr164ePjQ6NGjeLOjYiIAOC///3vfe939epVACIjIwEoXLhwsmp59tlnWbRoEaVKlaJD\nhw5xC7PsdjuBgYFER0c/zEeUVOaUITXAJ4A7sXfYd2FfhvofsIiIiCvw8vKiWbNmLFu2jGrVqtGz\nZ09CQ0Px8vICIGfOnNhsNi5fvpysBUy5cuUC4NSpB08F3L59O4sWLaJZs2asWLECd3f3uNfsdjuf\nfvrpQ34qSW1O14IKoHLBytiwsfOPnVaXIiIiIg+pcuXK9O3bl5MnT/K///0v7njt2rWx2+1s3Lgx\nWfepXbs2AGvWrHnguUeOHAHg6aefjhdQwbS8unnzZnLLlzTmlCE1a+aslMtXjp1nFVJFRESc2ZAh\nQ/D09GTMmDFxj+1ff/11MmXKxNtvv83hw4cTXHPr1i02bdoU933VqlWpU6cOO3bsYMyYMQnOj4iI\niHuEX6JECcDMd73X+fPnee2111Ltc8mjc8rH/QBVfKoopIqIiDi5QoUK0b9/fz7//HM+++wzPv74\nY8qVK8fXX39N7969qVixIq1ataJMmTLcvn2bEydOsGnTJgoWLMj+/fvj7vPtt9/SqFEjBg0axLx5\n82jQoAF2u53Dhw+zZs0aDh48iJ+fHzVq1KBu3bosWLCAunXrUrduXc6dO8fKlSspX748hQoVwm63\nW/gTkbucciQVTEjdfXY3MbExVpciIiIij+C9997D29ubcePGceHCBcD0PQ0JCaF79+7s2bOHCRMm\nMGfOHI4dO0aXLl2YOHFivHsUL16cHTt2MGjQIKKiopgwYQLTp0/n1KlT/Otf/4prceXm5saSJUt4\n5ZVXOHPmDOPGjePnn3+mb9++rFy5kkyZMiXon2qz2RIcu3tc0o7zjqT6VuHa7WscuXiEcvnKWV2O\niIiIJOHe9k6JKVCgQNxK/XtVqlSJ6dOnJ/t98uTJw+jRoxk9evR9z8udOzcTJkxI9LWwsLAEx3r2\n7EnPnj2Tda6kHqcdSQ3wCQDQI38RERGRDMhpQ2o+73wUyVGEXWd3WV2KiIiIiKQypw2poMVTIiIi\nIhmV84fUP3ZqFZ6IiIhIBuPcIdW3CheuX+DMlTNWlyIiIiIiqci5Q6pPFUCLp0REREQyGqcOqX45\n/cidJbe2RxURERHJYJw6pNpsNqr4avGUiIiISEbj1CEVIKBggNpQiYiIiGQwTh9Sq/hWISwyjMib\nkVaXIiIiIiKpxPlD6l+LpzSaKiIiIpJxOH1ILZevHFk8smjxlIiIiEgG4vQh1cPNg8oFK2vxlIiI\niEgG4vQhFbQ9qoiIiCNzc3O779fMmTOtLlEckIfVBaSGAJ8Apu6Yyo3bN/DK5GV1OSIiIvIPNpuN\n4cOHJ/palSpV0rkacQYZIqRW8alCjD2GfRf2Ub1QdavLERERkUQMGzbM6hLEiWSIx/2PFXwMN5ub\nFk+JiIhkYDNmzIibHrBmzRrq169P9uzZyZ8/P7169eLSpUsA7NixgzZt2pA7d26yZ8/O008/zfHj\nxxO958WLF3nvvfeoUKEC3t7e5MqVi2bNmrFmzZoE50ZFRfHf//6XJk2aUKRIETw9PSlQoABPP/00\nv/zyS6L337BhA0899VS882vWrMnIkSPjndeoUSPc3BKPZfd+7nsVL16cEiVKEBUVxYABAyhWrBiZ\nM2eOd+/Q0FB69epF0aJF8fT0xMfHh+7du3Po0KGkf9AOIkOMpHpn8qZ8vvKalyoiIuIClixZwrJl\ny2jbti2vvPIKW7ZsYdasWRw9epTRo0fTvHlzGjVqRN++fdm7dy9Lly7l2LFj7NmzB5vNFnef48eP\n06hRI44fP07Dhg1p06YNV65cYdmyZbRq1YrJkyfTp0+fuPP379/PkCFDaNiwIW3btiV37tyEh4ez\nePFili9fzpIlS2jdunXc+cuXL+epp54id+7ctGvXjsKFC3Px4kX279/PpEmTEkx/uLe2xPzzdZvN\nRnR0NI0bN+by5cu0bt2abNmyUaJECQBWrlxJx44diY2N5amnnqJ06dKcPHmSBQsW8NNPPxEUFOTQ\nUy0yREgFLZ4SERFxZHa7nZEjR2K32+MdL1GiBD179kzRvZYuXcr69eupV69e3L1btmzJ2rVrefLJ\nJ5k2bRpdu3aNO//ll19mypQpLF26lHbt2sUd79mzJydPnuSHH37gmWeeiTt++fJlGjVqxJtvvknb\ntm0pWLAgAP7+/vzxxx/kyZMnXj0nTpzgiSee4J133okXUqdOnQpAUFAQlStXjnfNxYsXU/SZE2O3\n2zl79iyVKlVi8+bNeHn9vS7n0qVLdO3alezZs7Np0ybKli0b99q+ffuoVasWL730Ejt27HjkOtJK\nhgqpC0MXEhMbg7ubu9XliIiIPJrr1yE0NP3ft3x58PZOk1v/8xE3mMfcKQ2p3bp1iwuoYEYUn3/+\nedauXUuVKlXiBVSAHj16MGXKFHbv3h0XUnfv3s3GjRvp0qVLvIAKkDNnTkaMGEGHDh2YP38+r776\nKgA5cuRItB4/Pz86derEhAkTOHXqFEWKFImrC4gXHu/6Z9B9WDabjTFjxiR4j1mzZnH58mUmTpwY\nL6ACVKxYkT59+vD555+zf/9+/P39U6WW1JZxQqpvFa7fvs6hiENUyF/B6nJEREQeTWgoVKuW/u8b\nEgJVq6b6bW02GzExMfc9Z8OGDWzYsCHescRGWqsl8nPx9fV94GunTp2KO3Z3DumlS5cYMWJEgmsu\nXLgAmDmd99qyZQuff/45v/zyCxcuXODWrVvxXj99+nRcSO3RowcLFy7kiSee4LnnnqNhw4bUrVs3\n7vXUkCVLlgSjtPD359u5c2ein+/unNTQ0FCF1LQW4BMAwM6zOxVSRUTE+ZUvbwKjFe9rkeDgYD78\n8MN4xxIbac2ZM2eCaz08PB742u3bt+OORUREALBmzZpEF0mBCdbXrl2L+37hwoV06tQJb29vmjdv\nTqlSpciaNStubm4EBQURHBxMdHR03PkdOnRg2bJljB07lmnTpjFp0iQAqlevzujRo2nSpEnSP4xk\nKlCgQKLH736+KVOmJHntPz+fo8kwITWPVx78cvqx6+wuuj3WzepyREREHo23d5qMaDqy4cOHJ9lL\nNbXdDbNffPEFr7/+erKuGTp0KFmyZGH79u2UK1cu3munT58mODg4wTVPPvkkTz75JDdu3GDr1q0s\nW7aML7/8kjZt2rBz507K//WPgrsr+2NjYxOs8o+MjEyypqQWW939fHv27KFSpUrJ+nyOJkO0oLpL\ni6dEREQkOWrXrg3Axo0bk33NkSNH8Pf3TxBQY2Nj2bx5832v9fLyonHjxowdO5b333+f6OhoVqxY\nEfd67ty5sdvtnDhxIsG127dvT3aNdz3M53M0GS+k/rEzwcpBERERkXtVq1aN+vXrs2DBAqZPn57o\nOXv37o2bmwpmfuyhQ4c4c+ZM3DG73c6IESM4cOBAglHNjRs3JjoP9+zZs0D8BVW1atUCEj6eX7du\nHXPnzk3hp4MXX3yRXLlyMXLkSLZt25bg9djY2ATzfx1Nih7337hxg2nTphEcHExUVBR+fn507do1\nWXMqQkJCmD17NmFhYURHR+Pr60ubNm1o3759ks1rU6paoWpE3Ijg2KVjlMpTKlXuKSIiIhnTnDlz\naNKkCS+99BJffPEFNWvWJFeuXJw6dYo9e/awb98+tm7dSv78+QF4++236d+/P1WrVqVjx45kypSJ\nLVu2cODAAdq2bcvSpUvj3f/NN9/kzJkz1K1bN67RfkhICEFBQRQrVoznnnsu7twXX3yRMWPG8Mkn\nn7B7924qVKjAoUOH4nqdzp8/P0WfLU+ePPz444906NCBWrVq0bRpU/z9/bHZbJw8eZJffvmFS5cu\ncf369Uf/QaaRFIXUYcOGcfDgQfr160eRIkVYu3Yto0aNwm6307Rp0ySv++233/j3v/9NQEAAAwcO\nJEuWLGzZsoXx48dz5syZZM8FeZAGxRrgbnNnzbE1CqkiIiIZjM1me2DD+5QoXLgwISEhjBs3jvnz\n5zNnzhxiYmLw9fXF39+fAQMGxJvP2a9fPzw9PQkMDGTWrFl4e3tTv359Zs6cyY8//siyZcvi3f+D\nDz5g4cKFbN++nbVr1+Lm5kaxYsX44IMPeOutt8iVK1fcufny5WPDhg0MHDiQjRs3EhwcTI0aNVi7\ndi3Hjh1jwYIFif487qdJkybs2bOHMWPGsGrVKjZt2oSnpyeFChWiWbNmCVpvORpbUFBQsp6Nb926\nlffff58hQ4bEGzkdOHAg4eHhfP/990mOiI4aNYrNmzezePFiPD09444PGjSI/fv3J/il3nXo0CFe\nfvllQkJCqJrMyeP1p9cnn3c+Fj67MFnni4iIpKUdO3ZQrVq1FP1dJuIIkvtn9+55kydPTtCT9VEk\n+zn75s2b8fb2plGjRvGOt27dmoiICA4cOJDktZ6ennh4eJA5c+Z4x7NmzRovtKaGlqVasj5sPbdj\nbj/4ZBERERFxSMkOqWFhYfj5+SUYLb27P2x4eHiS17Zv357Y2FjGjRtHREQEV69eZdWqVWzZsiXB\nrhCPqkWpFkRFR/Hr6V9T9b4iIiIikn6SPSc1KiqKwoULJzh+d4uwqKioJK8tU6YMn376KcOHD2fR\nokWA6QfWr18/OnXqlNKa76uabzXyeOVh9dHV1POr9+ALRERERMThpEsz/7179/Lee+8REBDAU089\nRZYsWdixYwdTp04lOjqa559/PtXey93NnWYlm7Hq6Co+bPzhgy8QEREREYeT7JCaI0cOLl++nOD4\n3RHUuyOqiRk3bhw+Pj589NFHcSvRAgICcHNzY8aMGTRr1ixuX93E/HMFHEDXrl2TnCrQslRL+izp\nw8UbF8njleeBn01EREREHmzu3LkJ+rbeb0esR5HskFqyZEnWr1+fYLuusLAw4O+5qYkJDw+nWbNm\nCVollCtXLm53hfuF1MDAwBStiGxRqgV27Kw9tpYuFbsk+zoRERERSVpig4R3V/entmQvnKpfvz43\nbtxIsC/typUryZcvHxUqVEjy2gIFCnDw4EFiY2PjHd+3bx9AXJPc1FIkRxH88/uz+ujqVL2viIiI\niKSPZI+k1qxZk2rVqhEYGMj169cpVKgQ69atY/v27XzwwQdxo6SfffYZq1evZs6cORQoUACALl26\nEBgYyPvvv0/btm3x9PRkx44d/PDDD1SrVo2SJUum+gdrUbIF8w/Mx263p2rjXxERERFJeylaOPXh\nhx8ybdoDicBbAAAgAElEQVQ0pk+fTlRUFMWKFWPo0KE0btw47hy73R73dVe7du3Imzcv8+bNY+zY\nsdy8eRNfX1969uxJ586dU+/T3KNl6ZYE/hpI6J+hVMif9CiviIiIiDieFIVULy8vXn/99ftuYzp4\n8GAGDx6c4HjdunWpW7duyit8SA2KNcDT3ZPVR1crpIqIiIg4mXRpQWUF70ze1C9Wn1VHVzGg1gCr\nyxERERd3v50ZRRyR1X9mM2xIBTMvdfiG4UTficbTI3W3XxUREUmO7NmzA9CjRw+LKxF5OHf/DKe3\nDB1SW5ZuyaC1g9h8YjNNSza1uhwREXFBZcqU4dChQ1y5csXqUkRSLHv27JQpU8aS987QIfWxAo9R\nMGtBVh9drZAqIiKWseoveRFnluw+qc7IZrPRolQLVh1dZXUpIiIiIpICGTqkgtkidfe53Zy9etbq\nUkREREQkmTJ8SG1eqjkAa46usbgSEREREUmuDB9SC2QtQBWfKqw+pi1SRURERJxFhg+pAC1KtWD1\n0dXE2mOtLkVEREREksElQmrLUi05f+08u8/utroUEREREUkGlwipdYrWIYtHFoLCg6wuRURERESS\nwSVCqqeHJ7WL1Cb4eLDVpYiIiIhIMrhESAVoWKwhm45v0rxUERERESfgOiG1eEMu3bzE3nN7rS5F\nRERERB7AZULqE4WfILN7Zj3yFxEREXECLhNSvTJ5UatILTaEb7C6FBERERF5AJcJqWDmpW48vlHz\nUkVEREQcnMuF1IgbEey/sN/qUkRERETkPlwqpNYuWptMbpkIDte8VBERERFH5lIh1TuTNzUK19Di\nKREREREH51IhFaBRsUYEHw/GbrdbXYqIiIiIJMHlQmrD4g05f+08oX+GWl2KiIiIiCTB5UJqnaJ1\ncLe565G/iIiIiANzuZCaLXM2qheqrpAqIiIi4sBcLqQCNCreiA3hGzQvVURERMRBuWRIbVisIWev\nnuXwxcNWlyIiIiIiiXDJkFrXry5uNjf1SxURERFxUC4ZUnN45qCqb1XNSxURERFxUC4ZUsE88le/\nVBERERHH5LIhtVHxRpyKOsWxS8esLkVERERE/sFlQ2o9v3rYsOmRv4iIiIgDctmQmitLLgJ8AhRS\nRURERByQy4ZU+Gteqlb4i4iIiDgc1w6pxRty/PJxwiPDrS5FRERERO7h0iG1QbEGZl6qRlNFRERE\nHIpLh9Q8XnmoVqgayw4vs7oUEREREbmHS4dUgE4VOrH88HKu3bpmdSkiIiIi8heXD6nP+D/D9dvX\nWXFkhdWliIiIiMhfXD6kls5TmgCfAH7c/6PVpYiIiIjIX1w+pAJ09u/MskPLuHH7htWliIiIiAgK\nqQB08u/EtdvXWHlkpdWliIiIiAgKqQCUzVuWxwo8xo8H9MhfRERExBEopP6lk38nlh5cys07N60u\nRURERMTlKaT+pbN/Z67cusKao2usLkVERETE5Smk/qVC/gr45/fnh/0/WF2KiIiIiMtTSL1Hpwqd\nWHJwCdF3oq0uRURERMSlKaTeo3PFzlyOvsy6sHVWlyIiIiLi0hRS71Exf0XK5S2nR/4iIiIiFlNI\nvYfNZqOTfycWhS7iVswtq8sRERERcVkKqf/Q2b8zkTcjCQoLsroUEREREZelkPoPlQtWpnSe0nrk\nLyIiImIhhdR/sNlsdKrQiYWhC7kdc9vqckRERERckkJqIjr5d+LijYsEHw+2uhQRERERl6SQmoiq\nvlUpkasE8/bNs7oUEREREZekkJoIm81G10pdmbdvHjdu37C6HBERERGXo5CahBervMjl6MssDF1o\ndSkiIiIiLkchNQml85Smvl99pu+abnUpIiIiIi5HIfU+elfpzbpj6wiPDLe6FBERERGXopB6H538\nO5E1c1Zm7pppdSkiIiIiLkUh9T6yZc5GF/8uTN81nVh7rNXliIiIiLgMhdQH6F2lN8cvH2dD+Aar\nSxERERFxGQqpD1CnaB3K5i3L1zu/troUEREREZehkPoANpuNFwNeZP6B+UTejLS6HBERERGXoJCa\nDC88/gK3Ym7x/e/fW12KiIiIZBBhYTBvHty8aXUljkkhNRkKZS9E69Kt+XqXHvmLiIjIw4uNhVWr\noG1bKFUKnn0WKlSABQvAbre6OseikJpMvav05rfTv7Hv/D6rSxEREREnExkJgYFQrhy0agWnTsGU\nKbBzJ1SsCM88A02bwt69VlfqOBRSk+mpsk+RzzufdqASERGRFPn+eyhcGAYOhBo1YPNm2LEDXnoJ\nAgJg2TJYvhxOnzbfv/YaRERYXbX1PKwuwFlkds9Mj8d6MGv3LD5p+gmZ3DNZXZKIiIg4uIsXTehs\n1gwmTQJf38TPa93ajKSOHw8jR8Ls2VC5Mvj4mGt8fMxXoUJQvz54e6fv57CCRlJToHeV3ly4foGf\nDv9kdSkiIiLiBIYPh1u37h9Q78qcGd55Bw4fhjffhGLFTMhdtw7GjoXevc1Ugfr14dy59KnfShpJ\nTYHHCj5GNd9qfL3za9qXb291OSIiIuLA9u6FiRPh008fHFDvVaAAfPhhwuO3bplpAh06QN26ZgFW\nqVKpV6+j0UhqCvUK6MWKIyu4eOOi1aWIiIiIg7Lb4Y03oEwZMyqaGjJnhlq14Oefwc0N6tQxC68y\nKoXUFOrk34mY2BgWhS6yuhQRERFxUD/8AMHBZkV/5sype+8SJWDLFjMdoGFDWL8+de/vKBRSU8gn\nmw8NijVg3r55VpciIiIiFnhQP9Nr1+Bf/4J27cwc0rSQP78Jp3XqmPeYlwFjiULqQ+hSsQtrj60l\n4rr6Q4iIiLiSkyehfHnTjP/YscTP+fRTOH8e/u//0raWbNlgyRKzIcBzz8HMmWn7fulNIfUhdKzQ\nETt2FoYutLoUERERSYbz5+GXXx7tHufOmVZSN2/C7t2mCf+HH8bf1vTYMfjsMzOSmh6LmjJnNuG0\nY0dTS0batUoh9SH4ZPOhYbGGeuQvIiLiBG7ehJYtoUED+P33h7vHpUvmHleumMfsBw7A22/DqFEm\nrC5fbs57913Ilw/eey/16n8QNzfo29cE5D170u9905pC6kPqUrEL68PWc+HaBatLERERkft46y0I\nDYUiRaBfP4iNTdn1V69CmzbmUf+aNWaENGtW+PhjEwpLlDCvN2gAixbBmDHm9fTUuDHkzAkLFqTv\n+6YlhdSHpEf+IiIijm/OHJg8GcaNM4/Ff/nFfJ9cN2+avqS//276klasGP/18uVNcP3uOzh6FJo0\nMXNE01vmzGaebFqG1I0bzYhyelFIfUgFshagcfHGeuQvIiLioEJDzchpjx7w0ktmpLNPH/j3v+HM\nmQdff/u2WZC0eTMsWwbVqyd+ns1mgml4OKxYYb63wt0wffhw6t/7xg3TrSCtF4PdSyH1EXSp2IWg\n8CDOXztvdSkiIiJyj+vXoXNnKFoUvvzy7+D42Wfg5fXgBvt37phtSH/6CebPNwH3QTJlSv2eqCnR\nsqX5bAvT4CHvwoVw+TL07Jn6906KQuoj6FC+AzZsLDiQgSaAiIiIZABvvGEev//wg2nVdFfu3KbB\n/vz5pn1TYu6u4v/uO5g9G558Mn1qflRZs5qeqWnxyH/6dKhfH0qXTv17J0Uh9RHkz5qfJiWa6JG/\niIiIA5k5E77+GiZOhEqVEr7+7LPQujW89ppZrX+vn3+GqlXh4EEICoIuXdKn5tTSsSP8+iucOpV6\n9zxxAtatgxdfTL17JodC6iPqUrELwceDOXf1nNWliIiIuLx9++CVV6BXL/OVGJvNBNiLF2HIEHPM\nbofx4802o6VKwY4dUK9eelWdep56Cjw8TJeB1DJzJnh7m+kT6SlFIfXGjRuMHz+ezp0707JlS/r2\n7cv6FGwYu3nzZgYMGMBTTz1F69atefHFF1m2bFmKi3YkeuQvIiLiGIKCoGlTKFkSJky4/7nFi5vm\n9+PGwYYN8PzzZorA66+bUUNf3/SoOPXlymV+Bqn1yD82FmbMMAH13mkT6cEjJScPGzaMgwcP0q9f\nP4oUKcLatWsZNWoUdrudpk2b3vfaOXPmMG3aNJ5++ml69OiBh4cHx48f586dO4/0AayW1zsvzUo2\nY97+ebxS4xWryxEREXE5sbHwyScwbJgZCZ0zx4z8PciAAWbOaePG5vy5c81qfmfXsaMZTf7zT7Ox\nwKPYtMlsEjB9eurUlhLJDqlbt24lJCSEIUOG0KRJEwACAgI4d+4ckyZNonHjxri5JT4we/DgQaZN\nm0a/fv149p7mYVWqVHnE8h1Dl4pd6LOkD2evnsUnm4/V5YiIiLiMP/80o6CrVplH98OHg7t78q71\n8DCjhEOGmJD7zx6ozurpp6F/f1i69NHnkU6fbqY/1K+fOrWlRLIf92/evBlvb28aNWoU73jr1q2J\niIjgwIEDSV67aNEiMmfOTIcOHR66UEfWvnx73N3cmb9/vtWliIiIuIxffoEqVWD7dtOf9MMPkx9Q\n76pc2azyzygBFaBgQahb99Ef+V+5Yroj9OplTe/XZIfUsLAw/Pz8EoyWlihRAoDw8PAkr92zZw/F\nihUjODiYF154gaZNm9KlSxemTJni9I/7AfJ45aF5yebM269V/iIiImnNbofPPze9S/38YOdO0yNU\n/taxI6xenbB7QUr88INp4p+evVHvleyQGhUVRY4cORIcv3ssKioqyWsvXLjAqVOnGD9+PM888wxj\nx46lVatWfP/993z66acPUbbj6VKxC5uOb+J45HGrSxEREcmw7tyBV1+Ft94yDfk3bIAiRayuyvF0\n6AC3bsHy5Q9/j+nTTb/YokVTr66USJcWVHa7nevXr/PWW2/x9NNPExAQQO/evenQoQPr1q3j9OnT\n6VFGmurk34nsntmZHJKCDYFFRERcyPXrZstOu/3hrr96Fdq3hylTYOpUGDvW7PIkCRUvbvq9Puwj\n/8OHzXaw6d0b9V7JXjiVI0cOLl++nOD43RHUxEZZ7702MjKSGjVqxDtes2ZN5s+fz5EjRyhcuHCS\n17/11lvkypUr3rGuXbvStWvX5Jaf5rJlzkavx3sxZccUhjUcRhaPLFaXJCIikqYOHAAfH7OL04P8\n/js88wwcOmR2LerUyXxVrZq8+Y5//GF6gB4+bEYHW7R49Pozuo4dYfRouHkTsqQwlsyYATlzmn8U\n3Gvu3LnMnTs33rHIyMhHKzQJyR5JLVmyJCdOnCA2Njbe8bCwMODvuamJKVWqFPb7/LPJ9oA/nYGB\ngSxZsiTelyMF1LterfEqf17/kx/2/WB1KSIiImnm+HHTN9PfH8qWNc3e7zc6Ons2PPEEeHrC999D\no0ZmNLR6dbNyfNAgs9PTjRuJX79vH9SqZbYr3bRJATW5OnY0o89r16bsupgY8zvt2hW8vOK/1rVr\n1wSZLDAwMPWKvkeyQ2r9+vW5ceMGwcHB8Y6vXLmSfPnyUaFChSSvbdiwIQC//vprvONbt27Fzc2N\n8uXLp6Rmh1UuXzmal2zO+G3jrS5FREQk1d24ASNHQvnysGWLCZotW5rV302aQGho/POjo01z/B49\nzCjq1q1mm9EpU+DsWVizxgTOGTPMavRs2aBCBdOr9OOPYdkys3NS3bqmSf3WrfD441Z8cudUoYL5\nXaX0kf/atXD6tLWP+iEFIbVmzZpUq1aNwMBAfvrpJ3bu3MmYMWPYvn07L7/8ctxo6GeffUazZs04\nf/583LWtWrWiTJkyBAYGsmDBAkJCQvjqq69YvHgx7dq1o0CBAqn/ySzyes3X+e30b2w7vc3qUkRE\nRFKF3Q7z55vQ85//mEVLBw9Cnz7w7bdmFfnJkyZADhtmHi+fPGka60+ZAl9++ffWmnd5eJhFOZMm\nwZkzZr/5SZPMsT/+gM8+g7ZtzQKgJ54wI6haIJVynTub0ev7dApNYPp0M0r+j1ma6S5FO059+OGH\nTJs2jenTpxMVFUWxYsUYOnQojRs3jjvHbrfHfd3l7u7OmDFjmDp1KrNnz+bKlSv4+vrSr18/unTp\nknqfxgG0KdOGYjmLMWHbBGYUnmF1OSIiIg/FbjfBZv1604po40YzJ3TNGihTJv65zZvD3r1m9HP0\naLNz06VLkDWrCZc1a97/vTw8zDn3nme3m6B74oQJqVog9XAGDTL/wHjmGfjttwdvbXrwoBm9HjXK\nmt6o97IFBQU95Bq7tHfo0CFefvllQkJCqFq1qtXlJNunmz9l+IbhnHrnFPm8H3E/MhERkXRy7JjZ\nt379eggKMnNAM2WC2rVh8GB48skH3+PAAXjjDTNq+vXXj74tpzy6AwfMqOjTT5uR76TC58mTZmpF\njhxmOkfOnMm7/44dO6hWrRqTJ0+mbNmyqVZ3urSgcjUvVX0JgGk7pllciYiIyIPZ7TB0qFnE1L8/\nhIVB797mMX5kJAQHJy+ggpkSsHat2cVJAdUxVKhgWnbNmWOmVCTmzz/N/GB3d/N7T25ATUspetwv\nyZPPOx/PVXqOL7d/yb/q/At3txTu0SYiIpJOYmLM4qZJk8wj3tdfd4yAIqnruefM6Ohbb5muCvfO\nN71yBVq3hosXzTmFCllX5700kppGXqvxGscvH+enwz9ZXYqIiEiibt2C7t3hq69g2jT44AMF1Ixs\n7FioUsX0p42IMMdu3jS9UA8dglWrTA9bR6GQmkZqFK5BzcI1mbBtgtWliIiIJHDtmlk9v3Ah/Pij\nebwvGVvmzDBvnvndP/883L4N3bqZHrXLlkFAgNUVxqeQmoZeq/Eaq4+u5lDEIatLERERiXPxomn1\n9PPPsGKFafMkrsHPz2yusHKlaRm2ZInp3lC/vtWVJaSQmoa6VOxCPu98TNw20epSREREANOTtGFD\ns73o+vWmCb+4lpYtTT/bAwdMT9SnnrK6osQppKahLB5Z6FOlD9N3TefqratWlyMiIhnQ/bYj/aew\nMDNiFhlp+pda3axdrDN8uNlV6vnnra4kaQqpaax/9f5cu3WNqTumWl2KiIhkIJGRpm1UzpymddDZ\ns/c//8ABqFcP3Nxg82bTlkhcl83mOKv4k6KQmsaK5SpGj8o9+GzLZ9y8c9PqckRExMldvWp2dipR\nwqzW7tED9uwxi17Wrk38mh07oEEDyJPHjKAWK5a+NYs8DIXUdPBB/Q84d+2cRlNFROSh3bwJ//sf\nlCwJI0aYcHr0KEycCLt3Q+XKZkR1yBC4c+fv6zZvhsaNTagNDgYfH8s+gkiKKKSmgzJ5y9DtsW6M\n3jya6DvRVpcjIiJO5tQpKF8eBg6Edu3Moqdx48DX17xesKBZrf2f/8Do0SaUnjxpdg5q0cL0xly3\nzoykijgLhdR08kH9Dzhz5QzTd023uhQREXEisbHQq5fpabl/v9neMrHH9W5u8N57ZrT0+HHTXqht\nW7N6f8UKyJ493UsXeSQKqemkfL7yPFfpOT7Z/Am3Ym5ZXY6IiDiJL74wo6AzZ0LZsg8+v25d2LXL\ntBnq0QMWLAAvr7SvUyS1KaSmoyENhnDy8klm7pppdSkiIuIEfv8d/v1vGDDANN9Prjx5YO5cs9Vp\n5sxpV59IWlJITUf++f3pXLEzH2/+mNsxt60uR0REHFh0NHTvbvZS/+QTq6sRSX8KqelsSP0hhEeG\n882eb6wuRUREHNjQoaa36ezZelwvrkkhNZ09VvAxnqnwDP/Z9B/uxN558AUiIuJyNmyAMWNg1Ciz\nAErEFSmkWmBIgyEcu3SM2XtmW12KiIikoZiYlG1bCmYnqRdeMNuXvvtu2tQl4gwUUi0Q4BPA0+We\n1miqiEgGFRZmFjz5+ECpUqYFVHK9/jpcvgyzZoG7e9rVKOLoFFItMqzhMA5fPMw3uzU3VUQkI4iN\nNWG0bVsTTCdNMgufSpaEJ5+ELl3gzJnEr7XbTX/Tdu3MHNQJE7R1qYhCqkWq+lal22PdeGf1O5y8\nfNLqckRE5CHZ7fDll1CmjAmjp07BV1/B6dMQGAhr1sC335oQWr682SkqJsZce/s2zJkD1atDo0Zw\n7JgZQe3e3dKPJOIQFFItNL71eLJmykqvxb2ItcdaXY6IiMP480/YsSP+HvSPIqXzQpPr9m3o2xde\nfRVq1YKffzZ19+kDWbOac2w2EzpDQ6FbN3jzTXPuRx+ZUdbu3SFfPrOt6d698Pzz5hoRV6eQaqHc\nXrmZ2X4m68PW8/nWz60uR0TEEnfuwM6dZjTyhRfMrkr580O1alCggNk1ad48M08zpS5cMDsw5c8P\nXbvCjBnwxx+pU/eVK+bx/MyZZvRz9myoXTvpgJk7t5kC8PPPcOuWWbnfvDns2QOrVpkdohRORf7m\nYXUBrq5pyaa8Xett3lv3Hs1LNadSgUpWlyQikuaOHjXzN5cvh40b4do18PCAgABo1QpGjICiRWHt\nWliyxATATJnMI/F27eDFF/8eqUzK8ePQooUJt336wPr10Lu3GVWtXNmEwo4dzahmSv3xB7RpA0eO\nmM+Rkt2gatc2ofzmTfD2Tvl7i7gKjaQ6gI+bfkyZvGXovqA70XeirS5HRCTVRUebuZlvvw3lypld\nlN55x4woDhsGmzaZMLltm9mrvls304Jp5EgT6MLD4X//MyON77wDlSqZ+yVl3z6oU8eM0m7ZAqNH\nw2+/wfnzZg5oQIAZ/axdG1q3ht27k/9ZDhww150/D5s3pyyg3uXmpoAq8iAKqQ4gi0cWvu3wLaF/\nhjJk/RCryxERB3LrFuzaZf7rrJYsMcG0RQv48UczGrpwIUREmJHSQYOgXr37h7ZixeC118xj8f37\noUQJc79eveDixfjn/vyzCbj585sQWarU36/ly2ce+8+caVbaz5tnRnWrVDHTCsLC7v9ZNm404Td7\ndti61YzIikjaUEh1EI/7PM6oxqMY+8tYNoRvsLocEXkE0dHQsyeMHWvaEj2Mq1fNyvBSpUyAypMH\nnnrKjDIePJh2C4FSU3g4PP20+apQwSwoOnECJk+G9u1N0HsYpUvDunUwZQosWmTuPW+e+ZncffRe\nqZLZtcnXN+n7uLlB585m1PXLL810gHLlzMKm48chJMSE2YEDzWirnx80bAhVq5rwW6TIw9UvIsmj\nkOpA3qn9Dg2LN+SFhS8QeTPS6nJE5CHY7Wbe49y5Jtw0awYnU9Bl7s8/zXzMYsXM9U2amNHDoUPh\nxg1zrHx5KF7czLP88kvzqPyfo4lWunXLPF739zdB78cfzdzTKlVSb2GQzWY+/4EDZhT22Wf/nq/a\nvLn5meXKlbx7ZcoEL79s5peOHGmmARQvbtpC9eoF8+ebc7p3N7/XFSsgZ87U+RwikjQtnHIg7m7u\nzGw/k8pfVuatlW8xo/0Mq0sSkRQaMcLMefzuO7My/YUXzCPhyZNNM/fE2O0mbH31lRkdtNtNW6N3\n3vm7oXuLFjB4sFlgtHEjrF5tRhNnzvy7TVOhQmYEsVIl6NDBhLf0FBtr5om+9RYcPmz+O3z4w4+Y\nJoevrwmRCxaYEdBevUxw93iIv928veG990xgXbPGtIeqUAGyZUv1skUkGRRSHYxfTj/GtBhD36V9\n6VO1D/X80vlvGRF5aLNmwYcfwn/+Y0b2wLQX6t/ffP/TT6aRe44cppn71q2weLF5ZH34sGlR9K9/\nwRtvmLmTicma1Tx6bt3afH/rlrn299///vrxR/i//zNzJwcNMjsguaXhc7NDh+Cbb8zX8ePmfXfs\nSN/5mh07mmCeGiO1efL8/fsTEevocb8D6l2lNzUL1+S15a9xJzaVOlmLSJoKDjaPn1980YzG3ZU7\ntxlVnTnTLBYKCICXXjKjnvXqmeMNG8KyZWYhz8iRSQfUxGTODBUrmlD10UfmPcLCYOlSE0zbtzcj\nq9Onp+7iq4sXzYhl7dpmHucXX5jR3k2bzHxNKxYUqceoSMaikOqA3GxuTHhyAnvP7WXitolWlyMi\nD3Dw4N+P1ydNShiWbDbz2H/3brMqfcsWs7BqyxYTTKdMMT03s2RJnXrc3Mwiq02bzHuUKWPmyZYs\nacLqo9q500xDeOMNyJsXvv8ezp410xXq1VNYFJHUocf9Dqp6oeq8XO1lhgYNpUvFLvhk87G6JBFJ\nxJ9/moBZsKCZG5k5c9Lnlihh5pGmpzp1zJSC/fvNNITevc00g3Hj7l9rUi5dgmeeMaOny5aBj/6v\nSUTSiEZSHdh/mv6HTG6ZGLx2sNWliEgiLlwwj9Ojosx809y5ra4oaf7+ZtemqVPN1qCNG6d8e9DY\nWDMiHBlp5r0qoIpIWlJIdWB5vPIwutloZu2exeYTm60uR0T+Yrebeab+/mZV/pIl5lG6M3jpJTN/\nNizMtFj67bfkXzt6tAnjs2ebFk0iImlJIdXBaRGViGM5c8aMnnbtakYj9+9/uL3frVSrlulfWqyY\n2ZkpOfNU1641vVqHDv27s4CISFpSSHVwWkQl4hjsdpg2zYye/vabmX86b56Zi+qMfH0hKMg8vu/d\n24ywHjmS+LmnTplQ3qwZDBuWvnWKiOtSSHUC1QtVp1+1fgwNGsrZq2etLkfE5Zw+bdor9eljVvHv\n32/6cjo7T0+zIn/SJNOrtUwZePJJszvU3e1cb90yW4d6eZnH/O7u1tYsIq5DIdVJ/KeJWUQ1cM1A\nq0sRcSlBQWav9gMHYOVK82jckRdIpZTNZnZYOnXKfLZz50y3grJlzYYAAwb8vbVpSvq3iog8KoVU\nJ5HXOy9jWozh2z3f8sO+H6wuRyTDs9vhv/81j7grVTK9QVu2tLqqtOPlZbYU3b4dfvnFzFv997/N\nKOvnn0PNmlZXKCKuRiHVifR8vCed/TvTd2lfwiPDrS5HJMO6csU84h40yHytWgX581tdVfqw2UxA\n/fZbOHkSVqww27qKiKQ3hVQnYrPZ+KrtV+TKkotu87txO+a21SWJZDgHDphRw9WrYcEC+OQT8HDR\nbU8KFoRWrbSDlIhYQyHVyeTKkou5z8zlt9O/MWLDCKvLEXF6kZFm69DJk+H116FGDbOt6PbtZpGU\niGE8hFAAACAASURBVIhYw0XHB5xb7aK1+ajxR3yw/gOalmxKkxJNrC5JxGlcv25WtK9eDb//bh5p\ng1m1XrasmZc5ejRky2ZpmSIiLk8h1UkNrjeYdWHr6LGgB7v77yZ/VheZMCfykG7cMKOlo0dDRAQ0\nbw7du5tFUZUqQfnypiWTiIg4BoVUJ+Vmc+ObDt/w+KTH6bW4F8u6LsOmiWMiCdy8aUZOR4+G8+eh\nZ0/44APn2cZURMRVaU6qE/PN7suM9jNYfng5n//6udXliDiMK1dg40YTTEuVgrffNu2jDh40u0Yp\noIqIOD6NpDq5J8s8ydu13mbQmkE0KdGEygUrW12SSLqy201fz19+MU3nd+yAQ4fM8SxZTCupoUPN\nbkoiIuI8NJKaAXzS9BPK5StHz0U91ZZKXIbdbvqX1q4NdeuaIBoebuaafv017NljRlRnzVJAFRFx\nRgqpGYCnhycznp7B3nN7+XjTx1aXI5Km7HZYu9YE01atTLuoVatMIP35Zxg3zqzQf+wx1+1vKiKS\nESikZhDVClXj/frvM2rTKHb+sdPqckTSxIYN0LChGS2NiYGVK02P0xYtTAspERHJOBRSM5AhDYbg\nn9+fnot6civmltXliKSamzehb19o3Nj0OV22DLZuNYuh1NRCRCRjUkjNQDK7Z2Zm+5kc+PMAHwV/\nZHU5IqkiPNw82v/2W5g6FbZtgzZtFE5FRDI6hdQMJsAngKENhvLJ5k/Yfma71eWIPJIVK6BqVbh0\nyazef+klhVMREVehkJoBvVfvPSoXrEzPRT2JvhNtdTkiKRYbCyNGmBHTOnVMa6mAAKurEhGR9KSQ\nmgFlcs/EzPYzORxxmBEbRlhdjkiy3bljWke1aQMffggffQRLlkDu3FZXJiIi6U0NWjKoxwo+xohG\nIxgaNJSWpVvSqHgjq0sSief2bRNId+z4+2vPHrNIKm9e01aqeXOrqxQREasopGZgg+oOYn3Yejp8\n34HNL26mYoGKVpckAsD27dC9u9kZys0NKlQwc0+7dTP/rVoVsma1ukoREbGSHvdnYB5uHszvMh+/\nnH60nt2a01GnrS5JXFxMDHzyidklKnt2CA42Tfh//93sDDVgANSvr4AqIiIKqRleziw5Wd5tOXbs\nPDnnSaKio6wuSVzUiRPQpAl88AEMHGh2h2rQALy9ra5MREQckUKqCyicozAruq/geORxnpn3jBr9\nS7r7/nuoXBnCwiAoCD7+GDJntroqERFxZAqpLqJSgUosem4RG49vpM+SPtjtdqtLEhewbRs8+yw8\n95zZHWr3brOtqYiIyIMopLqQRsUbMbP9TL7Z8w1Dg4ZaXY44MLvdtIN6GNevw7RpUL061Kxpti+d\nORO++06tpEREJPm0ut/FPFfpOU5FnWLgmoGUzlOaXgG9rC5JHMi1azB7NowfD/v2QcmSUL58/K+S\nJc2K/NhYsxAqNtZ8XbxowujMmRAVBa1bw9Kl5r/u7lZ/MhERcTYKqS7o3drvsv/Cfl5f/jr1/OpR\nOk9pq0sSix05AhMnwtdfm9X2bdvCyy/DsWMQGgoLFpj5pA+aJZI/P7zyCvTrByVKpE/tIiKSMSmk\nuiCbzcbnrT4n+Hgwzy98nk0vbsLDTX8UXM2dO7ByJXz5JaxYYR7F9+9vvooXT3j+zZv/z959x9d8\nfw8cf90MISGJik1Ea8YWm9h7VsygNqnRitmqVUWHVb5m7b1qx6odUtSsGXvE3pLGltzfH+eHKlUh\nyefem/N8PD6Py03u/ZxLxrnvz3mfA6dOwfnz8nc7O1khfX6bKJFc3ndyis9XoZRSylZpZpJAJXNK\nxpy6cyg1vRQ/bP+BfmW0RjWhOHkSpk+Xy/JXr0rj/KlTZXNTkiT//rjEiSFPHjmUUkqpuKZJagJW\nPGNxvin1DQODB1I1S1UKpy9sdEgqjkRGwq+/yuX8kBBwd5eJT61bS5KqlFJKWRrd3Z/A9S/TnwJp\nC9BsWTPuP7lvdDgqDgQFyWanNm1kNXT+fFlBHTtWE1SllFKWS5PUBM7R3pE5dedwMfwiPTf0NDoc\nFYsePICOHaF2bShaVDZBbdggl/UTJzY6OqWUUurtNElVZPfIzojKI5iwdwKrT642OhwVC/bvBx8f\nmDFDdu2vXPnmzVBKKaWUpdIkVQHweaHPqZ61Om1WtuHm/ZtGh6OQdk+PH8fsMdHRMHQoFCsmm6D2\n7ZOWUCZT3MSolFJKxRXdOKUAaUs1tfZU8kzIw2fLPmNVk1XalsoA9+/Dpk2wZo0cFy+CmxukSQOp\nU8ttmjTw0Ufw9Kkksc+PR4+kp+mePdCzJwwaJG2hlFJKKWukWYh6IU3SNMyvN59qc6vx+arPmVxr\nMiZdgotzFy/CsmWwejVs3QpPnsAnn0DdupA/P9y6BdeuwfXrcnv0qEx3SpRIepL+/fjoI9i4EcqX\nN/pVKaWUUh9Gk1T1ioofV2Rq7am0WN4CTzdP+pfpb3RINslslhXTceOkXtTeHsqUgZ9+gurVIVs2\noyNUSimljKVJqnpN83zNuRh+kb5b+pLRNSOtCrQyOiSbEREhTfTHj5dL87lzS6LapAm4uhodnVJK\nKWU5NElVb/SN7zeEhYfRLqgd6ZKlo0qWKkaHZNWuXoUffpBJTw8fgp8f/PIL+PrqpiallFLqTTRJ\nVW9kMpkYV2McVyKvUG9RPba12kbBtNr5Pabu3JHd9v/7n/Qm7dYN2reH9OmNjkwppZSybNqCSv0r\nBzsHFtRbgHdKb6rPrc75e+eNDslqREbCkCEy6WnsWOjeHc6dg4EDNUFVSiml3oUmqeqtXBK5sKrJ\nKlwSuVB6emnmHZ5HtDna6LAsktkMJ0/CyJGyO/+776BFCzhzRtpBubkZHaFSSillPTRJVf8plUsq\nNjXfRMG0BWm6tCk+k3xYd3odZrPZ6NAM9fQp7N4tSamfn/QxzZ5depTWqCEJ6+jRcr9SSimlYkaT\nVPVOvNy9WN54OSGtQkiaKCnV5laj/Kzy7L682+jQ4tWzZ7BunezGd3eHokWhb1+4excCAuRjd+7A\ntGmQKZPR0SqllFLWSzdOqRgp6VmSbS23serkKnpv6k3RKUVpmKsh0+tMx9nR2ejw4syhQzBrFsyd\nKw31vb2hTx+oUAEKFNDJTkoppVRsi1GS+vDhQ6ZOnUpwcDARERF4enri7+9P+RiOt5k6dSpz587F\ny8uLadOmxeixyngmk4la2WtRPWt15hyaQ8c1HWmxvAUL6y/EzmQ7i/MREZKYTpkCBw+Ch4esoDZv\nDgULausopZRSKi7FKEnt378/J06coH379mTIkIGNGzcyePBgzGYzFSpUeKfnOH36NIsWLSJ58uQ6\nctPK2dvZ0yJ/C1ydXPFb5MeALQMYVH6Q0WF9sNBQabA/c6b0NK1dWzZBVasGjo5GR6eUUkolDO+c\npO7atYt9+/bRt2/fFyun+fPn5/r160ycOJFy5cphZ/f2VbSoqCh++uknateuzenTp4mIiPiw6JVF\nqJuzLj9U+IHem3qTwyMHTfM2NTqkGHv2DFatknZRmzZBqlQQGCh1phkyGB2dUkoplfC887XZkJAQ\nnJ2dKVu27Cv3V6tWjdu3bxMaGvqfzzFv3jwiIyNp3bp1gt8Zbmu+KvkVzfM1p83KNuy8uNPocGIk\nOBjy5YO6deH+fZgzB8LCpG2UJqhKKaWUMd45ST137hyenp6vrZZmzpwZgPPnz7/18efPn2fOnDl0\n7dqVJEmSxDxSZdFMJhOTak6icPrCfLrwUy7cu2B0SP/p+nWpLy1bVnqY/vEH7NwJTZuCk5PR0Sml\nlFIJ2zsnqREREbi6ur52//P73nbpPioqiqFDh1K6dGmKFCnyHmEqa+Dk4MTShktxcXSh1vxa/PX4\nL6NDeqOoKBg/Xnqarl4tG6NCQkC/NJVSSinLES9bsRcvXsyVK1fo3LlzfJxOGSilS0qC/IO4EH6B\nJkubEBUdZXRIL/z1F2zeDMWKQadO0KABnDgBbdrAf5RTK6WUUiqevfPGKVdXV8LDw1+7//kK6ptW\nWQGuX7/O9OnTCQgIwN7ensjISEBWV6OiooiMjCRRokQkekujycDAQNzd3V+5z9/fH39//3cNX8Wj\nXKlysbD+QmrMq8GnCz9lnt88kjkli9cYjh6FvXvl9uhROHJE6kxB6k937IDixeM1JKWUUsrqzZ8/\nn/nz579y37179+LkXO+cpH788cds3ryZ6OjoV+pSz507B7ysTf2nq1ev8uTJE8aMGcOYMWNe+3jt\n2rWpV68enTp1+tdzjxo1ioIFC75rqMoCVM1SlVX+q2i8pDElppUgyD8IL3evOD+v2QyDB0P//vL3\nTJkgd25o3Bhy5ZIjf36wt4/zUJRSSimb86ZFwv379+Pj4xPr53rnJNXX15fVq1cTHBxMuXLlXty/\nbt06PDw8yJkz5xsflyVLFn7++edX7jObzYwbN44HDx7Qq1cvPDw83jN8ZcmqZa3GzjY7qTW/FkUm\nF2Fpo6WU8iwVZ+d78gTat5f+pgMHQteukCx+F3CVUkopFUveOUktUqQIPj4+jBo1igcPHpAuXTo2\nbdrE3r176dOnz4vG/EOHDmX9+vXMmzePVKlSkTRpUvLly/fa87m4uBAVFfXGjynb4Z3Smz/a/kH9\nRfWpMKsCk2pOokX+FrF+nrt3wc9PLuPPnSuToZRSSillvWI0ceq7775j6tSpTJ8+nYiICDJlykS/\nfv1eWVk1m80vjrcxmUw6cSqB8HD2YP1n6+m0uhMtV7Tk6M2j/FDhB+ztYuea+9mzUL063LoljfhL\nxd1irVJKKaXiiWnLli0W21X/5MmTBAQEsG/fPq1JtQFms5nRf4ym+/ruVP6kMnPqziGFc4oPes4d\nO6BOHUieXNpJZc0aS8EqpZRS6p08r0n95ZdfyJYtW6w9rzbeUfHGZDIRWCyQNU3WsPvybnwm+bD3\nyt73fr7ly6F8eciZU5rwa4KqlFJK2Q5NUlW8q5KlCvvb7yeVSypKTivJpH2TYjwmd/ZsqF8fateG\nDRsgxYctyCqllFLKwmiSqgyRyT0T21ttp02BNgSsCqD1ytY8ePrgnR47dqyMM23VCubP1xGmSiml\nlC3SJFUZxsnBifE1xjPr01ksPLKQElNLcObOmX/9fLMZhgyBL76A7t1h0iTtd6qUUkrZKk1SleE+\ny/cZu9ruIvJJJFXmVCHySeRrn2M2Q8+e0LevNOsfNgy0OYRSSilluzRJVRYhb+q8rG26lquRV+n2\nW7dXPhYVBe3awYgRMGYM9OmjCapSSill6zRJVRYja4qs/FzlZybvn8zKEysBiIiAunVhxgyZJNW5\ns7ExKqWUUip+xKiZv1KxzmyGsDA4cAD+/JN2SZOyL1lZ2q5sy4oqh2nTODWXL8PKldKwXymllFIJ\ngyap6sPcuyc9oJydwd1duuonTy5/TpIEHj2CO3fg9u2Xt7duwfHjLxJT7t2T5/LwwHT/Pr88fEh3\nD3tWzCxJ3sgZLNtZnOzeukNKKaWUSkg0SVXvJzwcRo2Cn3+WP7+JgwM8e/bmj338MRQoINv0CxSA\n/PkhXTrMDx6y4ouN3Fk6hhbnNtLzgS+UTSmzTl1dJRl2cZHjeWKcPTvkygUeHnH3epVSSikVrzRJ\nVTETHg6jR0ty+ugRdOgAgYGSkN69K6uiz2/DwyFZMum0/9FHL2+TJ39j76iHD6H9587MmVObr7+u\nTb8i7Ti1bjaLnT7loxMX4MYNuH9fjgcP5DYiAqKj5QlSpZJkNVcuyJMHGjSQcymllFLK6miSqt7O\nbJak89IlWLECRo6U5PTzz6FXL0ib9uXnpkv33qeJjISqVWH/fpg3D/z9IfLJzxS4tJXqSQ4RMioE\nB7s3fLk+eQKnTsHRo3IcOwabNsGECdKzKjBQDk1WlVJKKauiSap66fx5WLJE6kQvXYLLl+X24UP5\nuJOTJKdfffVqcvqB7t+HGjXg0CHYvBmKFZP7kyZKypy6cyg5rSSDggcxsNzA1x+cKNHL1dO/u35d\nmqkOGyZlCZqsKqWUUlZFk9SE7swZWLxYjr17IXFi8PGBjBmhUCHIkAHSp5fbbNnkcn0sevgQ6tSB\nfftg/fqXCepzRTMU5duy39JvSz8c7BzoW7ovpndpkpo6NQwfLqupf09Wu3SBSpWkvvXvh6NjrL4u\npZRSSn0YTVITosePYdw4mD1bVk2TJJGlzB49pM9TsmTxEsajR9IDdedOWLsWSpR48+f18e2DCRN9\nt/Tl8l+XGVd9HPZ277jb/5/J6vDhMGjQ65+XJAlkyQLly0OFClCmjCSvSimllDKEJqkJzdq18OWX\ncmnfz0/GN1WrJrvl49GTJ1C/PgQHw+rVULr0v3+uyWSiT+k+pEuWjnZB7bgWeY159ebh7Oj87id8\nnqz27w9Xr8qGq/BwuX3+50OHYNky2Rhmbw+FC0vCWqECFC8uq8xKKaWUiheapCYU585JTebKlbJa\nuGIFeHsbEsrTp9CoEWzc+DKcd9GqQCtSJ01Ng18bUHFWRYL8g0jhnCJmJ39+ef/fmM1SArFpkxwT\nJ8KQIZKglioFFStK0lqgwBs7FCillFIqduhYVFv38CF8+60kpPv3w6JFkh0alKDevy8J6urVsHQp\nVK4cs8dXz1qdLS22cOrOKUpOK8n5e+djN0CTSS77BwTIv9WNGzJ0YPBgqVsdNEhWWFOmlBfyxx+x\ne36llFJKAZqk2q5nz2Tgvbc3fP89dO0KoaHSO/RdNh7FgePHoUgR2SC1ZMn7jzktkr4IO9vs5Gn0\nU0pMLcHRG0djN9C/s7OTQQPdu8OaNTI1a/t22YB16JDs9KpSBX7/Pe5iUEoppRIgTVJtzfPkNEcO\naNVKLksfOSKJatKkhoW1cKEsQJrNsGcP1Kr1Yc+X5aMs7Gi9g1QuqSgzowz7ruyLnUD/S6JEctl/\nwADpy7pokdS4lioldQtbt8qLVEoppdQH0STVVvwzOc2XT3buL10qraMM8uSJ7NNq3Bhq14bduyFn\nzth57tRJU7OlxRayfJSF8rPK83tYPK9m2tnJyvTzf+e7d6FcOdkFNmmS9JlVSiml1HvRJNUWBAVJ\n5teqFeTNKzWUS5ZIomqgsDDJ1yZOlI5Xc+bE/mJu8iTJ2fDZBgqkKUDlOZXZeHZj7J7gXdjZSS+t\n/ftlJ5i9vYyLzZBBVrL79YNduyAqKv5jU0oppayUJqnW7OpVWcmrXRs++USS06VLpYbSYMuWQcGC\nEmJICHTsGHelsMmckrGm6RpKZypNjXk1CDoRFDcn+i8mk9QxbN0KN2/KfNdcuWD8eGlhlTatbLx6\n/NiY+JRSSikrokmqNYqOluXJnDlh2zaYP1/6n1pAchoZCW3bSgtWX19ZXCxSJO7P6+zozPJGy6mZ\nrSZ+i/xYeGRh3J/0bT76CPz9Zfn4xg3J1P394bvvZIV761Zj41NKKaUsnCap1uboUcn+OnSQVdTQ\nUCn4NGjH/t/t2iV58oIFMGWKLOqmiGEb0w/h5ODEwvoLaZy7MU2WNmH6genxd/K3sbeHkiVlSMCf\nf4KHh9SutmwJt24ZHZ1SSillkTRJtRbPnskqXIECcPu2jGqaPFlW7CwgtIEDZYO7h4fkYW3aGJM3\nO9g5MPPTmbQr2I7WK1sz5o8x8R/E2+TKJavfkydL/Wr27DB9unYEUEoppf5Bk1RrcOqUZIADB0Kv\nXnDw4NvniMajK1cktEGDZH9QSIj0wjeSncmOCTUm0L14d75c9yU/hvxobED/ZGcnNRHHj0ONGtC6\ntayO795tdGRKKaWUxdAk1ZKZzfDLL3IN/dYtaRg/eDA4ORkdGQD37kHVqnDpkiSnAwaAg4UM2jWZ\nTAyrNIwBZQbQe1Nv+m7ui9nSVitTpYJZs2T86l9/QdGi0LSptEVQSimlEjhNUi3V9euya//zz6FZ\nM7mGXqyY0VG98OgRfPqpJKjr11tUaC+YTCa+LfstwyoNY8j2IXT9ravlJaogQwD275cSgE2bpASg\nTx9JXJVSSqkESpNUS/Tbb5A7t1z+DQqS1VQDp0X9U1QUfPaZjK0PCpLJq5asR4kejK8+ntF/jKZ9\nUHuioi2wX6m9vZQAnDoFPXrAyJGQNStMm6b1qkoppRIkTVItze+/Q506UKgQHD4MNWsaHdErzGYI\nDJSd+/Pny6Z1a9ChcAdm1JnBtD+nUXhyYRYfW2yZyWqyZFLge/IkVKwoO9Bq1pSGs0oppVQCokmq\nJTl1ShLUokVh+XKpWbQwP/0EY8fKBKlPPzU6mphpkb8FW1ts5aMkH9Hg1wbkGp+LGX/O4GnUU6ND\ne13GjNJjddUq2LcP8uSBxYuNjkoppZSKN5qkWoqbN6FaNenhtGyZxWyO+rsZM6B3b+jfX0plrZFv\nJl82Nt/Irja7yOGRg1YrWpFlTBbG7h7Lw6cPjQ7vdTVqwJEjULas9MVt1kx2rCmllFI2TpNUS/Dw\noWyS+usvmRxlAb1P/+nXX6Vksm1b+PZbo6P5cEUzFGV54+Uc7nAYX09fuqzrgvd4bzae3Wh0aK/z\n8JD/gFmzpAg4Tx7ZYKWUUkrZME1SjRYdLbuQDh6US7uZMxsd0SuePIGuXaFhQ1nImzDBIoZbxZrc\nqXIzx28OxzsdJ7N7ZirNrkS7le0IfxRudGivMpnk6+TwYdlQVbEifPUVPLXAUgWllFIqFmiSarSe\nPV/uQipc2OhoXnHxolxlHjtWJnrOm2c5fVBjW9YUWdnYfCMTa0xk4dGFeI/3ZtXJVUaH9TpPT9i4\nEYYOlQ4ApUrB2bNGR6WUUkrFOk1SjTR2rCQao0fLhikLsm6dTGC9dAm2b4cvv7StFdQ3sTPZEVAo\ngKMdj5IvdT5qza9Fs6XNuPXgltGhvcrOTt7c/P67DHkoUAAWLDA6KqWUUipWaZJqlJAQ6eUUGAhf\nfGF0NC9ERcl40+rVoUgROHDAMhv1x6WMbhlZ3WQ1Mz+dyZpTa8g5LidT9k8h2hxtdGivKlJEhgDU\nqAH+/tKu6v59o6NSSimlYoUmqUa4dUuSihIlYNgwo6N54elT8POD77+X6aurVkGKFEZHZQyTyUTz\nfM051ukYVbNUpV1QO4pNKcaey3uMDu1Vbm4wd640/V+wAAoWlFrVX36BDRvgzBmtW1VKKWWVNEmN\nb9HR0KKFzBW1oCJPsxkCAmDNGtlA/s03clU5oUuTNA2z685mW8ttPI56TNEpRWkf1N6ySgBMJmjV\nSvqpZssmnQA6dYLKlSFLFkiSRDbkNW4M06fD5ctGR6yUUkr9J8vIkBKS4cMlE1y7FjJkMDqaF/r2\nlfxlzhy51K9e5ZvJl33t9zFx70T6bu7L4mOL+b7C9wT4BGCylGLdHDnkHQbI6unFi7Kp6uxZGRQR\nHCwlAWYz5MoFVapIIuvrC87OxsaulFJK/YMmqfFpxw5Zovz6a6ha1ehoXhgzRi7xDx8OTZsaHY3l\ncrBzoHORzjTM1ZDeG3vTYXUHQsJCmFJ7CokdEhsd3qscHeHjj+X4u9u3pTvAb79JecDIkbKaX6CA\nlJ88PyzoDZRSSqmESS/oxpfbt6FRI9mFNGiQ0dG8sGgRdOkC3bpB9+5GR2MdUrmkYmqdqSysv5Al\noUsoO6Ms1yKvGR3Wu0mRQr4Op02T1g1HjsD//gfZs8sqbKNGMpLV01MmN5w5Y3TESimlEihNUuPD\n8zrUhw+lH6qF1KFu2SL94f39LWr/ltVomKsh21puIyw8jCKTi3Dg6gGjQ4oZk0ku+3foALNnS0J6\n9ar07W3YUMpScuSQj2sdq1JKqXimSWp8GDkSVq+WsZYZMxodDSADrj79FMqUkVpU3ST1fgqnL8ye\ndntI5ZKKUtNLsTR0qdEhfZg0aaBuXan9OH1a6kAWLZINWD16SGcKpZRSKh5oahLXQkKgd2/o1cti\ndiRdvgzVqsl0zSVLIFEioyOybuld07Ot1TZqZK1BvUX1GLJtCGaz2eiwPpyzswwNOHtW2lpNmiRd\nAr79VublKqWUUnFIk9S4dP26XDYtXlwaj1qAhw9lBdXeXvqgJktmdES2wdnRmQX1FzCgzAD6bulL\n5TmVOX/vvNFhxQ43N0lMz56Fzz+X1dWyZaWmVSmllIojmqTGlWfPpC9ldDQsXCi7rQ1mNkPr1nD0\nKKxcKVd2VeyxM9nxbdlvWdd0HSdunSD3+NyM3T3W8iZVvS8PDyle3rZN2lsVLCiFzUoppVQc0CQ1\nrvTpI0PvFy2CtGmNjgaQBbAFC2DmTOk4pOJGlSxVONLxCM3zNeeLtV9QdkZZTt4+aXRYsadYMRnH\nmjcvVKwIP/0k74CUUkqpWKRJalxYtgyGDpVf3qVLGx0NICH17QsDBkCDBkZHY/tcnVwZX2M8W1ps\n4cpfV8g3MR/DdwznWfQzo0OLHSlTSq/Vr7+Ww88PwsONjkoppZQN0SQ1tp06BS1byi/tbt2MjgaA\nQ4ek1VT9+tC/v9HRJCxlvcpyqMMhOhbqSK8Nvai/qD5Pomxk05G9PQwZAitWyGX/QoVg8mS4edPo\nyJRSStkATVJj0/37UK+eFHtOny59KA124wbUri07+WfM0FZTRnB2dGZElREE+Qex5tQa/Jf48zTq\nqdFhxZ7atWHvXvDyko1VadJAuXIwdqz2V1VKKfXeNGWJLWaz/II+c0b6Orm6Gh0Rjx5JzvzokSx2\nubgYHVHCViNbDZY0XELQiSCaLm1qO5f+QfqobtggwwAmTgQnJ+jaVcarligBP/4ozXm1dlUppdQ7\n0iQ1tkycCHPmSC/J3LmNjoaoKGjaVBa4li2TKZfKeLWy1+LXBr+y7Pgymi1tZluJKkCqVNCuHaxb\nJ8v4M2fKfYMHQ/78krS2aQOLF2sNq1JKqbfSJDU2/PEHdOkCnTpJZmiw54u6K1ZIc4HixY2OSP1d\nnRx1WFh/IYuPLabF8hZERUcZHVLcSJ4cmjeH5cvh9m1ZaW3cGHbulN17KVJIqcD+/UZHqpRSnc5m\nRAAAIABJREFUygJpkvqhbt6UHUmFCsn4UwvwzTcwZQpMmwa1ahkdjXoTv5x+LKi/gIVHFtJyRUvb\nTVSfc3KSdlUjRsCxY3D+PPzvf3D8OPj4yEbDQ4eMjlIppZQF0ST1Q0RFgb8/PH4sS5YWMF90xAgp\n/xsxQhaxlOWq712fuX5zmXd4Hs2WNePRs0dGhxR/MmWCjh0lYZ0xQ+pV8+WDRo3kPqWUUgmeg9EB\nWLV+/aT1zoYNUmtnsJkzoUcP6N3bYrpfqf/QKHcj7Ex2fLbsM87fO8/yRstJnTS10WHFHwcHaNEC\nmjSRL+BBg6Smu2JF6RLg7v7qkTq19B7Web5KKWXzNEl9XytWwA8/SMP+8uWNjoaVK2U/Srt20rpS\nWY8GuRqQyT0TdRbUofDkwgT5B5EvTT6jw4pfjo7Qtq009J02DdasgXPn4N69l0dkpHxuokTS4qpW\nLTl0V6BSStkkvdz/Pk6dkmvpdetCz55GR8O2bXKVtE4dmDDBItqzqhgqkr4Ie9rtIaVLSkpOK8ny\n48uNDskYTk7QoQMEBclY4cOH4eJF+OsvePJEvveGDoVnzyAwUMoG8uWTKRV37hgdvVJKqVikSWpM\n3b8vmzxSp7aIhv1//imLSSVKwLx5MgRIWacMrhnY1nIb1bJWw2+hHz+G/IhZ+4q+5Ogo/Vi7dIGN\nG+HWLVi4EPLmhdGjoUAB6RyglFLKJmiSGlN9+sDZs7B0Kbi5GRrKmTNQtapMk1q+XBahlHVzSeTC\nwvoL6Vu6L7039abF8hYJa0NVTLi5QcOGMHs2HDkideGlS8OwYRAdbXR0SimlPpAmqTFx5gyMHy+J\nqsEN+69ehUqV5Pf02rW6j8SW2Jns+K7cd8zzm8eio4soP7M81yOvGx2WZcuYEbZuhe7doVcv6b96\n65bRUSmllPoAmqTGRJ8+kDKl1MIZ6O5dqFJFSvTWr5eQlO3xz+PPtlbbOHfvHEWmFOHgtYNGh2TZ\nHB2l/9qaNTJgo0ABCAkxOiqllFLvSZPUd7Vnj9S/DRoEzs6GhfHggdSgXr4sCWqmTIaFouLB8w1V\nKZKkSNgbqmKiWjUp1s6cGcqWhZo1YcAAaYFx+bKMZFNKKWXxNEl9F2az7OLPlUt6Ohrk6VMpwTtw\nAFavBm9vw0JR8SiDawa2t9pO1SxVdUPVu0qfHjZvftkJYPx4aX+RIQOkTQvVq0srjMePjY5UKaXU\nv9Ak9V2sWQPBwdIT1aDt82az9EBdv172bBUrZkgYyiAuiVxY1GDRKxuqHj/TBOutHBxkqsW6dXDj\nBoSFwbJl8o0UHQ2dO0O2bDB1qrwDVEopZVE0Sf0vUVHw1Vdy2bB6dcPC6NNHBvLMmCH1qCrh+fuG\nqoVHF1J9XnUiHkcYHZZ1MJlkc9Wnn0rJzrp1cPSovNtr21YuS8ydK9/vSimlLIImqf9l5kz5ZTZ0\nqGE9UceOleFWI0bI9EiVsPnn8Wd9s/Xsu7KP0tNLc+WvK0aHZJ1y5JA68z//lCS1WTPpubp8udat\nKqWUBdAk9W0ePJBJNo0aQeHChoSwZAl8+aVctezWzZAQlAUq41WGkNYh3H54mxJTS3D81nGjQ7Je\n+fLJmOM//pBa1rp1pQHxyZNGR6aUUgmaJqlvM3q01LINGWLI6bdtg6ZNoXFj6U+u1N/lTpWbHa13\nkDRRUkpOK8mOizuMDsm6FSkiRd8rV8r41Tx5pM7mwQOjI1NKqQRJk9R/c+uW9Fzs0AE++STeT3/k\niPQjL1lSpq/a6f+UeoOMbhnZ3mo7uVPlpsKsCqw8sdLokKxfrVpS4vP111JjkzOnlgAopZQBNPX5\nN89XT/v2jfdTX7woVxu9vGQzso47VW+TPElyfmv2GzWy1qDuwrrUnl+bRUcX8fDpQ6NDs15JksDA\ngfJuMVeulyUAK1bIFA2llFJxTpPUN7l5E375RYpA43mc05MnsgHZ0VHGnbq6xuvplZVK7JCYhfUX\nMrbaWG7cv0GjxY1IPTw1rVa0YuPZjURF667195IlizQlXr4crl+Xb860aeUKS0iItLJSSikVJzRJ\nfZMxY+T6eufO8X7qfv3g8GHZMJU2bbyfXlkxezt7OhTuwK62uzjZ+STdi3cnJCyESrMrkfHnjHyz\n6RvCwsOMDtP6mEwyCODPP+Wbs317SVx9faUUqE8fSVh1hVUppWKVJqn/9Ndf0vOpfXtIkSJeT71l\ni2yQGjwYChaM11MrG5M1RVYGlB3Ayc4n+aPtH/jl9GPcnnFkHp2ZTxd8yvoz64k26ypgjOXOLf3g\nzp+HrVuhUiWZXOXrC+7u8vchQ+D33zVpVUqpD6RJ6j9NngyRkfHe7+nuXWjeHMqUge7d4/XUyoaZ\nTCaKpC/C2OpjudztMuOrj+fs3bNUmVOFHGNzMGrXKO4+vGt0mNbHzk6+WSdNkvKgvXtlSEDixNJT\nuVQpSJ4cevfWaVZKKfWeNEn9u8ePZTdvs2Yy4zuemM3w+eeSG8+aZdjkVWXjkiZKSkChAA5+fpDt\nrbbjk86HXht6kWZEGvwW+rE0dCmPnj0yOkzrY28PPj7y7jIoCG7fhj17IDAQhg+XVdZz54yOUiml\nrI4mqX83dy5cvQo9e8braWfPhkWLZK9WxozxemqVAJlMJkp5lmJ+vflc7HqRnyr+RFh4GPUW1SPN\n8DS0XdmWree3ajnA+3JwgEKF5LJ/SIj0Ws6fHxYsMDoypZSyKpqkPhcVJZfpPv1U+iLGk7NnoVMn\nudTfsGG8nVYpAFInTU1gsUD2tt9LaKdQvijyBZvObaLczHJkG5ONeYfnabL6IYoWhQMHoEYN8PeH\nNm3g/n2jo1JKKaugSepzK1bAiRPw1Vfxdspnz6SyIGVKaSiglJFyeORgUPlBnP3yLCGtQsiVKhdN\nlzbFZ5IP606vw6zN7N+Pm5tcpZk+XVZTfXxg506jo1JKKYunSSpIUeiPP0K5crLyEU++/17Ghc+Z\no/1QleUwmUyU9CzJisYrCGkVQtJESak2txrlZ5Vn9+XdRodnnUwmaNkS9u8HZ2coUQKKFZPk9fFj\no6NTSimLpEkqSO+nPXtkDGI8OXAAvvtOWiyWKBFvp1UqRkp6lmRby20E+Qdx68Etik4pSsNfG3Ln\n4R2jQ7NO2bPLz5oVK+SdabNm4OkJ/fvD5ctGR6eUUhbFIaYPePjwIVOnTiU4OJiIiAg8PT3x9/en\nfPnyb31ccHAwW7Zs4cSJE9y9e5fkyZOTO3duWrZsSfr06d/7BcSKH3+EAgWkx2E8ePoUWrcGb29D\npq4qFSMmk4ma2WpSLUs15h6eS7ffulF0SlFW+a8iu0d2o8OzPvb2ULu2HMePS1/mn3+W/qt+fjJE\npFQpWX1VSqkELMYrqf3792f9+vW0aNGCn376iezZszN48GA2bdr01sctXLiQp0+f0rx5c4YOHUrr\n1q05deoU7du35/z58+8b/4fbtw82bJBV1Hj6pTBsmAyumT4dEiWKl1Mq9cHs7expnq85u9vtJpF9\nIopOKcqGMxuMDsu65cghSerlyzBypEy1Kl1augFMnqybrJRSCVqMktRdu3axb98+AgMDqVmzJvnz\n56dHjx74+PgwceJEot8yx/r7779nyJAhVKtWjbx581KpUiVGjBjB06dPWbx48Qe/kPf2008y2rBe\nvXg5XWgoDBwIPXrI/gmlrM3HyT9mZ5udlMhYgmpzqzFu9zijQ7J+rq7wxRfyA2L9evDygoAA6dfc\nvTucOWN0hEopFe9ilKSGhITg7OxM2bJlX7m/WrVq3L59m9DQ0H99rLu7+2v3pUiRAg8PD27evBmT\nMGLPlSuwdKlMl4qHDvpRUdKBxssLBgyI89MpFWdcnVwJ8g/iy6Jf0nltZzqu7sjTKJ2s9MHs7KTs\naMUKSUzbt4cZM6SW9auv4MEDoyNUSql4E6Mk9dy5c3h6emJn9+rDMmfODBDjy/ZXrlzh+vXreHl5\nxehxsWbmTHByks0L8WDMGNi1C6ZOhSRJ4uWUSsUZezt7RlYZyZRaU5i8fzLV5lbjxK0TRodlOzJn\nlis9ly7J5ZfRoyFvXti82ejIlFIqXsQoSY2IiMD1Db2Snt8XERHxzs8VFRXF0KFDcXZ2pn79+jEJ\nI3ZER0u22LBhvPR/OntWdvJ36iR7IpSyFW0KtmHjZxs5cuMIOcbloOKsiiwLXcaz6GdGh2YbkiSR\nHx4HD0K6dFChArRtC3fvGh2ZUkrFKUNaUEVHRzN06FCOHj1K7969SZkyZfwHERwsl9Pato3zU5nN\n0K6dNO3/4Yc4P51S8a6MVxnOB55ndt3Z3H96H79Ffnw8+mOGbBvC9cjrRodnG7Jnh61bYcIEmaPs\n7Q1LlsgPGKWUskExakHl6upKeHj4a/c/X0F90yrrP5nNZoYPH87GjRvp3bs3Jd6hSWhgYOBrNa3+\n/v74+/u/Y+RvMGWK7KyNhyalU6bIFbr16yFp0jg/nVKGSOyQmGZ5m9EsbzP2X93P+D3jGbJ9CAOD\nB1Lfuz6dCneiRMYSmLS10vuzs4PPP4dataBjR6hfX2pYR46E3LmNjk4plQDMnz+f+fPnv3LfvXv3\n4uRcMUpSP/74YzZv3kx0dPQrdannzp0DXtam/huz2cywYcP47bff6NmzJxUrVnyn844aNYqCBQvG\nJNS3u3NHViAGD47ztlPnz8tO/tat460Nq1KGK5i2IFNqT2FYpWFM/3M6E/ZOoNT0UuRLnY9OhTvR\nJE8TXBK5GB2m9UqfHpYvh5Ur5QdMvnxyuea77yBVKqOjU0rZsDctEu7fvx+fOGhZFKPL/b6+vjx8\n+JDg4OBX7l+3bh0eHh7kzJnzXx/7fAX1t99+o1u3blStWvX9Io4Nc+fKVvvmzeP0NE+fQpMmkDw5\njBgRp6dSyiIlT5KcbsW7caLzCdY1XYenmycBqwJIPzI9gesCOXNHWyu9N5MJ6tSBo0dh+HBYuBCy\nZoWhQ+HRI6OjU0qpDxajJLVIkSL4+PgwatQoVq9ezYEDBxg+fDh79+4lICDgxWW8oUOHUrFiRW7c\nuPHisWPGjGHt2rVUrVqVzJkzc+zYsRfHqVOnYvdVvY3ZLE2y69SJ8xWHb7+F3bth/nx4QwcupRIM\nO5MdVbJUYaX/Ss52OUuHQh2Ye3guOcbloOPqjlyLvGZ0iNYrUSLo2hVOn4YWLeCbb6RedfVqoyNT\nSqkPEuOxqN999x1Tp05l+vTpREREkClTJvr160e5cuVefI7ZbH5xPLdz505MJhNr165l7dq1rzxn\nmjRpmDdv3ge8jBjYu1fGPQ0dGqen2bxZNkkNGQLFi8fpqZSyKl7uXvxQ8Qf6l+nP2N1j+T7ke2Ye\nnEm3Yt3oWbInrk5x323DJqVIAf/7n9SqBgZCzZrQsqWMXNV3yUopK2TasmWLxW4NPXnyJAEBAezb\nty/2alIDAmDtWjh3Ls4a+N+8KSVi3t6yWcrOkB4KSlmHuw/v8tPvPzH6j9G4OLrQx7cPHQt3xMnB\nyejQrJfZLEMAAgMhWTLZvWlkiZVSyqY9r0n95ZdfyJYtW6w9b8JKnyIjYd482cUURwlqdLQsXjx7\nBrNna4Kq1H9JniQ5P1b8kdNfnMYvpx89N/Qk9fDUlJlRhi/WfMHkfZPZdWkXkU8ijQ7VephM0KoV\nHDkCuXJBtWrSbu8N3VmUUspSxfhyv1X79Ve4f19+eMeR0aNhzRo50qaNs9MoZXPSu6ZnUq1JdC/e\nnV+P/crhG4fZeG4j4/eOJ9ocDUD2FNmp712fZnmbkcMjh8ERW4GMGWHdOhlc0q2bXNqZNg3esbOK\nUkoZKWElqVOmQOXKkClTnDz9vn0yXrtbN1m4UErFXHaP7PQt3ffF3x8+fUjorVAOXT9ESFgIY3eP\nZcj2IRRMW5BmeZrROHdj0ibTd4T/ymSSVdRKlaBNG7nt0kWK5nU+s1LKgiWci9HHjsGOHXE2Yeqv\nv6BxYxmtrVOllIo9SRyTUDBtQVrmb8mU2lO41uMaSxouwcvdi683fU2GnzNQeXZlDlw9YHSoli1T\nJllJHTUKJk6EQoXggP6bKaUsV8JJUqdOBQ8PqF071p/abJYy12vXpN1UokSxfgql1P9L7JAYv5x+\nLGm4hGvdr/FLzV+4FnmNolOKMmLHiBelAeoN7OxkFXXfPvlBVbQo/Pij9I1WSikLkzCS1CdPYNYs\n6SEYBxnk8OGweLGcImvWWH96pdS/SJ4kOW0LtmVPuz10KdqFHht6UHVOVa7+ddXo0Cxbrlywa5fU\nJn3zDZQrJx1PlFLKgiSMJDU4GG7dgs8+i/Wn3rwZvv5ajrp1Y/3plVLvwMnBiWGVh7G+2XqO3DhC\nngl5WHlipdFhWTYnJ1lF3boVwsIgRw5pTaIlAEopC5EwktSgIPD0lILRWBQWBo0aQfnyMHhwrD61\nUuo9VPqkEoc6HKKkZ0nqLKhDh1UduPXgltFhWbbSpeHQIfkhtnkzFCwIZcrAsmVaBqCUMpTtJ6lm\nM6xaJdNX/n9sa2x49Ajq1QMXF6lDjaO2q0qpGPJw9mB5o+VMrDGRmQdnknJYSrzHeRMQFMCcQ3O4\ncO+C0SFaHldX6NkTzp6VVn3R0eDnB1myyHS+0FD5WaqUUvHI9pPU0FCptapZM1af9osvZLrq0qWy\nH0spZTlMJhMBhQI48+UZZtedja+nL9vDtvPZss/wGu2F58+efLHmC01Y/8nBAerXh+3bZYS0ry/0\n6yfj8zJkkLr+2bPhyhWjI1VKJQC23yc1KAicnWVjQCyZPFlark6fLlfGlFKWKW2ytDTL24xmeZsB\ncOvBLX4P+53gC8HMOjiLifsm0jRPU74q+RU5U+Y0OFoL4+Mju0EnTICQENi4UY5Zs+TjOXLI/Ods\n2SB7djmyZZNVWaWUigW2n6SuWiXNqxMnjpWn27MHOneGDh1kj4FSynp4OHtQJ0cd6uSow6Byg5i8\nfzLDdwxn1sFZ+OX0o3ep3vik8zE6TMvi4gJVqsgBcPOm1K5u3SpXqoKDpf/ec2nSSJ3rp59C9erg\n5mZI2Eop62fbl/tv35YG/rF0qf/RI2kQkDev9MNWSlkvl0QuBBYL5MyXZ5hUaxJ/XvuTQpMLUWNe\nDU7cOmF0eJYrZUrZMTphgiSqV69CeLi8g58zR5pGnzkDTZrI51atKsMDrmpbMKVUzNh2krp2rWwA\nqFEjVp7uu+9kX8GMGdqwXylb4eTgRNuCbTne+Tjz/OYRejOUPBPy8PXGr4l8Eml0eNbB1VUmWDVt\nCkOGSD3rhQswYoT0qe7cGdKlgxIl5B3+pUtGR6yUsgK2naSuWiU/ONN++FzvAwdkk2u/ftIHWyll\nWxzsHPDP48/RjkfpW7ovo/8YTc5xOVl0dBFm3dkec56essN082a4fl3e3adIAb16QcaMUKoUjB4N\nly8bHalSykLZbpL69CmsWwe1asXKU7VuLcnp11/HQmxKKYuVxDEJ/cv0J7RTKIXSFaLR4kZUmFWB\nozeOGh2a9UqRQjoDBAXBjRswcyYkTy5trzJkkBqqypWlRODLL+Wy1fjxsHy5lG0ppRIk2904FRIi\ndVKxUI86bJi0m/rjD3B0jIXYlFIWz8vdi2WNlrHu9Dq+XPsleSbkoYxXGZrlaUY973q4J3Y3OkTr\n5O4OzZvLce8erFghI1pv3ZK61cOH5c+3bsGzZ9LfulAh2bhVuTIUK6Y/iJVKIGx3JTUoSGqgChT4\noKcJDYWBA6FHD+nIopRKWKpmqcrhDoeZVmcaDnYOtAtqR5rhaai3qB7LQpfx+Nljo0O0Xu7ussI6\nYYIMEdiyRZLUq1elljUsDKZOhY8/lpXV0qWlMbWfH/z+u9HRK6XimO0mqbEwZSoqCtq0AS8vGDAg\n9kJTSlkXJwcnWuZvyYbPNnCp2yW+r/A95++dx2+RH2lGpKHp0qYsOLKAuw/vGh2q7TCZpHa1VStY\nsEDKBHbvlprWM2ekprV+fTh92uhIlVJxxDaT1JMn4dSpD77UP24c7NwpjfuTJIml2JRSVi1dsnR0\nK96Nfe33cazjMb4s8iWhN0PxX+JPymEpKTujLCN2jNA2VrHN3h4KF4Y+fWQn66xZUoPl7Q2BgVq7\nqpQNss0kNShImvdXqPDeT3HuHPTuDZ06yWRApZT6p5wpczKw3ED2B+znYteLjKs+jmROyei7pS85\nxuWg7Iyy7Lq0y+gwbY+dnTStPnlS6rGmTYMsWaTlVUSE0dEppWKJbSapq1ZJgurs/F4PN5shIEBK\nn374IZZjU0rZpAyuGQgoFECQfxC3e91mcYPF3Hl4h+JTi+O30I/jt44bHaLtSZJEVhNOnwZ/f/jq\nK+kaULAgdOkida46REApq2V7Serdu7B9+wdd6l+4EDZskFr+ZMliMTalVILg7OhMPe96HAg4wKxP\nZ7H/6n5yjc9F25VtuRShjexjXapUsrHq9GmYNElaWq1eDQ0bygbaLFmgWzfZiKWUshq2l6T+9pvs\neHrPJDU8HLp2hXr1ZOy0Ukq9L3s7ez7L9xknOp9gZOWRrDixgqxjstJpdSd2XdqlQwJim5eX7Had\nMUMS1suXYdEiGc06YwZ88om0vjp82OBAlVLvwvaS1FWrIH9+aRD9Hvr2hchImdynlFKxwcnBiS7F\nunDmyzN8VfIrlp9YTvGpxck6JisDtgzg5O2TRodom9KlgwYNYOxYWUUdPhy2bpWV1urVIThY6ruU\nUhbJtpLUZ89gzZr3XkXdu1euGH333XvnuEop9a9cnVz5tuy3hAWGsan5JkpnKs2oP0aRfWx2ikwu\nwuhdo7keed3oMG1T0qRSp3rmDMyeDZcuQdmyMhxg+XKIjjY6QqXUP9hWkrpzp9Skvsco1Kgo+Pxz\nyJNHxk0rpVRcsbezp3zm8kyrM41r3a+xqP4i0iVLR88NPUk/Mj1V51Rl9sHZRD6JNDpU2+PoCM2a\nwcGDsqiRODHUrSs//GfPljnYSimLYFtJ6ubNsrPzPUZDTZwI+/fLrYPtDotVSlmYJI5JaJCrAcsb\nL+daj2uMqz6OB08f0Hx5c1IPT02TJU1Ye2ot0WZd6YtVJhNUqyaX/ENCIHNmqVfNlk0uqT18aHSE\nSiV4tpWkBgfL2Dx7+xg97OpV+OYbaN9ervwopZQRPkryEQGFAtjWahvnupyjr29fDl4/SPV51ck7\nIS9zD83lWfQzo8O0PSVLyn6GP/+UXwJffCGjWH/5RcrIlFKGsJ0k9dEjudxfpkyMH9qtGzg5aU9U\npZTl8HL3ordvb450OML2VtvJ5J6JZsuakX1sdibtm8TjZ4+NDtH25MsH8+fDiRNQqZLUgOXNKwms\nbrBSKt7ZTpK6e7ckqmXLxuhhGzbIWOgRI6RSQCmlLInJZKKUZylWN1nN/vb7KZSuEJ+v+pyP//cx\nI3eOJPxRuNEh2p4sWWTs6r59kDat7HMoX17+rpSKN7aTpAYHg5ubvOt9R0+eyNjTsmWljl4ppSxZ\ngbQFWFh/IaGdQqn6SVW+2vgV6Uemp8OqDhy+rr0/Y13BgrBxowwGuHEDChWCpk1h3Tq4f9/o6JSy\nebaTpG7dGuN61PHjpRvJmDFSQ6+UUtYgu0d2ptaZyoXAC/Qs0ZMVJ1aQd2JeSk8vzYIjC3gS9cTo\nEG2HySQ9VQ8elGlWISGy4Sp5cvmdM3CgTDl8ov/mSsU220hSHz+GHTtidKn/zh3ph9quHeTOHXeh\nKaVUXEmXLB0Dyg7gQuAFFtVfhL2dPf5L/PH82ZPB2wZz79E9o0O0HQ4O8gvj/HkIDYWffwYPD5n8\nUrq0JK3Vq8Po0XD8uNawKhULbCNJ3bNH6lFjsGlq0CBphzdwYBzGpZRS8cDR3pEGuRqwpcUWjnQ4\nQt0cdRm8bTCeP3vy9cavdUBAbDKZIEcOqRVbuhRu3ZJJMAMGyGpqr16QM6eMaG3XDn79FSIijI5a\nKatkG0nq1q3g6irjUN/BqVMyJe+bbyB16rgNTSml4lOuVLmYUHMC5wPP06FQB8bvGY/XaC++WPMF\nF+5dMDo822NvL725e/WS+tU7d2RIQN26UhrQsCF4ekoSe+eO0dEqZVVsJ0mNQT3qV1/JSOfAwLgN\nSymljJImaRp+qvQTFwIv0Me3D/OPzCfLmCy0WtGK47eOGx2e7XJxkZrVUaOkLODcOWjdGoYNk9XV\nvn3h9m2jo1TKKlh/kvrkidSjvuOl/uBgWLZMeqImSRLHsSmllMGSJ0lO39J9uRB4gaEVh7L+zHq8\nx3nT4NcG7L+63+jwbJ+XF4wcKcnq559LLauXF/TuLR0DlFL/yvqT1D17ZHzdO2yaio6G7t2hSBFo\n3DjuQ1NKKUvhksiFrsW7cvbLs/xS8xcOXD2AzyQfqs2txvYL240Oz/alTg1Dh8rGq86dpeYsXToo\nVQoGD5a61mgdfavU31l/khqDetS5c6UX88iRYGf9r1wppWLMycGJdj7tON75OPP85nEp4hKlZ5Sm\n8OTCfL/9ew5dP4RZd6bHnZQp5VLe+fPSBzFVKkleCxeGNGngs89kkMChQ9K5RqkEzMHoAD5YcLC8\nE3V4+0t58ECurtSvL2OalVIqIXOwc8A/jz+Ncjdi9cnVzDg4gx9CfqDP5j54unlSM2tNamarSbnM\n5UjskNjocG1PihTQvr0cT5/KWO+1a2VQwJw58jn29pAtm/RJfH7kywcff6zNvVWCYN1J6pMn8Pvv\n8O23//mpI0dK+c+PP8Z9WEopZS3sTHbUyl6LWtlr8fjZY4IvBLPq5CqCTgYxfu94EjskpliGYvh6\n+uLr6UvxjMVJmiip0WHbFkdH2fxburSsst67B0ePwpEjL4///e/lhqvnVw/z54cCBeTInTtGw2yU\nsgbWnaTu3StLpP9Rj3rzpiSnX34Jn3wSP6EppZS1cXJwovInlan8SWVGVx1N6K1Q1p1ex/aw7Yzf\nM55B2wZhb7KnQNoClMpYimwpspHJPROebp54unni6uRq9EuwDe7ucsnv75f9zGa4fh0dokKZAAAg\nAElEQVT+/BMOHJDbtWsleQXIlAkCAqSTgPZWVDbCupPU4GBIlkzeRb7FqFFy27t3PMSklFI2wGQy\n4Z3SG++U3nQr3g2z2czxW8fZHrad7WHbCToZxIXwCzyLfvbiMe6J3fF08yR/mvwvVl6zpciGSS9N\nfziTSWpWq1aV47m//pLNFjNnyhjFAQPAzw86dJCVWf23V1bMupPUrVv/sx41PBzGjZPv1xQp4i80\npZSyJSaTiZwpc5IzZU7a+7QHICo6iquRVwkLD+PCvQuEhYdx/t55dl/ZzZxDc4g2R5PKJRWlPEu9\nSFrzpcmHg511/+qxKMmSydXEsmVhxAjZdDVxovw9Z04ZJlCggJQGeHpq0qqsivX+pHj6VOpR+/d/\n66dNmCAdqrp2jae4lFIqgbC3syeDawYyuGagRMYSr3ws4nEEOy/uZNuFbWwP287XG7/mcdRjkiZK\nSomMJV4krUXSFyGJozatjhUffSRTarp0kUWciROl1dXzWtbkyV/WsubMKSuzqVO/PBLrBjllWaw3\nSd23D+7ff2sT/wcPZMNUq1bSjk4ppVT8cHVypUqWKlTJUgWAx88es+fKHrZfkHKBYTuG0W9LPxzt\nHPFO6U3yJMlxc3LDPbE7bk5uuCV2w8PZA++U3uRNnZdULqkMfkVWxGSCcuXkMJvhyhWpYX1+BAVJ\nHdw/W425ukL69NJBwMcHChWCggXlfqUMYL1J6tatkDSpfAP9i2nT5A1kr17xF5ZSSqnXOTk4Ucqz\nFKU8S9Gb3kRFR3H4xmG2X9jOsZvHCH8cTvjjcE7fOc29R/cIfxzOrQe3ePTsEQCpXVKTJ3Ue8qbK\nS740+aiWpRopXVIa/KqsgMkkiWf69FCjxsv7nz2TXcXXr796hIXB/v2wcqWs9IC0wSpYUDZnpUkD\nadO+eqtJrIoj1pukPu+P6uj4xg8/fSqjkv39paWcUkopy2FvZ0/+NPnJn+bfB7FERUdx9u5ZDl0/\nJMeNQyw/sZyRu0Zib7KnSpYqNMvTjNrZa+OSyCUeo7cBDg6SZKZN++aPR0XB8ePSRWffPlmB/eMP\nuHoVHj169XMzZJBRjkWKQNGisgqbLFncvwZl86wzSX36FEJCoE+ff/2UefPkDeHXX8djXEoppWKN\nvZ09WVNkJWuKrNTzrvfi/pv3b/LrsV+Zc2gOTZY2wcXRBb+cfjTN05QyXmV0+EBssLeHXLnkaNHi\n5f1mM0REwLVrkrBeuQIHD8Lu3TLeNTJSVm+9vWWKlo+PrMLmzw/Ozsa9HmWVrDNJ3b9fvhH+pT9q\nVJT0Q65dW/obK6WUsh0pXVLSsXBHOhbuyJk7Z5h3eB5zD89l9qHZ2JvsyZYiG3lT531x5EmVB083\nT22FFRtMJnBzkyN7drmvSRO5fb76unu3rLru3SsrRk+eyCzynDklac2TR1Zfn5chpE8PTk7GvSZl\nsawzSd22Td6R+fi88cPLl8OJEzBjRvyGpZRSKn598tEn9CvTj76l+3Lw+kF2X97N4euHOXTjEL+d\n+Y17j+4BkDZpWmpkrUGt7LWokLmClgfEhb+vvrZqJfc9eSLTs/bte3ksWSIbn//Ow0NqXgsWlLKB\nwoXlef5j5Lmybdb5v3/okOw+fEM9qtkM338P5ctDsWIGxKaUUiremUym12pczWYzlyIucej6ITaf\n28yqU6uYcmAKTvZOlM9cnprZalIzW0083TwNjNzGJUr0cnRr27Yv74+IgEuX4PLll8eZM7BrF0yd\nCtHRkCTJy6S1UCE5smSRVVmVIFhnkhoaKknqG6xfL9UAGzfGc0xKKaUsislkIqNbRjK6ZaRGthqM\nqDKCk7dPsvrkaoJOBtFlXRc6relE3tR5qZm1JrWy16JwusLY29kbHbrtc3WVulVv79c/dv++/CLf\ns0dKB1asgJ9/fvm4ggVftsjKkkU2f6VOrauuNsj6/kejoyVJ9fd/44d/+EGuEpQvH89xKaWUsnjZ\nUmQjW/FsdC3elXuP7rH+zHpWnVzFL/t+4fuQ70npnJLqWatTK1stqmetroMGjODiAr6+cjx3587L\ncoG9e2HxYpmw9ZzJBKlSScKaLh3kzSsbU4oW1ZVXK2Z9SerFi9K7LWfO1z60c6d0plq2TCe/KaWU\nejv3xO40zNWQhrkaEhUdxa5Luwg6GcTqU6uZeXAmyRIlo553PZrlaUZZr7K6wmqkjz6CSpXkeO72\nbbhwQboMPO80cPWqlA5MmQI//igrrLVqQZ06UKGClBAoq2F9SeqxY3L7hksEo0ZB1qzy5kkppZR6\nV/Z29pT0LElJz5L8WPFHTt4+ybzD85hzaA4z/pxBumTp8M/tT5M8TcifJj92Jl2dM1yKFHK8SVSU\nrFytWCHHlCmy4bpWLejcGUqW1NUsK2B9SWpoqHyheb5a6H7xomwYHDVKV/aVUkp9mGwpsvFt2W8Z\nUGYAuy/vZs6hOcw6OIsRO0fg7OhM7lS5yZsqr0zB+v82Vymc/yVhUvHP3l4G/pQqBUOHSmusFSuk\n7Y+vr9S0BgZCw4ayuUtZJOtLUo8dgxw5XstEJ0yQMpa/9xxWSimlPoTJZKJohqIUzVCUkVVGsj1s\nOweuHuDwjcPsvbqXWYdm8STqCQDpk6V/Mbr1eY/W7B7ZSWSvSZChTCYpEcyZU+akr18vK1qffQY9\ne0LHjhAQIDWtyqJYZ5L6j0v9Dx/CpEnQurVOYlNKKRU3HO0dKZ+5POUzv9yZ+zTqKafunHoxuvXw\njcMsOLqAoTuGymPsHPFJ50PNrNLuKm/qvDpUwEh2dlC1qhyhofC//0nt6qBBUK4c+PlJ/WqaNEZH\nqrC2JNVsli+qmjVfuXvuXNn417mzQXEppZRKkBztHfFO6Y13Sm8a52784v57j+7JUIHrh9h8fjM/\n/v4jfbf0JaNrxhf9Wct5ldPuAUbKmVMuww4ZIpOxli2DTp2gQwcoXvxlwvrJJ1q/ahDrSlKvXYN7\n915ZSTWb5Y1QzZrydaSUUkoZzT2xO76ZfPHN5EunIp14/Owx2y5sY9XJVQSdDGLC3gnYm+zJ4ZHj\nRU3r8xKBDK4ZdLU1Pn30kaxyde4sHQOCgmDpUujTB3r0kJrVtGllfGu6dC9HuZYsKVODdCNMnLGu\nJDU0VG7/1n5q61Y4fPhln1+llFLK0jg5OFHpk0pU+qQSo6qOIvRWKNsubHsxwnXNqTWEPw4HwNXJ\nFS93LzzdPMnklunVW/dMpEmaRrsLxJUUKaBlSzkiIyXJOH9e2ltduSLtrY4dk93af/0lSWu9elC/\nviSt9tqmLDZZV5J67JiMQv3bkun//ifjfbV5v1JKKWtgMplelAg8ZzabuRhxkUPXD3Hs5jEu3LtA\nWEQY2y5sIyw87EUCC1LnmtEtI55uni8S2Pxp8lPKsxSpXHTzT6xJmvS18sIXoqNhxw4ZKrB4MYwZ\nIz1Z/fxkBKzZLJ/z/DY6GhInfrkKmyGDrODqivlbWVeSGhoK2bK9GH127px0lJg4Uf+flVJKWS+T\nyfQi6ayZ7fXEKPxROBfCLxAWHkZYeNiLJPbU7VOsP7Oea5HXAMieIju+nlJm4Ovpi5e7l5YOxAU7\nu5ctrkaOlPGtzxPWCRMkKTGZ5PPs7OTPT55I0vpc4sSSrGbOLMlwvXqSwKoXrCtJ/cfO/nHjwN0d\nmjUzMCallFIqjrkldiNvYqlZfZOL4RfZHrad7Re2sz1sO1MOTJHHObmRyT3Ta6UD6V3T4+bkhlti\nN9yc3EjmlExLCN6XnZ3UphYrBsOG/fuq2bNnsrfm8mW4dEmOy5elZrFHD+jSRUoGGjSQhDVDhvh9\nHRbI+pLUDh0AKRWZ8n/t3XtU1HX6wPH3DDjcFAYGYVAZREiDUEmLvFZmZpqSrVsd2kqXNdrdLr/c\ntjyk5ep2tAjSFU8rm2SXlXZNN1zJME2XxMRS27yEpHKzaMGdYAYCAWF+f3xjjOUiMsMMyPM653s8\n58t8hufrc+A8fObz+TwblaPNPD2dHJcQQgjhRME+wTww+gEeGP0AAMZaIwfOHSD/fL4y82oq4ZOS\nTygxlWCuN7cZr0KFt5s3vh6+BHsHK4Wtt8Fa4Bp8DPh5+OHj5oO7q7vMznaks/8XV1el8Bw2DG66\nqfXXqqrgn/+E995TznJ96inlhIFp05R/J0wAf/+ejb0X6jtFqtEIFRXWTVNvv62sWX7sMSfHJYQQ\nQvQyOk8dsaNiiR3Vtk+46YKJsuoyTPUmTBdMrf411ho5Zz5HcVUxn5R8wjfmb2i2NLcar3HRWGdh\nte5aggYGtZqplQ1e3aDVwsMPK5fJpBSsmZmQng6rVimvCQ+/VLDOn6+sgb3K9Z0itWVnf2Qkzc3K\nGuV77mnTHVUIIYQQnfBxVwrMrrjYfJGy6jJKTaVU1lVaC9qqC1WY6pV/y6rLyCnJoaSqhOqGauvY\nn27wCvEJsRax/7vUwMfdB68BXjI728LHR+mG9dBDyhrW4mLIy4ODB5Xr3XdhyRJITITFi8Hj6j1r\nt28VqWo1jBzJnj1KG96//MXZQQkhhBBXL1e1q3V29HIsFgumetOljV0/LjMoNZVSYCxgd+Fuvqv+\nDguWNmNdVC4dLjUI8QkhRBvCYM/B/a+QVamUjVWhoRAXp9z7/nulQ9Yf/qDsHF+1Ch544Ko8r7Xv\nFKlffaUcPeXmxjvvKMdOTZni7KCEEEIIAcoJBVp3LVp3bYcbvBqaGiivKW+1xKDqQhWmCyaMdUbO\nmc5RYiphd+FuSkwl1DbWWse6u7pbC+aWItbf0x8fN2XZQcvyAx83HwIHBqJx0Tjq0R3Lz085HP6x\nx5QZ1Ycegj/9CVJS4OabnR2dXfWdIjU/HyIiaGyErCx44gk5dkoIIYToSzQuGoJ9ggkm+LKvtVgs\nfF/3vXU29qezs8crjpN1OgtjrZEmS1Obsa5qV2s3rzEBYxgdqHT0Gjpo6NUzGxseDtu2wf798PTT\ncMstynFHGzaAl5ezo7OLvlOkfvUV/OIX7N+vbIKbN8/ZAQkhhBCip6hUKnSeOnSeOsYFjWv3NRaL\nhdrG2jYzs0WVRRyvOM6x8mPsKNhhXSvr7+nPFMMU5SxZw1SuD7oeV3XfKYXaNXWqsmb1nXeU2dUv\nv4T3378qesX3jcz88IPSgiwigu3bIThYaegghBBCiP5LpVLhpfHCS+PFkEFD2n2NxWJRZl/Lj/N5\n2efsL93P0r1LuXDxAl4DvJgYPJEpwVMYqx/LmMAxDNcO73unEqjVsGABjB+v7Cq/4QbIyIBZs5wd\nmU36RpFaXAyAJSKSzOchNlY+6hdCCCHE5alUKoZrhzNcO5y5o+YCytrYI2VHlAYIpftJ/SwVY50R\ngIGagUQFRDEmYAxRAVEEeAW0OomgZd2t54BeeEh7VBR8/rnysf9ddykbrBIT++ymqr5RpBYWAnC8\n8VpKS+Huu50cjxBCCCH6LI2LhonBE5kYPJFnJz+LxWLhu5rvOF6uLBE4VnGMQ98eYtO/N9HY3Nju\newR6BSprXn+8RgeMJmJwBO6u7g5+mv+h1SrnrK5cCcuWweHD8NZb4O3t3Li6oW8UqUVFYDDw/u6B\n+Pgoa4OFEEIIIexBpVIxZNAQhgwawszwmdb7zZZmahpqrCcQtKx9rbxQydfGrzlecZz3T71PysEU\nQDlKy8/Dr93NWa5qV7zdvK0nELTMzvp7+nPd4OsYHTiaa/2vtc+pBGq1ckTV+PHKrOqYMUrr1V/+\nsk9tquobRWphIURGsn07zJ4Nmqv0VAkhhBBC9B5qlRpvN2+83byhk/4H1fXVnDx/kmPlx/hv7X/b\nfU1jU+OlZgj1VRhrjRRWFlJeU8458zlAKWQj/COsM7MjfEdYO3gFeAVc+VrZuXOVmdQXXoD/+z+l\ncH3sMXj8cRg8+Mreywn6RpFaVIR55r18ka0cCSaEEEII0VsMchvEhGETmDBsQrfGV12o4kTFCety\ng+MVx9nx9Q7M9WbrazQuGmuzgwlDJzBn5Bxihsbgonbp/M2vuUbpUrVqFaxdC8nJkJQECxfCU0/B\nqFHditkR+kaR+u23fP5DJAMG9PmNakIIIYQQrWjdtUwxTGGK4VKXIovFQtWFqladu0qqSiiqKiLt\nSBqrclcx2HMws6+ZzdyRc5kRNkOZ8e1IaKhy6P/y5fDnP8O6dcqZqgYDTJ6sXJMmKUsDXC5T+DpI\n3yhSLRb+eTqC227rk+t+hRBCCCGuiEqlwtfDF18PX8bqx7b6WlNzE3nf5JH1dRY7vt7BW1++xQD1\nAGaGz+S3N/yWmeEzO14a4OcHS5cqDQB27oQDB+DTT2HrVmhshIEDYeJE5Sile++FwEAHPG37+syZ\nBBlfRMiufiGEEEL0ey5qFyYbJrP69tWc+O0JCp8sJPmOZL4xf8PsjNmMTB1JyqcpfF/3fcdv4u4O\nP/uZ0k714EEwmZTuVcuWKRuvFi+GIUPgjjvgjTeUTkoO1ieK1AsDdfy32Y/YWGdHIoQQQgjRu4T6\nhvLkTU9yNOEoB+IPcNOwm0j8OJGhrw4lfns8B0oP0NjU/lFaVh4eMGWKsvknOxv+8x9lWcDFi7Bo\nkTKjevfdsHevYx6KK/y4v66ujvT0dHJycjCbzRgMBuLi4rjtttsuO7ayspK0tDTy8vKor68nLCyM\n+Ph4xo1rv9XZT33jGsqNN8LQoVcSrRBCCCFE/6FSqZgUPIlJwZNYM3MNG49uZMPhDWz69yY8B3gy\nYdgEa0vYCcMm4KXp5DgqnQ4SEpSrrAy2bIG//Q1On4Yu1H32cEVF6gsvvEBBQQEJCQkMGzaMPXv2\n8OKLL2KxWJg+fXqH4xoaGnj66aepra3liSeeQKvVkpmZyZIlS0hOTmbs2LEdjgX4smaEfNTfT7z7\n7rvExcU5OwzhIJLv/kXy3b9Ivp0rwCuA56Y+x7OTn+Vw2WH2lyjdtdYdWseKnBW4ql0ZGziWEb4j\nCPEJsR51ZfAxEOITgq+H76U3GzJEOQngqafAYnHYM3S5SM3Ly+PIkSMsW7bMOnMaHR1NeXk5GzZs\nYNq0aag7aLu1c+dOiouLWb9+PZGRkdaxixYtIi0tjddee63T7336YigvSJHaL8gvtf5F8t2/SL77\nF8l37+CqdrUej/XM5GdotjTz1fmvyC3N5fNvP6fEVMIX//mCUlMpDU0N1nH6gXqlm1bAjx21AkcT\n4R+Bm6ub42Lv6gtzc3Px9PTk1ltvbXV/1qxZvPjii+Tn53Pdddd1ONZgMFgLVAAXFxdmzJjBxo0b\nMRqN6HS6Dr93jX8oHby1EEIIIYToIrVKTVRAFFEBUfz6hl9b7zdbmqn4oYJSUylFlUWcqDjBsYpj\nbMvfRvLBZEDpqLVq+iqenfysQ2LtcpFaVFSEwWBoM1saGhoKQHFxcYdFalFRUbsf6f90bGdFavDN\nobTTYUwIIYQQQtiBWqVGP1CPfqCemKEx3M/91q+Z681K0Vp+jOv11zsspi4XqWazmaHt7Fzy/vHg\nUrPZ3OZrLaqrqxk0aFCHY00mU6ff+8Y7Oy5ghRBCCCFEz/F287ZuyHKkPnGYv8btFEePylRqf1BV\nVcXRo0edHYZwEMl3/yL57l8k3/1Hfn5+j7xvl4tUb2/vdmc8W2ZQvTtpBeXt7U11dXWHY318fNod\np9Pp0Ol0LFjwYFfDFFeB8ePHOzsE4UCS7/5F8t2/SL77j5aazZ66XKSOGDGCvXv30tzc3GpdalFR\nEXBpfWl7QkNDKSwsbHP/cmN1Oh1paWkYjcauhimEEEIIIRzMqUXq1KlT+eCDD8jJyWHatGnW+9nZ\n2fj7+xMREdHp2LVr15Kfn299XVNTE7t37yYyMhI/P78Ox/bEQwshhBBCiN6ty0VqTEwM48ePZ+3a\ntdTW1jJkyBA+/vhjDh8+zNKlS1H9uP0+KSmJjz76iIyMDAICAgDlmKrMzExWrFjBI488glarZfv2\n7Xz77bckJyf3zJMJIYQQQog+64o2Tq1cuZL09HQ2bdqE2WwmJCSE559/vtXMqsVisV4tBgwYQEpK\nCmlpaaxbt476+nrCw8N56aWXGDNmjP2eRgghhBBCXBVU+/btc1x/KyGEEEIIIbrAKUdQ1dXVkZ6e\nTk5ODmazGYPBQFxcnLXdamcqKytJS0sjLy+P+vp6wsLCiI+PZ9y4cQ6IXHRHd/Odk5PDvn37KCgo\noLKyEl9fX6Kioli4cGG7Z/aK3sGWn++fSk9PZ/PmzQwfPpw33nijh6IVtrI137m5ubz33nucPXuW\npqYm9Ho98+fPZ86cOT0cuegOW/J95MgRNm/eTFFREfX19QQFBXHXXXcxb968DtuqC+epq6vjrbfe\n4uzZs5w+fRqz2cyCBQtYsGBBl8bbo15zWbhw4R+6F373Pffcc3z22Wf86le/Yv78+ZjNZtLT0xk2\nbBgjRozocFxDQwNPPvkk586d4ze/+Q2zZ8+mpKSEN998k7Fjx6LX6x34FKKrupvvpKQkNBoNc+fO\nZf78+YwcOZJ//etf/P3vf2fy5MlotVoHPoXoqu7m+6fOnDnDK6+8go+PDx4eHtx99909HLXoLlvy\nnZGRwauvvsrEiRN58MEHmTFjBnq9nubmZq699loHPYG4Et3N92effcaSJUvQ6/U88sgj3HHHHTQ2\nNvL2229TU1NDTEyMA59CdIXRaCQlJQWtVktUVBSnT58mOjqa6Ojoy461V73m8JnUvLw8jhw5wrJl\ny6x/eUVHR1NeXs6GDRuYNm1ah39R7dy5k+LiYtavX09kZKR17KJFi0hLS+O1115z2HOIrrEl36tW\nrWpTiI4bN464uDi2bt3K73//+x6PX1wZW/LdoqmpiZdffpnY2FjOnDnTaTc74Vy25LugoID09HQS\nEhK4//5L7Revv95xLRfFlbEl3x999BEajYbVq1fj5uYGKL/Pz507R3Z2No8//rjDnkN0jV6vZ8eO\nHYDSGXTnzp1dHmuves3h8+u5ubl4enpy6623tro/a9YsjEZjp10LcnNzMRgM1gcGcHFxYcaMGZw6\ndUrOU+2FbMl3ezOlOp0Of39/zp8/b+9QhR3Yku8WGRkZ1NTUEB8f32oDpuh9bMl3ZmYmGo2Ge+65\np4ejFPZiS77d3NxwdXVFo9G0uu/l5WUtWsXVw171msOL1KKiIgwGQ5u/tloO9C8uLu50bHsfJ3Rl\nrHAOW/LdnrKyMsrLyxk+fLidIhT2ZGu+i4uL+etf/8rixYvx8PDoqTCFndiS72PHjhESEkJOTg4P\nP/ww06dP57777uP111/n4sWLPRm26CZb8j1v3jyam5tJTU3FaDRSU1PDrl27OHDgAHFxcT0ZtnAC\ne9VrDv+432w2t7vppaWtamcf7VVXVzNo0KAOx7bXtlU4ly35/l9NTU0kJSXh6enJz3/+c7vFKOzH\nlny35Pfmm2+W9Wl9hC35Pn/+PCaTifXr1xMfH09ISAhHjx4lIyODiooKli5d2mNxi+6xJd/XXHMN\nL7/8MsuXLyczMxMAtVpNQkKC/D6/CtmrXnPK7n4hrlRzczNJSUmcPHmSFStWMHjwYGeHJOxs69at\nlJWVsXr1ameHIhzAYrFQW1vb6qzt6Oho6urq2LZtm5zicZU5fvw4iYmJREdHM2fOHNzd3Tl69Cgb\nN26kvr6ehx56yNkhil7I4UWqt7d3uxV0y19gLVV2R2Orq6s7HOvj42OnKIW92JLvFhaLheTkZPbs\n2UNiYiKTJk2ye5zCPrqb7/LycjZt2sSjjz6Ki4sLNTU1gDK72tTURE1NDRqNps16NuFctv4+r6qq\n4sYbb2x1PyYmhm3btnHmzBkpUnsZW/KdmpqKXq/nj3/8o7VDZXR0NGq1mjfffJPbb7+doKCgnglc\nOJy96jWHr0kdMWIEpaWlNDc3t7pfVFQEXFqv0J7Q0FAKCwvb3O/KWOEctuQblAL1lVdeYdeuXTzz\nzDPcfvvtPRarsF138/3dd9/R0NBAamoqsbGx1uvkyZOUlpYSGxvL66+/3uPxiytjy893WFhYpxvj\nWgoZ0XvYku/i4mJGjhzZJq+jRo3CYrFQWlpq/4CF09irXnN4kTp16lTq6urIyclpdT87Oxt/f38i\nIiI6HVtaWtpqB2FTUxO7d+8mMjISPz+/HotbdI8t+W6ZQd21axe/+93vuPPOO3s6XGGj7uY7PDyc\nNWvWtLpeffVVwsLCCAoKYs2aNcybN88RjyCugC0/37fccgsAhw4danU/Ly8PtVot56T2QrbkOyAg\ngIKCgjYF7smTJwFkCddVxl71msMP8x86dCgnTpzggw8+wNvbmx9++IHNmzeTk5PD4sWLrbvBkpKS\nWL58OXfeeSdeXl6A8ldcbm4ue/bswdfX19rNID8/n8TERAIDAx35KKILbMl3amoqWVlZzJo1i5iY\nGM6fP2+9qqqq0Ol0znw00Y7u5luj0aDX69tce/fupampiUcffbRLS0OEY9ny8x0WFkZeXh67d+/G\n3d2d2tpasrKy+Mc//kFsbOwVdygTPc+WfLu6upKdnc2pU6fw8PCgoqKCrKwstmzZwrhx41qdlSt6\nj0OHDnH27FkKCwv59NNP0Wq1qFQqSkpKCAoKwtXVtUfrNadsnFq5ciXp6els2rQJs9lMSEhIq8Xz\noMyitVwtBgwYQEpKCmlpaaxbt476+nrCw8N56aWXGDNmjDMeRXRBd/N98OBBVCoVH374IR9++GGr\n99Tr9WRkZDjsGUTXdTff7VGpVPKxby/X3Xy7uLiQnJzMxo0b2bx5M9XV1QQFBZGQkMB9993njEcR\nXdDdfMfGxqLT6diyZQspKSlcuHCBoKAgFixYwL333uuMRxFdsHbtWsrLywHl93FOTg45OTmoVCoy\nMjIIDAzs0XpNtW/fPjktWwghhBBC9CoOX5MqhBBCCCHE5UiRKoQQQggheh0pUoUQQgghRK8jRaoQ\nQgghhOh1pEgVQgghhBC9jhSpQgghhBCi15EiVQghhBBC9DpSpAohhBBCiF5HitQ5O1cAAAArSURB\nVFQhhBBCCNHrSJEqhBBCCCF6HSlShRBCCCFEryNFqhBCCCGE6HX+HxVmFKrAw+6/AAAAAElFTkSu\nQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "thresholds = [float(n) / nthresholds for n in range(0, nthresholds)]\n", "falseposDict = dict([(t, falsepos(t)) for t in thresholds])\n", "falsenegDict = dict([(t, falseneg(t)) for t in thresholds])\n", "trueposDict = dict([(t, truepos(t)) for t in thresholds])\n", "\n", "precisions = [precision(t) for t in thresholds]\n", "recalls = [recall(t) for t in thresholds]\n", "fmeasures = [fmeasure(t) for t in thresholds]\n", "\n", "print precisions[0], fmeasures[0]\n", "assert (abs(precisions[0] - 0.000532546802671) < 0.0000001)\n", "assert (abs(fmeasures[0] - 0.00106452669505) < 0.0000001)\n", "\n", "\n", "fig = plt.figure()\n", "plt.plot(thresholds, precisions)\n", "plt.plot(thresholds, recalls)\n", "plt.plot(thresholds, fmeasures)\n", "plt.legend(['Precision', 'Recall', 'F-measure'])\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Discussion\n", "#### State-of-the-art tools can get an F-measure of about 60% on this dataset. In this lab exercise, our best F-measure is closer to 40%. Look at some examples of errors (both False Positives and False Negatives) and think about what went wrong.\n", "### There are several ways we might improve our simple classifier, including:\n", "#### * Using additional attributes\n", "#### * Performing better featurization of our textual data (e.g., stemming, n-grams, etc.)\n", "#### * Using different similarity functions" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }