{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Joining DataFrames in Pandas\n", "\n", "In previous labs, we've explored the power tables as a data management abstraction, in particular with the Pandas DataFrame object.\n", "Tables let us select rows and columns of interest, group data, and measure aggregates.\n", "\n", "But what happens when we have more than one table?\n", "Traditional relational databases usually contain many tables.\n", "Moreover, when integrating multiple data sets, we necessarily need tools to combine them.\n", "\n", "In this lab, we will use Panda's take on the database **join** operation to see how tables can be linked together.\n", "Specifically, we're going to perform a \"fuzzy join\" based on string edit-distance as another approach to finding duplicate records." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "### Data\n", "\n", "Today we'll be using a small data set of restaurants.\n", "Download the data from [here](https://raw.github.com/amplab/datascience-sp14/master/lab4/data/restaurants.csv).\n", "Put the data file, \"restaurants.csv\", in the same directory as this notebook.\n", "\n", "### Edit Distance\n", "\n", "We're going to be using a string-similarity python library to compute \"edit distance\".\n", "Install it on your VM by running the following:\n", "\n", "`sudo apt-get install python-levenshtein`\n", "\n", "**NOTE**: You may also need to run `sudo apt-get update`.\n", "\n", "To test that it works, the following should run OK:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import Levenshtein as L" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Joins\n", "\n", "A **join** is a way to connect rows in two different data tables based on some criteria.\n", "Suppose the university has a database for student records with two tables in it: *Students* and *Grades*.\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "\n", "Students = pd.DataFrame({'student_id': [1, 2], 'name': ['Alice', 'Bob']})\n", "Students" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namestudent_id
0 Alice 1
1 Bob 2
\n", "

2 rows \u00d7 2 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ " name student_id\n", "0 Alice 1\n", "1 Bob 2\n", "\n", "[2 rows x 2 columns]" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "Grades = pd.DataFrame({'student_id': [1, 1, 2, 2], 'class_id': [1, 2, 1, 3], 'grade': ['A', 'C', 'B', 'B']})\n", "Grades" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
class_idgradestudent_id
0 1 A 1
1 2 C 1
2 1 B 2
3 3 B 2
\n", "

4 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ " class_id grade student_id\n", "0 1 A 1\n", "1 2 C 1\n", "2 1 B 2\n", "3 3 B 2\n", "\n", "[4 rows x 3 columns]" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say we want to know all of Bob's grades.\n", "Then, we can look up Bob's student ID in the Students table, and with the ID, look up his grades in the Grades table.\n", "Joins naturally express this process: when two tables share a common type of column (student ID in this case), we can join the tables together to get a complete view.\n", "\n", "In Pandas, we can use the **merge** method to perform a join.\n", "Pass the two tables to join as the first arguments, then the \"on\" parameter is set to the join column name." ] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.merge(Students, Grades, on='student_id')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namestudent_idclass_idgrade
0 Alice 1 1 A
1 Alice 1 2 C
2 Bob 2 1 B
3 Bob 2 3 B
\n", "

4 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ " name student_id class_id grade\n", "0 Alice 1 1 A\n", "1 Alice 1 2 C\n", "2 Bob 2 1 B\n", "3 Bob 2 3 B\n", "\n", "[4 rows x 4 columns]" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DIY\n", "\n", "1. Use **merge** to join Grades with the Classes table below, and find out what class Alice got an A in." ] }, { "cell_type": "code", "collapsed": false, "input": [ "Classes = pd.DataFrame({'class_id': [1, 2, 3], 'title': ['Math', 'English', 'Spanish']})\n", "pd.merge(pd.merge(Students, Grades, on='student_id'), Classes, on='class_id')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namestudent_idclass_idgradetitle
0 Alice 1 1 A Math
1 Bob 2 1 B Math
2 Alice 1 2 C English
3 Bob 2 3 B Spanish
\n", "

4 rows \u00d7 5 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ " name student_id class_id grade title\n", "0 Alice 1 1 A Math\n", "1 Bob 2 1 B Math\n", "2 Alice 1 2 C English\n", "3 Bob 2 3 B Spanish\n", "\n", "[4 rows x 5 columns]" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Joining the Restaurant Data\n", "\n", "Now let's load the restaurant data that we will be analyzing:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "resto = pd.read_csv('restaurants.csv')\n", "resto.info()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Int64Index: 858 entries, 0 to 857\n", "Data columns (total 4 columns):\n", "id 858 non-null int64\n", "cluster 858 non-null int64\n", "name 858 non-null object\n", "city 858 non-null object\n", "dtypes: int64(2), object(2)" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "resto[:10]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idclusternamecity
0 560 453 2223 san francisco
1 781 675 103 west atlanta
2 279 172 20 mott new york
3 43 23 21 club new york
4 44 23 21 club new york city
5 280 173 9 jones street new york
6 486 379 abbey atlanta
7 145 74 abruzzi atlanta
8 146 74 abruzzi atlanta
9 561 454 acquarello san francisco
\n", "

10 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ " id cluster name city\n", "0 560 453 2223 san francisco\n", "1 781 675 103 west atlanta\n", "2 279 172 20 mott new york\n", "3 43 23 21 club new york\n", "4 44 23 21 club new york city\n", "5 280 173 9 jones street new york\n", "6 486 379 abbey atlanta\n", "7 145 74 abruzzi atlanta\n", "8 146 74 abruzzi atlanta\n", "9 561 454 acquarello san francisco\n", "\n", "[10 rows x 4 columns]" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The restaurant data has four columns.\n", "**id** is a unique ID field (unique for each row), **name** is the name of the restaurant, and **city** is where it is located.\n", "The fourth column, **cluster**, is a \"gold standard\" column.\n", "If two records have the same **cluster**, that means they are both about the same restaurant.\n", "\n", "The type of join we made above between Students and Grades, where we link records with equal values in a common column, is called an *equijoin*.\n", "Equijoins may join on more than one column, too (both value have to match).\n", "\n", "Let's use an equijoin to find pairs of duplicate restaurant records.\n", "We join the data to itself, on the **cluster** column.\n", "\n", "> Note: a join between a table and itself is called a *self-join*.\n", "\n", "The result (\"clusters\" below) has a lot of extra records in it.\n", "For example, since we're joining a table to itself, every record matches itself.\n", "We can filter on IDs to get rid of these extra join results.\n", "Note that when Pandas joins two tables that have columns with the same name, it appends \"_x\" and \"_y\" to the names to distinguish them." ] }, { "cell_type": "code", "collapsed": false, "input": [ "clusters = pd.merge(resto, resto, on='cluster')\n", "clusters = clusters[clusters.id_x != clusters.id_y]\n", "clusters[:10]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_xclustername_xcity_xid_yname_ycity_y
4 43 23 21 club new york 44 21 club new york city
5 44 23 21 club new york city 43 21 club new york
10 145 74 abruzzi atlanta 146 abruzzi atlanta
11 146 74 abruzzi atlanta 145 abruzzi atlanta
20 184 94 alain rondelli san francisco 185 alain rondelli san francisco
21 185 94 alain rondelli san francisco 184 alain rondelli san francisco
36 186 95 aqua san francisco 187 aqua san francisco
37 187 95 aqua san francisco 186 aqua san francisco
40 45 24 aquavit new york 46 aquavit new york city
41 46 24 aquavit new york city 45 aquavit new york
\n", "

10 rows \u00d7 7 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ " id_x cluster name_x city_x id_y name_y \\\n", "4 43 23 21 club new york 44 21 club \n", "5 44 23 21 club new york city 43 21 club \n", "10 145 74 abruzzi atlanta 146 abruzzi \n", "11 146 74 abruzzi atlanta 145 abruzzi \n", "20 184 94 alain rondelli san francisco 185 alain rondelli \n", "21 185 94 alain rondelli san francisco 184 alain rondelli \n", "36 186 95 aqua san francisco 187 aqua \n", "37 187 95 aqua san francisco 186 aqua \n", "40 45 24 aquavit new york 46 aquavit \n", "41 46 24 aquavit new york city 45 aquavit \n", "\n", " city_y \n", "4 new york city \n", "5 new york \n", "10 atlanta \n", "11 atlanta \n", "20 san francisco \n", "21 san francisco \n", "36 san francisco \n", "37 san francisco \n", "40 new york city \n", "41 new york \n", "\n", "[10 rows x 7 columns]" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DIY\n", "\n", "1. There are still extra records in *clusters*, above. If records *A* and *B* match each other, then we will get both (*A*, *B*) and (*B*, *A*) in the output.\n", "Filter *clusters* so that we only keep one instance of each matching pair (HINT: use the IDs again).\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "clusters = clusters[clusters.id_x < clusters.id_y]\n", "clusters[:10]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_xclustername_xcity_xid_yname_ycity_y
4 43 23 21 club new york 44 21 club new york city
10 145 74 abruzzi atlanta 146 abruzzi atlanta
20 184 94 alain rondelli san francisco 185 alain rondelli san francisco
36 186 95 aqua san francisco 187 aqua san francisco
40 45 24 aquavit new york 46 aquavit new york city
46 1 0 arnie morton's of chicago los angeles 2 arnie morton's of chicago los angeles
51 3 1 art's delicatessen studio city 4 art's deli studio city
58 47 25 aureole new york 48 aureole new york city
62 147 75 bacchanalia atlanta 148 bacchanalia atlanta
79 5 2 hotel bel-air bel air 6 bel-air hotel bel air
\n", "

10 rows \u00d7 7 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ " id_x cluster name_x city_x id_y \\\n", "4 43 23 21 club new york 44 \n", "10 145 74 abruzzi atlanta 146 \n", "20 184 94 alain rondelli san francisco 185 \n", "36 186 95 aqua san francisco 187 \n", "40 45 24 aquavit new york 46 \n", "46 1 0 arnie morton's of chicago los angeles 2 \n", "51 3 1 art's delicatessen studio city 4 \n", "58 47 25 aureole new york 48 \n", "62 147 75 bacchanalia atlanta 148 \n", "79 5 2 hotel bel-air bel air 6 \n", "\n", " name_y city_y \n", "4 21 club new york city \n", "10 abruzzi atlanta \n", "20 alain rondelli san francisco \n", "36 aqua san francisco \n", "40 aquavit new york city \n", "46 arnie morton's of chicago los angeles \n", "51 art's deli studio city \n", "58 aureole new york city \n", "62 bacchanalia atlanta \n", "79 bel-air hotel bel air \n", "\n", "[10 rows x 7 columns]" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fuzzy Joins\n", "\n", "Sometimes an equijoin isn't good enough.\n", "\n", "Say you want to match up records that are *almost* equal in a column.\n", "Or where a *function* of a columns is equal.\n", "Or maybe you don't care about equality: maybe \"less than\" or \"greater than or equal to\" is what you want.\n", "These cases call for a more general join than equijoin.\n", "\n", "We are going to make one of these joins between the restaurants data and itself.\n", "Specifically, we want to match up pairs of records whose restaurant names are *almost* the same.\n", "We call this a **fuzzy join**.\n", "\n", "To do a fuzzy join in Pandas we need to go about it in a few steps:\n", "\n", "1. Join every record in the first table with every record in the second table. This is called the **Cartesian product** of the tables, and it's simply a list of all possible pairs of records.\n", "2. Add a column to the Cartesian product that measures how \"similar\" each pair of records is. This is our **join criterion**.\n", "3. Filter the Cartesian product based on when the join criterion is \"similar enough.\"\n", "\n", "> SQL Aside: In SQL, all of joins are supported in about the same way as equijoins are.\n", "> Essentially, you write a boolean expression on columns from the join-tables, and whenever that expression is true, you join the records together.\n", "> This is very similar to writing an **if** statement in python or Java.\n", "\n", "Let's do an example to get the hang of it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1. Join every record in the first table with every record in the second table.\n", "\n", "We use a \"dummy\" column to compute the Cartesian product of the data with itself.\n", "**dummy** takes the same value for every record, so we can do an equijoin and get back all pairs." ] }, { "cell_type": "code", "collapsed": false, "input": [ "resto['dummy'] = 0\n", "prod = pd.merge(resto, resto, on='dummy')\n", "\n", "# Clean up\n", "del prod['dummy']\n", "del resto['dummy']\n", "\n", "# Show that prod is the size of \"resto\" squared:\n", "print len(prod), len(resto)**2" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "736164 736164\n" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "prod[:10]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_xcluster_xname_xcity_xid_ycluster_yname_ycity_y
0 560 453 2223 san francisco 560 453 2223 san francisco
1 560 453 2223 san francisco 781 675 103 west atlanta
2 560 453 2223 san francisco 279 172 20 mott new york
3 560 453 2223 san francisco 43 23 21 club new york
4 560 453 2223 san francisco 44 23 21 club new york city
5 560 453 2223 san francisco 280 173 9 jones street new york
6 560 453 2223 san francisco 486 379 abbey atlanta
7 560 453 2223 san francisco 145 74 abruzzi atlanta
8 560 453 2223 san francisco 146 74 abruzzi atlanta
9 560 453 2223 san francisco 561 454 acquarello san francisco
\n", "

10 rows \u00d7 8 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ " id_x cluster_x name_x city_x id_y cluster_y name_y \\\n", "0 560 453 2223 san francisco 560 453 2223 \n", "1 560 453 2223 san francisco 781 675 103 west \n", "2 560 453 2223 san francisco 279 172 20 mott \n", "3 560 453 2223 san francisco 43 23 21 club \n", "4 560 453 2223 san francisco 44 23 21 club \n", "5 560 453 2223 san francisco 280 173 9 jones street \n", "6 560 453 2223 san francisco 486 379 abbey \n", "7 560 453 2223 san francisco 145 74 abruzzi \n", "8 560 453 2223 san francisco 146 74 abruzzi \n", "9 560 453 2223 san francisco 561 454 acquarello \n", "\n", " city_y \n", "0 san francisco \n", "1 atlanta \n", "2 new york \n", "3 new york \n", "4 new york city \n", "5 new york \n", "6 atlanta \n", "7 atlanta \n", "8 atlanta \n", "9 san francisco \n", "\n", "[10 rows x 8 columns]" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DIY\n", "\n", "* Like we did with *clusters* remove \"extra\" record pairs, e.g., ones with the same ID." ] }, { "cell_type": "code", "collapsed": false, "input": [ "prod = prod[prod.id_x < prod.id_y]\n", "prod[:10]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_xcluster_xname_xcity_xid_ycluster_yname_ycity_y
1 560 453 2223 san francisco 781 675 103 west atlanta
9 560 453 2223 san francisco 561 454 acquarello san francisco
12 560 453 2223 san francisco 708 602 afghan kebab house new york city
20 560 453 2223 san francisco 782 676 alon's at the terrace atlanta
24 560 453 2223 san francisco 762 656 andre's french restaurant las vegas
28 560 453 2223 san francisco 640 534 apple pan the west la
33 560 453 2223 san francisco 709 603 arcadia new york city
40 560 453 2223 san francisco 641 535 asahi ramen west la
47 560 453 2223 san francisco 642 536 baja fresh westlake village
48 560 453 2223 san francisco 783 677 baker's cajun cafe atlanta
\n", "

10 rows \u00d7 8 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ " id_x cluster_x name_x city_x id_y cluster_y \\\n", "1 560 453 2223 san francisco 781 675 \n", "9 560 453 2223 san francisco 561 454 \n", "12 560 453 2223 san francisco 708 602 \n", "20 560 453 2223 san francisco 782 676 \n", "24 560 453 2223 san francisco 762 656 \n", "28 560 453 2223 san francisco 640 534 \n", "33 560 453 2223 san francisco 709 603 \n", "40 560 453 2223 san francisco 641 535 \n", "47 560 453 2223 san francisco 642 536 \n", "48 560 453 2223 san francisco 783 677 \n", "\n", " name_y city_y \n", "1 103 west atlanta \n", "9 acquarello san francisco \n", "12 afghan kebab house new york city \n", "20 alon's at the terrace atlanta \n", "24 andre's french restaurant las vegas \n", "28 apple pan the west la \n", "33 arcadia new york city \n", "40 asahi ramen west la \n", "47 baja fresh westlake village \n", "48 baker's cajun cafe atlanta \n", "\n", "[10 rows x 8 columns]" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. Add a column to the Cartesian product that measures how \"similar\" each pair of records is.\n", "\n", "In the homework assignment, we used a string similarity metric called *cosine similarity* which measured how many \"tokens\" two strings shared in common.\n", "Now, we're going to use an alternative measure of string similarity called *edit-distance*.\n", "[Edit-distance](http://en.wikipedia.org/wiki/Edit_distance) counts the number of simple changes you have to make to a string to turn it into another string.\n", "\n", "Import the edit distance library:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import Levenshtein as L\n", "\n", "L.distance('Hello, World!', 'Hallo, World!')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "1" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we add a computed column, named **distance**, that measures the edit distance between the names of two restaurants:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# This takes a minute or two to run\n", "prod['distance'] = prod.apply(lambda r: L.distance(r['name_x'], r['name_y']), axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. Filter the Cartesian product based on when the join criterion is \"similar enough.\"\n", "\n", "Now we complete the join by filtering out pairs or records that aren't equal enough for our liking.\n", "As in the first homework assignment, we can only figure out how similar is \"similar enough\" by trying out some different options.\n", "Let's try maximum edit-distance from 0 to 10 and compute precision and recall." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "import pylab\n", "\n", "def accuracy(max_distance):\n", " similar = prod[prod.distance < max_distance]\n", " correct = float(sum(similar.cluster_x == similar.cluster_y))\n", " precision = correct / len(similar)\n", " recall = correct / len(clusters)\n", " return (precision, recall)\n", "\n", "thresholds = range(1, 11)\n", "p = []\n", "r = []\n", "\n", "for t in thresholds:\n", " acc = accuracy(t)\n", " p.append(acc[0])\n", " r.append(acc[1])\n", "\n", "pylab.plot(thresholds, p)\n", "pylab.plot(thresholds, r)\n", "pylab.legend(['precision', 'recall'], loc=2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEACAYAAABMEua6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlYlWXixvHvUXBNEFRItlBQZFGkKCYbC80lLW1cKjRb\n3LLFKZtmxqmpxmbK1MmmJqdraMaylZzmakYtwdJCLX/KlFtmmVrE4haCIqIgx/f3x5snEWSRA+9Z\n7s91nevAOS/n3EDePD3Pu9gMwzAQERGP0MrqACIi4jwqdRERD6JSFxHxICp1EREPolIXEfEgKnUR\nEQ9Sb6lPmTKF4OBg+vbte95t7r//fnr16kViYiJbtmxxakAREWm4ekt98uTJZGVlnff5lStXsmfP\nHnbv3s1LL73EPffc49SAIiLScPWW+sCBAwkICDjv88uXL+eOO+4AICUlhSNHjnDw4EHnJRQRkQZr\n8px6YWEh4eHhjs/DwsIoKCho6suKiMgFcMpC6blnGrDZbM54WRERaSSfpr5AaGgo+fn5js8LCgoI\nDQ2tsV10dDR79+5t6tuJiHiVqKgo9uzZ0+DtmzxSHz16NK+99hoAGzdupHPnzgQHB9fYbu/evRiG\n4VK3P/zhD5ZncIdMrppLmZTJG3I1djBc70h9woQJrF27lqKiIsLDw3niiSc4deoUADNmzGDkyJGs\nXLmS6OhoOnbsyCuvvNKoACIi4jz1lnpGRka9L7Jo0SKnhBERkabx6iNKU1NTrY5QgytmAtfMpUwN\no0wN56q5GsNmGEaLXCTDZrPRQm8lIuIxGtudTd77pakCAwMpKSmxOoZXCggIoLi42OoYIuJElo/U\nNYK3jn72Iq6vsf9OvXpOXUTE06jURUQ8iEpdRMSDqNQtdM899/Dkk0/Wu11CQgLr1q1rgUQi4u60\nUOrF9LMXcX1aKG1hVVVVVkcQEXFQqZ9HZGQk8+bNIz4+nsDAQKZMmUJFRQXZ2dmEhYWxYMECunfv\nztSpUzEMg3nz5hEdHU3Xrl255ZZbqu17/8knnzBgwAACAgKIiIhwnADtzjvv5LHHHgOgqKiIG264\ngYCAALp06cLVV19dLcuaNWsAqKioYNasWYSGhhIaGsqDDz5IZWUlgCPbs88+S3BwMCEhISxZsqSF\nfmIi4gpatNRPnmzJd2u6t956iw8++IC9e/fyzTff8OSTT2Kz2Th48CAlJSXk5eWRnp7OX//6V5Yv\nX866devYv38/AQEB3HfffQB8//33jBw5kgceeICioiK2bt1KYmIiYP5v1Zlzzy9cuJDw8HCKioo4\ndOgQTz/9tCPH2ds99dRT5OTksG3bNrZt20ZOTk61efmDBw9SWlrKvn37WLx4Mffddx9Hjx5tqR+Z\niFjNaCGA0a2bYTz8sGHk5VV/vO6vc86tsSIjI4309HTH5ytXrjSioqKM7Oxso02bNkZFRYXjudjY\nWGPNmjWOz/ft22f4+voaVVVVxty5c42xY8fW+h533nmn8dhjjxmGYRiPP/64ceONNxp79uypNcuZ\n14+KijIyMzMdz61atcqIjIw0DMMwPv74Y6N9+/aG3W53PB8UFGRs2rSp1vdvwV+/iFygxv47bdGR\n+iefwPHj0L8/3HQTrF9f/9c4q9YvxNmX6YuIiGDfvn0AdOvWjTZt2jiey83NZcyYMQQEBBAQEEBc\nXBw+Pj4cPHiQgoICevbsWcf3Z4b7zW9+Q3R0NMOGDSMqKor58+fXuv2+ffu45JJLas0F0KVLF1q1\n+unX2qFDB8rKyhr5nYuIu2rRUu/dG55/HnJz4ZprYPr0lnz3xsvLy6v2cUhICFDzcn0RERFkZWVR\nUlLiuJWXlxMSEkJ4eHiDTnJ/0UUX8cwzz7B3716WL1/Os88+y8cff1xju5CQEHJzc2vNJSJiyUJp\np04wcybs3GnFuzeMYRi8+OKLFBYWUlxczFNPPUVaWlqt295999088sgjjj8CP/zwA8uXLwfg1ltv\nZfXq1bzzzjtUVVVx+PBhtm3b5niPM9577z327NmDYRj4+fnRunXraiPuMyZMmMCTTz5JUVERRUVF\n/PGPf+S2225z9rcvIm7K0r1fauksl2Gz2Zg4caJjOqRXr148+uijGIZRY6T+wAMPMHr0aIYNG4af\nnx9XXnklOTk5gDmFs3LlShYuXEiXLl1ISkpi+/btjvc481p79uxh6NChdOrUiQEDBnDfffdxzTXX\n1Mj16KOPkpycTL9+/ejXrx/Jyck8+uij1XKLiPfSwUfn0aNHDxYvXszgwYOtjtJsXPVnLyI/0cFH\nIiJeTKUuIuJBNP3ixfSzF3F9mn4REfFiKnUREQ+iUhcR8SAqdRERD6JSFxHxICp1C6WmprJ48WIA\nlixZwsCBAy1OJCLuTqVuobNPEyAi4gwq9QbQJetExF2o1M8jMjKSBQsW0K9fPzp16sSnn37quCRd\n//79Wbt2rWPb4uJiJk+eTGhoKIGBgYwZMwaAkpISbrjhBoKCgggMDGTUqFEUFhZa9S2JiBdQqdfh\n7bffJjMzk71793LjjTfy+OOPU1JSwjPPPMO4ceM4fPgwALfddhsnT55k586dHDp0iF/96leAeWrd\nqVOnkpeXR15eHu3bt2fmzJlWfksi4uF8rA5QH9sTzplzNv7QuMPhbTYb999/P6GhocyfP5+RI0dy\n3XXXATBkyBCSk5N5//33GTp0KFlZWRQXF+Pv7w/gWPA8e9QO8Mgjj3j0WR9FxHouX+qNLWNnOnM5\nu++//5533nmHFStWOJ6rqqpi8ODB5OfnExgY6Cj0s5WXl/Pggw+yatUqSkpKACgrK6v1nOwiIs7g\n8qVupTPFGxERwW233cZLL71UY5v9+/dTXFzM0aNHaxT7woUL+eabb8jJySEoKIitW7dy6aWXqtRF\npNloTr0BJk2axIoVK/jggw+w2+2cPHmS7OxsCgsL6d69OyNGjODee+/lyJEjnDp1ivU/XlG7rKyM\n9u3b4+/vT3FxMU888YTF34mIeDqN1BsgLCyMZcuW8dvf/pYJEybQunVrUlJSePHFFwF4/fXXefDB\nB+nTpw+VlZUMHjyYgQMHMmvWLCZOnEjXrl0JDQ3lV7/6lePapefSPusinscwDCrsFRw9eZTSilKO\nVvx4X9/nP96XVpQ2+j3rPZ96VlYWs2bNwm63M23aNGbPnl3t+aKiIiZNmsSBAweoqqri17/+NXfe\neWfNN9L51F2OfvYi51d1uqrWwq2rhGsr61a2Vvi19cO/nb9539b/p8/bnOfxsz6PDIhs1L/TOkvd\nbrcTExPD6tWrCQ0N5fLLLycjI4PY2FjHNnPmzKGiooKnn36aoqIiYmJiOHjwID4+1f8nQKXuevSz\nF29jP21nf9l+8o/mk1+a/9N9aT6FpYUcOXnEUcgVVRV0atvpvGV73sfP+bytT9smZW7sv9M6p19y\ncnKIjo4mMjISgLS0NJYtW1at1Lt378727dsBKC0tpUuXLjUKXUSkuZ02TvPD8R+ql/VZpZ1/NJ8D\nZQfo0qEL4X7hhPuHm/d+4QwIH0CYXxgB7QIcpdzRt6NbTonW2b6FhYWO3frAnFvetGlTtW2mT5/O\n4MGDCQkJ4dixY/zrX/9qnqQi4rUMw6DkZMl5y/rMSLtT2041Cjupe5LjsZBOIbRp3cbqb6dZ1Vnq\nDfkrNXfuXPr37092djZ79+5l6NChbNu2jU6dOtXYds6cOY6PU1NTSU1NbXRgEfE8pRWl5B/Np6C0\n4Lyl7dvKt1pZh/uHM6THEMdjYX5htPdtb/W30mTZ2dlkZ2df8NfXWeqhoaHk5+c7Ps/PzycsLKza\nNhs2bOD3v/89AFFRUfTo0YNdu3aRnJxc4/XOLnUR8TyV9soG7d1RfKK4WmFXna6qdUrk7Mc6ta05\nUPRE5w54G7srdJ2lnpyczO7du8nNzSUkJISlS5eSkZFRbZs+ffqwevVqrrrqKg4ePMiuXbvo2bNn\no0KIiLVOG6c5VnGs0bvcnft41emquhcR2/rj39afyM6RjI4Z7SjtgHYBbjl/7YrqLHUfHx8WLVrE\n8OHDsdvtTJ06ldjYWNLT0wGYMWMGjzzyCJMnTyYxMZHTp0+zYMECAgMDGxwgIEC/TKsEBARYHUGa\nyDAMTlSduOD9oM88fvzUcTr6dqx3t7uogKg69/ho79Ne/54tVu9+6k57IzfcfW73brjySti5E4KC\nrE4jnuZ8UxWNGRmXVpTi08qnRrmeXbiOEXIdu951atuJVjYdYO6KGtudKvV6zJoFlZXw48GjIjWc\nNk6TdzSPHYd2sO/YvgYX8in7Kfzb+ddZyPXtC+3X1s/j9+bwdip1JysuhpgYWLcOzto9X7yQYRgU\nHivky0NfsuPQDr78wbz/qugrOrfrTHy3eML9wus9GEVTFdIYKvVm8Oyz8NFH8N57VieRlmAYBoeO\nH3KU9peHvmTHD+Z9O592xAfFk9AtwbwPSiCuWxyd23W2OrZ4KJV6M6iogLg4eOkluPZaq9OIMxWf\nKHYUt6PEf/iS08ZpEoISiO8W77iPD4qna4euVkcWL6NSbyb//jc8+SR8/jm0bm11Gmms0orSGsW9\n49AOyk+Vm4V9prx/HH0HdwzW1Ii4BJV6MzEMGDgQpk2DWk5CKS7ieOVxvir6qsa8d/GJYmK7xVYb\neScEJRDmF6byFpemUm9GmzbBuHGwaxd07Gh1mpZhGAblp8oxcK3f3WnjNN+VfFetuL/84Uv2H9tP\n7y69q0+dBMUT2TlSu+yJW1KpN7MJE8y9YB5/3Ook9TtZdbLarnQXcoTgsYpj+Lb2pbXNteacbDYb\nEf4RNea9owKj8Gmls4SK51CpN7PcXEhOhi++gO7dm+c9qk5XOeWQbcMw6t+9rgH7QaskRayjUm8B\ns2fD4cPwz39Wf9wwDMoqy+ofFddTyieqTtCpTadGXynl3Mfatm6r+WIRN6dSv0DnTlXUVciHy0pZ\n8eFR4pJKqfL56fFjlcdo59OuyVdK6dimo+Z/RQRw8pWP3EFLTlV0ad+FHp174N/On64H/Ph8rT+v\nvOBH5/aaqhAR12DZSN0wDI6fOt7ks8udmaq40PniMyc6auxUxalT0Lcv/OUvMGJEc/zERERcfPql\n74t9q51d7uypigudrrByqmLFCvjd72DbNtBlWUWkObh0qW/dv9Wj9qowDPO0AbfcAjNmWJ1GRDyR\nS5e6Ky+UXqgtW2DkSPOAJD8/q9OIiKdpbHdqF4smSkqC4cNh/nyrk4iIaKTuFIWFkJhojtrDw61O\nIyKeRCN1C4SGwr33wiOPWJ1ERLydRupOUlYGvXvD8uXmaQRERJxBI3WLXHQRPPEEPPSQuVeMiIgV\nVOpONGWKeU3TZcusTiIi3krTL062ahX88pewYwe00UXeRaSJNP1iseHDoWdP+PvfrU4iIt5II/Vm\nsGOHeaTp119DQIDVaUTEnemIUhdx113mEabPPGN1EhFxZyp1F3HgACQkQE6OOR0jInIhNKfuIi6+\nGGbNMs/iKCLSUjRSb0bl5RATA0uXwoABVqcREXekkboL6dABnnpKBySJSMtRqTezSZOgshL+9S+r\nk4iIN9D0SwvIzobJk+Grr6BdO6vTiIg70fSLC0pNhX794IUXrE4iIp5OI/UWsmsX/Pzn5mi9a1er\n04iIu9B+6i7sl7807zViF5GGcvr0S1ZWFn369KFXr17MP88127Kzs0lKSiIhIYHU1NQGv7m3+cMf\n4O23zVG7iEhzqHOkbrfbiYmJYfXq1YSGhnL55ZeTkZFBbGysY5sjR45w1VVXsWrVKsLCwigqKqJr\nLfMLGqmb/vxn+OQTnZ5XRBrGqSP1nJwcoqOjiYyMxNfXl7S0NJad00ZvvfUW48aNIywsDKDWQpef\n/PKXsH27uUeMiIiz1VnqhYWFhJ91JeWwsDAKCwurbbN7926Ki4sZNGgQycnJvP76682T1EO0awfz\n5pkHJJ0+bXUaEfE0dZa6zWar9wVOnTrF5s2bWblyJatWreJPf/oTu3fvdlpAT3TzzeYFNN54w+ok\nIuJpfOp6MjQ0lPz8fMfn+fn5jmmWM8LDw+natSvt27enffv2XH311Wzbto1evXrVeL05c+Y4Pk5N\nTfXaRVWbDRYuhFtugfHjzdMJiIiAueNJdhPmZ+tcKK2qqiImJoY1a9YQEhLCFVdcUWOh9Ouvv2bm\nzJmsWrWKiooKUlJSWLp0KXFxcdXfSAulNdx8s3lQ0qOPWp1ERFxVY7uzzpG6j48PixYtYvjw4djt\ndqZOnUpsbCzp6ekAzJgxgz59+nDdddfRr18/WrVqxfTp02sUutRu3jy44gqYNs08Va+ISFPp4COL\n/frXUFoKL71kdRIRcUU6otTNlJRAnz6wZo15pSQRkbPphF5uJiAAfv97c8QuItJUKnUXcPfd8O23\nsGqV1UlExN2p1F1AmzawYIE5WrfbrU4jIu5Mpe4ibrwRAgPhlVesTiIi7kwLpS7ks8/gF7+A3Fzw\nqXNnUxHxFloodWPJyXDJJfD++1YnERF3pVJ3MXfdpX3WReTCafrFxZSXQ3g4bNkCERFWpxERq2n6\nxc116AATJ8LixVYnERF3pJG6C/riCxgxQgumIqKRukfo29ecgsnMtDqJiLgblbqL0oKpiFwITb+4\nqOPHzdH6tm3mvYh4J02/eIiOHWHCBHj5ZauTiIg70UjdhW3bBjfcYC6Ytm5tdRoRsYJG6h4kMRFC\nQiAry+okIuIuVOouTgumItIYmn5xcWVl5pGlX3wBoaFWpxGRlqbpFw9z0UVwyy1aMBWRhtFI3Q1s\n2WKekvfbb7VgKuJtNFL3QElJEBQEH3xgdRIRcXUqdTehBVMRaQhNv7iJY8fMBdMvvzR3cxQR76Dp\nFw/VqRPcfLOuYSoiddNI3Y18/jmMG2cumLbSn2MRr6CRuge77DLo0gU+/NDqJCLiqlTqbkYLpiJS\nF02/uJnSUrjkEti5E7p3tzqNiDQ3Tb94OD8/GD8eliyxOomIuCKN1N3Q//5nnjpgzx4tmIp4Oo3U\nvUByMvj7w5o1VicREVejUndDNpsWTEWkdpp+cVNHj5oLprt2QXCw1WlEpLlo+sVL+PubByJpwVRE\nzqaRuhvbtAluvRW++UYLpiKeyukj9aysLPr06UOvXr2YP3/+ebf73//+h4+PD++++26D31ya5oor\noGNH+Phjq5OIiKuos9TtdjszZ84kKyuLnTt3kpGRwVdffVXrdrNnz+a6667TaLwFacFURM5VZ6nn\n5OQQHR1NZGQkvr6+pKWlsWzZshrbvfDCC4wfP55u3bo1W1Cp3a23wqpVcOiQ1UlExBXUWeqFhYWE\nh4c7Pg8LC6OwsLDGNsuWLeOee+4BzPkfaTmdO8OYMfDqq1YnERFXUGepN6SgZ82axbx58xyT+Zp+\naXlnpmD0oxcRn7qeDA0NJT8/3/F5fn4+YWFh1bb5/PPPSUtLA6CoqIjMzEx8fX0ZPXp0jdebM2eO\n4+PU1FRSU1ObEF3O+NnPoF07yM6GQYOsTiMiTZGdnU12dvYFf32duzRWVVURExPDmjVrCAkJ4Yor\nriAjI4PY2Nhat588eTKjRo1i7NixNd9IuzQ2qxdegA0bICPD6iQi4kxO3aXRx8eHRYsWMXz4cOLi\n4rjllluIjY0lPT2d9PT0JocV55k0CTIz4YcfrE4iIlbSwUce5I47oF8/eOghq5OIiLPoNAFeTAum\nIqJS9yADBoCPD6xbZ3USEbGKSt2D6AhTEdGcuocpLoaePWHvXujSxeo0ItJUmlP3coGBMGoUvPaa\n1UlExAoqdQ+kBVMR76VS90A//7l5/8kn1uYQkZanUvdAWjAV8V5aKPVQhw9DVBR8+605zy4i7kkL\npQKYe75cfz28/rrVSUSkJanUPZgWTEW8j0rdg119NVRVmWdvFBHvoFL3YFowFfE+Wij1cEVFEB0N\n330HAQFWpxGRxtJCqVTTtSuMGAFvvGF1EhFpCSp1L6AFUxHvoVL3AqmpcPIkbNxodRIRaW4qdS+g\nBVMR76GFUi9x6BD07g25udC5s9VpRKShtFAqtQoKguHD4c03rU4iIs1Jpe5F7roL0tO1YCriyVTq\nXmTQICgvh5wcq5OISHNRqXuRVq1g+nQtmIp4Mi2UepmDByEmBr7/Hvz9rU4jIvXRQqnUKTgYhg6F\nt96yOomINAeVuhfSgqmI51Kpe6Frr4XSUvjsM6uTiIizqdS9kBZMRTyXFkq91IEDEBtrLpj6+Vmd\nRkTORwul0iAXXwyDB0NGhtVJRMSZVOpeTCf5EvE8KnUvNnQoHD4Mn39udRIRcRaVuhfTgqmI59FC\nqZfbtw/i4yEvDzp1sjqNiJxLC6XSKCEh5pWR3n7b6iQi4gwqddGCqYgHaVCpZ2Vl0adPH3r16sX8\n+fNrPP/mm2+SmJhIv379uOqqq9i+fbvTg0rzGTbMvDLS5s1WJxGRpqq31O12OzNnziQrK4udO3eS\nkZHBV199VW2bnj17sm7dOrZv385jjz3GXXfd1WyBxflat4Zp0+Af/7A6iYg0Vb2lnpOTQ3R0NJGR\nkfj6+pKWlsayZcuqbXPllVfi/+N5XFNSUigoKGietNJspkyBpUuhrMzqJCLSFPWWemFhIeHh4Y7P\nw8LCKCwsPO/2ixcvZuTIkc5JJy0mNBQGDjSLXUTcl099G9hstga/2Mcff8zLL7/Mp59+Wuvzc+bM\ncXycmppKampqg19bmt9dd8Ef/whTp1qdRMR7ZWdnk52dfcFfX+9+6hs3bmTOnDlkZWUB8PTTT9Oq\nVStmz55dbbvt27czduxYsrKyiI6OrvlG2k/d5dntEBkJK1ZA//5WpxERaIb91JOTk9m9eze5ublU\nVlaydOlSRo8eXW2bvLw8xo4dyxtvvFFroYt70IKpiPtr0BGlmZmZzJo1C7vdztSpU3n44YdJT08H\nYMaMGUybNo3//Oc/REREAODr60vOOZes10jdPeTnQ2Kied+xo9VpRKSx3anTBEgNo0bB2LEwebLV\nSUREpwmQJtMRpiLuS6UuNYwYYU6/6MBgEfejUpcafHzM3Rq1YCrifjSnLrXKy4OkJHPE3qGD1WlE\nvJfm1MUpIiLgZz+Dd96xOomINIZKXc5LC6Yi7kelLud1/fWQmws7dlidREQaSqUu5+XjY569UQum\nIu5DC6VSp9xcuOwyWL3aXDgVkZalhVJxqshIeP55GD4cfvc7OHHC6kQiUheVutRr0iTzQKRvvzXP\nC7N2rdWJROR8NP0ijbJsGdx3n7mIumAB/HjBKxFpJpp+kWZ1440/7Q0TH2+WvIi4Do3U5YKtXWue\nfz0pCV54AYKDrU4k4nk0UpcWc8015lx7z57Qty8sWQL6uy1iLY3UxSk2bzZH7V27Qno69OhhdSIR\nz6CRulji0kth0ya49lq4/HL4y1/Ma56KSMvSSF2cbvdumD7d3Kf9n/80p2ZE5MJopC6W69ULPvrI\nPCf74MHw+ONQUWF1KhHvoFKXZtGqlXmWx61bzcXUpCTYsMHqVCKeT9Mv0uwMA/79b3jgARg3DubO\nhU6drE4l4h40/SIux2aDm24yD1oqK4OEBMjMtDqViGfSSF1a3IcfwowZMGAAPPecuRukiNROI3Vx\neUOHwhdfQFCQOWp/6y0dtCTiLBqpi6U2bTIPWoqIgL//HcLDrU4k4lo0Uhe3kpICn39uXuQ6KQn+\n9jc4fdrqVCLuSyN1cRk7d5oHLdls5kFLffpYnUjEehqpi9uKi4P16yEtDX7+c3jySaistDqViHvR\nSF1cUl4e3H03FBTA4sXm+WREvJFG6uIRIiLg/ffht7+FG26Ahx6C48etTiXi+lTq4rJsNvP6qDt2\nwP790K8frFljdSoR16bpF3Eb778P99wDQ4bAwoUQEGB1IpHm19juVKmLWzl2DB5+GN591zyIKTIS\nLrnkp/vwcGjTxuqUIs6jUhevsG2buX97bi58//1P9/v2mUeqnl30Z99HRED79pZGF2kUlbp4taoq\nKCysXvRn3+fnQ+fOtRf+JZeYN51BUlyJ00s9KyuLWbNmYbfbmTZtGrNnz66xzf33309mZiYdOnRg\nyZIlJCUlNTmYSHM4fRoOHKi98L//3ry1b3/+wo+MNP8o2GxWfhfiTZxa6na7nZiYGFavXk1oaCiX\nX345GRkZxMbGOrZZuXIlixYtYuXKlWzatIkHHniAjRs3NjlYS8jOziY1NdXqGNW4YiZwzVzNkckw\n4Icfzl/4ubnmducr/D17shk0KJX27c0/Dr6+1v8B8JbfnTO4Yq7GdqdPXU/m5OQQHR1NZGQkAGlp\naSxbtqxaqS9fvpw77rgDgJSUFI4cOcLBgwcJDg6+gPgtyxV/ga6YCVwzV3NkstnMOfmgoNoPeDIM\nOHKkZuFv2GB+/O232bRtm0p5uXmNVsOADh1wlPy5N2c/V9sfEW/53TmDq+ZqjDpLvbCwkPCzTpsX\nFhbGpk2b6t2moKDALUpdpLFsNnNXyoAA8wRk55ozx7ydceqUWe5n384U/vluZ54vKWnc15aX1/5H\npKQEVqyA1q3Bx8e8nfnYWfeN/ZqvvzYztWpl/kxru6/rOWdtc+5j5eVw+PBPfxhttuq32h5r7OPN\nrc5StzUwwbn/a9DQrxPxdL6+5s3Pr2Xer7Y/Is89Z54ozW43F5KdfX/iROO/ZudO8ypYhmGuc5zv\nvq7nnLXN2R8fP26eTA7Mx869NeXxczXbHwGjDv/3f/9nDB8+3PH53LlzjXnz5lXbZsaMGUZGRobj\n85iYGOPAgQM1XisqKsoAdNNNN910a8QtKiqqrpquoc6RenJyMrt37yY3N5eQkBCWLl1KRkZGtW1G\njx7NokWLSEtLY+PGjXTu3LnWqZc9e/bU9VYiIuIEdZa6j48PixYtYvjw4djtdqZOnUpsbCzp6ekA\nzJgxg5EjR7Jy5Uqio6Pp2LEjr7zySosEFxGRmlrs4CMREWl+zX6WxilTphAcHEzfvn2b+60aLD8/\nn0GDBhEfH09CQgJ//etfrY7EyZMnSUlJoX///sTFxfHwww9bHcnBbreTlJTEqFGjrI4CQGRkJP36\n9SMpKYmmjOPAAAAFg0lEQVQrrrjC6jgAHDlyhPHjxxMbG0tcXFytx2q0tF27dpGUlOS4+fv7u8R/\n608//TTx8fH07duXiRMnUlFRYXUknn/+efr27UtCQgLPP/+8JRlq68ri4mKGDh1K7969GTZsGEeO\nHKn/hRo1A38B1q1bZ2zevNlISEho7rdqsP379xtbtmwxDMMwjh07ZvTu3dvYuXOnxakM4/jx44Zh\nGMapU6eMlJQUY/369RYnMi1cuNCYOHGiMWrUKKujGIZhGJGRkcbhw4etjlHN7bffbixevNgwDPP3\nd+TIEYsTVWe3242LL77YyMvLszTHd999Z/To0cM4efKkYRiGcfPNNxtLliyxNNMXX3xhJCQkGCdO\nnDCqqqqMIUOGGHv27GnxHLV15W9+8xtj/vz5hmEYxrx584zZs2fX+zrNPlIfOHAgAS52jtSLL76Y\n/v37A3DRRRcRGxvLvn37LE4FHTp0AKCyshK73U5gYKDFiaCgoICVK1cybdo0lzoi2JWyHD16lPXr\n1zNlyhTAXIvy9/e3OFV1q1evJioqqtoxJVbw8/PD19eX8vJyqqqqKC8vJzQ01NJMX3/9NSkpKbRr\n147WrVtzzTXX8O6777Z4jtq68uyDO++44w7++9//1vs6Xn+RjNzcXLZs2UJKSorVUTh9+jT9+/cn\nODiYQYMGERcXZ3UkHnzwQf785z/TqpXr/Kdis9kYMmQIycnJ/OMf/7A6Dt999x3dunVj8uTJXHrp\npUyfPp3y8nKrY1Xz9ttvM3HiRKtjEBgYyEMPPURERAQhISF07tyZIUOGWJopISGB9evXU1xcTHl5\nOe+//z4FBQWWZjrj7KPzg4ODOXjwYL1f4zr/Ui1QVlbG+PHjef7557nooousjkOrVq3YunUrBQUF\nrFu3juzsbEvzvPfeewQFBZGUlORSI+NPP/2ULVu2kJmZyd/+9jfWr19vaZ6qqio2b97Mvffey+bN\nm+nYsSPz5s2zNNPZKisrWbFiBTfddJPVUdi7dy/PPfccubm57Nu3j7KyMt58801LM/Xp04fZs2cz\nbNgwRowYQVJSkksNYs6w2WwNOrDT9ZK3kFOnTjFu3DgmTZrEL37xC6vjVOPv78/111/PZ599ZmmO\nDRs2sHz5cnr06MGECRP46KOPuP322y3NBNC9e3cAunXrxpgxY8jJybE0T1hYGGFhYVz+48lixo8f\nz+bNmy3NdLbMzEwuu+wyunXrZnUUPvvsMwYMGECXLl3w8fFh7NixbNiwwepYTJkyhc8++4y1a9fS\nuXNnYmJirI4EmKPzAwcOALB//36CgoLq/RqvLHXDMJg6dSpxcXHMmjXL6jgAFBUVOVa2T5w4wYcf\nfljrKYxb0ty5c8nPz+e7777j7bffZvDgwbz22muWZiovL+fYsWMAHD9+nA8++MDyPasuvvhiwsPD\n+eabbwBz/jo+Pt7STGfLyMhgwoQJVscAzFHxxo0bOXHiBIZhsHr1apeYZjx06BAAeXl5/Oc//3GJ\nqSowD+589dVXAXj11VcbNgBtjlXcs6WlpRndu3c32rRpY4SFhRkvv/xyc79lvdavX2/YbDYjMTHR\n6N+/v9G/f38jMzPT0kzbt283kpKSjMTERKNv377GggULLM1zruzsbJfY++Xbb781EhMTjcTERCM+\nPt6YO3eu1ZEMwzCMrVu3GsnJyUa/fv2MMWPGuMzeL2VlZUaXLl2M0tJSq6M4zJ8/34iLizMSEhKM\n22+/3aisrLQ6kjFw4EAjLi7OSExMND766CNLMpzpSl9fX0dXHj582Lj22muNXr16GUOHDjVKSkrq\nfR0dfCQi4kG8cvpFRMRTqdRFRDyISl1ExIOo1EVEPIhKXUTEg6jURUQ8iEpdRMSDqNRFRDzI/wM2\nBjkG+wgGgQAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DIY\n", "\n", "1. Another common way to visualize the tradeoff between precision and recall is to plot them directly against each other.\n", "Create a scatterplot with precision on one axis and recall on the other.\n", "Where are \"good\" points on the plot, and where are \"bad\" ones.\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pylab.scatter(p, r)\n", "# Top-right of chart is \"good\" because precision and recall are both high (close to 1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEACAYAAABfxaZOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHv9JREFUeJzt3X9YVGXeP/D36EwCUpKYIjMUMiAMgjNTkP2wHDN3tC1t\ngwqz1YyIp29999Krp2e3/CPY61qV5OkS5btPdF21ftsUfXbbBXEBd/FxQkWjlNTUNXKhhlFIRECa\n1YGZ+/uH7nwjaGYYmTnAeb/+mjPnvs/53HH17nSfc59RCCEEiIhIFsZJXQAREQUPQ5+ISEYY+kRE\nMsLQJyKSEYY+EZGMMPSJiGTEa+hXV1cjKSkJCQkJKCgoGLC/vb0dixYtgsFgQEpKCrZu3dpvv9Pp\nhNFoxGOPPTZsRRMRkX88hr7T6cQrr7yC6upqnDp1CqWlpTh9+nS/NsXFxTAajfj8889hsVjw6quv\noq+vz72/qKgIycnJUCgUgRkBERH5zGPo19fXIz4+HrGxsVCpVMjKykJ5eXm/NtOnT0d3dzcAoLu7\nG5GRkVAqlQCAlpYWVFZW4oUXXgDXgBERSc9j6NtsNsTExLi3NRoNbDZbvzY5OTk4efIkoqOjodfr\nUVRU5N63Zs0abNy4EePG8dYBEdFI4DGNfZmSWbduHQwGA86dO4fPP/8cL7/8Mi5fvozdu3dj6tSp\nMBqNvMonIhohlJ52qtVqWK1W97bVaoVGo+nXpq6uDmvXrgUAaLVazJgxA3//+99RV1eHXbt2obKy\nEleuXEF3dzdWrFiBDz74oF//+Ph4nD17drjGQ0QkC1qtFl999dXQOwoPent7RVxcnGhqahJXr14V\ner1enDp1ql+bNWvWiLy8PCGEEK2trUKtVouLFy/2a2OxWMSjjz466Dm8lDDqvfnmm1KXEFAc3+g2\nlsc3lscmhP/Z6fFKX6lUori4GGazGU6nE9nZ2dDpdCgpKQEA5Obm4o033sCqVaug1+vhcrnw1ltv\nYfLkyQOOxad3iIik5zH0AWDx4sVYvHhxv+9yc3Pdn6dMmYKKigqPx5g3bx7mzZvnZ4lERDRc+FhN\ngJlMJqlLCCiOb3Qby+Mby2O7EYrrc0PSFaBQ8OkeIqIh8jc7eaVPRCQjDH0iIhlh6BMRyQhDn4hI\nRhj6REQywtAnIpIRhj4RkYww9ImIZIShT0QkIwx9IiIZYegTEckIQ5+ISEYY+kREMsLQJyKSEYY+\nEZGMMPSJiGSEoU9EJCMMfSIiGWHoExHJCEOfiEhGGPpERDLiU+hXV1cjKSkJCQkJKCgoGLC/vb0d\nixYtgsFgQEpKCrZu3QoAsFqtmD9/PmbNmoWUlBRs3rx5WIsfyYQQqKysxJYtW/Dxxx9LXQ4REQBA\nIYQQnho4nU4kJiaipqYGarUa6enpKC0thU6nc7fJy8vD1atXsX79erS3tyMxMRFtbW1ob29Ha2sr\nDAYDenp6cNddd6GsrKxfX4VCAS8ljErZ2a9g504LnM4HMW5cFf7jP17Em2++LnVZRDRG+JudXq/0\n6+vrER8fj9jYWKhUKmRlZaG8vLxfm+nTp6O7uxsA0N3djcjISCiVSkRFRcFgMAAAwsPDodPpcO7c\nuSEXOdqcOHECO3aU47vvDuHKld/Cbj+E9es34OLFi1KXRkQy5zX0bTYbYmJi3NsajQY2m61fm5yc\nHJw8eRLR0dHQ6/UoKioacJzm5mY0NDRgzpw5w1D2yHbhwgWoVDMA3Hz9myioVJHo6OiQsiwiIii9\nNVAoFF4Psm7dOhgMBlgsFpw9exYLFy7EsWPHcPPN10Kvp6cHmZmZKCoqQnh4+ID+eXl57s8mkwkm\nk8n3EYxAer0eLtcZAGUAHoFCsRUTJwJ33HGH1KUR0ShlsVhgsVhu+DheQ1+tVsNqtbq3rVYrNBpN\nvzZ1dXVYu3YtAECr1WLGjBk4c+YM0tLS0Nvbi4yMDDz77LN4/PHHBz3H90N/LIiMjMRf/1qOzMyV\nOH8+E1rtbJSX/wU33XST1KUR0Sj1wwvi/Px8v47jdXonLS0NjY2NaG5uhsPhwM6dO7FkyZJ+bZKS\nklBTUwMAaGtrw5kzZxAXFwchBLKzs5GcnIzVq1f7VeBodc8996Cl5Qz6+nrx5ZdH+928JiKSiten\ndwCgqqoKq1evhtPpRHZ2Nl5//XWUlJQAAHJzc9He3o5Vq1bhm2++gcvlwuuvv45nnnkGBw4cwIMP\nPojZs2e7p4nWr1+PRYsW/f8CxujTO0REgeRvdvoU+oHE0CciGrqAPbJJRERjB0OfiEhGGPpERDLC\n0CcikhGGPhGRjDD0iYhkxOuKXPJfQ0MDjh49itjYWDz00EM+vdKCiCiQ+Jx+gPzXf72Lf//3PCgU\nCwF8gieffAjvv/9/GPxENCy4OGsEsdvtuPXWaXA4PgegBdCDiRNTsW/ffyM9PV3q8ohoDODirBHk\n0qVLGD9+Iq4FPgCEY/x4Hc6fPy9lWUREDP1AiIqKQkREOBSKdwEIALVwOj/FnXfeKXVpRCRzDP0A\nGD9+PPburcAdd2zGuHETMGnSU/joow8HvJKaiCjYOKcfYFeuXMGECRN4A5eIhhVv5BIRyQhv5BIR\nkVcMfSIiGWHoExHJCEOfiEhGGPpERDLC0CcikhGGPhGRjDD0iYhkhKFPRCQjXkO/uroaSUlJSEhI\nQEFBwYD97e3tWLRoEQwGA1JSUrB161af+xIRUXB5fA2D0+lEYmIiampqoFarkZ6ejtLSUuh0Oneb\nvLw8XL16FevXr0d7ezsSExPR1tYGhULhtS/A1zAQEfkjIK9hqK+vR3x8PGJjY6FSqZCVlYXy8vJ+\nbaZPn47u7m4AQHd3NyIjI6FUKn3qS0REweUx9G02G2JiYtzbGo0GNputX5ucnBycPHkS0dHR0Ov1\nKCoq8rkvEREFl8cfRvfldcDr1q2DwWCAxWLB2bNnsXDhQhw7dmxIReTl5bk/m0wmmEymIfUnIhrr\nLBYLLBbLDR/HY+ir1WpYrVb3ttVqHfBDIHV1dVi7di0AQKvVYsaMGThz5gw0Go3Xvv/y/dAnIqKB\nfnhBnJ+f79dxPE7vpKWlobGxEc3NzXA4HNi5cyeWLFnSr01SUhJqamoAAG1tbThz5gzi4uJ86ktE\nRMHl8UpfqVSiuLgYZrMZTqcT2dnZ0Ol0KCkpAQDk5ubijTfewKpVq6DX6+FyufDWW29h8uTJADBo\nXyIikg5/OYuIaBTiL2cREZFXDH0iIhlh6BMRyQhDn4hIRjw+vUM3prOzE3V1dQgNDcUDDzwApZL/\nuIlIWkyhAGlsbMR99z0MhyMeLtdFJCVFoLa2CqGhoVKXRkQyxumdAMnJWYOOjtXo7t6Lnp6j+OKL\nySgq2iJ1WUQkcwz9APnHP5rhcj10fWscrlwxobHxa0lrIiJi6AfInDl34aab3gHgAtCFsLBtuO++\nu6Qui4hkjityA6SjowNm8xM4ceIEXK6rWLlyFd59d7NPby4lIvLG3+xk6AeQEAJtbW0ICQlBRESE\n1OUQ0RjC0CcikhG+e4eIiLxi6BMRyQhDn4hIRhj6REQywtAnIpIRhj4RkYww9ImIZIShH2DffPMN\nampq0NzcLHUpREQM/UB6//3/i6SkO5GZ+RskJ6fjt799V+qSiEjmuCI3QNrb2xETk4ArVw4DSATw\nD4SEpOOrr45DrVZLXR4RjXJckTvCWK1W3HRTDK4FPgDEYcKEeHz9NV+vTETS8Rr61dXVSEpKQkJC\nAgoKCgbsLywshNFohNFoRGpqKpRKJTo7OwEA69evx6xZs5CamopnnnkGV69eHf4RjFBxcXFwOs8B\nqLv+zWfo7T2LhIQEKcsiIrkTHvT19QmtViuampqEw+EQer1enDp16kfbV1RUiAULFgghhGhqahIz\nZswQV65cEUII8dRTT4mtW7cO6OOlhFGtsrJSTJwYKcLDtSIs7Fbx5z+XSV0SEY0R/manx9/Ira+v\nR3x8PGJjYwEAWVlZKC8vh06nG7T99u3bsWzZMgDALbfcApVKBbvdjvHjx8Nut8tuLnvx4sVoa/sa\nLS0tUKvVCA8Pl7okIpI5j9M7NpsNMTEx7m2NRgObzTZoW7vdjj179iAjIwMAMHnyZLz66qu4/fbb\nER0djYiICDz88MPDWProMHHiRCQmJjLwiWhE8HilP5RfeaqoqMDcuXPdPxZy9uxZbNq0Cc3NzZg0\naRKefPJJbNu2DcuXLx/QNy8vz/3ZZDLBZDL5fF4iIjmwWCywWCw3fByPoa9Wq2G1Wt3bVqsVGo1m\n0LY7duxwT+0AwGeffYb77rsPkZGRAIAnnngCdXV1XkOfiIgG+uEFcX5+vl/H8Ti9k5aWhsbGRjQ3\nN8PhcGDnzp1YsmTJgHZdXV2ora3F0qVL3d8lJSXh8OHD+Oc//wkhBGpqapCcnOxXkURENDw8Xukr\nlUoUFxfDbDbD6XQiOzsbOp0OJSUlAIDc3FwAQFlZGcxmM0JDQ9199Xo9VqxYgbS0NIwbNw533nkn\nXnzxxQAOhYiIvOGKXCKiUYgrcomIyCuGPhGRjDD0g0AIgdOnT6O+vh52u13qcohIxjzeyKUb53K5\n8NRTz6Gq6n+gVN6GsLBuHDjwV2i1WqlLIyIZ4pV+gH3wwQeoqvoKdnsjursb8O23L2P58n+Tuiwi\nkimGfoCdOnUGdvsjAK49zupyPYEvv/y7tEURkWwx9ANs9uxZmDixAkAPAGD8+B1ITp4lbVFEJFt8\nTj/AXC4XVqzIxUcflUOlisSkSQL791e731xKROQPf7OToR8kzc3NuHz5MmbOnIkJEyZIXQ4RjXIM\nfSIiGeGKXCIi8oqhT0QkIwx9IiIZYegTEckIQ5+ISEYY+kREMsLQJyKSEYY+EZGMMPSJiGSEoU9E\nJCMMfSIiGWHoExHJCEOfiEhGvIZ+dXU1kpKSkJCQgIKCggH7CwsLYTQaYTQakZqaCqVSic7OTgBA\nZ2cnMjMzodPpkJycjMOHDw//CIiIyGceX63sdDqRmJiImpoaqNVqpKeno7S0FDqdbtD2u3fvxqZN\nm1BTUwMAWLlyJebNm4fnn38efX19+O677zBp0qT+BfDVykREQxaQVyvX19cjPj4esbGxUKlUyMrK\nQnl5+Y+23759O5YtWwYA6Orqwv79+/H8888DAJRK5YDAJyKi4PIY+jabDTExMe5tjUYDm802aFu7\n3Y49e/YgIyMDANDU1ITbbrsNq1atwp133omcnBzY7fZhLJ2IiIZK6WmnQqHw+UAVFRWYO3cuIiIi\nAAB9fX04evQoiouLkZ6ejtWrV2PDhg349a9/PaBvXl6e+7PJZILJZPL5vEREcmCxWGCxWG74OB5D\nX61Ww2q1uretVis0Gs2gbXfs2OGe2gGu/V+BRqNBeno6ACAzMxMbNmwYtO/3Q5+IiAb64QVxfn6+\nX8fxOL2TlpaGxsZGNDc3w+FwYOfOnViyZMmAdl1dXaitrcXSpUvd30VFRSEmJgZffvklAKCmpgaz\nZs3yq0giIhoeHq/0lUoliouLYTab4XQ6kZ2dDZ1Oh5KSEgBAbm4uAKCsrAxmsxmhoaH9+m/ZsgXL\nly+Hw+GAVqvF7373uwANg4iIfOHxkc2gFMBHNomIhiwgj2wSEdHYwtAnIpIRhj4RkYww9ImIZISh\nT0QkIwx9IiIZYegTEckIQ5+ISEYY+kREMsLQJyKSEYY+EZGMMPSJiGSEoU9EJCMMfSIiGWHoExHJ\nCENfQkIIXLhwAd3d3UE/77fffovLly8H9bxEJD2GvkQ6Oztx770PIyZmJqZMiUZOzv+Gy+UK+Hk7\nOjqQnm7C7bcnITIyCq+88ip/xIZIRhj6EnnppVfR0DADV6+2o7fXhu3b6/Hee+8H/LzZ2b/A8eMp\n18/bgq1bLfjwww8Dfl4iGhkY+hI5dOhTOBwvAxgPYBLs9hU4cOCzgJ/3k08+RW/vy7j2p78V3323\nHAcPBv68RDQyMPQlMmPGHRg3bt/1LRdCQixISLg94Oe94447oFB8/7wfIz4+8OclopGBP4wuka++\n+gr33vsQHI6ZEOIStNpQHDz4V4SFhQX0vKdPn8bcuQvR15cMl+sCdLoI1NZWISQkJKDnJaLh5W92\nMvQl1NnZiYMHDyIkJAQPPvggVCpVUM7b0dGBQ4cOISwsDA888ACUSmVQzktEw4ehT0QkI/5mp9c5\n/erqaiQlJSEhIQEFBQUD9hcWFsJoNMJoNCI1NRVKpRKdnZ3u/U6nE0ajEY899tiQiyMiouHl8Urf\n6XQiMTERNTU1UKvVSE9PR2lpKXQ63aDtd+/ejU2bNqGmpsb93dtvv40jR47g8uXL2LVr18ACeKVP\nRDRkAbnSr6+vR3x8PGJjY6FSqZCVlYXy8vIfbb99+3YsW7bMvd3S0oLKykq88MILDHYiohHAY+jb\nbDbExMS4tzUaDWw226Bt7XY79uzZg4yMDPd3a9aswcaNGzFuHJ8MJSIaCTw+tqFQKHw+UEVFBebO\nnYuIiAgA16Z6pk6dCqPRCIvF4rFvXl6e+7PJZILJZPL5vEREcmCxWLxmqS88hr5arYbVanVvW61W\naDSaQdvu2LGj39ROXV0ddu3ahcrKSly5cgXd3d1YsWIFPvjggwF9vx/6REQ00A8viPPz8/06jscb\nuX19fUhMTMTevXsRHR2Nu+++e9AbuV1dXYiLi0NLSwtCQ0MHHOfjjz9GYWEhKioqBhbAG7lEREPm\nb3Z6vNJXKpUoLi6G2WyG0+lEdnY2dDodSkpKAAC5ubkAgLKyMpjN5kED//sFElFgCCHQ1dWFW265\nhffQyCMuziIa5Y4ePYrFizNw6VI7JkyYgD/84UMsWrRI6rIowLgil0iGHA4HoqO1uHixEMDTAA5g\n4sSfobHxOKZPny51eRRAAVuRS0Qj1zfffIOrV5W4FvgAMBdK5WycOHFCyrJoBGPoE41iU6dORW/v\nRQBnr39zCQ7HaajVainLohGMoU80it1yyy14++2NCAubi/DwpxEWZsRLLz2HWbNmSV0ajVCc0yca\nA06cOIHjx49Dq9XinnvukbocCgLeyCUikhHeyCUiIq8Y+kREMsLQJyKSEYY+EZGMMPSJiGSEoU9E\nJCMMfSIiGWHoExH9QG9v75hdP8TQJyK6rqurCwsWLEFIyESEhNyMwsJNUpc07Bj6RETXrVr1Cg4c\nuA0uVw8cji/w5ptbUFVVJXVZw4qhT0R0XW1tLRyOtQBuAhALu30V9u2rlbqsYcXQJyK6burUKABH\nrm8JhIQcgVodJWVJw44vXCMiuq6urg5m8+MQwgyFworY2Cv45JP/QVhYmNSlDcC3bBIRDYPm5mbs\n27cP4eHheOyxxxASEiJ1SYNi6BMRyQhfrUxERF4x9ImIZMSn0K+urkZSUhISEhJQUFAwYH9hYSGM\nRiOMRiNSU1OhVCrR2dkJq9WK+fPnY9asWUhJScHmzZuHfQBEROQ7r3P6TqcTiYmJqKmpgVqtRnp6\nOkpLS6HT6QZtv3v3bmzatAk1NTVobW1Fa2srDAYDenp6cNddd6GsrKxfX87pExENXcDm9Ovr6xEf\nH4/Y2FioVCpkZWWhvLz8R9tv374dy5YtAwBERUXBYDAAAMLDw6HT6XDu3LkhF0lERMPDa+jbbDbE\nxMS4tzUaDWw226Bt7XY79uzZg4yMjAH7mpub0dDQgDlz5txAuUREdCOU3hooFAqfD1ZRUYG5c+ci\nIiKi3/c9PT3IzMxEUVERwsPDB/TLy8tzfzaZTDCZTD6fk4hIDiwWCywWyw0fx2voq9VqWK1W97bV\naoVGoxm07Y4dO9xTO//S29uLjIwMPPvss3j88ccH7ff90CciGmuEEEO6gB7MDy+I8/Pz/TqO1+md\ntLQ0NDY2orm5GQ6HAzt37sSSJUsGtOvq6kJtbS2WLl3q/k4IgezsbCQnJ2P16tV+FUhENFp9++23\nuP9+M1SqCYiM1ODPfy6TuiTvoa9UKlFcXAyz2Yzk5GQ8/fTT0Ol0KCkpQUlJibtdWVkZzGYzQkND\n3d8dPHgQH374Ifbt2+d+pLO6ujowIyEiGmGWLl2O+voUOJ1d6Oj4A5YvfxFffPGFpDXxNQxERAHg\ncrmgUk2Ay9UDYAIAIDT0Rfznfxrx0ksv3fDx+RoGIqIRZNy4cQgPnwzg5PVvXBg//iSmTJkiZVkM\nfSKiQHnnnSKEhT2CCRP+FyZOnIfZsyfiZz/7maQ1cXqHiCiAGhoasH//fkydOhUZGRlQqVTDcly+\nWpmISEY4p09ERF4x9ImIZIShT0QkIwx9IiIZYegTEckIQ5+ISEYY+kREMsLQJyKSEYY+EZGMMPSJ\niGSEoU9EJCMMfSIiGWHoExHJCEOfiEhGGPpERDLC0CcikhGGPhGRjDD0iYhkxGvoV1dXIykpCQkJ\nCSgoKBiwv7CwEEajEUajEampqVAqlejs7PSpLxERBZnwoK+vT2i1WtHU1CQcDofQ6/Xi1KlTP9q+\noqJCLFiwYEh9vZQw6u3bt0/qEgKK4xvdxvL4xvLYhPA/Oz1e6dfX1yM+Ph6xsbFQqVTIyspCeXn5\nj7bfvn07li1b5lffscpisUhdQkBxfKPbWB7fWB7bjfAY+jabDTExMe5tjUYDm802aFu73Y49e/Yg\nIyNjyH2JiCg4PIa+QqHw+UAVFRWYO3cuIiIihtyXiIiCQ+lpp1qthtVqdW9brVZoNJpB2+7YscM9\ntTOUvlqtdsz/ByI/P1/qEgKK4xvdxvL4xvLYtFqtX/0U128IDKqvrw+JiYnYu3cvoqOjcffdd6O0\ntBQ6na5fu66uLsTFxaGlpQWhoaFD6ktERMHj8UpfqVSiuLgYZrMZTqcT2dnZ0Ol0KCkpAQDk5uYC\nAMrKymA2m92B76kvERFJx+OVPhERjS1BX5Hb0dGBhQsXYubMmfjJT37iXsj1fVarFfPnz8esWbOQ\nkpKCzZs3B7vMIfNlIdovfvELJCQkQK/Xo6GhIcgV3hhv49u2bRv0ej1mz56N+++/H8ePH5egSv/4\nuojw008/hVKpxJ/+9KcgVnfjfBmfxWKB0WhESkoKTCZTcAu8Qd7G197ejkWLFsFgMCAlJQVbt24N\nfpF+ev755zFt2jSkpqb+aJsh58owrhXwyWuvvSYKCgqEEEJs2LBB/PKXvxzQ5vz586KhoUEIIcTl\ny5fFzJkzPS4Kk5ovC9H+8pe/iMWLFwshhDh8+LCYM2eOFKX6xZfx1dXVic7OTiGEEFVVVaNmfL4u\nIuzr6xPz588XP/3pT8Uf//hHCSr1jy/ju3TpkkhOThZWq1UIIcSFCxekKNUvvozvzTffFL/61a+E\nENfGNnnyZNHb2ytFuUNWW1srjh49KlJSUgbd70+uBP1Kf9euXVi5ciUAYOXKlSgrKxvQJioqCgaD\nAQAQHh4OnU6Hc+fOBbXOofBlIdr3xz1nzhx0dnaira1NinKHzJfx3XvvvZg0aRKAa+NraWmRotQh\n83UR4ZYtW5CZmYnbbrtNgir958v4tm/fjoyMDPfTdVOmTJGiVL/4Mr7p06eju7sbANDd3Y3IyEgo\nlR5vZ44YDzzwAG699dYf3e9PrgQ99Nva2jBt2jQAwLRp07wW2NzcjIaGBsyZMycY5fnFl4Vog7UZ\nLcE41IV27733Hh555JFglHbDfP3blZeX46WXXgIwutag+DK+xsZGdHR0YP78+UhLS8Pvf//7YJfp\nN1/Gl5OTg5MnTyI6Ohp6vR5FRUXBLjNg/MmVgPznbuHChWhtbR3w/W9+85t+2wqFwuO/QD09PcjM\nzERRURHCw8OHvc7h4msIiB/cMx8t4TGUOvft24f3338fBw8eDGBFw8eXsa1evRobNmyAQqGAEGLA\n33Ek82V8vb29OHr0KPbu3Qu73Y57770X99xzDxISEoJQ4Y3xZXzr1q2DwWCAxWLB2bNnsXDhQhw7\ndgw333xzECoMvKHmSkBC/29/+9uP7ps2bRpaW1sRFRWF8+fPY+rUqYO26+3tRUZGBp599lk8/vjj\ngShz2PiyEO2HbVpaWqBWq4NW443wdaHd8ePHkZOTg+rqao//SzqS+DK2I0eOICsrC8C1m4JVVVVQ\nqVRYsmRJUGv1hy/ji4mJwZQpUxAaGorQ0FA8+OCDOHbs2KgIfV/GV1dXh7Vr1wK4tqBpxowZOHPm\nDNLS0oJaayD4lSvDdsfBR6+99prYsGGDEEKI9evXD3oj1+VyiZ///Odi9erVwS7PL729vSIuLk40\nNTWJq1ever2Re+jQoVFzo1MI38b39ddfC61WKw4dOiRRlf7xZWzf99xzz4mPPvooiBXeGF/Gd/r0\nabFgwQLR19cnvvvuO5GSkiJOnjwpUcVD48v41qxZI/Ly8oQQQrS2tgq1Wi0uXrwoRbl+aWpq8ulG\nrq+5EvTQv3jxoliwYIFISEgQCxcuFJcuXRJCCGGz2cQjjzwihBBi//79QqFQCL1eLwwGgzAYDKKq\nqirYpQ5JZWWlmDlzptBqtWLdunVCCCHeeecd8c4777jbvPzyy0Kr1YrZs2eLI0eOSFWqX7yNLzs7\nW0yePNn990pPT5ey3CHx5W/3L6Mt9IXwbXwbN24UycnJIiUlRRQVFUlVql+8je/ChQvi0UcfFbNn\nzxYpKSli27ZtUpY7JFlZWWL69OlCpVIJjUYj3nvvvRvOFS7OIiKSEf5cIhGRjDD0iYhkhKFPRCQj\nDH0iIhlh6BMRyQhDn4hIRhj6REQywtAnIpKR/wf7ZiJyW7rtxAAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. The python Levenshtein library provides another metric of string similarity called \"ratio\" (use `L.ratio(s1, s1)`).\n", "`ratio` gives a similarity score between 0 and 1, with higher meaning more similar.\n", "Add a column to \"prod\" with the `ratio` similarities of the **name** columns, and redo the precision/recall tradeoff analysis with the new metric.\n", "(Note: you will have to alter the `accuracy` method and the threshold range.)\n", "On this data, does `Levenshtein.ratio` do better than `Levenshtein.distance`?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "prod['ratio'] = prod.apply(lambda r: L.ratio(r['name_x'], r['name_y']), axis=1)\n", "prod[:10]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_xcluster_xname_xcity_xid_ycluster_yname_ycity_ydistanceratio
1 560 453 2223 san francisco 781 675 103 west atlanta 8 0.166667
9 560 453 2223 san francisco 561 454 acquarello san francisco 10 0.000000
12 560 453 2223 san francisco 708 602 afghan kebab house new york city 18 0.000000
20 560 453 2223 san francisco 782 676 alon's at the terrace atlanta 21 0.000000
24 560 453 2223 san francisco 762 656 andre's french restaurant las vegas 25 0.000000
28 560 453 2223 san francisco 640 534 apple pan the west la 14 0.000000
33 560 453 2223 san francisco 709 603 arcadia new york city 7 0.000000
40 560 453 2223 san francisco 641 535 asahi ramen west la 11 0.000000
47 560 453 2223 san francisco 642 536 baja fresh westlake village 10 0.000000
48 560 453 2223 san francisco 783 677 baker's cajun cafe atlanta 18 0.000000
\n", "

10 rows \u00d7 10 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ " id_x cluster_x name_x city_x id_y cluster_y \\\n", "1 560 453 2223 san francisco 781 675 \n", "9 560 453 2223 san francisco 561 454 \n", "12 560 453 2223 san francisco 708 602 \n", "20 560 453 2223 san francisco 782 676 \n", "24 560 453 2223 san francisco 762 656 \n", "28 560 453 2223 san francisco 640 534 \n", "33 560 453 2223 san francisco 709 603 \n", "40 560 453 2223 san francisco 641 535 \n", "47 560 453 2223 san francisco 642 536 \n", "48 560 453 2223 san francisco 783 677 \n", "\n", " name_y city_y distance ratio \n", "1 103 west atlanta 8 0.166667 \n", "9 acquarello san francisco 10 0.000000 \n", "12 afghan kebab house new york city 18 0.000000 \n", "20 alon's at the terrace atlanta 21 0.000000 \n", "24 andre's french restaurant las vegas 25 0.000000 \n", "28 apple pan the west la 14 0.000000 \n", "33 arcadia new york city 7 0.000000 \n", "40 asahi ramen west la 11 0.000000 \n", "47 baja fresh westlake village 10 0.000000 \n", "48 baker's cajun cafe atlanta 18 0.000000 \n", "\n", "[10 rows x 10 columns]" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "def accuracy_ratio(min_ratio):\n", " similar = prod[prod.ratio > min_ratio]\n", " correct = float(sum(similar.cluster_x == similar.cluster_y))\n", " precision = correct / len(similar)\n", " recall = correct / len(clusters)\n", " return (precision, recall)\n", "\n", "thresholds = range(1, 10)\n", "p = []\n", "r = []\n", "\n", "for t in thresholds:\n", " acc = accuracy_ratio(float(t)/10)\n", " p.append(acc[0])\n", " r.append(acc[1])\n", "\n", "pylab.plot(thresholds, p)\n", "pylab.plot(thresholds, r)\n", "pylab.legend(['precision', 'recall'], loc=2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAEACAYAAACuzv3DAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtYVOXePvAbBBVBOShyJhRkQFFEQVIz0cwDCeahQisV\nUdmWlZnltrcDuc20tJ2Ktd39TLMS7ShiiIqvqIlIaomvFgcTORlyVBTkMKzfHytGiWEYYGDN4f5c\n11zMOItZX5Zx+/Ss9X2WkSAIAoiISGcZS10AERG1D4OciEjHMciJiHQcg5yISMcxyImIdByDnIhI\nx7UY5AsWLICdnR0GDx7c7DYvvvgiBgwYAF9fX/zyyy8aLZCIiFRrMcjDw8ORkJDQ7Pvx8fHIyspC\nZmYm/vvf/2LJkiUaLZCIiFRrMcjHjBkDa2vrZt/fv38/5s2bBwAIDAxEeXk5CgsLNVchERGp1O45\n8vz8fLi4uCheOzs7Iy8vr70fS0REatLIyc6/d/kbGRlp4mOJiEgNJu39ACcnJ+Tm5ipe5+XlwcnJ\nqcl2bv3dcO3qtfbujojIoLi7uyMrK0vlNu0O8tDQUERHRyMsLAwpKSmwsrKCnZ1dk+2uXb3WZOSu\njaKiohAVFdXs+/VCPUqrSlFQUYCCigJcr7h+7/nte8//vP0nenXrBceejnDo6QDHno5wtHBs/Lqn\nI+wt7NG1S1eN16ktdKFOXagRYJ2apit1qjPD0WKQz549G8ePH0dxcTFcXFzwzjvvoLa2FgAQGRmJ\n4OBgxMfHw8PDA+bm5tixY0f7K9dixkbG6NOjD/r06IMhdkOa3a5eqEdJZYki2BuC/lLRJRz544ji\ndeHtQlh2t1QEu6NF46BveNiZ28G0i2kn/qREpCtaDPKYmJgWPyQ6OlojxegTYyNj2JrbwtbcFr72\nvs1uJ6+Xo7iyuMmIPq0wDYeuHFK8vnHnBmzMbMQRvYUDci/noiCuAD279kTPbj2VfrXoatHoz8xN\nzXn+gkgPtXtqRd8EBQV16v66GHeBnYUd7Czs4Ae/ZreT18tRVFmkmM45XX0aLg4uqKipQEV1BfIr\n8lFRXCG+rqnA7ZrbqKiuULxfUVOBu3V3YW5q3ijcLbpaNP5HQI1/EBq+mpmYtfgPQ2cfz7bQhRoB\n1qlpulKnOow668YSRkZGOjFHrs/k9XIx4P8K9/ufK/va0vu18lpYdLVoNuitulmhv3V/yPrI4Nnb\nE25WbjAx5tiBqDXUyU7Jg9zGxgZlZWWdUQL9jbW1NUpLS9v8/XX1dUpH/g1fy6rKcKXsCtJL0pFe\nnI7CO4XoZ9UPnr09IestUwS8rLcMfXr04bQPkRI6EeQcqUuns499VW0VskqzkF6SjoySDEXAp5ek\nwwhGYqj3kUHW+17Ae9h4wMzUrNNqJNI2DHJSSVuOvSAIKK4svhfwf4V7RkkG/ij7A/YW9uLo3aZx\n0LtYusDYiAt4kn5jkJNKunDs6+rrkF2e3STg00vSUVZVhgG9B9ybqmkYyfeRwaq7ldSlE2kEg5xU\n0vVjX1FdgczSTKQX3zdV81fQ9zDtoTTg+1v3b1MDFpFUGORabsmSJXBycsIbb7yhcjsfHx98/PHH\nePjhhzW6f3099oIg4Prt60oDPvdmLlwsXRrNw3vbemOo/VD06tZL6tKJmmCQk0qGeOxr5DW4UnpF\nEfAZJRm4VHQJaYVpcOnlguGOw+Hv4I/hjsPhZ++Hnt16Sl0yGTgGeSeoq6uDiYluXhut68dek+rq\n6/Bb0W84W3AW566fw9mCs7h44yJcLV3h7+iP4Q7D4e/oj6H2Q2HR1ULqcsmAqPN7ylP+zXBzc8O6\ndeswaNAg2NjYYMGCBaiurkZSUhKcnZ3x/vvvw8HBARERERAEAevWrYOHhwf69OmDp556qtG18T/9\n9BNGjRoFa2truLq6YteuXQCA+fPn48033wQAFBcXY+rUqbC2tkbv3r0bTaO4ubnh6NGjAIDq6mos\nW7YMTk5OcHJywssvv4yamhoAUNT24Ycfws7ODo6Ojti5c2cnHTHdZmJsgsF2gxHuF47o4GikLExB\n+cpy7Jm5B+PcxiGjJAOvHH4FfT/oi0EfD8LcH+Zi85nNSM5NRmVtpdTlk4HTzaFkJ9m9ezcOHz6M\nHj16ICQkBGvWrMGECRNQWFiIsrIy5OTkQC6XY/Pmzdi/fz9OnDgBW1tbvPDCC3j++eexe/duXLt2\nDcHBwfj0008xa9Ys3Lx5U7Hsr5GRkaIJZuPGjXBxcUFxcTEAICUlRVHH/du9++67SE1NxYULFwAA\n06ZNw5o1a7B69WoAQGFhIW7duoWCggIcPnwYs2bNwvTp02Fpadlpx01fmHYxha+9L3ztfbHAbwEA\noFZei0tFl8SRe8E5fJH2BS7duAR3G3fFqH24w3D42vuih2kPiX8CMhhCJ2luVy2VAGjm0Vpubm7C\ntm3bFK/j4+MFd3d3ISkpSejatatQXV2teM/b21s4evSo4nVBQYFgamoq1NXVCWvXrhVmzJihdB/z\n588X3nzzTUEQBOGtt94Spk2bJmRlZSmtpeHz3d3dhYMHDyreO3TokODm5iYIgiAcO3ZMMDMzE+Ry\nueL9vn37CmfOnFG6/07869dr1XXVwrmCc8K2s9uExfsXC8O2DRPM1pgJgz8eLITvCxeiz0QLKbkp\nQmVNpdSlkg5S5/dU60fkUk7h3n8LO1dXVxQUFAAAbG1t0bXrvUvYsrOzMX36dBgb35upMjExQWFh\nIfLy8tC/f/9m9yH89QO++uqriIqKwsSJEwEAixcvxsqVK5tsX1BQgAceeEBpXQDQu3fvRnX06NED\nt2/fVvtnptbr2qUrhjkMwzCHYcBw8c+q66px8cZFnCsQ59u3/7Idvxf/Ds/enoqRu7+jPwbbDUZ3\nk+7S/gCk87Q+yKWUk5PT6LmjoyOApgu9u7q6YseOHRg5cmSTz3BxcUFqamqL+7KwsMCGDRuwYcMG\nXLp0CePHj8eIESMwbty4Rts5OjoiOzsb3t7eTeoi7dHNpJsirCMRCQC4W3cXFwsvKk6obju3DRkl\nGfDq43VvWsZxOAb3HYxuJt0k/glIlzDImyEIAj7++GNMnToVZmZmePfddxEWFqZ023/84x94/fXX\n8fnnn8PV1RVFRUU4ffo0QkND8fTTT2Pt2rX45ptvMH36dNy8eRN5eXnw9fVtdCb6wIED8PLygru7\nO3r16oUuXbo0Glk3mD17NtasWYOAgAAAwOrVq/Hss892zEEgjepu0h0BTgEIcApQ/FlVbRXSCtNw\n7vo5nMk/g4/PfozMkkx423orLoP0d/SHT18fNjJRsxjkzTAyMsKcOXMwceJEFBQU4PHHH8cbb7yB\nlJSUJiPyl156CYIgKLbt27cvwsLCEBoaChcXF8THx2PFihVYuHAhLC0t8e6778LX17fRScysrCy8\n8MILKCoqgrW1NZ5//nmMHTu2SV1vvPEGbt26hSFDxLsTPfnkk40airiCoG4xMzVDoHMgAp0DFX9W\nWVuJtMI0nC04i+TcZGxJ3YIrpVfwoPODeCnwJYTIQrjGDDXC68ib0a9fP2zfvh3jx4+XupQOo63H\nnpq6U3MHcRlx2JC8ARU1FVj+4HLM9Z3LlSENAK8jJ9IT5l3NEeYThp8X/Yz/Tv0vDmQegNsmN7yT\n9A6K7hRJXR5JjEFOpEOMjIww1m0s4mbHIWleEvJu5cEz2hNLDixBRkmG1OWRRDi1YsB47PXDn7f/\nxNbUrfjPuf/gIdeHsGLkCoxyGcXzJTpOEIBNm4CXX+ZaK6QCj71+uVNzB59f+Bwfnv4Qtua2WDFy\nBR73ehxdjLtIXRq1kiAAK1YAR44AFy8yyEkFHnv9JK+XIzY9Fh8kf4Abd25g+YPLMX/ofJh3NZe6\nNFKDXA5ERgKXLgHx8YCNDYOcVOCx13/JucnYkLwBJ3NOInJ4JJaOWAp7C3upy6Jm1NQAzzwDlJYC\n+/YBFha8aoXI4I1yGYXvn/oeyQuSUVpVCu+t3li4fyEuF12WujT6m8pKYNo0oLYWOHBADHF1cURu\nwHjsDU/RnSJ8cvYTbP15KwIcA7Bi1AqMfWAsT4xK7OZNYOpUoH9/YPt24P5bHHBEruWCgoKwfft2\nAMDOnTsxZswYiSsifWdrbou3xr6F7JeyMU02Df848A8EfBqAPf+3B3X1dVKXZ5Bu3ADGjQP8/IAd\nOxqHuLoY5BK6v0WfqDOZmZph0fBFuPz8Zbw99m18cvYTeGz2wEcpH6GiukLq8gxGbi7w8MPiaHzT\nJkDJ8kpqYZCroa6OIxXST8ZGxgiRheD4/OP4+omvcTrvNPpt6od/Jv4T+bfypS5Pr2VmAmPGAIsW\nAatXA+0Z0zHIm+Hm5ob3338fQ4YMQc+ePXHq1CnF7dqGDh2K48ePK7YtLS1FeHg4nJycYGNjg+nT\npwMAysrKMHXqVPTt2xc2NjYICQlBfj5/OUg7jXAagb2z9uLnRT+jqrYKgz8ZjHn75iGtME3q0vRO\nWhoQFAS88Qbwyivt/zwGuQp79uzBwYMHceXKFUybNg1vvfUWysrKsGHDBsycORMlJSUAgGeffRZ3\n797F5cuXcePGDSxfvhyAuBRuREQEcnJykJOTAzMzMyxdulTKH4moRf2s+2HTlE3IejELXr29MPnL\nyZj05SQcuXKEJ8c14PRp4NFHgX//G1i4UDOfqfXL2Bq9o5k5ZOHt1v0HaGRkhBdffBFOTk5Yv349\ngoODMXnyZADAhAkT4O/vjx9//BGPPvooEhISUFpaqrgvZsNJy/tH5wDw+uuv6/VqiqRfbMxssGrM\nKiwfuRy7L+7GskPLYGJsghUjV+Apn6e4PnobJCYCs2cDu3YBU6Zo7nO1PshbG8Ca1HCrt2vXruGb\nb75BXFyc4r26ujqMHz8eubm5sLGxUXpz48rKSrz88ss4dOgQysrKAAC3b9+GIAg8yUk6o5tJN4T7\nhWP+0Pk4dOUQNiRvwKqjq/BS4EtYPHwxLLvzxt7q+OEHsWPz++/FuXFN4tSKCg1h6+rqimeffRZl\nZWWKR0VFBV577TW4uLigtLQUN2/ebPL9GzduREZGBlJTU3Hz5k0cP34cgiDwf09JJxkZGWGyx2Qk\nzk1E3Ow4XCi8gH6b+uGVQ68g52ZOyx9gwHbtAp57DkhI0HyIAwxytTzzzDOIi4vD4cOHIZfLcffu\nXSQlJSE/Px8ODg6YMmUKnnvuOZSXl6O2thYnT54EII6+zczMYGlpidLSUrzzzjsS/yREmuHn4Icv\nZ3yJX//xK4yMjOC3zQ9Pf/80zl8/L3VpWmfLFvGk5rFjwLBhHbMPBrkanJ2dERsbi7Vr16Jv375w\ndXXFxo0bUV9fDwD44osvYGpqCi8vL9jZ2WHTpk0AgGXLlqGqqgp9+vTBqFGjMGXKlGanVHhNOeki\nV0tXbJi4AX+8+Af87P0wbc80jPh0BN498S7SCtMM+v8+BQFYswbYvBk4cQLw8uq4fbXYop+QkIBl\ny5ZBLpdj4cKFWLlyZaP3i4uL8cwzz+DPP/9EXV0dVqxYgfnz5zfdEVv0tQ6PPWlarbwWx68dR1x6\nHOIy4lAv1CPEMwQhshCMfWAsupl0k7rETiEIwKuvAocPiw/7dqxTps7vqcogl8vlkMlkSExMhJOT\nEwICAhATEwNvb2/FNlFRUaiursZ7772H4uJiyGQyFBYWwuRvfaYMcu3DY08dSRAEXCq6pAj1y0WX\n8aj7owjxDEHwgGD06dFH6hI7xP3L0P74I2Bj077Pa/daK6mpqfDw8ICbmxtMTU0RFhaG2NjYRts4\nODjg1q1bAIBbt26hd+/eTUKciAyPkZERfPr6YNWYVUiOSEbGCxkI9gjGvt/3wX2zOx767CG8f+p9\n/Fb0m94MKGpqxMsLs7PFm0K0N8TVpTJx8/PzFZfgAeJc8ZkzZxpts2jRIowfPx6Ojo6oqKjA119/\n3TGVEpFO62veF+F+4Qj3C8fdurs4dvUY4jLiMPHLiejWpRtCZaEI8QzBQ64PwbSLqdTltlplJTBz\nJtCtm7gMbffunbdvlUGuzsm3tWvXYujQoUhKSsKVK1fw6KOP4sKFC+jZs2eTbaOiohTPg4KCEBQU\n1OqCiUj3dTfpjikDpmDKgCnYGrwVFwovYH/6fryW+BqulF7BZI/JCPEMwWSPybA2s5a63BY1LEPb\nrx/w2WdtW8GwQVJSEpKSklr1PSrnyFNSUhAVFYWEhAQAwHvvvQdjY+NGJzyDg4PxP//zPxg9ejQA\n4JFHHsH69evh7+/feEecI9c6PPakjQoqCnAg4wDiMuJwPPs4/B39FSdMPWw8pC6viaIiYNIkYPTo\n9q1g2Jx2n+ysq6uDTCbD0aNH4ejoiBEjRjQ52bl8+XJYWlri7bffRmFhIYYPH460tDTY/G1yiEGu\nfXjsSdtV1lbi6B9HsT99Pw5kHoBVdyuEeoYiRBaCkc4jJb+xdF6euG7KrFntX8GwOe0OcgA4ePCg\n4vLDiIgIrFq1Ctu2bQMAREZGori4GOHh4cjJyUF9fT1WrVqFOXPmqF2MjY2Non2dOpe1tTVKS0ul\nLoNILfVCPc4VnMP+9P2Iy4hDfkU+ggcEI8QzBBPdJ6JXt16dWk9mJjBxIvD88+Id7zuKRoK8M4sh\nIlJXzs0cHMg4gP3p+5Gcm4wHnR9UnDB9wOqBDt33xYvA5MlAVJS4nnhHYpATkUGoqK7AkT+OYH/6\nfsRnxsPewl4R6gFOATA20tzEdUqKeJPkzZuBp57S2Mc2i0FORAZHXi/HmfwziEuPw/6M/SipLMFU\nz6kI8QzBhP4TYN7VvM2fffSoeJ34zp1AcLDmalaFQU5EBu+Psj8Uof5z/s8Y88AYhHqGYpTLKHjY\neMDM1Eytz4mNFadRvv1WvM9mZ2GQExHdp/xuOQ5lHUJcRhzOXz+Pq+VXYWduB1kfGWS9ZfDs7an4\n6mLpopiS+eIL4LXXxEaf4cM7t2YGORGRCnX1dbhWfg3pJelIL05HRkmG+LwkHeV3y+Fh44EuZZ64\nkirD64tlCBrsCVkfGay6W3VajQxyIqI2qqi+jX9uyMB3xzIwe2k6CuX3gr6HaY9Go3dZbxlkfWTo\nb91f47fAY5ATEbWBIAArVwIHD4rL0Do43P+egOu3r4uhXiyO3hsCPvdmLlwsXZpM08j6yOBg4dCm\new4wyImIWkkuB5YsAdLSgPj41q1gWCOvwR9lfzQJ+IySDFTWVt4bvd8X8ANsBqBnt6ZrUzVgkBMR\ntUJNDTB3rrh+SmwsYGGhuc8uqypDRklGo3n4jJIMZJZkwtrMusk0jWdvT7hZucG0iymDnIhIHZWV\nwBNPAKamwJ49nbcMbb1Qj9ybufcCvjgdGaXitM2ft/9E9ZvVDHIiopbcugWEhACuruIytKZashx6\nVW0VenTt0b47BBER6bviYmD8eGDwYODzz7UnxAGo3azEICcig5WfL3ZpTp4MbNmi+bXEO4uOlk1E\n1D5ZWcCYMUB4OLBmTcesJd5ZeJdkIjI4//d/4ij87bc7fhnazsCTnURkUAQBGDkSiIjQjRBXJzs5\ntUJEBuWnn4DSUmDBAqkr0RwGOREZlA0bgOXLgS7S3u5Tozi1QkQGIz1dvEolOxswU+/KPslxaoWI\n6D4bNwLPPac7Ia4ujsiJyCAUFgJeXkBGBmBrK3U16uOInIjoL1u3AmFhuhXi6uKInIj03p07QL9+\nwKlTwIABUlfTOhyRExFBvOv9Qw/pXoiriyNyItJrcjng6SneQHnUKKmraT2OyInI4P3wA2Bvr5sh\nri4GORHpLUEAPvgAWLFC6ko6FoOciPTWTz8BZWVAaKjUlXQsBjkR6S19bMdXhic7iUgv6WI7vjI8\n2UlEBktf2/GV4YiciPSOrrbjK8MROREZJH1ux1eGI3Ii0iu63I6vDEfkRGRw9L0dX5kWgzwhIQFe\nXl4YMGAA1q9fr3SbpKQk+Pn5wcfHB0FBQZqukYhILXI58OGH+t8A9Hcqp1bkcjlkMhkSExPh5OSE\ngIAAxMTEwNvbW7FNeXk5Ro8ejUOHDsHZ2RnFxcXo06dP0x1xaoWIOti33wL//rc4raIv2j21kpqa\nCg8PD7i5ucHU1BRhYWGIjY1ttM3u3bsxc+ZMODs7A4DSECci6miG0o6vjMogz8/Ph4uLi+K1s7Mz\n8vPzG22TmZmJ0tJSjBs3Dv7+/vjiiy86plIiIhVOnQJKS/W/HV8ZE1VvGhkZtfgBtbW1OH/+PI4e\nPYrKykqMHDkSDz74IAYY0pkGIpLcBx8Ar7yi/+34yqgMcicnJ+Tm5ipe5+bmKqZQGri4uKBPnz4w\nMzODmZkZHn74YVy4cEFpkEdFRSmeBwUF8cQoEWlEejqQkgLExEhdSfslJSUhKSmpVd+j8mRnXV0d\nZDIZjh49CkdHR4wYMaLJyc7ff/8dS5cuxaFDh1BdXY3AwEDs3bsXAwcObLwjnuwkog4SGQk4OAD3\njRX1hjrZqXJEbmJigujoaEyaNAlyuRwRERHw9vbGtm3bAACRkZHw8vLC5MmTMWTIEBgbG2PRokVN\nQpyIqKMUFgLffCOOyg0VOzuJSKe99RZQVAR88onUlXQMdbKTQU5EOquyEnBz0592fGXYok9Eem3H\nDsNrx1eGI3Ii0klyOeDpCXzxhX7fWJkjciLSW/v2AXZ2+h3i6mKQE5HOaWjHf/VVqSvRDgxyItI5\np04BJSWG2Y6vDIOciHTOhg3A8uWG2Y6vDE92EpFOSU8HHn4YuHoV6NFD6mo6Hk92EpHe+fBDYMkS\nwwhxdXFETkQ648YNwMtLHJUbzI2VOSInIn0SHQ089ZThhLi6OCInIp3Q0I7/009iI5Ch4IiciPTG\njh3A6NGGFeLq4oiciLSeobTjK8MRORHpBbbjq8YgJyKt1tCOv2KF1JVoLwY5EWm1hnb8adOkrkR7\nMciJSKuxHb9lPNlJRFrL0NrxleHJTiLSaWzHVw9H5ESklQyxHV8ZjsiJSGexHV99HJETkdYx1HZ8\nZTgiJyKdtHMn2/FbgyNyItIqcjkgkwG7drGTE+CInIh00L59QN++DPHWYJATkdZgO37bMMiJSGuw\nHb9tGOREpDXYjt82PNlJRFqB7fjK8WQnEekMtuO3HUfkRCS5GzfESw7T08UrVugejsiJSCds3Sq2\n4zPE24YjciKSFNvxVeOInIi0Htvx248jciKSTEM7/uefi2FOTWlkRJ6QkAAvLy8MGDAA69evb3a7\nn3/+GSYmJvj+++9bXykRGaSGdnyGePuoDHK5XI6lS5ciISEBly9fRkxMDH777Tel261cuRKTJ0/m\nqJuI1MJ2fM1RGeSpqanw8PCAm5sbTE1NERYWhtjY2CbbbdmyBbNmzYItV4AnIjUlJ7MdX1NUBnl+\nfj5cXFwUr52dnZGfn99km9jYWCxZsgSAOJ9DRNSSDz5gO76mmKh6U51QXrZsGdatW6eYkOfUChG1\nJD1dHJHv3i11JfpBZZA7OTkhNzdX8To3NxfOzs6Ntjl37hzCwsIAAMXFxTh48CBMTU0RGhra5POi\noqIUz4OCghAUFNSO0olIV7Edv3lJSUlISkpq1feovPywrq4OMpkMR48ehaOjI0aMGIGYmBh4e3sr\n3T48PBwhISGYMWNG0x3x8kMiAtvxW0ud7FQ5IjcxMUF0dDQmTZoEuVyOiIgIeHt7Y9u2bQCAyMhI\nzVVLRAaB7fiax4YgIuo0bMdvPbboE5FWYTt+x+CInIg6Bdvx24YjciLSGvv2Aba2wKhRUleifxjk\nRNThGtrxX30VYM+g5jHIiajDJScDxcVsx+8oDHIi6nBsx+9YPNlJRB0qPR0YMwbIzmYnZ1vwZCcR\nSe7f/2Y7fkfjiJyIOsyNG4CXF/D77+zkbCuOyIlIUlu3Ak8+yRDvaByRE1GHaGjHP3lSbASituGI\nnIgks3On2PzDEO94HJETkcaxHV9zOCInIknExrIdvzMxyIlIo9iO3/kY5ESkUcnJQFER2/E7E4Oc\niDRqwwa243c2nuwkIo3JyAAeeojt+JrEk51E1GmqqoAFC8TROEO8c3FETkTtVl8v3lDZxAT46ivA\nmENEjVEnO006qRYi0mOvviquq3L4MENcCgxyImqXzZuB+Hjg1CmgWzepqzFMDHIiarMffgDWrxdD\n3MZG6moMF4OciNrk9Glg8WIgIUFcHIukw9ksImq1zExg+nRxLZXhw6WuhhjkRNQqRUXAlCnAv/4F\nBAdLXQ0BvPyQiFqhshIYPx6YMAFYs0bqagyDOtnJICcitcjlwMyZQM+ewK5dXBCrs/A6ciLSCEEA\nli0DKiqAr79miGsbBjkRtejDD4GkJOCnn4CuXaWuhv6OQU5EKn39NfDRR+LytJaWUldDyjDIiahZ\nJ08CS5cCR44ALi5SV0PN4eWHRKTU778Ds2YBX34J+PpKXQ2pwiAnoib+/FO8Rnz9emDiRKmroZYw\nyImokTt3gKlTgXnzgPnzpa6G1MHryIlIoa4OePxxoG9fYPt2XmaoDTR2h6CEhAR4eXlhwIABWL9+\nfZP3v/rqK/j6+mLIkCEYPXo00tLS2lYxEUlGEMQTm7W1wLZtDHFd0uJVK3K5HEuXLkViYiKcnJwQ\nEBCA0NBQeHt7K7bp378/Tpw4AUtLSyQkJGDx4sVISUnp0MKJSLPWrwdSUoATJwBTU6mrodZocUSe\nmpoKDw8PuLm5wdTUFGFhYYiNjW20zciRI2H51wWmgYGByMvL65hqiahD7N4NfPKJeIOIXr2kroZa\nq8Ugz8/Ph8t9F5A6OzsjPz+/2e23b9+OYC6JRqQzjh0T2+9//BFwdJS6GmqLFqdWjFoxUXbs2DF8\n9tlnOHXqlNL3o6KiFM+DgoIQFBSk9mcTkeZdugSEhQF79wI+PlJXQwCQlJSEpKSkVn1Pi0Hu5OSE\n3Nxcxevc3Fw4Ozs32S4tLQ2LFi1CQkICrK2tlX7W/UFORNIqKAAee0xcR2XcOKmroQZ/H+S+8847\nLX5Pi1Mr/v7+yMzMRHZ2NmpqarB3716EhoY22iYnJwczZszAl19+CQ8Pj9ZXTkSdqqJCDPHFi4Gn\nn5a6GmrEdGaDAAALSUlEQVSvFkfkJiYmiI6OxqRJkyCXyxEREQFvb29s27YNABAZGYnVq1ejrKwM\nS5YsAQCYmpoiNTW1YysnojaprQWeeAIYMQJYtUrqakgT2BBEZEAEAVi4UGzBj40FTLhsntbjjSWI\nqJE1a4ALF8S1xRni+oN/lUQG4vPPgc8+A06fBiwspK6GNIlTK0QG4MgR4JlnxJH4fU3ZpAM4tUJE\nSEsTr0z57juGuL7iMrZEeiwvT1ySdssWYMwYqauhjsIgJ9JTN28CU6YAL74IPPWU1NVQR+IcOZEe\nqqkR7/Dj5SWOxrkkre5SJzsZ5ER6RhDEO/vcvCnOi3fpInVF1B482UlkgN56S7xx8rFjDHFDwSAn\n0iP/7/8BMTFAcjLQo4fU1VBn4dQKkZ44eBBYsEC8w8+AAVJXQ5rCqRUiA3H+vHjX+9hYhrgh4uWH\nRDru2jUgJAT4z3+AkSOlroakwCAn0mFlZeK14q+9BsyYIXU1JBXOkRPpqOpqYNIkYNgw8S4/pJ94\nHTmRnqqvF9dPqa0Fvv4aMOb/W+stnuwk0lOvvw7k5ACJiQxxYpAT6ZxPPgF++EG8VtzMTOpqSBsw\nyIl0SFwc8K9/ASdPAr17S10NaQsGOZGO+PlnICICOHAAcHeXuhrSJpxdI9JyVVXiVMq0aWIL/ogR\nUldE2oYjciItVFEBxMeLqxceOgQMHw5s3QqEhkpdGWkjXn5IpCXKy4H9+8XwPnYMGD0amDlTHInb\n2kpdHUmF15ETabmiInF9lO++A06dAsaNE8M7JASwtpa6OtIGDHIiLXT9ujjn/e23wLlzYnfmzJni\nHX169pS6OtI2DHIiLXHtGvD99+LI+9Il4LHHxPCeNInrhpNqDHIiCWVlicH97bfA1aviXPfMmcAj\njwDdukldHekKBjlRJ7t8WQzu774DCguB6dPF8B47FjA1lbo60kUMcqIOJgjAr7+Kwf3dd8Dt22Jw\nz5wJjBrFe2ZS+zHIiTpAfb3YZfntt+K8N3AvvAMCuIgVaRZXPyTSELlcvDzwu+/E8LawAGbNEl/7\n+gJGRlJXSIaMQU7UjNpa4PhxceS9bx9gby+Oug8dAgYOlLo6onsY5ET3qa4W1/j+7juxy7J/fzG8\nf/oJ8PCQujoi5ThHTgavslIcZX/7rbi+iY+PGN4zZgCurlJXR4aOJzvJ4NXWip2UBQWNH/n5957n\n5oonKWfOFC8XdHCQumqiezQS5AkJCVi2bBnkcjkWLlyIlStXNtnmxRdfxMGDB9GjRw/s3LkTfn5+\nbSqGSF319cCNG6oDuqBAvMt8376AoyPg5CR+bXg0vHZ1BXr1kvonIlKu3VetyOVyLF26FImJiXBy\nckJAQABCQ0Ph7e2t2CY+Ph5ZWVnIzMzEmTNnsGTJEqSkpGjmJ5BAUlISgoKCpC6jRfpapyCIqwD+\nPZD/HtKFhYCVVdNQ9vdv/NrWtuVrufX1WEqFdXY+lUGempoKDw8PuLm5AQDCwsIQGxvbKMj379+P\nefPmAQACAwNRXl6OwsJC2NnZdVzVHUhX/nJ1sc47d1oO6IICsX397yNnmUxcGbDhtb090LWr5mvU\nZqxTs3SlTnWoDPL8/Hy4uLgoXjs7O+PMmTMtbpOXl6ezQa7r5HKgrk6cG66tvfe8pa+t2Vbd77l7\nV7xB8N69YkDX1Cif3ggIaPxn5uZSH0Ui3aIyyI3U7HL4+/xNc983apSqz2hpH+rU0f738/PFeyI2\nvG74nvu/asOflZcD0dFNw9PICDAxEdf1aO3XtnyPiQnQvbvyz+jaVexy/Oc/xYC2smLjDFGHEFQ4\nffq0MGnSJMXrtWvXCuvWrWu0TWRkpBATE6N4LZPJhD///LPJZ7m7uwsA+OCDDz74aMXD3d1dVUwL\ngiAIKkfk/v7+yMzMRHZ2NhwdHbF3717ExMQ02iY0NBTR0dEICwtDSkoKrKyslE6rZGVlqdoVERG1\nkcogNzExQXR0NCZNmgS5XI6IiAh4e3tj27ZtAIDIyEgEBwcjPj4eHh4eMDc3x44dOzqlcCIiEnVa\nQxAREXWMDl9wc8GCBbCzs8PgwYM7eldtlpubi3HjxmHQoEHw8fHB5s2bpS5Jqbt37yIwMBBDhw7F\nwIEDsWrVKqlLUkkul8PPzw8hISFSl9IsNzc3DBkyBH5+fhgxYoTU5TSrvLwcs2bNgre3NwYOHKiV\nvRrp6enw8/NTPCwtLbXyd+m9997DoEGDMHjwYMyZMwfV1dVSl6TUpk2bMHjwYPj4+GDTpk2qN25x\nFr2dTpw4IZw/f17w8fHp6F212fXr14VffvlFEARBqKioEDw9PYXLly9LXJVyd+7cEQRBEGpra4XA\nwEDh5MmTElfUvI0bNwpz5swRQkJCpC6lWW5ubkJJSYnUZbRo7ty5wvbt2wVBEP/uy8vLJa5INblc\nLtjb2ws5OTlSl9LI1atXhX79+gl3794VBEEQnnzySWHnzp0SV9XUxYsXBR8fH6Gqqkqoq6sTJkyY\nIGRlZTW7fYePyMeMGQNra+uO3k272NvbY+jQoQAACwsLeHt7o6CgQOKqlOvx1516a2pqIJfLYWNj\nI3FFyuXl5SE+Ph4LFy7U+qUZtL2+mzdv4uTJk1iwYAEA8dyVpaWlxFWplpiYCHd390Y9JtqgV69e\nMDU1RWVlJerq6lBZWQknJyepy2ri999/R2BgILp3744uXbpg7Nix+L7hLiZK8F4mf5OdnY1ffvkF\ngYGBUpeiVH19PYYOHQo7OzuMGzcOA7V0YeyXX34ZH3zwAYy1/HY5RkZGmDBhAvz9/fHpp59KXY5S\nV69eha2tLcLDwzFs2DAsWrQIlZWVUpel0p49ezBnzhypy2jCxsYGr7zyClxdXeHo6AgrKytMmDBB\n6rKa8PHxwcmTJ1FaWorKykr8+OOPyMvLa3Z77f4t62S3b9/GrFmzsGnTJlhYWEhdjlLGxsb49ddf\nkZeXhxMnTiApKUnqkpo4cOAA+vbtCz8/P60f7Z46dQq//PILDh48iK1bt+LkyZNSl9REXV0dzp8/\nj+eeew7nz5+Hubk51q1bJ3VZzaqpqUFcXByeeOIJqUtp4sqVK/joo4+QnZ2NgoIC3L59G1999ZXU\nZTXh5eWFlStXYuLEiZgyZQr8/PxUDooY5H+pra3FzJkz8cwzz+Dxxx+XupwWWVpa4rHHHsPZs2el\nLqWJ5ORk7N+/H/369cPs2bPxv//7v5g7d67UZSnl8Neatba2tpg+fTpSU1MlrqgpZ2dnODs7IyAg\nAAAwa9YsnD9/XuKqmnfw4EEMHz4ctra2UpfSxNmzZzFq1Cj07t0bJiYmmDFjBpKTk6UuS6kFCxbg\n7NmzOH78OKysrCCTyZrdlkEOcY40IiICAwcOxLJly6Qup1nFxcUoLy8HAFRVVeHIkSNKlwyW2tq1\na5Gbm4urV69iz549GD9+PHbt2iV1WU1UVlaioqICAHDnzh0cPnxYK6+usre3h4uLCzIyMgCI88+D\nBg2SuKrmxcTEYPbs2VKXoZSXlxdSUlJQVVUFQRCQmJiotdOTN27cAADk5OTghx9+UDlV1eG3eps9\nezaOHz+OkpISuLi4YPXq1QgPD+/o3bbKqVOn8OWXXyouQwPES5QmT54scWWNXb9+HfPmzUN9fT3q\n6+vx7LPP4pFHHpG6rBapu2ZPZyssLMT06dMBiNMXTz/9NCZOnChxVcpt2bIFTz/9NGpqauDu7q61\njXd37txBYmKi1p5v8PX1xdy5c+Hv7w9jY2MMGzYMixcvlrospWbNmoWSkhKYmpri448/Ri8Vi+az\nIYiISMdxaoWISMcxyImIdByDnIhIxzHIiYh0HIOciEjHMciJiHQcg5yISMcxyImIdNz/B761DuPQ\n5kfZAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "pylab.scatter(p, r)\n", "# Ratio appears to do a little better than distance---it achieves higher precision at reasonable levels of recall (say >0.8)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEACAYAAABfxaZOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHE5JREFUeJzt3X9UVOeBN/Dv6ExXFKuiBGGGFISRGQMM447StNKMq4A1\nXQya7IHavv5gfYlNjiZv1hNPbE/g7KlKmtONLX+EGBNXNxL2TaxogxNLmomNxtI11HiQpUjEDBNF\nUBGUyI/x2T9sZyXI/ALmMjzfzzk9h3vv81y+z/H0y82duTMqIYQAERFJYYLSAYiIKHhY+kREEmHp\nExFJhKVPRCQRlj4RkURY+kREEvFa+uvXr0dUVBRSUlKGHLNp0ybo9XqYTCbU1ta698fFxSE1NRVm\nsxkLFy4cmcRERBQwr6W/bt062Gy2IY9XVVXh/PnzaGxsxGuvvYaNGze6j6lUKtjtdtTW1qKmpmZk\nEhMRUcC8ln5GRgZmzJgx5PHDhw9jzZo1AID09HR0dHSgtbXVfZzPfhERjR3DvqfvdDoRGxvr3tbp\ndHA6nQDuXukvXboUFosFu3fvHu6vIiKiYVKPxEmGupr/+OOPERMTg7a2NmRmZsJgMCAjI2MkfiUR\nEQVg2KWv1WrhcDjc2y0tLdBqtQCAmJgYAEBkZCRyc3NRU1MzqPQTExPR1NQ03BhERFJJSEjA+fPn\n/Z437Ns7OTk52LdvHwDg1KlTmD59OqKiotDd3Y2uri4AwK1bt3Ds2LH7vgOoqakJQohx+78XX3xR\n8QxcH9cn29pkWF+gF8ter/Tz8/Px0Ucfob29HbGxsSguLkZfXx8AoLCwEMuXL0dVVRUSExMxZcoU\nvPnmmwCAy5cvY+XKlQCA/v5+rF69GllZWQGFJCKikeG19MvLy72epLS0dNC+OXPm4M9//nNgqYiI\naFTwidxRZrValY4wqri+0DWe1waM//UFSiWEUPSN9CqVCgpHICIKOYF2J6/0iYgkwtInIpIIS5+I\nSCIsfSIiibD0iYgkwtInIpIIS5+ISCIsfSIiibD0iYgkwtInIpIIS5+ISCIsfSIiibD0iYgkwtIn\nIpIIS5+ISCIsfSIiibD0iYgkwtInIpIIS5+ISCIsfSIiibD0iYgk4rX0169fj6ioKKSkpAw5ZtOm\nTdDr9TCZTKitrXXvt9lsMBgM0Ov1KCkpGZnEREQUMK+lv27dOthstiGPV1VV4fz582hsbMRrr72G\njRs3AgBcLheefvpp2Gw2nDt3DuXl5aivrx+55GNcf38/Dh48iP3796Orq0vpOEREAAC1twEZGRlo\nbm4e8vjhw4exZs0aAEB6ejo6Ojpw+fJlXLhwAYmJiYiLiwMA5OXlobKyEkajcUSCj2VdXV148MGH\n0NHRA2AK1OpN+K//ssNkMikdjYgkN+x7+k6nE7Gxse5tnU4Hp9OJL7/88r77ZfDEE/no6EgA0AKg\nCf39TyI7+wmlYxEReb/S94UQYljzi4qK3D9brVZYrdbhBVJYfX0zgJ8A0Px1zxO4evV15QIRUciz\n2+2w2+3DPs+wS1+r1cLhcLi3W1paoNPp0NfXN2C/w+GATqe77znuLf3xICUlAV98UQ6gAMA3ALyF\nyMiZCqciolD29Qvi4uLigM4z7Ns7OTk52LdvHwDg1KlTmD59OqKiomCxWNDY2Ijm5mb09vaioqIC\nOTk5w/11IeE//7Mcs2Y5AUQD0EGj2Yvq6oMKpyIi8uFKPz8/Hx999BHa29sRGxuL4uJi9PX1AQAK\nCwuxfPlyVFVVITExEVOmTMGbb75598RqNUpLS5GdnQ2Xy4WCggIpXsQFgMmTJ6O19Tx+//vf4+bN\nm8jOzkZYWJjSsYiIoBLDvSE/3AAq1bBfEyAikk2g3ckncomIJMLSJyKSCEufiEgiLH0iIomw9EfR\nF198gTNnzuD27dtKRyEiAsDSHxVCCPzkJ/8PSUnzkZHxQ8TFzUNDQ4PSsYiIWPqj4ciRI9i37xhu\n3z6Prq46XLnyHJ54Yp3SsYiIWPqjoa6uDrdvLwcwHQAgRD4aG+uUDUVEBJb+qEhKSsKkSccA3AQA\nqFS/QXx8krKhiIjAJ3JHhRAC69c/hYqK30Cj0UKjuQK7vQrJyclKRyOicSLQ7mTpj6K//OUvqK+v\nh1arhclkgkaj8T6JiMgHLP0x5s6dO1iz5km8885BqNXTERkZhuPHjw758dJERP7gZ++MMfv378fB\ng5/h9u1m3LzZiC++WIk1a55SOhYRSY6lP0o+/fQsurtzAYQDUMHl+hHOnj2rdCwikhxLf5TMm6fH\n5Mk2AD0AgAkTDmPu3LnKhiIi6fGe/ijp7+/HihX5sNv/BLU6EmFhV3HixO+QkJCgdDQiGgf4Qu4Y\nJITA2bNncfPmTZhMJkyZMkXpSEQ0TrD0iYgkwnfvEBGRVyx9IiKJsPSJiCTC0icikghLn4hIIl5L\n32azwWAwQK/Xo6SkZNDx69evIzc3FyaTCenp6air+9/PjY+Li0NqairMZjMWLlw4ssmJiMhvHt+y\n6XK5kJSUhOrqami1WixYsADl5eUwGo3uMVu2bME3v/lN/OxnP0NDQwOeeuopVFdXAwDi4+Nx+vRp\nREREDB2Ab9kkIvLbqLxls6amBomJiYiLi4NGo0FeXh4qKysHjKmvr8fixYsB3P3ykObmZrS1tbmP\ns9CJiMYOj6XvdDoRGxvr3tbpdHA6nQPGmEwmHDx4EMDdPxIXL15ES0sLgLt/iZYuXQqLxYLdu3eP\ndHYiIvKT2tNBlUrl9QRbt27F5s2bYTabkZKSArPZjIkTJwIAPv74Y8TExKCtrQ2ZmZkwGAzIyMgY\ndI6ioiL3z1arFVar1b9VEBGNc3a7HXa7fdjn8XhP/9SpUygqKoLNZgMA7NixAxMmTMDzzz8/5Anj\n4+Nx9uxZhIeHD9hfXFyM8PBwPPfccwMDSHhPv6enB5cuXcLs2bMxadIkpeMQUQgalXv6FosFjY2N\naG5uRm9vLyoqKpCTkzNgzI0bN9Db2wsA2L17Nx555BGEh4eju7sbXV1dAIBbt27h2LFjSElJ8Tvg\neFNdXY1Zs3R46KEMzJqlxXvvVSkdiYgk4vH2jlqtRmlpKbKzs+FyuVBQUACj0YiysjIAQGFhIc6d\nO4e1a9dCpVIhOTkZe/bsAQC0trYiNzcXwN2PGV69ejWysrJGeTljW2dnJ3Jz83Hz5jsAHgHwCf7p\nn/4RFy/+N2bNmqV0PCKSAD9lM4hqa2thtf4fdHb+7zdoTZuWjqqqf8N3vvMdBZMRUajhp2yGAJ1O\nh97eFgDn/7rnInp6zuPBBx9UMhYRSYSlH0SRkZF45ZVfICzsO5g2LRthYQuxY0cxdDqd0tGISBK8\nvaOApqYmNDQ0IDExkd+bS0QB4TdnERFJhPf0iYjIK5Y+EZFEWPpERBJh6RMRSYSlT0QkEZY+EZFE\nWPpERBJh6RMRSYSlT0QkEZY+EZFEWPpERBJh6RMRSYSlT0QkEZY+EZFEWPpERBJh6RMRSYSlH6KE\nEOjo6OAX0BCRX1j6IejkyZOYNSsWDzwQi4iIGBw/flzpSEQUIvh1iSGmq6sLOp0enZ2vA/gBgGOY\nOvXHuHjxvzFjxgyl4xFRkIza1yXabDYYDAbo9XqUlJQMOn79+nXk5ubCZDIhPT0ddXV1Ps8l/50/\nfx7AA7hb+ACQhQkTHkRDQ4OCqYgoVHgsfZfLhaeffho2mw3nzp1DeXk56uvrB4zZvn075s+fjzNn\nzmDfvn3YvHmzz3PJf9HR0ejpcQBw/nVPK3p6LiA6OlrJWEQUIjyWfk1NDRITExEXFweNRoO8vDxU\nVlYOGFNfX4/FixcDAJKSktDc3IwrV674NJf8N3v2bBQV/RSTJ6dj6tQ8TJ5swdatz+Fb3/qW0tGI\nKASoPR10Op2IjY11b+t0Ovzxj38cMMZkMuHgwYNYtGgRampqcPHiRbS0tPg0lwKzdetzyMpajPr6\neiQl/QssFovSkYgoRHgsfZVK5fUEW7duxebNm2E2m5GSkgKz2YyJEyf6NPdvioqK3D9brVZYrVaf\n58pq/vz5mD9/vtIxiChI7HY77Hb7sM/jsfS1Wi0cDod72+FwQKfTDRgzdepUvPHGG+7t+Ph4JCQk\n4KuvvvI692/uLX0iIhrs6xfExcXFAZ3H4z19i8WCxsZGNDc3o7e3FxUVFcjJyRkw5saNG+jt7QUA\n7N69G4888gjCw8N9mktERMHl8UpfrVajtLQU2dnZcLlcKCgogNFoRFlZGQCgsLAQ586dw9q1a6FS\nqZCcnIw9e/Z4nEtERMrhw1lERCFo1B7OIiKi8YOlT0QkEZY+EZFEWPpERBJh6RMRSYSlT0QkEZY+\nEZFEWPpERBJh6RMRSYSlT0QkEZY+EZFEWPpERBJh6RMRSYSlT0QkEZY+EZFEWPpERBJh6RMRSYSl\nT0QkEZY+EZFEWPpERBJh6RMRSYSlT0QkEZY+EZFEvJa+zWaDwWCAXq9HSUnJoOPt7e1YtmwZ0tLS\nkJycjL1797qPxcXFITU1FWazGQsXLhzR4ERE5D+VEEIMddDlciEpKQnV1dXQarVYsGABysvLYTQa\n3WOKiorQ09ODHTt2oL29HUlJSWhtbYVarUZ8fDxOnz6NiIiIoQOoVPAQgYiI7iPQ7vR4pV9TU4PE\nxETExcVBo9EgLy8PlZWVA8ZER0ejs7MTANDZ2YmZM2dCrVa7j7PQiYjGDo+l73Q6ERsb697W6XRw\nOp0DxmzYsAF1dXWIiYmByWTCrl273MdUKhWWLl0Ki8WC3bt3j3B0IiLyl9rTQZVK5fUE27dvR1pa\nGux2O5qampCZmYkzZ85g6tSpOHHiBKKjo9HW1obMzEwYDAZkZGQMOkdRUZH7Z6vVCqvV6vdCiIjG\nM7vdDrvdPuzzeCx9rVYLh8Ph3nY4HNDpdAPGnDx5Etu2bQMAJCQkID4+Hg0NDbBYLIiOjgYAREZG\nIjc3FzU1NV5Ln4iIBvv6BXFxcXFA5/F4e8disaCxsRHNzc3o7e1FRUUFcnJyBowxGAyorq4GALS2\ntqKhoQFz5sxBd3c3urq6AAC3bt3CsWPHkJKSElBIIiIaGR6v9NVqNUpLS5GdnQ2Xy4WCggIYjUaU\nlZUBAAoLC/HCCy9g3bp1MJlMuHPnDl566SVERETg888/x8qVKwEA/f39WL16NbKyskZ/RURENCSP\nb9kMSgC+ZZOIyG+j8pZNIiIaX1j6REQSYekTEUmEpU9EJBGWPhGRRFj6REQSYekTEUmEpU9EJBGW\nPhGRRFj6REQSYekTEUmEpU9j3p49b2L27ERMnx6DJ598Bn19fUpHIgpZ/MA1GtNsNhtWrfq/6O5+\nB0AkwsI24MknF+CXv9yhdDQiRfED12hc+s1vqtDd/QyAhQDi8dVXL+Hdd3+rdCyikMXSpzFt5sxp\nUKs/v2fP55g2bZpieYhCHW/v0JjW2tqK1NR03LhhRX9/JP7u7/bivff+P79HmaQXaHey9GnMa2tr\nw/79+9Hd/RVycv4RqampSkciUhxLn4hIInwhl4iIvGLpExFJhKVPRCQRlj4RkURY+kREEvFa+jab\nDQaDAXq9HiUlJYOOt7e3Y9myZUhLS0NycjL27t3r81wiIgouj2/ZdLlcSEpKQnV1NbRaLRYsWIDy\n8nIYjUb3mKKiIvT09GDHjh1ob29HUlISWltboVKpvM4F+JZNIqJAjMpbNmtqapCYmIi4uDhoNBrk\n5eWhsrJywJjo6Gh0dnYCADo7OzFz5kyo1Wqf5hIRUXB5LH2n04nY2Fj3tk6ng9PpHDBmw4YNqKur\nQ0xMDEwmE3bt2uXzXCIiCi61p4MqlcrrCbZv3460tDTY7XY0NTUhMzMTZ86c8StEUVGR+2er1crP\nVSEi+hq73Q673T7s83gsfa1WC4fD4d52OBzQ6XQDxpw8eRLbtm0DACQkJCA+Ph4NDQ3Q6XRe5/7N\nvaVPRESDff2CuLi4OKDzeLy9Y7FY0NjYiObmZvT29qKiogI5OTkDxhgMBlRXVwO4+4mIDQ0NmDNn\njk9ziYgouDxe6avVapSWliI7OxsulwsFBQUwGo0oKysDABQWFuKFF17AunXrYDKZcOfOHbz00kuI\niIgAgPvOJSIi5fBTNomIQhA/ZZOIiLxi6RMRSYSlT0QkEZY+EZFEWPpERBJh6RMRSYSlT0QkEZY+\nEZFEWPpERBJh6RMRSYSlT0QkEZY+EZFEWPpERBJh6RMRSYSlT0QkEZY+EQXNyZMnYbH8A+bMMePZ\nZ7eit7dX6UjS4ZeoEFFQNDQ04O//fhFu3XoFwFyEhf0UP/yhHq+/Xqp0tJDEL1EhojHtyJEj6O3N\nA7AawAJ89dWbKC8vVzqWdFj6RBQUkyZNwsSJ1+7Zcw3f+MYkxfLIiqVPREGRn5+PadM+gVr9NIBf\nY/LkFSgufkHpWNLhPX0iCprW1lb88pe/wpUr1/DYY8uwYsUKpSOFrEC7k6VPRBSC+EIuERF55bX0\nbTYbDAYD9Ho9SkpKBh1/+eWXYTabYTabkZKSArVajY6ODgBAXFwcUlNTYTabsXDhwpFPT0REfvF4\ne8flciEpKQnV1dXQarVYsGABysvLYTQa7zv+t7/9LV555RVUV1cDAOLj43H69GlEREQMHYC3d4iI\n/DYqt3dqamqQmJiIuLg4aDQa5OXlobKycsjxBw4cQH5+/oB9LHQiorHDY+k7nU7Exsa6t3U6HZxO\n533Hdnd34/3338eqVavc+1QqFZYuXQqLxYLdu3ePUGQiIgqU2tNBlUrl84mOHDmCRYsWYfr06e59\nJ06cQHR0NNra2pCZmQmDwYCMjIxBc4uKitw/W61WWK1Wn38vEZEM7HY77Hb7sM/jsfS1Wi0cDod7\n2+FwQKfT3Xfs22+/PejWTnR0NAAgMjISubm5qKmp8Vr6REQ02NcviIuLiwM6j8fbOxaLBY2NjWhu\nbkZvby8qKiqQk5MzaNyNGzdw/PjxAQ9adHd3o6urCwBw69YtHDt2DCkpKQGFJCKikeHxSl+tVqO0\ntBTZ2dlwuVwoKCiA0WhEWVkZAKCwsBAAcOjQIWRnZyMsLMw9t7W1Fbm5uQCA/v5+rF69GllZWaO1\nDiIi8gGfyCUiCkF8IpeIiLxi6RMR+aG+vh5ZWSuRkrIIW7b8NOS+/Yu3d4iIfHTp0iUYjfPR2fk8\nhDAjLGwHVq6MxX/8R/CfQ+KnbBIRjbLXX38dmzd/iO7ut/665wbU6ij09HRjwoTg3jjhPX0iolGm\n0WigUt26Z083JkyY6NeDrEpj6RMR+WjFihWYOvUs1OpnAfw7Jk9+FJs3PxtSpc/bO0REfmhtbcW/\n/msJWlqu4NFHF+Of/3m9IqXPe/pERBLhPX0iIvKKpU9EJBGWPhGRRFj6REQSYekTEUmEpU9EJBGW\nPhGRRFj6REQSYekTEUmEpU9EJBGWPhGRRFj6REQSYekTEUmEpU9EJBGvpW+z2WAwGKDX61FSUjLo\n+Msvvwyz2Qyz2YyUlBSo1Wp0dHT4NJeIiIJMeNDf3y8SEhLEhQsXRG9vrzCZTOLcuXNDjj9y5IhY\nsmSJX3O9RAh5H374odIRRhXXF7rG89qEGP/rC7Q7PV7p19TUIDExEXFxcdBoNMjLy0NlZeWQ4w8c\nOID8/PyA5o5Xdrtd6QijiusLXeN5bcD4X1+gPJa+0+lEbGyse1un08HpdN53bHd3N95//32sWrXK\n77lERBQcHkvfn+99PHLkCBYtWoTp06f7PZeIiIJD7emgVquFw+FwbzscDuh0uvuOffvtt923dvyZ\nm5CQMO7/QBQXFysdYVRxfaFrPK8NGN/rS0hICGiexy9G7+/vR1JSEj744APExMRg4cKFKC8vh9Fo\nHDDuxo0bmDNnDlpaWhAWFubXXCIiCh6PV/pqtRqlpaXIzs6Gy+VCQUEBjEYjysrKAACFhYUAgEOH\nDiE7O9td+J7mEhGRcjxe6RMR0fgS9Cdyr127hszMTMydOxdZWVnuB7nu5XA4sHjxYjz00ENITk7G\nr371q2DH9JsvD6Jt2rQJer0eJpMJtbW1QU44PN7W99Zbb8FkMiE1NRXf/e538dlnnymQMjC+PkT4\npz/9CWq1GgcPHgxiuuHzZX12ux1msxnJycmwWq3BDThM3tbX3t6OZcuWIS0tDcnJydi7d2/wQwZo\n/fr1iIqKQkpKypBj/O6VEXxWwCdbtmwRJSUlQgghdu7cKZ5//vlBYy5duiRqa2uFEEJ0dXWJuXPn\nenwoTGm+PIj23nvvie9///tCCCFOnTol0tPTlYgaEF/Wd/LkSdHR0SGEEOLo0aMhsz5fHyLs7+8X\nixcvFo8++qh45513FEgaGF/Wd/36dTFv3jzhcDiEEEK0tbUpETUgvqzvxRdfFFu3bhVC3F1bRESE\n6OvrUyKu344fPy4+/fRTkZycfN/jgfRK0K/0Dx8+jDVr1gAA1qxZg0OHDg0aM3v2bKSlpQEAwsPD\nYTQa8eWXXwY1pz98eRDt3nWnp6ejo6MDra2tSsT1my/re/jhhzFt2jQAd9fX0tKiRFS/+foQ4a9/\n/Ws8/vjjiIyMVCBl4HxZ34EDB7Bq1Sr3u+tmzZqlRNSA+LK+6OhodHZ2AgA6Ozsxc+ZMqNUeX84c\nMzIyMjBjxowhjwfSK0Ev/dbWVkRFRQEAoqKivAZsbm5GbW0t0tPTgxEvIL48iHa/MaFSjP4+aLdn\nzx4sX748GNGGzdd/u8rKSmzcuBFAaD2D4sv6Ghsbce3aNSxevBgWiwX79+8PdsyA+bK+DRs2oK6u\nDjExMTCZTNi1a1ewY46aQHplVP7cZWZm4vLly4P2//znPx+wrVKpPP4f6ObNm3j88cexa9cuhIeH\nj3jOkeJrCYivvWYeKuXhT84PP/wQb7zxBk6cODGKiUaOL2t75plnsHPnTqhUKgghBv07jmW+rK+v\nrw+ffvopPvjgA3R3d+Phhx/Gt7/9bej1+iAkHB5f1rd9+3akpaXBbrejqakJmZmZOHPmDKZOnRqE\nhKPP314ZldL/3e9+N+SxqKgoXL58GbNnz8alS5fwwAMP3HdcX18fVq1ahR/96Ed47LHHRiPmiPHl\nQbSvj2lpaYFWqw1axuHw9UG7zz77DBs2bIDNZvP4n6RjiS9rO336NPLy8gDcfVHw6NGj0Gg0yMnJ\nCWrWQPiyvtjYWMyaNQthYWEICwvD9773PZw5cyYkSt+X9Z08eRLbtm0DcPeBpvj4eDQ0NMBisQQ1\n62gIqFdG7BUHH23ZskXs3LlTCCHEjh077vtC7p07d8SPf/xj8cwzzwQ7XkD6+vrEnDlzxIULF0RP\nT4/XF3I/+eSTkHmhUwjf1nfx4kWRkJAgPvnkE4VSBsaXtd1r7dq14t133w1iwuHxZX319fViyZIl\nor+/X9y6dUskJyeLuro6hRL7x5f1Pfvss6KoqEgIIcTly5eFVqsVV69eVSJuQC5cuODTC7m+9krQ\nS//q1atiyZIlQq/Xi8zMTHH9+nUhhBBOp1MsX75cCCHEH/7wB6FSqYTJZBJpaWkiLS1NHD16NNhR\n/VJVVSXmzp0rEhISxPbt24UQQrz66qvi1VdfdY956qmnREJCgkhNTRWnT59WKmpAvK2voKBARERE\nuP+9FixYoGRcv/jyb/c3oVb6Qvi2vl/84hdi3rx5Ijk5WezatUupqAHxtr62tjbxgx/8QKSmpork\n5GTx1ltvKRnXL3l5eSI6OlpoNBqh0+nEnj17ht0rfDiLiEgi/LpEIiKJsPSJiCTC0icikghLn4hI\nIix9IiKJsPSJiCTC0icikghLn4hIIv8DZWKfgA37zg4AAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 19 } ], "metadata": {} } ] }