{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we'll try to solve to a problem that arose when I was working on the Kaggle What's Cooking challenge (see my previous posts on the subject [here](http://flothesof.github.io/kaggle-whats-cooking-machine-learning.html) and [here](http://flothesof.github.io/kaggle-whatscooking-bokeh-plots.html)). \n", "\n", "In this challenge, the data, which consists of recipes, contains a few quirks. For example, the data may contain names of brands such as *knorr garlic minicub*, or mention weights and quantities that should have been cleaned up beforehand *(10 oz.) frozen chopped spinach, thawed and squeezed dry*.\n", "\n", "The goal of this notebook is therefore to develop a method that suggests better ingredient names in an automatic way, based on a given recipe.\n", "\n", "The outline of this blog post is as follows: we will first look at some overall statistics about the data, then develop a probabilistic model inspired by the principle behind spellcheckers and google auto-correct as explained by [Peter Norvig](http://norvig.com/spell-correct.html) and finally apply it to the machine learning algorithm used in the previous posts, the logistic regression." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Statistics about the data, at ingredient and word level" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section, we'll do some basic data exploration. In particular, let's try to answer the following questions:\n", "\n", "- how many ingredients do we find in the data? how many are unique?\n", "- how many individual words make up these ingredients?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's load the data. We read it using codecs because pandas hasn't support for specifying encoding in the json loading routine (see )." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import codecs\n", "df_train = pd.read_json(codecs.open('train.json', 'r', 'utf-8'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The head of the dataframe looks like this:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cuisineidingredients
0greek10259[romaine lettuce, black olives, grape tomatoes...
1southern_us25693[plain flour, ground pepper, salt, tomatoes, g...
2filipino20130[eggs, pepper, salt, mayonaise, cooking oil, g...
3indian22213[water, vegetable oil, wheat, salt]
4indian13162[black pepper, shallots, cornflour, cayenne pe...
\n", "
" ], "text/plain": [ " cuisine id ingredients\n", "0 greek 10259 [romaine lettuce, black olives, grape tomatoes...\n", "1 southern_us 25693 [plain flour, ground pepper, salt, tomatoes, g...\n", "2 filipino 20130 [eggs, pepper, salt, mayonaise, cooking oil, g...\n", "3 indian 22213 [water, vegetable oil, wheat, salt]\n", "4 indian 13162 [black pepper, shallots, cornflour, cayenne pe..." ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_train.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's build a single list with all the ingredients found in the dataset." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "all_ingredients_text = []\n", "for ingredient_list in df_train.ingredients:\n", " all_ingredients_text += [ing.lower() for ing in ingredient_list]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a look at the stats. We have the following number of ingredients in our recipes:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "428275" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(all_ingredients_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Among those, the number of unique ingredients is:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "6703" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(set(all_ingredients_text))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How about looking at these ingredients in terms of words? We can split each ingredient at a word boundary using a regexp and then count them:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['KRAFT',\n", " 'Shredded',\n", " 'Pepper',\n", " 'Jack',\n", " 'Cheese',\n", " 'with',\n", " 'a',\n", " 'TOUCH',\n", " 'OF',\n", " 'PHILADELPHIA']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "re.split(re.compile('[,. ]+'), 'KRAFT Shredded Pepper Jack Cheese with a TOUCH OF PHILADELPHIA')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's do that and compute the number of words and then the unique words:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "splitter = re.compile('[,. ]+')\n", "all_words = []\n", "for ingredient in all_ingredients_text:\n", " all_words += re.split(splitter, ingredient)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "807802" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(all_words)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How about unique ones?" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3152" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(set(all_words))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So to conclude, our dataset consists of:\n", "\n", "- 428275 ingredients\n", "- among which 6703 are unique\n", "- all these ingredients are made of 807802 words\n", "- among which 3152 are unique\n", "\n", "\n", "Let's now turn to the problems one can see with these ingredients." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Why some ingredients found in the data are strange" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The longest ingredient names found in the dataset are:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['pillsbury™ crescent recipe creations® refrigerated seamless dough sheet',\n", " 'kraft mexican style shredded four cheese with a touch of philadelphia',\n", " 'bertolli vineyard premium collect marinara with burgundi wine sauc',\n", " 'kraft shredded pepper jack cheese with a touch of philadelphia',\n", " 'hidden valley® farmhouse originals italian with herbs dressing',\n", " 'hidden valley® original ranch salad® dressing & seasoning mix',\n", " 'condensed reduced fat reduced sodium cream of mushroom soup',\n", " 'hellmann’ or best food canola cholesterol free mayonnais',\n", " 'condensed reduced fat reduced sodium cream of chicken soup',\n", " \"i can't believ it' not butter! made with olive oil spread\",\n", " 'wish-bone light asian sesame ginger vinaigrette dressing',\n", " '(10 oz.) frozen chopped spinach, thawed and squeezed dry',\n", " 'kraft mexican style 2% milk finely shredded four cheese',\n", " 'kraft shredded low-moisture part-skim mozzarella cheese',\n", " 'hurst family harvest chipotle lime black bean soup mix',\n", " 'reduced fat reduced sodium tomato and herb pasta sauce',\n", " 'frozen orange juice concentrate, thawed and undiluted',\n", " 'crystal farms reduced fat shredded marble jack cheese',\n", " \"i can't believe it's not butter!® all purpose sticks\",\n", " 'sargento® traditional cut shredded mozzarella cheese',\n", " 'lipton sparkling diet green tea with strawberry kiwi',\n", " 'sargento® traditional cut shredded 4 cheese mexican',\n", " 'hidden valley® original ranch® spicy ranch dressing',\n", " 'hidden valley® greek yogurt original ranch® dip mix',\n", " 'conimex woksaus specials vietnamese gember knoflook',\n", " 'honeysuckle white® hot italian turkey sausage links',\n", " 'ragu old world style sweet tomato basil pasta sauc',\n", " 'sargento® artisan blends® shredded parmesan cheese',\n", " 'reduced fat reduced sodium cream of mushroom soup',\n", " 'frozen lemonade concentrate, thawed and undiluted',\n", " 'reduced sodium reduced fat cream of mushroom soup',\n", " 'condensed reduced fat reduced sodium tomato soup',\n", " 'shredded reduced fat reduced sodium swiss cheese',\n", " 'knorr reduc sodium chicken flavor bouillon cube',\n", " '2 1/2 to 3 lb. chicken, cut into serving pieces',\n", " 'kraft mexican style finely shredded four cheese',\n", " 'frozen chopped spinach, thawed and squeezed dry',\n", " \"frank's® redhot® original cayenne pepper sauce\",\n", " 'reduced sodium condensed cream of chicken soup',\n", " 'foster farms boneless skinless chicken breasts',\n", " 'knorr tomato bouillon with chicken flavor cube',\n", " 'pompeian canola oil and extra virgin olive oil',\n", " 'hidden valley® original ranch® light dressing',\n", " \"uncle ben's ready rice whole grain brown rice\",\n", " 'soy vay® veri veri teriyaki® marinade & sauce',\n", " 'taco bell® home originals® taco seasoning mix',\n", " 'pillsbury™ refrigerated crescent dinner rolls',\n", " 'kraft reduced fat shredded mozzarella cheese',\n", " 'reduced sodium italian style stewed tomatoes',\n", " 'skinless and boneless chicken breast fillet']" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(set(all_ingredients_text), key=len, reverse=True)[:50]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's going on here? Several observations can be made from this ingredient list:\n", "\n", "- some ingredients contain special symbols that are not relevant (trademark, copyright, ...)\n", "- some ingredients feature brand names, for instance *KRAFT, Pillsbury, Hidden Valley*, that I would say are not relevant to the identification of the ingredient itself and that I would thus want to exclude\n", "- some ingredients contain sentences of english words instead of ingredients *i can't believ it' not butter! made with olive oil spread*\n", "- some ingredients contain spelling errors *burgundi wine sauc* should actually be spelled *burgundy wine sauce*\n", "- some ingredients contain quantities of ingredients that shouldn't be there: *2 1/2 to 3 lb. chicken, cut into serving pieces*\n", "\n", "Ideally, we want a function that returns a more relevant version of these ingredients. For example:\n", "\n", "| original ingredient | improved ingredient |\n", "|---------------------|---------------------|\n", "| pillsbury™ crescent recipe creations® refrigerated seamless dough sheet | refrigerated seamless dough sheet |\n", "| i can't believe it's not butter!® all purpose sticks | all purpose sticks |\n", "| 2 1/2 to 3 lb. chicken, cut into serving pieces | chicken pieces |\n", "\n", "These suggested improved ingredients are not unique and depend upon the knowledge of the person formulating them. The problem is that they're not easy to systematically generate. Therefore, it would be great if we could use a model based on the data itself to suggest them. This is what we're going to do in the next section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Building a simple model " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Theory " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given an ingredient, we have a choice to throw out some of its components (i.e. words). Judging if a component of an ingredient is useful or not is difficult. However, by following [Peter Norvig's approach on spell correcting](http://norvig.com/spell-correct.html) (or see these slides at Cornell University that explain Peter's approach in detail), we can formulate the problem of throwing out ingredient components in a probabilistic way. Although not stated in Norvig's writeup, this approach is close to something called a [Naive Bayes classifier](https://en.wikipedia.org/wiki/Naive_Bayes_classifier). \n", "\n", "Let's take a closer look at what we want to do. We want our improved ingredient to maximize its probability given our input ingredient:\n", "\n", "$$\n", "\\DeclareMathOperator*{\\argmax}{arg\\,max}\n", "\\argmax_{b}P\\left({b \\mid i}\\right)\n", "$$\n", "\n", "Here, I denoted by $b$ the better ingredient and by $i$ the original ingredient.\n", "\n", "The previous expression can be transformed according to Bayes Rule (following Peter Norvig) to yield:\n", "\n", "$$\n", "\\argmax_{b} {P\\left({i \\mid b}\\right) P(b) \\over P(i)}\n", "$$\n", "\n", "Given that the ingredient $i$ is fixed, we neglect the denominator (as Peter does) and end up with:\n", "\n", "$$\n", "\\argmax_{b} P\\left({i \\mid b}\\right) P(b)\n", "$$\n", "\n", "This leaves us with three terms that balance out: \n", "\n", "- the probability of occurence of ingredient in our data $P(b)$ (this will favor commonly used ingredients)\n", "- the error model, which says how likely it is that the input ingredient $i$ is a modified version of the better ingredient $b$\n", "- the argmax, our control mechanism, which chooses the $b$ that gives the best combined probability score\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our error model will be that we can only delete words from our ingredient. So what we need is to generate a list of possible ingredients based on our original ingredient by substracting words. Also, we will assume that word order doesn't matter, so we can represent an ingredient by a set." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modelling ingredients using sets " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Ingredients** contain a fixed number of words. We will therefore model them as **frozensets** of **words**. This will allow us to manipulate them more easily in the remaining document." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's define a function that creates an ingredient from a text string:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import re\n", "\n", "def to_ingredient(text):\n", " \"Transforms text into an ingredient.\"\n", " return frozenset(re.split(re.compile('[,. ]+'), text))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's build a list of all our ingredients using this function:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [], "source": [ "all_ingredients = [to_ingredient(text) for text in all_ingredients_text]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[frozenset({'lettuce', 'romaine'}),\n", " frozenset({'black', 'olives'}),\n", " frozenset({'grape', 'tomatoes'}),\n", " frozenset({'garlic'}),\n", " frozenset({'pepper'}),\n", " frozenset({'onion', 'purple'}),\n", " frozenset({'seasoning'}),\n", " frozenset({'beans', 'garbanzo'}),\n", " frozenset({'cheese', 'crumbles', 'feta'}),\n", " frozenset({'flour', 'plain'})]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_ingredients[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now implement our algorithm given our model of an ingredient." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now write a function that generates all possible candidate ingredients that can be built from a starting ingredient, using combinations found in the Python standard library." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import itertools\n", "\n", "def candidates(ingredient):\n", " \"Returns a list of candidate ingredients obtained from the original ingredient by keeping at least one of them.\"\n", " n = len(ingredient)\n", " possible = []\n", " for i in range(1, n + 1):\n", " possible += [frozenset(combi) for combi in itertools.combinations(ingredient, i)]\n", " return possible" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how this works on examples:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[frozenset({'pasta'}),\n", " frozenset({'herb'}),\n", " frozenset({'sauce'}),\n", " frozenset({'and'}),\n", " frozenset({'tomato'}),\n", " frozenset({'herb', 'pasta'}),\n", " frozenset({'pasta', 'sauce'}),\n", " frozenset({'and', 'pasta'}),\n", " frozenset({'pasta', 'tomato'}),\n", " frozenset({'herb', 'sauce'}),\n", " frozenset({'and', 'herb'}),\n", " frozenset({'herb', 'tomato'}),\n", " frozenset({'and', 'sauce'}),\n", " frozenset({'sauce', 'tomato'}),\n", " frozenset({'and', 'tomato'}),\n", " frozenset({'herb', 'pasta', 'sauce'}),\n", " frozenset({'and', 'herb', 'pasta'}),\n", " frozenset({'herb', 'pasta', 'tomato'}),\n", " frozenset({'and', 'pasta', 'sauce'}),\n", " frozenset({'pasta', 'sauce', 'tomato'}),\n", " frozenset({'and', 'pasta', 'tomato'}),\n", " frozenset({'and', 'herb', 'sauce'}),\n", " frozenset({'herb', 'sauce', 'tomato'}),\n", " frozenset({'and', 'herb', 'tomato'}),\n", " frozenset({'and', 'sauce', 'tomato'}),\n", " frozenset({'and', 'herb', 'pasta', 'sauce'}),\n", " frozenset({'herb', 'pasta', 'sauce', 'tomato'}),\n", " frozenset({'and', 'herb', 'pasta', 'tomato'}),\n", " frozenset({'and', 'pasta', 'sauce', 'tomato'}),\n", " frozenset({'and', 'herb', 'sauce', 'tomato'}),\n", " frozenset({'and', 'herb', 'pasta', 'sauce', 'tomato'})]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "candidates(to_ingredient(\"tomato and herb pasta sauce\"))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[frozenset({'flavor'}),\n", " frozenset({'chicken'}),\n", " frozenset({'cube'}),\n", " frozenset({'knorr'}),\n", " frozenset({'bouillon'}),\n", " frozenset({'chicken', 'flavor'}),\n", " frozenset({'cube', 'flavor'}),\n", " frozenset({'flavor', 'knorr'}),\n", " frozenset({'bouillon', 'flavor'}),\n", " frozenset({'chicken', 'cube'}),\n", " frozenset({'chicken', 'knorr'}),\n", " frozenset({'bouillon', 'chicken'}),\n", " frozenset({'cube', 'knorr'}),\n", " frozenset({'bouillon', 'cube'}),\n", " frozenset({'bouillon', 'knorr'}),\n", " frozenset({'chicken', 'cube', 'flavor'}),\n", " frozenset({'chicken', 'flavor', 'knorr'}),\n", " frozenset({'bouillon', 'chicken', 'flavor'}),\n", " frozenset({'cube', 'flavor', 'knorr'}),\n", " frozenset({'bouillon', 'cube', 'flavor'}),\n", " frozenset({'bouillon', 'flavor', 'knorr'}),\n", " frozenset({'chicken', 'cube', 'knorr'}),\n", " frozenset({'bouillon', 'chicken', 'cube'}),\n", " frozenset({'bouillon', 'chicken', 'knorr'}),\n", " frozenset({'bouillon', 'cube', 'knorr'}),\n", " frozenset({'chicken', 'cube', 'flavor', 'knorr'}),\n", " frozenset({'bouillon', 'chicken', 'cube', 'flavor'}),\n", " frozenset({'bouillon', 'chicken', 'flavor', 'knorr'}),\n", " frozenset({'bouillon', 'cube', 'flavor', 'knorr'}),\n", " frozenset({'bouillon', 'chicken', 'cube', 'knorr'}),\n", " frozenset({'bouillon', 'chicken', 'cube', 'flavor', 'knorr'})]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "candidates(to_ingredient('knorr chicken flavor bouillon cube'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The final step is to compute probabilities of candidate words. This is done using a counter of ingredients: the higher the count, the higher its probability (we don't bother normalizing here)." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[(frozenset({'salt'}), 18049),\n", " (frozenset({'oil', 'olive'}), 7972),\n", " (frozenset({'onions'}), 7972),\n", " (frozenset({'water'}), 7457),\n", " (frozenset({'garlic'}), 7380),\n", " (frozenset({'sugar'}), 6434),\n", " (frozenset({'cloves', 'garlic'}), 6237),\n", " (frozenset({'butter'}), 4848),\n", " (frozenset({'black', 'ground', 'pepper'}), 4785),\n", " (frozenset({'all-purpose', 'flour'}), 4632),\n", " (frozenset({'pepper'}), 4438),\n", " (frozenset({'oil', 'vegetable'}), 4385),\n", " (frozenset({'eggs'}), 3388),\n", " (frozenset({'sauce', 'soy'}), 3296),\n", " (frozenset({'kosher', 'salt'}), 3113),\n", " (frozenset({'green', 'onions'}), 3078),\n", " (frozenset({'tomatoes'}), 3058),\n", " (frozenset({'eggs', 'large'}), 2948),\n", " (frozenset({'carrots'}), 2814),\n", " (frozenset({'butter', 'unsalted'}), 2782)]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from collections import Counter\n", "\n", "c = Counter(all_ingredients)\n", "\n", "c.most_common(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we're all set to compute the best candidate for a given input.\n", "\n", "First, let's build the probability evaluation for a possible ingredient using a default dict (so that we don't end up asking for an ingredient that doesn't exist):" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from collections import defaultdict\n", "probability = defaultdict(lambda: 1, c.most_common())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's test the probability:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "probability[to_ingredient('pasta and herb')]" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "867" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "probability[to_ingredient('tomato sauce')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seems like what we expect: tomato sauce has a higher probability than pasta and herb, which doesn't appear in our initial words." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now write the function that yields the most probable replacement ingredient among all possible candidates." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def best_replacement(ingredient):\n", " \"Computes best replacement ingredient for a given input.\"\n", " return max(candidates(ingredient), key=lambda c: probability[c])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now look at some examples:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "frozenset({'sauce', 'tomato'})" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "best_replacement(to_ingredient(\"tomato sauce\"))" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "frozenset({'pasta'})" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "best_replacement(to_ingredient(\"pasta and herb\"))" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "frozenset({'cheese'})" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "best_replacement(to_ingredient(\"kraft mexican style shredded four cheese with a touch of philadelphia\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These examples all look good. What about the less frequent ingredients we had in our data?" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
original ingredientimproved ingredient
0pillsbury™ crescent recipe creations® refriger...dough
1kraft mexican style shredded four cheese with ...cheese
2bertolli vineyard premium collect marinara wit...wine
3kraft shredded pepper jack cheese with a touch...pepper
4hidden valley® farmhouse originals italian wit...herbs
5hidden valley® original ranch salad® dressing ...seasoning
6condensed reduced fat reduced sodium cream of ...cream
7hellmann’ or best food canola cholesterol fr...canola
8condensed reduced fat reduced sodium cream of ...chicken
9i can't believ it' not butter! made with olive...oil olive
10wish-bone light asian sesame ginger vinaigrett...ginger
11(10 oz.) frozen chopped spinach, thawed and sq...spinach
12kraft mexican style 2% milk finely shredded fo...milk
13kraft shredded low-moisture part-skim mozzarel...mozzarella cheese shredded
14hurst family harvest chipotle lime black bean ...lime
15reduced fat reduced sodium tomato and herb pas...sauce tomato
16frozen orange juice concentrate, thawed and un...orange
17crystal farms reduced fat shredded marble jack...cheese
18i can't believe it's not butter!® all purpose ...believe can't butter!® purpose it's not all st...
19sargento® traditional cut shredded mozzarella ...mozzarella cheese shredded
20lipton sparkling diet green tea with strawberr...kiwi
21sargento® traditional cut shredded 4 cheese me...cheese
22hidden valley® original ranch® spicy ranch dre...ranch dressing
23hidden valley® greek yogurt original ranch® di...greek yogurt
24conimex woksaus specials vietnamese gember kno...woksaus
25honeysuckle white® hot italian turkey sausage ...sausage italian
26ragu old world style sweet tomato basil pasta ...pasta
27sargento® artisan blends® shredded parmesan ch...parmesan cheese
28reduced fat reduced sodium cream of mushroom soupcream
29frozen lemonade concentrate, thawed and undilutedfrozen lemonade concentrate
30reduced sodium reduced fat cream of mushroom soupcream
31condensed reduced fat reduced sodium tomato soupfat
32shredded reduced fat reduced sodium swiss cheesecheese
33knorr reduc sodium chicken flavor bouillon cubechicken
342 1/2 to 3 lb. chicken, cut into serving pieceschicken
35kraft mexican style finely shredded four cheesecheese
36frozen chopped spinach, thawed and squeezed dryspinach
37frank's® redhot® original cayenne pepper saucepepper
38reduced sodium condensed cream of chicken soupchicken
39foster farms boneless skinless chicken breastsbreasts boneless skinless chicken
40knorr tomato bouillon with chicken flavor cubechicken
41pompeian canola oil and extra virgin olive oiloil olive
42hidden valley® original ranch® light dressingdressing
43uncle ben's ready rice whole grain brown ricerice
44soy vay® veri veri teriyaki® marinade & saucesoy sauce
45taco bell® home originals® taco seasoning mixseasoning taco
46pillsbury™ refrigerated crescent dinner rollsrolls
47kraft reduced fat shredded mozzarella cheesemozzarella cheese shredded
48reduced sodium italian style stewed tomatoestomatoes
49skinless and boneless chicken breast filletchicken
\n", "
" ], "text/plain": [ " original ingredient \\\n", "0 pillsbury™ crescent recipe creations® refriger... \n", "1 kraft mexican style shredded four cheese with ... \n", "2 bertolli vineyard premium collect marinara wit... \n", "3 kraft shredded pepper jack cheese with a touch... \n", "4 hidden valley® farmhouse originals italian wit... \n", "5 hidden valley® original ranch salad® dressing ... \n", "6 condensed reduced fat reduced sodium cream of ... \n", "7 hellmann’ or best food canola cholesterol fr... \n", "8 condensed reduced fat reduced sodium cream of ... \n", "9 i can't believ it' not butter! made with olive... \n", "10 wish-bone light asian sesame ginger vinaigrett... \n", "11 (10 oz.) frozen chopped spinach, thawed and sq... \n", "12 kraft mexican style 2% milk finely shredded fo... \n", "13 kraft shredded low-moisture part-skim mozzarel... \n", "14 hurst family harvest chipotle lime black bean ... \n", "15 reduced fat reduced sodium tomato and herb pas... \n", "16 frozen orange juice concentrate, thawed and un... \n", "17 crystal farms reduced fat shredded marble jack... \n", "18 i can't believe it's not butter!® all purpose ... \n", "19 sargento® traditional cut shredded mozzarella ... \n", "20 lipton sparkling diet green tea with strawberr... \n", "21 sargento® traditional cut shredded 4 cheese me... \n", "22 hidden valley® original ranch® spicy ranch dre... \n", "23 hidden valley® greek yogurt original ranch® di... \n", "24 conimex woksaus specials vietnamese gember kno... \n", "25 honeysuckle white® hot italian turkey sausage ... \n", "26 ragu old world style sweet tomato basil pasta ... \n", "27 sargento® artisan blends® shredded parmesan ch... \n", "28 reduced fat reduced sodium cream of mushroom soup \n", "29 frozen lemonade concentrate, thawed and undiluted \n", "30 reduced sodium reduced fat cream of mushroom soup \n", "31 condensed reduced fat reduced sodium tomato soup \n", "32 shredded reduced fat reduced sodium swiss cheese \n", "33 knorr reduc sodium chicken flavor bouillon cube \n", "34 2 1/2 to 3 lb. chicken, cut into serving pieces \n", "35 kraft mexican style finely shredded four cheese \n", "36 frozen chopped spinach, thawed and squeezed dry \n", "37 frank's® redhot® original cayenne pepper sauce \n", "38 reduced sodium condensed cream of chicken soup \n", "39 foster farms boneless skinless chicken breasts \n", "40 knorr tomato bouillon with chicken flavor cube \n", "41 pompeian canola oil and extra virgin olive oil \n", "42 hidden valley® original ranch® light dressing \n", "43 uncle ben's ready rice whole grain brown rice \n", "44 soy vay® veri veri teriyaki® marinade & sauce \n", "45 taco bell® home originals® taco seasoning mix \n", "46 pillsbury™ refrigerated crescent dinner rolls \n", "47 kraft reduced fat shredded mozzarella cheese \n", "48 reduced sodium italian style stewed tomatoes \n", "49 skinless and boneless chicken breast fillet \n", "\n", " improved ingredient \n", "0 dough \n", "1 cheese \n", "2 wine \n", "3 pepper \n", "4 herbs \n", "5 seasoning \n", "6 cream \n", "7 canola \n", "8 chicken \n", "9 oil olive \n", "10 ginger \n", "11 spinach \n", "12 milk \n", "13 mozzarella cheese shredded \n", "14 lime \n", "15 sauce tomato \n", "16 orange \n", "17 cheese \n", "18 believe can't butter!® purpose it's not all st... \n", "19 mozzarella cheese shredded \n", "20 kiwi \n", "21 cheese \n", "22 ranch dressing \n", "23 greek yogurt \n", "24 woksaus \n", "25 sausage italian \n", "26 pasta \n", "27 parmesan cheese \n", "28 cream \n", "29 frozen lemonade concentrate \n", "30 cream \n", "31 fat \n", "32 cheese \n", "33 chicken \n", "34 chicken \n", "35 cheese \n", "36 spinach \n", "37 pepper \n", "38 chicken \n", "39 breasts boneless skinless chicken \n", "40 chicken \n", "41 oil olive \n", "42 dressing \n", "43 rice \n", "44 soy sauce \n", "45 seasoning taco \n", "46 rolls \n", "47 mozzarella cheese shredded \n", "48 tomatoes \n", "49 chicken " ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame([(text, \n", " \" \".join(best_replacement(to_ingredient(text)))) \n", " for text in sorted(set(all_ingredients_text), key=len, reverse=True)[:50]],\n", " columns=['original ingredient', 'improved ingredient'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is interesting: a lot of the ingredients get well simplified, which is exactly what we want. Two comments:\n", "\n", "- some simplifications are not the ones expected *i can't believe it's not butter!® all purpose sticks*, *reduced sodium reduced fat cream of mushroom soup*\n", "- some get simplified **too much**: *skinless and boneless chicken breast fillet* becomes *chicken*, *reduced sodium italian style stewed tomatoes* becomes *tomatoes*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This model fails to take into account one thing: if we leave out too many ingredients in our replacement ingredient, the distance to the original ingredient increases. This observation is analogous to Peter Norvig's spell checker, where he considers the possible candidates by increasing edit distance: first words that have only one modification compared to the original word, then two modifications, then three... \n", "\n", "To compensate for this oversimplifcation, let's modify our procedure to keep a new ingredient that is as close as possible to the original." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Building a slightly more elaborate model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only thing we have to change is the way our candidates function works. Instead of generating all possibilites it should return only the ones that exist in our vocabulary of recipes with the least possible number of modifications. Here the modifications can be thought of leaving out a given number of words, by increasing the number of words left out." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def candidates_increasing_distance(ingredient, vocabulary):\n", " \"Returns candidate ingredients obtained from the original ingredient by substraction, largest number of ingredients first.\"\n", " n = len(ingredient)\n", " for i in range(n - 1, 1, -1):\n", " possible = [frozenset(combi) for combi in itertools.combinations(ingredient, i) \n", " if frozenset(combi) in vocabulary]\n", " if len(possible) > 0:\n", " return possible\n", " return [ingredient]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will define our vocabulary by the already existing counted ingredients:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true }, "outputs": [], "source": [ "vocabulary = dict(c.most_common())" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[frozenset({'chicken', 'cubes', 'stock'}),\n", " frozenset({'creations®',\n", " 'crescent',\n", " 'dough',\n", " 'pillsbury™',\n", " 'recipe',\n", " 'refrigerated',\n", " 'seamless',\n", " 'sheet'}),\n", " frozenset({'nori', 'shredded'}),\n", " frozenset({'eye', 'of', 'roast', 'round'}),\n", " frozenset({'gumbo', 'mixture', 'vegetable'}),\n", " frozenset({'curry', 'green', 'paste', 'thai'}),\n", " frozenset({'bouillon', 'chicken', 'cube', 'flavor', 'knorr'}),\n", " frozenset({'beef', 'roast', 'shoulder'}),\n", " frozenset({'grained'}),\n", " frozenset({'dried', 'ear', 'mushrooms', 'wood'})]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(vocabulary.keys())[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's test our function on a few examples:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[frozenset({'clam', 'juice'})]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "candidates_increasing_distance(to_ingredient(\"bottled clam juice\"), vocabulary)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[frozenset({'dressing', 'ranch'})]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "candidates_increasing_distance(to_ingredient('hidden valley original ranch spicy ranch dressing'), vocabulary)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[frozenset({'boneless', 'breast', 'chicken', 'skinless'})]" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "candidates_increasing_distance(to_ingredient(\"skinless and boneless chicken breast fillet\"), vocabulary)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[frozenset({'italian', 'stewed', 'style', 'tomatoes'})]" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "candidates_increasing_distance(to_ingredient(\"reduced sodium italian style stewed tomatoes\"), vocabulary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we see, this works better on the examples that got simplified too much using the previous model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now write a new function for computing the best replacement, that takes into account the probabilities:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def best_replacement_increasing_distance(ingredient, vocabulary):\n", " \"Computes best replacement ingredient for a given input.\"\n", " return max(candidates_increasing_distance(ingredient, vocabulary), key=lambda w: vocabulary[w])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And let's apply this to our lesser used ingredients:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
original ingredientimproved ingredient
0pillsbury™ crescent recipe creations® refriger...refrigerated seamless dough crescent
1kraft mexican style shredded four cheese with ...cheese shredded
2bertolli vineyard premium collect marinara wit...wine marinara bertolli vineyard with burgundi ...
3kraft shredded pepper jack cheese with a touch...kraft jack pepper cheese shredded
4hidden valley® farmhouse originals italian wit...herbs italian
5hidden valley® original ranch salad® dressing ...ranch dressing
6condensed reduced fat reduced sodium cream of ...mushroom fat cream reduced soup of sodium
7hellmann’ or best food canola cholesterol fr...best canola cholesterol mayonnais food or free...
8condensed reduced fat reduced sodium cream of ...cream chicken reduced soup condensed of sodium
9i can't believ it' not butter! made with olive...oil olive
10wish-bone light asian sesame ginger vinaigrett...vinaigrette dressing
11(10 oz.) frozen chopped spinach, thawed and sq...chopped frozen spinach thawed squeezed and dry
12kraft mexican style 2% milk finely shredded fo...mexican shredded finely style cheese kraft four
13kraft shredded low-moisture part-skim mozzarel...kraft mozzarella cheese shredded
14hurst family harvest chipotle lime black bean ...bean soup mix
15reduced fat reduced sodium tomato and herb pas...sauce tomato
16frozen orange juice concentrate, thawed and un...orange frozen juice concentrate
17crystal farms reduced fat shredded marble jack...cheese reduced shredded fat
18i can't believe it's not butter!® all purpose ...believe can't butter!® sticks not it's all i p...
19sargento® traditional cut shredded mozzarella ...mozzarella cheese shredded
20lipton sparkling diet green tea with strawberr...tea green
21sargento® traditional cut shredded 4 cheese me...cheese shredded
22hidden valley® original ranch® spicy ranch dre...hidden valley® ranch® original dressing
23hidden valley® greek yogurt original ranch® di...greek yogurt
24conimex woksaus specials vietnamese gember kno...woksaus specials knoflook gember conimex vietn...
25honeysuckle white® hot italian turkey sausage ...hot sausage turkey italian links
26ragu old world style sweet tomato basil pasta ...pasta world old ragu style sauc
27sargento® artisan blends® shredded parmesan ch...parmesan cheese shredded
28reduced fat reduced sodium cream of mushroom soupmushroom fat cream reduced soup of
29frozen lemonade concentrate, thawed and undilutedfrozen lemonade concentrate
30reduced sodium reduced fat cream of mushroom soupmushroom fat cream reduced soup of
31condensed reduced fat reduced sodium tomato soupcondensed soup tomato
32shredded reduced fat reduced sodium swiss cheeseswiss reduced cheese fat
33knorr reduc sodium chicken flavor bouillon cubebouillon reduc flavor knorr chicken sodium
342 1/2 to 3 lb. chicken, cut into serving piecespieces chicken
35kraft mexican style finely shredded four cheesecheese shredded
36frozen chopped spinach, thawed and squeezed drychopped frozen spinach
37frank's® redhot® original cayenne pepper saucepepper cayenne sauce
38reduced sodium condensed cream of chicken soupcondensed soup cream chicken of
39foster farms boneless skinless chicken breastsbreasts boneless skinless chicken
40knorr tomato bouillon with chicken flavor cubebouillon tomato with flavor knorr chicken
41pompeian canola oil and extra virgin olive oiloil virgin olive
42hidden valley® original ranch® light dressinghidden valley® ranch® original dressing
43uncle ben's ready rice whole grain brown ricewhole grain rice
44soy vay® veri veri teriyaki® marinade & saucesoy sauce
45taco bell® home originals® taco seasoning mixseasoning taco mix
46pillsbury™ refrigerated crescent dinner rollsrefrigerated rolls crescent
47kraft reduced fat shredded mozzarella cheesecheese reduced shredded fat
48reduced sodium italian style stewed tomatoestomatoes style italian stewed
49skinless and boneless chicken breast filletskinless breast chicken boneless
\n", "
" ], "text/plain": [ " original ingredient \\\n", "0 pillsbury™ crescent recipe creations® refriger... \n", "1 kraft mexican style shredded four cheese with ... \n", "2 bertolli vineyard premium collect marinara wit... \n", "3 kraft shredded pepper jack cheese with a touch... \n", "4 hidden valley® farmhouse originals italian wit... \n", "5 hidden valley® original ranch salad® dressing ... \n", "6 condensed reduced fat reduced sodium cream of ... \n", "7 hellmann’ or best food canola cholesterol fr... \n", "8 condensed reduced fat reduced sodium cream of ... \n", "9 i can't believ it' not butter! made with olive... \n", "10 wish-bone light asian sesame ginger vinaigrett... \n", "11 (10 oz.) frozen chopped spinach, thawed and sq... \n", "12 kraft mexican style 2% milk finely shredded fo... \n", "13 kraft shredded low-moisture part-skim mozzarel... \n", "14 hurst family harvest chipotle lime black bean ... \n", "15 reduced fat reduced sodium tomato and herb pas... \n", "16 frozen orange juice concentrate, thawed and un... \n", "17 crystal farms reduced fat shredded marble jack... \n", "18 i can't believe it's not butter!® all purpose ... \n", "19 sargento® traditional cut shredded mozzarella ... \n", "20 lipton sparkling diet green tea with strawberr... \n", "21 sargento® traditional cut shredded 4 cheese me... \n", "22 hidden valley® original ranch® spicy ranch dre... \n", "23 hidden valley® greek yogurt original ranch® di... \n", "24 conimex woksaus specials vietnamese gember kno... \n", "25 honeysuckle white® hot italian turkey sausage ... \n", "26 ragu old world style sweet tomato basil pasta ... \n", "27 sargento® artisan blends® shredded parmesan ch... \n", "28 reduced fat reduced sodium cream of mushroom soup \n", "29 frozen lemonade concentrate, thawed and undiluted \n", "30 reduced sodium reduced fat cream of mushroom soup \n", "31 condensed reduced fat reduced sodium tomato soup \n", "32 shredded reduced fat reduced sodium swiss cheese \n", "33 knorr reduc sodium chicken flavor bouillon cube \n", "34 2 1/2 to 3 lb. chicken, cut into serving pieces \n", "35 kraft mexican style finely shredded four cheese \n", "36 frozen chopped spinach, thawed and squeezed dry \n", "37 frank's® redhot® original cayenne pepper sauce \n", "38 reduced sodium condensed cream of chicken soup \n", "39 foster farms boneless skinless chicken breasts \n", "40 knorr tomato bouillon with chicken flavor cube \n", "41 pompeian canola oil and extra virgin olive oil \n", "42 hidden valley® original ranch® light dressing \n", "43 uncle ben's ready rice whole grain brown rice \n", "44 soy vay® veri veri teriyaki® marinade & sauce \n", "45 taco bell® home originals® taco seasoning mix \n", "46 pillsbury™ refrigerated crescent dinner rolls \n", "47 kraft reduced fat shredded mozzarella cheese \n", "48 reduced sodium italian style stewed tomatoes \n", "49 skinless and boneless chicken breast fillet \n", "\n", " improved ingredient \n", "0 refrigerated seamless dough crescent \n", "1 cheese shredded \n", "2 wine marinara bertolli vineyard with burgundi ... \n", "3 kraft jack pepper cheese shredded \n", "4 herbs italian \n", "5 ranch dressing \n", "6 mushroom fat cream reduced soup of sodium \n", "7 best canola cholesterol mayonnais food or free... \n", "8 cream chicken reduced soup condensed of sodium \n", "9 oil olive \n", "10 vinaigrette dressing \n", "11 chopped frozen spinach thawed squeezed and dry \n", "12 mexican shredded finely style cheese kraft four \n", "13 kraft mozzarella cheese shredded \n", "14 bean soup mix \n", "15 sauce tomato \n", "16 orange frozen juice concentrate \n", "17 cheese reduced shredded fat \n", "18 believe can't butter!® sticks not it's all i p... \n", "19 mozzarella cheese shredded \n", "20 tea green \n", "21 cheese shredded \n", "22 hidden valley® ranch® original dressing \n", "23 greek yogurt \n", "24 woksaus specials knoflook gember conimex vietn... \n", "25 hot sausage turkey italian links \n", "26 pasta world old ragu style sauc \n", "27 parmesan cheese shredded \n", "28 mushroom fat cream reduced soup of \n", "29 frozen lemonade concentrate \n", "30 mushroom fat cream reduced soup of \n", "31 condensed soup tomato \n", "32 swiss reduced cheese fat \n", "33 bouillon reduc flavor knorr chicken sodium \n", "34 pieces chicken \n", "35 cheese shredded \n", "36 chopped frozen spinach \n", "37 pepper cayenne sauce \n", "38 condensed soup cream chicken of \n", "39 breasts boneless skinless chicken \n", "40 bouillon tomato with flavor knorr chicken \n", "41 oil virgin olive \n", "42 hidden valley® ranch® original dressing \n", "43 whole grain rice \n", "44 soy sauce \n", "45 seasoning taco mix \n", "46 refrigerated rolls crescent \n", "47 cheese reduced shredded fat \n", "48 tomatoes style italian stewed \n", "49 skinless breast chicken boneless " ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame([(text, \n", " \" \".join(best_replacement_increasing_distance(to_ingredient(text), vocabulary))) \n", " for text in sorted(set(all_ingredients_text), key=len, reverse=True)[:50]],\n", " columns=['original ingredient', 'improved ingredient'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This result is interesting: the refined labels given here seem much better than the originals. However, there are still problems with the new ingredients. In particular, special symbols pollute the data. Therefore it's a good idea to first clean the data before employing our method.\n", "\n", "Hopefully, this leads us to a better ingredient names, leading to better classification. To assert this, we'll train again on the dataset and see if we can expect some improvement compared to our previous model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Training a machine learning model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's train the [previous model](http://flothesof.github.io/kaggle-whats-cooking-machine-learning.html):" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_train['all_ingredients'] = df_train['ingredients'].map(\";\".join)\n", "from sklearn.feature_extraction.text import CountVectorizer\n", "cv = CountVectorizer()\n", "X = cv.fit_transform(df_train['all_ingredients'].values)\n", "from sklearn.preprocessing import LabelEncoder\n", "enc = LabelEncoder()\n", "y = enc.fit_transform(df_train.cuisine)\n", "from sklearn.cross_validation import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's evaluate the accuracy of our model by doing a K-fold cross validation (training the model several times and check the error each time)." ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.cross_validation import cross_val_score, KFold\n", "from scipy.stats import sem\n", "import numpy as np\n", "def evaluate_cross_validation(clf, X, y, K):\n", " # create a k-fold cross validation iterator\n", " cv = KFold(len(y), K, shuffle=True, random_state=0)\n", " # by default the score used is the one returned by score method of the estimator (accuracy)\n", " scores = cross_val_score(clf, X, y, cv=cv)\n", " print (scores)\n", " print (\"Mean score: {0:.3f} (+/-{1:.3f})\".format(\n", " np.mean(scores), sem(scores)))" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.79195475 0.78830924 0.78290383 0.79019485 0.78124214]\n", "Mean score: 0.787 (+/-0.002)\n" ] } ], "source": [ "from sklearn.linear_model import LogisticRegression\n", "logistic = LogisticRegression()\n", "evaluate_cross_validation(logistic, X, y, 5)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Now that we have an initial evaluation of our algorithm, let's feed it our better vectors. All we have to do is rebuild the X matrix using our new feature ingredients. We also perform some string substitution to take into account the special characters." ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def clean_string(ingredient_text):\n", " \"Cleans the ingredient from special characters and returns it as lowercase.\"\n", " return \"\".join([char for char in ingredient_text.lower() if char.isalnum() or char.isspace()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's test this:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'franks redhot original cayenne pepper sauce'" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clean_string(\"frank's® redhot® original cayenne pepper sauce\")" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'2 12 to 3 lb chicken cut into serving pieces'" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clean_string(\"2 1/2 to 3 lb. chicken, cut into serving pieces\")" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'i cant believe its not butter all purpose sticks'" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clean_string(\"i can't believe it's not butter!® all purpose sticks\")" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'extravirgin olive oil'" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clean_string(\"extra-virgin olive oil\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can put together our function for improving a list of ingredients from the dataframe. We'll use a treshold for improving ingredients because very popular ingredients should just stay the same and not get cleaned. To do this, let's plot a graph for the value counts of the different ingredients found:" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('bmh')" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlYAAAFyCAYAAAA3cJSiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYXFWd//H3tzvp7DtJSAJJIA0JCYEmIqK2RowaGARG\nUDSoIFHHZUYZB5U44g8ddWyXzAzK6KhgBEZEDTICggQBwQaBhKSTQBJCJ0B2sqez9X5+f9xT3dVL\n9Xqq61bV5/U8eVL3VtWtU5+uunXqnm+da845RERERKT3CjLdABEREZFcoY6ViIiISCDqWImIiIgE\noo6ViIiISCDqWImIiIgEoo6ViIiISCDqWImIiIgEoo6ViIiISCB90rEys7lm9qSZ/cTM3t4Xjyki\nIiLS1/rqiJUDDgMDgG199JgiIiIifapHHSszu83MXjezNa3WX2hmG8xso5ndkFjvnHvSOXcxsAj4\nt941WURERCSeenrEagkwP3mFmRUAt/j1s4AFZjaj1f0OAkU9fEwRERGRWOvXkzs558rNbEqr1ecB\nLzvnXgMws7uBy4ANZvY+og7XCKLOl4iIiEjO6VHHKoVJwNak5W1EnS2cc/cC93Z050svvdRVV1dz\n4oknAjBkyBCKi4spKSkBoKKiAkDLXVhOXI5Le7J9WXmGW06si0t7sn1ZeSrPuC5XVlby/ve/Pzbt\nCbEMsHr1anbt2gXA/Pnzuf76641WzDnXel2X+CNW9zvnzvLLVwDznXP/4Jc/ApznnPt8V7Z39dVX\nu5tvvrlHbZGWysrKWLRoUaabkTOUZzjKMizlGZbyDCcfsly5ciXz5s1r07EK+avA7cDkpOWT/Lou\nSfQApfe2bNmS6SbkFOUZjrIMS3mGpTzDyecse9OxMv8vYTlQbGZTzKwI+BBwX1c3tn//fsrKyigv\nL+9Fk0RERETSp7y8nLKyshZDhMl6VGNlZncB7wDGmNkW4Cbn3BIz+xywjKjDdptzbn1Xt7lgwQIW\nLlzYk+ZIK1dddVWmm5BTlGc4yjIs5RmW8gwnl7MsLS2ltLSUlStXtnt9j2usQnv00UfdnDlzMt0M\nERERkU71RY1VryxdulRDgYEow7CUZzjKMizlGZbyDCeXs0zLUGA6FBcXayhQREREYk1DgSIiIiKB\nxX4oUERERCTbxaZjpRqrcJRhWMozHGUZlvIMS3mGk8tZqsZKREREJBDVWImIiIgEphorERERkTSL\nTccq1VildF8uj21ngvIMR1mGpTzDUp7h5HOWselYVVZWqnhdREREYq2z4nXVWImIiIh0k2qsRERE\nRNIsNh0r1ViFo+HUsJRnOMoyLOUZlvIMJ5+zjE3HSkRERCTbxabGavHixW737t1NE2+JiIiIxE15\neTnl5eWMGzeO66+/vk2NVWw6VipeFxERkWwR++J11ViFk89j2+mgPMNRlmEpz7CUZzj5nGVsOlYi\nIiIi2U5DgSIiIiLdFPuhQBEREZFsF5uOlWqswsnnse10UJ7hKMuwlGdYyjOcfM4yNh0rnStQRERE\n4k7nChQREREJTDVWIiIiImkWm46VaqzC0XBqWMozHGUZlvIMS3mGk89ZxqZjJSIiIpLtVGMlIiIi\n0k2qsRIRERFJs9h0rFRjFU4+j22ng/IMR1mGpTzDUp7h5HOWselYaR4rERERiTvNYyUiIiISmGqs\nRERERNIsNh0r1ViFo+HUsJRnOMoyLOUZlvIMJ5+zjE3HSkRERCTbqcZKREREpJtUYyUiIiKSZrHp\nWKnGKpx8HttOB+UZjrIMS3mGpTzDyecsY9OxEhEREcl2qrESERER6SbVWImIiIikWWw6VqqxCief\nx7bTQXmGoyzDUp5hKc9w8jnL2HSsdK5AERERiTudK1BEREQkkEPV9eyoquH4tpdUYyUiIiLSG89t\nPcR1921MeX1sOlaqsQpHw6lhKc9wlGVYyjMs5RlOPmcZm46ViIiISNx1VkGlGisRERGRLlq2cR8/\neHILZXOcaqxEREREeqOzw1Gx6VipxiqcfB7bTgflGY6yDEt5hqU8w8nlLDsb6ItNx0pEREQk7jo7\nYqUaKxEREZEuemjDXv6zfKtqrERERER6SzVWeSiXx7YzQXmGoyzDUp5hKc9wcjnLrOlYiYiIiGQ7\n1ViJiIiIdNED6/fyw6cyXGNlZoPNbLmZ/V1fPJ6IiIhIJvTVUOANwG86uoFqrMLJ5bHtTFCe4SjL\nsJRnWMoznFzOsrORvm53rMzsNjN73czWtFp/oZltMLONZnZD0vp3AeuAPUCbQ2YiIiIi2SL4PFZm\nVgocAe5wzp3l1xUAG4F5wA5gOfAh59wGM/sWMBiYBRxzzr2vve2qxkpERETi7r51e7jl6W0pa6z6\ndXeDzrlyM5vSavV5wMvOudcAzOxu4DJgg3PuRr/uamBvt5+BiIiISEz01SltJgFbk5a3+XVJDXF3\nOOceTLUB1ViFk8tj25mgPMNRlmEpz7CUZzi5nGVn43zdPmKVLk888QQrVqxg8uTJAIwYMYLZs2dT\nWloKNP+RtKxlLWfvckJc2pPtywlxaU+2LyfEpT3ZvLx27dpYtSfEcuJy+ZqNbN5zjIqC9zBv3jxa\n69E8Vn4o8P6kGqvzga875y70y4sA55z7ble3qRorERERibt7X9jNT57ZHnweK6PlL/yWA8VmNsXM\nioAPAff1cNsiIiIiWakn0y3cBTwNnG5mW8zsWudcA/A5YBnwInC3c259d7a7dOlSysrK2hySle5T\nhmEpz3CUZVjKMyzlGU4uZ7l+1bNsX3Z7ytrwft3doHPuqhTrHwIe6u72EoqLi1m4cGFP7y4iIiKS\ndtNL3sSkmpMoKWm/lErnChQRERHpoqVrd/OzZ8PXWAVXUVGhoUARERGJtfUrOx4KjE3HCmDRokVN\nP2+UnlPnNCzlGY6yDEt5hqU8w8nlLCecMYdJ77mGkpKSdq+PVcdKREREJM4OVdd3eL1qrERERES6\n6NuPvcITmw+qxkpERESktzatXq4aq3yjzmlYyjMcZRmW8gxLeYaTy1mOnX6OaqxEREREQqip77iE\nSjVWIiIiIl302Xs3ULnvePxrrERERETirqGx4wNSselY6VyB4SjDsJRnOMoyLOUZlvIMJ5ez3Ln+\n+bDnCkwXnStQRERE4m706edQfeIsnStQREREpLc+cvcL7D5SpxorERERkd6qz5Yaq1RjldJ9uTy2\nnQnKMxxlGZbyDEt5hpPLWTY0dnx9bDpWlZWVKl4XERGRWNvz0soOi9dVYyUiIiLSBQ2Njr/7RQUO\nVGMlIiIi0hsHj9fjgBEDU0+qEJuOlWqswtFwaljKMxxlGZbyDEt5hpOrWR44XgfA6EFZ0LESERER\nibPahqh8akC/1N0n1ViJiIiIdMGanUf44h9f5szxQ/jIpCOqsRIRERHpqfrGaK6FfoVt+lNNYtOx\n0rkCw1GGYSnPcJRlWMozLOUZTq5mWd/oqNpUwarf/1znChQRERHpjYZGGD6thPMveDslJxxq9zaq\nsRIRERHpgidfOcC3Hn2V0qkjeO/og6qxEhEREempBn+ewH4FWVBjpXmswsnVse1MUZ7hKMuwlGdY\nyjOcXM2yzk+30K8wdfcpNh0rERERkTg7XNMAwLCiwpS3UY2ViIiISBcsWbGDX1e8zkfnnMgsdqrG\nSkRERKSndlbVADB+aFHK28SmY6V5rMJRhmEpz3CUZVjKMyzlGU6uZnngeD1Vmyr44+23aB4rERER\nkd7Ye7SO4dNK+OfLP8SBV9e3exvVWImIiIh0orahkUuWrMYM7rvmbF5YU6EaKxEREZGeqK5rxAFD\nigop6pcF0y1oHqtwcnVsO1OUZzjKMizlGZbyDCcXs6yuj07APKCDOawgRh0rERERkbiqSXSsOjha\nBTHqWJWUlGS6CTmjtLQ0003IKcozHGUZlvIMS3mGk4tZZl3HSkRERCSuEh2rgdnSsVKNVTi5OLad\nScozHGUZlvIMS3mGk4tZ1jQkjlilPgEzxKhjJSIiIhJXNfXR9FRF2VK8rhqrcHJxbDuTlGc4yjIs\n5RmW8gwnF7OszrahQBEREZG4OlJTD8DgosIObxebjpXOFRiOMgxLeYajLMNSnmEpz3ByMcudh2sB\nOFhZQVlZmc4VKCIiItJTWw5WA/C20lLeddqlrFy5st3b6VyBIiIiIp345D3ree1ANT+67HSmjx3C\nypUrda5AERERke46eLyOrf6I1bihRR3eNjYdK81jFU4ujm1nkvIMR1mGpTzDUp7h5FqWK7YdptHB\nWScOZdSg/h3eNjYdKxEREZE42nYoOlp15olDOr1tbDpWmscqnFycPySTlGc4yjIs5RmW8gwn17Lc\nuPcYAKeMHtTpbWPTsRIRERGJm9qGRlZsOwzAzPFZdMRKNVbh5NrYdqYpz3CUZVjKMyzlGU4uZVm5\n93jT5bFDOi5chxh1rERERETiZvP+qGP1ruJRXbp9bDpWqrEKJ9fGtjNNeYajLMNSnmEpz3ByKcvN\n+6KO1aljBnfp9rHpWImIiIjEzab9UeH6tDGdF65DjDpWqrEKJ5fGtuNAeYajLMNSnmEpz3ByJcuG\nRsfm/dFUC9O68ItAiFHHSkRERCROnt9eRU19I+OHFjF8YNdOr5z2cwWa2QzgOmAM8Jhz7n/au53O\nFSgiIiJxcuPDm3huaxUXTBvFVy6Y2uK6jJ0r0Dm3wTn3GeCDwFvS/XgiIiIivVXf6Fi5PZq/akHJ\n+C7fr9sdKzO7zcxeN7M1rdZfaGYbzGyjmd3Q6rpLgAeAB1NtVzVW4eTK2HZcKM9wlGVYyjMs5RlO\nLmS5avth6hsdg/sXMHVU1+qroGdHrJYA85NXmFkBcItfPwtY4IcAAXDO3e+cuxj4SA8eT0RERKRP\nJU5jUzJxWLfu17VKrCTOuXIzm9Jq9XnAy8651wDM7G7gMmCDmc0FLgcGAH9MtV3NYxVOLs0fEgfK\nMxxlGZbyDEt5hpMLWb52IJq/avaJQ7t1v253rFKYBGxNWt5G1NnCOfcE8ESgxxERERFJu52HawEo\n7uL8VQmhOla9dvPNNzNkyBAmT54MwIgRI5g9e3ZTrzcxXqvlzpeTx7bj0J5sX1ae4ZYT6+LSnmxf\nVp7KM67La9eu5TOf+Uxs2tPd5WN1Dbx6YCQA29c9z+HN/Zqu27JlCwDnnnsu8+bNo7UeTbfghwLv\nd86d5ZfPB77unLvQLy8CnHPuu13d5uLFi93ChQu73RZpq7y8vOkFIr2nPMNRlmEpz7CUZzjZnuVv\n17zOrc/tYM6kYZRdVNzubUJPt2D+X8JyoNjMpphZEfAh4L7ubFA1VuFk84s5jpRnOMoyLOUZlvIM\nJ9uzXOWnWbho+phu37cn0y3cBTwNnG5mW8zsWudcA/A5YBnwInC3c259d7ZbUVFBWVlZi0OyIiIi\nIn1pR1UNL+w6AsCZ49sWrpeXl1NWVpZymqhud6ycc1c55yY65wY45yY755b49Q8556Y7505zzpV1\nd7sAixYtyvpebhyocxqW8gxHWYalPMNSnuFkc5a/Wf06NQ2ON08ewZgh/dtcX1payqJFi1KOtOlc\ngSIiIiJeYrb1q99wYo/un/ZzBXbV4sWL3e7duyktLdVRKxEREelzWw9W8/Gl6xncv4B7PnoWhQVt\natMpLy+nvLyccePGcf3117e5Qb8+aWkXlJSUoJMwi4iISKbc88JuAKaPHdJupwpoOgC0cuXKdq+P\nzVCgzhUYTjaPbceR8gxHWYalPMNSnuFka5ZbD9YAUDp1RI+3EZuOlYiIiEgmHTxeB8DM8UN6vA3V\nWImIiEje23esjgV3vUCBwd1XncnIQW1/EQiqsRIRERHp1OOV+wE4f/KIlJ0qUI1VXsrWse24Up7h\nKMuwlGdYyjOcbMxyR1V00uWSicN6tZ3YdKxEREREMmXn4ahwfWw7k4J2R2xqrB599FGnoUARERHp\na0drG7jyV2upb3DcteDMdmdcby3VSZhjU2NVUVHBsmXLVLwuIiIiferOlTupa3CcPWFop52q5OL1\nefPmtbk+VkOBOldgGNk4th1nyjMcZRmW8gxLeYaTTVk651i2MSpcv/bciZ3eXucKFBEREUlhy8Fq\njtQ2MGJgP84YN7jX24tNxypVz0+6T0f9wlKe4SjLsJRnWMoznGzKcuPeYwCcMW4wZu2fxqY7YtOx\nEhEREelrf1y/D4AZY3s+23qy2HSsli5dSllZWVaNy8aVMgxLeYajLMNSnmEpz3CyJcsdVTWs230U\nAy6aMaZL9ykvL6esrCzl/Jux+VVgcXExCxcuzHQzREREJA845/jF8h0AnD9lBKM6mG09WWczr2se\nKxEREck7P3t2O0vX7saAW/5+Oqed0L3C9VTzWMVmKFBERESkL+w9WssfXtwDwBfnTu52p6ojselY\n6VyB4WTL2Ha2UJ7hKMuwlGdYyjOcuGd50yObqWt0vO2Ukbz7tK7VVnVVbDpWIiIiIum2/VA1L+89\nDsDCLkwI2l2qsRIREZG88dl7N1C57zjTxw7mR5dN7/F2dK5AERERyWurth+mcl90tOqf3nJSj7ah\ncwXmobiPbWcb5RmOsgxLeYalPMOJY5bV9Y18+7FXAHhX8Sim93BCUJ0rUERERPLa0doGvvqnTVTV\nNDB2SH+uK52ctsdSjZWIiIjkrHWvH+WmRzZzqLoegG++51TeNHlEr7cb+xorERERkZCO1NRz48Ob\nOFLbwLABhdz0rlM4a8KwtD5mbIYCNY9VOHEc285myjMcZRmW8gxLeYYTlyzvXLmLI7UNzBo/hF9f\ndWbaO1UQo46ViIiISCi7DtfwwPq9AHz8jRMpKuybLo9qrERERCSnNDQ6vrZsEyu2HWZe8ShueMfU\n4I8R+xorzWMlIiIiITz00j5WbDvMsAGFXPOGCUG3rXms8lBcxrZzhfIMR1mGpTzDUp7hZDLLl/ce\n45antwLw0TkTOHHYgKDb72weq9gcsRIRERHpjcc37afs8ddwwBsmDeOSM07o8zaoxkpERESy3pOb\nD/Ctx14F4OwJQ/n2hdPSWrAe+xorERERkZ5odI47V+4CYP7po/nC2yZTYG36PH0iNjVWmscqHNUJ\nhKU8w1GWYSnPsJRnOH2d5b0v7OG1g9UM6FfA5996csY6VRCjjpWIiIhId+08XMNPn90OwL9eMJX+\nfTRfVSqqsRIREZGs9OKuI3z5oUrqGhxvnTKCm959ap89tmqsREREJGfc+tx2frtmNwAnDO7Pp86f\nlOEWRWIzFKgaq3BUJxCW8gxHWYalPMNSnuGkO8vKvceaOlVvnjKCJVfODD5fVU/piJWIiIhkjV2H\na7juvo0ADCkq5Bt9OPzXFaqxEhERkazQ6BzX3beRl/YcY8TAfvzw0tOZMDwzR6piX2OlcwWKiIhI\nKlXV9Xzh/o1sPVTD4P4F/M/lMxgzuH+ft0PnCsxDqhMIS3mGoyzDUp5hKc9wQmdZ29DY1KkC+OLb\np2SkUwU6V6CIiIhksZr6Rv7hnvXsPFwLwE/eN51pYwZnuFWpqcZKREREYqm+0fGJpevZURUdqfrC\n2yZz0fQxGW5VJPY1ViIiIiIJtfWNfOKe9ezyR6q++s6pzD11VIZb1bnY1FhpHqtwVCcQlvIMR1mG\npTzDUp7h9DbLA8fq+PjS5k7VN99zalZ0qkBHrERERCRGnn7tIF9/5JWm5X+/cBrnnjQ8gy3qHtVY\niYiISCzsqKrhY79dB0STf371nVNj26lSjZWIiIjE1tHaBj65dH3T8p0fnMnQAdnXTVGNVQ5SnUBY\nyjMcZRmW8gxLeYbT3SyP1jbwsd+uo64xGkX7+RUzsrJTBTpiJSIiIhlUufcYX3jgZWrqGwH41wum\nMmXUoMw2qhdUYyUiIiIZsWzjPn7w5Jam5evfPpn5p8djnqrOqMZKREREYmP51qqmTlVRofHzK87I\n2AmVQ1KNVQ5SnUBYyjMcZRmW8gxLeYbTWZYrtlXx1Yc3ATCgXwF/uObsnOhUQR8dsTKzy4CLgWHA\nL5xzj/TF44qIiEi83LlyJ3eu3NW0/OsFsygsaDOilrX6tMbKzEYC33fOfbL1daqxEhERyW13PL+T\n/10VdaoKDO6+6kxGDuqf4Vb1TKoaqx4NBZrZbWb2upmtabX+QjPbYGYbzeyGdu56I/DfPXlMERER\nyV43l29p6lRNHF7EQwtLsrZT1ZGe1lgtAeYnrzCzAuAWv34WsMDMZiRdXwY86Jxrt5hKNVbhqE4g\nLOUZjrIMS3mGpTzDSc5y075jXPLL1fxxwz4Axgzuz23vn4lZ7gz/JetRjZVzrtzMprRafR7wsnPu\nNQAzuxu4DNhgZp8D5gHDzazYOfez3jRaRERE4u++dXu45eltTcvnTBxK2UXFOdupgl7UWPmO1f3O\nubP88hXAfOfcP/jljwDnOec+35XtqcZKREQkd/zs2e0sXbu7afk7F07jDTE9719PxH4eq6VLl3Lr\nrbcyefJkAEaMGMHs2bMpLS0Fmg8ralnLWtaylrWs5Xgv37duDxUFUwEo2P4CN1wwpalTFYf29WQ5\ncXnLlmjurXPPPZd58+bRWsgjVucDX3fOXeiXFwHOOffdrmxv8eLFbuHChT1qi7RUXl7e9IKQ3lOe\n4SjLsJRnWMqz917Zf5wvP1jJ1hdXMHxaCaMH9+NXHzozp6ZTSAj6q0DP/L+E5UCxmU0xsyLgQ8B9\nvdi+iIiIZImjtQ186vcbOFRdD8Cpowfx6wW52anqSI+OWJnZXcA7gDHA68BNzrklZnYR8F9EHbbb\nnHNlXd3m4sWL3e7duyktLdU3BhERkSxyrLaBK3+1ltqGqE/xrfmnct7JIzLcqvQoLy+nvLyccePG\ncf3117fpNeokzCIiItJj2w/VcO3v1jUt3/COKcwrHp3BFvWNdAwFBqV5rMJJLrST3lOe4SjLsJRn\nWMqz+57beqhFp+rqN0xgXvHovM6yX6YbkFBZWUlZWZmGAkVERGKuvtGx6MFK1uw60rRu8XtPY/aJ\nQzPYqr6RPBQY9FeBoWkoUEREJP7qGhq5/I411DQ09x9u/+BMJgwbkMFW9b3Yz2MlIiIi8bZm5xG+\n+MeXm5bnFY/iy3On5PRM6t2lGqsclM9j2+mgPMNRlmEpz7CUZ2qNzvGDJ15r0am64syx3PCOqe12\nqvI5y9gcsVKNlYiISPw8v62Kr/xpU4t1P3nfdKaNGZyhFmWWaqxERESkR773l1f5c+WBpuUZYwez\n+L2n0b8wNgNeGaMaKxEREelUQ6PjpT3H+Nc/VXKsrrFp/c+umMHUUYMy2LLsEJsup2qswsnnse10\nUJ7hKMuwlGdY+Z7n4Zp6lizfwUW/qOCf79/Y1Kk6ZdRAHlxY0q1OVT5nqSNWIiIiee6+dXu45elt\nLdaNHdKfy2aO5QNnjdOv/rohNjVWOlegiIhI33LOcd19G9mw51jTujmThvGpN03ilNEa9muPzhUo\nIiIibby05yif/8NGknsBdy2YxQlDijLWpmyicwXmkXwe204H5RmOsgxLeYaVT3k+VrmfzyV1qs6Z\nOJSHFpYE61TlU5atqcZKREQkTyzfWsUvn9/By3uPN627cd5U3jZ1pOqoAtFQoIiISA47UlPPkhU7\neXDDXhpafeT/esGZjBnSPzMNy3Kxn8eqoqKCZcuWqXhdREQkgEbnWLPzCF9+sLLF+gGFxqfffBLv\nOW20Jvrsgc5mXo9VoosWLVKnKoB8HttOB+UZjrIMS3mGlUt5Prf1EBfeVtGiU/Wmk4fzmw+fyf3X\nlnDxjBPS2qnKpSxbKy0tZdGiRZSUlLR7fWyOWImIiEjPOefYvP84i5/cQuW+5hqqYQMK+fLcKbxp\n8ogMti5/qMZKREQkyx08XsdX/rSJTUkdKoB/e8+pnK8OVVrEvsZKREREuudobQNlj7/Ks1urWqz/\n2Bsm8KGS8RTol359LjY1VprHKpxcHtvOBOUZjrIMS3mGlW15Prn5AO+7Y02LTtVlM0/ggWvP5qpz\nTsxopyrbsgxJR6xERESyyJGaem54qLLFXFSXnHECC984kSFFhRlsmUCMaqx0rkAREZH27ayqYf/x\nOr7+yCscqq5vcd0PLz2dGeOGZKhl+UfnChQREclCx+sa+OsrB3l++2Ee33SgzfVXnjWOq+dMoKhf\nbKp68orOFZhH8nlsOx2UZzjKMizlGVac8nz6tYNcdvsafvDklhadquljB/PluVN4aGEJnzhvUmw7\nVXHKsq+pxkpERCQm1r1+lNuf38GqHUea1k0YVsRbp47kA7PHMWqwTj8TdxoKFBERybBD1fXc+PAm\nXtpzrMV6nSA5vjSPlYiISIwcr2ugodHxrcdeZeX2wy2ui2ZKH86wAfqYzjaxGZxVjVU4+Ty2nQ7K\nMxxlGZbyDKsv8mx0jj1Ha/nt6te57PY1XH7n2hadqk+eN5H/u/os3nXa6KzuVOXzazN7/2oiIiJZ\nwjnH3mN1fO3hTWzeX93iuiFFhbyreBSfPv8kCgs05JftYlNjpXmsREQkF1XXN7L4idd44pWDTetG\nDOzHuKH9+db8aYwapIL0bKJ5rERERDKgur6RvUdr+dQ9G6hrbP6sfd+ssXzmzSdlsGUSguaxyiP5\nPLadDsozHGUZlvIMK2SeN5dv4dJfrmbh79Y3dareeNJwHlpYkhedqnx+barGSkREJICq6nr+5YGX\n2Xu0lmN1jU3rhw0o5PNvPZm5p47KYOukr2goUEREpBde2HWEW5/bwbrdR1usnz52MD+89HTNQZWj\nNI+ViIhIQNsPVXPP2j08sGFvi/ULzh7PlWePZ1D/AnWq8pBqrHJQPo9tp4PyDEdZhqU8w+pqnut3\nH+VPL+3jH+7Z0KJT9YXSk/n5FTP42LkTGFJUSEEed6ry+bWpI1YiIiKdqKqup6HRcbimgevu29ji\nunnFo5hXPJpzTxqeodZJnKjGSkREpAN3PL+T/121q836+aePZuyQIj58zoma2DMPqcZKRESkE/es\n3c1zWw+1WFe57zgQ/bqv0IyCAvhwyYlcMnNsJpooMRebjlVFRQU6YhVGeXm5Zq8PSHmGoyzDUp69\nV9vQyOrkDQ/OAAAgAElEQVQdR6hpaGTt8me4r2o8De0M5BQVGr/4wExGDIzNx2as5fNrU68QERHJ\nW0vX7OaXz+8EoGrTToZPG8/Igf1YdMGUFrebMHyAOlXSJbGpsdK5AkVEJN2cc/xX+VbW+zmn9h6t\n40htA9PHDmbM4OicfRdMG6XJPCWlzs4VGJvud0lJiYYCRUQkqMM19WzyNVLRcgMPvbSvxW0KDL40\ndwqTRw7s6+ZJFkocAFq5cmW718emY6Uaq3DyeWw7HZRnOMoyLOXZuS/9sZLN+4+3WT911EC+csFU\nAEYO7Meowf2VZ0D5nGVsOlYiIiK9cbimnt+ufp2jtc3n6Xv1QNSpOnvC0KZ1ZnDZzLGcMnpQn7dR\ncl9saqw0j5WIiPTG/724hx//bVub9ScOK+KOD87KQIskl2keKxERySm3P7+T8lcPNi0fPF4PwFun\njGDOpGFN689KOlolkm46V2AOyudzNKWD8gxHWYaVz3k65/jt6td57UB1079D1VHH6pKZJ3DJzLFN\n/6aM6tqQXz7nGVo+Z6kjViIiEmtHaxv4l/s3sutIbdM656Cu0TGofwH/dcnpTeuHDihk7JCiTDRT\nBFCNlYiIxFzFjsN8+cHKdq+be+pIvvrOU/q4RSKqsRIRkZhrdI4bH97EutePtlhf3xgdAHjzlBHc\nMLfljOiD+semokUEUI1VTsrnse10UJ7hKMuwci3PA8fqWbHtMMfqGlv8q21wGPDGk4YzuKiwxT+z\nNgcMeizX8sykfM5SR6xERKTPvXbgOLc8vY3q+uY5p2r95ckjB3Lzpae3uH1hgTGwX2yOBYikpBor\nERHpc0tW7ODXFa+3e90F00Y1zYouEleqsRIRkT5VU9/Inyv3c7Smoc11q3ccAeDKs8ZROnVk0/oC\nM04doxnRJXv1yXFVMzvFzG41s9+muo1qrMLJ57HtdFCe4SjLsOKe52OV+7m5fCu3Lt/R5t+63VGB\n+hnjhjAj6d/pYwfTryBc3VR3xD3PbJLPWfbJESvn3CvAJzrqWImISG7Zc7QOgFnjhzBz3JA2148c\n1I/zTh7e180SSase1ViZ2W3Ae4HXnXNnJa2/EPgvoiNhtznnvtvqfr91zl3Z3jZVYyUikl2O1zVw\n/QMvs+twbbvX19Q3Utfo+NSbJnHF7HF93DqR9EpVY9XTocAlwPzkFWZWANzi188CFpjZjFb3y8zx\nXRERCa5y33Eq9x3nSG1Du//qGh1FhcbM8W2PVonkqh4NBTrnys1sSqvV5wEvO+deAzCzu4HLgA1m\nNhr4NlBiZje0PpIFUY2VjliFUV5eTmlpaaabkTOUZzjKMqx05dnQ6NheVQOdDGi8dqAagDmThvHV\nd05t9zZFhQUMyJJpEvT6DCefswxZYzUJ2Jq0vI2os4Vzbj/wmYCPJSIiafLtx16l/NWDXb79iIH9\nGDZAPzIXgRhNt1BZWclnP/tZJk+eDMCIESOYPXt2U4838QsDLXe+XFpaGqv2ZPuy8tRyvi0/+7en\nqDpWx4xzzqPQjD0vrQRg7PRoVCF5uX9hAeMPvkR5+bbYtF/L8VhOiEt7Qjyf8vJytmzZAsC5557L\nvHnzaK3HE4T6ocD7E8XrZnY+8HXn3IV+eRHg2hv2a4+K10VE0qe2vpHDtW3nk2rPP9yznsM1Dfzm\nw2cyalD/NLdMJDulY4JQo2Ux+nKg2He4dgIfAhZ0dWOqsQqnvDx/x7bTQXmGoyzD6mqeVdX1XPu7\ndRxuZ6LOjgzuX9jTpmUlvT7Dyecse9SxMrO7gHcAY8xsC3CTc26JmX0OWEbzdAvru7rNyspKysrK\nmoZeREQkjK2Hqjlc00C/AmP4gK51lt5w0vCsKToX6Uvl5eWUl5czbty4sEOBoWkoUEQkPVZsq+Jf\n/7SJOZOGUXZRcaabI5ITdK5AEZEcUlVdz23Ld3RpeG//sWgG9EE6AiWSdrHpWC1dupRly5ZpKDCA\nfB7bTgflGY6yDOepVw/ymwcfZfi0ki7fZ/ywojS2KPvp9RlOLmfZ2VBgbDpWxcXFLFy4MNPNEBHJ\nCkf9L/zedPJw3n366E5v37+ggJKJQ9PdLJGclzgAtHLlynavj03HqqSk69+6pGO5+i0hU5RnOMoy\nnOr6RoZPK2HamEG8/ZRRmW5OTtDrM5x8zjI2HSsRkXy0ed9xfvLMNqrrG7t1vz1HoxMfD+yvuimR\nOInNO3Lp0qWUlZW1mbFVuk8ZhqU8w1GWbT2+aT+rdx7hpT3HuvVv/7F6qjZVMHnkwEw/hZyh12c4\nuZxleXk5ZWVlVFRUtHt9bI5YqcZKRPJR4kjVFWeOZe6p3RvSW7/qAG+ZMjIdzRKRFFRjlYfyeWw7\nHZRnOMqyrUTHavLIgcwYN6Rb950x/53paFLe0usznHzOMjYdKxGRbFJVXc+Lrx/t9XZ2VEW1UkWa\nY0okJ8SmY6VzBYaTy/OHZILyDCeXsvz2Y6+waseRYNvryXn5cinPOFCe4eRzlrHpWOlcgSKSTV4/\nEs1mXjJxKAN7ebRp1KD+mmNKJEvoXIEiImlw1a9fYO/ROn61YBZjh2hGc5F8k+pcgRrUFxHpgbqG\n6EtpUaF2oyLSLDZ7hFTzQUj35fL8IZmgPMPJRJbOubT8q/G/5isqbPOFtc/otRmW8gwnn7OMTY2V\niEhoP3t2O0vX7k7rY+iIlYgki02N1eLFi93u3btVvC4iwXxi6Xq2HKxO2/bfeNJwvn3htLRtX0Ti\nJ7l4/frrr29zyDo2R6xKSko03YKIBFXXEA3XLfnATCaNGJDh1ohILuhs5vXYHMNWjVU4+Ty2nQ7K\nM5y+zrKuMToi3z+DdVDppNdmWMoznHzOMjYdKxGR0BK/3OtfkJsdKxGJn9jUWGkeKxEJ7X13rOFo\nbQO//+hshg6ITeWDiOSAVPNYaU8jIn3mqVcPsnHvsT57vOq6BgD66Zd7ItJHYtOxWrp0KcuWLdOv\nAgPI53M0pYPyDONYbQNf+tm9DD21pE8fd2C/gpwdCtRrMyzlGU4uZ9nZKW1i07EqLi5m4cKFmW6G\niKTJ8fpGGh0M6l/AlWeN77PHnTl+CIU52rESkb7X2a8CVWMlIn1i95FaPnL3i5wwpD93LTgz080R\nEekVnStQRDIq8Qu9fjp6JCI5LDYdK81jFU4+zx+SDsozjIZGR9WmCnWsAtJrMyzlGU4+ZxmbjpWI\n5Lb6Rh2xEpHcF5uOVUlJ3/5SKJfl6i8xMkV5hlHvHMOnlahjFZBem2Epz3DyOcvY/CpQRHrveF0D\n+47VZboZ7dpVVQOgX+iJSE6LTceqoqJCJ2EOJJfnD8mEbMmzur6Rq3+zjkPV9ZluSkpVmyroP/6t\nmW5GzsiW12a2UJ7h5HOWselYVVZWUlZWpglCRXrowPE6DlXXU2AwYdiATDenXQOH9OfC6WMy3QwR\nkR7rbIJQzWMlkiO2H6rm2t+tZ+LwIn555axMN0dEJKdpHiuRHOeniaLAVMMkIpIpselYaR6rcPJ5\n/pB0yJY8G/x0BoUx7lhlS5bZQnmGpTzDyecsY9OxEpHeafTD+oV6V4uIZExsdsGaxyocFf+HlS15\nZsNQYLZkmS2UZ1jKM5x8zjI2HSsR6Z2moUDNEyUikjGx6VipxiqcfB7bTodsybNpKDDGR6yyJcts\noTzDUp7h5HOWselYiUjvNDRG/xfoXS0ikjGx2QWrxiqcfB7bTodsybMhC45YZUuW2UJ5hqU8w8nn\nLGMz87pIb+w/VkfFjsPEY7rbzHj1QDUQ7+J1EZFcF5uOlc4VGE4+nqPpO4+/yuqdR9Ky7apNFQyf\nlj1HVAf0i2/HKh9fm+mkPMNSnuHkc5ax6VjpXIHSGwePRycePu/k4QwpKgy67S3HhzF52qig20yX\nwgLj0jNOyHQzRERyls4VKHnh479bx9ZDNdx6xRlMHjUw080REZEcp3MFSk7zUzih8iIREcmk2HSs\nNI9VOPk4f4jzZevpmBszH/NMF2UZlvIMS3mGk89ZxqZjJdIbjVlwOhcREcl9selYaR6rcPKx+N+l\ncSgwH/NMF2UZlvIMS3mGk89ZxqZjJdIbidO56IiViIhkUmw6VqqxCicfx7bTecQqH/NMF2UZlvIM\nS3mGk89ZxqZjJdIbjYnidXTESkREMkfzWElO+OCv1nLgeD13X3Umowf3z3RzREQkx2keK8lpmsdK\nRETiIDYdK9VYhZOPY9sujcXr+ZhnuijLsJRnWMoznHzOMjYdK5HeSAxo64CViIhkkmqsJCe87441\nHK1t4Pcfnc3QAbE5t7iIiOSoVDVWaf8EMrPBwI+BGuAJ59xd6X5MyT+JeaxMRVYiIpJBfTEUeDnw\nO+fcp4BLU91INVbh5OPYdvMpbcJvOx/zTBdlGZbyDEt5hpPPWXa7Y2Vmt5nZ62a2ptX6C81sg5lt\nNLMbkq46CdjqLzek2m5lZWV3myIprF27NtNN6HPpLF7PxzzTRVmGpTzDUp7h5HOWPTlitQSYn7zC\nzAqAW/z6WcACM5vhr95K1LmCDmqLjx492oOmSHsOHTqU6Sb0uXTOvJ6PeaaLsgxLeYalPMPJ5yy7\nXWPlnCs3symtVp8HvOycew3AzO4GLgM2APcCt5jZxcD9HW17495j3W2OtGPfsbq8y7JB5woUEZEY\nCFW8Ponm4T6AbUSdLZxzx4CFnW1g165d/NP/vRSoOflt899eZPXk/MwyHd2qLVu2pGGr+UlZhqU8\nw1Ke4eRzlrH5Xfq0adM4uvaXTctnn302JSUlmWtQFqsoeA8lJfGYRqOvra5YFXyb5557LitXrgy+\n3XykLMNSnmEpz3ByMcuKigpWr17dtHz22Wczb968Nrfr0TxWfijwfufcWX75fODrzrkL/fIiwDnn\nvtuz5ouIiIhkn55Ot2C0HHVZDhSb2RQzKwI+BNzX28aJiIiIZJOeTLdwF/A0cLqZbTGza51zDcDn\ngGXAi8Ddzrn1YZsqIiIiEm9d6liZ2UAz+4uZmXPuKufcROfcAOfcZOfcEgDn3EPOuenOudOcc2Xp\nbXbXmVl/M3vCTwnR14/dlFug7b1iZqPTdftubHeEmX0mafkEM3so9OOIQPffR2aW9pkJzWyJmV2e\n7sfpCTP7StLlFu/VND/uFDNrd/IiM/uGmb2zg/teY2Y/CtCGjO3vRRK6+uJbCNzj0nRiQTMrTMd2\nAZxzdcCfiYYn+1qw3PyOorvbCf738n+rUcBnmx7Eub3ADjN7c+jH6600dG7nmtn9/vIlZvblDm57\nppktSWd7UjzuNWZ2Yjfvk/JDsRftaOp8mNl1Zjawm/dPZN2t95FzrrSbjxPkQ91vK+U+1cwuNrNv\nhHicTvxr0uUW79We6sbrtt2/kXPuJufcYz25bxfaNjex7wmxv0/uBJrZ42Y2x18+3M3tvN/M1pnZ\no53crukLsJkd7izrpO3Wd7DNKWa2IGm5zb6ou3r6ZaKj91d3M+3CYzXtn3t4/6b9YG8y62rH6sPA\nH/yDfd/M1prZajO70q8zM/ux/2M/bGZ/TNqhJr9o3mBmj/vLN5nZHf7b5R3+CT1pZiv8v/NTPPH3\nmtkzZva8mS0zs7FJ27vNvxEqzexzSXf7g38OfS05t7n+m9QDFs1Q/+PEjcxsgZmt8f/KktYfNrMf\nmNkq4M1J6weZ2YNm9vFOHt+Az/usVpvZ6f7+g31WiRwv8evb/Rv4tj9pZn8gGur9DjDNzFaaWeIH\nCn8APtLLvNIhHV8KHIBz7n7n3PdS3si5F4BJZnZS0up222Nhv1x8jGgKlO5K509J/xkY3IP7OVq+\nj24xs/f6y/ea2a3+8rVm9k1/+bD/f67fH/zOzNab2Z2JjZrZ3/l1y/32L+6sIf6x15vZMmBc0vpX\nzKzMzFYAH2j1YTzGzF7xNz2B6P34ZzPbbGb/aGZf8O+jp81spL/P42b2H2a23MxeNLNzzeweM3sp\n8Rz97T5sZs/6+//EzArM7DvAIL/uTtp5r5rZF83sOTOrMLOb/LrBft+0yu+HPtDq6bd43ZrZODP7\nvd/GqqT9dT8z+5mZvWBmfzKzAf72yZ3sN5rZU/6+z5jZkFY5X+yvH23R0fCl/nk+a74Dlby/B34P\nJH/B6dX+voNOYHffHx8HPuGca/uzsdTbdXS+z/o48Amgo8kKTwGuatpo+/uivpTquaRjn9PbbSb2\n7z3PzDnX4T+gP7DDX74CeNhfHge8Boz36x/w68cD+4HL/fJmYLS//AbgMX/5JqKi9yK/PDDpcjGw\nPEV7RiRd/jjw/aTtlRNNITEG2AsU+usKgN2dPdeQ/5Jz88tzid4IU4g6PMuIzqM4wec42rfzUeBS\nf59G4IqkbWz2938E+HAX2vAK8Fl/+TPAz/zlbwNXJfIEXgIGpfob+LYfBib75SnAmlaPNbH1ujj8\nA55KavcNwBpgFfDvfl0J8DegArgn8foCHgfm+MtjgFeSsrjPX74G+JG//AFgrd/2X5Ie//PAF1O0\nZy7wJNEHwQa/7l/8dtYA1yXd717/fllLtLNOvK6X+NuuBq4jei8eBtYDK4EBwBzgL/7+DwHjk96P\nFb7N3+vs70fUMXrA334N8AG//mvAs37d/yTdfgnRa/xzRCdhXw086q97D1Gt5grgN8Bgv/5C3/YV\nwM1Ekwonv48+CHzXX37Wb+MGon1OJfDvQBXRB896oN5vY6C/7VeBF4Ba4G9+O38DdvhsXkps31/3\nbn+/SmCnz2C+v//lRBMhNxJ9sA8ANhG9dr5H9CVkLXAk6fWyn+gD7wTgIPBJf91/AJ9Peu19J+n1\ns51of1tENF/gKGAG0Q+EEvu4/wY+4i9XJbW/xXvVP5+f+svmsyn1z+WnSbcblup95JcrfLtW+cyH\nARf5LDYQvZd+75/rNH/bV4DnifZ3c4Dv+4xW+/b/0P99DhK93tcT7fPe4h9zC83vk//xtzsV2EV0\nurSVwFtJsb/3+d9LtO/dDPwj8AV/v6eBkcmv23b2A4f9/yf421/UwXvlazS/D79L0r7CX38/8Pak\n/XTiM7KKlvuIIURH4Fb4nP43abs1/u+72q874F8f3wFeJnrtHwO+4bd1k19fQfQZclLS873ZP25l\n4rn7627xj7UM+GNSLmVE76MK4Hud7DeuAf7PZ/kS8P+SrqtK8TwTn4FTgHXAz/zj/QkY4K+b5p9H\nhb/fKUT71MeB3/l235n0WN3eD9Jq/93lz50ufDBNANYlvfk/lnTd7cAlwH8C1yStvyfpD5D8omnd\nsfpa0n2GA3fQ/MF3JEV7zgQe9rdbDzyYtL2vJN3uRWBi0vJWYEh3A+rpv+Tc/PJcWn7gXuvzvBT4\nZdL6hcAP/OU6/JQYSVmuAhZ0sQ2vABP85fOAZf7y8qScV/nbTU/1N/BtfzRpu+11rPoBe/oq3y4+\n/+QvBRcRdbwTb8rETnQ1UOovfwP4D3+5dcdqc1IWyR2rH/rLa5KyHp7UhrcAf2jdnqRtJXdY5/j2\nDCTa0bwAnN2qvQOJPqxH+dsvS34P+f8fA85J+rs8BYzxy1cCtyU997f6y13pWLX74Ztom798B3Cx\nv5z8AfUKMCopzyeAQX75y8CNRB2TLcCpfv1viN7rye+jiUQdoTP89p8i6mBt8JmNJPpwGuXzfRj4\nJtGH6I/99uf5v28ir/8EjgJDfRteJTri19ROf5vf+HYWAkd8Ht8Hqok6WG8HfuW3vdv/vZM75dcQ\nfXG62S+/SvNr5lpavvbe7C9fgP8y65f/Apzln882ok7BKqJ94df8bQ6neq/69m5Out9G/9in+fXf\nwb8f2nsf+eULiTqWQ1q9NtcBW5LeS08SDUs+Q/SheTnRF5lyfzn5S/pe/zd8gaiDMIGo41dL9GG8\niqgjsY2oc/s/wCZ//5uIOlcd7u99/hv9/Tvq2Ca/bpP3A1W+rc8A7+zC/udxmt+HTfsKv9xRxyo5\n6wJgaNL75mX8+9vfdiDR63k/0T5+hc/oF0Qd78/T/Lr6K/B80uvt3qTn+xt/+Qyis6jQ6m80wf9d\nLic6CLCh9X6ngxyuIepYj6R5/9WUqf+/sPXzTHr91gKzk/YJiYMCz9DcASvy257b6vXzNNE+uEf7\nQZL2393515WhwOO+we0xOj/sVk/zkGPr7SSfIPALwC4XzY11LlFQmNm3/KHmxExjPyJ6gZ4FfLrV\nNmuSLjfScgLUAUQ7wL5ynGiHnKx1Vs7/S1W3cNz5v26Sp4h2bF2VyKSB5jyM6EjYOf7fKc65l0jx\nN/A6O5njQKLnHCeJnSdEO58lzrkaAOfcQTMbTnSEKlHsfDvRh2NPlAO3m9knaPm6203UGWjdnoTn\nnHOJKYpLiXZ21c65o0Tf+N/mr/tnM6sg2pmcRPMH4SlmdrOZzSfqpEHL6VCmE30ZecQPKX8VmGhm\nI/xzf8rfrmmYrANrgXeb2XfMrNQ5l3i8eX5IZw1RR2BWivsn2nQ+MBN4yrfpaqId6AyiDuxmf7v/\nJXrdNr3HnXM7iHbQ84k6PbVER4mqnHNHnXOJfGcTHQE5n+ioySy/rY1EHa2JtPw77XHOHfGvjxd9\ne5raSTS09FaiTnADUcdqEtGHWRVRJ/dtRB9e9UQfgHcBC2ipiubXg6P5/dl6f5W8Pnm/5vztDLjd\nOTfHv4fPcM59k84Z0dGwxP1Od84tcc697J/DWuBbZnZj0n1av27fRfRer4UW76VhSbe7negIwmBa\nnpkjURdUCvza3383UafqkL/9BufcTr/vqwe+5Zw7h+io4lkuOptHIpvWuSSk2t8/7pw75qK60INE\nR2Dxz3tqe4ElKSLqIH7JdV4vltDdWkqjZdYFwHfMbLV/7Im+HYntFhF9MekP/JToPbSJ6MsztHxe\nM2nO7E6i13PC/wG46Nf8iWHut9H8N9pJ1KGD6O903MxuNbP30bX9/iPOuYPOuWqi/VqiDtKS/m/x\nPM0s0Y5XnHOJ+s/ngalmNpSoI32fb1+t3zZE+9TE66fCP/+e7geT999d1mnHyu+oCi2an+qvwAf9\nWP5YouCfI9rxvN8i44F3JG3iFaIjVRANU6QyguhQO0Q72kL/+Df6HcAcf91wojcYRD3hTllU47XX\n7xD7hM+twOeWcJ5FdUwFREMa5URvgLf7eoJCoh3xXxJNb2fT/w84aGb/nVjhazYmdKN5DxN9k0nc\nPzHFfbt/g3YcJtqJJjud6NtmnHT0paAzHX0haMM591miN+vJwPNmNirpvokdT3ud7U7PPm5mc4F3\nAm9yzpUQ7SwG+tfY2USvl08DP2/v7sALSR+kZzvnLursMdvT3oevr6H5b6Jv+GcBt9J5XkZ0pC3R\npjOdc59Mui5ZPc37n4RniL4EPEn0TXg+0b4pefu/JDrK9ATwb0ltugP4ElHJQoX/O70J/4FjZm8k\nOiKU6Lws8/ueT/nn/Wn/XhtOdNSglujvej7RB9VfifZ5dxINo1xBVKeReC0VEuYLyKNE+9xEjeko\nMzvZX1drZolORuv36sPAwkRdk5lNNLOx/jkdd87dRXRUK1Ej9u9EneXWr9v1+KJ4/9wSj9FZR+Il\n4ET8h7eZDbXm+sLXiYbQZpnZGX7dFuDv/OV6og4zdHDWkE729607qak6tu2pJ/pg784X2+T7Jn/e\npnqPOFpm/WGiju05vnO5u9V2vkB0tOpxmr8MNxKNdkDb55XqtZecS4d/Q5/recBS4L1Ew3Odae+g\nQvL/7T3PREbJbWt9gKA97d2+p/vBHh0w6Grx+jKiw8P3Eu1cEr3KL/lvG/cQfSN5kWjH9TxRrxai\nndoPzew5mr+ttOfHwMd8b/J0Un/gfANYalHR6Z4Otpf8h7yAaHy4ry2juWcO0WHaW4hy2uScu9c5\ntwtYRPThuApY4ZxLfItq98XonLuOqEC1zMyMaKx5fzuPn+po4jeB/r5IdS3R3wi6+Ddwzu0nOtqw\nxpqL1zOVcUqtvhQ8AlxrZoMg+iByzlUBB8ws8c3to0QfxBAN05zrL7cu5G3DzE51zi13zt1EtFNI\nfMg1dThTdLaT/RX4e4t+FTQEeJ9fNwI44JyrMbMZRB/imNkYohqbe4mGqBJfPg4TffBD9EE21pp/\niNDPzGY65w4RddDf4m/X9MMD/2H753aeY3sfvgOJXmf7/LfI96d4blVJbXoGeKuZTfPbHWxmpxEd\ntZhiZqf42yWO9iwDPm5mtyflVOiPbP3K5/OM39Yo356hRO8Jo7mQ2QFj/bfTDwNj/baajr4Ak4l2\nxi3a6TN+leiI1y+JOrfvJRpqaCAaHpnunHsRWIwfgiH60lng2wPR0bbEF5BU78+ORgES+4D1RH/z\nZf5b/jKi4Q+I6lHWmNmd/r36dOK96px7hOgoxN/8Ecbf+bbNBp7z7/3/R7SPwK/fRMvX7SNEuc7z\n21hFdBT1EM2dgo/6+9USfTYk3g9G9KXtjURHFJcRHdGaTtSxSgxv/s6/Dv5KdFR2NVGH7It+OzOT\nMjlMy8+yG4heb6E5olKNGZb0a2Az68qcja8CJf7gw8n4c+m2w/BZ++2OIKoXazSzC4iOpCbfdgSw\nzy8nfxk+Rtsvv5U0v84/QssvI63bANEXl8SBlAlE+3j8vmmkc+5PRDWhiTOw/L3viLfn3WY20u9/\n/57ooELyY3X2PFtwzh0BtprZZf6xixL79hS6uh9s/aOHnh0w6Mp4IdE3s9s7uU1ivH000WHwcV3Z\ndl/8I+r4FWfgcZtyI6k2J/BjzMLXZGU447+Q9MOCuPwjOorzTn/5BqJO7Uqi4QWIjvgkitd/T3Px\n+nSiLxDPE3U8O6uxuoeozmoN8J9Jj/8jfM1RO+1p85og+vVconj9c35dEfCgb/vviQ7Jv51oh/Y8\n0QfbSuA9/vaXE3VSEsXrZxN1GCv8tj/ubzfHr1tJVIy6xq9/A/BQO1m+x2eyiqiuKVE/8k2infZf\ngdvwxalEdR6JWpV/8m1KFK9fQHS0e7Vvw3v9+vk0F6//J1GdyDlE38h/kuJv/OV2/q6fJhoqfYao\nMBMJNVMAAAINSURBVPcXrf5Oa2muPfkz8IS//F3/WIn6l3ekaGfim+w8v/xTmmtW+vksVvvH+lJS\nW+8HZmX6fdGN989DrV+3HWSe6r00jegI22qiI/RT/frv0fxF/f3tvSeIOl9X+8ulRB+Qz/n7Jup1\nT/PbSBSvr6Gdgmra1jkl/7Aq+b2c/Lp9jLb1QEVExc+fJqoHWp8iu6b7+uX/JapDu8df9/Z22lHl\ns/57ovfBGKLO+2qi99aLRJ31Of62xUTvvUNE9XFVfttv9Jm/DKz1217i/zati9ebnm/y80zaf60n\nOtL5ANG+5USi9/9q/y/xo4nrgRtS5J7Yb70E3Nj6sVI8z8m0rRG8nub9SzGtXledvH7Oohv7waTn\nf3F7f98O3zfdeIN9jKRC6nauf5xoh/sC8NFM7xCS2tU/8YfP0ON/jKjHnZaOVRz+ER3CvTTT7UjR\ntk6/FKTxsYv8zqIgDu3pRrv/Ed+BiMs/oqOhZwbc3j/7/dWLRMN2A/vgOYwjqjXJeJ49aHvsX7e+\nnf2JOrVj+ujxLgb+KQ1Z/znkdtvbF6Uhizv6Kvc++tv2OLMenYRZJJuY2ceIPhT69MVuZsVEBZZP\nxqE9kllmdi5Q65xbk+m29IRet30ndNap9kWSWm8yU8dKREREJBCdT0lEREQkEHWsRERERAJRx0pE\nREQkEHWsRERERAJRx0pEREQkkP8PGPox6H5psBYAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(10, 6))\n", "pd.Series(vocabulary).sort_values().plot(logy=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This gives us a criterion: let's only improved ingredients whose counts are lower 100. The others are assumed to be already well defined." ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def improve_ingredients(ingredient_list):\n", " \"Improves the list of ingredients given as input.\"\n", " better_ingredients = []\n", " for ingredient in ingredient_list:\n", " cleaned_ingredient = clean_string(ingredient)\n", " if vocabulary[to_ingredient(cleaned_ingredient)] <= 100:\n", " better_ingredients.append(\" \".join(best_replacement_increasing_distance(\n", " to_ingredient(cleaned_ingredient), vocabulary)))\n", " else:\n", " better_ingredients.append(cleaned_ingredient)\n", " return \";\".join(better_ingredients)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before using our function, we also need to rebuild our vocabulary using our new string cleaning function:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_test = pd.read_json(codecs.open('test.json', 'r', 'utf-8'))\n", "all_ingredients_text = []\n", "for df in [df_train, df_test]:\n", " for ingredient_list in df.ingredients:\n", " all_ingredients_text += [clean_string(ing) for ing in ingredient_list]\n", "all_ingredients = [to_ingredient(text) for text in all_ingredients_text]\n", "c = Counter(all_ingredients)\n", "vocabulary = dict(c.most_common())" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_train['better_ingredients'] = df_train['ingredients'].map(improve_ingredients)" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cuisineidingredientsall_ingredientsbetter_ingredients
0greek10259[romaine lettuce, black olives, grape tomatoes...romaine lettuce;black olives;grape tomatoes;ga...romaine lettuce;black olives;grape tomatoes;ga...
1southern_us25693[plain flour, ground pepper, salt, tomatoes, g...plain flour;ground pepper;salt;tomatoes;ground...plain flour;ground pepper;salt;tomatoes;ground...
2filipino20130[eggs, pepper, salt, mayonaise, cooking oil, g...eggs;pepper;salt;mayonaise;cooking oil;green c...eggs;pepper;salt;mayonaise;cooking oil;green c...
3indian22213[water, vegetable oil, wheat, salt]water;vegetable oil;wheat;saltwater;vegetable oil;wheat;salt
4indian13162[black pepper, shallots, cornflour, cayenne pe...black pepper;shallots;cornflour;cayenne pepper...black pepper;shallots;cornflour;cayenne pepper...
5jamaican6602[plain flour, sugar, butter, eggs, fresh ginge...plain flour;sugar;butter;eggs;fresh ginger roo...plain flour;sugar;butter;eggs;fresh ginger roo...
6spanish42779[olive oil, salt, medium shrimp, pepper, garli...olive oil;salt;medium shrimp;pepper;garlic;cho...olive oil;salt;medium shrimp;pepper;garlic;cho...
7italian3735[sugar, pistachio nuts, white almond bark, flo...sugar;pistachio nuts;white almond bark;flour;v...sugar;nuts pistachio;white bark almond;flour;v...
8mexican16903[olive oil, purple onion, fresh pineapple, por...olive oil;purple onion;fresh pineapple;pork;po...olive oil;purple onion;fresh pineapple;pork;po...
9italian12734[chopped tomatoes, fresh basil, garlic, extra-...chopped tomatoes;fresh basil;garlic;extra-virg...chopped tomatoes;fresh basil;garlic;extravirgi...
\n", "
" ], "text/plain": [ " cuisine id ingredients \\\n", "0 greek 10259 [romaine lettuce, black olives, grape tomatoes... \n", "1 southern_us 25693 [plain flour, ground pepper, salt, tomatoes, g... \n", "2 filipino 20130 [eggs, pepper, salt, mayonaise, cooking oil, g... \n", "3 indian 22213 [water, vegetable oil, wheat, salt] \n", "4 indian 13162 [black pepper, shallots, cornflour, cayenne pe... \n", "5 jamaican 6602 [plain flour, sugar, butter, eggs, fresh ginge... \n", "6 spanish 42779 [olive oil, salt, medium shrimp, pepper, garli... \n", "7 italian 3735 [sugar, pistachio nuts, white almond bark, flo... \n", "8 mexican 16903 [olive oil, purple onion, fresh pineapple, por... \n", "9 italian 12734 [chopped tomatoes, fresh basil, garlic, extra-... \n", "\n", " all_ingredients \\\n", "0 romaine lettuce;black olives;grape tomatoes;ga... \n", "1 plain flour;ground pepper;salt;tomatoes;ground... \n", "2 eggs;pepper;salt;mayonaise;cooking oil;green c... \n", "3 water;vegetable oil;wheat;salt \n", "4 black pepper;shallots;cornflour;cayenne pepper... \n", "5 plain flour;sugar;butter;eggs;fresh ginger roo... \n", "6 olive oil;salt;medium shrimp;pepper;garlic;cho... \n", "7 sugar;pistachio nuts;white almond bark;flour;v... \n", "8 olive oil;purple onion;fresh pineapple;pork;po... \n", "9 chopped tomatoes;fresh basil;garlic;extra-virg... \n", "\n", " better_ingredients \n", "0 romaine lettuce;black olives;grape tomatoes;ga... \n", "1 plain flour;ground pepper;salt;tomatoes;ground... \n", "2 eggs;pepper;salt;mayonaise;cooking oil;green c... \n", "3 water;vegetable oil;wheat;salt \n", "4 black pepper;shallots;cornflour;cayenne pepper... \n", "5 plain flour;sugar;butter;eggs;fresh ginger roo... \n", "6 olive oil;salt;medium shrimp;pepper;garlic;cho... \n", "7 sugar;nuts pistachio;white bark almond;flour;v... \n", "8 olive oil;purple onion;fresh pineapple;pork;po... \n", "9 chopped tomatoes;fresh basil;garlic;extravirgi... " ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_train.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now see the performance of this new version of the ingredients." ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = cv.fit_transform(df_train['better_ingredients'].values)" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.78830924 0.78680075 0.78214959 0.78918919 0.77923058]\n", "Mean score: 0.785 (+/-0.002)\n" ] } ], "source": [ "evaluate_cross_validation(logistic, X, y, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well that's too bad... Our \"clever\" ingredients don't improve our score on the test set. Still let's try to submit a solution given that the test data might have some other issues (unknown ingredients that our method can simplify...)." ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_test['better_ingredients'] = df_test['ingredients'].map(improve_ingredients)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now, let's build our feature matrix:" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X_submit = cv.transform(df_test['better_ingredients'].values)" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n", " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n", " verbose=0, warm_start=False)" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logistic.fit(X, y)" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y_submit = logistic.predict(X_submit)" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with open(\"submission_better_words.csv\", 'w') as f:\n", " f.write(\"id,cuisine\\n\")\n", " for i, cuisine in zip(df_test.id, enc.inverse_transform(y_submit)):\n", " f.write(\"{},{}\\n\".format(i, cuisine))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My submission (to the already closed contest) achieves a ranking of 576th, nothing to frill about. That's too bad. I would have hoped it would somehow improve my score, even though the K-fold had shown worse performance. Nevertheless, I think what is valuable here is that we have devised a way to improve a given text feature vector by using simple probability. This technique might be useful in other machine learning settings, when one can't trust the absolute reliability of the data. As demonstrated on the most exotic ingredients found in the dataset, this method can perform very well and transform an ingredient that is quite exotic into something that still makes sense." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }