{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Modern NLP in Python\n", "### _- Or -_\n", "## What you can learn about food by analyzing a million Yelp reviews" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Before we get started...\n", "__whois?__\n", "- Patrick Harrison\n", "- Lead Data Scientist @ S&P Global Market Intelligence - _**we are hiring**_\n", "- University of Virginia — Systems Engineering\n", "- patrick@skipgram.io / @skipgram\n", "\n", "__Join Charlottesville Data Science!__\n", "- On Meetup.com ... http://www.meetup.com/CharlottesvilleDataScience\n", "- On Slack ... __https://cville.typeform.com/to/UEzMVh__\n", " - _link invites you to join the Cville team on Slack. Join Cville, then join channel __#datascience__._\n", " \n", "_Note: I presented this notebook as a tutorial during the [PyData DC 2016 conference](http://pydata.org/dc2016/schedule/presentation/11/). To view the video of the presentation on YouTube, see [here](https://www.youtube.com/watch?v=6zm9NC9uRkk)._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Our Trail Map\n", "This tutorial features an end-to-end data science & natural language processing pipeline, starting with **raw data** and running through **preparing**, **modeling**, **visualizing**, and **analyzing** the data. We'll touch on the following points:\n", "1. A tour of the dataset\n", "1. Introduction to text processing with spaCy\n", "1. Automatic phrase modeling\n", "1. Topic modeling with LDA\n", "1. Visualizing topic models with pyLDAvis\n", "1. Word vector models with word2vec\n", "1. Visualizing word2vec with t-SNE\n", "\n", "...and we might even learn a thing or two about Python along the way.\n", "\n", "Let's get started!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Yelp Dataset\n", "[**The Yelp Dataset**](https://www.yelp.com/dataset_challenge/) is a dataset published by the business review service [Yelp](http://yelp.com) for academic research and educational purposes. I really like the Yelp dataset as a subject for machine learning and natural language processing demos, because it's big (but not so big that you need your own data center to process it), well-connected, and anyone can relate to it — it's largely about food, after all!\n", "\n", "**Note:** If you'd like to execute this notebook interactively on your local machine, you'll need to download your own copy of the Yelp dataset. If you're reviewing a static copy of the notebook online, you can skip this step. Here's how to get the dataset:\n", "1. Please visit the Yelp dataset webpage [here](https://www.yelp.com/dataset_challenge/)\n", "1. Click \"Get the Data\"\n", "1. Please review, agree to, and respect Yelp's terms of use!\n", "1. The dataset downloads as a compressed .tgz file; uncompress it\n", "1. Place the uncompressed dataset files (*yelp_academic_dataset_business.json*, etc.) in a directory named *yelp_dataset_challenge_academic_dataset*\n", "1. Place the *yelp_dataset_challenge_academic_dataset* within the *data* directory in the *Modern NLP in Python* project folder\n", "\n", "That's it! You're ready to go.\n", "\n", "The current iteration of the Yelp dataset (as of this demo) consists of the following data:\n", "- __552K__ users\n", "- __77K__ businesses\n", "- __2.2M__ user reviews\n", "\n", "When focusing on restaurants alone, there are approximately __22K__ restaurants with approximately __1M__ user reviews written about them.\n", "\n", "The data is provided in a handful of files in _.json_ format. We'll be using the following files for our demo:\n", "- __yelp\\_academic\\_dataset\\_business.json__ — _the records for individual businesses_\n", "- __yelp\\_academic\\_dataset\\_review.json__ — _the records for reviews users wrote about businesses_\n", "\n", "The files are text files (UTF-8) with one _json object_ per line, each one corresponding to an individual data record. Let's take a look at a few examples." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"business_id\": \"vcNAWiLM4dR7D2nwwJ7nCA\", \"full_address\": \"4840 E Indian School Rd\\nSte 101\\nPhoenix, AZ 85018\", \"hours\": {\"Tuesday\": {\"close\": \"17:00\", \"open\": \"08:00\"}, \"Friday\": {\"close\": \"17:00\", \"open\": \"08:00\"}, \"Monday\": {\"close\": \"17:00\", \"open\": \"08:00\"}, \"Wednesday\": {\"close\": \"17:00\", \"open\": \"08:00\"}, \"Thursday\": {\"close\": \"17:00\", \"open\": \"08:00\"}}, \"open\": true, \"categories\": [\"Doctors\", \"Health & Medical\"], \"city\": \"Phoenix\", \"review_count\": 9, \"name\": \"Eric Goldberg, MD\", \"neighborhoods\": [], \"longitude\": -111.98375799999999, \"state\": \"AZ\", \"stars\": 3.5, \"latitude\": 33.499313000000001, \"attributes\": {\"By Appointment Only\": true}, \"type\": \"business\"}\n", "\n" ] } ], "source": [ "import os\n", "import codecs\n", "\n", "data_directory = os.path.join('..', 'data',\n", " 'yelp_dataset_challenge_academic_dataset')\n", "\n", "businesses_filepath = os.path.join(data_directory,\n", " 'yelp_academic_dataset_business.json')\n", "\n", "with codecs.open(businesses_filepath, encoding='utf_8') as f:\n", " first_business_record = f.readline() \n", "\n", "print first_business_record" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The business records consist of _key, value_ pairs containing information about the particular business. A few attributes we'll be interested in for this demo include:\n", "- __business\\_id__ — _unique identifier for businesses_\n", "- __categories__ — _an array containing relevant category values of businesses_\n", "\n", "The _categories_ attribute is of special interest. This demo will focus on restaurants, which are indicated by the presence of the _Restaurant_ tag in the _categories_ array. In addition, the _categories_ array may contain more detailed information about restaurants, such as the type of food they serve." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The review records are stored in a similar manner — _key, value_ pairs containing information about the reviews." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"votes\": {\"funny\": 0, \"useful\": 2, \"cool\": 1}, \"user_id\": \"Xqd0DzHaiyRqVH3WRG7hzg\", \"review_id\": \"15SdjuK7DmYqUAj6rjGowg\", \"stars\": 5, \"date\": \"2007-05-17\", \"text\": \"dr. goldberg offers everything i look for in a general practitioner. he's nice and easy to talk to without being patronizing; he's always on time in seeing his patients; he's affiliated with a top-notch hospital (nyu) which my parents have explained to me is very important in case something happens and you need surgery; and you can get referrals to see specialists without having to see him first. really, what more do you need? i'm sitting here trying to think of any complaints i have about him, but i'm really drawing a blank.\", \"type\": \"review\", \"business_id\": \"vcNAWiLM4dR7D2nwwJ7nCA\"}\n", "\n" ] } ], "source": [ "review_json_filepath = os.path.join(data_directory,\n", " 'yelp_academic_dataset_review.json')\n", "\n", "with codecs.open(review_json_filepath, encoding='utf_8') as f:\n", " first_review_record = f.readline()\n", " \n", "print first_review_record" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few attributes of note on the review records:\n", "- __business\\_id__ — _indicates which business the review is about_\n", "- __text__ — _the natural language text the user wrote_\n", "\n", "The _text_ attribute will be our focus today!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_json_ is a handy file format for data interchange, but it's typically not the most usable for any sort of modeling work. Let's do a bit more data preparation to get our data in a more usable format. Our next code block will do the following:\n", "1. Read in each business record and convert it to a Python `dict`\n", "2. Filter out business records that aren't about restaurants (i.e., not in the \"Restaurant\" category)\n", "3. Create a `frozenset` of the business IDs for restaurants, which we'll use in the next step" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21,892 restaurants in the dataset.\n" ] } ], "source": [ "import json\n", "\n", "restaurant_ids = set()\n", "\n", "# open the businesses file\n", "with codecs.open(businesses_filepath, encoding='utf_8') as f:\n", " \n", " # iterate through each line (json record) in the file\n", " for business_json in f:\n", " \n", " # convert the json record to a Python dict\n", " business = json.loads(business_json)\n", " \n", " # if this business is not a restaurant, skip to the next one\n", " if u'Restaurants' not in business[u'categories']:\n", " continue\n", " \n", " # add the restaurant business id to our restaurant_ids set\n", " restaurant_ids.add(business[u'business_id'])\n", "\n", "# turn restaurant_ids into a frozenset, as we don't need to change it anymore\n", "restaurant_ids = frozenset(restaurant_ids)\n", "\n", "# print the number of unique restaurant ids in the dataset\n", "print '{:,}'.format(len(restaurant_ids)), u'restaurants in the dataset.'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we will create a new file that contains only the text from reviews about restaurants, with one review per line in the file." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "intermediate_directory = os.path.join('..', 'intermediate')\n", "\n", "review_txt_filepath = os.path.join(intermediate_directory,\n", " 'review_text_all.txt')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Text from 991,714 restaurant reviews in the txt file.\n", "CPU times: user 26.7 s, sys: 1.21 s, total: 27.9 s\n", "Wall time: 28.1 s\n" ] } ], "source": [ "%%time\n", "\n", "# this is a bit time consuming - make the if statement True\n", "# if you want to execute data prep yourself.\n", "if 0 == 1:\n", " \n", " review_count = 0\n", "\n", " # create & open a new file in write mode\n", " with codecs.open(review_txt_filepath, 'w', encoding='utf_8') as review_txt_file:\n", "\n", " # open the existing review json file\n", " with codecs.open(review_json_filepath, encoding='utf_8') as review_json_file:\n", "\n", " # loop through all reviews in the existing file and convert to dict\n", " for review_json in review_json_file:\n", " review = json.loads(review_json)\n", "\n", " # if this review is not about a restaurant, skip to the next one\n", " if review[u'business_id'] not in restaurant_ids:\n", " continue\n", "\n", " # write the restaurant review as a line in the new file\n", " # escape newline characters in the original review text\n", " review_txt_file.write(review[u'text'].replace('\\n', '\\\\n') + '\\n')\n", " review_count += 1\n", "\n", " print u'''Text from {:,} restaurant reviews\n", " written to the new txt file.'''.format(review_count)\n", " \n", "else:\n", " \n", " with codecs.open(review_txt_filepath, encoding='utf_8') as review_txt_file:\n", " for review_count, line in enumerate(review_txt_file):\n", " pass\n", " \n", " print u'Text from {:,} restaurant reviews in the txt file.'.format(review_count + 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## spaCy — Industrial-Strength NLP in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![spaCy](https://s3.amazonaws.com/skipgram-images/spaCy.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[**spaCy**](https://spacy.io) is an industrial-strength natural language processing (_NLP_) library for Python. spaCy's goal is to take recent advancements in natural language processing out of research papers and put them in the hands of users to build production software.\n", "\n", "spaCy handles many tasks commonly associated with building an end-to-end natural language processing pipeline:\n", "- Tokenization\n", "- Text normalization, such as lowercasing, stemming/lemmatization\n", "- Part-of-speech tagging\n", "- Syntactic dependency parsing\n", "- Sentence boundary detection\n", "- Named entity recognition and annotation\n", "\n", "In the \"batteries included\" Python tradition, spaCy contains built-in data and models which you can use out-of-the-box for processing general-purpose English language text:\n", "- Large English vocabulary, including stopword lists\n", "- Token \"probabilities\"\n", "- Word vectors\n", "\n", "spaCy is written in optimized Cython, which means it's _fast_. According to a few independent sources, it's the fastest syntactic parser available in any language. Key pieces of the spaCy parsing pipeline are written in pure C, enabling efficient multithreading (i.e., spaCy can release the _GIL_)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import spacy\n", "import pandas as pd\n", "import itertools as it\n", "\n", "nlp = spacy.load('en')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's grab a sample review to play with." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "After a morning of Thrift Store hunting, a friend and I were thinking of lunch, and he suggested Emil's after he'd seen Chris Sebak do a bit on it and had tried it a time or two before, and I had not. He said they had a decent Reuben, but to be prepared to step back in time.\n", "\n", "Well, seeing as how I'm kind of addicted to late 40's and early 50's, and the whole Rat Pack scene, stepping back in time is a welcomed change in da burgh...as long as it doesn't involve 1979, which I can see all around me every day.\n", "\n", "And yet another shot at finding a decent Reuben in da burgh...well, that's like hunting the Holy Grail. So looking under one more bush certainly wouldn't hurt.\n", "\n", "So off we go right at lunchtime in the middle of...where exactly were we? At first I thought we were lost, driving around a handful of very rather dismal looking blocks in what looked like a neighborhood that had been blighted by the building of a highway. And then...AHA! Here it is! And yep, there it was. This little unassuming building with an add-on entrance with what looked like a very old hand painted sign stating quite simply 'Emil's. \n", "\n", "We walked in the front door, and entered another world. Another time, and another place. Oh, and any Big Burrito/Sousa foodies might as well stop reading now. I wouldn't want to see you walk in, roll your eyes and say 'Reaaaaaalllly?'\n", "\n", "This is about as old world bar/lounge/restaurant as it gets. Plain, with a dark wood bar on one side, plain white walls with no yinzer pics, good sturdy chairs and actual white linens on the tables. This is the kind of neighborhood dive that I could see Frank and Dino pulling a few tables together for some poker, a fish sammich, and some cheap scotch. And THAT is exactly what I love.\n", "\n", "Oh...but good food counts too. \n", "\n", "We each had a Reuben, and my friend had a side of fries. The Reubens were decent, but not NY awesome. A little too thick on the bread, but overall, tasty and definitely filling. Not too skimpy on the meat. I seriously CRAVE a true, good NY Reuben, but since I can't afford to travel right now, what I find in da burgh will have to do. But as we sat and ate, burgers came out to an adjoining table. Those were some big thick burgers. A steak went past for the table behind us. That was HUGE! And when we asked about it, the waitress said 'Yeah, it's huge and really good, and he only charges $12.99 for it, ain't that nuts?' Another table of five came in, and wham. Fish sandwiches PILED with breaded fish that looked amazing. Yeah, I want that, that, that and THAT!\n", "\n", "My friend also mentioned that they have a Chicken Parm special one day of the week that is only served UNTIL 4 pm, and that it is fantastic. If only I could GET there on that week day before 4...\n", "\n", "The waitress did a good job, especially since there was quite a growing crowd at lunchtime on a Saturday, and only one of her. She kept up and was very friendly. \n", "\n", "They only have Pepsi products, so I had a brewed iced tea, which was very fresh, and she did pop by to ask about refills as often as she could. As the lunch hour went on, they were getting busy.\n", "\n", "Emil's is no frills, good portions, very reasonable prices, VERY comfortable neighborhood hole in the wall...kind of like Cheers, but in a blue collar neighborhood in the 1950's. Fan-freakin-tastic! I could feel at home here.\n", "\n", "You definitely want to hit Mapquest or plug in your GPS though. I am not sure that I could find it again on my own...it really is a hidden gem. I will be making my friend take me back until I can memorize where the heck it is.\n", "\n", "Addendum: 2nd visit for the fish sandwich. Excellent. Truly. A pound of fish on a fish-shaped bun (as opposed to da burgh's seemingly popular hamburger bun). The fish was flavorful, the batter excellent, and for just $8. This may have been the best fish sandwich I've yet to have in da burgh.\n", "\n" ] } ], "source": [ "with codecs.open(review_txt_filepath, encoding='utf_8') as f:\n", " sample_review = list(it.islice(f, 8, 9))[0]\n", " sample_review = sample_review.replace('\\\\n', '\\n')\n", " \n", "print sample_review" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hand the review text to spaCy, and be prepared to wait..." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 222 ms, sys: 11.6 ms, total: 234 ms\n", "Wall time: 251 ms\n" ] } ], "source": [ "%%time\n", "parsed_review = nlp(sample_review)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...1/20th of a second or so. Let's take a look at what we got during that time..." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "After a morning of Thrift Store hunting, a friend and I were thinking of lunch, and he suggested Emil's after he'd seen Chris Sebak do a bit on it and had tried it a time or two before, and I had not. He said they had a decent Reuben, but to be prepared to step back in time.\n", "\n", "Well, seeing as how I'm kind of addicted to late 40's and early 50's, and the whole Rat Pack scene, stepping back in time is a welcomed change in da burgh...as long as it doesn't involve 1979, which I can see all around me every day.\n", "\n", "And yet another shot at finding a decent Reuben in da burgh...well, that's like hunting the Holy Grail. So looking under one more bush certainly wouldn't hurt.\n", "\n", "So off we go right at lunchtime in the middle of...where exactly were we? At first I thought we were lost, driving around a handful of very rather dismal looking blocks in what looked like a neighborhood that had been blighted by the building of a highway. And then...AHA! Here it is! And yep, there it was. This little unassuming building with an add-on entrance with what looked like a very old hand painted sign stating quite simply 'Emil's. \n", "\n", "We walked in the front door, and entered another world. Another time, and another place. Oh, and any Big Burrito/Sousa foodies might as well stop reading now. I wouldn't want to see you walk in, roll your eyes and say 'Reaaaaaalllly?'\n", "\n", "This is about as old world bar/lounge/restaurant as it gets. Plain, with a dark wood bar on one side, plain white walls with no yinzer pics, good sturdy chairs and actual white linens on the tables. This is the kind of neighborhood dive that I could see Frank and Dino pulling a few tables together for some poker, a fish sammich, and some cheap scotch. And THAT is exactly what I love.\n", "\n", "Oh...but good food counts too. \n", "\n", "We each had a Reuben, and my friend had a side of fries. The Reubens were decent, but not NY awesome. A little too thick on the bread, but overall, tasty and definitely filling. Not too skimpy on the meat. I seriously CRAVE a true, good NY Reuben, but since I can't afford to travel right now, what I find in da burgh will have to do. But as we sat and ate, burgers came out to an adjoining table. Those were some big thick burgers. A steak went past for the table behind us. That was HUGE! And when we asked about it, the waitress said 'Yeah, it's huge and really good, and he only charges $12.99 for it, ain't that nuts?' Another table of five came in, and wham. Fish sandwiches PILED with breaded fish that looked amazing. Yeah, I want that, that, that and THAT!\n", "\n", "My friend also mentioned that they have a Chicken Parm special one day of the week that is only served UNTIL 4 pm, and that it is fantastic. If only I could GET there on that week day before 4...\n", "\n", "The waitress did a good job, especially since there was quite a growing crowd at lunchtime on a Saturday, and only one of her. She kept up and was very friendly. \n", "\n", "They only have Pepsi products, so I had a brewed iced tea, which was very fresh, and she did pop by to ask about refills as often as she could. As the lunch hour went on, they were getting busy.\n", "\n", "Emil's is no frills, good portions, very reasonable prices, VERY comfortable neighborhood hole in the wall...kind of like Cheers, but in a blue collar neighborhood in the 1950's. Fan-freakin-tastic! I could feel at home here.\n", "\n", "You definitely want to hit Mapquest or plug in your GPS though. I am not sure that I could find it again on my own...it really is a hidden gem. I will be making my friend take me back until I can memorize where the heck it is.\n", "\n", "Addendum: 2nd visit for the fish sandwich. Excellent. Truly. A pound of fish on a fish-shaped bun (as opposed to da burgh's seemingly popular hamburger bun). The fish was flavorful, the batter excellent, and for just $8. This may have been the best fish sandwich I've yet to have in da burgh.\n", "\n" ] } ], "source": [ "print parsed_review" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks the same! What happened under the hood?\n", "\n", "What about sentence detection and segmentation?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sentence 1:\n", "After a morning of Thrift Store hunting, a friend and I were thinking of lunch, and he suggested Emil's after he'd seen Chris Sebak do a bit on it and had tried it a time or two before, and I had not.\n", "\n", "Sentence 2:\n", "He said they had a decent Reuben, but to be prepared to step back in time.\n", "\n", "\n", "\n", "Sentence 3:\n", "Well, seeing as how I'm kind of addicted to late 40's and early 50's, and the whole Rat Pack scene, stepping back in time is a welcomed change in da burgh...as long as it doesn't involve 1979, which I can see all around me every day.\n", "\n", "\n", "\n", "Sentence 4:\n", "And yet another shot at finding a decent Reuben in da burgh...\n", "\n", "Sentence 5:\n", "well, that's like hunting the Holy Grail.\n", "\n", "Sentence 6:\n", "So looking under one more bush certainly wouldn't hurt.\n", "\n", "\n", "\n", "Sentence 7:\n", "So off we go right at lunchtime in the middle of...where exactly were we?\n", "\n", "Sentence 8:\n", "At first I thought we were lost, driving around a handful of very rather dismal looking blocks in what looked like a neighborhood that had been blighted by the building of a highway.\n", "\n", "Sentence 9:\n", "And then...AHA!\n", "\n", "Sentence 10:\n", "Here it is!\n", "\n", "Sentence 11:\n", "And yep, there it was.\n", "\n", "Sentence 12:\n", "This little unassuming building with an add-on entrance with what looked like a very old hand painted sign stating quite simply 'Emil's. \n", "\n", "\n", "\n", "Sentence 13:\n", "We walked in the front door, and entered another world.\n", "\n", "Sentence 14:\n", "Another time, and another place.\n", "\n", "Sentence 15:\n", "Oh, and any Big Burrito/Sousa foodies might as well stop reading now.\n", "\n", "Sentence 16:\n", "I wouldn't want to see you walk in, roll your eyes and say 'Reaaaaaalllly?'\n", "\n", "\n", "\n", "Sentence 17:\n", "This is about as old world bar/lounge/restaurant as it gets.\n", "\n", "Sentence 18:\n", "Plain, with a dark wood bar on one side, plain white walls with no yinzer pics, good sturdy chairs and actual white linens on the tables.\n", "\n", "Sentence 19:\n", "This is the kind of neighborhood dive that I could see Frank and Dino pulling a few tables together for some poker, a fish sammich, and some cheap scotch.\n", "\n", "Sentence 20:\n", "And THAT is exactly what I love.\n", "\n", "\n", "\n", "Sentence 21:\n", "Oh...but good food counts too. \n", "\n", "\n", "\n", "Sentence 22:\n", "We each had a Reuben, and my friend had a side of fries.\n", "\n", "Sentence 23:\n", "The Reubens were decent, but not NY awesome.\n", "\n", "Sentence 24:\n", "A little too thick on the bread, but overall, tasty and definitely filling.\n", "\n", "Sentence 25:\n", "Not too skimpy on the meat.\n", "\n", "Sentence 26:\n", "I seriously CRAVE a true, good NY Reuben, but since I can't afford to travel right now, what I find in da burgh will have to do.\n", "\n", "Sentence 27:\n", "But as we sat and ate, burgers came out to an adjoining table.\n", "\n", "Sentence 28:\n", "Those were some big thick burgers.\n", "\n", "Sentence 29:\n", "A steak went past for the table behind us.\n", "\n", "Sentence 30:\n", "That was HUGE!\n", "\n", "Sentence 31:\n", "And when we asked about it, the waitress said 'Yeah, it's huge and really good, and he only charges $12.99 for it, ain't that nuts?'\n", "\n", "Sentence 32:\n", "Another table of five came in, and wham.\n", "\n", "Sentence 33:\n", "Fish sandwiches PILED with breaded fish that looked amazing.\n", "\n", "Sentence 34:\n", "Yeah, I want that, that, that and THAT!\n", "\n", "\n", "\n", "Sentence 35:\n", "My friend also mentioned that they have a Chicken Parm special one day of the week that is only served UNTIL 4 pm, and that it is fantastic.\n", "\n", "Sentence 36:\n", "If only I could GET there on that week day before 4...\n", "\n", "\n", "\n", "Sentence 37:\n", "The waitress did a good job, especially since there was quite a growing crowd at lunchtime on a Saturday, and only one of her.\n", "\n", "Sentence 38:\n", "She kept up and was very friendly. \n", "\n", "\n", "\n", "Sentence 39:\n", "They only have Pepsi products, so I had a brewed iced tea, which was very fresh, and she did pop by to ask about refills as often as she could.\n", "\n", "Sentence 40:\n", "As the lunch hour went on, they were getting busy.\n", "\n", "\n", "\n", "Sentence 41:\n", "Emil's is no frills, good portions, very reasonable prices, VERY comfortable neighborhood hole in the wall...\n", "\n", "Sentence 42:\n", "kind of like Cheers, but in a blue collar neighborhood in the 1950's.\n", "\n", "Sentence 43:\n", "Fan-freakin-tastic!\n", "\n", "Sentence 44:\n", "I could feel at home here.\n", "\n", "\n", "\n", "Sentence 45:\n", "You definitely want to hit Mapquest or plug in your GPS though.\n", "\n", "Sentence 46:\n", "I am not sure that I could find it again on my own...it really is a hidden gem.\n", "\n", "Sentence 47:\n", "I will be making my friend take me back until I can memorize where the heck it is.\n", "\n", "\n", "\n", "Sentence 48:\n", "Addendum: 2nd visit for the fish sandwich.\n", "\n", "Sentence 49:\n", "Excellent.\n", "\n", "Sentence 50:\n", "Truly.\n", "\n", "Sentence 51:\n", "A pound of fish on a fish-shaped bun (as opposed to da burgh's seemingly popular hamburger bun).\n", "\n", "Sentence 52:\n", "The fish was flavorful, the batter excellent, and for just $8.\n", "\n", "Sentence 53:\n", "This may have been the best fish sandwich I've yet to have in da burgh.\n", "\n", "\n" ] } ], "source": [ "for num, sentence in enumerate(parsed_review.sents):\n", " print 'Sentence {}:'.format(num + 1)\n", " print sentence\n", " print ''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about named entity detection?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Entity 1: Thrift Store - ORG\n", "\n", "Entity 2: Emil - PERSON\n", "\n", "Entity 3: Chris Sebak - PERSON\n", "\n", "Entity 4: two - CARDINAL\n", "\n", "Entity 5: Reuben - PERSON\n", "\n", "Entity 6: Rat Pack - ORG\n", "\n", "Entity 7: 1979 - DATE\n", "\n", "Entity 8: every day - DATE\n", "\n", "Entity 9: Reuben - PERSON\n", "\n", "Entity 10: one - CARDINAL\n", "\n", "Entity 11: Emil - PERSON\n", "\n", "Entity 12: Frank - PERSON\n", "\n", "Entity 13: Dino - PERSON\n", "\n", "Entity 14: Reuben - PERSON\n", "\n", "Entity 15: Reubens - PERSON\n", "\n", "Entity 16: Reuben - PERSON\n", "\n", "Entity 17: HUGE - ORG\n", "\n", "Entity 18: 12.99 - MONEY\n", "\n", "Entity 19: five - CARDINAL\n", "\n", "Entity 20: one day - DATE\n", "\n", "Entity 21: UNTIL - ORG\n", "\n", "Entity 22: 4 pm - TIME\n", "\n", "Entity 23: that week day - DATE\n", "\n", "Entity 24: Saturday - DATE\n", "\n", "Entity 25: only one - CARDINAL\n", "\n", "Entity 26: Pepsi - ORG\n", "\n", "Entity 27: the lunch hour - TIME\n", "\n", "Entity 28: Emil - PERSON\n", "\n", "Entity 29: 1950 - DATE\n", "\n", "Entity 30: Mapquest - LOC\n", "\n", "Entity 31: 2nd - CARDINAL\n", "\n", "Entity 32: Truly - PERSON\n", "\n", "Entity 33: 8 - MONEY\n", "\n" ] } ], "source": [ "for num, entity in enumerate(parsed_review.ents):\n", " print 'Entity {}:'.format(num + 1), entity, '-', entity.label_\n", " print ''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about part of speech tagging?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | token_text | \n", "part_of_speech | \n", "
---|---|---|
0 | \n", "After | \n", "ADP | \n", "
1 | \n", "a | \n", "DET | \n", "
2 | \n", "morning | \n", "NOUN | \n", "
3 | \n", "of | \n", "ADP | \n", "
4 | \n", "Thrift | \n", "PROPN | \n", "
5 | \n", "Store | \n", "PROPN | \n", "
6 | \n", "hunting | \n", "NOUN | \n", "
7 | \n", ", | \n", "PUNCT | \n", "
8 | \n", "a | \n", "DET | \n", "
9 | \n", "friend | \n", "NOUN | \n", "
10 | \n", "and | \n", "CONJ | \n", "
11 | \n", "I | \n", "PRON | \n", "
12 | \n", "were | \n", "VERB | \n", "
13 | \n", "thinking | \n", "VERB | \n", "
14 | \n", "of | \n", "ADP | \n", "
15 | \n", "lunch | \n", "NOUN | \n", "
16 | \n", ", | \n", "PUNCT | \n", "
17 | \n", "and | \n", "CONJ | \n", "
18 | \n", "he | \n", "PRON | \n", "
19 | \n", "suggested | \n", "VERB | \n", "
20 | \n", "Emil | \n", "PROPN | \n", "
21 | \n", "'s | \n", "PART | \n", "
22 | \n", "after | \n", "ADP | \n", "
23 | \n", "he | \n", "PRON | \n", "
24 | \n", "'d | \n", "VERB | \n", "
25 | \n", "seen | \n", "VERB | \n", "
26 | \n", "Chris | \n", "PROPN | \n", "
27 | \n", "Sebak | \n", "PROPN | \n", "
28 | \n", "do | \n", "VERB | \n", "
29 | \n", "a | \n", "DET | \n", "
... | \n", "... | \n", "... | \n", "
855 | \n", "flavorful | \n", "ADJ | \n", "
856 | \n", ", | \n", "PUNCT | \n", "
857 | \n", "the | \n", "DET | \n", "
858 | \n", "batter | \n", "NOUN | \n", "
859 | \n", "excellent | \n", "ADJ | \n", "
860 | \n", ", | \n", "PUNCT | \n", "
861 | \n", "and | \n", "CONJ | \n", "
862 | \n", "for | \n", "ADP | \n", "
863 | \n", "just | \n", "ADV | \n", "
864 | \n", "$ | \n", "SYM | \n", "
865 | \n", "8 | \n", "NUM | \n", "
866 | \n", ". | \n", "PUNCT | \n", "
867 | \n", "This | \n", "DET | \n", "
868 | \n", "may | \n", "VERB | \n", "
869 | \n", "have | \n", "VERB | \n", "
870 | \n", "been | \n", "VERB | \n", "
871 | \n", "the | \n", "DET | \n", "
872 | \n", "best | \n", "ADJ | \n", "
873 | \n", "fish | \n", "NOUN | \n", "
874 | \n", "sandwich | \n", "NOUN | \n", "
875 | \n", "I | \n", "PRON | \n", "
876 | \n", "'ve | \n", "VERB | \n", "
877 | \n", "yet | \n", "ADV | \n", "
878 | \n", "to | \n", "PART | \n", "
879 | \n", "have | \n", "VERB | \n", "
880 | \n", "in | \n", "ADP | \n", "
881 | \n", "da | \n", "PROPN | \n", "
882 | \n", "burgh | \n", "NOUN | \n", "
883 | \n", ". | \n", "PUNCT | \n", "
884 | \n", "\\n | \n", "SPACE | \n", "
885 rows × 2 columns
\n", "\n", " | token_text | \n", "token_lemma | \n", "token_shape | \n", "
---|---|---|---|
0 | \n", "After | \n", "after | \n", "Xxxxx | \n", "
1 | \n", "a | \n", "a | \n", "x | \n", "
2 | \n", "morning | \n", "morning | \n", "xxxx | \n", "
3 | \n", "of | \n", "of | \n", "xx | \n", "
4 | \n", "Thrift | \n", "thrift | \n", "Xxxxx | \n", "
5 | \n", "Store | \n", "store | \n", "Xxxxx | \n", "
6 | \n", "hunting | \n", "hunting | \n", "xxxx | \n", "
7 | \n", ", | \n", ", | \n", ", | \n", "
8 | \n", "a | \n", "a | \n", "x | \n", "
9 | \n", "friend | \n", "friend | \n", "xxxx | \n", "
10 | \n", "and | \n", "and | \n", "xxx | \n", "
11 | \n", "I | \n", "i | \n", "X | \n", "
12 | \n", "were | \n", "be | \n", "xxxx | \n", "
13 | \n", "thinking | \n", "think | \n", "xxxx | \n", "
14 | \n", "of | \n", "of | \n", "xx | \n", "
15 | \n", "lunch | \n", "lunch | \n", "xxxx | \n", "
16 | \n", ", | \n", ", | \n", ", | \n", "
17 | \n", "and | \n", "and | \n", "xxx | \n", "
18 | \n", "he | \n", "he | \n", "xx | \n", "
19 | \n", "suggested | \n", "suggest | \n", "xxxx | \n", "
20 | \n", "Emil | \n", "emil | \n", "Xxxx | \n", "
21 | \n", "'s | \n", "'s | \n", "'x | \n", "
22 | \n", "after | \n", "after | \n", "xxxx | \n", "
23 | \n", "he | \n", "he | \n", "xx | \n", "
24 | \n", "'d | \n", "would | \n", "'x | \n", "
25 | \n", "seen | \n", "see | \n", "xxxx | \n", "
26 | \n", "Chris | \n", "chris | \n", "Xxxxx | \n", "
27 | \n", "Sebak | \n", "sebak | \n", "Xxxxx | \n", "
28 | \n", "do | \n", "do | \n", "xx | \n", "
29 | \n", "a | \n", "a | \n", "x | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
855 | \n", "flavorful | \n", "flavorful | \n", "xxxx | \n", "
856 | \n", ", | \n", ", | \n", ", | \n", "
857 | \n", "the | \n", "the | \n", "xxx | \n", "
858 | \n", "batter | \n", "batter | \n", "xxxx | \n", "
859 | \n", "excellent | \n", "excellent | \n", "xxxx | \n", "
860 | \n", ", | \n", ", | \n", ", | \n", "
861 | \n", "and | \n", "and | \n", "xxx | \n", "
862 | \n", "for | \n", "for | \n", "xxx | \n", "
863 | \n", "just | \n", "just | \n", "xxxx | \n", "
864 | \n", "$ | \n", "$ | \n", "$ | \n", "
865 | \n", "8 | \n", "8 | \n", "d | \n", "
866 | \n", ". | \n", ". | \n", ". | \n", "
867 | \n", "This | \n", "this | \n", "Xxxx | \n", "
868 | \n", "may | \n", "may | \n", "xxx | \n", "
869 | \n", "have | \n", "have | \n", "xxxx | \n", "
870 | \n", "been | \n", "be | \n", "xxxx | \n", "
871 | \n", "the | \n", "the | \n", "xxx | \n", "
872 | \n", "best | \n", "best | \n", "xxxx | \n", "
873 | \n", "fish | \n", "fish | \n", "xxxx | \n", "
874 | \n", "sandwich | \n", "sandwich | \n", "xxxx | \n", "
875 | \n", "I | \n", "i | \n", "X | \n", "
876 | \n", "'ve | \n", "have | \n", "'xx | \n", "
877 | \n", "yet | \n", "yet | \n", "xxx | \n", "
878 | \n", "to | \n", "to | \n", "xx | \n", "
879 | \n", "have | \n", "have | \n", "xxxx | \n", "
880 | \n", "in | \n", "in | \n", "xx | \n", "
881 | \n", "da | \n", "da | \n", "xx | \n", "
882 | \n", "burgh | \n", "burgh | \n", "xxxx | \n", "
883 | \n", ". | \n", ". | \n", ". | \n", "
884 | \n", "\\n | \n", "\\n | \n", "\\n | \n", "
885 rows × 3 columns
\n", "\n", " | token_text | \n", "entity_type | \n", "inside_outside_begin | \n", "
---|---|---|---|
0 | \n", "After | \n", "\n", " | O | \n", "
1 | \n", "a | \n", "\n", " | O | \n", "
2 | \n", "morning | \n", "\n", " | O | \n", "
3 | \n", "of | \n", "\n", " | O | \n", "
4 | \n", "Thrift | \n", "ORG | \n", "B | \n", "
5 | \n", "Store | \n", "ORG | \n", "I | \n", "
6 | \n", "hunting | \n", "\n", " | O | \n", "
7 | \n", ", | \n", "\n", " | O | \n", "
8 | \n", "a | \n", "\n", " | O | \n", "
9 | \n", "friend | \n", "\n", " | O | \n", "
10 | \n", "and | \n", "\n", " | O | \n", "
11 | \n", "I | \n", "\n", " | O | \n", "
12 | \n", "were | \n", "\n", " | O | \n", "
13 | \n", "thinking | \n", "\n", " | O | \n", "
14 | \n", "of | \n", "\n", " | O | \n", "
15 | \n", "lunch | \n", "\n", " | O | \n", "
16 | \n", ", | \n", "\n", " | O | \n", "
17 | \n", "and | \n", "\n", " | O | \n", "
18 | \n", "he | \n", "\n", " | O | \n", "
19 | \n", "suggested | \n", "\n", " | O | \n", "
20 | \n", "Emil | \n", "PERSON | \n", "B | \n", "
21 | \n", "'s | \n", "\n", " | O | \n", "
22 | \n", "after | \n", "\n", " | O | \n", "
23 | \n", "he | \n", "\n", " | O | \n", "
24 | \n", "'d | \n", "\n", " | O | \n", "
25 | \n", "seen | \n", "\n", " | O | \n", "
26 | \n", "Chris | \n", "PERSON | \n", "B | \n", "
27 | \n", "Sebak | \n", "PERSON | \n", "I | \n", "
28 | \n", "do | \n", "\n", " | O | \n", "
29 | \n", "a | \n", "\n", " | O | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
855 | \n", "flavorful | \n", "\n", " | O | \n", "
856 | \n", ", | \n", "\n", " | O | \n", "
857 | \n", "the | \n", "\n", " | O | \n", "
858 | \n", "batter | \n", "\n", " | O | \n", "
859 | \n", "excellent | \n", "\n", " | O | \n", "
860 | \n", ", | \n", "\n", " | O | \n", "
861 | \n", "and | \n", "\n", " | O | \n", "
862 | \n", "for | \n", "\n", " | O | \n", "
863 | \n", "just | \n", "\n", " | O | \n", "
864 | \n", "$ | \n", "\n", " | O | \n", "
865 | \n", "8 | \n", "MONEY | \n", "B | \n", "
866 | \n", ". | \n", "\n", " | O | \n", "
867 | \n", "This | \n", "\n", " | O | \n", "
868 | \n", "may | \n", "\n", " | O | \n", "
869 | \n", "have | \n", "\n", " | O | \n", "
870 | \n", "been | \n", "\n", " | O | \n", "
871 | \n", "the | \n", "\n", " | O | \n", "
872 | \n", "best | \n", "\n", " | O | \n", "
873 | \n", "fish | \n", "\n", " | O | \n", "
874 | \n", "sandwich | \n", "\n", " | O | \n", "
875 | \n", "I | \n", "\n", " | O | \n", "
876 | \n", "'ve | \n", "\n", " | O | \n", "
877 | \n", "yet | \n", "\n", " | O | \n", "
878 | \n", "to | \n", "\n", " | O | \n", "
879 | \n", "have | \n", "\n", " | O | \n", "
880 | \n", "in | \n", "\n", " | O | \n", "
881 | \n", "da | \n", "\n", " | O | \n", "
882 | \n", "burgh | \n", "\n", " | O | \n", "
883 | \n", ". | \n", "\n", " | O | \n", "
884 | \n", "\\n | \n", "\n", " | O | \n", "
885 rows × 3 columns
\n", "\n", " | text | \n", "log_probability | \n", "stop? | \n", "punctuation? | \n", "whitespace? | \n", "number? | \n", "out of vocab.? | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "After | \n", "-9.091193 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
1 | \n", "a | \n", "-3.929788 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
2 | \n", "morning | \n", "-9.529314 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
3 | \n", "of | \n", "-4.275874 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
4 | \n", "Thrift | \n", "-14.550483 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
5 | \n", "Store | \n", "-11.719210 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
6 | \n", "hunting | \n", "-10.961483 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
7 | \n", ", | \n", "-3.454960 | \n", "\n", " | Yes | \n", "\n", " | \n", " | \n", " |
8 | \n", "a | \n", "-3.929788 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
9 | \n", "friend | \n", "-8.210516 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
10 | \n", "and | \n", "-4.113108 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
11 | \n", "I | \n", "-3.791565 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
12 | \n", "were | \n", "-6.673175 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
13 | \n", "thinking | \n", "-8.442947 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
14 | \n", "of | \n", "-4.275874 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
15 | \n", "lunch | \n", "-10.572958 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
16 | \n", ", | \n", "-3.454960 | \n", "\n", " | Yes | \n", "\n", " | \n", " | \n", " |
17 | \n", "and | \n", "-4.113108 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
18 | \n", "he | \n", "-5.931905 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
19 | \n", "suggested | \n", "-10.656719 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
20 | \n", "Emil | \n", "-15.862375 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
21 | \n", "'s | \n", "-4.830559 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
22 | \n", "after | \n", "-7.265652 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
23 | \n", "he | \n", "-5.931905 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
24 | \n", "'d | \n", "-7.075287 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
25 | \n", "seen | \n", "-7.973224 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
26 | \n", "Chris | \n", "-10.966099 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
27 | \n", "Sebak | \n", "-19.502029 | \n", "\n", " | \n", " | \n", " | \n", " | Yes | \n", "
28 | \n", "do | \n", "-5.246997 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
29 | \n", "a | \n", "-3.929788 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
855 | \n", "flavorful | \n", "-14.094742 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
856 | \n", ", | \n", "-3.454960 | \n", "\n", " | Yes | \n", "\n", " | \n", " | \n", " |
857 | \n", "the | \n", "-3.528767 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
858 | \n", "batter | \n", "-12.895466 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
859 | \n", "excellent | \n", "-10.147964 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
860 | \n", ", | \n", "-3.454960 | \n", "\n", " | Yes | \n", "\n", " | \n", " | \n", " |
861 | \n", "and | \n", "-4.113108 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
862 | \n", "for | \n", "-4.880109 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
863 | \n", "just | \n", "-5.630868 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
864 | \n", "$ | \n", "-7.450107 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
865 | \n", "8 | \n", "-8.940966 | \n", "\n", " | \n", " | \n", " | Yes | \n", "\n", " |
866 | \n", ". | \n", "-3.067898 | \n", "\n", " | Yes | \n", "\n", " | \n", " | \n", " |
867 | \n", "This | \n", "-6.783917 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
868 | \n", "may | \n", "-7.678495 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
869 | \n", "have | \n", "-5.156485 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
870 | \n", "been | \n", "-6.670917 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
871 | \n", "the | \n", "-3.528767 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
872 | \n", "best | \n", "-7.492557 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
873 | \n", "fish | \n", "-10.166230 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
874 | \n", "sandwich | \n", "-11.186007 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
875 | \n", "I | \n", "-3.791565 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
876 | \n", "'ve | \n", "-6.593011 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
877 | \n", "yet | \n", "-8.229137 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
878 | \n", "to | \n", "-3.856022 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
879 | \n", "have | \n", "-5.156485 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
880 | \n", "in | \n", "-4.619072 | \n", "Yes | \n", "\n", " | \n", " | \n", " | \n", " |
881 | \n", "da | \n", "-10.829142 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
882 | \n", "burgh | \n", "-16.942732 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
883 | \n", ". | \n", "-3.067898 | \n", "\n", " | Yes | \n", "\n", " | \n", " | \n", " |
884 | \n", "\\n | \n", "-6.050651 | \n", "\n", " | \n", " | Yes | \n", "\n", " | \n", " |
885 rows × 7 columns
\n", "\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "90 | \n", "91 | \n", "92 | \n", "93 | \n", "94 | \n", "95 | \n", "96 | \n", "97 | \n", "98 | \n", "99 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
the | \n", "-0.035762 | \n", "-0.173890 | \n", "-0.035782 | \n", "-0.007144 | \n", "0.032371 | \n", "-0.065272 | \n", "-0.219383 | \n", "-0.064665 | \n", "0.002739 | \n", "0.025802 | \n", "... | \n", "0.050136 | \n", "0.044030 | \n", "0.145281 | \n", "-0.020442 | \n", "0.128879 | \n", "-0.076461 | \n", "0.075532 | \n", "-0.012841 | \n", "0.024710 | \n", "-0.067555 | \n", "
be | \n", "-0.074780 | \n", "-0.049524 | \n", "0.085974 | \n", "-0.098892 | \n", "0.141556 | \n", "0.024878 | \n", "-0.011119 | \n", "-0.175374 | \n", "0.005410 | \n", "-0.110996 | \n", "... | \n", "-0.199047 | \n", "-0.081284 | \n", "-0.198344 | \n", "0.007257 | \n", "0.075339 | \n", "0.070266 | \n", "-0.008326 | \n", "-0.127542 | \n", "-0.046246 | \n", "0.110279 | \n", "
and | \n", "-0.070505 | \n", "-0.026918 | \n", "0.028344 | \n", "-0.099909 | \n", "0.127974 | \n", "-0.058155 | \n", "-0.056091 | \n", "-0.028973 | \n", "0.197281 | \n", "-0.040528 | \n", "... | \n", "-0.049051 | \n", "-0.212434 | \n", "-0.042576 | \n", "0.055731 | \n", "0.117097 | \n", "-0.206737 | \n", "0.055435 | \n", "-0.065056 | \n", "0.052316 | \n", "-0.078666 | \n", "
i | \n", "-0.161238 | \n", "0.050831 | \n", "-0.081706 | \n", "-0.084479 | \n", "0.053073 | \n", "-0.102327 | \n", "-0.108607 | \n", "-0.001920 | \n", "-0.057367 | \n", "-0.050715 | \n", "... | \n", "0.028528 | \n", "-0.016578 | \n", "-0.179229 | \n", "0.053357 | \n", "0.070913 | \n", "0.036893 | \n", "-0.000544 | \n", "-0.007254 | \n", "-0.056005 | \n", "0.106345 | \n", "
a | \n", "-0.083491 | \n", "-0.033712 | \n", "-0.124125 | \n", "-0.110776 | \n", "-0.033046 | \n", "-0.089950 | \n", "0.025416 | \n", "-0.052321 | \n", "-0.059281 | \n", "0.074985 | \n", "... | \n", "-0.101939 | \n", "0.022392 | \n", "0.057049 | \n", "0.015819 | \n", "-0.001798 | \n", "0.001103 | \n", "0.003096 | \n", "0.037175 | \n", "-0.074279 | \n", "0.001683 | \n", "
to | \n", "-0.012082 | \n", "0.033135 | \n", "-0.063183 | \n", "-0.057252 | \n", "-0.018721 | \n", "-0.017931 | \n", "-0.027784 | \n", "0.112110 | \n", "0.020549 | \n", "-0.174336 | \n", "... | \n", "-0.017111 | \n", "-0.067532 | \n", "-0.022149 | \n", "0.154788 | \n", "-0.093789 | \n", "-0.020456 | \n", "0.065478 | \n", "0.075484 | \n", "-0.053530 | \n", "-0.005314 | \n", "
it | \n", "0.025022 | \n", "0.081581 | \n", "0.127987 | \n", "-0.188015 | \n", "0.041450 | \n", "-0.126222 | \n", "0.172725 | \n", "-0.149931 | \n", "-0.069566 | \n", "-0.036031 | \n", "... | \n", "0.045720 | \n", "0.094828 | \n", "0.089329 | \n", "0.051623 | \n", "-0.108989 | \n", "-0.145476 | \n", "0.068617 | \n", "0.090687 | \n", "-0.101725 | \n", "0.090377 | \n", "
have | \n", "-0.140812 | \n", "-0.070552 | \n", "0.022102 | \n", "0.001077 | \n", "0.109890 | \n", "-0.061365 | \n", "0.046450 | \n", "0.003073 | \n", "0.113845 | \n", "-0.038957 | \n", "... | \n", "-0.051071 | \n", "-0.090922 | \n", "-0.022011 | \n", "0.157082 | \n", "-0.082406 | \n", "-0.010306 | \n", "-0.063481 | \n", "-0.098728 | \n", "-0.064020 | \n", "0.153466 | \n", "
of | \n", "-0.036341 | \n", "-0.054903 | \n", "0.000644 | \n", "-0.010602 | \n", "0.168195 | \n", "-0.058505 | \n", "-0.052342 | \n", "0.039159 | \n", "-0.053572 | \n", "-0.160039 | \n", "... | \n", "0.085908 | \n", "-0.211464 | \n", "-0.084990 | \n", "0.082315 | \n", "0.223018 | \n", "-0.142501 | \n", "0.280647 | \n", "0.003435 | \n", "-0.037710 | \n", "-0.145140 | \n", "
not | \n", "-0.075276 | \n", "0.109047 | \n", "0.055135 | \n", "0.052251 | \n", "0.209437 | \n", "0.084334 | \n", "-0.122419 | \n", "-0.193307 | \n", "0.000699 | \n", "-0.099067 | \n", "... | \n", "-0.150619 | \n", "-0.060446 | \n", "0.181940 | \n", "-0.118538 | \n", "-0.002879 | \n", "0.018827 | \n", "0.084586 | \n", "0.040437 | \n", "0.070277 | \n", "-0.047521 | \n", "
for | \n", "-0.102976 | \n", "0.001369 | \n", "-0.069402 | \n", "-0.122936 | \n", "0.028278 | \n", "-0.074256 | \n", "-0.013786 | \n", "-0.147065 | \n", "0.204125 | \n", "-0.033473 | \n", "... | \n", "0.032123 | \n", "0.013365 | \n", "0.008156 | \n", "-0.021331 | \n", "0.025385 | \n", "0.105075 | \n", "0.184737 | \n", "0.087325 | \n", "-0.230621 | \n", "0.075051 | \n", "
in | \n", "-0.053390 | \n", "-0.175599 | \n", "-0.091688 | \n", "-0.153791 | \n", "0.003205 | \n", "0.013146 | \n", "-0.013261 | \n", "0.162506 | \n", "-0.036985 | \n", "-0.123813 | \n", "... | \n", "-0.101997 | \n", "-0.025117 | \n", "0.101147 | \n", "0.002555 | \n", "-0.075434 | \n", "-0.031021 | \n", "0.170358 | \n", "-0.070997 | \n", "-0.143472 | \n", "-0.039543 | \n", "
we | \n", "-0.015929 | \n", "-0.019187 | \n", "-0.186680 | \n", "-0.240963 | \n", "0.077926 | \n", "-0.122313 | \n", "-0.183584 | \n", "-0.038707 | \n", "0.067121 | \n", "-0.108626 | \n", "... | \n", "0.054799 | \n", "-0.029601 | \n", "-0.197221 | \n", "-0.081994 | \n", "0.114129 | \n", "0.127746 | \n", "0.057743 | \n", "-0.044793 | \n", "-0.080014 | \n", "-0.001816 | \n", "
that | \n", "-0.026609 | \n", "0.085940 | \n", "0.118164 | \n", "0.011576 | \n", "0.156952 | \n", "-0.061402 | \n", "-0.068207 | \n", "0.008184 | \n", "-0.169472 | \n", "0.051105 | \n", "... | \n", "-0.088909 | \n", "0.062827 | \n", "-0.114507 | \n", "0.007300 | \n", "-0.075059 | \n", "-0.202200 | \n", "0.003658 | \n", "0.042448 | \n", "-0.091925 | \n", "0.045213 | \n", "
but | \n", "-0.063436 | \n", "0.089140 | \n", "-0.057425 | \n", "-0.093110 | \n", "0.066531 | \n", "-0.079715 | \n", "-0.049745 | \n", "-0.161346 | \n", "0.097094 | \n", "0.035439 | \n", "... | \n", "-0.129163 | \n", "-0.022460 | \n", "-0.200731 | \n", "0.079950 | \n", "-0.002590 | \n", "-0.113734 | \n", "0.048470 | \n", "0.037333 | \n", "0.111525 | \n", "-0.001558 | \n", "
with | \n", "0.042318 | \n", "-0.186670 | \n", "-0.230563 | \n", "0.076302 | \n", "0.216593 | \n", "-0.056183 | \n", "0.004471 | \n", "-0.087819 | \n", "0.073513 | \n", "-0.219137 | \n", "... | \n", "0.036514 | \n", "0.135176 | \n", "-0.056771 | \n", "-0.020261 | \n", "0.213735 | \n", "-0.116074 | \n", "0.162992 | \n", "0.015298 | \n", "-0.152731 | \n", "0.070306 | \n", "
my | \n", "-0.002067 | \n", "-0.023159 | \n", "0.035879 | \n", "0.036316 | \n", "-0.110738 | \n", "-0.033034 | \n", "-0.100291 | \n", "-0.039403 | \n", "0.109342 | \n", "0.024952 | \n", "... | \n", "-0.051417 | \n", "0.220378 | \n", "-0.106171 | \n", "0.159718 | \n", "-0.036391 | \n", "-0.025573 | \n", "0.133651 | \n", "-0.157615 | \n", "0.010161 | \n", "-0.172925 | \n", "
this | \n", "0.139739 | \n", "0.184600 | \n", "0.137359 | \n", "-0.109916 | \n", "0.021484 | \n", "-0.018423 | \n", "-0.027546 | \n", "-0.055886 | \n", "-0.137625 | \n", "-0.058589 | \n", "... | \n", "-0.035842 | \n", "-0.075413 | \n", "-0.068598 | \n", "0.122231 | \n", "-0.097841 | \n", "0.114074 | \n", "0.111075 | \n", "0.174843 | \n", "-0.018743 | \n", "0.087721 | \n", "
you | \n", "-0.171262 | \n", "-0.119866 | \n", "0.063801 | \n", "-0.087287 | \n", "-0.061923 | \n", "0.023105 | \n", "-0.196524 | \n", "-0.043654 | \n", "-0.003327 | \n", "-0.078496 | \n", "... | \n", "0.094615 | \n", "0.029243 | \n", "0.020553 | \n", "-0.101657 | \n", "0.039655 | \n", "0.059782 | \n", "-0.073931 | \n", "-0.002060 | \n", "-0.068405 | \n", "-0.246893 | \n", "
on | \n", "0.057481 | \n", "0.044937 | \n", "-0.063766 | \n", "-0.007839 | \n", "0.161119 | \n", "-0.047322 | \n", "-0.024250 | \n", "-0.038904 | \n", "0.085989 | \n", "0.036280 | \n", "... | \n", "0.084352 | \n", "-0.119525 | \n", "0.076835 | \n", "-0.010369 | \n", "0.035561 | \n", "0.055588 | \n", "0.119598 | \n", "0.306402 | \n", "-0.095085 | \n", "0.053575 | \n", "
they | \n", "-0.235321 | \n", "-0.026314 | \n", "0.143165 | \n", "-0.170460 | \n", "0.042189 | \n", "-0.019444 | \n", "-0.171945 | \n", "-0.087666 | \n", "0.005467 | \n", "-0.034397 | \n", "... | \n", "-0.122975 | \n", "-0.054745 | \n", "0.022250 | \n", "-0.068428 | \n", "-0.009932 | \n", "-0.012489 | \n", "0.102740 | \n", "0.071282 | \n", "-0.165166 | \n", "0.126805 | \n", "
food | \n", "-0.164133 | \n", "0.007745 | \n", "0.058311 | \n", "-0.169839 | \n", "-0.042278 | \n", "0.004095 | \n", "0.203732 | \n", "-0.021252 | \n", "-0.084491 | \n", "-0.016372 | \n", "... | \n", "-0.096869 | \n", "0.060159 | \n", "-0.133541 | \n", "0.166804 | \n", "0.084901 | \n", "0.109261 | \n", "0.137871 | \n", "0.018093 | \n", "-0.158754 | \n", "-0.042917 | \n", "
do | \n", "0.091428 | \n", "-0.132115 | \n", "0.105080 | \n", "0.135949 | \n", "0.038100 | \n", "0.066993 | \n", "-0.046825 | \n", "-0.165575 | \n", "-0.087334 | \n", "0.068053 | \n", "... | \n", "-0.101949 | \n", "-0.037880 | \n", "-0.187836 | \n", "0.037602 | \n", "-0.094156 | \n", "-0.040069 | \n", "-0.013014 | \n", "-0.013038 | \n", "-0.033346 | \n", "-0.056112 | \n", "
good | \n", "-0.239592 | \n", "-0.232940 | \n", "-0.005036 | \n", "-0.028226 | \n", "0.149816 | \n", "-0.133312 | \n", "-0.034164 | \n", "-0.130310 | \n", "-0.013757 | \n", "0.008618 | \n", "... | \n", "-0.034984 | \n", "-0.135347 | \n", "-0.112965 | \n", "0.056312 | \n", "0.055106 | \n", "-0.026181 | \n", "-0.135510 | \n", "0.087664 | \n", "0.009934 | \n", "-0.111619 | \n", "
place | \n", "0.025479 | \n", "0.130311 | \n", "0.119834 | \n", "-0.096365 | \n", "0.013793 | \n", "0.074431 | \n", "-0.063780 | \n", "0.063191 | \n", "-0.004273 | \n", "0.111458 | \n", "... | \n", "-0.076251 | \n", "-0.076574 | \n", "-0.086146 | \n", "-0.023936 | \n", "0.136419 | \n", "-0.001543 | \n", "-0.084301 | \n", "0.016356 | \n", "-0.148379 | \n", "-0.016498 | \n", "
so | \n", "0.021455 | \n", "0.079794 | \n", "0.192058 | \n", "-0.093809 | \n", "-0.094279 | \n", "-0.147522 | \n", "-0.066564 | \n", "-0.073133 | \n", "0.009708 | \n", "0.050529 | \n", "... | \n", "-0.050627 | \n", "-0.008651 | \n", "-0.034267 | \n", "0.045445 | \n", "-0.104442 | \n", "-0.012076 | \n", "-0.118052 | \n", "-0.015163 | \n", "-0.006679 | \n", "-0.074553 | \n", "
get | \n", "0.009313 | \n", "-0.101684 | \n", "-0.163864 | \n", "-0.159002 | \n", "0.018936 | \n", "-0.056202 | \n", "-0.074619 | \n", "-0.127081 | \n", "0.182303 | \n", "-0.001993 | \n", "... | \n", "-0.040294 | \n", "-0.038149 | \n", "-0.180993 | \n", "-0.143341 | \n", "0.140279 | \n", "0.181399 | \n", "0.054530 | \n", "-0.152596 | \n", "0.028443 | \n", "-0.030319 | \n", "
go | \n", "0.031094 | \n", "-0.126839 | \n", "-0.054429 | \n", "-0.221885 | \n", "-0.063464 | \n", "0.024554 | \n", "0.060154 | \n", "-0.011108 | \n", "-0.020744 | \n", "0.038979 | \n", "... | \n", "-0.074458 | \n", "-0.172092 | \n", "-0.123518 | \n", "0.006400 | \n", "-0.085149 | \n", "0.157569 | \n", "-0.048633 | \n", "0.017931 | \n", "0.111066 | \n", "0.040107 | \n", "
at | \n", "0.102501 | \n", "-0.095756 | \n", "-0.216304 | \n", "-0.107230 | \n", "-0.112544 | \n", "-0.036979 | \n", "-0.066605 | \n", "-0.016080 | \n", "0.046475 | \n", "-0.128300 | \n", "... | \n", "-0.058233 | \n", "-0.046821 | \n", "0.042406 | \n", "0.178607 | \n", "0.181424 | \n", "-0.120113 | \n", "0.029031 | \n", "0.113648 | \n", "-0.107441 | \n", "-0.005374 | \n", "
as | \n", "0.141091 | \n", "0.073669 | \n", "0.109637 | \n", "-0.112564 | \n", "-0.167600 | \n", "-0.059139 | \n", "-0.122552 | \n", "-0.137383 | \n", "0.093218 | \n", "0.096284 | \n", "... | \n", "-0.052156 | \n", "-0.106116 | \n", "-0.088926 | \n", "-0.079129 | \n", "0.072921 | \n", "-0.009605 | \n", "-0.001447 | \n", "0.068642 | \n", "-0.022845 | \n", "0.197407 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
hard_boiled_egg | \n", "0.025154 | \n", "0.060949 | \n", "-0.064816 | \n", "0.071975 | \n", "0.087870 | \n", "-0.034552 | \n", "0.046470 | \n", "-0.074013 | \n", "0.048614 | \n", "-0.027098 | \n", "... | \n", "0.161784 | \n", "-0.011800 | \n", "0.036288 | \n", "0.102444 | \n", "-0.036660 | \n", "-0.259496 | \n", "0.093633 | \n", "-0.055128 | \n", "0.099477 | \n", "0.069598 | \n", "
poached_quail_egg | \n", "0.060616 | \n", "0.001519 | \n", "-0.052908 | \n", "0.013590 | \n", "0.031732 | \n", "-0.020164 | \n", "0.067209 | \n", "0.015895 | \n", "0.034758 | \n", "-0.247553 | \n", "... | \n", "0.074396 | \n", "0.051672 | \n", "0.038299 | \n", "-0.001248 | \n", "0.074726 | \n", "-0.081252 | \n", "-0.078778 | \n", "0.008109 | \n", "0.023891 | \n", "0.075863 | \n", "
egg_foo | \n", "0.022816 | \n", "0.077288 | \n", "-0.122898 | \n", "-0.022765 | \n", "-0.137979 | \n", "-0.131516 | \n", "-0.025750 | \n", "0.071284 | \n", "-0.130913 | \n", "-0.041167 | \n", "... | \n", "0.044335 | \n", "0.100112 | \n", "0.124033 | \n", "-0.012426 | \n", "0.037161 | \n", "0.058809 | \n", "-0.033203 | \n", "-0.105034 | \n", "0.207285 | \n", "-0.094321 | \n", "
sherri | \n", "-0.089536 | \n", "-0.030370 | \n", "0.011311 | \n", "-0.094766 | \n", "-0.121970 | \n", "0.143681 | \n", "-0.035314 | \n", "-0.060650 | \n", "0.005531 | \n", "-0.034962 | \n", "... | \n", "-0.010114 | \n", "0.059706 | \n", "0.068094 | \n", "0.158161 | \n", "-0.076122 | \n", "0.115804 | \n", "-0.133826 | \n", "0.022022 | \n", "-0.115043 | \n", "0.043500 | \n", "
hindrance | \n", "-0.033325 | \n", "0.011520 | \n", "0.027370 | \n", "0.240223 | \n", "-0.004098 | \n", "-0.047842 | \n", "-0.008730 | \n", "-0.086408 | \n", "-0.155469 | \n", "0.036887 | \n", "... | \n", "0.028251 | \n", "0.136807 | \n", "0.079745 | \n", "-0.107322 | \n", "-0.092331 | \n", "0.151693 | \n", "0.126414 | \n", "-0.036642 | \n", "0.043212 | \n", "-0.016060 | \n", "
eggdrop_soup | \n", "-0.028369 | \n", "0.040039 | \n", "-0.005378 | \n", "-0.088736 | \n", "0.015203 | \n", "-0.150734 | \n", "-0.008286 | \n", "0.052836 | \n", "0.041498 | \n", "0.061759 | \n", "... | \n", "0.058192 | \n", "0.170658 | \n", "-0.069061 | \n", "-0.043249 | \n", "0.080981 | \n", "-0.074859 | \n", "0.026031 | \n", "-0.048331 | \n", "0.196082 | \n", "0.050968 | \n", "
arbitrarily | \n", "0.203714 | \n", "-0.047405 | \n", "-0.045261 | \n", "0.000302 | \n", "-0.074105 | \n", "-0.011420 | \n", "0.006737 | \n", "0.068212 | \n", "-0.010306 | \n", "0.162682 | \n", "... | \n", "-0.165282 | \n", "-0.078750 | \n", "-0.051527 | \n", "0.129190 | \n", "-0.088332 | \n", "0.000339 | \n", "0.021954 | \n", "0.224878 | \n", "-0.020637 | \n", "0.019025 | \n", "
faisant | \n", "0.046053 | \n", "0.083873 | \n", "0.057943 | \n", "0.174203 | \n", "-0.121259 | \n", "-0.043806 | \n", "-0.069513 | \n", "-0.037047 | \n", "-0.026478 | \n", "-0.119066 | \n", "... | \n", "-0.063223 | \n", "0.018042 | \n", "-0.107165 | \n", "0.028304 | \n", "-0.141706 | \n", "-0.084532 | \n", "0.097593 | \n", "-0.029115 | \n", "-0.016920 | \n", "-0.027668 | \n", "
marian | \n", "-0.035330 | \n", "0.146843 | \n", "-0.173594 | \n", "-0.010971 | \n", "-0.150150 | \n", "0.082224 | \n", "0.036275 | \n", "-0.028033 | \n", "-0.076082 | \n", "0.051976 | \n", "... | \n", "0.042506 | \n", "0.160951 | \n", "0.003764 | \n", "0.168254 | \n", "-0.057206 | \n", "-0.067292 | \n", "-0.254453 | \n", "0.049995 | \n", "-0.097059 | \n", "0.106862 | \n", "
9:30p | \n", "0.119010 | \n", "0.002159 | \n", "-0.000148 | \n", "-0.102635 | \n", "-0.016918 | \n", "-0.075822 | \n", "-0.016008 | \n", "0.012505 | \n", "-0.068003 | \n", "-0.066301 | \n", "... | \n", "0.027208 | \n", "-0.110685 | \n", "0.139527 | \n", "-0.031720 | \n", "-0.101919 | \n", "0.105827 | \n", "0.034134 | \n", "0.083859 | \n", "0.089299 | \n", "0.053762 | \n", "
extra_chasu | \n", "-0.225889 | \n", "-0.131615 | \n", "0.046431 | \n", "0.017999 | \n", "0.119188 | \n", "-0.075226 | \n", "-0.140749 | \n", "-0.054829 | \n", "0.210201 | \n", "-0.098395 | \n", "... | \n", "0.049070 | \n", "-0.028876 | \n", "-0.173717 | \n", "0.074353 | \n", "-0.078363 | \n", "-0.166292 | \n", "-0.007546 | \n", "0.116509 | \n", "0.073052 | \n", "0.090262 | \n", "
lavosh_wrap | \n", "0.134420 | \n", "-0.032055 | \n", "0.012240 | \n", "0.024420 | \n", "0.031334 | \n", "-0.002414 | \n", "0.022550 | \n", "-0.024545 | \n", "0.054083 | \n", "0.101219 | \n", "... | \n", "0.092470 | \n", "-0.109932 | \n", "0.019652 | \n", "-0.090741 | \n", "-0.008825 | \n", "0.052382 | \n", "0.012688 | \n", "-0.035351 | \n", "-0.093695 | \n", "-0.105152 | \n", "
dum_dum | \n", "-0.002741 | \n", "-0.137371 | \n", "0.030704 | \n", "-0.030365 | \n", "-0.134645 | \n", "-0.036521 | \n", "-0.019889 | \n", "-0.191169 | \n", "0.034061 | \n", "0.156001 | \n", "... | \n", "0.072851 | \n", "0.016341 | \n", "-0.009848 | \n", "0.038048 | \n", "-0.026917 | \n", "-0.035949 | \n", "-0.022561 | \n", "-0.000162 | \n", "0.026049 | \n", "0.074344 | \n", "
triplet | \n", "-0.010495 | \n", "0.057432 | \n", "-0.019535 | \n", "-0.044881 | \n", "0.042409 | \n", "-0.094355 | \n", "0.111214 | \n", "-0.141414 | \n", "-0.102281 | \n", "0.013674 | \n", "... | \n", "0.009309 | \n", "-0.071423 | \n", "0.029983 | \n", "-0.079734 | \n", "0.017278 | \n", "0.049596 | \n", "0.000595 | \n", "-0.111090 | \n", "0.125764 | \n", "-0.020409 | \n", "
nantucket | \n", "-0.124356 | \n", "0.141918 | \n", "-0.038579 | \n", "0.035650 | \n", "-0.157662 | \n", "0.048110 | \n", "-0.006915 | \n", "0.049056 | \n", "0.191926 | \n", "0.001897 | \n", "... | \n", "-0.076790 | \n", "-0.067047 | \n", "0.020261 | \n", "0.088759 | \n", "0.029744 | \n", "0.020393 | \n", "-0.033682 | \n", "0.150856 | \n", "0.276557 | \n", "-0.086213 | \n", "
gurl | \n", "-0.119794 | \n", "0.025898 | \n", "-0.070130 | \n", "-0.027929 | \n", "0.113244 | \n", "0.076868 | \n", "0.084859 | \n", "-0.000508 | \n", "-0.008275 | \n", "0.026478 | \n", "... | \n", "0.038612 | \n", "0.010746 | \n", "0.052930 | \n", "0.236658 | \n", "0.021199 | \n", "-0.092340 | \n", "-0.143270 | \n", "0.038394 | \n", "0.005431 | \n", "0.177151 | \n", "
nordstroms | \n", "-0.030410 | \n", "-0.026861 | \n", "-0.016836 | \n", "0.097363 | \n", "-0.098189 | \n", "0.080675 | \n", "0.005855 | \n", "0.065331 | \n", "-0.047586 | \n", "-0.083942 | \n", "... | \n", "-0.029618 | \n", "-0.259797 | \n", "0.078861 | \n", "0.081747 | \n", "-0.147309 | \n", "0.084849 | \n", "-0.121320 | \n", "0.219587 | \n", "-0.045757 | \n", "-0.032065 | \n", "
eau_de | \n", "-0.146450 | \n", "0.000788 | \n", "-0.094391 | \n", "0.146833 | \n", "-0.051392 | \n", "0.026519 | \n", "0.022764 | \n", "-0.161101 | \n", "-0.106951 | \n", "0.018758 | \n", "... | \n", "-0.051030 | \n", "-0.074207 | \n", "-0.017950 | \n", "-0.091572 | \n", "-0.142917 | \n", "-0.220402 | \n", "0.008315 | \n", "-0.086124 | \n", "0.101774 | \n", "0.051059 | \n", "
extra_.5 | \n", "-0.124972 | \n", "-0.032596 | \n", "-0.017800 | \n", "0.106415 | \n", "-0.086728 | \n", "-0.039636 | \n", "-0.048088 | \n", "-0.016923 | \n", "-0.079315 | \n", "0.078559 | \n", "... | \n", "-0.180088 | \n", "0.053834 | \n", "-0.146145 | \n", "0.114461 | \n", "0.028400 | \n", "0.033901 | \n", "0.040767 | \n", "0.131177 | \n", "0.154335 | \n", "0.170376 | \n", "
hazelnut_crunch | \n", "0.055723 | \n", "-0.098708 | \n", "0.013225 | \n", "0.098075 | \n", "-0.021967 | \n", "-0.046137 | \n", "0.083640 | \n", "0.011891 | \n", "-0.034513 | \n", "0.159220 | \n", "... | \n", "0.098174 | \n", "0.030037 | \n", "-0.028975 | \n", "0.043462 | \n", "0.046888 | \n", "-0.187656 | \n", "0.048226 | \n", "-0.086743 | \n", "0.060892 | \n", "0.115703 | \n", "
bella_notte | \n", "0.209296 | \n", "-0.102068 | \n", "-0.059274 | \n", "-0.061223 | \n", "0.032078 | \n", "0.042433 | \n", "0.060573 | \n", "-0.263855 | \n", "0.026291 | \n", "-0.082852 | \n", "... | \n", "-0.089840 | \n", "-0.048790 | \n", "-0.040089 | \n", "-0.112727 | \n", "-0.008889 | \n", "-0.095856 | \n", "-0.047674 | \n", "0.194045 | \n", "-0.118062 | \n", "0.139320 | \n", "
homebrew | \n", "0.054390 | \n", "-0.050279 | \n", "-0.181006 | \n", "0.001028 | \n", "-0.064048 | \n", "0.029383 | \n", "0.007203 | \n", "-0.207681 | \n", "-0.026785 | \n", "0.004940 | \n", "... | \n", "-0.138264 | \n", "-0.097649 | \n", "0.139629 | \n", "0.101149 | \n", "-0.145391 | \n", "0.143334 | \n", "0.121490 | \n", "-0.018492 | \n", "0.134825 | \n", "0.120686 | \n", "
conveyor_belt_oven | \n", "-0.099600 | \n", "-0.127618 | \n", "0.070998 | \n", "-0.040632 | \n", "-0.022066 | \n", "-0.021832 | \n", "-0.066087 | \n", "-0.130704 | \n", "-0.057886 | \n", "0.000509 | \n", "... | \n", "0.020469 | \n", "-0.251385 | \n", "0.096959 | \n", "0.079115 | \n", "0.054866 | \n", "-0.097657 | \n", "0.223597 | \n", "0.031170 | \n", "0.070630 | \n", "-0.136683 | \n", "
meekly | \n", "-0.087812 | \n", "0.020389 | \n", "0.040041 | \n", "-0.046436 | \n", "0.084847 | \n", "0.022525 | \n", "0.122033 | \n", "-0.047317 | \n", "0.005451 | \n", "0.035133 | \n", "... | \n", "0.059881 | \n", "0.080178 | \n", "0.009784 | \n", "0.125705 | \n", "0.010245 | \n", "0.068661 | \n", "-0.171023 | \n", "0.034595 | \n", "0.104936 | \n", "0.051597 | \n", "
foccaccia | \n", "0.118130 | \n", "0.028629 | \n", "-0.063642 | \n", "0.006247 | \n", "0.072471 | \n", "-0.086754 | \n", "0.078307 | \n", "-0.052333 | \n", "-0.051110 | \n", "0.034550 | \n", "... | \n", "0.045899 | \n", "-0.123955 | \n", "-0.082976 | \n", "0.117231 | \n", "-0.052423 | \n", "-0.274294 | \n", "0.093353 | \n", "0.015437 | \n", "-0.108372 | \n", "-0.065421 | \n", "
clyde | \n", "0.025336 | \n", "-0.044486 | \n", "-0.081030 | \n", "-0.049451 | \n", "-0.215602 | \n", "-0.004157 | \n", "0.048990 | \n", "-0.149425 | \n", "-0.003808 | \n", "-0.043461 | \n", "... | \n", "-0.116701 | \n", "0.082913 | \n", "-0.020653 | \n", "-0.006104 | \n", "0.045135 | \n", "0.094833 | \n", "-0.134000 | \n", "0.077121 | \n", "0.069405 | \n", "0.039477 | \n", "
original_g_spicy | \n", "0.058115 | \n", "-0.036521 | \n", "-0.119183 | \n", "-0.040159 | \n", "0.163193 | \n", "0.043903 | \n", "0.047436 | \n", "-0.050745 | \n", "-0.071357 | \n", "0.027666 | \n", "... | \n", "0.018813 | \n", "0.062281 | \n", "-0.057308 | \n", "0.022598 | \n", "-0.043096 | \n", "0.110582 | \n", "0.028887 | \n", "-0.102838 | \n", "0.020102 | \n", "-0.127496 | \n", "
potaotes | \n", "0.009098 | \n", "0.057424 | \n", "-0.156997 | \n", "-0.057388 | \n", "0.030169 | \n", "-0.095243 | \n", "0.110111 | \n", "0.049034 | \n", "0.202477 | \n", "0.018903 | \n", "... | \n", "-0.080152 | \n", "0.054386 | \n", "0.005011 | \n", "-0.024257 | \n", "0.078248 | \n", "-0.135774 | \n", "-0.086026 | \n", "0.067570 | \n", "0.036091 | \n", "-0.236710 | \n", "
desert_botanical_gardens | \n", "0.073653 | \n", "0.200249 | \n", "-0.088580 | \n", "-0.032873 | \n", "-0.161853 | \n", "0.066677 | \n", "0.162242 | \n", "-0.057449 | \n", "-0.014113 | \n", "0.114148 | \n", "... | \n", "-0.128910 | \n", "-0.128824 | \n", "0.168945 | \n", "0.011161 | \n", "-0.043418 | \n", "0.006576 | \n", "-0.059663 | \n", "0.228896 | \n", "-0.001921 | \n", "0.073541 | \n", "
mi_match | \n", "0.025482 | \n", "0.122178 | \n", "0.062693 | \n", "0.150734 | \n", "-0.028056 | \n", "0.091268 | \n", "-0.173603 | \n", "0.027926 | \n", "-0.232217 | \n", "-0.054804 | \n", "... | \n", "-0.034624 | \n", "0.001389 | \n", "0.182999 | \n", "-0.010098 | \n", "-0.074026 | \n", "-0.003859 | \n", "0.082882 | \n", "-0.061745 | \n", "0.132040 | \n", "0.038518 | \n", "
50835 rows × 100 columns
\n", "\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "90 | \n", "91 | \n", "92 | \n", "93 | \n", "94 | \n", "95 | \n", "96 | \n", "97 | \n", "98 | \n", "99 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
food | \n", "-0.164133 | \n", "0.007745 | \n", "0.058311 | \n", "-0.169839 | \n", "-0.042278 | \n", "0.004095 | \n", "0.203732 | \n", "-0.021252 | \n", "-0.084491 | \n", "-0.016372 | \n", "... | \n", "-0.096869 | \n", "0.060159 | \n", "-0.133541 | \n", "0.166804 | \n", "0.084901 | \n", "0.109261 | \n", "0.137871 | \n", "0.018093 | \n", "-0.158754 | \n", "-0.042917 | \n", "
good | \n", "-0.239592 | \n", "-0.232940 | \n", "-0.005036 | \n", "-0.028226 | \n", "0.149816 | \n", "-0.133312 | \n", "-0.034164 | \n", "-0.130310 | \n", "-0.013757 | \n", "0.008618 | \n", "... | \n", "-0.034984 | \n", "-0.135347 | \n", "-0.112965 | \n", "0.056312 | \n", "0.055106 | \n", "-0.026181 | \n", "-0.135510 | \n", "0.087664 | \n", "0.009934 | \n", "-0.111619 | \n", "
place | \n", "0.025479 | \n", "0.130311 | \n", "0.119834 | \n", "-0.096365 | \n", "0.013793 | \n", "0.074431 | \n", "-0.063780 | \n", "0.063191 | \n", "-0.004273 | \n", "0.111458 | \n", "... | \n", "-0.076251 | \n", "-0.076574 | \n", "-0.086146 | \n", "-0.023936 | \n", "0.136419 | \n", "-0.001543 | \n", "-0.084301 | \n", "0.016356 | \n", "-0.148379 | \n", "-0.016498 | \n", "
order | \n", "0.045996 | \n", "-0.035101 | \n", "-0.045906 | \n", "-0.280336 | \n", "0.157393 | \n", "-0.146304 | \n", "-0.064311 | \n", "-0.082754 | \n", "0.124693 | \n", "-0.194072 | \n", "... | \n", "-0.063306 | \n", "0.125928 | \n", "-0.194433 | \n", "-0.129551 | \n", "0.039680 | \n", "0.058868 | \n", "-0.023189 | \n", "-0.153715 | \n", "0.152482 | \n", "-0.003842 | \n", "
great | \n", "-0.189524 | \n", "-0.259744 | \n", "-0.041182 | \n", "0.021664 | \n", "0.132266 | \n", "-0.030005 | \n", "-0.078055 | \n", "-0.004111 | \n", "-0.016184 | \n", "0.054305 | \n", "... | \n", "-0.052275 | \n", "-0.122914 | \n", "-0.075555 | \n", "0.193383 | \n", "0.017008 | \n", "-0.117156 | \n", "-0.041065 | \n", "0.140430 | \n", "-0.067349 | \n", "-0.040291 | \n", "
5 rows × 100 columns
\n", "\n", " | x_coord | \n", "y_coord | \n", "
---|---|---|
food | \n", "2.313886 | \n", "6.475995 | \n", "
good | \n", "8.763030 | \n", "4.633407 | \n", "
place | \n", "-8.942178 | \n", "2.221976 | \n", "
order | \n", "-2.876029 | \n", "-2.300830 | \n", "
great | \n", "9.515772 | \n", "5.076319 | \n", "
\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"