{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Machine Learning to Predict Yelp Ratings from Attributes\n", "\n", "The goal of this project is to explore the power of Yelp metadata attributes to predict the rating of a venue. Yelp collects a lot of data on businesses (See [Yelp Developer Docs](https://www.yelp.com/developers/documentation/v2/business)). I will focus on:\n", "\n", "* city - city in which the business resides\n", "* longitude & latitude - coordinates of business\n", "* categories - provides a list of catogories the business is associated with\n", "* attributes - a list of various features (Take Out, Waiter Service, Alcohol, etc.)\n", "\n", "How predictive are the 'characteristics' of the restaurant for star ratings?" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python2.7/dist-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.\n", " warnings.warn(self.msg_depr % (key, alt_key))\n" ] } ], "source": [ "import pandas as pd\n", "import gzip\n", "import simplejson\n", "import re\n", "from sklearn.cross_validation import train_test_split\n", "import matplotlib\n", "import seaborn as sns\n", "from sklearn import metrics\n", "import matplotlib.pylab as plt\n", "import sklearn\n", "import numpy as np\n", "from sklearn import base\n", "from sklearn.externals import joblib\n", "from sklearn import neighbors, cross_validation, grid_search\n", "import matplotlib.pylab as plt\n", "from sklearn.feature_extraction import DictVectorizer\n", "from sklearn.feature_extraction.text import TfidfTransformer\n", "from sklearn import linear_model\n", "\n", "plt.style.use('ggplot')\n", "%matplotlib inline\n", "pd.options.display.max_columns=25" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data = gzip.open('yelp_train_academic_dataset_business.json.gz')\n", "data_content = data.read()\n", "data.close()\n", "lines= re.split('\\n',data_content)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | attributes | \n", "business_id | \n", "categories | \n", "city | \n", "full_address | \n", "hours | \n", "latitude | \n", "longitude | \n", "name | \n", "neighborhoods | \n", "open | \n", "review_count | \n", "stars | \n", "state | \n", "type | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "{u'By Appointment Only': True} | \n", "vcNAWiLM4dR7D2nwwJ7nCA | \n", "[Doctors, Health & Medical] | \n", "Phoenix | \n", "4840 E Indian School Rd\\nSte 101\\nPhoenix, AZ ... | \n", "{u'Thursday': {u'close': u'17:00', u'open': u'... | \n", "33.499313 | \n", "-111.983758 | \n", "Eric Goldberg, MD | \n", "[] | \n", "True | \n", "7 | \n", "3.5 | \n", "AZ | \n", "business | \n", "
1 | \n", "{u'Take-out': True, u'Price Range': 1, u'Outdo... | \n", "JwUE5GmEO-sH1FuwJgKBlQ | \n", "[Restaurants] | \n", "De Forest | \n", "6162 US Highway 51\\nDe Forest, WI 53532 | \n", "{} | \n", "43.238893 | \n", "-89.335844 | \n", "Pine Cone Restaurant | \n", "[] | \n", "True | \n", "26 | \n", "4.0 | \n", "WI | \n", "business | \n", "
2 | \n", "{u'Take-out': True, u'Outdoor Seating': False,... | \n", "uGykseHzyS5xAMWoN6YUqA | \n", "[American (Traditional), Restaurants] | \n", "De Forest | \n", "505 W North St\\nDe Forest, WI 53532 | \n", "{u'Monday': {u'close': u'22:00', u'open': u'06... | \n", "43.252267 | \n", "-89.353437 | \n", "Deforest Family Restaurant | \n", "[] | \n", "True | \n", "16 | \n", "4.0 | \n", "WI | \n", "business | \n", "
3 | \n", "{u'Take-out': True, u'Accepts Credit Cards': T... | \n", "LRKJF43s9-3jG9Lgx4zODg | \n", "[Food, Ice Cream & Frozen Yogurt, Fast Food, R... | \n", "De Forest | \n", "4910 County Rd V\\nDe Forest, WI 53532 | \n", "{u'Monday': {u'close': u'22:00', u'open': u'10... | \n", "43.251045 | \n", "-89.374983 | \n", "Culver's | \n", "[] | \n", "True | \n", "7 | \n", "4.5 | \n", "WI | \n", "business | \n", "
4 | \n", "{u'Take-out': True, u'Has TV': False, u'Outdoo... | \n", "RgDg-k9S5YD_BaxMckifkg | \n", "[Chinese, Restaurants] | \n", "De Forest | \n", "631 S Main St\\nDe Forest, WI 53532 | \n", "{u'Monday': {u'close': u'22:00', u'open': u'11... | \n", "43.240875 | \n", "-89.343722 | \n", "Chang Jiang Chinese Kitchen | \n", "[] | \n", "True | \n", "3 | \n", "4.0 | \n", "WI | \n", "business | \n", "