{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Hello 👋 Welcome !!! \n", "\n", "### This notebook serves as a mini version of the 4 part [Yelp data analysis](https://vaddina.github.io/2016/12/18/Yelp-Dataset-Analysis-I.html) blog post series and gives you a short presentation of the interesting insights we derived thereof. Please head over to the blog for more details. The code is available here. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we conduct data exploration & analysis of [Yelp data](https://www.yelp.com/dataset_challenge) and the goal is to frame a few interesting questions and try to find the answers using the data.\n", "\n", "[Yelp](https://www.yelp.com/) is a website which `publishes crowd-sourced public reviews`. In other words, it offers a service where people can review many types of businesses like `restaurants, bars, automotive related etc.`. \n", "\n", "The dataset acts as a rich source of information on users & businesses and helps us to find out many interesting topics like evolving food trends, best food by region, cheap & best beauty parlours nearby etc. The users' data can also be used as a social graph to find interesting connections amongst users based on their interests, friend circles and so on. Many [research articles](https://scholar.google.com/scholar?q=citation%3A+Yelp+Dataset) have made use of this evolving data to derive interesting insights. Though many have concentrated on `Reviews, Businesses & Users'` data, there are only few papers that made use of `Tips`. Here among other things, we shall explore its relation to other datasets and its importance.\n", "\n", "We ask the following questions:\n", "\n", "1. How did generosity change over time? How does it compare by reviews' growth ?\n", " * How does it vary by region / sex ?\n", "2. Is there any relationship between the reviews and tips left by any given user?\n", " * Is it different when looked from a business' perspective ?\n", "3. How did gender diversity change over time?\n", " * How is it related to the contribution of reviews & tips?\n", "4. Predict the rating given by a user just from his/her review.\n", " * In other words, perform a fine grained sentiment classification." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`NOTE: The internals of the code is not much explained here as this intends to give a concise view of what we explored and how we did it. You can head over to the blog post for in-detail explanation of why we are doing what we are doing. Or you can select the 'open' option under 'file' menu above and inspect the code yourself.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we proceed further, let's load data analysis and visualization libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "np = pd.np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from matplotlib import pyplot as plt\n", "import seaborn as sns\n", "\n", "from bokeh.plotting import figure, show, output_notebook #, output_file\n", "from bokeh.charts import Area, defaults\n", "from bokeh.layouts import row, gridplot, column\n", "from bokeh.models import Range1d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Output the plots to notebook for `inline` visualization" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(global) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = \"1\";\n", "\n", " if (typeof (window._bokeh_onload_callbacks) === \"undefined\" || force !== \"\") {\n", " window._bokeh_onload_callbacks = [];\n", " window._bokeh_is_loading = undefined;\n", " }\n", "\n", "\n", " \n", " if (typeof (window._bokeh_timeout) === \"undefined\" || force !== \"\") {\n", " window._bokeh_timeout = Date.now() + 5000;\n", " window._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"