{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "I'm currently reading [Dataclysm](http://dataclysm.org/), a book by one of the OkCupid founders, Christian Rudder. He's the one behind the [OkTrends blog](http://blog.okcupid.com/), which gives you a taste of what sort of data analysis the book is about. About halfway through the book, Rudder analyzes essays written by the users about themselves. To find meaning in the data across the different categories (white, black, asian, hispanic), he makes us of [quantile-quantile plots](https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot). This struck me as an excellent application of interactive visualization using Bokeh and the Kaggle What's Cooking challenge data, which [I have previously investigated](http://flothesof.github.io/kaggle-whats-cooking-machine-learning.html). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Loading the data and counting it " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will start by loading the data, as usual:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df = pd.read_json('train.json')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | cuisine | \n", "id | \n", "ingredients | \n", "
---|---|---|---|
0 | \n", "greek | \n", "10259 | \n", "[romaine lettuce, black olives, grape tomatoes... | \n", "
1 | \n", "southern_us | \n", "25693 | \n", "[plain flour, ground pepper, salt, tomatoes, g... | \n", "
2 | \n", "filipino | \n", "20130 | \n", "[eggs, pepper, salt, mayonaise, cooking oil, g... | \n", "
3 | \n", "indian | \n", "22213 | \n", "[water, vegetable oil, wheat, salt] | \n", "
4 | \n", "indian | \n", "13162 | \n", "[black pepper, shallots, cornflour, cayenne pe... | \n", "