{ "metadata": { "name": "", "signature": "sha256:97f4c336a9201382aaed9d3e8e089873dddb5a566b498a503aced99bf837946a" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Plotly and Socrata" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Awesome datasets and graphs coming together" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Taken from both companies' Wikipedia pages:\n", "\n", "> Plotly is an online analytics and data visualization tool. Plotly provides online graphing, analytics, a Python command line, and stats tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST.\n", "\n", "> Socrata is a company that provides social data discovery services for opening government data. Socrata targets non-technical Internet users who want to view and share government, healthcare, energy, education, or environment data. Its products are issued under a proprietary, closed, exclusive license.\n", "\n", "Simply put, the two are meant to work together and this IPython notebook will you how you can turn a dataset like this one and into a plot like that one." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "1. Get a Socrata application token" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You need an application token to communicate with Socrata from a Socrata Open Data API (soda for short).\n", "\n", "Register to Socrata and get your application token here." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "2. Install the Soda Ruby wrapper" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately, there are no Soda Python wrapper available at this moment in time. But, fortunately, IPython allows us to use mutliple programming language inside the same environment (called an IPython notebook). So, here we will use Ruby and the `soda-ruby` gem to comminicate with Socrata.\n", "\n", "With Ruby and gem installed on your machine, run in a terminal/command prompt:\n", "\n", "* `$ gem install soda-ruby`\n", "\n", "Add `sudo` in front of the above for a system-wide install on Unix-like machines. Information about local gem install can be found here.\n", "\n", "Then, add the line:\n", "\n", " gem 'soda-ruby', :require => 'soda'\n", "\n", "to a file named `Gemfile` placed either in the current directory or in folder part of the gems path found of your machine (more here)." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "3. Get dataset from Socrata with Ruby and transfer it to IPython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Head to opendata.socrata.com, browse or search for a dataset that you like and click on its link. I chose a list of the Guardian's \"Top 1,000 Songs to Hear Before You Die\" which can be viewed here. Here is a screenshot of the web page in question:\n", "\n", "\n", "\n", "Then, \n", "\n", "1. Click on `Export`, a blue button on the upper right side of the page.\n", "\n", "2. Click on `Soda API`, the upper-most tab under `Export`.\n", "\n", "3. Copy the `API Access Endpoint`, under the `Soda API` tab.\n", "\n", "In our case the API Access Endpoint is:\n", "\n", " http://opendata.socrata.com/resource/ed74-c6ni.json\n", " \n", "The API Access Endpoint represent the link between the dataset hosted on Socrata and the API, in our case soda-ruby. It contains two pieces of important information: the domain name and the dataset identifier. From the Socrata offical docs, take note that the API Access Endpoint corresponds to:\n", "\n", " http://$domain/resource/$dataset_identifier\n", "\n", "So, in our case the domain name is `opendata.socrata.com` and the dataset identifier is `ed74-c6ni`. Note that `.json` is just the file extension, not needed to access the dataset).\n", "\n", "Now, call the `%%ruby` IPython inline magic to turn on Ruby inside the cell below:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%ruby --out socrata_data\n", "\n", "# with --out, data written to the stdout in this ruby cell \n", "# will be mapped to a Python variable (socrata_data) after execution.\n", "\n", "require 'soda/client'\n", "require 'json' \n", "\n", "# Set up client object with domain and application token\n", "client = SODA::Client.new({:domain => \"opendata.socrata.com\", \n", " :app_token => \"eqZC5q2iEmFXdIu2qEbtZkWgP\"})\n", "\n", "# Get data with dataset identifier\n", "response = client.get(\"ed74-c6ni\")\n", "\n", "# Print dataset to stdout as a JSON \n", "puts response.to_json" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And there you go, the Socrata dataset in now inside our IPython namespace!\n", "\n", "Next, we will handle the dataset inside IPython using the popular `pandas` module, so" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "\n", "# Read the retrieved JSON dataset (df stands for dataframe)\n", "df = pd.read_json(socrata_data)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "df.head() # print the first 5 lines of the dataframe" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | artist | \n", "spotify_url | \n", "theme | \n", "title | \n", "year | \n", "
---|---|---|---|---|---|
0 | \n", "ABC | \n", "{u'url': u'http://open.spotify.com/track/78j3q... | \n", "Love | \n", "The Look of Love | \n", "1982 | \n", "
1 | \n", "Badly Drawn Boy | \n", "{u'url': u'http://open.spotify.com/track/2PojS... | \n", "Love | \n", "The Shining | \n", "2000 | \n", "
2 | \n", "The Beach Boys | \n", "{u'url': u'http://open.spotify.com/track/0ObrX... | \n", "Love | \n", "God Only Knows | \n", "1966 | \n", "
3 | \n", "The Beach Boys | \n", "{u'url': u'http://open.spotify.com/track/2oF7F... | \n", "Love | \n", "Good Vibrations | \n", "1966 | \n", "
4 | \n", "The Beach Boys | \n", "{u'url': u'http://open.spotify.com/track/0cx32... | \n", "Love | \n", "Wouldn\u2019t It Be Nice | \n", "1966 | \n", "
5 rows \u00d7 5 columns
\n", "