{ "metadata": { "name": "", "signature": "sha256:d8f3947fc2f92555c36d456253c5e4ce375a98088897d2d35fbdab03c175c5ae" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import jinja2\n", "\n", "from collections import OrderedDict\n", "from json import dumps\n", "from IPython.html.widgets import interact\n", "from IPython.html import widgets\n", "from IPython.display import display, display_pretty, Javascript, HTML\n", "from IPython.utils.traitlets import Any, Bool, Dict, List, Unicode\n", "from threading import Lock\n", "from urllib2 import urlopen" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Embedding Interactive Charts on an IPython Notebook" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this three part post we\u2019ll show you how easy it is to integrate D3.js, Chart.js and HighCharts chart into an notebook and how to make them interactive using HTML widgets." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Requirements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only requirement to run the examples is IPython Notebook version 2.0 or greater. All the modules that we reference are either in the standard Python distribution, or are dependencies of IPython." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "About Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although Pandas is not strictly necessary to accomplish what we do in the examples, it is such a popular data analysis tool that we wanted to use it anyway. We recommend that you read the 10 Minutes to Pandas tutorial to get and idea of what it can do or buy *Python for Data Analysis* for an in depth guide of data analysis using Python, Pandas and NumPy." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "About the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the data that we use in the examples are taken from the [United States Census Bureau](http://www.census.gov) site. We're going to use 2012 population estimates and we're going to plot the sex and age groups by the state, region and division." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Population by State" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're going to build a Pandas DataFrame from the dataset of [Incorporated Places and Minor Civil Divisions](http://www.census.gov/popest/data/cities/totals/2012/SUB-EST2012.html). We could have just grabbed the [estimates for the states](http://www.census.gov/popest/data/state/totals/2012/index.html), but also wanted to show you how easy it is to work with data using Pandas. First, we fetch the data using [urlopen](https://docs.python.org/2/library/urllib2.html#urllib2.urlopen) and we parse the response as CSV using Pandas' [read_csv](http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.io.parsers.read_csv.html) function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sub_est_2012_df = pd.read_csv(\n", " urlopen('http://www.census.gov/popest/data/cities/totals/2012/files/SUB-EST2012.csv'),\n", " encoding='latin-1',\n", " dtype={'STATE': 'str', 'COUNTY': 'str', 'PLACE': 'str'}\n", ")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting data frame has a lot of information that we don\u2019t need and can be discarded. According to the [file layout description](http://www.census.gov/popest/data/cities/totals/2012/files/SUB-EST2012.pdf), the data is summarized at the nation, state, county and place levels according to the SUMLEV column. Since we\u2019re only interested in the population for each state we can just filter the rows with SUMLEV \u201840\u2019, but wanted to show you how to use the aggregate feature of Pandas\u2019 DataFrames, so we\u2019ll take the data summarized at the count level (SUMLEV \u201850\u2019), then we\u2019ll group by state, and sum the population estimates." ] }, { "cell_type": "code", "collapsed": false, "input": [ "sub_est_2012_df_by_county = sub_est_2012_df[sub_est_2012_df.SUMLEV == 50]\n", "sub_est_2012_df_by_state = sub_est_2012_df_by_county.groupby(['STATE']).sum()\n", "\n", "# Alternatively we could have just taken the summary rows for the states\n", "\n", "# sub_est_2012_df_by_state = sub_est_2012_df[sub_est_2012_df.SUMLEV == 40]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you see the table, the states are referenced using their ANSI codes. We can augment the table to include the state names and abbreviations by merging with [another resource](http://www.census.gov/geo/reference/ansi_statetables.html) from the Geography section of the US Census Bureau site. We use `read_csv` Pandas function making sure that we use the pipe character (|) as separator." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Taken from http://www.census.gov/geo/reference/ansi_statetables.html\n", "\n", "state = pd.read_csv(urlopen('http://www.census.gov/geo/reference/docs/state.txt'), sep='|', dtype={'STATE': 'str'})" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "state.drop(\n", " ['STATENS'],\n", " inplace=True, axis=1\n", ")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "sub_est_2012_df_by_state = pd.merge(sub_est_2012_df_by_state, state, left_index=True, right_on='STATE')\n", "sub_est_2012_df_by_state.drop(\n", " ['SUMLEV', 'COUSUB', 'CONCIT', 'ESTIMATESBASE2010', 'POPESTIMATE2010', 'POPESTIMATE2011'],\n", " inplace=True, axis=1\n", ")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're also interested in plotting the information about the age and sex of the people, and for that we can use the [Annual Estimates of the Civilian Population by Single Year of Age and Sex](http://www.census.gov/popest/data/state/asrh/2012/SC-EST2012-AGESEX-CIV.html)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Taken from http://www.census.gov/popest/data/state/asrh/2012/SC-EST2012-AGESEX-CIV.html\n", "\n", "sc_est2012_agesex_civ_df = pd.read_csv(\n", " urlopen('http://www.census.gov/popest/data/state/asrh/2012/files/SC-EST2012-AGESEX-CIV.csv'),\n", " encoding='latin-1',\n", " dtype={'SUMLEV': 'str'}\n", ")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again, the table is summarized at many levels, but we're only interested in the information at the state level, so we filter out the unnecessary rows. We also do a little bit of processing to the STATE column so it can be used to merge with the state DataFrame." ] }, { "cell_type": "code", "collapsed": false, "input": [ "sc_est2012_agesex_civ_df_sumlev040 = sc_est2012_agesex_civ_df[\n", " (sc_est2012_agesex_civ_df.SUMLEV == '040') &\n", " (sc_est2012_agesex_civ_df.SEX != 0) &\n", " (sc_est2012_agesex_civ_df.AGE != 999)\n", "]\n", "sc_est2012_agesex_civ_df_sumlev040.drop(\n", " ['SUMLEV', 'NAME', 'ESTBASE2010_CIV', 'POPEST2010_CIV', 'POPEST2011_CIV'],\n", " inplace=True, axis=1\n", ")\n", "sc_est2012_agesex_civ_df_sumlev040['STATE'] = sc_est2012_agesex_civ_df_sumlev040['STATE'].apply(lambda x: '%02d' % (x,))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we need to do is group the rows by state, region, division and sex, and sum across all ages. Afterwards, we augment the result with the names and abbreviations of the states. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "sc_est2012_sex = sc_est2012_agesex_civ_df_sumlev040.groupby(['STATE', 'REGION', 'DIVISION', 'SEX'], as_index=False)[['POPEST2012_CIV']].sum()\n", "sc_est2012_sex = pd.merge(sc_est2012_sex, state, left_on='STATE', right_on='STATE')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the age information, we group by state, region, division and age and we sum across all sexes. If you see the result, you'll notice that there's a row for each year. This is pretty useful for analysis, but it can be problematic to plot, so we're going to group the rows according to age buckets of 20 years. Once again, we add the state information at the end." ] }, { "cell_type": "code", "collapsed": false, "input": [ "sc_est2012_age = sc_est2012_agesex_civ_df_sumlev040.groupby(['STATE', 'REGION', 'DIVISION', 'AGE'], as_index=False)[['POPEST2012_CIV']].sum()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "age_buckets = pd.cut(sc_est2012_age.AGE, range(0,100,20))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "sc_est2012_age = sc_est2012_age.groupby(['STATE', 'REGION', 'DIVISION', age_buckets], as_index=False)['POPEST2012_CIV'].sum()\n", "sc_est2012_age = pd.merge(sc_est2012_age, state, left_on='STATE', right_on='STATE')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need information about regions and divisions, but since the dataset is small, we'll build the dictionaries by hand." ] }, { "cell_type": "code", "collapsed": false, "input": [ "region_codes = {\n", " 0: 'United States Total',\n", " 1: 'Northeast',\n", " 2: 'Midwest',\n", " 3: 'South',\n", " 4: 'West'\n", "}\n", "division_codes = {\n", " 0: 'United States Total',\n", " 1: 'New England',\n", " 2: 'Middle Atlantic',\n", " 3: 'East North Central',\n", " 4: 'West North Central',\n", " 5: 'South Atlantic',\n", " 6: 'East South Central',\n", " 7: 'West South Central',\n", " 8: 'Mountain',\n", " 9: 'Pacific'\n", "}" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Part 1 - Embedding D3.js" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[D3.js](http://d3js.org/) is an incredibly flexible JavaScript chart library. Although it is primarily used to plot data, it can be used to draw arbitrary graphics and animations.\n", "\n", "Let's build a column chart of the five most populated states in the USA. IPython Notebooks are regular web pages so in order to use any JavaScript library in it, we need to load the necessary requirements. IPython Notebook uses [RequireJS](http://requirejs.org/) to load its own requirements, so we can make use of it with the `%%javascript` cell magic to load external dependencies.\n", "\n", "In all the examples of this notebook we'll load the libraries from [cdnjs.com](http://cdnjs.com/), so to declare the requirement of D3.js we do" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%javascript\n", "require.config({\n", " paths: {\n", " d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min'\n", " }\n", "});" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll make use of the `display` function and `HTML` from the IPython Notebook [API](http://ipython.org/ipython-doc/2/api/generated/IPython.core.display.html#module-IPython.core.display) to render HTML content within the notebook itself. We're declaring styles to change the look and feel of the plots, and we define a new `div` with id `\"chart_d3\"` that the library is going to use as the target of the plot." ] }, { "cell_type": "code", "collapsed": false, "input": [ "display(HTML(\"\"\"\n", "\n", "
\n", "\"\"\"))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we define a template with the JavaScript code that is going to render the chart. Notice that we iterate over the \"data\" parameter to populate the \"data\" variable in JavaScript. Afterwards, we use the `display` method once again to force the execution of the JavaScript code, which renders the chart on the target div." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "sub_est_2012_df_by_state_template" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sub_est_2012_df_by_state_template = jinja2.Template(\n", "\"\"\"\n", "// Based on http://bl.ocks.org/mbostock/3885304\n", "\n", "require([\"d3\"], function(d3) {\n", " var data = []\n", "\n", " {% for row in data %}\n", " data.push({ 'state': '{{ row[4] }}', 'population': {{ row[1] }} });\n", " {% endfor %}\n", "\n", " d3.select(\"#chart_d3 svg\").remove()\n", "\n", " var margin = {top: 20, right: 20, bottom: 30, left: 40},\n", " width = 800 - margin.left - margin.right,\n", " height = 400 - margin.top - margin.bottom;\n", "\n", " var x = d3.scale.ordinal()\n", " .rangeRoundBands([0, width], .25);\n", "\n", " var y = d3.scale.linear()\n", " .range([height, 0]);\n", "\n", " var xAxis = d3.svg.axis()\n", " .scale(x)\n", " .orient(\"bottom\");\n", "\n", " var yAxis = d3.svg.axis()\n", " .scale(y)\n", " .orient(\"left\")\n", " .ticks(10)\n", " .tickFormat(d3.format('.1s'));\n", " \n", " var svg = d3.select(\"#chart_d3\").append(\"svg\")\n", " .attr(\"width\", width + margin.left + margin.right)\n", " .attr(\"height\", height + margin.top + margin.bottom)\n", " .append(\"g\")\n", " .attr(\"transform\", \"translate(\" + margin.left + \",\" + margin.top + \")\");\n", "\n", " x.domain(data.map(function(d) { return d.state; }));\n", " y.domain([0, d3.max(data, function(d) { return d.population; })]);\n", "\n", " svg.append(\"g\")\n", " .attr(\"class\", \"x axis\")\n", " .attr(\"transform\", \"translate(0,\" + height + \")\")\n", " .call(xAxis);\n", "\n", " svg.append(\"g\")\n", " .attr(\"class\", \"y axis\")\n", " .call(yAxis)\n", " .append(\"text\")\n", " .attr(\"transform\", \"rotate(-90)\")\n", " .attr(\"y\", 6)\n", " .attr(\"dy\", \".71em\")\n", " .style(\"text-anchor\", \"end\")\n", " .text(\"Population\");\n", "\n", " svg.selectAll(\".bar\")\n", " .data(data)\n", " .enter().append(\"rect\")\n", " .attr(\"class\", \"bar\")\n", " .attr(\"x\", function(d) { return x(d.state); })\n", " .attr(\"width\", x.rangeBand())\n", " .attr(\"y\", function(d) { return y(d.population); })\n", " .attr(\"height\", function(d) { return height - y(d.population); });\n", "});\n", "\"\"\"\n", ")\n", "display(Javascript(sub_est_2012_df_by_state_template.render(\n", " data=sub_est_2012_df_by_state.sort(['POPESTIMATE2012'], ascending=False)[:5].itertuples()))\n", ")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The chart shows that California, Texas, New York, Florida and Illinois are the most populated states. What about the other states? Let's build an interactive chart that allows us to show whichever state we chose. IPython Notebook provides widgets that allow us to get information from the user in an intuitive manner. Sadly, at the time of this writing, there's no widget to select multiple items from a list, so before we move on, let's define such widget." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "MultipleSelectWidget" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class MultipleSelectWidget(widgets.DOMWidget):\n", " _view_name = Unicode('MultipleSelectView', sync=True)\n", " \n", " value = List(sync=True)\n", " values = Dict(sync=True)\n", " values_order = List(sync=True)\n", " description = Unicode(sync=True)\n", "\n", " def __init__(self, *args, **kwargs):\n", " self.value_lock = Lock()\n", "\n", " self.values = kwargs.get('values', [])\n", " self.value = kwargs.get('value', [])\n", " self.values_order = kwargs.get('values_order', [])\n", " \n", " widgets.DOMWidget.__init__(self, *args, **kwargs)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%%javascript\n", "require([\"widgets/js/widget\"], function(WidgetManager){\n", " var MultipleSelectView = IPython.DOMWidgetView.extend({\n", " initialize: function(parameters) {\n", " this.model.on('change',this.update,this);\n", " this.options = parameters.options;\n", " this.child_views = [];\n", " // I had to override DOMWidgetView's initialize to set model.views otherwise\n", " // multiple views would get attached to the model\n", " this.model.views = [this];\n", " },\n", " \n", " render : function(){\n", " this.$el\n", " .addClass('widget-hbox');\n", " this.$label = $('
')\n", " .appendTo(this.$el)\n", " .addClass('widget-hlabel')\n", " .hide();\n", " this.$listbox = $('