{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# [matta](https://github.com/carnby/matta) - view and scaffold d3.js visualizations in IPython notebooks\n", "\n", "## basic examples\n", "\n", "By [@carnby](https://twitter.com/carnby).\n", "\n", "This notebook showcases the basic matta visualizations, as well as their usage.\n", "\n", "Note that the `init_javascript` call is not needed when running on local server having added the javascript code to your IPython profile." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/cartography/template.css\n", "/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/parsets/template.css\n" ] }, { "data": { "text/html": [ "\n", "matta Javascript code added." ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import networkx as nx\n", "import matta\n", "import json\n", "import requests\n", "\n", "from networkx.readwrite import json_graph\n", "\n", "# we do this to load the required libraries when viewing on NBViewer\n", "matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wordclouds\n", "\n", "Wordclouds are implemented using the [d3.layout.cloud layout by Jason Davies](http://www.jasondavies.com/wordcloud/). They work with bags of words. The python `Counter` class is perfect for this purposes." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "u\"\\ufeff***The Project Gutenberg's Etext of Shakespeare's First Folio***\\r\\n*********************The Tragedie\"" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hamlet = requests.get('http://www.gutenberg.org/cache/epub/2265/pg2265.txt').text\n", "hamlet[0:100]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import re\n", "from collections import Counter\n", "\n", "words = re.split(r'[\\W]+', hamlet.lower())\n", "counts = Counter(words)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
wordfrequency
995the1108
1877and920
2656to762
2437of698
4951you593
\n", "
" ], "text/plain": [ " word frequency\n", "995 the 1108\n", "1877 and 920\n", "2656 to 762\n", "2437 of 698\n", "4951 you 593" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame.from_records(counts.iteritems(), columns=['word', 'frequency'])\n", "df.sort_values(['frequency'], ascending=False, inplace=True)\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matta.wordcloud(dataframe=df.head(500), text='word', font_size='frequency', \n", " typeface='Helvetica', font_weight='bold',\n", " font_color={'value': 'frequency', 'palette': 'cubehelix', 'scale': 'threshold'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Treemaps\n", "\n", "Treemaps use the [Treemap Layout from d3.js](https://github.com/mbostock/d3/wiki/Treemap-Layout). They work with trees, which we construct through [`networkx.DiGraph`](http://networkx.github.io/documentation/networkx-1.9.1/reference/classes.digraph.html)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/egraells/.virtualenvs/ipython/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.\n", " InsecurePlatformWarning\n" ] } ], "source": [ "flare_data = requests.get('https://gist.githubusercontent.com/mbostock/4063582/raw/a05a94858375bd0ae023f6950a2b13fac5127637/flare.json').json()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "u'flare'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "flare_data['name']" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "\n", "\n", "tree = nx.DiGraph()\n", "\n", "def add_node(node):\n", " node_id = tree.number_of_nodes() + 1\n", " n = tree.add_node(node_id, name=node['name'])\n", " \n", " if 'size' in node:\n", " tree.node[node_id]['size'] = node['size']\n", " \n", " if 'children' in node:\n", " for child in node['children']:\n", " child_id = add_node(child)\n", " tree.add_edge(node_id, child_id)\n", " \n", " return node_id\n", "\n", "root = add_node(flare_data)\n", "# treemap requires this attribute\n", "tree.graph['root'] = root" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nx.is_arborescence(tree)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matta.treemap(tree=tree, node_value='size', node_label='name',\n", " node_color={'value': 'parent.name', 'scale': 'ordinal', 'palette': sns.husl_palette(15, l=.4, s=.9)})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sankey\n", "\n", "Sankey or flow diagrams use the [Sankey plugin by Mike Bostock](http://bost.ocks.org/mike/sankey/). They work with digraphs, just like treemaps. Note that graphs with loops are not supported." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sankey_data = requests.get('http://bost.ocks.org/mike/sankey/energy.json')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [], "source": [ "\n", "\n", "sankey_graph = json_graph.node_link_graph(json.loads(sankey_data.text), directed=True)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "((0, {u'name': u\"Agricultural 'waste'\"}), (0, 1, {u'value': 124.729}))" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sankey_graph.nodes_iter(data=True).next(), sankey_graph.edges_iter(data=True).next()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matta.flow(graph=sankey_graph, node_label='name', link_weight='value', node_color='indigo', \n", " node_width=12, node_padding=13,\n", " link_color={'value': 'value', 'palette': 'Greys', 'scale': 'threshold'}, link_opacity=0.8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parallel Coordinates\n", "\n", "Parallel Coordinates are based on the [code by Jason Davies](http://bl.ocks.org/jasondavies/1341281). They work with [`pandas.DataFrame`](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe)." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
economy (mpg)cylindersdisplacement (cc)power (hp)weight (lb)0-60 mph (s)year
name
AMC Ambassador Brougham13.08360175382111.073
AMC Ambassador DPL15.0839019038508.570
AMC Ambassador SST17.08304150367211.572
AMC Concord DL 620.2623290326518.279
AMC Concord DL18.16258120341015.178
\n", "
" ], "text/plain": [ " economy (mpg) cylinders displacement (cc) \\\n", "name \n", "AMC Ambassador Brougham 13.0 8 360 \n", "AMC Ambassador DPL 15.0 8 390 \n", "AMC Ambassador SST 17.0 8 304 \n", "AMC Concord DL 6 20.2 6 232 \n", "AMC Concord DL 18.1 6 258 \n", "\n", " power (hp) weight (lb) 0-60 mph (s) year \n", "name \n", "AMC Ambassador Brougham 175 3821 11.0 73 \n", "AMC Ambassador DPL 190 3850 8.5 70 \n", "AMC Ambassador SST 150 3672 11.5 72 \n", "AMC Concord DL 6 90 3265 18.2 79 \n", "AMC Concord DL 120 3410 15.1 78 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('http://bl.ocks.org/jasondavies/raw/1341281/cars.csv', index_col='name')\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matta.parcoords(dataframe=df)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "### Parallel Sets" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ClassAgeSexSurvived
0Second ClassChildFemaleSurvived
1Second ClassChildFemaleSurvived
2Second ClassChildFemaleSurvived
3Second ClassChildFemaleSurvived
4Second ClassChildFemaleSurvived
\n", "
" ], "text/plain": [ " Class Age Sex Survived\n", "0 Second Class Child Female Survived\n", "1 Second Class Child Female Survived\n", "2 Second Class Child Female Survived\n", "3 Second Class Child Female Survived\n", "4 Second Class Child Female Survived" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('https://www.jasondavies.com/parallel-sets/titanic.csv')\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matta.parsets(dataframe=df, columns=['Survived', 'Sex', 'Age', 'Class'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Graph\n", "\n", "Graphs from [`networkx.DiGraph`](http://networkx.github.io/documentation/networkx-1.9.1/reference/classes.digraph.html) are visualized using the [Force Layout in d3.js](https://github.com/mbostock/d3/wiki/Force-Layout)." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [], "source": [ "graph = nx.davis_southern_women_graph()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for node in graph.nodes_iter(data=True):\n", " graph.node[node[0]]['color'] = 'purple' if node[1]['bipartite'] else 'green'\n", " graph.node[node[0]]['size'] = graph.degree(node[0])" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matta.force(graph=graph, link_distance=100, height=600,\n", " node_ratio='size',\n", " node_color={'value': 'bipartite', 'scale': 'ordinal', 'palette': 'Set2'})" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }