{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# [matta](https://github.com/carnby/matta) - view and scaffold d3.js visualizations in IPython notebooks\n", "\n", "## Let's Make a Map Too (and a Cartogram!)\n", "\n", "Inspired by [Mike Bostock's _Let's Make a Map_](http://bost.ocks.org/mike/map/), we want to make a map too using matta. \n", "We will display the [communes of Santiago, Chile](https://en.wikipedia.org/wiki/Santiago#Political_divisions). To do that we will perform the following steps:\n", "\n", "1. Download the administrative borders of Chile in a shapefile from the [Chilean Library of Congress](http://siit2.bcn.cl/mapas_vectoriales/index_html/).\n", "2. Use ogr2ogr to filter and clip the borders of the city of Santiago, as well as converting the result to GeoJSON.\n", "3. Convert the GeoJSON file to TopoJSON.\n", "4. Display the TopoJSON file using matta in the IPython notebook.\n", "5. Download Human Development Index from Wikipedia and make a choropleth/symbol map using matta." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import print_function, unicode_literals" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/cartography/template.css\n", "/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/parsets/template.css\n" ] }, { "data": { "text/html": [ "\n", "matta Javascript code added." ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import matta\n", "import json\n", "import unicodedata\n", "\n", "# we do this to load the required libraries when viewing on NBViewer\n", "matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download Shapefiles\n", "\n", "Note We delete data to start from 0" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2015-12-20 22:18:18-- http://siit2.bcn.cl/obtienearchivo?id=repositorio/10221/10396/1/division_comunal.zip\n", "Resolviendo siit2.bcn.cl (siit2.bcn.cl)... 200.0.66.71\n", "Conectando con siit2.bcn.cl (siit2.bcn.cl)[200.0.66.71]:80... conectado.\n", "Petición HTTP enviada, esperando respuesta... 200 OK\n", "Longitud: 29000232 (28M) [application/zip]\n", "Grabando a: “data/division_comunal.zip”\n", "\n", "100%[======================================>] 29.000.232 1,25MB/s en 22s \n", "\n", "2015-12-20 22:18:47 (1,23 MB/s) - “data/division_comunal.zip” guardado [29000232/29000232]\n", "\n" ] } ], "source": [ "!rm -fr data\n", "!mkdir data\n", "!wget http://siit2.bcn.cl/obtienearchivo?id=repositorio/10221/10396/1/division_comunal.zip -O data/division_comunal.zip" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Archive: data/division_comunal.zip\n", " inflating: data/Disclaimer.txt \n", " inflating: data/division_comunal.dbf \n", " inflating: data/division_comunal.prj \n", " inflating: data/division_comunal.sbn \n", " inflating: data/division_comunal.sbx \n", " inflating: data/division_comunal.shp \n", " inflating: data/division_comunal.shp.xml \n", " inflating: data/division_comunal.shx \n" ] } ], "source": [ "!unzip data/division_comunal.zip -d data/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convert to GeoJSON\n", "\n", "You can use `ogrinfo` to see the structure of the source shapefile." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO: Open of `data/division_comunal.shp'\r\n", " using driver `ESRI Shapefile' successful.\r\n", "\r\n", "Layer name: division_comunal\r\n", "Geometry: Polygon\r\n", "Feature Count: 346\r\n", "Extent: (-3701712.293900, 3794823.357600) - (704690.560200, 8065196.816300)\r\n", "Layer SRS WKT:\r\n", "PROJCS[\"WGS_1984_UTM_Zone_19S\",\r\n", " GEOGCS[\"GCS_WGS_1984\",\r\n", " DATUM[\"WGS_1984\",\r\n", " SPHEROID[\"WGS_84\",6378137.0,298.257223563]],\r\n", " PRIMEM[\"Greenwich\",0.0],\r\n", " UNIT[\"Degree\",0.0174532925199433]],\r\n", " PROJECTION[\"Transverse_Mercator\"],\r\n", " PARAMETER[\"False_Easting\",500000.0],\r\n", " PARAMETER[\"False_Northing\",10000000.0],\r\n", " PARAMETER[\"Central_Meridian\",-69.0],\r\n", " PARAMETER[\"Scale_Factor\",0.9996],\r\n", " PARAMETER[\"Latitude_Of_Origin\",0.0],\r\n", " UNIT[\"Meter\",1.0]]\r\n", "NOM_REG: String (50.0)\r\n", "NOM_PROV: String (20.0)\r\n", "NOM_COM: String (30.0)\r\n", "SHAPE_LENG: Real (19.11)\r\n", "DIS_ELEC: Integer (4.0)\r\n", "CIR_SENA: Integer (4.0)\r\n", "COD_COMUNA: Integer (4.0)\r\n", "SHAPE_Le_1: Real (19.11)\r\n", "SHAPE_Area: Real (19.11)\r\n" ] } ], "source": [ "!ogrinfo data/division_comunal.shp 'division_comunal' | head -n 30" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we use ogr2ogr to convert the shapefile into GeoJSON. We will build a file for Santiago (its capital) only.\n", "\n", "Notes:\n", "\n", " * After some manual inspection, we know that NOM_PROV contains the name of the parent administrative divisions of the city. We use a where clause to filter those. \n", " * We delete `santiago-comunas-geo.json` in case it exists (that is, when we re-run the notebook :) ).\n", " * We use the `-clipdst` option to specify a bounding box obtained in [this site](http://boundingbox.klokantech.com/). We need it because some populated locations are much bigger in administrative terms than in population terms.\n", " * We also use the `-t_srs EPSG:4326` option to convert the data coordinates to (longitude,latitude) pairs." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "!ogr2ogr -where \"NOM_PROV IN ('Santiago', 'Maipo', 'Cordillera', 'Chacabuco', 'Talagante')\" -f GeoJSON \\\n", " -clipdst -70.828155 -33.635036 -70.452573 -33.302953 -t_srs EPSG:4326 \\\n", " data/santiago-comunas-geo.json data/division_comunal.shp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### From GeoJSON to TopoJSON" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bounds: -70.828155 -33.635036 -70.452573 -33.302953 (spherical)\r\n", "pre-quantization: 0.0418m (3.76e-7°) 0.0369m (3.32e-7°)\r\n", "topology: 274 arcs, 5195 points\r\n", "post-quantization: 4.18m (0.0000376°) 3.69m (0.0000332°)\r\n", "prune: retained 274 / 274 arcs (100%)\r\n" ] } ], "source": [ "!topojson -p --id-property NOM_COM -s 0 -o data/santiago-comunas-topo.json data/santiago-comunas-geo.json" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-rw-r-- 1 egraells egraells 449K dic 20 22:19 data/santiago-comunas-geo.json\r\n", "-rw-rw-r-- 1 egraells egraells 55K dic 20 22:19 data/santiago-comunas-topo.json\r\n" ] } ], "source": [ "!ls -lh data/*.json" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def strip_accents(s):\n", " return ''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn')\n", "\n", "with open('data/santiago-comunas-topo.json', 'r') as f:\n", " stgo = json.load(f)\n", " \n", "for g in stgo['objects']['santiago-comunas-geo']['geometries']:\n", " g['id'] = strip_accents(g['id'].title())\n", " g['properties']['id'] = g['id']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is an example feature:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{u'arcs': [[0, 1, 2, 3, 4, 5, 6, 7]],\n", " u'id': u'Independencia',\n", " u'properties': {u'CIR_SENA': 7,\n", " u'COD_COMUNA': 1310,\n", " u'DIS_ELEC': 19,\n", " u'NOM_COM': u'Independencia',\n", " u'NOM_PROV': u'Santiago',\n", " u'NOM_REG': u'Regi\\xf3n Metropolitana de Santiago',\n", " u'SHAPE_Area': 7514745.51089,\n", " u'SHAPE_LENG': 11488.6957475,\n", " u'SHAPE_Le_1': 11718.6870863,\n", " u'id': u'Independencia'},\n", " u'type': u'Polygon'}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stgo['objects']['santiago-comunas-geo']['geometries'][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display TopoJSON using matta" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matta.cartography(geometry=stgo, leaflet=False, label='NOM_COM',\n", " fill_color={'value': 'id', 'palette': str('husl'), 'scale': 'ordinal', 'n_colors': 40})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Choroplet and Symbol Map\n", "\n", "Now that we can load a map and giving some colors to areas, let's try to visualize some data. The easiest way to do so would be to get a Wikipedia page with data for Santiago. From it, we will get the *Human Development Index* (IDH in Spanish) score for each municipality of the city." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ComunaUbicación?Población (2002)?Viviendas (2002)?Densidad poblacional?Crecimiento demográfico (1992-2002)?IDH (2003)?Pobreza (2006)?
0Cerrillossurponiente71.90619.8114.32908-100,743 (54)83
1Cerro Navianorponiente148.31235.27713.48291-480,683 (165)175
2Conchalínorte133.25632.60912.07029-1290,707 (118)80
3El Bosquesur175.59442.80812.27072160,711 (106)158
4Estación Centralsurponiente130.39432.3579.03631-750,735 (60)73
\n", "
" ], "text/plain": [ " Comuna Ubicación? Población (2002)? Viviendas (2002)? \\\n", "0 Cerrillos surponiente 71.906 19.811 \n", "1 Cerro Navia norponiente 148.312 35.277 \n", "2 Conchalí norte 133.256 32.609 \n", "3 El Bosque sur 175.594 42.808 \n", "4 Estación Central surponiente 130.394 32.357 \n", "\n", " Densidad poblacional? Crecimiento demográfico (1992-2002)? IDH (2003)? \\\n", "0 4.32908 -10 0,743 (54) \n", "1 13.48291 -48 0,683 (165) \n", "2 12.07029 -129 0,707 (118) \n", "3 12.27072 16 0,711 (106) \n", "4 9.03631 -75 0,735 (60) \n", "\n", " Pobreza (2006)? \n", "0 83 \n", "1 175 \n", "2 80 \n", "3 158 \n", "4 73 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_html('https://es.wikipedia.org/wiki/Anexo:Comunas_de_Santiago_de_Chile', \n", " attrs={'class': 'sortable'}, header=0)[0]\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data is not clean due to locale settings. Fortunately, we just want the HDI column, which should be easy to convert to a meaningful float. Let's fix the HDI and Population columns." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ComunaUbicación?Población (2002)?Viviendas (2002)?Densidad poblacional?Crecimiento demográfico (1992-2002)?Pobreza (2006)?HDI
0Cerrillossurponiente71.90619.8114.32908-10830.743
1Cerro Navianorponiente148.31235.27713.48291-481750.683
2Conchalinorte133.25632.60912.07029-129800.707
3El Bosquesur175.59442.80812.27072161580.711
4Estacion Centralsurponiente130.39432.3579.03631-75730.735
\n", "
" ], "text/plain": [ " Comuna Ubicación? Población (2002)? Viviendas (2002)? \\\n", "0 Cerrillos surponiente 71.906 19.811 \n", "1 Cerro Navia norponiente 148.312 35.277 \n", "2 Conchali norte 133.256 32.609 \n", "3 El Bosque sur 175.594 42.808 \n", "4 Estacion Central surponiente 130.394 32.357 \n", "\n", " Densidad poblacional? Crecimiento demográfico (1992-2002)? \\\n", "0 4.32908 -10 \n", "1 13.48291 -48 \n", "2 12.07029 -129 \n", "3 12.27072 16 \n", "4 9.03631 -75 \n", "\n", " Pobreza (2006)? HDI \n", "0 83 0.743 \n", "1 175 0.683 \n", "2 80 0.707 \n", "3 158 0.711 \n", "4 73 0.735 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Comuna'] = [strip_accents(c).replace('?', '').title() for c in df['Comuna']]\n", "df['HDI'] = [float(c.split()[0].replace(',', '.')) for c in df['IDH (2003)?']]\n", "del df['IDH (2003)?']\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ComunaHDIPopulation
0Cerrillos0.74371906
1Cerro Navia0.683148312
2Conchali0.707133256
3El Bosque0.711175594
4Estacion Central0.735130394
\n", "
" ], "text/plain": [ " Comuna HDI Population\n", "0 Cerrillos 0.743 71906\n", "1 Cerro Navia 0.683 148312\n", "2 Conchali 0.707 133256\n", "3 El Bosque 0.711 175594\n", "4 Estacion Central 0.735 130394" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Population'] = df['Población (2002)?'] * 1000\n", "df = df.loc[:,('Comuna', 'HDI', 'Population')]\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Much better! " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HDIPopulation
count37.00000037.000000
mean0.762865146718.648649
std0.076155106077.982722
min0.6570002477.000000
25%0.70900085118.000000
50%0.737000120874.000000
75%0.804000175594.000000
max0.949000492603.000000
\n", "
" ], "text/plain": [ " HDI Population\n", "count 37.000000 37.000000\n", "mean 0.762865 146718.648649\n", "std 0.076155 106077.982722\n", "min 0.657000 2477.000000\n", "25% 0.709000 85118.000000\n", "50% 0.737000 120874.000000\n", "75% 0.804000 175594.000000\n", "max 0.949000 492603.000000" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's visualize HDI for each municipality." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matta.cartography(geometry=stgo, label='NOM_COM', width=700,\n", " area_dataframe=df, area_feature_name='Comuna', \n", " area_color={'value': 'HDI', 'scale': 'threshold', 'palette': 'coolwarm', 'n_colors': 7, 'legend': True}, \n", " leaflet=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happened there? We bound a DataFrame (`area_dataframe`) to the geometry by using the `area_feature_name` column in the DataFrame. Using this, our code was able to match a feature from the TopoJSON structure with a row from the DataFrame. Note that some areas are not drawn - they were not available in the DataFrame.\n", "\n", "We can also visualize this information using a [cartogram](https://en.wikipedia.org/wiki/Cartogram). According to Wikipedia:\n", "\n", "> A cartogram is a map in which some thematic mapping variable – such as travel time, population, or Gross National Product – is substituted for land area or distance. The geometry or space of the map is distorted in order to convey the information of this alternate variable. \n", "\n", "For instance, we observe that the areas that have a high HDI are, indeed, big. But how much people inhabit them?" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matta.cartogram(geometry=stgo, width=700,\n", " area_dataframe=df, area_feature_name='Comuna', area_opacity=1.0,\n", " area_color={'value': 'HDI', 'scale': 'threshold', 'palette': 'coolwarm', 'n_colors': 7}, \n", " area_value='Population', na_value=df['Population'].min() * 0.75)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "As we can see, the shape of each area is different now! The differences are not very dramatic, but this is just an example." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }