{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Effective interactive data visualization with pandas and pygal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "I really like [pandas](https://pandas.pydata.org/) – the powerful data analysis framework for Python. And I really like [pygal](http://www.pygal.org) – an interactive visualization library written in and for Python.\n", "\n", "**Why not put these two libraries together for effective data visualizations?**\n", "\n", "In this blog post, I want to show you some basic use cases and integration tips between pandas as pygal.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "We need some kind of data. Which one doesn't really matter. Here I have a dataset that was produced to measure the utilization of source code during program execution. It shows the lines of source code that were executed (covered) or missed during a production coverage measurement.\n", "\n", "As usual, we load this data with pandas first." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PACKAGECLASSLINE_MISSEDLINE_COVERED
0org.springframework.samples.petclinicPetclinicInitializer024
1org.springframework.samples.petclinic.modelNamedEntity14
2org.springframework.samples.petclinic.modelSpecialty01
3org.springframework.samples.petclinic.modelPetType01
4org.springframework.samples.petclinic.modelVets40
\n", "
" ], "text/plain": [ " PACKAGE CLASS \\\n", "0 org.springframework.samples.petclinic PetclinicInitializer \n", "1 org.springframework.samples.petclinic.model NamedEntity \n", "2 org.springframework.samples.petclinic.model Specialty \n", "3 org.springframework.samples.petclinic.model PetType \n", "4 org.springframework.samples.petclinic.model Vets \n", "\n", " LINE_MISSED LINE_COVERED \n", "0 0 24 \n", "1 1 4 \n", "2 0 1 \n", "3 0 1 \n", "4 4 0 " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "raw = pd.read_csv(\"datasets/jacoco_production_coverage_spring_petclinic.csv\")\n", "raw.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a nice dataframe that makes this data better consumable later." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
classlinescoverage
0org.springframework.samples.petclinic.Petclini...241.0
1org.springframework.samples.petclinic.model.Na...50.8
2org.springframework.samples.petclinic.model.Sp...11.0
3org.springframework.samples.petclinic.model.Pe...11.0
4org.springframework.samples.petclinic.model.Vets40.0
\n", "
" ], "text/plain": [ " class lines coverage\n", "0 org.springframework.samples.petclinic.Petclini... 24 1.0\n", "1 org.springframework.samples.petclinic.model.Na... 5 0.8\n", "2 org.springframework.samples.petclinic.model.Sp... 1 1.0\n", "3 org.springframework.samples.petclinic.model.Pe... 1 1.0\n", "4 org.springframework.samples.petclinic.model.Vets 4 0.0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(index=raw.index)\n", "df['class'] = raw['PACKAGE'] + \".\" + raw['CLASS']\n", "df['lines'] = raw['LINE_MISSED'] + raw['LINE_COVERED']\n", "df['coverage'] = raw['LINE_COVERED'] / df['lines']\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualization\n", "### Setup\n", "The following cell has nothing to do with pandas and pygal per se, but it enables us to embed interactive visualizations directly into this notebook. This is pretty cool, so we use this here!" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from IPython.display import display, HTML\n", "\n", "base_html = \"\"\"\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " {rendered_chart}\n", "
\n", " \n", "\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basics\n", "\n", "The core idea is to let pandas create the data in a format that pygal's visualizations can consume easily. So let's have a look at what pygal expects as input data.\n", "\n", "Here is a basic example for a bar chart (adapted from [pygal's documentation](http://www.pygal.org/en/stable/documentation/types/line.html)) and take a look at the visualization (hint: it's interactive!)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", "Browser usage evolution (in %)0010102020303040405050606070708080200220032004200520062007200820092010201120120137.29965034965036111.8076923076923200416.6194.74720279720282101.20387305002689200525252.1947552447552495.83808499193114200631309.642307692307792.00537923614846200736.4367.0898601398601588.55594405594405200845.5424.5374125874126382.74300699300699200946.3481.9849650349650682.23197955890264201042.8539.432517482517684.4677245831092201137.1596.8800699300788.1087950511027320120379.72832167832166111.807692307692320083.9437.17587412587415109.31643356643357200910.8494.62342657342657104.90882194728349201023.8552.070979020979296.60462614308767201135.3609.518531468531689.25860677783754201285.847.68146853146852557.0200284.6105.1290209790209857.76654115115653200384.7162.5765734265734557.70266272189349200474.5220.024125874125964.21826250672405200566277.471678321678369.64792899408283200658.6334.919230769230874.37493275954813200754.7392.3667832167832376.86619150080688200844.8449.814335664335783.19015599784831200936.2507.2618881118881488.68370091447014201026.6564.709440559440694.81603012372243201120.1622.15699300699398.96812802582033201214.260.31993006993006102.73695535233998200215.4117.76748251748252101.97041420118343200315.3175.21503496503496102.0342926304464720048.9232.66258741258744106.1225121032813420059290.1101398601399106.0586336740183200610.4347.5576923076923105.1643356643356520078.9405.00524475524475106.1225121032813420085.8462.4527972027972108.1027434104357220096.7519.9003496503497107.5278375470683220106.8577.3479020979022107.4639591178052720117.5634.7954545454546107.016810112963952012Browser usage evolution (in %)FirefoxChromeIEOthers\n", "
\n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pygal\n", "\n", "bar_chart = pygal.Bar(height=200)\n", "bar_chart.title = 'Browser usage evolution (in %)'\n", "bar_chart.x_labels = map(str, range(2002, 2013))\n", "bar_chart.add('Firefox', [None, None, 0, 16.6, 25, 31, 36.4, 45.5, 46.3, 42.8, 37.1])\n", "bar_chart.add('Chrome', [None, None, None, None, None, None, 0, 3.9, 10.8, 23.8, 35.3])\n", "bar_chart.add('IE', [85.8, 84.6, 84.7, 74.5, 66, 58.6, 54.7, 44.8, 36.2, 26.6, 20.1])\n", "bar_chart.add('Others', [14.2, 15.4, 15.3, 8.9, 9, 10.4, 8.9, 5.8, 6.7, 6.8, 7.5])\n", "display(HTML(base_html.format(rendered_chart=bar_chart.render(is_unicode=True))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the important lines it this one:\n", "```python\n", "bar_chart.add('Firefox', [None, None, 0, 16.6, 25, 31, 36.4, 45.5, 46.3, 42.8, 37.1])\n", "\n", "```\n", "\n", "For each bar chart category (like \"Firefox\" or \"Chrome\"), we need to call the `add` function and provide the data.\n", "\n", "Let's go back to our own dataset. First, we create a category that makes some kind of sense for our use case. Let's use the name of a technical aspect of a source code file as our category. We can find this information at a specific part in the `class` column (at least for most cases)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
classlinescoveragecategory
0org.springframework.samples.petclinic.Petclini...241.0petclinic
1org.springframework.samples.petclinic.model.Na...50.8model
2org.springframework.samples.petclinic.model.Sp...11.0model
3org.springframework.samples.petclinic.model.Pe...11.0model
4org.springframework.samples.petclinic.model.Vets40.0model
\n", "
" ], "text/plain": [ " class lines coverage \\\n", "0 org.springframework.samples.petclinic.Petclini... 24 1.0 \n", "1 org.springframework.samples.petclinic.model.Na... 5 0.8 \n", "2 org.springframework.samples.petclinic.model.Sp... 1 1.0 \n", "3 org.springframework.samples.petclinic.model.Pe... 1 1.0 \n", "4 org.springframework.samples.petclinic.model.Vets 4 0.0 \n", "\n", " category \n", "0 petclinic \n", "1 model \n", "2 model \n", "3 model \n", "4 model " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['category'] = df['class'].str.split(\".\").str[-2]\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bar chart\n", "OK, let's try to create a bar chart for the coverage data of each file. Based on this data, we can take the first step to get into the basic mechanics of the integration between pandas and pygal." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "category\n", "jdbc 0.000000\n", "jpa 0.691558\n", "model 0.739048\n", "petclinic 1.000000\n", "service 0.888889\n", "util 0.135417\n", "web 0.639809\n", "Name: coverage, dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean_by_category = df.groupby('category')['coverage'].mean()\n", "mean_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We just iterate over all entries and add these to the bar chart by using a list comprehension." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", "Pygal000.10.10.20.20.30.30.40.40.50.50.60.60.70.70.80.80.90.911087.14285714285714156.923076923076930.6915584416163.82857142857142103.726273726273730.739047619240.51428571428573100.073260073260091317.280.000000000000010.8888888889393.885714285714388.547008547008560.1354166667470.5714285714286146.506410256410280.6398088023547.2571428571428107.70701520701522jdbcjpamodelpetclinicserviceutilweb\n", "
\n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bar_chart = pygal.Bar(height=200)\n", "[bar_chart.add(x[0], x[1]) for x in mean_by_category.items()]\n", "display(HTML(base_html.format(rendered_chart=bar_chart.render(is_unicode=True))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So this is pretty standard and easy to do.\n", "\n", "Let's look at a slightly more sophisticated use case: showing coverage values for all classes and color the classes accordingly to the category they belong to.\n", "\n", "For this, a bar chart doesn't make sense anymore. So let's look at another visualization type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Treemap\n", "A treemap generates size-based tiles of a dataset and orders them together in a nicely way.\n", "\n", "#### New tricks\n", "The key idea to integrate pandas with pygal is to use the pandas' `groupby`-function to get the data in a format that pygal can consume. The special trick is to put all the `coverage`-values into a list for each category." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "category\n", "jdbc [7, 33, 17, 9, 26, 7, 8, 43]\n", "jpa [8, 11, 2, 7]\n", "model [5, 1, 1, 4, 12, 5, 7, 40, 21, 12]\n", "petclinic [24]\n", "service [18]\n", "util [5, 3, 6, 24]\n", "web [36, 10, 30, 11, 16, 10, 2]\n", "Name: lines, dtype: object" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "values_by_category = df.groupby(['category'])['lines'].apply(list)\n", "values_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This format is exactly what pygal needs. Let's create the treemap out of this data by using a list comprehension again." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", "Pygal755.48341764374153148.582775919732463355.48341764374153100.9239130434782517116.39498009376554109.26421404682276928.4014347398962832.341137123745822689.3129971899202832.341137123745827172.96235407004642147.63925729442978172.96235407004642127.7453580901856843172.9623540700464260.1061007957559648229.901423316808134.9450549450549511229.90142331680882.747252747252742229.90142331680847.032967032967037229.90142331680822.3076923076923145253.2067167759476103.504273504273511258.9271069886455103.504273504273511260.8339037262115103.504273504273514265.60089557012634103.5042735042735112280.8552694706541103.504273504273515341.8727650727651151.786653517422747299.6798159861989598.3678500986193340349.2565311629141498.3678500986193321293.9428288966751726.5811965811965812365.4477065553989526.5811965811965824407.339453062530180.018435.1469054853671480.05472.2235087158165146.801619433198373472.2235087158165130.607287449392716472.2235087158165112.3886639676113424472.223508715816551.6599190283400836543.4670381374834117.1382870437690810543.467038137483466.302166642431330619.6064912000135106.0869565217391411518.857862687448529.16387959866220616571.569791730738429.16387959866220610626.234014442298533.5117056856187362626.23401444229857.424749163879596jdbcjpamodelpetclinicserviceutilweb\n", "
\n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "treemap = pygal.Treemap(height=200)\n", "[treemap.add(x[0], x[1]) for x in values_by_category.items()]\n", "display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Adding labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might have noticed that the labels on mouse-over actions don't show the actual class name but rather the name of the category. Instead of passing a list of values, we need to differentiate between the actual value and the corresponding label for each value. We can do this by passing an [appropriate dictionary](http://www.pygal.org/en/stable/documentation/configuration/label.html).\n", "\n", "\n", "```python\n", "chart.add('category', [{'value' : 1, 'label': 'one'}, {'value': 2, 'label': 'two'}])\n", "```\n", "\n", "Let's fix this with another trick: We can iterate of the necessary data during the grouping of the values. For this, we have to combine the data that we need with the `zip` command an build a data dictionary within in the `apply` action." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "category\n", "jdbc [{'value': 7, 'label': 'org.springframework.sa...\n", "jpa [{'value': 8, 'label': 'org.springframework.sa...\n", "model [{'value': 5, 'label': 'org.springframework.sa...\n", "petclinic [{'value': 24, 'label': 'org.springframework.s...\n", "service [{'value': 18, 'label': 'org.springframework.s...\n", "util [{'value': 5, 'label': 'org.springframework.sa...\n", "web [{'value': 36, 'label': 'org.springframework.s...\n", "dtype: object" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class_values_by_category = df.groupby(['category'], axis=0).apply(\n", " lambda x : [{\"value\" : l, \"label\" : c } for l, c in zip(x['lines'], x['class'])])\n", "class_values_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we generate the treemap once again, you can spot the difference in the visualization by hovering over the tiles with your pointing device." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", "Pygalorg.springframework.samples.petclinic.repository.jdbc.JdbcVisitRowMapper755.48341764374153148.58277591973246org.springframework.samples.petclinic.repository.jdbc.JdbcPetRepositoryImpl3355.48341764374153100.92391304347825org.springframework.samples.petclinic.repository.jdbc.JdbcVetRepositoryImpl17116.39498009376554109.26421404682276org.springframework.samples.petclinic.repository.jdbc.JdbcPetRowMapper928.4014347398962832.34113712374582org.springframework.samples.petclinic.repository.jdbc.JdbcVisitRepositoryImpl2689.3129971899202832.34113712374582org.springframework.samples.petclinic.repository.jdbc.JdbcPet7172.96235407004642147.6392572944297org.springframework.samples.petclinic.repository.jdbc.JdbcPetVisitExtractor8172.96235407004642127.74535809018568org.springframework.samples.petclinic.repository.jdbc.JdbcOwnerRepositoryImpl43172.9623540700464260.106100795755964org.springframework.samples.petclinic.repository.jpa.JpaVisitRepositoryImpl8229.901423316808134.94505494505495org.springframework.samples.petclinic.repository.jpa.JpaOwnerRepositoryImpl11229.90142331680882.74725274725274org.springframework.samples.petclinic.repository.jpa.JpaVetRepositoryImpl2229.90142331680847.03296703296703org.springframework.samples.petclinic.repository.jpa.JpaPetRepositoryImpl7229.90142331680822.307692307692314org.springframework.samples.petclinic.model.NamedEntity5253.2067167759476103.50427350427351org.springframework.samples.petclinic.model.Specialty1258.9271069886455103.50427350427351org.springframework.samples.petclinic.model.PetType1260.8339037262115103.50427350427351org.springframework.samples.petclinic.model.Vets4265.60089557012634103.50427350427351org.springframework.samples.petclinic.model.Visit12280.8552694706541103.50427350427351org.springframework.samples.petclinic.model.BaseEntity5341.8727650727651151.78665351742274org.springframework.samples.petclinic.model.Person7299.6798159861989598.36785009861933org.springframework.samples.petclinic.model.Owner40349.2565311629141498.36785009861933org.springframework.samples.petclinic.model.Pet21293.9428288966751726.58119658119658org.springframework.samples.petclinic.model.Vet12365.4477065553989526.58119658119658org.springframework.samples.petclinic.PetclinicInitializer24407.339453062530180.0org.springframework.samples.petclinic.service.ClinicServiceImpl18435.1469054853671480.0org.springframework.samples.petclinic.util.BrokenSingleton5472.2235087158165146.80161943319837org.springframework.samples.petclinic.util.Database3472.2235087158165130.60728744939271org.springframework.samples.petclinic.util.EntityUtils6472.2235087158165112.38866396761134org.springframework.samples.petclinic.util.CallMonitoringAspect24472.223508715816551.65991902834008org.springframework.samples.petclinic.web.OwnerController36543.4670381374834117.13828704376908org.springframework.samples.petclinic.web.PetTypeFormatter10543.467038137483466.3021666424313org.springframework.samples.petclinic.web.PetController30619.6064912000135106.08695652173914org.springframework.samples.petclinic.web.PetValidator11518.857862687448529.163879598662206org.springframework.samples.petclinic.web.VisitController16571.569791730738429.163879598662206org.springframework.samples.petclinic.web.VetController10626.234014442298533.511705685618736org.springframework.samples.petclinic.web.CrashController2626.23401444229857.424749163879596jdbcjpamodelpetclinicserviceutilweb\n", "
\n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "treemap = pygal.Treemap(height=200)\n", "[treemap.add(x[0], x[1]) for x in class_values_by_category.iteritems()]\n", "display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Adding color" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the final step, I want to show you how you can colorize the tiles as needed. In our case, the column `coverage` is a perfect candidate for this, because it shows the ratio of executed code lines. A value near 1 means that almost all code lines were executed. A value near 0 means that the code line didn't ran.\n", "\n", "Let's see if we can visualize this in the treemap, too. For this, we need two things:\n", "- an indicator, that shows how much a class is covered (we have this information in the `coverage` column)\n", "- a spectrum of colors that we want to use to show how strong the indicator per entry is (we could use the metapher of hot and cold for values near 1 and 0 respecively)\n", "\n", "There are many ways to do it, but the most basic way is so assign every indicator value a corresponding color. For this, we'll us a red to blue colormap from matplot lib an draw colors appropriately." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
classlinescoveragecategorycolor
0org.springframework.samples.petclinic.Petclini...241.0petclinic#b40426
1org.springframework.samples.petclinic.model.Na...50.8model#ee8468
2org.springframework.samples.petclinic.model.Sp...11.0model#b40426
3org.springframework.samples.petclinic.model.Pe...11.0model#b40426
4org.springframework.samples.petclinic.model.Vets40.0model#3b4cc0
\n", "
" ], "text/plain": [ " class lines coverage \\\n", "0 org.springframework.samples.petclinic.Petclini... 24 1.0 \n", "1 org.springframework.samples.petclinic.model.Na... 5 0.8 \n", "2 org.springframework.samples.petclinic.model.Sp... 1 1.0 \n", "3 org.springframework.samples.petclinic.model.Pe... 1 1.0 \n", "4 org.springframework.samples.petclinic.model.Vets 4 0.0 \n", "\n", " category color \n", "0 petclinic #b40426 \n", "1 model #ee8468 \n", "2 model #b40426 \n", "3 model #b40426 \n", "4 model #3b4cc0 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from matplotlib.cm import coolwarm\n", "from matplotlib.colors import rgb2hex\n", "\n", "df['color'] = df['coverage'].apply(lambda x : rgb2hex(coolwarm(x)))\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "category\n", "jdbc [{'value': 7, 'label': 'org.springframework.sa...\n", "jpa [{'value': 8, 'label': 'org.springframework.sa...\n", "model [{'value': 5, 'label': 'org.springframework.sa...\n", "petclinic [{'value': 24, 'label': 'org.springframework.s...\n", "service [{'value': 18, 'label': 'org.springframework.s...\n", "util [{'value': 5, 'label': 'org.springframework.sa...\n", "web [{'value': 36, 'label': 'org.springframework.s...\n", "dtype: object" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class_ratios_by_category = df.groupby(['category'], axis=0).apply(\n", " lambda x : [\n", " {\"value\" : y,\n", " \"label\" : z,\n", " \"color\" : c} for y, z, c in zip(\n", " x['lines'],\n", " x['class'],\n", " x['color'])])\n", "class_ratios_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's plot this treemap. We disable the legend, because it doesn't make sense anymore (the colors of the legend doesn't represent the colors in the treemap anymore)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", "Pygalorg.springframework.samples.petclinic.repository.jdbc.JdbcVisitRowMapper723.19793165947012109.26421404682276org.springframework.samples.petclinic.repository.jdbc.JdbcPetRepositoryImpl3372.24105762567302109.26421404682276org.springframework.samples.petclinic.repository.jdbc.JdbcVetRepositoryImpl17133.5449650834266109.26421404682276org.springframework.samples.petclinic.repository.jdbc.JdbcPetRowMapper932.5861872015718132.34113712374584org.springframework.samples.petclinic.repository.jdbc.JdbcVisitRepositoryImpl26102.4726417034109132.34113712374584org.springframework.samples.petclinic.repository.jdbc.JdbcPet7198.4471453702223147.6392572944297org.springframework.samples.petclinic.repository.jdbc.JdbcPetVisitExtractor8198.4471453702223127.74535809018568org.springframework.samples.petclinic.repository.jdbc.JdbcOwnerRepositoryImpl43198.447145370222360.10610079575598org.springframework.samples.petclinic.repository.jpa.JpaVisitRepositoryImpl8263.7757876219415134.94505494505495org.springframework.samples.petclinic.repository.jpa.JpaOwnerRepositoryImpl11263.775787621941582.74725274725276org.springframework.samples.petclinic.repository.jpa.JpaVetRepositoryImpl2263.775787621941547.03296703296704org.springframework.samples.petclinic.repository.jpa.JpaPetRepositoryImpl7263.775787621941522.30769230769232org.springframework.samples.petclinic.model.NamedEntity5342.01823124900056151.7948717948718org.springframework.samples.petclinic.model.Specialty1342.01823124900056145.64102564102564org.springframework.samples.petclinic.model.PetType1342.01823124900056143.5897435897436org.springframework.samples.petclinic.model.Vets4342.01823124900056138.46153846153845org.springframework.samples.petclinic.model.Visit12342.01823124900056122.05128205128206org.springframework.samples.petclinic.model.BaseEntity5290.523717846203156.41025641025641org.springframework.samples.petclinic.model.Person7347.4963709727024101.80032733224223org.springframework.samples.petclinic.model.Owner40347.496370972702448.46699399890888org.springframework.samples.petclinic.model.Pet21424.0588517511595107.97202797202797org.springframework.samples.petclinic.model.Vet12424.058851751159531.04895104895106org.springframework.samples.petclinic.PetclinicInitializer24467.3580681272989680.00000000000001org.springframework.samples.petclinic.service.ClinicServiceImpl18499.2627538781385480.00000000000001org.springframework.samples.petclinic.util.BrokenSingleton5541.8023348792581146.80161943319837org.springframework.samples.petclinic.util.Database3541.8023348792581130.60728744939271org.springframework.samples.petclinic.util.EntityUtils6541.8023348792581112.38866396761135org.springframework.samples.petclinic.util.CallMonitoringAspect24541.802334879258151.6599190283401org.springframework.samples.petclinic.web.OwnerController36615.8502701312066110.36437246963564org.springframework.samples.petclinic.web.PetTypeFormatter10673.5825586327259110.36437246963564org.springframework.samples.petclinic.web.PetController30628.400767631536833.44129554655872org.springframework.samples.petclinic.web.PetValidator11715.7588357588357135.2268244575937org.springframework.samples.petclinic.web.VisitController16715.758835758835781.97238658777123org.springframework.samples.petclinic.web.VetController10710.821205821205926.74556213017754org.springframework.samples.petclinic.web.CrashController2740.446985446985526.74556213017754\n", "
\n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "treemap = pygal.Treemap(height=200, show_legend=False)\n", "[treemap.add(x[0], x[1]) for x in class_ratios_by_category.iteritems()]\n", "display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hacking the system" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One problem exists though: The value in the lower left corner in the tooltip is the lines value. In the case that we want to display another value there (e. g. the coverage value), we need to hack the system a little bit by introduction a value formatter. This formatter needs a formatting function that we can happily provide (but surley not in a way the library designer originally thought how to do uit ;-) )." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "category\n", "jdbc [{'value': 7, 'label': 'org.springframework.sa...\n", "jpa [{'value': 8, 'label': 'org.springframework.sa...\n", "model [{'value': 5, 'label': 'org.springframework.sa...\n", "petclinic [{'value': 24, 'label': 'org.springframework.s...\n", "service [{'value': 18, 'label': 'org.springframework.s...\n", "util [{'value': 5, 'label': 'org.springframework.sa...\n", "web [{'value': 36, 'label': 'org.springframework.s...\n", "dtype: object" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class_ratios_hack_by_category = df.groupby(['category'], axis=0).apply(\n", " lambda x : [\n", " {\"value\" : y,\n", " \"label\" : z,\n", " \"color\" : c,\n", " \"formatter\" : lambda x : \"{0:.0%}\".format(f)} for y, z, c, f in zip(\n", " x['lines'],\n", " x['class'],\n", " x['color'],\n", " x['coverage'])])\n", "class_ratios_hack_by_category" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", "Pygalorg.springframework.samples.petclinic.repository.jdbc.JdbcVisitRowMapper0%23.19793165947012109.26421404682276org.springframework.samples.petclinic.repository.jdbc.JdbcPetRepositoryImpl0%72.24105762567302109.26421404682276org.springframework.samples.petclinic.repository.jdbc.JdbcVetRepositoryImpl0%133.5449650834266109.26421404682276org.springframework.samples.petclinic.repository.jdbc.JdbcPetRowMapper0%32.5861872015718132.34113712374584org.springframework.samples.petclinic.repository.jdbc.JdbcVisitRepositoryImpl0%102.4726417034109132.34113712374584org.springframework.samples.petclinic.repository.jdbc.JdbcPet0%198.4471453702223147.6392572944297org.springframework.samples.petclinic.repository.jdbc.JdbcPetVisitExtractor0%198.4471453702223127.74535809018568org.springframework.samples.petclinic.repository.jdbc.JdbcOwnerRepositoryImpl0%198.447145370222360.10610079575598org.springframework.samples.petclinic.repository.jpa.JpaVisitRepositoryImpl86%263.7757876219415134.94505494505495org.springframework.samples.petclinic.repository.jpa.JpaOwnerRepositoryImpl86%263.775787621941582.74725274725276org.springframework.samples.petclinic.repository.jpa.JpaVetRepositoryImpl86%263.775787621941547.03296703296704org.springframework.samples.petclinic.repository.jpa.JpaPetRepositoryImpl86%263.775787621941522.30769230769232org.springframework.samples.petclinic.model.NamedEntity8%342.01823124900056151.7948717948718org.springframework.samples.petclinic.model.Specialty8%342.01823124900056145.64102564102564org.springframework.samples.petclinic.model.PetType8%342.01823124900056143.5897435897436org.springframework.samples.petclinic.model.Vets8%342.01823124900056138.46153846153845org.springframework.samples.petclinic.model.Visit8%342.01823124900056122.05128205128206org.springframework.samples.petclinic.model.BaseEntity8%290.523717846203156.41025641025641org.springframework.samples.petclinic.model.Person8%347.4963709727024101.80032733224223org.springframework.samples.petclinic.model.Owner8%347.496370972702448.46699399890888org.springframework.samples.petclinic.model.Pet8%424.0588517511595107.97202797202797org.springframework.samples.petclinic.model.Vet8%424.058851751159531.04895104895106org.springframework.samples.petclinic.PetclinicInitializer100%467.3580681272989680.00000000000001org.springframework.samples.petclinic.service.ClinicServiceImpl89%499.2627538781385480.00000000000001org.springframework.samples.petclinic.util.BrokenSingleton54%541.8023348792581146.80161943319837org.springframework.samples.petclinic.util.Database54%541.8023348792581130.60728744939271org.springframework.samples.petclinic.util.EntityUtils54%541.8023348792581112.38866396761135org.springframework.samples.petclinic.util.CallMonitoringAspect54%541.802334879258151.6599190283401org.springframework.samples.petclinic.web.OwnerController50%615.8502701312066110.36437246963564org.springframework.samples.petclinic.web.PetTypeFormatter50%673.5825586327259110.36437246963564org.springframework.samples.petclinic.web.PetController50%628.400767631536833.44129554655872org.springframework.samples.petclinic.web.PetValidator50%715.7588357588357135.2268244575937org.springframework.samples.petclinic.web.VisitController50%715.758835758835781.97238658777123org.springframework.samples.petclinic.web.VetController50%710.821205821205926.74556213017754org.springframework.samples.petclinic.web.CrashController50%740.446985446985526.74556213017754\n", "
\n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "treemap = pygal.Treemap(height=200, show_legend=False, colors=[\"#ffffff\"])\n", "[treemap.add(x[0], x[1]) for x in class_ratios_hack_by_category.iteritems()]\n", "display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gauge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many other visualization types that you can use with these tricks. Let's take a look at the dataset from the beginning." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "category\n", "jdbc 0.000000\n", "jpa 0.691558\n", "model 0.739048\n", "petclinic 1.000000\n", "service 0.888889\n", "util 0.135417\n", "web 0.639809\n", "Name: coverage, dtype: float64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can visualize this e. g. as gauge chart." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", "Pygal69.15584416390.0563565119141133.7547298720783473.9047619604.1943654456522142.05404982771282100110.40035699999999351.3999999991075488.88888889355.62023823345277347.094053124113913.54166667581.4672819280338214.9643229006280563.98088023175.02290049958077497.02849683253913100010069.1558441610073.904761910010010088.8888888910013.5416666710063.98088023jdbcjpamodelpetclinicserviceutilweb\n", "
\n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gauge = pygal.SolidGauge(inner_radius=0.70)\n", "[gauge.add(x[0], [{\"value\" : x[1] * 100}] ) for x in mean_by_category.iteritems()]\n", "display(HTML(base_html.format(rendered_chart=gauge.render(is_unicode=True))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or in another variant of it..." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", "Pygal001010202030304040505060607070808090901001000431.575454.870514225703969.15584416124.34388504060337171.410324058698573.9047619100.00102022248042216.645478940079100202.82500000000005454.870514225703988.88888889112.78153352601957370.6229113866320413.54166667532.9953158056725346.9878882386576663.98088023164.30663333210737129.80672273819306jdbcjpamodelpetclinicserviceutilweb\n", "
\n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gauge = pygal.Gauge(human_readable=True)\n", "[gauge.add(x[0], [{\"value\" : x[1] * 100}] ) for x in mean_by_category.iteritems()]\n", "display(HTML(base_html.format(rendered_chart=gauge.render(is_unicode=True))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, STOP! Enough for today!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "Allright, that's it for this blog post! I hope you have seen that (if you know some tricks), you can easily integrate pandas with pygal!\n", "\n", "I find this combination a nice tradeoff between complexity and interactivity. Let me now if I can simplyfy or explain one or two things more deeply.\n", "\n", "Maybe next time, we can take a look at some tricks regarding [D3](https://d3js.org/), can't we?\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }