{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Violin Plot with Python Plotly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Violin plot is a trace that visually encodes the distribution of a data set, along with its summary statistics.\n", "It displays the graph of the estimated probability density function (pdf) mirrored about y-axis, and inside the violin-like shaped region, the elements of a box plot (median, lower and upper quartile, whisker position).\n", " \n", "In this Jupyter Notebook we define functions to get the Plotly plot of a violin plot. In order to get more insights into distributional properties we add the option to overlay onto the same axis the rug plot of the data set." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import HTML\n", "HTML('')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from scipy import stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the summary statistics of data:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def calc_stats(data) :\n", " x=np.asarray(data, np. float) \n", " vals_min=np.min(x)\n", " vals_max=np.max(x)\n", " q2=np.percentile(x, 50, interpolation='linear') \n", " q1=np.percentile(x, 25, interpolation='lower')\n", " q3=np.percentile(x, 75, interpolation='higher')\n", " IQR=q3-q1\n", " whisker_dist = 1.5 * IQR\n", " #in order to prevent drawing whiskers outside the interval \n", " #of data one defines the whisker positions as:\n", " d1 = np.min(x[x >= (q1 - whisker_dist)])\n", " d2 = np.max(x[x <= (q3 + whisker_dist)])\n", " return vals_min, vals_max, q1, q2 ,q3, d1,d2" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import plotly.plotly as py\n", "from plotly.graph_objs import *\n", "import plotly.tools as tls " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Functions that define violin components:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def make_half_violin( x, y, fillcolor='#1f77b4', linecolor='rgb(50,50,50)'): \n", " text=['(pdf(y), y)=('+'{:0.2f}'.format(x[i])+', '+'{:0.2f}'.format(y[i])+')'\n", " for i in range(len(x))] \n", " return Scatter(x=x, \n", " y=y, mode='lines',\n", " name='',\n", " text=text,\n", " fill='tonextx', \n", " fillcolor= fillcolor,\n", " line=Line(width=0.5, color=linecolor, shape='spline'),\n", " hoverinfo='text',\n", " opacity=0.5\n", " )\n", " \n", "\n", "def make_rugplot(vals, pdf_max, distance, color='#1f77b4'):\n", " return Scatter(y=vals, \n", " x=[-pdf_max-distance]*len(vals),\n", " marker=Marker(\n", " color=color,\n", " symbol='line-ew-open'\n", " ),\n", " mode='markers',\n", " name='',\n", " showlegend=False,\n", " hoverinfo='y'\n", " ) \n", "def make_quartiles(q1, q3):\n", " return Scatter(x=[0, 0],\n", " y=[q1, q3],\n", " text=['lower-quartile: '+'{:0.2f}'.format(q1), \n", " 'upper-quartile: '+'{:0.2f}'.format(q3)],\n", " mode='lines',\n", " line=Line(width=4, color='rgb(0,0,0)'),\n", " hoverinfo='text'\n", " )\n", "def make_median(q2):\n", " return Scatter(x=[0],\n", " y=[q2], \n", " text=['median: '+'{:0.2f}'.format(q2)],\n", " mode='markers',\n", " marker=dict(symbol='square', color='rgb(255,255,255)'),\n", " hoverinfo='text'\n", " )\n", "def make_non_outlier_interval(d1,d2):\n", " return Scatter(x=[0, 0],\n", " y=[d1, d2],\n", " name='',\n", " mode='lines',\n", " line=Line(width=1.5, color='rgb(0,0,0)')\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set axes:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ " \n", "def make_XAxis(xaxis_title, xaxis_range):\n", " xaxis=XAxis(title=xaxis_title,\n", " range=xaxis_range,\n", " showgrid=False,\n", " zeroline=False,\n", " showline=False,\n", " mirror=False,\n", " ticks='',\n", " showticklabels=False,\n", " )\n", " return xaxis\n", "\n", "\n", "def make_YAxis(yaxis_title):\n", " yaxis = YAxis(title=yaxis_title,\n", " showticklabels=True,\n", " autorange=True,\n", " ticklen=4,\n", " showline=True,\n", " zeroline=False,\n", " showgrid=False,\n", " mirror=False) \n", " return yaxis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data values, vals, can be given in a numeric list, numpy array of shape (n, ) or a pandas series." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because a violin plot is symmetric with respect to a vertical axis, we define the range of x values\n", "in the plot either\n", "of the form range=[-a,a] or of the form [-b,a], when a rug plot is overlaid." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def create_violinplot(vals, fillcolor='#1f77b4', rugplot=True):\n", " vals=np.asarray(vals, np.float)\n", " vals_min, vals_max, q1, q2, q3, d1, d2=calc_stats(vals)#summary statistics\n", " \n", " pdf= stats.gaussian_kde(vals)# kernel density estimation of pdf\n", " xx=np.linspace(vals_min, vals_max, 100)# grid over the data interval\n", " yy=pdf(xx)#evaluate the pdf at the grid xx\n", " max_pdf=np.max(yy)\n", " distance=2.0*max_pdf/10 if rugplot else 0# distance from the violin plot to rugplot\n", " plot_xrange=[-max_pdf-distance-0.1, max_pdf+0.1]# range for x values in the plot\n", " \n", " plot_data=[make_half_violin(-yy, xx, fillcolor=fillcolor),\n", " make_half_violin(yy, xx, fillcolor=fillcolor),\n", " make_non_outlier_interval(d1, d2),\n", " make_quartiles(q1,q3),\n", " make_median(q2)]\n", " if rugplot: \n", " plot_data.append(make_rugplot(vals, max_pdf, distance=distance, color=fillcolor))\n", " return plot_data, plot_xrange " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us define first a single violin plot:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Score
06.55
19.13
28.46
39.38
46.35
\n", "