{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Violin Plot with Python Plotly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Violin plot is a trace that visually encodes the distribution of a data set, along with its summary statistics.\n", "It displays the graph of the estimated probability density function (pdf) mirrored about y-axis, and inside the violin-like shaped region, the elements of a box plot (median, lower and upper quartile, whisker position).\n", " \n", "In this Jupyter Notebook we define functions to get the Plotly plot of a violin plot. In order to get more insights into distributional properties we add the option to overlay onto the same axis the rug plot of the data set." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import HTML\n", "HTML('')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from scipy import stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the summary statistics of data:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def calc_stats(data) :\n", " x=np.asarray(data, np. float) \n", " vals_min=np.min(x)\n", " vals_max=np.max(x)\n", " q2=np.percentile(x, 50, interpolation='linear') \n", " q1=np.percentile(x, 25, interpolation='lower')\n", " q3=np.percentile(x, 75, interpolation='higher')\n", " IQR=q3-q1\n", " whisker_dist = 1.5 * IQR\n", " #in order to prevent drawing whiskers outside the interval \n", " #of data one defines the whisker positions as:\n", " d1 = np.min(x[x >= (q1 - whisker_dist)])\n", " d2 = np.max(x[x <= (q3 + whisker_dist)])\n", " return vals_min, vals_max, q1, q2 ,q3, d1,d2" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import plotly.plotly as py\n", "from plotly.graph_objs import *\n", "import plotly.tools as tls " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Functions that define violin components:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def make_half_violin( x, y, fillcolor='#1f77b4', linecolor='rgb(50,50,50)'): \n", " text=['(pdf(y), y)=('+'{:0.2f}'.format(x[i])+', '+'{:0.2f}'.format(y[i])+')'\n", " for i in range(len(x))] \n", " return Scatter(x=x, \n", " y=y, mode='lines',\n", " name='',\n", " text=text,\n", " fill='tonextx', \n", " fillcolor= fillcolor,\n", " line=Line(width=0.5, color=linecolor, shape='spline'),\n", " hoverinfo='text',\n", " opacity=0.5\n", " )\n", " \n", "\n", "def make_rugplot(vals, pdf_max, distance, color='#1f77b4'):\n", " return Scatter(y=vals, \n", " x=[-pdf_max-distance]*len(vals),\n", " marker=Marker(\n", " color=color,\n", " symbol='line-ew-open'\n", " ),\n", " mode='markers',\n", " name='',\n", " showlegend=False,\n", " hoverinfo='y'\n", " ) \n", "def make_quartiles(q1, q3):\n", " return Scatter(x=[0, 0],\n", " y=[q1, q3],\n", " text=['lower-quartile: '+'{:0.2f}'.format(q1), \n", " 'upper-quartile: '+'{:0.2f}'.format(q3)],\n", " mode='lines',\n", " line=Line(width=4, color='rgb(0,0,0)'),\n", " hoverinfo='text'\n", " )\n", "def make_median(q2):\n", " return Scatter(x=[0],\n", " y=[q2], \n", " text=['median: '+'{:0.2f}'.format(q2)],\n", " mode='markers',\n", " marker=dict(symbol='square', color='rgb(255,255,255)'),\n", " hoverinfo='text'\n", " )\n", "def make_non_outlier_interval(d1,d2):\n", " return Scatter(x=[0, 0],\n", " y=[d1, d2],\n", " name='',\n", " mode='lines',\n", " line=Line(width=1.5, color='rgb(0,0,0)')\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set axes:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ " \n", "def make_XAxis(xaxis_title, xaxis_range):\n", " xaxis=XAxis(title=xaxis_title,\n", " range=xaxis_range,\n", " showgrid=False,\n", " zeroline=False,\n", " showline=False,\n", " mirror=False,\n", " ticks='',\n", " showticklabels=False,\n", " )\n", " return xaxis\n", "\n", "\n", "def make_YAxis(yaxis_title):\n", " yaxis = YAxis(title=yaxis_title,\n", " showticklabels=True,\n", " autorange=True,\n", " ticklen=4,\n", " showline=True,\n", " zeroline=False,\n", " showgrid=False,\n", " mirror=False) \n", " return yaxis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data values, `vals`, can be given in a numeric list, numpy array of shape (n, ) or a pandas series." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because a violin plot is symmetric with respect to a vertical axis, we define the range of x values\n", "in the plot either\n", "of the form `range=[-a,a]` or of the form `[-b,a]`, when a rug plot is overlaid." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def create_violinplot(vals, fillcolor='#1f77b4', rugplot=True):\n", " vals=np.asarray(vals, np.float)\n", " vals_min, vals_max, q1, q2, q3, d1, d2=calc_stats(vals)#summary statistics\n", " \n", " pdf= stats.gaussian_kde(vals)# kernel density estimation of pdf\n", " xx=np.linspace(vals_min, vals_max, 100)# grid over the data interval\n", " yy=pdf(xx)#evaluate the pdf at the grid xx\n", " max_pdf=np.max(yy)\n", " distance=2.0*max_pdf/10 if rugplot else 0# distance from the violin plot to rugplot\n", " plot_xrange=[-max_pdf-distance-0.1, max_pdf+0.1]# range for x values in the plot\n", " \n", " plot_data=[make_half_violin(-yy, xx, fillcolor=fillcolor),\n", " make_half_violin(yy, xx, fillcolor=fillcolor),\n", " make_non_outlier_interval(d1, d2),\n", " make_quartiles(q1,q3),\n", " make_median(q2)]\n", " if rugplot: \n", " plot_data.append(make_rugplot(vals, max_pdf, distance=distance, color=fillcolor))\n", " return plot_data, plot_xrange " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us define first a single violin plot:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Score
06.55
19.13
28.46
39.38
46.35
\n", "
" ], "text/plain": [ " Score\n", "0 6.55\n", "1 9.13\n", "2 8.46\n", "3 9.38\n", "4 6.35" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df=pd.read_excel('Violin-plot-data.xlsx')\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x=list(df['Score'])\n", "plot_data, plot_xrange=create_violinplot(x, fillcolor='rgb(102,194,163)')" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": true }, "outputs": [], "source": [ "layout=Layout(title='Violin and Rug Plot',\n", " autosize=False,\n", " font=Font(size=11),\n", " height=450,\n", " showlegend=False,\n", " width=350,\n", " xaxis=make_XAxis('', plot_xrange), \n", " yaxis=make_YAxis(''),\n", " hovermode='closest'\n", " ) " ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [], "source": [ "layout['yaxis'].update(dict(showline=False, showticklabels=False, ticks=''))" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n", "fig=Figure(data=Data(plot_data), layout=layout)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.sign_in('empet', 'my_api_key')\n", "py.iplot(fig, filename='Violin-Plot-Example')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data summary encoded in a violin plot facilitate comparison of multiple data sets. \n", "In the following we generate a few data sets and their violin plots:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GroupScore
0B1.656178
1C-1.379259
2C1.567691
3B1.484571
4E0.410634
\n", "
" ], "text/plain": [ " Group Score\n", "0 B 1.656178\n", "1 C -1.379259\n", "2 C 1.567691\n", "3 B 1.484571\n", "4 E 0.410634" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.seed(619517)\n", "Nr=250\n", "y = np.random.randn(Nr)\n", "gr = np.random.choice(list(\"ABCDE\"), Nr)\n", "norm_params=[(0, 1.2), (0.7, 1), (-0.5, 1.4), (0.3, 1), (0.8, 0.9)]# mean and standard deviations \n", "\n", "for i, letter in enumerate(\"ABCDE\"):\n", " y[gr == letter] *=norm_params[i][1]+ norm_params[i][0]\n", "df = pd.DataFrame(dict(Score=y, Group=gr))\n", "df.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Group data:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "gb=df.groupby(['Group'])\n", "group_name=['A', 'B', 'C', 'D', 'E']\n", "L=len(group_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each violin plot will be displayed in a subplot:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is the format of your plot grid:\n", "[ (1,1) x1,y1 ] [ (1,2) x2,y1 ] [ (1,3) x3,y1 ] [ (1,4) x4,y1 ] [ (1,5) x5,y1 ]\n", "\n" ] } ], "source": [ "fig = tls.make_subplots(rows=1, cols=L, shared_yaxes=True, \n", " horizontal_spacing=0.025, \n", " print_grid=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set colors for violins:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "violet_colors=['#604d9e','#6c4774','#9e70a2','#caaac2','#d6c7dd']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get plot data for each group, and assign them to the corresponding subplot:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for k, gr in enumerate(group_name):\n", " vals= np.asarray( gb.get_group(gr)['Score'], np.float)\n", " plot_data, plot_xrange=create_violinplot(vals, fillcolor=violet_colors[k])\n", " for item in plot_data:\n", " fig.append_trace(item, 1, k+1)\n", " fig['layout'].update({'xaxis{}'.format(k+1): \n", " make_XAxis('Group '+'{:d}'.format(k+1), plot_xrange)}) \n", "fig['layout'].update({'yaxis{}'.format(1): make_YAxis('')})# set the sharey axis style " ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pl_width=900\n", "pl_height=500\n", "title = 'Violin Plots'\n", "\n", "fig['layout'].update(title=title, \n", " font= Font(family='Georgia, serif'),\n", " showlegend=False, \n", " hovermode='closest', \n", " autosize=False, \n", " width=pl_width, \n", " height=pl_height,\n", " margin=Margin(\n", " l=65,\n", " r=65,\n", " b=85,\n", " t=150\n", " )\n", " ) \n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.sign_in('empet', 'my_api_key')\n", "py.iplot(fig, filename='Multiple-Violins')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"./custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }