{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Violin Plot with Python Plotly"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Violin plot is a trace that visually encodes the distribution of a data set, along with its summary statistics.\n",
"It displays the graph of the estimated probability density function (pdf) mirrored about y-axis, and inside the violin-like shaped region, the elements of a box plot (median, lower and upper quartile, whisker position).\n",
" \n",
"In this Jupyter Notebook we define functions to get the Plotly plot of a violin plot. In order to get more insights into distributional properties we add the option to overlay onto the same axis the rug plot of the data set."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import HTML\n",
"HTML('')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from scipy import stats"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compute the summary statistics of data:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def calc_stats(data) :\n",
" x=np.asarray(data, np. float) \n",
" vals_min=np.min(x)\n",
" vals_max=np.max(x)\n",
" q2=np.percentile(x, 50, interpolation='linear') \n",
" q1=np.percentile(x, 25, interpolation='lower')\n",
" q3=np.percentile(x, 75, interpolation='higher')\n",
" IQR=q3-q1\n",
" whisker_dist = 1.5 * IQR\n",
" #in order to prevent drawing whiskers outside the interval \n",
" #of data one defines the whisker positions as:\n",
" d1 = np.min(x[x >= (q1 - whisker_dist)])\n",
" d2 = np.max(x[x <= (q3 + whisker_dist)])\n",
" return vals_min, vals_max, q1, q2 ,q3, d1,d2"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import plotly.plotly as py\n",
"from plotly.graph_objs import *\n",
"import plotly.tools as tls "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Functions that define violin components:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def make_half_violin( x, y, fillcolor='#1f77b4', linecolor='rgb(50,50,50)'): \n",
" text=['(pdf(y), y)=('+'{:0.2f}'.format(x[i])+', '+'{:0.2f}'.format(y[i])+')'\n",
" for i in range(len(x))] \n",
" return Scatter(x=x, \n",
" y=y, mode='lines',\n",
" name='',\n",
" text=text,\n",
" fill='tonextx', \n",
" fillcolor= fillcolor,\n",
" line=Line(width=0.5, color=linecolor, shape='spline'),\n",
" hoverinfo='text',\n",
" opacity=0.5\n",
" )\n",
" \n",
"\n",
"def make_rugplot(vals, pdf_max, distance, color='#1f77b4'):\n",
" return Scatter(y=vals, \n",
" x=[-pdf_max-distance]*len(vals),\n",
" marker=Marker(\n",
" color=color,\n",
" symbol='line-ew-open'\n",
" ),\n",
" mode='markers',\n",
" name='',\n",
" showlegend=False,\n",
" hoverinfo='y'\n",
" ) \n",
"def make_quartiles(q1, q3):\n",
" return Scatter(x=[0, 0],\n",
" y=[q1, q3],\n",
" text=['lower-quartile: '+'{:0.2f}'.format(q1), \n",
" 'upper-quartile: '+'{:0.2f}'.format(q3)],\n",
" mode='lines',\n",
" line=Line(width=4, color='rgb(0,0,0)'),\n",
" hoverinfo='text'\n",
" )\n",
"def make_median(q2):\n",
" return Scatter(x=[0],\n",
" y=[q2], \n",
" text=['median: '+'{:0.2f}'.format(q2)],\n",
" mode='markers',\n",
" marker=dict(symbol='square', color='rgb(255,255,255)'),\n",
" hoverinfo='text'\n",
" )\n",
"def make_non_outlier_interval(d1,d2):\n",
" return Scatter(x=[0, 0],\n",
" y=[d1, d2],\n",
" name='',\n",
" mode='lines',\n",
" line=Line(width=1.5, color='rgb(0,0,0)')\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set axes:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
" \n",
"def make_XAxis(xaxis_title, xaxis_range):\n",
" xaxis=XAxis(title=xaxis_title,\n",
" range=xaxis_range,\n",
" showgrid=False,\n",
" zeroline=False,\n",
" showline=False,\n",
" mirror=False,\n",
" ticks='',\n",
" showticklabels=False,\n",
" )\n",
" return xaxis\n",
"\n",
"\n",
"def make_YAxis(yaxis_title):\n",
" yaxis = YAxis(title=yaxis_title,\n",
" showticklabels=True,\n",
" autorange=True,\n",
" ticklen=4,\n",
" showline=True,\n",
" zeroline=False,\n",
" showgrid=False,\n",
" mirror=False) \n",
" return yaxis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data values, `vals`, can be given in a numeric list, numpy array of shape (n, ) or a pandas series."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because a violin plot is symmetric with respect to a vertical axis, we define the range of x values\n",
"in the plot either\n",
"of the form `range=[-a,a]` or of the form `[-b,a]`, when a rug plot is overlaid."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def create_violinplot(vals, fillcolor='#1f77b4', rugplot=True):\n",
" vals=np.asarray(vals, np.float)\n",
" vals_min, vals_max, q1, q2, q3, d1, d2=calc_stats(vals)#summary statistics\n",
" \n",
" pdf= stats.gaussian_kde(vals)# kernel density estimation of pdf\n",
" xx=np.linspace(vals_min, vals_max, 100)# grid over the data interval\n",
" yy=pdf(xx)#evaluate the pdf at the grid xx\n",
" max_pdf=np.max(yy)\n",
" distance=2.0*max_pdf/10 if rugplot else 0# distance from the violin plot to rugplot\n",
" plot_xrange=[-max_pdf-distance-0.1, max_pdf+0.1]# range for x values in the plot\n",
" \n",
" plot_data=[make_half_violin(-yy, xx, fillcolor=fillcolor),\n",
" make_half_violin(yy, xx, fillcolor=fillcolor),\n",
" make_non_outlier_interval(d1, d2),\n",
" make_quartiles(q1,q3),\n",
" make_median(q2)]\n",
" if rugplot: \n",
" plot_data.append(make_rugplot(vals, max_pdf, distance=distance, color=fillcolor))\n",
" return plot_data, plot_xrange "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us define first a single violin plot:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Score | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 6.55 | \n",
"
\n",
" \n",
" 1 | \n",
" 9.13 | \n",
"
\n",
" \n",
" 2 | \n",
" 8.46 | \n",
"
\n",
" \n",
" 3 | \n",
" 9.38 | \n",
"
\n",
" \n",
" 4 | \n",
" 6.35 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Score\n",
"0 6.55\n",
"1 9.13\n",
"2 8.46\n",
"3 9.38\n",
"4 6.35"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df=pd.read_excel('Violin-plot-data.xlsx')\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"x=list(df['Score'])\n",
"plot_data, plot_xrange=create_violinplot(x, fillcolor='rgb(102,194,163)')"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"layout=Layout(title='Violin and Rug Plot',\n",
" autosize=False,\n",
" font=Font(size=11),\n",
" height=450,\n",
" showlegend=False,\n",
" width=350,\n",
" xaxis=make_XAxis('', plot_xrange), \n",
" yaxis=make_YAxis(''),\n",
" hovermode='closest'\n",
" ) "
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"layout['yaxis'].update(dict(showline=False, showticklabels=False, ticks=''))"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"\n",
"fig=Figure(data=Data(plot_data), layout=layout)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"py.sign_in('empet', 'my_api_key')\n",
"py.iplot(fig, filename='Violin-Plot-Example')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data summary encoded in a violin plot facilitate comparison of multiple data sets. \n",
"In the following we generate a few data sets and their violin plots:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Group | \n",
" Score | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" B | \n",
" 1.656178 | \n",
"
\n",
" \n",
" 1 | \n",
" C | \n",
" -1.379259 | \n",
"
\n",
" \n",
" 2 | \n",
" C | \n",
" 1.567691 | \n",
"
\n",
" \n",
" 3 | \n",
" B | \n",
" 1.484571 | \n",
"
\n",
" \n",
" 4 | \n",
" E | \n",
" 0.410634 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Group Score\n",
"0 B 1.656178\n",
"1 C -1.379259\n",
"2 C 1.567691\n",
"3 B 1.484571\n",
"4 E 0.410634"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.seed(619517)\n",
"Nr=250\n",
"y = np.random.randn(Nr)\n",
"gr = np.random.choice(list(\"ABCDE\"), Nr)\n",
"norm_params=[(0, 1.2), (0.7, 1), (-0.5, 1.4), (0.3, 1), (0.8, 0.9)]# mean and standard deviations \n",
"\n",
"for i, letter in enumerate(\"ABCDE\"):\n",
" y[gr == letter] *=norm_params[i][1]+ norm_params[i][0]\n",
"df = pd.DataFrame(dict(Score=y, Group=gr))\n",
"df.head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Group data:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"gb=df.groupby(['Group'])\n",
"group_name=['A', 'B', 'C', 'D', 'E']\n",
"L=len(group_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each violin plot will be displayed in a subplot:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is the format of your plot grid:\n",
"[ (1,1) x1,y1 ] [ (1,2) x2,y1 ] [ (1,3) x3,y1 ] [ (1,4) x4,y1 ] [ (1,5) x5,y1 ]\n",
"\n"
]
}
],
"source": [
"fig = tls.make_subplots(rows=1, cols=L, shared_yaxes=True, \n",
" horizontal_spacing=0.025, \n",
" print_grid=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set colors for violins:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"violet_colors=['#604d9e','#6c4774','#9e70a2','#caaac2','#d6c7dd']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get plot data for each group, and assign them to the corresponding subplot:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"for k, gr in enumerate(group_name):\n",
" vals= np.asarray( gb.get_group(gr)['Score'], np.float)\n",
" plot_data, plot_xrange=create_violinplot(vals, fillcolor=violet_colors[k])\n",
" for item in plot_data:\n",
" fig.append_trace(item, 1, k+1)\n",
" fig['layout'].update({'xaxis{}'.format(k+1): \n",
" make_XAxis('Group '+'{:d}'.format(k+1), plot_xrange)}) \n",
"fig['layout'].update({'yaxis{}'.format(1): make_YAxis('')})# set the sharey axis style "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"pl_width=900\n",
"pl_height=500\n",
"title = 'Violin Plots'\n",
"\n",
"fig['layout'].update(title=title, \n",
" font= Font(family='Georgia, serif'),\n",
" showlegend=False, \n",
" hovermode='closest', \n",
" autosize=False, \n",
" width=pl_width, \n",
" height=pl_height,\n",
" margin=Margin(\n",
" l=65,\n",
" r=65,\n",
" b=85,\n",
" t=150\n",
" )\n",
" ) \n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"py.sign_in('empet', 'my_api_key')\n",
"py.iplot(fig, filename='Multiple-Violins')"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.core.display import HTML\n",
"def css_styling():\n",
" styles = open(\"./custom.css\", \"r\").read()\n",
" return HTML(styles)\n",
"css_styling()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}