{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "##
Cloudy Mountain Plot
an informative RDI(*) categorical distribution plot
inspired by Violin, Bean and Pirate Plots
\n", "####
云山图 [yún shān tú] / 雲山図 [くもやまず/kumo yama zu]
\n", "### by [Dr Giuseppe Insana](http://insana.net), August 2019 - coded in Julia and Python\n", "#### (*): RDI = Raw data + Descriptive statistics + Inferential statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### This notebook shows usage examples of Cloudy Mountain Plot in Julia.\n", "### Please check the [CMPlot.jl github page](https://github.com/g-insana/CMPlot.jl) for installation instruction and options" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [], "source": [ "using CMPlot\n", "using PlotlyJS\n", "using Random #for shuffle()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [], "source": [ "#To load some public test data:\n", "using RDatasets\n", "iris = dataset(\"datasets\", \"iris\") # Iris dataset; Anderson, Edgar (1935) Fisher, R. A. (1936) ; http://vincentarelbundock.github.io/Rdatasets/doc/datasets/iris.html\n", "train = dataset(\"Ecdat\", \"Wages\") # Individual wages, US, 1976 to 1982; Cornwell, C. and P. Rupert (1988) ; http://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/Wages.html\n", "train[!,:ExpYears]=[if x < 10 \"<10\" elseif x >= 10 && x < 30 \"10-29\" else \">30\" end for x in train.Exp] #binning on years of full-time work experience\n", "train[!,:EduYears]=[if x < 9 \"<9\" elseif x >= 9 && x < 14 \"9-13\" else \">13\" end for x in train.Ed] #binning on years of education\n", "train = train[shuffle(1:size(train, 1)),:] #shuffle if you wish to bias when using pointsmaxdisplayed\n", "train[1:3,:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Alternatively, load your own data, e.g. via CSV:\n", "#=\n", "using CSV\n", "using Statistics\n", "train = DataFrame(CSV.File(\"train.csv\", delim=',', decimal='.')) #dataset on loans\n", "#Some data cleaning:\n", "train.Married=coalesce.(train.Married,\"No\") #replace missing values in Married with \"No\"\n", "train.Gender=coalesce.(train.Gender,\"Male\") #replace missing values in Gender with \"Male\"\n", "train.LoanAmount=coalesce.(train.LoanAmount,floor(Int,mean(skipmissing(train.LoanAmount)))) #replace missing values in LoanAmount with their mean\n", "train.LoanAmount[1]\n", "=#" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1 min Quickstart: just pass as arguments a dataframe and the column label for the categorical data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot(cmplot(iris,xcol=:Species)...) #using splat operator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A little evolutionary history, via box plots and violins:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [], "source": [ "using PlotlyJS\n", "layout=Layout(\n", " title=\"BoxPlot\",xaxis_title=\"Wage (log)\",yaxis_title=\"Married+Gender\",\n", " xaxis_showgrid=true, yaxis_showgrid=true,\n", " margin_t=20,\n", " legend_y=0.9, legend_x=0.8,\n", " )\n", "\n", "boxbygender=box(y=train.Sex, x=train.LWage, name=\"Gender\",legendgroup=\"gender\",orientation=\"h\",marker_color=\"green\")\n", "boxbymarried=box(y=train.Married, x=train.LWage, name=\"Married\",legendgroup=\"married\",orientation=\"h\",marker_color=\"blue\")\n", "violbygender=violin(y=train.Sex, x=train.LWage, name=\"Gender\",legendgroup=\"gender\",box_visible=true,\n", " orientation=\"h\",marker_color=\"green\")\n", "violbymarried=violin(y=train.Married, x=train.LWage, name=\"Married\",legendgroup=\"married\",box_visible=true,\n", " orientation=\"h\",marker_color=\"blue\")\n", "p1=PlotlyJS.plot([boxbygender,boxbymarried],layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"married_plus_gender-loanamount_boxplot.pdf\"))\n", "\n", "layout[\"title\"]=\"ViolinPlot\"\n", "layout[\"yaxis_title\"]=\" \"\n", "p2=PlotlyJS.plot([violbygender,violbymarried],layout)#,style=mystyle)\n", "#savefig(p2::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"married_plus_gender-loanamount_violinplot.pdf\"))\n", "[p1 p2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### And this would be same data shown using bean plots and pirate plots:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(bt1,layout)=cmplot(train; xcol=:Sex, ycol=:LWage, orientation=\"h\",inf=\"none\",ycolorgroups=false,\n", " side=\"both\",colorshift=2,colorrange=4,pointsopacity=1,showpoints=true,showboxplot=false,pointshapes=[\"line-ns\"],\n", " markoutliers=false,pointsmaxdisplayed=200)\n", "(bt2,layout)=cmplot(train; xcol=:Married, ycol=:LWage, orientation=\"h\",inf=\"none\",ycolorgroups=false,\n", " side=\"both\",colorshift=0,colorrange=4,pointsopacity=1,showpoints=true,showboxplot=false,pointshapes=[\"line-ns\"],\n", " markoutliers=false,pointsmaxdisplayed=200)\n", "layout[\"title\"]=\"BeanPlot\"\n", "layout[\"yaxis_title\"]=\"Married + Gender\"\n", "layout[\"xaxis_title\"]=\"Wage (log)\"\n", "layout[\"margin_t\"]=30\n", "layout[\"showlegend\"]=false\n", "\n", "p1=PlotlyJS.plot(union(bt1,bt2),layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"married_plus_gender-loanamount_rdiplot.pdf\"))\n", "\n", "#union of two different Xcolumns, joining pirate plot like traces:\n", "(pt1,layout)=cmplot(train; xcol=:Sex, ycol=:LWage, orientation=\"h\",inf=\"hdi\",ycolorgroups=false,\n", " side=\"both\",colorshift=2,colorrange=4,pointsopacity=0.3,showpoints=true,showboxplot=false,pointshapes=[\"circle\"],\n", " markoutliers=false,pointsmaxdisplayed=200)\n", "(pt2,layout)=cmplot(train; xcol=:Married, ycol=:LWage, orientation=\"h\",inf=\"hdi\",ycolorgroups=false,\n", " side=\"both\",colorshift=0,colorrange=4,pointsopacity=0.3,showpoints=true,showboxplot=false,pointshapes=[\"circle\"],\n", " markoutliers=false,pointsmaxdisplayed=200)\n", "layout[\"title\"]=\"PiratePlot\"\n", "layout[\"yaxis_title\"]=\" \"\n", "layout[\"xaxis_title\"]=\"Wage (log)\"\n", "layout[\"margin_t\"]=30\n", "layout[\"legend\"]=attr(orientation=\"h\")\n", "\n", "p2=PlotlyJS.plot(union(pt1,pt2),layout)\n", "#savefig(p2::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"married_plus_gender-loanamount_pirateplot.pdf\"))\n", "[p1 p2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Until we arrive to the cloudy mountain plots:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# union of two separate Xcolumns (Gender + Married)\n", "(traces1,layout)=cmplot(train; xcol=:Sex, ycol=:LWage, xsuperimposed=false,orientation=\"h\",\n", " colorshift=2,colorrange=4,ycolorgroups=false,side=\"pos\",inf=\"hdi\",conf_level=0.95,altsidesflip=false,\n", " pointsoverdens=true,showpoints=true,pointshapes=[\"triangle-down\",\"triangle-up\"],pointsopacity=0.2,\n", " pointsdistance=1,pointsmaxdisplayed=400)\n", "(traces2,layout)=cmplot(train; xcol=:Married, ycol=:LWage, xsuperimposed=false,\n", " orientation=\"h\",colorshift=0,colorrange=4,ycolorgroups=false,side=\"pos\",inf=\"hdi\",conf_level=0.95,\n", " altsidesflip=false,pointsoverdens=true,showpoints=true,pointshapes=[\"triangle-right\",\"triangle-left\"],\n", " pointsopacity=0.3,pointsdistance=1,pointsmaxdisplayed=400,title=\"CloudyMountainPlot\")\n", "layout[\"legend_tracegroupgap\"]=0\n", "layout[\"xaxis_title\"]=\"Wage (log)\"\n", "layout[\"yaxis_title\"]=\"Married and Gender\"\n", "layout[\"margin_l\"]=60\n", "#layout[\"yaxis_range\"]=[-0.1,3.51]\n", "p1=plot(union(traces1,traces2),layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"married_plus_gender-loanamount_overimposed_rdiplot.pdf\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ... which is particularly powerful when overimposed for same X:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Superimposed rdi plots for union of two separate Xcolumns (Gender + Married)\n", "(traces1,layout)=cmplot(train; xcol=:Sex, ycol=:LWage, xlabel=\"M/F\", xsuperimposed=true,orientation=\"h\",\n", " colorshift=2,colorrange=4,ycolorgroups=false,side=\"alt\",inf=\"hdi\",conf_level=0.95,altsidesflip=false,\n", " pointsoverdens=true,showpoints=true,pointshapes=[\"triangle-down\",\"triangle-up\"],pointsdistance=0.6,\n", " pointsmaxdisplayed=400)\n", "(traces2,layout)=cmplot(train; xcol=:Married, ycol=:LWage, xlabel=\"married?\", xsuperimposed=true,\n", " orientation=\"h\",colorshift=0,colorrange=4,ycolorgroups=false,side=\"alt\",inf=\"hdi\",conf_level=0.95,\n", " altsidesflip=false,pointsoverdens=true,showpoints=true,pointshapes=[\"triangle-right\",\"triangle-left\"]\n", " ,pointsdistance=0.6,pointsmaxdisplayed=400)\n", "layout[\"legend_tracegroupgap\"]=0\n", "layout[\"yaxis_title\"]=\"Married and Gender\"\n", "layout[\"xaxis_title\"]=\"Wage (log)\"\n", "layout[\"title\"]=\"Married + Gender ~ Wage\"\n", "layout[\"margin_l\"]=60\n", "#layout[\"yaxis_range\"]=[-0.51,1.51]\n", "p1=plot(union(traces1,traces2),layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"married_plus_gender-loanamount_overimposed_rdiplot.pdf\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Now some illustrative usage examples\n", "### First: two Ycol side by side\n", "#### (notice the raw point \"clouds\" on opposite side of the kernel density \"mountains\", for better clarity)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#IRIS dataset, plotting two Ycol side by side\n", "traces,layout=cmplot(iris,xcol=:Species,ycol=[:SepalLength,:PetalLength],\n", " colorrange=3,pointshapes=[\"star-triangle-up\",\"star-diamond\",\"star-square\"])\n", "p1=plot(traces,layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"species-sepallength_petallength_rdiplot.pdf\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### It works well also with three Ycol:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#IRIS dataset, plotting three Ycol side by side\n", "traces,layout=cmplot(iris,xcol=:Species,ycol=[:SepalLength,:PetalLength,:SepalWidth],\n", " pointshapes=[\"star-triangle-up\",\"star-diamond\",\"star-square\"])\n", "p1=plot(traces,layout)\n", "\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"species-sepallength_petallength_sepalwidth_rdiplot.pdf\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Excursus: not only we can show different Xcol together, but we can intersect them, grouping the data by two or more X\n", "### For example, combining two X a different picture is revealed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#intersection of two different Xcolumns:\n", "traces,layout=cmplot(train; xcol=[:Married,:Sex], ycol=:LWage,ycolorgroups=false,\n", " side=\"pos\",pointshapes=[\"star-diamond\"],pointsmaxdisplayed=500)\n", "layout[\"legend_tracegroupgap\"]=0\n", "layout[\"title\"]=\"Married & Gender ~ Wage\"\n", "layout[\"yaxis_title\"]=\"Married & Gender\"\n", "layout[\"xaxis_title\"]=\"Wage (log)\"\n", "p1=plot(traces,layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"married_gender-loanamount_rdiplot.pdf\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Or with three:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#intersection of three different Xcolumns:\n", "traces,layout=cmplot(train; xcol=[:Sex,:Married,:SMSA], ycol=:LWage, ycolorgroups=false, side=\"both\",\n", " pointsmaxdisplayed=300)\n", "layout[\"legend_tracegroupgap\"]=0\n", "layout[\"margin_l\"]=180\n", "layout[\"margin_r\"]=0\n", "layout[\"title\"]=\"Gender & Married & LivesInCity ~ Wage\"\n", "layout[\"yaxis_title\"]=\"Gender & Married & LivesInCity\"\n", "layout[\"xaxis_title\"]=\"Wage (log)\"\n", "p1=plot(traces,layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"married_gender_education-applicantincome_rdiplot.pdf\"))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Super imposition of distributions: one of the best features of cloudy mountain plots" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [], "source": [ "#Superimposed plots for a single Xcolumn\n", "(traces1,layout)=cmplot(train; xcol=:South, ycol=:LWage, xsuperimposed=true, pointsoverdens=true,\n", " ycolorgroups=false,altsidesflip=false, colorshift=2, colorrange=4, pointshapes=[\"star\",\"pentagon\"],\n", " pointsmaxdisplayed=400)\n", "layout[\"legend_tracegroupgap\"]=0\n", "layout[\"margin_b\"]=50\n", "layout[\"title\"]=\"South ~ Wage\"\n", "layout[\"xaxis_title\"]=\"Wage (log)\"\n", "layout[\"yaxis_title\"]=\"Resides in the south?\"\n", "layout[\"yaxis_range\"]=[-0.51,0.51]\n", "p1=plot(traces1,layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"gender-loanamount_rdiplot.pdf\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#superimposed plot for combination of two X variables\n", "traces,layout=cmplot(train,xcol=[:Sex,:BlueCol],xsuperimposed=true,ycol=:LWage,\n", " ycolorgroups=false,pointsoverdens=true,markoutliers=false,pointshapes=[\"hexagon\"],pointsmaxdisplayed=500)\n", "layout[\"legend_tracegroupgap\"]=0\n", "layout[\"title\"]=\"Gender & BlueCollar ~ Wage\"\n", "layout[\"xaxis_title\"]=\"Wage (log)\"\n", "layout[\"yaxis_title\"]=\"Gender & BlueCollar\"\n", "p1=plot(traces,layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"propertyarea_gender-loanamount_rdiplot.pdf\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidePrompt": true }, "outputs": [], "source": [ "#superimposed plot for combination of two X variables, one of which with 3 bins\n", "traces,layout=cmplot(train,xcol=[:EduYears,:Union],xsuperimposed=true,ycol=:LWage,\n", " ycolorgroups=false, altsidesflip=true, pointsoverdens=true,markoutliers=false,\n", " pointshapes=[\"hexagon\"],pointsmaxdisplayed=100)\n", "layout[\"legend_tracegroupgap\"]=0\n", "layout[\"title\"]=\"EducationYears & UnionContract ~ Wage\"\n", "layout[\"xaxis_title\"]=\"Wage (log)\"\n", "layout[\"yaxis_title\"]=\"EducationYears & UnionContract\"\n", "p1=plot(traces,layout)\n", "#savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"propertyarea_gender-loanamount_rdiplot.pdf\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## To save publication ready vector graphic files (e.g. svg or pdf):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "#To save the plot as high quality file:\n", "\n", "#=\n", "using ORCA\n", "savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"output_filename.svg\"))\n", "savefig(p1::Union{Plot,PlotlyJS.SyncPlot}, joinpath(homedir(),\"output_filename.pdf\"))\n", "=#\n", "\n", "#NOTE: if it does not work, try from a terminal to manually start the conda server, \n", "# e.g. \"conda run orca serve -p 7982\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## To save a plot as standalone html:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "jsonplot1 = json(p1)\n", "template = \"\n", "\n", " \n", "\n", "\n", "
\n", " \n", "\n", "\n", "\"\n", "outputfilename=\"plot_name.html\"\n", "#=\n", "open(joinpath(homedir(),outputfilename), \"w\") do f\n", " write(f, template)\n", "end\n", "=#" ] } ], "metadata": { "@webio": { "lastCommId": "cf990d2cf90f4926a43b3219fa2fab34", "lastKernelId": "8cb8294e-c739-4d44-90fa-12df7428cf55" }, "hide_code_all_hidden": false, "kernelspec": { "display_name": "Julia 1.5.1", "language": "julia", "name": "julia-1.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.5.1" } }, "nbformat": 4, "nbformat_minor": 2 }