{ "cells": [ { "cell_type": "markdown", "id": "d233529d-085b-4643-b754-10d7857d2307", "metadata": {}, "source": [ "# Sina Plot\n", "\n", "A sina plot visualizes a single variable across classes, with jitter width reflecting the data's density in each class." ] }, { "cell_type": "code", "execution_count": 1, "id": "99b35bfe-7e72-43df-ab16-b3774fd9f223", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from lets_plot import *" ] }, { "cell_type": "code", "execution_count": 2, "id": "149a5fba-d25e-4ab1-850b-bc102cace2b2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "LetsPlot.setup_html()" ] }, { "cell_type": "code", "execution_count": 3, "id": "72502639-c669-4531-969e-dfafcae36dc2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(234, 12)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0manufacturermodeldisplyearcyltransdrvctyhwyflclass
01audia41.819994auto(l5)f1829pcompact
12audia41.819994manual(m5)f2129pcompact
23audia42.020084manual(m6)f2031pcompact
34audia42.020084auto(av)f2130pcompact
45audia42.819996auto(l5)f1626pcompact
\n", "
" ], "text/plain": [ " Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy \\\n", "0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 \n", "1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 \n", "2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 \n", "3 4 audi a4 2.0 2008 4 auto(av) f 21 30 \n", "4 5 audi a4 2.8 1999 6 auto(l5) f 16 26 \n", "\n", " fl class \n", "0 p compact \n", "1 p compact \n", "2 p compact \n", "3 p compact \n", "4 p compact " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/mpg.csv\")\n", "print(df.shape)\n", "df.head()" ] }, { "cell_type": "markdown", "id": "3cc8f35b-77c0-47d3-9ec6-dce32cc96334", "metadata": {}, "source": [ "## Default View" ] }, { "cell_type": "code", "execution_count": 4, "id": "00a83c93-412d-4ea1-b2db-02093cb677f2", "metadata": {}, "outputs": [], "source": [ "g = ggplot(df, aes(\"drv\", \"hwy\"))" ] }, { "cell_type": "code", "execution_count": 5, "id": "3a69838c-e1ec-4a17-bd5b-90937b42829c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g + geom_sina(seed=42)" ] }, { "cell_type": "markdown", "id": "be4e15e1-be4d-4c37-8714-7d79b418ffb7", "metadata": {}, "source": [ "## When to Use" ] }, { "cell_type": "code", "execution_count": 6, "id": "ee002208-b54b-4019-ad03-a1b19a1d9e33", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " g + geom_boxplot() + ggtitle(\"geom_boxplot()\", \"Show distribution but not sample size\"),\n", " g + geom_violin() + ggtitle(\"geom_violin()\", \"Show distribution but not sample size\"),\n", " g + geom_jitter(seed=42) + ggtitle(\"geom_jitter()\", \"Show sample size but not distribution\"),\n", " g + geom_sina(seed=42) + ggtitle(\"geom_sina()\", \"Show both distribution and sample size\"),\n", "], ncol=2)" ] }, { "cell_type": "markdown", "id": "ca9b7eba-99c4-4908-871c-345ba4afce51", "metadata": {}, "source": [ "## Applying Jitter Position\n", "\n", "Sometimes vertically adjusting points might be desirable:\n", "\n", "- **overlapping values**, where multiple observations share the exact same y-value;\n", "\n", "- **integerish banding**, where values are close to integers and appear artificially grouped into horizontal bands.\n", "\n", "In these cases, you may consider using a position adjustment." ] }, { "cell_type": "code", "execution_count": 7, "id": "9ef60dca-faf1-4f0f-860a-e8463b07eb3c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " g + geom_sina(seed=42) + ggtitle(\"Default position\"),\n", " g + geom_sina(seed=42, position=position_jitter(width=0, seed=42)) + ggtitle(\"'jitter' position\"),\n", "])" ] }, { "cell_type": "markdown", "id": "fda60030-bd33-485c-8b5f-c84e0fa58819", "metadata": {}, "source": [ "Use the `'jitterdodge'` position adjustment if additional grouping is required:" ] }, { "cell_type": "code", "execution_count": 8, "id": "ddbcb8e8-8eca-4fb8-a6bc-00523e10d435", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " g + geom_sina(aes(color=as_discrete(\"year\")), seed=42) + \\\n", " scale_color_discrete(format=\"d\") + \\\n", " ggtitle(\"Default position\"),\n", " g + geom_sina(aes(color=as_discrete(\"year\")), seed=42,\n", " position=position_jitterdodge(jitter_width=0, seed=42)) + \\\n", " scale_color_discrete(format=\"d\") + \\\n", " ggtitle(\"'jitterdodge' position\"),\n", "])" ] }, { "cell_type": "markdown", "id": "ea63adc3-6524-45c4-8dcc-405778d70ebd", "metadata": {}, "source": [ "## Connection with Violin Plots\n", "\n", "In a sina, points are randomly positioned within the violin boundaries when both layers use the same parameters." ] }, { "cell_type": "markdown", "id": "c70149ec-784e-4858-8d62-3b3a9b306715", "metadata": {}, "source": [ "### Same Shape" ] }, { "cell_type": "code", "execution_count": 9, "id": "626bc4d2-1955-4ba5-8732-7c45d597577a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g + \\\n", " geom_violin(bw=1.5) + \\\n", " geom_sina(bw=1.5, seed=42)" ] }, { "cell_type": "markdown", "id": "de77fa60-2601-4f66-a942-2212c9b6dc32", "metadata": {}, "source": [ "### Same Quantiles" ] }, { "cell_type": "code", "execution_count": 10, "id": "52e7eac5-5d8b-4b9f-add9-0bfbd76e1636", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g + \\\n", " geom_violin(aes(color='..quantile..', fill='..quantile..'), alpha=.5) + \\\n", " geom_sina(aes(color='..quantile..'), size=2, seed=42) + \\\n", " scale_continuous(['color', 'fill'], low=\"#1a9641\", high=\"#d7191c\")" ] }, { "cell_type": "markdown", "id": "c5e937df-9ade-4265-91dc-67469a348621", "metadata": {}, "source": [ "### Same `scale` Values" ] }, { "cell_type": "code", "execution_count": 11, "id": "c1d62bfc-790d-4724-b9eb-fab0042849d3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " g + \\\n", " geom_violin(scale='width') + \\\n", " geom_sina(scale='width', size=1.5, seed=42) + \\\n", " ggtitle(\"scale='width'\"),\n", " g + \\\n", " geom_violin(scale='area') + \\\n", " geom_sina(scale='area', size=1.5, seed=42) + \\\n", " ggtitle(\"scale='area'\"),\n", " g + \\\n", " geom_violin(scale='count') + \\\n", " geom_sina(scale='count', size=1.5, seed=42) + \\\n", " ggtitle(\"scale='count'\"),\n", "])" ] }, { "cell_type": "markdown", "id": "a0fc0536-aca6-44db-9301-74c1621281d5", "metadata": {}, "source": [ "### Compatible Stats" ] }, { "cell_type": "code", "execution_count": 12, "id": "56c2764c-96c6-4463-a4a4-141f3a10b7d5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " g + geom_violin() + ggtitle(\"Violin\\nstat='ydensity' (default)\"),\n", " g + geom_violin(stat='sina') + ggtitle(\"Violin\\nstat='sina'\"),\n", " g + geom_sina(size=1.5, seed=42, stat='ydensity') + ggtitle(\"Sina\\nstat='ydensity'\"),\n", " g + geom_sina(size=1.5, seed=42) + ggtitle(\"Sina\\nstat='sina' (default)\"),\n", "], ncol=2)" ] }, { "cell_type": "markdown", "id": "ec77545a-2f86-4b20-903a-70f1ff59e7ec", "metadata": {}, "source": [ "### `show_half` Parameter" ] }, { "cell_type": "code", "execution_count": 13, "id": "1bd61d7b-38bf-42ce-9c4d-b539861ed3ae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g + \\\n", " geom_violin(show_half=-1, size=0, fill=\"gray85\") + \\\n", " geom_sina(show_half=1, seed=42)" ] }, { "cell_type": "markdown", "id": "b55ac8da-1f6d-45b3-8308-5e65b88273c4", "metadata": {}, "source": [ "### Raincloud Plot" ] }, { "cell_type": "code", "execution_count": 14, "id": "85c2feaf-ab2c-4db0-9700-d48a9a03b867", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g + \\\n", " geom_violin(aes(fill=\"drv\"), show_half=1, size=0, position=position_nudge(x=.07)) + \\\n", " geom_boxplot(aes(fill=\"drv\"), color=\"white\", width=.1, outlier_alpha=0) + \\\n", " geom_sina(aes(color=\"drv\"), show_half=-1, seed=42,\n", " position=position_nudge(x=-.07)) + \\\n", " scale_color_brewer(palette=\"Set2\") + \\\n", " scale_fill_brewer(palette=\"Pastel2\") + \\\n", " facet_grid(x=\"year\") + \\\n", " coord_flip() + \\\n", " theme_light() + flavor_solarized_dark()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.23" } }, "nbformat": 4, "nbformat_minor": 5 }