{ "cells": [ { "cell_type": "markdown", "id": "f40905f8-002a-4e51-97b2-a527a2a5af05", "metadata": {}, "source": [ "# Overriding Default Grouping with the `group` Aesthetic\n", "\n", "#### How Grouping Works in Lets-Plot\n", "\n", "Default Grouping Behavior:\n", "- Lets-Plot automatically groups data by discrete variables mapped to aesthetics like `color, shape, linetype`, etc.\n", "- This creates separate visual elements (`lines, paths, polygons`) for each unique combination of these variables\n", "\n", "Explicit Group Control:\n", "- Use `group = 'var'` to group only by that specific variable, overriding default grouping\n", "- Use `group = [var1, var2, ...]` to group by the interaction of multiple variables\n", "- Use `group = []` to disable all grouping completely\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "5949c620-ad0a-40c6-94dd-fe607f460c2e", "metadata": {}, "outputs": [], "source": [ "from lets_plot import *\n", "import polars as pl" ] }, { "cell_type": "code", "execution_count": 2, "id": "fabe8230-fc90-427c-afd9-6dd9f22c7a96", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "LetsPlot.setup_html()" ] }, { "cell_type": "code", "execution_count": 3, "id": "436e50eb-0f1a-46d1-90b5-804d1793cb04", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (5, 12)
manufacturermodeldisplyearcyltransdrvctyhwyflclass
i64strstrf64i64i64strstri64i64strstr
1"audi""a4"1.819994"auto(l5)""f"1829"p""compact"
2"audi""a4"1.819994"manual(m5)""f"2129"p""compact"
3"audi""a4"2.020084"manual(m6)""f"2031"p""compact"
4"audi""a4"2.020084"auto(av)""f"2130"p""compact"
5"audi""a4"2.819996"auto(l5)""f"1626"p""compact"
" ], "text/plain": [ "shape: (5, 12)\n", "┌─────┬──────────────┬───────┬───────┬───┬─────┬─────┬─────┬─────────┐\n", "│ ┆ manufacturer ┆ model ┆ displ ┆ … ┆ cty ┆ hwy ┆ fl ┆ class │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ i64 ┆ str ┆ str ┆ f64 ┆ ┆ i64 ┆ i64 ┆ str ┆ str │\n", "╞═════╪══════════════╪═══════╪═══════╪═══╪═════╪═════╪═════╪═════════╡\n", "│ 1 ┆ audi ┆ a4 ┆ 1.8 ┆ … ┆ 18 ┆ 29 ┆ p ┆ compact │\n", "│ 2 ┆ audi ┆ a4 ┆ 1.8 ┆ … ┆ 21 ┆ 29 ┆ p ┆ compact │\n", "│ 3 ┆ audi ┆ a4 ┆ 2.0 ┆ … ┆ 20 ┆ 31 ┆ p ┆ compact │\n", "│ 4 ┆ audi ┆ a4 ┆ 2.0 ┆ … ┆ 21 ┆ 30 ┆ p ┆ compact │\n", "│ 5 ┆ audi ┆ a4 ┆ 2.8 ┆ … ┆ 16 ┆ 26 ┆ p ┆ compact │\n", "└─────┴──────────────┴───────┴───────┴───┴─────┴─────┴─────┴─────────┘" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mtcars = pl.read_csv(\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/mpg.csv\")\n", "mtcars.head()" ] }, { "cell_type": "markdown", "id": "04b31eae-df20-407c-a869-84af10998edb", "metadata": {}, "source": [ "#### 1. Highway MPG by Drive Type\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "4ed67db6-5cc7-4a7a-a872-3729919ed82c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "seed = 21\n", "( ggplot(mtcars, aes(x='drv', y='hwy')) \n", " + geom_violin(tooltips='none') \n", " + geom_sina(seed=seed)\n", ")" ] }, { "cell_type": "markdown", "id": "470f9100-a241-4657-8c03-3edfa04d6f6f", "metadata": {}, "source": [ "#### 2. Add More Information - `color`" ] }, { "cell_type": "code", "execution_count": 5, "id": "955d6793-c886-4dec-b54b-3abb3f495a06", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "( ggplot(mtcars, aes(x='drv', y='hwy')) \n", " + geom_violin(tooltips='none') \n", " + geom_sina(aes(color='cyl'), seed=seed)\n", ")" ] }, { "cell_type": "markdown", "id": "5192b398-40f9-41f6-957d-bdd40a321a9f", "metadata": {}, "source": [ "#### 3. Discrete `color`: Default Grouping Creates Unwanted Separation\n", "\n", "Let's add discrete colors by marking the `cyl` variable as discrete. \n", "\n", "When we map `color=as_discrete('cyl')`, Lets-Plot automatically groups the data by the discrete color variable. \\\n", "This means:\n", "\n", "* Automatic grouping: Each combination of `drv` (x-axis) and `cyl` (color) becomes a separate group\n", "* Position adjustment: The `geom_sina()` uses \"dodge\" positioning by default, which separates overlapping groups horizontally\n", "* Result: Instead of one sina plot per drive type, we get 4 separate sina plots (one for each cylinder count) within each drive type category\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "698448ec-b2cf-49dd-8385-8a204be04d22", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "( ggplot(mtcars, aes(x='drv', y='hwy')) \n", " + geom_violin(tooltips='none') \n", " + geom_sina(aes(color=as_discrete('cyl')), seed=seed)\n", ")" ] }, { "cell_type": "markdown", "id": "6ec21599-4f7b-4032-920e-5aaa71ae33c4", "metadata": {}, "source": [ "#### 4. Fix with Explicit Grouping by Drive Type" ] }, { "cell_type": "code", "execution_count": 7, "id": "d7ddc01a-1a4e-4e8a-abd7-db2ea36449fc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "( ggplot(mtcars, aes(x='drv', y='hwy')) \n", " + geom_violin(tooltips='none') \n", " + geom_sina(aes(color=as_discrete('cyl'),\n", " group='drv' # <-- group only by drive type (ignoring the color variable for grouping)\n", " ), seed=seed)\n", ")" ] }, { "cell_type": "markdown", "id": "ae362a61-097b-46f8-b407-b32709c5765d", "metadata": {}, "source": [ "#### 5. Cleaner Fix: Disable All Grouping" ] }, { "cell_type": "code", "execution_count": 8, "id": "e3d71379-1732-489d-baf7-08c87f4c6d01", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "( ggplot(mtcars, aes(x='drv', y='hwy')) \n", " + geom_violin(tooltips='none') \n", " + geom_sina(aes(color=as_discrete('cyl'),\n", " group=[] # <-- disable all grouping entirely\n", " ), seed=seed)\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.23" } }, "nbformat": 4, "nbformat_minor": 5 }