{ "cells": [ { "cell_type": "markdown", "id": "95951f47-5a3a-4dc8-83ed-a5e868e310e8", "metadata": {}, "source": [ "# Preserving Data–Statistic Bijection in Lets-Plot\n", "\n", "Some statistical geometries in Lets-Plot (such as `geom_sina()`) generate their own statistical data, while still keeping a one-to-one correspondence with the original input data points.\n", "Previously, this correspondence was not preserved in the mapping: if you mapped an aesthetic (e.g., `color`) to a column from the original dataset, all points could end up with an aggregated value. \n", "\n", "Now, Lets-Plot preserves the **bijection between data and statistics** for such geometries. This means you can safely map aesthetics to variables from the original dataset, and they will be correctly aligned with the statistical output." ] }, { "cell_type": "code", "execution_count": 1, "id": "5d9b4c3f-05b0-49de-92db-bcef2275dbef", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from lets_plot import *" ] }, { "cell_type": "code", "execution_count": 2, "id": "17d0c10a-a352-4e43-a2d2-7c2dc16bdf87", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "LetsPlot.setup_html()" ] }, { "cell_type": "code", "execution_count": 3, "id": "a7ed9215-c4dd-4bd1-a2cf-bdba66ae2291", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(234, 12)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0manufacturermodeldisplyearcyltransdrvctyhwyflclass
01audia41.819994auto(l5)f1829pcompact
12audia41.819994manual(m5)f2129pcompact
23audia42.020084manual(m6)f2031pcompact
34audia42.020084auto(av)f2130pcompact
45audia42.819996auto(l5)f1626pcompact
\n", "
" ], "text/plain": [ " Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy \\\n", "0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 \n", "1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 \n", "2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 \n", "3 4 audi a4 2.0 2008 4 auto(av) f 21 30 \n", "4 5 audi a4 2.8 1999 6 auto(l5) f 16 26 \n", "\n", " fl class \n", "0 p compact \n", "1 p compact \n", "2 p compact \n", "3 p compact \n", "4 p compact " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/mpg.csv\")\n", "print(df.shape)\n", "df.head()" ] }, { "cell_type": "markdown", "id": "09ef0544-ff77-4e69-97cf-a9da33213696", "metadata": {}, "source": [ "## Map Columns to the Aesthetics" ] }, { "cell_type": "markdown", "id": "a2cf70f7-78f6-42e2-86c2-4429da556ca6", "metadata": {}, "source": [ "### Sina Stat" ] }, { "cell_type": "code", "execution_count": 4, "id": "71d8709f-fd3e-48ed-9081-557bb4daf91d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(df, aes(\"drv\", \"hwy\")) + \\\n", " geom_violin() + \\\n", " geom_sina(aes(color=\"displ\", size=\"cyl\"), seed=42) + \\\n", " scale_size(range=[2, 4])" ] }, { "cell_type": "markdown", "id": "c7474925-5f27-401d-a49a-7ad7b1ec80e9", "metadata": {}, "source": [ "### Q-Q Stat" ] }, { "cell_type": "code", "execution_count": 5, "id": "963295bd-53d2-4a9f-80bc-0ac4a717d492", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(df) + \\\n", " geom_qq(aes(sample=\"hwy\", color=\"displ\", size=\"cyl\")) + \\\n", " scale_size(range=[3, 6])" ] }, { "cell_type": "markdown", "id": "685df0b9-a1d5-48e3-b807-d77f826f4173", "metadata": {}, "source": [ "## Show Column Values in Tooltips\n", "\n", "For the above-mentioned statistics, the tooltips can display not only the mapped values, but also any columns from the original dataframe." ] }, { "cell_type": "code", "execution_count": 7, "id": "69def5fc-27b9-4e88-8d61-49f6b554f003", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(df, aes(sample=\"hwy\")) + \\\n", " geom_qq_line(color='teal') + \\\n", " geom_qq(size=3, shape=21, color=\"black\", fill=\"gold\", alpha=.5,\n", " tooltips=layer_tooltips().title(\"@manufacturer @model\")\n", " .line(\"theoretical|@..theoretical..\")\n", " .line(\"highway mileage (sample)|@..sample..\")\n", " .line(\"city mileage|@cty\")\n", " .line(\"engine displacement in liters|@displ\")\n", " .line(\"year of manufacturing|@year\")\n", " .line(\"number of cylinders|@cyl\")\n", " .line(\"type of transmission|@trans\")\n", " .line(\"drive type|@drv\")\n", " .line(\"fuel type|@fl\")\n", " .line(\"vehicle class|@class\")\n", " .format(\"year\", \"d\")\n", " .min_width(300)\n", " .anchor(\"bottom_right\")) + \\\n", " ggsize(1000, 600)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.23" } }, "nbformat": 4, "nbformat_minor": 5 }