{ "cells": [ { "cell_type": "markdown", "id": "46012313-2725-4f37-9f2b-e0242fa0a5b5", "metadata": {}, "source": [ "# `geom_pointdensity()`\n", "\n", "`geom_pointdensity()` is like `geom_point()`, but smarter in crowded spots. It plots each data point, and also colors that point based on how many other points are packed around it. Dense clusters get one color, sparse areas get another. So instead of just a scatterplot, you get a built-in heatmap of local point density, without needing a separate 2D density layer." ] }, { "cell_type": "code", "execution_count": 1, "id": "580ed172-f0d9-4ac4-929c-546ef97d968c", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from lets_plot import *" ] }, { "cell_type": "code", "execution_count": 2, "id": "c4e2b532-8f8a-45cd-ad8d-cd2f4ed65c13", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "LetsPlot.setup_html()" ] }, { "cell_type": "markdown", "id": "9d2be025-73a6-41cf-afc6-0e261193e888", "metadata": {}, "source": [ "## Prepare Data" ] }, { "cell_type": "code", "execution_count": 3, "id": "0cb92dcf-ef71-4770-8321-4529d9f7f233", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(53940, 11)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
caratcutcolorclaritydepthtablepricexyzis_ideal
00.23Ideal6SI261.555.03263.953.982.43Quality: ideal
10.21Premium6SI159.861.03263.893.842.31Quality: not ideal
20.23Good6VS156.965.03274.054.072.31Quality: not ideal
30.29Premium2VS262.458.03344.204.232.63Quality: not ideal
40.31Good1SI263.358.03354.344.352.75Quality: not ideal
\n", "
" ], "text/plain": [ " carat cut color clarity depth table price x y z \\\n", "0 0.23 Ideal 6 SI2 61.5 55.0 326 3.95 3.98 2.43 \n", "1 0.21 Premium 6 SI1 59.8 61.0 326 3.89 3.84 2.31 \n", "2 0.23 Good 6 VS1 56.9 65.0 327 4.05 4.07 2.31 \n", "3 0.29 Premium 2 VS2 62.4 58.0 334 4.20 4.23 2.63 \n", "4 0.31 Good 1 SI2 63.3 58.0 335 4.34 4.35 2.75 \n", "\n", " is_ideal \n", "0 Quality: ideal \n", "1 Quality: not ideal \n", "2 Quality: not ideal \n", "3 Quality: not ideal \n", "4 Quality: not ideal " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/diamonds.csv\")\n", "df[\"color\"] = df[\"color\"].map({\"D\": 7, \"E\": 6, \"F\": 5, \"G\": 4, \"H\": 3, \"I\": 2, \"J\": 1})\n", "df = df.assign(is_ideal=(df[\"cut\"] == \"Ideal\").map({True: \"Quality: ideal\", False: \"Quality: not ideal\"}))\n", "print(df.shape)\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 4, "id": "b1ef9592-b379-4fe0-9717-71032bf64468", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1610, 9)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
caratcolorclaritydepthtablepricexyz
00.226VS265.161.03373.873.782.49
10.866SI255.169.027576.456.333.52
20.965SI266.362.027596.275.954.07
30.705VS264.557.027625.575.533.58
40.705VS265.355.027625.635.583.66
\n", "
" ], "text/plain": [ " carat color clarity depth table price x y z\n", "0 0.22 6 VS2 65.1 61.0 337 3.87 3.78 2.49\n", "1 0.86 6 SI2 55.1 69.0 2757 6.45 6.33 3.52\n", "2 0.96 5 SI2 66.3 62.0 2759 6.27 5.95 4.07\n", "3 0.70 5 VS2 64.5 57.0 2762 5.57 5.53 3.58\n", "4 0.70 5 VS2 65.3 55.0 2762 5.63 5.58 3.66" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fair_cut_df = df[df[\"cut\"] == \"Fair\"].drop(columns=[\"cut\", \"is_ideal\"]).reset_index(drop=True)\n", "print(fair_cut_df.shape)\n", "fair_cut_df.head()" ] }, { "cell_type": "markdown", "id": "394bc31e-ecfd-49cd-90ef-32f1190dd81f", "metadata": {}, "source": [ "## Default View" ] }, { "cell_type": "code", "execution_count": 5, "id": "48231944-ad70-4f16-b5ed-f0577d2fc89c", "metadata": {}, "outputs": [], "source": [ "p = ggplot(fair_cut_df, aes(\"carat\", \"price\"))" ] }, { "cell_type": "code", "execution_count": 6, "id": "9631514e-9c61-4a08-a580-c98ef8eb476d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p + geom_pointdensity()" ] }, { "cell_type": "markdown", "id": "d5e45ec0-929c-4a36-b709-20f6135bb218", "metadata": {}, "source": [ "## Parameters" ] }, { "cell_type": "markdown", "id": "fea008f9-97bf-4fa6-b1bb-b6bd1a5391b1", "metadata": {}, "source": [ "### `adjust`" ] }, { "cell_type": "code", "execution_count": 7, "id": "4afdf836-225f-493d-80fd-33e0e0a112bc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " p + geom_pointdensity() + ggtitle(\"adjust=1 (default)\"),\n", " p + geom_pointdensity(adjust=.1) + ggtitle(\"adjust=.1\"),\n", " p + geom_pointdensity(adjust=10) + ggtitle(\"adjust=10\"),\n", "])" ] }, { "cell_type": "markdown", "id": "cf170946-e34f-45ed-bd69-4fc346af3837", "metadata": {}, "source": [ "### `method`\n", "\n", "Parameter `method` tells `geom_pointdensity()` how to estimate \"how crowded is it here?\" around each point.\n", "\n", "Here are the options:\n", "\n", "- `'neighbours'` - for every point, it counts how many other points fall within some radius.\n", "\n", " Use when: you have a few thousand points (or less) and you want a very local, discrete crowding measure that treats each point individually.\n", "\n", "- `'kde2d'` - builds a smooth 2D density surface (kernel density estimate) and then looks up that smooth density at each point.\n", "\n", " Use when: you have a ton of points (tens of thousands+), or you want something smoother / less noisy than direct neighbour counts.\n", "\n", "- `'auto'` (default) - it chooses for you. For smaller datasets it behaves like `'neighbours'`; for larger datasets it switches to `'kde2d'`, because that scales better.\n", "\n", " Use when: you’re not sure about performance trade-offs and just want a sensible default." ] }, { "cell_type": "code", "execution_count": 8, "id": "98882440-6557-4a09-a803-914d5c2da977", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " p + geom_pointdensity(aes(color='..count..')) + ggtitle(\"method='auto' (default)\"),\n", " p + geom_pointdensity(aes(color='..count..'), method='neighbours') + ggtitle(\"method='neighbours'\"),\n", " p + geom_pointdensity(aes(color='..count..'), method='kde2d') + ggtitle(\"method='kde2d'\"),\n", "])" ] }, { "cell_type": "markdown", "id": "afea8dc3-0df0-455d-9e66-16b65bd71cf6", "metadata": {}, "source": [ "Sometimes you may have additional reasons to explicitly specify the `method`:" ] }, { "cell_type": "code", "execution_count": 9, "id": "fdbc5e05-125b-4686-8c7c-1c02b87d029f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(df, aes(\"carat\", \"price\")) + \\\n", " geom_pointdensity() + \\\n", " facet_grid(x=\"is_ideal\")" ] }, { "cell_type": "markdown", "id": "ea301be1-612d-4612-99dc-307b2992e8cd", "metadata": {}, "source": [ "Although both subplots have the same distribution and a similar number of points, it is clear that the pictures are too different. This is because different methods were applied to different facets; the decision on which method to use is made independently for each data group.\n", "\n", "This can easily be corrected by specifying the method explicitly:" ] }, { "cell_type": "code", "execution_count": 10, "id": "0c71e2f4-9643-44b2-a929-94c1a7b0d5c1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(df, aes(\"carat\", \"price\")) + \\\n", " geom_pointdensity(method='kde2d') + \\\n", " facet_grid(x=\"is_ideal\")" ] }, { "cell_type": "markdown", "id": "e077e0a0-b282-450f-85a2-0365c21dcf97", "metadata": {}, "source": [ "## Improved Appearance" ] }, { "cell_type": "code", "execution_count": 11, "id": "f89c2c08-427f-4420-940a-499979376d6e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p + \\\n", " geom_pointdensity(aes(alpha=\"color\", color='..count..'),\n", " tooltips=layer_tooltips().line(\"neighbours count|@..count..\")\n", " .line(\"diamond colour\\nfrom 1 (worst) to 7 (best)|@color\")\n", " .line(\"clarity|@clarity\")) + \\\n", " scale_color_viridis(name=\"neighbours count\") + \\\n", " scale_alpha(range=[.1, .9], guide='none') + \\\n", " ggtb() + \\\n", " ggsize(1000, 600) + \\\n", " theme_classic()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.23" } }, "nbformat": 4, "nbformat_minor": 5 }