{ "cells": [ { "cell_type": "markdown", "id": "eff35ebd-8fa2-4b5d-8338-822e440c4289", "metadata": {}, "source": [ "# Smooth Statistics Labels\n", "\n", "Smooth layers (e.g., `geom_smooth()`) now compute additional statistics that describe the fitted model quality and the fitted equation. These values are available as smooth stat variables (such as `..r2..` and `..adjr2..`) and can be displayed in a layer annotation via `smooth_labels`.\n", "\n", "`smooth_labels` is an annotation helper designed specifically for smooth layers. It extends `layer_labels`, so the familiar label-building API (`line()`, `format()`, `size()`, etc.) works the same way, while adding support for smooth-stat variables and the equation marker.\n", "\n", "Supported variables and markers:\n", "\n", "* `..r2..` — **R² (coefficient of determination)**. A goodness-of-fit measure showing what fraction of the variance in the response is explained by the fitted model. Values are typically between 0 and 1 (higher means the model explains more of the observed variation).\n", "* `..adjr2..` — **adjusted R²**. A variant of R² that accounts for model complexity: it penalizes adding extra terms/parameters and is therefore more suitable for comparing models with different numbers of predictors (e.g., different polynomial degrees). Adjusted R² can be lower than R² and may even be negative for a very poor fit.\n", "* `~eq` — **fitted equation**. Inserts the model equation into the annotation (can be configured with `eq()`)." ] }, { "cell_type": "code", "execution_count": 1, "id": "7a5c9a56-3edd-4109-ad8c-ef9ae1d1b64c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "from lets_plot import *\n", "LetsPlot.setup_html()" ] }, { "cell_type": "code", "execution_count": 2, "id": "bbb9afae-832d-4bdf-8a12-363d2dd26462", "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)" ] }, { "cell_type": "code", "execution_count": 3, "id": "cc74a604-e64a-4e3c-890e-8e12d244886f", "metadata": {}, "outputs": [], "source": [ "t = np.linspace(0, 1, 100)\n", "mean = 1 + np.zeros(2)\n", "cov = np.eye(2)\n", "x, y = np.random.multivariate_normal(mean, cov, t.size).T\n", "df = pd.DataFrame({'t': t, 'x': x, 'y': y})\n", "df = df.melt(id_vars=['t'], value_vars=['x', 'y'])" ] }, { "cell_type": "code", "execution_count": 4, "id": "a3383095-3f77-4a29-8e0b-cbc0fc6c85ba", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(df, aes(x='t', y='value', group='variable')) + \\\n", " geom_point(aes(color='variable'), size=3, alpha=.5) + \\\n", " geom_smooth(aes(color='variable'), size=1, deg=3, span=.3, level=.7, seed=42, \n", " labels = smooth_labels()\n", " .line('\\(R\\^2=\\)@..r2.., \\(R_{{adj}}\\^2=\\)@..adjr2.., ~eq')\n", " .format('..r2..', '.4f')\n", " .format('..adjr2..', '.4f')\n", " .eq(format='.2f')\n", " .label_y(['top', 'bottom'])\n", " .inherit_color())" ] }, { "cell_type": "code", "execution_count": 5, "id": "4f0171d9-eb6b-4b68-876e-95c7614fcedf", "metadata": {}, "outputs": [], "source": [ "n = 100\n", "x = np.linspace(-2, 2, n)\n", "y = x ** 3 + np.random.normal(size=n)" ] }, { "cell_type": "code", "execution_count": 6, "id": "45287c7e-9db6-4848-ac8e-8a1d0fcafbf7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot({'x': x, 'y': y}, aes(x='x', y='y')) + geom_point() + \\\n", " geom_smooth(color='red', deg=1, labels = smooth_labels().label_y(8).inherit_color()) + \\\n", " geom_smooth(color='green', deg=2, labels = smooth_labels().label_y(6).inherit_color()) + \\\n", " geom_smooth(color='blue', deg=3, labels = smooth_labels().label_y(4).inherit_color())" ] }, { "cell_type": "code", "execution_count": 7, "id": "3f04955d-fe8d-425c-be5e-a4806afd59e7", "metadata": {}, "outputs": [], "source": [ "plot = ggplot({'x': [0, 1.5, 1.7, 2], 'y': [0, 1, 1.8, 4]}, aes('x', 'y')) + geom_point() " ] }, { "cell_type": "markdown", "id": "80a212a4-01fe-4ab5-a853-a62475334491", "metadata": {}, "source": [ "### Default usage\n", "`smooth_labels()` without any parameters or method calls displays the coefficient of determination (R²)." ] }, { "cell_type": "code", "execution_count": 8, "id": "c1f29b93-0498-488d-a5fe-e9c6ddc8336f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot + geom_smooth(deg=2, labels = smooth_labels())" ] }, { "cell_type": "markdown", "id": "9194a677-5460-4e5a-afc2-25479687d7a5", "metadata": {}, "source": [ "When using `line()`, the value is displayed without the left-hand part (label). The user can format the line as desired." ] }, { "cell_type": "code", "execution_count": 9, "id": "8c906c9c-183b-430f-a39a-46d9a57eb167", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot + geom_smooth(deg=2, labels = smooth_labels()\n", " .line('@..r2..')\n", " .line('R\\^2=@..r2..')\n", " .line('\\(R\\^2\\)=@..r2..'))" ] }, { "cell_type": "markdown", "id": "3dbd887b-a728-4301-bb4f-4982dd3a5eed", "metadata": {}, "source": [ "Use `format()` to format the values." ] }, { "cell_type": "code", "execution_count": 10, "id": "10ee4460-e81c-4c76-a705-e807f4f95ce3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot + geom_smooth(deg=2, labels = smooth_labels()\n", " .line('@..adjr2..')\n", " .format('..adjr2..', '.3f'))" ] }, { "cell_type": "markdown", "id": "d4656fcc-46d9-454d-a280-99bf1bfaba0a", "metadata": {}, "source": [ "`size()`" ] }, { "cell_type": "code", "execution_count": 11, "id": "4942b882-150e-4ee0-87e7-fedcf560e37d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot + geom_smooth(deg=2, labels = smooth_labels().size(20))" ] }, { "cell_type": "markdown", "id": "9f61ed02-6c00-4f09-a2c6-c1cfc11fb070", "metadata": {}, "source": [ "### Formula\n", "Use `~eq` to display the equation.\n", "\n", "`~eq` represents the full equation (including the left-hand part, e.g. `y=`) and can be configured with `eq(lhs, rhs, format, threshold)`." ] }, { "cell_type": "code", "execution_count": 12, "id": "f58bf5dd-b6eb-49e7-a25c-943c33bb6306", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot + geom_smooth(deg=2, labels = smooth_labels()\n", " .line('~eq'))" ] }, { "cell_type": "markdown", "id": "f60574f0-3436-4dd5-94ef-ad12e67d724e", "metadata": {}, "source": [ "#### Equation configuration: `eq()`\n", "\n", "Parameters `lhs` and `rhs`\n" ] }, { "cell_type": "code", "execution_count": 13, "id": "2ef93b90-6283-48dc-beb0-17b89c276f7a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " # Replaces 'y' with f(x) on the left-hand side of the equation\n", " plot + geom_smooth(deg=2, labels=smooth_labels().line('~eq').eq(lhs='f(x)')),\n", " # Completely removes the left-hand side of the equation\n", " plot + geom_smooth(deg=2, labels=smooth_labels().line('~eq').eq(lhs='')),\n", " # Replaces 'x' with 't' on the right-hand side of the equation\n", " plot + geom_smooth(deg=2, labels=smooth_labels().line('~eq').eq(rhs='t')),\n", "], ncol = 1)\n" ] }, { "cell_type": "markdown", "id": "c3a6b410-02ed-4857-8ab6-4c82a0aa559c", "metadata": {}, "source": [ "Coefficient formatting" ] }, { "cell_type": "code", "execution_count": 14, "id": "83945419-dd63-4819-9413-a58568d1a6e1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " # The regular format() is applied to all coefficients\n", " plot + geom_smooth(deg=2, labels=smooth_labels().line('~eq').format('~eq', '.1f')),\n", " # One format in eq() is applied to all coefficients\n", " plot + geom_smooth(deg=2, labels=smooth_labels().line('~eq').eq(format='.2f')),\n", " # You can pass a list to format each coefficient separately.\n", " # The list length does not have to match the number of coefficients:\n", " # the last format in the list will be used for any remaining coefficients.\n", " plot + geom_smooth(deg=2, labels=smooth_labels().line('~eq').eq(format=['.1f', '.3f'])),\n", "])" ] }, { "cell_type": "markdown", "id": "7756da6f-e209-42d2-b305-48ffeca09214", "metadata": {}, "source": [ "Parameter `threshold` " ] }, { "cell_type": "code", "execution_count": 15, "id": "3a7ca4e2-c485-4fbf-90ed-41b2d9d2c3c3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot + geom_smooth(deg=2, labels=smooth_labels()\n", " .line('~eq')\n", " .eq(threshold=0.01))" ] }, { "cell_type": "markdown", "id": "816e6ff4-8d8b-486d-805c-a4fa761856fa", "metadata": {}, "source": [ "#### Positioning: `label_x()` and `label_y()`\n", "\n", "By default, the text is shown in the upper-left corner. To change its position, use `label_x()` and `label_y()`. They accept strings (`left`, `center`, `right`) for `label_x()` and (`top`, `middle`, `bottom`) for `label_y()`, or an exact position in plot coordinates." ] }, { "cell_type": "code", "execution_count": 16, "id": "db06defc-f002-4761-ba1f-5b5cc6520585", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " plot + geom_smooth(deg=2, labels = smooth_labels().label_x('center').label_y('middle')), \n", " plot + geom_smooth(deg=2, labels = smooth_labels().label_x(0.5).label_y(4))\n", "])" ] }, { "cell_type": "markdown", "id": "d5b3a4d4-9df4-4457-8ad5-d9dd041f4025", "metadata": {}, "source": [ "#### Grouping\n", "If the data for `geom_smooth()` is grouped, the statistics are computed separately for each group automatically. If groups are differentiated by color, use `inherit_color()` to make the label text color match the geom_smooth() line color.\n", "\n", "By default, group labels are stacked vertically, but you can control the position of each group independently. To do this, pass lists to `label_x()` and `label_y()`.\n" ] }, { "cell_type": "code", "execution_count": 17, "id": "ca657b16-ec18-4100-a1b9-5396315f9c07", "metadata": {}, "outputs": [], "source": [ "x = [0, 1.5, 1.7, 2, 0, 1.5, 1.7, 2]\n", "y = [0, 1, 1.8, 4, 0.5, 1.5, 3, 4.5]\n", "g = ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b']\n", "\n", "plot_groups = ggplot({'x': x, 'y': y, 'g': g}, aes('x', 'y')) + geom_point() " ] }, { "cell_type": "code", "execution_count": 18, "id": "1306958c-1865-482c-8a28-4b45080eedb2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid([\n", " plot_groups + geom_smooth(aes(x='x', y='y', color='g'), deg=2, labels = smooth_labels().inherit_color()) + ggtitle('Color'),\n", " plot_groups + geom_smooth(aes(x='x', y='y', linetype='g'), deg=2, labels = smooth_labels().label_x('center').label_y([-4, 6])) + ggtitle('Linetype')\n", "])" ] }, { "cell_type": "markdown", "id": "50b69c94-36b4-4a3b-b0b1-9cc3d791b1e9", "metadata": {}, "source": [ "#### All supported variables" ] }, { "cell_type": "code", "execution_count": 19, "id": "954ecfc2-9872-4433-83a7-8034d4f28000", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot + geom_smooth(deg=2, labels = smooth_labels()\n", " .line('n=@..n..')\n", " .line('Method: @..method..') \n", " .line('\\(R\\^2=\\)@..r2..')\n", " .line('\\(R_{{adj}}\\^2=\\)@..adjr2..')\n", " .line('AIC=@..aic..')\n", " .line('BIC=@..bic..')\n", " .line('\\(F_{{@..df1.., @..df2..}}=\\)@..f..')\n", " .line('p=@..p..')\n", " .line('@..cilevel.. CI [@..cilow.., @..cihigh..]')\n", " .format('..cilevel..', '0.0%')\n", " .line('~eq')\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "f1b85188-bec8-46b4-81d3-f3a2fec02639", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.20" } }, "nbformat": 4, "nbformat_minor": 5 }