{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true }, "pycharm": { "name": "#%% md\n" } }, "source": [ "# Lets-Plot Usage Guide\n", "\n", "\n", " \"Couldn't\n", "\n", "
\n", "
\n", "\n", "\n", "- [System requirements](#sys)\n", "- [Installation](#install)\n", "- [Understanding architecture](#implementation)\n", "- [Learning API](#api)\n", "- [Getting started](#gsg)\n", "\n", "\n", "**Lets-Plot** is an open-source plotting library for statistical data. It is implemented using the \n", "[Kotlin programming language](https://kotlinlang.org/) that has a multi-platform nature.\n", "That's why Lets-Plot provides the plotting functionality that \n", "is packaged as a JavaScript library, a JVM library, and a native Python extension.\n", "\n", "The design of the Lets-Plot library is heavily influenced by \n", "[ggplot2](https://ggplot2.tidyverse.org) library.\n", "\n", "\n", "## System requirements\n", "When installing the Lets-Plot library, consider the following requirements.\n", "\n", "Supported operating systems:\n", "- macOS\n", "- Linux\n", "- Windows\n", "\n", "Supported Python versions:\n", "- 3.7\n", "- 3.8\n", "- 3.9\n", "- 3.10\n", "- 3.11\n", "- 3.12\n", "\n", "\n", "## Installation\n", "\n", "The `lets-plot` package is available in the [pypi.org](https://pypi.org/project/lets-plot/) repository.\n", "Execute the following command to install the `lets-plot` package on your Python interpreter:\n", "\n", "`pip install lets-plot`\n", "\n", "\n", "## Understanding Lets-Plot architecture\n", "In `lets-plot`, the **plot** is represented at least by one\n", "**layer**. It can be built based on the default dataset with the aesthetics mappings, set of scales, or additional \n", "features applied.\n", "\n", "The **Layer** is responsible for creating the objects painted on the ‘canvas’ and it contains the following elements:\n", "- **Data** - the set of data specified either once for all layers or on a per layer basis.\n", "One plot can combine multiple different datasets (one per layer).\n", "- **Aesthetic mapping** - describes how variables in the dataset are mapped to the visual properties of the layer, such as color, shape, size, or position.\n", "- **Geometric object** - a geometric object that represents a particular type of plots.\n", "- **Statistical transformation** - computes some kind of statistical summary on the raw input data. \n", "For example, `bin` statistics is used for histograms and `smooth` is used for regression lines. \n", "Most stats take additional parameters to specify details of the statistical transformation of data.\n", "- **Position adjustment** - a method used to compute the final coordinates of geometry. \n", "Used to build variants of the same `geom` object or to avoid overplotting.\n", "\n", "\n", "\n", "\n", "## Learning API\n", "The typical code fragment that renders a plot looks as follows:\n", "\n", "```\n", "from lets_plot import *\n", "p = ggplot() \n", "p + geom_(mapping=aes('x', 'y', =''), stat=, position=)\n", "```\n", "\n", "### Geometric objects `geom`\n", "\n", "You can add a new geometric object (or plot layer) by creating it using the `geom_xxx()` function and then adding this object to `ggplot`:\n", "\n", "```\n", "p = ggplot(data=df)\n", "p + geom_point()\n", "```\n", "\n", "The following plots are supported:\n", "\n", "- Area plot: [`geom_area()`](https://lets-plot.org/python/pages/api/lets_plot.geom_area.html)\n", "- Discrete plot: [`geom_bar()`](https://lets-plot.org/python/pages/api/lets_plot.geom_bar.html), [`geom_pie()`](https://lets-plot.org/python/pages/api/lets_plot.geom_pie.html), [`geom_lollipop()`](https://lets-plot.org/python/pages/api/lets_plot.geom_lollipop.html), [`geom_count()`](https://lets-plot.org/python/pages/api/lets_plot.geom_count.html), [`stat_sum()`](https://lets-plot.org/python/pages/api/lets_plot.stat_sum.html)\n", "- Boxplot: [`geom_boxplot()`](https://lets-plot.org/python/pages/api/lets_plot.geom_boxplot.html)\n", "- Contours: [`geom_contour()`](https://lets-plot.org/python/pages/api/lets_plot.geom_contour.html), [`geom_contourf()`](https://lets-plot.org/python/pages/api/lets_plot.geom_contourf.html)\n", "- Connectors [`geom_path()`](https://lets-plot.org/python/pages/api/lets_plot.geom_path.html), [`geom_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_line.html), [`geom_segment()`](https://lets-plot.org/python/pages/api/lets_plot.geom_segment.html), [`geom_curve()`](https://lets-plot.org/python/pages/api/lets_plot.geom_curve.html), [`geom_spoke()`](https://lets-plot.org/python/pages/api/lets_plot.geom_spoke.html), [`geom_step()`](https://lets-plot.org/python/pages/api/lets_plot.geom_step.html)\n", "- Density plot: [`geom_density()`](https://lets-plot.org/python/pages/api/lets_plot.geom_density.html), [`geom_area_ridges()`](https://lets-plot.org/python/pages/api/lets_plot.geom_area_ridges.html), [`geom_violin()`](https://lets-plot.org/python/pages/api/lets_plot.geom_violin.html)\n", " and [`geom_density2d()`](https://lets-plot.org/python/pages/api/lets_plot.geom_density2d.html), [`geom_density2df()`](https://lets-plot.org/python/pages/api/lets_plot.geom_density2df.html)\n", "- Error-bar plot: [`geom_errorbar()`](https://lets-plot.org/python/pages/api/lets_plot.geom_errorbar.html), [`geom_crossbar()`](https://lets-plot.org/python/pages/api/lets_plot.geom_crossbar.html), [`geom_linerange()`](https://lets-plot.org/python/pages/api/lets_plot.geom_linerange.html), [`geom_pointrange()`](https://lets-plot.org/python/pages/api/lets_plot.geom_pointrange.html)\n", "- Histogram: [`geom_freqpoly()`](https://lets-plot.org/python/pages/api/lets_plot.geom_freqpoly.html), [`geom_histogram()`](https://lets-plot.org/python/pages/api/lets_plot.geom_histogram.html) and [`geom_bin2d()`](https://lets-plot.org/python/pages/api/lets_plot.geom_bin2d.html)\n", "- Jitter plot: [`geom_jitter()`](https://lets-plot.org/python/pages/api/lets_plot.geom_jitter.html)\n", "- Line plot: [`geom_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_line.html)\n", "- Reference lines: [`geom_abline()`](https://lets-plot.org/python/pages/api/lets_plot.geom_abline.html), [`geom_hline()`](https://lets-plot.org/python/pages/api/lets_plot.geom_hline.html), [`geom_vline()`](https://lets-plot.org/python/pages/api/lets_plot.geom_vline.html)\n", "- Polygons: [`geom_polygon`](https://lets-plot.org/python/pages/api/lets_plot.geom_polygon.html)\n", "- Rectangles, Tiles, Raster: [`geom_rect()`](https://lets-plot.org/python/pages/api/lets_plot.geom_rect.html), [`geom_tile()`](https://lets-plot.org/python/pages/api/lets_plot.geom_tile.html), [`geom_raster()`](https://lets-plot.org/python/pages/api/lets_plot.geom_raster.html)\n", "- Ribbons: [`geom_ribbon()`](https://lets-plot.org/python/pages/api/lets_plot.geom_ribbon.html)\n", "- Scatter plot: [`geom_point()`](https://lets-plot.org/python/pages/api/lets_plot.geom_point.html)\n", "- Dot plot: [`geom_dotplot()`](http://lets-plot.org/python/pages/api/lets_plot.geom_dotplot.html), [`geom_ydotplot()`](http://lets-plot.org/python/pages/api/lets_plot.geom_ydotplot.html)\n", "- Regression lines: [`geom_smooth()`](https://lets-plot.org/python/pages/api/lets_plot.geom_smooth.html)\n", "- Q-Q plot: [`geom_qq()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq.html), [`geom_qq_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq_line.html), [`geom_qq2()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq2.html), [`geom_qq2_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq2_line.html)\n", "- ECDF plot: [`stat_ecdf()`](https://lets-plot.org/python/pages/api/lets_plot.stat_ecdf.html)\n", "- Summary: [`stat_summary()`](https://lets-plot.org/python/pages/api/lets_plot.stat_summary.html), [`stat_summary_bin()`](https://lets-plot.org/python/pages/api/lets_plot.stat_summary_bin.html)\n", "- Function plot: [`geom_function()`](https://lets-plot.org/python/pages/api/lets_plot.geom_function.html)\n", "- Text: [`geom_text()`](https://lets-plot.org/python/pages/api/lets_plot.geom_text.html), [`geom_label()`](https://lets-plot.org/python/pages/api/lets_plot.geom_label.html)\n", "- Map: [`geom_map()`](https://lets-plot.org/python/pages/api/lets_plot.geom_map.html)\n", "- Image: [`geom_imshow()`](https://lets-plot.org/python/pages/api/lets_plot.geom_imshow.html)\n", "\n", "See the [geom reference](https://lets-plot.org/python/pages/charts.html) for more information about the supported\n", "geometric methods, their arguments, and default values.\n", "\n", "### Collections of plots\n", "With the [`GGBunch()`](https://lets-plot.org/python/pages/api/lets_plot.GGBunch.html) method, you can \n", "render a collection of plots. \n", "Use the `add_plot()` method to add plot to the bunch and set an arbitrary location and size for plots inside the grid:\n", "\n", "```\n", "bunch = GGBunch()\n", "bunch.add_plot(plot1, 0, 0)\n", "bunch.add_plot(plot2, 0, 200)\n", "```\n", "\n", "See the [GGBunch](https://nbviewer.jupyter.org/github/JetBrains/lets-plot-docs/blob/master/source/examples/cookbook/ggbunch.ipynb) example for more information.\n", "\n", "### Stat `stat`\n", "\n", "Add `stat` as an argument to `geom_xxx()` function to define statistical data transformations:\n", "\n", "`geom_point(stat='count')`\n", "\n", "Supported transformations:\n", "\n", "- `identity`: leave the data unchanged\n", "- `count`: calculate the number of points with same x-axis coordinate\n", "- `bin`: calculate the number of points falling in each of adjacent equally sized ranges along the x-axis\n", "- `bin2d`: calculate the number of points falling in each of adjacent equal sized rectangles on the plot plane\n", "- `smooth`: perform smoothing\n", "- `contour`, `contourf`: calculate contours of 3D data\n", "- `boxplot`: calculate components of a box plot.\n", "- `density`, `density2d`, `density2df`: perform a kernel density estimation for 1D and 2D data\n", "\n", "### Aesthetic mappings `mapping`\n", "With mappings, you can define how variables in dataset are mapped to the visual elements of the plot.\n", "Pass the result of the `aes(x, y, other)` function to `geom`, where:\n", "- `x`: the dataframe column to map to the x axis. \n", "- `y`: the dataframe column to map to the y axis.\n", "- `other`: other visual properties of the plot, such as color, shape, size, or position.\n", "\n", "`geom_bar(x='cty', y='hwy', color='cyl')`\n", "you can use a simplified form:\n", "`geom_bar('cty', 'hwy', color='cyl')`\n", "\n", "### Position adjustment `position`\n", "\n", "All layers have a position adjustment that computes the final coordinates of geometry. \n", "Position adjustment is used to build variances of the same plots and resolve overlapping. \n", "Override the default settings by using the `position` argument in the `geom` functions:\n", "\n", "`geom_bar(position='dodge')`\n", "\n", "Available adjustments:\n", "- `dodge`\n", "- `jitter`\n", "- `jitterdodge`\n", "- `nudge`\n", "- `identity`\n", "- `fill`\n", "- `stack`\n", "\n", "See the [position reference](https://lets-plot.org/python/pages/api.html#positions) for more information about position adjustments.\n", "\n", "### Features affecting the entire plot\n", "\n", "#### Scales\n", "\n", "Enables choosing a reasonable scale for each mapped variable depending on the variable attributes. Override default scales to tweak \n", "details like the axis labels or legend keys, or to use a completely different translation from data to aesthetic.\n", "For example, to override the fill color on the histogram:\n", "\n", "`p + geom_histogram() + scale_fill_brewer(name=\"Trend\", palette=\"RdPu\")`\n", "\n", "See the list of the available `scale` methods in the [scale reference](https://lets-plot.org/python/pages/api.html#scales)\n", "\n", "#### Coordinated system\n", "\n", "The coordinate system determines how the x and y aesthetics combine to position elements in the plot. \n", "For example, to override the default X and Y ratio:\n", "\n", "`p + coord_fixed(ratio=2)`\n", "\n", "See the list of the available methods in [coordinates reference](https://lets-plot.org/python/pages/api.html#coordinates)\n", "\n", "#### Legend\n", "The axes and legends help users interpret plots.\n", "Use the `guide` methods or the `guide` argument of the `scale` method to customize the legend.\n", "For example, to define the number of columns in the legend:\n", "\n", "`p + scale_color_discrete(guide=guide_legend(ncol=2))`\n", "\n", "See more information in the [guide reference](https://lets-plot.org/python/pages/api.html#scale-guides)\n", "\n", "Adjust legend location on plot using the `theme` legend_position, legend_justification and legend_direction methods, see:\n", "[TBD]\n", "\n", "\n", "#### Sampling\n", "\n", "Sampling is a special technique of data transformation built into Lets-Plot and it is applied after stat transformation.\n", "Sampling helps prevents UI freezes and out-of-memory crashes when attempting to plot an excessively large number of geometries.\n", "By default, the technique applies automatically when the data volume exceeds a certain threshold.\n", "The `none` value disables any sampling for the given layer. The sampling methods can be chained together using the + operator.\n", "\n", "Available methods:\n", "- `sampling_random_stratified`: randomly selects points from each group proportionally to the group size but also ensures \n", "that each group is represented by at least a specified minimum number of points.\n", "- `sampling_random`: selects data points at randomly chosen indices without replacement.\n", "- `sampling_pick`: analyses X-values and selects all points which X-values get in the set of first `n` X-values found in the population.\n", "- `sampling_systematic`: selects data points at evenly distributed indices.\n", "- `sampling_vertex_dp`, `sampling_vertex_vw`: simplifies plotting of polygons. \n", "There is a choice of two implementation algorithms: Douglas-Peucker (`_dp`) and \n", "Visvalingam-Whyatt (`_vw`).\n", "\n", "For more details, see the [sampling reference](https://lets-plot.org/python/pages/sampling.html)." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "\n", "### Getting started\n", "\n", "Let's plot a point plot built using the mpg dataset.\n", "\n", "Create the `DataFrame` object and retrieve the data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-04-17T07:32:22.730003Z", "iopub.status.busy": "2024-04-17T07:32:22.729831Z", "iopub.status.idle": "2024-04-17T07:32:23.012868Z", "shell.execute_reply": "2024-04-17T07:32:23.012570Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0manufacturermodeldisplyearcyltransdrvctyhwyflclass
01audia41.819994auto(l5)f1829pcompact
12audia41.819994manual(m5)f2129pcompact
23audia42.020084manual(m6)f2031pcompact
34audia42.020084auto(av)f2130pcompact
45audia42.819996auto(l5)f1626pcompact
\n", "
" ], "text/plain": [ " Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy \\\n", "0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 \n", "1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 \n", "2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 \n", "3 4 audi a4 2.0 2008 4 auto(av) f 21 30 \n", "4 5 audi a4 2.8 1999 6 auto(l5) f 16 26 \n", "\n", " fl class \n", "0 p compact \n", "1 p compact \n", "2 p compact \n", "3 p compact \n", "4 p compact " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Data set\n", "\n", "import pandas as pd\n", "mpg = pd.read_csv(\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv\")\n", "mpg.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot the basic point plot." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-04-17T07:32:23.026220Z", "iopub.status.busy": "2024-04-17T07:32:23.026128Z", "iopub.status.idle": "2024-04-17T07:32:23.199735Z", "shell.execute_reply": "2024-04-17T07:32:23.199450Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Basic plotting\n", "from lets_plot import *\n", "# Load Lets-Plot JS library\n", "LetsPlot.setup_html()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perform the following aesthetic mappings:\n", " - `x` = displ (the **displ** column of the dataframe)\n", " - `y` = hwy (the **hwy** column of the dataframe)\n", " - `color` = cyl (the **cyl** column of the dataframe)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-04-17T07:32:23.200888Z", "iopub.status.busy": "2024-04-17T07:32:23.200750Z", "iopub.status.idle": "2024-04-17T07:32:23.234141Z", "shell.execute_reply": "2024-04-17T07:32:23.233947Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = ggplot(mpg)\n", "\n", "p + geom_point(aes('displ', 'hwy', color='cyl'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Apply statistical data transformation to count the number of cases at each x position." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-04-17T07:32:23.235263Z", "iopub.status.busy": "2024-04-17T07:32:23.235158Z", "iopub.status.idle": "2024-04-17T07:32:23.239350Z", "shell.execute_reply": "2024-04-17T07:32:23.239175Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p + geom_point(aes('displ', size='..count..', col='..count..'), stat='count')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Change the pallete and the legend, add the title. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-04-17T07:32:23.240298Z", "iopub.status.busy": "2024-04-17T07:32:23.240224Z", "iopub.status.idle": "2024-04-17T07:32:23.244473Z", "shell.execute_reply": "2024-04-17T07:32:23.244301Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p += scale_color_continuous(low=\"blue\", high=\"pink\", guide=guide_legend(ncol=2)) \\\n", " + ggtitle('Highway MPG by displacement')\n", "p + geom_point(aes('displ', 'hwy', color='cyl'), position='jitter') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Apply the randomly stratified sampling to select points from each group proportionally \n", "to the group size." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-04-17T07:32:23.245418Z", "iopub.status.busy": "2024-04-17T07:32:23.245347Z", "iopub.status.idle": "2024-04-17T07:32:23.249785Z", "shell.execute_reply": "2024-04-17T07:32:23.249523Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p + geom_point(\n", " aes('displ', 'hwy', color='cyl'), \n", " position='jitter', \n", " sampling=sampling_random_stratified(40))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 }