{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Histogram\n",
"\n",
"The [geom_histogram()](https://lets-plot.org/python/pages/api/lets_plot.geom_histogram.html) geometry visualizes the distribution of a numeric variable using binned counts."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"from lets_plot import *"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"LetsPlot.setup_html()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(234, 12)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" | 1 | \n",
" 2 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" | 2 | \n",
" 3 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 20 | \n",
" 31 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" | 3 | \n",
" 4 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(av) | \n",
" f | \n",
" 21 | \n",
" 30 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" | 4 | \n",
" 5 | \n",
" audi | \n",
" a4 | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy \\\n",
"0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 \n",
"1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 \n",
"2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 \n",
"3 4 audi a4 2.0 2008 4 auto(av) f 21 30 \n",
"4 5 audi a4 2.8 1999 6 auto(l5) f 16 26 \n",
"\n",
" fl class \n",
"0 p compact \n",
"1 p compact \n",
"2 p compact \n",
"3 p compact \n",
"4 p compact "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg_df = pd.read_csv(\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv\")\n",
"print(mpg_df.shape)\n",
"mpg_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(150, 5)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sepal_length | \n",
" sepal_width | \n",
" petal_length | \n",
" petal_width | \n",
" species | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 5.1 | \n",
" 3.5 | \n",
" 1.4 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
" | 1 | \n",
" 4.9 | \n",
" 3.0 | \n",
" 1.4 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
" | 2 | \n",
" 4.7 | \n",
" 3.2 | \n",
" 1.3 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
" | 3 | \n",
" 4.6 | \n",
" 3.1 | \n",
" 1.5 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
" | 4 | \n",
" 5.0 | \n",
" 3.6 | \n",
" 1.4 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"0 5.1 3.5 1.4 0.2 setosa\n",
"1 4.9 3.0 1.4 0.2 setosa\n",
"2 4.7 3.2 1.3 0.2 setosa\n",
"3 4.6 3.1 1.5 0.2 setosa\n",
"4 5.0 3.6 1.4 0.2 setosa"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris_df = pd.read_csv(\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/iris.csv\")\n",
"print(iris_df.shape)\n",
"iris_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Default Histogram"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"mpg_p = ggplot(mpg_df, aes(x=\"hwy\", fill=as_discrete(\"year\"))) + scale_fill_discrete(format=\"d\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg_p + geom_histogram()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Binning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fixed Number of Bins"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg_p + geom_histogram(bins=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fixed Bin Width"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg_p + geom_histogram(binwidth=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom Bins Using the `breaks` Parameter"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg_p + geom_histogram(breaks=[10, 16, 19, 24, 26, 30, 45])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Bin Alignment"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gggrid([\n",
" mpg_p + geom_histogram(binwidth=5) + ggtitle(\"Default Bin Alignment\"),\n",
" mpg_p + geom_histogram(binwidth=5, center=10) + ggtitle(\"Bins Centered on 10\"),\n",
" mpg_p + geom_histogram(binwidth=5, boundary=10) + ggtitle(\"Bin Boundary at 10\"),\n",
"], ncol=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Position Adjustments"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interleaved Histogram"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg_p + geom_histogram(position='dodge')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Overlaid Semi-Transparent Histogram"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg_p + geom_histogram(position='identity', alpha=.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Parameter `threshold`\n",
"\n",
"The `threshold` parameter controls how the `bin` statistic is computed. When `threshold=None` (the default), all bins are retained.\n",
"\n",
"When `threshold` is `0` or another numeric value, bins with counts less than or equal to that value are dropped, but only if they are on the left or right edge of the histogram.\n",
"\n",
"Dropping empty or low-count edge bins is particularly useful for faceted plots with free scales."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"iris_p = ggplot(iris_df, aes(x=\"petal_length\", fill=\"species\")) + facet_wrap(facets=\"species\", scales='free_x')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Default Faceted Histogram with Free X-axis\n",
"\n",
"Despite `scales='free_x'`, the x-axis still appears to be shared across the plot facets."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris_p + geom_histogram(alpha=.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### With a Threshold"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris_p + geom_histogram(threshold=0, alpha=.5)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}