{ "cells": [ { "cell_type": "markdown", "id": "disciplinary-circuit", "metadata": {}, "source": [ "# _Categorical_ Data Type\n", "\n", "\n", "`Categoricals` can only take on a limited number of possible values (categories) and
\n", "can be sorted according to the custom order of the categories.\n", "\n", "To harness `Categorical` data type in Lets-Plot you can either add a `pandas.Categotical` variable to
\n", "your `pandas.DataFrame` or annotate any variable in your dataset as `Categorical` using
\n", "Lets-Plot `as_discrete()` function and the `levels` parameter.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "entire-rapid", "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:45:29.796564Z", "iopub.status.busy": "2024-04-26T11:45:29.796564Z", "iopub.status.idle": "2024-04-26T11:45:30.823826Z", "shell.execute_reply": "2024-04-26T11:45:30.823826Z" } }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from lets_plot import *\n", "from lets_plot.mapping import as_discrete" ] }, { "cell_type": "code", "execution_count": 2, "id": "southwest-newcastle", "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:45:30.823826Z", "iopub.status.busy": "2024-04-26T11:45:30.823826Z", "iopub.status.idle": "2024-04-26T11:45:30.839582Z", "shell.execute_reply": "2024-04-26T11:45:30.839582Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "LetsPlot.setup_html()" ] }, { "cell_type": "code", "execution_count": 3, "id": "beautiful-closer", "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:45:30.839582Z", "iopub.status.busy": "2024-04-26T11:45:30.839582Z", "iopub.status.idle": "2024-04-26T11:45:31.138398Z", "shell.execute_reply": "2024-04-26T11:45:31.138398Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0manufacturermodeldisplyearcyltransdrvctyhwyflclass
01audia41.819994auto(l5)f1829pcompact
12audia41.819994manual(m5)f2129pcompact
23audia42.020084manual(m6)f2031pcompact
34audia42.020084auto(av)f2130pcompact
\n", "
" ], "text/plain": [ " Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy \\\n", "0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 \n", "1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 \n", "2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 \n", "3 4 audi a4 2.0 2008 4 auto(av) f 21 30 \n", "\n", " fl class \n", "0 p compact \n", "1 p compact \n", "2 p compact \n", "3 p compact " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mpg_df = pd.read_csv (\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv\")\n", "mpg_df.head(4)" ] }, { "cell_type": "markdown", "id": "714414b3", "metadata": {}, "source": [ "#### 1. Data Type of the \"manufacturer\" is `Unordered Discrete` by Default ." ] }, { "cell_type": "code", "execution_count": 4, "id": "drawn-canadian", "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:45:31.138398Z", "iopub.status.busy": "2024-04-26T11:45:31.138398Z", "iopub.status.idle": "2024-04-26T11:45:31.250120Z", "shell.execute_reply": "2024-04-26T11:45:31.249323Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(mpg_df) + geom_bar(aes(x='manufacturer')) + coord_flip()" ] }, { "cell_type": "code", "execution_count": 5, "id": "civilian-stuff", "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:45:31.250120Z", "iopub.status.busy": "2024-04-26T11:45:31.250120Z", "iopub.status.idle": "2024-04-26T11:45:31.266045Z", "shell.execute_reply": "2024-04-26T11:45:31.265291Z" } }, "outputs": [ { "data": { "text/plain": [ "['dodge',\n", " 'toyota',\n", " 'volkswagen',\n", " 'ford',\n", " 'chevrolet',\n", " 'audi',\n", " 'hyundai',\n", " 'subaru',\n", " 'nissan',\n", " 'honda',\n", " 'jeep',\n", " 'pontiac',\n", " 'land rover',\n", " 'mercury',\n", " 'lincoln']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# \n", "# Create a list of categories sorted according to a num. of vehicles in the dataset.\n", "# \n", "\n", "brands_by_count = mpg_df['manufacturer'].value_counts().index.tolist()\n", "brands_by_count" ] }, { "cell_type": "markdown", "id": "eleven-plumbing", "metadata": {}, "source": [ "#### 2. First Option: Add a `pandas.Categorical` Variable" ] }, { "cell_type": "code", "execution_count": 6, "id": "coordinate-omaha", "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:45:31.266045Z", "iopub.status.busy": "2024-04-26T11:45:31.266045Z", "iopub.status.idle": "2024-04-26T11:45:31.282021Z", "shell.execute_reply": "2024-04-26T11:45:31.281198Z" } }, "outputs": [], "source": [ "manufacturer_cat = pd.Categorical(mpg_df['manufacturer'], categories=brands_by_count, ordered=True)\n", "mpg_df['manufacturer_cat'] = manufacturer_cat" ] }, { "cell_type": "code", "execution_count": 7, "id": "published-wagner", "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:45:31.282021Z", "iopub.status.busy": "2024-04-26T11:45:31.282021Z", "iopub.status.idle": "2024-04-26T11:45:31.312833Z", "shell.execute_reply": "2024-04-26T11:45:31.312833Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(mpg_df) + \\\n", " geom_bar(aes(x='manufacturer_cat'),\n", " labels=layer_labels(['..count..']).format('..count..', 'd'),\n", " tooltips='none') + \\\n", " coord_flip()" ] }, { "cell_type": "markdown", "id": "temporal-punishment", "metadata": {}, "source": [ "#### 3. Second Option: Annotate \"manufacturer\" as a `Categorical` Using `as_discrete(levels=..)` " ] }, { "cell_type": "code", "execution_count": 8, "id": "durable-staff", "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:45:31.321435Z", "iopub.status.busy": "2024-04-26T11:45:31.321435Z", "iopub.status.idle": "2024-04-26T11:45:31.344957Z", "shell.execute_reply": "2024-04-26T11:45:31.344957Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(mpg_df) + \\\n", " geom_bar(aes(x=as_discrete('manufacturer', levels=brands_by_count)),\n", " labels=layer_labels(['..count..']).format('..count..', 'd'),\n", " tooltips='none') + \\\n", " coord_flip()" ] }, { "cell_type": "markdown", "id": "undefined-anthropology", "metadata": {}, "source": [ "#### 4. Faceted Plot with a `Categorical` as a Facet Variable\n", "\n", "When the facet variable is of `Categorical` data type, plot facets are ordered according to the order of categories. " ] }, { "cell_type": "code", "execution_count": 9, "id": "familiar-nowhere", "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:45:31.344957Z", "iopub.status.busy": "2024-04-26T11:45:31.344957Z", "iopub.status.idle": "2024-04-26T11:45:31.376379Z", "shell.execute_reply": "2024-04-26T11:45:31.376379Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ggplot(mpg_df) + \\\n", " geom_pie(aes(fill='drv', size='..sum..')) + \\\n", " facet_wrap(facets='manufacturer_cat', ncol=5, order=0) + \\\n", " scale_size(range=[2, 10]) + \\\n", " guides(size='none') + \\\n", " theme_void()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 5 }