{ "cells": [ { "cell_type": "markdown", "id": "301891f2", "metadata": {}, "source": [ "# _Categorical_ Data Type\n", "\n", "\n", "`Categoricals` can only take on a limited number of possible values (categories) and
\n", "can be sorted according to the custom order of the categories.\n", "\n", "To harness `Categorical` data type in Lets-Plot you can annotate any variable in your dataset as `Categorical`
using Lets-Plot `asDiscrete()` function and the `levels` parameter." ] }, { "cell_type": "code", "execution_count": 1, "id": "b808bb87", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%useLatestDescriptors\n", "%use lets-plot\n", "%use dataframe" ] }, { "cell_type": "code", "execution_count": 2, "id": "ce1f82a2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Lets-Plot Kotlin API v.4.6.0. Frontend: Notebook with dynamically loaded JS. Lets-Plot JS v.4.2.0." ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "LetsPlot.getInfo()" ] }, { "cell_type": "code", "execution_count": 3, "id": "7decce2b", "metadata": {}, "outputs": [ { "data": { "application/kotlindataframe+json": "{\"nrow\":3,\"ncol\":12,\"columns\":[\"untitled\",\"manufacturer\",\"model\",\"displ\",\"year\",\"cyl\",\"trans\",\"drv\",\"cty\",\"hwy\",\"fl\",\"class\"],\"kotlin_dataframe\":[{\"untitled\":1,\"manufacturer\":\"audi\",\"model\":\"a4\",\"displ\":1.8,\"year\":1999,\"cyl\":4,\"trans\":\"auto(l5)\",\"drv\":\"f\",\"cty\":18,\"hwy\":29,\"fl\":\"p\",\"class\":\"compact\"},{\"untitled\":2,\"manufacturer\":\"audi\",\"model\":\"a4\",\"displ\":1.8,\"year\":1999,\"cyl\":4,\"trans\":\"manual(m5)\",\"drv\":\"f\",\"cty\":21,\"hwy\":29,\"fl\":\"p\",\"class\":\"compact\"},{\"untitled\":3,\"manufacturer\":\"audi\",\"model\":\"a4\",\"displ\":2.0,\"year\":2008,\"cyl\":4,\"trans\":\"manual(m6)\",\"drv\":\"f\",\"cty\":20,\"hwy\":31,\"fl\":\"p\",\"class\":\"compact\"}]}", "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "\n", "

DataFrame: rowsCount = 3, columnsCount = 12

\n", "
untitledmanufacturermodeldisplyearcyltransdrvctyhwyflclass
1audia41.80000019994auto(l5)f1829pcompact
2audia41.80000019994manual(m5)f2129pcompact
3audia42.00000020084manual(m6)f2031pcompact
\n", " \n", " \n", " " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val mpg = DataFrame.readCSV(\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv\")\n", "mpg.head(3)" ] }, { "cell_type": "code", "execution_count": 4, "id": "e462696a", "metadata": {}, "outputs": [], "source": [ "val mpgData = mpg.toMap()" ] }, { "cell_type": "markdown", "id": "5360fc9a", "metadata": {}, "source": [ "#### 1. Data Type of the \"manufacturer\" is `Unordered Discrete` by Default " ] }, { "cell_type": "code", "execution_count": 5, "id": "6d24e699", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "letsPlot(mpgData) + geomBar() { x = \"manufacturer\" } + coordFlip()" ] }, { "cell_type": "code", "execution_count": 6, "id": "0af44c65", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dodge\n", "toyota\n", "volkswagen\n", "ford\n", "chevrolet\n", "audi\n", "hyundai\n", "subaru\n", "nissan\n", "honda\n", "jeep\n", "pontiac\n", "land rover\n", "mercury\n", "lincoln\n" ] } ], "source": [ "// \n", "// Create a list of categories sorted according to a num. of vehicles in the dataset.\n", "// \n", "\n", "val brandsByCount = mpg.valueCounts { manufacturer }.manufacturer.toList()\n", "brandsByCount.forEach(::println)" ] }, { "cell_type": "markdown", "id": "298c25fd", "metadata": {}, "source": [ "#### 2. Annotate \"manufacturer\" as a `Categorical` Using `asDiscrete(levels=..)` " ] }, { "cell_type": "code", "execution_count": 7, "id": "187a9fba", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "letsPlot(mpgData) + \n", " geomBar(labels = layerLabels(\"..count..\").format(\"..count..\", \"d\"), tooltips = tooltipsNone) {\n", " x = asDiscrete(\"manufacturer\", levels = brandsByCount) \n", " } + \n", " coordFlip()" ] }, { "cell_type": "markdown", "id": "bright-consequence", "metadata": {}, "source": [ "#### 3. Faceted Plots Respect the Order of Categories in Facet Variable\n", "\n", "However, it's fairly tricky to annotate a facet variable with `asDiscrete()`." ] }, { "cell_type": "code", "execution_count": 8, "id": "patent-edmonton", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "letsPlot(data = mpgData) {\n", " // Utilize unused aesthetic just to annotate \"manufacturer\" as Categorical.\n", " slope = asDiscrete(\"manufacturer\", levels = brandsByCount) \n", " \n", " } +\n", " geomPie() { \n", " fill = \"drv\"\n", " size = \"..sum..\" \n", " } +\n", " facetWrap(facets = \"manufacturer\", ncol = 5, order = 0) + \n", " scaleSize(range = 2 to 10) +\n", " guides(size = \"none\") +\n", " themeVoid()\n" ] } ], "metadata": { "kernelspec": { "display_name": "Kotlin", "language": "kotlin", "name": "kotlin" }, "language_info": { "codemirror_mode": "text/x-kotlin", "file_extension": ".kt", "mimetype": "text/x-kotlin", "name": "kotlin", "nbconvert_exporter": "", "pygments_lexer": "kotlin", "version": "1.8.20" } }, "nbformat": 4, "nbformat_minor": 5 }