{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Categorical mapping\n", "A categorical attribute with $n$ distinct values is mapped into $n$ binary attributes. \n", "\n", "It is also possible to map into $n-1$ binary values, where the scenario where all binary attributes are equal to zero corresponds to the last categorical value not indicated in the attributes. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Loading required package: daltoolbox\n", "\n", "Registered S3 method overwritten by 'quantmod':\n", " method from\n", " as.zoo.data.frame zoo \n", "\n", "\n", "Attaching package: ‘daltoolbox’\n", "\n", "\n", "The following object is masked from ‘package:base’:\n", "\n", " transform\n", "\n", "\n" ] } ], "source": [ "# DAL ToolBox\n", "# version 1.0.777\n", "\n", "source(\"https://raw.githubusercontent.com/cefet-rj-dal/daltoolbox/main/jupyter.R\")\n", "\n", "#loading DAL\n", "load_library(\"daltoolbox\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### dataset for example " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 6 × 5
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
<dbl><dbl><dbl><dbl><fct>
15.13.51.40.2setosa
24.93.01.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
55.03.61.40.2setosa
65.43.91.70.4setosa
\n" ], "text/latex": [ "A data.frame: 6 × 5\n", "\\begin{tabular}{r|lllll}\n", " & Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", " & & & & & \\\\\n", "\\hline\n", "\t1 & 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n", "\t2 & 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n", "\t3 & 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n", "\t4 & 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n", "\t5 & 5.0 & 3.6 & 1.4 & 0.2 & setosa\\\\\n", "\t6 & 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 6 × 5\n", "\n", "| | Sepal.Length <dbl> | Sepal.Width <dbl> | Petal.Length <dbl> | Petal.Width <dbl> | Species <fct> |\n", "|---|---|---|---|---|---|\n", "| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |\n", "| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |\n", "| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |\n", "| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |\n", "| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |\n", "| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "1 5.1 3.5 1.4 0.2 setosa \n", "2 4.9 3.0 1.4 0.2 setosa \n", "3 4.7 3.2 1.3 0.2 setosa \n", "4 4.6 3.1 1.5 0.2 setosa \n", "5 5.0 3.6 1.4 0.2 setosa \n", "6 5.4 3.9 1.7 0.4 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris <- datasets::iris\n", "head(iris)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### creating categorical mapping" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Speciessetosa Speciesversicolor Speciesvirginica\n", "1 1 0 0\n", "2 1 0 0\n", "3 1 0 0\n", "4 1 0 0\n", "5 1 0 0\n", "6 1 0 0\n" ] } ], "source": [ "cm <- categ_mapping(\"Species\")\n", "iris_cm <- transform(cm, iris)\n", "print(head(iris_cm))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### creating categorical mapping\n", "Can be made from a single column, but needs to be a data frame" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 6 × 1
Species
<fct>
1setosa
2setosa
3setosa
4setosa
5setosa
6setosa
\n" ], "text/latex": [ "A data.frame: 6 × 1\n", "\\begin{tabular}{r|l}\n", " & Species\\\\\n", " & \\\\\n", "\\hline\n", "\t1 & setosa\\\\\n", "\t2 & setosa\\\\\n", "\t3 & setosa\\\\\n", "\t4 & setosa\\\\\n", "\t5 & setosa\\\\\n", "\t6 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 6 × 1\n", "\n", "| | Species <fct> |\n", "|---|---|\n", "| 1 | setosa |\n", "| 2 | setosa |\n", "| 3 | setosa |\n", "| 4 | setosa |\n", "| 5 | setosa |\n", "| 6 | setosa |\n", "\n" ], "text/plain": [ " Species\n", "1 setosa \n", "2 setosa \n", "3 setosa \n", "4 setosa \n", "5 setosa \n", "6 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "diris <- iris[,\"Species\", drop=FALSE]\n", "head(diris)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Speciessetosa Speciesversicolor Speciesvirginica\n", "1 1 0 0\n", "2 1 0 0\n", "3 1 0 0\n", "4 1 0 0\n", "5 1 0 0\n", "6 1 0 0\n" ] } ], "source": [ "iris_cm <- transform(cm, diris)\n", "print(head(iris_cm))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.3.3" } }, "nbformat": 4, "nbformat_minor": 4 }