{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Categorical mapping\n",
"A categorical attribute with $n$ distinct values is mapped into $n$ binary attributes. \n",
"\n",
"It is also possible to map into $n-1$ binary values, where the scenario where all binary attributes are equal to zero corresponds to the last categorical value not indicated in the attributes. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Loading required package: daltoolbox\n",
"\n",
"Registered S3 method overwritten by 'quantmod':\n",
" method from\n",
" as.zoo.data.frame zoo \n",
"\n",
"\n",
"Attaching package: ‘daltoolbox’\n",
"\n",
"\n",
"The following object is masked from ‘package:base’:\n",
"\n",
" transform\n",
"\n",
"\n"
]
}
],
"source": [
"# DAL ToolBox\n",
"# version 1.0.777\n",
"\n",
"source(\"https://raw.githubusercontent.com/cefet-rj-dal/daltoolbox/main/jupyter.R\")\n",
"\n",
"#loading DAL\n",
"load_library(\"daltoolbox\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### dataset for example "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"A data.frame: 6 × 5\n",
"\n",
"\t | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\t | <dbl> | <dbl> | <dbl> | <dbl> | <fct> |
\n",
"\n",
"\n",
"\t1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
\n",
"\t2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
\n",
"\t3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
\n",
"\t4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
\n",
"\t5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
\n",
"\t6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 6 × 5\n",
"\\begin{tabular}{r|lllll}\n",
" & Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
" & & & & & \\\\\n",
"\\hline\n",
"\t1 & 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n",
"\t2 & 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n",
"\t3 & 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n",
"\t4 & 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n",
"\t5 & 5.0 & 3.6 & 1.4 & 0.2 & setosa\\\\\n",
"\t6 & 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 6 × 5\n",
"\n",
"| | Sepal.Length <dbl> | Sepal.Width <dbl> | Petal.Length <dbl> | Petal.Width <dbl> | Species <fct> |\n",
"|---|---|---|---|---|---|\n",
"| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |\n",
"| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |\n",
"| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |\n",
"| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |\n",
"| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |\n",
"| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n",
"1 5.1 3.5 1.4 0.2 setosa \n",
"2 4.9 3.0 1.4 0.2 setosa \n",
"3 4.7 3.2 1.3 0.2 setosa \n",
"4 4.6 3.1 1.5 0.2 setosa \n",
"5 5.0 3.6 1.4 0.2 setosa \n",
"6 5.4 3.9 1.7 0.4 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris <- datasets::iris\n",
"head(iris)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### creating categorical mapping"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Speciessetosa Speciesversicolor Speciesvirginica\n",
"1 1 0 0\n",
"2 1 0 0\n",
"3 1 0 0\n",
"4 1 0 0\n",
"5 1 0 0\n",
"6 1 0 0\n"
]
}
],
"source": [
"cm <- categ_mapping(\"Species\")\n",
"iris_cm <- transform(cm, iris)\n",
"print(head(iris_cm))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### creating categorical mapping\n",
"Can be made from a single column, but needs to be a data frame"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A data.frame: 6 × 1\n",
"\n",
"\t | Species |
\n",
"\t | <fct> |
\n",
"\n",
"\n",
"\t1 | setosa |
\n",
"\t2 | setosa |
\n",
"\t3 | setosa |
\n",
"\t4 | setosa |
\n",
"\t5 | setosa |
\n",
"\t6 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 6 × 1\n",
"\\begin{tabular}{r|l}\n",
" & Species\\\\\n",
" & \\\\\n",
"\\hline\n",
"\t1 & setosa\\\\\n",
"\t2 & setosa\\\\\n",
"\t3 & setosa\\\\\n",
"\t4 & setosa\\\\\n",
"\t5 & setosa\\\\\n",
"\t6 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 6 × 1\n",
"\n",
"| | Species <fct> |\n",
"|---|---|\n",
"| 1 | setosa |\n",
"| 2 | setosa |\n",
"| 3 | setosa |\n",
"| 4 | setosa |\n",
"| 5 | setosa |\n",
"| 6 | setosa |\n",
"\n"
],
"text/plain": [
" Species\n",
"1 setosa \n",
"2 setosa \n",
"3 setosa \n",
"4 setosa \n",
"5 setosa \n",
"6 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"diris <- iris[,\"Species\", drop=FALSE]\n",
"head(diris)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Speciessetosa Speciesversicolor Speciesvirginica\n",
"1 1 0 0\n",
"2 1 0 0\n",
"3 1 0 0\n",
"4 1 0 0\n",
"5 1 0 0\n",
"6 1 0 0\n"
]
}
],
"source": [
"iris_cm <- transform(cm, diris)\n",
"print(head(iris_cm))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "4.3.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}