{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Homepage: https://spkit.github.io\n",
"
Nikesh Bajaj : http://nikeshbajaj.in"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DecisionTrees without converting Catogorical Features using SpKit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: Most of ML libraries force us to convert the catogorycal features into one-hot vector or any numerical value. However, it should not be the case, **Not atleast with Decision Trees**, due a simple reason, of how decision tree works. In **spkit library**, Decision tree can handle mixed type input features, 'Catogorical' and 'Numerical'. In this notebook, I would use a dataset *hurricNamed* from *vincentarelbundock* github repository, and use only a few features, mixed of catogorical and numerical features. Converting number of deaths to binary with threshold of 10, we handle this as Classification Problem. \n",
"\n",
"However, it is not shown that coverting features into one-hot vector or any label encoder affects the performance of model, but, it is useful, when you need to visulize the decision process. Very important when you need to extract and simplify the decision rule."
]
},
{
"cell_type": "markdown",
"metadata": {
"toc": true
},
"source": [
"