{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# OrdinalEncoder\n",
"The OrdinalEncoder() will replace the variable labels by digits, from 1 to the number of different labels. \n",
"\n",
"If we select \"arbitrary\", then the encoder will assign numbers as the labels appear in the variable (first come first served).\n",
"\n",
"If we select \"ordered\", the encoder will assign numbers following the mean of the target value for that label. So labels for which the mean of the target is higher will get the number 1, and those where the mean of the target is smallest will get the number n."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from sklearn.model_selection import train_test_split\n",
"from feature_engine.encoding import OrdinalEncoder"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Load titanic dataset from OpenML\n",
"\n",
"def load_titanic():\n",
" data = pd.read_csv('https://www.openml.org/data/get_csv/16826755/phpMYEkMl')\n",
" data = data.replace('?', np.nan)\n",
" data['cabin'] = data['cabin'].astype(str).str[0]\n",
" data['pclass'] = data['pclass'].astype('O')\n",
" data['age'] = data['age'].astype('float')\n",
" data['fare'] = data['fare'].astype('float')\n",
" data['embarked'].fillna('C', inplace=True)\n",
" data.drop(labels=['boat', 'body', 'home.dest'], axis=1, inplace=True)\n",
" return data"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" pclass \n",
" survived \n",
" name \n",
" sex \n",
" age \n",
" sibsp \n",
" parch \n",
" ticket \n",
" fare \n",
" cabin \n",
" embarked \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 1 \n",
" 1 \n",
" Allen, Miss. Elisabeth Walton \n",
" female \n",
" 29.0000 \n",
" 0 \n",
" 0 \n",
" 24160 \n",
" 211.3375 \n",
" B \n",
" S \n",
" \n",
" \n",
" 1 \n",
" 1 \n",
" 1 \n",
" Allison, Master. Hudson Trevor \n",
" male \n",
" 0.9167 \n",
" 1 \n",
" 2 \n",
" 113781 \n",
" 151.5500 \n",
" C \n",
" S \n",
" \n",
" \n",
" 2 \n",
" 1 \n",
" 0 \n",
" Allison, Miss. Helen Loraine \n",
" female \n",
" 2.0000 \n",
" 1 \n",
" 2 \n",
" 113781 \n",
" 151.5500 \n",
" C \n",
" S \n",
" \n",
" \n",
" 3 \n",
" 1 \n",
" 0 \n",
" Allison, Mr. Hudson Joshua Creighton \n",
" male \n",
" 30.0000 \n",
" 1 \n",
" 2 \n",
" 113781 \n",
" 151.5500 \n",
" C \n",
" S \n",
" \n",
" \n",
" 4 \n",
" 1 \n",
" 0 \n",
" Allison, Mrs. Hudson J C (Bessie Waldo Daniels) \n",
" female \n",
" 25.0000 \n",
" 1 \n",
" 2 \n",
" 113781 \n",
" 151.5500 \n",
" C \n",
" S \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" pclass survived name sex \\\n",
"0 1 1 Allen, Miss. Elisabeth Walton female \n",
"1 1 1 Allison, Master. Hudson Trevor male \n",
"2 1 0 Allison, Miss. Helen Loraine female \n",
"3 1 0 Allison, Mr. Hudson Joshua Creighton male \n",
"4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female \n",
"\n",
" age sibsp parch ticket fare cabin embarked \n",
"0 29.0000 0 0 24160 211.3375 B S \n",
"1 0.9167 1 2 113781 151.5500 C S \n",
"2 2.0000 1 2 113781 151.5500 C S \n",
"3 30.0000 1 2 113781 151.5500 C S \n",
"4 25.0000 1 2 113781 151.5500 C S "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = load_titanic()\n",
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"X = data.drop(['survived', 'name', 'ticket'], axis=1)\n",
"y = data.survived"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"cabin 0\n",
"pclass 0\n",
"embarked 0\n",
"dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we will encode the below variables, they have no missing values\n",
"X[['cabin', 'pclass', 'embarked']].isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"cabin object\n",
"pclass object\n",
"embarked object\n",
"dtype: object"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"''' Make sure that the variables are type (object).\n",
"if not, cast it as object , otherwise the transformer will either send an error (if we pass it as argument) \n",
"or not pick it up (if we leave variables=None). '''\n",
"\n",
"X[['cabin', 'pclass', 'embarked']].dtypes"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((916, 8), (393, 8))"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# let's separate into training and testing set\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)\n",
"\n",
"X_train.shape, X_test.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The OrdinalEncoder() replaces categories by ordinal numbers \n",
"(0, 1, 2, 3, etc). The numbers can be ordered based on the mean of the target\n",
"per category, or assigned arbitrarily.\n",
"\n",
"Ordered ordinal encoding: for the variable colour, if the mean of the target\n",
"for blue, red and grey is 0.5, 0.8 and 0.1 respectively, blue is replaced by 1,\n",
"red by 2 and grey by 0.\n",
"\n",
"Arbitrary ordinal encoding: the numbers will be assigned arbitrarily to the\n",
"categories, on a first seen first served basis.\n",
"\n",
"The encoder will encode only categorical variables (type 'object'). A list\n",
"of variables can be passed as an argument. If no variables are passed, the\n",
"encoder will find and encode all categorical variables (type 'object').\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ordered"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"OrdinalEncoder(variables=['pclass', 'cabin', 'embarked'])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we will encode 3 variables:\n",
"'''\n",
"Parameters\n",
"----------\n",
"\n",
"encoding_method : str, default='ordered' \n",
" Desired method of encoding.\n",
"\n",
" 'ordered': the categories are numbered in ascending order according to\n",
" the target mean value per category.\n",
"\n",
" 'arbitrary' : categories are numbered arbitrarily.\n",
" \n",
"variables : list, default=None\n",
" The list of categorical variables that will be encoded. If None, the \n",
" encoder will find and select all object type variables.\n",
"'''\n",
"ordinal_enc = OrdinalEncoder(encoding_method='ordered',\n",
" variables=['pclass', 'cabin', 'embarked'])\n",
"\n",
"# for this encoder, we need to pass the target as argument\n",
"# if encoding_method='ordered'\n",
"ordinal_enc.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'pclass': {3: 0, 2: 1, 1: 2},\n",
" 'cabin': {'T': 0,\n",
" 'n': 1,\n",
" 'G': 2,\n",
" 'A': 3,\n",
" 'C': 4,\n",
" 'F': 5,\n",
" 'D': 6,\n",
" 'E': 7,\n",
" 'B': 8},\n",
" 'embarked': {'S': 0, 'Q': 1, 'C': 2}}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ordinal_enc.encoder_dict_"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" pclass \n",
" sex \n",
" age \n",
" sibsp \n",
" parch \n",
" fare \n",
" cabin \n",
" embarked \n",
" \n",
" \n",
" \n",
" \n",
" 271 \n",
" 2 \n",
" male \n",
" 24.0 \n",
" 1 \n",
" 0 \n",
" 82.2667 \n",
" 8 \n",
" 0 \n",
" \n",
" \n",
" 61 \n",
" 2 \n",
" female \n",
" 76.0 \n",
" 1 \n",
" 0 \n",
" 78.8500 \n",
" 4 \n",
" 0 \n",
" \n",
" \n",
" 1280 \n",
" 0 \n",
" male \n",
" 22.0 \n",
" 0 \n",
" 0 \n",
" 7.8958 \n",
" 1 \n",
" 0 \n",
" \n",
" \n",
" 247 \n",
" 2 \n",
" female \n",
" 54.0 \n",
" 1 \n",
" 0 \n",
" 59.4000 \n",
" 1 \n",
" 2 \n",
" \n",
" \n",
" 361 \n",
" 1 \n",
" female \n",
" 22.0 \n",
" 1 \n",
" 1 \n",
" 29.0000 \n",
" 1 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" pclass sex age sibsp parch fare cabin embarked\n",
"271 2 male 24.0 1 0 82.2667 8 0\n",
"61 2 female 76.0 1 0 78.8500 4 0\n",
"1280 0 male 22.0 0 0 7.8958 1 0\n",
"247 2 female 54.0 1 0 59.4000 1 2\n",
"361 1 female 22.0 1 1 29.0000 1 0"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# transform and visualise the data\n",
"\n",
"train_t = ordinal_enc.transform(X_train)\n",
"test_t = ordinal_enc.transform(X_test)\n",
"\n",
"test_t.sample(5)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbkAAAFNCAYAAACdVxEnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deZxddX3/8debSQIBspCZkIHsmYQliYAwBJIgiywJaItVq6DFqmhMW1Rq9SfaDbUttvXXn6i0NLWUH7aVWqu/phYzKEqwJGAGCZhJAJOwZEhmspGNrJP5/P44Z4abyyw3Yc4s576fj8c8cs9yz/ncOzfzvufcz/leRQRmZmZ5dFxfF2BmZpYVh5yZmeWWQ87MzHLLIWdmZrnlkDMzs9xyyJmZWW455KxLkh6W9JFjvO8ESXskVfR0XQX7uF3SP3exvEHS5ce47ZA09ZiLy6mB+rxImpTWPqiva7He45ArA5JekLQvDZwmSfdKOjmj/VzVNh0RL0XEyRFxuKf3VaqImBERD/f2fgdqEJSz4tdvue0/rxxy5ePXIuJk4DzgzcDn+rges1zJ8oyFHTuHXJmJiCagjiTsAJB0saRlknZIeqqz03uSaiT9RNI2SVsl/YukkemybwETgP9Kjxj/V/HpIUmnS1osabuktZI+WrDt2yV9R9J9knanpxlrC5Z/VtLL6bJnJV1ZUNqQLu7X/u443cd3Jf1buu4vJJ3bzVN2naT16eP9a0nt/2ckfVjSGkmvSKqTNDGd/0i6ylPpc/FeSUslvStdfkn6vFyXTl8laWV3202XnSXpR+lz+Kyk9xQsu1fSXZL+O318j0uq6eR32fa7WSBpo6RNkv6gYHmFpM9LWpdu6wlJ4zvYztskPSlpl6QNkm4vWHaCpH9OXy87JK2QNCZd9sH0ed0t6XlJ7++kzlmSlqf33yTpG5KGFCwPSQsl/Sp9vu6SpILH8JX0d7ceeFuHv2E6fv2m8/9dydmPnZIekTSj6Pn+O0kPSHoVuELS+enzsTu9779J+rOC+7xd0sr08SyTdE5X+7ceEBH+yfkP8AJwVXp7HPBL4M50eiywDbiO5E3P1en06HT5w8BH0ttT0+XHA6OBR4CvdrSfdHoSEMCgdHop8LfACSQhuwW4Ml12O7A/raMCuAN4LF12JrABOL1guzXd3a+Dx347cAh4NzAY+DTwPDC4k+ctgJ8Co0j+AD1X8Fy8A1gLnA0MAv4IWFZ036kF018Evp7e/jywDvjLgmV3drdd4KT0efhQuux8YCswI11+L7AdmJUu/xfg/k4eW9vv5tvpdt+U/j7anqvPkLxOzgQEnAtUFj824PL0vscB5wDNwDvSZR8D/gs4Mf3dXAAMT/e3CzgzXe+0tsfQQZ0XABenj2cSsAa4teh5/gEwMv0dbQHmp8sWAs8A49Pf4U8peD129f+kYN6HgWEkr/mvAisLlt0L7ATmpo9/OPAi8EmS19c7gYPAn6Xrnw9sBi5Kn4/fTvd5fGf7908P/P3r6wL80wu/5OQ/zx5gd/qf/CFgZLrss8C3itavA347vf0w6R/2Drb7DuDJov10GHLpH5rDwLCC5XcA96a3bwd+XLBsOrAvvT01/eNwFUWB1NX9imtK1y0MwOOATcBbOnl80fYHM53+XeCh9PYPgZuLtrUXmFhw38KQuxJ4Or29BPgIr4X4UuCd3W0XeC/ws6Ia/x740/T2vcA3C5ZdBzzTyWNr+92cVTDvr4B/TG8/C1zfxfMytZNlXwX+T3r7w8Ay4JyidU4CdgDvAoYe5Wv5VuD7RbVcUjD9HeC29PZPgIUFy67hKEOuaPnI9P4jCp7v+wqWXwq8DKhg3v/wWsj9HfClom0+C1xWyv79c2w/Pl1ZPt4REcNI3nmfBVSl8ycCv5mePtkhaQdwCcm76yNIOlXS/UpOG+4C/rlgO905HdgeEbsL5r1IciTZpqng9l7gBEmDImItyR+324HNaQ2nd3e/TurY0HYjIlqBxrS2zmwouP1iwboTgTsLnrPtJEc8Y+nYcuCM9HTdecB9wHhJVSRHXm2nOLva7kTgoqLf1fuB6oL9FD8X3TUYdfb4xpMcbXZJ0kWSfippi6SdJEdPba+Jb5G8Ybo/PSX6V5IGR8SrJIG9ENiUnl49q5PtnyHpB+kpw13AX/D611xnj/n0Dh5fydLTnV9OT9nuIgkhivZfuP3TgZcjTawOlk8E/qDo9zeerl9/9gY55MpMRCwleQf6lXTWBpIjuZEFPydFxJc7uPsdJO9kz4mI4cBvkfwBbt98F7veCIySNKxg3gSSd76l1P2vEXEJyR+KAP6ylPt1oP1zJSWfr41La+t2fZJ629bdAHys6HkbGhHLOql/L/AEyamsVRFxkOQo51PAuojYWsJ2NwBLi5adHBG/c9TPQmmPr8PP84r8K7AYGB8RI4C7SV8TEXEoIr4QEdOBOcDbgQ+ky+oi4mqSN1PPAP/Qyfb/Ll0+LX3NfZ4jX3Nd2dTB4+tK8ev3fcD1JGcQRpAc/ULnr/lNwNi2zwRThfvfAPx50e/vxIj4dif7tx7gkCtPXwWulnQeydHYr0mal75zPUHS5ZLGdXC/YSSnPXdIGkvyuU2hZmBKRzuMiA0kf9TvSPdxDnAzyedGXZJ0pqS3Sjqe5PO3fSSnPo/FBZLemR7p3QocAB7rYv3PSDolbbr4JPBv6fy7gc+1NSJIGiHpNwvu19FzsRS4Jf0XklPBhdPdbfcHJEeDN0kanP5cKOnskh/96/2xpBPT/X2o4PF9E/iSpGlKnCOpsoP7DyM5Qt8vaRZJMJDWfoWkNynpOtxF8nnoYUljJP26pJNInv89dP77HJbed096tHc0gf4d4BOSxkk6Bbitm/WLf2fD0vq2kXyu+Bfd3H85yeO4RdIgSdeTHKW3+QdgYXr0K0knKWncaXvj1+n/Hzt2DrkyFBFbSE6X/XEaPteTvEPeQvJu8zN0/Nr4AsmH5zuB/wa+V7T8DuCP0lMxn+7g/jeSvBveCHyf5LOkH5VQ8vHAl0maLJqAU9N6j8V/kpwqewW4ieSzsEPdrP8EsJLkMf8jQER8n+Ro8v70VNYq4NqC+90O/N/0uWjrgFxK8ofzkU6mu9xueqr3GuAGkuewKV33+KN9EgosJWl0eQj4SkQ8mM7/G5KQeJAkZP4RGNrB/X8X+KKk3cCfpPdpUw18N73/mnRf/0zy2vqD9DFsBy5Lt9ORT5ME526SkPi3TtbryD+QnC59CvgFr3+9Fit+/d5HcorzZWA1Xb8ZIj06fyfJm7cdJGc6fkASlEREPfBR4Bskr7+1wAe72L/1AB15+tgsv5S0t0+NiN/q61r6mqRJvNZZ2tK31eSXpMeBuyPin/q6lnLlIzkzsx4i6TJJ1enpyt8muaxiSV/XVc4yCzlJ90jaLGlVJ8sl6WtKLgp+WtL5WdViZtZLziQ5PbqT5JTsuyNiU9+WVN4yO10p6VKSD5Tvi4iZHSy/Dvg4ybU8F5FcDHtRJsWYmVlZyuxILiIeIflQuTPXkwRgRMRjwEhJr7s2y8zM7Fj15WdyYznyQslGOr+Q1szM7Kj15fcqdXRBZ4fnTiUtABYAnHTSSRecdVaHgyOYmVmZeuKJJ7ZGxOji+X0Zco0cORpApyNPRMQiYBFAbW1t1NfXZ1+dmZkNGJI6HLatL09XLgY+kHZZXgzsdBeSmZn1pMyO5CR9m2Qw4CpJjcCfknz9BBFxN/AASWflWpJBVT+UVS1mZlaeMgu5iLixm+UB/F5W+zczM/OIJ2ZmllsOOTMzyy2HnJmZ5ZZDzszMcsshZ2ZmueWQMzOz3HLImZlZbjnkzMwstxxyZmaWWw45MzPLLYecmZnllkPOzMxyyyFnZma55ZAzM7PccsiZmVluOeTMzCy3HHJmZpZbDjkzM8utTENO0nxJz0paK+m2DpafIun7kp6W9HNJM7Osx8zMyktmISepArgLuBaYDtwoaXrRap8HVkbEOcAHgDuzqsfMzMpPlkdys4C1EbE+Ig4C9wPXF60zHXgIICKeASZJGpNhTWZmVkayDLmxwIaC6cZ0XqGngHcCSJoFTATGZViTmZmVkSxDTh3Mi6LpLwOnSFoJfBx4Emh53YakBZLqJdVv2bKl5ys1M7NcGpThthuB8QXT44CNhStExC7gQwCSBDyf/lC03iJgEUBtbW1xUJqZmXUoyyO5FcA0SZMlDQFuABYXriBpZLoM4CPAI2nwmZmZvWGZHclFRIukW4A6oAK4JyIaJC1Ml98NnA3cJ+kwsBq4Oat6zMys/GR5upKIeAB4oGje3QW3lwPTsqzBzMzKl0c8MTOz3HLImZlZbjnkzMwstxxyZmaWWw45MzPLLYecmZnllkPOzMxyyyFnZma55ZAzM7PccsiZmVluOeTMzCy3HHJmZpZbDjkzM8sth5yZmeWWQ87MzHLLIWdmZrnlkDMzs9xyyJmZWW455MzMLLcyDTlJ8yU9K2mtpNs6WD5C0n9JekpSg6QPZVmPmZmVl8xCTlIFcBdwLTAduFHS9KLVfg9YHRHnApcD/1vSkKxqMjOz8pLlkdwsYG1ErI+Ig8D9wPVF6wQwTJKAk4HtQEuGNZmZWRnJMuTGAhsKphvTeYW+AZwNbAR+CXwyIlqLNyRpgaR6SfVbtmzJql4zM8uZLENOHcyLoul5wErgdOA84BuShr/uThGLIqI2ImpHjx7d85WamVkuZRlyjcD4gulxJEdshT4EfC8Sa4HngbMyrMnMzMpIliG3ApgmaXLaTHIDsLhonZeAKwEkjQHOBNZnWJOZmZWRQVltOCJaJN0C1AEVwD0R0SBpYbr8buBLwL2SfklyevOzEbE1q5rMzKy8ZBZyABHxAPBA0by7C25vBK7JsgYzMytfHvHEzMxyyyFnZma55ZAzM7PccsiZmVluOeTMzCy3HHJmZpZbDjkzM8sth5yZmeWWQ87MzHLLIWdmZrnlkDMzs9xyyJmZWW455MzMLLcccmZmllsOOTMzyy2HnJmZ5ZZDzszMcsshZ2ZmuZVpyEmaL+lZSWsl3dbB8s9IWpn+rJJ0WNKoLGsyM7PykVnISaoA7gKuBaYDN0qaXrhORPx1RJwXEecBnwOWRsT2rGoyM7PykuWR3CxgbUSsj4iDwP3A9V2sfyPw7QzrMTOzMpNlyI0FNhRMN6bzXkfSicB84D8yrMfMzMpMliGnDuZFJ+v+GvBoZ6cqJS2QVC+pfsuWLT1WoJmZ5VuWIdcIjC+YHgds7GTdG+jiVGVELIqI2oioHT16dA+WaGZmeZZlyK0ApkmaLGkISZAtLl5J0gjgMuA/M6zFzMzK0KCsNhwRLZJuAeqACuCeiGiQtDBdfne66m8AD0bEq1nVYmZm5UkRnX1M1j/V1tZGfX19X5dhZmb9iKQnIqK2eL5HPDEzs9xyyJmZWW455MzMLLcccmZmllsOOTMzyy2HnJmZ5ZZDzszMcsshZ2ZmueWQMzOz3HLImZlZbjnkzMwstxxyZmaWWw45MzPLrW5DTtLcUuaZmZn1N6UcyX29xHlmZmYlefVACw/8chPffaIx0/10+qWpkmYDc4DRkj5VsGg4yZegmpmZlWzH3oP8eM1mlqxq4me/2sKBllbOqh7Guy8Yl9k+u/pm8CHAyek6wwrm7wLenVlFZmaWG0079/Pg6ibqGpp4bP12DrcGp404gRtnTWDejGounHRKpvvvNOQiYimwVNK9EfGipJMi4tVMqzEzswHv+a2vsmRVEmwrN+wAYMrok/jYpVOYP7OaN40dgaReqaWrI7k2p0v6IclR3QRJ5wIfi4jf7e6OkuYDd5Kc3vxmRHy5g3UuB74KDAa2RsRlR1G/mZn1sYigYeMuHmxoYklDE8817wHgTWNH8Jl5ZzJvxhimnjqsm61ko5SQ+yowD1gMEBFPSbq0uztJqgDuAq4GGoEVkhZHxOqCdUYCfwvMj4iXJJ16DI/BzMx62eHW4BcvvULdqiTYGl/Zx3GCCyeN4k/ePp1rZoxh3Ckn9nWZJYUcEbGh6NDycAl3mwWsjYj1AJLuB64HVhes8z7gexHxUrqfzaXUY2Zmve9gSyvL1m2lrqGZH61uYuuegwypOI65Uyv5+FunctXZY6g8+fi+LvMIpYTcBklzgJA0BPgEsKaE+40FNhRMNwIXFa1zBjBY0sMkzS13RsR9JWzbzMx6wd6DLSx9dgtLGpr4yTOb2b2/hROHVHDFWacyb0Y1V5w5mmEnDO7rMjtVSsgtJPlcbSxJUD0I/F4J9+voU8XoYP8XAFcCQ4Hlkh6LiOeO2JC0AFgAMGHChBJ2bWZmx6qt1b+uoYlHnkta/U85cTDzZ1Qzf2Y1c6dWccLggXElWbchFxFbgfcfw7YbgfEF0+OAjR2sszXt2nxV0iPAucARIRcRi4BFALW1tcVBaWZmb1Dzrv3tjSOdtfoPqhh4I0F2G3KSvtbB7J1AfUT8Zxd3XQFMkzQZeBm4geQzuEL/CXxD0iCS6/IuAv5PKYWbmdkb8/zWV6lrSFr9n3zpyFb/eTOqOWdc77X6Z6WU05UnAGcB/55OvwtoAG6WdEVE3NrRnSKiRdItQB3JJQT3RESDpIXp8rsjYo2kJcDTQCvJZQar3thDMjOzjkQEqzftoq6hmbpVTTzbvBtIWv0/fc0ZzJ9Z3Wet/llRRNdn/yT9BLgmIlrS6UEkn8tdDfwyIqZnXmWB2traqK+v781dmpkNWK1pq/+SVU3UrW5iw/ak1b920ijmz6juN63+b5SkJyKitnh+KUdyY4GTSE5Rkt4+PSIOSzrQgzWamVkPONjSyvL126hraOLBhma27jnQ3ur/e5dP5arpY6jqZ63+WSkl5P4KWJm2+Qu4FPgLSScBP86wNjMzK1Fbq39dQxMPFbb6n3kq82b2/1b/rHQZcpKOI7kmbg7Jxd0CPh8RbV2Sn8m2PDMz68yOvQd5aM1mlhS0+o9MW/3nzajmkmkDp9U/K12GXES0SvrfETGbpBPSzMz6UPOu/Ty4OmkcWb5+G4dbg+rhJ3DDheOZN7OaWZNGDchW/6yUcrryQUnvIhl+y9eomZn1shfSVv8lha3+VSexoK3Vf+wIjjtuYLf6Z6WUkPsUSbNJi6T9JKcsIyKGZ1qZmVmZigjWbNrNkoYmHmxo4pmmpNV/5tjh/MHVba3+Jw/4a9h6QykjnuTrogkzs36ordW/7Yhtw/Z9KB3V/4/fPp1rpo9h/KiB3+rf20r6FgJJpwDTSC4MByAiHsmqKDOzcnCwpZXH1m9jSUMTP1rdzJbdBxhcIeZOrSq7Vv+slDKs10eAT5KMPbkSuBhYDrw129LMzPJn78EWHnluC3UNzfx4TfMRrf7XzBjDFWedyvAybPXPSilHcp8ELgQei4grJJ0FfCHbsszM8mPn3kM89EwzS1Y18civtrD/UNLqP29GNfPd6p+pUkJuf0Tsl4Sk4yPiGUlnZl6ZmdkAtnnXfupWN/NgQxPL122jJW31f2/teObNqGbWZLf694ZSQq5R0kjg/wE/kvQKr//KHDOzsvfitrTVf1UTT27YQQRMrjqJj7xlCvNnutW/L5TSXfkb6c3bJf0UGAH8MNOqzMwGgLZW/7avq2lr9Z9x+nA+ddUZzJtZzTS3+vepUhpPvhURNwFExNK2ecBNGddmZtbvtLYGT25IR/VvaOal7XuTVv+Jo/ijt53NvBnVbvXvR0o5XTmjcEJSBXBBNuWYmfU/hw63snxdOqp/Qav/nJoqfufyGq46ewyjh7nVvz/qNOQkfQ74PDBU0q622cBBYFEv1GZm1mf2HTzM0ue28GBDEz9e08yu/S0MHVzBFWeNZt6Marf6DxCdhlxE3AHcIemOiPhcL9ZkZtYn2lr96xqaWPrca63+16Sj+r/Frf4DTimNJw44M8utzW2j+he1+r+ndjzz3eo/4JU0rNexkjQfuBOoAL4ZEV8uWn45yVf4PJ/O+l5EfDHLmszM2lr96xqa+cVLrxzR6j9vxhjOHTfSrf450dVncpMj4vnOlncnbVC5C7gaaARWSFocEauLVv1ZRLz9WPdjZtadiOCZpt3t17AVtvr//lXJqP5u9c+nro7kvgtcIOmhiLjyGLY9C1gbEesBJN0PXA8Uh5yZWY9LWv13tF/D9uK2pNW/duIpbvUvI12F3HGS/hQ4Q9KnihdGxN90s+2xwIaC6Ubgog7Wmy3pKZJRVD4dEQ3dbNfMrEOHDiej+tc1NPFgQzObC1r9P3ZpDVdPd6t/uekq5G4A3pGucyzfKdfRcX/xN4v/ApgYEXskXUcydNi0121IWgAsAJgwYcIxlGJmebXv4GEe+dUW6lYd2ep/+ZmjmT/Trf7lrqtLCJ4F/lLS0xFxLMN4NQLjC6bHUTTmZUTsKrj9gKS/lVQVEVuL1ltEem1ebW1tcVCaWZnZue8QP3mmmbpVzTz83Gb2H2plxNDBXD29mnkzxnDpGaPd6m9Aad2VyyT9DXBpOr0U+GJE7OzmfiuAaZImAy+THBm+r3AFSdVAc0SEpFnAccC2o3kAZlYeNu/ez4MNR7b6jxl+PO8pGNV/sFv9rUgpIXcPsAp4Tzp9E/BPwDu7ulNEtEi6BagjuYTgnohokLQwXX438G7gdyS1APuAGyLCR2pmBsBL2/YmHZENTe2t/pMqT+Tmt0xm/oxqt/pbt9RdpkhaGRHndTevt9TW1kZ9fX1f7NrMMhYRPNu8u33w4zWbkk80pp82PPmC0ZnVnDHGrf72epKeiIja4vmlHMntk3RJRPxPuqG5JEddZmZvWGtrsLJxB3Wrklb/F9zqbz2olJBbCNwnaUQ6/Qrw29mVZGZ5d+hwK4+v386Shk1HtPrPrqliwaU1XDX9VE4ddkJfl2k5UMrYlU8B50oank7v6uYuZmav097q39DEQ2s2s3PfofZW/7ZR/UcMdau/9aySx650uJnZ0dq57xA/fWYzS1Ylo/rvO3SYEUMHc+XZpzJ/RjVvmTaaoUPc6m/ZyXSAZjMrP5t37+dHq5upa2hm+bqtHDocnDrseN59wTjmzajmoilu9bfe45Azszdsw/a97YMfP1HQ6v/huZOZN7Oa89zqb32kpJCTNAeYVLh+RNyXUU1m1s9FBM8170lb/ZtYnbb6n33acG698gzmzRzDmWOGudXf+ly3ISfpW0ANsBI4nM4OwCFnVkbaW/0bmqhb9Vqr/wUTTuEPr0ta/SdUutXf+pdSjuRqgekeicSs/Bw63MrPn9/OklVNPLi6ieZdBxh0nJgztYqPXjqFq6ePcau/9WulhNwqoBrYlHEtZtYP7D90mEee20JdQzM/XtPc3up/2RmvjervVn8bKEoJuSpgtaSfAwfaZkbEr2dWlZn1qrZW/7qGJh5+Nmn1H37CIK6aPoZ5M6q51K3+NkCVEnK3Z12EmfW+LbsP8KPVzSxpaDqi1f9dF4xl/ozT3OpvuVDKiCdLe6MQM8teW6t/XUMT9S8mrf4T01b/a2ZU8+bxbvW3fCmlu/Ji4OvA2cAQkq/NeTUihmdcm5m9QW2t/m3XsBW2+n/yymnMn1ntVn/LtVJOV36D5AtP/52k0/IDwLQsizKzY9faGjzVuIMlDU082NDM81tfRYLz3epvZaiki8EjYq2kiog4DPyTpGUZ12VmR6Gt1b/tVGRbq//smkpuvmQy10wfw6nD3epv5aeUkNsraQiwUtJfkVxKcFK2ZZlZd/YfOszPfrWVJauaeOiZZnbsPcQJg49rb/V/65ljGHGiW/2tvJUScjcBxwG3AL8PjAfelWVRZvZ6h1uDNZt28ejarSxbt42fP7/9tVb/s8dwzYxqLjvDrf5mhUrprnxR0lDgtIj4wtFsXNJ84E6SZpVvRsSXO1nvQuAx4L0R8d2j2YdZXkUE67bsYdm6bTy6diuPrd/Ozn2HAJh66sm8p3YcV00fw8VTKt3qb9aJUrorfw34Ckln5WRJ5wFf7O5icEkVwF3A1UAjsELS4ohY3cF6fwnUHdtDMMuPDdv3snzdNpatS47WNu9Oxl8YO3Io82aMYe7UKmZPqfTna2YlKvVi8FnAwwARsVLSpBLuNwtYGxHrASTdD1wPrC5a7+PAfwAXllKwWZ5s2X2AZeu2psG2jZe27wWg6uTjmVNTyZyaSuZOrWL8KHdDmh2LUkKuJSJ2HsN1NGOBDQXTjcBFhStIGgv8BvBWHHJWBnbuO8Tj65NAW7ZuK8817wFg+AmDuHhKJR+eO4k5U6uYdurJvnbNrAeUNECzpPcBFZKmAZ8ASrmEoKP/ocXfZPBV4LMRcbir/9CSFgALACZMmFDCrs36h70HW6h/4ZX2UFv18k5aA04YfBwXThrFO88fx5yaSmacPoIKjzRi1uNKCbmPA39IMjjzt0k+O/tSCfdrJOnEbDMO2Fi0Ti1wfxpwVcB1kloi4v8VrhQRi4BFALW1tf7KH+u3Dra0snLDjvbP1J586RUOHQ4GV4g3jz+Fj791GnNqKjlvwkiOH+QuSLOsldJduZck5P7wKLe9ApgmaTLwMsmoKe8r2vbkttuS7gV+UBxwZv3Z4dZg9cZdPJqG2oq0rV+CmaeP4MOXTGZOTRUXTjqFE4eUNPaCmfWgTv/XSVrc1R27666MiBZJt5Ac+VUA90REg6SF6fK7j6Fesz4VEazdXNjWv41d+1sAmJa29c+ZWsXFkyt9IbZZP9DVW8vZJI0j3wYep+PP2LoUEQ8ADxTN6zDcIuKDR7t9s96wYfve9tOPy9ZtY0va1j/ulKFcO/M05kytZHZNpb8h26wf6irkqkmucbuR5DTjfwPfjoiG3ijMrK9s3r0/aelfu41l67eyYfs+IGnrnzu1Mm3td1u/2UDQacilgzEvAZZIOp4k7B6W9MWI+HpvFWiWtZ17D/HY89tYnp6C/NXmI9v6P3LJFObUVDLVbf1mA06Xn4Sn4fY2koCbBHwN+F72ZZllZ+/BFla88Er7Rdhtbf1DB1dw4eRRvOuCccytqWL66cPd1m82wHXVePJ/gZnAD4EvRMSqXqvKrAe1tfU/ujYJtSc3HNnW/4krpzGnporzxo9kyCCPAWmWJ10dyd0EvAqcAXyi4DSNgPA3g1t/dbg1aNi4s70DshzbOOYAAAz4SURBVP6FV9rb+t80Nmnrn1tTRa3b+s1yr6vP5PyW1gaEiOBXm/ewLP0KmsK2/jPGnMx7LxzP7JpKt/WblSG/jbUBacP2ve3fq7Zs3Ta27kna+sePGsp1bzqN2TVu6zczh5wNEJt37Wf5+m3twdb4StLWP3pY0tY/t6aK2TWVbus3syM45Kxf2rn3EMvXb2N5ehF2YVv/7JpKPvqWKcydWknNaLf1m1nnHHLWL7S39adHaqs27iQK2vrffcE45rit38yOkkPO+sSBlsOsfGlH+1fQrNyw47W2/gmn8MkrpzF3ahXnjnNbv5kdO4ec9YrDrcGql3e2h9qKF7az/1Arx6Vt/Teno4pcOGkUQ4f4K2jMrGc45CwTbW39jxa09e8uaOu/4cIJzKmp5KIplYwY6rZ+M8uGQ856zEvbjhytv62tf8KoE3mb2/rNrA845OyYbd61v/30Y3Fb/yVTk5H63dZvZn3JIWcl27H3II+t39Z+pLY2besfMXQwF08ZxYJLk8/V3NZvZv2FQ8469eqBFla8sL39aK1h4672tv5Zk0fxmxeMY+7UKs4+zW39ZtY/OeSs3YGWwzzZ1ta/Nmnrb2kNhlQcx5snjOTWK89gztRKt/Wb2YCRachJmg/cCVQA34yILxctvx74EtAKtAC3RsT/ZFmTvaatrf/R9HvVitv6P5qefqyd6LZ+MxuYMgs5SRXAXcDVQCOwQtLiiFhdsNpDwOKICEnnAN8BzsqqpnIXETzXvIdl67by6NptPP78a239Z44Zxg0XTmDu1CpmTR7ltn4zy4Usj+RmAWsjYj2ApPuB64H2kIuIPQXrnwREhvWUnYjgpe172xtFlq/bytY9BwGYWHkibz/nNGbXVDF7SiWjhx3fx9WamfW8LENuLLChYLoRuKh4JUm/AdwBnAq8LcN6ykLzrv1JS//aJNhe3pG09Z867HgumVrFnKlVzKmpZNwpbus3s/zLMuQ6ard73ZFaRHwf+L6kS0k+n7vqdRuSFgALACZMmNDDZQ5sbW39j65NOiDXbXkVSNr6Z0+p5GOXTWFOTRU1o09yW7+ZlZ0sQ64RGF8wPQ7Y2NnKEfGIpBpJVRGxtWjZImARQG1tbVmf0nz1QAs/f2E7y9cl3622elPS1n/ikKSt/70XjmdOjdv6zcwg25BbAUyTNBl4GbgBeF/hCpKmAuvSxpPzgSHAtgxrGnDa2/rTMSCL2/p//6ozmFNTyTlu6zcze53MQi4iWiTdAtSRXEJwT0Q0SFqYLr8beBfwAUmHgH3AeyOirI/UWg63smrjrvbP1Va8sJ0DLWlb/7iRfPTSKcytqeKCiae4rd/MrBsaaJlSW1sb9fX1fV1Gj4kInm3e3d4o8vj6bew+8Fpb/5x0DEi39ZuZdU7SExFRWzzfI570sra2/rZGkeXrtrHt1YK2/nNPY05NFRe7rd/M7A1zyPWCpp37Wb4+uQB7eVFb/6VnjGZ2TaXb+s3MMuCQy8Arr742Wv+j67ayPm3rH3li0ta/8LIpzHZbv5lZ5hxyPeDVAy38/Pnt7d+rVtzWf+OFE5hdU8n004ZznNv6zcx6jUPuGOw/lLT1L1+3lUfXbeOpgrb+8ycmbf1zpyZt/YMr3NZvZtZXHHIlaDncyi9f3pmO/3hkW/8540amXxZaRe2kUzhhsNv6zcz6C4dcB1pbg+c2704bRbby+Prt7W39Z1UP4/0XTWROTSWzpoxi+Alu6zcz668cciRt/S9u29veKPJYQVv/pMoTefu5pzOnppLZNZVUney2fjOzgaJsQ65p5/72RpFla7eyced+AMYMP57L0rb+2W7rNzMb0Moy5N7798t5/PntwGtt/b9zRfIVNFOq3NZvZpYXZRlyV559KledPYY5Uys5u9pt/WZmeVWWIbfg0pq+LsHMzHqBL+IyM7PccsiZmVluOeTMzCy3HHJmZpZbDjkzM8sth5yZmeWWQ87MzHIr05CTNF/Ss5LWSrqtg+Xvl/R0+rNM0rlZ1mNmZuUls5CTVAHcBVwLTAdulDS9aLXngcsi4hzgS8CirOoxM7Pyk+WR3CxgbUSsj4iDwP3A9YUrRMSyiHglnXwMGJdhPWZmVmayDLmxwIaC6cZ0XmduBn7Y0QJJCyTVS6rfsmVLD5ZoZmZ5lmXIdTTqcXS4onQFSch9tqPlEbEoImojonb06NE9WKKZmeVZlgM0NwLjC6bHARuLV5J0DvBN4NqI2JZhPWZmVmayPJJbAUyTNFnSEOAGYHHhCpImAN8DboqI5zKsxczMylBmR3IR0SLpFqAOqADuiYgGSQvT5XcDfwJUAn+bflFpS0TUZlWTmZmVF0V0+DFZv1VbWxv19fV9XYaZmfUjkp7o6CDJI56YmVluOeTMzCy3HHJmZpZbDjkzM8sth5yZmeWWQ87MzHLLIWdmZrnlkDMzs9xyyJmZWW455MzMLLcccmZmllsOOTMzyy2HnJmZ5ZZDzszMcsshZ2ZmueWQMzOz3HLImZlZbjnkzMwstzINOUnzJT0raa2k2zpYfpak5ZIOSPp0lrWYmVn5GZTVhiVVAHcBVwONwApJiyNidcFq24FPAO/Iqg4zMytfWR7JzQLWRsT6iDgI3A9cX7hCRGyOiBXAoQzrMDOzMpVlyI0FNhRMN6bzzMzMekWWIacO5sUxbUhaIKleUv2WLVveYFlmZlYusgy5RmB8wfQ4YOOxbCgiFkVEbUTUjh49ukeKMzOz/Msy5FYA0yRNljQEuAFYnOH+zMzMjpBZd2VEtEi6BagDKoB7IqJB0sJ0+d2SqoF6YDjQKulWYHpE7MqqLjMzKx+ZhRxARDwAPFA07+6C200kpzHNzMx6nEc8MTOz3HLImZlZbjnkzMwstxxyZmaWWw45MzPLLYecmZnllkPOzMxyyyFnZma55ZAzM7PccsiZmVluOeTMzCy3HHJmZpZbDjkzM8sth5yZmeWWQ87MzHLLIWdmZrnlkDMzs9xyyJmZWW5lGnKS5kt6VtJaSbd1sFySvpYuf1rS+VnWY2Zm5SWzkJNUAdwFXAtMB26UNL1otWuBaenPAuDvsqrHzMzKT5ZHcrOAtRGxPiIOAvcD1xetcz1wXyQeA0ZKOi3DmszMrIxkGXJjgQ0F043pvKNdx8zM7JgMynDb6mBeHMM6SFpAcjoTYI+kZ99gbQBVwNYe2I5ZX/Lr2Aa6nnoNT+xoZpYh1wiML5geB2w8hnWIiEXAop4sTlJ9RNT25DbNeptfxzbQZf0azvJ05QpgmqTJkoYANwCLi9ZZDHwg7bK8GNgZEZsyrMnMzMpIZkdyEdEi6RagDqgA7omIBkkL0+V3Aw8A1wFrgb3Ah7Kqx8zMyo8iXvcRWFmQtCA9DWo2YPl1bANd1q/hsg05MzPLPw/rZWZmuVV2IdfdUGNm/Z2keyRtlrSqr2sxO1aSxkv6qaQ1khokfTKT/ZTT6cp0qLHngKtJLl9YAdwYEav7tDCzoyDpUmAPyWhBM/u6HrNjkY5udVpE/ELSMOAJ4B09/fe43I7kShlqzKxfi4hHgO19XYfZGxERmyLiF+nt3cAaMhjxqtxCzsOImZn1M5ImAW8GHu/pbZdbyJU0jJiZmfUOSScD/wHcGhG7enr75RZyJQ0jZmZm2ZM0mCTg/iUivpfFPsot5EoZaszMzDImScA/Amsi4m+y2k9ZhVxEtABtQ42tAb4TEQ19W5XZ0ZH0bWA5cKakRkk393VNZsdgLnAT8FZJK9Of63p6J2V1CYGZmZWXsjqSMzOz8uKQMzOz3HLImZlZbjnkzMwstxxyZmaWWw45sz4m6XDaPr1K0r9LOrGLdW+X9OnerM9sIHPImfW9fRFxXvqNAgeBhX1dkFleOOTM+pefAVMBJH1A0tOSnpL0reIVJX1U0op0+X+0HQFK+s30qPApSY+k82ZI+nl6xPi0pGm9+qjM+ogvBjfrY5L2RMTJkgaRjOO3BHgE+B4wNyK2ShoVEdsl3Q7siYivSKqMiG3pNv4MaI6Ir0v6JTA/Il6WNDIidkj6OvBYRPxLOqRdRUTs65MHbNaLfCRn1veGSloJ1AMvkYzn91bguxGxFSAiOvr+uJmSfpaG2vuBGen8R4F7JX0UqEjnLQc+L+mzwEQHnJWLQX1dgJkln8kVzkgHr+3uNMu9JN+k/JSkDwKXA0TEQkkXAW8DVko6LyL+VdLj6bw6SR+JiJ/08OMw63d8JGfWPz0EvEdSJYCkUR2sMwzYlH5dyfvbZkqqiYjHI+JPgK3AeElTgPUR8TWSb944J/NHYNYP+EjOrB+KiAZJfw4slXQYeBL4YNFqf0zyTcovAr8kCT2Av04bS0QSlk8BtwG/JekQ0AR8MfMHYdYPuPHEzMxyy6crzcwstxxyZmaWWw45MzPLLYecmZnllkPOzMxyyyFnZma55ZAzM7PccsiZmVlu/X9nKYw9mX7mNAAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"''' The OrdinalEncoder with encoding_method='order' has the characteristic that return monotonic\n",
" variables,that is, encoded variables which values increase as the target increases'''\n",
"\n",
"# let's explore the monotonic relationship\n",
"plt.figure(figsize=(7,5))\n",
"pd.concat([test_t,y_test], axis=1).groupby(\"pclass\")[\"survived\"].mean().plot()\n",
"plt.xticks([0,1,2])\n",
"plt.yticks(np.arange(0,1.1,0.1))\n",
"plt.title(\"Relationship between pclass and target\")\n",
"plt.xlabel(\"Pclass\")\n",
"plt.ylabel(\"Mean of target\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Arbitrary"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"OrdinalEncoder(encoding_method='arbitrary',\n",
" variables=['pclass', 'cabin', 'embarked'])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ordinal_enc = OrdinalEncoder(encoding_method='arbitrary',\n",
" variables=['pclass', 'cabin', 'embarked'])\n",
"\n",
"# for this encoder we don't need to add the target. You can leave it or remove it.\n",
"ordinal_enc.fit(X_train)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'pclass': {2: 0, 3: 1, 1: 2},\n",
" 'cabin': {'n': 0,\n",
" 'E': 1,\n",
" 'C': 2,\n",
" 'D': 3,\n",
" 'B': 4,\n",
" 'A': 5,\n",
" 'F': 6,\n",
" 'T': 7,\n",
" 'G': 8},\n",
" 'embarked': {'S': 0, 'C': 1, 'Q': 2}}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ordinal_enc.encoder_dict_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the ordering of the different labels is not the same when we select \"arbitrary\" or \"ordered\""
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" pclass \n",
" sex \n",
" age \n",
" sibsp \n",
" parch \n",
" fare \n",
" cabin \n",
" embarked \n",
" \n",
" \n",
" \n",
" \n",
" 1122 \n",
" 1 \n",
" female \n",
" NaN \n",
" 1 \n",
" 1 \n",
" 22.3583 \n",
" 6 \n",
" 1 \n",
" \n",
" \n",
" 934 \n",
" 1 \n",
" female \n",
" 4.0 \n",
" 0 \n",
" 2 \n",
" 22.0250 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 815 \n",
" 1 \n",
" male \n",
" NaN \n",
" 0 \n",
" 0 \n",
" 14.5000 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 124 \n",
" 2 \n",
" female \n",
" 48.0 \n",
" 1 \n",
" 1 \n",
" 79.2000 \n",
" 4 \n",
" 1 \n",
" \n",
" \n",
" 1125 \n",
" 1 \n",
" male \n",
" 24.0 \n",
" 0 \n",
" 0 \n",
" 8.0500 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" pclass sex age sibsp parch fare cabin embarked\n",
"1122 1 female NaN 1 1 22.3583 6 1\n",
"934 1 female 4.0 0 2 22.0250 0 0\n",
"815 1 male NaN 0 0 14.5000 0 0\n",
"124 2 female 48.0 1 1 79.2000 4 1\n",
"1125 1 male 24.0 0 0 8.0500 0 0"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# transform: see the numerical values in the former categorical variables\n",
"\n",
"train_t = ordinal_enc.transform(X_train)\n",
"test_t = ordinal_enc.transform(X_test)\n",
"\n",
"test_t.sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Automatically select categorical variables\n",
"\n",
"This encoder selects all the categorical variables, if None is passed to the variable argument when calling the encoder."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"OrdinalEncoder(encoding_method='arbitrary',\n",
" variables=['pclass', 'sex', 'cabin', 'embarked'])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ordinal_enc = OrdinalEncoder(encoding_method = 'arbitrary')\n",
"\n",
"# for this encoder we don't need to add the target. You can leave it or remove it.\n",
"ordinal_enc.fit(X_train)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['pclass', 'sex', 'cabin', 'embarked']"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ordinal_enc.variables"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" pclass \n",
" sex \n",
" age \n",
" sibsp \n",
" parch \n",
" fare \n",
" cabin \n",
" embarked \n",
" \n",
" \n",
" \n",
" \n",
" 1135 \n",
" 1 \n",
" 1 \n",
" NaN \n",
" 0 \n",
" 0 \n",
" 7.8958 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 328 \n",
" 0 \n",
" 1 \n",
" 34.0 \n",
" 1 \n",
" 0 \n",
" 26.0000 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 785 \n",
" 1 \n",
" 0 \n",
" 22.0 \n",
" 1 \n",
" 0 \n",
" 13.9000 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 708 \n",
" 1 \n",
" 1 \n",
" 24.0 \n",
" 0 \n",
" 0 \n",
" 7.8542 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 486 \n",
" 0 \n",
" 1 \n",
" 24.0 \n",
" 0 \n",
" 0 \n",
" 10.5000 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" pclass sex age sibsp parch fare cabin embarked\n",
"1135 1 1 NaN 0 0 7.8958 0 0\n",
"328 0 1 34.0 1 0 26.0000 0 0\n",
"785 1 0 22.0 1 0 13.9000 0 0\n",
"708 1 1 24.0 0 0 7.8542 0 0\n",
"486 0 1 24.0 0 0 10.5000 0 0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_t = ordinal_enc.transform(X_train)\n",
"test_t = ordinal_enc.transform(X_test)\n",
"\n",
"test_t.sample(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}