{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Missing value imputation: MeanMedianImputer\n", "\n", "The MeanMedianImputer() replaces missing data by the mean or median value of the variable. \n", "\n", "It works only with numerical variables.\n", "\n", "**For this demonstration, we use the Ames House Prices dataset produced by Professor Dean De Cock:**\n", "\n", "[Dean De Cock (2011) Ames, Iowa: Alternative to the Boston Housing\n", "Data as an End of Semester Regression Project, Journal of Statistics Education, Vol.19, No. 3](http://jse.amstat.org/v19n3/decock.pdf)\n", "\n", "The version of the dataset used in this notebook can be obtained from [Kaggle](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Version" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1.2.0'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Make sure you are using this \n", "# Feature-engine version.\n", "\n", "import feature_engine\n", "\n", "feature_engine.__version__" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "from sklearn.model_selection import train_test_split\n", "\n", "from feature_engine.imputation import MeanMedianImputer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilities...PoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
0160RL65.08450PaveNaNRegLvlAllPub...0NaNNaNNaN022008WDNormal208500
1220RL80.09600PaveNaNRegLvlAllPub...0NaNNaNNaN052007WDNormal181500
2360RL68.011250PaveNaNIR1LvlAllPub...0NaNNaNNaN092008WDNormal223500
3470RL60.09550PaveNaNIR1LvlAllPub...0NaNNaNNaN022006WDAbnorml140000
4560RL84.014260PaveNaNIR1LvlAllPub...0NaNNaNNaN0122008WDNormal250000
\n", "

5 rows × 81 columns

\n", "
" ], "text/plain": [ " Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \\\n", "0 1 60 RL 65.0 8450 Pave NaN Reg \n", "1 2 20 RL 80.0 9600 Pave NaN Reg \n", "2 3 60 RL 68.0 11250 Pave NaN IR1 \n", "3 4 70 RL 60.0 9550 Pave NaN IR1 \n", "4 5 60 RL 84.0 14260 Pave NaN IR1 \n", "\n", " LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold \\\n", "0 Lvl AllPub ... 0 NaN NaN NaN 0 2 \n", "1 Lvl AllPub ... 0 NaN NaN NaN 0 5 \n", "2 Lvl AllPub ... 0 NaN NaN NaN 0 9 \n", "3 Lvl AllPub ... 0 NaN NaN NaN 0 2 \n", "4 Lvl AllPub ... 0 NaN NaN NaN 0 12 \n", "\n", " YrSold SaleType SaleCondition SalePrice \n", "0 2008 WD Normal 208500 \n", "1 2007 WD Normal 181500 \n", "2 2008 WD Normal 223500 \n", "3 2006 WD Abnorml 140000 \n", "4 2008 WD Normal 250000 \n", "\n", "[5 rows x 81 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Download the data from Kaggle and store it\n", "# in the same folder as this notebook.\n", "\n", "data = pd.read_csv('houseprice.csv')\n", "\n", "data.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((1022, 79), (438, 79))" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Separate the data into train and test sets.\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " data.drop(['Id', 'SalePrice'], axis=1),\n", " data['SalePrice'],\n", " test_size=0.3,\n", " random_state=0,\n", ")\n", "\n", "X_train.shape, X_test.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check missing data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LotFrontage 0.184932\n", "MasVnrArea 0.004892\n", "dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Numerical variables with missing data\n", "\n", "X_train[['LotFrontage', 'MasVnrArea']].isnull().mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imputation with the median\n", "\n", "Let's start by imputing missing data in 2 variables with their median." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Set up the imputer.\n", "\n", "imputer = MeanMedianImputer(\n", " imputation_method='median',\n", " variables=['LotFrontage', 'MasVnrArea'],\n", ")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MeanMedianImputer(variables=['LotFrontage', 'MasVnrArea'])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find median values\n", "\n", "imputer.fit(X_train)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'LotFrontage': 69.0, 'MasVnrArea': 0.0}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dictionary with the imputation values for each variable.\n", "\n", "imputer.imputer_dict_" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LotFrontage 69.0\n", "MasVnrArea 0.0\n", "dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let's corroborate that the dictionary \n", "# contains the median values of the variables.\n", "\n", "X_train[['LotFrontage', 'MasVnrArea']].median()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# impute the data\n", "\n", "train_t = imputer.transform(X_train)\n", "test_t = imputer.transform(X_test)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LotFrontage 0\n", "MasVnrArea 0\n", "dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check we no longer have NA\n", "\n", "train_t[['LotFrontage', 'MasVnrArea']].isnull().sum()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAD4CAYAAAD7CAEUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAvg0lEQVR4nO3deXxV9Z3/8dcnGwlZCISwSFBAUEBZlLB00WoXoK0VO+qIojIzjstU+7Pj1Fan1mk79TF22kfVtnamWi1qrdhSRWqduuFSWxeiYgEBCXsWspHkZr3ZPr8/zrnxEm+Se5N77kI+z8fjmnu/93vO/Z4Q8873+z3ne0RVMcYYY8KVEu8GGGOMSS4WHMYYYyJiwWGMMSYiFhzGGGMiYsFhjDEmImnxbkAsjB8/XqdNmxbvZhhjTFJ5++23a1W1sG/5iAiOadOmUVJSEu9mGGNMUhGRg6HKbajKGGNMRCw4jDHGRMSCwxhjTERGxByHMSY5dHZ2UlZWRnt7e7ybMqJkZmZSVFREenp6WPU9DQ4RWQncA6QCv1TVO/u8Pwp4GFgE1AGXqOoBEVkC3BeoBnxHVZ8MZ5/GmORVVlZGbm4u06ZNQ0Ti3ZwRQVWpq6ujrKyM6dOnh7WNZ0NVIpIK3At8HpgLXCoic/tUuwqoV9WZwF3AD9zy7UCxqi4EVgK/EJG0MPdpjElS7e3tFBQUWGjEkIhQUFAQUS/PyzmOJUCpqu5T1Q5gPbCqT51VwEPu8w3AZ0REVLVVVbvc8kwgsIRvOPs0xiQxC43Yi/R77mVwTAEOB70uc8tC1nGDohEoABCRpSKyA9gGXOe+H84+cbe/RkRKRKSkpqYmCodjEsJjj8Hzz8e7FcaMaAl7VpWqvqmqpwGLgVtFJDPC7e9T1WJVLS4s/MiFjyYZVVbCZZfB8uXQ3R3v1pjjUE5OTth1161bR0VFRe/rc845h1NPPZWFCxeycOFCNmzYMOz2bNy4kffff3/Y+4k2L4OjHJga9LrILQtZR0TSgDE4k+S9VHUn0AycHuY+zfHqL3/58Pnu3fFrhzF8NDgAHn30UbZu3crWrVu56KKLjnmvewh/7IzE4NgCzBKR6SKSAawGNvWpswlY6z6/CNisqupukwYgIicBs4EDYe7THK927frwuQWHiZGtW7eybNky5s+fz5e//GXq6+vZsGEDJSUlrFmzhoULF9LW1hZy22nTpvHNb36TM888k9/97nc89thjzJs3j9NPP51vfvObvfVycnL41re+xYIFC1i2bBlVVVX89a9/ZdOmTdx8880sXLiQvXv3cv/997N48WIWLFjAhRdeSGtrKwB79+5l2bJlzJs3j9tuu+2YntMPf/hDFi9ezPz58/mP//iPqHxPPDsdV1W7ROQG4FmcU2cfVNUdIvI9oERVNwEPAI+ISClwFCcIAD4J3CIinUAP8BVVrQUItU+vjsEkmH37YPRoaG11npvj2nf/sIP3K3xR3efcE/L4jy+dFtE2V155JT/96U/51Kc+xe233853v/td7r77bn72s5/xox/9iOLi4t66a9asISsrC4AXX3wRgIKCAt555x0qKipYtmwZb7/9NmPHjmX58uVs3LiRCy64gJaWFpYtW8Ydd9zBN77xDe6//35uu+02zj//fM4777ze3kt+fj5XX301ALfddhsPPPAAX/3qV7nxxhu58cYbufTSS/nf//3f3vY899xz7Nmzh7feegtV5fzzz+fVV1/l7LPPHtb30dPrOFT1GeCZPmW3Bz1vBy4Osd0jwCPh7tOMEEeOwOzZsGMHVFXFuzVmBGhsbKShoYFPfepTAKxdu5aLL/7Ir6xejz766DFBAnDJJZcAsGXLFs455xwCc65r1qzh1Vdf5YILLiAjI4PzzjsPgEWLFvF8PyeAbN++ndtuu42Ghgaam5tZsWIFAK+//jobN24E4LLLLuPrX/864ATHc889xxlnnAFAc3Mze/bsSezgMCaqqqpg0iSorbXgGAEi7Rkkquzs7EHrpKen954Sm5qaSldXV8h6//AP/8DGjRtZsGAB69at4+WXXx5wv6rKrbfeyrXXXhtxuweSsGdVGfMRR444wTFhggWHiYkxY8YwduxY/vznPwPwyCOP9PY+cnNzaWpqCntfS5Ys4ZVXXqG2tpbu7m4ee+yx3n31p+9nNDU1MXnyZDo7O3n00Ud7y5ctW8bvf/97ANavX99bvmLFCh588EGam5sBKC8vp7q6Ouw298d6HCY59PRAdbUTHFVV0OdsFmOiobW1laKiot7XN910Ew899BDXXXcdra2tzJgxg1/96leA89f/ddddR1ZWFq+//vqg+548eTJ33nkn5557LqrKF7/4RVatGvj65dWrV3P11Vfzk5/8hA0bNvCf//mfLF26lMLCQpYuXdobKnfffTeXX345d9xxBytXrmTMmDEALF++nJ07d/Kxj30McCbhf/3rXzNhwoQhfX8CRFUHr5XkiouL1W7klORqa6GwEO65B957D/70Jyi3M7GPNzt37mTOnDnxbkbSaW1tJSsrCxFh/fr1PPbYYzz11FMR7SPU915E3lbV4r51rcdhksORI87XiROdR3W10wtJsdFWY95++21uuOEGVJX8/HwefPBBTz/PgsMkh/p65+v48U7Po6sLfD7Iz49rs4xJBGeddRbvvfdezD7P/lwzyaGx0fmal/dhWATKjDExZcFhkoPPvRAsODgaGuLVGmNGNAsOkxyCg8M9Y8SCw5j4sOAwycF6HMYkDAsOkxx8PkhNddaqsuAwHrFl1cNjZ1WZ5ODzOb0NEQsOkxDWrVvH6aefzgknnNBbFmqtqoDu7m5SU1Mj+oyNGzdy3nnnMXduYt0h23ocJjk0NjrBAR9+teAwMWDLqn+U9ThMcgj0OADS0iA314LjePe1r8HWrdHd58KFcPfdEW1iy6p/lPU4THIIDg5whqssOIzHQi2r/uqrr/ZbP/gOgAUFBUDoZdXT0tJ6l1UHPrKs+oEDB0Luf/v27Zx11lnMmzePRx99lB07nNsRvf76673LvV922WW99YOXVT/zzDPZtWsXe/bsGcZ3xGE9DpMcfD7nivGAvLwPz7Qyx6cIewaJypZVNyZefL4Pr98AZ6gqgiWtjRkKW1Y9NOtxmOTQd6jKgsN4wJZVD48tq26Sw+jRcP318MMfOq8vvBA++AC2bYtvu0xU2bLqQ2PLqhvTV2cntLUd2+PIybEehzEuW1bdmL4CAWFDVcaEZMuqG9NX8DpVARYcx62RMHyeaCL9nltwmMQXCI6+Z1V1doLfH582GU9kZmZSV1dn4RFDqkpdXR2ZmZlhb2NDVSbx9dfjAKfXMWpU7NtkPFFUVERZWRk1NTXxbsqIkpmZeczZZIPxNDhEZCVwD5AK/FJV7+zz/ijgYWARUAdcoqoHRORzwJ1ABtAB3Kyqm91tXgYmA4HFYZar6vBPTDaJK/jufwHBwTF+fOzbZDyRnp7O9OnT490MMwjPgkNEUoF7gc8BZcAWEdmkqsFrBF8F1KvqTBFZDfwAuASoBb6kqhUicjrwLDAlaLs1qmrn144Ug/U4jDEx5eUcxxKgVFX3qWoHsB7oe7XLKuAh9/kG4DMiIqr6rqoGFrrfAWS5vRMzEoUKjsDqn+4VscaY2PEyOKYAh4Nel3Fsr+GYOqraBTQCBX3qXAi8o6rBs6C/EpGtIvJtCSzw0oeIXCMiJSJSYuOlSc56HMYklIQ+q0pETsMZvgpeoWuNqs4DznIfV4TaVlXvU9ViVS0uDF4czyQfnw9SUiB4sTgLDmPixsvgKAemBr0ucstC1hGRNGAMziQ5IlIEPAlcqap7Axuoarn7tQn4Dc6QmDmeBd/9L8CCw5i48TI4tgCzRGS6iGQAq4FNfepsAta6zy8CNquqikg+8EfgFlX9S6CyiKSJyHj3eTpwHrDdw2MwiSD47n8BFhzGxI1nweHOWdyAc0bUTuC3qrpDRL4nIue71R4ACkSkFLgJuMUtvwGYCdzuzmVsFZEJwCjgWRH5G7AVp8dyv1fHYBJE35VxwYLDmDjy9DoOVX0GeKZP2e1Bz9uBi0Ns933g+/3sdlE022iSQKjgyMhwHhYcxsRcQk+OGwN89CZOAbm5djquMXFgwWESX6geB9jS6sbEiQWHSXz9BYetkGtMXFhwmMQX6qwqsOAwJk4sOExi6+qC1lYLDmMSiAWHSWyh7v4XYMFhTFxYcJjEFuomTgEWHMbEhQWHSWyhFjgMsNNxjYkLCw6T2AYKjsDpuHabUWNiyoLDJLZQd/8LyM11Js/tvuPGxJQFh0lsgw1Vgc1zGBNjFhwmsQ02OQ4WHMbEmAWHSWzW4zAm4VhwmMTm8zk3cAq++19AIEwsOIyJKQsOk9gCy42EurV8IDgCvRJjTExYcJjE1t8Ch2DBYUycWHCYxDZQcATmOCw4jIkpCw6T2Pq7iRPYHIcxcWLBYRLbQD2O7Gxn7sN6HMbElAWHSWwDBUdKirPsiAWHMTFlwWESW383cQrIy7OhKmNizILDJLaBehzgvGc9DmNiyoLDJK7ubmhpGTg4cnMtOIyJMQsOk7gCQ1D9nVUF1uMwJg48DQ4RWSkiu0WkVERuCfH+KBF53H3/TRGZ5pZ/TkTeFpFt7tdPB22zyC0vFZGfiIS6pNgcFwZapyrA5jiMiTnPgkNEUoF7gc8Dc4FLRWRun2pXAfWqOhO4C/iBW14LfElV5wFrgUeCtvkf4GpglvtY6dUxmDgLNzisx2FMTHnZ41gClKrqPlXtANYDq/rUWQU85D7fAHxGRERV31XVCrd8B5Dl9k4mA3mq+oaqKvAwcIGHx2Diyb2J0y/ereGf1m2hpinEDZtsjsOYmPMyOKYAh4Nel7llIeuoahfQCBT0qXMh8I6q+t36ZYPsEwARuUZESkSkpKamZsgHYeLIDY4/HWpl865qvvLo22jf28QGhqrs9rHGxExCT46LyGk4w1fXRrqtqt6nqsWqWlxYWBj9xhnvucEx6aRJ/PeF89lyoJ4n3ik/tk5ennP2VVtbHBpozMjkZXCUA1ODXhe5ZSHriEgaMAaoc18XAU8CV6rq3qD6RYPs0xwnmqrqAPj4mTO4aFERC6fm81//t4um9s4PK9lCh8bEnJfBsQWYJSLTRSQDWA1s6lNnE87kN8BFwGZVVRHJB/4I3KKqfwlUVtVKwCciy9yzqa4EnvLwGEwcVRw8AsBpc08kJUX47vmnUdfi5/tP7/xwyMqWVjcm5jwLDnfO4gbgWWAn8FtV3SEi3xOR891qDwAFIlIK3AQETtm9AZgJ3C4iW93HBPe9rwC/BEqBvcD/eXUMJr7qKqrpTEnltJMnArBgaj5fOedkHi85zFcfe5faZr+tkGtMHKR5uXNVfQZ4pk/Z7UHP24GLQ2z3feD7/eyzBDg9ui01iai95iitWTmMSf/wx/TfPncqozPSuPuFD/jznlruneLnk2A9DmNiKKEnx83I1lXfQEdO7jFlKSnC9efO5P9uPIsZhdn88HX3rG0LDmNixoLDJKT2zm5Sm3z05IVebmTmhFx+ccUi2jKznQILDmNixoLDJKT9tS3k+ltIye9/naoJuZksmj8dgO76hhi1zBhjwWESUnl9G7n+VtLHjR2w3lnFMwGoOlgZi2YZY7DgMAmqvKGNvPYWRhUMHByLT5lIc0YWtYePxKhlxhgLDpOQyhvayOtoIXP8uAHrFeaOojUrh6ZKW1bGmFix4DAJqaKuheyONiQ/f9C6nXlj6D561PtGGWMACw6ToOqq6khRHfgmTgFjx5LR5MMXvBSJMcYzFhwmIfmOOOtUhRMcGYUFjGlvZlelXT1uTCxYcJiE097Z/eHQUxjBMXrCePLaW9hf2+xxy4wx4PGSI8YMRbXPT66/xXkRxhxH1oQC1N/MgbpWbxtmjAGsx2ESUHVTO7l+NwTC6HGkjBtHTkcbh6saPW6ZMQbCDA4ReUJEvigiFjTGc9VNQT2OMCfHAerKqz1slTEmINwg+DlwGbBHRO4UkVM9bJMZ4ap9kfU4AsHhq6j+6K1ljTFRF1ZwqOoLqroGOBM4ALwgIn8VkX8UkXQvG2hGnuomP2M7Iu9xpPkaqWvp8LBlxhiIYI5DRAqAfwD+GXgXuAcnSJ73pGVmxKpp8jOxuw2ysiAzc/AN3OAY097MQZsgN8ZzYZ1VJSJPAqcCjwBfcm/hCvC4iJR41TgzMlU3+ZnY1QrjBl5upJd75tWY9maONLZ71zBjDBD+6bj3u3fz6yUio1TVr6rFHrTLjGDVTX4KOprDD45Aj8PfQmVjm4ctM8ZA+ENVoW7j+no0G2JMQE1TO/ntkQdHQUcLldbjMMZzA/Y4RGQSMAXIEpEzAHHfygNGe9w2MwJ1dfdQ19JBXmsTjDsxvI0yMyEzkxO0nVesx2GM5wYbqlqBMyFeBPw4qLwJ+HeP2mRGsNrmDlRhdHNj+D0OgLFjmdDdaj0OY2JgwOBQ1YeAh0TkQlX9fYzaZEawmiY/AKOaIgyO8eMpbG+issGCwxivDTZUdbmq/hqYJiI39X1fVX8cYjNjhqy6qZ1RnX5S/e2RBUdhIWOrGqhuaqeru4e0VFvkwBivDPZ/V7b7NQfIDfEwJqqqm/zkt7vLo0fY48htaqBHnX0YY7wz2FDVL9yv3x3KzkVkJc6FgqnAL1X1zj7vjwIeBhYBdcAlqnrAvdhwA7AYWKeqNwRt8zIwGQjMgi5XVVuk6DhR7fMzpt1dHn3swPcbP8b48WQ11gNQ2djGCflZHrTOGAPhL3L43yKSJyLpIvKiiNSIyOWDbJMK3At8HpgLXCoic/tUuwqoV9WZwF3AD9zyduDbwNf72f0aVV3oPiw0jiPVTe0UqTtPEeFQVXpjPak93TZBbozHwh0IXq6qPuA8nLWqZgI3D7LNEqBUVfepagewHljVp84q4CH3+QbgMyIiqtqiqq/hBIgZQWqb/ZwY6ExGOFQFkN9mE+TGeC3c4AgMaX0R+J2qhnPjgynA4aDXZW5ZyDqq2gU0AgVh7PtXIrJVRL4tIhKqgohcIyIlIlJSU1MTxi5NIqhp8jO5ewjBUVgIwAmdzRzxWXAY46Vwg+NpEdmFMxfxoogUEr/ewBpVnQec5T6uCFVJVe9T1WJVLS50f6mYxFfT7Gdit7tQ4RB6HCdLmwWHMR4Ld1n1W4CPA8Wq2gm08NFhp77KgalBr4vcspB1RCQNGIMzST5QW8rdr03Ab3CGxMxxQFWpbepgfEcLpKVBTk74G7vBcaK2UmVzHMZ4KpJ7js/GuZ4jeJuHB6i/BZglItNxAmI1zs2ggm0C1uKse3URsFkHuBOP+9n5qlrr3gfkPOCFCI7BJLCWjm7aOrsZG1inKvQoZGhur7Kos8V6HMZ4LNxl1R8BTga2At1usTJAcKhql4jcADyLczrug6q6Q0S+B5So6ibgAeARESkFjuKES+AzD+CsiZUhIhcAy4GDwLNuaKTihMb94R6sSWy17vUXY9qaIjsVF6DAmRqb2NlMtc9PT4+SkhJB8BhjwhZuj6MYmDtQbyAUdyn2Z/qU3R70vB24uJ9tp/Wz20WRtMEkj5pmJziymxp6h57CNmoU5OVR2Oajo7uHo60djM8ZFf1GGmPCnhzfDkzysiHGBHocWQ11MGFC5DsYP578VueEP7uhkzHeCbfHMR54X0TeAnrXc1DV8z1plRmRAj2OjKNDDI7CQnKanKvHq3ztnD4ljPuVG2MiFm5wfMfLRhgDzjUcadqN1NX2TnZHZPx4sg45lw7ZBLkx3gn3dNxXcK4YT3efbwHe8bBdZgSqbfYzLcWP9PQMrccxcSJpdbWkCHZKrjEeCnetqqtxlgT5hVs0BdjoUZvMCFXT5GeGuleND6XHMWkSUlVF4eh063EY46FwJ8evBz4B+ABUdQ8whD8JjelfTXMHJ3W7K+MOpccxaRJ0d3NKmt8WOjTGQ+EGh99dqBDovRAvolNzjRlMbZOfos5hBgcws7uZKutxGOOZcIPjFRH5dyBLRD4H/A74g3fNMiONqjoLHPp9TsFQhqomTwZgWleTnY5rjIfCDY5bgBpgG3AtzkV9t3nVKDPy+Nq76OjuoaDN5yw1UhDOIsl9uD2OKf5GfO1dtHV0D7KBMWYowjodV1V7RGQjsFFVbY1yE3U17sV/41oanNBIi2QZNZcbHBNbGmC0c0ru9PHZA29jjInYgD0OcXxHRGqB3cBu9+5/tw+0nTGRqnUv/sv1HR3aMBU4q+lmZzOu+ShgV48b45XBhqr+FedsqsWqOk5VxwFLgU+IyL963jozYgR6HKMb64c2MR4waRJ5jc7K/Ed8bYNUNsYMxWDBcQVwqaruDxSo6j7gcuBKLxtmRpZAj2PU0dphB8foOmc09Uijf5DKxpihGCw40lW1tm+hO8+R7k2TzEhU0+QnNUVIqakednCkVleROyrNTsk1xiODBUfHEN8zJiI1TX6mZPQgDQ1wwglD39HkyVBZycQxmTbHYYxHBjt1ZYGI+EKUC5DpQXvMCFXb7GdWj3vx33CCY9IkaGhgapbYsiPGeGTA4FDV1Fg1xIxsNc1+Pt7p3EuDKVOGvqPA1ePawtO+jCi0zBjTV7gXABrjqdqmDqa2u8Ex3B4HcFJnE9VNfrp7bGUcY6LNgsPEXU+PUtvsZ0qLc/1FNIKjqL2Bbne/xpjosuAwcdfQ1klXj1LYVAeZmZCfP/SduaEzscm9lsMmyI2JOgsOE3eBX+4FjbXO/IbI0Hc2cSKkp1NQX+3s2ybIjYk6Cw4Td1VNzi/33KM1wxumAkhJgSlTyKs94uzbgsOYqLPgMHEXuM1rZm3V8IMDYOpURh2pIC1FbKjKGA9YcJi4q/L5QZW0yorhnYobMHUqcvgwE/PsIkBjvOBpcIjIShHZLSKlInJLiPdHicjj7vtvisg0t7xARF4SkWYR+VmfbRaJyDZ3m5+IDGdA3CSCqqZ2TkrrQlpbo9bjoKyMSTl273FjvOBZcIhIKnAv8HlgLnCpiMztU+0qoF5VZwJ3AT9wy9uBbwNfD7Hr/wGuBma5j5XRb72JparGdk7taXJeRCs4OjuZJa0WHMZ4wMsexxKgVFX3ufcrXw+s6lNnFfCQ+3wD8BkREVVtUdXXcAKkl4hMBvJU9Q1VVeBh4AIPj8HEQFVTOzO73JVtojRUBXCyv753/sQYEz1eBscU4HDQ6zK3LGQdVe0CGoGB7hk6xd3PQPs0SeZIo5+T/A3Oi2j0OIqKADip9SgtHd342juHv09jTK/jdnJcRK4RkRIRKampsbvdJqrO7h7qWvxMaY7CVeMBbo9jSpNzR4DyeruhkzHR5GVwlANTg14XuWUh64hIGjAGqBtkn0WD7BMAVb1PVYtVtbhwqLciNZ6rbfajChMbqmH8eBg9evg7HT8eMjMpbHD+YDh8tHX4+zTG9PIyOLYAs0RkuohkAKuBTX3qbALWus8vAja7cxchqWol4BORZe7ZVFcCT0W/6SZWAqfLjq2rhBNPjM5ORWDaNPKrnL8pDluPw5ioGux+HEOmql0icgPwLJAKPKiqO0Tke0CJqm4CHgAeEZFS4ChOuAAgIgeAPCBDRC4Alqvq+8BXgHVAFvB/7sMkqSqfswhhTlUlzDklejueMYP0g/vJnp9KWb31OIyJJs+CA0BVnwGe6VN2e9DzduDifrad1k95CXB69Fpp4qnK1w6qZFQchhWfjd6OZ8xAXnuNqWOzOHzUehzGRNNxOzlukkOVr52xna2kNDVFb6gK4OSTwefj1IxO63EYE2UWHCauKhramNft3sApmsExYwYAp/trOXy0lQGmzowxEbLgMHFV3tDG3C7vgmNmUw0tHd3Ut9q1HMZEiwWHiavy+jZmtrvXcJx0UvR2PH06AEX1lYCdkmtMNFlwmLjp7O7hiK+dE5trISMDJkyI3s6zs2HiRAprnIUGDts8hzFRY8Fh4uZIYzs9ChMbqpyrvVOi/ON48snklB0CsDOrjIkiCw4TN+UNzi/z/Noj0Z3fCJg1i7TSPYzLzuCQDVUZEzUWHCZuAmtIZVdVeBMcs2dDRQWzRysH61qiv39jRigLDhM35Q1tpHV3kVrpUXDMmQPAEn81+2stOIyJFgsOEzdl9a3M0Wakp8e7Hgdwmq+CysZ2Wju6ov8ZxoxAFhwmbsqDL/6bOnXgykMxYwakpTG9xrktzIFam+cwJhosOEzcHDraypx2554ZgQv2oio9HWbNYkLFAQAbrjImSiw4TFz4u7opr29jVlOVcxpuNC/+CzZ7Njn7SwE4YBPkxkSFBYeJi8NHW+lRmHK0whmmysjw5oNmzyZlbylTstPYV2PBYUw0WHCYuNjvzjeMqyp3VrL1ypw50NXFMq1nf22zd59jzAhiwWHiIvBLPOvQAW+Dwz2z6syWSpvjMCZKLDhMXOyvbWVKaicptTXeTIwHnHYaiDCn+gD1rZ3Ut3R491nGjBAWHCYu9tY0sxT3VFwvexyjR8OsWRSVlfZ+rjFmeCw4TMypKruPNLGoy11O3cvgAJg/n7F7dwHwQZUFhzHDZcFhYu6Ir53Gtk5mt1Q7BV4OVQEsWED6/n2Mp4MPqpq8/SxjRgALDhNzu444v7ynVh2CiRMhP9/bD5w/H4BPd1ZZcBgTBRYcJuZ2u8Ex7sAeZ/Laa25wLGkqt6EqY6LAgsPE3PsVPibnjSJt187YBMdJJ0FeHnNqDlDb7Lczq4wZJgsOE3NbDzdwbrYfmpth7lzvP1AE5s+n6NAHADZcZcwwWXCYmKpt9nPoaCuf7KhyCmLR4wCYP5/cPTtBlQ+qbbjKmOHwNDhEZKWI7BaRUhG5JcT7o0Tkcff9N0VkWtB7t7rlu0VkRVD5ARHZJiJbRaTEy/ab6Nt6qAGAebUHnYJYBceCBaQ0NTGnrZY91uMwZlg8Cw4RSQXuBT4PzAUuFZG+4xJXAfWqOhO4C/iBu+1cYDVwGrAS+Lm7v4BzVXWhqhZ71X7jjXcO1ZOaIpywZ7tz/ca4cbH54EWLAPhs62Her/DF5jONOU552eNYApSq6j5V7QDWA6v61FkFPOQ+3wB8RkTELV+vqn5V3Q+UuvszSe4ve+tYODWf1JItsCSG/6Tz5kFGBsvq9rGjwkd3j8bus405zngZHFOAw0Gvy9yykHVUtQtoBAoG2VaB50TkbRG5pr8PF5FrRKREREpqamqGdSAmOhpbO9lW1sDysT1w+HBsgyMjAxYu5JRDu2jr7LaVco0ZhmScHP+kqp6JMwR2vYicHaqSqt6nqsWqWlxYWBjbFpqQXt9XR4/Cpxv2OQWLF8e2AYsXU7B7Oyk93Wwrb4ztZxtzHPEyOMqB4BtJF7llIeuISBowBqgbaFtVDXytBp7EhrCSxgs7q8jNTGPG1tchJycuwZHS0swcXyXbymyew5ih8jI4tgCzRGS6iGTgTHZv6lNnE7DWfX4RsFlV1S1f7Z51NR2YBbwlItkikgsgItnAcmC7h8dgoqSzu4fn36/is3MmkvrC83Duud7d9a8/7tDYitbDbLcehzFD5llwuHMWNwDPAjuB36rqDhH5noic71Z7ACgQkVLgJuAWd9sdwG+B94E/AderajcwEXhNRN4D3gL+qKp/8uoYTPS8sa+OxrZOvpzXDnv3wvLlsW/EqadCbi5L6/ayo6KRHpsgN2ZI0rzcuao+AzzTp+z2oOftwMX9bHsHcEefsn3Agui31HjtmW1HGJ2Rysfe2ewUnHde7BuRkgKLFjHzwE5a5nWzr7aFmRNyYt8OY5JcMk6OmyTT4u/iD+9VsPK0SaQ/vh4+9jGYNi0+jVmyhHGlOxnV6afkwNH4tMGYJGfBYTz3h/cqaPZ3cVVuI2zbBpdeGr/GnH020tHBOUf38pYFhzFDYsFhPKWqPPLGQU6ZmMPcJx6BrCy4/PL4NeiTn4SUFL7UsIctFhzGDIkFh/HUizur2VHh47r545Df/AbWrIGxY+PXoDFj4IwzWHTgbxw+2kZlY1v82mJMkrLgMJ7xd3Xzgz/t4qSC0ax69zloa4Prr493s+Ccc5i4cyujujp4c5/1OoyJlAWH8cyPnt3NnupmvvPF2aT+z8+dYaKFC+PdLDjnHFL8fs46uo9XPrDlaIyJlAWH8cQjrx/g/j/v5/JlJ3Lu1pdg3z7413+Nd7Mc7jzH3zfs4qXd1bbgoTERsuAwUfeH9yq4fdMOPjtnAt/54hy44w44/XS44IJ4N82Rnw+f+ATLdvyFhtZO3j1UH+8WGZNULDhMVG0vb+Sm325l8Unj+NllZ5L2qwfh/ffhttucC/ASxapV5H2wk2m+Kl7YWR3v1hiTVBLo/2ST7Lq6e/j6795jXHYG9125iMxd78PNN8OnPw1///fxbt6xVjm3hvnnhh1s2lpuw1XGRMCCw0TNH/5Wwa5KHz8tqCH/mn+CpUshOxseeABE4t28Y82cCaedxhd2/pmKxnb+Ulob7xYZkzQsOExUqCr3vrSXH7/xMEuuuwyeeca5ZuOtt+K3vMhg1q5l3NYtnNFSyeMlhwevb4wBLDhMlLxzqJ4Zrz3P3736O/jqV+HIEbj/figqinfT+rd2LaSn882y1/jT9iMcqG2Jd4uMSQoWHCYqniw5zE1/+Q09p5wKP/5x7O+1MRQTJsAFF7Dk5U3kd7Xx082l8W6RMUnBgsMMW0dXD3VPPs3s6v2k3PYtSPN0tf7ouvlmUhrq+VHlqzz5bhl/K2uId4uMSXgWHGbYXtpdzYp3n6dzzFi45JJ4NycyixfD+edzzlO/4rSeJv7tt+/R3tkd71YZk9AsOMywPfPXPazY8wapl1ycHENUfd11F9LVxbq/3sfeI438+xPbcO5gbIwJxYLDDEtjayfpT28iq9NPyhVXxLs5QzNjBtx1FwWvvcTv9z7BE++U8d/P7rbwMKYfSTQYbRLR09sq+NK2zXRMPZGMj3883s0ZumuvhZ07OeOee3h8eStretbQ06Pc8vnZSKJdg2JMnFlwmGHZ/NJ73HdgKym33pJYS4oMxV13QVYWS++8k80HP+CSlq9R39rB9y+YR0Zakh+bMVFk/zeYITtQ28L0554iVXuQK6+Md3OGTwT+67/gsceYWr6Xzb/+GrXrn2Dtg2/R2NoZ79YZkzAsOMyQPfzXA1y44yU6ihfDqafGuznRs3o18s47ZM2YxoO//x6fefCHfPnuF3lrv930yRiw4DBD1Njayd4Nf2RO9X4y/vmqeDcn+mbNgjfegH/5F/75zSf41d3X8tNbf85Xf/MOOyt98W6dMXElI+HMkeLiYi0pKYl3M44rt2/cxoqvXcHS1krSDh6ArKx4N8k7zz5LzzXXknLoIDsnzuCp2Wezd/HZTFq6kOmT8inIyWDs6AwKcjIYnzOK8TmjSE2xCXWT/ETkbVUt7lvu6eS4iKwE7gFSgV+q6p193h8FPAwsAuqAS1T1gPvercBVQDfw/1T12XD2abz34s4qmh54iE8cfA/uvvv4Dg2AFStI+WA3/PrXzPrZvdzyyjp4ZR0dqenUjh5DW3om/rR0OlLTOJiazhtjJ9My81T00+dy8spPcca0cYxKS433URgTNZ71OEQkFfgA+BxQBmwBLlXV94PqfAWYr6rXichq4MuqeomIzAUeA5YAJwAvAKe4mw24z1CsxzF8qkqVz89TL++g7me/4OaX15GydCmpL7+UXEuMRMPhw/Dyy+i2bXRU1dDZ6KOztY2u1nZ6WlvJPHSAMXVVANRl5fHG9IUcKf44YxefQfakQlJaWmivqKSrvBKOHCGjuorsozVkt7fgT8+gK2s0mp1Nx7jxdE8+gZSpRWRNO5HcmdMomDGVifmjyUw//oOovbObWl8bR6uO0lrXQGd9Az0+H7ldfjLHjmH05InknTCBvIkFpKWPsJ/BGIlHj2MJUKqq+9wGrAdWAcG/5FcB33GfbwB+Js5J86uA9arqB/aLSKm7P8LYZ9RsPeNsxlYeAgX3PwBIUNhK0DvHlB8TyP1vC6AKEqKOoKH3HdzIfj7z2P0ds8Gg9YOfBtfJ1B6ubW8GoPOznyP18fUjLzQApk6FK65AgFHu4yNqamh9+hn8Tz7N2a+9Qu7Drzp96xCaR+fRPG48baNzSW9vIK2hgoy2FsY01ZPa03NM3c6UVBoyc2lIS6MrLYOu1LTee51I73/o/TeM/M9CHXw7900h8POhiKrzM6Qc+/yY97R3G6ecY7Y95rUqqT3dFHW0Mdj6yl2SwtGsXFozs+kJnBIefO2N+1yDypSRM5Q4ee8ORmWPjuo+vfy/fgoQfJODMmBpf3VUtUtEGoECt/yNPttOcZ8Ptk8AROQa4BqAE088cUgH0H7SdOoy3F8L0ueHLcQPJjg/nBL0vG8dJ2gkaJ+Bp4IKx2w70H4gqHKfz+/9nBCfKaGOo/flwMeXm5lG55yTKTxvBelLlybezZkSSWEho/9xLaP/ca0T7nv20Pz+btqr69CcHPJOnMyooikwaRI5mZnkhNpHdzdUV9Oy7yC+0gO07jtIx8FDdNcdxd/aDh0dpHZ1ovrhr3vnjxDXR/55JHRx4N0w/jmDf/U6P4Pi/JyJOD9XgZ+nEOWB14EP++h2zv8DiCCSQkZmBmljx5CRP4aMsWNIy88nJS+X1vRM2usb6aiupau2Fq2tQ+rrSWtqdL7Xqr3fk94/fPr7I2kEOEGifw7UcfvnoqreB9wHzlDVUPaxbONDUW2TGaFE4JRTyDnllNAB0Z/UVJg8mezJk8n+xDKvWmdMxLw8HbccmBr0usgtC1lHRNKAMTiT5P1tG84+jTHGeMjL4NgCzBKR6SKSAawGNvWpswlY6z6/CNiszmz9JmC1iIwSkenALOCtMPdpjDHGQ54NVblzFjcAz+KcOvugqu4Qke8BJaq6CXgAeMSd/D6KEwS49X6LM+ndBVyvqt0Aofbp1TEYY4z5KLsA0BhjTEj9nY5rS44YY4yJiAWHMcaYiFhwGGOMiYgFhzHGmIiMiMlxEakBDsbwI8cDtTH8PK/Z8SQ2O57El6zHdJKqFvYtHBHBEWsiUhLqTIRkZceT2Ox4Et/xdkw2VGWMMSYiFhzGGGMiYsHhjfvi3YAos+NJbHY8ie+4Oiab4zDGGBMR63EYY4yJiAWHMcaYiFhwDIOIXCwiO0SkR0SK+7x3q4iUishuEVkRVL7SLSsVkVti3+rIJFt7AUTkQRGpFpHtQWXjROR5Ednjfh3rlouI/MQ9vr+JyJnxa3loIjJVRF4Skffdn7cb3fKkPCYRyRSRt0TkPfd4vuuWTxeRN912P+7eOgH39gqPu+Vvisi0uB5AP0QkVUTeFZGn3ddJfTwDseAYnu3A3wGvBheKyFycJeJPA1YCP3d/qFKBe4HPA3OBS926CSnZ2htkHc73PdgtwIuqOgt40X0NzrHNch/XAP8TozZGogv4N1WdCywDrnf/HZL1mPzAp1V1AbAQWCkiy4AfAHep6kygHrjKrX8VUO+W3+XWS0Q3AjuDXif78fTLgmMYVHWnqu4O8dYqYL2q+lV1P1AKLHEfpaq6T1U7gPVu3USVbO0FQFVfxbm/S7BVQOBewA8BFwSVP6yON4B8EZkck4aGSVUrVfUd93kTzi+nKSTpMbntanZfprsPBT4NbHDL+x5P4Dg3AJ8RSawb3otIEfBF4JfuayGJj2cwFhzemAIcDnpd5pb1V56okq29A5moqpXu8yPARPd5Uh2jO6xxBvAmSXxMbg98K1ANPA/sBRpUtcutEtzm3uNx328ECmLa4MHdDXwD6HFfF5DcxzMgC45BiMgLIrI9xCPh//I2obm3J06689BFJAf4PfA1VfUFv5dsx6Sq3aq6ECjC6dnOjm+Lhk5EzgOqVfXteLclVjy7dezxQlU/O4TNyoGpQa+L3DIGKE9EAx1HsqkSkcmqWukO21S75UlxjCKSjhMaj6rqE25xUh8TgKo2iMhLwMdwhtTS3L/Cg9scOJ4yEUkDxgB1cWlwaJ8AzheRLwCZQB5wD8l7PIOyHoc3NgGr3bMnpuNMUr4FbAFmuWdbZOBMoG+KYzsHk2ztHcgmYK37fC3wVFD5le6ZSMuAxqDhn4Tgjn8/AOxU1R8HvZWUxyQihSKS7z7PAj6HM2/zEnCRW63v8QSO8yJgsybQlcuqequqFqnqNJz/Rzar6hqS9HjCoqr2GOID+DLO2KUfqAKeDXrvWzjjtruBzweVfwH4wH3vW/E+hjCOMana67b5MaAS6HT/fa7CGUN+EdgDvACMc+sKzplje4FtQHG82x/ieD6JMwz1N2Cr+/hCsh4TMB941z2e7cDtbvkMnD+wSoHfAaPc8kz3dan7/ox4H8MAx3YO8PTxcjz9PWzJEWOMMRGxoSpjjDERseAwxhgTEQsOY4wxEbHgMMYYExELDmOMMRGx4DDGGBMRCw5jjDER+f9EMae2YU6q1QAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# The variable distribution changed slightly with\n", "# more values accumulating towards the median \n", "# after the imputation.\n", "\n", "fig = plt.figure()\n", "ax = fig.add_subplot(111)\n", "X_train['LotFrontage'].plot(kind='kde', ax=ax)\n", "train_t['LotFrontage'].plot(kind='kde', ax=ax, color='red')\n", "lines, labels = ax.get_legend_handles_labels()\n", "ax.legend(lines, labels, loc='best')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Automatically select all numerical variables\n", "\n", "Let's now impute all numerical variables with the mean.\n", "\n", "If we leave the parameter `variables` to `None`, the transformer identifies and imputes all numerical variables." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Set up the imputer\n", "\n", "imputer = MeanMedianImputer(\n", " imputation_method='mean',\n", ")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MeanMedianImputer(imputation_method='mean')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find numerical variables and their mean.\n", "\n", "imputer.fit(X_train)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['MSSubClass',\n", " 'LotFrontage',\n", " 'LotArea',\n", " 'OverallQual',\n", " 'OverallCond',\n", " 'YearBuilt',\n", " 'YearRemodAdd',\n", " 'MasVnrArea',\n", " 'BsmtFinSF1',\n", " 'BsmtFinSF2',\n", " 'BsmtUnfSF',\n", " 'TotalBsmtSF',\n", " '1stFlrSF',\n", " '2ndFlrSF',\n", " 'LowQualFinSF',\n", " 'GrLivArea',\n", " 'BsmtFullBath',\n", " 'BsmtHalfBath',\n", " 'FullBath',\n", " 'HalfBath',\n", " 'BedroomAbvGr',\n", " 'KitchenAbvGr',\n", " 'TotRmsAbvGrd',\n", " 'Fireplaces',\n", " 'GarageYrBlt',\n", " 'GarageCars',\n", " 'GarageArea',\n", " 'WoodDeckSF',\n", " 'OpenPorchSF',\n", " 'EnclosedPorch',\n", " '3SsnPorch',\n", " 'ScreenPorch',\n", " 'PoolArea',\n", " 'MiscVal',\n", " 'MoSold',\n", " 'YrSold']" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Numerical variables identified.\n", "\n", "imputer.variables_" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'MSSubClass': 56.614481409001954,\n", " 'LotFrontage': 69.66866746698679,\n", " 'LotArea': 10567.96673189824,\n", " 'OverallQual': 6.079256360078278,\n", " 'OverallCond': 5.562622309197652,\n", " 'YearBuilt': 1970.940313111546,\n", " 'YearRemodAdd': 1984.6986301369864,\n", " 'MasVnrArea': 103.55358898721731,\n", " 'BsmtFinSF1': 442.2240704500978,\n", " 'BsmtFinSF2': 47.12720156555773,\n", " 'BsmtUnfSF': 565.9921722113503,\n", " 'TotalBsmtSF': 1055.3434442270059,\n", " '1stFlrSF': 1161.7221135029354,\n", " '2ndFlrSF': 354.7250489236791,\n", " 'LowQualFinSF': 5.690802348336595,\n", " 'GrLivArea': 1522.1379647749511,\n", " 'BsmtFullBath': 0.4187866927592955,\n", " 'BsmtHalfBath': 0.05675146771037182,\n", " 'FullBath': 1.576320939334638,\n", " 'HalfBath': 0.38258317025440314,\n", " 'BedroomAbvGr': 2.8943248532289627,\n", " 'KitchenAbvGr': 1.0450097847358122,\n", " 'TotRmsAbvGrd': 6.548923679060666,\n", " 'Fireplaces': 0.6125244618395304,\n", " 'GarageYrBlt': 1978.0123966942149,\n", " 'GarageCars': 1.764187866927593,\n", " 'GarageArea': 469.3982387475538,\n", " 'WoodDeckSF': 94.8522504892368,\n", " 'OpenPorchSF': 47.37866927592955,\n", " 'EnclosedPorch': 23.607632093933464,\n", " '3SsnPorch': 3.3258317025440314,\n", " 'ScreenPorch': 15.646771037181995,\n", " 'PoolArea': 1.786692759295499,\n", " 'MiscVal': 55.86497064579256,\n", " 'MoSold': 6.300391389432486,\n", " 'YrSold': 2007.839530332681}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The imputation value, the mean, for each variable\n", "\n", "imputer.imputer_dict_" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MSSubClass 0\n", "LotFrontage 0\n", "LotArea 0\n", "OverallQual 0\n", "OverallCond 0\n", "YearBuilt 0\n", "YearRemodAdd 0\n", "MasVnrArea 0\n", "BsmtFinSF1 0\n", "BsmtFinSF2 0\n", "BsmtUnfSF 0\n", "TotalBsmtSF 0\n", "1stFlrSF 0\n", "2ndFlrSF 0\n", "LowQualFinSF 0\n", "GrLivArea 0\n", "BsmtFullBath 0\n", "BsmtHalfBath 0\n", "FullBath 0\n", "HalfBath 0\n", "BedroomAbvGr 0\n", "KitchenAbvGr 0\n", "TotRmsAbvGrd 0\n", "Fireplaces 0\n", "GarageYrBlt 0\n", "GarageCars 0\n", "GarageArea 0\n", "WoodDeckSF 0\n", "OpenPorchSF 0\n", "EnclosedPorch 0\n", "3SsnPorch 0\n", "ScreenPorch 0\n", "PoolArea 0\n", "MiscVal 0\n", "MoSold 0\n", "YrSold 0\n", "dtype: int64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# impute the data\n", "\n", "train_t = imputer.transform(X_train)\n", "test_t = imputer.transform(X_test)\n", "\n", "# the numerical variables do not have NA after\n", "# the imputation.\n", "\n", "test_t[imputer.variables_].isnull().sum()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "fenotebook", "language": "python", "name": "fenotebook" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }