{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Variable transformers : BoxCoxTransformer\n", "\n", "The BoxCoxTransformer() applies the BoxCox transformation to numerical\n", "variables.\n", "\n", "The Box-Cox transformation is defined as:\n", "\n", "- T(Y)=(Y exp(λ)−1)/λ if λ!=0\n", "- log(Y) otherwise\n", "\n", "where Y is the response variable and λ is the transformation parameter. λ varies,\n", "typically from -5 to 5. In the transformation, all values of λ are considered and\n", "the optimal value for a given variable is selected.\n", "\n", "**For this demonstration, we use the Ames House Prices dataset produced by Professor Dean De Cock:**\n", "\n", "Dean De Cock (2011) Ames, Iowa: Alternative to the Boston Housing\n", "Data as an End of Semester Regression Project, Journal of Statistics Education, Vol.19, No. 3\n", "\n", "http://jse.amstat.org/v19n3/decock.pdf\n", "\n", "https://www.tandfonline.com/doi/abs/10.1080/10691898.2011.11889627\n", "\n", "The version of the dataset used in this notebook can be obtained from [Kaggle](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.model_selection import train_test_split\n", "\n", "from feature_engine.imputation import ArbitraryNumberImputer, CategoricalImputer\n", "from feature_engine.transformation import BoxCoxTransformer" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilities...PoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
0160RL65.08450PaveNaNRegLvlAllPub...0NaNNaNNaN022008WDNormal208500
1220RL80.09600PaveNaNRegLvlAllPub...0NaNNaNNaN052007WDNormal181500
2360RL68.011250PaveNaNIR1LvlAllPub...0NaNNaNNaN092008WDNormal223500
3470RL60.09550PaveNaNIR1LvlAllPub...0NaNNaNNaN022006WDAbnorml140000
4560RL84.014260PaveNaNIR1LvlAllPub...0NaNNaNNaN0122008WDNormal250000
\n", "

5 rows × 81 columns

\n", "
" ], "text/plain": [ " Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \\\n", "0 1 60 RL 65.0 8450 Pave NaN Reg \n", "1 2 20 RL 80.0 9600 Pave NaN Reg \n", "2 3 60 RL 68.0 11250 Pave NaN IR1 \n", "3 4 70 RL 60.0 9550 Pave NaN IR1 \n", "4 5 60 RL 84.0 14260 Pave NaN IR1 \n", "\n", " LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold \\\n", "0 Lvl AllPub ... 0 NaN NaN NaN 0 2 \n", "1 Lvl AllPub ... 0 NaN NaN NaN 0 5 \n", "2 Lvl AllPub ... 0 NaN NaN NaN 0 9 \n", "3 Lvl AllPub ... 0 NaN NaN NaN 0 2 \n", "4 Lvl AllPub ... 0 NaN NaN NaN 0 12 \n", "\n", " YrSold SaleType SaleCondition SalePrice \n", "0 2008 WD Normal 208500 \n", "1 2007 WD Normal 181500 \n", "2 2008 WD Normal 223500 \n", "3 2006 WD Abnorml 140000 \n", "4 2008 WD Normal 250000 \n", "\n", "[5 rows x 81 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Read data\n", "data = pd.read_csv('houseprice.csv')\n", "data.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((1022, 79), (438, 79))" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's separate into training and testing set\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " data.drop(['Id', 'SalePrice'], axis=1), data['SalePrice'], test_size=0.3, random_state=0)\n", "\n", "X_train.shape, X_test.shape" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BoxCoxTransformer(variables=['LotArea', 'GrLivArea'])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's transform 2 variables\n", "\n", "bct = BoxCoxTransformer(variables = ['LotArea', 'GrLivArea'])\n", "\n", "# find the optimal lambdas \n", "bct.fit(X_train)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'LotArea': 0.022716974992922984, 'GrLivArea': 0.06854346283829917}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# these are the exponents for the BoxCox transformation\n", "\n", "bct.lambda_dict_" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# transfor the variables\n", "\n", "train_t = bct.transform(X_train)\n", "test_t = bct.transform(X_test)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 0, 'GrLivArea')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAEWCAYAAAB/tMx4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAYfklEQVR4nO3df5RdZX3v8ffX8CNA0vDTWZjkEqwUCo0/YBbipVcngIKgQtfFXryoidKmWkWsWEHt9dfFil2tSG9bLYoVrkJAtELpVeTX6MJKNAEkBKQECIbID1FAgrQY/N4/9jPmcHLOzElyJjNP5v1a66zZ+9n77P3sZ+Z85jnP2XufyEwkSfV5zkRXQJK0eQxwSaqUAS5JlTLAJalSBrgkVcoAl6RKGeBTSESsi4jn97DevIjIiNiuy/KPRMSXNrMOqyPiqM187v4RcUtEPBER79qcbdQiIv4gItaU39lLJro+7SJiZUQMTXQ9pjoDfJKKiG9GxMc6lB8fEQ92C9fRZOaMzLynPzWcEO8Drs/MmZn5t+O5o4hYFBE3jOc+xvDXwDvL7+zmCawHEfHFiDirtSwzD8rM4QmqkgoDfPK6AHhjRERb+ZuAL2fm+l43tDlhP0ntA6zcnCeORxtExLR+b7PFlhzreNZLk0lm+piED2An4HHg5S1luwH/AbwIOBT4HvAY8ADwd8AOLesm8A7gLuDelrIXlOnjgJuBXwBrgI+0PHdeWXcx8JOy/fe2LP8I8KWW+cOAfyt1+SEwNMpxrQbeD9wOPAr8EzC9ZflrgFvKtv4NeGEpvw54phz/OuB3gFnAhcBPgfuAvwCeU9ZfBHwXOAf4GXAWsCNNz/bHwEPAZ4GdOtTxd8t+nin7eqyUfxH4DPD/gCeBo3psx4Vln48AH2xZfiiwrDz3IeBTpY7ryvOeBO5uqdNwaZeVwOtattOpXquBPwduLWXnAwPAN4AngGuA3Vq28RXgQZq/ue8AB5XyxcCvgKdLvf6l5fd4VJneEfg0zd/KT8r0jmXZEHA/cDrwMM3f0lsm+vW1rTwmvAI+RvnlwOeAz7fM/wlwS5k+hCY4tytBcQfw7pZ1E7ga2H0kpHh2gA8B82nehb2wBMgJZdlI8FwM7FLW+2nLC/YjlAAHZtME5LFlW68s83t1OabVwG3A3FK37wJnlWUvKS/ylwLTaIJvdUsYDAN/1LKtC4HLgZmlzv8OnFKWLQLWA6eWNtqJJsyvKPudCfwL8Iku9VwE3NBW9kWagDu8HOv0Htvxc2X/LwL+E/jdsvx7wJvK9AzgsLbf38jvantgFfABYAfgCJoQ3n+Ueq0GbqQJ7dmlXW8qbTyd5h/ih1v299bSJiNhfEvbcZ/V4fc48vfwsbKv5wJ70fzj/d8tf2fryzrb0/yd/JKWfx4+tiAjJroCPkb55cDv0/S4ppf57wJ/1mXddwP/3DKfwBFt6/wmFDo8/9PAOWV6JHgOaFn+V8D5ZfojbAjwM4D/27atq4CFXfazGnhby/yxbOhlfmbkhd+y/E7gFWV6mBLgNAH/NHBgy7p/AgyX6UXAj1uWBU1P9Ldbyl5GeXfSoZ6L6BzgF47xO+vUjnNaln8fOKlMfwf4KLBnh+20Bvh/o+kdP6dl+cWU3n6nepV2Prll/qvAZ1rmTwW+3uUYdi37n9Wy/dEC/G7g2JZlRwOry/QQ8BSwXcvyh2n5Z+Vj8x+OgU9imXkDzdvuEyLit2necl8EEBG/ExFXlg80fwH8JbBn2ybWdNt2RLw0Iq6PiJ9GxOPA28Z4/n3A8zpsah/g9RHx2MiD5h/P3qMcWrft7gOc3ratuV32uydNj+6+tm3N7rKfvYCdgeUt2/5mKd8Uz2rTHtvxwZbpX9L0tgFOoRkK+lFE/CAiXtNln88D1mTmr1vKRjvWEQ+1TD/VYX5GOYZpEXF2RNxd/pZWl3Xaj6Ob57Hx76H1d/azfPZnNq1toC1ggE9+FwJvBt4IXJWZIy/CzwA/AvbLzN+ieXvd/oHnaLeavIhmOGFuZs6iGQ9uf/7clun/QjO+2W4NTQ9815bHLpl59ij77rbdNcDH27a1c2Ze3GEbj9CMze7Ttq21LfPZtv5TNGO7I9uelZndgqRb27WX99KOnTeUeVdmvoFm6OGTwGURsUuHVX8CzI2I1tfraMe6qf4ncDzN2PksmncOsOE4xtr2T9j499Dpb0V9ZoBPfhfSvLD+mObMlBEzaT78WhcRBwBv38TtzgR+npn/ERGH0ryI2/2viNg5Ig4C3gJc0mGdLwGvjYijS09uekQMRcScUfb9joiYExG7Ax9s2e7ngLeVXm1ExC4RcVxEzGzfQGY+A1wKfDwiZkbEPsB7Sn02UnqvnwPOiYjnAkTE7Ig4uksdHwLmRMQOoxwH9NaOHUXEGyNir1K3x0rxrzusupSm1/q+iNi+nH/9WmBJr/saw0yasfmf0bxL+cu25Q8Bo10/cDHwFxGxV0TsCXyILr8H9ZcBPsll5mqaD4V2oenpjXgvTVg8QRNMncJ1NH8KfCwinqB5wV3aYZ1v03x4di3w15n5rQ71W0PTe/sAzQeda2jOfhjtb+si4FvAPTTjp2eVbS2j+Uf1dzRnqKyiGYvu5lSace17gBvKdr8wyvpnlG3eWIYKrgH277LudTRnezwYEY+Mss1e2rGbY4CVEbEOOJdmbPyp9pUy82mawH41zTuJfwDenJk/2oR9jeZCmmGPtTRnB93Ytvx84MAy9PT1Ds8/i+ZsmluBFTQflp7VYT31WZQPFSRJlbEHLkmVMsAlqVIGuCRVygCXpEpt1Zsc7bnnnjlv3rytuctJ4cknn2SXXTqd3ju12S6d2S6dTeV2Wb58+SOZudFFZ1s1wOfNm8eyZcu25i4nheHhYYaGhia6GpOO7dKZ7dLZVG6XiLivU7lDKJJUKQNckiplgEtSpQxwSaqUAS5JlTLAJalSBrgkVcoAl6RKGeCSVKmteiWm6jDvzH/tWL767OO2ck0kjcYeuCRVygCXpEoZ4JJUKQNckiplgEtSpQxwSaqUAS5JlTLAJalSBrgkVcoAl6RK9RTgEfFnEbEyIm6LiIsjYnpE7BsRSyNiVURcEhE7jHdlJUkbjBngETEbeBcwmJm/B0wDTgI+CZyTmS8AHgVOGc+KSpKerdchlO2AnSJiO2Bn4AHgCOCysvwC4IS+106S1FVk5tgrRZwGfBx4CvgWcBpwY+l9ExFzgW+UHnr7cxcDiwEGBgYOWbJkSf9qX4l169YxY8aMia5Gz1asfbxj+fzZs/q6n9raZWuxXTqbyu2yYMGC5Zk52F4+5u1kI2I34HhgX+Ax4CvAMb3uODPPA84DGBwczKGhoV6fus0YHh6mpuNe1O12sicP9XU/tbXL1mK7dGa7bKyXIZSjgHsz86eZ+Svga8DhwK5lSAVgDrB2nOooSeqglwD/MXBYROwcEQEcCdwOXA+cWNZZCFw+PlWUJHUyZoBn5lKaDytvAlaU55wHnAG8JyJWAXsA549jPSVJbXr6SrXM/DDw4bbie4BD+14jSVJPvBJTkiplgEtSpQxwSaqUAS5JlTLAJalSBrgkVaqn0whVt3ndLo0/+7itXBNJ/WQPXJIqZYBLUqUMcEmqlAEuSZUywCWpUga4JFXKAJekShngklQpA1ySKmWAS1KlDHBJqpQBLkmVMsAlqVIGuCRVytvJTmHdbjMrqQ72wCWpUga4JFXKAJekSjkGPgls6li0X4UmCeyBS1K1DHBJqpQBLkmVMsAlqVJ+iLkVeeGMpH6yBy5JlTLAJalSDqFUyKEYSWAPXJKqZQ98C3TrCXulpKStwR64JFXKAJekShngklSpngI8InaNiMsi4kcRcUdEvCwido+IqyPirvJzt/GurCRpg1574OcC38zMA4AXAXcAZwLXZuZ+wLVlXpK0lYwZ4BExC3g5cD5AZj6dmY8BxwMXlNUuAE4YnypKkjqJzBx9hYgXA+cBt9P0vpcDpwFrM3PXsk4Aj47Mtz1/MbAYYGBg4JAlS5b0r/YTbMXaxzuWz58961nz69atY8aMGV3Xr0X7cW2pkXbRs9kunU3ldlmwYMHyzBxsL+8lwAeBG4HDM3NpRJwL/AI4tTWwI+LRzBx1HHxwcDCXLVu2OfWflHo9D3x4eJihoaHqr6Ds9/ntI+2iZ7NdOpvK7RIRHQO8lzHw+4H7M3Npmb8MOBh4KCL2LhvfG3i4X5WVJI1tzADPzAeBNRGxfyk6kmY45QpgYSlbCFw+LjWUJHXU66X0pwJfjogdgHuAt9CE/6URcQpwH/CH41NFSVInPQV4Zt4CbDT+QtMblyRNAK/ElKRKGeCSVCkDXJIqZYBLUqUMcEmqlAEuSZUywCWpUga4JFXKAJekShngklQpA1ySKtXrzaykUe9n3u97hUsamz1wSaqUAS5JlTLAJalSBrgkVcoAl6RKGeCSVClPIxwH7afbnT5/PYtGOQVPkjaHPXBJqpQBLkmVMsAlqVIGuCRVygCXpEoZ4JJUKQNckiplgEtSpQxwSaqUV2JqXHX7Egi/AELacvbAJalSBrgkVcoAl6RKGeCSVCkDXJIqZYBLUqUMcEmqlAEuSZUywCWpUga4JFWq5wCPiGkRcXNEXFnm942IpRGxKiIuiYgdxq+akqR2m9IDPw24o2X+k8A5mfkC4FHglH5WTJI0up4CPCLmAMcBny/zARwBXFZWuQA4YRzqJ0nqIjJz7JUiLgM+AcwE3gssAm4svW8iYi7wjcz8vQ7PXQwsBhgYGDhkyZIlfav8RFux9vGe1hvYCR56apwrM8Hmz57VsbxbG82fPYt169YxY8aM8axWlWyXzqZyuyxYsGB5Zg62l495O9mIeA3wcGYuj4ihTd1xZp4HnAcwODiYQ0ObvIlJa1GXW6W2O33+ev5mxbZ9597VJw91LO/WRqtPHmJ4eJht6e+hX2yXzmyXjfWSKocDr4uIY4HpwG8B5wK7RsR2mbkemAOsHb9qSpLajTkGnpnvz8w5mTkPOAm4LjNPBq4HTiyrLQQuH7daSpI2siXngZ8BvCciVgF7AOf3p0qSpF5s0sBsZg4Dw2X6HuDQ/ldJktQLr8SUpEoZ4JJUqW373DZtNd2+fV7S+LEHLkmVsgc+BnuWkiYre+CSVCkDXJIqZYBLUqUMcEmqlAEuSZUywCWpUga4JFXKAJekShngklQpA1ySKmWAS1KlDHBJqpQBLkmVMsAlqVIGuCRVygCXpEoZ4JJUKQNckirlV6ppQsw78185ff56FrV9Zd3qs4+boBpJ9bEHLkmVMsAlqVIGuCRVygCXpEr5IaYmlXltH2qO8MNNaWP2wCWpUga4JFXKAJekShngklQpA1ySKmWAS1KlDHBJqpQBLkmVMsAlqVIGuCRVaswAj4i5EXF9RNweESsj4rRSvntEXB0Rd5Wfu41/dSVJI3rpga8HTs/MA4HDgHdExIHAmcC1mbkfcG2ZlyRtJWMGeGY+kJk3lekngDuA2cDxwAVltQuAE8apjpKkDjZpDDwi5gEvAZYCA5n5QFn0IDDQ36pJkkYTmdnbihEzgG8DH8/Mr0XEY5m5a8vyRzNzo3HwiFgMLAYYGBg4ZMmSJX2p+NayYu3jW7yNgZ3goaf6UJltzKa0y/zZs8a3MpPIunXrmDFjxkRXY9KZyu2yYMGC5Zk52F7eU4BHxPbAlcBVmfmpUnYnMJSZD0TE3sBwZu4/2nYGBwdz2bJlm3UAE6Xb/ak3xenz1/M3K7z1ertNaZepdD/w4eFhhoaGJroak85UbpeI6BjgY756IiKA84E7RsK7uAJYCJxdfl7ep7pKPfMLIDSV9dL9ORx4E7AiIm4pZR+gCe5LI+IU4D7gD8elhpKkjsYM8My8AYgui4/sb3UkSb3ySkxJqpQBLkmVMsAlqVIGuCRVygCXpEoZ4JJUKQNckirl9d3aJnmFpqYCe+CSVCl74EU/blql8ePvR9qYPXBJqpQBLkmVMsAlqVIGuCRVygCXpEoZ4JJUKQNckiplgEtSpQxwSaqUV2JqStnUKzq9d4omM3vgklQpe+DSKLyroSYze+CSVCkDXJIqZYBLUqUMcEmqlAEuSZUywCWpUga4JFXKAJekShngklSpKXclpt9urn7wCk1NBvbAJalSU64HLo2nzXmHZ69dm8seuCRVygCXpEo5hCJtI/xgdeqxBy5Jlaq+B26vQ1NNv06F3dTXjq+1ycceuCRVaot64BFxDHAuMA34fGae3ZdadeAFONpWtf9tnz5/PYsm8O99W36tjfe7iK39LmWze+ARMQ34e+DVwIHAGyLiwH5VTJI0ui0ZQjkUWJWZ92Tm08AS4Pj+VEuSNJbIzM17YsSJwDGZ+Udl/k3ASzPznW3rLQYWl9n9gTs3v7rV2hN4ZKIrMQnZLp3ZLp1N5XbZJzP3ai8c97NQMvM84Lzx3s9kFhHLMnNwousx2dgundkundkuG9uSIZS1wNyW+TmlTJK0FWxJgP8A2C8i9o2IHYCTgCv6Uy1J0lg2ewglM9dHxDuBq2hOI/xCZq7sW822LVN6CGkUtktntktntkubzf4QU5I0sbwSU5IqZYBLUqUM8M0QEV+IiIcj4raWst0j4uqIuKv83K2UR0T8bUSsiohbI+LglucsLOvfFRELJ+JY+iki5kbE9RFxe0SsjIjTSvmUbpuImB4R34+IH5Z2+Wgp3zcilpbjv6ScDEBE7FjmV5Xl81q29f5SfmdEHD1Bh9RXETEtIm6OiCvLvO3Sq8z0sYkP4OXAwcBtLWV/BZxZps8EPlmmjwW+AQRwGLC0lO8O3FN+7lamd5voY9vCdtkbOLhMzwT+neY2C1O6bcrxzSjT2wNLy/FeCpxUyj8LvL1M/ynw2TJ9EnBJmT4Q+CGwI7AvcDcwbaKPrw/t8x7gIuDKMm+79PiwB74ZMvM7wM/bio8HLijTFwAntJRfmI0bgV0jYm/gaODqzPx5Zj4KXA0cM+6VH0eZ+UBm3lSmnwDuAGYzxdumHN+6Mrt9eSRwBHBZKW9vl5H2ugw4MiKilC/JzP/MzHuBVTS3tKhWRMwBjgM+X+YD26VnBnj/DGTmA2X6QWCgTM8G1rSsd38p61a+TShvb19C09uc8m1ThgluAR6m+Yd0N/BYZq4vq7Qe42+Ovyx/HNiDbbBdgE8D7wN+Xeb3wHbpmQE+DrJ5Xzdlz8+MiBnAV4F3Z+YvWpdN1bbJzGcy88U0VywfChwwsTWaeBHxGuDhzFw+0XWplQHePw+Vt/+Unw+X8m63HNgmb0UQEdvThPeXM/Nrpdi2KTLzMeB64GU0Q0YjF9O1HuNvjr8snwX8jG2vXQ4HXhcRq2nuZnoEzfcLTPV26ZkB3j9XACNnSywELm8pf3M54+Iw4PEynHAV8KqI2K2clfGqUlatMh55PnBHZn6qZdGUbpuI2Csidi3TOwGvpPl84HrgxLJae7uMtNeJwHXlncsVwEnlbIx9gf2A72+VgxgHmfn+zJyTmfNoPpS8LjNPZoq3yyaZ6E9Ra3wAFwMPAL+iGW87hWYs7lrgLuAaYPeybtB88cXdwApgsGU7b6X5wGUV8JaJPq4+tMvv0wyP3ArcUh7HTvW2AV4I3Fza5TbgQ6X8+TRBswr4CrBjKZ9e5leV5c9v2dYHS3vdCbx6oo+tj200xIazUGyXHh9eSi9JlXIIRZIqZYBLUqUMcEmqlAEuSZUywCWpUga4qhERAxFxUUTcExHLI+J7EfEHHdabFy13imwp/1hEHNXDfl4cERkR1d5/RVODAa4qlIuEvg58JzOfn5mH0Fz8Madtva5fE5iZH8rMa3rY3RuAG8rPjnWJCF87mnD+EaoWRwBPZ+ZnRwoy877M/D8RsSgiroiI62guGOooIr4YESdGxDER8ZWW8qGWe1EH8HpgEfDKiJheyueVe01fSHMxztyI+POI+EE09zL/aMv2vl7eIayMiMX9bQZpAwNctTgIuGmU5QcDJ2bmK3rY1jXASyNilzL/P2juxQHwX4F7M/NuYJjmVqcj9gP+ITMPAvYv84cCLwYOiYiXl/XeWt4hDALviog9eqiTtMkMcFUpIv4+mm+4+UEpujoz2+/R3lE2tyL9JvDaMuRyHBvut/EGNoT5Ep49jHJfNvcth+b+LK+iuUT+Jpq7C+5Xlr0rIn4I3Ehzk6X9kMZB1/FCaZJZCfz3kZnMfEdE7AksK0VPbuL2lgDvpPlijmWZ+URETCv7OD4iPkhzr5Y9ImJmh30E8InM/MfWjUbEEHAU8LLM/GVEDNPcw0PqO3vgqsV1wPSIeHtL2c5bsL1v0wy7/DEbetxHArdm5tzMnJeZ+9DcGnejM11o7o741nLvcyJidkQ8l+YWp4+W8D6A5qvTpHFhgKsK2dx17QTgFRFxb0R8n+brtc7o8pT9I+L+lsfr27b3DHAl8OryE5rhkn9u285X6XA2SmZ+i+Z7HL8XEStovuJrJs3QzHYRcQdwNs0wijQuvBuhJFXKHrgkVcoAl6RKGeCSVCkDXJIqZYBLUqUMcEmqlAEuSZX6/7bsZE0ceeJSAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# variable before transformation\n", "X_train['GrLivArea'].hist(bins=50)\n", "plt.title('Variable before transformation')\n", "plt.xlabel('GrLivArea')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 0, 'GrLivArea')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAEWCAYAAAB/tMx4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVqUlEQVR4nO3df7RdZX3n8fdHIvIjEBAwpQEJHRkskpGRu9TqUm+EWgQt2EFHxtGgYNpprUylU7B11drqGuzSUabtrEpLhalCqkiVhaOC6BVthRoQDZA6CoISEbAGNMpUo9/54+zIyeX+OEnuvec8N+/XWmfds5+9zz7f8+TcT577nL33SVUhSWrPY4ZdgCRp5xjgktQoA1ySGmWAS1KjDHBJapQBLkmNMsA1MpIsT3J9ku8neeew6+mXZGWSSrJkAZ7rOUm+MuC2Zyb53AzrJ5KcPXfVaZTM+5tRbUiypW9xH+BfgZ90y79eVe9fgDLWAt8B9q/d+ASFqvoscPSw69DoM8AFQFUt3XY/yV3A2VX1ycnbJVlSVVvnqYwjgNt3Jrznua4Fs1hehxaGUyiaUZLxJPckOS/Jt4H3JjkwydVJHkiyubt/WN9jJpL8SZJ/6KZDrklycLduryTvS/IvSR5M8oVu6uQSYA3we0m2JDkxyeOSvDvJt7rbu5M8boa6/ijJB7v9fz/JhiT/Nskbk9yf5JtJXtBX57IkFye5N8mmJG9Nske3bo8k70jynSR3AqfM0EfnJbliUtuFSf5nd//VSTZ2Nd2Z5Ndn6d/xJPf0bXN+kju6x9+e5CWPLiF/nuShJP+c5IQZan1NV8vmJJ9IcsR022r0GeAaxM8Bj6c3Ql5L733z3m75icDDwJ9Pesx/Al4NPAHYE/jdrn0NsAw4HDgI+A3g4ao6E3g/8KdVtbQb/f8B8EzgOOCpwNOBN81QF8CLgb8FDgS+CHyiq3cF8MfAe/oefwmwFXgS8O+BFwDb5otfC7yoax8DTp+hf9YBJyfZD3rhD7wMuKxbf3+3r/27PnlXkqfN8jr63QE8h16/vQV4X5JD+9Y/o9vmYODNwJVJHj95J0lOBX4f+DXgEOCzwOUzvC6Nuqry5m27G3AXcGJ3fxz4EbDXDNsfB2zuW54A3tS3/JvAx7v7rwH+Efh3U+znEuCtfct3ACf3Lf8KcNd0dQF/BFzbt/xiYAuwR7e8H1DAAcByevP8e/dtfwbw6e7+p4Df6Fv3gu6xS6bpg88Br+ru/zJwxwz99WHgnBlexzhwzwyPvwU4tbt/JvAtIH3r/wl4Zd+/xdnd/Y8BZ/Vt9xjgh8ARw37Pedu5myNwDeKBqvp/2xaS7JPkPUnuTvI94HrggG3TD51v993/IbBtjv1v6Y2K13XTIn+a5LHTPO/PA3f3Ld/dtU1ZV+e+vvsPA9+pqp/0LdPVcgTwWODebirnQXqj8yf0Pfc3Jz33TC6j9x8A9P762Db6JskLk9yQ5Lvd85xMb7Q80+v4mSSvSnJLX53HTnr8puoSua/W/n7a5gjgwr79fBcIvb9O1CADXIOY/KHiufSOknhGVe0PPLdrz6w7qvpxVb2lqo4BnkVvauFV02z+LXqhs80Tu7bp6toR36Q3Aj+4qg7obvtX1VO69ffSm+bpf+6ZfBAY7z4LeAldgHdz9h8C3gEsr6oDgP/D9n017evo5qj/CngdcFD3+FsnPX5Fkv7lyf20zTfpHVF0QN9t76r6x1lem0aUAa6dsR+90eyD3Vzrmwd9YJLVSVZ1o/XvAT8GfjrN5pcDb0pySPch6B8C79u10nuq6l7gGuCdSfZP8pgk/ybJ87pNPgC8PslhSQ4Ezp9lfw/Qm654L/D1qtrYrdoTeBzwALA1yQvpTccMal96Af8A9D4QpTcC7/eErtbHJnkp8Iv0/pOY7C+BNyZ5SrevZd32apQBrp3xbmBvesds3wB8fAce+3PAFfTCeyPwGXrTKlN5K7Ae+DKwAbi5a5srr6IXsLcDm7u6tn04+Ff0pnq+1D3vlQPs7zLgRPqmT6rq+8Dr6f2HsJne9MpVgxZYVbcD7wQ+T296aBXwD5M2uxE4it6/x9uA06vqX6bY198Db6c3ffU9eiP5Fw5ai0ZPtp86kyS1whG4JDXKAJekRhngktQoA1ySGrWgF7M6+OCDa+XKldu1/eAHP2DfffddyDKaYd9Mz76Zmv0yvZb75qabbvpOVR0yuX1BA3zlypWsX79+u7aJiQnGx8cXsoxm2DfTs2+mZr9Mr+W+STLlmcBOoUhSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMW9ExMaXe28vyPTtl+1wWnLHAlWiwcgUtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQMFeJLfSXJbkluTXJ5kryRHJrkxydeS/F2SPee7WEnSI2YN8CQrgNcDY1V1LLAH8HLg7cC7qupJwGbgrPksVJK0vUGnUJYAeydZAuwD3As8H7iiW38pcNqcVydJmlaqavaNknOAtwEPA9cA5wA3dKNvkhwOfKwboU9+7FpgLcDy5cuPX7du3Xbrt2zZwtKlS3fxZSxO9s30WuybDZsemrJ91Yplc/YcLfbLQmm5b1avXn1TVY1Nbp/1Cx2SHAicChwJPAh8EDhp0CeuqouAiwDGxsZqfHx8u/UTExNMblOPfTO9FvvmzOm+0OEV43P2HC32y0JZjH0zyBTKicDXq+qBqvoxcCXwbOCAbkoF4DBg0zzVKEmawiAB/g3gmUn2SRLgBOB24NPA6d02a4CPzE+JkqSpzBrgVXUjvQ8rbwY2dI+5CDgPeEOSrwEHARfPY52SpEkG+lLjqnoz8OZJzXcCT5/ziiRJA/FMTElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGDfSt9JK2t/L8j07ZftcFpyxwJdqdOQKXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjfIwQmkOTXd4oTQfHIFLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGuWJPFqUvF63dgeOwCWpUQMFeJIDklyR5J+TbEzyS0ken+TaJF/tfh4438VKkh4x6Aj8QuDjVfVk4KnARuB84LqqOgq4rluWJC2QWQM8yTLgucDFAFX1o6p6EDgVuLTb7FLgtPkpUZI0lVTVzBskxwEXAbfTG33fBJwDbKqqA7ptAmzetjzp8WuBtQDLly8/ft26ddut37JlC0uXLt3Fl7E42TfTm61vNmx6aMr2VSuWzcnzT7f/nTFXNYHvmZm03DerV6++qarGJrcPEuBjwA3As6vqxiQXAt8Dfrs/sJNsrqoZ58HHxsZq/fr127VNTEwwPj4+6OvYrdg305utb+b7KJS5vGzsXB4Z43tmei33TZIpA3yQOfB7gHuq6sZu+QrgacB9SQ7tdn4ocP9cFStJmt2sAV5V3wa+meTorukEetMpVwFrurY1wEfmpUJJ0pQGPZHnt4H3J9kTuBN4Nb3w/0CSs4C7gZfNT4mSpKkMFOBVdQvwqPkXeqNxSdIQeCamJDXKAJekRhngktQoA1ySGuXlZCW8/Kza5AhckhrlCFwaMkf/2lmOwCWpUY7ApRnM5UWrpLnmCFySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcrLyUqN8QsgtI0jcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoDyPUbsVvmddi4ghckhrlCFxNmDxyPnfVVs48/6OevKLdmiNwSWqUAS5JjTLAJalRBrgkNcoPMaUR5SGPms3AI/AkeyT5YpKru+Ujk9yY5GtJ/i7JnvNXpiRpsh2ZQjkH2Ni3/HbgXVX1JGAzcNZcFiZJmtlAAZ7kMOAU4K+75QDPB67oNrkUOG0e6pMkTWPQEfi7gd8DftotHwQ8WFVbu+V7gBVzW5okaSapqpk3SF4EnFxVv5lkHPhd4Ezghm76hCSHAx+rqmOnePxaYC3A8uXLj1+3bt1267ds2cLSpUt3+YUsRrtj32zY9NBA2y3fG+57GFatWLZL+1lMVq1Ytlu+ZwbVct+sXr36pqoam9w+yFEozwZ+NcnJwF7A/sCFwAFJlnSj8MOATVM9uKouAi4CGBsbq/Hx8e3WT0xMMLlNPbtj35w54JEX567ayjs3LOGuV4zv0n4Wk7teMb5bvmcGtRj7ZtYplKp6Y1UdVlUrgZcDn6qqVwCfBk7vNlsDfGTeqpQkPcqunMhzHvCGJF+jNyd+8dyUJEkaxA6dyFNVE8BEd/9O4OlzX5I0OE920e7MU+klqVEGuCQ1ygCXpEYZ4JLUKK9GKC0SK8//6M++aq6fXzu3eDkCl6RGGeCS1CgDXJIaZYBLUqP8EFNzYrozIqf7AM0zKKVd5whckhrlCFwDc9QsjRZH4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGuVXqmle+TVso2tHv4hao8cRuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRswZ4ksOTfDrJ7UluS3JO1/74JNcm+Wr388D5L1eStM0gI/CtwLlVdQzwTOC3khwDnA9cV1VHAdd1y5KkBTJrgFfVvVV1c3f/+8BGYAVwKnBpt9mlwGnzVKMkaQqpqsE3TlYC1wPHAt+oqgO69gCbty1PesxaYC3A8uXLj1+3bt1267ds2cLSpUt3rvpFbtT6ZsOmh4Zdws8s3xvue3jYVYyeueiXVSuWzU0xI2bUfp92xOrVq2+qqrHJ7QMHeJKlwGeAt1XVlUke7A/sJJurasZ58LGxsVq/fv12bRMTE4yPjw9Uw+5m1PpmlC5Mde6qrbxzg9dim2wu+mWxXsxq1H6fdkSSKQN8oKNQkjwW+BDw/qq6smu+L8mh3fpDgfvnqlhJ0uwGOQolwMXAxqr6H32rrgLWdPfXAB+Z+/IkSdMZ5G+tZwOvBDYkuaVr+33gAuADSc4C7gZeNi8VSpKmNGuAV9XngEyz+oS5LUeSNCjPxJSkRvkxvh5llI420cLzq9ba4QhckhplgEtSowxwSWqUAS5JjfJDTEkD8cPN0eMIXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKwwgl7RIPLxweR+CS1CgDXJIa5RTKbszLxkptcwQuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ymuhSJoXM11rx0vNzg1H4JLUKEfgkhacXwIxNxyBS1KjHIHvBrzut1rniH1qjsAlqVEGuCQ1yimURcSpEqlnqt+Fc1dtZXzhS5lXjsAlqVGOwBvkSFuL1Y6+t3f33wVH4JLUqF0agSc5CbgQ2AP466q6YE6qmsJ8H0a0M/uf75o2bHqIM3fzEYY0inZ05D9fhzvu9Ag8yR7AXwAvBI4BzkhyzFwVJkma2a5MoTwd+FpV3VlVPwLWAafOTVmSpNmkqnbugcnpwElVdXa3/ErgGVX1uknbrQXWdotHA1+ZtKuDge/sVBGLn30zPftmavbL9FrumyOq6pDJjfN+FEpVXQRcNN36JOuramy+62iRfTM9+2Zq9sv0FmPf7MoUyibg8L7lw7o2SdIC2JUA/wJwVJIjk+wJvBy4am7KkiTNZqenUKpqa5LXAZ+gdxjh31TVbTuxq2mnV2TfzMC+mZr9Mr1F1zc7/SGmJGm4PBNTkhplgEtSo4YW4EmOTnJL3+17Sf7rsOoZJUl+J8ltSW5NcnmSvYZd06hIck7XL7ft7u+XJH+T5P4kt/a1PT7JtUm+2v08cJg1Dss0ffPS7n3z0ySL4nDCoQV4VX2lqo6rquOA44EfAn8/rHpGRZIVwOuBsao6lt4HxC8fblWjIcmxwGvpnQX8VOBFSZ403KqG6hLgpElt5wPXVdVRwHXd8u7oEh7dN7cCvwZcv+DVzJNRmUI5Abijqu4ediEjYgmwd5IlwD7At4Zcz6j4ReDGqvphVW0FPkPvF3K3VFXXA9+d1HwqcGl3/1LgtIWsaVRM1TdVtbGqJp8J3rRRCfCXA5cPu4hRUFWbgHcA3wDuBR6qqmuGW9XIuBV4TpKDkuwDnMz2J5MJllfVvd39bwPLh1mM5tfQA7w7CehXgQ8Ou5ZR0M1ZngocCfw8sG+S/zzcqkZDVW0E3g5cA3wcuAX4yTBrGmXVO0bY44QXsaEHOL3L0d5cVfcNu5ARcSLw9ap6oKp+DFwJPGvINY2Mqrq4qo6vqucCm4H/O+yaRsx9SQ4F6H7eP+R6NI9GIcDPwOmTft8AnplknySh9/nAxiHXNDKSPKH7+UR689+XDbeikXMVsKa7vwb4yBBr0Twb6pmYSfalF1i/UFUPDa2QEZPkLcB/BLYCXwTOrqp/HW5VoyHJZ4GDgB8Db6iq64Zc0tAkuRwYp3eZ1PuANwMfBj4APBG4G3hZVU3+oHPRm6Zvvgv8GXAI8CBwS1X9ypBKnBOeSi9JjRqFKRRJ0k4wwCWpUQa4JDXKAJekRhngktQoA1zNSLI8yWVJ7kxyU5LPJ3nJFNut7L8KXV/7Hyc5cYDnOS5JJZl8MSRppBjgakJ3UtOHgeur6heq6nh619A5bNJ2035NYFX9YVV9coCnOwP4XPdzylqS+LujofNNqFY8H/hRVf3ltoaquruq/izJmUmuSvIpepdQnVKSS5KcnuSkJB/sax9PcnV3P8BLgTOBX952LfZuVP+VJP+b3kW1Dk/y35J8IcmXu5Ovtu3vw91fCLclWTu33SA9wgBXK54C3DzD+qcBp1fV8wbY1yeBZ3RnAkPvrNd13f1n0bsWzR3ABHBK3+OOAv5XVT0FOLpbfjpwHHB8kud2272m+wthDHh9koMGqEnaYQa4mpTkL5J8KckXuqZrBz1lvLuW+MeBF3dTLqfwyDVDzuCRMF/H9tMod1fVDd39F3S3L9L7j+XJ9AIdeqH9JeAGepe7PQppHkw7XyiNmNuA/7Btoap+K8nBwPqu6Qc7uL91wOvoXR9jfVV9P8ke3XOcmuQPgAAHJdlviucI8N+r6j39O00yTu+Kkr9UVT9MMgH4lXiaF47A1YpPAXsl+S99bfvswv4+Q2/a5bU8MuI+AfhyVR1eVSur6gjgQ8CjjnQBPgG8JslS6H0VXnelxGXA5i68nww8cxdqlGZkgKsJ3ZcTnAY8L8nXk/wTva8MO2+ahxyd5J6+20sn7e8nwNX0rkd/ddd8Bo/+XtYPMcXRKN23JF0GfD7JBuAKYD96UzNLkmwELqA3jSLNC69GKEmNcgQuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1Kj/j9NRVWgumI25QAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# transformed variable\n", "train_t['GrLivArea'].hist(bins=50)\n", "plt.title('Transformed variable')\n", "plt.xlabel('GrLivArea')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 0, 'LotArea')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEWCAYAAACdaNcBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAY3UlEQVR4nO3de5RmVXnn8e8jzb2xG2zsabp7KFDUoCBCD8LSmAK8cDGBrIUKg9BNyHRM0NERRxpxRnRI0mYSMS4zKAQG8EJDNAQCGlSg4sIEsFvut9AwhdByEYSGQoiCz/xxdumheN+6dNd19/ez1rvqnL332Wef/b7vr06d91KRmUiS6vKyqR6AJGn8Ge6SVCHDXZIqZLhLUoUMd0mqkOEuSRUy3EVEDETErqNo1xMRGRGzutSfFhFf3cAx9EfE2zdw29dGxE0R8XRE/NcN6WOmiIjfj4gHyn32pqkez1ARcXtE9E71OGS4zzgR8U8R8ZkO5YdHxMPdgnc4mTk7M+8bnxFOiY8D12Tmdpn5hYncUUQsi4hrJ3IfI/hL4IPlPrtxCsdBRJwXEae3yzLz9ZnZN0VDUovhPvOcD7w/ImJI+bHA1zLz+dF2tCG/CKapnYHbN2TDiZiDiNhsvPts2ZhjnchxabrJTG8z6AZsDawH3tYq2x54DngjsC/wr8CTwEPAF4EtWm0TOBG4B/h/rbJXl+XDgBuBp4AHgNNa2/aUtsuBn5T+P9aqPw34amt9P+BfylhuBnqHOa5+4BTgDuAJ4P8CW7Xq3w3cVPr6F2DPUn418EI5/gHgNcAc4ALgp8D9wCeBl5X2y4AfAGcAjwOnA1vSnBH/GHgE+BKwdYcx/lbZzwtlX0+W8vOAM4FvAc8Abx/lPC4t+3wMOLVVvy+wumz7CPC5MsaBst0zwL2tMfWVebkd+L1WP53G1Q/8d+CWUnYOMB/4NvA08D1g+1Yffwc8TPOY+z7w+lK+HPgl8Isyrn9s3Y9vL8tbAp+neaz8pCxvWep6gQeBk4BHaR5Lx0/186um25QPwNsG3GlwNvC3rfU/Am4qy/vQhOqsEiJ3Ah9ptU3gu8AOgwHGi8O9F9iD5q+6PUu4HFHqBkPpQmDb0u6nrSfzaZRwBxbShOehpa93lPUduxxTP3AbsLiM7QfA6aXuTSUA3gxsRhOK/a2g6AP+sNXXBcClwHZlzP8GnFDqlgHPAx8qc7Q1TdBfVva7HfCPwJ93Gecy4NohZefRhN9byrFuNcp5PLvs/43AvwO/Ver/FTi2LM8G9hty/w3eV5sDa4FPAFsAB9IE9GuHGVc/cB1NoC8s8/qjMsdb0fyy/FRrf39Q5mQwqG8actynd7gfBx8Pnyn7eiWwI80v5f/Vepw9X9psTvM4+TmtXyzeNjInpnoA3jbgToO30pypbVXWfwD8ty5tPwJc0lpP4MAhbX4dGB22/zxwRlkeDKXXter/AjinLJ/Gb8L9ZOArQ/q6EljaZT/9wAda64fym7PTMwdDoVV/N/A7ZbmPEu404f8LYPdW2z8C+sryMuDHrbqgOYN9Vatsf8pfNR3GuYzO4X7BCPdZp3lc1Kq/ATiqLH8f+DQwr0M/7XD/bZqz6pe16i+k/JXQaVxlno9prX8TOLO1/iHgH7ocw9yy/zmt/ocL93uBQ1t17wL6y3Iv8Cwwq1X/KK1fZN427uY19xkoM6+l+VP+iIh4Fc2f8V8HiIjXRMTl5cXVp4A/A+YN6eKBbn1HxJsj4pqI+GlErAc+MML29wM7dehqZ+A9EfHk4I3ml9KCYQ6tW787AycN6Wtxl/3OozkTvH9IXwu77GdHYBtgTavvfyrlY/GiOR3lPD7cWv45zVk6wAk0l5fuiogfRsS7u+xzJ+CBzPxVq2y4Yx30SGv52Q7rs8sxbBYRKyPi3vJY6i9thh5HNzvx0vuhfZ89ni9+jag9B9pIhvvMdQFwHPB+4MrMHHyCngncBeyWmS+n+ZN96Iuvw30V6NdpLlEszsw5NNefh26/uLX8H2mupw71AM2Z+9zWbdvMXDnMvrv1+wDwp0P62iYzL+zQx2M014J3HtLXutZ6Dmn/LM215MG+52Rmt5DpNndDy0czj507yrwnM4+muZzxWeAbEbFth6Y/ARZHRPt5PNyxjtV/Bg6nuVY/h+YvDvjNcYzU90946f3Q6bGiCWC4z1wX0Dzp/gvNO2gGbUfzQtxARLwO+OMx9rsd8LPMfC4i9qV5gg/1PyJim4h4PXA8cFGHNl8Ffjci3lXOALeKiN6IWDTMvk+MiEURsQNwaqvfs4EPlLPhiIhtI+KwiNhuaAeZ+QJwMfCnEbFdROwMfLSM5yXKWe/ZwBkR8UqAiFgYEe/qMsZHgEURscUwxwGjm8eOIuL9EbFjGduTpfhXHZpeT3O2+/GI2Ly8v/x3gVWj3dcItqN5LeBxmr9u/mxI/SPAcJ+PuBD4ZETsGBHzgP9Jl/tB489wn6Eys5/mBaptac4QB32MJkiepgmtTsE7nD8BPhMRT9M8GS/u0OafaV7Iuwr4y8z8TofxPUBz1vcJmhddH6B5l8Zwj7mvA98B7qO5Xnt66Ws1zS+xL9K8k2YtzbXvbj5Ecx39PuDa0u+5w7Q/ufR5Xbn88D3gtV3aXk3zrpSHI+KxYfoczTx2czBwe0QMAH9Ncy3+2aGNMvMXNGF+CM1fIP8HOC4z7xrDvoZzAc2llHU072K6bkj9OcDu5XLWP3TY/nSad/3cAtxK88Lt6R3aaQJEeSFDklQRz9wlqUKGuyRVyHCXpAoZ7pJUoWnxxVHz5s3Lnp6eMW3zzDPPsO22nd76K3B+huPcdOfcdDcd52bNmjWPZWbHD9xNi3Dv6elh9erVY9qmr6+P3t7eiRlQBZyf7pyb7pyb7qbj3ETE/d3qvCwjSRUy3CWpQoa7JFXIcJekChnuklQhw12SKmS4S1KFDHdJqpDhLkkVmhafUJ0IPSuu6Fjev/KwSR6JJE0+z9wlqUKGuyRVyHCXpAoZ7pJUIcNdkipkuEtShQx3SaqQ4S5JFTLcJalChrskVchwl6QKGe6SVCHDXZIqZLhLUoUMd0mqkOEuSRUy3CWpQoa7JFXIcJekCo063CNis4i4MSIuL+u7RMT1EbE2Ii6KiC1K+ZZlfW2p75mgsUuSuhjLmfuHgTtb658FzsjMVwNPACeU8hOAJ0r5GaWdJGkSjSrcI2IRcBjwt2U9gAOBb5Qm5wNHlOXDyzql/qDSXpI0SUZ75v554OPAr8r6K4AnM/P5sv4gsLAsLwQeACj160t7SdIkmTVSg4h4N/BoZq6JiN7x2nFELAeWA8yfP5++vr4xbT8wMDDsNift8XzH8rHuZ6YaaX42Zc5Nd85NdzNtbkYMd+AtwO9FxKHAVsDLgb8G5kbErHJ2vghYV9qvAxYDD0bELGAO8PjQTjPzLOAsgCVLlmRvb++YBt7X18dw2yxbcUXH8v5jxrafmWqk+dmUOTfdOTfdzbS5GfGyTGaekpmLMrMHOAq4OjOPAa4BjizNlgKXluXLyjql/urMzHEdtSRpWBvzPveTgY9GxFqaa+rnlPJzgFeU8o8CKzZuiJKksRrNZZlfy8w+oK8s3wfs26HNc8B7xmFskqQN5CdUJalChrskVchwl6QKGe6SVCHDXZIqZLhLUoUMd0mqkOEuSRUy3CWpQoa7JFXIcJekChnuklQhw12SKmS4S1KFDHdJqpDhLkkVMtwlqUKGuyRVyHCXpAoZ7pJUIcNdkipkuEtShWZN9QAmW8+KK7rW9a88bBJHIkkTxzN3SaqQ4S5JFTLcJalChrskVchwl6QKGe6SVCHDXZIqZLhLUoUMd0mqkOEuSRUy3CWpQoa7JFXIcJekChnuklShEcM9IraKiBsi4uaIuD0iPl3Kd4mI6yNibURcFBFblPIty/raUt8zwccgSRpiNGfu/w4cmJlvBPYCDo6I/YDPAmdk5quBJ4ATSvsTgCdK+RmlnSRpEo0Y7tkYKKubl1sCBwLfKOXnA0eU5cPLOqX+oIiI8RqwJGlkkZkjN4rYDFgDvBr4G+B/A9eVs3MiYjHw7cx8Q0TcBhycmQ+WunuBN2fmY0P6XA4sB5g/f/4+q1atGtPABwYGmD17dtf6W9etH1N/AHssnDPmbaarkeZnU+bcdOfcdDcd5+aAAw5Yk5lLOtWN6t/sZeYLwF4RMRe4BHjdxg4qM88CzgJYsmRJ9vb2jmn7vr4+httm2TD/Tq+b/mPGNobpbKT52ZQ5N905N93NtLkZ07tlMvNJ4Bpgf2BuRAz+clgErCvL64DFAKV+DvD4eAxWkjQ6o3m3zI7ljJ2I2Bp4B3AnTcgfWZotBS4ty5eVdUr91Tmaaz+SpHEzmssyC4Dzy3X3lwEXZ+blEXEHsCoiTgduBM4p7c8BvhIRa4GfAUdNwLglScMYMdwz8xbgTR3K7wP27VD+HPCecRmdJGmD+AlVSaqQ4S5JFTLcJalChrskVchwl6QKGe6SVCHDXZIqZLhLUoUMd0mqkOEuSRUy3CWpQoa7JFXIcJekChnuklQhw12SKmS4S1KFDHdJqpDhLkkVMtwlqUKGuyRVyHCXpAoZ7pJUIcNdkipkuEtShQx3SaqQ4S5JFTLcJalChrskVchwl6QKGe6SVCHDXZIqZLhLUoUMd0mqkOEuSRUy3CWpQoa7JFXIcJekCo0Y7hGxOCKuiYg7IuL2iPhwKd8hIr4bEfeUn9uX8oiIL0TE2oi4JSL2nuiDkCS92GjO3J8HTsrM3YH9gBMjYndgBXBVZu4GXFXWAQ4Bdiu35cCZ4z5qSdKwRgz3zHwoM39Ulp8G7gQWAocD55dm5wNHlOXDgQuycR0wNyIWjPfAJUndRWaOvnFED/B94A3AjzNzbikP4InMnBsRlwMrM/PaUncVcHJmrh7S13KaM3vmz5+/z6pVq8Y08IGBAWbPnt21/tZ168fUH8AeC+eMeZvpaqT52ZQ5N905N91Nx7k54IAD1mTmkk51s0bbSUTMBr4JfCQzn2ryvJGZGRGj/y3RbHMWcBbAkiVLsre3dyyb09fXx3DbLFtxxZj6A+g/ZmxjmM5Gmp9NmXPTnXPT3Uybm1G9WyYiNqcJ9q9l5t+X4kcGL7eUn4+W8nXA4tbmi0qZJGmSjObdMgGcA9yZmZ9rVV0GLC3LS4FLW+XHlXfN7Aesz8yHxnHMkqQRjOayzFuAY4FbI+KmUvYJYCVwcUScANwPvLfUfQs4FFgL/Bw4fjwHLEka2YjhXl4YjS7VB3Von8CJGzkuSdJG8BOqklQhw12SKmS4S1KFDHdJqpDhLkkVMtwlqUKGuyRVyHCXpAoZ7pJUIcNdkipkuEtShQx3SaqQ4S5JFTLcJalChrskVchwl6QKGe6SVCHDXZIqZLhLUoUMd0mqkOEuSRUy3CWpQoa7JFXIcJekChnuklQhw12SKmS4S1KFDHdJqpDhLkkVMtwlqUKGuyRVyHCXpArNmuoBbKyeFVdM9RAkadrxzF2SKmS4S1KFDHdJqpDhLkkVGjHcI+LciHg0Im5rle0QEd+NiHvKz+1LeUTEFyJibUTcEhF7T+TgJUmdjebM/Tzg4CFlK4CrMnM34KqyDnAIsFu5LQfOHJ9hSpLGYsRwz8zvAz8bUnw4cH5ZPh84olV+QTauA+ZGxIJxGqskaZQiM0duFNEDXJ6ZbyjrT2bm3LIcwBOZOTciLgdWZua1pe4q4OTMXN2hz+U0Z/fMnz9/n1WrVo1p4AMDA8yePZtb160f03bD2WPhnHHra6oNzo9eyrnpzrnpbjrOzQEHHLAmM5d0qtvoDzFlZkbEyL8hXrrdWcBZAEuWLMne3t4xbd/X10dvby/LxvFDTP3HjG0M09ng/OilnJvunJvuZtrcbOi7ZR4ZvNxSfj5aytcBi1vtFpUySdIk2tBwvwxYWpaXApe2yo8r75rZD1ifmQ9t5BglSWM04mWZiLgQ6AXmRcSDwKeAlcDFEXECcD/w3tL8W8ChwFrg58DxEzBmSdIIRgz3zDy6S9VBHdomcOLGDkqStHH8hKokVchwl6QKGe6SVCHDXZIqZLhLUoUMd0mqkOEuSRWa8f8gezx1+2fb/SsPm+SRSNLG8cxdkipkuEtShQx3SaqQ4S5JFTLcJalChrskVchwl6QKGe6SVCHDXZIqZLhLUoUMd0mqkOEuSRUy3CWpQoa7JFXIcJekChnuklQhw12SKmS4S1KFDHdJqpDhLkkVMtwlqUKGuyRVyHCXpAoZ7pJUIcNdkio0a6oHMBP0rLiiY3n/ysMmeSSSNDqeuUtShTxz3wie0Uuarjxzl6QKTUi4R8TBEXF3RKyNiBUTsQ9JUnfjflkmIjYD/gZ4B/Ag8MOIuCwz7xjvfc00XsaRNFkm4pr7vsDazLwPICJWAYcDm0y4dwvx8WrfTbdfEhvS/1h/4WyKv7g2xWMeK+foNyZ7LiIzx7fDiCOBgzPzD8v6scCbM/ODQ9otB5aX1dcCd49xV/OAxzZyuDVzfrpzbrpzbrqbjnOzc2bu2Kliyt4tk5lnAWdt6PYRsTozl4zjkKri/HTn3HTn3HQ30+ZmIl5QXQcsbq0vKmWSpEkyEeH+Q2C3iNglIrYAjgIum4D9SJK6GPfLMpn5fER8ELgS2Aw4NzNvH+/9sBGXdDYRzk93zk13zk13M2puxv0FVUnS1PMTqpJUIcNdkio0I8N9U/p6g4joj4hbI+KmiFhdynaIiO9GxD3l5/alPCLiC2VebomIvVv9LC3t74mIpa3yfUr/a8u2MflHOToRcW5EPBoRt7XKJnwuuu1jOukyN6dFxLry2LkpIg5t1Z1SjvPuiHhXq7zjc6u8QeL6Un5RebMEEbFlWV9b6nsm6ZBHLSIWR8Q1EXFHRNweER8u5XU/djJzRt1oXqS9F9gV2AK4Gdh9qsc1gcfbD8wbUvYXwIqyvAL4bFk+FPg2EMB+wPWlfAfgvvJz+7K8fam7obSNsu0hU33Mw8zF24C9gdsmcy667WM63brMzWnAxzq03b08b7YEdinPp82Ge24BFwNHleUvAX9clv8E+FJZPgq4aKrnosPxLgD2LsvbAf9W5qDqx86UT/wG3FH7A1e21k8BTpnqcU3g8fbz0nC/G1hQlhcAd5flLwNHD20HHA18uVX+5VK2ALirVf6idtPxBvQMCbAJn4tu+5hutw5zcxqdw/1Fzxmad7bt3+25VQLrMWBWKf91u8Fty/Ks0i6mei5GmKdLab77qurHzky8LLMQeKC1/mApq1UC34mINdF8ZQPA/Mx8qCw/DMwvy93mZrjyBzuUzySTMRfd9jETfLBcWji3dUlgrHPzCuDJzHx+SPmL+ir160v7aalcNnoTcD2VP3ZmYrhvat6amXsDhwAnRsTb2pXZnBL4flYmZy5m2HyfCbwK2At4CPirKR3NFIuI2cA3gY9k5lPtuhofOzMx3DeprzfIzHXl56PAJTTfuvlIRCwAKD8fLc27zc1w5Ys6lM8kkzEX3fYxrWXmI5n5Qmb+Cjib5rEDY5+bx4G5ETFrSPmL+ir1c0r7aSUiNqcJ9q9l5t+X4qofOzMx3DeZrzeIiG0jYrvBZeCdwG00xzv4Sv1SmmuIlPLjyqv9+wHry5+EVwLvjIjty5/m76S5ZvoQ8FRE7Fde3T+u1ddMMRlz0W0f09pgqBS/T/PYgeZ4jirvdNkF2I3mBcGOz61yxnkNcGTZfug8D87NkcDVpf20Ue7Pc4A7M/Nzraq6HztT/eLGBr4gcijNK973AqdO9Xgm8Dh3pXnHws3A7YPHSnNN8yrgHuB7wA6lPGj+Ucq9wK3AklZffwCsLbfjW+VLaJ709wJfZBq/GAZcSHN54Zc01zVPmIy56LaP6XTrMjdfKcd+C03ILGi1P7Uc59203iHV7blVHos3lDn7O2DLUr5VWV9b6ned6rnoMDdvpbkccgtwU7kdWvtjx68fkKQKzcTLMpKkERjuklQhw12SKmS4S1KFDHdJqpDhripFxMAY2i6LiJ2GlM2LiF9GxAfGf3TSxDPcJVgG7DSk7D3AdTRfAtVRRGw2gWOSNorhrk1GROwVEdeVL9K6pHzS8EiaD6B8LZrvPN+6ND8aOAlYGBGLWn0MRMRfRcTNwP4R8f6IuKFs++XBwI+IMyNidfn+8E9P9rFKhrs2JRcAJ2fmnjSfPPxUZn4DWA0ck5l7ZeazEbGY5tOcN9B8j/n7Wn1sS/P93m+k+Q6V9wFvycy9gBeAY0q7UzNzCbAn8DsRseckHJ/0a4a7NgkRMQeYm5n/XIrOp/kHF528jybUAVbx4kszL9B8ARXAQcA+wA8j4qayvmupe29E/Ai4EXg9zT+HkCbNrJGbSJuco4H/EBGDZ+E7RcRumXkP8FxmvlDKAzg/M09pb1y+jOtjwH/KzCci4jya72CRJo1n7tokZOZ64ImI+O1SdCwweBb/NM2/XyMiXgPMzsyFmdmTmT3An9P5hdWrgCMj4pVl2x0iYmfg5cAzwPqImE/zXfzSpPLMXbXaJiLa/x3nczRfufqliNiG5v9fHl/qzivlz9J8Z/4lQ/r6JnAR8Jl2YWbeERGfpPlPWS+j+UbGEzPzuoi4EbiL5j/3/GBcj0waBb8VUpIq5GUZSaqQ4S5JFTLcJalChrskVchwl6QKGe6SVCHDXZIq9P8B4yS+3dnpwE4AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# tvariable before transformation\n", "X_train['LotArea'].hist(bins=50)\n", "plt.title('Variable before transformation')\n", "plt.xlabel('LotArea')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 0, 'LotArea')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEWCAYAAACdaNcBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAeEElEQVR4nO3df5xcdX3v8ddbIhCymIDBFZLIUo0oJGDNFvFytbtiKwI13F5/wAVNFJvSIqLGyw/xFrVQYxWpXm9po6EJFbNS/AEFUTC6crEGTSQSfikRF0KEBAQiCxEMfvrH+W49TGZ2d37vnH0/H4997Jwf8/2+5+zsZ85858w5igjMzKxYntPuAGZm1ngu7mZmBeTibmZWQC7uZmYF5OJuZlZALu5mZgXk4m5IGpb0B+NYr0dSSJpSYflHJH2xxgxDkl5f430PlrRB0uOS3ltLG51C0v+QtDn9zf6w3XlKSbpdUl+7c5iLe8eR9E1JHyszf6GkBysV3tFERFdE3NOYhG1xFvDdiNg7Ij7bzI4kLZZ0UzP7GMOngPekv9ktbcyBpJWSLsjPi4hDI2KwTZEsx8W986wCTpGkkvlvBy6PiJ3jbaiWF4IJ6kDg9lru2IxtIGm3RreZU89jbWYum2giwj8d9ANMBbYDr83N2wf4DXA4cATwA+Ax4AHgc8DuuXUDOB24G/hFbt5L0u3jgFuAXwObgY/k7tuT1l0C/DK1/8Hc8o8AX8xNHwn8R8ryE6BvlMc1BJwL3AE8CvwLsGdu+fHAhtTWfwCHpfnfAZ5Jj38YeCkwHbgMeAi4F/gw8Jy0/mLg+8DFwK+AC4A9yPaI7wO2Av8ETC2T8eWpn2dSX4+l+SuBS4BvAE8Arx/ndlyU+nwYOC+3/AhgXbrvVuDTKeNwut8TwM9zmQbTdrkdeFOunXK5hoD/Ddya5q0AuoHrgMeBbwP75Nr4N+BBsufcjcChaf4S4LfA0ynXv+f+jq9Pt/cA/oHsufLLdHuPtKwPuB9YCmwjey69s93/X0X6aXsA/9TwR4PPA1/ITf8lsCHdXkBWVKekInIn8L7cugHcAOw7UsB4dnHvA+aTvas7LBWXE9KykaK0GpiW1nso98/8EVJxB2aRFc9jU1t/kqb3q/CYhoDbgDkp2/eBC9KyP0wF4FXAbmRFcShXKAaBd+faugy4Ctg7Zf4ZcGpathjYCZyRttFUskJ/dep3b+DfgY9XyLkYuKlk3kqy4ndUeqx7jnM7fj71fzjwFPDytPwHwNvT7S7gyJK/38jf6rnAJuBDwO7A68gK9MGj5BoC1pIV9Flpu/44beM9yV4sz8/19660TUYK9YaSx31Bmb/jyPPhY6mvFwD7kb0o/23uebYzrfNcsufJk+ReWPxTZ51odwD/1PBHg/9Otqe2Z5r+PvD+Cuu+D/habjqA15Ws818Fo8z9/wG4ON0eKUovyy3/e2BFuv0Rfl/czwb+taStbwGLKvQzBJyWmz6W3++dXjJSFHLLfwr8cbo9SCruZMX/aeCQ3Lp/CQym24uB+3LLRLYH++LcvFeT3tWUybmY8sX9sjH+ZuW24+zc8h8CJ6bbNwIfBWaWaSdf3F9Dtlf9nNzy1aR3CeVype18cm76K8AluekzgK9XeAwzUv/Tc+2PVtx/DhybW/YGYCjd7gN2AFNyy7eReyHzT30/HnPvQBFxE9lb+RMkvZjsbfyXACS9VNI16cPVXwN/B8wsaWJzpbYlvUrSdyU9JGk7cNoY978XOKBMUwcCb5H02MgP2YvS/qM8tErtHggsLWlrToV+Z5LtCd5b0tasCv3sB+wFrM+1/c00vxrP2qbj3I4P5m4/SbaXDnAq2fDSXZJ+JOn4Cn0eAGyOiN/l5o32WEdszd3eUWa6Kz2G3SQtk/Tz9FwaSuuUPo5KDmDXv0P+b/arePZnRPltYHVyce9clwHvAE4BvhURI/+glwB3AXMj4nlkb9lLP3wd7VSgXyIbopgTEdPJxp9L7z8nd/tFZOOppTaT7bnPyP1Mi4hlo/Rdqd3NwIUlbe0VEavLtPEw2VjwgSVtbclNR8n6O8jGkkfanh4RlYpMpW1XOn8827F8QxF3R8RJZMMZnwCulDStzKq/BOZIyv8fj/ZYq/W/gIVkY/XTyd5xwO8fx1ht/5Jd/w7lnivWBC7unesysn+6vyA7gmbE3mQfxA1LehnwV1W2uzfwSET8RtIRZP/gpf6PpL0kHQq8E/hymXW+CPyZpDekPcA9JfVJmj1K36dLmi1pX+C8XLufB05Le8OSNE3ScZL2Lm0gIp4BrgAulLS3pAOBD6Q8u0h7vZ8HLpb0AgBJsyS9oULGrcBsSbuP8jhgfNuxLEmnSNovZXsszf5dmVVvJtvbPUvSc9Px5X8GDIy3rzHsTfZZwK/I3t38XcnyrcBo349YDXxY0n6SZgJ/Q4W/gzWei3uHioghsg+oppHtIY74IFkheZysaJUrvKP5a+Bjkh4n+2e8osw63yP7IG8N8KmIuL5Mvs1ke30fIvvQdTPZURqjPee+BFwP3EM2XntBamsd2YvY58iOpNlENvZdyRlk4+j3ADeldi8dZf2zU5tr0/DDt4GDK6z7HbKjUh6U9PAobY5nO1ZyDHC7pGHgM2Rj8TtKV4qIp8mK+RvJ3oH8I/COiLirir5GcxnZUMoWsqOY1pYsXwEckoazvl7m/heQHfVzK7CR7IPbC8qsZ02g9EGGmZkViPfczcwKyMXdzKyAXNzNzArIxd3MrIAmxImjZs6cGT09PU3v54knnmDatHKHC098nZwdnL/dOjl/J2eH5uZfv379wxFR9gt3E6K49/T0sG7duqb3Mzg4SF9fX9P7aYZOzg7O326dnL+Ts0Nz80u6t9IyD8uYmRWQi7uZWQGNWdwlXSppm6TbSuafIemudFmtv8/NP1fSJkk/HeUr3GZm1kTjGXNfSfa178tGZkjqJ/tq+eER8VTunByHACcCh5Kd/e3bkl6azvdhZmYtMuaee0TcCDxSMvuvgGUR8VRaZ1uavxAYiIinIuIXZOfrOKKBec3MbBzGdW4ZST3ANRExL01vILvSzTFklx37YET8SNLngLUR8cW03grguoi4skybS8gu1UV3d/eCgYFGnciusuHhYbq6OvN00Z2cHZy/3To5fydnh+bm7+/vXx8RveWW1Xoo5BSyS5IdCfwRcIWk0U79uYuIWA4sB+jt7Y1WHOrUyYdUdXJ2cP526+T8nZwd2pe/1qNl7ge+Gpkfkp1reibZqUHzF1yYzbMvHGBmZi1Qa3H/OtAP2WXdyC7O+zDZecVPlLSHpIOAuWTXhjQzsxYac1hG0mqyi9nOlHQ/cD7ZhQ8uTYdHPk120eMgu8DAFWQn9t8JnO4jZaycnnOuLTt/aNlxLU5iVkxjFvd0LcdyTqmw/oXAhfWEMjOz+vgbqmZmBeTibmZWQC7uZmYF5OJuZlZALu5mZgXk4m5mVkAu7mZmBeTibmZWQC7uZmYF5OJuZlZALu5mZgXk4m5mVkAu7mZmBeTibmZWQC7uZmYF5OJuZlZALu5mZgU0ZnGXdKmkbemSeqXLlkoKSTPTtCR9VtImSbdKemUzQpuZ2ejGs+e+EjimdKakOcCfAvflZr+R7KLYc4ElwCX1RzQzs2qNWdwj4kbgkTKLLgbOAiI3byFwWWTWAjMk7d+QpGZmNm6KiLFXknqAayJiXppeCLwuIs6UNAT0RsTDkq4BlkXETWm9NcDZEbGuTJtLyPbu6e7uXjAwMNCgh1TZ8PAwXV1dTe+nGTo5O+yaf+OW7WXXmz9reqsiVaVo27+TdHJ2aG7+/v7+9RHRW27ZlGobk7QX8CGyIZmaRcRyYDlAb29v9PX11dPcuAwODtKKfpqhk7PDrvkXn3Nt2fWGTu4rO7/dirb9O0knZ4f25a+6uAMvBg4CfiIJYDbwY0lHAFuAObl1Z6d5ZmbWQlUfChkRGyPiBRHRExE9wP3AKyPiQeBq4B3pqJkjge0R8UBjI5uZ2VjGcyjkauAHwMGS7pd06iirfwO4B9gEfB7464akNDOzqow5LBMRJ42xvCd3O4DT649lk1VPpbH4Zce1OIlZZ/M3VM3MCsjF3cysgFzczcwKyMXdzKyAXNzNzArIxd3MrIBc3M3MCsjF3cysgFzczcwKyMXdzKyAXNzNzArIxd3MrIBc3M3MCsjF3cysgFzczcwKyMXdzKyAXNzNzApoPJfZu1TSNkm35eZ9UtJdkm6V9DVJM3LLzpW0SdJPJb2hSbnNzGwU49lzXwkcUzLvBmBeRBwG/Aw4F0DSIcCJwKHpPv8oabeGpTUzs3EZs7hHxI3AIyXzro+InWlyLTA73V4IDETEUxHxC7ILZR/RwLxmZjYOjRhzfxdwXbo9C9icW3Z/mmdmZi2kiBh7JakHuCYi5pXMPw/oBf48IkLS54C1EfHFtHwFcF1EXFmmzSXAEoDu7u4FAwMD9T6WMQ0PD9PV1dX0fpqhk7PDrvk3btle1f3nz5re6EhVKdr27ySdnB2am7+/v399RPSWWzal1kYlLQaOB46O379CbAHm5FabnebtIiKWA8sBent7o6+vr9Yo4zY4OEgr+mmGTs4Ou+ZffM61Vd1/6OS+MddppqJt/07SydmhfflrGpaRdAxwFvCmiHgyt+hq4ERJe0g6CJgL/LD+mGZmVo0x99wlrQb6gJmS7gfOJzs6Zg/gBkmQDcWcFhG3S7oCuAPYCZweEc80K7yZmZU3ZnGPiJPKzF4xyvoXAhfWE8rMzOrjb6iamRWQi7uZWQHVfLSMWSv1VDi6ZmjZcS1OYtYZvOduZlZALu5mZgXk4m5mVkAu7mZmBeTibmZWQD5axppq5CiXpfN3Vn0+GTOrnffczcwKyMXdzKyAXNzNzArIxd3MrIBc3M3MCsjF3cysgFzczcwKyMXdzKyAXNzNzApozOIu6VJJ2yTdlpu3r6QbJN2dfu+T5kvSZyVtknSrpFc2M7yZmZU3nj33lcAxJfPOAdZExFxgTZoGeCMwN/0sAS5pTEwzM6vGmMU9Im4EHimZvRBYlW6vAk7Izb8sMmuBGZL2b1BWMzMbJ0XE2CtJPcA1ETEvTT8WETPSbQGPRsQMSdcAyyLiprRsDXB2RKwr0+YSsr17uru7FwwMDDTmEY1ieHiYrq6upvfTDJ2afeOW7QB0T4WtOxrf/vxZ0xvfaBmduv1HdHL+Ts4Ozc3f39+/PiJ6yy2r+6yQERGSxn6F2PV+y4HlAL29vdHX11dvlDENDg7Sin6aoVOzL86dFfKijY0/CenQyX0Nb7OcTt3+Izo5fydnh/blr/Voma0jwy3p97Y0fwswJ7fe7DTPzMxaqNbifjWwKN1eBFyVm/+OdNTMkcD2iHigzoxmZlalMd8nS1oN9AEzJd0PnA8sA66QdCpwL/DWtPo3gGOBTcCTwDubkNnMzMYwZnGPiJMqLDq6zLoBnF5vKDMzq4+/oWpmVkAu7mZmBeTibmZWQC7uZmYF5OJuZlZALu5mZgXk4m5mVkAu7mZmBeTibmZWQC7uZmYF5OJuZlZALu5mZgXk4m5mVkAu7mZmBeTibmZWQC7uZmYF5OJuZlZAdRV3Se+XdLuk2yStlrSnpIMk3Sxpk6QvS9q9UWHNzGx8ai7ukmYB7wV6I2IesBtwIvAJ4OKIeAnwKHBqI4Kamdn41TssMwWYKmkKsBfwAPA64Mq0fBVwQp19mJlZlZRd07rGO0tnAhcCO4DrgTOBtWmvHUlzgOvSnn3pfZcASwC6u7sXDAwM1JxjvIaHh+nq6mp6P80w0bNv3LJ91OXdU2Hrjsb3O3/W9MY3WsZE3/5j6eT8nZwdmpu/v79/fUT0lls2pdZGJe0DLAQOAh4D/g04Zrz3j4jlwHKA3t7e6OvrqzXKuA0ODtKKfpphomdffM61oy5fOn8nF22s+elW0dDJfQ1vs5yJvv3H0sn5Ozk7tC9/PcMyrwd+EREPRcRvga8CRwEz0jANwGxgS50ZzcysSvXsSt0HHClpL7JhmaOBdcB3gTcDA8Ai4Kp6Q5pV0lPhHcPQsuNanMRsYql5zz0ibib74PTHwMbU1nLgbOADkjYBzwdWNCCnmZlVoa5B0Ig4Hzi/ZPY9wBH1tGtmZvXxN1TNzArIxd3MrIBc3M3MCsjF3cysgFzczcwKyMXdzKyAXNzNzArIxd3MrIBc3M3MCsjF3cysgFzczcwKyMXdzKyAXNzNzArIxd3MrIBc3M3MCsjF3cysgFzczcwKqK7iLmmGpCsl3SXpTkmvlrSvpBsk3Z1+79OosGZmNj717rl/BvhmRLwMOBy4EzgHWBMRc4E1adrMzFqo5uIuaTrwWtIFsCPi6Yh4DFgIrEqrrQJOqC+imZlVq54994OAh4B/kXSLpC9ImgZ0R8QDaZ0Hge56Q5qZWXUUEbXdUeoF1gJHRcTNkj4D/Bo4IyJm5NZ7NCJ2GXeXtARYAtDd3b1gYGCgphzVGB4epqurq+n9NMNEz75xy/ZRl3dPha07WhQGmD9rekPbm+jbfyydnL+Ts0Nz8/f396+PiN5yy+op7i8E1kZET5p+Ddn4+kuAvoh4QNL+wGBEHDxaW729vbFu3bqaclRjcHCQvr6+pvfTDBM9e8851466fOn8nVy0cUqL0sDQsuMa2t5E3/5j6eT8nZwdmptfUsXiXvOwTEQ8CGyWNFK4jwbuAK4GFqV5i4Crau3DzMxqU++u1BnA5ZJ2B+4B3kn2gnGFpFOBe4G31tmHmZlVqa7iHhEbgHJvCY6up10zM6uPv6FqZlZALu5mZgXk4m5mVkAu7mZmBeTibmZWQK37VolZC1X6UlWjv9xkNlF5z93MrIBc3M3MCsjF3cysgFzczcwKyMXdzKyAXNzNzArIxd3MrIBc3M3MCshfYrKy/CUgs87mPXczswJycTczKyAXdzOzAqq7uEvaTdItkq5J0wdJulnSJklfTtdXNTOzFmrEnvuZwJ256U8AF0fES4BHgVMb0IeZmVWhruIuaTZwHPCFNC3gdcCVaZVVwAn19GFmZtVTRNR+Z+lK4OPA3sAHgcXA2rTXjqQ5wHURMa/MfZcASwC6u7sXDAwM1JxjvIaHh+nq6mp6P83Q6uwbt2wvO3/+rOlVrT+ieyps3VF3rLpVyj+WTn7uQGfn7+Ts0Nz8/f396yOit9yymo9zl3Q8sC0i1kvqq/b+EbEcWA7Q29sbfX1VN1G1wcFBWtFPM7Q6++JKx7mfXD5DpfVHLJ2/k4s2tv9rFZXyj6WTnzvQ2fk7OTu0L389/21HAW+SdCywJ/A84DPADElTImInMBvYUn9MMzOrRs1j7hFxbkTMjoge4ETgOxFxMvBd4M1ptUXAVXWnNDOzqjTjOPezgQ9I2gQ8H1jRhD7MzGwUDRkEjYhBYDDdvgc4ohHtmplZbfwNVTOzAnJxNzMrIBd3M7MCcnE3MysgF3czswJycTczK6D2fx/cOkqly++Z2cTiPXczswJycTczKyAXdzOzAnJxNzMrIH+gapPKaB8IDy07roVJzJrLe+5mZgXk4m5mVkAeljFLKg3ZeLjGOpH33M3MCsjF3cysgGou7pLmSPqupDsk3S7pzDR/X0k3SLo7/d6ncXHNzGw86tlz3wksjYhDgCOB0yUdApwDrImIucCaNG1mZi1U8weqEfEA8EC6/bikO4FZwEKgL622iuzaqmfXldLGxR8ImtkIRUT9jUg9wI3APOC+iJiR5gt4dGS65D5LgCUA3d3dCwYGBurOMZbh4WG6urqa3k8zjCf7xi3by86fP2t61f1VaqtW3VNh646GNtky82dN7+jnDhT/uT+RNTN/f3//+ojoLbes7uIuqQv4HnBhRHxV0mP5Yi7p0YgYddy9t7c31q1bV1eO8RgcHKSvr6/p/TTDeLI3cs+90af2XTp/Jxdt7Mwjb4eWHdfRzx0o/nN/ImtmfkkVi3tdR8tIei7wFeDyiPhqmr1V0v5p+f7Atnr6MDOz6tW8K5WGXFYAd0bEp3OLrgYWAcvS76vqSmhN5YtvmBVTPe+TjwLeDmyUtCHN+xBZUb9C0qnAvcBb60poZmZVq+domZsAVVh8dK3tmplZ/TrzE66C8SGMZtZoLu6TgMfVzSYfn1vGzKyAXNzNzArIxd3MrIBc3M3MCsjF3cysgFzczcwKyIdCNkGzj1v3oY0Tg7+fYBOZ99zNzArIxd3MrIBc3M3MCmhSjrl7rNTMim5SFvdG8QebVg3vVFgrubjnTLRinc+zdP5OFk+wfJNFzznXVrX9J9rzyCYnj7mbmRWQi7uZWQEVelim9O2xhzasCDx2b+PRtOIu6RjgM8BuwBciYlmz+uoUHou1ctr1vKilX7+AdI6mFHdJuwH/D/gT4H7gR5Kujog7Gt2XC6ZZphV79NX20a53Gd4WzRtzPwLYFBH3RMTTwACwsEl9mZlZCUVE4xuV3gwcExHvTtNvB14VEe/JrbMEWJImDwZ+2vAgu5oJPNyCfpqhk7OD87dbJ+fv5OzQ3PwHRsR+5Ra07QPViFgOLG9ln5LWRURvK/tslE7ODs7fbp2cv5OzQ/vyN2tYZgswJzc9O80zM7MWaFZx/xEwV9JBknYHTgSublJfZmZWoinDMhGxU9J7gG+RHQp5aUTc3oy+qtTSYaAG6+Ts4Pzt1sn5Ozk7tCl/Uz5QNTOz9vLpB8zMCsjF3cysgCZFcZf0fkm3S7pN0mpJe7Y7UzUknZmy3y7pfe3OMxZJl0raJum23Lx9Jd0g6e70e592ZhxNhfxvSdv/d5Im7GF5FbJ/UtJdkm6V9DVJM9oYcVQV8v9tyr5B0vWSDmhnxtGUy59btlRSSJrZiiyFL+6SZgHvBXojYh7ZB7wntjfV+EmaB/wF2bd+DweOl/SS9qYa00rgmJJ55wBrImIusCZNT1Qr2TX/bcCfAze2PE11VrJr9huAeRFxGPAz4NxWh6rCSnbN/8mIOCwiXgFcA/xNq0NVYSW75kfSHOBPgftaFaTwxT2ZAkyVNAXYC/hlm/NU4+XAzRHxZETsBL5HVmQmrIi4EXikZPZCYFW6vQo4oZWZqlEuf0TcGRGt+BZ1XSpkvz49dwDWkn3vZEKqkP/XuclpwIQ9CqTCcx/gYuAsWpi98MU9IrYAnyJ7xXwA2B4R17c3VVVuA14j6fmS9gKO5dlfEOsU3RHxQLr9INDdzjCT2LuA69odolqSLpS0GTiZib3nvgtJC4EtEfGTVvZb+OKexnYXAgcBBwDTJJ3S3lTjFxF3Ap8Arge+CWwAnmlnpnpFdvzthN37KipJ5wE7gcvbnaVaEXFeRMwhy/6esdafKNIO2YdowwtS4Ys78HrgFxHxUET8Fvgq8N/anKkqEbEiIhZExGuBR8nGTTvNVkn7A6Tf29qcZ1KRtBg4Hjg5OvvLLZcD/7PdIarwYrIdy59IGiIbEvuxpBc2u+PJUNzvA46UtJckAUcDd7Y5U1UkvSD9fhHZePuX2puoJlcDi9LtRcBVbcwyqaQL55wFvCkinmx3nmpJmpubXAjc1a4s1YqIjRHxgojoiYgesutbvDIiHmx235PiG6qSPgq8jewt6S3AuyPiqfamGj9J/x94PvBb4AMRsabNkUYlaTXQR3aq063A+cDXgSuAFwH3Am+NiHIfPLVdhfyPAP8X2A94DNgQEW9oU8SKKmQ/F9gD+FVabW1EnNaWgGOokP9YstOC/47suXNa+ixtwimXPyJW5JYPkR251/RTGE+K4m5mNtlMhmEZM7NJx8XdzKyAXNzNzArIxd3MrIBc3M3MCsjF3QpJ0nAV6y4uPdOgpJmSfitpQh4yaDYWF3czWEx2aoq8t5CdZOukSneStFsTM5nVxcXdJg1Jr5C0Nnde830kvRnoBS5P5wufmlY/CVgKzJI0O9fGsKSLJP0EeLWkUyT9MN33n0cKvqRLJK1L54D/aKsfq5mLu00mlwFnp/OabyT79uCVwDqyc668IiJ2pHNv7x8RPyT7Vu3bcm1MIzsF8+Fk3/h8G3BUOtf4M2RnLQQ4LyJ6gcOAP5Z0WAsen9l/cXG3SUHSdGBGRHwvzVoFvLbC6m8jK+oAAzx7aOYZ4Cvp9tHAAuBHkjak6T9Iy94q6cdkp7s4FDikAQ/DbNymtDuA2QR0EvBCSSN74QdImhsRdwO/iYiRUy4LWBURz7qykaSDgA8CfxQRj0paCXTUpR2t83nP3SaFiNgOPCrpNWnW28muagXwOLA3gKSXAl0RMSt3Jr+PU/6D1TXAm3Nn7dxX0oHA84AngO2SuoE3NulhmVXkPXcrqr0k3Z+b/jTZqYb/KV1A4R7gnWnZyjR/B/C19JP3FeDLwMfyMyPiDkkfBq6X9Byys3aeHhFrJd1CdmrazcD3G/rIzMbBZ4U0MysgD8uYmRWQi7uZWQG5uJuZFZCLu5lZAbm4m5kVkIu7mVkBubibmRXQfwIi1kPwLHIT7QAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# transformed variable\n", "train_t['LotArea'].hist(bins=50)\n", "plt.title('Variable before transformation')\n", "plt.xlabel('LotArea')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Automatically select numerical variables\n", "\n", "The transformer will transform all numerical variables if no variables are specified." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# load numerical variables only\n", "\n", "variables = ['LotFrontage', 'LotArea',\n", " '1stFlrSF', 'GrLivArea',\n", " 'TotRmsAbvGrd', 'SalePrice']\n", "\n", "data = pd.read_csv('houseprice.csv', usecols=variables)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((1022, 5), (438, 5))" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's separate into training and testing set\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " data.drop(['SalePrice'], axis=1), data['SalePrice'], test_size=0.3, random_state=0)\n", "\n", "X_train.shape, X_test.shape" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Impute missing values\n", "\n", "arbitrary_imputer = ArbitraryNumberImputer(arbitrary_number=2)\n", "\n", "arbitrary_imputer.fit(X_train)\n", "\n", "# impute variables\n", "train_t = arbitrary_imputer.transform(X_train)\n", "test_t = arbitrary_imputer.transform(X_test)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BoxCoxTransformer()" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's transform all numerical variables\n", "\n", "bct = BoxCoxTransformer()\n", "\n", "bct.fit(train_t)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['LotFrontage', 'LotArea', '1stFlrSF', 'GrLivArea', 'TotRmsAbvGrd']" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# variables that will be transformed\n", "\n", "bct.variables_" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# transform variables\n", "train_t = bct.transform(train_t)\n", "test_t = bct.transform(test_t)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'LotFrontage': 0.7837538110249009,\n", " 'LotArea': 0.022716974992922984,\n", " '1stFlrSF': 0.024760203538733927,\n", " 'GrLivArea': 0.06854346283829917,\n", " 'TotRmsAbvGrd': 0.26841547941861493}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# learned parameters\n", "\n", "bct.lambda_dict_" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "fenotebook", "language": "python", "name": "fenotebook" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }