{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "import os\n", "os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\";\n", "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"; \n", "\n", "import urllib.request\n", "import pandas as pd\n", "import numpy as np\n", "pd.set_option('display.max_columns', None)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import ktrain\n", "from ktrain import tabular" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting House Prices\n", "\n", "In this notebook, we will predict the prices of houses from various house attributes. The dataset [can be downloaded from Kaggle here](https://www.kaggle.com/c/house-prices-advanced-regression-techniques)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## STEP 1: Load and Preprocess Data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "train_df = pd.read_csv('data/housing_price/train.csv', index_col=0)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilitiesLotConfigLandSlopeNeighborhoodCondition1Condition2BldgTypeHouseStyleOverallQualOverallCondYearBuiltYearRemodAddRoofStyleRoofMatlExterior1stExterior2ndMasVnrTypeMasVnrAreaExterQualExterCondFoundationBsmtQualBsmtCondBsmtExposureBsmtFinType1BsmtFinSF1BsmtFinType2BsmtFinSF2BsmtUnfSFTotalBsmtSFHeatingHeatingQCCentralAirElectrical1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaBsmtFullBathBsmtHalfBathFullBathHalfBathBedroomAbvGrKitchenAbvGrKitchenQualTotRmsAbvGrdFunctionalFireplacesFireplaceQuGarageTypeGarageYrBltGarageFinishGarageCarsGarageAreaGarageQualGarageCondPavedDriveWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
Id
160RL65.08450PaveNaNRegLvlAllPubInsideGtlCollgCrNormNorm1Fam2Story7520032003GableCompShgVinylSdVinylSdBrkFace196.0GdTAPConcGdTANoGLQ706Unf0150856GasAExYSBrkr85685401710102131Gd8Typ0NaNAttchd2003.0RFn2548TATAY0610000NaNNaNNaN022008WDNormal208500
220RL80.09600PaveNaNRegLvlAllPubFR2GtlVeenkerFeedrNorm1Fam1Story6819761976GableCompShgMetalSdMetalSdNone0.0TATACBlockGdTAGdALQ978Unf02841262GasAExYSBrkr1262001262012031TA6Typ1TAAttchd1976.0RFn2460TATAY29800000NaNNaNNaN052007WDNormal181500
360RL68.011250PaveNaNIR1LvlAllPubInsideGtlCollgCrNormNorm1Fam2Story7520012002GableCompShgVinylSdVinylSdBrkFace162.0GdTAPConcGdTAMnGLQ486Unf0434920GasAExYSBrkr92086601786102131Gd6Typ1TAAttchd2001.0RFn2608TATAY0420000NaNNaNNaN092008WDNormal223500
470RL60.09550PaveNaNIR1LvlAllPubCornerGtlCrawforNormNorm1Fam2Story7519151970GableCompShgWd SdngWd ShngNone0.0TATABrkTilTAGdNoALQ216Unf0540756GasAGdYSBrkr96175601717101031Gd7Typ1GdDetchd1998.0Unf3642TATAY035272000NaNNaNNaN022006WDAbnorml140000
560RL84.014260PaveNaNIR1LvlAllPubFR2GtlNoRidgeNormNorm1Fam2Story8520002000GableCompShgVinylSdVinylSdBrkFace350.0GdTAPConcGdTAAvGLQ655Unf04901145GasAExYSBrkr1145105302198102141Gd9Typ1TAAttchd2000.0RFn3836TATAY192840000NaNNaNNaN0122008WDNormal250000
\n", "
" ], "text/plain": [ " MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \\\n", "Id \n", "1 60 RL 65.0 8450 Pave NaN Reg \n", "2 20 RL 80.0 9600 Pave NaN Reg \n", "3 60 RL 68.0 11250 Pave NaN IR1 \n", "4 70 RL 60.0 9550 Pave NaN IR1 \n", "5 60 RL 84.0 14260 Pave NaN IR1 \n", "\n", " LandContour Utilities LotConfig LandSlope Neighborhood Condition1 \\\n", "Id \n", "1 Lvl AllPub Inside Gtl CollgCr Norm \n", "2 Lvl AllPub FR2 Gtl Veenker Feedr \n", "3 Lvl AllPub Inside Gtl CollgCr Norm \n", "4 Lvl AllPub Corner Gtl Crawfor Norm \n", "5 Lvl AllPub FR2 Gtl NoRidge Norm \n", "\n", " Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt \\\n", "Id \n", "1 Norm 1Fam 2Story 7 5 2003 \n", "2 Norm 1Fam 1Story 6 8 1976 \n", "3 Norm 1Fam 2Story 7 5 2001 \n", "4 Norm 1Fam 2Story 7 5 1915 \n", "5 Norm 1Fam 2Story 8 5 2000 \n", "\n", " YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType \\\n", "Id \n", "1 2003 Gable CompShg VinylSd VinylSd BrkFace \n", "2 1976 Gable CompShg MetalSd MetalSd None \n", "3 2002 Gable CompShg VinylSd VinylSd BrkFace \n", "4 1970 Gable CompShg Wd Sdng Wd Shng None \n", "5 2000 Gable CompShg VinylSd VinylSd BrkFace \n", "\n", " MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure \\\n", "Id \n", "1 196.0 Gd TA PConc Gd TA No \n", "2 0.0 TA TA CBlock Gd TA Gd \n", "3 162.0 Gd TA PConc Gd TA Mn \n", "4 0.0 TA TA BrkTil TA Gd No \n", "5 350.0 Gd TA PConc Gd TA Av \n", "\n", " BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF \\\n", "Id \n", "1 GLQ 706 Unf 0 150 856 \n", "2 ALQ 978 Unf 0 284 1262 \n", "3 GLQ 486 Unf 0 434 920 \n", "4 ALQ 216 Unf 0 540 756 \n", "5 GLQ 655 Unf 0 490 1145 \n", "\n", " Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF \\\n", "Id \n", "1 GasA Ex Y SBrkr 856 854 0 \n", "2 GasA Ex Y SBrkr 1262 0 0 \n", "3 GasA Ex Y SBrkr 920 866 0 \n", "4 GasA Gd Y SBrkr 961 756 0 \n", "5 GasA Ex Y SBrkr 1145 1053 0 \n", "\n", " GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr \\\n", "Id \n", "1 1710 1 0 2 1 3 \n", "2 1262 0 1 2 0 3 \n", "3 1786 1 0 2 1 3 \n", "4 1717 1 0 1 0 3 \n", "5 2198 1 0 2 1 4 \n", "\n", " KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu \\\n", "Id \n", "1 1 Gd 8 Typ 0 NaN \n", "2 1 TA 6 Typ 1 TA \n", "3 1 Gd 6 Typ 1 TA \n", "4 1 Gd 7 Typ 1 Gd \n", "5 1 Gd 9 Typ 1 TA \n", "\n", " GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual \\\n", "Id \n", "1 Attchd 2003.0 RFn 2 548 TA \n", "2 Attchd 1976.0 RFn 2 460 TA \n", "3 Attchd 2001.0 RFn 2 608 TA \n", "4 Detchd 1998.0 Unf 3 642 TA \n", "5 Attchd 2000.0 RFn 3 836 TA \n", "\n", " GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch \\\n", "Id \n", "1 TA Y 0 61 0 0 \n", "2 TA Y 298 0 0 0 \n", "3 TA Y 0 42 0 0 \n", "4 TA Y 0 35 272 0 \n", "5 TA Y 192 84 0 0 \n", "\n", " ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold \\\n", "Id \n", "1 0 0 NaN NaN NaN 0 2 2008 \n", "2 0 0 NaN NaN NaN 0 5 2007 \n", "3 0 0 NaN NaN NaN 0 9 2008 \n", "4 0 0 NaN NaN NaN 0 2 2006 \n", "5 0 0 NaN NaN NaN 0 12 2008 \n", "\n", " SaleType SaleCondition SalePrice \n", "Id \n", "1 WD Normal 208500 \n", "2 WD Normal 181500 \n", "3 WD Normal 223500 \n", "4 WD Abnorml 140000 \n", "5 WD Normal 250000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_df.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "train_df.drop(['Alley','PoolQC','MiscFeature','Fence','FireplaceQu','Utilities'], 1, inplace=True)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MSSubClassMSZoningLotFrontageLotAreaStreetLotShapeLandContourLotConfigLandSlopeNeighborhoodCondition1Condition2BldgTypeHouseStyleOverallQualOverallCondYearBuiltYearRemodAddRoofStyleRoofMatlExterior1stExterior2ndMasVnrTypeMasVnrAreaExterQualExterCondFoundationBsmtQualBsmtCondBsmtExposureBsmtFinType1BsmtFinSF1BsmtFinType2BsmtFinSF2BsmtUnfSFTotalBsmtSFHeatingHeatingQCCentralAirElectrical1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaBsmtFullBathBsmtHalfBathFullBathHalfBathBedroomAbvGrKitchenAbvGrKitchenQualTotRmsAbvGrdFunctionalFireplacesGarageTypeGarageYrBltGarageFinishGarageCarsGarageAreaGarageQualGarageCondPavedDriveWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
Id
160RL65.08450PaveRegLvlInsideGtlCollgCrNormNorm1Fam2Story7520032003GableCompShgVinylSdVinylSdBrkFace196.0GdTAPConcGdTANoGLQ706Unf0150856GasAExYSBrkr85685401710102131Gd8Typ0Attchd2003.0RFn2548TATAY0610000022008WDNormal208500
220RL80.09600PaveRegLvlFR2GtlVeenkerFeedrNorm1Fam1Story6819761976GableCompShgMetalSdMetalSdNone0.0TATACBlockGdTAGdALQ978Unf02841262GasAExYSBrkr1262001262012031TA6Typ1Attchd1976.0RFn2460TATAY29800000052007WDNormal181500
360RL68.011250PaveIR1LvlInsideGtlCollgCrNormNorm1Fam2Story7520012002GableCompShgVinylSdVinylSdBrkFace162.0GdTAPConcGdTAMnGLQ486Unf0434920GasAExYSBrkr92086601786102131Gd6Typ1Attchd2001.0RFn2608TATAY0420000092008WDNormal223500
470RL60.09550PaveIR1LvlCornerGtlCrawforNormNorm1Fam2Story7519151970GableCompShgWd SdngWd ShngNone0.0TATABrkTilTAGdNoALQ216Unf0540756GasAGdYSBrkr96175601717101031Gd7Typ1Detchd1998.0Unf3642TATAY035272000022006WDAbnorml140000
560RL84.014260PaveIR1LvlFR2GtlNoRidgeNormNorm1Fam2Story8520002000GableCompShgVinylSdVinylSdBrkFace350.0GdTAPConcGdTAAvGLQ655Unf04901145GasAExYSBrkr1145105302198102141Gd9Typ1Attchd2000.0RFn3836TATAY1928400000122008WDNormal250000
\n", "
" ], "text/plain": [ " MSSubClass MSZoning LotFrontage LotArea Street LotShape LandContour \\\n", "Id \n", "1 60 RL 65.0 8450 Pave Reg Lvl \n", "2 20 RL 80.0 9600 Pave Reg Lvl \n", "3 60 RL 68.0 11250 Pave IR1 Lvl \n", "4 70 RL 60.0 9550 Pave IR1 Lvl \n", "5 60 RL 84.0 14260 Pave IR1 Lvl \n", "\n", " LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle \\\n", "Id \n", "1 Inside Gtl CollgCr Norm Norm 1Fam 2Story \n", "2 FR2 Gtl Veenker Feedr Norm 1Fam 1Story \n", "3 Inside Gtl CollgCr Norm Norm 1Fam 2Story \n", "4 Corner Gtl Crawfor Norm Norm 1Fam 2Story \n", "5 FR2 Gtl NoRidge Norm Norm 1Fam 2Story \n", "\n", " OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl \\\n", "Id \n", "1 7 5 2003 2003 Gable CompShg \n", "2 6 8 1976 1976 Gable CompShg \n", "3 7 5 2001 2002 Gable CompShg \n", "4 7 5 1915 1970 Gable CompShg \n", "5 8 5 2000 2000 Gable CompShg \n", "\n", " Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond \\\n", "Id \n", "1 VinylSd VinylSd BrkFace 196.0 Gd TA \n", "2 MetalSd MetalSd None 0.0 TA TA \n", "3 VinylSd VinylSd BrkFace 162.0 Gd TA \n", "4 Wd Sdng Wd Shng None 0.0 TA TA \n", "5 VinylSd VinylSd BrkFace 350.0 Gd TA \n", "\n", " Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 \\\n", "Id \n", "1 PConc Gd TA No GLQ 706 \n", "2 CBlock Gd TA Gd ALQ 978 \n", "3 PConc Gd TA Mn GLQ 486 \n", "4 BrkTil TA Gd No ALQ 216 \n", "5 PConc Gd TA Av GLQ 655 \n", "\n", " BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC \\\n", "Id \n", "1 Unf 0 150 856 GasA Ex \n", "2 Unf 0 284 1262 GasA Ex \n", "3 Unf 0 434 920 GasA Ex \n", "4 Unf 0 540 756 GasA Gd \n", "5 Unf 0 490 1145 GasA Ex \n", "\n", " CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea \\\n", "Id \n", "1 Y SBrkr 856 854 0 1710 \n", "2 Y SBrkr 1262 0 0 1262 \n", "3 Y SBrkr 920 866 0 1786 \n", "4 Y SBrkr 961 756 0 1717 \n", "5 Y SBrkr 1145 1053 0 2198 \n", "\n", " BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr \\\n", "Id \n", "1 1 0 2 1 3 \n", "2 0 1 2 0 3 \n", "3 1 0 2 1 3 \n", "4 1 0 1 0 3 \n", "5 1 0 2 1 4 \n", "\n", " KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces GarageType \\\n", "Id \n", "1 1 Gd 8 Typ 0 Attchd \n", "2 1 TA 6 Typ 1 Attchd \n", "3 1 Gd 6 Typ 1 Attchd \n", "4 1 Gd 7 Typ 1 Detchd \n", "5 1 Gd 9 Typ 1 Attchd \n", "\n", " GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond \\\n", "Id \n", "1 2003.0 RFn 2 548 TA TA \n", "2 1976.0 RFn 2 460 TA TA \n", "3 2001.0 RFn 2 608 TA TA \n", "4 1998.0 Unf 3 642 TA TA \n", "5 2000.0 RFn 3 836 TA TA \n", "\n", " PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch \\\n", "Id \n", "1 Y 0 61 0 0 0 \n", "2 Y 298 0 0 0 0 \n", "3 Y 0 42 0 0 0 \n", "4 Y 0 35 272 0 0 \n", "5 Y 192 84 0 0 0 \n", "\n", " PoolArea MiscVal MoSold YrSold SaleType SaleCondition SalePrice \n", "Id \n", "1 0 0 2 2008 WD Normal 208500 \n", "2 0 0 5 2007 WD Normal 181500 \n", "3 0 0 9 2008 WD Normal 223500 \n", "4 0 0 2 2006 WD Abnorml 140000 \n", "5 0 0 12 2008 WD Normal 250000 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_df.head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "processing train: 1309 rows x 74 columns\n", "\n", "The following integer column(s) are being treated as categorical variables:\n", "['MSSubClass', 'OverallQual', 'OverallCond', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageCars', '3SsnPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold']\n", "To treat any of these column(s) as numerical, cast the column to float in DataFrame or CSV\n", " and re-run tabular_from* function.\n", "\n", "processing test: 151 rows x 74 columns\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/amaiya/projects/ghub/ktrain/ktrain/utils.py:556: UserWarning: Task is being treated as REGRESSION because either class_names argument was not supplied or is_regression=True. If this is incorrect, change accordingly.\n", " 'If this is incorrect, change accordingly.')\n" ] } ], "source": [ "trn, val, preproc = tabular.tabular_from_df(train_df, is_regression=True, \n", " label_columns='SalePrice', random_state=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## STEP 2: Create Model and Wrap in `Learner`" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "done.\n" ] } ], "source": [ "model = tabular.tabular_regression_model('mlp', trn)\n", "learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=128)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## STEP 3: Estimate LR\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "simulating training for different learning rates... this may take a few moments...\n", "Train for 10 steps\n", "Epoch 1/16\n", "10/10 [==============================] - 5s 526ms/step - loss: 39023478485.7307 - mae: 181231.9375\n", "Epoch 2/16\n", "10/10 [==============================] - 1s 97ms/step - loss: 39033418674.8315 - mae: 181204.8594\n", "Epoch 3/16\n", "10/10 [==============================] - 1s 99ms/step - loss: 38418801680.4742 - mae: 179555.7969\n", "Epoch 4/16\n", "10/10 [==============================] - 1s 100ms/step - loss: 38186333255.9661 - mae: 179885.9062\n", "Epoch 5/16\n", "10/10 [==============================] - 1s 96ms/step - loss: 39033367996.8027 - mae: 181204.7500\n", "Epoch 6/16\n", "10/10 [==============================] - 1s 98ms/step - loss: 39178636118.9229 - mae: 181407.5469\n", "Epoch 7/16\n", "10/10 [==============================] - 1s 97ms/step - loss: 39022837652.4843 - mae: 181230.2969\n", "Epoch 8/16\n", "10/10 [==============================] - 1s 96ms/step - loss: 38555598667.6511 - mae: 180373.7500\n", "Epoch 9/16\n", "10/10 [==============================] - 1s 96ms/step - loss: 38548896255.5665 - mae: 180083.1719\n", "Epoch 10/16\n", "10/10 [==============================] - 1s 97ms/step - loss: 35094031275.8950 - mae: 170316.9062\n", "Epoch 11/16\n", "10/10 [==============================] - 1s 101ms/step - loss: 10132749122.6554 - mae: 77930.0859\n", "Epoch 12/16\n", "10/10 [==============================] - 1s 96ms/step - loss: 7250272029.6969 - mae: 73190.8281\n", "Epoch 13/16\n", "10/10 [==============================] - 1s 98ms/step - loss: 7010621260.5182 - mae: 70700.7656\n", "Epoch 14/16\n", "10/10 [==============================] - 1s 96ms/step - loss: 25998046822.4000 - mae: 135040.7500\n", "Epoch 15/16\n", " 7/10 [====================>.........] - ETA: 0s - loss: 874942046208.0000 - mae: 330034.8438\n", "\n", "done.\n", "Visually inspect loss plot and select learning rate associated with falling loss\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "learner.lr_find(show_plot=True, max_epochs=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## STEP 4: Train" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "early_stopping automatically enabled at patience=5\n", "reduce_on_plateau automatically enabled at patience=2\n", "\n", "\n", "begin training using triangular learning rate policy with max lr of 0.1...\n", "Train for 11 steps, validate for 5 steps\n", "Epoch 1/1024\n", "11/11 [==============================] - 6s 541ms/step - loss: 33410952382.0443 - mae: 156290.1250 - val_loss: 15485848576.0000 - val_mae: 114282.9531\n", "Epoch 2/1024\n", "11/11 [==============================] - 1s 132ms/step - loss: 24494850035.0924 - mae: 145351.7188 - val_loss: 25421583155.2000 - val_mae: 153021.7188\n", "Epoch 3/1024\n", "11/11 [==============================] - 1s 136ms/step - loss: 18668215047.6272 - mae: 125661.6016 - val_loss: 12586017996.8000 - val_mae: 91371.8438\n", "Epoch 4/1024\n", "11/11 [==============================] - 1s 133ms/step - loss: 12545563277.9832 - mae: 84823.7500 - val_loss: 8312404070.4000 - val_mae: 86080.5156\n", "Epoch 5/1024\n", "11/11 [==============================] - 2s 140ms/step - loss: 5367907446.2460 - mae: 61690.7188 - val_loss: 3880212326.4000 - val_mae: 54640.9609\n", "Epoch 6/1024\n", "11/11 [==============================] - 2s 137ms/step - loss: 2749079195.2330 - mae: 35548.7422 - val_loss: 994916563.2000 - val_mae: 22115.1758\n", "Epoch 7/1024\n", "11/11 [==============================] - 2s 137ms/step - loss: 1747315035.4652 - mae: 28340.7891 - val_loss: 942788454.4000 - val_mae: 20258.2383\n", "Epoch 8/1024\n", "11/11 [==============================] - 1s 135ms/step - loss: 1317610653.1398 - mae: 24707.8145 - val_loss: 795084512.0000 - val_mae: 18270.5781\n", "Epoch 9/1024\n", "11/11 [==============================] - 2s 137ms/step - loss: 1345168384.1711 - mae: 23964.7910 - val_loss: 751457507.2000 - val_mae: 17889.5137\n", "Epoch 10/1024\n", "11/11 [==============================] - 2s 138ms/step - loss: 1204894672.8556 - mae: 22083.0527 - val_loss: 729885836.8000 - val_mae: 17330.7344\n", "Epoch 11/1024\n", "11/11 [==============================] - 1s 130ms/step - loss: 1293521195.8075 - mae: 24240.8613 - val_loss: 845886457.6000 - val_mae: 19439.5332\n", "Epoch 12/1024\n", " 8/11 [====================>.........] - ETA: 0s - loss: 1224515284.0000 - mae: 22132.0742\n", "Epoch 00012: Reducing Max LR on Plateau: new max lr will be 0.05 (if not early_stopping).\n", "11/11 [==============================] - 1s 134ms/step - loss: 1196409911.1933 - mae: 22769.3926 - val_loss: 997325734.4000 - val_mae: 21638.4668\n", "Epoch 13/1024\n", "11/11 [==============================] - 1s 136ms/step - loss: 1081636823.2361 - mae: 22792.1152 - val_loss: 709030249.6000 - val_mae: 18628.7402\n", "Epoch 14/1024\n", "11/11 [==============================] - 1s 136ms/step - loss: 984812624.6539 - mae: 20191.5820 - val_loss: 662907520.0000 - val_mae: 16497.9941\n", "Epoch 15/1024\n", "11/11 [==============================] - 1s 135ms/step - loss: 984294369.7418 - mae: 19897.7480 - val_loss: 666114873.6000 - val_mae: 16434.1055\n", "Epoch 16/1024\n", " 9/11 [=======================>......] - ETA: 0s - loss: 895732711.1111 - mae: 19749.8887\n", "Epoch 00016: Reducing Max LR on Plateau: new max lr will be 0.025 (if not early_stopping).\n", "11/11 [==============================] - 1s 135ms/step - loss: 957869730.4446 - mae: 19990.8848 - val_loss: 708456806.4000 - val_mae: 18026.2402\n", "Epoch 17/1024\n", "11/11 [==============================] - 1s 133ms/step - loss: 860801337.9251 - mae: 19515.2520 - val_loss: 695209459.2000 - val_mae: 16676.5195\n", "Epoch 18/1024\n", "11/11 [==============================] - 1s 136ms/step - loss: 824453914.3285 - mae: 19024.5078 - val_loss: 661850604.8000 - val_mae: 16811.2227\n", "Epoch 19/1024\n", "11/11 [==============================] - 2s 138ms/step - loss: 801468495.2299 - mae: 18976.1191 - val_loss: 660384323.2000 - val_mae: 16322.7549\n", "Epoch 20/1024\n", "11/11 [==============================] - 1s 131ms/step - loss: 753795500.6753 - mae: 18709.6406 - val_loss: 663768908.8000 - val_mae: 16474.5703\n", "Epoch 21/1024\n", "10/11 [==========================>...] - ETA: 0s - loss: 783667638.4000 - mae: 18711.5293\n", "Epoch 00021: Reducing Max LR on Plateau: new max lr will be 0.0125 (if not early_stopping).\n", "11/11 [==============================] - 1s 133ms/step - loss: 786054059.7830 - mae: 18731.3438 - val_loss: 664781280.0000 - val_mae: 16376.9863\n", "Epoch 22/1024\n", "11/11 [==============================] - 1s 133ms/step - loss: 825439074.7869 - mae: 19253.4219 - val_loss: 668607993.6000 - val_mae: 16350.0859\n", "Epoch 23/1024\n", " 8/11 [====================>.........] - ETA: 0s - loss: 824802156.0000 - mae: 18940.5098\n", "Epoch 00023: Reducing Max LR on Plateau: new max lr will be 0.00625 (if not early_stopping).\n", "11/11 [==============================] - 1s 134ms/step - loss: 790809524.3453 - mae: 18444.7891 - val_loss: 669713436.8000 - val_mae: 16343.8193\n", "Epoch 24/1024\n", " 9/11 [=======================>......] - ETA: 0s - loss: 714549009.7778 - mae: 18092.3867Restoring model weights from the end of the best epoch.\n", "11/11 [==============================] - 1s 136ms/step - loss: 736000784.4584 - mae: 18215.8066 - val_loss: 666175027.2000 - val_mae: 16389.6992\n", "Epoch 00024: early stopping\n", "Weights from best epoch have been loaded into model.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learner.autofit(1e-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate Model" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('mae', 16322.754966887418)]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learner.evaluate(test_data=val)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }