{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Pandas to Get Familiar With Your Data\n", "\n", "The first thing you'll want to do is familiarize yourself with the data. You'll use the Pandas library for this. Pandas is the primary tool that modern data scientists use for exploring and manipulating data. Most people abbreviate pandas in their code as pd. We do this with the command" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most important part of the Pandas library is the DataFrame. A DataFrame holds the type of data you might think of as a table. This is similar to a sheet in Excel, or a table in a SQL database. The Pandas DataFrame has powerful methods for most things you'll want to do with this type of data. Let's start by looking at a basic data overview with our example data from Melbourne and the data you'll be working with from Iowa.\n", "\n", "We load and explore the data with the following:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilities...PoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
0160RL65.08450PaveNaNRegLvlAllPub...0NaNNaNNaN022008WDNormal208500
1220RL80.09600PaveNaNRegLvlAllPub...0NaNNaNNaN052007WDNormal181500
2360RL68.011250PaveNaNIR1LvlAllPub...0NaNNaNNaN092008WDNormal223500
3470RL60.09550PaveNaNIR1LvlAllPub...0NaNNaNNaN022006WDAbnorml140000
4560RL84.014260PaveNaNIR1LvlAllPub...0NaNNaNNaN0122008WDNormal250000
5650RL85.014115PaveNaNIR1LvlAllPub...0NaNMnPrvShed700102009WDNormal143000
6720RL75.010084PaveNaNRegLvlAllPub...0NaNNaNNaN082007WDNormal307000
7860RLNaN10382PaveNaNIR1LvlAllPub...0NaNNaNShed350112009WDNormal200000
8950RM51.06120PaveNaNRegLvlAllPub...0NaNNaNNaN042008WDAbnorml129900
910190RL50.07420PaveNaNRegLvlAllPub...0NaNNaNNaN012008WDNormal118000
101120RL70.011200PaveNaNRegLvlAllPub...0NaNNaNNaN022008WDNormal129500
111260RL85.011924PaveNaNIR1LvlAllPub...0NaNNaNNaN072006NewPartial345000
121320RLNaN12968PaveNaNIR2LvlAllPub...0NaNNaNNaN092008WDNormal144000
131420RL91.010652PaveNaNIR1LvlAllPub...0NaNNaNNaN082007NewPartial279500
141520RLNaN10920PaveNaNIR1LvlAllPub...0NaNGdWoNaN052008WDNormal157000
151645RM51.06120PaveNaNRegLvlAllPub...0NaNGdPrvNaN072007WDNormal132000
161720RLNaN11241PaveNaNIR1LvlAllPub...0NaNNaNShed70032010WDNormal149000
171890RL72.010791PaveNaNRegLvlAllPub...0NaNNaNShed500102006WDNormal90000
181920RL66.013695PaveNaNRegLvlAllPub...0NaNNaNNaN062008WDNormal159000
192020RL70.07560PaveNaNRegLvlAllPub...0NaNMnPrvNaN052009CODAbnorml139000
202160RL101.014215PaveNaNIR1LvlAllPub...0NaNNaNNaN0112006NewPartial325300
212245RM57.07449PaveGrvlRegBnkAllPub...0NaNGdPrvNaN062007WDNormal139400
222320RL75.09742PaveNaNRegLvlAllPub...0NaNNaNNaN092008WDNormal230000
2324120RM44.04224PaveNaNRegLvlAllPub...0NaNNaNNaN062007WDNormal129900
242520RLNaN8246PaveNaNIR1LvlAllPub...0NaNMnPrvNaN052010WDNormal154000
252620RL110.014230PaveNaNRegLvlAllPub...0NaNNaNNaN072009WDNormal256300
262720RL60.07200PaveNaNRegLvlAllPub...0NaNNaNNaN052010WDNormal134800
272820RL98.011478PaveNaNRegLvlAllPub...0NaNNaNNaN052010WDNormal306000
282920RL47.016321PaveNaNIR1LvlAllPub...0NaNNaNNaN0122006WDNormal207500
293030RM60.06324PaveNaNIR1LvlAllPub...0NaNNaNNaN052008WDNormal68500
..................................................................
1430143160RL60.021930PaveNaNIR3LvlAllPub...0NaNNaNNaN072006WDNormal192140
14311432120RLNaN4928PaveNaNIR1LvlAllPub...0NaNNaNNaN0102009WDNormal143750
1432143330RL60.010800PaveGrvlRegLvlAllPub...0NaNNaNNaN082007WDNormal64500
1433143460RL93.010261PaveNaNIR1LvlAllPub...0NaNNaNNaN052008WDNormal186500
1434143520RL80.017400PaveNaNRegLowAllPub...0NaNNaNNaN052006WDNormal160000
1435143620RL80.08400PaveNaNRegLvlAllPub...0NaNGdPrvNaN072008CODAbnorml174000
1436143720RL60.09000PaveNaNRegLvlAllPub...0NaNGdWoNaN052007WDNormal120500
1437143820RL96.012444PaveNaNRegLvlAllPub...0NaNNaNNaN0112008NewPartial394617
1438143920RM90.07407PaveNaNRegLvlAllPub...0NaNMnPrvNaN042010WDNormal149700
1439144060RL80.011584PaveNaNRegLvlAllPub...0NaNNaNNaN0112007WDNormal197000
1440144170RL79.011526PaveNaNIR1BnkAllPub...0NaNNaNNaN092008WDNormal191000
14411442120RMNaN4426PaveNaNRegLvlAllPub...0NaNNaNNaN052008WDNormal149300
1442144360FV85.011003PaveNaNRegLvlAllPub...0NaNNaNNaN042009WDNormal310000
1443144430RLNaN8854PaveNaNRegLvlAllPub...0NaNNaNNaN052009WDNormal121000
1444144520RL63.08500PaveNaNRegLvlAllPub...0NaNNaNNaN0112007WDNormal179600
1445144685RL70.08400PaveNaNRegLvlAllPub...0NaNNaNNaN052007WDNormal129000
1446144720RLNaN26142PaveNaNIR1LvlAllPub...0NaNNaNNaN042010WDNormal157900
1447144860RL80.010000PaveNaNRegLvlAllPub...0NaNNaNNaN0122007WDNormal240000
1448144950RL70.011767PaveNaNRegLvlAllPub...0NaNGdWoNaN052007WDNormal112000
14491450180RM21.01533PaveNaNRegLvlAllPub...0NaNNaNNaN082006WDAbnorml92000
1450145190RL60.09000PaveNaNRegLvlAllPub...0NaNNaNNaN092009WDNormal136000
1451145220RL78.09262PaveNaNRegLvlAllPub...0NaNNaNNaN052009NewPartial287090
14521453180RM35.03675PaveNaNRegLvlAllPub...0NaNNaNNaN052006WDNormal145000
1453145420RL90.017217PaveNaNRegLvlAllPub...0NaNNaNNaN072006WDAbnorml84500
1454145520FV62.07500PavePaveRegLvlAllPub...0NaNNaNNaN0102009WDNormal185000
1455145660RL62.07917PaveNaNRegLvlAllPub...0NaNNaNNaN082007WDNormal175000
1456145720RL85.013175PaveNaNRegLvlAllPub...0NaNMnPrvNaN022010WDNormal210000
1457145870RL66.09042PaveNaNRegLvlAllPub...0NaNGdPrvShed250052010WDNormal266500
1458145920RL68.09717PaveNaNRegLvlAllPub...0NaNNaNNaN042010WDNormal142125
1459146020RL75.09937PaveNaNRegLvlAllPub...0NaNNaNNaN062008WDNormal147500
\n", "

1460 rows × 81 columns

\n", "
" ], "text/plain": [ " Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \\\n", "0 1 60 RL 65.0 8450 Pave NaN Reg \n", "1 2 20 RL 80.0 9600 Pave NaN Reg \n", "2 3 60 RL 68.0 11250 Pave NaN IR1 \n", "3 4 70 RL 60.0 9550 Pave NaN IR1 \n", "4 5 60 RL 84.0 14260 Pave NaN IR1 \n", "5 6 50 RL 85.0 14115 Pave NaN IR1 \n", "6 7 20 RL 75.0 10084 Pave NaN Reg \n", "7 8 60 RL NaN 10382 Pave NaN IR1 \n", "8 9 50 RM 51.0 6120 Pave NaN Reg \n", "9 10 190 RL 50.0 7420 Pave NaN Reg \n", "10 11 20 RL 70.0 11200 Pave NaN Reg \n", "11 12 60 RL 85.0 11924 Pave NaN IR1 \n", "12 13 20 RL NaN 12968 Pave NaN IR2 \n", "13 14 20 RL 91.0 10652 Pave NaN IR1 \n", "14 15 20 RL NaN 10920 Pave NaN IR1 \n", "15 16 45 RM 51.0 6120 Pave NaN Reg \n", "16 17 20 RL NaN 11241 Pave NaN IR1 \n", "17 18 90 RL 72.0 10791 Pave NaN Reg \n", "18 19 20 RL 66.0 13695 Pave NaN Reg \n", "19 20 20 RL 70.0 7560 Pave NaN Reg \n", "20 21 60 RL 101.0 14215 Pave NaN IR1 \n", "21 22 45 RM 57.0 7449 Pave Grvl Reg \n", "22 23 20 RL 75.0 9742 Pave NaN Reg \n", "23 24 120 RM 44.0 4224 Pave NaN Reg \n", "24 25 20 RL NaN 8246 Pave NaN IR1 \n", "25 26 20 RL 110.0 14230 Pave NaN Reg \n", "26 27 20 RL 60.0 7200 Pave NaN Reg \n", "27 28 20 RL 98.0 11478 Pave NaN Reg \n", "28 29 20 RL 47.0 16321 Pave NaN IR1 \n", "29 30 30 RM 60.0 6324 Pave NaN IR1 \n", "... ... ... ... ... ... ... ... ... \n", "1430 1431 60 RL 60.0 21930 Pave NaN IR3 \n", "1431 1432 120 RL NaN 4928 Pave NaN IR1 \n", "1432 1433 30 RL 60.0 10800 Pave Grvl Reg \n", "1433 1434 60 RL 93.0 10261 Pave NaN IR1 \n", "1434 1435 20 RL 80.0 17400 Pave NaN Reg \n", "1435 1436 20 RL 80.0 8400 Pave NaN Reg \n", "1436 1437 20 RL 60.0 9000 Pave NaN Reg \n", "1437 1438 20 RL 96.0 12444 Pave NaN Reg \n", "1438 1439 20 RM 90.0 7407 Pave NaN Reg \n", "1439 1440 60 RL 80.0 11584 Pave NaN Reg \n", "1440 1441 70 RL 79.0 11526 Pave NaN IR1 \n", "1441 1442 120 RM NaN 4426 Pave NaN Reg \n", "1442 1443 60 FV 85.0 11003 Pave NaN Reg \n", "1443 1444 30 RL NaN 8854 Pave NaN Reg \n", "1444 1445 20 RL 63.0 8500 Pave NaN Reg \n", "1445 1446 85 RL 70.0 8400 Pave NaN Reg \n", "1446 1447 20 RL NaN 26142 Pave NaN IR1 \n", "1447 1448 60 RL 80.0 10000 Pave NaN Reg \n", "1448 1449 50 RL 70.0 11767 Pave NaN Reg \n", "1449 1450 180 RM 21.0 1533 Pave NaN Reg \n", "1450 1451 90 RL 60.0 9000 Pave NaN Reg \n", "1451 1452 20 RL 78.0 9262 Pave NaN Reg \n", "1452 1453 180 RM 35.0 3675 Pave NaN Reg \n", "1453 1454 20 RL 90.0 17217 Pave NaN Reg \n", "1454 1455 20 FV 62.0 7500 Pave Pave Reg \n", "1455 1456 60 RL 62.0 7917 Pave NaN Reg \n", "1456 1457 20 RL 85.0 13175 Pave NaN Reg \n", "1457 1458 70 RL 66.0 9042 Pave NaN Reg \n", "1458 1459 20 RL 68.0 9717 Pave NaN Reg \n", "1459 1460 20 RL 75.0 9937 Pave NaN Reg \n", "\n", " LandContour Utilities ... PoolArea PoolQC Fence MiscFeature \\\n", "0 Lvl AllPub ... 0 NaN NaN NaN \n", "1 Lvl AllPub ... 0 NaN NaN NaN \n", "2 Lvl AllPub ... 0 NaN NaN NaN \n", "3 Lvl AllPub ... 0 NaN NaN NaN \n", "4 Lvl AllPub ... 0 NaN NaN NaN \n", "5 Lvl AllPub ... 0 NaN MnPrv Shed \n", "6 Lvl AllPub ... 0 NaN NaN NaN \n", "7 Lvl AllPub ... 0 NaN NaN Shed \n", "8 Lvl AllPub ... 0 NaN NaN NaN \n", "9 Lvl AllPub ... 0 NaN NaN NaN \n", "10 Lvl AllPub ... 0 NaN NaN NaN \n", "11 Lvl AllPub ... 0 NaN NaN NaN \n", "12 Lvl AllPub ... 0 NaN NaN NaN \n", "13 Lvl AllPub ... 0 NaN NaN NaN \n", "14 Lvl AllPub ... 0 NaN GdWo NaN \n", "15 Lvl AllPub ... 0 NaN GdPrv NaN \n", "16 Lvl AllPub ... 0 NaN NaN Shed \n", "17 Lvl AllPub ... 0 NaN NaN Shed \n", "18 Lvl AllPub ... 0 NaN NaN NaN \n", "19 Lvl AllPub ... 0 NaN MnPrv NaN \n", "20 Lvl AllPub ... 0 NaN NaN NaN \n", "21 Bnk AllPub ... 0 NaN GdPrv NaN \n", "22 Lvl AllPub ... 0 NaN NaN NaN \n", "23 Lvl AllPub ... 0 NaN NaN NaN \n", "24 Lvl AllPub ... 0 NaN MnPrv NaN \n", "25 Lvl AllPub ... 0 NaN NaN NaN \n", "26 Lvl AllPub ... 0 NaN NaN NaN \n", "27 Lvl AllPub ... 0 NaN NaN NaN \n", "28 Lvl AllPub ... 0 NaN NaN NaN \n", "29 Lvl AllPub ... 0 NaN NaN NaN \n", "... ... ... ... ... ... ... ... \n", "1430 Lvl AllPub ... 0 NaN NaN NaN \n", "1431 Lvl AllPub ... 0 NaN NaN NaN \n", "1432 Lvl AllPub ... 0 NaN NaN NaN \n", "1433 Lvl AllPub ... 0 NaN NaN NaN \n", "1434 Low AllPub ... 0 NaN NaN NaN \n", "1435 Lvl AllPub ... 0 NaN GdPrv NaN \n", "1436 Lvl AllPub ... 0 NaN GdWo NaN \n", "1437 Lvl AllPub ... 0 NaN NaN NaN \n", "1438 Lvl AllPub ... 0 NaN MnPrv NaN \n", "1439 Lvl AllPub ... 0 NaN NaN NaN \n", "1440 Bnk AllPub ... 0 NaN NaN NaN \n", "1441 Lvl AllPub ... 0 NaN NaN NaN \n", "1442 Lvl AllPub ... 0 NaN NaN NaN \n", "1443 Lvl AllPub ... 0 NaN NaN NaN \n", "1444 Lvl AllPub ... 0 NaN NaN NaN \n", "1445 Lvl AllPub ... 0 NaN NaN NaN \n", "1446 Lvl AllPub ... 0 NaN NaN NaN \n", "1447 Lvl AllPub ... 0 NaN NaN NaN \n", "1448 Lvl AllPub ... 0 NaN GdWo NaN \n", "1449 Lvl AllPub ... 0 NaN NaN NaN \n", "1450 Lvl AllPub ... 0 NaN NaN NaN \n", "1451 Lvl AllPub ... 0 NaN NaN NaN \n", "1452 Lvl AllPub ... 0 NaN NaN NaN \n", "1453 Lvl AllPub ... 0 NaN NaN NaN \n", "1454 Lvl AllPub ... 0 NaN NaN NaN \n", "1455 Lvl AllPub ... 0 NaN NaN NaN \n", "1456 Lvl AllPub ... 0 NaN MnPrv NaN \n", "1457 Lvl AllPub ... 0 NaN GdPrv Shed \n", "1458 Lvl AllPub ... 0 NaN NaN NaN \n", "1459 Lvl AllPub ... 0 NaN NaN NaN \n", "\n", " MiscVal MoSold YrSold SaleType SaleCondition SalePrice \n", "0 0 2 2008 WD Normal 208500 \n", "1 0 5 2007 WD Normal 181500 \n", "2 0 9 2008 WD Normal 223500 \n", "3 0 2 2006 WD Abnorml 140000 \n", "4 0 12 2008 WD Normal 250000 \n", "5 700 10 2009 WD Normal 143000 \n", "6 0 8 2007 WD Normal 307000 \n", "7 350 11 2009 WD Normal 200000 \n", "8 0 4 2008 WD Abnorml 129900 \n", "9 0 1 2008 WD Normal 118000 \n", "10 0 2 2008 WD Normal 129500 \n", "11 0 7 2006 New Partial 345000 \n", "12 0 9 2008 WD Normal 144000 \n", "13 0 8 2007 New Partial 279500 \n", "14 0 5 2008 WD Normal 157000 \n", "15 0 7 2007 WD Normal 132000 \n", "16 700 3 2010 WD Normal 149000 \n", "17 500 10 2006 WD Normal 90000 \n", "18 0 6 2008 WD Normal 159000 \n", "19 0 5 2009 COD Abnorml 139000 \n", "20 0 11 2006 New Partial 325300 \n", "21 0 6 2007 WD Normal 139400 \n", "22 0 9 2008 WD Normal 230000 \n", "23 0 6 2007 WD Normal 129900 \n", "24 0 5 2010 WD Normal 154000 \n", "25 0 7 2009 WD Normal 256300 \n", "26 0 5 2010 WD Normal 134800 \n", "27 0 5 2010 WD Normal 306000 \n", "28 0 12 2006 WD Normal 207500 \n", "29 0 5 2008 WD Normal 68500 \n", "... ... ... ... ... ... ... \n", "1430 0 7 2006 WD Normal 192140 \n", "1431 0 10 2009 WD Normal 143750 \n", "1432 0 8 2007 WD Normal 64500 \n", "1433 0 5 2008 WD Normal 186500 \n", "1434 0 5 2006 WD Normal 160000 \n", "1435 0 7 2008 COD Abnorml 174000 \n", "1436 0 5 2007 WD Normal 120500 \n", "1437 0 11 2008 New Partial 394617 \n", "1438 0 4 2010 WD Normal 149700 \n", "1439 0 11 2007 WD Normal 197000 \n", "1440 0 9 2008 WD Normal 191000 \n", "1441 0 5 2008 WD Normal 149300 \n", "1442 0 4 2009 WD Normal 310000 \n", "1443 0 5 2009 WD Normal 121000 \n", "1444 0 11 2007 WD Normal 179600 \n", "1445 0 5 2007 WD Normal 129000 \n", "1446 0 4 2010 WD Normal 157900 \n", "1447 0 12 2007 WD Normal 240000 \n", "1448 0 5 2007 WD Normal 112000 \n", "1449 0 8 2006 WD Abnorml 92000 \n", "1450 0 9 2009 WD Normal 136000 \n", "1451 0 5 2009 New Partial 287090 \n", "1452 0 5 2006 WD Normal 145000 \n", "1453 0 7 2006 WD Abnorml 84500 \n", "1454 0 10 2009 WD Normal 185000 \n", "1455 0 8 2007 WD Normal 175000 \n", "1456 0 2 2010 WD Normal 210000 \n", "1457 2500 5 2010 WD Normal 266500 \n", "1458 0 4 2010 WD Normal 142125 \n", "1459 0 6 2008 WD Normal 147500 \n", "\n", "[1460 rows x 81 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# save filepath to variable for easier access\n", "file_path = 'train.csv'\n", "# read the data and store data in DataFrame titled df\n", "df = pd.read_csv(file_path) \n", "# show the data\n", "df" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Id MSSubClass LotFrontage LotArea OverallQual \\\n", "count 1460.000000 1460.000000 1201.000000 1460.000000 1460.000000 \n", "mean 730.500000 56.897260 70.049958 10516.828082 6.099315 \n", "std 421.610009 42.300571 24.284752 9981.264932 1.382997 \n", "min 1.000000 20.000000 21.000000 1300.000000 1.000000 \n", "25% 365.750000 20.000000 59.000000 7553.500000 5.000000 \n", "50% 730.500000 50.000000 69.000000 9478.500000 6.000000 \n", "75% 1095.250000 70.000000 80.000000 11601.500000 7.000000 \n", "max 1460.000000 190.000000 313.000000 215245.000000 10.000000 \n", "\n", " OverallCond YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 \\\n", "count 1460.000000 1460.000000 1460.000000 1452.000000 1460.000000 \n", "mean 5.575342 1971.267808 1984.865753 103.685262 443.639726 \n", "std 1.112799 30.202904 20.645407 181.066207 456.098091 \n", "min 1.000000 1872.000000 1950.000000 0.000000 0.000000 \n", "25% 5.000000 1954.000000 1967.000000 0.000000 0.000000 \n", "50% 5.000000 1973.000000 1994.000000 0.000000 383.500000 \n", "75% 6.000000 2000.000000 2004.000000 166.000000 712.250000 \n", "max 9.000000 2010.000000 2010.000000 1600.000000 5644.000000 \n", "\n", " ... WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch \\\n", "count ... 1460.000000 1460.000000 1460.000000 1460.000000 \n", "mean ... 94.244521 46.660274 21.954110 3.409589 \n", "std ... 125.338794 66.256028 61.119149 29.317331 \n", "min ... 0.000000 0.000000 0.000000 0.000000 \n", "25% ... 0.000000 0.000000 0.000000 0.000000 \n", "50% ... 0.000000 25.000000 0.000000 0.000000 \n", "75% ... 168.000000 68.000000 0.000000 0.000000 \n", "max ... 857.000000 547.000000 552.000000 508.000000 \n", "\n", " ScreenPorch PoolArea MiscVal MoSold YrSold \\\n", "count 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 \n", "mean 15.060959 2.758904 43.489041 6.321918 2007.815753 \n", "std 55.757415 40.177307 496.123024 2.703626 1.328095 \n", "min 0.000000 0.000000 0.000000 1.000000 2006.000000 \n", "25% 0.000000 0.000000 0.000000 5.000000 2007.000000 \n", "50% 0.000000 0.000000 0.000000 6.000000 2008.000000 \n", "75% 0.000000 0.000000 0.000000 8.000000 2009.000000 \n", "max 480.000000 738.000000 15500.000000 12.000000 2010.000000 \n", "\n", " SalePrice \n", "count 1460.000000 \n", "mean 180921.195890 \n", "std 79442.502883 \n", "min 34900.000000 \n", "25% 129975.000000 \n", "50% 163000.000000 \n", "75% 214000.000000 \n", "max 755000.000000 \n", "\n", "[8 rows x 38 columns]\n" ] } ], "source": [ "# print a summary of the data in train data\n", "print(df.describe())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Interpreting Data Description\n", "\n", "The results show 8 numbers for each column in your original dataset. The first number, the count, shows how many rows have non-missing values.\n", "\n", "The second value is the mean, which is the average. Under that, sd is the standard deviation, which measures how numerically spread out the values are.\n", "\n", "To interpret the min, 25%, 50%, 75% and max values, imagine sorting each column from lowest to highest value. The first (smallest) value is the min. If you go a quarter way through the list, you'll find a number that is bigger than 25% of the values and smaller than 75% of the values. That is the 25% value (pronounced \"25th percentile\"). The 50th and 75th percentiles are defined analgously, and the max is the largest number." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Spark and Optimus to Get Familiar With Your Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Optimus (By Iron)is the missing framework for cleaning, pre-processing and exploring data in a distributed fashion. It uses all the power of Apache Spark (optimized via Catalyst) and Python to do so. It implements several handy tools for data wrangling and munging that will make your life much easier. The first obvious advantage over any other public data cleaning library or framework is that it will work on your laptop or your big cluster, and second, it is amazingly easy to install, use and understand." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install optimus with:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: optimuspyspark in /Users/faviovazquez/anaconda/lib/python3.6/site-packages\n", "Requirement already satisfied: pytest in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from optimuspyspark)\n", "Requirement already satisfied: pixiedust-optimus in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from optimuspyspark)\n", "Requirement already satisfied: spark-df-profiling-optimus in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from optimuspyspark)\n", "Requirement already satisfied: pytest-spark in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from optimuspyspark)\n", "Requirement already satisfied: seaborn in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from optimuspyspark)\n", "Requirement already satisfied: findspark in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from optimuspyspark)\n", "Requirement already satisfied: pyspark in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from optimuspyspark)\n", "Requirement already satisfied: py>=1.4.29 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from pytest->optimuspyspark)\n", "Requirement already satisfied: setuptools in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from pytest->optimuspyspark)\n", "Requirement already satisfied: geojson in /Users/faviovazquez/.local/lib/python3.6/site-packages (from pixiedust-optimus->optimuspyspark)\n", "Requirement already satisfied: lxml in /Users/faviovazquez/.local/lib/python3.6/site-packages (from pixiedust-optimus->optimuspyspark)\n", "Requirement already satisfied: mpld3 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from pixiedust-optimus->optimuspyspark)\n", "Requirement already satisfied: jinja2>=2.8 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from spark-df-profiling-optimus->optimuspyspark)\n", "Requirement already satisfied: pandas>=0.17.0 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from spark-df-profiling-optimus->optimuspyspark)\n", "Requirement already satisfied: matplotlib>=2.0 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from spark-df-profiling-optimus->optimuspyspark)\n", "Requirement already satisfied: six>=1.9.0 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from spark-df-profiling-optimus->optimuspyspark)\n", "Requirement already satisfied: numpy in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from seaborn->optimuspyspark)\n", "Requirement already satisfied: scipy in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from seaborn->optimuspyspark)\n", "Requirement already satisfied: py4j==0.10.4 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from pyspark->optimuspyspark)\n", "Requirement already satisfied: MarkupSafe>=0.23 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from jinja2>=2.8->spark-df-profiling-optimus->optimuspyspark)\n", "Requirement already satisfied: python-dateutil>=2 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from pandas>=0.17.0->spark-df-profiling-optimus->optimuspyspark)\n", "Requirement already satisfied: pytz>=2011k in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from pandas>=0.17.0->spark-df-profiling-optimus->optimuspyspark)\n", "Requirement already satisfied: cycler>=0.10 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from matplotlib>=2.0->spark-df-profiling-optimus->optimuspyspark)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/faviovazquez/anaconda/lib/python3.6/site-packages (from matplotlib>=2.0->spark-df-profiling-optimus->optimuspyspark)\n" ] } ], "source": [ "!pip install optimuspyspark" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using Optimus is really easy, and you have spark underneath :)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Starting or getting SparkSession and SparkContext.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Setting checkpoint folder (local). If you are in a cluster change it with set_check_point_folder(path,'hadoop').
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Deleting previous folder if exists...\n", "Creation of checkpoint directory...\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", "
\n", " \n", " \n", " \n", "

Optimus successfully imported. Have fun :).

\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import optimus as op" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "tools = op.Utilities()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = tools.read_csv(\"train.csv\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "textn", "| Id|MSSubClass|MSZoning|LotFrontage|LotArea|Street|Alley|LotShape|LandContour|Utilities|LotConfig|LandSlope|Neighborhood|Condition1|Condition2|BldgType|HouseStyle|OverallQual|OverallCond|YearBuilt|YearRemodAdd|RoofStyle|RoofMatl|Exterior1st|Exterior2nd|MasVnrType|MasVnrArea|ExterQual|ExterCond|Foundation|BsmtQual|BsmtCond|BsmtExposure|BsmtFinType1|BsmtFinSF1|BsmtFinType2|BsmtFinSF2|BsmtUnfSF|TotalBsmtSF|Heating|HeatingQC|CentralAir|Electrical|1stFlrSF|2ndFlrSF|LowQualFinSF|GrLivArea|BsmtFullBath|BsmtHalfBath|FullBath|HalfBath|BedroomAbvGr|KitchenAbvGr|KitchenQual|TotRmsAbvGrd|Functional|Fireplaces|FireplaceQu|GarageType|GarageYrBlt|GarageFinish|GarageCars|GarageArea|GarageQual|GarageCond|PavedDrive|WoodDeckSF|OpenPorchSF|EnclosedPorch|3SsnPorch|ScreenPorch|PoolArea|PoolQC|Fence|MiscFeature|MiscVal|MoSold|YrSold|SaleType|SaleCondition|SalePrice|\nn", "| 1| 60| RL| 65| 8450| Pave| NA| Reg| Lvl| AllPub| Inside| Gtl| CollgCr| Norm| Norm| 1Fam| 2Story| 7| 5| 2003| 2003| Gable| CompShg| VinylSd| VinylSd| BrkFace| 196| Gd| TA| PConc| Gd| TA| No| GLQ| 706| Unf| 0| 150| 856| GasA| Ex| Y| SBrkr| 856| 854| 0| 1710| 1| 0| 2| 1| 3| 1| Gd| 8| Typ| 0| NA| Attchd| 2003| RFn| 2| 548| TA| TA| Y| 0| 61| 0| 0| 0| 0| NA| NA| NA| 0| 2| 2008| WD| Normal| 208500|\n", "| 2| 20| RL| 80| 9600| Pave| NA| Reg| Lvl| AllPub| FR2| Gtl| Veenker| Feedr| Norm| 1Fam| 1Story| 6| 8| 1976| 1976| Gable| CompShg| MetalSd| MetalSd| None| 0| TA| TA| CBlock| Gd| TA| Gd| ALQ| 978| Unf| 0| 284| 1262| GasA| Ex| Y| SBrkr| 1262| 0| 0| 1262| 0| 1| 2| 0| 3| 1| TA| 6| Typ| 1| TA| Attchd| 1976| RFn| 2| 460| TA| TA| Y| 298| 0| 0| 0| 0| 0| NA| NA| NA| 0| 5| 2007| WD| Normal| 181500|\n", "| 3| 60| RL| 68| 11250| Pave| NA| IR1| Lvl| AllPub| Inside| Gtl| CollgCr| Norm| Norm| 1Fam| 2Story| 7| 5| 2001| 2002| Gable| CompShg| VinylSd| VinylSd| BrkFace| 162| Gd| TA| PConc| Gd| TA| Mn| GLQ| 486| Unf| 0| 434| 920| GasA| Ex| Y| SBrkr| 920| 866| 0| 1786| 1| 0| 2| 1| 3| 1| Gd| 6| Typ| 1| TA| Attchd| 2001| RFn| 2| 608| TA| TA| Y| 0| 42| 0| 0| 0| 0| NA| NA| NA| 0| 9| 2008| WD| Normal| 223500|\n", "| 4| 70| RL| 60| 9550| Pave| NA| IR1| Lvl| AllPub| Corner| Gtl| Crawfor| Norm| Norm| 1Fam| 2Story| 7| 5| 1915| 1970| Gable| CompShg| Wd Sdng| Wd Shng| None| 0| TA| TA| BrkTil| TA| Gd| No| ALQ| 216| Unf| 0| 540| 756| GasA| Gd| Y| SBrkr| 961| 756| 0| 1717| 1| 0| 1| 0| 3| 1| Gd| 7| Typ| 1| Gd| Detchd| 1998| Unf| 3| 642| TA| TA| Y| 0| 35| 272| 0| 0| 0| NA| NA| NA| 0| 2| 2006| WD| Abnorml| 140000|\n", "| 5| 60| RL| 84| 14260| Pave| NA| IR1| Lvl| AllPub| FR2| Gtl| NoRidge| Norm| Norm| 1Fam| 2Story| 8| 5| 2000| 2000| Gable| CompShg| VinylSd| VinylSd| BrkFace| 350| Gd| TA| PConc| Gd| TA| Av| GLQ| 655| Unf| 0| 490| 1145| GasA| Ex| Y| SBrkr| 1145| 1053| 0| 2198| 1| 0| 2| 1| 4| 1| Gd| 9| Typ| 1| TA| Attchd| 2000| RFn| 3| 836| TA| TA| Y| 192| 84| 0| 0| 0| 0| NA| NA| NA| 0| 12| 2008| WD| Normal| 250000|\n", "| 6| 50| RL| 85| 14115| Pave| NA| IR1| Lvl| AllPub| Inside| Gtl| Mitchel| Norm| Norm| 1Fam| 1.5Fin| 5| 5| 1993| 1995| Gable| CompShg| VinylSd| VinylSd| None| 0| TA| TA| Wood| Gd| TA| No| GLQ| 732| Unf| 0| 64| 796| GasA| Ex| Y| SBrkr| 796| 566| 0| 1362| 1| 0| 1| 1| 1| 1| TA| 5| Typ| 0| NA| Attchd| 1993| Unf| 2| 480| TA| TA| Y| 40| 30| 0| 320| 0| 0| NA|MnPrv| Shed| 700| 10| 2009| WD| Normal| 143000|\n", "| 7| 20| RL| 75| 10084| Pave| NA| Reg| Lvl| AllPub| Inside| Gtl| Somerst| Norm| Norm| 1Fam| 1Story| 8| 5| 2004| 2005| Gable| CompShg| VinylSd| VinylSd| Stone| 186| Gd| TA| PConc| Ex| TA| Av| GLQ| 1369| Unf| 0| 317| 1686| GasA| Ex| Y| SBrkr| 1694| 0| 0| 1694| 1| 0| 2| 0| 3| 1| Gd| 7| Typ| 1| Gd| Attchd| 2004| RFn| 2| 636| TA| TA| Y| 255| 57| 0| 0| 0| 0| NA| NA| NA| 0| 8| 2007| WD| Normal| 307000|\n", "| 8| 60| RL| NA| 10382| Pave| NA| IR1| Lvl| AllPub| Corner| Gtl| NWAmes| PosN| Norm| 1Fam| 2Story| 7| 6| 1973| 1973| Gable| CompShg| HdBoard| HdBoard| Stone| 240| TA| TA| CBlock| Gd| TA| Mn| ALQ| 859| BLQ| 32| 216| 1107| GasA| Ex| Y| SBrkr| 1107| 983| 0| 2090| 1| 0| 2| 1| 3| 1| TA| 7| Typ| 2| TA| Attchd| 1973| RFn| 2| 484| TA| TA| Y| 235| 204| 228| 0| 0| 0| NA| NA| Shed| 350| 11| 2009| WD| Normal| 200000|\n", "| 9| 50| RM| 51| 6120| Pave| NA| Reg| Lvl| AllPub| Inside| Gtl| OldTown| Artery| Norm| 1Fam| 1.5Fin| 7| 5| 1931| 1950| Gable| CompShg| BrkFace| Wd Shng| None| 0| TA| TA| BrkTil| TA| TA| No| Unf| 0| Unf| 0| 952| 952| GasA| Gd| Y| FuseF| 1022| 752| 0| 1774| 0| 0| 2| 0| 2| 2| TA| 8| Min1| 2| TA| Detchd| 1931| Unf| 2| 468| Fa| TA| Y| 90| 0| 205| 0| 0| 0| NA| NA| NA| 0| 4| 2008| WD| Abnorml| 129900|\n", "| 10| 190| RL| 50| 7420| Pave| NA| Reg| Lvl| AllPub| Corner| Gtl| BrkSide| Artery| Artery| 2fmCon| 1.5Unf| 5| 6| 1939| 1950| Gable| CompShg| MetalSd| MetalSd| None| 0| TA| TA| BrkTil| TA| TA| No| GLQ| 851| Unf| 0| 140| 991| GasA| Ex| Y| SBrkr| 1077| 0| 0| 1077| 1| 0| 1| 0| 2| 2| TA| 5| Typ| 2| TA| Attchd| 1939| RFn| 1| 205| Gd| TA| Y| 0| 4| 0| 0| 0| 0| NA| NA| NA| 0| 1| 2008| WD| Normal| 118000|\n", "| 11| 20| RL| 70| 11200| Pave| NA| Reg| Lvl| AllPub| Inside| Gtl| Sawyer| Norm| Norm| 1Fam| 1Story| 5| 5| 1965| 1965| Hip| CompShg| HdBoard| HdBoard| None| 0| TA| TA| CBlock| TA| TA| No| Rec| 906| Unf| 0| 134| 1040| GasA| Ex| Y| SBrkr| 1040| 0| 0| 1040| 1| 0| 1| 0| 3| 1| TA| 5| Typ| 0| NA| Detchd| 1965| Unf| 1| 384| TA| TA| Y| 0| 0| 0| 0| 0| 0| NA| NA| NA| 0| 2| 2008| WD| Normal| 129500|\n", "| 12| 60| RL| 85| 11924| Pave| NA| IR1| Lvl| AllPub| Inside| Gtl| NridgHt| Norm| Norm| 1Fam| 2Story| 9| 5| 2005| 2006| Hip| CompShg| WdShing| Wd Shng| Stone| 286| Ex| TA| PConc| Ex| TA| No| GLQ| 998| Unf| 0| 177| 1175| GasA| Ex| Y| SBrkr| 1182| 1142| 0| 2324| 1| 0| 3| 0| 4| 1| Ex| 11| Typ| 2| Gd| BuiltIn| 2005| Fin| 3| 736| TA| TA| Y| 147| 21| 0| 0| 0| 0| NA| NA| NA| 0| 7| 2006| New| Partial| 345000|\n", "| 13| 20| RL| NA| 12968| Pave| NA| IR2| Lvl| AllPub| Inside| Gtl| Sawyer| Norm| Norm| 1Fam| 1Story| 5| 6| 1962| 1962| Hip| CompShg| HdBoard| Plywood| None| 0| TA| TA| CBlock| TA| TA| No| ALQ| 737| Unf| 0| 175| 912| GasA| TA| Y| SBrkr| 912| 0| 0| 912| 1| 0| 1| 0| 2| 1| TA| 4| Typ| 0| NA| Detchd| 1962| Unf| 1| 352| TA| TA| Y| 140| 0| 0| 0| 176| 0| NA| NA| NA| 0| 9| 2008| WD| Normal| 144000|\n", "| 14| 20| RL| 91| 10652| Pave| NA| IR1| Lvl| AllPub| Inside| Gtl| CollgCr| Norm| Norm| 1Fam| 1Story| 7| 5| 2006| 2007| Gable| CompShg| VinylSd| VinylSd| Stone| 306| Gd| TA| PConc| Gd| TA| Av| Unf| 0| Unf| 0| 1494| 1494| GasA| Ex| Y| SBrkr| 1494| 0| 0| 1494| 0| 0| 2| 0| 3| 1| Gd| 7| Typ| 1| Gd| Attchd| 2006| RFn| 3| 840| TA| TA| Y| 160| 33| 0| 0| 0| 0| NA| NA| NA| 0| 8| 2007| New| Partial| 279500|\n", "| 15| 20| RL| NA| 10920| Pave| NA| IR1| Lvl| AllPub| Corner| Gtl| NAmes| Norm| Norm| 1Fam| 1Story| 6| 5| 1960| 1960| Hip| CompShg| MetalSd| MetalSd| BrkFace| 212| TA| TA| CBlock| TA| TA| No| BLQ| 733| Unf| 0| 520| 1253| GasA| TA| Y| SBrkr| 1253| 0| 0| 1253| 1| 0| 1| 1| 2| 1| TA| 5| Typ| 1| Fa| Attchd| 1960| RFn| 1| 352| TA| TA| Y| 0| 213| 176| 0| 0| 0| NA| GdWo| NA| 0| 5| 2008| WD| Normal| 157000|\n", "| 16| 45| RM| 51| 6120| Pave| NA| Reg| Lvl| AllPub| Corner| Gtl| BrkSide| Norm| Norm| 1Fam| 1.5Unf| 7| 8| 1929| 2001| Gable| CompShg| Wd Sdng| Wd Sdng| None| 0| TA| TA| BrkTil| TA| TA| No| Unf| 0| Unf| 0| 832| 832| GasA| Ex| Y| FuseA| 854| 0| 0| 854| 0| 0| 1| 0| 2| 1| TA| 5| Typ| 0| NA| Detchd| 1991| Unf| 2| 576| TA| TA| Y| 48| 112| 0| 0| 0| 0| NA|GdPrv| NA| 0| 7| 2007| WD| Normal| 132000|\n", "| 17| 20| RL| NA| 11241| Pave| NA| IR1| Lvl| AllPub| CulDSac| Gtl| NAmes| Norm| Norm| 1Fam| 1Story| 6| 7| 1970| 1970| Gable| CompShg| Wd Sdng| Wd Sdng| BrkFace| 180| TA| TA| CBlock| TA| TA| No| ALQ| 578| Unf| 0| 426| 1004| GasA| Ex| Y| SBrkr| 1004| 0| 0| 1004| 1| 0| 1| 0| 2| 1| TA| 5| Typ| 1| TA| Attchd| 1970| Fin| 2| 480| TA| TA| Y| 0| 0| 0| 0| 0| 0| NA| NA| Shed| 700| 3| 2010| WD| Normal| 149000|\n", "| 18| 90| RL| 72| 10791| Pave| NA| Reg| Lvl| AllPub| Inside| Gtl| Sawyer| Norm| Norm| Duplex| 1Story| 4| 5| 1967| 1967| Gable| CompShg| MetalSd| MetalSd| None| 0| TA| TA| Slab| NA| NA| NA| NA| 0| NA| 0| 0| 0| GasA| TA| Y| SBrkr| 1296| 0| 0| 1296| 0| 0| 2| 0| 2| 2| TA| 6| Typ| 0| NA| CarPort| 1967| Unf| 2| 516| TA| TA| Y| 0| 0| 0| 0| 0| 0| NA| NA| Shed| 500| 10| 2006| WD| Normal| 90000|\n", "| 19| 20| RL| 66| 13695| Pave| NA| Reg| Lvl| AllPub| Inside| Gtl| SawyerW| RRAe| Norm| 1Fam| 1Story| 5| 5| 2004| 2004| Gable| CompShg| VinylSd| VinylSd| None| 0| TA| TA| PConc| TA| TA| No| GLQ| 646| Unf| 0| 468| 1114| GasA| Ex| Y| SBrkr| 1114| 0| 0| 1114| 1| 0| 1| 1| 3| 1| Gd| 6| Typ| 0| NA| Detchd| 2004| Unf| 2| 576| TA| TA| Y| 0| 102| 0| 0| 0| 0| NA| NA| NA| 0| 6| 2008| WD| Normal| 159000|\n", "| 20| 20| RL| 70| 7560| Pave| NA| Reg| Lvl| AllPub| Inside| Gtl| NAmes| Norm| Norm| 1Fam| 1Story| 5| 6| 1958| 1965| Hip| CompShg| BrkFace| Plywood| None| 0| TA| TA| CBlock| TA| TA| No| LwQ| 504| Unf| 0| 525| 1029| GasA| TA| Y| SBrkr| 1339| 0| 0| 1339| 0| 0| 1| 0| 3| 1| TA| 6| Min1| 0| NA| Attchd| 1958| Unf| 1| 294| TA| TA| Y| 0| 0| 0| 0| 0| 0| NA|MnPrv| NA| 0| 5| 2009| COD| Abnorml| 139000|\nn", "only showing top 20 rows\n", "\n" ] } ], "source": [ "df.show()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "profiler = op.DataFrameProfiler(df)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Overview

\n", "
\n", "
\n", "
\n", "

Dataset info

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Number of variables81
Number of observations1460
Total Missing (%)0.0%
Total size in memory0.0 B
Average record size in memory0.0 B
\n", "
\n", "
\n", "

Variables types

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Numeric35
Categorical46
Date0
Text (Unique)0
Rejected0
\n", "
\n", "
\n", "

Warnings

\n", "
  • 2ndFlrSF has 829 / 56.8% zeros
  • 3SsnPorch has 1436 / 98.4% zeros
  • BsmtFinSF1 has 467 / 32.0% zeros
  • BsmtFinSF2 has 1293 / 88.6% zeros
  • BsmtFullBath has 856 / 58.6% zeros
  • BsmtHalfBath has 1378 / 94.4% zeros
  • BsmtUnfSF has 118 / 8.1% zeros
  • EnclosedPorch has 1252 / 85.8% zeros
  • Fireplaces has 690 / 47.3% zeros
  • GarageArea has 81 / 5.5% zeros
  • GarageCars has 81 / 5.5% zeros
  • GarageYrBlt has a high cardinality: 98 distinct values Warning
  • HalfBath has 913 / 62.5% zeros
  • LotFrontage has a high cardinality: 111 distinct values Warning
  • LowQualFinSF has 1434 / 98.2% zeros
  • MasVnrArea has a high cardinality: 328 distinct values Warning
  • MiscVal has 1408 / 96.4% zeros
  • MiscVal is highly skewed (γ1 = 24.452)
  • OpenPorchSF has 656 / 44.9% zeros
  • PoolArea has 1453 / 99.5% zeros
  • ScreenPorch has 1344 / 92.1% zeros
  • TotalBsmtSF has 37 / 2.5% zeros
  • WoodDeckSF has 761 / 52.1% zeros
\n", "
\n", "
\n", "
\n", "

Variables

\n", "
\n", "
\n", "
\n", "

1stFlrSF
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count753
Unique (%)51.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean1162.6
Minimum334
Maximum4692
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum334
5-th percentile672.95
Q1882
Median1087
Q31391.2
95-th percentile1831.2
Maximum4692
Range4358
Interquartile range509.25
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation386.59
Coef of variation0.33251
Kurtosis5.7221
Mean1162.6
MAD300.58
Skewness1.3753
Sum1697400
Variance149450
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

2ndFlrSF
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count417
Unique (%)28.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean346.99
Minimum0
Maximum2065
Zeros (%)56.8%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q3728
95-th percentile1141
Maximum2065
Range2065
Interquartile range728
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation436.53
Coef of variation1.258
Kurtosis-0.55568
Mean346.99
MAD396.48
Skewness0.81219
Sum506610
Variance190560
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

3SsnPorch
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count20
Unique (%)1.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean3.4096
Minimum0
Maximum508
Zeros (%)98.4%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile0
Maximum508
Range508
Interquartile range0
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation29.317
Coef of variation8.5985
Kurtosis123.24
Mean3.4096
MAD6.7071
Skewness10.294
Sum4978
Variance859.51
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

Alley
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count3
Unique (%)0.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
NA\n", "
\n", " 1369\n", "
\n", " \n", "
Grvl\n", "
\n", "  \n", "
\n", " 50\n", "
Pave\n", "
\n", "  \n", "
\n", " 41\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
NA136993.8%\n", "
 
\n", "
Grvl503.4%\n", "
 
\n", "
Pave412.8%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

BedroomAbvGr
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count8
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean2.8664
Minimum0
Maximum8
Zeros (%)0.4%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile2
Q12
Median3
Q33
95-th percentile4
Maximum8
Range8
Interquartile range1
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation0.81578
Coef of variation0.2846
Kurtosis2.2191
Mean2.8664
MAD0.57631
Skewness0.21157
Sum4185
Variance0.66549
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

BldgType
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
1Fam\n", "
\n", " 1220\n", "
\n", " \n", "
TwnhsE\n", "
\n", "  \n", "
\n", " 114\n", "
Duplex\n", "
\n", "  \n", "
\n", " 52\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 74\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
1Fam122083.6%\n", "
 
\n", "
TwnhsE1147.8%\n", "
 
\n", "
Duplex523.6%\n", "
 
\n", "
Twnhs432.9%\n", "
 
\n", "
2fmCon312.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

BsmtCond
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
TA\n", "
\n", " 1311\n", "
\n", " \n", "
Gd\n", "
\n", "  \n", "
\n", " 65\n", "
Fa\n", "
\n", "  \n", "
\n", " 45\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 39\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
TA131189.8%\n", "
 
\n", "
Gd654.5%\n", "
 
\n", "
Fa453.1%\n", "
 
\n", "
NA372.5%\n", "
 
\n", "
Po20.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

BsmtExposure
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
No\n", "
\n", " 953\n", "
\n", " \n", "
Av\n", "
\n", " 221\n", "
\n", " \n", "
Gd\n", "
\n", "  \n", "
\n", " 134\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 152\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
No95365.3%\n", "
 
\n", "
Av22115.1%\n", "
 
\n", "
Gd1349.2%\n", "
 
\n", "
Mn1147.8%\n", "
 
\n", "
NA382.6%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

BsmtFinSF1
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count637
Unique (%)43.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean443.64
Minimum0
Maximum5644
Zeros (%)32.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median383.5
Q3712.25
95-th percentile1274
Maximum5644
Range5644
Interquartile range712.25
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation456.1
Coef of variation1.0281
Kurtosis11.076
Mean443.64
MAD367.37
Skewness1.6838
Sum647710
Variance208030
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

BsmtFinSF2
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count144
Unique (%)9.9%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean46.549
Minimum0
Maximum1474
Zeros (%)88.6%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile396.2
Maximum1474
Range1474
Interquartile range0
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation161.32
Coef of variation3.4656
Kurtosis20.04
Mean46.549
MAD82.535
Skewness4.2509
Sum67962
Variance26024
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

BsmtFinType1
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count7
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Unf\n", "
\n", " 430\n", "
\n", " \n", "
GLQ\n", "
\n", " 418\n", "
\n", " \n", "
ALQ\n", "
\n", " 220\n", "
\n", " \n", "
Other values (4)\n", "
\n", " 392\n", "
\n", " \n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Unf43029.5%\n", "
 
\n", "
GLQ41828.6%\n", "
 
\n", "
ALQ22015.1%\n", "
 
\n", "
BLQ14810.1%\n", "
 
\n", "
Rec1339.1%\n", "
 
\n", "
LwQ745.1%\n", "
 
\n", "
NA372.5%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

BsmtFinType2
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count7
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Unf\n", "
\n", " 1256\n", "
\n", " \n", "
Rec\n", "
\n", "  \n", "
\n", " 54\n", "
LwQ\n", "
\n", "  \n", "
\n", " 46\n", "
Other values (4)\n", "
\n", "  \n", "
\n", " 104\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Unf125686.0%\n", "
 
\n", "
Rec543.7%\n", "
 
\n", "
LwQ463.2%\n", "
 
\n", "
NA382.6%\n", "
 
\n", "
BLQ332.3%\n", "
 
\n", "
ALQ191.3%\n", "
 
\n", "
GLQ141.0%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

BsmtFullBath
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean0.42534
Minimum0
Maximum3
Zeros (%)58.6%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q31
95-th percentile1
Maximum3
Range3
Interquartile range1
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation0.51891
Coef of variation1.22
Kurtosis-0.84033
Mean0.42534
MAD0.49876
Skewness0.59545
Sum621
Variance0.26927
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

BsmtHalfBath
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count3
Unique (%)0.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean0.057534
Minimum0
Maximum2
Zeros (%)94.4%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile1
Maximum2
Range2
Interquartile range0
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation0.23875
Coef of variation4.1497
Kurtosis16.336
Mean0.057534
MAD0.10861
Skewness4.0992
Sum84
Variance0.057003
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

BsmtQual
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
TA\n", "
\n", " 649\n", "
\n", " \n", "
Gd\n", "
\n", " 618\n", "
\n", " \n", "
Ex\n", "
\n", "  \n", "
\n", " 121\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 72\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
TA64944.5%\n", "
 
\n", "
Gd61842.3%\n", "
 
\n", "
Ex1218.3%\n", "
 
\n", "
NA372.5%\n", "
 
\n", "
Fa352.4%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

BsmtUnfSF
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count780
Unique (%)53.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean567.24
Minimum0
Maximum2336
Zeros (%)8.1%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q1223
Median477.5
Q3808
95-th percentile1468
Maximum2336
Range2336
Interquartile range585
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation441.87
Coef of variation0.77898
Kurtosis0.46926
Mean567.24
MAD353.28
Skewness0.91932
Sum828170
Variance195250
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

CentralAir
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count2
Unique (%)0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "
Y\n", "
\n", " 1365\n", "
\n", " \n", "
N\n", "
\n", "  \n", "
\n", " 95\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Y136593.5%\n", "
 
\n", "
N956.5%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

Condition1
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count9
Unique (%)0.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Norm\n", "
\n", " 1260\n", "
\n", " \n", "
Feedr\n", "
\n", "  \n", "
\n", " 81\n", "
Artery\n", "
\n", "  \n", "
\n", " 48\n", "
Other values (6)\n", "
\n", "  \n", "
\n", " 71\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Norm126086.3%\n", "
 
\n", "
Feedr815.5%\n", "
 
\n", "
Artery483.3%\n", "
 
\n", "
RRAn261.8%\n", "
 
\n", "
PosN191.3%\n", "
 
\n", "
RRAe110.8%\n", "
 
\n", "
PosA80.5%\n", "
 
\n", "
RRNn50.3%\n", "
 
\n", "
RRNe20.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

Condition2
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count8
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Norm\n", "
\n", " 1445\n", "
\n", " \n", "
Feedr\n", "
\n", "  \n", "
\n", " 6\n", "
PosN\n", "
\n", "  \n", "
\n", " 2\n", "
Other values (5)\n", "
\n", "  \n", "
\n", " 7\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Norm144599.0%\n", "
 
\n", "
Feedr60.4%\n", "
 
\n", "
PosN20.1%\n", "
 
\n", "
Artery20.1%\n", "
 
\n", "
RRNn20.1%\n", "
 
\n", "
RRAn10.1%\n", "
 
\n", "
PosA10.1%\n", "
 
\n", "
RRAe10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

Electrical
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count6
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
SBrkr\n", "
\n", " 1334\n", "
\n", " \n", "
FuseA\n", "
\n", "  \n", "
\n", " 94\n", "
FuseF\n", "
\n", "  \n", "
\n", " 27\n", "
Other values (3)\n", "
\n", "  \n", "
\n", " 5\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
SBrkr133491.4%\n", "
 
\n", "
FuseA946.4%\n", "
 
\n", "
FuseF271.8%\n", "
 
\n", "
FuseP30.2%\n", "
 
\n", "
Mix10.1%\n", "
 
\n", "
NA10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

EnclosedPorch
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count120
Unique (%)8.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean21.954
Minimum0
Maximum552
Zeros (%)85.8%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile180.15
Maximum552
Range552
Interquartile range0
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation61.119
Coef of variation2.784
Kurtosis10.391
Mean21.954
MAD37.66
Skewness3.0867
Sum32053
Variance3735.6
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

ExterCond
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
TA\n", "
\n", " 1282\n", "
\n", " \n", "
Gd\n", "
\n", "  \n", "
\n", " 146\n", "
Fa\n", "
\n", "  \n", "
\n", " 28\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 4\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
TA128287.8%\n", "
 
\n", "
Gd14610.0%\n", "
 
\n", "
Fa281.9%\n", "
 
\n", "
Ex30.2%\n", "
 
\n", "
Po10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

ExterQual
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
TA\n", "
\n", " 906\n", "
\n", " \n", "
Gd\n", "
\n", " 488\n", "
\n", " \n", "
Ex\n", "
\n", "  \n", "
\n", " 52\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
TA90662.1%\n", "
 
\n", "
Gd48833.4%\n", "
 
\n", "
Ex523.6%\n", "
 
\n", "
Fa141.0%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

Exterior1st
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count15
Unique (%)1.0%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
VinylSd\n", "
\n", " 515\n", "
\n", " \n", "
HdBoard\n", "
\n", " 222\n", "
\n", " \n", "
MetalSd\n", "
\n", " 220\n", "
\n", " \n", "
Other values (12)\n", "
\n", " 503\n", "
\n", " \n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
VinylSd51535.3%\n", "
 
\n", "
HdBoard22215.2%\n", "
 
\n", "
MetalSd22015.1%\n", "
 
\n", "
Wd Sdng20614.1%\n", "
 
\n", "
Plywood1087.4%\n", "
 
\n", "
CemntBd614.2%\n", "
 
\n", "
BrkFace503.4%\n", "
 
\n", "
WdShing261.8%\n", "
 
\n", "
Stucco251.7%\n", "
 
\n", "
AsbShng201.4%\n", "
 
\n", "
Stone20.1%\n", "
 
\n", "
BrkComm20.1%\n", "
 
\n", "
AsphShn10.1%\n", "
 
\n", "
ImStucc10.1%\n", "
 
\n", "
CBlock10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

Exterior2nd
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count16
Unique (%)1.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
VinylSd\n", "
\n", " 504\n", "
\n", " \n", "
MetalSd\n", "
\n", " 214\n", "
\n", " \n", "
HdBoard\n", "
\n", " 207\n", "
\n", " \n", "
Other values (13)\n", "
\n", " 535\n", "
\n", " \n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
VinylSd50434.5%\n", "
 
\n", "
MetalSd21414.7%\n", "
 
\n", "
HdBoard20714.2%\n", "
 
\n", "
Wd Sdng19713.5%\n", "
 
\n", "
Plywood1429.7%\n", "
 
\n", "
CmentBd604.1%\n", "
 
\n", "
Wd Shng382.6%\n", "
 
\n", "
Stucco261.8%\n", "
 
\n", "
BrkFace251.7%\n", "
 
\n", "
AsbShng201.4%\n", "
 
\n", "
ImStucc100.7%\n", "
 
\n", "
Brk Cmn70.5%\n", "
 
\n", "
Stone50.3%\n", "
 
\n", "
AsphShn30.2%\n", "
 
\n", "
Other10.1%\n", "
 
\n", "
CBlock10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

Fence
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
NA\n", "
\n", " 1179\n", "
\n", " \n", "
MnPrv\n", "
\n", "  \n", "
\n", " 157\n", "
GdPrv\n", "
\n", "  \n", "
\n", " 59\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 65\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
NA117980.8%\n", "
 
\n", "
MnPrv15710.8%\n", "
 
\n", "
GdPrv594.0%\n", "
 
\n", "
GdWo543.7%\n", "
 
\n", "
MnWw110.8%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

FireplaceQu
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count6
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
NA\n", "
\n", " 690\n", "
\n", " \n", "
Gd\n", "
\n", " 380\n", "
\n", " \n", "
TA\n", "
\n", " 313\n", "
\n", " \n", "
Other values (3)\n", "
\n", "  \n", "
\n", " 77\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
NA69047.3%\n", "
 
\n", "
Gd38026.0%\n", "
 
\n", "
TA31321.4%\n", "
 
\n", "
Fa332.3%\n", "
 
\n", "
Ex241.6%\n", "
 
\n", "
Po201.4%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

Fireplaces
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean0.61301
Minimum0
Maximum3
Zeros (%)47.3%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median1
Q31
95-th percentile2
Maximum3
Range3
Interquartile range1
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation0.64467
Coef of variation1.0516
Kurtosis-0.2206
Mean0.61301
MAD0.57942
Skewness0.6489
Sum895
Variance0.41559
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

Foundation
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count6
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
PConc\n", "
\n", " 647\n", "
\n", " \n", "
CBlock\n", "
\n", " 634\n", "
\n", " \n", "
BrkTil\n", "
\n", " 146\n", "
\n", " \n", "
Other values (3)\n", "
\n", "  \n", "
\n", " 33\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
PConc64744.3%\n", "
 
\n", "
CBlock63443.4%\n", "
 
\n", "
BrkTil14610.0%\n", "
 
\n", "
Slab241.6%\n", "
 
\n", "
Stone60.4%\n", "
 
\n", "
Wood30.2%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

FullBath
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean1.5651
Minimum0
Maximum3
Zeros (%)0.6%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile1
Q11
Median2
Q32
95-th percentile2
Maximum3
Range3
Interquartile range1
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation0.55092
Coef of variation0.35201
Kurtosis-0.85822
Mean1.5651
MAD0.52244
Skewness0.036524
Sum2285
Variance0.30351
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

Functional
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count7
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Typ\n", "
\n", " 1360\n", "
\n", " \n", "
Min2\n", "
\n", "  \n", "
\n", " 34\n", "
Min1\n", "
\n", "  \n", "
\n", " 31\n", "
Other values (4)\n", "
\n", "  \n", "
\n", " 35\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Typ136093.2%\n", "
 
\n", "
Min2342.3%\n", "
 
\n", "
Min1312.1%\n", "
 
\n", "
Mod151.0%\n", "
 
\n", "
Maj1141.0%\n", "
 
\n", "
Maj250.3%\n", "
 
\n", "
Sev10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

GarageArea
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count441
Unique (%)30.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean472.98
Minimum0
Maximum1418
Zeros (%)5.5%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q1334.5
Median480
Q3576
95-th percentile850.1
Maximum1418
Range1418
Interquartile range241.5
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation213.8
Coef of variation0.45204
Kurtosis0.90982
Mean472.98
MAD160.02
Skewness0.1798
Sum690550
Variance45713
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

GarageCars
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean1.7671
Minimum0
Maximum4
Zeros (%)5.5%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q11
Median2
Q32
95-th percentile3
Maximum4
Range4
Interquartile range1
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation0.74732
Coef of variation0.4229
Kurtosis0.21613
Mean1.7671
MAD0.58384
Skewness-0.3422
Sum2580
Variance0.55848
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

GarageCond
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count6
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
TA\n", "
\n", " 1326\n", "
\n", " \n", "
NA\n", "
\n", "  \n", "
\n", " 81\n", "
Fa\n", "
\n", "  \n", "
\n", " 35\n", "
Other values (3)\n", "
\n", "  \n", "
\n", " 18\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
TA132690.8%\n", "
 
\n", "
NA815.5%\n", "
 
\n", "
Fa352.4%\n", "
 
\n", "
Gd90.6%\n", "
 
\n", "
Po70.5%\n", "
 
\n", "
Ex20.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

GarageFinish
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Unf\n", "
\n", " 605\n", "
\n", " \n", "
RFn\n", "
\n", " 422\n", "
\n", " \n", "
Fin\n", "
\n", " 352\n", "
\n", " \n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Unf60541.4%\n", "
 
\n", "
RFn42228.9%\n", "
 
\n", "
Fin35224.1%\n", "
 
\n", "
NA815.5%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

GarageQual
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count6
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
TA\n", "
\n", " 1311\n", "
\n", " \n", "
NA\n", "
\n", "  \n", "
\n", " 81\n", "
Fa\n", "
\n", "  \n", "
\n", " 48\n", "
Other values (3)\n", "
\n", "  \n", "
\n", " 20\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
TA131189.8%\n", "
 
\n", "
NA815.5%\n", "
 
\n", "
Fa483.3%\n", "
 
\n", "
Gd141.0%\n", "
 
\n", "
Po30.2%\n", "
 
\n", "
Ex30.2%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

GarageType
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count7
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Attchd\n", "
\n", " 870\n", "
\n", " \n", "
Detchd\n", "
\n", " 387\n", "
\n", " \n", "
BuiltIn\n", "
\n", "  \n", "
\n", " 88\n", "
Other values (4)\n", "
\n", "  \n", "
\n", " 115\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Attchd87059.6%\n", "
 
\n", "
Detchd38726.5%\n", "
 
\n", "
BuiltIn886.0%\n", "
 
\n", "
NA815.5%\n", "
 
\n", "
Basment191.3%\n", "
 
\n", "
CarPort90.6%\n", "
 
\n", "
2Types60.4%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

GarageYrBlt
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count98
Unique (%)6.7%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
NA\n", "
\n", "  \n", "
\n", " 81\n", "
2005\n", "
\n", "  \n", "
\n", " 65\n", "
2006\n", "
\n", "  \n", "
\n", " 59\n", "
Other values (95)\n", "
\n", " 1255\n", "
\n", " \n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
NA815.5%\n", "
 
\n", "
2005654.5%\n", "
 
\n", "
2006594.0%\n", "
 
\n", "
2004533.6%\n", "
 
\n", "
2003503.4%\n", "
 
\n", "
2007493.4%\n", "
 
\n", "
1977352.4%\n", "
 
\n", "
1998312.1%\n", "
 
\n", "
1999302.1%\n", "
 
\n", "
1976292.0%\n", "
 
\n", "
2008292.0%\n", "
 
\n", "
2000271.8%\n", "
 
\n", "
1968261.8%\n", "
 
\n", "
2002261.8%\n", "
 
\n", "
1950241.6%\n", "
 
\n", "
1993221.5%\n", "
 
\n", "
1958211.4%\n", "
 
\n", "
1965211.4%\n", "
 
\n", "
1962211.4%\n", "
 
\n", "
2009211.4%\n", "
 
\n", "
Other values (78)74050.7%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

GrLivArea
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count861
Unique (%)59.0%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean1515.5
Minimum334
Maximum5642
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum334
5-th percentile848
Q11129.5
Median1464
Q31776.8
95-th percentile2466.1
Maximum5642
Range5308
Interquartile range647.25
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation525.48
Coef of variation0.34675
Kurtosis4.8743
Mean1515.5
MAD397.32
Skewness1.3652
Sum2212600
Variance276130
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

HalfBath
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count3
Unique (%)0.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean0.38288
Minimum0
Maximum2
Zeros (%)62.5%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q31
95-th percentile1
Maximum2
Range2
Interquartile range1
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation0.50289
Coef of variation1.3134
Kurtosis-1.0773
Mean0.38288
MAD0.47886
Skewness0.6752
Sum559
Variance0.25289
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

Heating
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count6
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
GasA\n", "
\n", " 1428\n", "
\n", " \n", "
GasW\n", "
\n", "  \n", "
\n", " 18\n", "
Grav\n", "
\n", "  \n", "
\n", " 7\n", "
Other values (3)\n", "
\n", "  \n", "
\n", " 7\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
GasA142897.8%\n", "
 
\n", "
GasW181.2%\n", "
 
\n", "
Grav70.5%\n", "
 
\n", "
Wall40.3%\n", "
 
\n", "
OthW20.1%\n", "
 
\n", "
Floor10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

HeatingQC
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Ex\n", "
\n", " 741\n", "
\n", " \n", "
TA\n", "
\n", " 428\n", "
\n", " \n", "
Gd\n", "
\n", " 241\n", "
\n", " \n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 50\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Ex74150.8%\n", "
 
\n", "
TA42829.3%\n", "
 
\n", "
Gd24116.5%\n", "
 
\n", "
Fa493.4%\n", "
 
\n", "
Po10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

HouseStyle
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count8
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
1Story\n", "
\n", " 726\n", "
\n", " \n", "
2Story\n", "
\n", " 445\n", "
\n", " \n", "
1.5Fin\n", "
\n", " 154\n", "
\n", " \n", "
Other values (5)\n", "
\n", "  \n", "
\n", " 135\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
1Story72649.7%\n", "
 
\n", "
2Story44530.5%\n", "
 
\n", "
1.5Fin15410.5%\n", "
 
\n", "
SLvl654.5%\n", "
 
\n", "
SFoyer372.5%\n", "
 
\n", "
1.5Unf141.0%\n", "
 
\n", "
2.5Unf110.8%\n", "
 
\n", "
2.5Fin80.5%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

Id
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count1460
Unique (%)100.0%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean730.5
Minimum1
Maximum1460
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum1
5-th percentile73.95
Q1365.75
Median730.5
Q31095.2
95-th percentile1387
Maximum1460
Range1459
Interquartile range729.5
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation421.61
Coef of variation0.57715
Kurtosis-1.2
Mean730.5
MAD365
Skewness0
Sum1066500
Variance177760
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

KitchenAbvGr
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean1.0466
Minimum0
Maximum3
Zeros (%)0.1%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile1
Q11
Median1
Q31
95-th percentile1
Maximum3
Range3
Interquartile range0
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation0.22034
Coef of variation0.21053
Kurtosis21.455
Mean1.0466
MAD0.090246
Skewness4.4838
Sum1528
Variance0.048549
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

KitchenQual
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
TA\n", "
\n", " 735\n", "
\n", " \n", "
Gd\n", "
\n", " 586\n", "
\n", " \n", "
Ex\n", "
\n", "  \n", "
\n", " 100\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
TA73550.3%\n", "
 
\n", "
Gd58640.1%\n", "
 
\n", "
Ex1006.8%\n", "
 
\n", "
Fa392.7%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

LandContour
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Lvl\n", "
\n", " 1311\n", "
\n", " \n", "
Bnk\n", "
\n", "  \n", "
\n", " 63\n", "
HLS\n", "
\n", "  \n", "
\n", " 50\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Lvl131189.8%\n", "
 
\n", "
Bnk634.3%\n", "
 
\n", "
HLS503.4%\n", "
 
\n", "
Low362.5%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

LandSlope
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count3
Unique (%)0.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Gtl\n", "
\n", " 1382\n", "
\n", " \n", "
Mod\n", "
\n", "  \n", "
\n", " 65\n", "
Sev\n", "
\n", "  \n", "
\n", " 13\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Gtl138294.7%\n", "
 
\n", "
Mod654.5%\n", "
 
\n", "
Sev130.9%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

LotArea
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count1073
Unique (%)73.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean10517
Minimum1300
Maximum215240
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum1300
5-th percentile3311.7
Q17553.5
Median9478.5
Q311602
95-th percentile17401
Maximum215240
Range213940
Interquartile range4048
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation9981.3
Coef of variation0.94908
Kurtosis202.54
Mean10517
MAD3758.8
Skewness12.195
Sum15355000
Variance99626000
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

LotConfig
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Inside\n", "
\n", " 1052\n", "
\n", " \n", "
Corner\n", "
\n", " 263\n", "
\n", " \n", "
CulDSac\n", "
\n", "  \n", "
\n", " 94\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 51\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Inside105272.1%\n", "
 
\n", "
Corner26318.0%\n", "
 
\n", "
CulDSac946.4%\n", "
 
\n", "
FR2473.2%\n", "
 
\n", "
FR340.3%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

LotFrontage
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count111
Unique (%)7.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
NA\n", "
\n", " 259\n", "
\n", " \n", "
60\n", "
\n", "  \n", "
\n", " 143\n", "
70\n", "
\n", "  \n", "
\n", " 70\n", "
Other values (108)\n", "
\n", " 988\n", "
\n", " \n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
NA25917.7%\n", "
 
\n", "
601439.8%\n", "
 
\n", "
70704.8%\n", "
 
\n", "
80694.7%\n", "
 
\n", "
50573.9%\n", "
 
\n", "
75533.6%\n", "
 
\n", "
65443.0%\n", "
 
\n", "
85402.7%\n", "
 
\n", "
78251.7%\n", "
 
\n", "
90231.6%\n", "
 
\n", "
21231.6%\n", "
 
\n", "
64191.3%\n", "
 
\n", "
68191.3%\n", "
 
\n", "
24191.3%\n", "
 
\n", "
73181.2%\n", "
 
\n", "
55171.2%\n", "
 
\n", "
79171.2%\n", "
 
\n", "
63171.2%\n", "
 
\n", "
72171.2%\n", "
 
\n", "
100161.1%\n", "
 
\n", "
Other values (91)49533.9%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

LotShape
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Reg\n", "
\n", " 925\n", "
\n", " \n", "
IR1\n", "
\n", " 484\n", "
\n", " \n", "
IR2\n", "
\n", "  \n", "
\n", " 41\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Reg92563.4%\n", "
 
\n", "
IR148433.2%\n", "
 
\n", "
IR2412.8%\n", "
 
\n", "
IR3100.7%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

LowQualFinSF
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count24
Unique (%)1.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean5.8445
Minimum0
Maximum572
Zeros (%)98.2%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile0
Maximum572
Range572
Interquartile range0
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation48.623
Coef of variation8.3194
Kurtosis82.946
Mean5.8445
MAD11.481
Skewness9.0021
Sum8533
Variance2364.2
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

MSSubClass
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count15
Unique (%)1.0%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean56.897
Minimum20
Maximum190
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum20
5-th percentile20
Q120
Median50
Q370
95-th percentile160
Maximum190
Range170
Interquartile range50
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation42.301
Coef of variation0.74346
Kurtosis1.5707
Mean56.897
MAD31.283
Skewness1.4062
Sum83070
Variance1789.3
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

MSZoning
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
RL\n", "
\n", " 1151\n", "
\n", " \n", "
RM\n", "
\n", "  \n", "
\n", " 218\n", "
FV\n", "
\n", "  \n", "
\n", " 65\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 26\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
RL115178.8%\n", "
 
\n", "
RM21814.9%\n", "
 
\n", "
FV654.5%\n", "
 
\n", "
RH161.1%\n", "
 
\n", "
C (all)100.7%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

MasVnrArea
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count328
Unique (%)22.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
0\n", "
\n", " 861\n", "
\n", " \n", "
108\n", "
\n", "  \n", "
\n", " 8\n", "
180\n", "
\n", "  \n", "
\n", " 8\n", "
Other values (325)\n", "
\n", " 583\n", "
\n", " \n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
086159.0%\n", "
 
\n", "
10880.5%\n", "
 
\n", "
18080.5%\n", "
 
\n", "
7280.5%\n", "
 
\n", "
NA80.5%\n", "
 
\n", "
1670.5%\n", "
 
\n", "
12070.5%\n", "
 
\n", "
20060.4%\n", "
 
\n", "
10660.4%\n", "
 
\n", "
34060.4%\n", "
 
\n", "
8060.4%\n", "
 
\n", "
17050.3%\n", "
 
\n", "
8450.3%\n", "
 
\n", "
32050.3%\n", "
 
\n", "
36050.3%\n", "
 
\n", "
13250.3%\n", "
 
\n", "
21640.3%\n", "
 
\n", "
19640.3%\n", "
 
\n", "
7640.3%\n", "
 
\n", "
21040.3%\n", "
 
\n", "
Other values (308)48833.4%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

MasVnrType
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
None\n", "
\n", " 864\n", "
\n", " \n", "
BrkFace\n", "
\n", " 445\n", "
\n", " \n", "
Stone\n", "
\n", "  \n", "
\n", " 128\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 23\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
None86459.2%\n", "
 
\n", "
BrkFace44530.5%\n", "
 
\n", "
Stone1288.8%\n", "
 
\n", "
BrkCmn151.0%\n", "
 
\n", "
NA80.5%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

MiscFeature
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
NA\n", "
\n", " 1406\n", "
\n", " \n", "
Shed\n", "
\n", "  \n", "
\n", " 49\n", "
Gar2\n", "
\n", "  \n", "
\n", " 2\n", "
Other values (2)\n", "
\n", "  \n", "
\n", " 3\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
NA140696.3%\n", "
 
\n", "
Shed493.4%\n", "
 
\n", "
Gar220.1%\n", "
 
\n", "
Othr20.1%\n", "
 
\n", "
TenC10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

MiscVal
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count21
Unique (%)1.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean43.489
Minimum0
Maximum15500
Zeros (%)96.4%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile0
Maximum15500
Range15500
Interquartile range0
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation496.12
Coef of variation11.408
Kurtosis698.6
Mean43.489
MAD83.88
Skewness24.452
Sum63494
Variance246140
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

MoSold
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count12
Unique (%)0.8%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean6.3219
Minimum1
Maximum12
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum1
5-th percentile2
Q15
Median6
Q38
95-th percentile11
Maximum12
Range11
Interquartile range3
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation2.7036
Coef of variation0.42766
Kurtosis-0.40683
Mean6.3219
MAD2.1425
Skewness0.21184
Sum9230
Variance7.3096
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

Neighborhood
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count25
Unique (%)1.7%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
NAmes\n", "
\n", " 225\n", "
\n", " \n", "
CollgCr\n", "
\n", "  \n", "
\n", " 150\n", "
OldTown\n", "
\n", "  \n", "
\n", " 113\n", "
Other values (22)\n", "
\n", " 972\n", "
\n", " \n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
NAmes22515.4%\n", "
 
\n", "
CollgCr15010.3%\n", "
 
\n", "
OldTown1137.7%\n", "
 
\n", "
Edwards1006.8%\n", "
 
\n", "
Somerst865.9%\n", "
 
\n", "
Gilbert795.4%\n", "
 
\n", "
NridgHt775.3%\n", "
 
\n", "
Sawyer745.1%\n", "
 
\n", "
NWAmes735.0%\n", "
 
\n", "
SawyerW594.0%\n", "
 
\n", "
BrkSide584.0%\n", "
 
\n", "
Crawfor513.5%\n", "
 
\n", "
Mitchel493.4%\n", "
 
\n", "
NoRidge412.8%\n", "
 
\n", "
Timber382.6%\n", "
 
\n", "
IDOTRR372.5%\n", "
 
\n", "
ClearCr281.9%\n", "
 
\n", "
StoneBr251.7%\n", "
 
\n", "
SWISU251.7%\n", "
 
\n", "
MeadowV171.2%\n", "
 
\n", "
Other values (5)553.8%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

OpenPorchSF
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count202
Unique (%)13.8%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean46.66
Minimum0
Maximum547
Zeros (%)44.9%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median25
Q368
95-th percentile175.05
Maximum547
Range547
Interquartile range68
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation66.256
Coef of variation1.42
Kurtosis8.4572
Mean46.66
MAD47.678
Skewness2.3619
Sum68124
Variance4389.9
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

OverallCond
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count9
Unique (%)0.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean5.5753
Minimum1
Maximum9
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum1
5-th percentile4
Q15
Median5
Q36
95-th percentile8
Maximum9
Range8
Interquartile range1
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation1.1128
Coef of variation0.19959
Kurtosis1.0985
Mean5.5753
MAD0.88902
Skewness0.69236
Sum8140
Variance1.2383
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

OverallQual
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count10
Unique (%)0.7%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean6.0993
Minimum1
Maximum10
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum1
5-th percentile4
Q15
Median6
Q37
95-th percentile8
Maximum10
Range9
Interquartile range2
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation1.383
Coef of variation0.22675
Kurtosis0.091857
Mean6.0993
MAD1.098
Skewness0.21672
Sum8905
Variance1.9127
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

PavedDrive
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count3
Unique (%)0.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Y\n", "
\n", " 1340\n", "
\n", " \n", "
N\n", "
\n", "  \n", "
\n", " 90\n", "
P\n", "
\n", "  \n", "
\n", " 30\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Y134091.8%\n", "
 
\n", "
N906.2%\n", "
 
\n", "
P302.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

PoolArea
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count8
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean2.7589
Minimum0
Maximum738
Zeros (%)99.5%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile0
Maximum738
Range738
Interquartile range0
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation40.177
Coef of variation14.563
Kurtosis222.5
Mean2.7589
MAD5.4914
Skewness14.813
Sum4028
Variance1614.2
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

PoolQC
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count4
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
NA\n", "
\n", " 1453\n", "
\n", " \n", "
Gd\n", "
\n", "  \n", "
\n", " 3\n", "
Ex\n", "
\n", "  \n", "
\n", " 2\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
NA145399.5%\n", "
 
\n", "
Gd30.2%\n", "
 
\n", "
Ex20.1%\n", "
 
\n", "
Fa20.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

RoofMatl
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count8
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
CompShg\n", "
\n", " 1434\n", "
\n", " \n", "
Tar&Grv\n", "
\n", "  \n", "
\n", " 11\n", "
WdShngl\n", "
\n", "  \n", "
\n", " 6\n", "
Other values (5)\n", "
\n", "  \n", "
\n", " 9\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
CompShg143498.2%\n", "
 
\n", "
Tar&Grv110.8%\n", "
 
\n", "
WdShngl60.4%\n", "
 
\n", "
WdShake50.3%\n", "
 
\n", "
Membran10.1%\n", "
 
\n", "
ClyTile10.1%\n", "
 
\n", "
Metal10.1%\n", "
 
\n", "
Roll10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

RoofStyle
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count6
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Gable\n", "
\n", " 1141\n", "
\n", " \n", "
Hip\n", "
\n", " 286\n", "
\n", " \n", "
Flat\n", "
\n", "  \n", "
\n", " 13\n", "
Other values (3)\n", "
\n", "  \n", "
\n", " 20\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Gable114178.2%\n", "
 
\n", "
Hip28619.6%\n", "
 
\n", "
Flat130.9%\n", "
 
\n", "
Gambrel110.8%\n", "
 
\n", "
Mansard70.5%\n", "
 
\n", "
Shed20.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

SaleCondition
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count6
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
Normal\n", "
\n", " 1198\n", "
\n", " \n", "
Partial\n", "
\n", "  \n", "
\n", " 125\n", "
Abnorml\n", "
\n", "  \n", "
\n", " 101\n", "
Other values (3)\n", "
\n", "  \n", "
\n", " 36\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Normal119882.1%\n", "
 
\n", "
Partial1258.6%\n", "
 
\n", "
Abnorml1016.9%\n", "
 
\n", "
Family201.4%\n", "
 
\n", "
Alloca120.8%\n", "
 
\n", "
AdjLand40.3%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

SalePrice
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count663
Unique (%)45.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean180920
Minimum34900
Maximum755000
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum34900
5-th percentile88000
Q1129980
Median163000
Q3214000
95-th percentile326100
Maximum755000
Range720100
Interquartile range84025
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation79443
Coef of variation0.4391
Kurtosis6.5098
Mean180920
MAD57435
Skewness1.8809
Sum264140000
Variance6311100000
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

SaleType
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count9
Unique (%)0.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "
WD\n", "
\n", " 1267\n", "
\n", " \n", "
New\n", "
\n", "  \n", "
\n", " 122\n", "
COD\n", "
\n", "  \n", "
\n", " 43\n", "
Other values (6)\n", "
\n", "  \n", "
\n", " 28\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
WD126786.8%\n", "
 
\n", "
New1228.4%\n", "
 
\n", "
COD432.9%\n", "
 
\n", "
ConLD90.6%\n", "
 
\n", "
ConLI50.3%\n", "
 
\n", "
ConLw50.3%\n", "
 
\n", "
CWD40.3%\n", "
 
\n", "
Oth30.2%\n", "
 
\n", "
Con20.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

ScreenPorch
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count76
Unique (%)5.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean15.061
Minimum0
Maximum480
Zeros (%)92.1%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile160
Maximum480
Range480
Interquartile range0
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation55.757
Coef of variation3.7021
Kurtosis18.372
Mean15.061
MAD27.729
Skewness4.118
Sum21989
Variance3108.9
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

Street
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count2
Unique (%)0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "
Pave\n", "
\n", " 1454\n", "
\n", " \n", "
Grvl\n", "
\n", "  \n", "
\n", " 6\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
Pave145499.6%\n", "
 
\n", "
Grvl60.4%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

TotRmsAbvGrd
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count12
Unique (%)0.8%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean6.5178
Minimum2
Maximum14
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum2
5-th percentile4
Q15
Median6
Q37
95-th percentile10
Maximum14
Range12
Interquartile range2
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation1.6254
Coef of variation0.24938
Kurtosis0.87364
Mean6.5178
MAD1.2796
Skewness0.67565
Sum9516
Variance2.6419
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

TotalBsmtSF
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count721
Unique (%)49.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean1057.4
Minimum0
Maximum6110
Zeros (%)2.5%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile519.3
Q1795.75
Median991.5
Q31298.2
95-th percentile1753
Maximum6110
Range6110
Interquartile range502.5
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation438.71
Coef of variation0.41488
Kurtosis13.201
Mean1057.4
MAD321.28
Skewness1.5227
Sum1543800
Variance192460
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

Utilities
\n", " Categorical\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count2
Unique (%)0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "
AllPub\n", "
\n", " 1459\n", "
\n", " \n", "
NoSeWa\n", "
\n", "  \n", "
\n", " 1\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "
ValueCountFrequency (%) 
AllPub145999.9%\n", "
 
\n", "
NoSeWa10.1%\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "

WoodDeckSF
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count274
Unique (%)18.8%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean94.245
Minimum0
Maximum857
Zeros (%)52.1%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q3168
95-th percentile335
Maximum857
Range857
Interquartile range168
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation125.34
Coef of variation1.3299
Kurtosis2.9786
Mean94.245
MAD102
Skewness1.5398
Sum137600
Variance15710
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

YearBuilt
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count112
Unique (%)7.7%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean1971.3
Minimum1872
Maximum2010
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum1872
5-th percentile1916
Q11954
Median1973
Q32000
95-th percentile2007
Maximum2010
Range138
Interquartile range46
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation30.203
Coef of variation0.015322
Kurtosis-0.44215
Mean1971.3
MAD25.067
Skewness-0.61283
Sum2878100
Variance912.22
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

YearRemodAdd
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count61
Unique (%)4.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean1984.9
Minimum1950
Maximum2010
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum1950
5-th percentile1950
Q11967
Median1994
Q32004
95-th percentile2007
Maximum2010
Range60
Interquartile range37
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation20.645
Coef of variation0.010401
Kurtosis-1.272
Mean1984.9
MAD18.623
Skewness-0.50304
Sum2897900
Variance426.23
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

YrSold
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count5
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean2007.8
Minimum2006
Maximum2010
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum2006
5-th percentile2006
Q12007
Median2008
Q32009
95-th percentile2010
Maximum2010
Range4
Interquartile range2
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation1.3281
Coef of variation0.00066146
Kurtosis-1.1906
Mean2007.8
MAD1.1487
Skewness0.09617
Sum2931400
Variance1.7638
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

Sample

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilitiesLotConfigLandSlopeNeighborhoodCondition1Condition2BldgTypeHouseStyleOverallQualOverallCondYearBuiltYearRemodAddRoofStyleRoofMatlExterior1stExterior2ndMasVnrTypeMasVnrAreaExterQualExterCondFoundationBsmtQualBsmtCondBsmtExposureBsmtFinType1BsmtFinSF1BsmtFinType2BsmtFinSF2BsmtUnfSFTotalBsmtSFHeatingHeatingQCCentralAirElectrical1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaBsmtFullBathBsmtHalfBathFullBathHalfBathBedroomAbvGrKitchenAbvGrKitchenQualTotRmsAbvGrdFunctionalFireplacesFireplaceQuGarageTypeGarageYrBltGarageFinishGarageCarsGarageAreaGarageQualGarageCondPavedDriveWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
0160RL658450PaveNARegLvlAllPubInsideGtlCollgCrNormNorm1Fam2Story7520032003GableCompShgVinylSdVinylSdBrkFace196GdTAPConcGdTANoGLQ706Unf0150856GasAExYSBrkr85685401710102131Gd8Typ0NAAttchd2003RFn2548TATAY0610000NANANA022008WDNormal208500
1220RL809600PaveNARegLvlAllPubFR2GtlVeenkerFeedrNorm1Fam1Story6819761976GableCompShgMetalSdMetalSdNone0TATACBlockGdTAGdALQ978Unf02841262GasAExYSBrkr1262001262012031TA6Typ1TAAttchd1976RFn2460TATAY29800000NANANA052007WDNormal181500
2360RL6811250PaveNAIR1LvlAllPubInsideGtlCollgCrNormNorm1Fam2Story7520012002GableCompShgVinylSdVinylSdBrkFace162GdTAPConcGdTAMnGLQ486Unf0434920GasAExYSBrkr92086601786102131Gd6Typ1TAAttchd2001RFn2608TATAY0420000NANANA092008WDNormal223500
3470RL609550PaveNAIR1LvlAllPubCornerGtlCrawforNormNorm1Fam2Story7519151970GableCompShgWd SdngWd ShngNone0TATABrkTilTAGdNoALQ216Unf0540756GasAGdYSBrkr96175601717101031Gd7Typ1GdDetchd1998Unf3642TATAY035272000NANANA022006WDAbnorml140000
4560RL8414260PaveNAIR1LvlAllPubFR2GtlNoRidgeNormNorm1Fam2Story8520002000GableCompShgVinylSdVinylSdBrkFace350GdTAPConcGdTAAvGLQ655Unf04901145GasAExYSBrkr1145105302198102141Gd9Typ1TAAttchd2000RFn3836TATAY192840000NANANA0122008WDNormal250000
\n", "
\n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "profiler.profiler()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This class makes a profile for a given dataframe and its different general features. Based on spark-df-profiling by Julio Soto.\n", "\n", "This overview presents basic information about the DataFrame, like number of variable it has, how many are missing values and in which column, the types of each variable, also some statistical information that describes the variable plus a frequency plot. Also a table that specifies the existing datatypes in each column dataFrame and other features. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also use Spark's native `describe` function to get something very similar of what you got using pandas." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
summaryIdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContour...PoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
0count146014601460146014601460146014601460...1460146014601460146014601460146014601460
1mean730.556.897260273972606None70.0499583680266510516.828082191782NoneNoneNoneNone...2.758904109589041NoneNoneNone43.4890410958904146.3219178082191782007.8157534246575NoneNone180921.19589041095
2stddev421.610009368847942.30057099381045None24.284751774483219981.26493237915NoneNoneNoneNone...40.17730694453021NoneNoneNone496.12302445794412.70362620835951131.3280951205521145NoneNone79442.50288288663
3min120C (all)1001300GrvlGrvlIR1Bnk...0ExGdPrvGar2012006CODAbnorml34900
4max1460190RMNA215245PavePaveRegLvl...738NANATenC15500122010WDPartial755000
\n", "

5 rows × 82 columns

\n", "
" ], "text/plain": [ " summary Id MSSubClass MSZoning LotFrontage \\\n", "0 count 1460 1460 1460 1460 \n", "1 mean 730.5 56.897260273972606 None 70.04995836802665 \n", "2 stddev 421.6100093688479 42.30057099381045 None 24.28475177448321 \n", "3 min 1 20 C (all) 100 \n", "4 max 1460 190 RM NA \n", "\n", " LotArea Street Alley LotShape LandContour ... \\\n", "0 1460 1460 1460 1460 1460 ... \n", "1 10516.828082191782 None None None None ... \n", "2 9981.26493237915 None None None None ... \n", "3 1300 Grvl Grvl IR1 Bnk ... \n", "4 215245 Pave Pave Reg Lvl ... \n", "\n", " PoolArea PoolQC Fence MiscFeature MiscVal \\\n", "0 1460 1460 1460 1460 1460 \n", "1 2.758904109589041 None None None 43.489041095890414 \n", "2 40.17730694453021 None None None 496.1230244579441 \n", "3 0 Ex GdPrv Gar2 0 \n", "4 738 NA NA TenC 15500 \n", "\n", " MoSold YrSold SaleType SaleCondition \\\n", "0 1460 1460 1460 1460 \n", "1 6.321917808219178 2007.8157534246575 None None \n", "2 2.7036262083595113 1.3280951205521145 None None \n", "3 1 2006 COD Abnorml \n", "4 12 2010 WD Partial \n", "\n", " SalePrice \n", "0 1460 \n", "1 180921.19589041095 \n", "2 79442.50288288663 \n", "3 34900 \n", "4 755000 \n", "\n", "[5 rows x 82 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe().toPandas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Selecting and Filtering Data using Pandas\n", "\n", "Your dataset had too many variables to wrap your head around, or even to print out nicely. This is just one of many situations where you'll want to access a smaller set of your data.\n", "\n", "For now, we'll rely on our intuition to pick variables to focus on. Later tutorials will show you statistical techniques to automatically prioritize variables.\n", "\n", "Before we can choose columns, it is helpful to see a list of all columns in the dataset. That is done with the columns property of the DataFrame (the bottom line of code below)." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',\n", " 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',\n", " 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',\n", " 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',\n", " 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',\n", " 'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',\n", " 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',\n", " 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',\n", " 'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',\n", " 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',\n", " 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',\n", " 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',\n", " 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',\n", " 'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',\n", " 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',\n", " 'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',\n", " 'SaleCondition', 'SalePrice'],\n", " dtype='object')\n" ] } ], "source": [ "file_path = 'train.csv'\n", "df = pd.read_csv(file_path) \n", "print(df.columns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many ways to select a subset of your data. We'll start with two main approaches:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tasks:\n", "\n", "- Print a list of the columns\n", "- From the list of columns, find a name of the column with the sales prices of the homes. Use the dot notation to extract this to a variable (as you saw above in the [original notebook](https://www.kaggle.com/dansbecker/selecting-and-filtering-in-pandas))\n", "- Use the head command to print out the top few lines of the variable you just created.\n", "- Pick any two variables and store them to a new DataFrame (as you saw in the [original notebook](https://www.kaggle.com/dansbecker/selecting-and-filtering-in-pandas) when you created two_columns_of_data.)\n", "- Use the describe command with the DataFrame you just created to see summaries of those variables. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Print a list of the columns" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',\n", " 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',\n", " 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',\n", " 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',\n", " 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',\n", " 'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',\n", " 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',\n", " 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',\n", " 'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',\n", " 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',\n", " 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',\n", " 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',\n", " 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',\n", " 'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',\n", " 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',\n", " 'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',\n", " 'SaleCondition', 'SalePrice'],\n", " dtype='object')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### From the list of columns, find a name of the column with the sales prices of the homes. Use the dot notation to extract this to a variable" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "house_price = df.SalePrice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use the head command to print out the top few lines of the variable you just created." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 208500\n", "1 181500\n", "2 223500\n", "3 140000\n", "4 250000\n", "Name: SalePrice, dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "house_price.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pick any two variables and store them to a new DataFrame" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "columns_of_interest = ['1stFlrSF', '2ndFlrSF']\n", "two_columns_of_data = df[columns_of_interest]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1stFlrSF2ndFlrSF
0856854
112620
2920866
3961756
411451053
5796566
616940
71107983
81022752
910770
1010400
1111821142
129120
1314940
1412530
158540
1610040
1712960
1811140
1913390
2011581218
2111080
2217950
2310600
2410600
2516000
269000
2717040
2816000
295200
.........
14307341104
14319580
14329680
1433962830
143411260
143515370
14368640
143719320
143812360
14391040685
14401423748
14418480
14421026981
14439520
144414220
14459130
144611880
14471220870
1448796550
14496300
1450896896
145115780
145210720
145311400
145412210
1455953694
145620730
145711881152
145810780
145912560
\n", "

1460 rows × 2 columns

\n", "
" ], "text/plain": [ " 1stFlrSF 2ndFlrSF\n", "0 856 854\n", "1 1262 0\n", "2 920 866\n", "3 961 756\n", "4 1145 1053\n", "5 796 566\n", "6 1694 0\n", "7 1107 983\n", "8 1022 752\n", "9 1077 0\n", "10 1040 0\n", "11 1182 1142\n", "12 912 0\n", "13 1494 0\n", "14 1253 0\n", "15 854 0\n", "16 1004 0\n", "17 1296 0\n", "18 1114 0\n", "19 1339 0\n", "20 1158 1218\n", "21 1108 0\n", "22 1795 0\n", "23 1060 0\n", "24 1060 0\n", "25 1600 0\n", "26 900 0\n", "27 1704 0\n", "28 1600 0\n", "29 520 0\n", "... ... ...\n", "1430 734 1104\n", "1431 958 0\n", "1432 968 0\n", "1433 962 830\n", "1434 1126 0\n", "1435 1537 0\n", "1436 864 0\n", "1437 1932 0\n", "1438 1236 0\n", "1439 1040 685\n", "1440 1423 748\n", "1441 848 0\n", "1442 1026 981\n", "1443 952 0\n", "1444 1422 0\n", "1445 913 0\n", "1446 1188 0\n", "1447 1220 870\n", "1448 796 550\n", "1449 630 0\n", "1450 896 896\n", "1451 1578 0\n", "1452 1072 0\n", "1453 1140 0\n", "1454 1221 0\n", "1455 953 694\n", "1456 2073 0\n", "1457 1188 1152\n", "1458 1078 0\n", "1459 1256 0\n", "\n", "[1460 rows x 2 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "two_columns_of_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use the describe command with the DataFrame you just created to see summaries of those variables." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1stFlrSF2ndFlrSF
count1460.0000001460.000000
mean1162.626712346.992466
std386.587738436.528436
min334.0000000.000000
25%882.0000000.000000
50%1087.0000000.000000
75%1391.250000728.000000
max4692.0000002065.000000
\n", "
" ], "text/plain": [ " 1stFlrSF 2ndFlrSF\n", "count 1460.000000 1460.000000\n", "mean 1162.626712 346.992466\n", "std 386.587738 436.528436\n", "min 334.000000 0.000000\n", "25% 882.000000 0.000000\n", "50% 1087.000000 0.000000\n", "75% 1391.250000 728.000000\n", "max 4692.000000 2065.000000" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "two_columns_of_data.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Selecting and Filtering Data using PySpark and Optimus" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import optimus as op\n", "tools = op.Utilities()\n", "df = tools.read_csv(\"train.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Print a list of the columns" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Id',\n", " 'MSSubClass',\n", " 'MSZoning',\n", " 'LotFrontage',\n", " 'LotArea',\n", " 'Street',\n", " 'Alley',\n", " 'LotShape',\n", " 'LandContour',\n", " 'Utilities',\n", " 'LotConfig',\n", " 'LandSlope',\n", " 'Neighborhood',\n", " 'Condition1',\n", " 'Condition2',\n", " 'BldgType',\n", " 'HouseStyle',\n", " 'OverallQual',\n", " 'OverallCond',\n", " 'YearBuilt',\n", " 'YearRemodAdd',\n", " 'RoofStyle',\n", " 'RoofMatl',\n", " 'Exterior1st',\n", " 'Exterior2nd',\n", " 'MasVnrType',\n", " 'MasVnrArea',\n", " 'ExterQual',\n", " 'ExterCond',\n", " 'Foundation',\n", " 'BsmtQual',\n", " 'BsmtCond',\n", " 'BsmtExposure',\n", " 'BsmtFinType1',\n", " 'BsmtFinSF1',\n", " 'BsmtFinType2',\n", " 'BsmtFinSF2',\n", " 'BsmtUnfSF',\n", " 'TotalBsmtSF',\n", " 'Heating',\n", " 'HeatingQC',\n", " 'CentralAir',\n", " 'Electrical',\n", " '1stFlrSF',\n", " '2ndFlrSF',\n", " 'LowQualFinSF',\n", " 'GrLivArea',\n", " 'BsmtFullBath',\n", " 'BsmtHalfBath',\n", " 'FullBath',\n", " 'HalfBath',\n", " 'BedroomAbvGr',\n", " 'KitchenAbvGr',\n", " 'KitchenQual',\n", " 'TotRmsAbvGrd',\n", " 'Functional',\n", " 'Fireplaces',\n", " 'FireplaceQu',\n", " 'GarageType',\n", " 'GarageYrBlt',\n", " 'GarageFinish',\n", " 'GarageCars',\n", " 'GarageArea',\n", " 'GarageQual',\n", " 'GarageCond',\n", " 'PavedDrive',\n", " 'WoodDeckSF',\n", " 'OpenPorchSF',\n", " 'EnclosedPorch',\n", " '3SsnPorch',\n", " 'ScreenPorch',\n", " 'PoolArea',\n", " 'PoolQC',\n", " 'Fence',\n", " 'MiscFeature',\n", " 'MiscVal',\n", " 'MoSold',\n", " 'YrSold',\n", " 'SaleType',\n", " 'SaleCondition',\n", " 'SalePrice']" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### From the list of columns, find a name of the column with the sales prices of the homes. Use the dot notation to extract this to a variable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is not the same in Spark, is more SQL-like, and the dot notation exists, but it will give you a `Column`, not that useful for now, so let's better select it. This will create another Spark DF." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "house_price = df.select(\"SalePrice\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use the head command to print out the top few lines of the variable you just created." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again not the same behavior, this will happen if you use head:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Row(SalePrice=208500)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "house_price.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What? I know... But if you use the `show()` method to see its content, if we do it with `n=5` is the same as head in Pandas. If you want to know more about `Rows` in spark go here: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Row" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+\n", "|SalePrice|\n", "+---------+\n", "| 208500|\n", "| 181500|\n", "| 223500|\n", "| 140000|\n", "| 250000|\n", "+---------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "house_price.show(n=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Woo! " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pick any two variables and store them to a new DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use again `select()`. " ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "two_columns_of_data = df.select(\"1stFlrSF\",\"2ndFlrSF\")" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------+--------+\n", "|1stFlrSF|2ndFlrSF|\n", "+--------+--------+\n", "| 856| 854|\n", "| 1262| 0|\n", "+--------+--------+\n", "only showing top 2 rows\n", "\n" ] } ], "source": [ "two_columns_of_data.show(n=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "columns_of_interest = ['1stFlrSF', '2ndFlrSF']\n", "two_columns_of_data = df.select(columns_of_interest)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------+--------+\n", "|1stFlrSF|2ndFlrSF|\n", "+--------+--------+\n", "| 856| 854|\n", "| 1262| 0|\n", "+--------+--------+\n", "only showing top 2 rows\n", "\n" ] } ], "source": [ "two_columns_of_data.show(n=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use the describe command with the DataFrame you just created to see summaries of those variables." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+-------+-----------------+------------------+\n", "|summary| 1stFlrSF| 2ndFlrSF|\n", "+-------+-----------------+------------------+\n", "| count| 1460| 1460|\n", "| mean|1162.626712328767|346.99246575342465|\n", "| stddev|386.5877380410744| 436.528435886257|\n", "| min| 334| 0|\n", "| max| 4692| 2065|\n", "+-------+-----------------+------------------+\n", "\n" ] } ], "source": [ "two_columns_of_data.describe().show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And with the Optimus profiler (Remember clicking on Toggle Details):" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Overview

\n", "
\n", "
\n", "
\n", "

Dataset info

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Number of variables2
Number of observations1460
Total Missing (%)0.0%
Total size in memory0.0 B
Average record size in memory0.0 B
\n", "
\n", "
\n", "

Variables types

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Numeric2
Categorical0
Date0
Text (Unique)0
Rejected0
\n", "
\n", "
\n", "

Warnings

\n", "
  • 2ndFlrSF has 829 / 56.8% zeros
\n", "
\n", "
\n", "
\n", "

Variables

\n", "
\n", "
\n", "
\n", "

1stFlrSF
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count753
Unique (%)51.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean1162.6
Minimum334
Maximum4692
Zeros (%)0.0%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum334
5-th percentile672.95
Q1882
Median1087
Q31391.2
95-th percentile1831.2
Maximum4692
Range4358
Interquartile range509.25
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation386.59
Coef of variation0.33251
Kurtosis5.7221
Mean1162.6
MAD300.58
Skewness1.3753
Sum1697400
Variance149450
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

2ndFlrSF
\n", " Numeric\n", "

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Distinct count417
Unique (%)28.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
\n", "\n", "
\n", "
\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean346.99
Minimum0
Maximum2065
Zeros (%)56.8%
\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "

Quantile statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Minimum0
5-th percentile0
Q10
Median0
Q3728
95-th percentile1141
Maximum2065
Range2065
Interquartile range728
\n", "

Descriptive statistics

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Standard deviation436.53
Coef of variation1.258
Kurtosis-0.55568
Mean346.99
MAD396.48
Skewness0.81219
Sum506610
Variance190560
Memory size0.0 B
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "

Sample

\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1stFlrSF2ndFlrSF
0856854
112620
2920866
3961756
411451053
\n", "
\n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "profiler = op.DataFrameProfiler(two_columns_of_data)\n", "profiler.profiler()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Your First Scikit-Learn Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You have the code to load your data, and you know how to index it. You are ready to choose which column you want to predict. This column is called the prediction target. There is a convention that the prediction target is referred to as y. Here is an example doing that with the example data.\n", "\n", "Check the code in the original repo before running this! https://www.kaggle.com/dansbecker/your-first-scikit-learn-model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Your Turn\n", "\n", "Now it's time for you to define and fit a model for your data (in your notebook).\n", "\n", "1. Select the target variable you want to predict. You can go back to the list of columns from your earlier commands to recall what it's called (hint: you've already worked with this variable). Save this to a new variable called y.\n", "\n", "2. Create a list of the names of the predictors we will use in the initial model. Use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes):\n", " - LotArea\n", " - YearBuilt\n", " - 1stFlrSF\n", " - 2ndFlrSF\n", " - FullBath\n", " - BedroomAbvGr\n", " - TotRmsAbvGrd\n", "\n", "3. Using the list of variable names you just created, select a new DataFrame of the predictors data. Save this with the variable name X.\n", "\n", "4. Create a DecisionTreeRegressorModel and save it to a variable (with a name like my_model or iowa_model). Ensure you've done the relevant import so you can run this command.\n", "\n", "5. Fit the model you have created using the data in X and the target data you saved above.\n", "\n", "6. Make a few predictions with the model's predict command and print out the predictions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Select the target variable you want to predict. You can go back to the list of columns from your earlier commands to recall what it's called (hint: you've already worked with this variable). Save this to a new variable called y." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The varialbe we want to predict is the SalePrice. So:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "file_path = 'train.csv'\n", "df = pd.read_csv(file_path) \n", "\n", "y = df.SalePrice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Create a list of the names of the predictors we will use in the initial model. Use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes):\n", " - LotArea\n", " - YearBuilt\n", " - 1stFlrSF\n", " - 2ndFlrSF\n", " - FullBath\n", " - BedroomAbvGr\n", " - TotRmsAbvGrd" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": true }, "outputs": [], "source": [ "predictors = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', \n", " 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Using the list of variable names you just created, select a new DataFrame of the predictors data. Save this with the variable name X." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = df[predictors]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Create a DecisionTreeRegressorModel and save it to a variable (with a name like my_model or iowa_model). Ensure you've done the relevant import so you can run this command." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will use the scikit-learn library to create your models. When coding, this library is written as sklearn, as you will see in the sample code. Scikit-learn is easily the most popular library for modeling the types of data typically stored in DataFrames. We will use Spark after :).\n", "\n", "The steps to building and using a model are:\n", "\n", "- Define: What type of model will it be? A decision tree? Some other type of model? Some other parameters of the model type are specified too.\n", "- Fit: Capture patterns from provided data. This is the heart of modeling.\n", "- Predict: Just what it sounds like\n", "- Evaluate: Determine how accurate the model's predictions are." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.tree import DecisionTreeRegressor\n", "\n", "# Define model\n", "my_model = DecisionTreeRegressor()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Fit the model you have created using the data in X and the target data you saved above." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,\n", " max_leaf_nodes=None, min_impurity_split=1e-07,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n", " splitter='best')" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Fit model\n", "my_model.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Make a few predictions with the model's predict command and print out the predictions." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Making predictions for the following 5 houses:\n", " LotArea YearBuilt 1stFlrSF 2ndFlrSF FullBath BedroomAbvGr \\\n", "0 8450 2003 856 854 2 3 \n", "1 9600 1976 1262 0 2 3 \n", "2 11250 2001 920 866 2 3 \n", "3 9550 1915 961 756 1 3 \n", "4 14260 2000 1145 1053 2 4 \n", "\n", " TotRmsAbvGrd \n", "0 8 \n", "1 6 \n", "2 6 \n", "3 7 \n", "4 9 \n", "The predictions are\n", "[ 208500. 181500. 223500. 140000. 250000.]\n" ] } ], "source": [ "print(\"Making predictions for the following 5 houses:\")\n", "print(X.head())\n", "print(\"The predictions are\")\n", "print(my_model.predict(X.head()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The real prices are" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[208500, 181500, 223500, 140000, 250000]" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.head().tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So it's a very good model :). Or is it?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You've built a decision tree model that can predict the prices of houses based on their characteristics. It's natural to ask how accurate the model's predictions will be, and measuring accuracy is necessary for us to see whether or not other approaches improve our model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Your First Spark ML Model (here only Spark, next with Optimus)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. At a high level, it provides tools such as:\n", "\n", "- ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering\n", "- Featurization: feature extraction, transformation, dimensionality reduction, and selection\n", "- Pipelines: tools for constructing, evaluating, and tuning ML Pipelines\n", "- Persistence: saving and load algorithms, models, and Pipelines\n", "- Utilities: linear algebra, statistics, data handling, etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "“Spark ML” is not an official name but occasionally used to refer to the MLlib DataFrame-based API. This is majorly due to the org.apache.spark.ml Scala package name used by the DataFrame-based API, and the “Spark ML Pipelines” term we used initially to emphasize the pipeline concept." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The varialbe we want to predict is the SalePrice. So:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.sql import SparkSession\n", "\n", "spark = SparkSession.builder.getOrCreate()\n", "sc = spark.sparkContext\n", "\n", "file_path = 'train.csv'\n", "df = spark.read.csv(file_path, header=\"true\", inferSchema=True) " ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.ml.feature import VectorAssembler, VectorIndexer\n", "\n", "# Choosing predictors\n", "features_cols = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', \n", " 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']\n", "\n", "# This concatenates all feature columns into a single feature vector in a new column \"rawFeatures\".\n", "vectorAssembler = VectorAssembler(inputCols=features_cols, outputCol=\"raw_features\")\n", "# This identifies categorical features and indexes them.\n", "vectorIndexer = VectorIndexer(inputCol=\"raw_features\", outputCol=\"features\", maxCategories=4)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.ml.regression import DecisionTreeRegressor\n", "# Takes the \"features\" column and learns to predict \"SalePrice\"\n", "dt = DecisionTreeRegressor(labelCol=\"SalePrice\")" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.ml import Pipeline\n", "\n", "pipeline = Pipeline(stages=[\n", " vectorAssembler, \n", " vectorIndexer, \n", " dt\n", " ])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": true }, "outputs": [], "source": [ "model = pipeline.fit(df)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": true }, "outputs": [], "source": [ "predictions = model.transform(df)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+------------------+\n", "|SalePrice| Prediction|\n", "+---------+------------------+\n", "| 208500|219910.90697674418|\n", "| 181500|150758.37549407114|\n", "| 223500|219910.90697674418|\n", "| 140000|149397.38805970148|\n", "| 250000|300287.74358974356|\n", "+---------+------------------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "predictions.select(\"SalePrice\",\"Prediction\").show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So it seems not that easy with Spark, and not even that accurate, but remember this will scale, and as you will see, spark needs more tweaks to improve performance. But more on that later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Validation\n", "\n", "\n", "## What is Model Validation\n", "\n", "You've built a model. But how good is it?\n", "\n", "You'll need to answer this question for almost every model you ever build. In most (though not necessarily all) applications, the relevant measure of model quality is predictive accuracy. In other words, will the model's predictions be close to what actually happens.\n", "\n", "Some people try answering this problem by making predictions with their training data. They compare those predictions to the actual target values in the training data. This approach has a critical shortcoming, which you will see in a moment (and which you'll subsequently see how to solve).\n", "\n", "Even with this simple approach, you'll need to summarize the model quality into a form that someone can understand. If you have predicted and actual home values for 10000 houses, you will inevitably end up with a mix of good and bad predictions. Looking through such a long list would be pointless.\n", "\n", "There are many metrics for summarizing model quality, but we'll start with one called Mean Absolute Error (also called MAE). Let's break down this metric starting with the last word, error.\n", "\n", "The prediction error for each house is: \n", "error=actual−predicted\n", "\n", "So, if a house cost \\$150,000 and you predicted it would cost \\$100,000 the error is \\$50,000.\n", "\n", "With the MAE metric, we take the absolute value of each error. This converts each error to a positive number. We then take the average of those absolute errors. This is our measure of model quality. In plain English, it can be said as\n", "\n", "On average, our predictions are off by about X\n", "\n", "In the [notebook for this module](https://www.kaggle.com/dansbecker/model-validation) they first load the Melbourne data and create X and y. We'll solve the problem first with sklearn and them with Spark :)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Problem with \"In-Sample\" Scores\n", "\n", "The measure they computed in the original notebook can be called an \"in-sample\" score. They used a single set of houses (called a data sample) for both building the model and for calculating it's MAE score. This is **bad**.\n", "\n", "Imagine that, in the large real estate market, door color is unrelated to home price. However, in the sample of data you used to build the model, it may be that all homes with green doors were very expensive. The model's job is to find patterns that predict home prices, so it will see this pattern, and it will always predict high prices for homes with green doors.\n", "\n", "Since this pattern was originally derived from the training data, the model will appear accurate in the training data.\n", "\n", "But this pattern likely won't hold when the model sees new data, and the model would be very inaccurate (and cost us lots of money) when we applied it to our real estate business.\n", "\n", "Even a model capturing only happenstance relationships in the data, relationships that will not be repeated when new data, can appear to be very accurate on in-sample accuracy measurements." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Models' practical value come from making predictions on new data, so we should measure performance on data that wasn't used to build the model. The most straightforward way to do this is to exclude some data from the model-building process, and then use those to test the model's accuracy on data it hasn't seen before. This data is called **validation data**.\n", "\n", "The scikit-learn library has a function train_test_split to break up the data into two pieces. We'll see afterwards how to do this in Spark." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Your Turn\n", "\n", "1. Use the train_test_split command to split up your data.\n", "2. Fit the model with the training data\n", "3. Make predictions with the validation predictors\n", "4. Calculate the mean absolute error between your predictions and the actual target values for the validation data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Use the train_test_split command to split up your data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "file_path = 'train.csv'\n", "df = pd.read_csv(file_path) \n", "\n", "y = df.SalePrice\n", "\n", "predictors = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', \n", " 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']\n", "\n", "X = df[predictors]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "# split data into training and validation data, for both predictors and target\n", "# The split is based on a random number generator. Supplying a numeric value to\n", "# the random_state argument guarantees we get the same split every time we\n", "# run this script.\n", "train_X, val_X, train_y, val_y = train_test_split(X, y,random_state = 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It should be noted that ty default, the value of the test data is set to 0.25. And of course the training will be 0.75." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Fit the model with the training data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,\n", " max_leaf_nodes=None, min_impurity_split=1e-07,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n", " splitter='best')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.tree import DecisionTreeRegressor\n", "\n", "# Define model\n", "model = DecisionTreeRegressor()\n", "# Fit model\n", "model.fit(train_X, train_y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Make predictions with the validation predictors" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# get predicted prices on validation data\n", "val_predictions = model.predict(val_X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Calculate the mean absolute error between your predictions and the actual target values for the validation data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "33606.2356164\n" ] } ], "source": [ "from sklearn.metrics import mean_absolute_error\n", "print(mean_absolute_error(val_y, val_predictions))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Validation in Spark" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The process in spark will be really similar, we first need to build the model as we saw before, but let's start with the data splitting." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Spark `randomSplit` randomly splits this DataFrame with the provided weights. The weigths are a list of doubles as weights with which to split the DataFrame. Weights will be normalized if they don’t sum up to 1.0. It's important to set a seed for reproducibility. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.sql import SparkSession\n", "\n", "spark = SparkSession.builder.getOrCreate()\n", "sc = spark.sparkContext\n", "\n", "file_path = 'train.csv'\n", "df = spark.read.csv(file_path, header=\"true\", inferSchema=True) " ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "train, test = df.randomSplit(weights=[0.75,0.25],seed=27)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.ml.feature import VectorAssembler, VectorIndexer\n", "\n", "# Choosing predictors\n", "features_cols = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', \n", " 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']\n", "\n", "# This concatenates all feature columns into a single feature vector in a new column \"rawFeatures\".\n", "vectorAssembler = VectorAssembler(inputCols=features_cols, outputCol=\"raw_features\")\n", "# This identifies categorical features and indexes them.\n", "vectorIndexer = VectorIndexer(inputCol=\"raw_features\", outputCol=\"features\", maxCategories=4)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.ml.regression import DecisionTreeRegressor\n", "# Takes the \"features\" column and learns to predict \"SalePrice\"\n", "dt = DecisionTreeRegressor(labelCol=\"SalePrice\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.ml import Pipeline\n", "\n", "pipeline = Pipeline(stages=[\n", " vectorAssembler, \n", " vectorIndexer, \n", " dt\n", " ])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Creating a model with the training data\n", "model = pipeline.fit(train)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Making predictions with the test data\n", "predictions = model.transform(test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see some of the predictions just to be sure we did a good job above" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+----------+\n", "|SalePrice|prediction|\n", "+---------+----------+\n", "| 181500| 167027.0|\n", "| 250000| 290103.0|\n", "+---------+----------+\n", "only showing top 2 rows\n", "\n" ] } ], "source": [ "predictions.select(\"SalePrice\",\"prediction\").show(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now for the evaluation" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.ml.evaluation import RegressionEvaluator\n", "\n", "# Select (prediction, true label) and compute test error with MAE\n", "evaluator = RegressionEvaluator(\n", " labelCol=\"SalePrice\", predictionCol=\"prediction\", metricName=\"mae\")" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean Absolute Error (MAE) on test data= 29025.42375836274\n" ] } ], "source": [ "mae = evaluator.evaluate(predictions)\n", "print(\"Mean Absolute Error (MAE) on test data= {}\".format(mae))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now we see that using this method we got a better result than with sklearn. So this is a good start for Spark :)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Underfitting, Overfitting and Model Optimization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Experimenting With Different Models\n", "\n", "Now that you have a trustworthy way to measure model accuracy, you can experiment with alternative models and see which gives the best predictions. But what alternatives do you have for models?\n", "\n", "You can see in scikit-learn's documentation that the decision tree model has many options (more than you'll want or need for a long time). The most important options determine the tree's depth. Recall from page 2 that a trees depth is a measure of how many splits it makes before coming to a prediction. This is a relatively shallow tree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In practice, it's not uncommon for a tree to have 10 splits between the top level (all houses and a leaf). As the tree gets deeper, the dataset gets sliced up into leaves with fewer houses. If a tree only had 1 split, it divides the data into 2 groups. If each group is split again, we would get 4 groups of houses. Splitting each of those again would create 8 groups. If we keep doubling the number of groups by adding more splits at each level, we'll have 210\n", " groups of houses by the time we get to the 10th level. That's 1024 leaves.\n", "\n", "When we divide the houses amongst many leaves, we also have fewer houses in each leaf. Leaves with very few houses will make predictions that are quite close to those homes' actual values, but they may make very unreliable predictions for new data (because each prediction is based on only a few houses).\n", "\n", "This is a phenomenon called **overfitting**, where a model matches the training data almost perfectly, but does poorly in validation and other new data. On the flip side, if we make our tree very shallow, it doesn't divide up the houses into very distinct groups.\n", "\n", "At an extreme, if a tree divides houses into only 2 or 4, each group still has a wide variety of houses. Resulting predictions may be far off for most houses, even in the training data (and it will be bad in validation too for the same reason). When a model fails to capture important distinctions and patterns in the data, so it performs poorly even in training data, that is called **underfitting**.\n", "\n", "Since we care about accuracy on new data, which we estimate from our validation data, we want to find the sweet spot between underfitting and overfitting." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example\n", "\n", "There are a few alternatives for controlling the tree depth, and many allow for some routes through the tree to have greater depth than other routes. But the max_leaf_nodes argument provides a very sensible way to control overfitting vs underfitting. The more leaves we allow the model to make, the more we move from the underfitting area in the above graph to the overfitting area.\n", "\n", "We can use a utility function to help compare MAE scores from different values for max_leaf_nodes:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.metrics import mean_absolute_error\n", "from sklearn.tree import DecisionTreeRegressor\n", "\n", "def get_mae(max_leaf_nodes, predictors_train, predictors_val, targ_train, targ_val):\n", " model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)\n", " model.fit(predictors_train, targ_train)\n", " preds_val = model.predict(predictors_val)\n", " mae = mean_absolute_error(targ_val, preds_val)\n", " return(mae)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load the data again and split it:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "file_path = 'train.csv'\n", "df = pd.read_csv(file_path) \n", "\n", "y = df.SalePrice\n", "\n", "predictors = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', \n", " 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']\n", "\n", "X = df[predictors]\n", "\n", "from sklearn.model_selection import train_test_split\n", "\n", "# split data into training and validation data, for both predictors and target\n", "# The split is based on a random number generator. Supplying a numeric value to\n", "# the random_state argument guarantees we get the same split every time we\n", "# run this script.\n", "train_X, val_X, train_y, val_y = train_test_split(X, y,random_state = 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use a for-loop to compare the accuracy of models built with different values for max_leaf_nodes." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Max leaf nodes: 5 \t\t Mean Absolute Error: 35190\n", "Max leaf nodes: 50 \t\t Mean Absolute Error: 27825\n", "Max leaf nodes: 500 \t\t Mean Absolute Error: 32662\n", "Max leaf nodes: 5000 \t\t Mean Absolute Error: 33382\n" ] } ], "source": [ "# compare MAE with differing values of max_leaf_nodes\n", "for max_leaf_nodes in [5, 50, 500, 5000]:\n", " my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)\n", " print(\"Max leaf nodes: %d \\t\\t Mean Absolute Error: %d\" %(max_leaf_nodes, my_mae))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that in this case 50 is the optimal number of leaves (it has the lowest MAE)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "Here's the takeaway: Models can suffer from either:\n", "\n", "- Overfitting: capturing spurious patterns that won't recur in the future, leading to less accurate predictions, or\n", "- Underfitting: failing to capture relevant patterns, again leading to less accurate predictions.\n", "\n", "We use validation data, which isn't used in model training, to measure a candidate model's accuracy. This lets us try many candidate models and keep the best one.\n", "\n", "But we're still using Decision Tree models, which are not very sophisticated by modern machine learning standards." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Underfitting, Overfitting and Model Optimization in Spark" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the `maxDept` parameter to optimize because there's not a max_leaf_nodes in Spark. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Maximum depth of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.sql import SparkSession\n", "\n", "spark = SparkSession.builder.getOrCreate()\n", "sc = spark.sparkContext\n", "\n", "file_path = 'train.csv'\n", "df = spark.read.csv(file_path, header=\"true\", inferSchema=True) " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "train, test = df.randomSplit(weights=[0.75,0.25],seed=27)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyspark.ml.regression import DecisionTreeRegressor\n", "from pyspark.ml import Pipeline\n", "from pyspark.ml.evaluation import RegressionEvaluator\n", "\n", "\n", "def get_mae_spark(max_depth, train_data, test_data):\n", " # This concatenates all feature columns into a single feature vector in a new column \"rawFeatures\".\n", " vectorAssembler = VectorAssembler(inputCols=features_cols, outputCol=\"raw_features\")\n", " # This identifies categorical features and indexes them.\n", " vectorIndexer = VectorIndexer(inputCol=\"raw_features\", outputCol=\"features\", maxCategories=4)\n", " # Takes the \"features\" column and learns to predict \"SalePrice\"\n", " dt = DecisionTreeRegressor(labelCol=\"SalePrice\", maxDepth=max_depth)\n", " \n", " pipeline = Pipeline(stages=[\n", " vectorAssembler, \n", " vectorIndexer, \n", " dt\n", " ])\n", " \n", " # Creating a model with the training data\n", " model = pipeline.fit(train_data)\n", " # Making predictions with the test data\n", " predictions = model.transform(test_data)\n", " # Select (prediction, true label) and compute test error with MAE\n", " evaluator = RegressionEvaluator(labelCol=\"SalePrice\", predictionCol=\"prediction\", metricName=\"mae\")\n", " mae = evaluator.evaluate(predictions)\n", " \n", " return mae" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Max depth: 5 \t\t Mean Absolute Error: 29025.42375836274\n", "Max depth: 10 \t\t Mean Absolute Error: 28375.668652125525\n", "Max depth: 15 \t\t Mean Absolute Error: 30544.50034435261\n", "Max depth: 20 \t\t Mean Absolute Error: 30883.51698806244\n" ] } ], "source": [ "# compare MAE with differing values of max_leaf_nodes\n", "for max_depth in [5, 10, 15, 20]:\n", " my_mae = get_mae_spark(max_depth, train, test)\n", " print(\"Max depth: {} \\t\\t Mean Absolute Error: {}\".format(max_depth, my_mae))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here the best maxDepth is 10 :)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }