{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Step-by-step TDD in a data science task\n", "\n", "If you are interested in a longer introduction [click here]()\n", "\n", "I took an example dataset from Kaggle, the [House Prices dataet](https://www.kaggle.com/c/house-prices-advanced-regression-techniques) this is a sufficiently easy and fun data. Just right to pass the imaginary test of 'tutorial on TDD for analysis'.\n", "\n", "Since it was a csv file, I started by reading the data with Pandas. The first thing I wanted to check is if there are NaN/NULL values." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] }, { "data": { "text/html": [ "
\n", " | MSSubClass | \n", "MSZoning | \n", "LotFrontage | \n", "LotArea | \n", "Street | \n", "Alley | \n", "LotShape | \n", "LandContour | \n", "Utilities | \n", "LotConfig | \n", "... | \n", "PoolArea | \n", "PoolQC | \n", "Fence | \n", "MiscFeature | \n", "MiscVal | \n", "MoSold | \n", "YrSold | \n", "SaleType | \n", "SaleCondition | \n", "SalePrice | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1 | \n", "60 | \n", "RL | \n", "65.0 | \n", "8450 | \n", "Pave | \n", "NaN | \n", "Reg | \n", "Lvl | \n", "AllPub | \n", "Inside | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "2 | \n", "2008 | \n", "WD | \n", "Normal | \n", "208500 | \n", "
2 | \n", "20 | \n", "RL | \n", "80.0 | \n", "9600 | \n", "Pave | \n", "NaN | \n", "Reg | \n", "Lvl | \n", "AllPub | \n", "FR2 | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "5 | \n", "2007 | \n", "WD | \n", "Normal | \n", "181500 | \n", "
3 | \n", "60 | \n", "RL | \n", "68.0 | \n", "11250 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "AllPub | \n", "Inside | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "9 | \n", "2008 | \n", "WD | \n", "Normal | \n", "223500 | \n", "
4 | \n", "70 | \n", "RL | \n", "60.0 | \n", "9550 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "AllPub | \n", "Corner | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "2 | \n", "2006 | \n", "WD | \n", "Abnorml | \n", "140000 | \n", "
5 | \n", "60 | \n", "RL | \n", "84.0 | \n", "14260 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "AllPub | \n", "FR2 | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "12 | \n", "2008 | \n", "WD | \n", "Normal | \n", "250000 | \n", "
5 rows × 80 columns
\n", "\n", " | SumOfNulls | \n", "DataTypes | \n", "
---|---|---|
PoolQC | \n", "1453 | \n", "object | \n", "
MiscFeature | \n", "1406 | \n", "object | \n", "
Alley | \n", "1369 | \n", "object | \n", "
Fence | \n", "1179 | \n", "object | \n", "
FireplaceQu | \n", "690 | \n", "object | \n", "
LotFrontage | \n", "259 | \n", "float64 | \n", "
GarageYrBlt | \n", "81 | \n", "float64 | \n", "
GarageCond | \n", "81 | \n", "object | \n", "
GarageType | \n", "81 | \n", "object | \n", "
GarageFinish | \n", "81 | \n", "object | \n", "
GarageQual | \n", "81 | \n", "object | \n", "
BsmtExposure | \n", "38 | \n", "object | \n", "
BsmtFinType2 | \n", "38 | \n", "object | \n", "
BsmtCond | \n", "37 | \n", "object | \n", "
BsmtQual | \n", "37 | \n", "object | \n", "
BsmtFinType1 | \n", "37 | \n", "object | \n", "
MasVnrArea | \n", "8 | \n", "float64 | \n", "
MasVnrType | \n", "8 | \n", "object | \n", "
Electrical | \n", "1 | \n", "object | \n", "
MSSubClass | \n", "0 | \n", "int64 | \n", "
Fireplaces | \n", "0 | \n", "int64 | \n", "
Functional | \n", "0 | \n", "object | \n", "
KitchenQual | \n", "0 | \n", "object | \n", "
KitchenAbvGr | \n", "0 | \n", "int64 | \n", "
BedroomAbvGr | \n", "0 | \n", "int64 | \n", "
HalfBath | \n", "0 | \n", "int64 | \n", "
FullBath | \n", "0 | \n", "int64 | \n", "
BsmtHalfBath | \n", "0 | \n", "int64 | \n", "
TotRmsAbvGrd | \n", "0 | \n", "int64 | \n", "
GarageCars | \n", "0 | \n", "int64 | \n", "
... | \n", "... | \n", "... | \n", "
HouseStyle | \n", "0 | \n", "object | \n", "
BldgType | \n", "0 | \n", "object | \n", "
Condition2 | \n", "0 | \n", "object | \n", "
Condition1 | \n", "0 | \n", "object | \n", "
LandSlope | \n", "0 | \n", "object | \n", "
2ndFlrSF | \n", "0 | \n", "int64 | \n", "
LotConfig | \n", "0 | \n", "object | \n", "
Utilities | \n", "0 | \n", "object | \n", "
LandContour | \n", "0 | \n", "object | \n", "
LotShape | \n", "0 | \n", "object | \n", "
Street | \n", "0 | \n", "object | \n", "
LotArea | \n", "0 | \n", "int64 | \n", "
YearBuilt | \n", "0 | \n", "int64 | \n", "
YearRemodAdd | \n", "0 | \n", "int64 | \n", "
RoofStyle | \n", "0 | \n", "object | \n", "
RoofMatl | \n", "0 | \n", "object | \n", "
Exterior1st | \n", "0 | \n", "object | \n", "
Exterior2nd | \n", "0 | \n", "object | \n", "
ExterQual | \n", "0 | \n", "object | \n", "
ExterCond | \n", "0 | \n", "object | \n", "
Foundation | \n", "0 | \n", "object | \n", "
BsmtFinSF1 | \n", "0 | \n", "int64 | \n", "
BsmtFinSF2 | \n", "0 | \n", "int64 | \n", "
BsmtUnfSF | \n", "0 | \n", "int64 | \n", "
TotalBsmtSF | \n", "0 | \n", "int64 | \n", "
Heating | \n", "0 | \n", "object | \n", "
HeatingQC | \n", "0 | \n", "object | \n", "
MSZoning | \n", "0 | \n", "object | \n", "
1stFlrSF | \n", "0 | \n", "int64 | \n", "
SalePrice | \n", "0 | \n", "int64 | \n", "
80 rows × 2 columns
\n", "