{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Correlation Plot\n", "\n", "The `CorrPlot` builder takes a dataframe (Kotlin `Map<*, *>`) as the input and builds a correlation plot.\n", "\n", "If the input has NxN shape and contains only numbers in range [0..1], then it is plotted as is. Otherwise `CorrPlot` will compute correlation coefficients using the Pearson's method. \n", "\n", "`CorrPlot` allows to combine 'tile', 'point' or 'label' layers in a matrix of \"full\", \"lower\" or \"upper\" type.\n", "\n", "A call to the terminal `build()` method will create a resulting 'plot' object. \n", "This 'plot' object can be further refined using regular Lets-Plot (ggplot) API, like `+ ggsize()` and so on.\n", "\n", "\n", "The Ames Housing dataset for this demo was downloaded from [House Prices - Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=train.csv) (train.csv), (c) Kaggle." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "Lets-Plot Kotlin API v.4.4.2. Frontend: Notebook with dynamically loaded JS. Lets-Plot JS v.4.0.0." ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%useLatestDescriptors\n", "%use lets-plot\n", "%use dataframe\n", "\n", "LetsPlot.getInfo()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "application/kotlindataframe+json": "{\"nrow\":3,\"ncol\":12,\"columns\":[\"untitled\",\"manufacturer\",\"model\",\"displ\",\"year\",\"cyl\",\"trans\",\"drv\",\"cty\",\"hwy\",\"fl\",\"class\"],\"kotlin_dataframe\":[{\"untitled\":1,\"manufacturer\":\"audi\",\"model\":\"a4\",\"displ\":1.8,\"year\":1999,\"cyl\":4,\"trans\":\"auto(l5)\",\"drv\":\"f\",\"cty\":18,\"hwy\":29,\"fl\":\"p\",\"class\":\"compact\"},{\"untitled\":2,\"manufacturer\":\"audi\",\"model\":\"a4\",\"displ\":1.8,\"year\":1999,\"cyl\":4,\"trans\":\"manual(m5)\",\"drv\":\"f\",\"cty\":21,\"hwy\":29,\"fl\":\"p\",\"class\":\"compact\"},{\"untitled\":3,\"manufacturer\":\"audi\",\"model\":\"a4\",\"displ\":2.0,\"year\":2008,\"cyl\":4,\"trans\":\"manual(m6)\",\"drv\":\"f\",\"cty\":20,\"hwy\":31,\"fl\":\"p\",\"class\":\"compact\"}]}", "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "\n", "

DataFrame: rowsCount = 3, columnsCount = 12

\n", " \n", " \n", " " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Cars MPG dataset\n", "var mpg_df = DataFrame.readCSV(\"https://raw.githubusercontent.com/JetBrains/lets-plot-kotlin/master/docs/examples/data/mpg.csv\")\n", "mpg_df.head(3)\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/kotlindataframe+json": "{\"nrow\":3,\"ncol\":12,\"columns\":[\"untitled\",\"manufacturer\",\"model\",\"displ\",\"year\",\"cyl\",\"trans\",\"drv\",\"cty\",\"hwy\",\"fl\",\"class\"],\"kotlin_dataframe\":[{\"untitled\":1,\"manufacturer\":\"audi\",\"model\":\"a4\",\"displ\":1.8,\"year\":1999,\"cyl\":4,\"trans\":\"auto(l5)\",\"drv\":\"f\",\"cty\":18,\"hwy\":29,\"fl\":\"p\",\"class\":\"compact\"},{\"untitled\":2,\"manufacturer\":\"audi\",\"model\":\"a4\",\"displ\":1.8,\"year\":1999,\"cyl\":4,\"trans\":\"manual(m5)\",\"drv\":\"f\",\"cty\":21,\"hwy\":29,\"fl\":\"p\",\"class\":\"compact\"},{\"untitled\":3,\"manufacturer\":\"audi\",\"model\":\"a4\",\"displ\":2.0,\"year\":2008,\"cyl\":4,\"trans\":\"manual(m6)\",\"drv\":\"f\",\"cty\":20,\"hwy\":31,\"fl\":\"p\",\"class\":\"compact\"}]}", "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "\n", "

DataFrame: rowsCount = 3, columnsCount = 12

\n", " \n", " \n", " " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mpg_df = mpg_df.remove(\"\")\n", "mpg_df.head(3)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "val mpg_dat = mpg_df.toMap()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Combining 'tile', 'point' and 'label' layers.\n", "\n", "When combining layers, `CorrPlot` chooses an acceptable plot configuration by default." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid(\n", " listOf(\n", " CorrPlot(mpg_dat, \"Tiles\").tiles().build(),\n", " CorrPlot(mpg_dat, \"Points\").points().build(), \n", " CorrPlot(mpg_dat, \"Tiles and labels\").tiles().labels().build(),\n", " CorrPlot(mpg_dat, \"Tiles, points and labels\").points().labels().tiles().build()\n", " ), 2, 400, 320)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The default plot configuration adapts to the changing options - compare \"Tiles and labels\" plot above and below.\n", "\n", "You can also override the default plot configuration using the parameter `type` - compare \"Tiles, points and labels\" plot above and below." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gggrid(\n", " listOf(\n", " CorrPlot(mpg_dat, \"Tiles and labels\").tiles().labels(color=\"white\").build(),\n", " CorrPlot(mpg_dat, \"Tiles, points and labels\")\n", " .tiles(type=\"upper\")\n", " .points(type=\"lower\")\n", " .labels(type=\"full\").build()\n", " ), 2, 400, 320)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Customizing colors.\n", "\n", "Instead of the default blue-grey-red gradient you can define your own lower-middle-upper colors, or \n", "choose one of the available 'Brewer' diverging palettes.\n", "\n", "Let's create a gradient resembling one of Seaborn gradients." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val corrPlot = CorrPlot(mpg_dat).points().labels().tiles()\n", "\n", "// Configure gradient resembling one of Seaborn gradients.\n", "val withGradientColors = (corrPlot\n", " .paletteGradient(low=\"#417555\", mid=\"#EDEDED\", high=\"#963CA7\")\n", " .build()) + ggtitle(\"Custom gradient\")\n", "\n", "// Configure Brewer 'BrBG' palette.\n", "val withBrewerColors = (corrPlot\n", " .paletteSpectral()\n", " .build()) + ggtitle(\"Brewer 'Spectral'\")\n", "\n", "// Show both plots\n", "gggrid(listOf(withGradientColors, withBrewerColors), 2, 400, 320)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Correlation plot with large number of variables in dataset.\n", "\n", "The [Kaggle House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=train.csv) dataset contains 81 variables." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/kotlindataframe+json": "{\"nrow\":3,\"ncol\":81,\"columns\":[\"Id\",\"MSSubClass\",\"MSZoning\",\"LotFrontage\",\"LotArea\",\"Street\",\"Alley\",\"LotShape\",\"LandContour\",\"Utilities\",\"LotConfig\",\"LandSlope\",\"Neighborhood\",\"Condition1\",\"Condition2\",\"BldgType\",\"HouseStyle\",\"OverallQual\",\"OverallCond\",\"YearBuilt\",\"YearRemodAdd\",\"RoofStyle\",\"RoofMatl\",\"Exterior1st\",\"Exterior2nd\",\"MasVnrType\",\"MasVnrArea\",\"ExterQual\",\"ExterCond\",\"Foundation\",\"BsmtQual\",\"BsmtCond\",\"BsmtExposure\",\"BsmtFinType1\",\"BsmtFinSF1\",\"BsmtFinType2\",\"BsmtFinSF2\",\"BsmtUnfSF\",\"TotalBsmtSF\",\"Heating\",\"HeatingQC\",\"CentralAir\",\"Electrical\",\"1stFlrSF\",\"2ndFlrSF\",\"LowQualFinSF\",\"GrLivArea\",\"BsmtFullBath\",\"BsmtHalfBath\",\"FullBath\",\"HalfBath\",\"BedroomAbvGr\",\"KitchenAbvGr\",\"KitchenQual\",\"TotRmsAbvGrd\",\"Functional\",\"Fireplaces\",\"FireplaceQu\",\"GarageType\",\"GarageYrBlt\",\"GarageFinish\",\"GarageCars\",\"GarageArea\",\"GarageQual\",\"GarageCond\",\"PavedDrive\",\"WoodDeckSF\",\"OpenPorchSF\",\"EnclosedPorch\",\"3SsnPorch\",\"ScreenPorch\",\"PoolArea\",\"PoolQC\",\"Fence\",\"MiscFeature\",\"MiscVal\",\"MoSold\",\"YrSold\",\"SaleType\",\"SaleCondition\",\"SalePrice\"],\"kotlin_dataframe\":[{\"Id\":1,\"MSSubClass\":60,\"MSZoning\":\"RL\",\"LotFrontage\":65,\"LotArea\":8450,\"Street\":\"Pave\",\"Alley\":null,\"LotShape\":\"Reg\",\"LandContour\":\"Lvl\",\"Utilities\":\"AllPub\",\"LotConfig\":\"Inside\",\"LandSlope\":\"Gtl\",\"Neighborhood\":\"CollgCr\",\"Condition1\":\"Norm\",\"Condition2\":\"Norm\",\"BldgType\":\"1Fam\",\"HouseStyle\":\"2Story\",\"OverallQual\":7,\"OverallCond\":5,\"YearBuilt\":2003,\"YearRemodAdd\":2003,\"RoofStyle\":\"Gable\",\"RoofMatl\":\"CompShg\",\"Exterior1st\":\"VinylSd\",\"Exterior2nd\":\"VinylSd\",\"MasVnrType\":\"BrkFace\",\"MasVnrArea\":196,\"ExterQual\":\"Gd\",\"ExterCond\":\"TA\",\"Foundation\":\"PConc\",\"BsmtQual\":\"Gd\",\"BsmtCond\":\"TA\",\"BsmtExposure\":\"No\",\"BsmtFinType1\":\"GLQ\",\"BsmtFinSF1\":706,\"BsmtFinType2\":\"Unf\",\"BsmtFinSF2\":0,\"BsmtUnfSF\":150,\"TotalBsmtSF\":856,\"Heating\":\"GasA\",\"HeatingQC\":\"Ex\",\"CentralAir\":\"Y\",\"Electrical\":\"SBrkr\",\"1stFlrSF\":856,\"2ndFlrSF\":854,\"LowQualFinSF\":0,\"GrLivArea\":1710,\"BsmtFullBath\":1,\"BsmtHalfBath\":0,\"FullBath\":2,\"HalfBath\":1,\"BedroomAbvGr\":3,\"KitchenAbvGr\":1,\"KitchenQual\":\"Gd\",\"TotRmsAbvGrd\":8,\"Functional\":\"Typ\",\"Fireplaces\":0,\"FireplaceQu\":null,\"GarageType\":\"Attchd\",\"GarageYrBlt\":2003,\"GarageFinish\":\"RFn\",\"GarageCars\":2,\"GarageArea\":548,\"GarageQual\":\"TA\",\"GarageCond\":\"TA\",\"PavedDrive\":\"Y\",\"WoodDeckSF\":0,\"OpenPorchSF\":61,\"EnclosedPorch\":0,\"3SsnPorch\":0,\"ScreenPorch\":0,\"PoolArea\":0,\"PoolQC\":null,\"Fence\":null,\"MiscFeature\":null,\"MiscVal\":0,\"MoSold\":2,\"YrSold\":2008,\"SaleType\":\"WD\",\"SaleCondition\":\"Normal\",\"SalePrice\":208500},{\"Id\":2,\"MSSubClass\":20,\"MSZoning\":\"RL\",\"LotFrontage\":80,\"LotArea\":9600,\"Street\":\"Pave\",\"Alley\":null,\"LotShape\":\"Reg\",\"LandContour\":\"Lvl\",\"Utilities\":\"AllPub\",\"LotConfig\":\"FR2\",\"LandSlope\":\"Gtl\",\"Neighborhood\":\"Veenker\",\"Condition1\":\"Feedr\",\"Condition2\":\"Norm\",\"BldgType\":\"1Fam\",\"HouseStyle\":\"1Story\",\"OverallQual\":6,\"OverallCond\":8,\"YearBuilt\":1976,\"YearRemodAdd\":1976,\"RoofStyle\":\"Gable\",\"RoofMatl\":\"CompShg\",\"Exterior1st\":\"MetalSd\",\"Exterior2nd\":\"MetalSd\",\"MasVnrType\":\"None\",\"MasVnrArea\":0,\"ExterQual\":\"TA\",\"ExterCond\":\"TA\",\"Foundation\":\"CBlock\",\"BsmtQual\":\"Gd\",\"BsmtCond\":\"TA\",\"BsmtExposure\":\"Gd\",\"BsmtFinType1\":\"ALQ\",\"BsmtFinSF1\":978,\"BsmtFinType2\":\"Unf\",\"BsmtFinSF2\":0,\"BsmtUnfSF\":284,\"TotalBsmtSF\":1262,\"Heating\":\"GasA\",\"HeatingQC\":\"Ex\",\"CentralAir\":\"Y\",\"Electrical\":\"SBrkr\",\"1stFlrSF\":1262,\"2ndFlrSF\":0,\"LowQualFinSF\":0,\"GrLivArea\":1262,\"BsmtFullBath\":0,\"BsmtHalfBath\":1,\"FullBath\":2,\"HalfBath\":0,\"BedroomAbvGr\":3,\"KitchenAbvGr\":1,\"KitchenQual\":\"TA\",\"TotRmsAbvGrd\":6,\"Functional\":\"Typ\",\"Fireplaces\":1,\"FireplaceQu\":\"TA\",\"GarageType\":\"Attchd\",\"GarageYrBlt\":1976,\"GarageFinish\":\"RFn\",\"GarageCars\":2,\"GarageArea\":460,\"GarageQual\":\"TA\",\"GarageCond\":\"TA\",\"PavedDrive\":\"Y\",\"WoodDeckSF\":298,\"OpenPorchSF\":0,\"EnclosedPorch\":0,\"3SsnPorch\":0,\"ScreenPorch\":0,\"PoolArea\":0,\"PoolQC\":null,\"Fence\":null,\"MiscFeature\":null,\"MiscVal\":0,\"MoSold\":5,\"YrSold\":2007,\"SaleType\":\"WD\",\"SaleCondition\":\"Normal\",\"SalePrice\":181500},{\"Id\":3,\"MSSubClass\":60,\"MSZoning\":\"RL\",\"LotFrontage\":68,\"LotArea\":11250,\"Street\":\"Pave\",\"Alley\":null,\"LotShape\":\"IR1\",\"LandContour\":\"Lvl\",\"Utilities\":\"AllPub\",\"LotConfig\":\"Inside\",\"LandSlope\":\"Gtl\",\"Neighborhood\":\"CollgCr\",\"Condition1\":\"Norm\",\"Condition2\":\"Norm\",\"BldgType\":\"1Fam\",\"HouseStyle\":\"2Story\",\"OverallQual\":7,\"OverallCond\":5,\"YearBuilt\":2001,\"YearRemodAdd\":2002,\"RoofStyle\":\"Gable\",\"RoofMatl\":\"CompShg\",\"Exterior1st\":\"VinylSd\",\"Exterior2nd\":\"VinylSd\",\"MasVnrType\":\"BrkFace\",\"MasVnrArea\":162,\"ExterQual\":\"Gd\",\"ExterCond\":\"TA\",\"Foundation\":\"PConc\",\"BsmtQual\":\"Gd\",\"BsmtCond\":\"TA\",\"BsmtExposure\":\"Mn\",\"BsmtFinType1\":\"GLQ\",\"BsmtFinSF1\":486,\"BsmtFinType2\":\"Unf\",\"BsmtFinSF2\":0,\"BsmtUnfSF\":434,\"TotalBsmtSF\":920,\"Heating\":\"GasA\",\"HeatingQC\":\"Ex\",\"CentralAir\":\"Y\",\"Electrical\":\"SBrkr\",\"1stFlrSF\":920,\"2ndFlrSF\":866,\"LowQualFinSF\":0,\"GrLivArea\":1786,\"BsmtFullBath\":1,\"BsmtHalfBath\":0,\"FullBath\":2,\"HalfBath\":1,\"BedroomAbvGr\":3,\"KitchenAbvGr\":1,\"KitchenQual\":\"Gd\",\"TotRmsAbvGrd\":6,\"Functional\":\"Typ\",\"Fireplaces\":1,\"FireplaceQu\":\"TA\",\"GarageType\":\"Attchd\",\"GarageYrBlt\":2001,\"GarageFinish\":\"RFn\",\"GarageCars\":2,\"GarageArea\":608,\"GarageQual\":\"TA\",\"GarageCond\":\"TA\",\"PavedDrive\":\"Y\",\"WoodDeckSF\":0,\"OpenPorchSF\":42,\"EnclosedPorch\":0,\"3SsnPorch\":0,\"ScreenPorch\":0,\"PoolArea\":0,\"PoolQC\":null,\"Fence\":null,\"MiscFeature\":null,\"MiscVal\":0,\"MoSold\":9,\"YrSold\":2008,\"SaleType\":\"WD\",\"SaleCondition\":\"Normal\",\"SalePrice\":223500}]}", "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "\n", "

DataFrame: rowsCount = 3, columnsCount = 81

\n", " \n", " \n", " " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val housing_df = DataFrame.readCSV(\"../data/Ames_house_prices_train.csv\")\n", "housing_df.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Correlation plot that shows all the correlations in this dataset is too large and barely useful. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CorrPlot(housing_df.toMap())\n", " .tiles(type=\"lower\")\n", " .paletteBrBG()\n", " .build()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### The `threshold` parameter.\n", "\n", "The `threshold` parameter let us specify a level of significance, below which variables are not shown." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CorrPlot(housing_df.toMap(), \"Threshold: 0.5\", threshold = 0.5, adjustSize = 0.7)\n", " .tiles(type=\"full\", diag=false)\n", " .paletteBrBG()\n", " .build()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Let's further increase our threshold in order to see only highly correlated variables.\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CorrPlot(housing_df.toMap(), \"Threshold: 0.8\", threshold = 0.8)\n", " .tiles(diag=false)\n", " .labels(color=\"white\", diag=false)\n", " .paletteBrBG()\n", " .build()" ] } ], "metadata": { "kernelspec": { "display_name": "Kotlin", "language": "kotlin", "name": "kotlin" }, "language_info": { "codemirror_mode": "text/x-kotlin", "file_extension": ".kt", "mimetype": "text/x-kotlin", "name": "kotlin", "nbconvert_exporter": "", "pygments_lexer": "kotlin", "version": "1.8.20" } }, "nbformat": 4, "nbformat_minor": 4 }