## Correlation Plot

The `CorrPlot` builder takes a dataframe (Kotlin `Map<*, *>`) as the input and builds a correlation plot.

If the input has NxN shape and contains only numbers in range [0..1], then it is plotted as is. Otherwise `CorrPlot` will compute correlation coefficients using the Pearson's method. 

`CorrPlot` allows to combine 'tile', 'point' or 'label' layers in a matrix of "full", "lower" or "upper" type.

A call to the terminal `build()` method will create a resulting 'plot' object. 
This 'plot' object can be further refined using regular Lets-Plot (ggplot) API, like `+ ggsize()` and so on.


The Ames Housing dataset for this demo was downloaded from [House Prices - Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=train.csv) (train.csv), (c) Kaggle.

In [1]:
%useLatestDescriptors
%use lets-plot
%use dataframe

LetsPlot.getInfo()

Lets-Plot Kotlin API v.4.4.2. Frontend: Notebook with dynamically loaded JS. Lets-Plot JS v.4.0.0.

In [2]:
// Cars MPG dataset
var mpg_df = DataFrame.readCSV("https://raw.githubusercontent.com/JetBrains/lets-plot-kotlin/master/docs/examples/data/mpg.csv")
mpg_df.head(3)


In [3]:
mpg_df = mpg_df.remove("")
mpg_df.head(3)

In [4]:
val mpg_dat = mpg_df.toMap()

### Combining 'tile', 'point' and 'label' layers.

When combining layers, `CorrPlot` chooses an acceptable plot configuration by default.

In [5]:
gggrid(
    listOf(
        CorrPlot(mpg_dat, "Tiles").tiles().build(),
        CorrPlot(mpg_dat, "Points").points().build(), 
        CorrPlot(mpg_dat, "Tiles and labels").tiles().labels().build(),
        CorrPlot(mpg_dat, "Tiles, points and labels").points().labels().tiles().build()
    ), 2, 400, 320)

The default plot configuration adapts to the changing options - compare "Tiles and labels" plot above and below.

You can also override the default plot configuration using the parameter `type` - compare "Tiles, points and labels" plot above and below.

In [6]:
gggrid(
    listOf(
        CorrPlot(mpg_dat, "Tiles and labels").tiles().labels(color="white").build(),
        CorrPlot(mpg_dat, "Tiles, points and labels")
         .tiles(type="upper")
         .points(type="lower")
         .labels(type="full").build()
    ), 2, 400, 320)

### Customizing colors.

Instead of the default blue-grey-red gradient you can define your own lower-middle-upper colors, or 
choose one of the available 'Brewer' diverging palettes.

Let's create a gradient resembling one of Seaborn gradients.

In [7]:
val corrPlot = CorrPlot(mpg_dat).points().labels().tiles()

// Configure gradient resembling one of Seaborn gradients.
val withGradientColors = (corrPlot
            .paletteGradient(low="#417555", mid="#EDEDED", high="#963CA7")
            .build()) + ggtitle("Custom gradient")

// Configure Brewer 'BrBG' palette.
val withBrewerColors = (corrPlot
            .paletteSpectral()
            .build()) + ggtitle("Brewer 'Spectral'")

// Show both plots
gggrid(listOf(withGradientColors, withBrewerColors), 2, 400, 320)


### Correlation plot with large number of variables in dataset.

The [Kaggle House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=train.csv) dataset contains 81 variables.

In [8]:
val housing_df = DataFrame.readCSV("../data/Ames_house_prices_train.csv")
housing_df.head(3)


Correlation plot that shows all the correlations in this dataset is too large and barely useful. 

In [9]:
CorrPlot(housing_df.toMap())
    .tiles(type="lower")
    .paletteBrBG()
    .build()


#### The `threshold` parameter.

The `threshold` parameter let us specify a level of significance, below which variables are not shown.

In [10]:
CorrPlot(housing_df.toMap(), "Threshold: 0.5", threshold = 0.5, adjustSize = 0.7)
    .tiles(type="full", diag=false)
    .paletteBrBG()
    .build()


Let's further increase our threshold in order to see only highly correlated variables.


In [11]:
CorrPlot(housing_df.toMap(), "Threshold: 0.8", threshold = 0.8)
    .tiles(diag=false)
    .labels(color="white", diag=false)
    .paletteBrBG()
    .build()