{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from hvplot.plotting import scatter_matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`scatter_matrix` shows all the pairwise relationships between the columns of your data. Each non-diagonal entry plots the corresponding columns against another, while the diagonal plot shows the distribution of the data within each individual column.\n", "\n", "This function is closely modelled on [pandas.plotting.scatter_matrix](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.scatter_matrix.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D'])\n", "\n", "scatter_matrix(df, alpha=0.2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_sub = df[['A', 'B']].copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `chart` parameter allows to change the type of the *off-diagonal* plots." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scatter_matrix(df_sub, chart='bivariate') + scatter_matrix(df_sub, chart='hexbin')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `diagonal` parameter allows to change the type of the *diagonal* plots." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scatter_matrix(df_sub, diagonal='kde')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setting `tools` to include a selection tool like `box_select` and an inspection tool like `hover` permits further analysis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scatter_matrix(df_sub, tools=['box_select', 'hover'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_sub['CAT'] = np.random.choice(['X', 'Y', 'Z'], len(df_sub))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `c` parameter allows to colorize the data by a given column, here by `'CAT'`. Note also that the `diagonal_kwds` parameter (equivalent to `hist_kwds` in this case or `density_kwds` for *kde* plots) allow to customize the diagonal plots." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scatter_matrix(df_sub, c='CAT', diagonal_kwds=dict(alpha=0.3))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame(np.random.randn(100_000, 4), columns=['A','B','C','D'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Scatter matrix plots may end up with a large number of points having to be rendered which can be challenging for the browser or even just crash it. In that case you should consider setting to `True` the `rasterize` (or `datashade`) parameter that uses [Datashader](https://datashader.org/) to render the off-diagonal plots on the backend and then send more efficient image-based representations to the browser.\n", "\n", "The following scatter matrix plot has 1,200,00 (12x100,000) points that are rendered efficiently by `datashader`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scatter_matrix(df, rasterize=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When `rasterize` (or `datashade`) is toggled it's possible to make individual points more visible by setting `dynspread=True` or `spread=True`. Head over to the [Working with large data using datashader](https://holoviews.org/user_guide/Large_Data.html) guide of [HoloViews](https://holoviews.org/index.html) to learn more about these operations and what parameters they accept (which can be passed as `kwds` to `scatter_matrix`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scatter_matrix(df, rasterize=True, dynspread=True)" ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 5 }