{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MLflow + whylogs Integration\n", "\n", "In this notebook, we will explore the [MLflow](https://mlflow.org/) integration in `whylogs`.\n", "\n", "This example uses the data from [MLflow's tutorial](https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html) for demonstration purposes.\n", "\n", "This tutorial showcases how you can use the whylogs integration to:\n", "* Capture data quality metrics while training a linear regression model in `mlflow`\n", "* Extract whylogs data back into an in-memory format from the MLflow backend\n", "* Visualize this data\n", "\n", "# Getting Started\n", "To run this tutorial:\n", "* Install [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)\n", "* Create a new environment with conda via `conda create --name whylogs-mlflow python=3.8`\n", " * You'll need to activate the environment with `conda activate whylogs-mlflow`\n", " * You'll need to install pip into the Conda environment `conda install pip`\n", " * To make the environment work with Jupyter notebooks, run `pip install ipykernel` to install the kernel module\n", " * Install the environment as a Jupyter notebook kernel via `python -m ipykernel install --user --name=whylogs-mlflow`\n", "* Install MLflow with scikit-learn via `pip install mlflow[extras]`\n", "* Install whylogs with matplotlib via `pip install whylogs[viz]`\n", "* You can also install the necessary libraries separately:\n", " * MLflow: `pip install mlflow`\n", " * whylogs: `pip install whylogs`\n", " * scikit-learn: `pip install scikit-learn`\n", " * matplotlib: `pip install matplotlib`\n", "* In your notebook, ensure you select `whylogs-mlflow` as your kernel\n", "\n", "# Setup\n", "First, we want to filter out noisy warnings" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "warnings.simplefilter('ignore')\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import random\n", "import time\n", "\n", "import pandas as pd\n", "import mlflow\n", "import whylogs\n", "\n", "from sklearn.metrics import mean_absolute_error\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import ElasticNet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Enable whylogs Integration\n", "\n", "Enable whylogs in MLflow to allow storing whylogs statistical profiles with every run. This method returns `True` if whylogs is able to patch MLflow." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": "True" }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "assert whylogs.__version__ >= \"0.1.13\" # we need 0.1.13 or later for MLflow integration\n", "whylogs.enable_mlflow()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dataset Preparation\n", "\n", "Download and prepare the UCI wine quality dataset. We sample the test dataset further to represent batches of data produced every second." ] }, { "cell_type": "code", "execution_count": 4, "outputs": [ { "data": { "text/plain": " fixed acidity volatile acidity citric acid residual sugar chlorides \\\n0 7.4 0.700 0.00 1.9 0.076 \n1 7.8 0.880 0.00 2.6 0.098 \n2 7.8 0.760 0.04 2.3 0.092 \n3 11.2 0.280 0.56 1.9 0.075 \n4 7.4 0.700 0.00 1.9 0.076 \n... ... ... ... ... ... \n1594 6.2 0.600 0.08 2.0 0.090 \n1595 5.9 0.550 0.10 2.2 0.062 \n1596 6.3 0.510 0.13 2.3 0.076 \n1597 5.9 0.645 0.12 2.0 0.075 \n1598 6.0 0.310 0.47 3.6 0.067 \n\n free sulfur dioxide total sulfur dioxide density pH sulphates \\\n0 11.0 34.0 0.99780 3.51 0.56 \n1 25.0 67.0 0.99680 3.20 0.68 \n2 15.0 54.0 0.99700 3.26 0.65 \n3 17.0 60.0 0.99800 3.16 0.58 \n4 11.0 34.0 0.99780 3.51 0.56 \n... ... ... ... ... ... \n1594 32.0 44.0 0.99490 3.45 0.58 \n1595 39.0 51.0 0.99512 3.52 0.76 \n1596 29.0 40.0 0.99574 3.42 0.75 \n1597 32.0 44.0 0.99547 3.57 0.71 \n1598 18.0 42.0 0.99549 3.39 0.66 \n\n alcohol quality \n0 9.4 5 \n1 9.8 5 \n2 9.8 5 \n3 9.8 6 \n4 9.4 5 \n... ... ... \n1594 10.5 5 \n1595 11.2 6 \n1596 11.0 6 \n1597 10.2 5 \n1598 11.0 6 \n\n[1599 rows x 12 columns]", "text/html": "
\n | fixed acidity | \nvolatile acidity | \ncitric acid | \nresidual sugar | \nchlorides | \nfree sulfur dioxide | \ntotal sulfur dioxide | \ndensity | \npH | \nsulphates | \nalcohol | \nquality | \n
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n7.4 | \n0.700 | \n0.00 | \n1.9 | \n0.076 | \n11.0 | \n34.0 | \n0.99780 | \n3.51 | \n0.56 | \n9.4 | \n5 | \n
1 | \n7.8 | \n0.880 | \n0.00 | \n2.6 | \n0.098 | \n25.0 | \n67.0 | \n0.99680 | \n3.20 | \n0.68 | \n9.8 | \n5 | \n
2 | \n7.8 | \n0.760 | \n0.04 | \n2.3 | \n0.092 | \n15.0 | \n54.0 | \n0.99700 | \n3.26 | \n0.65 | \n9.8 | \n5 | \n
3 | \n11.2 | \n0.280 | \n0.56 | \n1.9 | \n0.075 | \n17.0 | \n60.0 | \n0.99800 | \n3.16 | \n0.58 | \n9.8 | \n6 | \n
4 | \n7.4 | \n0.700 | \n0.00 | \n1.9 | \n0.076 | \n11.0 | \n34.0 | \n0.99780 | \n3.51 | \n0.56 | \n9.4 | \n5 | \n
... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n
1594 | \n6.2 | \n0.600 | \n0.08 | \n2.0 | \n0.090 | \n32.0 | \n44.0 | \n0.99490 | \n3.45 | \n0.58 | \n10.5 | \n5 | \n
1595 | \n5.9 | \n0.550 | \n0.10 | \n2.2 | \n0.062 | \n39.0 | \n51.0 | \n0.99512 | \n3.52 | \n0.76 | \n11.2 | \n6 | \n
1596 | \n6.3 | \n0.510 | \n0.13 | \n2.3 | \n0.076 | \n29.0 | \n40.0 | \n0.99574 | \n3.42 | \n0.75 | \n11.0 | \n6 | \n
1597 | \n5.9 | \n0.645 | \n0.12 | \n2.0 | \n0.075 | \n32.0 | \n44.0 | \n0.99547 | \n3.57 | \n0.71 | \n10.2 | \n5 | \n
1598 | \n6.0 | \n0.310 | \n0.47 | \n3.6 | \n0.067 | \n18.0 | \n42.0 | \n0.99549 | \n3.39 | \n0.66 | \n11.0 | \n6 | \n
1599 rows × 12 columns
\n\n | column | \ncount | \nnull_count | \nbool_count | \nnumeric_count | \nmax | \nmean | \nmin | \nstddev | \nnunique_numbers | \n... | \nnunique_str_upper | \nquantile_0.0000 | \nquantile_0.0100 | \nquantile_0.0500 | \nquantile_0.2500 | \nquantile_0.5000 | \nquantile_0.7500 | \nquantile_0.9500 | \nquantile_0.9900 | \nquantile_1.0000 | \n
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \nchlorides | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n0.61100 | \n0.086965 | \n0.01200 | \n0.044101 | \n134.0 | \n... | \n0.0 | \n0.01200 | \n0.04100 | \n0.05400 | \n0.07000 | \n0.07900 | \n0.0900 | \n0.123 | \n0.3680 | \n0.61100 | \n
1 | \nquality | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n8.00000 | \n5.626355 | \n3.00000 | \n0.785196 | \n6.0 | \n... | \n0.0 | \n3.00000 | \n4.00000 | \n5.00000 | \n5.00000 | \n6.00000 | \n6.0000 | \n7.000 | \n8.0000 | \n8.00000 | \n
2 | \ntotal sulfur dioxide | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n289.00000 | \n46.488741 | \n6.00000 | \n33.239143 | \n141.0 | \n... | \n0.0 | \n6.00000 | \n8.00000 | \n11.00000 | \n22.00000 | \n37.00000 | \n63.0000 | \n115.000 | \n148.0000 | \n289.00000 | \n
3 | \nalcohol | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n14.90000 | \n10.404393 | \n8.40000 | \n1.060160 | \n65.0 | \n... | \n0.0 | \n8.40000 | \n9.00000 | \n9.20000 | \n9.50000 | \n10.10000 | \n11.0000 | \n12.500 | \n13.4000 | \n14.90000 | \n
4 | \ndensity | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n1.00369 | \n0.996796 | \n0.99007 | \n0.001863 | \n390.0 | \n... | \n0.0 | \n0.99007 | \n0.99235 | \n0.99362 | \n0.99566 | \n0.99677 | \n0.9979 | \n1.000 | \n1.0022 | \n1.00369 | \n
5 | \nfree sulfur dioxide | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n72.00000 | \n15.822769 | \n1.00000 | \n10.503918 | \n57.0 | \n... | \n0.0 | \n1.00000 | \n3.00000 | \n4.00000 | \n7.00000 | \n13.00000 | \n21.0000 | \n35.000 | \n48.0000 | \n72.00000 | \n
6 | \nvolatile acidity | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n1.33000 | \n0.530367 | \n0.12000 | \n0.178894 | \n136.0 | \n... | \n0.0 | \n0.12000 | \n0.18000 | \n0.27000 | \n0.40000 | \n0.53000 | \n0.6400 | \n0.855 | \n1.0350 | \n1.33000 | \n
7 | \nsulphates | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n1.98000 | \n0.657131 | \n0.33000 | \n0.168270 | \n89.0 | \n... | \n0.0 | \n0.33000 | \n0.43000 | \n0.47000 | \n0.55000 | \n0.62000 | \n0.7300 | \n0.950 | \n1.3400 | \n1.98000 | \n
8 | \ncitric acid | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n0.78000 | \n0.269700 | \n0.00000 | \n0.193572 | \n75.0 | \n... | \n0.0 | \n0.00000 | \n0.00000 | \n0.00000 | \n0.09000 | \n0.26000 | \n0.4200 | \n0.600 | \n0.6900 | \n0.78000 | \n
9 | \npH | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n4.01000 | \n3.310234 | \n2.89000 | \n0.153440 | \n82.0 | \n... | \n0.0 | \n2.89000 | \n2.93000 | \n3.06000 | \n3.21000 | \n3.31000 | \n3.4000 | \n3.560 | \n3.6800 | \n4.01000 | \n
10 | \nresidual sugar | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n15.50000 | \n2.575438 | \n0.90000 | \n1.459931 | \n85.0 | \n... | \n0.0 | \n0.90000 | \n1.30000 | \n1.60000 | \n1.90000 | \n2.20000 | \n2.6000 | \n5.200 | \n8.3000 | \n15.50000 | \n
11 | \nfixed acidity | \n1199.0 | \n0.0 | \n0.0 | \n1199.0 | \n15.90000 | \n8.328774 | \n4.60000 | \n1.723000 | \n90.0 | \n... | \n0.0 | \n4.60000 | \n5.20000 | \n6.10000 | \n7.10000 | \n7.90000 | \n9.3000 | \n11.700 | \n13.0000 | \n15.90000 | \n
12 rows × 32 columns
\n