{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n",
">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Mlflow_Logging)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Mlflow_Logging) to leverage the power of whylogs and WhyLabs together!*"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# MLflow Logging\n",
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/Mlflow_Logging.ipynb)\n",
"\n",
"[MLflow](https://www.mlflow.org/) is an open-source model platform that can track, manage and help users deploy their models to production with a very consistent API and good software engineering practices. Whylogs users can benefit from our API to seamlessly log profiles to their Mlflow environment. Let's see how."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Setup\n",
"\n",
"For this tutorial we will simplify the approach by using MLflow's local client. One of MLflow's advantages is that it uses the exact same API to work both locally and in the cloud. So with a minor setup, the code shown here can be easily extended if you're working with MLflow in Kubernetes or in Databricks, for example. In order to get started, make sure you have both `mlflow` and `whylogs` installed in your environment by uncommenting the following cells:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# Note: you may need to restart the kernel to use updated packages.\n",
"%pip install 'whylogs[mlflow]'"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We are also installing `pandas`, `scikit-learn` and `matplotlib` in order to have a very simple training example and show you how you can start profiling your training data with `whylogs`. So, if you still haven't, also run the following cell:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"%pip install -q scikit-learn matplotlib pandas mlflow-skinny"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Get the data\n",
"\n",
"Now let us get an example dataset from the `scikit-learn` library and create a function that returns an aggregated dataframe with it. We will use this same function later on!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"from sklearn.datasets import load_iris\n",
"\n",
"def get_data() -> pd.DataFrame:\n",
" iris_data = load_iris()\n",
" dataframe = pd.DataFrame(iris_data.data, columns=iris_data.feature_names)\n",
" dataframe[\"target\"] = pd.DataFrame(iris_data.target)\n",
" return dataframe"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df = get_data()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | sepal length (cm) | \n", "sepal width (cm) | \n", "petal length (cm) | \n", "petal width (cm) | \n", "target | \n", "
---|---|---|---|---|---|
0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "0 | \n", "
1 | \n", "4.9 | \n", "3.0 | \n", "1.4 | \n", "0.2 | \n", "0 | \n", "
2 | \n", "4.7 | \n", "3.2 | \n", "1.3 | \n", "0.2 | \n", "0 | \n", "
3 | \n", "4.6 | \n", "3.1 | \n", "1.5 | \n", "0.2 | \n", "0 | \n", "
4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "0 | \n", "
\n", " | counts/n | \n", "counts/null | \n", "types/integral | \n", "types/fractional | \n", "types/boolean | \n", "types/string | \n", "types/object | \n", "distribution/mean | \n", "distribution/stddev | \n", "distribution/n | \n", "... | \n", "distribution/q_90 | \n", "distribution/q_95 | \n", "distribution/q_99 | \n", "ints/max | \n", "ints/min | \n", "cardinality/est | \n", "cardinality/upper_1 | \n", "cardinality/lower_1 | \n", "frequent_items/frequent_strings | \n", "type | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
target | \n", "150 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1.000000 | \n", "0.819232 | \n", "150 | \n", "... | \n", "2.0 | \n", "2.0 | \n", "2.0 | \n", "2.0 | \n", "0.0 | \n", "3.000000 | \n", "3.000150 | \n", "3.0 | \n", "[FrequentItem(value='0.000000', est=50, upper=... | \n", "SummaryType.COLUMN | \n", "
petal width (cm) | \n", "150 | \n", "0 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "1.199333 | \n", "0.762238 | \n", "150 | \n", "... | \n", "2.2 | \n", "2.3 | \n", "2.5 | \n", "NaN | \n", "NaN | \n", "22.000001 | \n", "22.001100 | \n", "22.0 | \n", "NaN | \n", "SummaryType.COLUMN | \n", "
sepal width (cm) | \n", "150 | \n", "0 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "3.057333 | \n", "0.435866 | \n", "150 | \n", "... | \n", "3.7 | \n", "3.8 | \n", "4.2 | \n", "NaN | \n", "NaN | \n", "23.000001 | \n", "23.001150 | \n", "23.0 | \n", "NaN | \n", "SummaryType.COLUMN | \n", "
petal length (cm) | \n", "150 | \n", "0 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "3.758000 | \n", "1.765298 | \n", "150 | \n", "... | \n", "5.8 | \n", "6.1 | \n", "6.7 | \n", "NaN | \n", "NaN | \n", "43.000004 | \n", "43.002151 | \n", "43.0 | \n", "NaN | \n", "SummaryType.COLUMN | \n", "
sepal length (cm) | \n", "150 | \n", "0 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "5.843333 | \n", "0.828066 | \n", "150 | \n", "... | \n", "6.9 | \n", "7.3 | \n", "7.7 | \n", "NaN | \n", "NaN | \n", "35.000003 | \n", "35.001750 | \n", "35.0 | \n", "NaN | \n", "SummaryType.COLUMN | \n", "
5 rows × 28 columns
\n", "\n", " | counts/n | \n", "counts/null | \n", "types/integral | \n", "types/fractional | \n", "types/boolean | \n", "types/string | \n", "types/object | \n", "cardinality/est | \n", "cardinality/upper_1 | \n", "cardinality/lower_1 | \n", "... | \n", "distribution/q_25 | \n", "distribution/median | \n", "distribution/q_75 | \n", "distribution/q_90 | \n", "distribution/q_95 | \n", "distribution/q_99 | \n", "type | \n", "ints/max | \n", "ints/min | \n", "frequent_items/frequent_strings | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
petal length (cm) | \n", "150 | \n", "0 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "43.000004 | \n", "43.002151 | \n", "43.0 | \n", "... | \n", "1.6 | \n", "4.4 | \n", "5.1 | \n", "5.8 | \n", "6.1 | \n", "6.7 | \n", "SummaryType.COLUMN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
petal width (cm) | \n", "150 | \n", "0 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "22.000001 | \n", "22.001100 | \n", "22.0 | \n", "... | \n", "0.3 | \n", "1.3 | \n", "1.8 | \n", "2.2 | \n", "2.3 | \n", "2.5 | \n", "SummaryType.COLUMN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
sepal length (cm) | \n", "150 | \n", "0 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "35.000003 | \n", "35.001750 | \n", "35.0 | \n", "... | \n", "5.1 | \n", "5.8 | \n", "6.4 | \n", "6.9 | \n", "7.3 | \n", "7.7 | \n", "SummaryType.COLUMN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
sepal width (cm) | \n", "150 | \n", "0 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "23.000001 | \n", "23.001150 | \n", "23.0 | \n", "... | \n", "2.8 | \n", "3.0 | \n", "3.3 | \n", "3.7 | \n", "3.8 | \n", "4.2 | \n", "SummaryType.COLUMN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
target | \n", "150 | \n", "0 | \n", "150 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3.000000 | \n", "3.000150 | \n", "3.0 | \n", "... | \n", "0.0 | \n", "1.0 | \n", "2.0 | \n", "2.0 | \n", "2.0 | \n", "2.0 | \n", "SummaryType.COLUMN | \n", "2.0 | \n", "0.0 | \n", "[FrequentItem(value='0.000000', est=50, upper=... | \n", "
5 rows × 28 columns
\n", "