{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n",
">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Classification_Performance_Metrics_to_WhyLabs)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Classification_Performance_Metrics_to_WhyLabs) to leverage the power of whylogs and WhyLabs together!*"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Monitoring Classification Model Performance Metrics\n",
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_Classification_Performance_Metrics_to_WhyLabs.ipynb)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In this tutorial, we'll show how you can log performance metrics of your ML Model with whylogs, and how to send it to your dashboard at Whylabs Platform. We'll follow a classification use case, where we're predicting whether an incoming product should be offered a discount or not based on past transaction information.\n",
"\n",
"We will:\n",
"\n",
"- Download Ecommerce Data for 7 days\n",
"- Log daily input features with whylogs\n",
"- Log daily classification performance metrics with whylogs\n",
"- Write logged profiles to WhyLabs' dashboard\n",
"- Show performance summary at WhyLabs\n",
"- __Advanced__: Monitor segmented performance metrics"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installing whylogs\n",
"\n",
"First, let's install whylogs. Since we want to write to WhyLabs, we'll install the whylabs extra. Additionally, we'll use the datasets module, so let's install it as well:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Note: you may need to restart the kernel to use updated packages.\n",
"%pip install 'whylogs[datasets]'"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🛍️ The Data - Ecommerce Dataset"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"The Ecommerce dataset contains transaction information of several products for a popular grocery supermarket in India. It contains features such as the product's description, category, market price and user rating.\n",
"\n",
"The original data was sourced from Kaggle's [BigBasket Entire Product List](https://www.kaggle.com/datasets/surajjha101/bigbasket-entire-product-list-28k-datapoints). From the source data additional transformations were made, such as: oversampling and feature creation/engineering.\n",
"\n",
"You can have more information about the resulting dataset and how to use it at https://whylogs.readthedocs.io/en/latest/datasets/ecommerce.html."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Downloading the data into daily batches\n",
"Let's download 7 batches with 7 days worth of data, corresponding to the last 7 days. We can use directly the datasets module for that."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from whylogs.datasets import Ecommerce\n",
"from datetime import datetime, timezone, timedelta\n",
"dataset = Ecommerce()\n",
"\n",
"start_timestamp = datetime.now(timezone.utc) - timedelta(days=6)\n",
"dataset.set_parameters(inference_start_timestamp=start_timestamp)\n",
"\n",
"daily_batches = dataset.get_inference_data(number_batches=7)\n",
"\n",
"#batches is an iterator, so let's get the list for this\n",
"daily_batches = list(daily_batches)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Since in this example we're mainly concerned with classification metrics, let's select a subset of the available features, for simplicity.\n",
"\n",
"Input features:\n",
"\n",
"- __product__\n",
"- __sales_last_week__\n",
"- __market_price__\n",
"- __rating__\n",
"- __category__\n",
"\n",
"Target feature:\n",
"\n",
"- __output_discount__\n",
"\n",
"Prediction feature:\n",
"\n",
"- __output_prediction__\n",
"\n",
"Score feature:\n",
"\n",
"- __output_score__, which is the class probability for the predicted class.\n",
"\n",
"The target and prediction features are encoded as 0's and 1's. While this example would work just as well this way, let's encode these categories to strings - `discount` and `full price` - for didactical purposes. \n",
"\n",
"Let's take a look at the resulting data for the first day:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | product | \n", "sales_last_week | \n", "market_price | \n", "rating | \n", "category | \n", "output_discount | \n", "output_prediction | \n", "output_score | \n", "
---|---|---|---|---|---|---|---|---|
date | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2023-01-10 00:00:00+00:00 | \n", "1-2-3 Noodles - Veg Masala Flavour | \n", "2 | \n", "12.0 | \n", "4 | \n", "Snacks and Branded Foods | \n", "full price | \n", "full price | \n", "1.000000 | \n", "
2023-01-10 00:00:00+00:00 | \n", "Jaggery Powder - Organic, Sulphur Free | \n", "1 | \n", "280.0 | \n", "3 | \n", "Gourmet and World Food | \n", "full price | \n", "full price | \n", "0.571833 | \n", "
2023-01-10 00:00:00+00:00 | \n", "Pudding - Assorted | \n", "3 | \n", "50.0 | \n", "4 | \n", "Gourmet and World Food | \n", "full price | \n", "discount | \n", "0.600000 | \n", "
2023-01-10 00:00:00+00:00 | \n", "Perfectly Moist Dark Chocolate Fudge Cake Mix ... | \n", "1 | \n", "495.0 | \n", "4 | \n", "Gourmet and World Food | \n", "full price | \n", "discount | \n", "0.517833 | \n", "
2023-01-10 00:00:00+00:00 | \n", "Pasta/Spaghetti Spoon - Nylon, Silicon Handle,... | \n", "1 | \n", "299.0 | \n", "3 | \n", "Kitchen, Garden and Pets | \n", "discount | \n", "discount | \n", "0.950000 | \n", "