{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n", ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Classification_Performance_Metrics_to_WhyLabs)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Classification_Performance_Metrics_to_WhyLabs) to leverage the power of whylogs and WhyLabs together!*" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Monitoring Classification Model Performance Metrics\n", "\n", "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_Classification_Performance_Metrics_to_WhyLabs.ipynb)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we'll show how you can log performance metrics of your ML Model with whylogs, and how to send it to your dashboard at Whylabs Platform. We'll follow a classification use case, where we're predicting whether an incoming product should be offered a discount or not based on past transaction information.\n", "\n", "We will:\n", "\n", "- Download Ecommerce Data for 7 days\n", "- Log daily input features with whylogs\n", "- Log daily classification performance metrics with whylogs\n", "- Write logged profiles to WhyLabs' dashboard\n", "- Show performance summary at WhyLabs\n", "- __Advanced__: Monitor segmented performance metrics" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Installing whylogs\n", "\n", "First, let's install whylogs. Since we want to write to WhyLabs, we'll install the whylabs extra. Additionally, we'll use the datasets module, so let's install it as well:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note: you may need to restart the kernel to use updated packages.\n", "%pip install 'whylogs[datasets]'" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 🛍️ The Data - Ecommerce Dataset" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The Ecommerce dataset contains transaction information of several products for a popular grocery supermarket in India. It contains features such as the product's description, category, market price and user rating.\n", "\n", "The original data was sourced from Kaggle's [BigBasket Entire Product List](https://www.kaggle.com/datasets/surajjha101/bigbasket-entire-product-list-28k-datapoints). From the source data additional transformations were made, such as: oversampling and feature creation/engineering.\n", "\n", "You can have more information about the resulting dataset and how to use it at https://whylogs.readthedocs.io/en/latest/datasets/ecommerce.html." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Downloading the data into daily batches\n", "Let's download 7 batches with 7 days worth of data, corresponding to the last 7 days. We can use directly the datasets module for that." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from whylogs.datasets import Ecommerce\n", "from datetime import datetime, timezone, timedelta\n", "dataset = Ecommerce()\n", "\n", "start_timestamp = datetime.now(timezone.utc) - timedelta(days=6)\n", "dataset.set_parameters(inference_start_timestamp=start_timestamp)\n", "\n", "daily_batches = dataset.get_inference_data(number_batches=7)\n", "\n", "#batches is an iterator, so let's get the list for this\n", "daily_batches = list(daily_batches)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Since in this example we're mainly concerned with classification metrics, let's select a subset of the available features, for simplicity.\n", "\n", "Input features:\n", "\n", "- __product__\n", "- __sales_last_week__\n", "- __market_price__\n", "- __rating__\n", "- __category__\n", "\n", "Target feature:\n", "\n", "- __output_discount__\n", "\n", "Prediction feature:\n", "\n", "- __output_prediction__\n", "\n", "Score feature:\n", "\n", "- __output_score__, which is the class probability for the predicted class.\n", "\n", "The target and prediction features are encoded as 0's and 1's. While this example would work just as well this way, let's encode these categories to strings - `discount` and `full price` - for didactical purposes. \n", "\n", "Let's take a look at the resulting data for the first day:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productsales_last_weekmarket_priceratingcategoryoutput_discountoutput_predictionoutput_score
date
2023-01-10 00:00:00+00:001-2-3 Noodles - Veg Masala Flavour212.04Snacks and Branded Foodsfull pricefull price1.000000
2023-01-10 00:00:00+00:00Jaggery Powder - Organic, Sulphur Free1280.03Gourmet and World Foodfull pricefull price0.571833
2023-01-10 00:00:00+00:00Pudding - Assorted350.04Gourmet and World Foodfull pricediscount0.600000
2023-01-10 00:00:00+00:00Perfectly Moist Dark Chocolate Fudge Cake Mix ...1495.04Gourmet and World Foodfull pricediscount0.517833
2023-01-10 00:00:00+00:00Pasta/Spaghetti Spoon - Nylon, Silicon Handle,...1299.03Kitchen, Garden and Petsdiscountdiscount0.950000
\n", "
" ], "text/plain": [ " product \\\n", "date \n", "2023-01-10 00:00:00+00:00 1-2-3 Noodles - Veg Masala Flavour \n", "2023-01-10 00:00:00+00:00 Jaggery Powder - Organic, Sulphur Free \n", "2023-01-10 00:00:00+00:00 Pudding - Assorted \n", "2023-01-10 00:00:00+00:00 Perfectly Moist Dark Chocolate Fudge Cake Mix ... \n", "2023-01-10 00:00:00+00:00 Pasta/Spaghetti Spoon - Nylon, Silicon Handle,... \n", "\n", " sales_last_week market_price rating \\\n", "date \n", "2023-01-10 00:00:00+00:00 2 12.0 4 \n", "2023-01-10 00:00:00+00:00 1 280.0 3 \n", "2023-01-10 00:00:00+00:00 3 50.0 4 \n", "2023-01-10 00:00:00+00:00 1 495.0 4 \n", "2023-01-10 00:00:00+00:00 1 299.0 3 \n", "\n", " category output_discount \\\n", "date \n", "2023-01-10 00:00:00+00:00 Snacks and Branded Foods full price \n", "2023-01-10 00:00:00+00:00 Gourmet and World Food full price \n", "2023-01-10 00:00:00+00:00 Gourmet and World Food full price \n", "2023-01-10 00:00:00+00:00 Gourmet and World Food full price \n", "2023-01-10 00:00:00+00:00 Kitchen, Garden and Pets discount \n", "\n", " output_prediction output_score \n", "date \n", "2023-01-10 00:00:00+00:00 full price 1.000000 \n", "2023-01-10 00:00:00+00:00 full price 0.571833 \n", "2023-01-10 00:00:00+00:00 discount 0.600000 \n", "2023-01-10 00:00:00+00:00 discount 0.517833 \n", "2023-01-10 00:00:00+00:00 discount 0.950000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "columns = ['product','sales_last_week','market_price','rating','category','output_discount','output_prediction','output_score']\n", "\n", "df = daily_batches[0].data[columns]\n", "\n", "df['output_discount'] = df['output_discount'].apply(lambda x: \"discount\" if x==1 else \"full price\")\n", "df['output_prediction'] = df['output_prediction'].apply(lambda x: \"discount\" if x==1 else \"full price\")\n", "df.head()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## ✔️ Setting the Environment Variables\n", "\n", "In order to send our profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.\n", "\n", "We will need three pieces of information:\n", "\n", "- API token\n", "- Organization ID\n", "- Dataset ID (or model-id)\n", "\n", "Go to https://whylabs.ai/free and grab a free account. You can follow along with the examples if you wish, but if you’re interested in only following this demonstration, you can go ahead and skip the quick start instructions.\n", "\n", "After that, you’ll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like `model-xxxx`)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Enter your WhyLabs Org ID\n", "Enter your WhyLabs Dataset ID\n", "Enter your WhyLabs API key\n", "Using API Key ID: \n" ] } ], "source": [ "import getpass\n", "import os\n", "\n", "# set your org-id here - should be something like \"org-xxxx\"\n", "print(\"Enter your WhyLabs Org ID\") \n", "os.environ[\"WHYLABS_DEFAULT_ORG_ID\"] = input()\n", "\n", "# set your datased_id (or model_id) here - should be something like \"model-xxxx\"\n", "print(\"Enter your WhyLabs Dataset ID\")\n", "os.environ[\"WHYLABS_DEFAULT_DATASET_ID\"] = input()\n", "\n", "\n", "# set your API key here\n", "print(\"Enter your WhyLabs API key\")\n", "os.environ[\"WHYLABS_API_KEY\"] = getpass.getpass()\n", "print(\"Using API Key ID: \", os.environ[\"WHYLABS_API_KEY\"][0:10])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 📊 Profiling the Data + Sending to WhyLabs\n", "\n", "Traditionally, data is logged by calling `why.log()`. In this case, we'll use `why.log_classification_metrics()`. By setting `log_full_data` to True, we will log the complete dataframe as in `why.log()`, but additionally it will compute classification metrics and add them to your results. If you want to log only the the `target`, `prediction`, and `score` columns, without the complete data, you can simply call `log_regression_metrics()` without specifying `log_full_data`, which defaults to False. \n", "\n", "`log_classification_metrics` takes the complete dataframe as input (with input/output features, as well as your prediction, target and score column). We also have to define which column is our target (in this case, `output_discount`) and which is our prediction column (`output_prediction` in this case). Additionally, in order to generate confusion matrices and ROC curves, we will also define a score column (`output_score`).\n", "\n", "Once the profile is logged, we can set it's timestamp for the proper day as given by our batch's timestamp.Now that we have properly timestamped profiles with regression metrics, we can use the `writer` method to send it to WhyLabs:" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "> Note: for whylogs versions 1.1.13 and lower, the default behavior of `log_regression_metrics` was logging the complete data, in addition to target and prediction columns. If you were using it in the older versions and wish to keep that behavior when updating whylogs, set `log_full_data` to True." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "logging data for date 2022-09-07 00:00:00+00:00\n", "writing profiles to whylabs...\n", "logging data for date 2022-09-08 00:00:00+00:00\n", "writing profiles to whylabs...\n", "logging data for date 2022-09-09 00:00:00+00:00\n", "writing profiles to whylabs...\n", "logging data for date 2022-09-10 00:00:00+00:00\n", "writing profiles to whylabs...\n", "logging data for date 2022-09-11 00:00:00+00:00\n", "writing profiles to whylabs...\n", "logging data for date 2022-09-12 00:00:00+00:00\n", "writing profiles to whylabs...\n", "logging data for date 2022-09-13 00:00:00+00:00\n", "writing profiles to whylabs...\n" ] } ], "source": [ "from whylogs.api.writer.whylabs import WhyLabsWriter\n", "import whylogs as why\n", "\n", "\n", "columns = ['product','sales_last_week','market_price','rating','category','output_discount','output_prediction','output_score']\n", "\n", "for batch in daily_batches:\n", " dataset_timestamp = batch.timestamp\n", "\n", " df = batch.data[columns]\n", " df['output_discount'] = df['output_discount'].apply(lambda x: \"discount\" if x==1 else \"full price\")\n", " df['output_prediction'] = df['output_prediction'].apply(lambda x: \"discount\" if x==1 else \"full price\")\n", "\n", " print(\"logging data for date {}\".format(dataset_timestamp))\n", " results = why.log_classification_metrics(\n", " df,\n", " target_column = \"output_discount\",\n", " prediction_column = \"output_prediction\",\n", " score_column=\"output_score\",\n", " log_full_data=True\n", " )\n", "\n", " profile = results.profile()\n", " profile.set_dataset_timestamp(dataset_timestamp)\n", "\n", " print(\"writing profiles to whylabs...\")\n", " results.writer(\"whylabs\").write()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "And that's it! You just sent your profiles to WhyLabs.\n", "\n", "At your model's dashboard, you should see the model metrics for the last seven days. For classification, the displayed metrics are:\n", "\n", "- Total output and input count\n", "- Accuracy\n", "- ROC\n", "- Precision-Recall chart\n", "- Confusion Matrix\n", "- Recall\n", "- FPR (false positive rate)\n", "- Precision\n", "- F1" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "![alt text](images/whylabs_classification.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced Usage - Monitoring Segmented Performance Metrics" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "You can also log performance metrics for specific segments of your data.\n", "\n", "For example, let's say we want to monitor the performance of our model according to the product's category. We might be interested in comparing performance metrics of the complete data against a specific segment, such as `Fruits & Vegetables` or `Beverages`, for example, or compare one segment against another.\n", "\n", "To do that, it's very simple: we just need to specify the column we wish to segment on and pass that information to a __Dataset Schema__ object, like this:" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "> If you want to learn more about segments, please refer to [this example](https://nbviewer.org/github/whylabs/whylogs/blob/mainline/python/examples/advanced/Segments.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from whylogs.core.segmentation_partition import segment_on_column\n", "from whylogs.core.schema import DatasetSchema\n", "\n", "segment_column = \"category\"\n", "segmented_schema = DatasetSchema(segments=segment_on_column(segment_column))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can log the profiles as demonstraded previously, with two differences. We'll pass the schema to the `log_classification_metrics` method, and also set the timestamp to the complete result set:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from whylogs.api.writer.whylabs import WhyLabsWriter\n", "import whylogs as why\n", "\n", "\n", "columns = ['product','sales_last_week','market_price','rating','category','output_discount','output_prediction','output_score']\n", "\n", "for batch in daily_batches:\n", " dataset_timestamp = batch.timestamp\n", "\n", " df = batch.data[columns]\n", " df['output_discount'] = df['output_discount'].apply(lambda x: \"discount\" if x==1 else \"full price\")\n", " df['output_prediction'] = df['output_prediction'].apply(lambda x: \"discount\" if x==1 else \"full price\")\n", " \n", " print(\"logging data for date {}\".format(dataset_timestamp))\n", " results = why.log_classification_metrics(\n", " df,\n", " target_column = \"output_discount\",\n", " prediction_column = \"output_prediction\",\n", " score_column=\"output_score\",\n", " schema=segmented_schema,\n", " log_full_data=True\n", " )\n", "\n", " results.set_dataset_timestamp(dataset_timestamp)\n", "\n", " print(\"writing profiles to whylabs...\")\n", " results.writer(\"whylabs\").write()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Note that we're passing `segmented_schema` as the `schema` parameter. This will tell whylogs to log the metrics for the complete data, as well as for each segment.\n", "\n", "You'll get a similar dashboard at the `performance` page at WhyLabs, but now you'll see a dropdown menu with the available segments (or complete data). You'll also be able to choose a second segment to compare against:" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "![alt text](images/segmented_performance_class.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We can see that products in the `Kitchen, Garden and Pets` category represents around 14% of the total data (605/4160), and the classification model seems to perform consistently better for this category. You can verify it by exploring the other metrics on this dashboard, or look for other insights by exploring the other segments!\n" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "5dd5901cadfd4b29c2aaf95ecd29c0c3b10829ad94dcfe59437dbee391154aea" } } }, "nbformat": 4, "nbformat_minor": 2 }