{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n", ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=ecommerce)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=ecommerce) to leverage the power of whylogs and WhyLabs together!*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Ecommerce Dataset - Usage Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/datasets/ecommerce.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This an example demonstrating the usage of the Ecommerce Dataset.\n", "\n", "For more information about the dataset itself, check the documentation on :\n", "https://whylogs.readthedocs.io/en/latest/datasets/ecommerce.html" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Installing the datasets module" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Note: you may need to restart the kernel to use updated packages.\n", "%pip install 'whylogs[datasets]'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading the Dataset\n", "\n", "You can load the dataset of your choice by calling it from the `datasets` module:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from whylogs.datasets import Ecommerce\n", "\n", "dataset = Ecommerce(version=\"base\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If no `version` parameter is passed, the default version is `base`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This will create a folder in the current directory named `whylogs_data` with the csv files for the Ecommerce Dataset. If the files already exist, the module will not redownload the files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Discovering Information" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To know what are the available versions for a given dataset, you can call:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('base',)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Ecommerce.describe_versions()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get access to overall description of the dataset:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ecommerce Dataset\n", "=================\n", "\n", "The Ecommerce dataset contains transaction information of several products for a popular grocery supermarket in India. It contains features such as the product's description, category, market price and user rating.\n", "\n", "The original data was sourced from Kaggle's [BigBasket Entire Product List](https://www.kaggle.com/datasets/surajjha101/bigbasket-entire-product-list-28k-datapoints). From the source data additional transformations were made, such as: oversampling and feature creation/engineering.\n", "\n", "License:\n", "CC BY-NC-SA 4.0\n", "\n", "Usage\n", "-----\n", "\n", "You can follow this guide to see how to use the ecommerce dataset:\n", "\n", ".. toctree::\n", " :maxdepth: 1\n", "\n", " ../examples/datasets/ecommerce\n", "\n", "\n", "Versions and Data Partitions\n", "----------------------------\n", "\n", "Currently the dataset contains one version: **base**. The task for the base version is to classify wether an incoming product should be provided a discount, given product features such as history of items sold, user rating, catego\n" ] } ], "source": [ "print(Ecommerce.describe()[:1000])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "note: the output was truncated to first 1000 characters as `describe()` will print a rather lengthy description." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Baseline Data\n", "\n", "You can access data from two different partitions: the baseline dataset and inference dataset.\n", "\n", "The baseline can be accessed as a whole, whereas the inference dataset can be accessed in periodic batches, defined by the user.\n", "\n", "To get a `baseline` object, just call `dataset.get_baseline()`:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from whylogs.datasets import Weather\n", "\n", "dataset = Ecommerce()\n", "\n", "baseline = dataset.get_baseline()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`baseline` will contain different attributes - one timestamp and five dataframes.\n", "\n", "- timestamp: the batch's timestamp (at the start)\n", "- data: the complete dataframe\n", "- features: input features\n", "- target: output feature(s)\n", "- prediction: output prediction and, possibly, features such as uncertainty, confidence, probability\n", "- extra: metadata features that are not of any of the previous categories, but still contain relevant information about the data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2022, 9, 12, 0, 0, tzinfo=datetime.timezone.utc)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baseline.timestamp" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productsales_last_weekmarket_priceratingcategory
date
2022-09-12 00:00:00+00:00Wood - Centre Filled Bar Infused With Dark Mou...1350.04.500000Snacks and Branded Foods
2022-09-12 00:00:00+00:00Toasted Almonds1399.03.944479Gourmet and World Food
2022-09-12 00:00:00+00:00Instant Thai Noodles - Hot & Spicy Tomyum195.03.300000Gourmet and World Food
2022-09-12 00:00:00+00:00Thokku - Vathakozhambu1336.04.300000Snacks and Branded Foods
2022-09-12 00:00:00+00:00Beetroot Powder1150.03.944479Gourmet and World Food
\n", "
" ], "text/plain": [ " product \\\n", "date \n", "2022-09-12 00:00:00+00:00 Wood - Centre Filled Bar Infused With Dark Mou... \n", "2022-09-12 00:00:00+00:00 Toasted Almonds \n", "2022-09-12 00:00:00+00:00 Instant Thai Noodles - Hot & Spicy Tomyum \n", "2022-09-12 00:00:00+00:00 Thokku - Vathakozhambu \n", "2022-09-12 00:00:00+00:00 Beetroot Powder \n", "\n", " sales_last_week market_price rating \\\n", "date \n", "2022-09-12 00:00:00+00:00 1 350.0 4.500000 \n", "2022-09-12 00:00:00+00:00 1 399.0 3.944479 \n", "2022-09-12 00:00:00+00:00 1 95.0 3.300000 \n", "2022-09-12 00:00:00+00:00 1 336.0 4.300000 \n", "2022-09-12 00:00:00+00:00 1 150.0 3.944479 \n", "\n", " category \n", "date \n", "2022-09-12 00:00:00+00:00 Snacks and Branded Foods \n", "2022-09-12 00:00:00+00:00 Gourmet and World Food \n", "2022-09-12 00:00:00+00:00 Gourmet and World Food \n", "2022-09-12 00:00:00+00:00 Snacks and Branded Foods \n", "2022-09-12 00:00:00+00:00 Gourmet and World Food " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baseline.features.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting Parameters\n", "\n", "With `set_parameters`, you can specify the timestamps for both baseline and inference datasets, as well as the inference interval.\n", "\n", "By default, the timestamp is set as:\n", "- Current date for baseline dataset\n", "- Tomorrow's date for inference dataset\n", "\n", "These timestamps can be defined by the user to any given day, including the dataset's original date.\n", "\n", "The `inference_interval` defines the interval for each batch: '1d' means that we will have daily batches, while '7d' would mean weekly batches." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To set the timestamps to the original dataset's date, set `original` to true, like below:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Currently, the inference interval takes a str in the format \"Xd\", where X is an integer between 1-30\n", "dataset.set_parameters(inference_interval=\"1d\", original=True)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2022, 8, 9, 0, 0, tzinfo=datetime.timezone.utc)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baseline = dataset.get_baseline()\n", "baseline.timestamp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can set timestamp by using the `baseline_timestamp` and `inference_start_timestamp`, and the inference interval like below:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime, timezone\n", "now = datetime.now(timezone.utc)\n", "dataset.set_parameters(baseline_timestamp=now, inference_start_timestamp=now, inference_interval=\"1d\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Note that we are passing the datetime converted to the UTC timezone. If a naive datetime is passed (no information on timezones), local time zone will be assumed. The local timestamp, however, will be converted to the proper datetime in UTC timezone. Passing a naive datetime will trigger a warning, letting you know of this behavior." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that if both `original` and a timestamp (baseline or inference) is passed simultaneously, the defined timestamp will be overwritten by the original dataset timestamp." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Inference Data #1 - By Date" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can get inference data in two different ways. The first is to specify the exact date you want, which will return a single batch:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "batch = dataset.get_inference_data(target_date=now)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can access the attributes just as showed before:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2022, 9, 12, 0, 0, tzinfo=datetime.timezone.utc)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "batch.timestamp" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productsales_last_weekmarket_priceratingcategorycategory.Baby Carecategory.Bakery, Cakes and Dairycategory.Beauty and Hygienecategory.Beveragescategory.Cleaning and Householdcategory.Eggs, Meat and Fishcategory.Foodgrains, Oil and Masalacategory.Fruits and Vegetablescategory.Gourmet and World Foodcategory.Kitchen, Garden and Petscategory.Snacks and Branded Foodsoutput_discountoutput_predictionoutput_score
date
2022-09-12 00:00:00+00:001-2-3 Noodles - Veg Masala Flavour212.04.200000Snacks and Branded Foods00000000001001.000000
2022-09-12 00:00:00+00:00Jaggery Powder - Organic, Sulphur Free1280.03.996552Gourmet and World Food00000000100000.571833
2022-09-12 00:00:00+00:00Pudding - Assorted350.04.400000Gourmet and World Food00000000100010.600000
2022-09-12 00:00:00+00:00Perfectly Moist Dark Chocolate Fudge Cake Mix ...1495.04.000000Gourmet and World Food00000000100010.517833
2022-09-12 00:00:00+00:00Pasta/Spaghetti Spoon - Nylon, Silicon Handle,...1299.03.732046Kitchen, Garden and Pets00000000010110.950000
............................................................
2022-09-12 00:00:00+00:00Premium Fish Fillet1250.03.931378Eggs, Meat and Fish00000100000100.910000
2022-09-12 00:00:00+00:00Organic Fennel & Nut Delight Laddoo - Low Carb...1499.01.700000Snacks and Branded Foods00000000001000.622333
2022-09-12 00:00:00+00:00Steel Storage Deep Dabba/ Container Set With P...1695.03.600000Kitchen, Garden and Pets00000000010110.990000
2022-09-12 00:00:00+00:00Venezia Large Bowl - Tempered Glass2495.03.813672Kitchen, Garden and Pets00000000010100.860000
2022-09-12 00:00:00+00:00Cologne - Tattoo For Men1799.04.000000Beauty and Hygiene00100000000110.585714
\n", "

4133 rows × 19 columns

\n", "
" ], "text/plain": [ " product \\\n", "date \n", "2022-09-12 00:00:00+00:00 1-2-3 Noodles - Veg Masala Flavour \n", "2022-09-12 00:00:00+00:00 Jaggery Powder - Organic, Sulphur Free \n", "2022-09-12 00:00:00+00:00 Pudding - Assorted \n", "2022-09-12 00:00:00+00:00 Perfectly Moist Dark Chocolate Fudge Cake Mix ... \n", "2022-09-12 00:00:00+00:00 Pasta/Spaghetti Spoon - Nylon, Silicon Handle,... \n", "... ... \n", "2022-09-12 00:00:00+00:00 Premium Fish Fillet \n", "2022-09-12 00:00:00+00:00 Organic Fennel & Nut Delight Laddoo - Low Carb... \n", "2022-09-12 00:00:00+00:00 Steel Storage Deep Dabba/ Container Set With P... \n", "2022-09-12 00:00:00+00:00 Venezia Large Bowl - Tempered Glass \n", "2022-09-12 00:00:00+00:00 Cologne - Tattoo For Men \n", "\n", " sales_last_week market_price rating \\\n", "date \n", "2022-09-12 00:00:00+00:00 2 12.0 4.200000 \n", "2022-09-12 00:00:00+00:00 1 280.0 3.996552 \n", "2022-09-12 00:00:00+00:00 3 50.0 4.400000 \n", "2022-09-12 00:00:00+00:00 1 495.0 4.000000 \n", "2022-09-12 00:00:00+00:00 1 299.0 3.732046 \n", "... ... ... ... \n", "2022-09-12 00:00:00+00:00 1 250.0 3.931378 \n", "2022-09-12 00:00:00+00:00 1 499.0 1.700000 \n", "2022-09-12 00:00:00+00:00 1 695.0 3.600000 \n", "2022-09-12 00:00:00+00:00 2 495.0 3.813672 \n", "2022-09-12 00:00:00+00:00 1 799.0 4.000000 \n", "\n", " category category.Baby Care \\\n", "date \n", "2022-09-12 00:00:00+00:00 Snacks and Branded Foods 0 \n", "2022-09-12 00:00:00+00:00 Gourmet and World Food 0 \n", "2022-09-12 00:00:00+00:00 Gourmet and World Food 0 \n", "2022-09-12 00:00:00+00:00 Gourmet and World Food 0 \n", "2022-09-12 00:00:00+00:00 Kitchen, Garden and Pets 0 \n", "... ... ... \n", "2022-09-12 00:00:00+00:00 Eggs, Meat and Fish 0 \n", "2022-09-12 00:00:00+00:00 Snacks and Branded Foods 0 \n", "2022-09-12 00:00:00+00:00 Kitchen, Garden and Pets 0 \n", "2022-09-12 00:00:00+00:00 Kitchen, Garden and Pets 0 \n", "2022-09-12 00:00:00+00:00 Beauty and Hygiene 0 \n", "\n", " category.Bakery, Cakes and Dairy \\\n", "date \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "... ... \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "\n", " category.Beauty and Hygiene category.Beverages \\\n", "date \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "... ... ... \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 1 0 \n", "\n", " category.Cleaning and Household \\\n", "date \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "... ... \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "\n", " category.Eggs, Meat and Fish \\\n", "date \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "... ... \n", "2022-09-12 00:00:00+00:00 1 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "\n", " category.Foodgrains, Oil and Masala \\\n", "date \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "... ... \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "\n", " category.Fruits and Vegetables \\\n", "date \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "... ... \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "\n", " category.Gourmet and World Food \\\n", "date \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 1 \n", "2022-09-12 00:00:00+00:00 1 \n", "2022-09-12 00:00:00+00:00 1 \n", "2022-09-12 00:00:00+00:00 0 \n", "... ... \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "\n", " category.Kitchen, Garden and Pets \\\n", "date \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 1 \n", "... ... \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 0 \n", "2022-09-12 00:00:00+00:00 1 \n", "2022-09-12 00:00:00+00:00 1 \n", "2022-09-12 00:00:00+00:00 0 \n", "\n", " category.Snacks and Branded Foods output_discount \\\n", "date \n", "2022-09-12 00:00:00+00:00 1 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 0 \n", "2022-09-12 00:00:00+00:00 0 1 \n", "... ... ... \n", "2022-09-12 00:00:00+00:00 0 1 \n", "2022-09-12 00:00:00+00:00 1 0 \n", "2022-09-12 00:00:00+00:00 0 1 \n", "2022-09-12 00:00:00+00:00 0 1 \n", "2022-09-12 00:00:00+00:00 0 1 \n", "\n", " output_prediction output_score \n", "date \n", "2022-09-12 00:00:00+00:00 0 1.000000 \n", "2022-09-12 00:00:00+00:00 0 0.571833 \n", "2022-09-12 00:00:00+00:00 1 0.600000 \n", "2022-09-12 00:00:00+00:00 1 0.517833 \n", "2022-09-12 00:00:00+00:00 1 0.950000 \n", "... ... ... \n", "2022-09-12 00:00:00+00:00 0 0.910000 \n", "2022-09-12 00:00:00+00:00 0 0.622333 \n", "2022-09-12 00:00:00+00:00 1 0.990000 \n", "2022-09-12 00:00:00+00:00 0 0.860000 \n", "2022-09-12 00:00:00+00:00 1 0.585714 \n", "\n", "[4133 rows x 19 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "batch.data" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
output_predictionoutput_score
date
2022-09-12 00:00:00+00:0001.000000
2022-09-12 00:00:00+00:0000.571833
2022-09-12 00:00:00+00:0010.600000
2022-09-12 00:00:00+00:0010.517833
2022-09-12 00:00:00+00:0010.950000
\n", "
" ], "text/plain": [ " output_prediction output_score\n", "date \n", "2022-09-12 00:00:00+00:00 0 1.000000\n", "2022-09-12 00:00:00+00:00 0 0.571833\n", "2022-09-12 00:00:00+00:00 1 0.600000\n", "2022-09-12 00:00:00+00:00 1 0.517833\n", "2022-09-12 00:00:00+00:00 1 0.950000" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "batch.prediction.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Inference Data #2 - By Number of Batches\n", "\n", "The second way is to specify the number of batches you want and also the date for the first batch.\n", "\n", "You can then iterate over the returned object to get the batches. You can then use the batch any way you want. Here's an example that retrieves daily batches for a period of 5 days and logs each one with __whylogs__, saving the binary profiles to disk:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "logging batch of size 4133 for 2022-09-12 00:00:00+00:00\n", "logging batch of size 4193 for 2022-09-13 00:00:00+00:00\n", "logging batch of size 4136 for 2022-09-14 00:00:00+00:00\n", "logging batch of size 4130 for 2022-09-15 00:00:00+00:00\n", "logging batch of size 4131 for 2022-09-16 00:00:00+00:00\n" ] } ], "source": [ "import whylogs as why\n", "batches = dataset.get_inference_data(number_batches=5)\n", "\n", "for batch in batches:\n", " print(\"logging batch of size {} for {}\".format(len(batch.data),batch.timestamp))\n", " profile = why.log(batch.data).profile()\n", " profile.set_dataset_timestamp(batch.timestamp)\n", " profile.view().write(\"batch_{}\".format(batch.timestamp))" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "5dd5901cadfd4b29c2aaf95ecd29c0c3b10829ad94dcfe59437dbee391154aea" } } }, "nbformat": 4, "nbformat_minor": 2 }