{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "Transactions allow several profiles to be commited to WhyLabs as a group. Let's start with some setup.\n" ], "metadata": { "id": "u5FQGlNpNVUX" } }, { "cell_type": "code", "source": [ "!pip install whylogs" ], "metadata": { "id": "rDZLfAYMi7vi" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "import whylogs as why\n", "from whylabs_client.api.transactions_api import TransactionsApi\n", "from whylogs.core.schema import DatasetSchema\n", "from whylogs.core.segmentation_partition import segment_on_column\n", "from whylogs.api.writer.whylabs import WhyLabsWriter, WhyLabsTransaction\n", "import os\n", "from uuid import uuid4\n", "from whylogs.datasets import Ecommerce\n", "import numpy as np\n", "import pandas as pd\n", "from datetime import datetime, timedelta, timezone" ], "metadata": { "id": "3eyEw1UUi_nl" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "os.environ[\"WHYLABS_DEFAULT_ORG_ID\"] = \"org-XXX\"\n", "os.environ[\"WHYLABS_DEFAULT_DATASET_ID\"] = \"model-XXX\"\n", "os.environ[\"WHYLABS_API_KEY\"] = \"XXXX:org-XXX\"" ], "metadata": { "id": "h3Fq8l14XmpA" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Get example dataset" ], "metadata": { "id": "ul-WBntiyS9x" } }, { "cell_type": "code", "source": [ "dataset = Ecommerce()" ], "metadata": { "id": "kbVch_DaySHW" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "daily_batches = dataset.get_inference_data(number_batches=20)\n", "list_daily_batches = list(daily_batches)" ], "metadata": { "id": "G_wcHlmBypVj" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "columns = ['product','sales_last_week','market_price','rating','category','output_discount','output_prediction','output_score']\n", "\n", "df = list_daily_batches[0].data[columns]" ], "metadata": { "id": "ltwx9rVJyzDc" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 235 }, "id": "Wd0Ms0UDynEi", "outputId": "505fef55-900a-4f85-d86d-a0f48ccb3a69" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " product \\\n", "date \n", "2024-02-23 00:00:00+00:00 1-2-3 Noodles - Veg Masala Flavour \n", "2024-02-23 00:00:00+00:00 Jaggery Powder - Organic, Sulphur Free \n", "2024-02-23 00:00:00+00:00 Pudding - Assorted \n", "2024-02-23 00:00:00+00:00 Perfectly Moist Dark Chocolate Fudge Cake Mix ... \n", "2024-02-23 00:00:00+00:00 Pasta/Spaghetti Spoon - Nylon, Silicon Handle,... \n", "\n", " sales_last_week market_price rating \\\n", "date \n", "2024-02-23 00:00:00+00:00 2 12.0 4.200000 \n", "2024-02-23 00:00:00+00:00 1 280.0 3.996552 \n", "2024-02-23 00:00:00+00:00 3 50.0 4.400000 \n", "2024-02-23 00:00:00+00:00 1 495.0 4.000000 \n", "2024-02-23 00:00:00+00:00 1 299.0 3.732046 \n", "\n", " category output_discount \\\n", "date \n", "2024-02-23 00:00:00+00:00 Snacks and Branded Foods 0 \n", "2024-02-23 00:00:00+00:00 Gourmet and World Food 0 \n", "2024-02-23 00:00:00+00:00 Gourmet and World Food 0 \n", "2024-02-23 00:00:00+00:00 Gourmet and World Food 0 \n", "2024-02-23 00:00:00+00:00 Kitchen, Garden and Pets 1 \n", "\n", " output_prediction output_score \n", "date \n", "2024-02-23 00:00:00+00:00 0 1.000000 \n", "2024-02-23 00:00:00+00:00 0 0.571833 \n", "2024-02-23 00:00:00+00:00 1 0.600000 \n", "2024-02-23 00:00:00+00:00 1 0.517833 \n", "2024-02-23 00:00:00+00:00 1 0.950000 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productsales_last_weekmarket_priceratingcategoryoutput_discountoutput_predictionoutput_score
date
2024-02-23 00:00:00+00:001-2-3 Noodles - Veg Masala Flavour212.04.200000Snacks and Branded Foods001.000000
2024-02-23 00:00:00+00:00Jaggery Powder - Organic, Sulphur Free1280.03.996552Gourmet and World Food000.571833
2024-02-23 00:00:00+00:00Pudding - Assorted350.04.400000Gourmet and World Food010.600000
2024-02-23 00:00:00+00:00Perfectly Moist Dark Chocolate Fudge Cake Mix ...1495.04.000000Gourmet and World Food010.517833
2024-02-23 00:00:00+00:00Pasta/Spaghetti Spoon - Nylon, Silicon Handle,...1299.03.732046Kitchen, Garden and Pets110.950000
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "df", "summary": "{\n \"name\": \"df\",\n \"rows\": 4133,\n \"fields\": [\n {\n \"column\": \"product\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3074,\n \"samples\": [\n \"Baby Feeding Bottles For Milk & Water With Handle\",\n \"Cucumber Sheet Mask\",\n \"Gomaya Khanda - Desi Cow Dung Cakes For Agnihotra And Pooja Purposes\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"sales_last_week\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 26,\n \"num_unique_values\": 15,\n \"samples\": [\n 12,\n 11,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"market_price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 654.8267810878871,\n \"min\": 5.0,\n \"max\": 12500.0,\n \"num_unique_values\": 588,\n \"samples\": [\n 81.25,\n 343.0,\n 890.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.6451922536814806,\n \"min\": 1.0,\n \"max\": 5.0,\n \"num_unique_values\": 102,\n \"samples\": [\n 3.3,\n 4.029661016949152,\n 4.168953068592058\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 11,\n \"samples\": [\n \"Foodgrains, Oil and Masala\",\n \"Snacks and Branded Foods\",\n \"Cleaning and Household\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"output_discount\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"output_prediction\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"output_score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.15383585519854434,\n \"min\": 0.5,\n \"max\": 1.0,\n \"num_unique_values\": 1151,\n \"samples\": [\n 0.795,\n 0.510123015873016\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 21 } ] }, { "cell_type": "markdown", "source": [ "## Writer setup" ], "metadata": { "id": "waeIbbTdukNr" } }, { "cell_type": "code", "source": [ "why.init(force_local=True)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fsAyYwaVy6a2", "outputId": "389f8828-fd88-4509-fb5d-e08fa822e5ab" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Initializing session with config /root/.config/whylogs/config.ini\n", "\n", "✅ Using session type: LOCAL. Profiles won't be uploaded or written anywhere automatically.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 10 } ] }, { "cell_type": "code", "source": [ "writer = WhyLabsWriter()" ], "metadata": { "id": "unQY11ndew5O" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Uploading multiple profiles with different timestamps\n", "`WhyLabsWriter::start_transaction()` signals the start of a transaction. Profiles sent to WhyLabs with `WhyLabsWriter::write()` during the transaction are uploaded to WhyLabs immediately, but won't be processed until `WhyLabsWriter::commit_transaction()` is called." ], "metadata": { "id": "NDNcskT58rRl" } }, { "cell_type": "code", "source": [ "transaction_id = writer.start_transaction()\n", "print(f\"Started transaction {transaction_id}\")\n", "for i in range(5):\n", " batch_df = list_daily_batches[i].data[columns]\n", " profile = why.log(batch_df)\n", " timestamp = datetime.now(tz=timezone.utc) - timedelta(days=i+1)\n", " profile.set_dataset_timestamp(timestamp)\n", " status, id = writer.write(profile)\n", " print(status, id)\n", "writer.commit_transaction()\n", "print(\"Commiting transaction\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "8MsbsK0y6Xkf", "outputId": "1cc72c0b-3295-49c2-8f1a-213c0310578f" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Started transaction df4ff687-f881-4633-8a44-3d1c24f631d3\n", "True log-v12vewf7Cu9j3aVV\n", "True log-PWW3D23edKlU0aFt\n", "True log-hDYs8dGamli2LHdq\n", "True log-4JIe3jWBahpMou07\n", "True log-2bmc0Rl3u4oGBIu8\n", "Commiting transaction\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Uploading multiple profiles with the same batch timestamp\n", "The `WhyLabsTransaction` context manager can simplify error handling." ], "metadata": { "id": "K7hEBIyuzXh0" } }, { "cell_type": "code", "source": [ "timestamp = datetime.now(tz=timezone.utc) - timedelta(days=2)\n", "timestamp" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jDaD9rUeujK6", "outputId": "58a2f305-a317-4793-9c09-1ec323ef02ca" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "datetime.datetime(2024, 2, 20, 0, 14, 58, 753029, tzinfo=datetime.timezone.utc)" ] }, "metadata": {}, "execution_count": 13 } ] }, { "cell_type": "code", "source": [ "try:\n", " with WhyLabsTransaction(writer):\n", " print(\"Started transaction\")\n", " for i in range(5):\n", " batch_df = list_daily_batches[i].data[columns]\n", " profile = why.log(df)\n", " profile.set_dataset_timestamp(timestamp)\n", " status, id = writer.write(profile)\n", " print(status, id)\n", "except Exception:\n", " print(\"Transaction failed\")\n", "\n", "print(\"Committed transaction\")\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QAjpae7K0CR3", "outputId": "728cd3b2-509d-4077-8eea-f37d91115ab9" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Started transaction\n", "True log-yaHvpXyNRO53ilWo\n", "True log-Zsa0lbCCqjzzjLGJ\n", "True log-pg57yHO6RuvO4Q8J\n", "True log-FSYoOwtmE8x51xSr\n", "True log-v3G6VyLUn1x1crVy\n", "Committed transaction\n" ] } ] }, { "cell_type": "markdown", "source": [ "If a `write()` call returns a `False` status, the profile will not be included in the transaction. You might want to retry writing it. If not, that profile will be left out of the transaction, but those successfully written will still be included." ], "metadata": { "id": "PaAQy-RDftXU" } }, { "cell_type": "markdown", "source": [ "## Segmented profiles\n", "\n", "Each segment in a segmneted profile get uploaded to WhyLabs in a separate S3 interaction. Segmented profiles can be sent as a transaction so that all the segments are committed to WhyLabs at once. In this case, the status returned from `WhyLabsWriter::write()` is the logical and of the statuses of each segment, and it returns a list of all the segmented ids.\n" ], "metadata": { "id": "PaolLMia9Zrs" } }, { "cell_type": "code", "source": [ "schema = DatasetSchema(segments=segment_on_column(\"output_discount\"))\n", "profile = why.log(df, schema=schema)\n", "with WhyLabsTransaction(writer):\n", " status, id = writer.write(profile)\n", "\n", "print(f\"{status} {id}\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d5kNH7eigJBb", "outputId": "9dea6a91-b29f-4a0e-ee9b-e11e00e50664" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "True log-Rhlr7KzY6pp7vla5; log-8ihsF7KAbhAfNFM6\n" ] } ] } ] }