{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "Transactions allow several profiles to be commited to WhyLabs as a group. Let's start with some setup.\n" ], "metadata": { "id": "u5FQGlNpNVUX" } }, { "cell_type": "code", "source": [ "!pip install whylogs" ], "metadata": { "id": "rDZLfAYMi7vi" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "import whylogs as why\n", "from whylabs_client.api.transactions_api import TransactionsApi\n", "from whylogs.core.schema import DatasetSchema\n", "from whylogs.core.segmentation_partition import segment_on_column\n", "from whylogs.api.writer.whylabs import WhyLabsWriter\n", "from whylogs.api.writer.whylabs_transaction_writer import WhyLabsTransactionWrirter\n", "import os\n", "from uuid import uuid4\n", "from whylogs.datasets import Ecommerce\n", "import numpy as np\n", "import pandas as pd\n", "from datetime import datetime, timedelta, timezone" ], "metadata": { "id": "3eyEw1UUi_nl" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "os.environ[\"WHYLABS_DEFAULT_ORG_ID\"] = \"org-XXX\"\n", "os.environ[\"WHYLABS_DEFAULT_DATASET_ID\"] = \"model-XXX\"\n", "os.environ[\"WHYLABS_API_KEY\"] = \"XXXX:org-XXX\"" ], "metadata": { "id": "h3Fq8l14XmpA" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Get example dataset" ], "metadata": { "id": "ul-WBntiyS9x" } }, { "cell_type": "code", "source": [ "dataset = Ecommerce()" ], "metadata": { "id": "kbVch_DaySHW" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "daily_batches = dataset.get_inference_data(number_batches=20)\n", "list_daily_batches = list(daily_batches)" ], "metadata": { "id": "G_wcHlmBypVj" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "columns = ['product','sales_last_week','market_price','rating','category','output_discount','output_prediction','output_score']\n", "\n", "df = list_daily_batches[0].data[columns]" ], "metadata": { "id": "ltwx9rVJyzDc" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 235 }, "id": "Wd0Ms0UDynEi", "outputId": "505fef55-900a-4f85-d86d-a0f48ccb3a69" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " product \\\n", "date \n", "2024-02-23 00:00:00+00:00 1-2-3 Noodles - Veg Masala Flavour \n", "2024-02-23 00:00:00+00:00 Jaggery Powder - Organic, Sulphur Free \n", "2024-02-23 00:00:00+00:00 Pudding - Assorted \n", "2024-02-23 00:00:00+00:00 Perfectly Moist Dark Chocolate Fudge Cake Mix ... \n", "2024-02-23 00:00:00+00:00 Pasta/Spaghetti Spoon - Nylon, Silicon Handle,... \n", "\n", " sales_last_week market_price rating \\\n", "date \n", "2024-02-23 00:00:00+00:00 2 12.0 4.200000 \n", "2024-02-23 00:00:00+00:00 1 280.0 3.996552 \n", "2024-02-23 00:00:00+00:00 3 50.0 4.400000 \n", "2024-02-23 00:00:00+00:00 1 495.0 4.000000 \n", "2024-02-23 00:00:00+00:00 1 299.0 3.732046 \n", "\n", " category output_discount \\\n", "date \n", "2024-02-23 00:00:00+00:00 Snacks and Branded Foods 0 \n", "2024-02-23 00:00:00+00:00 Gourmet and World Food 0 \n", "2024-02-23 00:00:00+00:00 Gourmet and World Food 0 \n", "2024-02-23 00:00:00+00:00 Gourmet and World Food 0 \n", "2024-02-23 00:00:00+00:00 Kitchen, Garden and Pets 1 \n", "\n", " output_prediction output_score \n", "date \n", "2024-02-23 00:00:00+00:00 0 1.000000 \n", "2024-02-23 00:00:00+00:00 0 0.571833 \n", "2024-02-23 00:00:00+00:00 1 0.600000 \n", "2024-02-23 00:00:00+00:00 1 0.517833 \n", "2024-02-23 00:00:00+00:00 1 0.950000 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productsales_last_weekmarket_priceratingcategoryoutput_discountoutput_predictionoutput_score
date
2024-02-23 00:00:00+00:001-2-3 Noodles - Veg Masala Flavour212.04.200000Snacks and Branded Foods001.000000
2024-02-23 00:00:00+00:00Jaggery Powder - Organic, Sulphur Free1280.03.996552Gourmet and World Food000.571833
2024-02-23 00:00:00+00:00Pudding - Assorted350.04.400000Gourmet and World Food010.600000
2024-02-23 00:00:00+00:00Perfectly Moist Dark Chocolate Fudge Cake Mix ...1495.04.000000Gourmet and World Food010.517833
2024-02-23 00:00:00+00:00Pasta/Spaghetti Spoon - Nylon, Silicon Handle,...1299.03.732046Kitchen, Garden and Pets110.950000
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "df", "summary": "{\n \"name\": \"df\",\n \"rows\": 4133,\n \"fields\": [\n {\n \"column\": \"product\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3074,\n \"samples\": [\n \"Baby Feeding Bottles For Milk & Water With Handle\",\n \"Cucumber Sheet Mask\",\n \"Gomaya Khanda - Desi Cow Dung Cakes For Agnihotra And Pooja Purposes\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"sales_last_week\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 26,\n \"num_unique_values\": 15,\n \"samples\": [\n 12,\n 11,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"market_price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 654.8267810878871,\n \"min\": 5.0,\n \"max\": 12500.0,\n \"num_unique_values\": 588,\n \"samples\": [\n 81.25,\n 343.0,\n 890.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.6451922536814806,\n \"min\": 1.0,\n \"max\": 5.0,\n \"num_unique_values\": 102,\n \"samples\": [\n 3.3,\n 4.029661016949152,\n 4.168953068592058\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 11,\n \"samples\": [\n \"Foodgrains, Oil and Masala\",\n \"Snacks and Branded Foods\",\n \"Cleaning and Household\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"output_discount\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"output_prediction\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"output_score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.15383585519854434,\n \"min\": 0.5,\n \"max\": 1.0,\n \"num_unique_values\": 1151,\n \"samples\": [\n 0.795,\n 0.510123015873016\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 21 } ] }, { "cell_type": "markdown", "source": [ "## Writer setup" ], "metadata": { "id": "waeIbbTdukNr" } }, { "cell_type": "code", "source": [ "why.init(force_local=True)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fsAyYwaVy6a2", "outputId": "389f8828-fd88-4509-fb5d-e08fa822e5ab" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Initializing session with config /root/.config/whylogs/config.ini\n", "\n", "✅ Using session type: LOCAL. Profiles won't be uploaded or written anywhere automatically.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 10 } ] }, { "cell_type": "code", "source": [ "writer = WhyLabsWriter()" ], "metadata": { "id": "unQY11ndew5O" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Uploading multiple profiles with different timestamps\n", "`WhyLabsWriter::start_transaction()` signals the start of a transaction. Profiles sent to WhyLabs with `WhyLabsWriter::write()` during the transaction are uploaded to WhyLabs immediately, but won't be processed until `WhyLabsWriter::commit_transaction()` is called." ], "metadata": { "id": "NDNcskT58rRl" } }, { "cell_type": "code", "source": [ "transaction_id = writer.start_transaction()\n", "print(f\"Started transaction {transaction_id}\")\n", "for i in range(5):\n", " batch_df = list_daily_batches[i].data[columns]\n", " profile = why.log(batch_df)\n", " timestamp = datetime.now(tz=timezone.utc) - timedelta(days=i+1)\n", " profile.set_dataset_timestamp(timestamp)\n", " status, id = writer.write(profile)\n", " print(status, id)\n", "writer.commit_transaction()\n", "print(\"Commiting transaction\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "8MsbsK0y6Xkf", "outputId": "1cc72c0b-3295-49c2-8f1a-213c0310578f" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Started transaction df4ff687-f881-4633-8a44-3d1c24f631d3\n", "True log-v12vewf7Cu9j3aVV\n", "True log-PWW3D23edKlU0aFt\n", "True log-hDYs8dGamli2LHdq\n", "True log-4JIe3jWBahpMou07\n", "True log-2bmc0Rl3u4oGBIu8\n", "Commiting transaction\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Uploading multiple profiles with the same batch timestamp\n", "The `WhyLabsTransactionWriter` can be used as a context manager to simplify transaction error handling and ensure `commit_transaction()` is called." ], "metadata": { "id": "K7hEBIyuzXh0" } }, { "cell_type": "code", "source": [ "timestamp = datetime.now(tz=timezone.utc) - timedelta(days=2)\n", "timestamp" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jDaD9rUeujK6", "outputId": "58a2f305-a317-4793-9c09-1ec323ef02ca" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "datetime.datetime(2024, 2, 20, 0, 14, 58, 753029, tzinfo=datetime.timezone.utc)" ] }, "metadata": {}, "execution_count": 13 } ] }, { "cell_type": "code", "source": [ "try:\n", " with WhyLabsTransactionWriter() as writer:\n", " print(\"Started transaction\")\n", " for i in range(5):\n", " batch_df = list_daily_batches[i].data[columns]\n", " profile = why.log(df)\n", " profile.set_dataset_timestamp(timestamp)\n", " status, id = writer.write(profile)\n", " print(status, id)\n", "except Exception:\n", " print(\"Transaction failed\")\n", "\n", "print(\"Committed transaction\")\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QAjpae7K0CR3", "outputId": "728cd3b2-509d-4077-8eea-f37d91115ab9" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Started transaction\n", "True log-yaHvpXyNRO53ilWo\n", "True log-Zsa0lbCCqjzzjLGJ\n", "True log-pg57yHO6RuvO4Q8J\n", "True log-FSYoOwtmE8x51xSr\n", "True log-v3G6VyLUn1x1crVy\n", "Committed transaction\n" ] } ] }, { "cell_type": "markdown", "source": [ "If a `write()` call during the transaction fails (returns a `False` status), the transaction's commit will fail raising an exception." ], "metadata": { "id": "PaAQy-RDftXU" } }, { "cell_type": "markdown", "source": [ "## Segmented profiles\n", "\n", "Each segment in a segmneted profile get uploaded to WhyLabs in a separate S3 interaction. Segmented profiles can be sent as a transaction so that all the segments are committed to WhyLabs at once. In this case, the status returned from `WhyLabsWriter::write()` is the logical and of the statuses of each segment, and it returns a list of all the segmented ids.\n" ], "metadata": { "id": "PaolLMia9Zrs" } }, { "cell_type": "code", "source": [ "schema = DatasetSchema(segments=segment_on_column(\"output_discount\"))\n", "profile = why.log(df, schema=schema)\n", "with WhyLabsTransactionWriter() as writer:\n", " status, id = writer.write(profile)\n", "\n", "print(f\"{status} {id}\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d5kNH7eigJBb", "outputId": "9dea6a91-b29f-4a0e-ee9b-e11e00e50664" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "True log-Rhlr7KzY6pp7vla5; log-8ihsF7KAbhAfNFM6\n" ] } ] } ] }