{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n", ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_to_WhyLabs)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_to_WhyLabs) to leverage the power of whylogs and WhyLabs together!*" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Writing Profiles to WhyLabs" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_to_WhyLabs.ipynb)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we'll show how to send your logged profiles to your monitoring dashboard at WhyLabs Platform.\n", "We will:\n", "\n", "- Define environment variables with the appropriate Credentials and IDs\n", "- Log data into a profile\n", "- Use the WhyLabs Writer to send the profile to your Project at WhyLabs" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Installing whylogs\n", "\n", "First, let's install __whylogs__. Since we want to write to WhyLabs, we'll also install the __whylabs__ extra.\n", "\n", "If you don't have it installed already, uncomment the line below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note: you may need to restart the kernel to use updated packages.\n", "%pip install whylogs" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## ✔️ Setting the Environment Variables" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In order to send our profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.\n", "\n", "We will need three pieces of information:\n", "\n", "- API token\n", "- Organization ID\n", "- Dataset ID (or model-id)\n", "\n", "Go to https://whylabs.ai/free and grab a free account. You can follow along with the examples if you wish, but if you’re interested in only following this demonstration, you can go ahead and skip the quick start instructions.\n", "\n", "After that, you’ll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like `model-xxxx`)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We'll now set the credentials as environment variables. The WhyLabs Writer will check for the existence of these variables in order to send the profiles to your dashboard." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import whylogs as why\n", "import os\n", "\n", "# Optionally you can override the AWS region that profiles get uploaded to when uploading to WhyLabs\n", "# os.environ[\"WHYLABS_UPLOAD_REGION\"] = 'ap-southeast-2'\n", "why.init()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Normally, uploads are done automatically after logging when you use why.init, but this notebook demonstrates\n", "manual WhyLabs uploads using the WhyLabs writer API, which you might need if you're doing something custom with\n", "the profiles before they get uploaded. Doing this will disable automatic uploads." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "why.init(force_local=True, reinit=True)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Fetching the Data" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "For demonstration, let's use data for transactions from a small retail business:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Transaction IDCustomer IDQuantityItem PriceTotal TaxTotal AmountStore TypeProduct CategoryProduct SubcategoryGenderTransaction TypeAge
0T14259136777C2744771148.915.6345164.5345TeleShopElectronicsAudio and videoFPurchase37.0
1T7313351894C267568448.120.2020212.6020Flagship storeHome and kitchenFurnishingMPurchase25.0
2T37745642681C267098110.91.144512.0445Flagship storeFootwearMensFPurchase42.0
3T13861409908C2716082135.228.3920298.7920MBRFootwearMensFPurchase43.0
4T58956348529C2724844144.360.6060637.8060TeleShopClothingMensFPurchase39.0
\n", "
" ], "text/plain": [ " Transaction ID Customer ID Quantity Item Price Total Tax Total Amount \\\n", "0 T14259136777 C274477 1 148.9 15.6345 164.5345 \n", "1 T7313351894 C267568 4 48.1 20.2020 212.6020 \n", "2 T37745642681 C267098 1 10.9 1.1445 12.0445 \n", "3 T13861409908 C271608 2 135.2 28.3920 298.7920 \n", "4 T58956348529 C272484 4 144.3 60.6060 637.8060 \n", "\n", " Store Type Product Category Product Subcategory Gender \\\n", "0 TeleShop Electronics Audio and video F \n", "1 Flagship store Home and kitchen Furnishing M \n", "2 Flagship store Footwear Mens F \n", "3 MBR Footwear Mens F \n", "4 TeleShop Clothing Mens F \n", "\n", " Transaction Type Age \n", "0 Purchase 37.0 \n", "1 Purchase 25.0 \n", "2 Purchase 42.0 \n", "3 Purchase 43.0 \n", "4 Purchase 39.0 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "csv_url = \"https://whylabs-public.s3.us-west-2.amazonaws.com/datasets/tour/current.csv\"\n", "df = pd.read_csv(csv_url)\n", "\n", "df.head()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 📊 Profiling the Data" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's profile the data with whylogs:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime, timezone\n", "\n", "profile = why.log(df, dataset_timestamp=datetime.now(tz=timezone.utc)).profile()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We're also setting the profile's dataset timestamp as the current datetime. If this is not set, the Writer would simply assign the current datetime automatically to the profile.\n", "\n", "You can set the timestamp to past dates, if you want to backfill data into your dashboard. You should see data for the last 14 days at your dashboard within seconds or minutes once you send it." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## ✍️ The WhyLabs Writer" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now, you can simply create a WhyLabsWriter object and use it to send your profiles, like this:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(True, 'log-SluRuCMaglUHtWln')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from whylogs.api.writer.whylabs import WhyLabsWriter\n", "\n", "# The writer uses the same credentials that you set up in why.init\n", "writer = WhyLabsWriter()\n", "writer.write(profile)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "A `200` response should mean that it went through successfully, and your status returned will be a tuple contained if the write was successful and the profile reference on WhyLabs if successful, otherwise an error string.\n", "\n", "The writer expects a __Profile View__ as parameter." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Option #2: Profile Result writer" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "A second way to write to WhyLabs is by directly using the `writer` method of a __Profile Result__ set, like this:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(True, 'log-iPA3fD7K3IAXqwZk')]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "profile_results = why.log(df)\n", "profile_results.writer(\"whylabs\").write()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 🔍 A Look on the Other Side" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now, check your dashboard to verify everything went ok. At the __Profile__ tab, you should see something like this:" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "![alt text](images/whylabs.png)\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Going from One to Many Profiles" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This was a simple example on how you can send a single profile to WhyLabs.\n", "\n", "To better experience WhyLab's features, try sending multiple profiles for a number of days to visualize how the feature's metrics change over time!\n", "To do that, follow the same steps shown here, but setting the dataset timestamp to past days for backfill.\n", "\n", "The following code snippet is a very simple way to do that. We're just tweaking the code in the previous sessions in order to split the dataframe into seven chunks, and logging each chunk for a given day:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from datetime import datetime, timedelta\n", "from whylogs.api.writer.whylabs import WhyLabsWriter\n", "\n", "writer = WhyLabsWriter()\n", "\n", "df_splits = np.array_split(df, 7)\n", "\n", "# This will log each data split for the last 7 days, starting with the current date and finishing 7 days in the past\n", "for day, df_split in enumerate(df_splits):\n", " timestamp = datetime.now(tz=timezone.utc) - timedelta(days=day)\n", " profile = why.log(df, dataset_timestamp=timestamp).profile()\n", " writer.write(profile.view())\n", " print(\"Logged profile for {}\".format(timestamp))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This should give you a quick way to look at how your features can be monitored over time!\n", "\n", "### Logging in a production service a rolling logger\n", "In a production service you can profile in line using a rolling logger as your telemetry gathering agent, which will aggregate the statistics over many messages or batches of your data, and the agen can be configured to write profiles with a configured interval to one or more destinations such as Whylabs, local disk, s3 etc.\n", "\n", "Below we will use this same dataframe and split it into 100 pieces to mimic small batches of inputs in a service, and log these with a rolling logger configured to write to WhyLabs every 5 minutes. To make it easier to see what's going on we also provide a callback on serialization, which gives access to the writer, the profile written, and a status." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from whylogs.api.writer.whylabs import WhyLabsWriter\n", "from whylogs.api.writer.writer import Writer\n", "from whylogs.core.view.dataset_profile_view import DatasetProfileView\n", "\n", "\n", "# step 0 - setup a serialization callback handler to debug/trace, \n", "# replace the print with your own application logging\n", "def upload_callback(writer: Writer, profile: DatasetProfileView, filename: str):\n", " print(f\"Uploaded with {writer}, profile with timestamp: {profile.dataset_timestamp} and filename {filename}\")\n", "\n", "# step 1 setup the telemetry agent as a rolling logger and use the WhyLabs writer\n", "telemetry_agent = why.logger(mode='rolling', interval=5, when=\"M\", callback=upload_callback)\n", "whylabs_writer = WhyLabsWriter()\n", "telemetry_agent.append_writer(writer=whylabs_writer)\n", "# you can also append writers by name with default configuration using env variables,\n", "# below we append the default local disk profile writer for debug purposes,\n", "# but it is easier to write only to WhyLabs in a production setup.\n", "telemetry_agent.append_writer(name='local')\n", "\n", "# step 2 some fake data in batches, typically this would be service traffic or batches of your data in a map/reduce setup.\n", "total_batches = 10\n", "df_splits = np.array_split(df, total_batches)\n", "\n", "# step 3 - log the fake data in batches, this loop will log each batch of the split, but won't create more than one profile.\n", "for batch, df_split in enumerate(df_splits):\n", " print(f\"profiling batch: {batch} out of {total_batches} batches of data\")\n", " telemetry_agent.log(df_split)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point there should be several batches of data profiled, but there is stil a single compact dataset profile describing all this data. If we waited for the interval to expire (5 minutes after instantiating the telemetry agent), we should see an upload get triggered, and our `upload_callback` triggered, but the rolling logger also has a `close()` method so you can flush any profile data during an application shutdown ahead of the interval's deadline. This will flush the profile." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "telemetry_agent.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You now have a local copy of the profile, you can check it out, and it is also uploaded to your WhyLabs account where it will be merged with other profiles in the same monitoring time window (e.g. daily or hourly) based on the profiles timestamp." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls *.bin" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.10 ('.venv': poetry)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "d39f874c9b8a97550ecbd783714b95e79c9b905449b34f44c40e3bf053b54b41" } } }, "nbformat": 4, "nbformat_minor": 2 }