{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "013a0fd4-31f9-4f3f-bf3a-1efc9640422f", "metadata": { "id": "013a0fd4-31f9-4f3f-bf3a-1efc9640422f" }, "source": [ "\n", "![alt text](https://whylabs-public.s3.us-west-2.amazonaws.com/assets/whylabs-logo-night-blue.svg)\n", "\n", "*Run AI with Certainty*\n", "\n", "# **Getting Started with WhyLabs** " ] }, { "attachments": {}, "cell_type": "markdown", "id": "dFKGE4P7N06M", "metadata": { "id": "dFKGE4P7N06M" }, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Getting_Started_with_WhyLabsV1.ipynb)\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7iwPGGTl1Uuz", "metadata": { "id": "7iwPGGTl1Uuz" }, "source": [ "### ๐ฉ **Step 1: Create a WhyLabs account** \n", "In order to use this example notebook, you'll first need to head to [WhyLabs](https://www.whylabs.ai/free) and signup for a free account.\n", "\n", "**You can skip the onboarding code example if you are using this noteboook**\n", "\n", "As part of the onboarding workflow, you will receive an **organization ID** for your account. This is the identifier for your account.\n", "\n", "You'll also need to create an access token as part of the onboarding flow.\n", "\n", "#### ๐ *If you already have a WhyLabs account* \n", "Please go to *Settings* -> *Access Tokens* to generate tokens.\n", "\n", "\n", "\n", "---\n", "\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "l8o9dKSU1X6H", "metadata": { "id": "l8o9dKSU1X6H" }, "source": [ "### ๐ **Step 2: Install whylogs and import dependencies** \n", "To begin, uncomment the cell below and install the **[whylogs](https://github.com/whylabs/whylogs)** library.\n", "\n", "[![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/whylabs/whylogs-python/blob/mainline/LICENSE)\n", "[![PyPI version](https://badge.fury.io/py/whylogs.svg)](https://badge.fury.io/py/whylogs)\n", "[![Coverage Status](https://coveralls.io/repos/github/whylabs/whylogs/badge.svg?branch=mainline)](https://coveralls.io/github/whylabs/whylogs?branch=mainline)\n", "[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)\n", "[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/4490/badge)](https://bestpractices.coreinfrastructure.org/projects/4490)\n", "[![PyPi Downloads](https://pepy.tech/badge/whylogs)](https://pepy.tech/project/whylogs)\n", "![CI](https://github.com/whylabs/whylogs-python/workflows/whylogs%20CI/badge.svg)\n", "[![Maintainability](https://api.codeclimate.com/v1/badges/442f6ca3dca1e583a488/maintainability)](https://codeclimate.com/github/whylabs/whylogs-python/maintainability)\n", "\n", "โ The `whylogs` library profiles data in real time, collecting thousands of metrics from structured data, unstructured data, and ML model predictions with zero configuration.\n", "\n", "\n", "โ This library runs locally on your machine and collects relevant metrics in dataset profiles that can both be logged to disk and uploaded to the WhyLabs Platform for monitoring." ] }, { "cell_type": "code", "execution_count": null, "id": "ad907ce3-0c3b-49e4-86f1-eae9de934f7b", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ad907ce3-0c3b-49e4-86f1-eae9de934f7b", "jupyter": { "outputs_hidden": true }, "outputId": "cbf178c0-9028-4002-ae01-568502d30b17", "tags": [] }, "outputs": [], "source": [ "# Note: you may need to restart the kernel to use updated packages.\n", "### The following WhyLabs Platform integration example requires the latest whylogs version: \n", "%pip install whylogs" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a244145c-ea35-4ab6-b03e-cf5f864ed94c", "metadata": { "id": "a244145c-ea35-4ab6-b03e-cf5f864ed94c" }, "source": [ "### ๐ **Step 3: Load example data batches**\n", "\n", "The example data is prepared from our public S3 bucket. Here in the example we have prepared a few examples CSVs for the example." ] }, { "cell_type": "code", "execution_count": 1, "id": "b78028ea-c7cb-494f-a303-071f1c345dfc", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "b78028ea-c7cb-494f-a303-071f1c345dfc", "outputId": "6acdedee-c4fd-4377-b525-beeaec390a2e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_1.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_2.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_3.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_4.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_5.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_6.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_7.csv\n" ] } ], "source": [ "import pandas as pd\n", "\n", "pdfs = []\n", "for i in range(1, 8):\n", " path = f\"https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_{i}.csv\"\n", " print(f\"Loading data from {path}\")\n", " df = pd.read_csv(path)\n", " pdfs.append(df)" ] }, { "cell_type": "code", "execution_count": 2, "id": "67b81ab4-a456-4d2d-9547-ad0d772e0aaa", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 394 }, "id": "67b81ab4-a456-4d2d-9547-ad0d772e0aaa", "outputId": "6c2268d7-43fe-4d1d-a4dc-738db5c747cd" }, "outputs": [ { "data": { "text/html": [ "
\n", " | Unnamed: 0 | \n", "id | \n", "member_id | \n", "loan_amnt | \n", "funded_amnt | \n", "funded_amnt_inv | \n", "int_rate | \n", "installment | \n", "annual_inc | \n", "desc | \n", "... | \n", "hardship_loan_status | \n", "orig_projected_additional_accrued_interest | \n", "hardship_payoff_balance_amount | \n", "hardship_last_payment_amount | \n", "debt_settlement_flag_date | \n", "settlement_status | \n", "settlement_date | \n", "settlement_amount | \n", "settlement_percentage | \n", "settlement_term | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | \n", "407.000000 | \n", "4.070000e+02 | \n", "0.0 | \n", "407.000000 | \n", "407.000000 | \n", "407.000000 | \n", "407.000000 | \n", "407.000000 | \n", "407.000000 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
mean | \n", "12548.717445 | \n", "1.158631e+08 | \n", "NaN | \n", "14203.746929 | \n", "14203.746929 | \n", "14202.948403 | \n", "13.514054 | \n", "418.020344 | \n", "78818.956069 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
std | \n", "125.354772 | \n", "1.207642e+06 | \n", "NaN | \n", "9351.142374 | \n", "9351.142374 | \n", "9350.997874 | \n", "5.446881 | \n", "271.096531 | \n", "55864.939403 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
min | \n", "12325.000000 | \n", "1.121538e+08 | \n", "NaN | \n", "1000.000000 | \n", "1000.000000 | \n", "1000.000000 | \n", "5.320000 | \n", "34.220000 | \n", "0.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
25% | \n", "12442.500000 | \n", "1.150769e+08 | \n", "NaN | \n", "7000.000000 | \n", "7000.000000 | \n", "7000.000000 | \n", "9.930000 | \n", "235.580000 | \n", "43325.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
50% | \n", "12550.000000 | \n", "1.157004e+08 | \n", "NaN | \n", "12000.000000 | \n", "12000.000000 | \n", "12000.000000 | \n", "12.620000 | \n", "357.250000 | \n", "63300.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
75% | \n", "12653.500000 | \n", "1.168245e+08 | \n", "NaN | \n", "20000.000000 | \n", "20000.000000 | \n", "20000.000000 | \n", "16.020000 | \n", "553.515000 | \n", "95000.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
max | \n", "12862.000000 | \n", "1.181592e+08 | \n", "NaN | \n", "40000.000000 | \n", "40000.000000 | \n", "40000.000000 | \n", "30.990000 | \n", "1417.710000 | \n", "495000.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
8 rows ร 126 columns
\n", "