{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "013a0fd4-31f9-4f3f-bf3a-1efc9640422f", "metadata": { "id": "013a0fd4-31f9-4f3f-bf3a-1efc9640422f" }, "source": [ "\n", "![alt text](https://whylabs-public.s3.us-west-2.amazonaws.com/assets/whylabs-logo-night-blue.svg)\n", "\n", "*Run AI with Certainty*\n", "\n", "# **Getting Started with WhyLabs** " ] }, { "attachments": {}, "cell_type": "markdown", "id": "dFKGE4P7N06M", "metadata": { "id": "dFKGE4P7N06M" }, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Getting_Started_with_WhyLabsV1.ipynb)\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7iwPGGTl1Uuz", "metadata": { "id": "7iwPGGTl1Uuz" }, "source": [ "### 🚩 **Step 1: Create a WhyLabs account** \n", "In order to use this example notebook, you'll first need to head to [WhyLabs](https://www.whylabs.ai/free) and signup for a free account.\n", "\n", "**You can skip the onboarding code example if you are using this noteboook**\n", "\n", "As part of the onboarding workflow, you will receive an **organization ID** for your account. This is the identifier for your account.\n", "\n", "You'll also need to create an access token as part of the onboarding flow.\n", "\n", "#### 🔑 *If you already have a WhyLabs account* \n", "Please go to *Settings* -> *Access Tokens* to generate tokens.\n", "\n", "\n", "\n", "---\n", "\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "l8o9dKSU1X6H", "metadata": { "id": "l8o9dKSU1X6H" }, "source": [ "### 🛠 **Step 2: Install whylogs and import dependencies** \n", "To begin, uncomment the cell below and install the **[whylogs](https://github.com/whylabs/whylogs)** library.\n", "\n", "[![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/whylabs/whylogs-python/blob/mainline/LICENSE)\n", "[![PyPI version](https://badge.fury.io/py/whylogs.svg)](https://badge.fury.io/py/whylogs)\n", "[![Coverage Status](https://coveralls.io/repos/github/whylabs/whylogs/badge.svg?branch=mainline)](https://coveralls.io/github/whylabs/whylogs?branch=mainline)\n", "[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)\n", "[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/4490/badge)](https://bestpractices.coreinfrastructure.org/projects/4490)\n", "[![PyPi Downloads](https://pepy.tech/badge/whylogs)](https://pepy.tech/project/whylogs)\n", "![CI](https://github.com/whylabs/whylogs-python/workflows/whylogs%20CI/badge.svg)\n", "[![Maintainability](https://api.codeclimate.com/v1/badges/442f6ca3dca1e583a488/maintainability)](https://codeclimate.com/github/whylabs/whylogs-python/maintainability)\n", "\n", "✅ The `whylogs` library profiles data in real time, collecting thousands of metrics from structured data, unstructured data, and ML model predictions with zero configuration.\n", "\n", "\n", "✅ This library runs locally on your machine and collects relevant metrics in dataset profiles that can both be logged to disk and uploaded to the WhyLabs Platform for monitoring." ] }, { "cell_type": "code", "execution_count": null, "id": "ad907ce3-0c3b-49e4-86f1-eae9de934f7b", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ad907ce3-0c3b-49e4-86f1-eae9de934f7b", "jupyter": { "outputs_hidden": true }, "outputId": "cbf178c0-9028-4002-ae01-568502d30b17", "tags": [] }, "outputs": [], "source": [ "# Note: you may need to restart the kernel to use updated packages.\n", "### The following WhyLabs Platform integration example requires the latest whylogs version: \n", "%pip install whylogs" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a244145c-ea35-4ab6-b03e-cf5f864ed94c", "metadata": { "id": "a244145c-ea35-4ab6-b03e-cf5f864ed94c" }, "source": [ "### 📝 **Step 3: Load example data batches**\n", "\n", "The example data is prepared from our public S3 bucket. Here in the example we have prepared a few examples CSVs for the example." ] }, { "cell_type": "code", "execution_count": 2, "id": "b78028ea-c7cb-494f-a303-071f1c345dfc", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "b78028ea-c7cb-494f-a303-071f1c345dfc", "outputId": "6acdedee-c4fd-4377-b525-beeaec390a2e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_1.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_2.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_3.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_4.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_5.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_6.csv\n", "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_7.csv\n" ] } ], "source": [ "import pandas as pd\n", "\n", "pdfs = []\n", "for i in range(1, 8):\n", " path = f\"https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_{i}.csv\"\n", " print(f\"Loading data from {path}\")\n", " df = pd.read_csv(path)\n", " pdfs.append(df)" ] }, { "cell_type": "code", "execution_count": 3, "id": "67b81ab4-a456-4d2d-9547-ad0d772e0aaa", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 394 }, "id": "67b81ab4-a456-4d2d-9547-ad0d772e0aaa", "outputId": "6c2268d7-43fe-4d1d-a4dc-738db5c747cd" }, "outputs": [ { "data": { "text/html": [ "
\n", " | Unnamed: 0 | \n", "id | \n", "member_id | \n", "loan_amnt | \n", "funded_amnt | \n", "funded_amnt_inv | \n", "int_rate | \n", "installment | \n", "annual_inc | \n", "desc | \n", "... | \n", "hardship_loan_status | \n", "orig_projected_additional_accrued_interest | \n", "hardship_payoff_balance_amount | \n", "hardship_last_payment_amount | \n", "debt_settlement_flag_date | \n", "settlement_status | \n", "settlement_date | \n", "settlement_amount | \n", "settlement_percentage | \n", "settlement_term | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | \n", "407.000000 | \n", "4.070000e+02 | \n", "0.0 | \n", "407.000000 | \n", "407.000000 | \n", "407.000000 | \n", "407.000000 | \n", "407.000000 | \n", "407.000000 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
mean | \n", "12548.717445 | \n", "1.158631e+08 | \n", "NaN | \n", "14203.746929 | \n", "14203.746929 | \n", "14202.948403 | \n", "13.514054 | \n", "418.020344 | \n", "78818.956069 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
std | \n", "125.354772 | \n", "1.207642e+06 | \n", "NaN | \n", "9351.142374 | \n", "9351.142374 | \n", "9350.997874 | \n", "5.446881 | \n", "271.096531 | \n", "55864.939403 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
min | \n", "12325.000000 | \n", "1.121538e+08 | \n", "NaN | \n", "1000.000000 | \n", "1000.000000 | \n", "1000.000000 | \n", "5.320000 | \n", "34.220000 | \n", "0.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
25% | \n", "12442.500000 | \n", "1.150769e+08 | \n", "NaN | \n", "7000.000000 | \n", "7000.000000 | \n", "7000.000000 | \n", "9.930000 | \n", "235.580000 | \n", "43325.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
50% | \n", "12550.000000 | \n", "1.157004e+08 | \n", "NaN | \n", "12000.000000 | \n", "12000.000000 | \n", "12000.000000 | \n", "12.620000 | \n", "357.250000 | \n", "63300.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
75% | \n", "12653.500000 | \n", "1.168245e+08 | \n", "NaN | \n", "20000.000000 | \n", "20000.000000 | \n", "20000.000000 | \n", "16.020000 | \n", "553.515000 | \n", "95000.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
max | \n", "12862.000000 | \n", "1.181592e+08 | \n", "NaN | \n", "40000.000000 | \n", "40000.000000 | \n", "40000.000000 | \n", "30.990000 | \n", "1417.710000 | \n", "495000.000000 | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
8 rows × 126 columns
\n", "