
![alt text](https://whylabs-public.s3.us-west-2.amazonaws.com/assets/whylabs-logo-night-blue.svg)

*Run AI with Certainty*

# **Getting Started with WhyLabs** 

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Getting_Started_with_WhyLabsV1.ipynb)


### üö© **Step 1: Create a WhyLabs account** 
In order to use this example notebook, you'll first need to head to [WhyLabs](https://www.whylabs.ai/free) and signup for a free account.

**You can skip the onboarding code example if you are using this noteboook**

As part of the onboarding workflow, you will receive an **organization ID** for your account. This is the identifier for your account.

You'll also need to create an access token as part of the onboarding flow.

#### üîë *If you already have a WhyLabs account* 
Please go to *Settings* -> *Access Tokens* to generate tokens.



---




### üõ† **Step 2: Install whylogs and import dependencies** 
To begin, uncomment the cell below and install the **[whylogs](https://github.com/whylabs/whylogs)** library.

[![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/whylabs/whylogs-python/blob/mainline/LICENSE)
[![PyPI version](https://badge.fury.io/py/whylogs.svg)](https://badge.fury.io/py/whylogs)
[![Coverage Status](https://coveralls.io/repos/github/whylabs/whylogs/badge.svg?branch=mainline)](https://coveralls.io/github/whylabs/whylogs?branch=mainline)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)
[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/4490/badge)](https://bestpractices.coreinfrastructure.org/projects/4490)
[![PyPi Downloads](https://pepy.tech/badge/whylogs)](https://pepy.tech/project/whylogs)
![CI](https://github.com/whylabs/whylogs-python/workflows/whylogs%20CI/badge.svg)
[![Maintainability](https://api.codeclimate.com/v1/badges/442f6ca3dca1e583a488/maintainability)](https://codeclimate.com/github/whylabs/whylogs-python/maintainability)

‚úÖ The `whylogs` library profiles data in real time, collecting thousands of metrics from structured data, unstructured data, and ML model predictions with zero configuration.


‚úÖ This library runs locally on your machine and collects relevant metrics in dataset profiles that can both be logged to disk and uploaded to the WhyLabs Platform for monitoring.

In [None]:
# Note: you may need to restart the kernel to use updated packages.
### The following WhyLabs Platform integration example requires the latest whylogs version: 
%pip install 'whylogs>=1.5.0'

### üìù **Step 3: Load example data batches**

The example data is prepared from our public S3 bucket. Here in the example we have prepared a few examples CSVs for the example.

In [1]:
import pandas as pd

pdfs = []
for i in range(1, 8):
    path = f"https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_{i}.csv"
    print(f"Loading data from {path}")
    df = pd.read_csv(path)
    pdfs.append(df)

Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_1.csv
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_2.csv
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_3.csv
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_4.csv
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_5.csv
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_6.csv
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_7.csv


In [2]:
pdfs[0].describe()

Unnamed: 0.1,Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,int_rate,installment,annual_inc,desc,...,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
count,407.0,407.0,0.0,407.0,407.0,407.0,407.0,407.0,407.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
mean,12548.717445,115863100.0,,14203.746929,14203.746929,14202.948403,13.514054,418.020344,78818.956069,,...,,,,,,,,,,
std,125.354772,1207642.0,,9351.142374,9351.142374,9350.997874,5.446881,271.096531,55864.939403,,...,,,,,,,,,,
min,12325.0,112153800.0,,1000.0,1000.0,1000.0,5.32,34.22,0.0,,...,,,,,,,,,,
25%,12442.5,115076900.0,,7000.0,7000.0,7000.0,9.93,235.58,43325.0,,...,,,,,,,,,,
50%,12550.0,115700400.0,,12000.0,12000.0,12000.0,12.62,357.25,63300.0,,...,,,,,,,,,,
75%,12653.5,116824500.0,,20000.0,20000.0,20000.0,16.02,553.515,95000.0,,...,,,,,,,,,,
max,12862.0,118159200.0,,40000.0,40000.0,40000.0,30.99,1417.71,495000.0,,...,,,,,,,,,,


### ‚öôÔ∏è **Step 4: Configure whylogs** 

`whylogs`, by default, does not send statistics to WhyLabs.

There are a few small steps you need to set up. If you haven't got the access key, please onboard with WhyLabs and generate an API key https://hub.whylabsapp.com/settings/access-tokens.

**WhyLabs only requires whylogs profiles - your raw data never leaves your machine.**

In [3]:
import whylogs as why

# Create a model in the dashboard and use that model id as the default dataset id in the prompt here. It will be
# saved in your whylogs conifg for future use. You can optionally supply reinit=True to reset your conifg. 
why.init(upload_on_log=True)

‚ùì What kind of session do you want to use?
 ‚§∑ 1. WhyLabs. Use an api key to upload to WhyLabs.
 ‚§∑ 2. WhyLabs Anonymous. Upload data anonymously to WhyLabs and get a viewing url.

Initializing session with config /home/jamie/.config/whylogs/config.ini

‚úÖ Using session type: WHYLABS_ANONYMOUS
 ‚§∑ session id: <will be generated before upload>


<whylogs.api.whylabs.session.session.GuestSession at 0x7fbf885ff880>

You can run this init from the command line as well with.

```bash
python -m whylogs.api.whylabs.session.why_init
```

You can use this to reset your config if you want to change your api key or default dataset it.

### üì¨ **Step 5: Logging to WhyLabs** 

Ensure you have a **model ID** (also called **dataset ID**) before you start!

#### Dataset Timestamp
* To avoid confusion, it's recommended that you use **[aware datetime](https://docs.python.org/3/library/datetime.html#:~:text=For%20applications%20requiring,is%20in%20effect.)** with `whylogs`
* If you don't set `dataset_timestamp` parameter, it'll default to `UTC` now
* WhyLabs supports real time visualization when the timestamp is **within the last 7 days**. Anything older than than will be picked up when we run our batch processing
* **If you log two profiles for the same day with different timestamps (12:00AM vs 12:01AM), they are merged to the same batch**

#### Logging Different Batches of Data
* We'll give the profiles different **dates**
* Create a new logger for each date. Note that the logger needs to be closed to flush out the data (automatically with the context manager in the example

In [4]:
import datetime

import whylogs as why

for i, df in enumerate(pdfs):
    # walking backwards. Each dataset has to map to a date to show up as a different batch
    # in WhyLabs
    dt = datetime.datetime.now(tz=datetime.timezone.utc) - datetime.timedelta(days=i)

    # log each day's data and set the date on the profile
    results = why.log(df, dataset_timestamp=dt)


‚úÖ Aggregated 407 rows into profile 

Visualize and explore this profile with one-click
üîç https://hub.whylabsapp.com/resources/model-1/profiles?profile=1725321600000&sessionToken=session-GKTK6PAd

‚úÖ Aggregated 390 rows into profile 

Visualize and explore this profile with one-click
üîç https://hub.whylabsapp.com/resources/model-1/profiles?profile=1725235200000&sessionToken=session-GKTK6PAd

‚úÖ Aggregated 382 rows into profile 

Visualize and explore this profile with one-click
üîç https://hub.whylabsapp.com/resources/model-1/profiles?profile=1725148800000&sessionToken=session-GKTK6PAd

‚úÖ Aggregated 371 rows into profile 

Visualize and explore this profile with one-click
üîç https://hub.whylabsapp.com/resources/model-1/profiles?profile=1725062400000&sessionToken=session-GKTK6PAd

‚úÖ Aggregated 301 rows into profile 

Visualize and explore this profile with one-click
üîç https://hub.whylabsapp.com/resources/model-1/profiles?profile=1724976000000&sessionToken=session-GKTK

In [5]:
from IPython.core.display import HTML

from whylogs.api.whylabs.session.session_manager import get_current_session

session = get_current_session()
model_id = session.config.get_default_dataset_id()

HTML(f'To view your statistics, go to the <a href="https://hub.whylabsapp.com/models/{model_id}/summary" target="_blank">model dashboard</a>')

### üìà **Step 6: Inspect statistics in WhyLabs** 

WhyLabs stores the follow statistics, from what is configured in `whylogs`

* Simple counters: boolean, null values, data types.
* Summary statistics: sum, min, max, median, variance.
* Unique value counter or cardinality: tracks an approximate unique value of your feature using HyperLogLog algorithm.
* Histograms for numerical features. whyLogs binary output can be queried to with dynamic binning based on the shape of your data.
* Top frequent items (default is 128). Note that this configuration affects the memory footprint, especially for text features.

Notice that these statistics are organized in batches. So if you run the above cells again, you'll see the statistics changed. 

* Now check the application to see if your **statistics** 
* Also, run the above cell again for the same model ID, do you see the statistics changes in WhyLabs? Especially the counters?

### üìù **Step 7: Run WhyLabs with your data** 

To go further, visit our [documentation](https://docs.whylabs.ai/) for more detailed of everything that you can do to start monitoring your ML and data pipelines.

You can also join our [Community Slack Channel](http://join.slack.whylabs.ai/) for questions related to `whylogs` or [cut us a ticket](https://support.whylabs.ai/) if you encounter issues with Whylabs onboarding.
