>### ðŸš© *Create a free WhyLabs account to get more value out of whylogs!*<br> 
>*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=flask_with_whylogs)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=flask_with_whylogs) to leverage the power of whylogs and WhyLabs together!*

# Integrating Whylogs into your Flask Flow

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/flask_streaming/flask_with_whylogs.ipynb)

Now that you've gone through the basics, let's get into how to integrate whylogs into your current work flow. Today, let's look at Flask to build a web app that wil use data from IRIS and us it for a prediction. Notice that we will log both the input received and the prediction! 

### What you'll Need
- Docker
- pandas
- scikit-learn
- Flask

## Overview

![Data Flow by Felipe de Pontes Adachi](./assets/flask_whylogs_whylabs_flow.jpeg)

Weâ€™ll deploy locally a Flask application, which is responsible for serving the user with the requested predictions through a REST endpoint. Our application will use the whylogs library to create statistical profiles of both input and output features of our application during production. These statistical properties will then be sent in microbatches to either to a local writer or WhyLabs at fixed intervals. If sent to WhyLabs, it will merge them automatically, creating statistical profiles on a daily basis.

#### Let's get our environment ready! 
Uncomment the whylogs install if you don't have it with whylabs extension installed

In [1]:
# Note: you may need to restart the kernel to use updated packages.
%pip install pandas utils joblib scikit-learn Flask
%pip install whylogs

Note: you may need to restart the kernel to use updated packages.


In [2]:
import random
import numpy as np
import time
import requests
import pandas as pd
from joblib import dump
from sklearn.svm import SVC
import sklearn.datasets 
from sklearn.model_selection import train_test_split

### Step 1: Load the Data
For this we will be using the IRIS dataset for our classification. It looks at the sepal and petal lengths and widths to be able to make a prediction of what species it is. This data set is readily available, but in this case we will grab it from sklearn's dataset library.

In [3]:
iris = sklearn.datasets.load_iris(as_frame=True)
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)

data['target'] = [iris.target_names[i] for i in iris.target]
data

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


In [4]:
       # Separating the independent variables from dependent variables
X = data.iloc[:, 0:4].values
y = data.iloc[:, -1].values
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

### Step 2: Train the Model
Next up, it's time to train the model. We will use a simple SVC, fit the modle, then dump it to "model.joblib". 

In [5]:
# Train a classifier
print("Train started.")
model = SVC()
model.fit(x_train, y_train)
print("Train finished.")
# Save the model
dump(model, 'model.joblib')
print("Model saved as model.joblib")

Train started.
Train finished.
Model saved as model.joblib


### Step 3: Build and Run a Docker Image
Within the directory this notebook is in you'll see all the code that makes up our flask app. There are a lot of files, but the main ones of interest are in the api folder. For this next step we will use docker to build an image based on all the requirements and settings that are put in that outer directory

In [6]:
!docker build --build-arg PYTHON_VERSION=3.9 -t whylogs-flask .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.1s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 37B                                        0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 34B                                           0.0s
[0m => [internal] load metadata for docker.io/library/python:3.9              0.0s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 37B                                        0.0s
[0m[34m => [internal] load .dockerignore                 

Huzzah!! It built! We have a docker image, but nothing is running yet. Open a terminal and execute the following command:

```bash
docker run --rm -p 5000:5000 whylogs-flask
```


### Step 4: Test Endpoint
Let's make sure it's actually up and running follow the directions below. 

- Go to http://0.0.0.0:5000/apidocs/
- Open /predict endpoint green tab.
- Click Try it out.
- Click Execute green button.
- Check the response and code, if 200, the API is working

If it's not working please check `docker ps` to see if something else is running on that port. 

### Step 5: Mess with Data to Showcase a Drift
Note that the logger is configured for to roll over every 5 min, we recommend running this for at least 15 min before really digging into the visualizations.

The following functions aim to modify the variables distribution in order to test whylabs.

In [7]:
def modify_random_column_values(data, value: float = np.random.uniform(low=0.0, high=10.0)) -> None:
    random_column = None
    data_mod = data.copy(deep=True)
    try:
        number_of_columns = len(data_mod.columns) - 2 # Index and label eliminated
        random_column = data_mod.columns[np.random.randint(number_of_columns) + 1]
        data_mod[random_column] = value
    except Exception as ex:
        raise f"Error adding fix value in random column: {str(random_column)}"
    return data_mod
        
        
def add_random_column_outliers(data, number_outliers: int = 10) -> None:
    random_column = None
    data_mod = data.copy(deep=True)
    try:
        number_of_columns = len(data_mod.columns) - 2  # Index and label eliminated
        number_of_rows = data_mod.shape[0]
        random_column = data_mod.columns[np.random.randint(number_of_columns) + 1]
        for i in range(number_outliers):
            random_row = np.random.randint(0, number_of_rows)
            data_mod.loc[random_row, random_column] = round(np.random.uniform(low=20.0, high=50.0), 2)
    except Exception as ex:
        raise f"Error adding outliers in random column: {random_column}"
    return data_mod

Once it's working, you can try to send continous requests to the endpoint:

In [8]:
labels = ["sepal_length_cm", "sepal_width_cm", "petal_length_cm", "petal_width_cm"]

In [9]:
# modify a variable distribution
data_mod = add_random_column_outliers(data, 30)
print("Dataset distribution modified!")

Dataset distribution modified!


In [10]:
url = "http://0.0.0.0:5000/api/v1"

In [11]:
healthy = requests.get(f"{url}/health")
if healthy.ok:
    for k in range(data_mod.shape[0]):
        # Build a payload with random values
        payload = dict(zip(labels, data_mod.iloc[:, 0:4].values[k]))
        print(payload)
        response = requests.post(f"{url}/predict", json=payload)
        if response.ok:
            print(response.json())
            time.sleep(random.randrange(2, 10))

{'sepal_length_cm': 5.1, 'sepal_width_cm': 3.5, 'petal_length_cm': 1.4, 'petal_width_cm': 0.2}
{'data': {'class': 'setosa'}, 'message': 'Success'}
{'sepal_length_cm': 4.9, 'sepal_width_cm': 3.0, 'petal_length_cm': 1.4, 'petal_width_cm': 25.89}
{'data': {'class': 'virginica'}, 'message': 'Success'}
{'sepal_length_cm': 4.7, 'sepal_width_cm': 3.2, 'petal_length_cm': 1.3, 'petal_width_cm': 0.2}
{'data': {'class': 'setosa'}, 'message': 'Success'}
{'sepal_length_cm': 4.6, 'sepal_width_cm': 3.1, 'petal_length_cm': 1.5, 'petal_width_cm': 0.2}
{'data': {'class': 'setosa'}, 'message': 'Success'}
{'sepal_length_cm': 5.0, 'sepal_width_cm': 3.6, 'petal_length_cm': 1.4, 'petal_width_cm': 48.97}
{'data': {'class': 'virginica'}, 'message': 'Success'}
{'sepal_length_cm': 5.4, 'sepal_width_cm': 3.9, 'petal_length_cm': 1.7, 'petal_width_cm': 48.1}
{'data': {'class': 'virginica'}, 'message': 'Success'}
{'sepal_length_cm': 4.6, 'sepal_width_cm': 3.4, 'petal_length_cm': 1.4, 'petal_width_cm': 0.3}
{'data': 

# Viewing the Data

This app defaults to a local file writer. You can look at the logs within the docker container by running `docker exec -it <container-id> bash` . They will all be in `ls /logs`

While there you can visualize the results by following the instructions of the ["Notebook Profile Visualizer"](https://github.com/whylabs/whylogs/blob/9618e5dd6570bc484579ec1325f2f512ff56977f/python/examples/basic/Notebook_Profile_Visualizer.ipynb)
. Some more information can be found in these notebooks ["Merging Profiles"](https://github.com/whylabs/whylogs/blob/9618e5dd6570bc484579ec1325f2f512ff56977f/python/examples/basic/Merging_Profiles.ipynb) and ["Streaming_Data_with_Log_Rotation"](https://github.com/whylabs/whylogs/blob/mainline/python/examples/advanced/Log_Rotation_for_Streaming_Data/Streaming_Data_with_Log_Rotation.ipynb).

## Using WhyLabs
As we talked about in the overview though we will also be able to send this over to your WhyLabs observation dashboard. Here it will automatically receive the data and merge them into useful daily analysis. Follow the directions in "WhyLabs Writer" to change the .env file in this directory. Then rerun the above. This time it will populate your WhyLabs dataset to be able to see the drifts on the WhyLabs portal. 

Included in this notebook's directory is a `.env` file with the needed variables set to None. 

![WhyLabs Profile](./assets/WhyLabs_profile.png)