# PiML Toolbox for Model Development and Validation: Low-code Demo

PiML (Python Interpretable Machine Learning) is a new Python toolbox for IML model development and validation. Through low-code automation and high-code programming, PiML supports various machine learning models in the following two categories:

- **Inherently interpretable models**: 
  1. EBM: Explainable Boosting Machine (Nori, et al. 2019; Lou, et al. 2013)
  2. GAMI-Net: Generalized Additive Model with Structured Interactions (Yang, Zhang and Sudjianto, 2021)
  3. ReLU-DNN: Deep ReLU Networks using the Aletheia Unwrapper (Sudjianto, et al. 2020)

- **Arbitrary black-box models**ï¼Œe.g.
  1. LightGBM or XGBoost of varying depth
  2. RandomForest of varying depth
  3. Residual Deep Neural Networks

This example notebook demonstrates how to use PiML in its low-code mode for developing machine learning models, interpreting them and testing them. The toolbox has the following built-in datasets for demo purposes. 

- **CoCircles** classification data: simulated by `sklearn.datasets.make_make_circles(n_samples=2000, noise=0.1)`; see [details](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html).   
- **Friedman** regression data: simulated by `sklearn.datasets.make_friedman1(n_samples=2000, n_features=10, and noise=0.1)`; see [details](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman1.html).   
- **BikeSharing** regression data from UCI repository: consisting of 17,389 samples of hourly counts of rental bikes in Capital bikeshare system; see [details](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset).  
- **CaliforniaHousing** regression data: consisting of 20,640 samples and 9 features, fetched by `sklearn.datasets`; see [details](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html). There are a raw version, a trim1 version (trimming only AveOccup) and a trim2 version (trimming AveRooms, AveBedrms, Population and AveOccup).   
- **TaiwanCredit** classification data fro UCI repository: consisting of 30,000 credit card clients in Taiwan from 200504 to 200509; see [details](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients). This data is subject to slight preprocessing.  

# Stage 0: Install PiML package on Google Colab

1. Run `!pip install piml` to install the latest version of PiML
2. In Colab, you'll need restart the runtime in order to use newly installed PiML version.

In [None]:
!pip install PiML

# Stage 1: Initialize an experiment, Load and Prepare data <a name="expdata"></a>

In [None]:
from piml import Experiment
exp = Experiment(platform="colab")

In [None]:
exp.data_loader()

In [None]:
exp.data_summary()

In [None]:
exp.data_prepare()

In [None]:
exp.eda()

# Stage 2. Train intepretable models <a name="modeltrain"></a>



In [None]:
exp.model_train()

# Stage 3. Explain and Interpret <a name="modelinterpret"></a>

In [None]:
exp.model_explain()

In [None]:
exp.model_interpret()

# Stage 4. Diagnose and compare

In [None]:
exp.model_diagnose()

In [None]:
exp.model_compare()

#Stage 5. Register an arbitrary model ... 

In [None]:
# train_x, train_y, test_x, test_y, Xnames, yname = exp.get_processed_data() 

from lightgbm import LGBMRegressor
pipeline = exp.make_pipeline(LGBMRegressor(max_depth=7))
pipeline.fit() #train_x, train_y
exp.register(pipeline=pipeline, name='LGBM')

In [None]:
exp.model_compare()