{"cells":[{"cell_type":"markdown","metadata":{"id":"xdsf78IH3_c7"},"source":["# PiML Toolbox for Model Development and Validation: Low-code Demo\n","\n","PiML (Python Interpretable Machine Learning) is a new Python toolbox for IML model development and validation. Through low-code automation and high-code programming, PiML supports various machine learning models in the following two categories:\n","\n","- **Inherently interpretable models**: \n"," 1. EBM: Explainable Boosting Machine (Nori, et al. 2019; Lou, et al. 2013)\n"," 2. GAMI-Net: Generalized Additive Model with Structured Interactions (Yang, Zhang and Sudjianto, 2021)\n"," 3. ReLU-DNN: Deep ReLU Networks using the Aletheia Unwrapper (Sudjianto, et al. 2020)\n","\n","- **Arbitrary black-box models**,e.g.\n"," 1. LightGBM or XGBoost of varying depth\n"," 2. RandomForest of varying depth\n"," 3. Residual Deep Neural Networks\n","\n","This example notebook demonstrates how to use PiML in its low-code mode for developing machine learning models, interpreting them and testing them. The toolbox has the following built-in datasets for demo purposes. \n","\n","- **CoCircles** classification data: simulated by `sklearn.datasets.make_make_circles(n_samples=2000, noise=0.1)`; see [details](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html). \n","- **Friedman** regression data: simulated by `sklearn.datasets.make_friedman1(n_samples=2000, n_features=10, and noise=0.1)`; see [details](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman1.html). \n","- **BikeSharing** regression data from UCI repository: consisting of 17,389 samples of hourly counts of rental bikes in Capital bikeshare system; see [details](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset). \n","- **CaliforniaHousing** regression data: consisting of 20,640 samples and 9 features, fetched by `sklearn.datasets`; see [details](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html). There are a raw version, a trim1 version (trimming only AveOccup) and a trim2 version (trimming AveRooms, AveBedrms, Population and AveOccup). \n","- **TaiwanCredit** classification data fro UCI repository: consisting of 30,000 credit card clients in Taiwan from 200504 to 200509; see [details](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients). This data is subject to slight preprocessing. "]},{"cell_type":"markdown","source":["# Stage 0: Install PiML package on Google Colab\n","\n","1. Run `!pip install piml` to install the latest version of PiML\n","2. In Colab, you'll need restart the runtime in order to use newly installed PiML version."],"metadata":{"id":"wJ7N7REOtAgN"}},{"cell_type":"code","source":["!pip install PiML"],"metadata":{"id":"0jvtEI-M15Xv"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"4kcbghi53_dB"},"source":["# Stage 1: Initialize an experiment, Load and Prepare data "]},{"cell_type":"code","execution_count":null,"metadata":{"ExecuteTime":{"end_time":"2022-04-18T12:31:46.670393Z","start_time":"2022-04-18T12:31:44.832218Z"},"id":"EN4UWjkT3_dC"},"outputs":[],"source":["from piml import Experiment\n","exp = Experiment(platform=\"colab\")"]},{"cell_type":"code","execution_count":null,"metadata":{"ExecuteTime":{"end_time":"2022-04-18T12:31:46.692710Z","start_time":"2022-04-18T12:31:46.674727Z"},"id":"1HXFMmmB3_dF"},"outputs":[],"source":["exp.data_loader()"]},{"cell_type":"code","execution_count":null,"metadata":{"ExecuteTime":{"end_time":"2022-04-18T12:31:53.491182Z","start_time":"2022-04-18T12:31:53.176711Z"},"id":"RTcvcydS3_dH","scrolled":false},"outputs":[],"source":["exp.data_summary()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"iQBVjZlmDxWk"},"outputs":[],"source":["exp.data_prepare()"]},{"cell_type":"code","execution_count":null,"metadata":{"ExecuteTime":{"end_time":"2022-04-18T12:31:56.068058Z","start_time":"2022-04-18T12:31:55.591583Z"},"id":"GmatYRgR3_dJ","scrolled":false},"outputs":[],"source":["exp.eda()"]},{"cell_type":"markdown","metadata":{"id":"RN8j7aZU3_dJ"},"source":["# Stage 2. Train intepretable models \n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"ExecuteTime":{"end_time":"2022-04-18T12:35:42.346210Z","start_time":"2022-04-18T12:35:42.096995Z"},"id":"WOe59Icm3_dK","scrolled":false},"outputs":[],"source":["exp.model_train()"]},{"cell_type":"markdown","metadata":{"id":"Lu3mpYH33_dK"},"source":["# Stage 3. Explain and Interpret "]},{"cell_type":"code","source":["exp.model_explain()"],"metadata":{"id":"ID9UHiYe5hot"},"execution_count":null,"outputs":[]},{"cell_type":"code","execution_count":null,"metadata":{"ExecuteTime":{"end_time":"2022-04-18T12:37:54.174073Z","start_time":"2022-04-18T12:37:54.162680Z"},"id":"oMgyUf7a3_dK","scrolled":false},"outputs":[],"source":["exp.model_interpret()"]},{"cell_type":"markdown","source":["# Stage 4. Diagnose and compare"],"metadata":{"id":"97AaEXDy5N8L"}},{"cell_type":"code","source":["exp.model_diagnose()"],"metadata":{"id":"dZNCKnDi5Nmo"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["exp.model_compare()"],"metadata":{"id":"FzJICjBfxWQl"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["#Stage 5. Register an arbitrary model ... "],"metadata":{"id":"VTJHgcRwtkpQ"}},{"cell_type":"code","source":["# train_x, train_y, test_x, test_y, Xnames, yname = exp.get_processed_data() \n","\n","from lightgbm import LGBMRegressor\n","pipeline = exp.make_pipeline(LGBMRegressor(max_depth=7))\n","pipeline.fit() #train_x, train_y\n","exp.register(pipeline=pipeline, name='LGBM')"],"metadata":{"id":"7WGJ8PzutkLh"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["exp.model_compare()"],"metadata":{"id":"USH17YFC4uiR"},"execution_count":null,"outputs":[]}],"metadata":{"colab":{"name":"PiML Low-code Example Run.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8.5"}},"nbformat":4,"nbformat_minor":0}