{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "XV25GIUMruLl" }, "source": [ "# Setup Environment\n", "\n", "We need to install [interpret](https://github.com/interpretml/interpret/) and [gamchanger](https://github.com/interpretml/gam-changer/) packages." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "jYksosEEsGwU" }, "outputs": [], "source": [ "# Install `interpretml` and `gamchanger` packages.\n", "# !pip install --upgrade interpret gamchanger" ] }, { "cell_type": "markdown", "metadata": { "id": "p9CB8dx2B67Y" }, "source": [ "# Model Training\n", "\n", "We will train a simple [EBM model](https://interpret.ml/docs/ebm.html) to predict if an indivisual's income is above 50K using the [census dataset](https://archive.ics.uci.edu/ml/datasets/census+income)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "Fhad39L2rb4k" }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import gamchanger as gc\n", "\n", "from json import load\n", "from sklearn.model_selection import train_test_split\n", "from interpret.glassbox import ExplainableBoostingClassifier\n", "\n", "df = pd.read_csv(\n", " \"https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data\",\n", " header=None)\n", "\n", "df.columns = [\n", " \"Age\", \"WorkClass\", \"fnlwgt\", \"Education\", \"EducationNum\",\n", " \"MaritalStatus\", \"Occupation\", \"Relationship\", \"Race\", \"Gender\",\n", " \"CapitalGain\", \"CapitalLoss\", \"HoursPerWeek\", \"NativeCountry\", \"Income\"\n", "]\n", "\n", "train_cols = df.columns[0:-1]\n", "label = df.columns[-1]\n", "X = df[train_cols]\n", "y = df[label].apply(lambda x: 0 if x == \" <=50K\" else 1) #Turning response into 0 and 1\n", "\n", "seed = 1\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "rltttrkyrkpr", "outputId": "3ed091ac-2f14-4246-fa08-488af015ab42" }, "outputs": [ { "data": { "text/plain": [ "ExplainableBoostingClassifier(feature_names=['Age', 'WorkClass', 'fnlwgt',\n", " 'Education', 'EducationNum',\n", " 'MaritalStatus', 'Occupation',\n", " 'Relationship', 'Race', 'Gender',\n", " 'CapitalGain', 'CapitalLoss',\n", " 'HoursPerWeek', 'NativeCountry',\n", " 'Relationship x HoursPerWeek',\n", " 'Age x Relationship',\n", " 'EducationNum x Occupation',\n", " 'MaritalStatus x HoursPerWeek',\n", " 'Occupation x Relationship',\n", " 'Occ...\n", " feature_types=['continuous', 'categorical',\n", " 'continuous', 'categorical',\n", " 'continuous', 'categorical',\n", " 'categorical', 'categorical',\n", " 'categorical', 'categorical',\n", " 'continuous', 'continuous',\n", " 'continuous', 'categorical',\n", " 'interaction', 'interaction',\n", " 'interaction', 'interaction',\n", " 'interaction', 'interaction',\n", " 'interaction', 'interaction',\n", " 'interaction', 'interaction'],\n", " n_jobs=-1, random_state=1)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ebm = ExplainableBoostingClassifier(random_state=seed, n_jobs=-1)\n", "ebm.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": { "id": "6Q8UL6bwCQBm" }, "source": [ "# GAM Changing\n", "\n", "Then we can start to **investigate**, **validate**, and **edit** the trained EBM model using GAM Changer.\n", "\n", "GAM Changer expects:\n", "\n", "1. Trained EBM model\n", "2. Sample data (optional)\n", " \n", "GAM Changer uses the sample data to compute model performance, feature correlation, etc. We suggest always providing sample data, unless the data is sensitive and you plan to share GAM Changer visualizations/outputs with external collaborators.\n", "\n", "We recommend to generate sample data using the **validation set** (if you have one), or a **large subset of the training set**. GAM Changer can support up to 10k+ data points (upper bound is the browser's memory limit), and the it can provide realtime feedback when the sample size is about less than 3k.\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "O5dxMUWTsFOU" }, "outputs": [], "source": [ "# Randomly sample 2000 points from the training set for GAM Changer\n", "rand_indexes = np.random.choice(range(len(y_train)), 2000)\n", "\n", "X_sample = X_train.to_numpy()[rand_indexes]\n", "y_sample = y_train.to_numpy()[rand_indexes]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 666 }, "id": "LkkFETcRtPwH", "outputId": "da1d4112-f499-4440-a2a9-50d7fd12cce3" }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gc.visualize(ebm, X_sample, y_sample)" ] }, { "cell_type": "markdown", "metadata": { "id": "0SKX1od9jjTM" }, "source": [ "## Use the changed EBM\n", "\n", "Suppose you have maded some edits on your EBM model, and save a `.gamchanger` file by clicking the save button on the botttom right. Let's load the new model in Python!\n", "\n", "Unfortunately, we cannot automatically transfer the new model back to Python (restriction of computational notebooks for security concerns). Instead, we will use Python to load the downloaded `.gamchanger` file.\n", "\n", "For cloud-based notebooks (e.g., Colab), you would need to upload the `.gamchanger` file to the working directory so that you can load it from your notebook.\n", "\n", "For example, here we load a previous editing history `edit-6-14-2021.gamchanger` to the root directory on the left panel, then use the code in the below cell." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 232 }, "id": "QB4D4wgAjn6t", "outputId": "5f81139a-a100-4b6a-f55c-80a91c075545" }, "outputs": [ { "data": { "text/plain": [ "ExplainableBoostingClassifier(feature_names=['Age', 'WorkClass', 'fnlwgt',\n", " 'Education', 'EducationNum',\n", " 'MaritalStatus', 'Occupation',\n", " 'Relationship', 'Race', 'Gender',\n", " 'CapitalGain', 'CapitalLoss',\n", " 'HoursPerWeek', 'NativeCountry',\n", " 'Relationship x HoursPerWeek',\n", " 'Age x Relationship',\n", " 'EducationNum x Occupation',\n", " 'MaritalStatus x HoursPerWeek',\n", " 'Occupation x Relationship',\n", " 'Occ...\n", " feature_types=['continuous', 'categorical',\n", " 'continuous', 'categorical',\n", " 'continuous', 'categorical',\n", " 'categorical', 'categorical',\n", " 'categorical', 'categorical',\n", " 'continuous', 'continuous',\n", " 'continuous', 'categorical',\n", " 'interaction', 'interaction',\n", " 'interaction', 'interaction',\n", " 'interaction', 'interaction',\n", " 'interaction', 'interaction',\n", " 'interaction', 'interaction'],\n", " n_jobs=-1, random_state=1)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Read from file \n", "\n", "# Use below code to load from `~/Downloads`\n", "# gc_dict = load(open(os.path.join(os.path.expanduser('~'), 'Downloads/edit-6-14-2022.gamchanger'), 'r'))\n", "\n", "# Use below code to load from `./`\n", "gc_dict = load(open('./edit-6-14-2022.gamchanger', 'r'))\n", "\n", "# gc.get_edited_model will return a copy of your original EBM where edits are applied\n", "new_ebm = gc.get_edited_model(ebm, gc_dict)\n", "\n", "new_ebm" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "gam-changer-adult.ipynb", "provenance": [] }, "interpreter": { "hash": "873180efce40cd0f92c6f277e3717bf56751c1b9e1ee0ff98acce134a686d5e5" }, "kernelspec": { "display_name": "Python 3.7.10 ('gam')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }