{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## License \n", "\n", "Copyright 2017 - 2020 Patrick Hall and the H2O.ai team\n", "\n", "Licensed under the Apache License, Version 2.0 (the \"License\");\n", "you may not use this file except in compliance with the License.\n", "You may obtain a copy of the License at\n", "\n", " http://www.apache.org/licenses/LICENSE-2.0\n", "\n", "Unless required by applicable law or agreed to in writing, software\n", "distributed under the License is distributed on an \"AS IS\" BASIS,\n", "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "See the License for the specific language governing permissions and\n", "limitations under the License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**DISCLAIMER:** This notebook is not legal compliance advice." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Explain Your Predictive Models to Business Stakeholders using LIME with Python and H2O\n", "#### Describing complex models and generating reason codes with Local Interpretable Model-agnostic Explanations (LIME) and LIME-variants\n", "\n", "Local Interpretable Model-agnostic Explanations (LIME) shed light on how almost any machine learning model makes decisions for specific rows of data. LIME builds local linear surrogate models around observations of interest and leverages the highly interpretable properties of linear models to increase transparency and accountability for the corresponding model predictions. In this notebook, an h2o GBM is trained on the UCI credit card default data and then predictions for a highly risky customer are explained using linear model coefficients and LIME-derived reason codes. The notebook concludes by introducing a variant of LIME that is easier to execute on new data and that can be analyzed alongside observed (i.e., not simulated) data.\n", "\n", "**Note**: As of the h2o 3.24 \"Yates\" release, Shapley values are supported in h2o. Shapley values can be used in place of or along with LIME. To see Shapley values for an h2o GBM in action please see: https://github.com/jphall663/interpretable_machine_learning_with_python/blob/master/dia.ipynb." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Python imports\n", "In general, NumPy and Pandas will be used for data manipulation purposes and h2o will be used for modeling tasks. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# h2o Python API with specific classes\n", "import h2o \n", "from h2o.estimators.glm import H2OGeneralizedLinearEstimator # for LIME\n", "from h2o.grid.grid_search import H2OGridSearch # for LIME\n", "from h2o.estimators.gbm import H2OGradientBoostingEstimator # for GBM\n", "\n", "\n", "import operator # for sorting dictionaries\n", "\n", "import numpy as np # array, vector, matrix calculations\n", "import pandas as pd # DataFrame handling\n", "\n", "# display plots in notebook\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Start h2o\n", "H2o is both a library and a server. The machine learning algorithms in the library take advantage of the multithreaded and distributed architecture provided by the server to train machine learning algorithms extremely efficiently. The API for the library was imported above in cell 1, but the server still needs to be started." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.\n", "Attempting to start a local H2O server...\n", " Java Version: java version \"1.8.0_201\"; Java(TM) SE Runtime Environment (build 1.8.0_201-b09); Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)\n", " Starting server from /home/patrickh/anaconda3/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar\n", " Ice root: /tmp/tmpq9cbjivf\n", " JVM stdout: /tmp/tmpq9cbjivf/h2o_patrickh_started_from_python.out\n", " JVM stderr: /tmp/tmpq9cbjivf/h2o_patrickh_started_from_python.err\n", " Server is running at http://127.0.0.1:54321\n", "Connecting to H2O server at http://127.0.0.1:54321 ... successful.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
H2O cluster uptime:01 secs
H2O cluster timezone:America/New_York
H2O data parsing timezone:UTC
H2O cluster version:3.26.0.3
H2O cluster version age:4 days
H2O cluster name:H2O_from_python_patrickh_zub5an
H2O cluster total nodes:1
H2O cluster free memory:1.778 Gb
H2O cluster total cores:8
H2O cluster allowed cores:8
H2O cluster status:accepting new members, healthy
H2O connection url:http://127.0.0.1:54321
H2O connection proxy:None
H2O internal security:False
H2O API Extensions:Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
Python version:3.6.4 final
" ], "text/plain": [ "-------------------------- ---------------------------------------------------\n", "H2O cluster uptime: 01 secs\n", "H2O cluster timezone: America/New_York\n", "H2O data parsing timezone: UTC\n", "H2O cluster version: 3.26.0.3\n", "H2O cluster version age: 4 days\n", "H2O cluster name: H2O_from_python_patrickh_zub5an\n", "H2O cluster total nodes: 1\n", "H2O cluster free memory: 1.778 Gb\n", "H2O cluster total cores: 8\n", "H2O cluster allowed cores: 8\n", "H2O cluster status: accepting new members, healthy\n", "H2O connection url: http://127.0.0.1:54321\n", "H2O connection proxy:\n", "H2O internal security: False\n", "H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4\n", "Python version: 3.6.4 final\n", "-------------------------- ---------------------------------------------------" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "h2o.init(max_mem_size='2G') # start h2o\n", "h2o.remove_all() # remove any existing data structures from h2o memory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Download, explore, and prepare UCI credit card default data\n", "\n", "UCI credit card default data: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients\n", "\n", "The UCI credit card default data contains demographic and payment information about credit card customers in Taiwan in the year 2005. The data set contains 23 input variables: \n", "\n", "* **`LIMIT_BAL`**: Amount of given credit (NT dollar)\n", "* **`SEX`**: 1 = male; 2 = female\n", "* **`EDUCATION`**: 1 = graduate school; 2 = university; 3 = high school; 4 = others \n", "* **`MARRIAGE`**: 1 = married; 2 = single; 3 = others\n", "* **`AGE`**: Age in years \n", "* **`PAY_0`, `PAY_2` - `PAY_6`**: History of past payment; `PAY_0` = the repayment status in September, 2005; `PAY_2` = the repayment status in August, 2005; ...; `PAY_6` = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; ...; 8 = payment delay for eight months; 9 = payment delay for nine months and above. \n", "* **`BILL_AMT1` - `BILL_AMT6`**: Amount of bill statement (NT dollar). `BILL_AMNT1` = amount of bill statement in September, 2005; `BILL_AMT2` = amount of bill statement in August, 2005; ...; `BILL_AMT6` = amount of bill statement in April, 2005. \n", "* **`PAY_AMT1` - `PAY_AMT6`**: Amount of previous payment (NT dollar). `PAY_AMT1` = amount paid in September, 2005; `PAY_AMT2` = amount paid in August, 2005; ...; `PAY_AMT6` = amount paid in April, 2005. \n", "\n", "These 23 input variables are used to predict the target variable, whether or not a customer defaulted on their credit card bill in late 2005.\n", "\n", "Because h2o accepts both numeric and character inputs, some variables will be recoded into more transparent character values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Import data and clean\n", "The credit card default data is available as an `.xls` file. Pandas reads `.xls` files automatically, so it's used to load the credit card default data and give the prediction target a shorter name: `DEFAULT_NEXT_MONTH`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# import XLS file\n", "path = 'default_of_credit_card_clients.xls'\n", "data = pd.read_excel(path,\n", " skiprows=1)\n", "\n", "# remove spaces from target column name \n", "data = data.rename(columns={'default payment next month': 'DEFAULT_NEXT_MONTH'}) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Assign modeling roles\n", "The shorthand name `y` is assigned to the prediction target. `X` is assigned to all other input variables in the credit card default data except the row indentifier, `ID`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "y = DEFAULT_NEXT_MONTH\n", "X = ['LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6']\n" ] } ], "source": [ "# assign target and inputs for GBM\n", "y = 'DEFAULT_NEXT_MONTH'\n", "X = [name for name in data.columns if name not in [y, 'ID']]\n", "print('y =', y)\n", "print('X =', X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Helper function for recoding values in the UCI credict card default data\n", "This simple function maps longer, more understandable character string values from the UCI credit card default data dictionary to the original integer values of the input variables found in the dataset. These character values can be used directly in h2o decision tree models, and the function returns the original Pandas DataFrame as an h2o object, an H2OFrame. H2o models cannot run on Pandas DataFrames. They require H2OFrames." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parse progress: |█████████████████████████████████████████████████████████| 100%\n" ] } ], "source": [ "def recode_cc_data(frame):\n", " \n", " \"\"\" Recodes numeric categorical variables into categorical character variables\n", " with more transparent values. \n", " \n", " Args:\n", " frame: Pandas DataFrame version of UCI credit card default data.\n", " \n", " Returns: \n", " H2OFrame with recoded values.\n", " \n", " \"\"\"\n", " \n", " # define recoded values\n", " sex_dict = {1:'male', 2:'female'}\n", " education_dict = {0:'other', 1:'graduate school', 2:'university', 3:'high school', \n", " 4:'other', 5:'other', 6:'other'}\n", " marriage_dict = {0:'other', 1:'married', 2:'single', 3:'divorced'}\n", " pay_dict = {-2:'no consumption', -1:'pay duly', 0:'use of revolving credit', 1:'1 month delay', \n", " 2:'2 month delay', 3:'3 month delay', 4:'4 month delay', 5:'5 month delay', 6:'6 month delay', \n", " 7:'7 month delay', 8:'8 month delay', 9:'9+ month delay'}\n", " \n", " # recode values using Pandas apply() and anonymous function\n", " frame['SEX'] = frame['SEX'].apply(lambda i: sex_dict[i])\n", " frame['EDUCATION'] = frame['EDUCATION'].apply(lambda i: education_dict[i]) \n", " frame['MARRIAGE'] = frame['MARRIAGE'].apply(lambda i: marriage_dict[i]) \n", " for name in frame.columns:\n", " if name in ['PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6']:\n", " frame[name] = frame[name].apply(lambda i: pay_dict[i]) \n", " \n", " return h2o.H2OFrame(frame)\n", "\n", "data = recode_cc_data(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Ensure target is handled as a categorical variable\n", "In h2o, a numeric variable can be treated as numeric or categorical. The target variable `DEFAULT_NEXT_MONTH` takes on values of `0` or `1`. To ensure this numeric variable is treated as a categorical variable, the `asfactor()` function is used to explicitly declare that it is a categorical variable. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "data[y] = data[y].asfactor() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display descriptive statistics\n", "The h2o `describe()` function displays a brief description of the credit card default data. For the categorical input variables `LIMIT_BAL`, `SEX`, `EDUCATION`, `MARRIAGE`, and `PAY_0`-`PAY_6`, the new character values created above in cell 5 are visible. Basic descriptive statistics are displayed for numeric inputs." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rows:30000\n", "Cols:25\n", "\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 PAY_3 PAY_4 PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 DEFAULT_NEXT_MONTH
type int int enum enum enum int enum enum enum enum enum enum int int int int int int int int int int int int enum
mins 1.0 10000.0 21.0 -165580.0 -69777.0 -157264.0 -170000.0 -81334.0 -339603.0 0.0 0.0 0.0 0.0 0.0 0.0
mean 15000.5 167484.32266666688 35.48549999999994 51223.3309000000949179.0751666666847013.1547999997143262.9489666666 40311.4009666665338871.760399999915663.580500000014 5921.16350000001 5225.681500000005 4826.076866666661 4799.387633333302 5215.502566666664
maxs 30000.0 1000000.0 79.0 964511.0 983931.0 1664089.0 891586.0 927171.0 961664.0 873552.0 1684259.0 896040.0 621000.0 426529.0 528666.0
sigma 8660.398374208891129747.66156720225 9.21790406809016 73635.8605755295971173.7687825283669349.3874270368164332.8561339164160797.1557702648 59554.1075367457416563.28035402576323040.87040205722617606.96146980311515666.15974403199315278.30567914479317777.465775435332
zeros 0 0 0 2008 2506 2870 3195 3506 4020 5249 5396 5968 6408 6703 7173
missing0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1.0 20000.0 femaleuniversity married 24.0 2 month delay 2 month delay pay duly pay duly no consumption no consumption 3913.0 3102.0 689.0 0.0 0.0 0.0 0.0 689.0 0.0 0.0 0.0 0.0 1
1 2.0 120000.0 femaleuniversity single 26.0 pay duly 2 month delay use of revolving credituse of revolving credituse of revolving credit2 month delay 2682.0 1725.0 2682.0 3272.0 3455.0 3261.0 0.0 1000.0 1000.0 1000.0 0.0 2000.0 1
2 3.0 90000.0 femaleuniversity single 34.0 use of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credit29239.0 14027.0 13559.0 14331.0 14948.0 15549.0 1518.0 1500.0 1000.0 1000.0 1000.0 5000.0 0
3 4.0 50000.0 femaleuniversity married 37.0 use of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credit46990.0 48233.0 49291.0 28314.0 28959.0 29547.0 2000.0 2019.0 1200.0 1100.0 1069.0 1000.0 0
4 5.0 50000.0 male university married 57.0 pay duly use of revolving creditpay duly use of revolving credituse of revolving credituse of revolving credit8617.0 5670.0 35835.0 20940.0 19146.0 19131.0 2000.0 36681.0 10000.0 9000.0 689.0 679.0 0
5 6.0 50000.0 male graduate schoolsingle 37.0 use of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credit64400.0 57069.0 57608.0 19394.0 19619.0 20024.0 2500.0 1815.0 657.0 1000.0 1000.0 800.0 0
6 7.0 500000.0 male graduate schoolsingle 29.0 use of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credit367965.0 412023.0 445007.0 542653.0 483003.0 473944.0 55000.0 40000.0 38000.0 20239.0 13750.0 13770.0 0
7 8.0 100000.0 femaleuniversity single 23.0 use of revolving creditpay duly pay duly use of revolving credituse of revolving creditpay duly 11876.0 380.0 601.0 221.0 -159.0 567.0 380.0 601.0 0.0 581.0 1687.0 1542.0 0
8 9.0 140000.0 femalehigh school married 28.0 use of revolving credituse of revolving credit2 month delay use of revolving credituse of revolving credituse of revolving credit11285.0 14096.0 12108.0 12211.0 11793.0 3719.0 3329.0 0.0 432.0 1000.0 1000.0 1000.0 0
9 10.0 20000.0 male high school single 35.0 no consumption no consumption no consumption no consumption pay duly pay duly 0.0 0.0 0.0 0.0 13007.0 13912.0 0.0 0.0 0.0 13007.0 1122.0 0.0 0
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Train an H2O GBM classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Split data into training and test sets for early stopping\n", "The credit card default data is split into training and test sets to monitor and prevent overtraining. Reproducibility is also an important factor in creating trustworthy models, and randomly splitting datasets can introduce randomness in model predictions and other results. A random seed is used here to ensure the data split is reproducible." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train data rows = 21060, columns = 25\n", "Test data rows = 8940, columns = 25\n" ] } ], "source": [ "# split into training and validation\n", "train, test = data.split_frame([0.7], seed=12345)\n", "\n", "# summarize split\n", "print('Train data rows = %d, columns = %d' % (train.shape[0], train.shape[1]))\n", "print('Test data rows = %d, columns = %d' % (test.shape[0], test.shape[1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Train h2o GBM classifier\n", "Many tuning parameters must be specified to train a GBM using h2o. Typically a grid search would be performed to identify the best parameters for a given modeling task using the `H2OGridSearch` class. For brevity's sake, a previously-discovered set of good tuning parameters are specified here. Because gradient boosting methods typically resample training data, an additional random seed is also specified for the h2o GBM using the `seed` parameter to create reproducible predictions, error rates, and variable importance values. To avoid overfitting, the `stopping_rounds` parameter is used to stop the training process after the test error fails to decrease for 5 iterations." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gbm Model Build progress: |███████████████████████████████████████████████| 100%\n", "GBM Test AUC = 0.78\n" ] } ], "source": [ "# initialize GBM model\n", "model = H2OGradientBoostingEstimator(ntrees=150, # maximum 150 trees in GBM\n", " max_depth=4, # trees can have maximum depth of 4\n", " sample_rate=0.9, # use 90% of rows in each iteration (tree)\n", " col_sample_rate=0.9, # use 90% of variables in each iteration (tree)\n", " stopping_rounds=5, # stop if validation error does not decrease for 5 iterations (trees)\n", " score_tree_interval=1, # for reproducibility, set higher for bigger data\n", " seed=12345) # random seed for reproducibility\n", "\n", "# train a GBM model\n", "model.train(y=y, x=X, training_frame=train, validation_frame=test)\n", "\n", "# print AUC\n", "print('GBM Test AUC = %.2f' % model.auc(valid=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Use LIME to generate descriptions for a local region with a perturbed sample\n", "\n", "LIME was originally described in the context of explaining image or text classification decisions here: http://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf. It can certainly also be applied to business or customer data, as will be done in the remaining sections of this notebook. Multiple Python implementations of LIME are available from the original authors of LIME, from the eli5 package, from the skater package, and probably others. However, this notebook uses a simple, step-by-step implementation of LIME for instructional purposes. \n", "\n", "A linear model cannot be built on a single observation, so LIME typically requires that a set of rows similar to the row of interest be simulated. This set of records are scored using the complex model to be explained. Then the records are weighted by their closeness to the record of interest, and a regularized linear model is trained on this weighted explanatory set. The parameters of the linear model and LIME-derived reason codes are then used to explain the prediction for the selected record. Because simulation of new points can seem abstract to some practicioners and simulation and distance calculations can be somewhat burdensome for creating explanations quickly in mission-critical applications, this notebook also presents a variation of LIME in which a more practical sample, instead of a perturbed, simulated sample, is used to create a local region in which to fit a linear model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display the most risky customer\n", "In the Oriole notebook *Increase Transparency and Accountability in Your Machine Learning Project with Python and H2O*, row index 29116 was found to contain the riskiest customer in the test dataset according to the h2o GBM model. Sections 3-7 focus on deriving reason codes and other explanations for this customer's GBM prediction. The riskiest customer is selected first for analysis as an exercise in boundary testing." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ID LIMIT_BALSEX EDUCATION MARRIAGE AGEPAY_0 PAY_2 PAY_3 PAY_4 PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 DEFAULT_NEXT_MONTH
29116 20000femaleuniversity married 593 month delay2 month delay3 month delay2 month delay2 month delay4 month delay 8803 11137 10672 11201 12721 11946 2800 0 1000 2000 0 0 1
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "row = test[test['ID'] == 29116]\n", "row" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use LIME, a sample of similar (i.e., near or local) points is simulated around the customer of interest. This simple function draws numeric values from normal distributions centered around the customer of interest and draws categorical values at random from the variable values in the test set." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
LIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5...BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6
09988.454213femalegraduate schooldivorced58.2875105 month delay5 month delay5 month delay5 month delay6 month delay...5433.3408046276.5768768055.5305877347.4679111597.8344900.0000001000.000000823.2532570.0000000.000000
1181039.642122malehigh schoolmarried70.460689pay dulypay dulypay dulypay dulyuse of revolving credit...94937.88861490412.27809987766.90605185915.19292622137.30391825583.93027321802.01039820928.43306619123.77592922563.515833
220000.000000maleuniversitysingle43.2842337 month delay7 month delay7 month delay7 month delay8 month delay...10672.00000011201.00000012721.00000011946.0000002800.0000000.0000001000.0000002000.0000000.0000000.000000
\n", "

3 rows × 23 columns

\n", "
" ], "text/plain": [ " LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 \\\n", "0 9988.454213 female graduate school divorced 58.287510 5 month delay \n", "1 181039.642122 male high school married 70.460689 pay duly \n", "2 20000.000000 male university single 43.284233 7 month delay \n", "\n", " PAY_2 PAY_3 PAY_4 PAY_5 \\\n", "0 5 month delay 5 month delay 5 month delay 6 month delay \n", "1 pay duly pay duly pay duly use of revolving credit \n", "2 7 month delay 7 month delay 7 month delay 8 month delay \n", "\n", " ... BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 \\\n", "0 ... 5433.340804 6276.576876 8055.530587 7347.467911 \n", "1 ... 94937.888614 90412.278099 87766.906051 85915.192926 \n", "2 ... 10672.000000 11201.000000 12721.000000 11946.000000 \n", "\n", " PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 \\\n", "0 1597.834490 0.000000 1000.000000 823.253257 0.000000 \n", "1 22137.303918 25583.930273 21802.010398 20928.433066 19123.775929 \n", "2 2800.000000 0.000000 1000.000000 2000.000000 0.000000 \n", "\n", " PAY_AMT6 \n", "0 0.000000 \n", "1 22563.515833 \n", "2 0.000000 \n", "\n", "[3 rows x 23 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def generate_local_sample(row, frame, X, N=1000):\n", " \n", " \"\"\" Generates a perturbed sample around a row of interest.\n", " \n", " Args:\n", " row: Row of H2OFrame to be explained.\n", " frame: H2OFrame in which row is stored.\n", " X: List of model input variables.\n", " N: Number of samples to generate.\n", " \n", " Returns:\n", " Pandas DataFrame containing perturbed sample.\n", " \n", " \"\"\"\n", " \n", " # initialize Pandas DataFrame\n", " sample_frame = pd.DataFrame(data=np.zeros(shape=(N, len(X))), columns=X)\n", " \n", " # generate column vectors of \n", " # randomly drawn levels for categorical variables\n", " # normally distributed numeric values around mean of column for numeric variables\n", " for key, val in frame[X].types.items():\n", " if val == 'enum': # 'enum' means categorical\n", " rs = np.random.RandomState(11111) # random seed for reproducibility\n", " draw = rs.choice(frame[key].levels()[0], size=(1, N))[0]\n", " else:\n", " rs = np.random.RandomState(11111) # random seed for reproducibility\n", " loc = row[key][0, 0]\n", " sd = frame[key].sd()\n", " draw = rs.normal(loc, sd, (N, 1))\n", " draw[draw < 0] = loc # prevents unrealistic values when std. dev. is large\n", " \n", " sample_frame[key] = draw\n", " \n", " return sample_frame\n", "\n", "# run and display results\n", "perturbed_sample = generate_local_sample(row, test, X)\n", "perturbed_sample.head(n=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Calculate distance between row of interest and perturbed sample\n", "Once the sample is simulated, then distances from the point of interest are used to weigh each point before fitting a penalized regression model. Since Euclidean distance calculations require numeric quanitites, categorical input variables are one-hot encoded. (Pandas has convenient functionality for one-hot encoding, and the H2OFrames are temporarily casted back to Pandas DataFrames to perform the encoding.) To prevent the disparate scales of numeric values, such as `AGE` and `LIMIT_BAL`, from skewing Euclidean distances, numeric input variables are standardized. \n", "\n", "First, the row containing the riskiest customer is encoded and standardized." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parse progress: |█████████████████████████████████████████████████████████| 100%\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
LIMIT_BAL AGE BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 SEX_female EDUCATION_graduate school MARRIAGE_married PAY_0_3 month delay PAY_2_2 month delay PAY_3_2 month delay PAY_4_3 month delay PAY_5_3 month delay PAY_6_3 month delay
2.246390.481433 -0.66112 -0.657958 -0.651883 -0.637776 -0.622867 -0.609179 -0.360791 -0.282325 -0.315203 -0.319038 -0.319074 -0.270536 1 1 1 1 1 1 1 1 1
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# scaling and one-hot encoding for calculating Euclidian distance\n", "# for the row of interest\n", "\n", "# scale numeric\n", "numeric = list(set(X) - set(['ID', 'SEX', 'EDUCATION', 'MARRIAGE', 'PAY_0', 'PAY_2',\n", " 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'DEFAULT_NEXT_MONTH']))\n", "\n", "scaled_test = test.as_data_frame()\n", "scaled_test[numeric] = (scaled_test[numeric] - scaled_test[numeric].mean())/scaled_test[numeric].std()\n", " \n", "# encode categorical\n", "row_df = scaled_test[scaled_test['ID'] == 22760]\n", "row_dummies = pd.concat([row_df.drop(['ID', 'SEX', 'EDUCATION', 'MARRIAGE', 'PAY_0', 'PAY_2',\n", " 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'DEFAULT_NEXT_MONTH'], axis=1),\n", " pd.get_dummies(row_df[['SEX', 'EDUCATION', 'MARRIAGE', 'PAY_0',\n", " 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6']])], \n", " axis=1)\n", "\n", "# convert to H2OFrame\n", "row_dummies = h2o.H2OFrame(row_dummies)\n", "row_dummies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then the simulated sample is encoded and standardized." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parse progress: |█████████████████████████████████████████████████████████| 100%\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
LIMIT_BAL AGE BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 SEX_female EDUCATION_graduate school MARRIAGE_married PAY_0_3 month delay PAY_2_2 month delay PAY_3_2 month delay PAY_4_3 month delay PAY_5_3 month delay PAY_6_3 month delay
-0.845634-0.0955699 -0.84979 -0.845174 -0.845174 -0.83886 -0.834367 -0.834967 -0.83834 -0.723503 -0.722209 -0.849042 -0.723503 -0.723503 1 1 0 0 0 0 0 0 0
1.41903 1.2011 1.41949 1.41897 1.41897 1.4182 1.4176 1.41769 1.41814 1.41999 1.41993 1.41941 1.41999 1.41999 0 0 1 0 0 0 0 0 0
-0.713084-1.69369 -0.716971 -0.712654 -0.712654 -0.706755 -0.70256 -0.70312 -0.706269 -0.723503 -0.722209 -0.716271 -0.723503 -0.723503 0 0 0 0 0 0 0 0 0
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# scaling and one-hot encoding for calculating Euclidian distance \n", "# for the simulated sample\n", "\n", "# scale\n", "scaled_perturbed_sample = perturbed_sample[numeric].copy(deep=True)\n", "scaled_perturbed_sample = (scaled_perturbed_sample - scaled_perturbed_sample.mean())/scaled_perturbed_sample.std()\n", "\n", "# encode\n", "perturbed_sample_dummies = pd.concat([scaled_perturbed_sample,\n", " pd.get_dummies(perturbed_sample[['SEX', 'EDUCATION', 'MARRIAGE', 'PAY_0',\n", " 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6']])],\n", " axis=1)\n", "\n", "# convert to H2OFrame\n", "perturbed_sample_dummies = h2o.H2OFrame(perturbed_sample_dummies[row_dummies.columns])\n", "perturbed_sample_dummies.head(rows=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Distance is calculated using h2o. The distance is substracted from the maximum distance, changing the distance values into similarity values. Now the observations with the highest values are those that are closest to the observation of interest and they will carry the most weight in the local explanatory linear model. A few sample similarity values are displayed directly below." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
distance
13.2143
10.2477
12.6506
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# calculate distance using H2OFrame distance function\n", "distance = row_dummies.distance(perturbed_sample_dummies, measure='l2').transpose()\n", "distance.columns = ['distance'] # rename \n", "distance = distance.max() - distance # lower distances, higher weight in LIME\n", "distance.head(rows=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Bind distance weights onto perturbed sample\n", "To fit an h2o linear model using the similarities as observation weights, the distance column must reside in the same H2OFrame as the simulated sample data." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parse progress: |█████████████████████████████████████████████████████████| 100%\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
LIMIT_BALSEX EDUCATION MARRIAGE AGEPAY_0 PAY_2 PAY_3 PAY_4 PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 distance
9988.45femalegraduate schooldivorced 58.28755 month delay5 month delay5 month delay5 month delay6 month delay 6 month delay 3152.62 5678.04 5433.34 6276.58 8055.53 7347.47 1597.83 0 1000 823.253 0 0 13.2143
181040 male high school married 70.4607pay duly pay duly pay duly pay duly use of revolving credituse of revolving credit 99691.6 98946.5 94937.9 90412.3 87766.9 85915.2 22137.3 25583.9 21802 20928.4 19123.8 22563.5 10.2477
20000 male university single 43.28427 month delay7 month delay7 month delay7 month delay8 month delay 8 month delay 8803 11137 10672 11201 12721 11946 2800 0 1000 2000 0 0 12.6506
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perturbed_sample = h2o.H2OFrame(perturbed_sample).cbind(distance)\n", "perturbed_sample.head(rows=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Bind model predictions onto perturbed sample\n", "For LIME, the target of the explanatory local linear model is the predictions of the GBM model in the local simulated sample. The values are calculated and column-bound to the simulated sample. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gbm prediction progress: |████████████████████████████████████████████████| 100%\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
LIMIT_BALSEX EDUCATION MARRIAGE AGEPAY_0 PAY_2 PAY_3 PAY_4 PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 distance p_DEFAULT_NEXT_MONTH
9988.45femalegraduate schooldivorced 58.28755 month delay5 month delay5 month delay5 month delay6 month delay 6 month delay 3152.62 5678.04 5433.34 6276.58 8055.53 7347.47 1597.83 0 1000 823.253 0 0 13.2143 0.515197
181040 male high school married 70.4607pay duly pay duly pay duly pay duly use of revolving credituse of revolving credit 99691.6 98946.5 94937.9 90412.3 87766.9 85915.2 22137.3 25583.9 21802 20928.4 19123.8 22563.5 10.2477 0.0663977
20000 male university single 43.28427 month delay7 month delay7 month delay7 month delay8 month delay 8 month delay 8803 11137 10672 11201 12721 11946 2800 0 1000 2000 0 0 12.6506 0.692681
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yhat = 'p_DEFAULT_NEXT_MONTH'\n", "preds1 = model.predict(perturbed_sample).drop(['predict', 'p0'])\n", "preds1.columns = [yhat]\n", "perturbed_sample = perturbed_sample.cbind(preds1)\n", "perturbed_sample.head(rows=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Train penalized linear model in local region \n", "Once the simulated sample has been weighted with distances and contains the GBM model predictions, a linear model is fit to the original inputs and the GBM model predictions, weighted by similarity to the row of interest. The trained GLM coefficients are helpful for understanding the local region of response function around the riskiest customer." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "glm Model Build progress: |███████████████████████████████████████████████| 100%\n", "\n", "Local Positive GLM Coefficients:\n", "MARRIAGE.married: 0.005757170307944079\n", "EDUCATION.high school: 0.005757170307945231\n", "PAY_0.8 month delay: 0.0065111741471265\n", "PAY_4.8 month delay: 0.006511174147127027\n", "PAY_3.8 month delay: 0.006511174147127041\n", "PAY_2.8 month delay: 0.006511174147127305\n", "PAY_0.4 month delay: 0.010754792563841487\n", "PAY_2.4 month delay: 0.010754792563842083\n", "PAY_4.4 month delay: 0.010754792563842167\n", "PAY_3.4 month delay: 0.010754792563842194\n", "PAY_5.6 month delay: 0.013679631302936743\n", "PAY_6.6 month delay: 0.01367963130293677\n", "PAY_2.2 month delay: 0.02636889214079144\n", "PAY_0.2 month delay: 0.02636889214079166\n", "PAY_3.2 month delay: 0.02636889214079169\n", "PAY_4.2 month delay: 0.026368892140792216\n", "PAY_0.3 month delay: 0.030938016884379638\n", "PAY_2.3 month delay: 0.03093801688437975\n", "PAY_4.3 month delay: 0.030938016884379915\n", "PAY_3.3 month delay: 0.030938016884380054\n", "PAY_2.7 month delay: 0.03649976537701373\n", "PAY_3.7 month delay: 0.036499765377013785\n", "PAY_0.7 month delay: 0.03649976537701445\n", "PAY_4.7 month delay: 0.036499765377014534\n", "PAY_6.3 month delay: 0.04135072490974577\n", "PAY_5.3 month delay: 0.0413507249097461\n", "Intercept: 0.5259242681066165\n", "\n", "Local GLM R-square:\n", "0.84\n" ] } ], "source": [ "# initialize\n", "local_glm1 = H2OGeneralizedLinearEstimator(weights_column='distance',\n", " seed=12345)\n", "# train \n", "local_glm1.train(x=X, y=yhat, training_frame=perturbed_sample)\n", "\n", "# coefs\n", "print('\\nLocal Positive GLM Coefficients:')\n", "for c_name, c_val in sorted(local_glm1.coef().items(), key=operator.itemgetter(1)):\n", " if c_val > 0.0:\n", " print('%s %s' % (str(c_name + ':').ljust(25), c_val))\n", " \n", "# r2\n", "print('\\nLocal GLM R-square:\\n%.2f' % local_glm1.r2())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The coefficients of the local linear model describe the average behavior of the GBM response function around the riskiest customer. In this local region, customers who missed payments are treated as the most likely to default." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Generate reason codes with LIME based on a perturbed sample\n", "This basic function uses the coefficients of the local linear explanatory model and the values in the row of interest to plot reason code values in a bar chart. The local GLM coefficient multiplied by the value in a specific row are estimates of how much each variable contributed to each prediction decision. These values can tell you how a variable and its values were weighted in any given decision by the model. These values are crucially important for machine learning interpretability and are often to referred to \"local feature importance\", \"reason codes\", or \"turn-down codes.\" The latter phrases are borrowed from credit scoring. Credit lenders in the U.S. must provide reasons for turning down certain credit applications in an automated fashion. Reason codes can be easily extracted from LIME local feature importance values by simply ranking the variables that played the largest role in any given decision." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "def plot_local_contrib(row, model, X, g_pred=None, scale=False): \n", "\n", " \"\"\" Plots reason codes in a bar chart. \n", " \n", " Args:\n", " \n", " row: Row of H2OFrame to be explained.\n", " model: H2O linear model used for generating reason codes.\n", " X: List of model input variables.\n", " g_pred: Prediction of model to be explained, sometimes denoted g, used for scaling.\n", " scale: Whether to rescale contributions to sum to model predictions.\n", " \n", " \"\"\"\n", " \n", " # initialize Pandas DataFrame to store results\n", " local_contrib_frame = pd.DataFrame(columns=['Name', 'Local Contribution', 'Sign'])\n", " \n", " # multiply values in row by local glm coefficients \n", " for key, val in sorted(row[X].types.items()):\n", " contrib = 0\n", " name = ''\n", " if val == 'enum':\n", " level = row[key][0, 0]\n", " name = '.'.join([str(key), str(level)])\n", " if name in model.coef():\n", " contrib = model.coef()[name]\n", " else:\n", " name = key\n", " if name in model.coef():\n", " contrib = row[name][0, 0]*model.coef()[name]\n", " \n", " # save only non-zero values\n", " if contrib != 0.0:\n", " local_contrib_frame = local_contrib_frame.append({'Name': name,\n", " 'Local Contribution': contrib,\n", " 'Sign': contrib > 0}, \n", " ignore_index=True) \n", "\n", " if scale:\n", " scaler = (g_pred - model.coef()['Intercept']) /\\\n", " local_contrib_frame['Local Contribution'].sum()\n", " local_contrib_frame['Local Contribution'] *= scaler\n", " \n", " # plot\n", " _ = local_contrib_frame.plot(x='Name',\n", " y='Local Contribution',\n", " kind='bar', \n", " title='Reason Codes', \n", " color=local_contrib_frame.Sign.map({True:'b', False:'g'}), \n", " legend=False) \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display reason codes\n", "Here it can be seen that the riskiest customer's prediction is driven by her values for payment variables. Specifically, the top five LIME-derived reason codes contributing to her high probability of default are:\n", "\n", "1. Most recent payment is 3 months delayed.\n", "2. 3rd most recent payment is 3 months delayed.\n", "3. 2nd most recent payment is 2 months delayed.\n", "4. 4th most recent payment is 2 months delayed.\n", "5. Customer is married.\n", "\n", "(Of course variables like `MARRIAGE`, `AGE`, and `SEX`should not be used in credit lending decisions. For a slightly more careful treatment of GBM in the context of fair lending see: https://github.com/jphall663/interpretable_machine_learning_with_python/blob/master/dia.ipynb)\n", "\n", "This result is somewhat aligned with LOCO-derived reason codes found in section 5 of the *Increase Transparency and Accountability in Your Machine Learning Project with Python and H2O* Oriole notebook. Both perspectives weigh the riskiest customer's most recent and 3rd most recent payments very heavily in the model's prediction. A minor discrepancy between LOCO- and LIME-derived reason codes is somewhat expected. LIME explanations are linear, do not consider interactions, and represent offsets from the local linear model intercept. LOCO importance values are nonlinear, do consider interactions, and do not explicitly consider a linear intercept or offset. **Because most currently-available explanatory techniques are approximate, it is recommended that users employ several different explanatory techniques and trust only consisent results across techniques. Also, as of h2o 3.24, Shapley values are supported for h2o GBM. Use Shapley values along with or instead of LIME for any high-stakes application.**\n", "\n", "It is also imperative to compare these results to domain knowledge and reasonable expectations. In this case, the LIME reason codes and linear model coefficients tell a relatively parsimonious story about the GBM's prediction behavior. If this was not so, steps should be taken to either reconcile or remove inconsistencies and unreasonable predictions." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_local_contrib(row, local_glm1, X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Use LIME to generate descriptions for a local region with a practical sample\n", "Using a previously-existing local sample based on clusters, deciles, or other more natural segments to create LIME explanations is computationally cheaper and perhaps more straightforward than using a simulated perturbed sample, but it does have one major drawback. If the sample is too large, the explanatory linear model maybe not be accurate enough to explain all predictions in the sample. The remaining sections of this notebook will explore the idea of generating LIME explanations using a practical sample. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create a local region based on values of SEX and merge with GBM model predictions\n", "Instead of using a perturbed simulated sample, a linear model will be fit on all women in the test set, and the sample is not weighted by distance from any one point. A few lines of the all female sample are displayed directly below." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gbm prediction progress: |████████████████████████████████████████████████| 100%\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ID LIMIT_BALSEX EDUCATION MARRIAGE AGEPAY_0 PAY_2 PAY_3 PAY_4 PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 DEFAULT_NEXT_MONTH p_DEFAULT_NEXT_MONTH
4 50000femaleuniversity married 37use of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credituse of revolving credit 46990 48233 49291 28314 28959 29547 2000 2019 1200 1100 1069 1000 0 0.144991
8 100000femaleuniversity single 23use of revolving creditpay duly pay duly use of revolving credituse of revolving creditpay duly 11876 380 601 221 -159 567 380 601 0 581 1687 1542 0 0.128193
16 50000femalehigh schooldivorced 231 month delay 2 month delay use of revolving credituse of revolving credituse of revolving credituse of revolving credit 50614 29173 28116 28771 29531 30211 0 1500 1100 1200 1300 1100 0 0.325205
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "preds2 = model.predict(test).drop(['predict', 'p0'])\n", "preds2.columns = [yhat]\n", "practical_sample = test.cbind(preds2)\n", "practical_sample = practical_sample[practical_sample['SEX'] == 'female']\n", "practical_sample.head(rows=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Train penalized linear model in local region \n", "A penalized linear model is trained in the local region defined by women in the test set. Because fit is a concern in this much larger explanatory sample, users should always check the R2 or other goodness-of-fit measures to ensure surrogate model is accurate in the sample. " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "glm Model Build progress: |███████████████████████████████████████████████| 100%\n", "\n", "Local Positive GLM Coefficients:\n", "BILL_AMT5: 1.0399411952809277e-07\n", "BILL_AMT2: 1.1999977329358182e-07\n", "AGE: 0.0005425600656944679\n", "EDUCATION.high school: 0.0007216688611868718\n", "PAY_0.8 month delay: 0.004556460822481314\n", "PAY_2.7 month delay: 0.004556460822481664\n", "PAY_3.6 month delay: 0.004556460822481671\n", "EDUCATION.university: 0.004929360519089897\n", "PAY_3.3 month delay: 0.009929773165479217\n", "MARRIAGE.married: 0.010735128951809985\n", "PAY_6.2 month delay: 0.018087589078182496\n", "PAY_5.3 month delay: 0.022758693814452228\n", "PAY_2.2 month delay: 0.026627792771502998\n", "PAY_4.2 month delay: 0.028646065263953614\n", "PAY_2.3 month delay: 0.028779499175080204\n", "PAY_5.2 month delay: 0.041320673312054176\n", "PAY_5.7 month delay: 0.04168881231553721\n", "PAY_4.7 month delay: 0.04168881231553731\n", "PAY_6.3 month delay: 0.11704990177038722\n", "PAY_0.4 month delay: 0.14562584980767246\n", "PAY_0.3 month delay: 0.21475212459213328\n", "PAY_0.2 month delay: 0.2253878590232612\n", "Intercept: 0.4675657228203593\n", "\n", "Local GLM R-square:\n", "0.93\n" ] } ], "source": [ "# initialize\n", "local_glm2 = H2OGeneralizedLinearEstimator(seed=12345)\n", "\n", "# train \n", "local_glm2.train(x=X, y=yhat, training_frame=practical_sample)\n", "\n", "# coefs\n", "print('\\nLocal Positive GLM Coefficients:')\n", "for c_name, c_val in sorted(local_glm2.coef().items(), key=operator.itemgetter(1)):\n", " if c_val > 0.0:\n", " print('%s %s' % (str(c_name + ':').ljust(25), c_val))\n", " \n", "# r2\n", "print('\\nLocal GLM R-square:\\n%.2f' % local_glm2.r2())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The R2 is quite high for this local sample and linear model. Because the sample is simply the women in the test set and because the model fit is acceptable, the trained linear model and coefficients can be used to understand the average behavior of women in the test set. On average, late payments, particulary `PAY_0`, `PAY_2`, and `PAY_6`, are the most likely to push the GBM model towards higher probability of default values for women." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Generate a ranked predictions plot to assess validity of local explanatory model\n", "A *ranked predictions plot* can also be used to ensure the local linear surrogate model is a good fit for the model inputs and predictions. A ranked predictions plot is a way to visually check whether the surrogate model is a good fit for the complex model. The y-axis is the numeric prediction of both models for a given point. The x-axis is the rank of a point when the predictions are sorted by their GBM prediction, from lowest on the left to highest on the right. When both sets of predictions are aligned, as they are below, this a good indication that the linear model fits the complex, nonlinear GBM well in the practical sample." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "glm prediction progress: |████████████████████████████████████████████████| 100%\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzs3Xd4VFX6wPHvO2mEJCSkQIAgCb03KTaagiAo9i6KWFZde+8glnVdVtd19aeuBXdtWFZFRUERBAGR3jsECDWFQEIISWbO748pmUxmkkky6e/neXjI3HvuPWcG8s65p4oxBqWUUo2LpbYLoJRSquZp8FdKqUZIg79SSjVCGvyVUqoR0uCvlFKNkAZ/pZRqhDT4q1ojIlNE5MMA3StVREYG4l5VyVtEHheRdyp5nw0iMjyghaskERkuImm1XQ5VfTT4qzI5AtsJEckVkYMiMl1EImu7XBXhKHOB4z1kichPItK1OvIyxrxgjLnZzzI953FtD2PM/Oool48yTBQRq+NzOSYiq0Xk/Ercp9R7UXWfBn/ljwuMMZFAX6Af8Fgtl6cyXnK8hyTgMDDdWyIRCa7JQtUBSxyfSwzwLvCZiDSv5TKpGqDBX/nNGHMQmI39SwAAERknIqscNce9IjLF7VyyiBgRuUFE9ohIhog84e3eIhIiIp+IyJciEioiFhF5VER2iEimiHwmIrFu6SeIyG7HOa/39PEe8oCPgZ6O+0wRkS9E5EMROQZMrErenk1ZInKWiCwWkWzH5zNRRG4FrgUedtS6v3WkdW8+ChORf4jIfseff4hImOPccBFJE5EHROSwiBwQkRvd8hwrIhtFJEdE9onIg358LjbgPSAc6OB5XkS6ich8x/vYICLjHce9vhdV92nwV34TkSTgPGC72+HjwPXYa47jgNtF5CKPS88CugDnAE+LSDeP+4YDXwMngSuMMQXAXcBFwDCgNXAEeN2Rvjvwf8AEx7k47DV6f95DJPZgtcrt8IXAF4738FGg8haRdsAPwGtAAvYvzdXGmLcd+bxkjIk0xlzg5fIngNMc1/QBBgFPup1PBKKBNsBNwOtuNfZ3gT8ZY6Kwf8n94sfnEgzcDOQC2zzOhQDfAnOAFo7P5yMR6eLne1F1kAZ/5Y+vRSQH2Iu9yWSy84QxZr4xZp0xxmaMWQt8gj1ounvGGHPCGLMGWIM9mDk1A34EdgA3GmOsjuO3AU8YY9KMMSeBKcBljiB1GfCdMWaB49xTgK2c9/CgiGRj/+KKBCa6nVtijPna8R5OBDDva4CfjTGfGGMKjTGZxpjV5ZTT6VpgqjHmsDEmHXgG+xeOU6HjfKExZhb2oN3F7Vx3EWlmjDlijFlZRj6nOT6Xg8DVwMXGmKOeabB/Zi8aYwqMMb8A3znSq3pKg7/yx0WOWuRwoCsQ7zwhIoNFZJ6IpIvIUeyBM97j+oNuP+dhDyROpwG9sQcW91UG2wFfOZoZsoFNgBVoib3GvdeZ0BhzHMgs5z1MM8bEGGMSjTHjjTE73M7t9UgbqLzbYv9Sq4zWwG6317sdx5wyjTFFbq/dP9dLgbHAbhH5VUROLyOf3x2fS7wx5jRjzM8+yrLX0TTkXp42/r4ZVfdo8Fd+M8b8ir2jdJrb4Y+BmUBbY0w08CYgFbjtHOAvwFwRael2fC9wniMwOf80McbsAw5gD6wAiEhT7M0vleW5tG2g8t6Ll/ZzH3l62o/9S8jpFMexchljlhljLsTeRPM18Jk/15VTlrYi4h4vTgH2ObOs4v1VLdDgryrqH8AoEXE23UQBWcaYfBEZhL2po0KMMS9h/xKZKyLOp4Y3gecd7eaISIKIXOg49wVwvqMzNRSYSmD/Lwcq74+AkSJyhYgEi0iciDg7yw8B7csowyfAk46844GngXLnRDg6y68VkWhjTCFwjPKbxMqzFPuTxcOOjvnhwAXAp47z5b0XVQdp8FcV4mh//g/2YARwBzDV0SfwNJWsZRpjnsVeS/3ZMbLmVexPFHMc9/4dGOxIuwH4M/YvjAPYO2QDOSEpIHkbY/Zgb355AMgCVlPc3/Eu9nb5bBH52svlzwHLgbXAOmCl45g/JgCpjtFLt2HvP6g0Rwf8Bdg7+zOAN4DrjTGbHUnKey+qDhLdzEUppRofrfkrpVQjpMFfKaUaIQ3+SinVCGnwV0qpRqjOLmIVHx9vkpOTa7sYSilVr6xYsSLDGJNQXro6G/yTk5NZvnx5bRdDKaXqFRHZXX4qbfZRSqlGSYO/Uko1Qhr8lVKqEaqzbf7eFBYWkpaWRn5+fm0XRQVYkyZNSEpKIiQkpLaLolSjUK+Cf1paGlFRUSQnJyNSkYUjVV1mjCEzM5O0tDRSUlJquzhKNQr1qtknPz+fuLg4DfwNjIgQFxenT3RK1aB6FfwBDfwNlP67KlWz6l3wV0qphmzx9gx2ZRyv9nw0+FfQ888/T48ePejduzd9+/Zl6dKltV0kl+nTp7N/v1+bPblMmTKFNm3a0LdvX3r27MnMmTOrVIbhw4fr5DylquCad5YyYtr8as+nXnX41rYlS5bw3XffsXLlSsLCwsjIyKCgoMDv64uKiggODvb5uqqmT59Oz549ad26dfmJ3dx33308+OCDbNq0iSFDhnD48GEsluJ6QaDLqZSqfVrzr4ADBw4QHx9PWFgYAPHx8a5Am5ycTEZGBgDLly9n+PDhgL1mPWHCBM4880wmTJjA9OnTGT9+PGeffTbnnHMOxhgeeughevbsSa9evZgxYwYANpuNO+64g65duzJq1CjGjh3LF198AcDUqVMZOHAgPXv25NZbb8UYwxdffMHy5cu59tpr6du3LydOnGDFihUMGzaMU089ldGjR3PgwIEy31+3bt0IDg4mIyODiRMncttttzF48GAefvhhjh8/zqRJkxg0aBD9+vXjm2++AeDEiRNcddVVdOvWjYsvvpgTJ04AYLVamThxout9vfLKK4H9x1BKVUm9rc498+0GNu4/FtB7dm/djMkX9PB5/txzz2Xq1Kl07tyZkSNHcuWVVzJs2LBy77tx40Z+++03wsPDmT59OitXrmTt2rXExsby5Zdfsnr1atasWUNGRgYDBw5k6NChLFq0iNTUVDZu3Mjhw4fp1q0bkyZNAuDOO+/k6aftuyhOmDCB7777jssuu4x//etfTJs2jQEDBlBYWMhdd93FN998Q0JCAjNmzOCJJ57gvffe81nOpUuXYrFYSEiwrwmVlpbG4sWLCQoK4vHHH+fss8/mvffeIzs7m0GDBjFy5EjeeustmjZtyqZNm1i7di39+/cHYPXq1ezbt4/169cDkJ2d7d8/glKqRtTb4F8bIiMjWbFiBQsXLmTevHlceeWVvPjii0ycOLHM68aPH094eLjr9ahRo4iNjQXgt99+4+qrryYoKIiWLVsybNgwli1bxm+//cbll1+OxWIhMTGRESNGuK6fN28eL730Enl5eWRlZdGjRw8uuOCCEnlu2bKF9evXM2rUKMBeE2/VqpXX8r3yyit8+OGHREVFMWPGDNfIm8svv5ygoCAA5syZw8yZM5k2bRpgH3a7Z88eFixYwN133w1A79696d27NwDt27dn586d3HXXXYwbN45zzz3Xr89YKVUz6m3wL6uGXp2CgoIYPnw4w4cPp1evXnzwwQdMnDiR4OBgbDYbQKnx6hEREWW+roj8/HzuuOMOli9fTtu2bZkyZYrX8fHGGHr06MGSJUvKvaezzd+TezmNMXz55Zd06dLFr3I2b96cNWvWMHv2bN58800+++yzMp86lGrsVu45wiVvLK6x/LTNvwK2bNnCtm3bXK9Xr15Nu3btAHub/4oVKwD48ssv/b7nkCFDmDFjBlarlfT0dBYsWMCgQYM488wz+fLLL7HZbBw6dIj58+cDxV8s8fHx5ObmuvoBAKKiosjJyQGgS5cupKenu4J/YWEhGzZsqPR7Hz16NK+99hrGGABWrVoFwNChQ/n4448BWL9+PWvXrgUgIyMDm83GpZdeynPPPcfKlSsrnbdSjYEz8E8Jns60kDerPb96W/OvDbm5udx1111kZ2cTHBxMx44defvttwGYPHkyN910E0899ZSrs9cfF198MUuWLKFPnz6ICC+99BKJiYlceumlzJ07l+7du9O2bVv69+9PdHQ0MTEx3HLLLfTs2ZPExEQGDhzoupezkzY8PJwlS5bwxRdfcPfdd3P06FGKioq499576dGjck9MTz31FPfeey+9e/fGZrORkpLCd999x+23386NN95It27d6NatG6eeeioA+/bt48Ybb3Q9Df3lL38B4M037f+pb7vttkqVQ6mGrptlD4bqn/QozppcXTNgwADjOV5806ZNdOvWrZZKVPNyc3OJjIwkMzOTQYMGsWjRIhITE2u7WNWmsf37KuUu+dHvAfg+9DH2mzhGPTuvUvcRkRXGmAHlpdOafx12/vnnk52dTUFBAU899VSDDvxKKbtITpBLePkJq0iDfx3mbOdXSjUezSSPXFv1B3/t8FVKqToijAKaSy7pJqba89Lgr5RSdUQz7Au6ZdKs2vPS4K+UUnVEjNiDf7aJrPa8NPgrpVQdEUMuANlUfiKovzT4V5CIcN1117leFxUVkZCQwPnnnw/YV9a88847y7zH8OHD6dKlC3369OHMM89ky5YtVSpTZGT11xKUUoGVnVfgmjS59ZB9cma0o+Z/1Gjwr3MiIiJYv369a/XKn376iTZt2lT4Ph999BFr1qzhhhtu4KGHHip13mq1VrmsSqm6aW9WHn2n/sQVby1h2N/muRapjBFnzV+bfeqksWPH8v339gkZn3zyCVdffXWl7zV06FC2b98O2JeIeOSRR+jfvz+ff/45O3bsYMyYMZx66qkMGTKEzZs3A7Br1y5OP/10evXqxZNPPum614EDBxg6dKhrY5aFCxdW4V0qpaqqyGrjn3O3cfxkEfO3HOazZXv5YHEq2w7ba/rLUo+wOzOPl3/aCkAs9i+Bmmjzr7/j/H94FA6uC+w9E3vBeS+Wm+yqq65i6tSpnH/++axdu5ZJkyZVOtB+++239OrVy/U6Li7OtQ7OOeecw5tvvkmnTp1YunQpd9xxB7/88gv33HMPt99+O9dffz2vv/6669qPP/6Y0aNH88QTT2C1WsnLy6tUmZRSgfH16v28/NNWsvMKeW/RLtfx6PCQEun2ZNl/VzvKftJNNLk0rfay1d/gX4t69+5Namoqn3zyCWPHjq3UPa699lrCw8NJTk7mtddecx2/8sorAfvSDosXL+byyy93nTt58iQAixYtci0eN2HCBB555BEABg4cyKRJkygsLOSiiy6ib9++lSqbUqp8L/+0lTE9Eune2vewzIIi+9pWC7ellzh+9ESh1/QdLPvZbqt4M3Jl1N/g70cNvTqNHz+eBx98kPnz55OZmVnh6z/66CMGDCi9/IZzGWWbzUZMTAyrV6/2er1zzX13Q4cOZcGCBXz//fdMnDiR+++/n+uvv77CZVOqsTpZZOX9RancdFYKIUG+W8ULHc05b/66g63PnVfufbcdzvUr/5ZyhKWmZta3Ckibv4iMEZEtIrJdRB71cv4UEZknIqtEZK2IVK66XIdMmjSJyZMnl2iyCaRmzZqRkpLC559/DtjX01+zZg0AZ555Jp9++ilg/xJx2r17Ny1btuSWW27h5ptv1mWUlaqgdxbu4sUfNvPh77v9Su+s2QdCM47Tmkx221oG7J5lqXLwF5Eg4HXgPKA7cLWIdPdI9iTwmTGmH3AV8EZV861tSUlJrh2sPE2fPp2kpCTXn7S0tErl8dFHH/Huu+/Sp08fevTo4do399VXX+X111+nV69e7Nu3z5V+/vz59OnTh379+jFjxgzuueceAG6++WY8V0hVSpWWe7IIgLyCyo22+/D33azfd9T12ssDuk9tJAOLGLaapErlXVGBaPYZBGw3xuwEEJFPgQuBjW5pDLjmK0cD+wOQb63IzS39+Obc2Qvsa+qXt62jrwXbUlNTS7xOSUnhxx9/LJUuJSWlxA5dzz33HAA33HADN9xwQ6n077zzTpnlUUoFxpNf2/esTn1xHECFVuVPlCwA0k10oIvlVSCafdoAe91epzmOuZsCXCciacAs4C5vNxKRW0VkuYgsT09P95ZEKaXqjb/N9n8C5xmWDRQZC5tMu2osUbGaGud/NTDdGJMEjAX+KyKl8jbGvG2MGWCMGZCQkFBDRVNKKbtA75+VebzAr3SxHOPaoLnMsQ0gjyYBLoV3gQj++4C2bq+THMfc3QR8BmCMWQI0AeIrk1ld3XlMVY3+u6q6oLb+F04M/pGmcpJ/FF0KwK1D21d7noEI/suATiKSIiKh2Dt0Z3qk2QOcAyAi3bAH/wq36zRp0oTMzEwNFA2MMYbMzEyaNKmZGo9SdclZlnX8Oegb5lhPZatpW/4FAVLlDl9jTJGI3AnMBoKA94wxG0RkKrDcGDMTeAD4t4jch/3LdaKpRAR3jpzR/oCGp0mTJiQl1cwoB6V88bfZxz165eQXEtWk5IzdvVl5WCzl3y2RTN4JmcZe04IHCm93u3/1V3ADMsnLGDMLe0eu+7Gn3X7eCJxZ1XxCQkJISUmp6m2UUipgxv5zIQsfPrvEsSEvlb/5ehBW/hryb0IoYlLhQ+TUwJIO7nRhN6WUqoK9WScqdd3EoNkMC1rLs0UT2GlaB7hU5dPgr5RSHsprdjFV7Bq+2LKQx4I/5ndbNz6wnlule1WWBn+llPLiRIHVNeM3kNrJQV4IeZc1pgN/KrgPU0thWIO/Ukp5Mej5n+k5eXZA7ynY+EfIGxQSzB0F93C0BjZt8aX+ruqplFIB5r4WT04ZtX7PVqEjxwt489cd5d7/0eBP6GfZzmOFN3GI2MoWMyC05q+UUn6Y+P4fDHnpF6/nJs/cwFsLdpZ5fX/Zyp+Cv+dL61l8Yj27zLQ1MZVJa/5KKeWH+Vt8zy+auab8tSqvC/6ZHBPO5MKJBH4hiYrTmr9SSnkIdM07RQ4w3rKYWdbBfm3RWJGloCtLg79SSlWjAbKZH0MfIZ9Q3reO8euammj20eCvlFIVlJ3nfQ9eT9Hk8lrov0gnhnEFL7DZnFLNJfOftvkrpVQFnfaXuX6kMkwNmU48R7moYCq7TWJ1F6tCtOavlFJlOFFgpfOTP7her9id5dd19wZ/yYVBi3mt6GI2mLq3JpkGf6WUchAvo3DeW7SrxEbtl/7fklJpPHWXVO4N/h8/WAfymvWiMtOueVqXd1BKqTqnIlsxOj0UPIMCE8TkwonlLt8Q3TSkzPPVRYO/UkoFjOGOoG8YEbSGvxddwWGa13aBfNLgr5RSHioz0rIJJ3kp+G0eDpnBd9bBTLeO9vva83u3qkSOVaPBXymlquhMyzrmhT3AFcG/8mrRJdxZeDcnCS33utgIe5rHxnYrcbwmNqrVoZ5KKVUFPWQX74VM47CJ4drCx1hk6+X3tT/fP6waS1Y2rfkrpWrcgaMnSH70e+ZsOFjbRSmhossqBGHl2ZD3OUkIVxc+WaHAD8U1/9qgwV8pVePWpR0F4LPlabVckqowTAn+gP6W7TxZeCNpJqG2C1Qh2uyjlFIVlMARXgp5mxFBa/i4aAQzbWfWdpEqTIO/Ukp5KGthtWhymRX2OM3I4y+FV/OOdWwA8quJLt6SNPgrpZSfosjjh7BHSZCj3FjwEPNs/Wq7SJWmbf5KKeWnK4Lm0VqymFTwYLUGfl3SWSmlalBZg32SJJ27g7/id1s3frH1D2y+NbF7iwcN/kop5Yc/BX1LtOTxROGkgN/bs81fd/JSSqk6IEnSmRD8M99ZT2OHaVPt+dWbZh8RGSMiW0Rku4g86iPNFSKyUUQ2iMjHgchXKaUC6edNhwEwHgssPB38HwpMEK8UXVobxaoWVR7tIyJBwOvAKCANWCYiM40xG93SdAIeA840xhwRkRZVzVcppQJt44FjAPyy+bDrWH/ZyrlBK3i58LJqq/V7tvm3jmlSLfm4C0TNfxCw3Riz0xhTAHwKXOiR5hbgdWPMEQBjzGGUUqqO2p99wvXzuUHLKTBBvOfn5uu+pL44zuc5Z5t/6+gmvHP9ACadWf07fwUi+LcB9rq9TnMcc9cZ6Cwii0TkdxHx+imKyK0islxElqenpwegaEopVXFWW3Gzz2mWjawyncilaYXv8+S4buUnciMijOzeEoul+nt8a2qSVzDQCRgOJAELRKSXMSbbPZEx5m3gbYABAwbU/JQ3pVSDsHH/MbYeyuGifpVrpnEG/zAK6C67ebeSs3hvHtKeLolRJMdF+JW+Jmf6BiL47wPaur1OchxzlwYsNcYUArtEZCv2L4NlAchfKaVKGPvPhQCu4J9XUERokIXgIP8aO5wV/8GWTYSKlRW2zpUuy5BOdXPBt0A0+ywDOolIioiEAlcBMz3SfI291o+IxGNvBtoZgLyVUqpc3Z+ezd2frvI7fZHNvmH7CMtq8k0ICyu4VDPA6e3jKnxNTU72qnLN3xhTJCJ3ArOBIOA9Y8wGEZkKLDfGzHScO1dENgJW4CFjTGZV81ZKKX/NWud774D/Lknlkz+Kuy7zC+3Bf6hlLUts3f3alcuT53DRuiYgbf7GmFnALI9jT7v9bID7HX+UUqpOeeqbDaWOxXGUDpYDfFo4olL37JrYrMLX1GSbv87wVUopL/pZtgOwxtahwtfGRoTy+NiKjfSpaRr8lVK1qO42jZxpWc9JE8wa41/wb59QPKJn5VOjCA2ueHityTZ/Df5KKeVBsHFJ0EJW2jr73d5vs9XdLzJvNPgrpWpRzS9l7I+zLOuJljy+s53m9zWBiP31bZy/UkrVC0fzCgkJFnJPFpFfYKNtbLjXppYLLEs4ZpryhXWo3/e2+hH91005F4BeU+b4X+hqosFfKVXjThRaK3zN3Z+sIq+giHduGFjpfPtMnUNisyYcPJYPQHR4CGsmn1sijWBjsGUTy2xdym3yeWZ8DybPtI8UuuGMdrwwa3OZ6aOahACw8OER5BWU/gy0zV8p1aDd8+nqCl8zc81+15LLZdmVcbzM887AD3D0RCH73BZxA+grO2hnOcws6+AKle/Wof6PCmob25QuiVEVun+gafBXSjUoI6bNr1D6F38oWVsfYlmHzQjzbH0DWKq6R5t9lFIN2uGc/PITuelv2cYW05YsKj5Jq7JaRYczvEsCd47oWGN5as1fKVUvbT2Uw+fL95abbsHWjDLPf7tmv+tnZ3v/BpNcqTL1aF25L4wgizD9xkEMSI6t1PWVoTV/pVS9dO4rCwC4fEBbn2mO5hXy4Odr/L7nxZbfCJcC5lr7lZkuLiKUzOMFpY7PvPMsbDU4XLMqtOavlGqwPliSWqH0Zwet5qBpzg+2QT7TnNY+lnG9W3k9F2QRQvxcNrq21Y9SKqVUNQuhiKGWtSy09sLX5LOzOsbz6a2n12zBqokGf6WUwr5dYzPJY7bN9zyC83ollnhdkzNyA02Dv1KqFtWd4HmeZWmZG7dc0Kc11ww6pYZLVX00+CulGj0LNoYFreU3W0+fs3pDgyw1OgO3umnwV0rVoroRTM+yrKONZPK19Syfadzjft0oddVo8FdKNVj+NslPCvqRo6Ypc2wDqnyv+kKDv1KqUeskaQwPWsN062gKCKnt4tQYDf5KqUbtmqC5FJggpheNLjOde7OPxSKOY/W3AUiDv1KqFtVuW0oUeVwe9CuzbQM5UoG1fO4d2ZkJp7XjyoG+ZxfXdbq8g1Kq0RoftJhIyeftovMrdF10eAjPXtSzmkpVM7Tmr5SqRbXbbHJl0Dx221qwzrSv1XLUBg3+SqlGqZOk0duyiw+sZbf1N1Qa/JVSjdK4oN+xGeFba9XW6kls1iRAJapZ2uavlGp0wijgiqD5LLV1I50Yv67x1UA15/6h5OYXBa5wNURr/kqpes1qM1htFRs1dFHQIlpLFv+yXljl/Js1CaF1THiV71PTAhL8RWSMiGwRke0i8mgZ6S4VESMivqfRKaUakaoP9RwxbT5dn/rB7/ShFHJ38P9YbWvPIlv9HrFTFVUO/iISBLwOnAd0B64Wke5e0kUB9wBLq5qnUqpx+XjpHp/n9mTlUWj1/0tkqGUtbSSTV4supbZHG9WmQNT8BwHbjTE7jTEFwKeAt2epZ4G/AhXbTVkp1YD5F3wf/2pdpe6+aHvJ/Xst2Lg7+H8cNjGNutYPgQn+bQD3XZTTHMdcRKQ/0NYY831ZNxKRW0VkuYgsT09PD0DRlFKN2R+pWSVej7H8QW/LLp4rvLZRrePjTbV3+IqIBXgZeKC8tMaYt40xA4wxAxISEqq7aEqpRiSKPO4K/or9JpbvbA1jK8aqCETw3we4L3CR5DjmFAX0BOaLSCpwGjBTO32VUjUljAL+G/oCHWQ/TxTehE0HOgbkE1gGdBKRFBEJBa4CZjpPGmOOGmPijTHJxphk4HdgvDFmeQDyVkrVaxUf7ZNfaPV5bl3aUa/HHwn+lN6yi7sK72KerV+F82yIqhz8jTFFwJ3AbGAT8JkxZoOITBWR8VW9v1KqYSu02kh+9Hv+syTVr/SXvLGY/dknSh3fmZ7LFW8tKXW8h6QyIegnPrcOY7ZtUBVL23AEZIavMWYWMMvj2NM+0g4PRJ5KqYZAyDtpr8lPm72F609PLpVin0eg33jgGGe8+EupdGf//ddSxyLJ453QaWTSjOeLrqlaSRvYqFBd3kEpVSf4agBativLx5nyXRS0iFaSxcSChzhGZKXvA7qNo1JKBVY11aiT5QAPB89grS2FX219qieTekyDv1KqwekpO/kqdDJFWLij8F5MAEJdQ2v20eCvlKpFgW1LCaOAq4Pm8mXoMwiGywsmk2Z0zpA32uavlKpz9mTmYTC0i4vwK70FG6dZNvJs8Pt0sBxgvS2Zewr/zA7TpvyLy9A1MYrNB3MASGretEr3qms0+Cul6pyhf5vn+vmx87qWmTaMAv4d8neGBq3jmAnnroI7+cE2iKIqhrerBralRbMmbD6YQ5+kaO4Y3qFK96trNPgrpWqRW0O6jxagv/ywucw7PB/yHmdZ1vNM4QQ+sZ5NPmEBLJ/diK4tCA5qWK3kGvyVUjUmv9BK16d+dL3+edMh18/SJl8EAAAgAElEQVTO2H+yyPcMXk8Tg37ksqAFvFl0Ae9bzwtUMQF7B2+fpGgAejv+bkg0+CulakxG7snSBx1R3znT1x+9ZQfjgxZzc/APzLf24eWiywJYSifhnG4tWfLY2bSKrn87dZVHg79SqlKKrDaO5RcRGxHqV/r0nJOc9dd5pY4bR/Q/WWTz4y6G8ZbF/C3kLcKkiAXWXtxd+OdqXZ65IQZ+0OCvlKqgrYdyOPeVBbSJCWdf9gm2PnceocHlt4enHcnzerwiM2f/HfJ3RgWtZJ0tmRtOPkoWzfy/uIIa2rh+Tw2rB0OpBsgYw2/bMjB1ZH2BpY7lFpxr7hTZ7DX2xdszyM4rqPD9/HlXzcjl5ZA3GBW0knnWPlxWMKVaAz/Axf2qNky0rtPgr1QdN2PZXq57dylfr95XfmI/fLA4lZV7jlT6em8V4vxCK9e8s5Qbpy/zek2R1cb9n63xem7xjgyvx536yHa+DX2S8y1L+G/RSP5ceA8n8a+pqbJSXxzHwOTYas2jtmmzj1K1aMHWdF7+aStf3Ha6z6GEex3NJfuOlF7G2B8nCqzYjCEizP7rPnnmBsAe4AKlyGavv285mMPRE4Wk5+QT0zSU+Ej7sMuOT/zg89o7P17l81x/2cqM0GcpJJhrCp5guSl7zL/ynwZ/pWrR/Z+tJiO3gCN5hSREBX58OsCA537ieIE1oMHel7wCK32emeN6vXHqaJqGVi7MdJB9vBf6N/II4+KCqew0rQNSxg4JEXRsEcnsDYfo0zaGNXuzA3Lf+kabfZSqgvxCK0fzCqs1j6o29R8v8H/cvD88O0LLKt/O9OPM2XCwwnmMtizjm9CnsCFcW/B4wAI/wA1nJLt+vm1o+4Ddt77R4K+UF1nHC/zqYL3o9UX0mTqn3HTV6cf1B0h+9Huvo2kyvY2r92H9vqMkP/o9K3ZnYYxh8As/89myveVeZ/C9KvPVb//Orf9d4XcZAM6wrOe1kH+y07Ri7Mm/sN4ENkAbA/eP6kLnlpGc0TG+xLk7hnfgvYmNY3txDf5KediRnkv/Z3/ig8Wp5aZ1LvpVVQbDiQIr577yKyt2V2zzki9W2DuCN+w/Vurcs99t9Hld8qPfM8XR/g+wYFs6AD9tPIzVZjh07CQPf7mWPZklv1TEI9T3nDyb1T6aTnJOFvn3JhxiOcZrIa+xx7TkxoKHOUhcha73V5fEKObcN4zo8BBWPTWKVU+N4r6RnXng3C6c3bVlteRZ12jwV8rD7szjAMzfml4DuRUH0o0HjrH1UC7Pfb+p3Kt+25ZB8qPfs+1QTpnj0QutxU8vBUW2UpufT3f7gnMGdeOadmW3YX/xpujvL9rFAi+fy7XvLC23zOVJlgP8L3QyzcjjnsI7yaRmllRoHhFK84hQ7hnZiSBLAx/c70Y7fFWdk5pxHAOkxPu3nG+guYJgNQ6rv/ad30mIdOvgLSMvZ3D/cuU+zu/dmsvfWkJ6jr05Z/aGg2Qd9z223ub2Jka98iu7M71PtHLPB1PyvTuPb9x/jGe+9f0kUVkWbIy3LGZyyH8wwM2FD7LBJAc8H1WS1vwbCKvNsDfL+y/2f3/fzeLtZY+lrkuGT5vPiGnzA3a/2RsOsu1QBZpnHMEuELG/yGoj10vTx6LtmXy9er/Xa3x96ezKOM7wafNdgR9g2pytrNjte8y++718Bf5Xf94GFD+D/LY9o8SXxss/bQVg7D8X+synsqLI4/PQZ/hH6BvsMS24uGBqtWy52NAnbFWGBv8GYtqcLQx5aZ5r1qW7p75ezzUBeCyvr/703xWMemWB3+ndH/ytNuM1ePvrrk9W0XPybL8y9dZ8Y7MZ3lm4y6+8nM1VJa734/HllZ/twd2Z/4b9x0p8oWw9lFsts4t7yw7mhD1MP9nO04U3cHHBVHabxIDnA/Y2flWSBv8GYpGjZp+R4//ojuqWnVfAyz9txWqr/mUJ8gut1ZLPs99tpOfk2RVaZtjdD+v9G+ZY4GVRsz92ZdH+8Vl+LngGL8wqve69v5+I1WZKPBl4tuEH8qMNJ5/JwR/wZegULNi4ouAp/mMdjU3DUY3ST7seO3wsn9GvLGC/l9p+XTB55gb+OXcb8zYfrtZ83vttF12f+pGHv1gbkPuJONv8DV+uSAMgv9C/AOwur6D4icE5FyA7r4CvVqW5pbJH1UluyyKs3pvNsL/N44q3llQ4Tyfn04q/NfYOj8/io6V7fJ7PPB6YSkU7Oci/Q/7OjcGz+dw6lHNPvlQjs3bryLJIdYp2+NZjny3fy5ZDOXy0dHdtF8WrE47JRUWVrDYWWu0BN6ScHZSmOoYzfrkyjb9fYW8vzs4rICO3gI4tIiucr7fxHlsP5ZRa6+XnjYdKpevzzBxGdW/JtMv7MOC5n4uPT51D6ovjuPvT1SVGy2TkFjjun1viPmV1zJYl92QRqRnHOf+13yp1vS+Dnp9bpeuT5DD3BP2PS4IWUkQQDxXeyufW4YEpXAU19NU6/aXBX9VZA5//mZOFNjY9O8bva1bvzaZ1dBMuen0R+4/mV2lJA/fa4uVvLuGvl/biwr5taBIShM1muPk/y13nJ01fxmntYzl6opAvVqTx5xEdyfOYWXuyyMqho/k+83tj3o5Kl9XJr/6FGhBOPv0s2xls2cwQy1r6W7Zz0oTwnvU83i4aRzrNa7Q87oNXz+1ePf0K9U1Agr+IjAFeBYKAd4wxL3qcvx+4GSgC0oFJxpi6WV2tRxr6o2x2JZZNuOj1RUQ1CSYnv+xO2umLdjHl241sfnYMTUKCSpzzVTN85Mt1PPLlOgB2vDC2xLlfNh/mF7fmrZEv/1rq+j92ZbGljFFH7lsa1kdNOMkoywpGBK1mlGUFUXICqxHWmxReKbyUGdbhAZ20dVr7WH7f6d+EOOfvym3DOpAY3SRgZajPqhz8RSQIeB0YBaQBy0RkpjHGfUDwKmCAMSZPRG4HXgKurGreys5zxmVl5BdaCQu2uNq7fZn67UbO7BjHOd0qMgsy8N9S7yzcyd/nbGX15FGlzrkH/hM+1rX5l6OWnZ1XyHuLttImJty15ov7ZCdfM1QXbit7Api3zmdvzUT1lyFJ0ukv2+lr2U4/y3a6SyphUkSOCecH6yBm2QazwtaZHJrWWKkePa8rKfERPPj5Gtf/gysGJLnOa5NPsUDU/AcB240xOwFE5FPgQsAV/I0x7nu3/Q5cF4B865T3F+2i/ynN6dM2praLUiF7MvNYuD2dJ75az30jO3PPyE4s3JaOIAxuH1uqvf29Rbt4b9Euv5pTnL9oJwpLB+Cs4wX0f/Ynpl7Yg+tPT65wuZ2zYNPLGd104Kj3znD3vWTfXrATKF7wy1nusp6sJr7vfd36snywpP4+7IZRQG/ZyQDLVvpbttHXso0EsS8nccKEsta0533rGBbaerHE1qNGRu54+/e5bVgHAAanxNJ36k8A/PXS3vzmGA3Xr579flanQAT/NoD76k9pwOAy0t8E+F7cu55yznwM9LK5e7PySGoeXqJGXlBk45vV+wJSnz7/tYUcc9SQvlqVxp1nd2TCu38AMOnMFJ6+oHul7+385bxvxhou7pdU4tzyVPvj+tPfbOCyU5MICbKU27HrjaWKVblnvy85Y9VqM3y31vvkq8YijAJ6SCqdLWn0lF30seygq+wlROxf4jtsrVhg68MqW0dW2TqyxbSlqBa6D8v6/x/TtHizFxFhSKcE/nj8HFo00yYfpxr9FxOR64ABwDAf528FbgU45ZRTarBkddOWgzmM/od9ctKMW09jcHt7e+k/527jX/O207WciSv5hdYSa5V8sSKNS/u34YPFqVxyahJ7s/JcgR8gNTOPnPzidvY1adW3zrn7l1n3p2cTHR7Cmsnn+nWt+/DFy98seziktwCR/Oj3rp+/X3ugxLkPf9/NJ3/Y6zLO0UYNXRxH6WlJpZ9lG6eKvWYfIfYno2MmnLW29rxtG8dqW0eW2zpzpJq3TyxLn6Ro1qQ51hqqYO1HA39JgQj++4C2bq+THMdKEJGRwBPAMGOM12d1Y8zbwNsAAwYMqDPdmXkFRXR/ejavXNmnVA3Wl4+X7mFYlwTaxISXmS4956TPTTzcl2v404crWP20PTg6myyOnSi7Q7TrUz9yarviURUPfr6GjNyTvPjDZqb4WKPl7k9Xu36uzKzOb1bv479LdvPF7WeUOmezGZbvPkK7uNJtwEfLeS9bD+XQqUUkIlLicd/bjGZfjDFl9mncP2M1v+/MdL1ellr5rQ7rqliO0cWyl+6SSn/LNvpYdpIk9iYRmxG2mLZ8YR3KIltPNpp27DPxmDo0Hejf1w9g0Av2Yaf+bBqvfAtE8F8GdBKRFOxB/yrgGvcEItIPeAsYY4yp3hk/1WB/tn143mtzt/sV/LPzCnj8q3W0T4jglweG+0y3aHsG176zlLcmnMroHvbhZyt2Z/Hlyn08f1HPEmnLatz4ZfNhLB6/B6sce7R6rvtSXpDdcbh4vPnKPdm8MGsTj4/tVuY17u5xfHmcKLCW6FzLyD3Jyz9t5WPHRKKnzve/OWnupkPc9MFyXrmyD+d2T/Tah+CL53DNz2873Wfa/60KzB65dUULjtDLspNell2kyEE6y166WYpbaNNMPKtsHZluG80Gk8w6Wwq5Ndg5W1kLHx6B1WYIDw1isOOLYFjnBH6tkVVYG44qB39jTJGI3AnMxj7U8z1jzAYRmQosN8bMBP4GRAKfO2pee4wx46uat78Wbkvnn3O38emtp1dpyVabMeXWHu3p7H8fKWO1RYC1jsfXlbuPMLpHIjab4dL/szdjTB3fw+dTrTP7/Y4x4xsPlFzHfX/2CS5+Y7HXa8vbB9ZzLZi3F+z0Gvxv/mA5P286xI/3DiE6PARjoLXbU86fPlxBaFDx5+Q+4Qm8rzM/d9MhTu9QeijgjnT7F9J9M9YA3jcB98V9yOXy3UdYvCOzjNT1k2CjjWTQQ3bT3ZJKL9lFT0sqLcTebGczQpqJJ9Uk8m3hGawzKWyytSOjhpZMDiiBtrGlv6DemzjQr3WMVLGAtPkbY2YBszyOPe3288hA5FNZzg7MmWv2ERsRxrTZW/jqjjN8bpjtS2pmHrf8Zzm3DGnPgOTYCn2R5BdamTR9GZMv6EGXxCiMMXz4e8nRH1+VUfN0/8KZs6HsIYNpZQT4mWvK7sz09gv00o+bmbPxEB/fUtyP7xyTPnv9IdfCYO6d3d7WfC/PTR8s56HRXUod97buTWV5rmdfvxjiOUZny166iP1PN8seOsk+mjra6K1G2GaSWGjrxTpbCuttyWw0yeTRMNu728SEsy/7BEEWISgAQ54bk0Y1w/fLFfvYeiiHwzknycgtKDXZY/vhHJqEBJHU3Pej78+bDvPzpsPcfFYKX6xM49eHRtCsSfkf40pHrXPKzA18cutprNqb7WqvdobbI3nFTwqeIdjZ+XiiwEpmGU8UWccL+HxF+Vvv+XLoWOnumDfm28fEe5vi76yVg32N+qrytpn2tDlbq3xfp6qODqopzThOJ0mjiyWNzrKXLpJGZ8te4qR4klimiWKLrS2f2M5mu2nNRls7tpi25FM9G8HXRd/ceSZ7fCxlrsrWqIK/zRhXbd3qpYY78mX7yBpnDdZqszfzeAtI7/xmX2b3yreW8P3dQ7zm556D8ynDGcRz3UbZeOtY3X44lzfmb3e9dk5YKbKVXQt+4PM1tI6puVqe+5PEou1Vb1KZU80ToTyXXKhtkeTRy7KLDrKf9nKADrKfTpY0WkvxzNUcE85Wk8Rs6wC2mrZsMW3ZakuqsZ2u6hSPX5X4yDDiI31/2TVvGlLNBaq/GmTwtzka3S0ezTI2Y1w1P5vNcPuHK0iJj+DhMSVXFVyWmkV8ZBg3fbCMnenHOadrC595HTqWX2o2pzPX7LxCXvpxMw+P6YqzKJnHC/ho6W7auj1dHDiaz3+WpJZo2jnv1dIbZ/yxK6vclR6zjhfQwsfoIQV//nhljeYXSR5tJIM2kkFryaSlHKGVZJEk6Zwih0oE+TwTxk7Tit9t3dlqS2KzactWW1v2E0fZXf7Kmw3PjG5U2zJWVIMM/qf9ZS42A8ufLNnV8PvOLFdNwGaMa631TQeOkRJfvPqj59jxuWUsSXwkr7BEO/mnf+xxjdwBe5NJcnwE7R1bEu7KOM4TX63nabfRLt+tPcB3aw/wp2Hty3xfv/m5G1egNhVX/jC0IJsUOUg7y0G6yl6SJN0V8GOk5AYrViMcojn7TDxLbD3YaWvFepPCVlsSB2lep4ZV1ncRYQ0yvAVMg/x0Djum/L+zcGepzs8jjsXClu4qrnHN25LOvC2VHyb2wGfFI1Ae/d+6EsEf8LrOfIGXCURWa9mjFf45d1slS6gqwzmKpiVHSJCjtJAjtJBsWpBNC8mmpWTRTg67OlsBjpsw9poW7DPxrLB1Zp+JL/Eng2jdtETVCQ0y+Ds513/xJlAbfwB8v67kLNGL31hU7jXeRqXNWFb5jlpVOeHkkyQZtJXDJEk6yXKIDrKfJEknSTIIk5LzIoqMhQyiOWxi2GfiWWzryS6TSKpJJNW0ZL+J1+DupyfGduP5Wd5/R1NfHFdiJraLtuIETIMO/rUl1Y+NOIyXUfy+VpBUlRdOPq0lk0TJoq2k004OkSTptJV0kiSdeCk5R+K4o919s2nLz7b+7DKtOGDiOGxiSDcxZBGlwT0A/nl1P8b3ae0K/neO6Mi/5m33mnZwSiw70o/bZ7brUP6A0eBfS8qbbKXKF0YByXKQNpJBK8kijmMkShatHMG+tWQSLSW/iAtMEPtMPHtNC+bYTiXNtGCvSSDNJLDXtCCDZmj1MjB+um8oo15Z4Hp9+/AOdGoRyf2frWGQx65oD47uwoOju7Bx/zEW7yjZt1VPRufWOxr8a0lZ+6UqAEMsObSSTJIknVaS5ai9H6a9HCResokjB4uUrApmmigOmDjSTAv+sHXloIljv4nlgIljn4nnAHH1vuY+oF1zlu8O/LpDF/drw5UD23LV2/b5Ghf2bc03q0tOCuyQEMGO9OJO7ISoMK/Laq9/ZjSRHh2u43q1omebaC7pX3KJlG6tiheK6966Gd1b21/fMiSFfy/c1eA3LaotGvxVrWhKPi3lCC3IpqUcIUGOkCQZrmaZJMko0ZEKcNKEsN/EstO0ZqWtE4dNDDtNK9JMAgdMLJlEU1hD/6VvOiuFNjHh9Gkb7VqSo6qGdIpn4bbyR3RNnzSo3O0a+58Sw8o9FVuVVSgZiHu1iS4R/P91TT9GdGnBOwt3uWZ1//H4OaQ8NsvzVoR6mT3vbQetb/58JslxEV7Lc3bXlvx7oX0+jdb+A0+DvwogQzPySJQsmnGcOMmhpWSRKEdoKVkkSQYJZBMvR2kmpZu9ck0T9piW7DKtWGDrzT4Tz34TT5qJ54CJI4soarpJpmtilNehs86F6cpb9vmvl/Zybf1YnutPT/Yr+PvzCdwypD23f1TxOQ3R4SFsfe48Zizbw3WntaPfKc0ptNp4+Iu1nNUxnoiwYO4Z2YnrTjuF4yetPte5Cgkqfdzb0iFlbX7krV9MBY4Gf1WuJpwkSdJpLZm0crSlJ5JFc8khWo7TjOPESg7NySFUSs+gLTRBHKI5B0wsG0wyGbZoDpnmHDSxHCaGQ6Y56SaGYzSlrrW3/3jvUO+jThzKWy4iEFtslrqnl1uO692qxN4E/oTNiNAgjnuZ8RwabGGCY3c155LgCx4eUSJNXGQYcZElr7tvZGfXE4G3L4XKNt9orb96aPBv5IIpogXZjo7SLFo7JiclSDaJcoREySKRLILc2tZtRkgnmiwTxTEi2GtasNrWkSNEkWmiOGRiOUoEWY6fM4mqF5OXHj2vKy/+sLlC15Q3gdR9hul3d53F+a/95jOt+zIf8x4czohp88vN/4FRnfn7T1tJ9rJHgrvbh3fg/lGd6fRE8SZ6ngG6Q4tIz8sq5Nah7ckvsvKeY+kTT02Cgyp1X2Mq/8WhfNPg34AFU0RLjtBGMlzBvfhvey0+gaOlOk2Pmaakm2gOmliWmB7stSWwyySy38RxwMRxiOa1sm2fL4+d15W/VDBoe1OZCqaI8N+bBhERFswlHstop744jq9Wpble92wTzUc3D2banC2s8tIe7/6vkBLvvR28spoEB5XaJtPz/Tr3v60sEXhkTFce8VguBaBTi0iidZ2dOqXu/Aarcgk2osgjTnKI5RhxcoxoOU4Mua529ZZyhGiO00zySCC7RI0d7NvyHTSxHDSxbLGdwkFiOeB47fz7KFWrAVbG3ed0Iie/kPcXpVb42jE9EwMS/AFaRIW5Zoh7mnhGMtMXp5Y6PqRTAgBz7huKRYQ7PlrhGtHiWWM9s2N8iQX7ynLH8A60ignnqa/Xlzjufk9vTSLegm+wow3+2Yt6uu7XolkYOen2uSXNmgRXeR0cb01gp7ePY8nOTLqUs+VoWUS06ac6aPCvRVf2TeDX1ZuJlRzi5BixHLO3nUsO8Rwj3rGkQAzHiZI8ojlOsHjvYDxuwjhoYjlkmrONNuTamnKIGPY5RsLsM/EcMs2rbaemh8d04aUft5Sb7pNbTqNTy8hSm7sMSo5ld9ZxH1eVLZDt6n8e0ZHJMzeUOOYchTMwOdZr8Hfq3NIe4ObcV7xFtTNQX9yvjeuYr34C51Hn9oTOBQedwfr1a/qz5VCOa82aUd1blrj+1av60rNNNB0SIpnlmHXeqUUku7PyuKB36xJ5NAmx8NHNp/HJH3t4de62gDSreHtbVw1qy5KdDW8DnYZAg3/AGKI44bb+yxFayhFiJYcYcomR48RIruNn+99NNhfibY8NmxGyiCLD2DtGd5PIUVsEx2jKERNFpmlGFlFkmSiyieKoiSCHcKqzs/SXB4Zx9t9/9Xm+Z2v/lhc+rX2sz/0IqtLU8e4NA7jpg+WkxEewK6NyXyJQciXY8JCSbdQRYZVrs4aS/zLndG3hdVTP2V1bsOGZ0T5rueN6t2IcrQBY8/S5hIcGcehYPm/9upPLTm3r9fPr2CKSn+4fVur4xf2SSIxuws1DUnh17raA7IJVVud3ebvflUXb+6uHBv9yFQd193Hp9oW9jpAg2bTE/tpzXDrYx6YfIZJsE8lRIkg1iWTbIjinfxcWZFiYt7uATNPMEdCbccREcowILJYgimz+/693dvwB/PbICM7667yAfQJgD8z3juzEsM4JXreI9LfJwFcQEIEzOsT7vO7Jcd0Y1jmBhKgw+k79qdS1bZrbt5AMDbJgkeKtNH2Zfe9QRv9jQYljQRbhrI72Mlw1sC1/crSBe5Z5bK+SC/eVxVsxbjgjmfF929D/2ZLvIzjI4vfucs7287axTVn3zGi/y9PvFPvQSucy5c5/t0CsgFmdLTOJzZqQnnNSl2gOoEYf/JuS71oSoK1jJmmKHCBRslyBPlxK11SPmzAOmeYcpjlrTXsO2Zpz2DiGLTqGLx42MeT6qJGnXjKOVT9u5pOdO7yW67rBp/CfJbu9nvPGfXJOkxD/aqi+xrB7IyLcO7Kzz/Onty+9964v3mpy7coZrXJqu+Z0almy3dg5A7RZkxCOFxSvizS8Swt+cSzD/fo1/emdFE3ziFDXxKgf7hlSog36f3ecwQ/rDnDdae1oEhJUYjvKEuUGn+d8cY3gcfsvICLERoT6fY+YpiFk5xWWn9APPVpHs+3581ydv01Dg3n6/O6cXcaeFf7y9r0+tFMCbWLCuWN45TuTReDdiQNYtD2DuDI2blEV04iCv6ENGfSx7KC3ZSd9ZCfdLLtLrbduM8J+4kgzCawxHThsKw7khykO8McJ95GP/+45pxP/N9978J9yQY8yg/8fj5/DoBdKb6sIxcE1LiK0RBOLr3Hd7lo2C/O6lWN5LBZh6ePnMNhRpo1TR3P7hyv51bGX73k9E107ojnbtJ02Pzum3C8sb08Mj57XjduHdyS6aQj7jxZPGnMfMtmxRWSpDb/dvyjB/jk9Ma47viQ2sweciNDK/7pUpV9i7v3DyMj1vXWnN2U1lXiO+pl0VkplilWKt3+j5hGhLHr07Crfu0VUEy7ul1R+QuW3Bh/8m5LPGMsf3BL8Pd0s9iWTT5pgNplTmGUdzB7TgkOmOQeIY68toUaHMTprmd4mEXnuQuapWbjvYXPOmZGev4sfTBrEvxfuJNhi4ZL+bfjb7NIdtHeO6MhT3xR3eH5x2+m08zH93lPLZsUdGE1Dg7l8QJIr+P/fdae6zkWHh/DhTYO57t2lQMknlb9f3ocHPi/eH8HJfZ/k/7u2v2vTbmcN2lXBLuNj69QikutOa1finsfyi8qdqDVlfA8GpcQxMLl5mem8ccbgsrK4dWh7drrthezJPqGqcjVeHSWjfGlwwf9kkb1m25oMJgX/wFVB84iUfLbZ2vBM4QSW27qw2ZxSY2vAVMb1p7crN02ZActV6yuZZkByLAPcVlN0D/7v3ziQG99fxukd4ujVJpp1+47yz6v7lUhfUc6APK53q1LnzurkvX3/0lOTMMCDbl8Ab153Ku0Tioefnter9P1K5Ov2s/t3qGfHZ6eWUazYfaTUk4inpqHBXHZqJWudpVt9Snl8bLfK3duf7LWzVPlQdyNgJRVaDTcFfc/DwZ8RQhFf287k46KzWWE61/gs0yfGdqN3UjRXOlZJ9GXFkyOxGZjy7Qa+X3vAa8B94eJePP5V8RoxocEWPvvT6V739HX+vrsHPm8bWTsDwzvXD2BElxal2rPLmzXqzce3DCYpxn5dV0e7+shuFWtP9gyUY3qW38HaPiGCU2Kb8uS47rzz207X8Y5lzFp9a8Kp/LYto8QTi1KNRYML/kFZ23kq5CPmW/vwZNEk0kxCrZXllqHtsdkMImXXwDwf6b3VEq8ZfAovzNpErtuGL4NSir8k4iKLOxCbNw0lLNjC446dktJzTtL/FN9NFkmx3vsvKlNrdB+x06llFBunjqapj5wW+8UAAAvdSURBVLby5LimXoOze7aewy19aRIS5Fp/5t8L7cH//RsHljnEMD4yjIvcxt9XB19NcNXttPaxRIUFu0YsKeWpwQV/E9uBawseY70tpdpmqi557Gwmf7OBQSmxZW4VCfa2+11/GcfJIitdnvyxzLSXn5rE92sP0Nex0mHftjGs3uvfsrz93IJ7aLCFLc+dB9g3n/92zX7G923t130CzVfgB5j/0Aif55wqM/7c1c5e4SsDr3iwT82WJi4yrEJDQKuiRVRYuaubqrqn7q+2VUEGYZGtV0ACf4eE4o7Oh0Z3cf3cKjqct68fwM1D2vt9rzA/FrUa7mh6cY5O+eDGQRUorXfOp4NArhXT3zFWvKLDHiujrI5tX4w2dNeoP54Yyaqnz63tYqgKang1/2q6b4/WzcpP5PDwmC5c2LfqzQmetV5nUPv8ttP9vsd1g09hRJcEkpr7335/Sf82rNt3lNYx3puD/nvTYDIrOPSwIqIdAT8iNIgvKvBenYZ1TmDhtgy/RylVJ39G+yhVGwJS8xeRMSKyRUS2i8ijXs6HicgMx/mlIpIciHy9qWit75cHSk999ybY4v9HdcfwjrTxEjjXTalY7cj5TmI8Oms9x6mXRUR8Bv6xjlEz8R59DhPPSGbb8+eREOV9eGFEWDCnODqDh3SK57Wr+/ldHn+M7NaCl6/ow8qnR1UqgN90VgrLnxwZ8JUxK8OfIaiqbPogVz2qHPxFJAh4HTgP6A5cLSKeM2ZuAo4YYzoCrwB/rWq+vlT0/0n7hEif0/WdnYXj+7TmjA5lz2B99sIe5eYV1aRiTRhhjiGIg1NKjv5x/4Jb+PAIfn1oeIXu63TX2R1ZO+XcUsFfREpNBPLlvzcN5oI+ge1PEBEu6Z/kV1OZr+s931Nt0d2oVF0ViJr/IGC7MWanMaYA+BS40CPNhcAHjp+/AM6Rqqz0VAbPWkJ0GW3Go3vYV0V01tJvG9aB7m61ameQvevsjj4nXQU7jjt3PirP/aM68787zvArbURYMLPvHco/rvRds24b27TSzRsWi9Csgl9IqmLG9mxF91bNqrxWfmOmT03VIxDBvw2w1+11muOY1zTGmCLgKFCqKi0it4rIchFZnp6eXrnSeAT/Vl42jfaSL2AfCz/rniEVym7uA8P49/UDAPviYx9MKruT9u5zOpU57NJTl8QowkPtNeCrB50C+O48vqiWRvQo35pHhDLrniF1ov9BKXd1qsPXGPM28DbAgAEDKvW87PmY7fmA0T4hgp3p3pf8rUyG7eIiXL/YFRn9UxmPj+3GQ2O6+JyR+sqVfXnlyr7VWgalakqkY6XR1tFVX0dLlRaImv8+oK3b6yTHMa9pRCQYiAaqZYeH8jqHUuIi+L9r+wPFY699PVV6fnG8dFnvCs9WDSSLRcpsBxeRKq2brlRd0jsphteu7sezF/Ws7aI0SIEI/suATiKSIiKhwFXATI80M4EbHD9fBvxiqmkwtudN7zq7IwBTL+zBsM4J/O3yPr6vLadEVwxoyzs3DKxiCZVS/rqgT+uA7DWgSqvyp2qMKRKRO4HZQBDwnjFmg4hMBZYbY2YC7wL/FZHtQBb2L4hq4Tk2fmyvVq7JSNc7OmU9Y3yUY8VIXzs16XgNpVRDE5CvVGPMLGCWx7Gn3X7OBy4PRF7lsYiUuQm3O2cLyS1D2xMWHMQ1jg5V1/nqKKBSStUBDW55h9iIUP54YmSZaTybd8KCg7hlaPtSW+hpjV8p1VA12Ma0tyacSqKPpXqbhdvftr9L+eoTgFKqoWmwwX90D99rwJ/VMZ5Xr+rrM82Ll/Ri66FcFmyzzzXQJwClVEPTYIN/WUSkzIXXrnK0/d80fRnbD+fSpJLLDCilVF3VKIO/v165qi+Lt2e6FjFTSqmGosF1+AZSsyYhfm0hqJRS9Y0Gf6WUaoQ0+CulVCOkwV8ppRohDf5KKdUIafBXSqlGSIO/Uko1Qhr8lVKqEdLgr5RSjZAGf6WUaoQ0+CulVCOkwV8ppRohDf5KKdUIafBXSqlGSIO/Uko1Qhr8lVKqEdLgr5RSjZAGf6WUaoQ0+CulVCOkwV8ppRoh3cBdqVoy5YLuDEyJre1iqEZKg79StWTimSm1XQTViFWp2UdEYkXkJxHZ5vi7uZc0fUVkiYhsEJG1InJlVfJUSilVdVVt838UmGuM6QTMdbz2lAdcb4zpAYwB/iEiMVXMVymlVBVUNfhfyP+3d24hVlZRHP/9ES+RVt4yUUkFIXwIk0GMRMS8lEX1ICEETRcQ6qXoIZSBoLfqISoIRCywqLQsaTCixhR6StO8TZnOZEbZ5EyZVi9dVw97HTsM51PrjH3n7LN+sJm1197zfesPe9bss7/v7A0b3d4I3DG4g5kdNbMet78F+oGJdd43CIIgqIN6k/8kM+tz+ztg0rk6S5oHjAC+KGhfLWmPpD0DAwN1hhYEQRAUcd4HvpK2A1fVaOqorpiZSbJzXGcy8DLQbmZ/1epjZuuB9QBtbW2F1wqCIAjq47zJ38yWFLVJOilpspn1eXLvL+h3GfAO0GFmH/3naIMgCIIhod5ln06g3e124O3BHSSNALYCL5nZljrvFwRBEAwB9Sb/J4ClknqAJV5HUpukDd7nTmAhcI+k/V7m1HnfIAiCoA5k1phL65IGgK/quMQE4PshCqfRyFkbhL5mJ2d9zaDtajM77xuVDZv860XSHjNrKzuOi0HO2iD0NTs568tJW2zsFgRB0IJE8g+CIGhBck7+68sO4CKSszYIfc1Ozvqy0Zbtmn8QBEFQTM4z/yAIgqCASP5BEAQtSHbJX9JNko5I6pVUa4vphkTSi5L6JXVX+Wqel6DEc67xoKS5Vb/T7v17JLXXutf/jaRpknZK+szPdXjI/bnoGyVpt6QDru9x98+QtMt1bPZvuyNppNd7vX161bXWuv+IpOXlKKqNpGGS9kna5vVs9Ek6LumQfwl1j/uyGJ+FmFk2BRhG2jF0Jmn30APA7LLjusDYFwJzge4q31PAGrfXAE+6vQJ4FxAwH9jl/nHAMf851u2xDaBtMjDX7THAUWB2RvoEjHZ7OLDL434dWOX+dcADbj8IrHN7FbDZ7dk+ZkcCM3wsDytbX5XOR4BXgW1ez0YfcByYMMiXxfgsKrnN/OcBvWZ2zMx+AzaRzhxoeMzsQ+DUIHfReQm3k/ZKMksb5V3hG+stB7rM7JSZ/Qh0kQ7QKRUz6zOzT9z+GTgMTCEffWZmv3h1uBcDFgOV/awG66vo3gLcKEnu32Rmv5rZl0AvaUyXjqSpwC3ABq+LjPQVkMX4LCK35D8F+Lqq/o37mpWi8xKKdDa8fl8CuI40O85Gny+J7CftbNtFmtWeNrM/vEt1rGd1ePsZYDwNrA94BngUqGzHPp689BnwvqS9kla7L5vxWYs4wL1JMDv3eQnNgKTRwJvAw2b2U5oMJppdn5n9CcxROqJ0K3BNySENGZJuBfrNbK+kRWXHc5FYYGYnJF0JdEn6vLqx2cdnLXKb+Z8AplXVp7qvWTnpHycrh+FUzkso0tmw+iUNJyX+V8zsLXdno6+CmZ0GdgLXk5YDKhOs6ljP6vD2y4EfaFx9NwC3STpOWkpdDDxLPvowsxP+s5/0z3seGY7PanJL/h8Ds/wthBGkh02dJcdUD0XnJXQCd/tbB/OBM/7x9D1gmaSx/mbCMveViq/3vgAcNrOnq5py0TfRZ/xIugRYSnqusRNY6d0G66voXgnssPTEsBNY5W/LzABmAbv/HxXFmNlaM5tqZtNJf1M7zOwuMtEn6VJJYyo2aVx1k8n4LKTsJ85DXUhP4o+S1lw7yo7nX8T9GtAH/E5aK7yftE76AdADbAfGeV8Bz7vGQ0Bb1XXuIz1I6wXuLVuXx7SAtKZ6ENjvZUVG+q4F9rm+buAx988kJbde4A1gpPtHeb3X22dWXavDdR8Bbi5bWw2ti/jnbZ8s9LmOA14+reSNXMZnUYntHYIgCFqQ3JZ9giAIggsgkn8QBEELEsk/CIKgBYnkHwRB0IJE8g+CIGhBIvkHQRC0IJH8gyAIWpC/AUQjl60KodC9AAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# ranked predictions plot\n", "pred_frame = local_glm2.predict(practical_sample).cbind(practical_sample)\\\n", " .as_data_frame()[['predict', yhat]]\n", "\n", "pred_frame.columns = ['Surrogate Preds.', 'ML Preds.']\n", "pred_frame.sort_values(by='ML Preds.', inplace=True)\n", "pred_frame.reset_index(inplace=True, drop=True)\n", "_ = pred_frame.plot(title='Ranked Predictions Plot')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both the R2 and ranked predictions plot show the linear model is a good fit in the practical, approximately local sample. This means the regression coefficients are likely a very accurate representation of the behavior of the nonlinear model in this region." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Generate reason codes using a practical sample" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create explanations (or 'reason codes') for a row in the local set\n", "Reason codes are generated for the model based on the practical sample, just as they were for the model based on the perturbed sample. Again, the woman's value for `PAY_0` is the most important local contributor to her GBM prediction. As seen in other attempts to explain this prediction, `PAY_0` followed by other payment variables, marital status, and her age play a role in the model decision. Also, for better local accuracy and explainability, LIME contributions are scaled such that contributions for each prediction plus the LIME intercept term sum to the GBM's predictions." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gbm prediction progress: |████████████████████████████████████████████████| 100%\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "g_pred_ = model.predict(row)['p1'][0, 0]\n", "plot_local_contrib(row, local_glm2, X, g_pred=g_pred_, scale=True) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Shutdown H2O\n", "After using h2o, it's typically best to shut it down. However, before doing so, users should ensure that they have saved any h2o data structures, such as models and H2OFrames, or scoring artifacts, such as POJOs and MOJOs." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Are you sure you want to shutdown the H2O instance running at http://127.0.0.1:54321 (Y/N)? y\n", "H2O session _sid_898f closed.\n" ] } ], "source": [ "# be careful, this can erase your work!\n", "h2o.cluster().shutdown(prompt=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Summary\n", "In this notebook, LIME was used to explain and generate reason codes for a complex GBM classifier. To do so, local linear models were fit to appropriate, representative samples and linear model coefficients were used to explain the average behavior in the samples and to create reason codes. Reason codes were assesed against domain knowledge and reasonable expectations. A ranked prediction plot was also introduced to compare surrogate linear model predictions to GBM model predictions. These techniques should generalize well for many types of business and research problems, enabling you to train a complex machine learning model and analyze, validate, and explain it to your colleagues, bosses, and potentially, external regulators." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }