{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Credit Approval Tutorial\n", "This tutorial illustrates the use of several methods in the AI Explainability 360 Toolkit to provide different kinds of explanations suited to different users in the context of a credit approval process enabled by machine learning. We use data from the [FICO Explainable Machine Learning Challenge](https://community.fico.com/s/explainable-machine-learning-challenge) as [described below](#intro). The three types of users (a.k.a. consumers) that we consider are a data scientist, who evaluates the machine learning model before deployment, a loan officer, who makes the final decision based on the model's output, and a bank customer, who wants to understand the reasons for their application result. \n", "\n", "For the [data scientist](#rule-based-models), we present two directly interpretable rule-based models that provide global understanding of their behavior. These models are produced by the [Boolean Rule Column Generation](#BRCG) (BRCG, class `BooleanRuleCG`) and [Logistic Rule Regression](#LogRR) (LogRR, class `LogisticRuleRegression`) algorithms in AIX360. The former yields very simple OR-of-ANDs classification rules while the latter gives weighted combinations of rules that are more accurate and still interpretable.\n", "\n", "For the [loan officer](#prototypes), we demonstrate a different way of explaining machine learning predictions by showing examples, specifically _prototypes_ or representatives in the training data that are similar to a given loan applicant and receive the same class label. We use the ProtoDash method (class `ProtodashExplainer`) to find these prototypes.\n", "\n", "For the [bank customer](#contrastive), we consider the Contrastive Explanations Method (CEM, class `CEMExplainer`) for explaining the predictions of black box models to end users. CEM builds upon the popular approach of highlighting features present in the input instance that are responsible for the model's classification. In addition to these, CEM also identifies features that are (minimally) absent in the input instance, but whose presence would have altered the classification.\n", "\n", "The tutorial is organized around these three types of consumers, following an introduction to the dataset.\n", "1. [Introduction to FICO HELOC Dataset](#intro)\n", "2. [Data Scientist: Boolean Rules and Logistic Rule Regression models](#rule-based-models)\n", "3. [Loan Officer: Similar samples as explanations for predictions based on HELOC Dataset](#prototypes)\n", "4. [Customer: Contrastive Explanations for predictions based on HELOC Dataset](#contrastive)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 1. Introduction to FICO HELOC Dataset\n", "\n", "The FICO HELOC dataset contains anonymized information about home equity line of credit (HELOC) applications made by real homeowners. A HELOC is a line of credit typically offered by a US bank as a percentage of home equity (the difference between the current market value of a home and the outstanding balance of all liens, e.g. mortgages). The customers in this dataset have requested a credit line in the range of USD 5,000 - 150,000. The machine learning task we are considering is to use the information about the applicant in their credit report to predict whether they will make timely payments over a two year period. The machine learning prediction can then be used to decide whether the homeowner qualifies for a line of credit and, if so, how much credit should be extended. \n", "\n", "The HELOC dataset and more information about it, including instructions to download, can be found [here](https://community.fico.com/s/explainable-machine-learning-challenge?tabset-3158a=2).\n", "\n", "The table below reproduces part of the data dictionary that comes with the HELOC dataset, explaining the predictor variables and target variable. For example, NumSatisfactoryTrades is a predictor variable that counts the number of past credit agreements with the applicant, which resulted in on-time payments. The target variable to predict is a binary variable called RiskPerformance. The value “Bad” indicates that an applicant was 90 days past due or worse at least once over a period of 24 months from when the credit account was opened. The value “Good” indicates that they have made their payments without ever being more than 90 days overdue. The relationship between a predictor variable and the target is indicated in the last column of the table. If a predictor variable is monotonically decreasing with respect to probability of bad = 1, it \n", "means that as the value of the variable increases, the probability of the loan application being \"Bad\" decreases, i.e. it becomes more \"good\". For example, ExternalRiskEstimate and NumSatisfactoryTrades are shown as monotonically decreasing. Monotonically increasing has the opposite meaning.\n", "\n", "\n", "|Field | Meaning |Monotonicity Constraint (with respect to probability of bad = 1)|\n", "|------|---------|----------------------------------------------------------------|\n", "|ExternalRiskEstimate |\tConsolidated version of risk markers |Monotonically Decreasing| \n", "|MSinceOldestTradeOpen\t| Months Since Oldest Trade Open | Monotonically Decreasing|\n", "|MSinceMostRecentTradeOpen | Months Since Most Recent Trade Open |Monotonically Decreasing\n", "|AverageMInFile\t| Average Months in File |Monotonically Decreasing|\n", "|NumSatisfactoryTrades |\tNumber Satisfactory Trades |Monotonically Decreasing|\n", "|NumTrades60Ever2DerogPubRec |\tNumber Trades 60+ Ever |Monotonically Decreasing|\n", "|NumTrades90Ever2DerogPubRec | Number Trades 90+ Ever |Monotonically Decreasing| \n", "|PercentTradesNeverDelq\t| Percent Trades Never Delinquent|Monotonically Decreasing|\n", "|MSinceMostRecentDelq\t| Months Since Most Recent Delinquency|Monotonically Decreasing|\n", "|MaxDelq2PublicRecLast12M |\tMax Delq/Public Records Last 12 Months. See tab \"MaxDelq\" for each category|Values 0-7 are monotonically decreasing|\n", "|MaxDelqEver |\tMax Delinquency Ever. See tab \"MaxDelq\" for each category|Values 2-8 are monotonically decreasing|\n", "|NumTotalTrades\t| Number of Total Trades (total number of credit accounts)|No constraint|\n", "|NumTradesOpeninLast12M\t| Number of Trades Open in Last 12 Months|Monotonically Increasing| \n", "|PercentInstallTrades\t| Percent Installment Trades|No constraint|\n", "|MSinceMostRecentInqexcl7days |\tMonths Since Most Recent Inq excl 7days|Monotonically Decreasing| \n", "|NumInqLast6M\t| Number of Inq Last 6 Months|Monotonically Increasing|\n", "|NumInqLast6Mexcl7days\t| Number of Inq Last 6 Months excl 7days. Excluding the last 7 days removes inquiries that are likely due to price comparision shopping. |Monotonically Increasing|\n", "|NetFractionRevolvingBurden\t| Net Fraction Revolving Burden. This is revolving balance divided by credit limit |Monotonically Increasing|\n", "|NetFractionInstallBurden\t| Net Fraction Installment Burden. This is installment balance divided by original loan amount |Monotonically Increasing| \n", "|NumRevolvingTradesWBalance\t| Number Revolving Trades with Balance |No constraint|\n", "|NumInstallTradesWBalance\t| Number Installment Trades with Balance |No constraint|\n", "|NumBank2NatlTradesWHighUtilization\t| Number Bank/Natl Trades w high utilization ratio |Monotonically Increasing|\n", "|PercentTradesWBalance\t| Percent Trades with Balance |No constraint\n", "|RiskPerformance\t| Paid as negotiated flag (12-36 Months). String of Good and Bad | Target |\n", "\n", "\n", "#### Storing HELOC dataset to run this notebook\n", "- In this notebook, we assume that the HELOC dataset is saved as `./aix360/data/heloc_data/heloc_dataset.csv`, where \".\" is the root directory of the Git repository before running a pip install of aix360 library. \n", "- If the data is downloaded after installation, please place the file within the respective folder under site-packages of your virtual environment `path-to-your-virtual-env/lib/python3.6/site-packages/aix360/data/heloc_data/heloc_dataset.csv`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 2. Data scientist: Boolean Rule and Logistic Rule Regression models\n", "In evaluating a machine learning model for deployment, a data scientist would ideally like to understand the behavior of the model as a whole, not just in specific instances (e.g. specific loan applicants). This is especially true in regulated industries such as banking where higher standards of explainability may be required. For example, the data scientist may have to present the model to: 1) technical and business managers for review before deployment, 2) a lending expert to compare the model to the expert's knowledge, or 3) a regulator to check for compliance. Furthermore, it is common for a model to be deployed in a different geography than the one it was trained on. A global view of the model may uncover problems with overfitting and poor generalization to other geographies before deployment.\n", "\n", "Directly interpretable models can provide such global understanding because they have a sufficiently simple form for their workings to be transparent. Below we present two directly interpretable models in the form of a [Boolean rule (BR)](#BRCG) and a [logistic rule regression (LogRR)](#LogRR) model. The former is produced by the Boolean Rule Column Generation (BRCG) algorithm while the latter is a generalized linear rule model (GLRM), both implemented in AIX360. While both models are interpretable, they provide different trade-offs between model simplicity and accuracy in predicting loan repayment. BRCG yields a very simple set of rules that has reasonable accuracy. LogRR achieves higher accuracy, higher even than some uninterpretable models, while retaining the form of a linear model. Its interpretation is enhanced by [plots as demonstrated below](#visualize)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1. Load and process data for BRCG and LogRR\n", "We use the `HELOCDataset` class in AIX360 to load the FICO HELOC data as a DataFrame. The setting `custom_preprocessing=nan_preprocessing` converts special values in the data (coded as negative integers) to `np.nan`, which can be handled properly by BRCG and LogRR, as opposed to replacing them with zeros or mean values. The data is then split into training and test sets using a fixed random seed." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
89608403194948864998
ExternalRiskEstimate64.057.059.065.065.0
MSinceOldestTradeOpen175.047.0168.0228.0117.0
MSinceMostRecentTradeOpen6.09.03.05.07.0
AverageMInFile97.035.038.069.048.0
NumSatisfactoryTrades29.05.021.024.07.0
NumTrades60Ever2DerogPubRec9.01.00.03.01.0
NumTrades90Ever2DerogPubRec9.00.00.02.01.0
PercentTradesNeverDelq63.050.0100.085.078.0
MSinceMostRecentDelq2.016.0NaN3.036.0
MaxDelq2PublicRecLast12M4.06.07.00.06.0
MaxDelqEver4.05.08.02.04.0
NumTotalTrades41.010.021.027.09.0
NumTradesOpeninLast12M1.01.012.01.02.0
PercentInstallTrades63.030.038.031.056.0
MSinceMostRecentInqexcl7days0.00.00.07.07.0
NumInqLast6M1.02.01.00.00.0
NumInqLast6Mexcl7days1.02.01.00.00.0
NetFractionRevolvingBurden16.066.085.013.054.0
NetFractionInstallBurden94.070.090.066.069.0
NumRevolvingTradesWBalance1.02.010.03.02.0
NumInstallTradesWBalance1.02.05.02.03.0
NumBank2NatlTradesWHighUtilizationNaN0.04.00.01.0
PercentTradesWBalance50.057.094.046.083.0
\n", "
" ], "text/plain": [ " 8960 8403 1949 4886 4998\n", "ExternalRiskEstimate 64.0 57.0 59.0 65.0 65.0\n", "MSinceOldestTradeOpen 175.0 47.0 168.0 228.0 117.0\n", "MSinceMostRecentTradeOpen 6.0 9.0 3.0 5.0 7.0\n", "AverageMInFile 97.0 35.0 38.0 69.0 48.0\n", "NumSatisfactoryTrades 29.0 5.0 21.0 24.0 7.0\n", "NumTrades60Ever2DerogPubRec 9.0 1.0 0.0 3.0 1.0\n", "NumTrades90Ever2DerogPubRec 9.0 0.0 0.0 2.0 1.0\n", "PercentTradesNeverDelq 63.0 50.0 100.0 85.0 78.0\n", "MSinceMostRecentDelq 2.0 16.0 NaN 3.0 36.0\n", "MaxDelq2PublicRecLast12M 4.0 6.0 7.0 0.0 6.0\n", "MaxDelqEver 4.0 5.0 8.0 2.0 4.0\n", "NumTotalTrades 41.0 10.0 21.0 27.0 9.0\n", "NumTradesOpeninLast12M 1.0 1.0 12.0 1.0 2.0\n", "PercentInstallTrades 63.0 30.0 38.0 31.0 56.0\n", "MSinceMostRecentInqexcl7days 0.0 0.0 0.0 7.0 7.0\n", "NumInqLast6M 1.0 2.0 1.0 0.0 0.0\n", "NumInqLast6Mexcl7days 1.0 2.0 1.0 0.0 0.0\n", "NetFractionRevolvingBurden 16.0 66.0 85.0 13.0 54.0\n", "NetFractionInstallBurden 94.0 70.0 90.0 66.0 69.0\n", "NumRevolvingTradesWBalance 1.0 2.0 10.0 3.0 2.0\n", "NumInstallTradesWBalance 1.0 2.0 5.0 2.0 3.0\n", "NumBank2NatlTradesWHighUtilization NaN 0.0 4.0 0.0 1.0\n", "PercentTradesWBalance 50.0 57.0 94.0 46.0 83.0" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load FICO HELOC data with special values converted to np.nan\n", "from aix360.datasets.heloc_dataset import HELOCDataset, nan_preprocessing\n", "data = HELOCDataset(custom_preprocessing=nan_preprocessing).data()\n", "# Separate target variable\n", "y = data.pop('RiskPerformance')\n", "\n", "# Split data into training and test sets using fixed random seed\n", "from sklearn.model_selection import train_test_split\n", "dfTrain, dfTest, yTrain, yTest = train_test_split(data, y, random_state=0, stratify=y)\n", "dfTrain.head().transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "BRCG and LogRR require non-binary features to be binarized using the provided `FeatureBinarizer` class. We use the default of nine quantile thresholds (i.e. 10 bins) to binarize ordinal (including continuous-valued) features, include all negations (e.g. '>' comparisons as well as '<='), and also return standardized versions of the original unbinarized ordinal features, which are used by LogRR but not BRCG. Below is the result of binarizing the first 'ExternalRiskEstimate' feature. " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Applications/anaconda3/lib/python3.7/site-packages/aix360/algorithms/rbm/features.py:154: RuntimeWarning: invalid value encountered in less_equal\n", " Anew = (data[c].values[:, np.newaxis] <= thresh[c]).astype(int)\n", "/Applications/anaconda3/lib/python3.7/site-packages/aix360/algorithms/rbm/features.py:154: RuntimeWarning: invalid value encountered in less_equal\n", " Anew = (data[c].values[:, np.newaxis] <= thresh[c]).astype(int)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
operation<=>==!=
value59.063.066.069.072.075.078.082.086.059.063.066.069.072.075.078.082.086.0NaNNaN
896000111111111000000001
840311111111100000000001
194911111111100000000001
488600111111111000000001
499800111111111000000001
\n", "
" ], "text/plain": [ "operation <= > \\\n", "value 59.0 63.0 66.0 69.0 72.0 75.0 78.0 82.0 86.0 59.0 63.0 66.0 69.0 \n", "8960 0 0 1 1 1 1 1 1 1 1 1 0 0 \n", "8403 1 1 1 1 1 1 1 1 1 0 0 0 0 \n", "1949 1 1 1 1 1 1 1 1 1 0 0 0 0 \n", "4886 0 0 1 1 1 1 1 1 1 1 1 0 0 \n", "4998 0 0 1 1 1 1 1 1 1 1 1 0 0 \n", "\n", "operation == != \n", "value 72.0 75.0 78.0 82.0 86.0 NaN NaN \n", "8960 0 0 0 0 0 0 1 \n", "8403 0 0 0 0 0 0 1 \n", "1949 0 0 0 0 0 0 1 \n", "4886 0 0 0 0 0 0 1 \n", "4998 0 0 0 0 0 0 1 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Binarize data and also return standardized ordinal features\n", "from aix360.algorithms.rbm import FeatureBinarizer\n", "fb = FeatureBinarizer(negations=True, returnOrd=True)\n", "dfTrain, dfTrainStd = fb.fit_transform(dfTrain)\n", "dfTest, dfTestStd = fb.transform(dfTest)\n", "dfTrain['ExternalRiskEstimate'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 2.2. Run Boolean Rule Column Generation (BRCG)\n", "First we consider BRCG, which is designed to produce a very simple OR-of-ANDs rule (known more formally as disjunctive normal form, DNF) or alternatively an AND-of-ORs rule (conjunctive normal form, CNF) to predict whether an applicant will repay the loan on time (Y = 1). For a binary classification problem such as we have here, a DNF rule is equivalent to a *rule set*, where AND clauses in the DNF correspond to individual rules in the rule set. Furthermore, it can be shown that a CNF rule for Y = 1 is equivalent to a DNF rule for Y = 0 [[1]](https://ieeexplore.ieee.org/document/7738856). BRCG is distinguished by its use of the optimization technique of column generation to search the space of possible clauses, which is exponential in size. To learn more about column generation, please see our NeurIPS paper [[2]](http://papers.nips.cc/paper/7716-boolean-decision-rules-via-column-generation). \n", "\n", "For this dataset, we find that a CNF rule for Y = 1 (i.e. a DNF for Y = 0, enabled by setting `CNF=True`) is slightly better than a DNF rule for Y = 1. The model complexity parameters `lambda0` and `lambda1` penalize the number of clauses in the rule and the number of conditions in each clause. We use the default values of 1e-3 for `lambda0` and `lambda1` (decreasing them did not increase accuracy here) and leave other parameters at their defaults as well. The model is then trained, evaluated, and printed." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Learning CNF rule with complexity parameters lambda0=0.001, lambda1=0.001\n", "Initial LP solved\n", "Iteration: 1, Objective: 0.2895\n", "Iteration: 2, Objective: 0.2895\n", "Iteration: 3, Objective: 0.2895\n", "Iteration: 4, Objective: 0.2895\n", "Iteration: 5, Objective: 0.2864\n", "Iteration: 6, Objective: 0.2864\n", "Iteration: 7, Objective: 0.2864\n", "Training accuracy: 0.719573146021883\n", "Test accuracy: 0.696515397082658\n", "Predict Y=0 if ANY of the following rules are satisfied, otherwise Y=1:\n", "['ExternalRiskEstimate <= 75.00 AND NumSatisfactoryTrades <= 17.00', 'ExternalRiskEstimate <= 72.00 AND NumSatisfactoryTrades > 17.00']\n" ] } ], "source": [ "# Instantiate BRCG with small complexity penalty and large beam search width\n", "from aix360.algorithms.rbm import BooleanRuleCG\n", "br = BooleanRuleCG(lambda0=1e-3, lambda1=1e-3, CNF=True)\n", "\n", "# Train, print, and evaluate model\n", "br.fit(dfTrain, yTrain)\n", "from sklearn.metrics import accuracy_score\n", "print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))\n", "print('Test accuracy:', accuracy_score(yTest, br.predict(dfTest)))\n", "print('Predict Y=0 if ANY of the following rules are satisfied, otherwise Y=1:')\n", "print(br.explain()['rules'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned DNF rule for Y = 0 is indeed very simple with only two clauses, each involving the same two features. It is interesting to see that such a rule can already achieve 69.7% accuracy. 'ExternalRiskEstimate' is a consolidated version of some risk markers (higher is better), while 'NumSatisfactoryTrades' is the number of satisfactory credit accounts. It makes sense therefore that for applicants with more than 17 satisfactory accounts, the ExternalRiskEstimate threshold dividing good (Y = 1) and bad (Y = 0) credit risk is slightly lower (more lenient) than for applicants with fewer satisfactory accounts.\n", "\n", "We note that AIX360 includes only a heuristic beam search version of BRCG. The published version of BRCG [[2]](http://papers.nips.cc/paper/7716-boolean-decision-rules-via-column-generation) (not implemented in AIX360) uses integer programming to yield slightly more complex rules that are also more accurate (close to 72% test accuracy)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 2.3. Run Logistic Rule Regression (LogRR)\n", "Next we consider a LogRR model, which can improve accuracy at the cost of a more complex but still interpretable model. Specifically, LogRR fits a logistic regression model using rule-based features, where column generation is again used to generate promising candidates from the space of all possible rules. Here we are also including unbinarized ordinal features (`useOrd=True`) in addition to rules. Similar to BRCG, the complexity parameters `lambda0`, `lambda1` penalize the number of rules included in the model and the number of conditions in each rule. the The values for `lambda0`, `lambda1` below strike a good balance between accuracy and model complexity, based on our published experience with the FICO HELOC dataset [[3]](http://proceedings.mlr.press/v97/wei19a.html)." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training accuracy: 0.742536809401594\n", "Test accuracy: 0.7260940032414911\n", "Probability of Y=1 is predicted as logistic(z) = 1 / (1 + exp(-z))\n", "where z is a linear combination of the following rules/numerical features:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rule/numerical featurecoefficient
0(intercept)-0.0684696
1MSinceMostRecentInqexcl7days > 0.000.680258
2ExternalRiskEstimate0.654171
3NetFractionRevolvingBurden-0.554063
4NumSatisfactoryTrades0.551644
5NumInqLast6M-0.463222
6NumBank2NatlTradesWHighUtilization-0.448346
7AverageMInFile <= 52.00-0.434366
8NumRevolvingTradesWBalance <= 5.000.421533
9MaxDelq2PublicRecLast12M <= 5.00-0.418156
10PercentInstallTrades > 50.00-0.317581
11NumSatisfactoryTrades <= 12.00-0.31248
12MSinceMostRecentDelq <= 21.00-0.301572
13PercentTradesNeverDelq <= 95.00-0.273936
14ExternalRiskEstimate > 75.000.263452
15AverageMInFile <= 84.00-0.182134
16PercentTradesNeverDelq0.166524
17AverageMInFile0.150683
18PercentInstallTrades > 42.00-0.148731
19NumBank2NatlTradesWHighUtilization <= 0.000.135388
20MSinceOldestTradeOpen <= 122.00-0.132505
21PercentTradesNeverDelq <= 91.00-0.117713
22NumSatisfactoryTrades <= 17.00-0.110228
23ExternalRiskEstimate > 72.000.107617
24NumInqLast6M > 0.00-0.0993614
25MSinceOldestTradeOpen <= 146.00-0.0966503
26PercentInstallTrades <= 42.000.0916733
27MSinceMostRecentInqexcl7days <= 0.00-0.0900543
28AverageMInFile <= 61.00-0.0794703
29AverageMInFile <= 76.00-0.072278
30NetFractionRevolvingBurden <= 39.000.0627657
31MSinceOldestTradeOpen > 122.000.060358
32NetFractionRevolvingBurden <= 50.000.0455664
33MSinceOldestTradeOpen0.0421272
34ExternalRiskEstimate > 69.000.0354293
35PercentTradesWBalance <= 73.00-0.0345454
36MSinceOldestTradeOpen > 146.000.024503
\n", "
" ], "text/plain": [ " rule/numerical feature coefficient\n", "0 (intercept) -0.0684696\n", "1 MSinceMostRecentInqexcl7days > 0.00 0.680258\n", "2 ExternalRiskEstimate 0.654171\n", "3 NetFractionRevolvingBurden -0.554063\n", "4 NumSatisfactoryTrades 0.551644\n", "5 NumInqLast6M -0.463222\n", "6 NumBank2NatlTradesWHighUtilization -0.448346\n", "7 AverageMInFile <= 52.00 -0.434366\n", "8 NumRevolvingTradesWBalance <= 5.00 0.421533\n", "9 MaxDelq2PublicRecLast12M <= 5.00 -0.418156\n", "10 PercentInstallTrades > 50.00 -0.317581\n", "11 NumSatisfactoryTrades <= 12.00 -0.31248\n", "12 MSinceMostRecentDelq <= 21.00 -0.301572\n", "13 PercentTradesNeverDelq <= 95.00 -0.273936\n", "14 ExternalRiskEstimate > 75.00 0.263452\n", "15 AverageMInFile <= 84.00 -0.182134\n", "16 PercentTradesNeverDelq 0.166524\n", "17 AverageMInFile 0.150683\n", "18 PercentInstallTrades > 42.00 -0.148731\n", "19 NumBank2NatlTradesWHighUtilization <= 0.00 0.135388\n", "20 MSinceOldestTradeOpen <= 122.00 -0.132505\n", "21 PercentTradesNeverDelq <= 91.00 -0.117713\n", "22 NumSatisfactoryTrades <= 17.00 -0.110228\n", "23 ExternalRiskEstimate > 72.00 0.107617\n", "24 NumInqLast6M > 0.00 -0.0993614\n", "25 MSinceOldestTradeOpen <= 146.00 -0.0966503\n", "26 PercentInstallTrades <= 42.00 0.0916733\n", "27 MSinceMostRecentInqexcl7days <= 0.00 -0.0900543\n", "28 AverageMInFile <= 61.00 -0.0794703\n", "29 AverageMInFile <= 76.00 -0.072278\n", "30 NetFractionRevolvingBurden <= 39.00 0.0627657\n", "31 MSinceOldestTradeOpen > 122.00 0.060358\n", "32 NetFractionRevolvingBurden <= 50.00 0.0455664\n", "33 MSinceOldestTradeOpen 0.0421272\n", "34 ExternalRiskEstimate > 69.00 0.0354293\n", "35 PercentTradesWBalance <= 73.00 -0.0345454\n", "36 MSinceOldestTradeOpen > 146.00 0.024503" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Instantiate LRR with good complexity penalties and numerical features\n", "from aix360.algorithms.rbm import LogisticRuleRegression\n", "lrr = LogisticRuleRegression(lambda0=0.005, lambda1=0.001, useOrd=True)\n", "\n", "# Train, print, and evaluate model\n", "lrr.fit(dfTrain, yTrain, dfTrainStd)\n", "print('Training accuracy:', accuracy_score(yTrain, lrr.predict(dfTrain, dfTrainStd)))\n", "print('Test accuracy:', accuracy_score(yTest, lrr.predict(dfTest, dfTestStd)))\n", "print('Probability of Y=1 is predicted as logistic(z) = 1 / (1 + exp(-z))')\n", "print('where z is a linear combination of the following rules/numerical features:')\n", "lrr.explain()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The test accuracy of LogRR is significantly better than that of BRCG and even better than the neural network in the [Loan Officer](#c2) and [Customer](#contrastive) sections. The LogRR model remains directly interpretable as it is a logistic regression model that uses the 36 rule-based and ordinal features shown above (in addition to an intercept term). Rules are distinguished by having one or more conditions on feature values (e.g. AverageMInFile <= 52.0) while ordinal features are marked by just the feature name without conditions (e.g. ExternalRiskEstimate). Being a linear model, feature importance is naturally given by the model coefficients and thus the list is sorted in order of decreasing coefficient magnitude. The list can be truncated if the user wishes to display fewer features.\n", "\n", "Since the rules in this LogRR model happen to all be single conditions on individual features, the model contains no interactions between features. It is therefore a kind of [generalized additive model (GAM)](https://en.wikipedia.org/wiki/Generalized_additive_model), i.e. a sum of functions of individual features, where these functions are themselves sums of step function components from rules and linear components from unbinarized ordinal features. Thus a better way to visualize the model is by plotting the univariate functions that make up the GAM, as we do next." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 2.4. Visualize LogRR model as a Generalized Additive Model (GAM)\n", "We use the `visualize()` method of `LogisticRuleRegression` to plot the functions in the GAM that corresponds to the LogRR model (more generally, `visualize()` plots the GAM part of a LogRR model, excluding higher-degree rules). The plots show the sizes and shapes of the model's dependences on individual features. These can then be compared to a lending expert's knowledge. In the present case, all plots indicate that the model behaves as we would expect with some interesting nuances. \n", "\n", "The 36 features shown above involve only 14 of the original features in the data (not including the intercept), as verified below. For example, ExternalRiskEstimate appears in its unbinarized form in row 2 above and also in 3 rules (rows 14, 23, 34)." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfx = lrr.explain()\n", "# Separate 1st-degree rules into (feature, operation, value) to count unique features\n", "dfx2 = dfx['rule/numerical feature'].str.split(' ', expand=True)\n", "dfx2.columns = ['feature','operation','value']\n", "dfx2['feature'].nunique() # includes intercept" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It follows that there are 14 functions to plot, which we organize into semantic groups below to ease interpretation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### ExternalRiskEstimate\n", "As expected from the BRCG Boolean rule above, 'ExternalRiskEstimate' is an important feature positively correlated with good credit risk. The jumps in the plot indicate that applicants with above average 'ExternalRiskEstimate' (the mean is 72) get an additional boost." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['ExternalRiskEstimate']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Credit inquiries\n", "The next two plots illustrate the dependence on the applicant's credit inquiries. The first plot shows a significant penalty for having less than one month since the most recent inquiry ('MSinceMostRecentInqexcl7days' = 0)." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkcAAAGwCAYAAACjPMHLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzde1xUdeL/8feADpfk4oWbiqJoKt4oTJbc1JLCLM1sW9ZqVTRcTdOki1Krpt82rC01y5Uyb221mmZZVpqRWhpJoWQpkdcwFbyDQoHC+f3Rz9mZBZXRYcbR1/PxOI/H8DmXec9xHs27c86cMRmGYQgAAACSJA9XBwAAALicUI4AAACsUI4AAACsUI4AAACsUI4AAACsUI4AAACsUI4AAACs1HF1gMtdZWWlDhw4ID8/P5lMJlfHAQAANWAYhk6ePKnGjRvLw8O+Y0GUows4cOCAwsPDXR0DAABchH379qlp06Z2rUM5ugA/Pz9Jv+9cf39/F6cBAAA1UVxcrPDwcMvnuD0oRxdw9lSav78/5QgAADdzMZfEcEE2AACAFcoRAACAFbcrR7Nnz1ZERIS8vb0VGxurrKyscy7bs2dPmUymKtMdd9zhxMQAAMCduFU5WrJkiVJSUjR58mRt3rxZnTt3VkJCgg4dOlTt8suXL9fBgwct0w8//CBPT0/de++9Tk4OAADchVuVo+nTpys5OVlJSUmKiopSenq6fH19NX/+/GqXb9CggUJDQy3TmjVr5Ovre95yVFZWpuLiYpsJAABcPdymHJWXlys7O1vx8fGWMQ8PD8XHxyszM7NG25g3b57+8pe/6JprrjnnMmlpaQoICLBM3OMIAICri9uUoyNHjqiiokIhISE24yEhISooKLjg+llZWfrhhx/04IMPnne51NRUFRUVWaZ9+/ZdUm4AAOBerpr7HM2bN08dO3ZU165dz7ucl5eXvLy8nJQKAABcbtzmyFGjRo3k6empwsJCm/HCwkKFhoaed92SkhItXrxYw4YNq82IAADgCuA25chsNismJkYZGRmWscrKSmVkZCguLu686y5dulRlZWV64IEHajsmAABwc251Wi0lJUWDBw9Wly5d1LVrV82cOVMlJSVKSkqSJA0aNEhNmjRRWlqazXrz5s1T//791bBhQ1fEBgAAbsStylFiYqIOHz6sSZMmqaCgQNHR0Vq1apXlIu38/Hx5eNgeDMvLy9OGDRv06aefuiIyAABwMybDMAxXh7icFRcXKyAgQEVFRQ794VnDMPTr6QqHbQ8AAHflU9fzon4g9nwu5fPbrY4cXSkMw9Cf0jOV/fNxV0cBAMDltk9NkK/58qkkbnNB9pXk19MVFCMAAC5Tl09Nu0p9+/d4+Zo9XR0DAACX8al7eX0OUo5czNfseVkdSgQA4GrHaTUAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArlCMAAAArbleOZs+erYiICHl7eys2NlZZWVnnXf7EiRMaNWqUwsLC5OXlpWuvvVYff/yxk9ICAAB3U8fVAeyxZMkSpaSkKD09XbGxsZo5c6YSEhKUl5en4ODgKsuXl5fr1ltvVXBwsJYtW6YmTZro559/VmBgoAvSAwAAd+BW5Wj69OlKTk5WUlKSJCk9PV0fffSR5s+frwkTJlRZfv78+Tp27Ji++uor1a1bV5IUERHhzMgAAMDNuM1ptfLycmVnZys+Pt4y5uHhofj4eGVmZla7zgcffKC4uDiNGjVKISEh6tChg5599llVVFSc83nKyspUXFxsMwEAgKuH25SjI0eOqKKiQiEhITbjISEhKigoqHad3bt3a9myZaqoqNDHH3+siRMn6sUXX9QzzzxzzudJS0tTQECAZQoPD3fo6wAAAJc3tylHF6OyslLBwcF67bXXFBMTo8TERD311FNKT08/5zqpqakqKiqyTPv27XNiYgAA4Gpuc81Ro0aN5OnpqcLCQpvxwsJChYaGVrtOWFiY6tatK09PT8tYu3btVFBQoPLycpnN5irreHl5ycvLy7HhAQCA23CbI0dms1kxMTHKyMiwjFVWViojI0NxcXHVrtOtWzft3LlTlZWVlrGffvpJYWFh1RYjAAAAtylHkpSSkqK5c+dq0aJFys3N1ciRI1VSUmL59tqgQYOUmppqWX7kyJE6duyYxo4dq59++kkfffSRnn32WY0aNcpVLwEAAFzm3Oa0miQlJibq8OHDmjRpkgoKChQdHa1Vq1ZZLtLOz8+Xh8d/+154eLhWr16tcePGqVOnTmrSpInGjh2r8ePHu+olAACAy5zJMAzD1SEuZ8XFxQoICFBRUZH8/f0dss3S8jOKmrRakrR9aoJ8zW7VUQEAuOxdyue3W51WAwAAqG2UIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACuUIwAAACsOK0fHjx/XG2+84ajNAQAAuITDylF+fr6SkpIctTkAAACXqFPTBYuLi887/+TJk5ccBgAAwNVqXI4CAwNlMpnOOd8wjPPOBwAAcAc1Lkd+fn566qmnFBsbW+38HTt26G9/+5vDggEAALhCjcvR9ddfL0nq0aNHtfMDAwNlGIZjUgEAALhIjS/Ivu++++Tt7X3O+aGhoZo8ebJDQgEAALhKjY8cJScnn3d+SEgI5QgAALg9bgIJAABg5aLK0caNG1VWVlblMQAAgLu7qHJ0++23a//+/VUeAwAAuLuLKkfW30rjG2oAAOBKwjVHAAAAVtyuHM2ePVsRERHy9vZWbGyssrKyzrnswoULZTKZbKbz3Y4AAADArcrRkiVLlJKSosmTJ2vz5s3q3LmzEhISdOjQoXOu4+/vr4MHD1qmn3/+2YmJAQCAu3GrcjR9+nQlJycrKSlJUVFRSk9Pl6+vr+bPn3/OdUwmk0JDQy1TSEiIExMDAAB34zblqLy8XNnZ2YqPj7eMeXh4KD4+XpmZmedc79SpU2revLnCw8N11113adu2bed9nrKyMhUXF9tMAADg6uE25ejIkSOqqKiocuQnJCREBQUF1a7Tpk0bzZ8/XytWrNCbb76pyspK3Xjjjfrll1/O+TxpaWkKCAiwTOHh4Q59HQAA4PJ2UeXo1VdftZQU68eXm7i4OA0aNEjR0dHq0aOHli9frqCgIL366qvnXCc1NVVFRUWWad++fU5MDAAAXK3Gv61mGIZMJpOk33+E9izrx7WpUaNG8vT0VGFhoc14YWGhQkNDa7SNunXr6rrrrtPOnTvPuYyXl5e8vLwuKSsAAHBfNT5y1K1bt/OWitpmNpsVExOjjIwMy1hlZaUyMjIUFxdXo21UVFTo+++/V1hYWG3FBAAAbq7G5ahp06aKjo7W7NmzazPPeaWkpGju3LlatGiRcnNzNXLkSJWUlCgpKUmSNGjQIKWmplqWnzp1qj799FPt3r1bmzdv1gMPPKCff/5ZDz74oKteAgAAuMzV+LTaO++8o6VLl2r06NF6//33tWDBAjVt2rQ2s1WRmJiow4cPa9KkSSooKFB0dLRWrVplueYpPz9fHh7/7XvHjx9XcnKyCgoKVL9+fcXExOirr75SVFSUU3MDAAD3YTLs/HG0w4cPa9SoUVqzZo3++te/qk4d2341ffp0hwZ0teLiYgUEBKioqEj+/v4O2WZp+RlFTVotSdo+NUG+5hp3VAAAUAOX8vlt96dygwYN1K5dO7333nvasmWLTTk6e8E2AACAu7KrHG3btk2DBg3SsWPH9Omnn+rmm2+urVwAAAAuUeMLsqdNm6aYmBh17txZW7dupRgBAIArUo2PHL300ktaunSp+vbtW5t5AAAAXKrG5eiHH35Qw4YNazMLAACAy9X4tBrFCAAAXA3c5odnAQAAnIFyBAAAYKVG5SglJUUlJSWSpC+++EJnzpyp1VAAAACuUqNy9PLLL+vUqVOSpJtvvlnHjh2r1VAAAACuUqNvq0VERGjWrFm67bbbZBiGMjMzVb9+/WqX7d69u0MDAgAAOFONytE///lPjRgxQmlpaTKZTLr77rurXc5kMqmiosKhAQEAAJypRuWof//+6t+/v06dOiV/f3/l5eUpODi4trMBAAA4nV2/rVavXj2tXbtWLVq0sPnBWQAAgCuF3Q2nR48eqqio0Lvvvqvc3FxJUlRUlO666y55eno6PCAAAIAz2V2Odu7cqTvuuEO//PKL2rRpI0lKS0tTeHi4PvroI0VGRjo8JAAAgLPYfRPIMWPGqGXLltq3b582b96szZs3Kz8/Xy1atNCYMWNqIyMAAIDT2H3kaP369fr666/VoEEDy1jDhg01bdo0devWzaHhAAAAnM3uI0deXl46efJklfFTp07JbDY7JBQAAICr2F2O7rzzTg0fPlybNm2SYRgyDENff/21RowYoX79+tVGRgAAAKexuxzNmjVLkZGRiouLk7e3t7y9vdWtWze1atVKL730Um1kBAAAcBq7rzkKDAzUihUrtHPnTstX+du1a6dWrVo5PBwAAICzXfSdHFu1akUhAgAAVxy7T6sBAABcyShHAAAAVihHAAAAVihHAAAAVuwuR6tWrdKGDRssf8+ePVvR0dG67777dPz4cYeGAwAAcDa7y9Hjjz+u4uJiSdL333+vRx99VH369NGePXuUkpLi8IAAAADOZPdX+ffs2aOoqChJ0rvvvqs777xTzz77rDZv3qw+ffo4PCAAAIAz2X3kyGw2q7S0VJL02Wef6bbbbpMkNWjQwHJECQAAwF3ZfeToj3/8o1JSUtStWzdlZWVpyZIlkqSffvpJTZs2dXhAAAAAZ7L7yNErr7yiOnXqaNmyZZozZ46aNGkiSfrkk0/Uu3dvhwcEAABwJrvLUbNmzbRy5Up99913GjZsmGV8xowZmjVrlkPDVWf27NmKiIiQt7e3YmNjlZWVVaP1Fi9eLJPJpP79+9dyQgAA4M5qdFrNnmuJ/P39LzrMhSxZskQpKSlKT09XbGysZs6cqYSEBOXl5Sk4OPic6+3du1ePPfaYbrrpplrLBgAArgw1KkeBgYEymUw12mBFRcUlBTqf6dOnKzk5WUlJSZKk9PR0ffTRR5o/f74mTJhwzjz333+/pkyZoi+//FInTpw473OUlZWprKzM8jcXmQMAcHWpUTlau3at5fHevXs1YcIEDRkyRHFxcZKkzMxMLVq0SGlpabWTUlJ5ebmys7OVmppqGfPw8FB8fLwyMzPPud7UqVMVHBysYcOG6csvv7zg86SlpWnKlCkOyQwAANxPjcpRjx49LI+nTp2q6dOna+DAgZaxfv36qWPHjnrttdc0ePBgx6eUdOTIEVVUVCgkJMRmPCQkRD/++GO162zYsEHz5s1TTk5OjZ8nNTXV5maWxcXFCg8Pv7jQAADA7dh9QXZmZqa6dOlSZbxLly41vjjaGU6ePKm//vWvmjt3rho1alTj9by8vOTv728zAQCAq4fd9zkKDw/X3Llz9fzzz9uMv/7667V6hKVRo0by9PRUYWGhzXhhYaFCQ0OrLL9r1y7t3btXffv2tYxVVlZKkurUqaO8vDxFRkbWWl4AAOCe7C5HM2bM0D333KNPPvlEsbGxkqSsrCzt2LFD7777rsMDnmU2mxUTE6OMjAzL1/ErKyuVkZGh0aNHV1m+bdu2+v77723G/v73v+vkyZN66aWXOFUGAACqZXc56tOnj3766SfNmTPHcq1P3759NWLEiFovHCkpKRo8eLC6dOmirl27aubMmSopKbF8e23QoEFq0qSJ0tLS5O3trQ4dOtisHxgYKElVxgEAAM6yuxxJv59ae/bZZx2d5YISExN1+PBhTZo0SQUFBYqOjtaqVassF2nn5+fLw8Puy6gAAAAsTIZhGBdaaOvWrTXeYKdOnS4p0OWmuLhYAQEBKioqctjF2aXlZxQ1abUkafvUBPmaL6qjAgCAc7iUz+8afSpHR0fLZDLJMAybm0Ge7VXWY7V5E0gAAIDaVqNzUHv27NHu3bu1Z88evfvuu2rRooX+9a9/KScnRzk5OfrXv/6lyMjIWr0gGwAAwBlqdOSoefPmlsf33nuvZs2apT59+ljGOnXqpPDwcE2cOJEfdgUAAG7N7quXv//+e7Vo0aLKeIsWLbR9+3aHhAIAAHAVu8tRu3btlJaWpvLycstYeXm50tLS1K5dO4eGAwAAcDa7vyaVnp6uvn37qmnTppZvpm3dulUmk0kffvihwwMCAAA4k93lqGvXrtq9e7feeusty00gExMTdd999+maa65xeEAAAABnuqgb7FxzzTUaPny4o7MAAAC43EWVo127dmnmzJnKzc2VJLVv315jxozhh1wBAIDbs/uC7NWrVysqKkpZWVnq1KmTOnXqpK+//lrt27fXmjVraiMjAACA09h95GjChAkaN26cpk2bVmV8/PjxuvXWWx0WDgAAwNnsPnKUm5urYcOGVRkfOnQo9zkCAABuz+5yFBQUpJycnCrjOTk5Cg4OdkgoAAAAV7H7tFpycrKGDx+u3bt368Ybb5Qkbdy4Uc8995xSUlIcHhAAAMCZ7C5HEydOlJ+fn1588UWlpqZKkho3bqynn35aY8aMcXhAAAAAZ7K7HJlMJo0bN07jxo3TyZMnJUl+fn4ODwYAAOAKdl9zZG327NmqqKhwVBYAAACXu6Ry9Oyzz+rYsWOOygIAAOByl1SODMNwVA4AAIDLwiWVIwAAgCvNRf222lnbt29X48aNHZUFAADA5S6pHIWHhzsqBwAAwGXB7nJUv359mUymKuMmk0ne3t5q1aqVhgwZoqSkJIcEBAAAcCa7y9GkSZP0j3/8Q7fffru6du0qScrKytKqVas0atQo7dmzRyNHjtSZM2eUnJzs8MAAAAC1ye5ytGHDBj3zzDMaMWKEzfirr76qTz/9VO+++646deqkWbNmUY4AAIDbsfvbaqtXr1Z8fHyV8V69emn16tWSpD59+mj37t2Xng4AAMDJ7C5HDRo00Icfflhl/MMPP1SDBg0kSSUlJfykCAAAcEsX9cOzI0eO1Nq1ay3XHH3zzTf6+OOPlZ6eLklas2aNevTo4dikAAAATmB3OUpOTlZUVJReeeUVLV++XJLUpk0brV+/XjfeeKMk6dFHH3VsSgAAACe5qPscdevWTd26dXN0FgAAAJe7qHJUUVGh999/X7m5uZKk9u3bq1+/fvL09HRoOAAAAGezuxzt3LlTffr00f79+9WmTRtJUlpamsLDw/XRRx8pMjLS4SEBAACcxe5vq40ZM0aRkZHat2+fNm/erM2bNys/P18tWrTQmDFjaiOjjdmzZysiIkLe3t6KjY1VVlbWOZddvny5unTposDAQF1zzTWKjo7Wv//971rPCAAA3JfdR47Wr1+vr7/+2vK1fUlq2LChpk2bVuvXIS1ZskQpKSlKT09XbGysZs6cqYSEBOXl5Sk4OLjK8g0aNNBTTz2ltm3bymw2a+XKlUpKSlJwcLASEhJqNSsAAHBPdh858vLy0smTJ6uMnzp1Smaz2SGhzmX69OlKTk5WUlKSoqKilJ6eLl9fX82fP7/a5Xv27Km7775b7dq1U2RkpMaOHatOnTppw4YNtZoTAAC4L7vL0Z133qnhw4dr06ZNMgxDhmHo66+/1ogRI9SvX7/ayChJKi8vV3Z2ts3duT08PBQfH6/MzMwLrm8YhjIyMpSXl6fu3bufc7mysjIVFxfbTAAA4OphdzmaNWuWIiMjFRcXJ29vb3l7e6tbt25q1aqVXnrppdrIKEk6cuSIKioqFBISYjMeEhKigoKCc65XVFSkevXqyWw264477tDLL7+sW2+99ZzLp6WlKSAgwDKFh4c77DUAAIDLn93XHAUGBmrFihXasWOHfvzxR0lSu3bt1KpVK4eHcwQ/Pz/l5OTo1KlTysjIUEpKilq2bKmePXtWu3xqaqpSUlIsfxcXF1OQAAC4ilzUfY4kqXXr1mrdurUjs5xXo0aN5OnpqcLCQpvxwsJChYaGnnM9Dw8PS3GLjo5Wbm6u0tLSzlmOvLy85OXl5bDcAADAvdSoHFkfSbmQ6dOnX3SY8zGbzYqJiVFGRob69+8vSaqsrFRGRoZGjx5d4+1UVlaqrKysVjICAAD3V6NytGXLlhptzGQyXVKYC0lJSdHgwYPVpUsXde3aVTNnzlRJSYmSkpIkSYMGDVKTJk2UlpYm6ffrh7p06aLIyEiVlZXp448/1r///W/NmTOnVnMCAAD3VaNytHbt2trOUSOJiYk6fPiwJk2apIKCAkVHR2vVqlWWi7Tz8/Pl4fHfa8xLSkr00EMP6ZdffpGPj4/atm2rN998U4mJia56CQAA4DJnMgzDcHWIy1lxcbECAgJUVFQkf39/h2yztPyMoiatliRtn5ogX/NFX/oFAACqcSmf33Z/lR8AAOBKRjkCAACwQjkCAACwQjkCAACwclFXAu/atUszZ85Ubm6uJCkqKkpjx45VZGSkQ8MBAAA4m91HjlavXq2oqChlZWWpU6dO6tSpkzZt2qT27dtrzZo1tZERAADAaew+cjRhwgSNGzdO06ZNqzI+fvz48/6oKwAAwOXO7iNHubm5GjZsWJXxoUOHavv27Q4JBQAA4Cp2l6OgoCDl5ORUGc/JyVFwcLBDQgEAALiK3afVkpOTNXz4cO3evVs33nijJGnjxo167rnn7PqBWgAAgMuR3eVo4sSJ8vPz04svvqjU1FRJUuPGjfX0009rzJgxDg8IAADgTHaXI5PJpHHjxmncuHE6efKkJMnPz8/hwQAAAFzB7muObrnlFp04cULS76XobDEqLi7WLbfc4th0AAAATmZ3OVq3bp3Ky8urjP/222/68ssvHRIKAADAVWp8Wm3r1q2Wx9u3b1dBQYHl74qKCq1atUpNmjRxbDoAAAAnq3E5io6Olslkkslkqvb0mY+Pj15++WWHhgMAAHC2GpejPXv2yDAMtWzZUllZWQoKCrLMM5vNCg4OlqenZ62EBAAAcJYal6PmzZtLkiorK2stDAAAgKvZfUE2AADAlYxyBAAAYIVyBAAAYIVyBAAAYMXunw85q7y8XIcOHapygXazZs0uORQAAICr2F2OduzYoaFDh+qrr76yGTcMQyaTSRUVFQ4LBwAA4Gx2l6MhQ4aoTp06WrlypcLCwmQymWojFwAAgEvYXY5ycnKUnZ2ttm3b1kYeAAAAl7L7guyoqCgdOXKkNrIAAAC4nN3l6LnnntMTTzyhdevW6ejRoyouLraZAAAA3Jndp9Xi4+MlSb169bIZ54JsAABwJbC7HK1du7Y2cgAAAFwW7C5HPXr0qI0cAAAAl4WLugnkiRMnNG/ePOXm5kqS2rdvr6FDhyogIMCh4QAAAJzN7guyv/32W0VGRmrGjBk6duyYjh07punTpysyMlKbN2+ujYwAAABOY3c5GjdunPr166e9e/dq+fLlWr58ufbs2aM777xTjzzySG1ktDF79mxFRETI29tbsbGxysrKOueyc+fO1U033aT69eurfv36io+PP+/yAAAAF3XkaPz48apT579n5OrUqaMnnnhC3377rUPD/a8lS5YoJSVFkydP1ubNm9W5c2clJCTo0KFD1S6/bt06DRw4UGvXrlVmZqbCw8N12223af/+/bWaEwAAuC+7y5G/v7/y8/OrjO/bt09+fn4OCXUu06dPV3JyspKSkhQVFaX09HT5+vpq/vz51S7/1ltv6aGHHlJ0dLTatm2r119/XZWVlcrIyDjnc5SVlXHvJgAArmJ2l6PExEQNGzZMS5Ys0b59+7Rv3z4tXrxYDz74oAYOHFgbGSVJ5eXlys7OttxnSZI8PDwUHx+vzMzMGm2jtLRUp0+fVoMGDc65TFpamgICAixTeHj4JWcHAADuw+5vq73wwgsymUwaNGiQzpw5I0mqW7euRo4cqWnTpjk84FlHjhxRRUWFQkJCbMZDQkL0448/1mgb48ePV+PGjW0K1v9KTU1VSkqK5e/i4mIKEgAAVxG7y5HZbNZLL72ktLQ07dq1S5IUGRkpX19fh4dzpGnTpmnx4sVat26dvL29z7mcl5eXvLy8nJgMAABcTi7qPkeS5Ovrq44dOzoyy3k1atRInp6eKiwstBkvLCxUaGjoedd94YUXNG3aNH322Wfq1KlTbcYEAABurkblaMCAAVq4cKH8/f01YMCA8y67fPlyhwT7X2azWTExMcrIyFD//v0lyXJx9ejRo8+53vPPP69//OMfWr16tbp06VIr2QAAwJWjRuUoICBAJpNJ0u/fVjv72NlSUlI0ePBgdenSRV27dtXMmTNVUlKipKQkSdKgQYPUpEkTpaWlSZKee+45TZo0SW+//bYiIiJUUFAgSapXr57q1avnktcAAAAubzUqRwsWLLA8XrhwYW1luaDExEQdPnxYkyZNUkFBgaKjo7Vq1SrLRdr5+fny8PjvF/DmzJmj8vJy/elPf7LZzuTJk/X00087MzoAAHATJsMwDHtWuOWWW7R8+XIFBgbajBcXF6t///76/PPPHRrQ1YqLixUQEKCioiL5+/s7ZJul5WcUNWm1JGn71AT5mi/60i8AAFCNS/n8tvs+R+vWrVN5eXmV8d9++01ffvmlvZsDAAC4rNT4kMXWrVstj7dv3265fkeSKioqtGrVKjVp0sSx6QAAAJysxuUoOjpaJpNJJpNJt9xyS5X5Pj4+evnllx0aDgAAwNlqXI727NkjwzDUsmVLZWVlKSgoyDLPbDYrODhYnp6etRISAADAWWpcjpo3by7p93sLAQAAXKns/prUG2+8cd75gwYNuugwAAAArmZ3ORo7dqzN36dPn1ZpaanMZrN8fX0pRwAAwK3Z/VX+48eP20ynThCI8kkAACAASURBVJ1SXl6e/vjHP+o///lPbWQEAABwGrvLUXVat26tadOmVTmqBAAA4G4cUo4kqU6dOjpw4ICjNgcAAOASdl9z9MEHH9j8bRiGDh48qFdeeUXdunVzWDAAAABXsLsc9e/f3+Zvk8mkoKAg3XLLLXrxxRcdFgwAAMAV7C5H3OcIAABcyS7pmiPDMGQYhqOyAAAAuNxFlaN58+apQ4cO8vb2lre3tzp06KDXX3/d0dkAAACczu7TapMmTdL06dP18MMPKy4uTpKUmZmpcePGKT8/X1OnTnV4SAAAAGexuxzNmTNHc+fO1cCBAy1j/fr1U6dOnfTwww9TjgAAgFuz+7Ta6dOn1aVLlyrjMTExOnPmjENCAQAAuIrd5eivf/2r5syZU2X8tdde0/333++QUAAAAK5So9NqKSkplscmk0mvv/66Pv30U/3hD3+QJG3atEn5+fn86CwAAHB7NSpHW7Zssfk7JiZGkrRr1y5JUqNGjdSoUSNt27bNwfEAAACcq0blaO3atbWdAwAA4LLgsB+eBQAAuBLU6MjRgAEDtHDhQvn7+2vAgAHnXXb58uUOCQYAAOAKNSpHAQEBMplMlscAAABXqhqVowULFkj6/bfUpkyZoqCgIPn4+NRqMAAAAFew65ojwzDUqlUr/fLLL7WVBwAAwKXsKkceHh5q3bq1jh49Wlt5AAAAXMrub6tNmzZNjz/+uH744YfayAMAAOBSdv/w7KBBg1RaWqrOnTvLbDZXufbo2LFjDgsHAADgbHaXoxkzZli+uQYAAHClsbscDRkypBZiAAAAXB7svubI09NThw4dqjJ+9OhReXp6OiTU+cyePVsRERHy9vZWbGyssrKyzrnstm3bdM899ygiIkImk0kzZ86s9XwAAMC92V2ODMOodrysrExms/mSA53PkiVLlJKSosmTJ2vz5s3q3LmzEhISqi1rklRaWqqWLVtq2rRpCg0NrdVsAADgylDj02qzZs2SJJlMJr3++uuqV6+eZV5FRYW++OILtW3b1vEJrUyfPl3JyclKSkqSJKWnp+ujjz7S/PnzNWHChCrL33DDDbrhhhskqdr5AAAA/6vG5WjGjBmSfj9ylJ6ebnMKzWw2KyIiQunp6Y5P+P+Vl5crOztbqampljEPDw/Fx8crMzPTYc9TVlamsrIyy9/FxcUO2zYAALj81bgc7dmzR5J08803a/ny5apfv36tharOkSNHVFFRoZCQEJvxkJAQ/fjjjw57nrS0NE2ZMsVh2wMAAO7F7muO1q5d6/Ri5EypqakqKiqyTPv27XN1JAAA4ER2f5W/oqJCCxcuVEZGhg4dOqTKykqb+Z9//rnDwllr1KiRPD09VVhYaDNeWFjo0Iutvby85OXl5bDtAQAA92L3kaOxY8dq7NixqqioUIcOHdS5c2ebqbaYzWbFxMQoIyPDMlZZWamMjAzFxcXV2vMCAICri91HjhYvXqx33nlHffr0qY0855WSkqLBgwerS5cu6tq1q2bOnKmSkhLLt9cGDRqkJk2aKC0tTdLvF3Fv377d8nj//v3KyclRvXr11KpVK6fnBwAAlz+7y5HZbHZZsUhMTNThw4c1adIkFRQUKDo6WqtWrbJcpJ2fny8Pj/8eDDtw4ICuu+46y98vvPCCXnjhBfXo0UPr1q1zdnwAAOAGTMa57up4Di+++KJ2796tV1555ar4jbXi4mIFBASoqKhI/v7+DtlmafkZRU1aLUnaPjVBvma7OyoAADiPS/n8tvtTecOGDVq7dq0++eQTtW/fXnXr1rWZv3z5cns3CQAAcNmwuxwFBgbq7rvvro0sAAAALmd3OVqwYEFt5AAAALgsXPTFLocPH1ZeXp4kqU2bNgoKCnJYKAAAAFex+z5HJSUlGjp0qMLCwtS9e3d1795djRs31rBhw1RaWlobGQEAAJzG7nKUkpKi9evX68MPP9SJEyd04sQJrVixQuvXr9ejjz5aGxkBAACcxu7Tau+++66WLVumnj17Wsb69OkjHx8f/fnPf9acOXMcmQ8AAMCp7D5yVFpaarnporXg4GBOqwEAALdndzmKi4vT5MmT9dtvv1nGfv31V02ZMoXfOAMAAG7P7tNqL730khISEtS0aVPLD81+99138vb21urVqx0eEAAAwJnsLkcdOnTQjh079NZbb+nHH3+UJA0cOFD333+/fHx8HB4QAADAmS7qPke+vr5KTk52dBYAAACXs/uao7S0NM2fP7/K+Pz58/Xcc885JBQAAICr2F2OXn31VbVt27bKePv27ZWenu6QUAAAAK5idzkqKChQWFhYlfGgoCAdPHjQIaEAAABcxe5yFB4ero0bN1YZ37hxoxo3buyQUAAAAK5i9wXZycnJeuSRR3T69GndcsstkqSMjAw98cQT/HwIAABwe3aXo8cff1xHjx7VQw89pPLyckmSt7e3xo8fr9TUVIcHBAAAcCa7y5HJZNJzzz2niRMnKjc3Vz4+PmrdurW8vLxqIx8AAIBTXdR9jiSpXr16uuGGGxyZBQAAwOXsviAbAADgSkY5AgAAsEI5AgAAsEI5AgAAsEI5AgAAsEI5AgAAsEI5AgAAsEI5AgAAsEI5AgAAsEI5AgAAsEI5AgAAsEI5AgAAsEI5AgAAsOJ25Wj27NmKiIiQt7e3YmNjlZWVdd7lly5dqrZt28rb21sdO3bUxx9/7KSkAADAHblVOVqyZIlSUlI0efJkbd68WZ07d1ZCQoIOHTpU7fJfffWVBg4cqGHDhmnLli3q37+/+vfvrx9++MHJyQEAgLswGYZhuDpETcXGxuqGG27QK6+8IkmqrKxUeHi4Hn74YU2YMKHK8omJiSopKdHKlSstY3/4wx8UHR2t9PT0Gj1ncXGxAgICVFRUJH9/f4e8jtLyM4qatFqStH1qgnzNdRyyXQAA8LtL+fx2myNH5eXlys7OVnx8vGXMw8ND8fHxyszMrHadzMxMm+UlKSEh4ZzLS1JZWZmKi4ttJgAAcPVwm3J05MgRVVRUKCQkxGY8JCREBQUF1a5TUFBg1/KSlJaWpoCAAMsUHh5+6eEBAIDbcJty5CypqakqKiqyTPv27XN1JAAA4ERuc7FLo0aN5OnpqcLCQpvxwsJChYaGVrtOaGioXctLkpeXl7y8vC49MAAAcEtuc+TIbDYrJiZGGRkZlrHKykplZGQoLi6u2nXi4uJslpekNWvWnHN5AAAAtzlyJEkpKSkaPHiwunTpoq5du2rmzJkqKSlRUlKSJGnQoEFq0qSJ0tLSJEljx45Vjx499OKLL+qOO+7Q4sWL9e233+q1115z5csAAACXMbcqR4mJiTp8+LAmTZqkgoICRUdHa9WqVZaLrvPz8+Xh8d+DYTfeeKPefvtt/f3vf9eTTz6p1q1b6/3331eHDh1c9RIAAMBlzq3uc+QK3OcIAAD3c1Xc5wgAAMAZKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABWKEcAAABW3KYcHTt2TPfff7/8/f0VGBioYcOG6dSpU+dd57XXXlPPnj3l7+8vk8mkEydOOCnt+fnU9dT2qQnaPjVBPnU9XR0HAABYcZtydP/992vbtm1as2aNVq5cqS+++ELDhw8/7zqlpaXq3bu3nnzySSelrBmTySRfcx35muvIZDK5Og4AALBiMgzDcHWIC8nNzVVUVJS++eYbdenSRZK0atUq9enTR7/88osaN2583vXXrVunm2++WcePH1dgYOB5ly0rK1NZWZnl7+LiYoWHh6uoqEj+/v6X/mIAAECtKy4uVkBAwEV9frvFkaPMzEwFBgZaipEkxcfHy8PDQ5s2bXLoc6WlpSkgIMAyhYeHO3T7AADg8uYW5aigoEDBwcE2Y3Xq1FGDBg1UUFDg0OdKTU1VUVGRZdq3b59Dtw8AAC5vLi1HEyZMkMlkOu/0448/OjWTl5eX/P39bSYAAHD1qOPKJ3/00Uc1ZMiQ8y7TsmVLhYaG6tChQzbjZ86c0bFjxxQaGlqLCQEAwNXGpeUoKChIQUFBF1wuLi5OJ06cUHZ2tmJiYiRJn3/+uSorKxUbG1vbMQEAwFXELa45ateunXr37q3k5GRlZWVp48aNGj16tP7yl79Yvqm2f/9+tW3bVllZWZb1CgoKlJOTo507d0qSvv/+e+Xk5OjYsWMueR0AAODy5xblSJLeeusttW3bVr169VKfPn30xz/+Ua+99ppl/unTp5WXl6fS0lLLWHp6uq677jolJydLkrp3767rrrtOH3zwgdPzAwAA9+AW9zlypUu5TwIAAHCNK/4+RwAAAM5COQIAALBCOQIAALBCOQIAALDi0vscuYOz16sXFxe7OAkAAKips5/bF/O9M8rRBZw8eVKS+AFaAADc0MmTJxUQEGDXOnyV/wIqKyt14MAB+fn5yWQyOWy7xcXFCg8P1759+7hFgBOx312D/e4a7HfXYL+7xv/ud8MwdPLkSTVu3FgeHvZdRcSRowvw8PBQ06ZNa237/Lita7DfXYP97hrsd9dgv7uG9X6394jRWVyQDQAAYIVyBAAAYMXz6aefftrVIa5Wnp6e6tmzp+rU4eymM7HfXYP97hrsd9dgv7uGo/Y7F2QDAABY4bQaAACAFcoRAACAFcoRAACAFcoRAACAFcqRi8yePVsRERHy9vZWbGyssrKyXB3pivb000/LZDLZTG3btnV1rCvOF198ob59+6px48YymUx6//33beYbhqFJkyYpLCxMPj4+io+P144dO1yU9spxof0+ZMiQKu//3r17uyjtlSEtLU033HCD/Pz8FBwcrP79+ysvL89mmd9++02jRo1Sw4YNVa9ePd1zzz0qLCx0UeIrQ032e8+ePau830eMGGHX81COXGDJkiVKSUnR5MmTtXnzZnXu3FkJCQk6dOiQq6Nd0dq3b6+DBw9apg0bNrg60hWnpKREnTt31uzZs6ud//zzz2vWrFlKT0/Xpk2bdM011yghIUG//fabk5NeWS603yWpd+/eNu////znP05MeOVZv369Ro0apa+//lpr1qzR6dOnddttt6mkpMSyzLhx4/Thhx9q6dKlWr9+vQ4cOKABAwa4MLX7q8l+l6Tk5GSb9/vzzz9v3xMZcLquXbsao0aNsvxdUVFhNG7c2EhLS3Nhqivb5MmTjc6dO7s6xlVFkvHee+9Z/q6srDRCQ0ONf/7zn5axEydOGF5eXsZ//vMfV0S8Iv3vfjcMwxg8eLBx1113uSjR1eHQoUOGJGP9+vWGYfz+3q5bt66xdOlSyzK5ubmGJCMzM9NVMa84/7vfDcMwevToYYwdO/aStsuRIycrLy9Xdna24uPjLWMeHh6Kj49XZmamC5Nd+Xbs2KHGjRurZcuWuv/++5Wfn+/qSFeVPXv2qKCgwOa9HxAQoNjYWN77TrBu3ToFBwerTZs2GjlypI4ePerqSFeUoqIiSVKDBg0kSdnZ2Tp9+rTN+71t27Zq1qwZ73cH+t/9ftZbb72lRo0aqUOHDkpNTVVpaald2+XWnU525MgRVVRUKCQkxGY8JCREP/74o4tSXfliY2O1cOFCtWnTRgcPHtSUKVN000036YcffpCfn5+r410VCgoKJKna9/7ZeagdvXv31oABA9SiRQvt2rVLTz75pG6//XZlZmbK09PT1fHcXmVlpR555BF169ZNHTp0kPT7+91sNiswMNBmWd7vjlPdfpek++67T82bN1fjxo21detWjR8/Xnl5eVq+fHmNt005wlXh9ttvtzzu1KmTYmNj1bx5c73zzjsaNmyYC5MBte8vf/mL5XHHjh3VqVMnRUZGat26derVq5cLk10ZRo0apR9++IHrGJ3sXPt9+PDhlscdO3ZUWFiYevXqpV27dikyMrJG2+a0mpM1atRInp6eVb6xUFhYqNDQUBeluvoEBgbq2muv1c6dO10d5apx9v3Ne9/1WrZsqUaNGvH+d4DRo0dr5cqVWrt2rZo2bWoZDw0NVXl5uU6cOGGzPO93xzjXfq9ObGysJNn1fqccOZnZbFZMTIwyMjIsY5WVlcrIyFBcXJwLk11dTp06pV27diksLMzVUa4aLVq0UGhoqM17v7i4WJs2beK972S//PKLjh49yvv/EhiGodGjR+u9997T559/rhYtWtjMj4mJUd26dW3e73l5ecrPz+f9fgkutN+rk5OTI0l2vd85reYCKSkpGjx4sLp06aKuXbtq5syZKikpUVJSkqujXbEee+wx9e3bV82bN9eBAwc0efJkeXp6auDAga6OdkU5deqUzf+d7dmzRzk5OWrQoIGaNWumRx55RM8884xat26tFi1aaOLEiWrcuLH69+/vwtTu73z7vUGDBpoyZYruuecehYaGateuXXriiSfUqlUrJSQkuDC1exs1apTefvttrVixQn5+fpbriAICAuTj46OAgAANGzZMKSkpatCggfz9/fXwww8rLi5Of/jDH1yc3n1daL/v2rVLb7/9tvr06aOGDRtq69atGjdunLp3765OnTrV/Iku7Ut0uFgvv/yy0axZM8NsNhtdu3Y1vv76a1dHuqIlJiYaYWFhhtlsNpo0aWIkJiYaO3fudHWsK87atWsNSVWmwYMHG4bx+9f5J06caISEhBheXl5Gr169jLy8PNeGvgKcb7+XlpYat912mxEUFGTUrVvXaN68uZGcnGwUFBS4OrZbq25/SzIWLFhgWebXX381HnroIaN+/fqGr6+vcffddxsHDx50XegrwIX2e35+vtG9e3ejQYMGhpeXl9GqVSvj8ccfN4qKiux6HtP/fzIAAACIa44AAABsUI4AAACsUI4AAACsUI4AAACsUI4AAACsUI4AAACsUI4AAACsUI4AAACsUI6Ay5TJZNL777/v6hi4SgwZMsQhP+PC+xZXAsoR4CBDhgyRyWTSiBEjqswbNWqUTCaThgwZIkk6fPiwRo4cqWbNmsnLy0uhoaFKSEjQxo0bLescPHhQt99+u7Pia+/evTKZTPL09NT+/ftt5h08eFB16tSRyWTS3r17HfJ85/owNplMlsnf31833HCDVqxY4ZDndKSz++vsj1qe9fTTTys6OtpFqRwnIiLC5t/i7DRq1ChXRwNqHeUIcKDw8HAtXrxYv/76q2Xst99+09tvv61mzZpZxu655x5t2bJFixYt0k8//aQPPvhAPXv21NGjRy3LhIaGysvLy6n5JalJkyZ64403bMYWLVqkJk2aOC3DggULdPDgQX377bfq1q2b/vSnP+n777932vND+uabb3Tw4EHLtGbNGknSvffe6+JkQO2jHAEOdP311ys8PFzLly+3jC1fvlzNmjXTddddJ0k6ceKEvvzySz333HO6+eab1bx5c3Xt2lWpqanq16+fZT3r0xNnj1IsX75cN998s3x9fdW5c2dlZmbaPP/GjRvVs2dP+fr6qn79+kpISNDx48clSZWVlUpLS1OLFi3k4+Ojzp07a9myZVVew+DBg7VgwQKbsQULFmjw4MFVll2/fr26du0qLy8vhYWFacKECTpz5oxl/rJly9SxY0f5+PioYcOGio+PV0lJiZ5++mktWrRIK1assByRWLdunWW9wMBAhYaG6tprr9X//d//6cyZM1q7dq3Nc69YsULXX3+9vL291bJlS02ZMsXmuU+cOKG//e1vCgkJkbe3tzp06KCVK1da5m/YsEE33XSTfHx8FB4erjFjxqikpMQyPyIiQs8++6yGDh0qPz8/NWvWTK+99pplfosWLSRJ1113nUwmk3r27Fll/0j/PUL2wgsvKCwsTA0bNtSoUaN0+vRpyzKHDh1S37595ePjoxYtWuitt95SRESEZs6cafN6HnzwQQUFBcnf31+33HKLvvvuO0mSYRiKj49XQkKCzv5c5rFjx9S0aVNNmjTJso1t27bpzjvvlL+/v/z8/HTTTTdp165d1eYOCgpSaGioZVq5cqUiIyPVo0cPyzI7duxQ9+7d5e3traioKEuBsjZ+/Hhde+218vX1VcuWLTVx4kTLa9+7d688PDz07bff2qwzc+ZMNW/eXJWVlTp+/Ljuv/9+BQUFycfHR61bt67y/gQcjXIEONjQoUNt/uM9f/58JSUlWf6uV6+e6tWrp/fff19lZWV2bfupp57SY489ppycHF177bUaOHCgpRDk5OSoV69eioqKUmZmpjZs2KC+ffuqoqJCkpSWlqY33nhD6enp2rZtm8aNG6cHHnhA69evt3mOfv366fjx49qwYYOk30vE8ePH1bdvX5vl9u/frz59+uiGG27Qd999pzlz5mjevHl65plnJP1+Km7gwIEaOnSocnNztW7dOg0YMECGYeixxx7Tn//8Z/Xu3dtyZOLGG2+s8nrPnDmjefPmSZLMZrNl/Msvv9SgQYM0duxYbd++Xa+++qoWLlyof/zjH5J+L4K33367Nm7cqDfffFPbt2/XtGnT5OnpKUnatWuXevfurXvuuUdbt27VkiVLtGHDBo0ePdrm+V988UV16dJFW7Zs0UMPPaSRI0cqLy9PkpSVlSVJ+uyzz3Tw4EGbQvy/1q5dq127dmnt2rVatGiRFi5cqIULF1rmDxkyRPv27dPatWu1bNky/etf/9KhQ4dstnHvvffq0KFD+uSTT5Sdna3rr79evXr10rFjx2QymbRo0SJ98803mjVrliRpxIgRatKkiaUc7d+/X927d5eXl5c+//xzZWdna+jQoTaF8lzKy8v15ptvaujQoTKZTJZ9PGDAAJnNZm3atEnp6ekaP358lXX9/Py0cOFCbd++XS+99JLmzp2rGTNmSPq9gMbHx1dbxocMGSIPDw9NnDhR27dv1yeffKLc3FzNmTNHjRo1umBm4JIYABxi8ODBxl133WUcOnTI8PLyMvbu3Wvs3bvX8Pb2Ng4fPmzcddddxuDBgw3DMIxly5YZ9evXN7y9vY0bb7zRSE1NNb777jub7Uky3nvvPcMwDGPPnj2GJOP111+3zN+2bZshycjNzTUMwzAGDhxodOvWrdpsv/32m+Hr62t89dVXNuPDhg0zBg4caPMcW7ZsMR555BEjKSnJMAzDSEpKMsaNG2ds2bLFkGTs2bPHMAzDePLJJ402bdoYlZWVlu3Nnj3bqFevnlFRUWFkZ2cbkoy9e/eed3/9L0mGt7e3cc011xgeHh6GJCMiIsI4evSoZZlevXoZzz77rM16//73v42wsDDDMAxj9erVhoeHh5GXl1ftcw8bNswYPny4zdiXX35peHh4GL/++qthGIbRvHlz44EHHrDMr6ysNIKDg405c+ZU2V/WJk+ebHTu3NnmdTZv3tw4c+aMZezee+81EhMTDcMwjLy8PEOSkZWVZZmfm5trSDJmzJhhyebv72/89ttvNs8VGRlpvPrqq5a/33nnHcPb29uYMGGCcc011xg//fSTZV5qaqrRokULo7y8vNp9cq5/D8MwjCVLlhienp7G/v37LWOrV6826tSpYzP2ySef2Lxvq/PPf/7TiImJsdl2/fr1La8tOzvbMJlMlvdZ3759Le9FwFk4cgQ4WFBQkO644w4tXLhQCxYs0B133FHl/3TvueceHThwQB988IF69+6tdevW6frrr7c5mlCdTp06WR6HhYVJkuUIw9kjR9XZuXOnSktLdeutt1qOXNWrV09vvPFGtadVhg4dqqVLl6qgoEBLly7V0KFDqyyTm5uruLg4y5EESerWrZtOnTqlX375RZ07d1avXr3UsWNH3XvvvZo7d67lFN+FzJgxQzk5Ofrkk08UFRWl119/XQ0aNLDM/+677zR16lSb15KcnKyDBw+qtLRUOTk5atq0qa699tpqt//dd99p4cKFNusnJCSosrJSe/bssSxnvb9NJpNCQ0OrHNGpifbt21uOWkm//9ud3U5ubq7q1KmjmJgYy/y2bdsqMDDQJu+pU6fUsGFDm8x79uyx+fe79957dffdd2vatGl64YUX1Lp1a8u8nJwc3XTTTapbt67d+efNm6fbb79djRs3tozl5uYqPDzcZiwuLq7KukuWLFG3bt0UGhqqevXq6e9//7vy8/Mt8/v37y9PT0+99957kqSFCxfq5ptvVkREhCRp5MiRWrx4saKjo/XEE0/oq6++sjs/YK86rg4AXImGDh1qOUUze/bsapfx9vbWrbfeqltvvVUTJ07Ugw8+qMmTJ1u+0VYd6w8269MbkuTj43PO9U6dOiVJ+uijj6pcWF3dRd8dO3ZU27ZtNXDgQLVr104dOnSo8q2sC/H09NSaNWv01Vdf6dNPP9XLL7+sp556Sps2bbJcr3MuoaGhatWqlVq1aqUFCxaoT58+2r59u4KDgy2vZ8qUKRowYECVdb29vc+7L86u/7e//U1jxoypMs/6wvn/LRImk8myv+1xqds5deqUwsLCbK7LOsu6RJWWlio7O1uenp7asWOHzXIX2ifn8vPPP+uzzz4772nDc/l/7dxtSJNdGAfw/7N0a+ZaJLOIln3IzUUscCVlL0ZpM/xgFPSC0CgWkW20XsRiogypD1FkGkVFiWYxEIKB5cpeDGFmaeXI6bJlzsDyBUsiQsXr+SDebG2KTzzlw8P1Az94zrnvc24ceHHO/159fT2ysrJgs9mg1+shl8tht9tx7tw5YYxYLMaePXtQWlqKbdu24fbt27hw4YLQv2XLFnR2duLevXuoqanBpk2bcOjQIZw9e/aXnoexqeCdI8Z+g/T0dAwNDWF4eBh6vX5K1yxdujQoEPxP6SNWxgAABWVJREFUabVaPHr0aMJ7SyQS+P1+oegY/1EqlWGv2bdvH2pra8PuGgGARqNBfX29EAAGxgLhMpkMCxcuBDBWBKxZswY2mw2vXr2CWCwWdgjEYrGQh5pMUlISdDqdkCcCxoLvXq835FmWLFkCkUgErVaLjx8/4u3bt2HvmZiYCI/HE/b6wGzTZMbHTeUZJpOQkICRkRE0NTUJbV6vF1++fAla76dPnxARERGy3sBdyWPHjkEkEqG6uhrFxcV4/Pix0KfValFXVxcUBJ+K0tJSxMbGIiMjI6hdo9Ggq6sL3d3dQtuzZ8+CxrhcLsTFxcFqtWLFihWIj49HZ2dnyBxGoxEPHz7EpUuXMDIyElL0KhQKGAwGVFRUoKioKCgYz9jvwMURY7/BjBkz0NraCo/HE3ScAgD9/f3YuHEjKioq4Ha70dHRgcrKSpw5cwaZmZm/POfJkyfx4sULZGdnw+12o62tDZcvX0ZfXx9kMhmOHz+OI0eOoKysDD6fDy9fvkRJSQnKysrC3m///v3o7e2F0WgM25+dnY2uri6YzWa0tbXB4XCgoKAAR48ehUgkQkNDA06fPo3Gxkb4/X7cuXMHvb290Gg0AMbCuG63G16vF319fZP+07ZYLLhy5Yrw/Uv5+fkoLy+HzWZDS0sLWltbYbfbkZeXBwBISUnB+vXrsX37dtTU1KCjowPV1dVwOp0Axt6gcrlcMJlMeP36Ndrb2+FwOEIC2ZOJjY2FVCqF0+nE58+f8fXr1ylfG0itViM9PR0HDhxAQ0MDmpqaYDQag3Z6UlNTsXr1amzduhUPHjzAhw8f4HK5YLVahTe97t69ixs3buDWrVtIS0tDTk4ODAaDcJRpMpkwODiIXbt2obGxEe3t7bh586YQMA9ndHRUeFMxIiL4oCE1NRUqlQoGgwHNzc2oq6uD1WoNGhMfHw+/3w+73Q6fz4fi4mKhOA6k0WiwatUq5ObmYvfu3UHPnp+fD4fDgXfv3qGlpQVVVVXCZ4ix32a6Q0+M/V9MFmglIiGQ/ePHDzpx4gQlJiaSXC6nqKgoUqvVlJeXR9+/fxfGI0wgOzD8OzAwQADoyZMnQlttbS0lJyeTRCKhOXPmkF6vp4GBASIaCxQXFRWRWq2myMhIUigUpNfr6enTpxPOEejnQPb4fCtXriSxWEzz58+n3NxcGh4eJiIij8dDer2eFAoFSSQSUqlUVFJSIlzb09NDaWlpFB0dHfQcCBPoHR0dpYSEBDp48KDQ5nQ6KTk5maRSKc2ePZuSkpLo6tWrQn9/fz/t3buXYmJiaObMmbRs2TKqqqoS+p8/fy7MP2vWLNJqtXTq1CmhPy4uTghEj1u+fDkVFBQIv1+7do2USiWJRCJKSUkhovCB7J8/F4cPHxbGExF1d3dTRkYGSSQSWrRoEZWXl4fMPzg4SGazmRYsWECRkZGkVCopKyuL/H4/9fT00Lx584JC6kNDQ6TT6WjHjh1CW3NzM23evJmioqJIJpPRunXryOfzTbjO+/fvE4AJg+1er5fWrl1LYrGYVCoVOZ3OkL9fTk4OxcTEUHR0NO3cuZPOnz9Pcrk85F7Xr18PCaYTERUWFpJGoyGpVEpz586lzMxMev/+fdj1MPZv+YsoYE+cMcbYf8LixYthsVhgsVimeyl/RGFhISorK+F2u6d7KYzxsRpjjLHp8+3bN7x58wYXL16E2Wye7uUwBoCLI8YYY9PIZDJBp9Nhw4YNE4b/GfvT+FiNMcYYYywA7xwxxhhjjAXg4ogxxhhjLAAXR4wxxhhjAbg4YowxxhgLwMURY4wxxlgALo4YY4wxxgJwccQYY4wxFoCLI8YYY4yxAH8DPKRizpinfWsAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['MSinceMostRecentInqexcl7days']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second shows that predicted risk increases with the number of inquiries in the last six months ('NumInqLast6M')." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['NumInqLast6M']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Debt level\n", "The following four plots relate to the applicant's debt level. 'NetFractionRevolvingBurden' is the ratio of revolving debt (e.g. credit card) balance to credit limit, expressed as a percentage, and has a large negative impact on the probability of good credit. A small fraction of applicants (less than 1%) actually have NetFractionRevolvingBurden greater than 100%, i.e. more revolving debt than their credit limit. This might be investigated further by the data scientist." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['NetFractionRevolvingBurden']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second 'NumBank2NatlTradesWHighUtilization' plot shows that the number of accounts (\"trades\") with high utilization (high balance relative to credit limit for each account) also has a large impact, with a drop as soon as one account has high utilization." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['NumBank2NatlTradesWHighUtilization']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " The third plot shows that the model gives a bonus to applicants who carry balances on no more than five revolving debt accounts." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['NumRevolvingTradesWBalance']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fourth shows an effect from the percentage of accounts with a balance that is much smaller than those from other features." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['PercentTradesWBalance']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Number and type of accounts\n", "The number of \"satisfactory\" accounts (\"trades\") has a significant positive effect on the predicted probability of good credit, with jumps at 12 and 17 accounts." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['NumSatisfactoryTrades']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, having more than 40% as installment debt accounts (e.g. car loans) is seen as a negative." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['PercentInstallTrades']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Length of credit history\n", "The 'AverageMInFile' plot shows that most of the benefit of having a longer average credit history accrues between average ages of 52 and 84 months (four to seven years). " ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['AverageMInFile']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similar but smaller gains come when the age of the oldest account ('MSinceOldestTradeOpen') exceeds 122 and 146 months (10-12 years)." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['MSinceOldestTradeOpen']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Delinquencies\n", "The last set of plots looks at the effect of delinquencies. The first plot shows that much of the change due to the percentage of accounts that were never delinquent ('PercentTradesNeverDelq') occurs between 90% and 100%." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['PercentTradesNeverDelq']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "'MaxDelq2PublicRecLast12M' measures the severity of the applicant's worst delinquency from the last 12 months of the public record. A value of 5 or below indicates that some delinquency has occurred, whether of unknown duration, 30/60/90/120 days delinquent, or a derogatory comment. " ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['MaxDelq2PublicRecLast12M']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "According to the last 'MSinceMostRecentDelq' plot, the effect of the most recent delinquency wears off after 21 months." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lrr.visualize(data, fb, ['MSinceMostRecentDelq']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 3. Loan Officer: Prototypical explanations for HELOC use case\n", "\n", "We now show how to generate explanations in the form of selecting prototypical or similar user profiles to an applicant in question that a bank employee such as a loan officer may be interested in. This may help the employee understand the decision of an applicant's HELOC application being accepted or rejected in the context of other similar applications. Note that the selected prototypical applications are profiles that are part of the training set that has been used to train an AI model that predicts good or bad i.e. approved or rejected for these applications. In fact, the method used in this notebook can work even if we are given not just one but a set of user profiles for which we want to find similar profiles from a training dataset. Additionally, the method computes weights for each prototype showcasing its similarity to the user(s) in question.\n", "\n", "The prototypical explanations in AIX360 are obtained using the Protodash algorithm developed in the following work: [ProtoDash: Fast Interpretable Prototype Selection](https://arxiv.org/abs/1707.01212)\n", "\n", "We now provide a brief overview of the method. The method takes as input a datapoint (or group of datapoints) that we want to explain with respect to instances in a training set belonging to the same feature space. The method then tries to minimize the maximum mean discrepancy (MMD metric) between the datapoints we want to explain and a prespecified number of instances from the training set that it will select. In other words, it will try to select training instances that have the same distribution as the datapoints we want to explain. The method does greedy selection and has quality guarantees with it also returning importance weights for the chosen prototypical training instances indicative of how similar/representative they are.\n", "\n", "In this tutorial, we will see two examples of obtaining prototypes, one for a user whose HELOC application was approved and another for a user whose HELOC application was rejected. In each case, we showcase the top five prototypes from the training data along with how similar the feature values were for these prototypes.\n", "\n", "[Example 1. Obtaining similar samples as explanations for a HELOC applicant predicted as \"Good\"](#good)
\n", "[Example 2. Obtaining similar samples as explanations for a HELOC applicant predicted as \"Bad\"](#bad)
\n", "\n", "\n", "###### Why Protodash?\n", "Before we showcase the two examples we provide some motivation for using this method. The method selects applications from the training set that are similar in different ways to the user application we want to explain. For example, a users loan may be rejected justifiably because the number of satisfactory trades he performed were low similar to another rejected user, or because his/her debts were too high similar to a different rejected user. Either of these reasons in isolation may be sufficient for rejection and the method is able to surface a variety of such reasons through the selected prototypes. This is not the case using standard nearest neighbor techniques which use metrics such as euclidean distance, cosine similarity amongst others, where one might get the same type of explanation (i.e. applications with only low number of satisfactory trades). Protodash thus is able to provide a much more well rounded and comprehensive view of why the decision for the applicant may be justifiable.\n", "\n", "Another benefit of the method is that — since it does distribution matching between the user/users in question and those available in the training set — it could, in principle, be applied also in non-iid settings such as for time series data. Other approaches which find similar profiles using standard distance measures (viz. euclidean, cosine) do not have this property. Additionally, we can also highlight important features for the different prototypes that made them similar to the user/users in question.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import statements\n", "\n", "Import necessary libraries, frameworks and algorithms." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import tensorflow as tf\n", "from keras.models import Sequential, Model, load_model, model_from_json\n", "from keras.layers import Dense\n", "import matplotlib.pyplot as plt\n", "from IPython.core.display import display, HTML\n", "\n", "from aix360.algorithms.contrastive import CEMExplainer, KerasClassifier\n", "from aix360.algorithms.protodash import ProtodashExplainer\n", "from aix360.datasets.heloc_dataset import HELOCDataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load HELOC dataset and show sample applicants" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/anaconda3/envs/aix360/lib/python3.6/site-packages/aix360/datasets/heloc_dataset.py:31: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[col][df[col].isin([-7, -8, -9])] = 0\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Size of HELOC dataset: (10459, 24)\n", "Number of \"Good\" applicants: 5000\n", "Number of \"Bad\" applicants: 5459\n", "Sample Applicants:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789
ExternalRiskEstimate55616766815954685961
MSinceOldestTradeOpen14458661693331378814832479
MSinceMostRecentTradeOpen4155127117724
AverageMInFile8441247313278376513836
NumSatisfactoryTrades202928123125172419
NumTrades60Ever2DerogPubRec3401000000
NumTrades90Ever2DerogPubRec0401000000
PercentTradesNeverDelq83100100931009192838595
MSinceMostRecentDelq2-7-776-7193155
MaxDelq2PublicRecLast12M3076744644
MaxDelqEver5886866666
NumTotalTrades237930123226182719
NumTradesOpeninLast12M1043013113
PercentInstallTrades43674457254758442626
MSinceMostRecentInqexcl7days0000000000
NumInqLast6M0045104016
NumInqLast6Mexcl7days0044104016
NetFractionRevolvingBurden3305372516289286831
NetFractionInstallBurden-8-8668389937648-886
NumRevolvingTradesWBalance80463127275
NumInstallTradesWBalance1-824147213
NumBank2NatlTradesWHighUtilization1-813032231
PercentTradesWBalance69086918094100409062
RiskPerformanceBadBadBadBadBadBadGoodGoodBadBad
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9\n", "ExternalRiskEstimate 55 61 67 66 81 59 54 68 59 61\n", "MSinceOldestTradeOpen 144 58 66 169 333 137 88 148 324 79\n", "MSinceMostRecentTradeOpen 4 15 5 1 27 11 7 7 2 4\n", "AverageMInFile 84 41 24 73 132 78 37 65 138 36\n", "NumSatisfactoryTrades 20 2 9 28 12 31 25 17 24 19\n", "NumTrades60Ever2DerogPubRec 3 4 0 1 0 0 0 0 0 0\n", "NumTrades90Ever2DerogPubRec 0 4 0 1 0 0 0 0 0 0\n", "PercentTradesNeverDelq 83 100 100 93 100 91 92 83 85 95\n", "MSinceMostRecentDelq 2 -7 -7 76 -7 1 9 31 5 5\n", "MaxDelq2PublicRecLast12M 3 0 7 6 7 4 4 6 4 4\n", "MaxDelqEver 5 8 8 6 8 6 6 6 6 6\n", "NumTotalTrades 23 7 9 30 12 32 26 18 27 19\n", "NumTradesOpeninLast12M 1 0 4 3 0 1 3 1 1 3\n", "PercentInstallTrades 43 67 44 57 25 47 58 44 26 26\n", "MSinceMostRecentInqexcl7days 0 0 0 0 0 0 0 0 0 0\n", "NumInqLast6M 0 0 4 5 1 0 4 0 1 6\n", "NumInqLast6Mexcl7days 0 0 4 4 1 0 4 0 1 6\n", "NetFractionRevolvingBurden 33 0 53 72 51 62 89 28 68 31\n", "NetFractionInstallBurden -8 -8 66 83 89 93 76 48 -8 86\n", "NumRevolvingTradesWBalance 8 0 4 6 3 12 7 2 7 5\n", "NumInstallTradesWBalance 1 -8 2 4 1 4 7 2 1 3\n", "NumBank2NatlTradesWHighUtilization 1 -8 1 3 0 3 2 2 3 1\n", "PercentTradesWBalance 69 0 86 91 80 94 100 40 90 62\n", "RiskPerformance Bad Bad Bad Bad Bad Bad Good Good Bad Bad" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "heloc = HELOCDataset()\n", "df = heloc.dataframe()\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 24)\n", "pd.set_option('display.width', 1000)\n", "print(\"Size of HELOC dataset:\", df.shape)\n", "print(\"Number of \\\"Good\\\" applicants:\", np.sum(df['RiskPerformance']=='Good'))\n", "print(\"Number of \\\"Bad\\\" applicants:\", np.sum(df['RiskPerformance']=='Bad'))\n", "print(\"Sample Applicants:\")\n", "df.head(10).transpose()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Distribution of ExternalRiskEstimate and NumSatisfactoryTrades columns:\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot (example) distributions for two features\n", "print(\"Distribution of ExternalRiskEstimate and NumSatisfactoryTrades columns:\")\n", "hist = df.hist(column=['ExternalRiskEstimate', 'NumSatisfactoryTrades'], bins=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Step 1: Process and Normalize HELOC dataset for training\n", "\n", "We will first process the HELOC dataset before using it to train an NN model that can predict the\n", "target variable RiskPerformance. The HELOC dataset is a tabular dataset with numerical values. However, some of the values are negative and need to be filtered. The processed data is stored in the file heloc.npz for easy access. The dataset is also normalized for training.\n", "\n", "The data processing and the type of model built in this case is different from the Data Scientist persona described above where rule based methods are showcased. This is the reason for going through these steps again for the Loan Officer persona." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### a. Process the dataset" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "# Clean data and split dataset into train/test\n", "(Data, x_train, x_test, y_train_b, y_test_b) = heloc.split()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### b. Normalize the dataset" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "Z = np.vstack((x_train, x_test))\n", "Zmax = np.max(Z, axis=0)\n", "Zmin = np.min(Z, axis=0)\n", "\n", "#normalize an array of samples to range [-0.5, 0.5]\n", "def normalize(V):\n", " VN = (V - Zmin)/(Zmax - Zmin)\n", " VN = VN - 0.5\n", " return(VN)\n", " \n", "# rescale a sample to recover original values for normalized values. \n", "def rescale(X):\n", " return(np.multiply ( X + 0.5, (Zmax - Zmin) ) + Zmin)\n", "\n", "N = normalize(Z)\n", "xn_train = N[0:x_train.shape[0], :]\n", "xn_test = N[x_train.shape[0]:, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Step 2. Define and train a NN classifier\n", "\n", "Let us now build a loan approval model based on the HELOC dataset.\n", "\n", "#### a. Define NN architecture\n", "We now define the architecture of a 2-layer neural network classifier whose predictions we will try to interpret. " ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "# nn with no softmax\n", "def nn_small():\n", " model = Sequential()\n", " model.add(Dense(10, input_dim=23, kernel_initializer='normal', activation='relu'))\n", " model.add(Dense(2, kernel_initializer='normal')) \n", " return model " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### b. Train the NN" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "dense_7 (Dense) (None, 10) 240 \n", "_________________________________________________________________\n", "dense_8 (Dense) (None, 2) 22 \n", "=================================================================\n", "Total params: 262\n", "Trainable params: 262\n", "Non-trainable params: 0\n", "_________________________________________________________________\n", "Train accuracy: 0.7387545589625827\n", "Test accuracy: 0.7224473257698542\n" ] } ], "source": [ "# Set random seeds for repeatability\n", "np.random.seed(1) \n", "tf.set_random_seed(2) \n", "\n", "class_names = ['Bad', 'Good']\n", "\n", "# loss function\n", "def fn(correct, predicted):\n", " return tf.nn.softmax_cross_entropy_with_logits(labels=correct, logits=predicted)\n", "\n", "# compile and print model summary\n", "nn = nn_small()\n", "nn.compile(loss=fn, optimizer='adam', metrics=['accuracy'])\n", "nn.summary()\n", "\n", "\n", "# train model or load a trained model\n", "TRAIN_MODEL = False\n", "\n", "if (TRAIN_MODEL): \n", " nn.fit(xn_train, y_train_b, batch_size=128, epochs=500, verbose=1, shuffle=False)\n", " nn.save_weights(\"heloc_nnsmall.h5\") \n", "else: \n", " nn.load_weights(\"heloc_nnsmall.h5\")\n", " \n", "\n", "# evaluate model accuracy \n", "score = nn.evaluate(xn_train, y_train_b, verbose=0) #Compute training set accuracy\n", "#print('Train loss:', score[0])\n", "print('Train accuracy:', score[1])\n", "\n", "score = nn.evaluate(xn_test, y_test_b, verbose=0) #Compute test set accuracy\n", "#print('Test loss:', score[0])\n", "print('Test accuracy:', score[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Step 3: Obtain similar samples as explanations for a HELOC applicant predicted as \"Good\" (Example 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### a. Normalize the data and chose a particular applicant, whose profile is displayed below." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "p_train = nn.predict_classes(xn_train) # Use trained neural network to predict train points\n", "p_train = p_train.reshape((p_train.shape[0],1))\n", "\n", "z_train = np.hstack((xn_train, p_train)) # Store (normalized) instances that were predicted as Good\n", "z_train_good = z_train[z_train[:,-1]==1, :]\n", "\n", "zun_train = np.hstack((x_train, p_train)) # Store (unnormalized) instances that were predicted as Good \n", "zun_train_good = zun_train[zun_train[:,-1]==1, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us now consider applicant 8 whose loan was approved. Note that this applicant was also considered for the contrastive explainer, however, we now justify the approved status in a different manner using prototypical examples, which is arguably a better explanation for a bank employee." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Chosen Sample: 8\n", "Prediction made by the model: Good\n", "Prediction probabilities: [[-0.1889221 0.29527372]]\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/anaconda3/envs/aix360/lib/python3.6/site-packages/keras/engine/sequential.py:247: UserWarning: Network returning invalid probability values. The last layer might not normalize predictions into probabilities (like softmax or sigmoid would).\n", " warnings.warn('Network returning invalid probability values. '\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
ExternalRiskEstimate82
MSinceOldestTradeOpen280
MSinceMostRecentTradeOpen13
AverageMInFile102
NumSatisfactoryTrades22
NumTrades60Ever2DerogPubRec0
NumTrades90Ever2DerogPubRec0
PercentTradesNeverDelq91
MSinceMostRecentDelq26
MaxDelq2PublicRecLast12M6
MaxDelqEver6
NumTotalTrades23
NumTradesOpeninLast12M0
PercentInstallTrades9
MSinceMostRecentInqexcl7days0
NumInqLast6M0
NumInqLast6Mexcl7days0
NetFractionRevolvingBurden3
NetFractionInstallBurden0
NumRevolvingTradesWBalance4
NumInstallTradesWBalance1
NumBank2NatlTradesWHighUtilization1
PercentTradesWBalance42
RiskPerformanceGood
\n", "
" ], "text/plain": [ " 0\n", "ExternalRiskEstimate 82\n", "MSinceOldestTradeOpen 280\n", "MSinceMostRecentTradeOpen 13\n", "AverageMInFile 102\n", "NumSatisfactoryTrades 22\n", "NumTrades60Ever2DerogPubRec 0\n", "NumTrades90Ever2DerogPubRec 0\n", "PercentTradesNeverDelq 91\n", "MSinceMostRecentDelq 26\n", "MaxDelq2PublicRecLast12M 6\n", "MaxDelqEver 6\n", "NumTotalTrades 23\n", "NumTradesOpeninLast12M 0\n", "PercentInstallTrades 9\n", "MSinceMostRecentInqexcl7days 0\n", "NumInqLast6M 0\n", "NumInqLast6Mexcl7days 0\n", "NetFractionRevolvingBurden 3\n", "NetFractionInstallBurden 0\n", "NumRevolvingTradesWBalance 4\n", "NumInstallTradesWBalance 1\n", "NumBank2NatlTradesWHighUtilization 1\n", "PercentTradesWBalance 42\n", "RiskPerformance Good" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx = 8\n", "\n", "X = xn_test[idx].reshape((1,) + xn_test[idx].shape)\n", "print(\"Chosen Sample:\", idx)\n", "print(\"Prediction made by the model:\", class_names[np.argmax(nn.predict_proba(X))])\n", "print(\"Prediction probabilities:\", nn.predict_proba(X))\n", "print(\"\")\n", "\n", "# attach the prediction made by the model to X\n", "X = np.hstack((X, nn.predict_classes(X).reshape((1,1))))\n", "\n", "Xun = x_test[idx].reshape((1,) + x_test[idx].shape) \n", "dfx = pd.DataFrame.from_records(Xun.astype('double')) # Create dataframe with original feature values\n", "dfx[23] = class_names[X[0, -1]]\n", "dfx.columns = df.columns\n", "dfx.transpose()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### b. Find similar applicants predicted as \"good\" using the protodash explainer. " ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -2.0000e+04 4e+00 1e+00 1e+00\n", " 1: 1.8207e+01 -2.2985e+05 5e+01 1e+00 1e+00\n", " 2: -1.6771e+00 -1.4132e+06 3e+02 1e+00 1e+00\n", " 3: 6.4653e-01 -7.7669e+06 2e+03 1e+00 1e+00\n", " 4: 9.0963e-01 -1.6930e+08 3e+04 1e+00 1e+00\n", " 5: 6.8400e-01 -8.7461e+10 2e+07 1e+00 1e+00\n", " 6: 2.1065e+08 -1.7700e+18 2e+18 6e-13 9e-03\n", " 7: 2.1065e+08 -1.7700e+16 2e+16 6e-15 1e-03\n", " 8: 2.1065e+08 -1.7700e+14 2e+14 4e-16 3e-05\n", " 9: 2.1065e+08 -1.7706e+12 2e+12 2e-16 5e-07\n", "10: 2.1059e+08 -1.8270e+10 2e+10 2e-16 6e-09\n", "11: 2.0548e+08 -7.3263e+08 9e+08 2e-16 6e-10\n", "12: 5.4547e+06 -5.0769e+08 5e+08 2e-16 2e-11\n", "13: 2.4579e+06 -1.0151e+07 1e+07 3e-16 8e-13\n", "14: 3.9731e+05 -4.8259e+05 9e+05 2e-16 2e-13\n", "15: 5.6807e+04 -6.2926e+04 1e+05 2e-16 4e-14\n", "16: 8.0641e+03 -9.1700e+03 2e+04 1e-16 1e-14\n", "17: 1.1237e+03 -1.3430e+03 2e+03 8e-17 3e-14\n", "18: 1.4817e+02 -2.0491e+02 4e+02 9e-17 2e-15\n", "19: 1.5650e+01 -3.4597e+01 5e+01 2e-16 8e-16\n", "20: -6.5180e-01 -7.5158e+00 7e+00 3e-16 7e-16\n", "21: -2.1215e+00 -2.8262e+00 7e-01 1e-16 6e-17\n", "22: -2.2224e+00 -2.3257e+00 1e-01 5e-17 2e-17\n", "23: -2.2551e+00 -2.2713e+00 2e-02 8e-17 8e-17\n", "24: -2.2583e+00 -2.2599e+00 2e-03 3e-16 7e-17\n", "25: -2.2584e+00 -2.2585e+00 5e-05 9e-17 2e-16\n", "26: -2.2584e+00 -2.2584e+00 5e-07 8e-17 7e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -3.0000e+04 6e+00 1e+00 1e+00\n", " 1: 3.0722e+01 -4.4267e+05 9e+01 1e+00 1e+00\n", " 2: -1.6074e+00 -1.6114e+06 3e+02 1e+00 1e+00\n", " 3: 1.4698e+00 -5.8978e+06 1e+03 1e+00 1e+00\n", " 4: 5.1359e+00 -5.6757e+07 1e+04 1e+00 1e+00\n", " 5: 8.8032e+00 -6.9908e+09 1e+06 1e+00 1e+00\n", " 6: 1.8944e+08 -1.6526e+17 2e+17 6e-13 3e-04\n", " 7: 1.8944e+08 -1.6526e+15 2e+15 6e-15 2e-04\n", " 8: 1.8944e+08 -1.6526e+13 2e+13 2e-16 1e-06\n", " 9: 1.8943e+08 -1.6612e+11 2e+11 2e-16 2e-08\n", "10: 1.8825e+08 -2.5115e+09 3e+09 2e-16 5e-10\n", "11: 1.2280e+08 -5.4548e+08 7e+08 7e-17 1e-07\n", "12: 1.8756e+07 -9.4847e+07 1e+08 4e-16 2e-12\n", "13: 3.6747e+06 -5.2922e+06 9e+06 5e-17 5e-13\n", "14: 5.3007e+05 -5.8496e+05 1e+06 2e-16 2e-13\n", "15: 7.5911e+04 -8.5173e+04 2e+05 2e-16 2e-13\n", "16: 1.0792e+04 -1.2212e+04 2e+04 1e-16 4e-14\n", "17: 1.5104e+03 -1.7838e+03 3e+03 3e-16 7e-15\n", "18: 2.0196e+02 -2.6962e+02 5e+02 2e-16 6e-15\n", "19: 2.2751e+01 -4.4473e+01 7e+01 2e-16 2e-15\n", "20: 1.6485e-01 -9.1332e+00 9e+00 2e-16 4e-16\n", "21: -2.0349e+00 -3.0759e+00 1e+00 3e-16 4e-16\n", "22: -2.1758e+00 -2.4046e+00 2e-01 5e-17 2e-16\n", "23: -2.2521e+00 -2.3210e+00 7e-02 2e-16 7e-17\n", "24: -2.2594e+00 -2.2652e+00 6e-03 2e-16 5e-17\n", "25: -2.2601e+00 -2.2604e+00 3e-04 3e-16 7e-17\n", "26: -2.2601e+00 -2.2601e+00 3e-06 2e-16 9e-17\n", "27: -2.2601e+00 -2.2601e+00 3e-08 1e-16 3e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -4.0000e+04 8e+00 1e+00 1e+00\n", " 1: 4.4367e+01 -7.1824e+05 1e+02 1e+00 1e+00\n", " 2: -2.0468e+00 -3.1903e+06 7e+02 1e+00 1e+00\n", " 3: 1.2538e+01 -1.4991e+07 3e+03 1e+00 1e+00\n", " 4: 1.8503e+01 -3.6431e+08 7e+04 1e+00 1e+00\n", " 5: 1.4872e+01 -4.6590e+11 1e+08 1e+00 1e+00\n", " 6: 1.8484e+08 -7.2574e+18 7e+18 5e-13 9e-03\n", " 7: 1.8484e+08 -7.2574e+16 7e+16 5e-15 5e-03\n", " 8: 1.8484e+08 -7.2574e+14 7e+14 1e-16 8e-05\n", " 9: 1.8484e+08 -7.2586e+12 7e+12 3e-17 7e-07\n", "10: 1.8482e+08 -7.3749e+10 7e+10 2e-16 8e-09\n", "11: 1.8327e+08 -1.8914e+09 2e+09 2e-16 3e-10\n", "12: 6.6101e+07 -3.5884e+08 4e+08 3e-16 3e-08\n", "13: 1.4131e+07 -2.4415e+07 4e+07 2e-16 1e-09\n", "14: 2.0607e+06 -2.3295e+06 4e+06 2e-16 6e-13\n", "15: 2.9628e+05 -3.3354e+05 6e+05 2e-16 3e-13\n", "16: 4.2328e+04 -4.7437e+04 9e+04 2e-16 8e-14\n", "17: 5.9948e+03 -6.8520e+03 1e+04 1e-16 2e-14\n", "18: 8.3044e+02 -1.0092e+03 2e+03 4e-16 1e-14\n", "19: 1.0739e+02 -1.5582e+02 3e+02 3e-16 3e-15\n", "20: 1.0233e+01 -2.7124e+01 4e+01 2e-16 1e-15\n", "21: -1.3211e+00 -6.3304e+00 5e+00 1e-16 3e-16\n", "22: -2.2395e+00 -2.6971e+00 5e-01 2e-16 1e-16\n", "23: -2.2596e+00 -2.2756e+00 2e-02 3e-16 1e-16\n", "24: -2.2616e+00 -2.2630e+00 1e-03 1e-16 1e-16\n", "25: -2.2617e+00 -2.2617e+00 2e-05 9e-17 1e-16\n", "26: -2.2617e+00 -2.2617e+00 2e-07 2e-16 6e-17\n", "Optimal solution found.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/anaconda3/envs/aix360/lib/python3.6/site-packages/cvxopt/coneprog.py:2111: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", " if 'x' in initvals:\n", "/anaconda3/envs/aix360/lib/python3.6/site-packages/cvxopt/coneprog.py:2116: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", " if 's' in initvals:\n", "/anaconda3/envs/aix360/lib/python3.6/site-packages/cvxopt/coneprog.py:2131: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", " if 'y' in initvals:\n", "/anaconda3/envs/aix360/lib/python3.6/site-packages/cvxopt/coneprog.py:2136: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", " if 'z' in initvals:\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -5.0000e+04 1e+01 1e+00 1e+00\n", " 1: 6.4530e+01 -1.0740e+06 2e+02 1e+00 1e+00\n", " 2: -1.0387e+00 -3.8672e+06 8e+02 1e+00 1e+00\n", " 3: 1.7593e+01 -1.4129e+07 3e+03 1e+00 1e+00\n", " 4: 2.4754e+01 -1.7515e+08 4e+04 1e+00 1e+00\n", " 5: 2.8355e+01 -4.2312e+10 9e+06 1e+00 1e+00\n", " 6: 2.6599e+08 -9.5334e+17 1e+18 4e-13 1e-03\n", " 7: 2.6599e+08 -9.5334e+15 1e+16 4e-15 9e-04\n", " 8: 2.6599e+08 -9.5336e+13 1e+14 1e-16 6e-06\n", " 9: 2.6599e+08 -9.5545e+11 1e+12 1e-16 9e-08\n", "10: 2.6558e+08 -1.1640e+10 1e+10 1e-16 2e-09\n", "11: 2.4039e+08 -2.0164e+09 2e+09 2e-16 2e-08\n", "12: 2.9390e+07 -1.5952e+09 2e+09 2e-16 7e-09\n", "13: 1.0754e+07 -3.7180e+07 5e+07 2e-16 2e-10\n", "14: 1.6461e+06 -1.9697e+06 4e+06 2e-16 2e-12\n", "15: 2.3560e+05 -2.6053e+05 5e+05 1e-16 2e-13\n", "16: 3.3615e+04 -3.7633e+04 7e+04 2e-16 5e-14\n", "17: 4.7521e+03 -5.4485e+03 1e+04 2e-16 2e-14\n", "18: 6.5532e+02 -8.0583e+02 1e+03 1e-16 9e-15\n", "19: 8.3449e+01 -1.2556e+02 2e+02 9e-17 4e-15\n", "20: 7.2389e+00 -2.2354e+01 3e+01 2e-16 7e-16\n", "21: -1.5947e+00 -5.4973e+00 4e+00 2e-16 6e-16\n", "22: -2.2383e+00 -2.5578e+00 3e-01 2e-16 1e-16\n", "23: -2.2526e+00 -2.2903e+00 4e-02 2e-16 7e-17\n", "24: -2.2616e+00 -2.2685e+00 7e-03 3e-16 8e-17\n", "25: -2.2622e+00 -2.2630e+00 8e-04 2e-16 1e-16\n", "26: -2.2622e+00 -2.2622e+00 2e-05 2e-16 2e-16\n", "27: -2.2622e+00 -2.2622e+00 2e-07 2e-16 2e-16\n", "Optimal solution found.\n" ] } ], "source": [ "explainer = ProtodashExplainer()\n", "(W, S, setValues) = explainer.explain(X, z_train_good, m=5) # Return weights W, Prototypes S and objective function values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### c. Display similar applicant user profiles and the extent to which they are similar to the chosen applicant as indicated by the last row in the table below labelled as \"Weight\"." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
ExternalRiskEstimate8589778373
MSinceOldestTradeOpen223379338789230
MSinceMostRecentTradeOpen13156265
AverageMInFile8725710910289
NumSatisfactoryTrades233164161
NumTrades60Ever2DerogPubRec00200
NumTrades90Ever2DerogPubRec00200
PercentTradesNeverDelq9110090100100
MSinceMostRecentDelq2606500
MaxDelq2PublicRecLast12M67676
MaxDelqEver68287
NumTotalTrades263214137
NumTradesOpeninLast12M00113
PercentInstallTrades933141718
MSinceMostRecentInqexcl7days10000
NumInqLast6M10112
NumInqLast6Mexcl7days10102
NetFractionRevolvingBurden402159
NetFractionInstallBurden000072
NumRevolvingTradesWBalance40139
NumInstallTradesWBalance10101
NumBank2NatlTradesWHighUtilization00017
PercentTradesWBalance500222353
RiskPerformanceGoodGoodGoodGoodGood
Weight0.7302220.06905620.09785930.04980470.0530578
\n", "
" ], "text/plain": [ " 0 1 2 3 4\n", "ExternalRiskEstimate 85 89 77 83 73\n", "MSinceOldestTradeOpen 223 379 338 789 230\n", "MSinceMostRecentTradeOpen 13 156 2 6 5\n", "AverageMInFile 87 257 109 102 89\n", "NumSatisfactoryTrades 23 3 16 41 61\n", "NumTrades60Ever2DerogPubRec 0 0 2 0 0\n", "NumTrades90Ever2DerogPubRec 0 0 2 0 0\n", "PercentTradesNeverDelq 91 100 90 100 100\n", "MSinceMostRecentDelq 26 0 65 0 0\n", "MaxDelq2PublicRecLast12M 6 7 6 7 6\n", "MaxDelqEver 6 8 2 8 7\n", "NumTotalTrades 26 3 21 41 37\n", "NumTradesOpeninLast12M 0 0 1 1 3\n", "PercentInstallTrades 9 33 14 17 18\n", "MSinceMostRecentInqexcl7days 1 0 0 0 0\n", "NumInqLast6M 1 0 1 1 2\n", "NumInqLast6Mexcl7days 1 0 1 0 2\n", "NetFractionRevolvingBurden 4 0 2 1 59\n", "NetFractionInstallBurden 0 0 0 0 72\n", "NumRevolvingTradesWBalance 4 0 1 3 9\n", "NumInstallTradesWBalance 1 0 1 0 1\n", "NumBank2NatlTradesWHighUtilization 0 0 0 1 7\n", "PercentTradesWBalance 50 0 22 23 53\n", "RiskPerformance Good Good Good Good Good\n", "Weight 0.730222 0.0690562 0.0978593 0.0498047 0.0530578" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs = pd.DataFrame.from_records(zun_train_good[S, 0:-1].astype('double'))\n", "RP=[]\n", "for i in range(S.shape[0]):\n", " RP.append(class_names[z_train_good[S[i], -1]]) # Append class names\n", "dfs[23] = RP\n", "dfs.columns = df.columns \n", "dfs[\"Weight\"] = np.around(W, 5)/np.sum(np.around(W, 5)) # Calculate normalized importance weights\n", "dfs.transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### d. Compute how similar a feature of a prototypical user is to the chosen applicant.\n", "The more similar the feature of prototypical user is to the applicant, the closer its weight is to 1. We can see below that several features for prototypes are quite similar to the chosen applicant. A human friendly explanation is provided thereafter." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
ExternalRiskEstimate0.590.290.420.840.21
MSinceOldestTradeOpen0.760.620.760.090.79
MSinceMostRecentTradeOpen1.000.090.830.890.87
AverageMInFile0.790.090.901.000.82
NumSatisfactoryTrades0.950.390.740.390.15
NumTrades60Ever2DerogPubRec1.001.000.081.001.00
NumTrades90Ever2DerogPubRec1.001.000.081.001.00
PercentTradesNeverDelq1.000.150.810.150.15
MSinceMostRecentDelq1.000.360.220.360.36
MaxDelq2PublicRecLast12M1.000.131.000.131.00
MaxDelqEver1.000.410.170.410.64
NumTotalTrades0.800.230.860.260.35
NumTradesOpeninLast12M1.001.000.400.400.06
PercentInstallTrades1.000.050.540.370.33
MSinceMostRecentInqexcl7days0.081.001.001.001.00
NumInqLast6M0.211.000.210.210.04
NumInqLast6Mexcl7days0.261.000.261.000.07
NetFractionRevolvingBurden0.960.880.960.920.09
NetFractionInstallBurden1.001.001.001.000.08
NumRevolvingTradesWBalance1.000.280.380.730.20
NumInstallTradesWBalance1.000.131.000.131.00
NumBank2NatlTradesWHighUtilization0.690.690.691.000.11
PercentTradesWBalance0.670.120.360.380.57
\n", "
" ], "text/plain": [ " 0 1 2 3 4\n", "ExternalRiskEstimate 0.59 0.29 0.42 0.84 0.21\n", "MSinceOldestTradeOpen 0.76 0.62 0.76 0.09 0.79\n", "MSinceMostRecentTradeOpen 1.00 0.09 0.83 0.89 0.87\n", "AverageMInFile 0.79 0.09 0.90 1.00 0.82\n", "NumSatisfactoryTrades 0.95 0.39 0.74 0.39 0.15\n", "NumTrades60Ever2DerogPubRec 1.00 1.00 0.08 1.00 1.00\n", "NumTrades90Ever2DerogPubRec 1.00 1.00 0.08 1.00 1.00\n", "PercentTradesNeverDelq 1.00 0.15 0.81 0.15 0.15\n", "MSinceMostRecentDelq 1.00 0.36 0.22 0.36 0.36\n", "MaxDelq2PublicRecLast12M 1.00 0.13 1.00 0.13 1.00\n", "MaxDelqEver 1.00 0.41 0.17 0.41 0.64\n", "NumTotalTrades 0.80 0.23 0.86 0.26 0.35\n", "NumTradesOpeninLast12M 1.00 1.00 0.40 0.40 0.06\n", "PercentInstallTrades 1.00 0.05 0.54 0.37 0.33\n", "MSinceMostRecentInqexcl7days 0.08 1.00 1.00 1.00 1.00\n", "NumInqLast6M 0.21 1.00 0.21 0.21 0.04\n", "NumInqLast6Mexcl7days 0.26 1.00 0.26 1.00 0.07\n", "NetFractionRevolvingBurden 0.96 0.88 0.96 0.92 0.09\n", "NetFractionInstallBurden 1.00 1.00 1.00 1.00 0.08\n", "NumRevolvingTradesWBalance 1.00 0.28 0.38 0.73 0.20\n", "NumInstallTradesWBalance 1.00 0.13 1.00 0.13 1.00\n", "NumBank2NatlTradesWHighUtilization 0.69 0.69 0.69 1.00 0.11\n", "PercentTradesWBalance 0.67 0.12 0.36 0.38 0.57" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z = z_train_good[S, 0:-1] # Store chosen prototypes\n", "eps = 1e-10 # Small constant defined to eliminate divide-by-zero errors\n", "fwt = np.zeros(z.shape)\n", "for i in range (z.shape[0]):\n", " for j in range(z.shape[1]):\n", " fwt[i, j] = np.exp(-1 * abs(X[0, j] - z[i,j])/(np.std(z[:, j])+eps)) # Compute feature similarity in [0,1]\n", " \n", "# move wts to a dataframe to display\n", "dfw = pd.DataFrame.from_records(np.around(fwt.astype('double'), 2))\n", "dfw.columns = df.columns[:-1]\n", "dfw.transpose() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Explanation:\n", "The above table depicts the five closest user profiles to the chosen applicant. Based on importance weight outputted by the method, we see that the prototype under column zero is the most representative user profile by far. This is (intuitively) confirmed from the feature similarity table above where more than 50% of the features (12 out of 23) of this prototype are identical to that of the chosen user whose prediction we want to explain. Also, the bank employee looking at the prototypical users and their features surmises that the approved applicant belongs to a group of approved users that have practically no debt (NetFractionInstallBurden). This justification gives the employee more confidence in approving the users application.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Example 2. Obtaining similar samples as explanations for a HELOC applicant predicted as \"Bad\". \n", "We now consider a user 1272 whose loan was denied. We obtained a contrastive explanation for this user before. Similar to user 8, we now obtain exemplar based explanations for this user to help the bank employee understand the reasons for the rejection. Steps similar to example 1 are followed in this case too, where we first process the data, obtain prototypes and their importance weights, and finally showcase how similar the features are of these prototypes to the user we want to explain." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### a. Normalize the data and chose a particular applicant, whose profile is displayed below." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "z_train_bad = z_train[z_train[:,-1]==0, :]\n", "zun_train_bad = zun_train[zun_train[:,-1]==0, :]" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Chosen Sample: 1272\n", "Prediction made by the model: Bad\n", "Prediction probabilities: [[ 0.40682057 -0.391679 ]]\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
ExternalRiskEstimate65
MSinceOldestTradeOpen256
MSinceMostRecentTradeOpen15
AverageMInFile52
NumSatisfactoryTrades17
NumTrades60Ever2DerogPubRec0
NumTrades90Ever2DerogPubRec0
PercentTradesNeverDelq100
MSinceMostRecentDelq0
MaxDelq2PublicRecLast12M7
MaxDelqEver8
NumTotalTrades19
NumTradesOpeninLast12M0
PercentInstallTrades29
MSinceMostRecentInqexcl7days2
NumInqLast6M5
NumInqLast6Mexcl7days5
NetFractionRevolvingBurden57
NetFractionInstallBurden79
NumRevolvingTradesWBalance2
NumInstallTradesWBalance4
NumBank2NatlTradesWHighUtilization2
PercentTradesWBalance60
RiskPerformanceBad
\n", "
" ], "text/plain": [ " 0\n", "ExternalRiskEstimate 65\n", "MSinceOldestTradeOpen 256\n", "MSinceMostRecentTradeOpen 15\n", "AverageMInFile 52\n", "NumSatisfactoryTrades 17\n", "NumTrades60Ever2DerogPubRec 0\n", "NumTrades90Ever2DerogPubRec 0\n", "PercentTradesNeverDelq 100\n", "MSinceMostRecentDelq 0\n", "MaxDelq2PublicRecLast12M 7\n", "MaxDelqEver 8\n", "NumTotalTrades 19\n", "NumTradesOpeninLast12M 0\n", "PercentInstallTrades 29\n", "MSinceMostRecentInqexcl7days 2\n", "NumInqLast6M 5\n", "NumInqLast6Mexcl7days 5\n", "NetFractionRevolvingBurden 57\n", "NetFractionInstallBurden 79\n", "NumRevolvingTradesWBalance 2\n", "NumInstallTradesWBalance 4\n", "NumBank2NatlTradesWHighUtilization 2\n", "PercentTradesWBalance 60\n", "RiskPerformance Bad" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx = 1272 #another user to try 2385\n", "\n", "X = xn_test[idx].reshape((1,) + xn_test[idx].shape)\n", "print(\"Chosen Sample:\", idx)\n", "print(\"Prediction made by the model:\", class_names[np.argmax(nn.predict_proba(X))])\n", "print(\"Prediction probabilities:\", nn.predict_proba(X))\n", "print(\"\")\n", "\n", "X = np.hstack((X, nn.predict_classes(X).reshape((1,1))))\n", "\n", "# move samples to a dataframe to display\n", "Xun = x_test[idx].reshape((1,) + x_test[idx].shape)\n", "dfx = pd.DataFrame.from_records(Xun.astype('double'))\n", "dfx[23] = class_names[X[0, -1]]\n", "dfx.columns = df.columns\n", "dfx.transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### b. Find similar applicants predicted as \"bad\" using the protodash explainer. " ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -2.0000e+04 4e+00 1e+00 1e+00\n", " 1: 1.3951e+01 -1.8757e+05 4e+01 1e+00 1e+00\n", " 2: -1.0452e+00 -2.4808e+06 5e+02 1e+00 1e+00\n", " 3: 9.9094e-01 -3.7820e+07 8e+03 1e+00 1e+00\n", " 4: 1.2044e+00 -5.7710e+09 1e+06 1e+00 1e+00\n", " 5: 1.6105e+08 -1.3427e+17 1e+17 7e-13 7e-04\n", " 6: 1.6105e+08 -1.3427e+15 1e+15 7e-15 2e-04\n", " 7: 1.6105e+08 -1.3427e+13 1e+13 2e-16 4e-06\n", " 8: 1.6104e+08 -1.3473e+11 1e+11 3e-17 2e-08\n", " 9: 1.6063e+08 -1.8053e+09 2e+09 3e-16 6e-10\n", "10: 1.2950e+08 -3.8084e+08 5e+08 2e-16 1e-09\n", "11: 6.9589e+06 -2.2750e+08 2e+08 5e-16 8e-12\n", "12: 2.4924e+06 -4.9489e+06 7e+06 1e-16 5e-13\n", "13: 3.7960e+05 -4.1688e+05 8e+05 3e-17 4e-13\n", "14: 5.4362e+04 -6.0989e+04 1e+05 6e-17 2e-13\n", "15: 7.7281e+03 -8.7442e+03 2e+04 3e-16 3e-14\n", "16: 1.0814e+03 -1.2777e+03 2e+03 3e-17 4e-15\n", "17: 1.4452e+02 -1.9320e+02 3e+02 3e-16 9e-15\n", "18: 1.6236e+01 -3.1901e+01 5e+01 2e-16 2e-15\n", "19: 7.9417e-02 -6.5709e+00 7e+00 4e-16 4e-16\n", "20: -1.5034e+00 -2.2383e+00 7e-01 3e-16 2e-16\n", "21: -1.6411e+00 -1.7669e+00 1e-01 9e-17 7e-17\n", "22: -1.6773e+00 -1.6963e+00 2e-02 1e-16 1e-16\n", "23: -1.6801e+00 -1.6816e+00 1e-03 5e-17 4e-17\n", "24: -1.6802e+00 -1.6802e+00 2e-05 2e-16 6e-17\n", "25: -1.6802e+00 -1.6802e+00 2e-07 2e-16 2e-16\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -3.0000e+04 6e+00 1e+00 1e+00\n", " 1: 2.2652e+01 -3.4494e+05 7e+01 1e+00 1e+00\n", " 2: -9.7166e-01 -1.5165e+06 3e+02 1e+00 1e+00\n", " 3: 8.8732e-01 -6.4545e+06 1e+03 1e+00 1e+00\n", " 4: 4.0619e+00 -9.5492e+07 2e+04 1e+00 1e+00\n", " 5: 6.8342e+00 -3.0797e+10 7e+06 1e+00 1e+00\n", " 6: 1.4054e+08 -6.7268e+17 7e+17 4e-13 2e-03\n", " 7: 1.4054e+08 -6.7268e+15 7e+15 3e-15 7e-04\n", " 8: 1.4054e+08 -6.7269e+13 7e+13 3e-16 2e-05\n", " 9: 1.4054e+08 -6.7332e+11 7e+11 2e-16 2e-07\n", "10: 1.4040e+08 -7.3721e+09 8e+09 3e-16 3e-09\n", "11: 1.2803e+08 -6.5498e+08 8e+08 1e-16 3e-10\n", "12: 1.0414e+07 -2.6146e+08 3e+08 2e-16 3e-10\n", "13: 3.1711e+06 -6.3976e+06 1e+07 9e-17 7e-12\n", "14: 4.8148e+05 -5.3517e+05 1e+06 2e-16 5e-13\n", "15: 6.8958e+04 -7.7139e+04 1e+05 1e-16 2e-13\n", "16: 9.8128e+03 -1.1070e+04 2e+04 2e-16 4e-14\n", "17: 1.3771e+03 -1.6135e+03 3e+03 3e-16 2e-14\n", "18: 1.8573e+02 -2.4246e+02 4e+02 1e-16 5e-15\n", "19: 2.1713e+01 -3.9393e+01 6e+01 1e-16 4e-15\n", "20: 7.1800e-01 -7.7941e+00 9e+00 2e-16 5e-16\n", "21: -1.4375e+00 -2.4351e+00 1e+00 1e-16 2e-16\n", "22: -1.6114e+00 -1.8294e+00 2e-01 2e-16 1e-16\n", "23: -1.6765e+00 -1.7151e+00 4e-02 1e-16 9e-17\n", "24: -1.6822e+00 -1.6854e+00 3e-03 2e-16 2e-16\n", "25: -1.6824e+00 -1.6824e+00 6e-05 5e-16 1e-16\n", "26: -1.6824e+00 -1.6824e+00 6e-07 7e-17 4e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -4.0000e+04 8e+00 1e+00 1e+00\n", " 1: 3.1518e+01 -5.1960e+05 1e+02 1e+00 1e+00\n", " 2: -5.3941e-01 -2.2190e+06 5e+02 1e+00 1e+00\n", " 3: -5.5212e-01 -8.2651e+06 2e+03 1e+00 1e+00\n", " 4: 8.1060e-01 -7.5932e+07 2e+04 1e+00 1e+00\n", " 5: 2.5556e+00 -6.5290e+09 1e+06 1e+00 1e+00\n", " 6: 3.5093e+08 -1.5662e+17 2e+17 9e-13 3e-04\n", " 7: 3.5093e+08 -1.5662e+15 2e+15 9e-15 1e-04\n", " 8: 3.5093e+08 -1.5663e+13 2e+13 2e-16 2e-06\n", " 9: 3.5088e+08 -1.5782e+11 2e+11 2e-16 3e-08\n", "10: 3.4591e+08 -2.7463e+09 3e+09 2e-16 8e-10\n", "11: 1.5618e+08 -4.5360e+08 6e+08 1e-16 1e-08\n", "12: 2.5039e+07 -5.2066e+07 8e+07 2e-16 3e-12\n", "13: 3.9788e+06 -4.5353e+06 9e+06 2e-16 5e-13\n", "14: 5.7260e+05 -6.4450e+05 1e+06 1e-16 1e-13\n", "15: 8.1978e+04 -9.1397e+04 2e+05 1e-16 2e-13\n", "16: 1.1669e+04 -1.3145e+04 2e+04 9e-17 3e-14\n", "17: 1.6394e+03 -1.9143e+03 4e+03 4e-16 9e-15\n", "18: 2.2187e+02 -2.8701e+02 5e+02 2e-16 4e-15\n", "19: 2.6316e+01 -4.6341e+01 7e+01 1e-16 2e-15\n", "20: 1.1314e+00 -9.0235e+00 1e+01 2e-16 8e-16\n", "21: -1.4923e+00 -2.7146e+00 1e+00 2e-16 4e-16\n", "22: -1.6504e+00 -1.7666e+00 1e-01 2e-16 1e-16\n", "23: -1.6819e+00 -1.6990e+00 2e-02 1e-16 2e-16\n", "24: -1.6844e+00 -1.6860e+00 2e-03 3e-16 9e-17\n", "25: -1.6845e+00 -1.6845e+00 5e-05 1e-16 5e-17\n", "26: -1.6845e+00 -1.6845e+00 5e-07 6e-17 9e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -5.0000e+04 1e+01 1e+00 1e+00\n", " 1: 4.2777e+01 -7.4034e+05 1e+02 1e+00 1e+00\n", " 2: -4.0907e-01 -3.0255e+06 6e+02 1e+00 1e+00\n", " 3: 3.2489e+00 -1.1962e+07 2e+03 1e+00 1e+00\n", " 4: 1.3141e+01 -1.5965e+08 3e+04 1e+00 1e+00\n", " 5: 2.0375e+01 -5.8520e+10 1e+07 1e+00 1e+00\n", " 6: 1.4807e+08 -1.2576e+18 1e+18 4e-13 3e-03\n", " 7: 1.4807e+08 -1.2576e+16 1e+16 4e-15 1e-03\n", " 8: 1.4807e+08 -1.2576e+14 1e+14 2e-16 2e-05\n", " 9: 1.4807e+08 -1.2588e+12 1e+12 2e-16 2e-07\n", "10: 1.4795e+08 -1.3762e+10 1e+10 2e-16 2e-09\n", "11: 1.3813e+08 -1.2350e+09 1e+09 1e-16 9e-10\n", "12: 1.4061e+07 -5.6498e+08 6e+08 3e-16 8e-11\n", "13: 3.8779e+06 -1.2907e+07 2e+07 2e-16 2e-12\n", "14: 5.8014e+05 -6.7599e+05 1e+06 2e-16 5e-13\n", "15: 8.2943e+04 -9.1922e+04 2e+05 2e-16 2e-13\n", "16: 1.1808e+04 -1.3283e+04 3e+04 2e-16 5e-14\n", "17: 1.6603e+03 -1.9330e+03 4e+03 8e-17 1e-14\n", "18: 2.2530e+02 -2.8932e+02 5e+02 1e-16 6e-15\n", "19: 2.7008e+01 -4.6485e+01 7e+01 2e-16 3e-15\n", "20: 1.3453e+00 -8.9445e+00 1e+01 1e-16 7e-16\n", "21: -1.3724e+00 -2.6264e+00 1e+00 1e-16 3e-16\n", "22: -1.5360e+00 -1.9467e+00 4e-01 5e-17 2e-16\n", "23: -1.6708e+00 -1.8638e+00 2e-01 2e-16 1e-16\n", "24: -1.6830e+00 -1.6970e+00 1e-02 3e-16 1e-16\n", "25: -1.6847e+00 -1.6856e+00 9e-04 2e-16 2e-16\n", "26: -1.6848e+00 -1.6848e+00 4e-05 1e-16 1e-16\n", "27: -1.6848e+00 -1.6848e+00 6e-07 3e-16 1e-16\n", "Optimal solution found.\n" ] } ], "source": [ "(W, S, setValues) = explainer.explain(X, z_train_bad, m=5) # Return weights W, Prototypes S and objective function values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### c. Display similar applicant user profiles and the extent to which they are similar to the chosen applicant as indicated by the last row in the table below labelled as \"Weight\"." ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
ExternalRiskEstimate736164550
MSinceOldestTradeOpen19112585194383
MSinceMostRecentTradeOpen177026383
AverageMInFile533213100383
NumSatisfactoryTrades1952181
NumTrades60Ever2DerogPubRec01001
NumTrades90Ever2DerogPubRec01001
PercentTradesNeverDelq10010010084100
MSinceMostRecentDelq00010
MaxDelq2PublicRecLast12M77746
MaxDelqEver88868
NumTotalTrades2069111
NumTradesOpeninLast12M03800
PercentInstallTrades25603342100
MSinceMostRecentInqexcl7days000230
NumInqLast6M016601
NumInqLast6Mexcl7days016601
NetFractionRevolvingBurden3123265840
NetFractionInstallBurden78830480
NumRevolvingTradesWBalance41250
NumInstallTradesWBalance33330
NumBank2NatlTradesWHighUtilization11130
PercentTradesWBalance54100711000
RiskPerformanceBadBadBadBadBad
Weight0.7817630.08225250.05739460.06428440.0143057
\n", "
" ], "text/plain": [ " 0 1 2 3 4\n", "ExternalRiskEstimate 73 61 64 55 0\n", "MSinceOldestTradeOpen 191 125 85 194 383\n", "MSinceMostRecentTradeOpen 17 7 0 26 383\n", "AverageMInFile 53 32 13 100 383\n", "NumSatisfactoryTrades 19 5 2 18 1\n", "NumTrades60Ever2DerogPubRec 0 1 0 0 1\n", "NumTrades90Ever2DerogPubRec 0 1 0 0 1\n", "PercentTradesNeverDelq 100 100 100 84 100\n", "MSinceMostRecentDelq 0 0 0 1 0\n", "MaxDelq2PublicRecLast12M 7 7 7 4 6\n", "MaxDelqEver 8 8 8 6 8\n", "NumTotalTrades 20 6 9 11 1\n", "NumTradesOpeninLast12M 0 3 8 0 0\n", "PercentInstallTrades 25 60 33 42 100\n", "MSinceMostRecentInqexcl7days 0 0 0 23 0\n", "NumInqLast6M 0 1 66 0 1\n", "NumInqLast6Mexcl7days 0 1 66 0 1\n", "NetFractionRevolvingBurden 31 232 65 84 0\n", "NetFractionInstallBurden 78 83 0 48 0\n", "NumRevolvingTradesWBalance 4 1 2 5 0\n", "NumInstallTradesWBalance 3 3 3 3 0\n", "NumBank2NatlTradesWHighUtilization 1 1 1 3 0\n", "PercentTradesWBalance 54 100 71 100 0\n", "RiskPerformance Bad Bad Bad Bad Bad\n", "Weight 0.781763 0.0822525 0.0573946 0.0642844 0.0143057" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# move samples to a dataframe to display\n", "dfs = pd.DataFrame.from_records(zun_train_bad[S, 0:-1].astype('double'))\n", "RP=[]\n", "for i in range(S.shape[0]):\n", " RP.append(class_names[z_train_bad[S[i], -1]]) # Append class names\n", "dfs[23] = RP\n", "dfs.columns = df.columns \n", "dfs[\"Weight\"] = np.around(W, 5)/np.sum(np.around(W, 5)) # Compute normalized importance weights for prototypes\n", "dfs.transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### d. Compute how similar a feature of a prototypical user is to the chosen applicant.\n", "The more similar the feature of prototypical user is to the applicant, the closer its weight is to 1. We can see below that several features for prototypes are quite similar to the chosen applicant. Following this table we provide human friendly explanation based on this table." ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
ExternalRiskEstimate0.730.860.960.680.08
MSinceOldestTradeOpen0.530.280.190.550.29
MSinceMostRecentTradeOpen0.990.950.900.930.08
AverageMInFile0.990.860.750.700.09
NumSatisfactoryTrades0.780.220.150.880.13
NumTrades60Ever2DerogPubRec1.000.131.001.000.13
NumTrades90Ever2DerogPubRec1.000.131.001.000.13
PercentTradesNeverDelq1.001.001.000.081.00
MSinceMostRecentDelq1.001.001.000.081.00
MaxDelq2PublicRecLast12M1.001.001.000.080.42
MaxDelqEver1.001.001.000.081.00
NumTotalTrades0.850.130.200.280.06
NumTradesOpeninLast12M1.000.380.081.001.00
PercentInstallTrades0.860.310.860.610.07
MSinceMostRecentInqexcl7days0.800.800.800.100.80
NumInqLast6M0.830.860.100.830.86
NumInqLast6Mexcl7days0.830.860.100.830.86
NetFractionRevolvingBurden0.720.110.910.710.49
NetFractionInstallBurden0.970.900.110.420.11
NumRevolvingTradesWBalance0.340.581.000.200.34
NumInstallTradesWBalance0.430.430.430.430.04
NumBank2NatlTradesWHighUtilization0.360.360.360.360.13
PercentTradesWBalance0.850.340.740.340.20
\n", "
" ], "text/plain": [ " 0 1 2 3 4\n", "ExternalRiskEstimate 0.73 0.86 0.96 0.68 0.08\n", "MSinceOldestTradeOpen 0.53 0.28 0.19 0.55 0.29\n", "MSinceMostRecentTradeOpen 0.99 0.95 0.90 0.93 0.08\n", "AverageMInFile 0.99 0.86 0.75 0.70 0.09\n", "NumSatisfactoryTrades 0.78 0.22 0.15 0.88 0.13\n", "NumTrades60Ever2DerogPubRec 1.00 0.13 1.00 1.00 0.13\n", "NumTrades90Ever2DerogPubRec 1.00 0.13 1.00 1.00 0.13\n", "PercentTradesNeverDelq 1.00 1.00 1.00 0.08 1.00\n", "MSinceMostRecentDelq 1.00 1.00 1.00 0.08 1.00\n", "MaxDelq2PublicRecLast12M 1.00 1.00 1.00 0.08 0.42\n", "MaxDelqEver 1.00 1.00 1.00 0.08 1.00\n", "NumTotalTrades 0.85 0.13 0.20 0.28 0.06\n", "NumTradesOpeninLast12M 1.00 0.38 0.08 1.00 1.00\n", "PercentInstallTrades 0.86 0.31 0.86 0.61 0.07\n", "MSinceMostRecentInqexcl7days 0.80 0.80 0.80 0.10 0.80\n", "NumInqLast6M 0.83 0.86 0.10 0.83 0.86\n", "NumInqLast6Mexcl7days 0.83 0.86 0.10 0.83 0.86\n", "NetFractionRevolvingBurden 0.72 0.11 0.91 0.71 0.49\n", "NetFractionInstallBurden 0.97 0.90 0.11 0.42 0.11\n", "NumRevolvingTradesWBalance 0.34 0.58 1.00 0.20 0.34\n", "NumInstallTradesWBalance 0.43 0.43 0.43 0.43 0.04\n", "NumBank2NatlTradesWHighUtilization 0.36 0.36 0.36 0.36 0.13\n", "PercentTradesWBalance 0.85 0.34 0.74 0.34 0.20" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z = z_train_bad[S, 0:-1] # Store the prototypes\n", "eps = 1e-10 # Small constant to guard against divide by zero errors\n", "fwt = np.zeros(z.shape)\n", "for i in range (z.shape[0]): # Compute feature similarity for each prototype\n", " for j in range(z.shape[1]):\n", " fwt[i, j] = np.exp(-1 * abs(X[0, j] - z[i,j])/(np.std(z[:, j])+eps))\n", " \n", "# move wts to a dataframe to display\n", "dfw = pd.DataFrame.from_records(np.around(fwt.astype('double'), 2))\n", "dfw.columns = df.columns[:-1]\n", "dfw.transpose() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Explanation:\n", "Here again, the above table depicts the five closest user profiles to the chosen applicant. Based on importance weight outputted by the method we see that the prototype under column zero is the most representative user profile by far. This is (intuitively) confirmed from the feature similarity table above where 10 features out of 23 of this prototype are highly similar (>0.9) to that of the user we want to explain. Also the bank employee can see that the applicant belongs to a group of rejected applicants with similar deliquency behavior. Realizing that the user also poses similar risk as these other applicants whose loan was rejected, the employee takes the more conservative decision of rejecting the users application as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 4. Customer: Contrastive explanations for HELOC Use Case\n", "\n", "We now demonstrate how to compute contrastive explanations using AIX360 and how such explanations can help home owners understand the decisions made by AI models that approve or reject their HELOC applications. \n", "\n", "Typically, home owners would like to understand why they do not qualify for a line of credit and if so what changes in their application would qualify them. On the other hand, if they qualified, they might want to know what factors led to the approval of their application. \n", "\n", "In this context, contrastive explanations provide information to applicants about what minimal changes to their profile would have changed the decision of the AI model from reject to accept or vice-versa (_pertinent negatives_). For example, increasing the number of satisfactory trades to a certain value may have led to the acceptance of the application everything else being the same. \n", "\n", "The method presented here also highlights a minimal set of features and their values that would still maintain the original decision (_pertinent positives_). For example, for an applicant whose HELOC application was approved, the \n", "explanation may say that even if the number of satisfactory trades was reduced to a lower number, the loan would have still gotten through.\n", "\n", "Additionally, organizations (Banks, financial institutions, etc.) would like to understand trends in the behavior of their AI models in approving loan applications, which could be done by studying contrastive explanations for individuals whose loans were either accepted or rejected. Looking at the aggregate statistics of pertinent positives for approved applicants the organization can get insight into what minimal set of features and their values play an important role in acceptances. While studying the aggregate statistics of pertinent negatives the organization can get insight into features that could change the status of rejected applicants and potentially uncover ways that an applicant may game the system by changing potentially non-important features that could alter the models outcome. \n", "\n", "The contrastive explanations in AIX360 are implemented using the algorithm developed in the following work:\n", "###### [Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives](https://arxiv.org/abs/1802.07623)\n", "\n", "We now provide a brief overview of the method. As mentioned above the algorithm outputs a contrastive explanation which consists of two parts: a) pertinent negatives (PNs) and b) pertinent positives (PPs). PNs identify a minimal set of features which if altered would change the classification of the original input. For example, in the loan case if a person's credit score is increased their loan application status may change from reject to accept. The manner in which the method accomplishes this is by optimizing a change in the prediction probability loss while enforcing an elastic norm constraint that results in minimal change of features and their values. Optionally, an auto-encoder may also be used to force these minimal changes to produce realistic PNs. PPs on the other hand identify a minimal set of features and their values that are sufficient to yield the original input's classification. For example, an individual's loan may still be accepted if the salary was 50K as opposed to 100K. Here again we have an elastic norm term so that the amount of information needed is minimal, however, the first loss term in this case tries to make the original input's class to be the winning class. For a more in-depth discussion, please refer to the above work.\n", "\n", "\n", "The three main steps to obtain a contrastive explanation are shown below. The first two steps are more about processing the data and building an AI model while the third step computes the actual explanation. \n", "\n", " [Step 1. Process and Normalize HELOC dataset for training](#c1)
\n", " [Step 2. Define and train a NN classifier](#c2)
\n", " [Step 3. Compute contrastive explanations for a few applicants](#c3)
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load HELOC dataset and show sample applicants" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/anaconda3/envs/aix360/lib/python3.6/site-packages/aix360/datasets/heloc_dataset.py:31: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[col][df[col].isin([-7, -8, -9])] = 0\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Size of HELOC dataset: (10459, 24)\n", "Number of \"Good\" applicants: 5000\n", "Number of \"Bad\" applicants: 5459\n", "Sample Applicants:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789
ExternalRiskEstimate55616766815954685961
MSinceOldestTradeOpen14458661693331378814832479
MSinceMostRecentTradeOpen4155127117724
AverageMInFile8441247313278376513836
NumSatisfactoryTrades202928123125172419
NumTrades60Ever2DerogPubRec3401000000
NumTrades90Ever2DerogPubRec0401000000
PercentTradesNeverDelq83100100931009192838595
MSinceMostRecentDelq2-7-776-7193155
MaxDelq2PublicRecLast12M3076744644
MaxDelqEver5886866666
NumTotalTrades237930123226182719
NumTradesOpeninLast12M1043013113
PercentInstallTrades43674457254758442626
MSinceMostRecentInqexcl7days0000000000
NumInqLast6M0045104016
NumInqLast6Mexcl7days0044104016
NetFractionRevolvingBurden3305372516289286831
NetFractionInstallBurden-8-8668389937648-886
NumRevolvingTradesWBalance80463127275
NumInstallTradesWBalance1-824147213
NumBank2NatlTradesWHighUtilization1-813032231
PercentTradesWBalance69086918094100409062
RiskPerformanceBadBadBadBadBadBadGoodGoodBadBad
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9\n", "ExternalRiskEstimate 55 61 67 66 81 59 54 68 59 61\n", "MSinceOldestTradeOpen 144 58 66 169 333 137 88 148 324 79\n", "MSinceMostRecentTradeOpen 4 15 5 1 27 11 7 7 2 4\n", "AverageMInFile 84 41 24 73 132 78 37 65 138 36\n", "NumSatisfactoryTrades 20 2 9 28 12 31 25 17 24 19\n", "NumTrades60Ever2DerogPubRec 3 4 0 1 0 0 0 0 0 0\n", "NumTrades90Ever2DerogPubRec 0 4 0 1 0 0 0 0 0 0\n", "PercentTradesNeverDelq 83 100 100 93 100 91 92 83 85 95\n", "MSinceMostRecentDelq 2 -7 -7 76 -7 1 9 31 5 5\n", "MaxDelq2PublicRecLast12M 3 0 7 6 7 4 4 6 4 4\n", "MaxDelqEver 5 8 8 6 8 6 6 6 6 6\n", "NumTotalTrades 23 7 9 30 12 32 26 18 27 19\n", "NumTradesOpeninLast12M 1 0 4 3 0 1 3 1 1 3\n", "PercentInstallTrades 43 67 44 57 25 47 58 44 26 26\n", "MSinceMostRecentInqexcl7days 0 0 0 0 0 0 0 0 0 0\n", "NumInqLast6M 0 0 4 5 1 0 4 0 1 6\n", "NumInqLast6Mexcl7days 0 0 4 4 1 0 4 0 1 6\n", "NetFractionRevolvingBurden 33 0 53 72 51 62 89 28 68 31\n", "NetFractionInstallBurden -8 -8 66 83 89 93 76 48 -8 86\n", "NumRevolvingTradesWBalance 8 0 4 6 3 12 7 2 7 5\n", "NumInstallTradesWBalance 1 -8 2 4 1 4 7 2 1 3\n", "NumBank2NatlTradesWHighUtilization 1 -8 1 3 0 3 2 2 3 1\n", "PercentTradesWBalance 69 0 86 91 80 94 100 40 90 62\n", "RiskPerformance Bad Bad Bad Bad Bad Bad Good Good Bad Bad" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "heloc = HELOCDataset()\n", "df = heloc.dataframe()\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 24)\n", "pd.set_option('display.width', 1000)\n", "print(\"Size of HELOC dataset:\", df.shape)\n", "print(\"Number of \\\"Good\\\" applicants:\", np.sum(df['RiskPerformance']=='Good'))\n", "print(\"Number of \\\"Bad\\\" applicants:\", np.sum(df['RiskPerformance']=='Bad'))\n", "print(\"Sample Applicants:\")\n", "df.head(10).transpose()" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Distribution of ExternalRiskEstimate and NumSatisfactoryTrades columns:\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot (example) distributions for two features\n", "print(\"Distribution of ExternalRiskEstimate and NumSatisfactoryTrades columns:\")\n", "hist = df.hist(column=['ExternalRiskEstimate', 'NumSatisfactoryTrades'], bins=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Step 1. Process and Normalize HELOC dataset for training\n", "\n", "We will first process the HELOC dataset before using it to train an NN model that can predict the\n", "target variable RiskPerformance. The HELOC dataset is a tabular dataset with numerical values. However, some of the values are negative and need to be filtered. The processed data is stored in the file heloc.npz for easy access. The dataset is also normalized for training.\n", "\n", "The data processing and model building is very similar to the Loan Officer persona above, where ProtoDash was the method of choice. We repeat these steps here so that both the use cases can be run independently." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### a. Process the dataset" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "# Clean data and split dataset into train/test\n", "PROCESS_DATA = False\n", "\n", "if (PROCESS_DATA): \n", " (Data, x_train, x_test, y_train_b, y_test_b) = heloc.split()\n", " np.savez('heloc.npz', Data=Data, x_train=x_train, x_test=x_test, y_train_b=y_train_b, y_test_b=y_test_b)\n", "else:\n", " heloc = np.load('heloc.npz', allow_pickle = True)\n", " Data = heloc['Data']\n", " x_train = heloc['x_train']\n", " x_test = heloc['x_test']\n", " y_train_b = heloc['y_train_b']\n", " y_test_b = heloc['y_test_b']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### b. Normalize the dataset" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "Z = np.vstack((x_train, x_test))\n", "Zmax = np.max(Z, axis=0)\n", "Zmin = np.min(Z, axis=0)\n", "\n", "#normalize an array of samples to range [-0.5, 0.5]\n", "def normalize(V):\n", " VN = (V - Zmin)/(Zmax - Zmin)\n", " VN = VN - 0.5\n", " return(VN)\n", " \n", "# rescale a sample to recover original values for normalized values. \n", "def rescale(X):\n", " return(np.multiply ( X + 0.5, (Zmax - Zmin) ) + Zmin)\n", "\n", "N = normalize(Z)\n", "xn_train = N[0:x_train.shape[0], :]\n", "xn_test = N[x_train.shape[0]:, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Step 2. Define and train a NN classifier\n", "\n", "Let us now build a loan approval model based on the HELOC dataset.\n", "\n", "#### a. Define NN architecture\n", "We now define the architecture of a 2-layer neural network classifier whose predictions we will try to interpret. " ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "# nn with no softmax\n", "def nn_small():\n", " model = Sequential()\n", " model.add(Dense(10, input_dim=23, kernel_initializer='normal', activation='relu'))\n", " model.add(Dense(2, kernel_initializer='normal')) \n", " return model " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### b. Train the NN" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "dense_9 (Dense) (None, 10) 240 \n", "_________________________________________________________________\n", "dense_10 (Dense) (None, 2) 22 \n", "=================================================================\n", "Total params: 262\n", "Trainable params: 262\n", "Non-trainable params: 0\n", "_________________________________________________________________\n", "Train accuracy: 0.7387545589625827\n", "Test accuracy: 0.7224473257698542\n" ] } ], "source": [ "# Set random seeds for repeatability\n", "np.random.seed(1) \n", "tf.set_random_seed(2) \n", "\n", "class_names = ['Bad', 'Good']\n", "\n", "# loss function\n", "def fn(correct, predicted):\n", " return tf.nn.softmax_cross_entropy_with_logits(labels=correct, logits=predicted)\n", "\n", "# compile and print model summary\n", "nn = nn_small()\n", "nn.compile(loss=fn, optimizer='adam', metrics=['accuracy'])\n", "nn.summary()\n", "\n", "\n", "# train model or load a trained model\n", "TRAIN_MODEL = False\n", "\n", "if (TRAIN_MODEL): \n", " nn.fit(xn_train, y_train_b, batch_size=128, epochs=500, verbose=1, shuffle=False)\n", " nn.save_weights(\"heloc_nnsmall.h5\") \n", "else: \n", " nn.load_weights(\"heloc_nnsmall.h5\")\n", " \n", "\n", "# evaluate model accuracy \n", "score = nn.evaluate(xn_train, y_train_b, verbose=0) #Compute training set accuracy\n", "#print('Train loss:', score[0])\n", "print('Train accuracy:', score[1])\n", "\n", "score = nn.evaluate(xn_test, y_test_b, verbose=0) #Compute test set accuracy\n", "#print('Test loss:', score[0])\n", "print('Test accuracy:', score[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Step 3. Compute contrastive explanations for a few applicants\n", "\n", "Given the trained NN model to decide on loan approvals, let us first examine an applicant whose application was denied and what (minimal) changes to his/her application would lead to approval (i.e. finding pertinent negatives). We will then look at another applicant whose loan was approved and ascertain features that would minimally suffice in him/her still getting a positive outcome (i.e. finding pertinent positives).\n", "\n", "#### a. Compute Pertinent Negatives (PN): \n", "\n", "In order to compute pertinent negatives, the CEM explainer computes a user profile that is close to the original applicant but for whom the decision of HELOC application is different. The explainer alters a minimal set of features by a minimal (positive) amount. This will help the user whose loan application was initially rejected say, to ascertain how to get it accepted. " ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Computing PN for Sample: 1272\n", "Prediction made by the model: [[ 0.40682057 -0.391679 ]]\n", "Prediction probabilities: Bad\n", "\n", "iter:0 const:[10.]\n", "Loss_Overall:0.2935, Loss_Attack:0.0000\n", "Loss_L2Dist:0.2065, Loss_L1Dist:0.8703, AE_loss:0.0\n", "target_lab_score:-1.1559, max_nontarget_lab_score:1.3184\n", "\n", "iter:500 const:[10.]\n", "Loss_Overall:5.9870, Loss_Attack:5.9782\n", "Loss_L2Dist:0.0032, Loss_L1Dist:0.0563, AE_loss:0.0\n", "target_lab_score:0.2639, max_nontarget_lab_score:-0.2339\n", "\n", "iter:0 const:[5.]\n", "Loss_Overall:0.0668, Loss_Attack:0.0000\n", "Loss_L2Dist:0.0368, Loss_L1Dist:0.3000, AE_loss:0.0\n", "target_lab_score:-0.2295, max_nontarget_lab_score:0.3076\n", "\n", "iter:500 const:[5.]\n", "Loss_Overall:1.5487, Loss_Attack:1.5277\n", "Loss_L2Dist:0.0085, Loss_L1Dist:0.1243, AE_loss:0.0\n", "target_lab_score:0.1245, max_nontarget_lab_score:-0.0810\n", "\n", "iter:0 const:[2.5]\n", "Loss_Overall:1.8033, Loss_Attack:1.7989\n", "Loss_L2Dist:0.0011, Loss_L1Dist:0.0335, AE_loss:0.0\n", "target_lab_score:0.3218, max_nontarget_lab_score:-0.2978\n", "\n", "iter:500 const:[2.5]\n", "Loss_Overall:2.2462, Loss_Attack:2.2462\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0000, AE_loss:0.0\n", "target_lab_score:0.4068, max_nontarget_lab_score:-0.3917\n", "\n", "iter:0 const:[1.25]\n", "Loss_Overall:1.1231, Loss_Attack:1.1231\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0000, AE_loss:0.0\n", "target_lab_score:0.4068, max_nontarget_lab_score:-0.3917\n", "\n", "iter:500 const:[1.25]\n", "Loss_Overall:1.1231, Loss_Attack:1.1231\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0000, AE_loss:0.0\n", "target_lab_score:0.4068, max_nontarget_lab_score:-0.3917\n", "\n", "iter:0 const:[1.875]\n", "Loss_Overall:1.6834, Loss_Attack:1.6834\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0001, AE_loss:0.0\n", "target_lab_score:0.4065, max_nontarget_lab_score:-0.3913\n", "\n", "iter:500 const:[1.875]\n", "Loss_Overall:1.6847, Loss_Attack:1.6847\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0000, AE_loss:0.0\n", "target_lab_score:0.4068, max_nontarget_lab_score:-0.3917\n", "\n", "iter:0 const:[2.1875]\n", "Loss_Overall:1.7709, Loss_Attack:1.7690\n", "Loss_L2Dist:0.0003, Loss_L1Dist:0.0168, AE_loss:0.0\n", "target_lab_score:0.3641, max_nontarget_lab_score:-0.3445\n", "\n", "iter:500 const:[2.1875]\n", "Loss_Overall:1.9655, Loss_Attack:1.9655\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0000, AE_loss:0.0\n", "target_lab_score:0.4068, max_nontarget_lab_score:-0.3917\n", "\n", "iter:0 const:[2.03125]\n", "Loss_Overall:1.7340, Loss_Attack:1.7331\n", "Loss_L2Dist:0.0001, Loss_L1Dist:0.0085, AE_loss:0.0\n", "target_lab_score:0.3853, max_nontarget_lab_score:-0.3679\n", "\n", "iter:500 const:[2.03125]\n", "Loss_Overall:1.8251, Loss_Attack:1.8251\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0000, AE_loss:0.0\n", "target_lab_score:0.4068, max_nontarget_lab_score:-0.3917\n", "\n", "iter:0 const:[1.953125]\n", "Loss_Overall:1.7104, Loss_Attack:1.7100\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0043, AE_loss:0.0\n", "target_lab_score:0.3959, max_nontarget_lab_score:-0.3796\n", "\n", "iter:500 const:[1.953125]\n", "Loss_Overall:1.7549, Loss_Attack:1.7549\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0000, AE_loss:0.0\n", "target_lab_score:0.4068, max_nontarget_lab_score:-0.3917\n", "\n", "iter:0 const:[1.9921875]\n", "Loss_Overall:1.7227, Loss_Attack:1.7220\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0064, AE_loss:0.0\n", "target_lab_score:0.3906, max_nontarget_lab_score:-0.3738\n", "\n", "iter:500 const:[1.9921875]\n", "Loss_Overall:1.7900, Loss_Attack:1.7900\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0000, AE_loss:0.0\n", "target_lab_score:0.4068, max_nontarget_lab_score:-0.3917\n", "\n" ] } ], "source": [ "# Some interesting user samples to try: 2344 449 1168 1272\n", "idx = 1272\n", "\n", "X = xn_test[idx].reshape((1,) + xn_test[idx].shape)\n", "print(\"Computing PN for Sample:\", idx)\n", "print(\"Prediction made by the model:\", nn.predict_proba(X))\n", "print(\"Prediction probabilities:\", class_names[np.argmax(nn.predict_proba(X))])\n", "print(\"\")\n", "\n", "mymodel = KerasClassifier(nn)\n", "explainer = CEMExplainer(mymodel)\n", "\n", "arg_mode = 'PN' # Find pertinent negatives\n", "arg_max_iter = 1000 # Maximum number of iterations to search for the optimal PN for given parameter settings\n", "arg_init_const = 10.0 # Initial coefficient value for main loss term that encourages class change\n", "arg_b = 9 # No. of updates to the coefficient of the main loss term\n", "arg_kappa = 0.1 # Minimum confidence gap between the PNs (changed) class probability and original class' probability\n", "arg_beta = 1e-1 # Controls sparsity of the solution (L1 loss)\n", "arg_gamma = 100 # Controls how much to adhere to a (optionally trained) auto-encoder\n", "my_AE_model = None # Pointer to an auto-encoder\n", "\n", "# Find PN for applicant 1272\n", "(adv_pn, delta_pn, info_pn) = explainer.explain_instance(X, arg_mode, my_AE_model, arg_kappa, arg_b,\n", " arg_max_iter, arg_init_const, arg_beta, arg_gamma)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us start by examining one particular loan application that was denied for applicant 1272. We showcase below how the decision could have been different through minimal changes to the profile conveyed by the pertinent negative. We also indicate the importance of different features to produce the change in the application status. The column delta in the table below indicates the necessary deviations for each of the features to produce this change. A human friendly explanation is then provided based on these deviations following the feature importance plot." ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sample: 1272\n", "prediction(X) [[ 0.40682057 -0.391679 ]] Bad\n", "prediction(Xpn) [[-0.02118406 0.07892033]] Good\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
X X_PN (X_PN - X)
ExternalRiskEstimate6576.3711.37
MSinceOldestTradeOpen2562560
MSinceMostRecentTradeOpen15150
AverageMInFile5264.8112.81
NumSatisfactoryTrades1720.33.3
NumTrades60Ever2DerogPubRec000
NumTrades90Ever2DerogPubRec000
PercentTradesNeverDelq1001000
MSinceMostRecentDelq000
MaxDelq2PublicRecLast12M770
MaxDelqEver880
NumTotalTrades19190
NumTradesOpeninLast12M000
PercentInstallTrades29290
MSinceMostRecentInqexcl7days220
NumInqLast6M550
NumInqLast6Mexcl7days550
NetFractionRevolvingBurden57570
NetFractionInstallBurden79790
NumRevolvingTradesWBalance220
NumInstallTradesWBalance440
NumBank2NatlTradesWHighUtilization220
PercentTradesWBalance60600
RiskPerformanceBadGoodNIL
" ], "text/plain": [ "" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Xpn = adv_pn\n", "classes = [ class_names[np.argmax(nn.predict_proba(X))], class_names[np.argmax(nn.predict_proba(Xpn))], 'NIL' ]\n", "\n", "print(\"Sample:\", idx)\n", "print(\"prediction(X)\", nn.predict_proba(X), class_names[np.argmax(nn.predict_proba(X))])\n", "print(\"prediction(Xpn)\", nn.predict_proba(Xpn), class_names[np.argmax(nn.predict_proba(Xpn))] )\n", "\n", "\n", "X_re = rescale(X) # Convert values back to original scale from normalized\n", "Xpn_re = rescale(Xpn)\n", "Xpn_re = np.around(Xpn_re.astype(np.double), 2)\n", "\n", "delta_re = Xpn_re - X_re\n", "delta_re = np.around(delta_re.astype(np.double), 2)\n", "delta_re[np.absolute(delta_re) < 1e-4] = 0\n", "\n", "X3 = np.vstack((X_re, Xpn_re, delta_re))\n", "\n", "dfre = pd.DataFrame.from_records(X3) # Create dataframe to display original point, PN and difference (delta)\n", "dfre[23] = classes\n", "\n", "dfre.columns = df.columns\n", "dfre.rename(index={0:'X',1:'X_PN', 2:'(X_PN - X)'}, inplace=True)\n", "dfret = dfre.transpose()\n", "\n", "\n", "def highlight_ce(s, col, ncols):\n", " if (type(s[col]) != str):\n", " if (s[col] > 0):\n", " return(['background-color: yellow']*ncols) \n", " return(['background-color: white']*ncols)\n", "\n", "dfret.style.apply(highlight_ce, col='(X_PN - X)', ncols=3, axis=1) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let us compute the importance of different PN features that would be instrumental in 1272 receiving a favorable outcome and display below." ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.rcdefaults()\n", "fi = abs((X-Xpn).astype('double'))/np.std(xn_train.astype('double'), axis=0) # Compute PN feature importance\n", "objects = df.columns[-2::-1]\n", "y_pos = np.arange(len(objects))\n", "performance = fi[0, -1::-1]\n", "\n", "plt.barh(y_pos, performance, align='center', alpha=0.5) # bar chart\n", "plt.yticks(y_pos, objects) # Display features on y-axis\n", "plt.xlabel('weight') # x-label\n", "plt.title('PN (feature importance)') # Heading\n", "\n", "plt.show() # Display PN feature importance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Explanation: \n", "We observe that the applicant 1272's loan application would have been accepted if the consolidated risk marker score (i.e. ExternalRiskEstimate) increased from 65 to 76, the loan application was on file (i.e. AverageMlnFile) for about 65 months and if the number of satisfactory trades (i.e. NumSatisfactoryTrades) increased to little over 20.\n", "\n", "_The above changes to the three suggested factors are also intuitively consistent in improving the chances of acceptance of an application, since all three are monotonic with probability of acceptance (refer HELOC description table). \n", "However, one must realize that the above explanation is for the particular applicant based on what the model would do and does not necessarily have to agree with their intuitive meaning. In fact, if the explanation is deemed unacceptable then its an indication that perhaps the model should be debugged/updated_." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Compute Pertinent Positives (PP):\n", "In order to compute pertinent positives, the CEM explainer identifies a minimal set of features along with their values (as close to 0) that would still maintain the predicted loan application status of the applicant." ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Computing PP for Sample: 8\n", "Prediction made by the model: Good\n", "Prediction probabilities: [[-0.1889221 0.29527372]]\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/anaconda3/envs/aix360/lib/python3.6/site-packages/keras/engine/sequential.py:247: UserWarning: Network returning invalid probability values. The last layer might not normalize predictions into probabilities (like softmax or sigmoid would).\n", " warnings.warn('Network returning invalid probability values. '\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "iter:0 const:[10.]\n", "Loss_Overall:8.1419, Loss_Attack:7.9649\n", "Loss_L2Dist:0.1243, Loss_L1Dist:0.5266, AE_loss:0.0\n", "target_lab_score:-0.3578, max_nontarget_lab_score:0.3387\n", "\n", "iter:500 const:[10.]\n", "Loss_Overall:0.3945, Loss_Attack:0.0000\n", "Loss_L2Dist:0.3318, Loss_L1Dist:0.6264, AE_loss:0.0\n", "target_lab_score:0.0991, max_nontarget_lab_score:-0.0737\n", "\n", "iter:0 const:[5.]\n", "Loss_Overall:8.4407, Loss_Attack:8.3992\n", "Loss_L2Dist:0.0216, Loss_L1Dist:0.1987, AE_loss:0.0\n", "target_lab_score:-0.8223, max_nontarget_lab_score:0.7575\n", "\n", "iter:500 const:[5.]\n", "Loss_Overall:4.1897, Loss_Attack:3.9453\n", "Loss_L2Dist:0.1997, Loss_L1Dist:0.4469, AE_loss:0.0\n", "target_lab_score:-0.3484, max_nontarget_lab_score:0.3406\n", "\n", "iter:0 const:[2.5]\n", "Loss_Overall:6.1030, Loss_Attack:6.1013\n", "Loss_L2Dist:0.0002, Loss_L1Dist:0.0149, AE_loss:0.0\n", "target_lab_score:-1.2149, max_nontarget_lab_score:1.1256\n", "\n", "iter:500 const:[2.5]\n", "Loss_Overall:6.2723, Loss_Attack:6.2723\n", "Loss_L2Dist:0.0000, Loss_L1Dist:0.0000, AE_loss:0.0\n", "target_lab_score:-1.2499, max_nontarget_lab_score:1.1590\n", "\n", "iter:0 const:[3.75]\n", "Loss_Overall:7.8400, Loss_Attack:7.8242\n", "Loss_L2Dist:0.0059, Loss_L1Dist:0.0990, AE_loss:0.0\n", "target_lab_score:-1.0324, max_nontarget_lab_score:0.9541\n", "\n", "iter:500 const:[3.75]\n", "Loss_Overall:5.8230, Loss_Attack:5.7570\n", "Loss_L2Dist:0.0449, Loss_L1Dist:0.2119, AE_loss:0.0\n", "target_lab_score:-0.7511, max_nontarget_lab_score:0.6841\n", "\n", "iter:0 const:[4.375]\n", "Loss_Overall:8.2661, Loss_Attack:8.2388\n", "Loss_L2Dist:0.0125, Loss_L1Dist:0.1488, AE_loss:0.0\n", "target_lab_score:-0.9274, max_nontarget_lab_score:0.8558\n", "\n", "iter:500 const:[4.375]\n", "Loss_Overall:6.5857, Loss_Attack:6.5105\n", "Loss_L2Dist:0.0523, Loss_L1Dist:0.2288, AE_loss:0.0\n", "target_lab_score:-0.7263, max_nontarget_lab_score:0.6618\n", "\n", "iter:0 const:[4.0625]\n", "Loss_Overall:8.0845, Loss_Attack:8.0632\n", "Loss_L2Dist:0.0089, Loss_L1Dist:0.1239, AE_loss:0.0\n", "target_lab_score:-0.9799, max_nontarget_lab_score:0.9049\n", "\n", "iter:500 const:[4.0625]\n", "Loss_Overall:6.6793, Loss_Attack:6.6236\n", "Loss_L2Dist:0.0365, Loss_L1Dist:0.1912, AE_loss:0.0\n", "target_lab_score:-0.7999, max_nontarget_lab_score:0.7306\n", "\n", "iter:0 const:[3.90625]\n", "Loss_Overall:7.9701, Loss_Attack:7.9516\n", "Loss_L2Dist:0.0073, Loss_L1Dist:0.1115, AE_loss:0.0\n", "target_lab_score:-1.0061, max_nontarget_lab_score:0.9295\n", "\n", "iter:500 const:[3.90625]\n", "Loss_Overall:6.2569, Loss_Attack:6.1965\n", "Loss_L2Dist:0.0403, Loss_L1Dist:0.2008, AE_loss:0.0\n", "target_lab_score:-0.7772, max_nontarget_lab_score:0.7091\n", "\n", "iter:0 const:[3.828125]\n", "Loss_Overall:7.9070, Loss_Attack:7.8899\n", "Loss_L2Dist:0.0066, Loss_L1Dist:0.1052, AE_loss:0.0\n", "target_lab_score:-1.0193, max_nontarget_lab_score:0.9418\n", "\n", "iter:500 const:[3.828125]\n", "Loss_Overall:6.3022, Loss_Attack:6.2467\n", "Loss_L2Dist:0.0364, Loss_L1Dist:0.1909, AE_loss:0.0\n", "target_lab_score:-0.8005, max_nontarget_lab_score:0.7312\n", "\n", "iter:0 const:[3.8671875]\n", "Loss_Overall:7.9391, Loss_Attack:7.9213\n", "Loss_L2Dist:0.0070, Loss_L1Dist:0.1083, AE_loss:0.0\n", "target_lab_score:-1.0127, max_nontarget_lab_score:0.9356\n", "\n", "iter:500 const:[3.8671875]\n", "Loss_Overall:6.0266, Loss_Attack:5.9612\n", "Loss_L2Dist:0.0443, Loss_L1Dist:0.2105, AE_loss:0.0\n", "target_lab_score:-0.7543, max_nontarget_lab_score:0.6872\n", "\n" ] } ], "source": [ "# Some interesting user samples to try: 8 9 11\n", "idx = 8\n", "\n", "X = xn_test[idx].reshape((1,) + xn_test[idx].shape)\n", "print(\"Computing PP for Sample:\", idx)\n", "print(\"Prediction made by the model:\", class_names[np.argmax(nn.predict_proba(X))])\n", "print(\"Prediction probabilities:\", nn.predict_proba(X))\n", "print(\"\")\n", "\n", "\n", "mymodel = KerasClassifier(nn)\n", "explainer = CEMExplainer(mymodel)\n", "\n", "arg_mode = 'PP' # Find pertinent positives\n", "arg_max_iter = 1000 # Maximum number of iterations to search for the optimal PN for given parameter settings\n", "arg_init_const = 10.0 # Initial coefficient value for main loss term that encourages class change\n", "arg_b = 9 # No. of updates to the coefficient of the main loss term\n", "arg_kappa = 0.1 # Minimum confidence gap between the PNs (changed) class probability and original class' probability\n", "arg_beta = 1e-1 # Controls sparsity of the solution (L1 loss)\n", "arg_gamma = 100 # Controls how much to adhere to a (optionally trained) auto-encoder\n", "my_AE_model = None # Pointer to an auto-encoder\n", "\n", "(adv_pp, delta_pp, info_pp) = explainer.explain_instance(X, arg_mode, my_AE_model, arg_kappa, arg_b,\n", " arg_max_iter, arg_init_const, arg_beta, arg_gamma)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the pertinent positives, we look at a different applicant 8 whose loan application was approved. We want to ascertain here what minimal values for this profile would still have lead to acceptance. Below, we showcase the pertinent positive as well as the important features in maintaining the approved status. The 0s in the PP column indicate that those features were not important. The 0s in the PP column indicate that those features were not important. Here too, we provide a human friendly explanation following the feature importance plot." ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PP for Sample: 8\n", "Prediction(Xpp) : Good\n", "Prediction probabilities for Xpp: [[-0.09004497 0.11049862]]\n", "\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
X X_PP
ExternalRiskEstimate8237.65
MSinceOldestTradeOpen2800
MSinceMostRecentTradeOpen130
AverageMInFile10273.67
NumSatisfactoryTrades2211.49
NumTrades60Ever2DerogPubRec00
NumTrades90Ever2DerogPubRec00
PercentTradesNeverDelq910
MSinceMostRecentDelq260
MaxDelq2PublicRecLast12M60
MaxDelqEver60
NumTotalTrades230
NumTradesOpeninLast12M00
PercentInstallTrades90
MSinceMostRecentInqexcl7days00
NumInqLast6M00
NumInqLast6Mexcl7days00
NetFractionRevolvingBurden30
NetFractionInstallBurden00
NumRevolvingTradesWBalance40
NumInstallTradesWBalance10
NumBank2NatlTradesWHighUtilization10
PercentTradesWBalance420
RiskPerformanceGoodGood
" ], "text/plain": [ "" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Xpp = delta_pp\n", "classes = [ class_names[np.argmax(nn.predict_proba(X))], class_names[np.argmax(nn.predict_proba(Xpp))]]\n", "\n", "print(\"PP for Sample:\", idx)\n", "print(\"Prediction(Xpp) :\", class_names[np.argmax(nn.predict_proba(Xpp))])\n", "print(\"Prediction probabilities for Xpp:\", nn.predict_proba(Xpp))\n", "print(\"\")\n", "\n", "X_re = rescale(X) # Convert values back to original scale from normalized\n", "adv_pp_re = rescale(adv_pp)\n", "Xpp_re = X_re - adv_pp_re\n", "Xpp_re = np.around(Xpp_re.astype(np.double), 2)\n", "Xpp_re[Xpp_re < 1e-4] = 0\n", "\n", "X2 = np.vstack((X_re, Xpp_re))\n", "\n", "dfpp = pd.DataFrame.from_records(X2.astype('double')) # Showcase a dataframe for the original point and PP\n", "dfpp[23] = classes\n", "dfpp.columns = df.columns\n", "dfpp.rename(index={0:'X',1:'X_PP'}, inplace=True)\n", "dfppt = dfpp.transpose()\n", "\n", "dfppt.style.apply(highlight_ce, col='X_PP', ncols=2, axis=1) " ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.rcdefaults()\n", "fi = abs(Xpp_re.astype('double'))/np.std(x_train.astype('double'), axis=0) # Compute PP feature importance\n", " \n", "objects = df.columns[-2::-1]\n", "y_pos = np.arange(len(objects)) # Get input feature names\n", "performance = fi[0, -1::-1]\n", "\n", "plt.barh(y_pos, performance, align='center', alpha=0.5) # Bar chart\n", "plt.yticks(y_pos, objects) # Plot feature names on y-axis\n", "plt.xlabel('weight') #x-label\n", "plt.title('PP (feature importance)') # Figure heading\n", "\n", "plt.show() # Display the feature importance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Explanation: \n", "We observe that the applicant 8's loan application would still have been accepted even if the consolidated risk marker score (i.e. ExternalRiskEstimate) reduced from 82 to around 40, application was on file (i.e. AverageMlnFile) for close to 70 months and number of satisfactory trades (i.e. NumSatisfactoryTrades) reduced from 22 to almost single digits.\n", "\n", "_Note that explanations may change a bit based on equivalent values in a local minima._" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }