{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ASE2021 Hands-on Exercise\n", "\n", "Below are interactive hands-on exercises for model-agnostic techniques for generating local explanations.\n", "First, we need to load necesarry libraries as well as preparing datasets.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "## Load Data and preparing datasets\n", "\n", "# Import for Load Data\n", "from os import listdir\n", "from os.path import isfile, join\n", "import pandas as pd\n", "\n", "# Import for Split Data into Training and Testing Samples\n", "from sklearn.model_selection import train_test_split\n", "\n", "train_dataset = pd.read_csv((\"../../datasets/lucene-2.9.0.csv\"), index_col = 'File')\n", "test_dataset = pd.read_csv((\"../../datasets/lucene-3.0.0.csv\"), index_col = 'File')\n", "\n", "outcome = 'RealBug'\n", "features = ['OWN_COMMIT', 'Added_lines', 'CountClassCoupled', 'AvgLine', 'RatioCommentToCode']\n", "# commits - # of commits that modify the file of interest\n", "# Added lines - # of added lines of code\n", "# Count class coupled - # of classes that interact or couple with the class of interest\n", "# LOC - # of lines of code\n", "# RatioCommentToCode - The ratio of lines of comments to lines of code\n", "\n", "# process outcome to 0 and 1\n", "train_dataset[outcome] = pd.Categorical(train_dataset[outcome])\n", "train_dataset[outcome] = train_dataset[outcome].cat.codes\n", "\n", "test_dataset[outcome] = pd.Categorical(test_dataset[outcome])\n", "test_dataset[outcome] = test_dataset[outcome].cat.codes\n", "\n", "X_train = train_dataset.loc[:, features]\n", "X_test = test_dataset.loc[:, features]\n", "\n", "y_train = train_dataset.loc[:, outcome]\n", "y_test = test_dataset.loc[:, outcome]\n", "\n", "class_labels = ['Clean', 'Defective']\n", "\n", "X_train.columns = features\n", "X_test.columns = features\n", "training_data = pd.concat([X_train, y_train], axis=1)\n", "testing_data = pd.concat([X_test, y_test], axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we construct a Random Forests model as a predictive model to be explained.\n", "\n", "**(1) Please construct a Random Forests model using the code cell below.**\n", "\n", "\n", "`````{admonition} Tips\n", ":class: tip\n", "````\n", "\n", "our_rf_model = RandomForestClassifier(random_state=0)\n", "our_rf_model.fit(X_train, y_train) \n", "\n", "````\n", "`````" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "\n", "# Please fit your Random Forests model here!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LIME\n", "\n", "**LIME** (i.e., Local Interpretable Model-agnostic\n", "Explanations) {cite}`ribeiro2016should` is a model-agnostic technique that\n", "mimics the behaviour of the black-box model to generate the explanations\n", "of the predictions of the black-box model. Given a black-box model and\n", "an instance to explain, LIME performs 4 key steps to generate an\n", "instance explanation as follows:\n", "\n", "- First, LIME randomly generates instances surrounding the instance of\n", " interest.\n", "\n", "- Second, LIME uses the black-box model to generate predictions of the\n", " generated random instances.\n", "\n", "- Third, LIME constructs a local regression model using the generated\n", " random instances and their generated predictions from the black-box\n", " model.\n", "\n", "- Finally, the coefficients of the regression model indicate the\n", " contribution of each metric on the prediction of the instance of\n", " interest according to the black-box model.\n", " \n", "**(2) Please use LIME to explain the prediction of *DocumentsWriter.java* that is generated from your Random Forests model.**\n", "\n", "`````{admonition} Tips\n", ":class: tip\n", "````\n", "\n", "# LIME Step 1 - Construct an explainer\n", "our_lime_explainer = lime.lime_tabular.LimeTabularExplainer(\n", " training_data = X_train.values, \n", " mode = 'classification',\n", " training_labels = y_train,\n", " feature_names = features,\n", " class_names = class_labels,\n", " discretize_continuous = True)\n", " \n", "# LIME Step 2 - Use the constructed explainer with the predict function \n", "# of your predictive model to explain any instance\n", "lime_local_explanation_of_an_instance = lime_explainer.explain_instance(\n", " data_row = X_test.loc['FileName.py', :], \n", " predict_fn = our_rf_model.predict_proba, \n", " num_features = 5,\n", " top_labels = 1)\n", " \n", "# Please use the code below to visualise the generated LIME explanation.\n", "lime_local_explanation_of_an_instance.show_in_notebook()\n", "\n", "````\n", "`````" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Explaining src/java/org/apache/lucene/index/DocumentsWriter.java with LIME\n" ] } ], "source": [ "# Import for LIME\n", "import lime\n", "import lime.lime_tabular\n", "\n", "file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'\n", "\n", "print(f'Explaining {file_to_be_explained} with LIME')\n", "\n", "# LIME Step 1 - Construct an explainer\n", "\n", "\n", "# LIME Step 2 - Use the constructed explainer with the predict function of your predictive model to explain any instance\n", "\n", "\n", "# visualise the generated LIME explanation\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SHAP\n", "\n", "**SHAP** (Shapley values) {cite}`lundberg2018consistentshap` is a model-agnostic technique that generate the explanations of the black-box model based on game theory.\n", "\n", "\n", "**(2) Please use LIME to explain the prediction of *DocumentsWriter.java* that is generated from your Random Forests model.**\n", "\n", "`````{admonition} Tips\n", ":class: tip\n", "````\n", "\n", "# SHAP Step 1 - Construct an explainer with the predict function\n", "# of your predictive model\n", "our_shap_explainer = shap.KernelExplainer(our_rf_model.predict, X_test)\n", " \n", "# SHAP Step 2 - Generate the SHAP explanation of an instance to be explained\n", "shap_explanations_of_an_instance = our_shap_explainer.shap_values(X_test.iloc[file_to_be_explained_idx, :])\n", " \n", "# Please use the code below to visualise the generated SHAP explanation (Force plot).\n", "shap.initjs()\n", "shap.force_plot(our_shap_explainer.expected_value, \n", " shap_explanations_of_instances, \n", " X_test.iloc[file_to_be_explained_idx,:])\n", "\n", "````\n", "`````" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Import libraries for SHAP\n", "import subprocess\n", "import sys\n", "import importlib\n", "import numpy\n", "import shap\n", "\n", "file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'\n", "file_to_be_explained_idx = list(X_test.index).index(file_to_be_explained)\n", "\n", "\n", "# SHAP Step 1 - Construct an explainer with the predict function\n", "\n", "\n", "# SHAP Step 2 - Generate the SHAP explanation of an instance to be explained\n", "\n", "\n", "# visualise the generated SHAP explanation\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PyExplainer\n", "\n", "**PyExplainer** {cite}`pornprasit2021pyexplainer` is a rule-based model-agnostic technique that utilises a local rule-based regression model to learn the associations between the characteristics of the synthetic instances and the predictions from the black-box model. Given a black-box model and an instance to explain, PyExplainer performs four key steps to generate an instance explanation as follows:\n", "\n", "- First, PyExplainer generates synthetic neighbors around the instance to be explained using the crossover\n", " and mutation techniques\n", "\n", "- Second, PyExplainer obtains the predictions of the synthetic neighbors from the black-box model\n", "\n", "- Third, PyExplainer builds a local rule-based regression model\n", "\n", "- Finally, PyExplainer generates an explanation from the local model for the instance to be explained\n", " \n", "**(3) Please use PyExplainer to explain the prediction of *DocumentsWriter.java* that is generated from your Random Forests model.**\n", "\n", "`````{admonition} Tips\n", ":class: tip\n", "````\n", "import numpy as np\n", "np.random.seed(0)\n", "\n", "# PyExplainer Step 1 - Construct a PyExplainer \n", "our_pyexplainer = PyExplainer(X_train = X_train,\n", " y_train = y_train,\n", " indep = X_train.columns,\n", " dep = outcome,\n", " blackbox_model = rf_model)\n", " \n", "# PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained\n", "pyexplainer_explanation_of_an_instance = our_pyexplainer.explain(\n", " X_explain = X_test.loc[file_to_be_explained,:].to_frame().transpose(),\n", " y_explain = pd.Series(bool(y_test.loc[file_to_be_explained]), \n", " index = [file_to_be_explained],\n", " name = outcome),\n", " search_function = 'crossoverinterpolation',\n", " max_iter=1000,\n", " max_rules=20,\n", " random_state=0,\n", " reuse_local_model=True)\n", " \n", "# Please use the code below to visualise the generated PyExplainer explanation (What-If interactive visualisation).\n", "our_pyexplainer.visualise(pyexplainer_explanation_of_an_instance, title=\"Why this file is defect-introducing ?\")\n", "\n", "````\n", "`````" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Import for PyExplainer\n", "from pyexplainer.pyexplainer_pyexplainer import PyExplainer\n", "\n", "file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'\n", "\n", "# PyExplainer Step 1 - Construct a PyExplainer \n", "\n", "\n", "# PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained\n", "\n", "\n", "# visualise the generated rule-based PyExplainer explanation\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "All of the above explanations are the property-contrast explanation within a file (https://xai4se.github.io/xai/theory-of-explanations.html).\n", "In fact, model-agnostic techniques can be used to generate other types of explanations, e.g., Object-contrast (i.e., the differences of explanations between two objects).\n", "\n", "**(4) Please use LIME to generate the object-contrast explanations between *DocumentsWriter.java* and *TestStringIntern.java*.**\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Generating the object-contrast explanations between src/java/org/apache/lucene/index/DocumentsWriter.java and src/test/org/apache/lucene/util/TestStringIntern.java with LIME\n" ] } ], "source": [ "# Import for LIME\n", "import lime\n", "import lime.lime_tabular\n", "\n", "file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'\n", "another_file_to_be_explained = 'src/test/org/apache/lucene/util/TestStringIntern.java'\n", "\n", "print(f'Generating the object-contrast explanations between {file_to_be_explained} and {another_file_to_be_explained} with LIME')\n", "\n", "# LIME Step 1 - Construct an explainer\n", "\n", "\n", "# LIME Step 2 - Use the constructed explainer with the predict function of your predictive model to explain the two instances\n", "\n", "\n", "# visualise the generated LIME explanation - (DocumentsWriter.java)\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# visualise the generated LIME explanation - (TestStringIntern.java)\n" ] } ], "metadata": { "kernelspec": { "display_name": "xaitools", "language": "python", "name": "xaitools" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 4 }