{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ASE2021 Hands-on Exercise\n",
    "\n",
    "Below are interactive hands-on exercises for model-agnostic techniques for generating local explanations.\n",
    "First, we need to load necesarry libraries as well as preparing datasets.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Load Data and preparing datasets\n",
    "\n",
    "# Import for Load Data\n",
    "from os import listdir\n",
    "from os.path import isfile, join\n",
    "import pandas as pd\n",
    "\n",
    "# Import for Split Data into Training and Testing Samples\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "train_dataset = pd.read_csv((\"../../datasets/lucene-2.9.0.csv\"), index_col = 'File')\n",
    "test_dataset = pd.read_csv((\"../../datasets/lucene-3.0.0.csv\"), index_col = 'File')\n",
    "\n",
    "outcome = 'RealBug'\n",
    "features = ['OWN_COMMIT', 'Added_lines', 'CountClassCoupled', 'AvgLine', 'RatioCommentToCode']\n",
    "# commits - # of commits that modify the file of interest\n",
    "# Added lines - # of added lines of code\n",
    "# Count class coupled - # of classes that interact or couple with the class of interest\n",
    "# LOC - # of lines of code\n",
    "# RatioCommentToCode - The ratio of lines of comments to lines of code\n",
    "\n",
    "# process outcome to 0 and 1\n",
    "train_dataset[outcome] = pd.Categorical(train_dataset[outcome])\n",
    "train_dataset[outcome] = train_dataset[outcome].cat.codes\n",
    "\n",
    "test_dataset[outcome] = pd.Categorical(test_dataset[outcome])\n",
    "test_dataset[outcome] = test_dataset[outcome].cat.codes\n",
    "\n",
    "X_train = train_dataset.loc[:, features]\n",
    "X_test = test_dataset.loc[:, features]\n",
    "\n",
    "y_train = train_dataset.loc[:, outcome]\n",
    "y_test = test_dataset.loc[:, outcome]\n",
    "\n",
    "class_labels = ['Clean', 'Defective']\n",
    "\n",
    "X_train.columns = features\n",
    "X_test.columns = features\n",
    "training_data = pd.concat([X_train, y_train], axis=1)\n",
    "testing_data = pd.concat([X_test, y_test], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we construct a Random Forests model as a predictive model to be explained.\n",
    "\n",
    "**(1) Please construct a Random Forests model using the code cell below.**\n",
    "\n",
    "\n",
    "`````{admonition} Tips\n",
    ":class: tip\n",
    "````\n",
    "\n",
    "our_rf_model = RandomForestClassifier(random_state=0)\n",
    "our_rf_model.fit(X_train, y_train)  \n",
    "\n",
    "````\n",
    "`````"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier\n",
    "\n",
    "# Please fit your Random Forests model here!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LIME\n",
    "\n",
    "**LIME** (i.e., Local Interpretable Model-agnostic\n",
    "Explanations) {cite}`ribeiro2016should` is a model-agnostic technique that\n",
    "mimics the behaviour of the black-box model to generate the explanations\n",
    "of the predictions of the black-box model. Given a black-box model and\n",
    "an instance to explain, LIME performs 4 key steps to generate an\n",
    "instance explanation as follows:\n",
    "\n",
    "-   First, LIME randomly generates instances surrounding the instance of\n",
    "    interest.\n",
    "\n",
    "-   Second, LIME uses the black-box model to generate predictions of the\n",
    "    generated random instances.\n",
    "\n",
    "-   Third, LIME constructs a local regression model using the generated\n",
    "    random instances and their generated predictions from the black-box\n",
    "    model.\n",
    "\n",
    "-   Finally, the coefficients of the regression model indicate the\n",
    "    contribution of each metric on the prediction of the instance of\n",
    "    interest according to the black-box model.\n",
    "    \n",
    "**(2) Please use LIME to explain the prediction of *DocumentsWriter.java* that is generated from your Random Forests model.**\n",
    "\n",
    "`````{admonition} Tips\n",
    ":class: tip\n",
    "````\n",
    "\n",
    "# LIME Step 1 - Construct an explainer\n",
    "our_lime_explainer = lime.lime_tabular.LimeTabularExplainer(\n",
    "                            training_data = X_train.values,  \n",
    "                            mode = 'classification',\n",
    "                            training_labels = y_train,\n",
    "                            feature_names = features,\n",
    "                            class_names = class_labels,\n",
    "                            discretize_continuous = True)\n",
    "                            \n",
    "# LIME Step 2 - Use the constructed explainer with the predict function \n",
    "# of your predictive model to explain any instance\n",
    "lime_local_explanation_of_an_instance = lime_explainer.explain_instance(\n",
    "                            data_row = X_test.loc['FileName.py', :], \n",
    "                            predict_fn = our_rf_model.predict_proba, \n",
    "                            num_features = 5,\n",
    "                            top_labels = 1)\n",
    "                            \n",
    "# Please use the code below to visualise the generated LIME explanation.\n",
    "lime_local_explanation_of_an_instance.show_in_notebook()\n",
    "\n",
    "````\n",
    "`````"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Explaining src/java/org/apache/lucene/index/DocumentsWriter.java with LIME\n"
     ]
    }
   ],
   "source": [
    "# Import for LIME\n",
    "import lime\n",
    "import lime.lime_tabular\n",
    "\n",
    "file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'\n",
    "\n",
    "print(f'Explaining {file_to_be_explained} with LIME')\n",
    "\n",
    "# LIME Step 1 - Construct an explainer\n",
    "\n",
    "\n",
    "# LIME Step 2 - Use the constructed explainer with the predict function of your predictive model to explain any instance\n",
    "\n",
    "\n",
    "# visualise the generated LIME explanation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SHAP\n",
    "\n",
    "**SHAP** (Shapley values) {cite}`lundberg2018consistentshap` is a model-agnostic technique that generate the explanations of the black-box model based on game theory.\n",
    "\n",
    "\n",
    "**(2) Please use LIME to explain the prediction of *DocumentsWriter.java* that is generated from your Random Forests model.**\n",
    "\n",
    "`````{admonition} Tips\n",
    ":class: tip\n",
    "````\n",
    "\n",
    "# SHAP Step 1 - Construct an explainer with the predict function\n",
    "# of your predictive model\n",
    "our_shap_explainer = shap.KernelExplainer(our_rf_model.predict, X_test)\n",
    "                            \n",
    "# SHAP Step 2 - Generate the SHAP explanation of an instance to be explained\n",
    "shap_explanations_of_an_instance = our_shap_explainer.shap_values(X_test.iloc[file_to_be_explained_idx, :])\n",
    "                            \n",
    "# Please use the code below to visualise the generated SHAP explanation (Force plot).\n",
    "shap.initjs()\n",
    "shap.force_plot(our_shap_explainer.expected_value, \n",
    "                shap_explanations_of_instances, \n",
    "                X_test.iloc[file_to_be_explained_idx,:])\n",
    "\n",
    "````\n",
    "`````"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Import libraries for SHAP\n",
    "import subprocess\n",
    "import sys\n",
    "import importlib\n",
    "import numpy\n",
    "import shap\n",
    "\n",
    "file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'\n",
    "file_to_be_explained_idx = list(X_test.index).index(file_to_be_explained)\n",
    "\n",
    "\n",
    "# SHAP Step 1 - Construct an explainer with the predict function\n",
    "\n",
    "\n",
    "# SHAP Step 2 - Generate the SHAP explanation of an instance to be explained\n",
    "\n",
    "\n",
    "# visualise the generated SHAP explanation\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PyExplainer\n",
    "\n",
    "**PyExplainer** {cite}`pornprasit2021pyexplainer` is a rule-based model-agnostic technique that utilises a local rule-based regression model to learn the associations between the characteristics of the synthetic instances and the predictions from the black-box model.  Given a black-box model and an instance to explain, PyExplainer performs four key steps to generate an instance explanation as follows:\n",
    "\n",
    "-   First, PyExplainer generates synthetic neighbors around the instance to be explained using the crossover\n",
    "    and mutation techniques\n",
    "\n",
    "-   Second, PyExplainer obtains the predictions of the synthetic neighbors from the black-box model\n",
    "\n",
    "-   Third, PyExplainer builds a local rule-based regression model\n",
    "\n",
    "-   Finally, PyExplainer generates an explanation from the local model for the instance to be explained\n",
    "    \n",
    "**(3) Please use PyExplainer to explain the prediction of *DocumentsWriter.java* that is generated from your Random Forests model.**\n",
    "\n",
    "`````{admonition} Tips\n",
    ":class: tip\n",
    "````\n",
    "import numpy as np\n",
    "np.random.seed(0)\n",
    "\n",
    "# PyExplainer Step 1 - Construct a PyExplainer \n",
    "our_pyexplainer = PyExplainer(X_train = X_train,\n",
    "                           y_train = y_train,\n",
    "                           indep = X_train.columns,\n",
    "                           dep = outcome,\n",
    "                           blackbox_model = rf_model)\n",
    "                            \n",
    "# PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained\n",
    "pyexplainer_explanation_of_an_instance = our_pyexplainer.explain(\n",
    "                                X_explain = X_test.loc[file_to_be_explained,:].to_frame().transpose(),\n",
    "                                y_explain = pd.Series(bool(y_test.loc[file_to_be_explained]), \n",
    "                                                      index = [file_to_be_explained],\n",
    "                                                      name = outcome),\n",
    "                                search_function = 'crossoverinterpolation',\n",
    "                                max_iter=1000,\n",
    "                                max_rules=20,\n",
    "                                random_state=0,\n",
    "                                reuse_local_model=True)\n",
    "                            \n",
    "# Please use the code below to visualise the generated PyExplainer explanation (What-If interactive visualisation).\n",
    "our_pyexplainer.visualise(pyexplainer_explanation_of_an_instance, title=\"Why this file is defect-introducing ?\")\n",
    "\n",
    "````\n",
    "`````"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import for PyExplainer\n",
    "from pyexplainer.pyexplainer_pyexplainer import PyExplainer\n",
    "\n",
    "file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'\n",
    "\n",
    "# PyExplainer Step 1 - Construct a PyExplainer \n",
    "\n",
    "\n",
    "# PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained\n",
    "\n",
    "\n",
    "# visualise the generated rule-based PyExplainer explanation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "All of the above explanations are the property-contrast explanation within a file (https://xai4se.github.io/xai/theory-of-explanations.html).\n",
    "In fact, model-agnostic techniques can be used to generate other types of explanations, e.g., Object-contrast (i.e., the differences of explanations between two objects).\n",
    "\n",
    "**(4) Please use LIME to generate the object-contrast explanations between *DocumentsWriter.java* and *TestStringIntern.java*.**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generating the object-contrast explanations between src/java/org/apache/lucene/index/DocumentsWriter.java and src/test/org/apache/lucene/util/TestStringIntern.java with LIME\n"
     ]
    }
   ],
   "source": [
    "# Import for LIME\n",
    "import lime\n",
    "import lime.lime_tabular\n",
    "\n",
    "file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'\n",
    "another_file_to_be_explained = 'src/test/org/apache/lucene/util/TestStringIntern.java'\n",
    "\n",
    "print(f'Generating the object-contrast explanations between {file_to_be_explained} and {another_file_to_be_explained} with LIME')\n",
    "\n",
    "# LIME Step 1 - Construct an explainer\n",
    "\n",
    "\n",
    "# LIME Step 2 - Use the constructed explainer with the predict function of your predictive model to explain the two instances\n",
    "\n",
    "\n",
    "# visualise the generated LIME explanation - (DocumentsWriter.java)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# visualise the generated LIME explanation - (TestStringIntern.java)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "xaitools",
   "language": "python",
   "name": "xaitools"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}