{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "italic-seafood",
   "metadata": {},
   "source": [
    "# <center> Welcome to PyExplainer Quickstart Guide </center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "northern-birmingham",
   "metadata": {},
   "source": [
    "# Top Note - MUST READ !!\n",
    "#### 1. When initialising the PyExplainer object, you should prepare 5 necessary parameters and follow the data type \n",
    "(1) X_train (pd.core.frame.DataFrame) - feature columns from training data <br><br> \n",
    "(2) y_train (pd.core.series.Series) - label column from training data <br><br>\n",
    "(3) indep (pd.core.indexes.base.Index) - names of feature columns > most of the time, you can get it by 'X_explain.columns' <br><br>\n",
    "(4) dep (str) - name of label column<br><br>\n",
    "(5) blackbox_model (any supervised classification model trained from sklearn lib) - model trained from sklearn lib<br><br>\n",
    "\n",
    "#### 2. When using the explain() function under PyExplainer object, you should prepare 2 parameters and follow the data type\n",
    "(1) X_explain (pd.core.frame.DataFrame) - one row of feature data <br><br> \n",
    "(2) y_explain (pd.core.series.Series) - one row of predicted data \n",
    "\n",
    "#### 3. Be careful when using the custom pandas index for Series and DataFrame \n",
    "In our Full Tutorial (PART B) example, the FileName column was used as the custom index.<br>  \n",
    "However, it is fine if you don't have custom index, pandas will generate default row index starting from 0.<br><br>\n",
    "If you do want to make use of custom index, make sure to use it consistently, whenever you do the data processing.<br><br>\n",
    "Otherwise, some of your data may have pandas default index while the others have your custom index, <br><br>\n",
    "which will trigger errors whenever you try to combine your DataFrame and Series. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "conscious-roommate",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "employed-intensity",
   "metadata": {},
   "source": [
    "# PART A - Quick Start"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "deadly-immigration",
   "metadata": {},
   "source": [
    "## 1. Prepare data and model\n",
    "\n",
    "Note. We use the default data and model here for an example"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "higher-cooling",
   "metadata": {},
   "source": [
    "### 1.1 Import required library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "harmful-morocco",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyexplainer import pyexplainer_pyexplainer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "practical-jenny",
   "metadata": {},
   "source": [
    "### 1.2 Obtain default dataset and global model (Random Forest)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "employed-adobe",
   "metadata": {},
   "outputs": [],
   "source": [
    "default_data_and_model = pyexplainer_pyexplainer.get_dflt()\n",
    "py_explainer = pyexplainer_pyexplainer.PyExplainer(X_train = default_data_and_model['X_train'],\n",
    "                           y_train = default_data_and_model['y_train'],\n",
    "                           indep = default_data_and_model['indep'],\n",
    "                           dep = default_data_and_model['dep'],\n",
    "                           blackbox_model = default_data_and_model['blackbox_model'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "informational-disclosure",
   "metadata": {},
   "source": [
    "## 🔧2. Create PyExplainer object "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "anticipated-tourism",
   "metadata": {},
   "source": [
    "### 2.1 Prepare data for creating PyExplainer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "abstract-disposal",
   "metadata": {},
   "outputs": [],
   "source": [
    "X_explain = default_data_and_model['X_explain']\n",
    "y_explain = default_data_and_model['y_explain']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "inside-republic",
   "metadata": {},
   "source": [
    "### 2.2 Create rules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "spatial-wrong",
   "metadata": {},
   "outputs": [],
   "source": [
    "created_rules = py_explainer.explain(X_explain=X_explain,\n",
    "                                     y_explain=y_explain,\n",
    "                                     search_function='crossoverinterpolation')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "engaging-ridge",
   "metadata": {},
   "source": [
    "## 3. Create interactive visualization\n",
    "\n",
    "You can change feature values at the slider bar to observe change of risk score."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "second-trout",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9feefd892e174452bc28777e0ad4dfba",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(Label(value='Risk Score: '), FloatProgress(value=0.0, bar_style='info', layout=Layout(width='40…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "07d0fafe28994776bc5666a43c492587",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output(layout=Layout(border='3px solid black'), outputs=({'output_type': 'display_data', 'data': {'text/plain'…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7f0ae3bba93d4ffbb4a944874bfcbe74",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "FloatSlider(value=0.0, continuous_update=False, description='#1 The value of CountDeclInstanceVariable is more…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "012c0a572d1e440b8bc3d847bed27af6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "FloatSlider(value=90.0, continuous_update=False, description='#2 The value of PercentLackOfCohesion is more th…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fc2862721c44469c90a106c385682435",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "FloatSlider(value=7.0, continuous_update=False, description='#3 The value of CountDeclMethodPublic is more tha…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "py_explainer.visualise(created_rules)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "894f3a9a",
   "metadata": {},
   "source": [
    "##### *** If the widget is not displayed, please run the code cell below, restart the notebook, and rerun from the top"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9db21a4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os \n",
    "os.system(\"jupyter nbextension enable --py widgetsnbextension\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a15de81b",
   "metadata": {},
   "source": [
    "# PART B - Full Tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "italic-services",
   "metadata": {},
   "source": [
    "## 1. Prepare sample data and model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "front-programmer",
   "metadata": {},
   "source": [
    "### 1.1 For the simplicity, we load the sample DataFrame that is included in the package already"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "removable-problem",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>File</th>\n",
       "      <th>CountDeclMethodPrivate</th>\n",
       "      <th>AvgLineCode</th>\n",
       "      <th>CountLine</th>\n",
       "      <th>MaxCyclomatic</th>\n",
       "      <th>CountDeclMethodDefault</th>\n",
       "      <th>AvgEssential</th>\n",
       "      <th>CountDeclClassVariable</th>\n",
       "      <th>SumCyclomaticStrict</th>\n",
       "      <th>AvgCyclomatic</th>\n",
       "      <th>...</th>\n",
       "      <th>OWN_LINE</th>\n",
       "      <th>OWN_COMMIT</th>\n",
       "      <th>MINOR_COMMIT</th>\n",
       "      <th>MINOR_LINE</th>\n",
       "      <th>MAJOR_COMMIT</th>\n",
       "      <th>MAJOR_LINE</th>\n",
       "      <th>RealBug</th>\n",
       "      <th>HeuBug</th>\n",
       "      <th>HeuBugCount</th>\n",
       "      <th>RealBugCount</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>activemq-console/src/main/java/org/apache/acti...</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "      <td>171</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>18</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>1.00000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>activemq-console/src/main/java/org/apache/acti...</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>123</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>15</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>0.98374</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>activemq-console/src/main/java/org/apache/acti...</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>136</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>16</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>1.00000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 70 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                File  CountDeclMethodPrivate  \\\n",
       "0  activemq-console/src/main/java/org/apache/acti...                       0   \n",
       "1  activemq-console/src/main/java/org/apache/acti...                       0   \n",
       "2  activemq-console/src/main/java/org/apache/acti...                       0   \n",
       "\n",
       "   AvgLineCode  CountLine  MaxCyclomatic  CountDeclMethodDefault  \\\n",
       "0           10        171              5                       0   \n",
       "1            8        123              5                       0   \n",
       "2            7        136              5                       0   \n",
       "\n",
       "   AvgEssential  CountDeclClassVariable  SumCyclomaticStrict  AvgCyclomatic  \\\n",
       "0             2                       0                   18              2   \n",
       "1             1                       1                   15              3   \n",
       "2             1                       1                   16              2   \n",
       "\n",
       "   ...  OWN_LINE  OWN_COMMIT  MINOR_COMMIT  MINOR_LINE  MAJOR_COMMIT  \\\n",
       "0  ...   1.00000         1.0             0           1             1   \n",
       "1  ...   0.98374         0.5             0           1             2   \n",
       "2  ...   1.00000         1.0             0           1             1   \n",
       "\n",
       "   MAJOR_LINE  RealBug  HeuBug  HeuBugCount  RealBugCount  \n",
       "0           0    False   False            0             0  \n",
       "1           1    False   False            0             0  \n",
       "2           0    False   False            0             0  \n",
       "\n",
       "[3 rows x 70 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from pyexplainer import pyexplainer_pyexplainer\n",
    "\n",
    "df = pyexplainer_pyexplainer.load_sample_data()\n",
    "df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "connected-occurrence",
   "metadata": {},
   "source": [
    "### 1.2 Define index column (OPTIONAL) and drop unwanted columns\n",
    "##### First, we set 'File' col as index col since it is the file that we wanna inspect, and it has nothing to do with features or label\n",
    "##### We use 'RealBug' as the label col, and the cols before 'RealBug' as feature cols\n",
    "##### Then we drop unnecessary cols (e.g. File, HeuBug, HeuBugCount, RealBugCount)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "median-moisture",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CountDeclMethodPrivate</th>\n",
       "      <th>AvgLineCode</th>\n",
       "      <th>CountLine</th>\n",
       "      <th>MaxCyclomatic</th>\n",
       "      <th>CountDeclMethodDefault</th>\n",
       "      <th>AvgEssential</th>\n",
       "      <th>CountDeclClassVariable</th>\n",
       "      <th>SumCyclomaticStrict</th>\n",
       "      <th>AvgCyclomatic</th>\n",
       "      <th>AvgLine</th>\n",
       "      <th>...</th>\n",
       "      <th>DDEV</th>\n",
       "      <th>Added_lines</th>\n",
       "      <th>Del_lines</th>\n",
       "      <th>OWN_LINE</th>\n",
       "      <th>OWN_COMMIT</th>\n",
       "      <th>MINOR_COMMIT</th>\n",
       "      <th>MINOR_LINE</th>\n",
       "      <th>MAJOR_COMMIT</th>\n",
       "      <th>MAJOR_LINE</th>\n",
       "      <th>RealBug</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>File</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>activemq-console/src/main/java/org/apache/activemq/console/command/AbstractAmqCommand.java</th>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "      <td>171</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>18</td>\n",
       "      <td>2</td>\n",
       "      <td>18</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>32</td>\n",
       "      <td>18</td>\n",
       "      <td>1.00000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>activemq-console/src/main/java/org/apache/activemq/console/command/AbstractCommand.java</th>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>123</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>15</td>\n",
       "      <td>3</td>\n",
       "      <td>17</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>30</td>\n",
       "      <td>28</td>\n",
       "      <td>0.98374</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>activemq-console/src/main/java/org/apache/activemq/console/command/AbstractJmxCommand.java</th>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>136</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>16</td>\n",
       "      <td>2</td>\n",
       "      <td>13</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>1.00000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 66 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    CountDeclMethodPrivate  \\\n",
       "File                                                                         \n",
       "activemq-console/src/main/java/org/apache/activ...                       0   \n",
       "activemq-console/src/main/java/org/apache/activ...                       0   \n",
       "activemq-console/src/main/java/org/apache/activ...                       0   \n",
       "\n",
       "                                                    AvgLineCode  CountLine  \\\n",
       "File                                                                         \n",
       "activemq-console/src/main/java/org/apache/activ...           10        171   \n",
       "activemq-console/src/main/java/org/apache/activ...            8        123   \n",
       "activemq-console/src/main/java/org/apache/activ...            7        136   \n",
       "\n",
       "                                                    MaxCyclomatic  \\\n",
       "File                                                                \n",
       "activemq-console/src/main/java/org/apache/activ...              5   \n",
       "activemq-console/src/main/java/org/apache/activ...              5   \n",
       "activemq-console/src/main/java/org/apache/activ...              5   \n",
       "\n",
       "                                                    CountDeclMethodDefault  \\\n",
       "File                                                                         \n",
       "activemq-console/src/main/java/org/apache/activ...                       0   \n",
       "activemq-console/src/main/java/org/apache/activ...                       0   \n",
       "activemq-console/src/main/java/org/apache/activ...                       0   \n",
       "\n",
       "                                                    AvgEssential  \\\n",
       "File                                                               \n",
       "activemq-console/src/main/java/org/apache/activ...             2   \n",
       "activemq-console/src/main/java/org/apache/activ...             1   \n",
       "activemq-console/src/main/java/org/apache/activ...             1   \n",
       "\n",
       "                                                    CountDeclClassVariable  \\\n",
       "File                                                                         \n",
       "activemq-console/src/main/java/org/apache/activ...                       0   \n",
       "activemq-console/src/main/java/org/apache/activ...                       1   \n",
       "activemq-console/src/main/java/org/apache/activ...                       1   \n",
       "\n",
       "                                                    SumCyclomaticStrict  \\\n",
       "File                                                                      \n",
       "activemq-console/src/main/java/org/apache/activ...                   18   \n",
       "activemq-console/src/main/java/org/apache/activ...                   15   \n",
       "activemq-console/src/main/java/org/apache/activ...                   16   \n",
       "\n",
       "                                                    AvgCyclomatic  AvgLine  \\\n",
       "File                                                                         \n",
       "activemq-console/src/main/java/org/apache/activ...              2       18   \n",
       "activemq-console/src/main/java/org/apache/activ...              3       17   \n",
       "activemq-console/src/main/java/org/apache/activ...              2       13   \n",
       "\n",
       "                                                    ...  DDEV  Added_lines  \\\n",
       "File                                                ...                      \n",
       "activemq-console/src/main/java/org/apache/activ...  ...     1           32   \n",
       "activemq-console/src/main/java/org/apache/activ...  ...     2           30   \n",
       "activemq-console/src/main/java/org/apache/activ...  ...     1            8   \n",
       "\n",
       "                                                    Del_lines  OWN_LINE  \\\n",
       "File                                                                      \n",
       "activemq-console/src/main/java/org/apache/activ...         18   1.00000   \n",
       "activemq-console/src/main/java/org/apache/activ...         28   0.98374   \n",
       "activemq-console/src/main/java/org/apache/activ...          8   1.00000   \n",
       "\n",
       "                                                    OWN_COMMIT  MINOR_COMMIT  \\\n",
       "File                                                                           \n",
       "activemq-console/src/main/java/org/apache/activ...         1.0             0   \n",
       "activemq-console/src/main/java/org/apache/activ...         0.5             0   \n",
       "activemq-console/src/main/java/org/apache/activ...         1.0             0   \n",
       "\n",
       "                                                    MINOR_LINE  MAJOR_COMMIT  \\\n",
       "File                                                                           \n",
       "activemq-console/src/main/java/org/apache/activ...           1             1   \n",
       "activemq-console/src/main/java/org/apache/activ...           1             2   \n",
       "activemq-console/src/main/java/org/apache/activ...           1             1   \n",
       "\n",
       "                                                    MAJOR_LINE  RealBug  \n",
       "File                                                                     \n",
       "activemq-console/src/main/java/org/apache/activ...           0    False  \n",
       "activemq-console/src/main/java/org/apache/activ...           1    False  \n",
       "activemq-console/src/main/java/org/apache/activ...           0    False  \n",
       "\n",
       "[3 rows x 66 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.set_index(df['File'])\n",
    "df = df.drop(['File', 'HeuBug', 'HeuBugCount', 'RealBugCount'], axis=1)\n",
    "df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "advance-wilderness",
   "metadata": {},
   "source": [
    "### 1.3 Define feature cols (X), and label col (y)\n",
    "##### the function AutoSpearman is used as a feature selection method to reduce number of features\n",
    "##### for more information about the algorithm please refer to [this paper](https://ieeexplore.ieee.org/document/8530020)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "growing-scale",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(Part 1) Automatically select non-correlated metrics based on a Spearman rank correlation test\n",
      "> Step 1 comparing between CountDeclMethod and CountDeclFunction\n",
      ">> CountDeclMethod has the average correlation of 0.433 with other metrics\n",
      ">> CountDeclFunction has the average correlation of 0.433 with other metrics\n",
      ">> Exclude CountDeclMethod\n",
      "> Step 2 comparing between MAJOR_COMMIT and DDEV\n",
      ">> MAJOR_COMMIT has the average correlation of 0.274 with other metrics\n",
      ">> DDEV has the average correlation of 0.274 with other metrics\n",
      ">> Exclude DDEV\n",
      "> Step 3 comparing between SumCyclomatic and SumCyclomaticModified\n",
      ">> SumCyclomatic has the average correlation of 0.501 with other metrics\n",
      ">> SumCyclomaticModified has the average correlation of 0.501 with other metrics\n",
      ">> Exclude SumCyclomatic\n",
      "> Step 4 comparing between AvgCyclomatic and AvgCyclomaticModified\n",
      ">> AvgCyclomatic has the average correlation of 0.387 with other metrics\n",
      ">> AvgCyclomaticModified has the average correlation of 0.387 with other metrics\n",
      ">> Exclude AvgCyclomatic\n",
      "> Step 5 comparing between MaxCyclomatic and MaxCyclomaticModified\n",
      ">> MaxCyclomatic has the average correlation of 0.476 with other metrics\n",
      ">> MaxCyclomaticModified has the average correlation of 0.476 with other metrics\n",
      ">> Exclude MaxCyclomatic\n",
      "> Step 6 comparing between SumCyclomaticModified and SumCyclomaticStrict\n",
      ">> SumCyclomaticModified has the average correlation of 0.488 with other metrics\n",
      ">> SumCyclomaticStrict has the average correlation of 0.489 with other metrics\n",
      ">> Exclude SumCyclomaticStrict\n",
      "> Step 7 comparing between CountStmtDecl and CountLineCodeDecl\n",
      ">> CountStmtDecl has the average correlation of 0.49 with other metrics\n",
      ">> CountLineCodeDecl has the average correlation of 0.487 with other metrics\n",
      ">> Exclude CountStmtDecl\n",
      "> Step 8 comparing between CountLineCode and CountStmt\n",
      ">> CountLineCode has the average correlation of 0.504 with other metrics\n",
      ">> CountStmt has the average correlation of 0.501 with other metrics\n",
      ">> Exclude CountLineCode\n",
      "> Step 9 comparing between CountSemicolon and CountStmt\n",
      ">> CountSemicolon has the average correlation of 0.484 with other metrics\n",
      ">> CountStmt has the average correlation of 0.492 with other metrics\n",
      ">> Exclude CountStmt\n",
      "> Step 10 comparing between OWN_COMMIT and MAJOR_COMMIT\n",
      ">> OWN_COMMIT has the average correlation of 0.238 with other metrics\n",
      ">> MAJOR_COMMIT has the average correlation of 0.249 with other metrics\n",
      ">> Exclude MAJOR_COMMIT\n",
      "> Step 11 comparing between CountPath_Max and MaxCyclomaticModified\n",
      ">> CountPath_Max has the average correlation of 0.447 with other metrics\n",
      ">> MaxCyclomaticModified has the average correlation of 0.448 with other metrics\n",
      ">> Exclude MaxCyclomaticModified\n",
      "> Step 12 comparing between CountStmtExe and CountLineCodeExe\n",
      ">> CountStmtExe has the average correlation of 0.473 with other metrics\n",
      ">> CountLineCodeExe has the average correlation of 0.475 with other metrics\n",
      ">> Exclude CountLineCodeExe\n",
      "> Step 13 comparing between SumEssential and CountDeclFunction\n",
      ">> SumEssential has the average correlation of 0.397 with other metrics\n",
      ">> CountDeclFunction has the average correlation of 0.379 with other metrics\n",
      ">> Exclude SumEssential\n",
      "> Step 14 comparing between CountPath_Max and MaxCyclomaticStrict\n",
      ">> CountPath_Max has the average correlation of 0.427 with other metrics\n",
      ">> MaxCyclomaticStrict has the average correlation of 0.428 with other metrics\n",
      ">> Exclude MaxCyclomaticStrict\n",
      "> Step 15 comparing between CountPath_Max and CountPath_Mean\n",
      ">> CountPath_Max has the average correlation of 0.416 with other metrics\n",
      ">> CountPath_Mean has the average correlation of 0.399 with other metrics\n",
      ">> Exclude CountPath_Max\n",
      "> Step 16 comparing between AvgCyclomaticStrict and AvgCyclomaticModified\n",
      ">> AvgCyclomaticStrict has the average correlation of 0.337 with other metrics\n",
      ">> AvgCyclomaticModified has the average correlation of 0.33 with other metrics\n",
      ">> Exclude AvgCyclomaticStrict\n",
      "> Step 17 comparing between CountDeclFunction and CountDeclInstanceMethod\n",
      ">> CountDeclFunction has the average correlation of 0.364 with other metrics\n",
      ">> CountDeclInstanceMethod has the average correlation of 0.342 with other metrics\n",
      ">> Exclude CountDeclFunction\n",
      "> Step 18 comparing between CountSemicolon and CountLineCodeDecl\n",
      ">> CountSemicolon has the average correlation of 0.436 with other metrics\n",
      ">> CountLineCodeDecl has the average correlation of 0.421 with other metrics\n",
      ">> Exclude CountSemicolon\n",
      "> Step 19 comparing between CountLine and CountLineBlank\n",
      ">> CountLine has the average correlation of 0.413 with other metrics\n",
      ">> CountLineBlank has the average correlation of 0.372 with other metrics\n",
      ">> Exclude CountLine\n",
      "> Step 20 comparing between MaxNesting_Mean and CountPath_Mean\n",
      ">> MaxNesting_Mean has the average correlation of 0.33 with other metrics\n",
      ">> CountPath_Mean has the average correlation of 0.365 with other metrics\n",
      ">> Exclude CountPath_Mean\n",
      "> Step 21 comparing between MaxNesting_Max and MaxNesting_Mean\n",
      ">> MaxNesting_Max has the average correlation of 0.337 with other metrics\n",
      ">> MaxNesting_Mean has the average correlation of 0.316 with other metrics\n",
      ">> Exclude MaxNesting_Max\n",
      "> Step 22 comparing between CountOutput_Mean and AvgLineCode\n",
      ">> CountOutput_Mean has the average correlation of 0.284 with other metrics\n",
      ">> AvgLineCode has the average correlation of 0.317 with other metrics\n",
      ">> Exclude AvgLineCode\n",
      "> Step 23 comparing between CountLineCodeDecl and SumCyclomaticModified\n",
      ">> CountLineCodeDecl has the average correlation of 0.385 with other metrics\n",
      ">> SumCyclomaticModified has the average correlation of 0.375 with other metrics\n",
      ">> Exclude CountLineCodeDecl\n",
      "> Step 24 comparing between CountPath_Min and MaxNesting_Min\n",
      ">> CountPath_Min has the average correlation of 0.083 with other metrics\n",
      ">> MaxNesting_Min has the average correlation of 0.077 with other metrics\n",
      ">> Exclude CountPath_Min\n",
      "> Step 25 comparing between CountDeclInstanceMethod and SumCyclomaticModified\n",
      ">> CountDeclInstanceMethod has the average correlation of 0.304 with other metrics\n",
      ">> SumCyclomaticModified has the average correlation of 0.371 with other metrics\n",
      ">> Exclude SumCyclomaticModified\n",
      "> Step 26 comparing between RatioCommentToCode and CountStmtExe\n",
      ">> RatioCommentToCode has the average correlation of 0.341 with other metrics\n",
      ">> CountStmtExe has the average correlation of 0.379 with other metrics\n",
      ">> Exclude CountStmtExe\n",
      "> Step 27 comparing between CountInput_Max and CountInput_Mean\n",
      ">> CountInput_Max has the average correlation of 0.293 with other metrics\n",
      ">> CountInput_Mean has the average correlation of 0.232 with other metrics\n",
      ">> Exclude CountInput_Max\n",
      "> Step 28 comparing between CountOutput_Max and CountOutput_Mean\n",
      ">> CountOutput_Max has the average correlation of 0.329 with other metrics\n",
      ">> CountOutput_Mean has the average correlation of 0.259 with other metrics\n",
      ">> Exclude CountOutput_Max\n",
      "> Step 29 comparing between MaxNesting_Mean and AvgCyclomaticModified\n",
      ">> MaxNesting_Mean has the average correlation of 0.257 with other metrics\n",
      ">> AvgCyclomaticModified has the average correlation of 0.247 with other metrics\n",
      ">> Exclude MaxNesting_Mean\n",
      "> Step 30 comparing between Added_lines and Del_lines\n",
      ">> Added_lines has the average correlation of 0.294 with other metrics\n",
      ">> Del_lines has the average correlation of 0.291 with other metrics\n",
      ">> Exclude Added_lines\n",
      "> Step 31 comparing between CountLineBlank and CountDeclInstanceMethod\n",
      ">> CountLineBlank has the average correlation of 0.299 with other metrics\n",
      ">> CountDeclInstanceMethod has the average correlation of 0.258 with other metrics\n",
      ">> Exclude CountLineBlank\n",
      "> Step 32 comparing between MINOR_LINE and OWN_LINE\n",
      ">> MINOR_LINE has the average correlation of 0.08 with other metrics\n",
      ">> OWN_LINE has the average correlation of 0.078 with other metrics\n",
      ">> Exclude MINOR_LINE\n",
      "> Step 33 comparing between CountDeclInstanceMethod and CountDeclMethodPublic\n",
      ">> CountDeclInstanceMethod has the average correlation of 0.246 with other metrics\n",
      ">> CountDeclMethodPublic has the average correlation of 0.232 with other metrics\n",
      ">> Exclude CountDeclInstanceMethod\n",
      "> Step 34 comparing between AvgLine and CountOutput_Mean\n",
      ">> AvgLine has the average correlation of 0.234 with other metrics\n",
      ">> CountOutput_Mean has the average correlation of 0.239 with other metrics\n",
      ">> Exclude CountOutput_Mean\n",
      "> Step 35 comparing between CountLineComment and AvgLineComment\n",
      ">> CountLineComment has the average correlation of 0.149 with other metrics\n",
      ">> AvgLineComment has the average correlation of 0.112 with other metrics\n",
      ">> Exclude CountLineComment\n",
      "> Step 36 comparing between Del_lines and ADEV\n",
      ">> Del_lines has the average correlation of 0.265 with other metrics\n",
      ">> ADEV has the average correlation of 0.233 with other metrics\n",
      ">> Exclude Del_lines\n",
      "According to Part 1 of AutoSpearman, ['ADEV', 'CountClassCoupled', 'AvgLine', 'OWN_LINE', 'CountDeclMethodProtected', 'CountDeclInstanceVariable', 'PercentLackOfCohesion', 'CountDeclClass', 'MAJOR_LINE', 'AvgLineBlank', 'CountDeclMethodPublic', 'CountInput_Mean', 'MaxNesting_Min', 'CountOutput_Min', 'CountDeclMethodDefault', 'AvgCyclomaticModified', 'CountInput_Min', 'CountDeclClassMethod', 'CountClassDerived', 'AvgLineComment', 'CountDeclClassVariable', 'CountClassBase', 'OWN_COMMIT', 'MaxInheritanceTree', 'CountDeclMethodPrivate', 'MINOR_COMMIT', 'AvgEssential', 'COMM', 'RatioCommentToCode'] are selected.\n",
      "(Part 2) Automatically select non-correlated metrics based on a Variance Inflation Factor analysis\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\micha\\miniconda3\\lib\\site-packages\\statsmodels\\tsa\\tsatools.py:142: FutureWarning: In a future version of pandas all arguments of concat except for the argument 'objs' will be keyword-only\n",
      "  x = pd.concat(x[::order], 1)\n",
      "C:\\Users\\micha\\miniconda3\\lib\\site-packages\\statsmodels\\stats\\outliers_influence.py:193: RuntimeWarning: divide by zero encountered in double_scalars\n",
      "  vif = 1. / (1. - r_squared_i)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Step 1 - exclude ADEV\n",
      "> Step 2 - exclude AvgLine\n",
      "Finally, according to Part 2 of AutoSpearman, Index(['CountClassCoupled', 'OWN_LINE', 'CountDeclMethodProtected',\n",
      "       'CountDeclInstanceVariable', 'PercentLackOfCohesion', 'CountDeclClass',\n",
      "       'MAJOR_LINE', 'AvgLineBlank', 'CountDeclMethodPublic',\n",
      "       'CountInput_Mean', 'MaxNesting_Min', 'CountOutput_Min',\n",
      "       'CountDeclMethodDefault', 'AvgCyclomaticModified', 'CountInput_Min',\n",
      "       'const', 'CountDeclClassMethod', 'CountClassDerived', 'AvgLineComment',\n",
      "       'CountDeclClassVariable', 'CountClassBase', 'OWN_COMMIT',\n",
      "       'MaxInheritanceTree', 'CountDeclMethodPrivate', 'MINOR_COMMIT',\n",
      "       'AvgEssential', 'COMM', 'RatioCommentToCode'],\n",
      "      dtype='object') are selected.\n",
      "27  out of  65  were selected via AutoSpearman feature selection process\n",
      "feature cols: \n",
      "\n",
      "                                                     CountClassCoupled  \\\n",
      "File                                                                    \n",
      "activemq-console/src/main/java/org/apache/activ...                  2   \n",
      "\n",
      "                                                    OWN_LINE  \\\n",
      "File                                                           \n",
      "activemq-console/src/main/java/org/apache/activ...       1.0   \n",
      "\n",
      "                                                    CountDeclMethodProtected  \\\n",
      "File                                                                           \n",
      "activemq-console/src/main/java/org/apache/activ...                         7   \n",
      "\n",
      "                                                    CountDeclInstanceVariable  \\\n",
      "File                                                                            \n",
      "activemq-console/src/main/java/org/apache/activ...                          3   \n",
      "\n",
      "                                                    PercentLackOfCohesion  \\\n",
      "File                                                                        \n",
      "activemq-console/src/main/java/org/apache/activ...                     61   \n",
      "\n",
      "                                                    CountDeclClass  \\\n",
      "File                                                                 \n",
      "activemq-console/src/main/java/org/apache/activ...               1   \n",
      "\n",
      "                                                    MAJOR_LINE  AvgLineBlank  \\\n",
      "File                                                                           \n",
      "activemq-console/src/main/java/org/apache/activ...           0             1   \n",
      "\n",
      "                                                    CountDeclMethodPublic  \\\n",
      "File                                                                        \n",
      "activemq-console/src/main/java/org/apache/activ...                      0   \n",
      "\n",
      "                                                    CountInput_Mean  ...  \\\n",
      "File                                                                 ...   \n",
      "activemq-console/src/main/java/org/apache/activ...         2.714286  ...   \n",
      "\n",
      "                                                    AvgLineComment  \\\n",
      "File                                                                 \n",
      "activemq-console/src/main/java/org/apache/activ...               6   \n",
      "\n",
      "                                                    CountDeclClassVariable  \\\n",
      "File                                                                         \n",
      "activemq-console/src/main/java/org/apache/activ...                       0   \n",
      "\n",
      "                                                    CountClassBase  \\\n",
      "File                                                                 \n",
      "activemq-console/src/main/java/org/apache/activ...               1   \n",
      "\n",
      "                                                    OWN_COMMIT  \\\n",
      "File                                                             \n",
      "activemq-console/src/main/java/org/apache/activ...         1.0   \n",
      "\n",
      "                                                    MaxInheritanceTree  \\\n",
      "File                                                                     \n",
      "activemq-console/src/main/java/org/apache/activ...                   2   \n",
      "\n",
      "                                                    CountDeclMethodPrivate  \\\n",
      "File                                                                         \n",
      "activemq-console/src/main/java/org/apache/activ...                       0   \n",
      "\n",
      "                                                    MINOR_COMMIT  \\\n",
      "File                                                               \n",
      "activemq-console/src/main/java/org/apache/activ...             0   \n",
      "\n",
      "                                                    AvgEssential  COMM  \\\n",
      "File                                                                     \n",
      "activemq-console/src/main/java/org/apache/activ...             2     1   \n",
      "\n",
      "                                                    RatioCommentToCode  \n",
      "File                                                                    \n",
      "activemq-console/src/main/java/org/apache/activ...                 0.7  \n",
      "\n",
      "[1 rows x 27 columns] \n",
      "\n",
      "\n",
      "label col: \n",
      "\n",
      " File\n",
      "activemq-console/src/main/java/org/apache/activemq/console/command/AbstractAmqCommand.java    False\n",
      "Name: RealBug, dtype: bool\n"
     ]
    }
   ],
   "source": [
    "from pyexplainer.pyexplainer_pyexplainer import AutoSpearman\n",
    "# select all rows, and all feature cols\n",
    "# the last col, which is label col, is not selected\n",
    "X = df.iloc[:, :-1]\n",
    "total_features = len(X.columns)\n",
    "\n",
    "# apply feature selection function to our feature DataFrame\n",
    "X = AutoSpearman(X)\n",
    "selected = len(X.columns)\n",
    "\n",
    "# select all rows, and the last label col\n",
    "y = df.iloc[:, -1]\n",
    "\n",
    "print(selected, \" out of \", total_features, \" were selected via AutoSpearman feature selection process\")\n",
    "print('feature cols:', '\\n\\n', X.head(1), '\\n\\n')\n",
    "print('label col:', '\\n\\n', y.head(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "offshore-cattle",
   "metadata": {},
   "source": [
    "### 1.4 Split data into training and testing set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "absolute-diversity",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "# 70% training and 30% test\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fewer-webmaster",
   "metadata": {},
   "source": [
    "## 2. Training and Predicting"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "increased-queensland",
   "metadata": {},
   "source": [
    "### 2.1 Train a RandomForest model using sklearn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "lesser-coupon",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RandomForestClassifier(random_state=0)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier\n",
    "\n",
    "rf_model = RandomForestClassifier(n_estimators=100, max_depth=None, min_samples_split=2, random_state=0)\n",
    "rf_model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "early-admission",
   "metadata": {},
   "source": [
    "### 2.2 Generate predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "detailed-isaac",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PredictedBug</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>File</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>activemq-core/src/main/java/org/apache/activemq/kaha/MapContainer.java</th>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>activemq-core/src/main/java/org/apache/activemq/openwire/v3/MessageAckMarshaller.java</th>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>activemq-core/src/main/java/org/apache/activemq/ConnectionFailedException.java</th>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    PredictedBug\n",
       "File                                                            \n",
       "activemq-core/src/main/java/org/apache/activemq...         False\n",
       "activemq-core/src/main/java/org/apache/activemq...         False\n",
       "activemq-core/src/main/java/org/apache/activemq...         False"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# generate prediction from the model, which will return a list of predicted labels\n",
    "y_preds = rf_model.predict(X_test) \n",
    "# create a DataFrame which only has predicted label column\n",
    "y_preds = pd.DataFrame(data={'PredictedBug': y_preds}, index=y_test.index) \n",
    "y_preds.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "inappropriate-decline",
   "metadata": {},
   "source": [
    "## 3. Prediction post processing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "weekly-homeless",
   "metadata": {},
   "source": [
    "### 3.1 Combine feature cols, label col, and the predicted col in testing set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "steady-comparison",
   "metadata": {},
   "outputs": [],
   "source": [
    "combined_testing_data = X_test.join(y_test.to_frame())\n",
    "combined_testing_data = combined_testing_data.join(y_preds)\n",
    "combined_testing_data.head(3)\n",
    "# total num of rows\n",
    "total_rows = len(combined_testing_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "viral-limit",
   "metadata": {},
   "source": [
    "### 3.2 Filter out wronly predicted rows "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "median-element",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The model correctly predicted  90.60000000000001 % of testing data\n"
     ]
    }
   ],
   "source": [
    "correctly_predicted_data = combined_testing_data[combined_testing_data['RealBug']==combined_testing_data['PredictedBug']]\n",
    "correctly_predicted_rows = len(correctly_predicted_data)\n",
    "print('The model correctly predicted ', round((correctly_predicted_rows / total_rows), 3) * 100, '% of testing data')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "single-compression",
   "metadata": {},
   "source": [
    "### 3.3 We focus on the bug file, therefore, filter out the non-buggy file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "welsh-aspect",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CountClassCoupled</th>\n",
       "      <th>OWN_LINE</th>\n",
       "      <th>CountDeclMethodProtected</th>\n",
       "      <th>CountDeclInstanceVariable</th>\n",
       "      <th>PercentLackOfCohesion</th>\n",
       "      <th>CountDeclClass</th>\n",
       "      <th>MAJOR_LINE</th>\n",
       "      <th>AvgLineBlank</th>\n",
       "      <th>CountDeclMethodPublic</th>\n",
       "      <th>CountInput_Mean</th>\n",
       "      <th>...</th>\n",
       "      <th>CountClassBase</th>\n",
       "      <th>OWN_COMMIT</th>\n",
       "      <th>MaxInheritanceTree</th>\n",
       "      <th>CountDeclMethodPrivate</th>\n",
       "      <th>MINOR_COMMIT</th>\n",
       "      <th>AvgEssential</th>\n",
       "      <th>COMM</th>\n",
       "      <th>RatioCommentToCode</th>\n",
       "      <th>RealBug</th>\n",
       "      <th>PredictedBug</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>File</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>activemq-core/src/test/java/org/apache/activemq/transport/fanout/FanoutTransportBrokerTest.java</th>\n",
       "      <td>12</td>\n",
       "      <td>0.738916</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>77</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "      <td>2.181818</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>0.800000</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>0.27</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>activemq-core/src/main/java/org/apache/activemq/ActiveMQMessageConsumer.java</th>\n",
       "      <td>27</td>\n",
       "      <td>0.569082</td>\n",
       "      <td>10</td>\n",
       "      <td>21</td>\n",
       "      <td>89</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>35</td>\n",
       "      <td>4.807692</td>\n",
       "      <td>...</td>\n",
       "      <td>4</td>\n",
       "      <td>0.500000</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>0.42</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>activemq-openwire-generator/src/main/java/org/apache/activemq/openwire/tool/SingleSourceGenerator.java</th>\n",
       "      <td>0</td>\n",
       "      <td>0.995781</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>88</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>20</td>\n",
       "      <td>1.428571</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0.666667</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0.15</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 29 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    CountClassCoupled  \\\n",
       "File                                                                    \n",
       "activemq-core/src/test/java/org/apache/activemq...                 12   \n",
       "activemq-core/src/main/java/org/apache/activemq...                 27   \n",
       "activemq-openwire-generator/src/main/java/org/a...                  0   \n",
       "\n",
       "                                                    OWN_LINE  \\\n",
       "File                                                           \n",
       "activemq-core/src/test/java/org/apache/activemq...  0.738916   \n",
       "activemq-core/src/main/java/org/apache/activemq...  0.569082   \n",
       "activemq-openwire-generator/src/main/java/org/a...  0.995781   \n",
       "\n",
       "                                                    CountDeclMethodProtected  \\\n",
       "File                                                                           \n",
       "activemq-core/src/test/java/org/apache/activemq...                         3   \n",
       "activemq-core/src/main/java/org/apache/activemq...                        10   \n",
       "activemq-openwire-generator/src/main/java/org/a...                         8   \n",
       "\n",
       "                                                    CountDeclInstanceVariable  \\\n",
       "File                                                                            \n",
       "activemq-core/src/test/java/org/apache/activemq...                          2   \n",
       "activemq-core/src/main/java/org/apache/activemq...                         21   \n",
       "activemq-openwire-generator/src/main/java/org/a...                          8   \n",
       "\n",
       "                                                    PercentLackOfCohesion  \\\n",
       "File                                                                        \n",
       "activemq-core/src/test/java/org/apache/activemq...                     77   \n",
       "activemq-core/src/main/java/org/apache/activemq...                     89   \n",
       "activemq-openwire-generator/src/main/java/org/a...                     88   \n",
       "\n",
       "                                                    CountDeclClass  \\\n",
       "File                                                                 \n",
       "activemq-core/src/test/java/org/apache/activemq...               3   \n",
       "activemq-core/src/main/java/org/apache/activemq...               5   \n",
       "activemq-openwire-generator/src/main/java/org/a...               1   \n",
       "\n",
       "                                                    MAJOR_LINE  AvgLineBlank  \\\n",
       "File                                                                           \n",
       "activemq-core/src/test/java/org/apache/activemq...           1             1   \n",
       "activemq-core/src/main/java/org/apache/activemq...           2             0   \n",
       "activemq-openwire-generator/src/main/java/org/a...           1             0   \n",
       "\n",
       "                                                    CountDeclMethodPublic  \\\n",
       "File                                                                        \n",
       "activemq-core/src/test/java/org/apache/activemq...                      8   \n",
       "activemq-core/src/main/java/org/apache/activemq...                     35   \n",
       "activemq-openwire-generator/src/main/java/org/a...                     20   \n",
       "\n",
       "                                                    CountInput_Mean  ...  \\\n",
       "File                                                                 ...   \n",
       "activemq-core/src/test/java/org/apache/activemq...         2.181818  ...   \n",
       "activemq-core/src/main/java/org/apache/activemq...         4.807692  ...   \n",
       "activemq-openwire-generator/src/main/java/org/a...         1.428571  ...   \n",
       "\n",
       "                                                    CountClassBase  \\\n",
       "File                                                                 \n",
       "activemq-core/src/test/java/org/apache/activemq...               2   \n",
       "activemq-core/src/main/java/org/apache/activemq...               4   \n",
       "activemq-openwire-generator/src/main/java/org/a...               1   \n",
       "\n",
       "                                                    OWN_COMMIT  \\\n",
       "File                                                             \n",
       "activemq-core/src/test/java/org/apache/activemq...    0.800000   \n",
       "activemq-core/src/main/java/org/apache/activemq...    0.500000   \n",
       "activemq-openwire-generator/src/main/java/org/a...    0.666667   \n",
       "\n",
       "                                                    MaxInheritanceTree  \\\n",
       "File                                                                     \n",
       "activemq-core/src/test/java/org/apache/activemq...                   6   \n",
       "activemq-core/src/main/java/org/apache/activemq...                   1   \n",
       "activemq-openwire-generator/src/main/java/org/a...                   2   \n",
       "\n",
       "                                                    CountDeclMethodPrivate  \\\n",
       "File                                                                         \n",
       "activemq-core/src/test/java/org/apache/activemq...                       0   \n",
       "activemq-core/src/main/java/org/apache/activemq...                       5   \n",
       "activemq-openwire-generator/src/main/java/org/a...                       0   \n",
       "\n",
       "                                                    MINOR_COMMIT  \\\n",
       "File                                                               \n",
       "activemq-core/src/test/java/org/apache/activemq...             0   \n",
       "activemq-core/src/main/java/org/apache/activemq...             0   \n",
       "activemq-openwire-generator/src/main/java/org/a...             0   \n",
       "\n",
       "                                                    AvgEssential  COMM  \\\n",
       "File                                                                     \n",
       "activemq-core/src/test/java/org/apache/activemq...             1     5   \n",
       "activemq-core/src/main/java/org/apache/activemq...             1    10   \n",
       "activemq-openwire-generator/src/main/java/org/a...             1     3   \n",
       "\n",
       "                                                    RatioCommentToCode  \\\n",
       "File                                                                     \n",
       "activemq-core/src/test/java/org/apache/activemq...                0.27   \n",
       "activemq-core/src/main/java/org/apache/activemq...                0.42   \n",
       "activemq-openwire-generator/src/main/java/org/a...                0.15   \n",
       "\n",
       "                                                    RealBug  PredictedBug  \n",
       "File                                                                       \n",
       "activemq-core/src/test/java/org/apache/activemq...     True          True  \n",
       "activemq-core/src/main/java/org/apache/activemq...     True          True  \n",
       "activemq-openwire-generator/src/main/java/org/a...     True          True  \n",
       "\n",
       "[3 rows x 29 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "correctly_predicted_bug = correctly_predicted_data[correctly_predicted_data['RealBug']==True]\n",
    "correctly_predicted_bug.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "systematic-equipment",
   "metadata": {},
   "source": [
    "### 3.4 Define feature cols and label col using correctly predicted testing data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "shaped-confirmation",
   "metadata": {},
   "outputs": [],
   "source": [
    "# select all rows and feature cols\n",
    "feature_cols = correctly_predicted_bug.iloc[:, :-2]\n",
    "# selected all rows and one label col (either RealBug or PredictedBug is fine since they are the same)\n",
    "label_col = correctly_predicted_bug.iloc[:, -2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "vocational-spending",
   "metadata": {},
   "source": [
    "### 3.5 Select one row of correctly predicted bug to be explained"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "cross-rhythm",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "one row of feature: \n",
      "\n",
      "                                                     CountClassCoupled  \\\n",
      "File                                                                    \n",
      "activemq-core/src/test/java/org/apache/activemq...                 12   \n",
      "\n",
      "                                                    OWN_LINE  \\\n",
      "File                                                           \n",
      "activemq-core/src/test/java/org/apache/activemq...  0.738916   \n",
      "\n",
      "                                                    CountDeclMethodProtected  \\\n",
      "File                                                                           \n",
      "activemq-core/src/test/java/org/apache/activemq...                         3   \n",
      "\n",
      "                                                    CountDeclInstanceVariable  \\\n",
      "File                                                                            \n",
      "activemq-core/src/test/java/org/apache/activemq...                          2   \n",
      "\n",
      "                                                    PercentLackOfCohesion  \\\n",
      "File                                                                        \n",
      "activemq-core/src/test/java/org/apache/activemq...                     77   \n",
      "\n",
      "                                                    CountDeclClass  \\\n",
      "File                                                                 \n",
      "activemq-core/src/test/java/org/apache/activemq...               3   \n",
      "\n",
      "                                                    MAJOR_LINE  AvgLineBlank  \\\n",
      "File                                                                           \n",
      "activemq-core/src/test/java/org/apache/activemq...           1             1   \n",
      "\n",
      "                                                    CountDeclMethodPublic  \\\n",
      "File                                                                        \n",
      "activemq-core/src/test/java/org/apache/activemq...                      8   \n",
      "\n",
      "                                                    CountInput_Mean  ...  \\\n",
      "File                                                                 ...   \n",
      "activemq-core/src/test/java/org/apache/activemq...         2.181818  ...   \n",
      "\n",
      "                                                    AvgLineComment  \\\n",
      "File                                                                 \n",
      "activemq-core/src/test/java/org/apache/activemq...               2   \n",
      "\n",
      "                                                    CountDeclClassVariable  \\\n",
      "File                                                                         \n",
      "activemq-core/src/test/java/org/apache/activemq...                       1   \n",
      "\n",
      "                                                    CountClassBase  \\\n",
      "File                                                                 \n",
      "activemq-core/src/test/java/org/apache/activemq...               2   \n",
      "\n",
      "                                                    OWN_COMMIT  \\\n",
      "File                                                             \n",
      "activemq-core/src/test/java/org/apache/activemq...         0.8   \n",
      "\n",
      "                                                    MaxInheritanceTree  \\\n",
      "File                                                                     \n",
      "activemq-core/src/test/java/org/apache/activemq...                   6   \n",
      "\n",
      "                                                    CountDeclMethodPrivate  \\\n",
      "File                                                                         \n",
      "activemq-core/src/test/java/org/apache/activemq...                       0   \n",
      "\n",
      "                                                    MINOR_COMMIT  \\\n",
      "File                                                               \n",
      "activemq-core/src/test/java/org/apache/activemq...             0   \n",
      "\n",
      "                                                    AvgEssential  COMM  \\\n",
      "File                                                                     \n",
      "activemq-core/src/test/java/org/apache/activemq...             1     5   \n",
      "\n",
      "                                                    RatioCommentToCode  \n",
      "File                                                                    \n",
      "activemq-core/src/test/java/org/apache/activemq...                0.27  \n",
      "\n",
      "[1 rows x 27 columns] \n",
      "\n",
      "one row of label: \n",
      "\n",
      " File\n",
      "activemq-core/src/test/java/org/apache/activemq/transport/fanout/FanoutTransportBrokerTest.java    True\n",
      "Name: RealBug, dtype: bool\n"
     ]
    }
   ],
   "source": [
    "# decide which row to be selected\n",
    "selected_row = 0\n",
    "# select the row in X_test which contains all of the feature values\n",
    "X_explain = feature_cols.iloc[[selected_row]]\n",
    "# select the corresponding label from the DataFrame that we just created above\n",
    "y_explain = label_col.iloc[[selected_row]]\n",
    "print('one row of feature:', '\\n\\n', X_explain, '\\n')\n",
    "print('one row of label:', '\\n\\n', y_explain)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "second-overhead",
   "metadata": {},
   "source": [
    "## 4. Create rules (explanations) and visualise it !"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "private-symbol",
   "metadata": {},
   "source": [
    "### 4.1 Initialise a PyExplainer object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "coastal-assist",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyexplainer import pyexplainer_pyexplainer\n",
    "\n",
    "py_explainer = pyexplainer_pyexplainer.PyExplainer(X_train = X_train,\n",
    "                                                   y_train = y_train,\n",
    "                                                   indep = X_train.columns,\n",
    "                                                   dep = 'RealBug',\n",
    "                                                   blackbox_model = rf_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fatal-leather",
   "metadata": {},
   "source": [
    "### 4.2 Create rules by triggering explain function under PyExplainer object\n",
    "##### Attention: This step can be time-consuming"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "floating-milwaukee",
   "metadata": {},
   "outputs": [],
   "source": [
    "rules = py_explainer.explain(X_explain=X_explain,\n",
    "                             y_explain=y_explain,\n",
    "                             search_function='crossoverinterpolation')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "interim-banner",
   "metadata": {},
   "source": [
    "##### Those created rules are stored in a dictionary, for more information about what is contained in each key, please refer to 'Appendix' part"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "short-atlanta",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['synthetic_data', 'synthetic_predictions', 'X_explain', 'y_explain', 'indep', 'dep', 'top_k_positive_rules', 'top_k_negative_rules', 'local_rulefit_model'])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rules.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "desperate-profit",
   "metadata": {},
   "source": [
    "### 4.3 Simply trigger visualise function under PyExplainer object to visualise the created rules "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "fourth-queensland",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a2baa847c99441bd982b8792e092f003",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(Label(value='Risk Score: '), FloatProgress(value=0.0, bar_style='info', layout=Layout(width='40…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "07d0fafe28994776bc5666a43c492587",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output(layout=Layout(border='3px solid black'))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f6f9d182fd2949dbbcc534fd2bfd17d9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "FloatSlider(value=1.0, continuous_update=False, description='#1 The value of CountDeclClassVariable is more th…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "25fbbd2765cc4afba015f97b46dfdfc8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "FloatSlider(value=0.0, continuous_update=False, description='#2 The value of CountDeclMethodPrivate is more th…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "py_explainer.visualise(rules)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "olive-filename",
   "metadata": {},
   "source": [
    "# Appendix"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "closed-draft",
   "metadata": {},
   "source": [
    "## The detail of variables used to to create PyExplainer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sufficient-vault",
   "metadata": {},
   "source": [
    "### Synthetic_data\n",
    "\n",
    "Synthetic_data is data that are generated by PyExplainer using one of the following approaches.\n",
    "\n",
    "1. Crossover and Interpolation\n",
    "2. Random Perturbation.\n",
    "\n",
    "After Synthetic_data is generated, it is stored as a pandas DataFrame object. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "vital-dynamics",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Type of pyExp_rule_obj['synthetic_data'] -  <class 'pandas.core.frame.DataFrame'> \n",
      "\n",
      "Example\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CountClassCoupled</th>\n",
       "      <th>OWN_LINE</th>\n",
       "      <th>CountDeclMethodProtected</th>\n",
       "      <th>CountDeclInstanceVariable</th>\n",
       "      <th>PercentLackOfCohesion</th>\n",
       "      <th>CountDeclClass</th>\n",
       "      <th>MAJOR_LINE</th>\n",
       "      <th>AvgLineBlank</th>\n",
       "      <th>CountDeclMethodPublic</th>\n",
       "      <th>CountInput_Mean</th>\n",
       "      <th>...</th>\n",
       "      <th>AvgLineComment</th>\n",
       "      <th>CountDeclClassVariable</th>\n",
       "      <th>CountClassBase</th>\n",
       "      <th>OWN_COMMIT</th>\n",
       "      <th>MaxInheritanceTree</th>\n",
       "      <th>CountDeclMethodPrivate</th>\n",
       "      <th>MINOR_COMMIT</th>\n",
       "      <th>AvgEssential</th>\n",
       "      <th>COMM</th>\n",
       "      <th>RatioCommentToCode</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.75</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>42.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>3.57</td>\n",
       "      <td>...</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.8</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>0.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2.60</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2 rows × 27 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   CountClassCoupled  OWN_LINE  CountDeclMethodProtected  \\\n",
       "0                1.0      0.75                       1.0   \n",
       "1                1.0      1.00                       0.0   \n",
       "\n",
       "   CountDeclInstanceVariable  PercentLackOfCohesion  CountDeclClass  \\\n",
       "0                        2.0                   42.0             1.0   \n",
       "1                        2.0                   68.0             1.0   \n",
       "\n",
       "   MAJOR_LINE  AvgLineBlank  CountDeclMethodPublic  CountInput_Mean  ...  \\\n",
       "0         0.0           3.0                    6.0             3.57  ...   \n",
       "1         0.0           0.0                    5.0             2.60  ...   \n",
       "\n",
       "   AvgLineComment  CountDeclClassVariable  CountClassBase  OWN_COMMIT  \\\n",
       "0             3.0                     1.0             1.0         0.8   \n",
       "1             0.0                     3.0             1.0         1.0   \n",
       "\n",
       "   MaxInheritanceTree  CountDeclMethodPrivate  MINOR_COMMIT  AvgEssential  \\\n",
       "0                 3.0                     0.0           0.0           1.0   \n",
       "1                 6.0                     0.0           0.0           1.0   \n",
       "\n",
       "   COMM  RatioCommentToCode  \n",
       "0   5.0                0.27  \n",
       "1   2.0                0.64  \n",
       "\n",
       "[2 rows x 27 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(\"Type of pyExp_rule_obj['synthetic_data'] - \", type(rules['synthetic_data']), \"\\n\")\n",
    "\n",
    "print('Example')\n",
    "display(rules['synthetic_data'].head(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "quality-examination",
   "metadata": {},
   "source": [
    "### Synthetic_predictions\n",
    "\n",
    "Synthetic_predictions is the prediction of Synthetic_data, which is obtained from the global model inside PyExplainer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "funny-ranch",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Type of pyExp_rule_obj['synthetic_predictions'] -  <class 'numpy.ndarray'> \n",
      "\n",
      "Example \n",
      "\n",
      " [ True False False ... False False False]\n"
     ]
    }
   ],
   "source": [
    "print(\"Type of pyExp_rule_obj['synthetic_predictions'] - \", type(rules['synthetic_predictions']), \"\\n\")\n",
    "print(\"Example\", \"\\n\\n\", rules['synthetic_predictions'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "removable-behalf",
   "metadata": {},
   "source": [
    "### X_explain\n",
    "\n",
    "X_explain is an instance to be explained (which is a defective commit in this context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "compound-jewel",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Type of pyExp_rule_obj['X_explain'] -  <class 'pandas.core.frame.DataFrame'> \n",
      "\n",
      "Example\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CountClassCoupled</th>\n",
       "      <th>OWN_LINE</th>\n",
       "      <th>CountDeclMethodProtected</th>\n",
       "      <th>CountDeclInstanceVariable</th>\n",
       "      <th>PercentLackOfCohesion</th>\n",
       "      <th>CountDeclClass</th>\n",
       "      <th>MAJOR_LINE</th>\n",
       "      <th>AvgLineBlank</th>\n",
       "      <th>CountDeclMethodPublic</th>\n",
       "      <th>CountInput_Mean</th>\n",
       "      <th>...</th>\n",
       "      <th>AvgLineComment</th>\n",
       "      <th>CountDeclClassVariable</th>\n",
       "      <th>CountClassBase</th>\n",
       "      <th>OWN_COMMIT</th>\n",
       "      <th>MaxInheritanceTree</th>\n",
       "      <th>CountDeclMethodPrivate</th>\n",
       "      <th>MINOR_COMMIT</th>\n",
       "      <th>AvgEssential</th>\n",
       "      <th>COMM</th>\n",
       "      <th>RatioCommentToCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>File</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>activemq-core/src/test/java/org/apache/activemq/transport/fanout/FanoutTransportBrokerTest.java</th>\n",
       "      <td>12</td>\n",
       "      <td>0.738916</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>77</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "      <td>2.181818</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0.8</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>0.27</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 27 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    CountClassCoupled  \\\n",
       "File                                                                    \n",
       "activemq-core/src/test/java/org/apache/activemq...                 12   \n",
       "\n",
       "                                                    OWN_LINE  \\\n",
       "File                                                           \n",
       "activemq-core/src/test/java/org/apache/activemq...  0.738916   \n",
       "\n",
       "                                                    CountDeclMethodProtected  \\\n",
       "File                                                                           \n",
       "activemq-core/src/test/java/org/apache/activemq...                         3   \n",
       "\n",
       "                                                    CountDeclInstanceVariable  \\\n",
       "File                                                                            \n",
       "activemq-core/src/test/java/org/apache/activemq...                          2   \n",
       "\n",
       "                                                    PercentLackOfCohesion  \\\n",
       "File                                                                        \n",
       "activemq-core/src/test/java/org/apache/activemq...                     77   \n",
       "\n",
       "                                                    CountDeclClass  \\\n",
       "File                                                                 \n",
       "activemq-core/src/test/java/org/apache/activemq...               3   \n",
       "\n",
       "                                                    MAJOR_LINE  AvgLineBlank  \\\n",
       "File                                                                           \n",
       "activemq-core/src/test/java/org/apache/activemq...           1             1   \n",
       "\n",
       "                                                    CountDeclMethodPublic  \\\n",
       "File                                                                        \n",
       "activemq-core/src/test/java/org/apache/activemq...                      8   \n",
       "\n",
       "                                                    CountInput_Mean  ...  \\\n",
       "File                                                                 ...   \n",
       "activemq-core/src/test/java/org/apache/activemq...         2.181818  ...   \n",
       "\n",
       "                                                    AvgLineComment  \\\n",
       "File                                                                 \n",
       "activemq-core/src/test/java/org/apache/activemq...               2   \n",
       "\n",
       "                                                    CountDeclClassVariable  \\\n",
       "File                                                                         \n",
       "activemq-core/src/test/java/org/apache/activemq...                       1   \n",
       "\n",
       "                                                    CountClassBase  \\\n",
       "File                                                                 \n",
       "activemq-core/src/test/java/org/apache/activemq...               2   \n",
       "\n",
       "                                                    OWN_COMMIT  \\\n",
       "File                                                             \n",
       "activemq-core/src/test/java/org/apache/activemq...         0.8   \n",
       "\n",
       "                                                    MaxInheritanceTree  \\\n",
       "File                                                                     \n",
       "activemq-core/src/test/java/org/apache/activemq...                   6   \n",
       "\n",
       "                                                    CountDeclMethodPrivate  \\\n",
       "File                                                                         \n",
       "activemq-core/src/test/java/org/apache/activemq...                       0   \n",
       "\n",
       "                                                    MINOR_COMMIT  \\\n",
       "File                                                               \n",
       "activemq-core/src/test/java/org/apache/activemq...             0   \n",
       "\n",
       "                                                    AvgEssential  COMM  \\\n",
       "File                                                                     \n",
       "activemq-core/src/test/java/org/apache/activemq...             1     5   \n",
       "\n",
       "                                                    RatioCommentToCode  \n",
       "File                                                                    \n",
       "activemq-core/src/test/java/org/apache/activemq...                0.27  \n",
       "\n",
       "[1 rows x 27 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(\"Type of pyExp_rule_obj['X_explain'] - \", type(rules['X_explain']), \"\\n\")\n",
    "\n",
    "print('Example')\n",
    "display(rules['X_explain'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ongoing-drilling",
   "metadata": {},
   "source": [
    "### y_explain\n",
    "\n",
    "y_explain is a label of X_explain "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "antique-straight",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Type of pyExp_rule_obj['y_explain'] -  <class 'pandas.core.series.Series'> \n",
      "\n",
      "Example \n",
      "\n",
      " File\n",
      "activemq-core/src/test/java/org/apache/activemq/transport/fanout/FanoutTransportBrokerTest.java    True\n",
      "Name: RealBug, dtype: bool\n"
     ]
    }
   ],
   "source": [
    "print(\"Type of pyExp_rule_obj['y_explain'] - \", type(rules['y_explain']), \"\\n\")\n",
    "print(\"Example\", \"\\n\\n\", rules['y_explain'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "narrative-senator",
   "metadata": {},
   "source": [
    "### indep\n",
    "#### indep is feature names of X_explain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "sized-valuable",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Type of pyExp_rule_obj['indep'] -  <class 'pandas.core.indexes.base.Index'> \n",
      "\n",
      "Example \n",
      "\n",
      " Index(['CountClassCoupled', 'OWN_LINE', 'CountDeclMethodProtected',\n",
      "       'CountDeclInstanceVariable', 'PercentLackOfCohesion', 'CountDeclClass',\n",
      "       'MAJOR_LINE', 'AvgLineBlank', 'CountDeclMethodPublic',\n",
      "       'CountInput_Mean', 'MaxNesting_Min', 'CountOutput_Min',\n",
      "       'CountDeclMethodDefault', 'AvgCyclomaticModified', 'CountInput_Min',\n",
      "       'CountDeclClassMethod', 'CountClassDerived', 'AvgLineComment',\n",
      "       'CountDeclClassVariable', 'CountClassBase', 'OWN_COMMIT',\n",
      "       'MaxInheritanceTree', 'CountDeclMethodPrivate', 'MINOR_COMMIT',\n",
      "       'AvgEssential', 'COMM', 'RatioCommentToCode'],\n",
      "      dtype='object')\n"
     ]
    }
   ],
   "source": [
    "print(\"Type of pyExp_rule_obj['indep'] - \", type(rules['indep']), \"\\n\")\n",
    "print(\"Example\", \"\\n\\n\", rules['indep'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "atlantic-charles",
   "metadata": {},
   "source": [
    "### dep\n",
    "#### dep is a label name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "moral-collectible",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Type of pyExp_rule_obj['dep'] -  <class 'str'> \n",
      "\n",
      "Example \n",
      "\n",
      " RealBug\n"
     ]
    }
   ],
   "source": [
    "print(\"Type of pyExp_rule_obj['dep'] - \", type(rules['dep']), \"\\n\")\n",
    "print(\"Example\", \"\\n\\n\", rules['dep'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "compact-willow",
   "metadata": {},
   "source": [
    "### top_k_positive_rules\n",
    "\n",
    "top_k_positive_rules is top-k rules that are genereated by PyExplainer to explain why a commit is predicted as defective.\n",
    "\n",
    "Here we show top-3 rules that lead to defective commits="
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "measured-miller",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Type of pyExp_rule_obj['top_k_positive_rules'] -  <class 'pandas.core.frame.DataFrame'> \n",
      "\n",
      "Example\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>rule</th>\n",
       "      <th>type</th>\n",
       "      <th>coef</th>\n",
       "      <th>support</th>\n",
       "      <th>importance</th>\n",
       "      <th>is_satisfy_instance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>161</td>\n",
       "      <td>AvgLineComment &gt; -3.6149998903274536 &amp; CountCl...</td>\n",
       "      <td>rule</td>\n",
       "      <td>3.490109e-23</td>\n",
       "      <td>0.271540</td>\n",
       "      <td>1.552240e-23</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>562</td>\n",
       "      <td>OWN_COMMIT &lt;= 0.8349999785423279 &amp; COMM &gt; 2.98...</td>\n",
       "      <td>rule</td>\n",
       "      <td>3.639777e-23</td>\n",
       "      <td>0.208877</td>\n",
       "      <td>1.479593e-23</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>820</td>\n",
       "      <td>CountClassBase &lt;= 2.9850000143051147 &amp; OWN_COM...</td>\n",
       "      <td>rule</td>\n",
       "      <td>3.484802e-23</td>\n",
       "      <td>0.232376</td>\n",
       "      <td>1.471797e-23</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index                                               rule  type  \\\n",
       "0    161  AvgLineComment > -3.6149998903274536 & CountCl...  rule   \n",
       "1    562  OWN_COMMIT <= 0.8349999785423279 & COMM > 2.98...  rule   \n",
       "2    820  CountClassBase <= 2.9850000143051147 & OWN_COM...  rule   \n",
       "\n",
       "           coef   support    importance is_satisfy_instance  \n",
       "0  3.490109e-23  0.271540  1.552240e-23                True  \n",
       "1  3.639777e-23  0.208877  1.479593e-23                True  \n",
       "2  3.484802e-23  0.232376  1.471797e-23                True  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(\"Type of pyExp_rule_obj['top_k_positive_rules'] - \", type(rules['top_k_positive_rules']), \"\\n\")\n",
    "print('Example')\n",
    "display(rules['top_k_positive_rules'].head(3))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "employed-choir",
   "metadata": {},
   "source": [
    "### top_k_negative_rules\n",
    "\n",
    "top_k_negative_rules is top-k negative rules that are genereated by PyExplainer to explain why a commit is predicted as clean.\n",
    "\n",
    "The default number of generated rules is 3.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "opposite-ownership",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Type of pyExp_rule_obj['top_k_negative_rules'] -  <class 'pandas.core.frame.DataFrame'> \n",
      "\n",
      "Example\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rule</th>\n",
       "      <th>type</th>\n",
       "      <th>coef</th>\n",
       "      <th>support</th>\n",
       "      <th>importance</th>\n",
       "      <th>Class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>918</th>\n",
       "      <td>OWN_COMMIT &gt; 0.8550000190734863 &amp; CountDeclCla...</td>\n",
       "      <td>rule</td>\n",
       "      <td>-4.819474e-23</td>\n",
       "      <td>0.678851</td>\n",
       "      <td>2.250298e-23</td>\n",
       "      <td>Clean</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1652</th>\n",
       "      <td>OWN_COMMIT &gt; 0.8650000095367432</td>\n",
       "      <td>rule</td>\n",
       "      <td>-4.748976e-23</td>\n",
       "      <td>0.689295</td>\n",
       "      <td>2.197742e-23</td>\n",
       "      <td>Clean</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1107</th>\n",
       "      <td>CountDeclMethodPrivate &lt;= 1.8399999737739563 &amp;...</td>\n",
       "      <td>rule</td>\n",
       "      <td>-4.863102e-23</td>\n",
       "      <td>0.725849</td>\n",
       "      <td>2.169360e-23</td>\n",
       "      <td>Clean</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                   rule  type          coef  \\\n",
       "918   OWN_COMMIT > 0.8550000190734863 & CountDeclCla...  rule -4.819474e-23   \n",
       "1652                    OWN_COMMIT > 0.8650000095367432  rule -4.748976e-23   \n",
       "1107  CountDeclMethodPrivate <= 1.8399999737739563 &...  rule -4.863102e-23   \n",
       "\n",
       "       support    importance  Class  \n",
       "918   0.678851  2.250298e-23  Clean  \n",
       "1652  0.689295  2.197742e-23  Clean  \n",
       "1107  0.725849  2.169360e-23  Clean  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(\"Type of pyExp_rule_obj['top_k_negative_rules'] - \", type(rules['top_k_negative_rules']), \"\\n\")\n",
    "print('Example')\n",
    "display(rules['top_k_negative_rules'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "written-insertion",
   "metadata": {},
   "source": [
    "# Bug Report Channel\n",
    "#### Please report <a href=\"https://github.com/awsm-research/pyExplainer/issues\">here</a>\n",
    "#### 📧 or email your report to michaelfu1998@gmail.com"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}