{ "cells": [ { "cell_type": "markdown", "id": "italic-seafood", "metadata": {}, "source": [ "# <center> Welcome to PyExplainer Quickstart Guide </center>" ] }, { "cell_type": "markdown", "id": "northern-birmingham", "metadata": {}, "source": [ "# Top Note - MUST READ !!\n", "#### 1. When initialising the PyExplainer object, you should prepare 5 necessary parameters and follow the data type \n", "(1) X_train (pd.core.frame.DataFrame) - feature columns from training data <br><br> \n", "(2) y_train (pd.core.series.Series) - label column from training data <br><br>\n", "(3) indep (pd.core.indexes.base.Index) - names of feature columns > most of the time, you can get it by 'X_explain.columns' <br><br>\n", "(4) dep (str) - name of label column<br><br>\n", "(5) blackbox_model (any supervised classification model trained from sklearn lib) - model trained from sklearn lib<br><br>\n", "\n", "#### 2. When using the explain() function under PyExplainer object, you should prepare 2 parameters and follow the data type\n", "(1) X_explain (pd.core.frame.DataFrame) - one row of feature data <br><br> \n", "(2) y_explain (pd.core.series.Series) - one row of predicted data \n", "\n", "#### 3. Be careful when using the custom pandas index for Series and DataFrame \n", "In our Full Tutorial (PART B) example, the FileName column was used as the custom index.<br> \n", "However, it is fine if you don't have custom index, pandas will generate default row index starting from 0.<br><br>\n", "If you do want to make use of custom index, make sure to use it consistently, whenever you do the data processing.<br><br>\n", "Otherwise, some of your data may have pandas default index while the others have your custom index, <br><br>\n", "which will trigger errors whenever you try to combine your DataFrame and Series. " ] }, { "cell_type": "markdown", "id": "conscious-roommate", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "employed-intensity", "metadata": {}, "source": [ "# PART A - Quick Start" ] }, { "cell_type": "markdown", "id": "deadly-immigration", "metadata": {}, "source": [ "## 1. Prepare data and model\n", "\n", "Note. We use the default data and model here for an example" ] }, { "cell_type": "markdown", "id": "higher-cooling", "metadata": {}, "source": [ "### 1.1 Import required library" ] }, { "cell_type": "code", "execution_count": 1, "id": "harmful-morocco", "metadata": {}, "outputs": [], "source": [ "from pyexplainer import pyexplainer_pyexplainer" ] }, { "cell_type": "markdown", "id": "practical-jenny", "metadata": {}, "source": [ "### 1.2 Obtain default dataset and global model (Random Forest)" ] }, { "cell_type": "code", "execution_count": 2, "id": "employed-adobe", "metadata": {}, "outputs": [], "source": [ "default_data_and_model = pyexplainer_pyexplainer.get_dflt()\n", "py_explainer = pyexplainer_pyexplainer.PyExplainer(X_train = default_data_and_model['X_train'],\n", " y_train = default_data_and_model['y_train'],\n", " indep = default_data_and_model['indep'],\n", " dep = default_data_and_model['dep'],\n", " blackbox_model = default_data_and_model['blackbox_model'])" ] }, { "cell_type": "markdown", "id": "informational-disclosure", "metadata": {}, "source": [ "## 🔧2. Create PyExplainer object " ] }, { "cell_type": "markdown", "id": "anticipated-tourism", "metadata": {}, "source": [ "### 2.1 Prepare data for creating PyExplainer" ] }, { "cell_type": "code", "execution_count": 3, "id": "abstract-disposal", "metadata": {}, "outputs": [], "source": [ "X_explain = default_data_and_model['X_explain']\n", "y_explain = default_data_and_model['y_explain']" ] }, { "cell_type": "markdown", "id": "inside-republic", "metadata": {}, "source": [ "### 2.2 Create rules" ] }, { "cell_type": "code", "execution_count": 4, "id": "spatial-wrong", "metadata": {}, "outputs": [], "source": [ "created_rules = py_explainer.explain(X_explain=X_explain,\n", " y_explain=y_explain,\n", " search_function='crossoverinterpolation')" ] }, { "cell_type": "markdown", "id": "engaging-ridge", "metadata": {}, "source": [ "## 3. Create interactive visualization\n", "\n", "You can change feature values at the slider bar to observe change of risk score." ] }, { "cell_type": "code", "execution_count": 33, "id": "second-trout", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9feefd892e174452bc28777e0ad4dfba", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(Label(value='Risk Score: '), FloatProgress(value=0.0, bar_style='info', layout=Layout(width='40…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "07d0fafe28994776bc5666a43c492587", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output(layout=Layout(border='3px solid black'), outputs=({'output_type': 'display_data', 'data': {'text/plain'…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7f0ae3bba93d4ffbb4a944874bfcbe74", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatSlider(value=0.0, continuous_update=False, description='#1 The value of CountDeclInstanceVariable is more…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "012c0a572d1e440b8bc3d847bed27af6", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatSlider(value=90.0, continuous_update=False, description='#2 The value of PercentLackOfCohesion is more th…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "fc2862721c44469c90a106c385682435", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatSlider(value=7.0, continuous_update=False, description='#3 The value of CountDeclMethodPublic is more tha…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "py_explainer.visualise(created_rules)" ] }, { "cell_type": "markdown", "id": "894f3a9a", "metadata": {}, "source": [ "##### *** If the widget is not displayed, please run the code cell below, restart the notebook, and rerun from the top" ] }, { "cell_type": "code", "execution_count": null, "id": "9db21a4d", "metadata": {}, "outputs": [], "source": [ "import os \n", "os.system(\"jupyter nbextension enable --py widgetsnbextension\")" ] }, { "cell_type": "markdown", "id": "a15de81b", "metadata": {}, "source": [ "# PART B - Full Tutorial" ] }, { "cell_type": "markdown", "id": "italic-services", "metadata": {}, "source": [ "## 1. Prepare sample data and model" ] }, { "cell_type": "markdown", "id": "front-programmer", "metadata": {}, "source": [ "### 1.1 For the simplicity, we load the sample DataFrame that is included in the package already" ] }, { "cell_type": "code", "execution_count": 6, "id": "removable-problem", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>File</th>\n", " <th>CountDeclMethodPrivate</th>\n", " <th>AvgLineCode</th>\n", " <th>CountLine</th>\n", " <th>MaxCyclomatic</th>\n", " <th>CountDeclMethodDefault</th>\n", " <th>AvgEssential</th>\n", " <th>CountDeclClassVariable</th>\n", " <th>SumCyclomaticStrict</th>\n", " <th>AvgCyclomatic</th>\n", " <th>...</th>\n", " <th>OWN_LINE</th>\n", " <th>OWN_COMMIT</th>\n", " <th>MINOR_COMMIT</th>\n", " <th>MINOR_LINE</th>\n", " <th>MAJOR_COMMIT</th>\n", " <th>MAJOR_LINE</th>\n", " <th>RealBug</th>\n", " <th>HeuBug</th>\n", " <th>HeuBugCount</th>\n", " <th>RealBugCount</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>activemq-console/src/main/java/org/apache/acti...</td>\n", " <td>0</td>\n", " <td>10</td>\n", " <td>171</td>\n", " <td>5</td>\n", " <td>0</td>\n", " <td>2</td>\n", " <td>0</td>\n", " <td>18</td>\n", " <td>2</td>\n", " <td>...</td>\n", " <td>1.00000</td>\n", " <td>1.0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>activemq-console/src/main/java/org/apache/acti...</td>\n", " <td>0</td>\n", " <td>8</td>\n", " <td>123</td>\n", " <td>5</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>15</td>\n", " <td>3</td>\n", " <td>...</td>\n", " <td>0.98374</td>\n", " <td>0.5</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>2</td>\n", " <td>1</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>activemq-console/src/main/java/org/apache/acti...</td>\n", " <td>0</td>\n", " <td>7</td>\n", " <td>136</td>\n", " <td>5</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>16</td>\n", " <td>2</td>\n", " <td>...</td>\n", " <td>1.00000</td>\n", " <td>1.0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>3 rows × 70 columns</p>\n", "</div>" ], "text/plain": [ " File CountDeclMethodPrivate \\\n", "0 activemq-console/src/main/java/org/apache/acti... 0 \n", "1 activemq-console/src/main/java/org/apache/acti... 0 \n", "2 activemq-console/src/main/java/org/apache/acti... 0 \n", "\n", " AvgLineCode CountLine MaxCyclomatic CountDeclMethodDefault \\\n", "0 10 171 5 0 \n", "1 8 123 5 0 \n", "2 7 136 5 0 \n", "\n", " AvgEssential CountDeclClassVariable SumCyclomaticStrict AvgCyclomatic \\\n", "0 2 0 18 2 \n", "1 1 1 15 3 \n", "2 1 1 16 2 \n", "\n", " ... OWN_LINE OWN_COMMIT MINOR_COMMIT MINOR_LINE MAJOR_COMMIT \\\n", "0 ... 1.00000 1.0 0 1 1 \n", "1 ... 0.98374 0.5 0 1 2 \n", "2 ... 1.00000 1.0 0 1 1 \n", "\n", " MAJOR_LINE RealBug HeuBug HeuBugCount RealBugCount \n", "0 0 False False 0 0 \n", "1 1 False False 0 0 \n", "2 0 False False 0 0 \n", "\n", "[3 rows x 70 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "from pyexplainer import pyexplainer_pyexplainer\n", "\n", "df = pyexplainer_pyexplainer.load_sample_data()\n", "df.head(3)" ] }, { "cell_type": "markdown", "id": "connected-occurrence", "metadata": {}, "source": [ "### 1.2 Define index column (OPTIONAL) and drop unwanted columns\n", "##### First, we set 'File' col as index col since it is the file that we wanna inspect, and it has nothing to do with features or label\n", "##### We use 'RealBug' as the label col, and the cols before 'RealBug' as feature cols\n", "##### Then we drop unnecessary cols (e.g. File, HeuBug, HeuBugCount, RealBugCount)" ] }, { "cell_type": "code", "execution_count": 7, "id": "median-moisture", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>CountDeclMethodPrivate</th>\n", " <th>AvgLineCode</th>\n", " <th>CountLine</th>\n", " <th>MaxCyclomatic</th>\n", " <th>CountDeclMethodDefault</th>\n", " <th>AvgEssential</th>\n", " <th>CountDeclClassVariable</th>\n", " <th>SumCyclomaticStrict</th>\n", " <th>AvgCyclomatic</th>\n", " <th>AvgLine</th>\n", " <th>...</th>\n", " <th>DDEV</th>\n", " <th>Added_lines</th>\n", " <th>Del_lines</th>\n", " <th>OWN_LINE</th>\n", " <th>OWN_COMMIT</th>\n", " <th>MINOR_COMMIT</th>\n", " <th>MINOR_LINE</th>\n", " <th>MAJOR_COMMIT</th>\n", " <th>MAJOR_LINE</th>\n", " <th>RealBug</th>\n", " </tr>\n", " <tr>\n", " <th>File</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>activemq-console/src/main/java/org/apache/activemq/console/command/AbstractAmqCommand.java</th>\n", " <td>0</td>\n", " <td>10</td>\n", " <td>171</td>\n", " <td>5</td>\n", " <td>0</td>\n", " <td>2</td>\n", " <td>0</td>\n", " <td>18</td>\n", " <td>2</td>\n", " <td>18</td>\n", " <td>...</td>\n", " <td>1</td>\n", " <td>32</td>\n", " <td>18</td>\n", " <td>1.00000</td>\n", " <td>1.0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>activemq-console/src/main/java/org/apache/activemq/console/command/AbstractCommand.java</th>\n", " <td>0</td>\n", " <td>8</td>\n", " <td>123</td>\n", " <td>5</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>15</td>\n", " <td>3</td>\n", " <td>17</td>\n", " <td>...</td>\n", " <td>2</td>\n", " <td>30</td>\n", " <td>28</td>\n", " <td>0.98374</td>\n", " <td>0.5</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>2</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>activemq-console/src/main/java/org/apache/activemq/console/command/AbstractJmxCommand.java</th>\n", " <td>0</td>\n", " <td>7</td>\n", " <td>136</td>\n", " <td>5</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>16</td>\n", " <td>2</td>\n", " <td>13</td>\n", " <td>...</td>\n", " <td>1</td>\n", " <td>8</td>\n", " <td>8</td>\n", " <td>1.00000</td>\n", " <td>1.0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>False</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>3 rows × 66 columns</p>\n", "</div>" ], "text/plain": [ " CountDeclMethodPrivate \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "\n", " AvgLineCode CountLine \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 10 171 \n", "activemq-console/src/main/java/org/apache/activ... 8 123 \n", "activemq-console/src/main/java/org/apache/activ... 7 136 \n", "\n", " MaxCyclomatic \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 5 \n", "activemq-console/src/main/java/org/apache/activ... 5 \n", "activemq-console/src/main/java/org/apache/activ... 5 \n", "\n", " CountDeclMethodDefault \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "\n", " AvgEssential \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 2 \n", "activemq-console/src/main/java/org/apache/activ... 1 \n", "activemq-console/src/main/java/org/apache/activ... 1 \n", "\n", " CountDeclClassVariable \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "activemq-console/src/main/java/org/apache/activ... 1 \n", "activemq-console/src/main/java/org/apache/activ... 1 \n", "\n", " SumCyclomaticStrict \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 18 \n", "activemq-console/src/main/java/org/apache/activ... 15 \n", "activemq-console/src/main/java/org/apache/activ... 16 \n", "\n", " AvgCyclomatic AvgLine \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 2 18 \n", "activemq-console/src/main/java/org/apache/activ... 3 17 \n", "activemq-console/src/main/java/org/apache/activ... 2 13 \n", "\n", " ... DDEV Added_lines \\\n", "File ... \n", "activemq-console/src/main/java/org/apache/activ... ... 1 32 \n", "activemq-console/src/main/java/org/apache/activ... ... 2 30 \n", "activemq-console/src/main/java/org/apache/activ... ... 1 8 \n", "\n", " Del_lines OWN_LINE \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 18 1.00000 \n", "activemq-console/src/main/java/org/apache/activ... 28 0.98374 \n", "activemq-console/src/main/java/org/apache/activ... 8 1.00000 \n", "\n", " OWN_COMMIT MINOR_COMMIT \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 1.0 0 \n", "activemq-console/src/main/java/org/apache/activ... 0.5 0 \n", "activemq-console/src/main/java/org/apache/activ... 1.0 0 \n", "\n", " MINOR_LINE MAJOR_COMMIT \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 1 1 \n", "activemq-console/src/main/java/org/apache/activ... 1 2 \n", "activemq-console/src/main/java/org/apache/activ... 1 1 \n", "\n", " MAJOR_LINE RealBug \n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0 False \n", "activemq-console/src/main/java/org/apache/activ... 1 False \n", "activemq-console/src/main/java/org/apache/activ... 0 False \n", "\n", "[3 rows x 66 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df.set_index(df['File'])\n", "df = df.drop(['File', 'HeuBug', 'HeuBugCount', 'RealBugCount'], axis=1)\n", "df.head(3)" ] }, { "cell_type": "markdown", "id": "advance-wilderness", "metadata": {}, "source": [ "### 1.3 Define feature cols (X), and label col (y)\n", "##### the function AutoSpearman is used as a feature selection method to reduce number of features\n", "##### for more information about the algorithm please refer to [this paper](https://ieeexplore.ieee.org/document/8530020)" ] }, { "cell_type": "code", "execution_count": 8, "id": "growing-scale", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Part 1) Automatically select non-correlated metrics based on a Spearman rank correlation test\n", "> Step 1 comparing between CountDeclMethod and CountDeclFunction\n", ">> CountDeclMethod has the average correlation of 0.433 with other metrics\n", ">> CountDeclFunction has the average correlation of 0.433 with other metrics\n", ">> Exclude CountDeclMethod\n", "> Step 2 comparing between MAJOR_COMMIT and DDEV\n", ">> MAJOR_COMMIT has the average correlation of 0.274 with other metrics\n", ">> DDEV has the average correlation of 0.274 with other metrics\n", ">> Exclude DDEV\n", "> Step 3 comparing between SumCyclomatic and SumCyclomaticModified\n", ">> SumCyclomatic has the average correlation of 0.501 with other metrics\n", ">> SumCyclomaticModified has the average correlation of 0.501 with other metrics\n", ">> Exclude SumCyclomatic\n", "> Step 4 comparing between AvgCyclomatic and AvgCyclomaticModified\n", ">> AvgCyclomatic has the average correlation of 0.387 with other metrics\n", ">> AvgCyclomaticModified has the average correlation of 0.387 with other metrics\n", ">> Exclude AvgCyclomatic\n", "> Step 5 comparing between MaxCyclomatic and MaxCyclomaticModified\n", ">> MaxCyclomatic has the average correlation of 0.476 with other metrics\n", ">> MaxCyclomaticModified has the average correlation of 0.476 with other metrics\n", ">> Exclude MaxCyclomatic\n", "> Step 6 comparing between SumCyclomaticModified and SumCyclomaticStrict\n", ">> SumCyclomaticModified has the average correlation of 0.488 with other metrics\n", ">> SumCyclomaticStrict has the average correlation of 0.489 with other metrics\n", ">> Exclude SumCyclomaticStrict\n", "> Step 7 comparing between CountStmtDecl and CountLineCodeDecl\n", ">> CountStmtDecl has the average correlation of 0.49 with other metrics\n", ">> CountLineCodeDecl has the average correlation of 0.487 with other metrics\n", ">> Exclude CountStmtDecl\n", "> Step 8 comparing between CountLineCode and CountStmt\n", ">> CountLineCode has the average correlation of 0.504 with other metrics\n", ">> CountStmt has the average correlation of 0.501 with other metrics\n", ">> Exclude CountLineCode\n", "> Step 9 comparing between CountSemicolon and CountStmt\n", ">> CountSemicolon has the average correlation of 0.484 with other metrics\n", ">> CountStmt has the average correlation of 0.492 with other metrics\n", ">> Exclude CountStmt\n", "> Step 10 comparing between OWN_COMMIT and MAJOR_COMMIT\n", ">> OWN_COMMIT has the average correlation of 0.238 with other metrics\n", ">> MAJOR_COMMIT has the average correlation of 0.249 with other metrics\n", ">> Exclude MAJOR_COMMIT\n", "> Step 11 comparing between CountPath_Max and MaxCyclomaticModified\n", ">> CountPath_Max has the average correlation of 0.447 with other metrics\n", ">> MaxCyclomaticModified has the average correlation of 0.448 with other metrics\n", ">> Exclude MaxCyclomaticModified\n", "> Step 12 comparing between CountStmtExe and CountLineCodeExe\n", ">> CountStmtExe has the average correlation of 0.473 with other metrics\n", ">> CountLineCodeExe has the average correlation of 0.475 with other metrics\n", ">> Exclude CountLineCodeExe\n", "> Step 13 comparing between SumEssential and CountDeclFunction\n", ">> SumEssential has the average correlation of 0.397 with other metrics\n", ">> CountDeclFunction has the average correlation of 0.379 with other metrics\n", ">> Exclude SumEssential\n", "> Step 14 comparing between CountPath_Max and MaxCyclomaticStrict\n", ">> CountPath_Max has the average correlation of 0.427 with other metrics\n", ">> MaxCyclomaticStrict has the average correlation of 0.428 with other metrics\n", ">> Exclude MaxCyclomaticStrict\n", "> Step 15 comparing between CountPath_Max and CountPath_Mean\n", ">> CountPath_Max has the average correlation of 0.416 with other metrics\n", ">> CountPath_Mean has the average correlation of 0.399 with other metrics\n", ">> Exclude CountPath_Max\n", "> Step 16 comparing between AvgCyclomaticStrict and AvgCyclomaticModified\n", ">> AvgCyclomaticStrict has the average correlation of 0.337 with other metrics\n", ">> AvgCyclomaticModified has the average correlation of 0.33 with other metrics\n", ">> Exclude AvgCyclomaticStrict\n", "> Step 17 comparing between CountDeclFunction and CountDeclInstanceMethod\n", ">> CountDeclFunction has the average correlation of 0.364 with other metrics\n", ">> CountDeclInstanceMethod has the average correlation of 0.342 with other metrics\n", ">> Exclude CountDeclFunction\n", "> Step 18 comparing between CountSemicolon and CountLineCodeDecl\n", ">> CountSemicolon has the average correlation of 0.436 with other metrics\n", ">> CountLineCodeDecl has the average correlation of 0.421 with other metrics\n", ">> Exclude CountSemicolon\n", "> Step 19 comparing between CountLine and CountLineBlank\n", ">> CountLine has the average correlation of 0.413 with other metrics\n", ">> CountLineBlank has the average correlation of 0.372 with other metrics\n", ">> Exclude CountLine\n", "> Step 20 comparing between MaxNesting_Mean and CountPath_Mean\n", ">> MaxNesting_Mean has the average correlation of 0.33 with other metrics\n", ">> CountPath_Mean has the average correlation of 0.365 with other metrics\n", ">> Exclude CountPath_Mean\n", "> Step 21 comparing between MaxNesting_Max and MaxNesting_Mean\n", ">> MaxNesting_Max has the average correlation of 0.337 with other metrics\n", ">> MaxNesting_Mean has the average correlation of 0.316 with other metrics\n", ">> Exclude MaxNesting_Max\n", "> Step 22 comparing between CountOutput_Mean and AvgLineCode\n", ">> CountOutput_Mean has the average correlation of 0.284 with other metrics\n", ">> AvgLineCode has the average correlation of 0.317 with other metrics\n", ">> Exclude AvgLineCode\n", "> Step 23 comparing between CountLineCodeDecl and SumCyclomaticModified\n", ">> CountLineCodeDecl has the average correlation of 0.385 with other metrics\n", ">> SumCyclomaticModified has the average correlation of 0.375 with other metrics\n", ">> Exclude CountLineCodeDecl\n", "> Step 24 comparing between CountPath_Min and MaxNesting_Min\n", ">> CountPath_Min has the average correlation of 0.083 with other metrics\n", ">> MaxNesting_Min has the average correlation of 0.077 with other metrics\n", ">> Exclude CountPath_Min\n", "> Step 25 comparing between CountDeclInstanceMethod and SumCyclomaticModified\n", ">> CountDeclInstanceMethod has the average correlation of 0.304 with other metrics\n", ">> SumCyclomaticModified has the average correlation of 0.371 with other metrics\n", ">> Exclude SumCyclomaticModified\n", "> Step 26 comparing between RatioCommentToCode and CountStmtExe\n", ">> RatioCommentToCode has the average correlation of 0.341 with other metrics\n", ">> CountStmtExe has the average correlation of 0.379 with other metrics\n", ">> Exclude CountStmtExe\n", "> Step 27 comparing between CountInput_Max and CountInput_Mean\n", ">> CountInput_Max has the average correlation of 0.293 with other metrics\n", ">> CountInput_Mean has the average correlation of 0.232 with other metrics\n", ">> Exclude CountInput_Max\n", "> Step 28 comparing between CountOutput_Max and CountOutput_Mean\n", ">> CountOutput_Max has the average correlation of 0.329 with other metrics\n", ">> CountOutput_Mean has the average correlation of 0.259 with other metrics\n", ">> Exclude CountOutput_Max\n", "> Step 29 comparing between MaxNesting_Mean and AvgCyclomaticModified\n", ">> MaxNesting_Mean has the average correlation of 0.257 with other metrics\n", ">> AvgCyclomaticModified has the average correlation of 0.247 with other metrics\n", ">> Exclude MaxNesting_Mean\n", "> Step 30 comparing between Added_lines and Del_lines\n", ">> Added_lines has the average correlation of 0.294 with other metrics\n", ">> Del_lines has the average correlation of 0.291 with other metrics\n", ">> Exclude Added_lines\n", "> Step 31 comparing between CountLineBlank and CountDeclInstanceMethod\n", ">> CountLineBlank has the average correlation of 0.299 with other metrics\n", ">> CountDeclInstanceMethod has the average correlation of 0.258 with other metrics\n", ">> Exclude CountLineBlank\n", "> Step 32 comparing between MINOR_LINE and OWN_LINE\n", ">> MINOR_LINE has the average correlation of 0.08 with other metrics\n", ">> OWN_LINE has the average correlation of 0.078 with other metrics\n", ">> Exclude MINOR_LINE\n", "> Step 33 comparing between CountDeclInstanceMethod and CountDeclMethodPublic\n", ">> CountDeclInstanceMethod has the average correlation of 0.246 with other metrics\n", ">> CountDeclMethodPublic has the average correlation of 0.232 with other metrics\n", ">> Exclude CountDeclInstanceMethod\n", "> Step 34 comparing between AvgLine and CountOutput_Mean\n", ">> AvgLine has the average correlation of 0.234 with other metrics\n", ">> CountOutput_Mean has the average correlation of 0.239 with other metrics\n", ">> Exclude CountOutput_Mean\n", "> Step 35 comparing between CountLineComment and AvgLineComment\n", ">> CountLineComment has the average correlation of 0.149 with other metrics\n", ">> AvgLineComment has the average correlation of 0.112 with other metrics\n", ">> Exclude CountLineComment\n", "> Step 36 comparing between Del_lines and ADEV\n", ">> Del_lines has the average correlation of 0.265 with other metrics\n", ">> ADEV has the average correlation of 0.233 with other metrics\n", ">> Exclude Del_lines\n", "According to Part 1 of AutoSpearman, ['ADEV', 'CountClassCoupled', 'AvgLine', 'OWN_LINE', 'CountDeclMethodProtected', 'CountDeclInstanceVariable', 'PercentLackOfCohesion', 'CountDeclClass', 'MAJOR_LINE', 'AvgLineBlank', 'CountDeclMethodPublic', 'CountInput_Mean', 'MaxNesting_Min', 'CountOutput_Min', 'CountDeclMethodDefault', 'AvgCyclomaticModified', 'CountInput_Min', 'CountDeclClassMethod', 'CountClassDerived', 'AvgLineComment', 'CountDeclClassVariable', 'CountClassBase', 'OWN_COMMIT', 'MaxInheritanceTree', 'CountDeclMethodPrivate', 'MINOR_COMMIT', 'AvgEssential', 'COMM', 'RatioCommentToCode'] are selected.\n", "(Part 2) Automatically select non-correlated metrics based on a Variance Inflation Factor analysis\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\micha\\miniconda3\\lib\\site-packages\\statsmodels\\tsa\\tsatools.py:142: FutureWarning: In a future version of pandas all arguments of concat except for the argument 'objs' will be keyword-only\n", " x = pd.concat(x[::order], 1)\n", "C:\\Users\\micha\\miniconda3\\lib\\site-packages\\statsmodels\\stats\\outliers_influence.py:193: RuntimeWarning: divide by zero encountered in double_scalars\n", " vif = 1. / (1. - r_squared_i)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> Step 1 - exclude ADEV\n", "> Step 2 - exclude AvgLine\n", "Finally, according to Part 2 of AutoSpearman, Index(['CountClassCoupled', 'OWN_LINE', 'CountDeclMethodProtected',\n", " 'CountDeclInstanceVariable', 'PercentLackOfCohesion', 'CountDeclClass',\n", " 'MAJOR_LINE', 'AvgLineBlank', 'CountDeclMethodPublic',\n", " 'CountInput_Mean', 'MaxNesting_Min', 'CountOutput_Min',\n", " 'CountDeclMethodDefault', 'AvgCyclomaticModified', 'CountInput_Min',\n", " 'const', 'CountDeclClassMethod', 'CountClassDerived', 'AvgLineComment',\n", " 'CountDeclClassVariable', 'CountClassBase', 'OWN_COMMIT',\n", " 'MaxInheritanceTree', 'CountDeclMethodPrivate', 'MINOR_COMMIT',\n", " 'AvgEssential', 'COMM', 'RatioCommentToCode'],\n", " dtype='object') are selected.\n", "27 out of 65 were selected via AutoSpearman feature selection process\n", "feature cols: \n", "\n", " CountClassCoupled \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 2 \n", "\n", " OWN_LINE \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 1.0 \n", "\n", " CountDeclMethodProtected \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 7 \n", "\n", " CountDeclInstanceVariable \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 3 \n", "\n", " PercentLackOfCohesion \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 61 \n", "\n", " CountDeclClass \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 1 \n", "\n", " MAJOR_LINE AvgLineBlank \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0 1 \n", "\n", " CountDeclMethodPublic \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "\n", " CountInput_Mean ... \\\n", "File ... \n", "activemq-console/src/main/java/org/apache/activ... 2.714286 ... \n", "\n", " AvgLineComment \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 6 \n", "\n", " CountDeclClassVariable \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "\n", " CountClassBase \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 1 \n", "\n", " OWN_COMMIT \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 1.0 \n", "\n", " MaxInheritanceTree \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 2 \n", "\n", " CountDeclMethodPrivate \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "\n", " MINOR_COMMIT \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0 \n", "\n", " AvgEssential COMM \\\n", "File \n", "activemq-console/src/main/java/org/apache/activ... 2 1 \n", "\n", " RatioCommentToCode \n", "File \n", "activemq-console/src/main/java/org/apache/activ... 0.7 \n", "\n", "[1 rows x 27 columns] \n", "\n", "\n", "label col: \n", "\n", " File\n", "activemq-console/src/main/java/org/apache/activemq/console/command/AbstractAmqCommand.java False\n", "Name: RealBug, dtype: bool\n" ] } ], "source": [ "from pyexplainer.pyexplainer_pyexplainer import AutoSpearman\n", "# select all rows, and all feature cols\n", "# the last col, which is label col, is not selected\n", "X = df.iloc[:, :-1]\n", "total_features = len(X.columns)\n", "\n", "# apply feature selection function to our feature DataFrame\n", "X = AutoSpearman(X)\n", "selected = len(X.columns)\n", "\n", "# select all rows, and the last label col\n", "y = df.iloc[:, -1]\n", "\n", "print(selected, \" out of \", total_features, \" were selected via AutoSpearman feature selection process\")\n", "print('feature cols:', '\\n\\n', X.head(1), '\\n\\n')\n", "print('label col:', '\\n\\n', y.head(1))" ] }, { "cell_type": "markdown", "id": "offshore-cattle", "metadata": {}, "source": [ "### 1.4 Split data into training and testing set" ] }, { "cell_type": "code", "execution_count": 9, "id": "absolute-diversity", "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "# 70% training and 30% test\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)" ] }, { "cell_type": "markdown", "id": "fewer-webmaster", "metadata": {}, "source": [ "## 2. Training and Predicting" ] }, { "cell_type": "markdown", "id": "increased-queensland", "metadata": {}, "source": [ "### 2.1 Train a RandomForest model using sklearn" ] }, { "cell_type": "code", "execution_count": 10, "id": "lesser-coupon", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RandomForestClassifier(random_state=0)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "\n", "rf_model = RandomForestClassifier(n_estimators=100, max_depth=None, min_samples_split=2, random_state=0)\n", "rf_model.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "id": "early-admission", "metadata": {}, "source": [ "### 2.2 Generate predictions" ] }, { "cell_type": "code", "execution_count": 11, "id": "detailed-isaac", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>PredictedBug</th>\n", " </tr>\n", " <tr>\n", " <th>File</th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>activemq-core/src/main/java/org/apache/activemq/kaha/MapContainer.java</th>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>activemq-core/src/main/java/org/apache/activemq/openwire/v3/MessageAckMarshaller.java</th>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>activemq-core/src/main/java/org/apache/activemq/ConnectionFailedException.java</th>\n", " <td>False</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " PredictedBug\n", "File \n", "activemq-core/src/main/java/org/apache/activemq... False\n", "activemq-core/src/main/java/org/apache/activemq... False\n", "activemq-core/src/main/java/org/apache/activemq... False" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# generate prediction from the model, which will return a list of predicted labels\n", "y_preds = rf_model.predict(X_test) \n", "# create a DataFrame which only has predicted label column\n", "y_preds = pd.DataFrame(data={'PredictedBug': y_preds}, index=y_test.index) \n", "y_preds.head(3)" ] }, { "cell_type": "markdown", "id": "inappropriate-decline", "metadata": {}, "source": [ "## 3. Prediction post processing" ] }, { "cell_type": "markdown", "id": "weekly-homeless", "metadata": {}, "source": [ "### 3.1 Combine feature cols, label col, and the predicted col in testing set" ] }, { "cell_type": "code", "execution_count": 12, "id": "steady-comparison", "metadata": {}, "outputs": [], "source": [ "combined_testing_data = X_test.join(y_test.to_frame())\n", "combined_testing_data = combined_testing_data.join(y_preds)\n", "combined_testing_data.head(3)\n", "# total num of rows\n", "total_rows = len(combined_testing_data)" ] }, { "cell_type": "markdown", "id": "viral-limit", "metadata": {}, "source": [ "### 3.2 Filter out wronly predicted rows " ] }, { "cell_type": "code", "execution_count": 13, "id": "median-element", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The model correctly predicted 90.60000000000001 % of testing data\n" ] } ], "source": [ "correctly_predicted_data = combined_testing_data[combined_testing_data['RealBug']==combined_testing_data['PredictedBug']]\n", "correctly_predicted_rows = len(correctly_predicted_data)\n", "print('The model correctly predicted ', round((correctly_predicted_rows / total_rows), 3) * 100, '% of testing data')" ] }, { "cell_type": "markdown", "id": "single-compression", "metadata": {}, "source": [ "### 3.3 We focus on the bug file, therefore, filter out the non-buggy file" ] }, { "cell_type": "code", "execution_count": 14, "id": "welsh-aspect", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>CountClassCoupled</th>\n", " <th>OWN_LINE</th>\n", " <th>CountDeclMethodProtected</th>\n", " <th>CountDeclInstanceVariable</th>\n", " <th>PercentLackOfCohesion</th>\n", " <th>CountDeclClass</th>\n", " <th>MAJOR_LINE</th>\n", " <th>AvgLineBlank</th>\n", " <th>CountDeclMethodPublic</th>\n", " <th>CountInput_Mean</th>\n", " <th>...</th>\n", " <th>CountClassBase</th>\n", " <th>OWN_COMMIT</th>\n", " <th>MaxInheritanceTree</th>\n", " <th>CountDeclMethodPrivate</th>\n", " <th>MINOR_COMMIT</th>\n", " <th>AvgEssential</th>\n", " <th>COMM</th>\n", " <th>RatioCommentToCode</th>\n", " <th>RealBug</th>\n", " <th>PredictedBug</th>\n", " </tr>\n", " <tr>\n", " <th>File</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>activemq-core/src/test/java/org/apache/activemq/transport/fanout/FanoutTransportBrokerTest.java</th>\n", " <td>12</td>\n", " <td>0.738916</td>\n", " <td>3</td>\n", " <td>2</td>\n", " <td>77</td>\n", " <td>3</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>8</td>\n", " <td>2.181818</td>\n", " <td>...</td>\n", " <td>2</td>\n", " <td>0.800000</td>\n", " <td>6</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>5</td>\n", " <td>0.27</td>\n", " <td>True</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>activemq-core/src/main/java/org/apache/activemq/ActiveMQMessageConsumer.java</th>\n", " <td>27</td>\n", " <td>0.569082</td>\n", " <td>10</td>\n", " <td>21</td>\n", " <td>89</td>\n", " <td>5</td>\n", " <td>2</td>\n", " <td>0</td>\n", " <td>35</td>\n", " <td>4.807692</td>\n", " <td>...</td>\n", " <td>4</td>\n", " <td>0.500000</td>\n", " <td>1</td>\n", " <td>5</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>10</td>\n", " <td>0.42</td>\n", " <td>True</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>activemq-openwire-generator/src/main/java/org/apache/activemq/openwire/tool/SingleSourceGenerator.java</th>\n", " <td>0</td>\n", " <td>0.995781</td>\n", " <td>8</td>\n", " <td>8</td>\n", " <td>88</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>20</td>\n", " <td>1.428571</td>\n", " <td>...</td>\n", " <td>1</td>\n", " <td>0.666667</td>\n", " <td>2</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>3</td>\n", " <td>0.15</td>\n", " <td>True</td>\n", " <td>True</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>3 rows × 29 columns</p>\n", "</div>" ], "text/plain": [ " CountClassCoupled \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 12 \n", "activemq-core/src/main/java/org/apache/activemq... 27 \n", "activemq-openwire-generator/src/main/java/org/a... 0 \n", "\n", " OWN_LINE \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0.738916 \n", "activemq-core/src/main/java/org/apache/activemq... 0.569082 \n", "activemq-openwire-generator/src/main/java/org/a... 0.995781 \n", "\n", " CountDeclMethodProtected \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 3 \n", "activemq-core/src/main/java/org/apache/activemq... 10 \n", "activemq-openwire-generator/src/main/java/org/a... 8 \n", "\n", " CountDeclInstanceVariable \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 2 \n", "activemq-core/src/main/java/org/apache/activemq... 21 \n", "activemq-openwire-generator/src/main/java/org/a... 8 \n", "\n", " PercentLackOfCohesion \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 77 \n", "activemq-core/src/main/java/org/apache/activemq... 89 \n", "activemq-openwire-generator/src/main/java/org/a... 88 \n", "\n", " CountDeclClass \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 3 \n", "activemq-core/src/main/java/org/apache/activemq... 5 \n", "activemq-openwire-generator/src/main/java/org/a... 1 \n", "\n", " MAJOR_LINE AvgLineBlank \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 1 1 \n", "activemq-core/src/main/java/org/apache/activemq... 2 0 \n", "activemq-openwire-generator/src/main/java/org/a... 1 0 \n", "\n", " CountDeclMethodPublic \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 8 \n", "activemq-core/src/main/java/org/apache/activemq... 35 \n", "activemq-openwire-generator/src/main/java/org/a... 20 \n", "\n", " CountInput_Mean ... \\\n", "File ... \n", "activemq-core/src/test/java/org/apache/activemq... 2.181818 ... \n", "activemq-core/src/main/java/org/apache/activemq... 4.807692 ... \n", "activemq-openwire-generator/src/main/java/org/a... 1.428571 ... \n", "\n", " CountClassBase \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 2 \n", "activemq-core/src/main/java/org/apache/activemq... 4 \n", "activemq-openwire-generator/src/main/java/org/a... 1 \n", "\n", " OWN_COMMIT \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0.800000 \n", "activemq-core/src/main/java/org/apache/activemq... 0.500000 \n", "activemq-openwire-generator/src/main/java/org/a... 0.666667 \n", "\n", " MaxInheritanceTree \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 6 \n", "activemq-core/src/main/java/org/apache/activemq... 1 \n", "activemq-openwire-generator/src/main/java/org/a... 2 \n", "\n", " CountDeclMethodPrivate \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0 \n", "activemq-core/src/main/java/org/apache/activemq... 5 \n", "activemq-openwire-generator/src/main/java/org/a... 0 \n", "\n", " MINOR_COMMIT \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0 \n", "activemq-core/src/main/java/org/apache/activemq... 0 \n", "activemq-openwire-generator/src/main/java/org/a... 0 \n", "\n", " AvgEssential COMM \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 1 5 \n", "activemq-core/src/main/java/org/apache/activemq... 1 10 \n", "activemq-openwire-generator/src/main/java/org/a... 1 3 \n", "\n", " RatioCommentToCode \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0.27 \n", "activemq-core/src/main/java/org/apache/activemq... 0.42 \n", "activemq-openwire-generator/src/main/java/org/a... 0.15 \n", "\n", " RealBug PredictedBug \n", "File \n", "activemq-core/src/test/java/org/apache/activemq... True True \n", "activemq-core/src/main/java/org/apache/activemq... True True \n", "activemq-openwire-generator/src/main/java/org/a... True True \n", "\n", "[3 rows x 29 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "correctly_predicted_bug = correctly_predicted_data[correctly_predicted_data['RealBug']==True]\n", "correctly_predicted_bug.head(3)" ] }, { "cell_type": "markdown", "id": "systematic-equipment", "metadata": {}, "source": [ "### 3.4 Define feature cols and label col using correctly predicted testing data" ] }, { "cell_type": "code", "execution_count": 15, "id": "shaped-confirmation", "metadata": {}, "outputs": [], "source": [ "# select all rows and feature cols\n", "feature_cols = correctly_predicted_bug.iloc[:, :-2]\n", "# selected all rows and one label col (either RealBug or PredictedBug is fine since they are the same)\n", "label_col = correctly_predicted_bug.iloc[:, -2]" ] }, { "cell_type": "markdown", "id": "vocational-spending", "metadata": {}, "source": [ "### 3.5 Select one row of correctly predicted bug to be explained" ] }, { "cell_type": "code", "execution_count": 16, "id": "cross-rhythm", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "one row of feature: \n", "\n", " CountClassCoupled \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 12 \n", "\n", " OWN_LINE \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0.738916 \n", "\n", " CountDeclMethodProtected \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 3 \n", "\n", " CountDeclInstanceVariable \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 2 \n", "\n", " PercentLackOfCohesion \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 77 \n", "\n", " CountDeclClass \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 3 \n", "\n", " MAJOR_LINE AvgLineBlank \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 1 1 \n", "\n", " CountDeclMethodPublic \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 8 \n", "\n", " CountInput_Mean ... \\\n", "File ... \n", "activemq-core/src/test/java/org/apache/activemq... 2.181818 ... \n", "\n", " AvgLineComment \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 2 \n", "\n", " CountDeclClassVariable \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 1 \n", "\n", " CountClassBase \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 2 \n", "\n", " OWN_COMMIT \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0.8 \n", "\n", " MaxInheritanceTree \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 6 \n", "\n", " CountDeclMethodPrivate \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0 \n", "\n", " MINOR_COMMIT \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0 \n", "\n", " AvgEssential COMM \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 1 5 \n", "\n", " RatioCommentToCode \n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0.27 \n", "\n", "[1 rows x 27 columns] \n", "\n", "one row of label: \n", "\n", " File\n", "activemq-core/src/test/java/org/apache/activemq/transport/fanout/FanoutTransportBrokerTest.java True\n", "Name: RealBug, dtype: bool\n" ] } ], "source": [ "# decide which row to be selected\n", "selected_row = 0\n", "# select the row in X_test which contains all of the feature values\n", "X_explain = feature_cols.iloc[[selected_row]]\n", "# select the corresponding label from the DataFrame that we just created above\n", "y_explain = label_col.iloc[[selected_row]]\n", "print('one row of feature:', '\\n\\n', X_explain, '\\n')\n", "print('one row of label:', '\\n\\n', y_explain)" ] }, { "cell_type": "markdown", "id": "second-overhead", "metadata": {}, "source": [ "## 4. Create rules (explanations) and visualise it !" ] }, { "cell_type": "markdown", "id": "private-symbol", "metadata": {}, "source": [ "### 4.1 Initialise a PyExplainer object" ] }, { "cell_type": "code", "execution_count": 17, "id": "coastal-assist", "metadata": {}, "outputs": [], "source": [ "from pyexplainer import pyexplainer_pyexplainer\n", "\n", "py_explainer = pyexplainer_pyexplainer.PyExplainer(X_train = X_train,\n", " y_train = y_train,\n", " indep = X_train.columns,\n", " dep = 'RealBug',\n", " blackbox_model = rf_model)" ] }, { "cell_type": "markdown", "id": "fatal-leather", "metadata": {}, "source": [ "### 4.2 Create rules by triggering explain function under PyExplainer object\n", "##### Attention: This step can be time-consuming" ] }, { "cell_type": "code", "execution_count": 18, "id": "floating-milwaukee", "metadata": {}, "outputs": [], "source": [ "rules = py_explainer.explain(X_explain=X_explain,\n", " y_explain=y_explain,\n", " search_function='crossoverinterpolation')" ] }, { "cell_type": "markdown", "id": "interim-banner", "metadata": {}, "source": [ "##### Those created rules are stored in a dictionary, for more information about what is contained in each key, please refer to 'Appendix' part" ] }, { "cell_type": "code", "execution_count": 19, "id": "short-atlanta", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['synthetic_data', 'synthetic_predictions', 'X_explain', 'y_explain', 'indep', 'dep', 'top_k_positive_rules', 'top_k_negative_rules', 'local_rulefit_model'])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rules.keys()" ] }, { "cell_type": "markdown", "id": "desperate-profit", "metadata": {}, "source": [ "### 4.3 Simply trigger visualise function under PyExplainer object to visualise the created rules " ] }, { "cell_type": "code", "execution_count": 20, "id": "fourth-queensland", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a2baa847c99441bd982b8792e092f003", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(Label(value='Risk Score: '), FloatProgress(value=0.0, bar_style='info', layout=Layout(width='40…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "07d0fafe28994776bc5666a43c492587", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output(layout=Layout(border='3px solid black'))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f6f9d182fd2949dbbcc534fd2bfd17d9", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatSlider(value=1.0, continuous_update=False, description='#1 The value of CountDeclClassVariable is more th…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "25fbbd2765cc4afba015f97b46dfdfc8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatSlider(value=0.0, continuous_update=False, description='#2 The value of CountDeclMethodPrivate is more th…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "py_explainer.visualise(rules)" ] }, { "cell_type": "markdown", "id": "olive-filename", "metadata": {}, "source": [ "# Appendix" ] }, { "cell_type": "markdown", "id": "closed-draft", "metadata": {}, "source": [ "## The detail of variables used to to create PyExplainer" ] }, { "cell_type": "markdown", "id": "sufficient-vault", "metadata": {}, "source": [ "### Synthetic_data\n", "\n", "Synthetic_data is data that are generated by PyExplainer using one of the following approaches.\n", "\n", "1. Crossover and Interpolation\n", "2. Random Perturbation.\n", "\n", "After Synthetic_data is generated, it is stored as a pandas DataFrame object. " ] }, { "cell_type": "code", "execution_count": 21, "id": "vital-dynamics", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of pyExp_rule_obj['synthetic_data'] - <class 'pandas.core.frame.DataFrame'> \n", "\n", "Example\n" ] }, { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>CountClassCoupled</th>\n", " <th>OWN_LINE</th>\n", " <th>CountDeclMethodProtected</th>\n", " <th>CountDeclInstanceVariable</th>\n", " <th>PercentLackOfCohesion</th>\n", " <th>CountDeclClass</th>\n", " <th>MAJOR_LINE</th>\n", " <th>AvgLineBlank</th>\n", " <th>CountDeclMethodPublic</th>\n", " <th>CountInput_Mean</th>\n", " <th>...</th>\n", " <th>AvgLineComment</th>\n", " <th>CountDeclClassVariable</th>\n", " <th>CountClassBase</th>\n", " <th>OWN_COMMIT</th>\n", " <th>MaxInheritanceTree</th>\n", " <th>CountDeclMethodPrivate</th>\n", " <th>MINOR_COMMIT</th>\n", " <th>AvgEssential</th>\n", " <th>COMM</th>\n", " <th>RatioCommentToCode</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1.0</td>\n", " <td>0.75</td>\n", " <td>1.0</td>\n", " <td>2.0</td>\n", " <td>42.0</td>\n", " <td>1.0</td>\n", " <td>0.0</td>\n", " <td>3.0</td>\n", " <td>6.0</td>\n", " <td>3.57</td>\n", " <td>...</td>\n", " <td>3.0</td>\n", " <td>1.0</td>\n", " <td>1.0</td>\n", " <td>0.8</td>\n", " <td>3.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>1.0</td>\n", " <td>5.0</td>\n", " <td>0.27</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>1.0</td>\n", " <td>1.00</td>\n", " <td>0.0</td>\n", " <td>2.0</td>\n", " <td>68.0</td>\n", " <td>1.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>5.0</td>\n", " <td>2.60</td>\n", " <td>...</td>\n", " <td>0.0</td>\n", " <td>3.0</td>\n", " <td>1.0</td>\n", " <td>1.0</td>\n", " <td>6.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>1.0</td>\n", " <td>2.0</td>\n", " <td>0.64</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>2 rows × 27 columns</p>\n", "</div>" ], "text/plain": [ " CountClassCoupled OWN_LINE CountDeclMethodProtected \\\n", "0 1.0 0.75 1.0 \n", "1 1.0 1.00 0.0 \n", "\n", " CountDeclInstanceVariable PercentLackOfCohesion CountDeclClass \\\n", "0 2.0 42.0 1.0 \n", "1 2.0 68.0 1.0 \n", "\n", " MAJOR_LINE AvgLineBlank CountDeclMethodPublic CountInput_Mean ... \\\n", "0 0.0 3.0 6.0 3.57 ... \n", "1 0.0 0.0 5.0 2.60 ... \n", "\n", " AvgLineComment CountDeclClassVariable CountClassBase OWN_COMMIT \\\n", "0 3.0 1.0 1.0 0.8 \n", "1 0.0 3.0 1.0 1.0 \n", "\n", " MaxInheritanceTree CountDeclMethodPrivate MINOR_COMMIT AvgEssential \\\n", "0 3.0 0.0 0.0 1.0 \n", "1 6.0 0.0 0.0 1.0 \n", "\n", " COMM RatioCommentToCode \n", "0 5.0 0.27 \n", "1 2.0 0.64 \n", "\n", "[2 rows x 27 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(\"Type of pyExp_rule_obj['synthetic_data'] - \", type(rules['synthetic_data']), \"\\n\")\n", "\n", "print('Example')\n", "display(rules['synthetic_data'].head(2))" ] }, { "cell_type": "markdown", "id": "quality-examination", "metadata": {}, "source": [ "### Synthetic_predictions\n", "\n", "Synthetic_predictions is the prediction of Synthetic_data, which is obtained from the global model inside PyExplainer." ] }, { "cell_type": "code", "execution_count": 22, "id": "funny-ranch", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of pyExp_rule_obj['synthetic_predictions'] - <class 'numpy.ndarray'> \n", "\n", "Example \n", "\n", " [ True False False ... False False False]\n" ] } ], "source": [ "print(\"Type of pyExp_rule_obj['synthetic_predictions'] - \", type(rules['synthetic_predictions']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", rules['synthetic_predictions'])" ] }, { "cell_type": "markdown", "id": "removable-behalf", "metadata": {}, "source": [ "### X_explain\n", "\n", "X_explain is an instance to be explained (which is a defective commit in this context)" ] }, { "cell_type": "code", "execution_count": 23, "id": "compound-jewel", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of pyExp_rule_obj['X_explain'] - <class 'pandas.core.frame.DataFrame'> \n", "\n", "Example\n" ] }, { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>CountClassCoupled</th>\n", " <th>OWN_LINE</th>\n", " <th>CountDeclMethodProtected</th>\n", " <th>CountDeclInstanceVariable</th>\n", " <th>PercentLackOfCohesion</th>\n", " <th>CountDeclClass</th>\n", " <th>MAJOR_LINE</th>\n", " <th>AvgLineBlank</th>\n", " <th>CountDeclMethodPublic</th>\n", " <th>CountInput_Mean</th>\n", " <th>...</th>\n", " <th>AvgLineComment</th>\n", " <th>CountDeclClassVariable</th>\n", " <th>CountClassBase</th>\n", " <th>OWN_COMMIT</th>\n", " <th>MaxInheritanceTree</th>\n", " <th>CountDeclMethodPrivate</th>\n", " <th>MINOR_COMMIT</th>\n", " <th>AvgEssential</th>\n", " <th>COMM</th>\n", " <th>RatioCommentToCode</th>\n", " </tr>\n", " <tr>\n", " <th>File</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>activemq-core/src/test/java/org/apache/activemq/transport/fanout/FanoutTransportBrokerTest.java</th>\n", " <td>12</td>\n", " <td>0.738916</td>\n", " <td>3</td>\n", " <td>2</td>\n", " <td>77</td>\n", " <td>3</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>8</td>\n", " <td>2.181818</td>\n", " <td>...</td>\n", " <td>2</td>\n", " <td>1</td>\n", " <td>2</td>\n", " <td>0.8</td>\n", " <td>6</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>5</td>\n", " <td>0.27</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>1 rows × 27 columns</p>\n", "</div>" ], "text/plain": [ " CountClassCoupled \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 12 \n", "\n", " OWN_LINE \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0.738916 \n", "\n", " CountDeclMethodProtected \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 3 \n", "\n", " CountDeclInstanceVariable \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 2 \n", "\n", " PercentLackOfCohesion \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 77 \n", "\n", " CountDeclClass \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 3 \n", "\n", " MAJOR_LINE AvgLineBlank \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 1 1 \n", "\n", " CountDeclMethodPublic \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 8 \n", "\n", " CountInput_Mean ... \\\n", "File ... \n", "activemq-core/src/test/java/org/apache/activemq... 2.181818 ... \n", "\n", " AvgLineComment \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 2 \n", "\n", " CountDeclClassVariable \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 1 \n", "\n", " CountClassBase \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 2 \n", "\n", " OWN_COMMIT \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0.8 \n", "\n", " MaxInheritanceTree \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 6 \n", "\n", " CountDeclMethodPrivate \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0 \n", "\n", " MINOR_COMMIT \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0 \n", "\n", " AvgEssential COMM \\\n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 1 5 \n", "\n", " RatioCommentToCode \n", "File \n", "activemq-core/src/test/java/org/apache/activemq... 0.27 \n", "\n", "[1 rows x 27 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(\"Type of pyExp_rule_obj['X_explain'] - \", type(rules['X_explain']), \"\\n\")\n", "\n", "print('Example')\n", "display(rules['X_explain'])" ] }, { "cell_type": "markdown", "id": "ongoing-drilling", "metadata": {}, "source": [ "### y_explain\n", "\n", "y_explain is a label of X_explain " ] }, { "cell_type": "code", "execution_count": 24, "id": "antique-straight", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of pyExp_rule_obj['y_explain'] - <class 'pandas.core.series.Series'> \n", "\n", "Example \n", "\n", " File\n", "activemq-core/src/test/java/org/apache/activemq/transport/fanout/FanoutTransportBrokerTest.java True\n", "Name: RealBug, dtype: bool\n" ] } ], "source": [ "print(\"Type of pyExp_rule_obj['y_explain'] - \", type(rules['y_explain']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", rules['y_explain'])" ] }, { "cell_type": "markdown", "id": "narrative-senator", "metadata": {}, "source": [ "### indep\n", "#### indep is feature names of X_explain" ] }, { "cell_type": "code", "execution_count": 25, "id": "sized-valuable", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of pyExp_rule_obj['indep'] - <class 'pandas.core.indexes.base.Index'> \n", "\n", "Example \n", "\n", " Index(['CountClassCoupled', 'OWN_LINE', 'CountDeclMethodProtected',\n", " 'CountDeclInstanceVariable', 'PercentLackOfCohesion', 'CountDeclClass',\n", " 'MAJOR_LINE', 'AvgLineBlank', 'CountDeclMethodPublic',\n", " 'CountInput_Mean', 'MaxNesting_Min', 'CountOutput_Min',\n", " 'CountDeclMethodDefault', 'AvgCyclomaticModified', 'CountInput_Min',\n", " 'CountDeclClassMethod', 'CountClassDerived', 'AvgLineComment',\n", " 'CountDeclClassVariable', 'CountClassBase', 'OWN_COMMIT',\n", " 'MaxInheritanceTree', 'CountDeclMethodPrivate', 'MINOR_COMMIT',\n", " 'AvgEssential', 'COMM', 'RatioCommentToCode'],\n", " dtype='object')\n" ] } ], "source": [ "print(\"Type of pyExp_rule_obj['indep'] - \", type(rules['indep']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", rules['indep'])" ] }, { "cell_type": "markdown", "id": "atlantic-charles", "metadata": {}, "source": [ "### dep\n", "#### dep is a label name" ] }, { "cell_type": "code", "execution_count": 26, "id": "moral-collectible", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of pyExp_rule_obj['dep'] - <class 'str'> \n", "\n", "Example \n", "\n", " RealBug\n" ] } ], "source": [ "print(\"Type of pyExp_rule_obj['dep'] - \", type(rules['dep']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", rules['dep'])" ] }, { "cell_type": "markdown", "id": "compact-willow", "metadata": {}, "source": [ "### top_k_positive_rules\n", "\n", "top_k_positive_rules is top-k rules that are genereated by PyExplainer to explain why a commit is predicted as defective.\n", "\n", "Here we show top-3 rules that lead to defective commits=" ] }, { "cell_type": "code", "execution_count": 27, "id": "measured-miller", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of pyExp_rule_obj['top_k_positive_rules'] - <class 'pandas.core.frame.DataFrame'> \n", "\n", "Example\n" ] }, { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>index</th>\n", " <th>rule</th>\n", " <th>type</th>\n", " <th>coef</th>\n", " <th>support</th>\n", " <th>importance</th>\n", " <th>is_satisfy_instance</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>161</td>\n", " <td>AvgLineComment > -3.6149998903274536 & CountCl...</td>\n", " <td>rule</td>\n", " <td>3.490109e-23</td>\n", " <td>0.271540</td>\n", " <td>1.552240e-23</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>562</td>\n", " <td>OWN_COMMIT <= 0.8349999785423279 & COMM > 2.98...</td>\n", " <td>rule</td>\n", " <td>3.639777e-23</td>\n", " <td>0.208877</td>\n", " <td>1.479593e-23</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>820</td>\n", " <td>CountClassBase <= 2.9850000143051147 & OWN_COM...</td>\n", " <td>rule</td>\n", " <td>3.484802e-23</td>\n", " <td>0.232376</td>\n", " <td>1.471797e-23</td>\n", " <td>True</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " index rule type \\\n", "0 161 AvgLineComment > -3.6149998903274536 & CountCl... rule \n", "1 562 OWN_COMMIT <= 0.8349999785423279 & COMM > 2.98... rule \n", "2 820 CountClassBase <= 2.9850000143051147 & OWN_COM... rule \n", "\n", " coef support importance is_satisfy_instance \n", "0 3.490109e-23 0.271540 1.552240e-23 True \n", "1 3.639777e-23 0.208877 1.479593e-23 True \n", "2 3.484802e-23 0.232376 1.471797e-23 True " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(\"Type of pyExp_rule_obj['top_k_positive_rules'] - \", type(rules['top_k_positive_rules']), \"\\n\")\n", "print('Example')\n", "display(rules['top_k_positive_rules'].head(3))" ] }, { "cell_type": "markdown", "id": "employed-choir", "metadata": {}, "source": [ "### top_k_negative_rules\n", "\n", "top_k_negative_rules is top-k negative rules that are genereated by PyExplainer to explain why a commit is predicted as clean.\n", "\n", "The default number of generated rules is 3.\n" ] }, { "cell_type": "code", "execution_count": 28, "id": "opposite-ownership", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of pyExp_rule_obj['top_k_negative_rules'] - <class 'pandas.core.frame.DataFrame'> \n", "\n", "Example\n" ] }, { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>rule</th>\n", " <th>type</th>\n", " <th>coef</th>\n", " <th>support</th>\n", " <th>importance</th>\n", " <th>Class</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>918</th>\n", " <td>OWN_COMMIT > 0.8550000190734863 & CountDeclCla...</td>\n", " <td>rule</td>\n", " <td>-4.819474e-23</td>\n", " <td>0.678851</td>\n", " <td>2.250298e-23</td>\n", " <td>Clean</td>\n", " </tr>\n", " <tr>\n", " <th>1652</th>\n", " <td>OWN_COMMIT > 0.8650000095367432</td>\n", " <td>rule</td>\n", " <td>-4.748976e-23</td>\n", " <td>0.689295</td>\n", " <td>2.197742e-23</td>\n", " <td>Clean</td>\n", " </tr>\n", " <tr>\n", " <th>1107</th>\n", " <td>CountDeclMethodPrivate <= 1.8399999737739563 &...</td>\n", " <td>rule</td>\n", " <td>-4.863102e-23</td>\n", " <td>0.725849</td>\n", " <td>2.169360e-23</td>\n", " <td>Clean</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " rule type coef \\\n", "918 OWN_COMMIT > 0.8550000190734863 & CountDeclCla... rule -4.819474e-23 \n", "1652 OWN_COMMIT > 0.8650000095367432 rule -4.748976e-23 \n", "1107 CountDeclMethodPrivate <= 1.8399999737739563 &... rule -4.863102e-23 \n", "\n", " support importance Class \n", "918 0.678851 2.250298e-23 Clean \n", "1652 0.689295 2.197742e-23 Clean \n", "1107 0.725849 2.169360e-23 Clean " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(\"Type of pyExp_rule_obj['top_k_negative_rules'] - \", type(rules['top_k_negative_rules']), \"\\n\")\n", "print('Example')\n", "display(rules['top_k_negative_rules'])" ] }, { "cell_type": "markdown", "id": "written-insertion", "metadata": {}, "source": [ "# Bug Report Channel\n", "#### Please report <a href=\"https://github.com/awsm-research/pyExplainer/issues\">here</a>\n", "#### 📧 or email your report to michaelfu1998@gmail.com" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }