{ "cells": [ { "cell_type": "markdown", "id": "dated-worst", "metadata": {}, "source": [ "#
πŸ‘©β€πŸ’» Welcome to PyExplainer Quickstart Guide πŸ‘¨β€πŸ’»
" ] }, { "cell_type": "markdown", "id": "legendary-military", "metadata": {}, "source": [ "###
pyexplainer - GitHub Repository
\n", "###
pyexplainer - PyPI
\n", "###
pyexplainer - Official Documentation
" ] }, { "cell_type": "markdown", "id": "pregnant-appraisal", "metadata": {}, "source": [ "# πŸ›  Installation \n", "## - Please ignore this part if you cloned the whole package from GitHub\n", "\n", "### πŸ€– Run the cell below to install pyexplainer 0.1.5" ] }, { "cell_type": "code", "execution_count": null, "id": "aboriginal-subject", "metadata": {}, "outputs": [], "source": [ "!pip install pyexplainer" ] }, { "cell_type": "markdown", "id": "ideal-bedroom", "metadata": {}, "source": [ "### πŸ€– If the code above did not work, try the cell below, otherwise, you are good to go!" ] }, { "cell_type": "code", "execution_count": null, "id": "alert-spirituality", "metadata": {}, "outputs": [], "source": [ "!pip3 install pyexplainer" ] }, { "cell_type": "markdown", "id": "regional-promise", "metadata": {}, "source": [ "# Let's get started !" ] }, { "cell_type": "markdown", "id": "forced-olympus", "metadata": {}, "source": [ "## πŸ‘©πŸ»β€πŸ”§ 1. Prepare data and model\n", "#### πŸ“Note. We use the default data and model here for an example" ] }, { "cell_type": "markdown", "id": "crude-promise", "metadata": {}, "source": [ "### 1.1 Import Libraries Needed" ] }, { "cell_type": "code", "execution_count": 2, "id": "stunning-summer", "metadata": {}, "outputs": [], "source": [ "from pyexplainer import pyexplainer_pyexplainer\n", "from pyexplainer.pyexplainer_pyexplainer import PyExplainer " ] }, { "cell_type": "markdown", "id": "turned-ethics", "metadata": {}, "source": [ "### 1.2 Use default datasets and model (Random Forest)" ] }, { "cell_type": "code", "execution_count": 3, "id": "hearing-gambling", "metadata": {}, "outputs": [], "source": [ "default_data_and_model = pyexplainer_pyexplainer.get_default_data_and_model()\n", "py_explainer = PyExplainer(X_train = default_data_and_model['X_train'],\n", " y_train = default_data_and_model['y_train'],\n", " indep = default_data_and_model['indep'],\n", " dep = default_data_and_model['dep'],\n", " blackbox_model = default_data_and_model['blackbox_model'])" ] }, { "cell_type": "markdown", "id": "appropriate-meaning", "metadata": {}, "source": [ "## πŸ”§2. Create a Rule Object Manually\n", "#### πŸ“Note. Rule Object is the core backend concept of PyExplainer ! " ] }, { "cell_type": "markdown", "id": "relevant-subcommittee", "metadata": {}, "source": [ "### 2.1 Prepare X_explain and y_explain data\n", "#### πŸ“Note. We use the default data here for an example" ] }, { "cell_type": "code", "execution_count": 4, "id": "changing-empty", "metadata": {}, "outputs": [], "source": [ "X_explain = default_data_and_model['X_explain']" ] }, { "cell_type": "code", "execution_count": 5, "id": "pleasant-avenue", "metadata": {}, "outputs": [], "source": [ "y_explain = default_data_and_model['y_explain']" ] }, { "cell_type": "markdown", "id": "mechanical-motion", "metadata": {}, "source": [ "### 2.2 Create the rule object" ] }, { "cell_type": "code", "execution_count": 6, "id": "awful-blank", "metadata": {}, "outputs": [], "source": [ "created_rule_obj = py_explainer.explain(X_explain=X_explain,\n", " y_explain=y_explain,\n", " search_function='crossoverinterpolation')" ] }, { "cell_type": "markdown", "id": "architectural-broad", "metadata": {}, "source": [ "## πŸ‘©πŸ½β€πŸŽ¨ 3. Pass Rule Object to .visualise(rule_obj) to Generate the Bullet Chart and Interactive Slider\n", "#### πŸ“Note. simply move the gray slider to modify the value so you can get a new risk score." ] }, { "cell_type": "markdown", "id": "returning-vertical", "metadata": {}, "source": [ "#### πŸ”§ Visualise the Rule Object we created manually using .explain(...) method" ] }, { "cell_type": "code", "execution_count": 7, "id": "cheap-minister", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1c988c320be64b47898cc057a226721b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(Label(value='Risk Score: '), FloatProgress(value=0.0, bar_style='info', layout=Layout(width='40…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "52e3c160dfca46dc89fe2a9850403a0d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatSlider(value=0.0, continuous_update=False, description='#1 Decrease the values of CountDeclMethodDefault …" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "65659f31f84d4b82ba22601163b93661", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatSlider(value=1.54, continuous_update=False, description='#2 Increase the values of RatioCommentToCode to …" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2babbcca8dd54c6995af457978ddfd2d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatSlider(value=1.0, continuous_update=False, description='#3 Decrease the values of AvgCyclomaticModified t…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9bd17a5b379e42f48e715dfff9f7e878", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output(layout=Layout(border='3px solid black'))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "py_explainer.visualise(created_rule_obj)" ] }, { "cell_type": "markdown", "id": "infectious-calgary", "metadata": {}, "source": [ "# 🀑 Important - Bug Report Channel 🀑\n", "#### Please report here\n", "#### πŸ“§ or email your report to michaelfu1998@gmail.com\n", "# " ] }, { "cell_type": "markdown", "id": "married-malpractice", "metadata": {}, "source": [ "#
πŸ™Thanks for playing around with PyExplainer, I really appreciate your time! πŸ™
\n", "####
πŸ”₯ More Features will be Released Soon πŸ”₯
" ] }, { "cell_type": "markdown", "id": "difficult-algeria", "metadata": {}, "source": [ "# πŸ“œ Appendex πŸ“œ" ] }, { "cell_type": "markdown", "id": "numeric-efficiency", "metadata": {}, "source": [ "## A. πŸ•΅πŸ» What's in the Rule Object (rule_obj) ? Let's unbox it ! πŸ“¦" ] }, { "cell_type": "markdown", "id": "increased-indicator", "metadata": {}, "source": [ "### Basic Data Check" ] }, { "cell_type": "code", "execution_count": null, "id": "respective-program", "metadata": {}, "outputs": [], "source": [ "print(\"Type of Rule Object: \", type(load_pyExp_rule_obj))\n", "print()\n", "print(\"All of the keys in Rule Object\")\n", "i = 1\n", "for k in load_pyExp_rule_obj.keys():\n", " print(\"Key \", i, \" - \",k)\n", " i += 1" ] }, { "cell_type": "markdown", "id": "forward-freeze", "metadata": {}, "source": [ "### πŸ”‘ Key 1 - synthetic_data\n", "#### As can be seen below, the synthetic data are data coming from feature columns\n", "#### This synthetic data was generated internally by the PyExplainer when the .explain(...) method is triggered\n", "#### Currently we have 2 approaches to generate synthetic_data\n", "#### Approach (1) Crossover and Interpolation\n", "#### Approach (2) Random Pertubation\n", "#### After the process of C&I. or RP., synthetic_data will be generated as a DataFrame below" ] }, { "cell_type": "code", "execution_count": null, "id": "laden-doubt", "metadata": {}, "outputs": [], "source": [ "print(\"Type of pyExp_rule_obj['synthetic_data'] - \", type(load_pyExp_rule_obj['synthetic_data']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", load_pyExp_rule_obj['synthetic_data'].head(2))" ] }, { "cell_type": "markdown", "id": "blind-mustang", "metadata": {}, "source": [ "### πŸ”‘ Key 2 - synthetic_predictions\n", "#### As can be seen below, the synthetic prediction are data coming from the prediction column\n", "#### This synthetic prediction was generated internally by the PyExplainer when the .explain(...) method is triggered\n", "#### This synthetic prediction is created based on the black box model we passed to the PyExplainer when initialising (section 1.5 & 2.3)\n", "#### This synthetic prediction is generated based on the synthetic data above therefore it's called synthetic_predictions\n", "#### >>> e.g. synthetic_predictions = blackbox_model.predict(synthetic_data) Note. we only need feature cols in synthetic_data" ] }, { "cell_type": "code", "execution_count": null, "id": "municipal-operation", "metadata": {}, "outputs": [], "source": [ "print(\"Type of pyExp_rule_obj['synthetic_predictions'] - \", type(load_pyExp_rule_obj['synthetic_predictions']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", load_pyExp_rule_obj['synthetic_predictions'])" ] }, { "cell_type": "markdown", "id": "adjacent-begin", "metadata": {}, "source": [ "### πŸ”‘ Key 3 - X_explain\n", "#### This X_explain is exactly the same as the one we passed to .explain(...) method (section 3.3 & section 3.4)" ] }, { "cell_type": "code", "execution_count": null, "id": "according-profession", "metadata": {}, "outputs": [], "source": [ "print(\"Type of pyExp_rule_obj['X_explain'] - \", type(load_pyExp_rule_obj['X_explain']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", load_pyExp_rule_obj['X_explain'])" ] }, { "cell_type": "markdown", "id": "spectacular-feeling", "metadata": {}, "source": [ "### πŸ”‘ Key 4 - y_explain\n", "#### This y_explain is exactly the same as the one we passed to .explain(...) method (section 3.3 & section 3.4)" ] }, { "cell_type": "code", "execution_count": null, "id": "present-breakdown", "metadata": {}, "outputs": [], "source": [ "print(\"Type of pyExp_rule_obj['y_explain'] - \", type(load_pyExp_rule_obj['y_explain']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", load_pyExp_rule_obj['y_explain'])" ] }, { "cell_type": "markdown", "id": "serious-territory", "metadata": {}, "source": [ "### πŸ”‘ Key 5 - indep\n", "#### Names of the Selected Feature Cols" ] }, { "cell_type": "code", "execution_count": null, "id": "sublime-ethernet", "metadata": {}, "outputs": [], "source": [ "print(\"Type of pyExp_rule_obj['indep'] - \", type(load_pyExp_rule_obj['indep']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", load_pyExp_rule_obj['indep'])" ] }, { "cell_type": "markdown", "id": "designed-barbados", "metadata": {}, "source": [ "### πŸ”‘ Key 6 - dep\n", "#### Names of the Label Col (Prediction Col)" ] }, { "cell_type": "code", "execution_count": null, "id": "agreed-health", "metadata": {}, "outputs": [], "source": [ "print(\"Type of pyExp_rule_obj['dep'] - \", type(load_pyExp_rule_obj['dep']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", load_pyExp_rule_obj['dep'])" ] }, { "cell_type": "markdown", "id": "noted-wiring", "metadata": {}, "source": [ "### πŸ”‘ Key 7 - top_k_positive_rules\n", "#### This shows the top k positive rules generated by the RuleFit model inside the .explain(...) function\n", "#### The value of 'top_k' can be tuned in when we create a Rule Object manually (section 3.4), the default value is 3" ] }, { "cell_type": "code", "execution_count": null, "id": "northern-registration", "metadata": {}, "outputs": [], "source": [ "print(\"Type of pyExp_rule_obj['top_k_positive_rules'] - \", type(load_pyExp_rule_obj['top_k_positive_rules']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", load_pyExp_rule_obj['top_k_positive_rules'])" ] }, { "cell_type": "markdown", "id": "contemporary-honolulu", "metadata": {}, "source": [ "### πŸ”‘ Key 8 - top_k_negative_rules\n", "#### This shows the top k negative rules generated by the RuleFit model inside the .explain(...) function\n", "#### The value of 'top_k' can be tuned in when we create a Rule Object manually (section 3.4), the default value is 3\n", "#### However, in the current version, the top_k value is always the same for both negative and positive rules which can be improved in the future version" ] }, { "cell_type": "code", "execution_count": null, "id": "swiss-costa", "metadata": {}, "outputs": [], "source": [ "print(\"Type of pyExp_rule_obj['top_k_negative_rules'] - \", type(load_pyExp_rule_obj['top_k_negative_rules']), \"\\n\")\n", "print(\"Example\", \"\\n\\n\", load_pyExp_rule_obj['top_k_negative_rules'])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 5 }