{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Interpretable or Accurate? Why not both?\n", "\n", "## Case Study: Predicting Employee Attrition Using Machine Learning\n", "\n", "The notebook contains the code for the accompanying blogpost titled [Interpretable or Accurate? Why not both?](https://towardsdatascience.com/interpretable-or-accurate-why-not-both-4d9c73512192?sk=2f44377541a2f49939c921e54eb3cde7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation\n", "\n", "Interpret is supported across Windows, Mac and Linux on Python 3.5+. Please refer the [documentation](https://interpret.ml/docs/getting-started.html) for more details.\n", "\n", "### pip\n", "pip install interpret\n", "\n", "### conda\n", "conda install -c interpretml interpret\n", "\n", "### source\n", "git clone https://github.com/interpretml/interpret.git && cd interpret/scripts && make install\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing necessary libraries\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import f1_score, accuracy_score\n", "\n", "from interpret import show\n", "from interpret import set_visualize_provider\n", "from interpret.provider import InlineProvider\n", "from interpret.data import ClassHistogram\n", "set_visualize_provider(InlineProvider())\n", "from interpret.glassbox import (\n", " LogisticRegression,\n", " ClassificationTree,\n", " ExplainableBoostingClassifier,\n", ")\n", "\n", "\n", "seed = 42" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing the Dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Age | \n", "Attrition | \n", "BusinessTravel | \n", "DailyRate | \n", "Department | \n", "DistanceFromHome | \n", "Education | \n", "EducationField | \n", "EmployeeCount | \n", "EmployeeNumber | \n", "... | \n", "RelationshipSatisfaction | \n", "StandardHours | \n", "StockOptionLevel | \n", "TotalWorkingYears | \n", "TrainingTimesLastYear | \n", "WorkLifeBalance | \n", "YearsAtCompany | \n", "YearsInCurrentRole | \n", "YearsSinceLastPromotion | \n", "YearsWithCurrManager | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "41 | \n", "Yes | \n", "Travel_Rarely | \n", "1102 | \n", "Sales | \n", "1 | \n", "2 | \n", "Life Sciences | \n", "1 | \n", "1 | \n", "... | \n", "1 | \n", "80 | \n", "0 | \n", "8 | \n", "0 | \n", "1 | \n", "6 | \n", "4 | \n", "0 | \n", "5 | \n", "
1 | \n", "49 | \n", "No | \n", "Travel_Frequently | \n", "279 | \n", "Research & Development | \n", "8 | \n", "1 | \n", "Life Sciences | \n", "1 | \n", "2 | \n", "... | \n", "4 | \n", "80 | \n", "1 | \n", "10 | \n", "3 | \n", "3 | \n", "10 | \n", "7 | \n", "1 | \n", "7 | \n", "
2 | \n", "37 | \n", "Yes | \n", "Travel_Rarely | \n", "1373 | \n", "Research & Development | \n", "2 | \n", "2 | \n", "Other | \n", "1 | \n", "4 | \n", "... | \n", "2 | \n", "80 | \n", "0 | \n", "7 | \n", "3 | \n", "3 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
3 | \n", "33 | \n", "No | \n", "Travel_Frequently | \n", "1392 | \n", "Research & Development | \n", "3 | \n", "4 | \n", "Life Sciences | \n", "1 | \n", "5 | \n", "... | \n", "3 | \n", "80 | \n", "0 | \n", "8 | \n", "3 | \n", "3 | \n", "8 | \n", "7 | \n", "3 | \n", "0 | \n", "
4 | \n", "27 | \n", "No | \n", "Travel_Rarely | \n", "591 | \n", "Research & Development | \n", "2 | \n", "1 | \n", "Medical | \n", "1 | \n", "7 | \n", "... | \n", "4 | \n", "80 | \n", "1 | \n", "6 | \n", "3 | \n", "3 | \n", "2 | \n", "2 | \n", "2 | \n", "2 | \n", "
5 rows × 35 columns
\n", "