{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "81e0620e", "metadata": {}, "source": [ "Last updated: 16 Feb 2023\n", "\n", "# 👋 PyCaret Multiclass Classification Tutorial\n", "\n", "PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.\n", "\n", "Compared with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with a few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks, such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more.\n", "\n", "The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more technical expertise.\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8116e19d", "metadata": {}, "source": [ "# 💻 Installation\n", "\n", "PyCaret is tested and supported on the following 64-bit systems:\n", "- Python 3.7 – 3.10\n", "- Python 3.9 for Ubuntu only\n", "- Ubuntu 16.04 or later\n", "- Windows 7 or later\n", "\n", "You can install PyCaret with Python's pip package manager:\n", "\n", "`pip install pycaret`\n", "\n", "PyCaret's default installation will not install all the extra dependencies automatically. For that you will have to install the full version:\n", "\n", "`pip install pycaret[full]`\n", "\n", "or depending on your use-case you may install one of the following variant:\n", "\n", "- `pip install pycaret[analysis]`\n", "- `pip install pycaret[models]`\n", "- `pip install pycaret[tuner]`\n", "- `pip install pycaret[mlops]`\n", "- `pip install pycaret[parallel]`\n", "- `pip install pycaret[test]`" ] }, { "cell_type": "code", "execution_count": 1, "id": "d7142a33", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'3.0.0'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check installed version\n", "import pycaret\n", "pycaret.__version__" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fb66e98d", "metadata": {}, "source": [ "# 🚀 Quick start" ] }, { "attachments": {}, "cell_type": "markdown", "id": "00347d44", "metadata": {}, "source": [ "PyCaret’s Classification Module is a supervised machine learning module that is used for classifying elements into groups. The goal is to predict the categorical class labels which are discrete and unordered. \n", "\n", "Some common use cases include predicting customer default (Yes or No), predicting customer churn (customer will leave or stay), the disease found (positive or negative). \n", "\n", "This module can be used for binary or multiclass problems. It provides several pre-processing features that prepare the data for modeling through the setup function. It has over 18 ready-to-use algorithms and several plots to analyze the performance of trained models.\n", "\n", "A typical workflow in PyCaret consist of following 5 steps in this order:\n", "\n", "## **Setup** ➡️ **Compare Models** ➡️ **Analyze Model** ➡️ **Prediction** ➡️ **Save Model**" ] }, { "cell_type": "code", "execution_count": 2, "id": "956dfdab", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal_lengthsepal_widthpetal_lengthpetal_widthspecies
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
45.03.61.40.2Iris-setosa
\n", "
" ], "text/plain": [ " sepal_length sepal_width petal_length petal_width species\n", "0 5.1 3.5 1.4 0.2 Iris-setosa\n", "1 4.9 3.0 1.4 0.2 Iris-setosa\n", "2 4.7 3.2 1.3 0.2 Iris-setosa\n", "3 4.6 3.1 1.5 0.2 Iris-setosa\n", "4 5.0 3.6 1.4 0.2 Iris-setosa" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# loading sample dataset from pycaret dataset module\n", "from pycaret.datasets import get_data\n", "data = get_data('iris')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c00f6a4a", "metadata": {}, "source": [ "## Setup\n", "This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function in PyCaret. It only has two required parameters i.e. `data` and `target`. All the other parameters are optional." ] }, { "cell_type": "code", "execution_count": 3, "id": "97f2c6c6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0Session id123
1Targetspecies
2Target typeMulticlass
3Target mappingIris-setosa: 0, Iris-versicolor: 1, Iris-virginica: 2
4Original data shape(150, 5)
5Transformed data shape(150, 5)
6Transformed train set shape(105, 5)
7Transformed test set shape(45, 5)
8Numeric features4
9PreprocessTrue
10Imputation typesimple
11Numeric imputationmean
12Categorical imputationmode
13Fold GeneratorStratifiedKFold
14Fold Number10
15CPU Jobs-1
16Use GPUFalse
17Log ExperimentFalse
18Experiment Nameclf-default-name
19USI8d38
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# import pycaret classification and init setup\n", "from pycaret.classification import *\n", "s = setup(data, target = 'species', session_id = 123)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3c583864", "metadata": {}, "source": [ "Once the setup has been successfully executed it shows the information grid containing experiment level information. \n", "\n", "- **Session id:** A pseudo-random number distributed as a seed in all functions for later reproducibility. If no `session_id` is passed, a random number is automatically generated that is distributed to all functions.
\n", "
\n", "- **Target type:** Binary, Multiclass, or Regression. The Target type is automatically detected.
\n", "
\n", "- **Label Encoding:** When the Target variable is of type string (i.e. 'Yes' or 'No') instead of 1 or 0, it automatically encodes the label into 1 and 0 and displays the mapping (0 : No, 1 : Yes) for reference. In this tutorial, no label encoding is required since the target variable is of numeric type.
\n", "
\n", "- **Original data shape:** Shape of the original data prior to any transformations.
\n", "
\n", "- **Transformed train set shape :** Shape of transformed train set
\n", "
\n", "- **Transformed test set shape :** Shape of transformed test set
\n", "
\n", "- **Numeric features :** The number of features considered as numerical.
\n", "
\n", "- **Categorical features :** The number of features considered as categorical.
" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ada19398", "metadata": {}, "source": [ "PyCaret has two set of API's that you can work with. (1) Functional (as seen above) and (2) Object Oriented API.\n", "\n", "With Object Oriented API instead of executing functions directly you will import a class and execute methods of class." ] }, { "cell_type": "code", "execution_count": 4, "id": "32ee91c9", "metadata": {}, "outputs": [], "source": [ "# import ClassificationExperiment and init the class\n", "from pycaret.classification import ClassificationExperiment\n", "exp = ClassificationExperiment()" ] }, { "cell_type": "code", "execution_count": 5, "id": "3ead9fb5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pycaret.classification.oop.ClassificationExperiment" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check the type of exp\n", "type(exp)" ] }, { "cell_type": "code", "execution_count": 6, "id": "f05b8590", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0Session id123
1Targetspecies
2Target typeMulticlass
3Target mappingIris-setosa: 0, Iris-versicolor: 1, Iris-virginica: 2
4Original data shape(150, 5)
5Transformed data shape(150, 5)
6Transformed train set shape(105, 5)
7Transformed test set shape(45, 5)
8Numeric features4
9PreprocessTrue
10Imputation typesimple
11Numeric imputationmean
12Categorical imputationmode
13Fold GeneratorStratifiedKFold
14Fold Number10
15CPU Jobs-1
16Use GPUFalse
17Log ExperimentFalse
18Experiment Nameclf-default-name
19USI42d4
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# init setup on exp\n", "exp.setup(data, target = 'species', session_id = 123)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "77213120", "metadata": {}, "source": [ "You can use any of the two method i.e. Functional or OOP and even switch back and forth between two set of API's. The choice of method will not impact the results and has been tested for consistency." ] }, { "attachments": {}, "cell_type": "markdown", "id": "f98dd435", "metadata": {}, "source": [ "## Compare Models\n", "\n", "This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the `get_metrics` function. Custom metrics can be added or removed using `add_metric` and `remove_metric` function." ] }, { "cell_type": "code", "execution_count": 7, "id": "65a19df4", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelAccuracyAUCRecallPrec.F1KappaMCCTT (Sec)
lrLogistic Regression0.97180.99710.97180.97800.97120.95730.96090.9190
knnK Neighbors Classifier0.97180.98300.97180.97800.97120.95730.96090.0370
qdaQuadratic Discriminant Analysis0.97180.99740.97180.97800.97120.95730.96090.0300
ldaLinear Discriminant Analysis0.97181.00000.97180.97800.97120.95730.96090.0330
lightgbmLight Gradient Boosting Machine0.95360.99350.95360.96340.95280.92980.93560.3150
nbNaive Bayes0.94450.98680.94450.95250.94380.91610.92070.0300
etExtra Trees Classifier0.94450.99350.94450.95860.94260.91610.92460.0880
catboostCatBoost Classifier0.94450.99220.94450.95860.94260.91610.92460.1220
gbcGradient Boosting Classifier0.93550.97920.93550.94160.93250.90230.90830.1360
xgboostExtreme Gradient Boosting0.93550.98680.93550.94400.93430.90230.90770.0710
dtDecision Tree Classifier0.92640.94290.92640.95020.92010.88860.90400.0270
rfRandom Forest Classifier0.92640.99090.92640.93430.92320.88860.89560.0900
adaAda Boost Classifier0.91550.99470.91550.94010.90970.87200.88730.0580
ridgeRidge Classifier0.82270.00000.82270.84370.81860.73200.74540.0220
svmSVM - Linear Kernel0.76180.00000.76180.66550.68880.63330.70480.0300
dummyDummy Classifier0.28640.50000.28640.08220.12770.00000.00000.0490
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/69 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelAccuracyAUCRecallPrec.F1KappaMCCTT (Sec)
lrLogistic Regression0.97180.99710.97180.97800.97120.95730.96090.0430
knnK Neighbors Classifier0.97180.98300.97180.97800.97120.95730.96090.0520
qdaQuadratic Discriminant Analysis0.97180.99740.97180.97800.97120.95730.96090.0420
ldaLinear Discriminant Analysis0.97181.00000.97180.97800.97120.95730.96090.0550
lightgbmLight Gradient Boosting Machine0.95360.99350.95360.96340.95280.92980.93560.0550
nbNaive Bayes0.94450.98680.94450.95250.94380.91610.92070.0380
etExtra Trees Classifier0.94450.99350.94450.95860.94260.91610.92460.1430
catboostCatBoost Classifier0.94450.99220.94450.95860.94260.91610.92460.0480
gbcGradient Boosting Classifier0.93550.97920.93550.94160.93250.90230.90830.1850
xgboostExtreme Gradient Boosting0.93550.98680.93550.94400.93430.90230.90770.0600
dtDecision Tree Classifier0.92640.94290.92640.95020.92010.88860.90400.0370
rfRandom Forest Classifier0.92640.99090.92640.93430.92320.88860.89560.1440
adaAda Boost Classifier0.91550.99470.91550.94010.90970.87200.88730.0850
ridgeRidge Classifier0.82270.00000.82270.84370.81860.73200.74540.0330
svmSVM - Linear Kernel0.76180.00000.76180.66550.68880.63330.70480.0320
dummyDummy Classifier0.28640.50000.28640.08220.12770.00000.00000.0430
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/69 [00:00#sk-container-id-1 {color: black;background-color: white;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                   intercept_scaling=1, l1_ratio=None, max_iter=1000,\n",
       "                   multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                   random_state=123, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                   warm_start=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, l1_ratio=None, max_iter=1000,\n", " multi_class='auto', n_jobs=None, penalty='l2',\n", " random_state=123, solver='lbfgs', tol=0.0001, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# compare models using OOP\n", "exp.compare_models()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "340de1e2", "metadata": {}, "source": [ "Notice that the output between functional and OOP API is consistent. Rest of the functions in this notebook will only be shown using functional API only. " ] }, { "attachments": {}, "cell_type": "markdown", "id": "6a77ec0c", "metadata": {}, "source": [ "## Analyze Model" ] }, { "attachments": {}, "cell_type": "markdown", "id": "595ea108", "metadata": {}, "source": [ "You can use the `plot_model` function to analyzes the performance of a trained model on the test set. It may require re-training the model in certain cases." ] }, { "cell_type": "code", "execution_count": 9, "id": "0ec7fad6", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# plot confusion matrix\n", "plot_model(best, plot = 'confusion_matrix')" ] }, { "cell_type": "code", "execution_count": 10, "id": "9fc4b9b1", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# plot AUC\n", "plot_model(best, plot = 'auc')" ] }, { "cell_type": "code", "execution_count": 11, "id": "bbc790e4", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# plot feature importance\n", "plot_model(best, plot = 'feature')" ] }, { "cell_type": "code", "execution_count": 12, "id": "da718984", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function plot_model in module pycaret.classification.functional:\n", "\n", "plot_model(estimator, plot: str = 'auc', scale: float = 1, save: bool = False, fold: Union[int, Any, NoneType] = None, fit_kwargs: Union[dict, NoneType] = None, plot_kwargs: Union[dict, NoneType] = None, groups: Union[str, Any, NoneType] = None, verbose: bool = True, display_format: Union[str, NoneType] = None) -> Union[str, NoneType]\n", " This function analyzes the performance of a trained model on holdout set.\n", " It may require re-training the model in certain cases.\n", " \n", " Example\n", " -------\n", " >>> from pycaret.datasets import get_data\n", " >>> juice = get_data('juice')\n", " >>> from pycaret.classification import *\n", " >>> exp_name = setup(data = juice, target = 'Purchase')\n", " >>> lr = create_model('lr')\n", " >>> plot_model(lr, plot = 'auc')\n", " \n", " \n", " estimator: scikit-learn compatible object\n", " Trained model object\n", " \n", " \n", " plot: str, default = 'auc'\n", " List of available plots (ID - Name):\n", " \n", " * 'pipeline' - Schematic drawing of the preprocessing pipeline\n", " * 'auc' - Area Under the Curve\n", " * 'threshold' - Discrimination Threshold\n", " * 'pr' - Precision Recall Curve\n", " * 'confusion_matrix' - Confusion Matrix\n", " * 'error' - Class Prediction Error\n", " * 'class_report' - Classification Report\n", " * 'boundary' - Decision Boundary\n", " * 'rfe' - Recursive Feature Selection\n", " * 'learning' - Learning Curve\n", " * 'manifold' - Manifold Learning\n", " * 'calibration' - Calibration Curve\n", " * 'vc' - Validation Curve\n", " * 'dimension' - Dimension Learning\n", " * 'feature' - Feature Importance\n", " * 'feature_all' - Feature Importance (All)\n", " * 'parameter' - Model Hyperparameter\n", " * 'lift' - Lift Curve\n", " * 'gain' - Gain Chart\n", " * 'tree' - Decision Tree\n", " * 'ks' - KS Statistic Plot\n", " \n", " \n", " scale: float, default = 1\n", " The resolution scale of the figure.\n", " \n", " \n", " save: bool, default = False\n", " When set to True, plot is saved in the current working directory.\n", " \n", " \n", " fold: int or scikit-learn compatible CV generator, default = None\n", " Controls cross-validation. If None, the CV generator in the ``fold_strategy``\n", " parameter of the ``setup`` function is used. When an integer is passed,\n", " it is interpreted as the 'n_splits' parameter of the CV generator in the\n", " ``setup`` function.\n", " \n", " \n", " fit_kwargs: dict, default = {} (empty dict)\n", " Dictionary of arguments passed to the fit method of the model.\n", " \n", " \n", " plot_kwargs: dict, default = {} (empty dict)\n", " Dictionary of arguments passed to the visualizer class.\n", " - pipeline: fontsize -> int\n", " \n", " \n", " groups: str or array-like, with shape (n_samples,), default = None\n", " Optional group labels when GroupKFold is used for the cross validation.\n", " It takes an array with shape (n_samples, ) where n_samples is the number\n", " of rows in training dataset. When string is passed, it is interpreted as\n", " the column name in the dataset containing group labels.\n", " \n", " \n", " verbose: bool, default = True\n", " When set to False, progress bar is not displayed.\n", " \n", " \n", " display_format: str, default = None\n", " To display plots in Streamlit (https://www.streamlit.io/), set this to 'streamlit'.\n", " Currently, not all plots are supported.\n", " \n", " \n", " Returns:\n", " Path to saved file, if any.\n", " \n", " \n", " Warnings\n", " --------\n", " - Estimators that does not support 'predict_proba' attribute cannot be used for\n", " 'AUC' and 'calibration' plots.\n", " \n", " - When the target is multiclass, 'calibration', 'threshold', 'manifold' and 'rfe'\n", " plots are not available.\n", " \n", " - When the 'max_features' parameter of a trained model object is not equal to\n", " the number of samples in training set, the 'rfe' plot is not available.\n", "\n" ] } ], "source": [ "# check docstring to see available plots \n", "help(plot_model)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6bd66179", "metadata": {}, "source": [ "An alternate to `plot_model` function is `evaluate_model`. It can only be used in Notebook since it uses ipywidget." ] }, { "cell_type": "code", "execution_count": 13, "id": "c75f07a8", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "012aa1cc546c4300b75a6d5dda58eb14", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "evaluate_model(best)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "954cbeff", "metadata": {}, "source": [ "## Prediction\n", "The `predict_model` function returns `prediction_label` and `prediction_score` (probability of the predicted class) as new columns in dataframe. When data is `None` (default), it uses the test set (created during the setup function) for scoring." ] }, { "cell_type": "code", "execution_count": 14, "id": "87c1a007", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelAccuracyAUCRecallPrec.F1KappaMCC
0Logistic Regression0.97780.99850000.96670.9674
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# predict on test set\n", "holdout_pred = predict_model(best)" ] }, { "cell_type": "code", "execution_count": 15, "id": "5c01ac77", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal_lengthsepal_widthpetal_lengthpetal_widthspeciesprediction_labelprediction_score
1056.32.54.91.5Iris-versicolorIris-versicolor0.5204
1067.23.26.01.8Iris-virginicaIris-virginica0.9503
1075.52.43.81.1Iris-versicolorIris-versicolor0.9334
1086.73.14.71.5Iris-versicolorIris-versicolor0.7321
1097.73.86.72.2Iris-virginicaIris-virginica0.9952
\n", "
" ], "text/plain": [ " sepal_length sepal_width petal_length petal_width species \\\n", "105 6.3 2.5 4.9 1.5 Iris-versicolor \n", "106 7.2 3.2 6.0 1.8 Iris-virginica \n", "107 5.5 2.4 3.8 1.1 Iris-versicolor \n", "108 6.7 3.1 4.7 1.5 Iris-versicolor \n", "109 7.7 3.8 6.7 2.2 Iris-virginica \n", "\n", " prediction_label prediction_score \n", "105 Iris-versicolor 0.5204 \n", "106 Iris-virginica 0.9503 \n", "107 Iris-versicolor 0.9334 \n", "108 Iris-versicolor 0.7321 \n", "109 Iris-virginica 0.9952 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show predictions df\n", "holdout_pred.head()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d4baf825", "metadata": {}, "source": [ "The same function works for predicting the labels on unseen dataset. Let's create a copy of original data and drop the `Class variable`. We can then use the new data frame without labels for scoring." ] }, { "cell_type": "code", "execution_count": 16, "id": "fb1cb86d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal_lengthsepal_widthpetal_lengthpetal_width
05.13.51.40.2
14.93.01.40.2
24.73.21.30.2
34.63.11.50.2
45.03.61.40.2
\n", "
" ], "text/plain": [ " sepal_length sepal_width petal_length petal_width\n", "0 5.1 3.5 1.4 0.2\n", "1 4.9 3.0 1.4 0.2\n", "2 4.7 3.2 1.3 0.2\n", "3 4.6 3.1 1.5 0.2\n", "4 5.0 3.6 1.4 0.2" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# copy data and drop Class variable\n", "\n", "new_data = data.copy()\n", "new_data.drop('species', axis=1, inplace=True)\n", "new_data.head()" ] }, { "cell_type": "code", "execution_count": 17, "id": "c5803df9", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal_lengthsepal_widthpetal_lengthpetal_widthprediction_labelprediction_score
05.13.51.40.2Iris-setosa0.9775
14.93.01.40.2Iris-setosa0.9678
24.73.21.30.2Iris-setosa0.9820
34.63.11.50.2Iris-setosa0.9719
45.03.61.40.2Iris-setosa0.9813
\n", "
" ], "text/plain": [ " sepal_length sepal_width petal_length petal_width prediction_label \\\n", "0 5.1 3.5 1.4 0.2 Iris-setosa \n", "1 4.9 3.0 1.4 0.2 Iris-setosa \n", "2 4.7 3.2 1.3 0.2 Iris-setosa \n", "3 4.6 3.1 1.5 0.2 Iris-setosa \n", "4 5.0 3.6 1.4 0.2 Iris-setosa \n", "\n", " prediction_score \n", "0 0.9775 \n", "1 0.9678 \n", "2 0.9820 \n", "3 0.9719 \n", "4 0.9813 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# predict model on new_data\n", "predictions = predict_model(best, data = new_data)\n", "predictions.head()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e4384735", "metadata": {}, "source": [ "## Save Model" ] }, { "attachments": {}, "cell_type": "markdown", "id": "cd63f053", "metadata": {}, "source": [ "Finally, you can save the entire pipeline on disk for later use, using pycaret's `save_model` function." ] }, { "cell_type": "code", "execution_count": 18, "id": "4181de41", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Successfully Saved\n" ] }, { "data": { "text/plain": [ "(Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n", " steps=[('label_encoding',\n", " TransformerWrapperWithInverse(exclude=None, include=None,\n", " transformer=LabelEncoder())),\n", " ('numerical_imputer',\n", " TransformerWrapper(exclude=None,\n", " include=['sepal_length', 'sepal_width',\n", " 'petal_length', 'petal_width'],\n", " transformer=SimpleImputer(add_indicator=F...\n", " fill_value=None,\n", " missing_values=nan,\n", " strategy='most_frequent',\n", " verbose='deprecated'))),\n", " ('trained_model',\n", " LogisticRegression(C=1.0, class_weight=None, dual=False,\n", " fit_intercept=True, intercept_scaling=1,\n", " l1_ratio=None, max_iter=1000,\n", " multi_class='auto', n_jobs=None,\n", " penalty='l2', random_state=123,\n", " solver='lbfgs', tol=0.0001, verbose=0,\n", " warm_start=False))],\n", " verbose=False),\n", " 'my_first_pipeline.pkl')" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# save pipeline\n", "save_model(best, 'my_first_pipeline')" ] }, { "cell_type": "code", "execution_count": 19, "id": "40ed5152", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Successfully Loaded\n" ] }, { "data": { "text/html": [ "
Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n",
       "         steps=[('label_encoding',\n",
       "                 TransformerWrapperWithInverse(exclude=None, include=None,\n",
       "                                               transformer=LabelEncoder())),\n",
       "                ('numerical_imputer',\n",
       "                 TransformerWrapper(exclude=None,\n",
       "                                    include=['sepal_length', 'sepal_width',\n",
       "                                             'petal_length', 'petal_width'],\n",
       "                                    transformer=SimpleImputer(add_indicator=F...\n",
       "                                                              fill_value=None,\n",
       "                                                              missing_values=nan,\n",
       "                                                              strategy='most_frequent',\n",
       "                                                              verbose='deprecated'))),\n",
       "                ('trained_model',\n",
       "                 LogisticRegression(C=1.0, class_weight=None, dual=False,\n",
       "                                    fit_intercept=True, intercept_scaling=1,\n",
       "                                    l1_ratio=None, max_iter=1000,\n",
       "                                    multi_class='auto', n_jobs=None,\n",
       "                                    penalty='l2', random_state=123,\n",
       "                                    solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                                    warm_start=False))],\n",
       "         verbose=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n", " steps=[('label_encoding',\n", " TransformerWrapperWithInverse(exclude=None, include=None,\n", " transformer=LabelEncoder())),\n", " ('numerical_imputer',\n", " TransformerWrapper(exclude=None,\n", " include=['sepal_length', 'sepal_width',\n", " 'petal_length', 'petal_width'],\n", " transformer=SimpleImputer(add_indicator=F...\n", " fill_value=None,\n", " missing_values=nan,\n", " strategy='most_frequent',\n", " verbose='deprecated'))),\n", " ('trained_model',\n", " LogisticRegression(C=1.0, class_weight=None, dual=False,\n", " fit_intercept=True, intercept_scaling=1,\n", " l1_ratio=None, max_iter=1000,\n", " multi_class='auto', n_jobs=None,\n", " penalty='l2', random_state=123,\n", " solver='lbfgs', tol=0.0001, verbose=0,\n", " warm_start=False))],\n", " verbose=False)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# load pipeline\n", "loaded_best_pipeline = load_model('my_first_pipeline')\n", "loaded_best_pipeline" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b2c7d62e", "metadata": {}, "source": [ "# 👇 Detailed function-by-function overview" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e05937f5", "metadata": {}, "source": [ "## ✅ Setup\n", "This function initializes the experiment in PyCaret and creates the transformation pipeline based on all the parameters passed in the function. Setup function must be called before executing any other function. It takes two required parameters: `data` and `target`. All the other parameters are optional and are used for configuring data preprocessing pipeline." ] }, { "cell_type": "code", "execution_count": 20, "id": "24e503be", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0Session id123
1Targetspecies
2Target typeMulticlass
3Target mappingIris-setosa: 0, Iris-versicolor: 1, Iris-virginica: 2
4Original data shape(150, 5)
5Transformed data shape(150, 5)
6Transformed train set shape(105, 5)
7Transformed test set shape(45, 5)
8Numeric features4
9PreprocessTrue
10Imputation typesimple
11Numeric imputationmean
12Categorical imputationmode
13Fold GeneratorStratifiedKFold
14Fold Number10
15CPU Jobs-1
16Use GPUFalse
17Log ExperimentFalse
18Experiment Nameclf-default-name
19USI35bc
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "s = setup(data, target = 'species', session_id = 123)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "924d198b", "metadata": {}, "source": [ "To access all the variables created by the setup function such as transformed dataset, random_state, etc. you can use `get_config` method." ] }, { "cell_type": "code", "execution_count": 21, "id": "76128b08", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'USI',\n", " 'X',\n", " 'X_test',\n", " 'X_test_transformed',\n", " 'X_train',\n", " 'X_train_transformed',\n", " 'X_transformed',\n", " '_available_plots',\n", " '_ml_usecase',\n", " 'data',\n", " 'dataset',\n", " 'dataset_transformed',\n", " 'exp_id',\n", " 'exp_name_log',\n", " 'fix_imbalance',\n", " 'fold_generator',\n", " 'fold_groups_param',\n", " 'fold_shuffle_param',\n", " 'gpu_n_jobs_param',\n", " 'gpu_param',\n", " 'html_param',\n", " 'idx',\n", " 'is_multiclass',\n", " 'log_plots_param',\n", " 'logging_param',\n", " 'memory',\n", " 'n_jobs_param',\n", " 'pipeline',\n", " 'seed',\n", " 'target_param',\n", " 'test',\n", " 'test_transformed',\n", " 'train',\n", " 'train_transformed',\n", " 'variable_and_property_keys',\n", " 'variables',\n", " 'y',\n", " 'y_test',\n", " 'y_test_transformed',\n", " 'y_train',\n", " 'y_train_transformed',\n", " 'y_transformed'}" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check all available config\n", "get_config()" ] }, { "cell_type": "code", "execution_count": 22, "id": "dbc43292", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal_lengthsepal_widthpetal_lengthpetal_width
05.02.03.51.0
15.43.91.30.4
25.63.04.11.3
37.42.86.11.9
44.63.41.40.3
...............
1006.62.94.61.3
1014.52.31.30.3
1024.83.01.40.1
1035.43.41.70.2
1046.23.45.42.3
\n", "

105 rows × 4 columns

\n", "
" ], "text/plain": [ " sepal_length sepal_width petal_length petal_width\n", "0 5.0 2.0 3.5 1.0\n", "1 5.4 3.9 1.3 0.4\n", "2 5.6 3.0 4.1 1.3\n", "3 7.4 2.8 6.1 1.9\n", "4 4.6 3.4 1.4 0.3\n", ".. ... ... ... ...\n", "100 6.6 2.9 4.6 1.3\n", "101 4.5 2.3 1.3 0.3\n", "102 4.8 3.0 1.4 0.1\n", "103 5.4 3.4 1.7 0.2\n", "104 6.2 3.4 5.4 2.3\n", "\n", "[105 rows x 4 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# lets access X_train_transformed\n", "get_config('X_train_transformed')" ] }, { "cell_type": "code", "execution_count": 23, "id": "ef9cd061", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The current seed is: 123\n", "The new seed is: 786\n" ] } ], "source": [ "# another example: let's access seed\n", "print(\"The current seed is: {}\".format(get_config('seed')))\n", "\n", "# now lets change it using set_config\n", "set_config('seed', 786)\n", "print(\"The new seed is: {}\".format(get_config('seed')))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7afbe41d", "metadata": {}, "source": [ "All the preprocessing configurations and experiment settings/parameters are passed into the `setup` function. To see all available parameters, check the docstring:" ] }, { "cell_type": "code", "execution_count": 24, "id": "2885a14f", "metadata": {}, "outputs": [], "source": [ "# help(setup)" ] }, { "cell_type": "code", "execution_count": 25, "id": "34ae0fce", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0Session id123
1Targetspecies
2Target typeMulticlass
3Target mappingIris-setosa: 0, Iris-versicolor: 1, Iris-virginica: 2
4Original data shape(150, 5)
5Transformed data shape(150, 5)
6Transformed train set shape(105, 5)
7Transformed test set shape(45, 5)
8Numeric features4
9PreprocessTrue
10Imputation typesimple
11Numeric imputationmean
12Categorical imputationmode
13NormalizeTrue
14Normalize methodminmax
15Fold GeneratorStratifiedKFold
16Fold Number10
17CPU Jobs-1
18Use GPUFalse
19Log ExperimentFalse
20Experiment Nameclf-default-name
21USI3b39
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# init setup with normalize = True\n", "\n", "s = setup(data, target = 'species', session_id = 123,\n", " normalize = True, normalize_method = 'minmax')" ] }, { "cell_type": "code", "execution_count": 26, "id": "04204ae7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# lets check the X_train_transformed to see effect of params passed\n", "get_config('X_train_transformed')['sepal_length'].hist()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d28a3e4e", "metadata": {}, "source": [ "Notice that all the values are between 0 and 1 - that is because we passed `normalize=True` in the `setup` function. If you don't remember how it compares to actual data, no problem - we can also access non-transformed values using `get_config` and then compare. See below and notice the range of values on x-axis and compare it with histogram above." ] }, { "cell_type": "code", "execution_count": 27, "id": "68cc1c63", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "get_config('X_train')['sepal_length'].hist()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "36b8b803", "metadata": {}, "source": [ "## ✅ Compare Models\n", "This function trains and evaluates the performance of all estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the `get_metrics` function. Custom metrics can be added or removed using `add_metric` and `remove_metric` function." ] }, { "cell_type": "code", "execution_count": 28, "id": "a3350418", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelAccuracyAUCRecallPrec.F1KappaMCCTT (Sec)
qdaQuadratic Discriminant Analysis0.97180.99740.97180.97800.97120.95730.96090.0380
ldaLinear Discriminant Analysis0.97181.00000.97180.97800.97120.95730.96090.0440
knnK Neighbors Classifier0.96360.98440.96360.97090.96310.94500.94940.0510
lightgbmLight Gradient Boosting Machine0.95360.98570.95360.96340.95280.92980.93560.0500
nbNaive Bayes0.94450.98680.94450.95250.94380.91610.92070.0380
etExtra Trees Classifier0.94450.99350.94450.95860.94260.91610.92460.1240
catboostCatBoost Classifier0.94450.99220.94450.95860.94260.91610.92460.0390
xgboostExtreme Gradient Boosting0.93550.98680.93550.94400.93430.90230.90770.0480
dtDecision Tree Classifier0.92640.94290.92640.95020.92010.88860.90400.0340
rfRandom Forest Classifier0.92640.99030.92640.93430.92320.88860.89560.1210
gbcGradient Boosting Classifier0.92640.96880.92640.93430.92320.88860.89560.1500
adaAda Boost Classifier0.91550.98430.91550.94010.90970.87200.88730.0690
lrLogistic Regression0.90730.97510.90730.91590.90640.85970.86450.0400
ridgeRidge Classifier0.83180.00000.83180.85450.82810.74590.75950.0370
svmSVM - Linear Kernel0.81000.00000.81000.78310.77020.71250.75270.0350
dummyDummy Classifier0.28640.50000.28640.08220.12770.00000.00000.0380
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/69 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameReferenceTurbo
ID
lrLogistic Regressionsklearn.linear_model._logistic.LogisticRegressionTrue
knnK Neighbors Classifiersklearn.neighbors._classification.KNeighborsCl...True
nbNaive Bayessklearn.naive_bayes.GaussianNBTrue
dtDecision Tree Classifiersklearn.tree._classes.DecisionTreeClassifierTrue
svmSVM - Linear Kernelsklearn.linear_model._stochastic_gradient.SGDC...True
rbfsvmSVM - Radial Kernelsklearn.svm._classes.SVCFalse
gpcGaussian Process Classifiersklearn.gaussian_process._gpc.GaussianProcessC...False
mlpMLP Classifiersklearn.neural_network._multilayer_perceptron....False
ridgeRidge Classifiersklearn.linear_model._ridge.RidgeClassifierTrue
rfRandom Forest Classifiersklearn.ensemble._forest.RandomForestClassifierTrue
qdaQuadratic Discriminant Analysissklearn.discriminant_analysis.QuadraticDiscrim...True
adaAda Boost Classifiersklearn.ensemble._weight_boosting.AdaBoostClas...True
gbcGradient Boosting Classifiersklearn.ensemble._gb.GradientBoostingClassifierTrue
ldaLinear Discriminant Analysissklearn.discriminant_analysis.LinearDiscrimina...True
etExtra Trees Classifiersklearn.ensemble._forest.ExtraTreesClassifierTrue
xgboostExtreme Gradient Boostingxgboost.sklearn.XGBClassifierTrue
lightgbmLight Gradient Boosting Machinelightgbm.sklearn.LGBMClassifierTrue
catboostCatBoost Classifiercatboost.core.CatBoostClassifierTrue
dummyDummy Classifiersklearn.dummy.DummyClassifierTrue
\n", "" ], "text/plain": [ " Name \\\n", "ID \n", "lr Logistic Regression \n", "knn K Neighbors Classifier \n", "nb Naive Bayes \n", "dt Decision Tree Classifier \n", "svm SVM - Linear Kernel \n", "rbfsvm SVM - Radial Kernel \n", "gpc Gaussian Process Classifier \n", "mlp MLP Classifier \n", "ridge Ridge Classifier \n", "rf Random Forest Classifier \n", "qda Quadratic Discriminant Analysis \n", "ada Ada Boost Classifier \n", "gbc Gradient Boosting Classifier \n", "lda Linear Discriminant Analysis \n", "et Extra Trees Classifier \n", "xgboost Extreme Gradient Boosting \n", "lightgbm Light Gradient Boosting Machine \n", "catboost CatBoost Classifier \n", "dummy Dummy Classifier \n", "\n", " Reference Turbo \n", "ID \n", "lr sklearn.linear_model._logistic.LogisticRegression True \n", "knn sklearn.neighbors._classification.KNeighborsCl... True \n", "nb sklearn.naive_bayes.GaussianNB True \n", "dt sklearn.tree._classes.DecisionTreeClassifier True \n", "svm sklearn.linear_model._stochastic_gradient.SGDC... True \n", "rbfsvm sklearn.svm._classes.SVC False \n", "gpc sklearn.gaussian_process._gpc.GaussianProcessC... False \n", "mlp sklearn.neural_network._multilayer_perceptron.... False \n", "ridge sklearn.linear_model._ridge.RidgeClassifier True \n", "rf sklearn.ensemble._forest.RandomForestClassifier True \n", "qda sklearn.discriminant_analysis.QuadraticDiscrim... True \n", "ada sklearn.ensemble._weight_boosting.AdaBoostClas... True \n", "gbc sklearn.ensemble._gb.GradientBoostingClassifier True \n", "lda sklearn.discriminant_analysis.LinearDiscrimina... True \n", "et sklearn.ensemble._forest.ExtraTreesClassifier True \n", "xgboost xgboost.sklearn.XGBClassifier True \n", "lightgbm lightgbm.sklearn.LGBMClassifier True \n", "catboost catboost.core.CatBoostClassifier True \n", "dummy sklearn.dummy.DummyClassifier True " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check available models\n", "models()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f588f54b", "metadata": {}, "source": [ "You can use the `include` and `exclude` parameter in the `compare_models` to train only select model or exclude specific models from training by passing the model id's in `exclude` parameter." ] }, { "cell_type": "code", "execution_count": 30, "id": "f2a7e578", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelAccuracyAUCRecallPrec.F1KappaMCCTT (Sec)
lightgbmLight Gradient Boosting Machine0.95360.98570.95360.96340.95280.92980.93560.0520
etExtra Trees Classifier0.94450.99350.94450.95860.94260.91610.92460.1190
catboostCatBoost Classifier0.94450.99220.94450.95860.94260.91610.92460.0390
xgboostExtreme Gradient Boosting0.93550.98680.93550.94400.93430.90230.90770.0580
dtDecision Tree Classifier0.92640.94290.92640.95020.92010.88860.90400.0370
rfRandom Forest Classifier0.92640.99030.92640.93430.92320.88860.89560.1200
gbcGradient Boosting Classifier0.92640.96880.92640.93430.92320.88860.89560.1510
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/33 [00:00#sk-container-id-3 {color: black;background-color: white;}#sk-container-id-3 pre{padding: 0;}#sk-container-id-3 div.sk-toggleable {background-color: white;}#sk-container-id-3 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-3 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-3 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-3 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-3 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-3 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-3 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-3 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-3 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-3 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-3 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-3 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-3 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-3 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-3 div.sk-item {position: relative;z-index: 1;}#sk-container-id-3 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-3 div.sk-item::before, #sk-container-id-3 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-3 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-3 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-3 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-3 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-3 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-3 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-3 div.sk-label-container {text-align: center;}#sk-container-id-3 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-3 div.sk-text-repr-fallback {display: none;}
LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,\n",
       "               importance_type='split', learning_rate=0.1, max_depth=-1,\n",
       "               min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,\n",
       "               n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,\n",
       "               random_state=123, reg_alpha=0.0, reg_lambda=0.0, silent='warn',\n",
       "               subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,\n", " importance_type='split', learning_rate=0.1, max_depth=-1,\n", " min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,\n", " n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,\n", " random_state=123, reg_alpha=0.0, reg_lambda=0.0, silent='warn',\n", " subsample=1.0, subsample_for_bin=200000, subsample_freq=0)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compare_tree_models" ] }, { "attachments": {}, "cell_type": "markdown", "id": "af9ae6cd", "metadata": {}, "source": [ "The function above has return trained model object as an output. The scoring grid is only displayed and not returned. If you need access to the scoring grid you can use `pull` function to access the dataframe." ] }, { "cell_type": "code", "execution_count": 32, "id": "fc529e25", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ModelAccuracyAUCRecallPrec.F1KappaMCCTT (Sec)
lightgbmLight Gradient Boosting Machine0.95360.98570.95360.96340.95280.92980.93560.052
etExtra Trees Classifier0.94450.99350.94450.95860.94260.91610.92460.119
catboostCatBoost Classifier0.94450.99220.94450.95860.94260.91610.92460.039
xgboostExtreme Gradient Boosting0.93550.98680.93550.94400.93430.90230.90770.058
dtDecision Tree Classifier0.92640.94290.92640.95020.92010.88860.90400.037
rfRandom Forest Classifier0.92640.99030.92640.93430.92320.88860.89560.120
gbcGradient Boosting Classifier0.92640.96880.92640.93430.92320.88860.89560.151
\n", "
" ], "text/plain": [ " Model Accuracy AUC Recall Prec. \\\n", "lightgbm Light Gradient Boosting Machine 0.9536 0.9857 0.9536 0.9634 \n", "et Extra Trees Classifier 0.9445 0.9935 0.9445 0.9586 \n", "catboost CatBoost Classifier 0.9445 0.9922 0.9445 0.9586 \n", "xgboost Extreme Gradient Boosting 0.9355 0.9868 0.9355 0.9440 \n", "dt Decision Tree Classifier 0.9264 0.9429 0.9264 0.9502 \n", "rf Random Forest Classifier 0.9264 0.9903 0.9264 0.9343 \n", "gbc Gradient Boosting Classifier 0.9264 0.9688 0.9264 0.9343 \n", "\n", " F1 Kappa MCC TT (Sec) \n", "lightgbm 0.9528 0.9298 0.9356 0.052 \n", "et 0.9426 0.9161 0.9246 0.119 \n", "catboost 0.9426 0.9161 0.9246 0.039 \n", "xgboost 0.9343 0.9023 0.9077 0.058 \n", "dt 0.9201 0.8886 0.9040 0.037 \n", "rf 0.9232 0.8886 0.8956 0.120 \n", "gbc 0.9232 0.8886 0.8956 0.151 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compare_tree_models_results = pull()\n", "compare_tree_models_results" ] }, { "attachments": {}, "cell_type": "markdown", "id": "05a72fc2", "metadata": {}, "source": [ "By default `compare_models` return the single best performing model based on the metric defined in the `sort` parameter. Let's change our code to return 3 top models based on `Recall`." ] }, { "cell_type": "code", "execution_count": 33, "id": "1066dd07", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelAccuracyAUCRecallPrec.F1KappaMCCTT (Sec)
qdaQuadratic Discriminant Analysis0.97180.99740.97180.97800.97120.95730.96090.0400
ldaLinear Discriminant Analysis0.97181.00000.97180.97800.97120.95730.96090.0380
knnK Neighbors Classifier0.96360.98440.96360.97090.96310.94500.94940.0490
lightgbmLight Gradient Boosting Machine0.95360.98570.95360.96340.95280.92980.93560.0490
nbNaive Bayes0.94450.98680.94450.95250.94380.91610.92070.0380
etExtra Trees Classifier0.94450.99350.94450.95860.94260.91610.92460.1180
catboostCatBoost Classifier0.94450.99220.94450.95860.94260.91610.92460.0390
xgboostExtreme Gradient Boosting0.93550.98680.93550.94400.93430.90230.90770.0460
dtDecision Tree Classifier0.92640.94290.92640.95020.92010.88860.90400.0330
rfRandom Forest Classifier0.92640.99030.92640.93430.92320.88860.89560.1170
gbcGradient Boosting Classifier0.92640.96880.92640.93430.92320.88860.89560.1450
adaAda Boost Classifier0.91550.98430.91550.94010.90970.87200.88730.0680
lrLogistic Regression0.90730.97510.90730.91590.90640.85970.86450.0420
ridgeRidge Classifier0.83180.00000.83180.85450.82810.74590.75950.0310
svmSVM - Linear Kernel0.81000.00000.81000.78310.77020.71250.75270.0350
dummyDummy Classifier0.28640.50000.28640.08220.12770.00000.00000.0360
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/71 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameReferenceTurbo
ID
lrLogistic Regressionsklearn.linear_model._logistic.LogisticRegressionTrue
knnK Neighbors Classifiersklearn.neighbors._classification.KNeighborsCl...True
nbNaive Bayessklearn.naive_bayes.GaussianNBTrue
dtDecision Tree Classifiersklearn.tree._classes.DecisionTreeClassifierTrue
svmSVM - Linear Kernelsklearn.linear_model._stochastic_gradient.SGDC...True
rbfsvmSVM - Radial Kernelsklearn.svm._classes.SVCFalse
gpcGaussian Process Classifiersklearn.gaussian_process._gpc.GaussianProcessC...False
mlpMLP Classifiersklearn.neural_network._multilayer_perceptron....False
ridgeRidge Classifiersklearn.linear_model._ridge.RidgeClassifierTrue
rfRandom Forest Classifiersklearn.ensemble._forest.RandomForestClassifierTrue
qdaQuadratic Discriminant Analysissklearn.discriminant_analysis.QuadraticDiscrim...True
adaAda Boost Classifiersklearn.ensemble._weight_boosting.AdaBoostClas...True
gbcGradient Boosting Classifiersklearn.ensemble._gb.GradientBoostingClassifierTrue
ldaLinear Discriminant Analysissklearn.discriminant_analysis.LinearDiscrimina...True
etExtra Trees Classifiersklearn.ensemble._forest.ExtraTreesClassifierTrue
xgboostExtreme Gradient Boostingxgboost.sklearn.XGBClassifierTrue
lightgbmLight Gradient Boosting Machinelightgbm.sklearn.LGBMClassifierTrue
catboostCatBoost Classifiercatboost.core.CatBoostClassifierTrue
dummyDummy Classifiersklearn.dummy.DummyClassifierTrue
\n", "" ], "text/plain": [ " Name \\\n", "ID \n", "lr Logistic Regression \n", "knn K Neighbors Classifier \n", "nb Naive Bayes \n", "dt Decision Tree Classifier \n", "svm SVM - Linear Kernel \n", "rbfsvm SVM - Radial Kernel \n", "gpc Gaussian Process Classifier \n", "mlp MLP Classifier \n", "ridge Ridge Classifier \n", "rf Random Forest Classifier \n", "qda Quadratic Discriminant Analysis \n", "ada Ada Boost Classifier \n", "gbc Gradient Boosting Classifier \n", "lda Linear Discriminant Analysis \n", "et Extra Trees Classifier \n", "xgboost Extreme Gradient Boosting \n", "lightgbm Light Gradient Boosting Machine \n", "catboost CatBoost Classifier \n", "dummy Dummy Classifier \n", "\n", " Reference Turbo \n", "ID \n", "lr sklearn.linear_model._logistic.LogisticRegression True \n", "knn sklearn.neighbors._classification.KNeighborsCl... True \n", "nb sklearn.naive_bayes.GaussianNB True \n", "dt sklearn.tree._classes.DecisionTreeClassifier True \n", "svm sklearn.linear_model._stochastic_gradient.SGDC... True \n", "rbfsvm sklearn.svm._classes.SVC False \n", "gpc sklearn.gaussian_process._gpc.GaussianProcessC... False \n", "mlp sklearn.neural_network._multilayer_perceptron.... False \n", "ridge sklearn.linear_model._ridge.RidgeClassifier True \n", "rf sklearn.ensemble._forest.RandomForestClassifier True \n", "qda sklearn.discriminant_analysis.QuadraticDiscrim... True \n", "ada sklearn.ensemble._weight_boosting.AdaBoostClas... True \n", "gbc sklearn.ensemble._gb.GradientBoostingClassifier True \n", "lda sklearn.discriminant_analysis.LinearDiscrimina... True \n", "et sklearn.ensemble._forest.ExtraTreesClassifier True \n", "xgboost xgboost.sklearn.XGBClassifier True \n", "lightgbm lightgbm.sklearn.LGBMClassifier True \n", "catboost catboost.core.CatBoostClassifier True \n", "dummy sklearn.dummy.DummyClassifier True " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check all the available models\n", "models()" ] }, { "cell_type": "code", "execution_count": 41, "id": "16641cab", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.90911.00000.90910.92730.90760.86250.8735
10.81820.92210.81820.81820.81820.72500.7250
20.90910.96100.90910.92730.90760.86250.8735
30.63640.89610.63640.63640.63640.45000.4500
41.00001.00001.00001.00001.00001.00001.0000
50.90000.97140.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
80.90001.00000.90000.92500.89710.84850.8616
91.00001.00001.00001.00001.00001.00001.0000
Mean0.90730.97510.90730.91590.90640.85970.8645
Std0.10760.03600.10760.10790.10770.16280.1628
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/4 [00:00\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AccuracyAUCRecallPrec.F1KappaMCC
Fold
00.90911.00000.90910.92730.90760.86250.8735
10.81820.92210.81820.81820.81820.72500.7250
20.90910.96100.90910.92730.90760.86250.8735
30.63640.89610.63640.63640.63640.45000.4500
41.00001.00001.00001.00001.00001.00001.0000
50.90000.97140.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
80.90001.00000.90000.92500.89710.84850.8616
91.00001.00001.00001.00001.00001.00001.0000
Mean0.90730.97510.90730.91590.90640.85970.8645
Std0.10760.03600.10760.10790.10770.16280.1628
\n", "
" ], "text/plain": [ " Accuracy AUC Recall Prec. F1 Kappa MCC\n", "Fold \n", "0 0.9091 1.0000 0.9091 0.9273 0.9076 0.8625 0.8735\n", "1 0.8182 0.9221 0.8182 0.8182 0.8182 0.7250 0.7250\n", "2 0.9091 0.9610 0.9091 0.9273 0.9076 0.8625 0.8735\n", "3 0.6364 0.8961 0.6364 0.6364 0.6364 0.4500 0.4500\n", "4 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000\n", "5 0.9000 0.9714 0.9000 0.9250 0.8971 0.8485 0.8616\n", "6 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000\n", "7 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000\n", "8 0.9000 1.0000 0.9000 0.9250 0.8971 0.8485 0.8616\n", "9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000\n", "Mean 0.9073 0.9751 0.9073 0.9159 0.9064 0.8597 0.8645\n", "Std 0.1076 0.0360 0.1076 0.1079 0.1077 0.1628 0.1628" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lr_results = pull()\n", "print(type(lr_results))\n", "lr_results" ] }, { "cell_type": "code", "execution_count": 43, "id": "148a74c4", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.91430.97300.91430.91580.91400.87120.8722
10.88570.97640.88570.89220.88490.82840.8325
20.97140.99880.97140.97360.97130.95710.9582
Mean0.92380.98270.92380.92720.92340.88560.8877
Std0.03560.01150.03560.03420.03590.05350.0525
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/4 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.90911.00000.90910.92730.90760.86250.8735
10.81820.90910.81820.81820.81820.72500.7250
20.90910.93510.90910.92730.90760.86250.8735
30.72730.88310.72730.73330.72290.58750.5950
40.90911.00000.90910.92730.90760.86250.8735
50.90000.97140.90000.92500.89710.84850.8616
60.90001.00000.90000.92500.89710.84850.8616
71.00001.00001.00001.00001.00001.00001.0000
80.90000.98570.90000.92500.89710.84850.8616
91.00001.00001.00001.00001.00001.00001.0000
Mean0.89730.96840.89730.91080.89550.84450.8525
Std0.07530.04150.07530.07580.07620.11390.1130
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/4 [00:00#sk-container-id-4 {color: black;background-color: white;}#sk-container-id-4 pre{padding: 0;}#sk-container-id-4 div.sk-toggleable {background-color: white;}#sk-container-id-4 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-4 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-4 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-4 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-4 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-4 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-4 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-4 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-4 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-4 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-4 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-4 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-4 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-4 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-4 div.sk-item {position: relative;z-index: 1;}#sk-container-id-4 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-4 div.sk-item::before, #sk-container-id-4 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-4 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-4 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-4 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-4 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-4 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-4 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-4 div.sk-label-container {text-align: center;}#sk-container-id-4 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-4 div.sk-text-repr-fallback {display: none;}
LogisticRegression(C=0.5, class_weight=None, dual=False, fit_intercept=True,\n",
       "                   intercept_scaling=1, l1_ratio=0.15, max_iter=1000,\n",
       "                   multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                   random_state=123, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                   warm_start=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LogisticRegression(C=0.5, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, l1_ratio=0.15, max_iter=1000,\n", " multi_class='auto', n_jobs=None, penalty='l2',\n", " random_state=123, solver='lbfgs', tol=0.0001, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# train logistic regression with specific model parameters\n", "create_model('lr', C = 0.5, l1_ratio = 0.15)" ] }, { "cell_type": "code", "execution_count": 45, "id": "b85af29b", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  AccuracyAUCRecallPrec.F1KappaMCC
SplitFold       
CV-Train00.91490.98820.91490.91590.91480.87230.8729
10.92550.98920.92550.92580.92550.88830.8884
20.92550.98870.92550.92790.92540.88830.8896
30.94680.98830.94680.94710.94680.92020.9204
40.90430.98550.90430.90650.90400.85640.8577
50.91580.98780.91580.91680.91570.87370.8743
60.90530.98430.90530.90740.90510.85790.8592
70.91580.98630.91580.91580.91580.87370.8737
80.91580.98480.91580.91680.91570.87370.8743
90.90530.98580.90530.90740.90510.85790.8592
CV-Val00.90911.00000.90910.92730.90760.86250.8735
10.81820.92210.81820.81820.81820.72500.7250
20.90910.96100.90910.92730.90760.86250.8735
30.63640.89610.63640.63640.63640.45000.4500
41.00001.00001.00001.00001.00001.00001.0000
50.90000.97140.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
80.90001.00000.90000.92500.89710.84850.8616
91.00001.00001.00001.00001.00001.00001.0000
CV-TrainMean0.91750.98690.91750.91870.91740.87620.8770
Std0.01220.00170.01220.01170.01220.01820.0180
CV-ValMean0.90730.97510.90730.91590.90640.85970.8645
Std0.10760.03600.10760.10790.10770.16280.1628
Trainnan0.91430.98730.00000.00000.00000.87140.8715
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/4 [00:00#sk-container-id-5 {color: black;background-color: white;}#sk-container-id-5 pre{padding: 0;}#sk-container-id-5 div.sk-toggleable {background-color: white;}#sk-container-id-5 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-5 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-5 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-5 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-5 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-5 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-5 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-5 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-5 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-5 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-5 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-5 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-5 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-5 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-5 div.sk-item {position: relative;z-index: 1;}#sk-container-id-5 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-5 div.sk-item::before, #sk-container-id-5 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-5 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-5 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-5 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-5 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-5 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-5 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-5 div.sk-label-container {text-align: center;}#sk-container-id-5 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-5 div.sk-text-repr-fallback {display: none;}
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                   intercept_scaling=1, l1_ratio=None, max_iter=1000,\n",
       "                   multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                   random_state=123, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                   warm_start=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, l1_ratio=None, max_iter=1000,\n", " multi_class='auto', n_jobs=None, penalty='l2',\n", " random_state=123, solver='lbfgs', tol=0.0001, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# train lr and return train score as well alongwith CV\n", "create_model('lr', return_train_score=True)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "08634e9e", "metadata": {}, "source": [ "Some other parameters that you might find very useful in `create_model` are:\n", "\n", "- cross_validation\n", "- engine\n", "- fit_kwargs\n", "- groups\n", "\n", "You can check the docstring of the function for more info." ] }, { "cell_type": "code", "execution_count": 46, "id": "3fb32c74", "metadata": {}, "outputs": [], "source": [ "# help(create_model)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d5378836", "metadata": {}, "source": [ "## ✅ Tune Model\n", "\n", "This function tunes the hyperparameters of the model. The output of this function is a scoring grid with cross-validated scores by fold. The best model is selected based on the metric defined in optimize parameter. Metrics evaluated during cross-validation can be accessed using the `get_metrics` function. Custom metrics can be added or removed using `add_metric` and `remove_metric` function." ] }, { "cell_type": "code", "execution_count": 47, "id": "402597f2", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.81820.85710.81820.87880.80610.72500.7642
10.90910.92860.90910.92730.90760.86250.8735
20.90910.92860.90910.92730.90760.86250.8735
30.72730.78570.72730.84420.68260.58750.6674
41.00001.00001.00001.00001.00001.00001.0000
50.90000.92860.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.92640.94290.92640.95020.92010.88860.9040
Std0.08930.07000.08930.05520.10110.13510.1119
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/4 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
01.00001.00001.00001.00001.00001.00001.0000
10.90910.94810.90910.92730.90760.86250.8735
20.90910.94810.90910.92730.90760.86250.8735
30.72730.84420.72730.84420.68260.58750.6674
41.00001.00001.00001.00001.00001.00001.0000
50.90000.92860.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.94450.96690.94450.96240.93950.91610.9276
Std0.08380.04880.08380.05130.09580.12670.1046
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/7 [00:00#sk-container-id-6 {color: black;background-color: white;}#sk-container-id-6 pre{padding: 0;}#sk-container-id-6 div.sk-toggleable {background-color: white;}#sk-container-id-6 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-6 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-6 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-6 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-6 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-6 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-6 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-6 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-6 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-6 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-6 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-6 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-6 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-6 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-6 div.sk-item {position: relative;z-index: 1;}#sk-container-id-6 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-6 div.sk-item::before, #sk-container-id-6 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-6 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-6 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-6 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-6 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-6 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-6 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-6 div.sk-label-container {text-align: center;}#sk-container-id-6 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-6 div.sk-text-repr-fallback {display: none;}
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
       "                       max_depth=None, max_features=None, max_leaf_nodes=None,\n",
       "                       min_impurity_decrease=0.0, min_samples_leaf=1,\n",
       "                       min_samples_split=2, min_weight_fraction_leaf=0.0,\n",
       "                       random_state=123, splitter='best')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n", " max_depth=None, max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_samples_leaf=1,\n", " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", " random_state=123, splitter='best')" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dt" ] }, { "cell_type": "code", "execution_count": 50, "id": "31e050ff", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.90910.94810.90910.92730.90760.86250.8735
11.00001.00001.00001.00001.00001.00001.0000
20.90910.92860.90910.92730.90760.86250.8735
30.72730.84420.72730.84420.68260.58750.6674
41.00001.00001.00001.00001.00001.00001.0000
50.90000.95710.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.94450.96780.94450.96240.93950.91610.9276
Std0.08380.04850.08380.05130.09580.12670.1046
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/7 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
01.00001.00001.00001.00001.00001.00001.0000
10.90910.94810.90910.92730.90760.86250.8735
20.90910.94810.90910.92730.90760.86250.8735
30.72730.84420.72730.84420.68260.58750.6674
41.00001.00001.00001.00001.00001.00001.0000
50.90000.92860.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.94450.96690.94450.96240.93950.91610.9276
Std0.08380.04880.08380.05130.09580.12670.1046
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/7 [00:00#sk-container-id-7 {color: black;background-color: white;}#sk-container-id-7 pre{padding: 0;}#sk-container-id-7 div.sk-toggleable {background-color: white;}#sk-container-id-7 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-7 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-7 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-7 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-7 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-7 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-7 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-7 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-7 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-7 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-7 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-7 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-7 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-7 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-7 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-7 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-7 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-7 div.sk-item {position: relative;z-index: 1;}#sk-container-id-7 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-7 div.sk-item::before, #sk-container-id-7 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-7 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-7 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-7 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-7 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-7 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-7 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-7 div.sk-label-container {text-align: center;}#sk-container-id-7 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-7 div.sk-text-repr-fallback {display: none;}
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='entropy',\n",
       "                       max_depth=5, max_features='sqrt', max_leaf_nodes=None,\n",
       "                       min_impurity_decrease=0.2, min_samples_leaf=5,\n",
       "                       min_samples_split=5, min_weight_fraction_leaf=0.0,\n",
       "                       random_state=123, splitter='best')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='entropy',\n", " max_depth=5, max_features='sqrt', max_leaf_nodes=None,\n", " min_impurity_decrease=0.2, min_samples_leaf=5,\n", " min_samples_split=5, min_weight_fraction_leaf=0.0,\n", " random_state=123, splitter='best')" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# model object\n", "tuned_dt" ] }, { "cell_type": "code", "execution_count": 53, "id": "7d5e49ca", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
RandomizedSearchCV(cv=StratifiedKFold(n_splits=10, random_state=None, shuffle=False),\n",
       "                   error_score=nan,\n",
       "                   estimator=Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n",
       "                                      steps=[('label_encoding',\n",
       "                                              TransformerWrapperWithInverse(exclude=None,\n",
       "                                                                            include=None,\n",
       "                                                                            transformer=LabelEncoder())),\n",
       "                                             ('numerical_imputer',\n",
       "                                              TransformerWrapper(exclude=None,...\n",
       "                                        'actual_estimator__max_features': [1.0,\n",
       "                                                                           'sqrt',\n",
       "                                                                           'log2'],\n",
       "                                        'actual_estimator__min_impurity_decrease': [0,\n",
       "                                                                                    0.0001,\n",
       "                                                                                    0.001,\n",
       "                                                                                    0.01,\n",
       "                                                                                    0.0002,\n",
       "                                                                                    0.002,\n",
       "                                                                                    0.02,\n",
       "                                                                                    0.0005,\n",
       "                                                                                    0.005,\n",
       "                                                                                    0.05,\n",
       "                                                                                    0.1,\n",
       "                                                                                    0.2,\n",
       "                                                                                    0.3,\n",
       "                                                                                    0.4,\n",
       "                                                                                    0.5],\n",
       "                                        'actual_estimator__min_samples_leaf': [2,\n",
       "                                                                               3,\n",
       "                                                                               4,\n",
       "                                                                               5,\n",
       "                                                                               6],\n",
       "                                        'actual_estimator__min_samples_split': [2,\n",
       "                                                                                5,\n",
       "                                                                                7,\n",
       "                                                                                9,\n",
       "                                                                                10]},\n",
       "                   pre_dispatch='2*n_jobs', random_state=123, refit=False,\n",
       "                   return_train_score=False, scoring='accuracy', verbose=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "RandomizedSearchCV(cv=StratifiedKFold(n_splits=10, random_state=None, shuffle=False),\n", " error_score=nan,\n", " estimator=Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n", " steps=[('label_encoding',\n", " TransformerWrapperWithInverse(exclude=None,\n", " include=None,\n", " transformer=LabelEncoder())),\n", " ('numerical_imputer',\n", " TransformerWrapper(exclude=None,...\n", " 'actual_estimator__max_features': [1.0,\n", " 'sqrt',\n", " 'log2'],\n", " 'actual_estimator__min_impurity_decrease': [0,\n", " 0.0001,\n", " 0.001,\n", " 0.01,\n", " 0.0002,\n", " 0.002,\n", " 0.02,\n", " 0.0005,\n", " 0.005,\n", " 0.05,\n", " 0.1,\n", " 0.2,\n", " 0.3,\n", " 0.4,\n", " 0.5],\n", " 'actual_estimator__min_samples_leaf': [2,\n", " 3,\n", " 4,\n", " 5,\n", " 6],\n", " 'actual_estimator__min_samples_split': [2,\n", " 5,\n", " 7,\n", " 9,\n", " 10]},\n", " pre_dispatch='2*n_jobs', random_state=123, refit=False,\n", " return_train_score=False, scoring='accuracy', verbose=1)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# tuner object\n", "tuner" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0a33c70b", "metadata": {}, "source": [ "The default search algorithm is `RandomizedSearchCV` from `sklearn`. This can be changed by using `search_library` and `search_algorithm` parameter." ] }, { "cell_type": "code", "execution_count": 54, "id": "31e33547", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.90910.92860.90910.92730.90760.86250.8735
10.90910.92860.90910.92730.90760.86250.8735
20.90910.92860.90910.92730.90760.86250.8735
30.72730.78570.72730.84420.68260.58750.6674
41.00001.00001.00001.00001.00001.00001.0000
50.90000.92860.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.93550.95000.93550.95510.93030.90230.9149
Std0.08220.06430.08220.05060.09390.12430.1027
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/7 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.90910.92860.90910.92730.90760.86250.8735
10.90911.00000.90910.92730.90760.86250.8735
20.90910.98700.90910.92730.90760.86250.8735
30.72730.94810.72730.84420.68260.58750.6674
41.00001.00001.00001.00001.00001.00001.0000
50.90000.98570.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.93550.98490.93550.95510.93030.90230.9149
Std0.08220.02430.08220.05060.09390.12430.1027
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/6 [00:00#sk-container-id-9 {color: black;background-color: white;}#sk-container-id-9 pre{padding: 0;}#sk-container-id-9 div.sk-toggleable {background-color: white;}#sk-container-id-9 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-9 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-9 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-9 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-9 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-9 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-9 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-9 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-9 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-9 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-9 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-9 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-9 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-9 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-9 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-9 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-9 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-9 div.sk-item {position: relative;z-index: 1;}#sk-container-id-9 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-9 div.sk-item::before, #sk-container-id-9 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-9 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-9 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-9 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-9 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-9 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-9 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-9 div.sk-label-container {text-align: center;}#sk-container-id-9 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-9 div.sk-text-repr-fallback {display: none;}
BaggingClassifier(base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,\n",
       "                                                        class_weight=None,\n",
       "                                                        criterion='gini',\n",
       "                                                        max_depth=None,\n",
       "                                                        max_features=None,\n",
       "                                                        max_leaf_nodes=None,\n",
       "                                                        min_impurity_decrease=0.0,\n",
       "                                                        min_samples_leaf=1,\n",
       "                                                        min_samples_split=2,\n",
       "                                                        min_weight_fraction_leaf=0.0,\n",
       "                                                        random_state=123,\n",
       "                                                        splitter='best'),\n",
       "                  bootstrap=True, bootstrap_features=False, max_features=1.0,\n",
       "                  max_samples=1.0, n_estimators=10, n_jobs=None,\n",
       "                  oob_score=False, random_state=123, verbose=0,\n",
       "                  warm_start=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "BaggingClassifier(base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,\n", " class_weight=None,\n", " criterion='gini',\n", " max_depth=None,\n", " max_features=None,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " random_state=123,\n", " splitter='best'),\n", " bootstrap=True, bootstrap_features=False, max_features=1.0,\n", " max_samples=1.0, n_estimators=10, n_jobs=None,\n", " oob_score=False, random_state=123, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ensemble with bagging\n", "ensemble_model(dt, method = 'Bagging')" ] }, { "cell_type": "code", "execution_count": 57, "id": "79279394", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.90910.92860.90910.92730.90760.86250.8735
10.90910.92860.90910.92730.90760.86250.8735
20.90910.92860.90910.92730.90760.86250.8735
30.63640.71430.63640.63640.61210.45000.4743
41.00001.00001.00001.00001.00001.00001.0000
50.90000.92860.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.92640.94290.92640.93430.92320.88860.8956
Std0.10620.08330.10620.10520.11300.16060.1532
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/6 [00:00#sk-container-id-10 {color: black;background-color: white;}#sk-container-id-10 pre{padding: 0;}#sk-container-id-10 div.sk-toggleable {background-color: white;}#sk-container-id-10 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-10 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-10 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-10 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-10 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-10 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-10 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-10 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-10 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-10 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-10 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-10 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-10 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-10 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-10 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-10 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-10 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-10 div.sk-item {position: relative;z-index: 1;}#sk-container-id-10 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-10 div.sk-item::before, #sk-container-id-10 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-10 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-10 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-10 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-10 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-10 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-10 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-10 div.sk-label-container {text-align: center;}#sk-container-id-10 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-10 div.sk-text-repr-fallback {display: none;}
AdaBoostClassifier(algorithm='SAMME.R',\n",
       "                   base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,\n",
       "                                                         class_weight=None,\n",
       "                                                         criterion='gini',\n",
       "                                                         max_depth=None,\n",
       "                                                         max_features=None,\n",
       "                                                         max_leaf_nodes=None,\n",
       "                                                         min_impurity_decrease=0.0,\n",
       "                                                         min_samples_leaf=1,\n",
       "                                                         min_samples_split=2,\n",
       "                                                         min_weight_fraction_leaf=0.0,\n",
       "                                                         random_state=123,\n",
       "                                                         splitter='best'),\n",
       "                   learning_rate=1.0, n_estimators=10, random_state=123)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "AdaBoostClassifier(algorithm='SAMME.R',\n", " base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,\n", " class_weight=None,\n", " criterion='gini',\n", " max_depth=None,\n", " max_features=None,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " random_state=123,\n", " splitter='best'),\n", " learning_rate=1.0, n_estimators=10, random_state=123)" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ensemble with boosting\n", "ensemble_model(dt, method = 'Boosting')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d0fa1ce2", "metadata": {}, "source": [ "Some other parameters that you might find very useful in `ensemble_model` are:\n", "\n", "- choose_better\n", "- n_estimators\n", "- groups\n", "- fit_kwargs\n", "- probability_threshold\n", "- return_train_score\n", "\n", "You can check the docstring of the function for more info." ] }, { "cell_type": "code", "execution_count": 58, "id": "78130ed1", "metadata": {}, "outputs": [], "source": [ "# help(ensemble_model)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ea8a9a4e", "metadata": {}, "source": [ "## ✅ Blend Models" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2ede29c4", "metadata": {}, "source": [ "This function trains a Soft Voting / Majority Rule classifier for select models passed in the estimator_list parameter. The output of this function is a scoring grid with CV scores by fold. Metrics evaluated during CV can be accessed using the `get_metrics` function. Custom metrics can be added or removed using `add_metric` and `remove_metric` function." ] }, { "cell_type": "code", "execution_count": 59, "id": "61a7a1c5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n", " store_covariance=False, tol=0.0001),\n", " LinearDiscriminantAnalysis(covariance_estimator=None, n_components=None,\n", " priors=None, shrinkage=None, solver='svd',\n", " store_covariance=False, tol=0.0001),\n", " KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", " metric_params=None, n_jobs=-1, n_neighbors=5, p=2,\n", " weights='uniform')]" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# top 3 models based on recall\n", "best_recall_models_top3" ] }, { "cell_type": "code", "execution_count": 60, "id": "04f65f2f", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.90910.97400.90910.92730.90760.86250.8735
11.00001.00001.00001.00001.00001.00001.0000
21.00001.00001.00001.00001.00001.00001.0000
30.90911.00000.90910.92730.90760.86250.8735
41.00001.00001.00001.00001.00001.00001.0000
50.90001.00000.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.97180.99740.97180.97800.97120.95730.9609
Std0.04310.00780.04310.03370.04400.06530.0599
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/6 [00:00#sk-container-id-11 {color: black;background-color: white;}#sk-container-id-11 pre{padding: 0;}#sk-container-id-11 div.sk-toggleable {background-color: white;}#sk-container-id-11 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-11 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-11 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-11 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-11 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-11 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-11 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-11 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-11 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-11 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-11 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-11 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-11 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-11 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-11 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-11 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-11 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-11 div.sk-item {position: relative;z-index: 1;}#sk-container-id-11 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-11 div.sk-item::before, #sk-container-id-11 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-11 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-11 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-11 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-11 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-11 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-11 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-11 div.sk-label-container {text-align: center;}#sk-container-id-11 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-11 div.sk-text-repr-fallback {display: none;}
VotingClassifier(estimators=[('Quadratic Discriminant Analysis',\n",
       "                              QuadraticDiscriminantAnalysis(priors=None,\n",
       "                                                            reg_param=0.0,\n",
       "                                                            store_covariance=False,\n",
       "                                                            tol=0.0001)),\n",
       "                             ('Linear Discriminant Analysis',\n",
       "                              LinearDiscriminantAnalysis(covariance_estimator=None,\n",
       "                                                         n_components=None,\n",
       "                                                         priors=None,\n",
       "                                                         shrinkage=None,\n",
       "                                                         solver='svd',\n",
       "                                                         store_covariance=False,\n",
       "                                                         tol=0.0001)),\n",
       "                             ('K Neighbors Classifier',\n",
       "                              KNeighborsClassifier(algorithm='auto',\n",
       "                                                   leaf_size=30,\n",
       "                                                   metric='minkowski',\n",
       "                                                   metric_params=None,\n",
       "                                                   n_jobs=-1, n_neighbors=5,\n",
       "                                                   p=2, weights='uniform'))],\n",
       "                 flatten_transform=True, n_jobs=-1, verbose=False,\n",
       "                 voting='soft', weights=None)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "VotingClassifier(estimators=[('Quadratic Discriminant Analysis',\n", " QuadraticDiscriminantAnalysis(priors=None,\n", " reg_param=0.0,\n", " store_covariance=False,\n", " tol=0.0001)),\n", " ('Linear Discriminant Analysis',\n", " LinearDiscriminantAnalysis(covariance_estimator=None,\n", " n_components=None,\n", " priors=None,\n", " shrinkage=None,\n", " solver='svd',\n", " store_covariance=False,\n", " tol=0.0001)),\n", " ('K Neighbors Classifier',\n", " KNeighborsClassifier(algorithm='auto',\n", " leaf_size=30,\n", " metric='minkowski',\n", " metric_params=None,\n", " n_jobs=-1, n_neighbors=5,\n", " p=2, weights='uniform'))],\n", " flatten_transform=True, n_jobs=-1, verbose=False,\n", " voting='soft', weights=None)" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# blend top 3 models\n", "blend_models(best_recall_models_top3)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9e788c9c", "metadata": {}, "source": [ "Some other parameters that you might find very useful in `blend_models` are:\n", "\n", "- choose_better\n", "- method\n", "- weights\n", "- fit_kwargs\n", "- probability_threshold\n", "- return_train_score\n", "\n", "You can check the docstring of the function for more info." ] }, { "cell_type": "code", "execution_count": 61, "id": "99b549a6", "metadata": {}, "outputs": [], "source": [ "# help(blend_models)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e76969b0", "metadata": {}, "source": [ "## ✅ Stack Models" ] }, { "attachments": {}, "cell_type": "markdown", "id": "55909804", "metadata": {}, "source": [ "This function trains a meta-model over select estimators passed in the estimator_list parameter. The output of this function is a scoring grid with CV scores by fold. Metrics evaluated during CV can be accessed using the `get_metrics` function. Custom metrics can be added or removed using `add_metric` and `remove_metric` function." ] }, { "cell_type": "code", "execution_count": 62, "id": "201c681e", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.90910.97400.90910.92730.90760.86250.8735
11.00001.00001.00001.00001.00001.00001.0000
21.00001.00001.00001.00001.00001.00001.0000
30.90911.00000.90910.92730.90760.86250.8735
41.00001.00001.00001.00001.00001.00001.0000
50.90001.00000.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.97180.99740.97180.97800.97120.95730.9609
Std0.04310.00780.04310.03370.04400.06530.0599
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/6 [00:00#sk-container-id-12 {color: black;background-color: white;}#sk-container-id-12 pre{padding: 0;}#sk-container-id-12 div.sk-toggleable {background-color: white;}#sk-container-id-12 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-12 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-12 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-12 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-12 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-12 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-12 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-12 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-12 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-12 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-12 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-12 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-12 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-12 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-12 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-12 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-12 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-12 div.sk-item {position: relative;z-index: 1;}#sk-container-id-12 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-12 div.sk-item::before, #sk-container-id-12 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-12 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-12 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-12 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-12 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-12 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-12 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-12 div.sk-label-container {text-align: center;}#sk-container-id-12 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-12 div.sk-text-repr-fallback {display: none;}
StackingClassifier(cv=5,\n",
       "                   estimators=[('Quadratic Discriminant Analysis',\n",
       "                                QuadraticDiscriminantAnalysis(priors=None,\n",
       "                                                              reg_param=0.0,\n",
       "                                                              store_covariance=False,\n",
       "                                                              tol=0.0001)),\n",
       "                               ('Linear Discriminant Analysis',\n",
       "                                LinearDiscriminantAnalysis(covariance_estimator=None,\n",
       "                                                           n_components=None,\n",
       "                                                           priors=None,\n",
       "                                                           shrinkage=None,\n",
       "                                                           solver='svd',\n",
       "                                                           store_covariance=False,\n",
       "                                                           tol=0.0001)),\n",
       "                               ('K Neighbors...\n",
       "                                                     n_jobs=-1, n_neighbors=5,\n",
       "                                                     p=2, weights='uniform'))],\n",
       "                   final_estimator=LogisticRegression(C=1.0, class_weight=None,\n",
       "                                                      dual=False,\n",
       "                                                      fit_intercept=True,\n",
       "                                                      intercept_scaling=1,\n",
       "                                                      l1_ratio=None,\n",
       "                                                      max_iter=1000,\n",
       "                                                      multi_class='auto',\n",
       "                                                      n_jobs=None, penalty='l2',\n",
       "                                                      random_state=123,\n",
       "                                                      solver='lbfgs',\n",
       "                                                      tol=0.0001, verbose=0,\n",
       "                                                      warm_start=False),\n",
       "                   n_jobs=-1, passthrough=True, stack_method='auto', verbose=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "StackingClassifier(cv=5,\n", " estimators=[('Quadratic Discriminant Analysis',\n", " QuadraticDiscriminantAnalysis(priors=None,\n", " reg_param=0.0,\n", " store_covariance=False,\n", " tol=0.0001)),\n", " ('Linear Discriminant Analysis',\n", " LinearDiscriminantAnalysis(covariance_estimator=None,\n", " n_components=None,\n", " priors=None,\n", " shrinkage=None,\n", " solver='svd',\n", " store_covariance=False,\n", " tol=0.0001)),\n", " ('K Neighbors...\n", " n_jobs=-1, n_neighbors=5,\n", " p=2, weights='uniform'))],\n", " final_estimator=LogisticRegression(C=1.0, class_weight=None,\n", " dual=False,\n", " fit_intercept=True,\n", " intercept_scaling=1,\n", " l1_ratio=None,\n", " max_iter=1000,\n", " multi_class='auto',\n", " n_jobs=None, penalty='l2',\n", " random_state=123,\n", " solver='lbfgs',\n", " tol=0.0001, verbose=0,\n", " warm_start=False),\n", " n_jobs=-1, passthrough=True, stack_method='auto', verbose=0)" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# stack models\n", "stack_models(best_recall_models_top3)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "af78cda8", "metadata": {}, "source": [ "Some other parameters that you might find very useful in `stack_models` are:\n", "\n", "- choose_better\n", "- meta_model\n", "- method\n", "- restack\n", "- probability_threshold\n", "- return_train_score\n", "\n", "You can check the docstring of the function for more info." ] }, { "cell_type": "code", "execution_count": 63, "id": "3305e597", "metadata": {}, "outputs": [], "source": [ "# help(stack_models)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "279a3127", "metadata": {}, "source": [ "## ✅ Plot Model" ] }, { "attachments": {}, "cell_type": "markdown", "id": "862bd3e9", "metadata": {}, "source": [ "This function analyzes the performance of a trained model on the hold-out set. It may require re-training the model in certain cases." ] }, { "cell_type": "code", "execution_count": 64, "id": "9c8da9b4", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# plot class report\n", "plot_model(best, plot = 'class_report')" ] }, { "cell_type": "code", "execution_count": 65, "id": "952b6f24", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# to control the scale of plot\n", "plot_model(best, plot = 'class_report', scale = 2)" ] }, { "cell_type": "code", "execution_count": 66, "id": "54389270", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "'Class Report.png'" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# to save the plot\n", "plot_model(best, plot = 'class_report', save=True)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2fef279d", "metadata": {}, "source": [ "Some other parameters that you might find very useful in `plot_model` are:\n", "\n", "- fit_kwargs\n", "- plot_kwargs\n", "- groups\n", "- display_format\n", "\n", "You can check the docstring of the function for more info." ] }, { "cell_type": "code", "execution_count": 67, "id": "54b09b8e", "metadata": {}, "outputs": [], "source": [ "# help(plot_model)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b724ca46", "metadata": {}, "source": [ "## ✅ Interpret Model" ] }, { "attachments": {}, "cell_type": "markdown", "id": "52f8fb33", "metadata": {}, "source": [ "This function analyzes the predictions generated from a trained model. Most plots in this function are implemented based on the SHAP (Shapley Additive exPlanations). For more info on this, please see https://shap.readthedocs.io/en/latest/" ] }, { "cell_type": "code", "execution_count": 68, "id": "6b6891b7", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 AccuracyAUCRecallPrec.F1KappaMCC
Fold       
00.90910.93510.90910.92730.90760.86250.8735
10.90911.00000.90910.92730.90760.86250.8735
20.90910.92210.90910.92730.90760.86250.8735
30.90911.00000.90910.92730.90760.86250.8735
41.00001.00001.00001.00001.00001.00001.0000
50.90001.00000.90000.92500.89710.84850.8616
61.00001.00001.00001.00001.00001.00001.0000
71.00001.00001.00001.00001.00001.00001.0000
81.00001.00001.00001.00001.00001.00001.0000
91.00001.00001.00001.00001.00001.00001.0000
Mean0.95360.98570.95360.96340.95280.92980.9356
Std0.04640.02870.04640.03660.04730.07030.0645
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/4 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# interpret summary model\n", "interpret_model(lightgbm, plot = 'summary')" ] }, { "cell_type": "code", "execution_count": 70, "id": "824bafdc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", " Visualization omitted, Javascript library not loaded!
\n", " Have you run `initjs()` in this notebook? If this notebook was from another\n", " user you must also trust this notebook (File -> Trust notebook). If you are viewing\n", " this notebook on github the Javascript has been stripped for security. If you are using\n", " JupyterLab this error is because a JupyterLab extension has not yet been written.\n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reason plot for test set observation 1\n", "interpret_model(lightgbm, plot = 'reason', observation = 1)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ca7ce2b4", "metadata": {}, "source": [ "Some other parameters that you might find very useful in `interpret_model` are:\n", "\n", "- plot\n", "- feature\n", "- use_train_data\n", "- X_new_sample\n", "- y_new_sample\n", "- save\n", "\n", "You can check the docstring of the function for more info." ] }, { "cell_type": "code", "execution_count": 71, "id": "42595030", "metadata": {}, "outputs": [], "source": [ "# help(interpret_model)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9f57d0c8", "metadata": {}, "source": [ "## ✅ Get Leaderboard" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ec63b67a", "metadata": {}, "source": [ "This function returns the leaderboard of all models trained in the current setup." ] }, { "cell_type": "code", "execution_count": 72, "id": "307a6e3c", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing: 0%| | 0/58 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Model NameModelAccuracyAUCRecallPrec.F1KappaMCC
Index
0Logistic Regression(TransformerWrapperWithInverse(exclude=None, i...0.90730.97510.90730.91590.90640.85970.8645
1K Neighbors Classifier(TransformerWrapperWithInverse(exclude=None, i...0.96360.98440.96360.97090.96310.94500.9494
2Naive Bayes(TransformerWrapperWithInverse(exclude=None, i...0.94450.98680.94450.95250.94380.91610.9207
3Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.94290.92640.95020.92010.88860.9040
4SVM - Linear Kernel(TransformerWrapperWithInverse(exclude=None, i...0.81000.00000.81000.78310.77020.71250.7527
5Ridge Classifier(TransformerWrapperWithInverse(exclude=None, i...0.83180.00000.83180.85450.82810.74590.7595
6Random Forest Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.99030.92640.93430.92320.88860.8956
7Quadratic Discriminant Analysis(TransformerWrapperWithInverse(exclude=None, i...0.97180.99740.97180.97800.97120.95730.9609
8Ada Boost Classifier(TransformerWrapperWithInverse(exclude=None, i...0.91550.98430.91550.94010.90970.87200.8873
9Gradient Boosting Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.96880.92640.93430.92320.88860.8956
10Linear Discriminant Analysis(TransformerWrapperWithInverse(exclude=None, i...0.97181.00000.97180.97800.97120.95730.9609
11Extra Trees Classifier(TransformerWrapperWithInverse(exclude=None, i...0.94450.99350.94450.95860.94260.91610.9246
12Extreme Gradient Boosting(TransformerWrapperWithInverse(exclude=None, i...0.93550.98680.93550.94400.93430.90230.9077
13Light Gradient Boosting Machine(TransformerWrapperWithInverse(exclude=None, i...0.95360.98570.95360.96340.95280.92980.9356
14CatBoost Classifier(TransformerWrapperWithInverse(exclude=None, i...0.94450.99220.94450.95860.94260.91610.9246
15Dummy Classifier(TransformerWrapperWithInverse(exclude=None, i...0.28640.50000.28640.08220.12770.00000.0000
16Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.94290.92640.95020.92010.88860.9040
17Random Forest Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.99030.92640.93430.92320.88860.8956
18Extra Trees Classifier(TransformerWrapperWithInverse(exclude=None, i...0.94450.99350.94450.95860.94260.91610.9246
19Gradient Boosting Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.96880.92640.93430.92320.88860.8956
20Extreme Gradient Boosting(TransformerWrapperWithInverse(exclude=None, i...0.93550.98680.93550.94400.93430.90230.9077
21Light Gradient Boosting Machine(TransformerWrapperWithInverse(exclude=None, i...0.95360.98570.95360.96340.95280.92980.9356
22CatBoost Classifier(TransformerWrapperWithInverse(exclude=None, i...0.94450.99220.94450.95860.94260.91610.9246
23Logistic Regression(TransformerWrapperWithInverse(exclude=None, i...0.90730.97510.90730.91590.90640.85970.8645
24K Neighbors Classifier(TransformerWrapperWithInverse(exclude=None, i...0.96360.98440.96360.97090.96310.94500.9494
25Naive Bayes(TransformerWrapperWithInverse(exclude=None, i...0.94450.98680.94450.95250.94380.91610.9207
26Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.94290.92640.95020.92010.88860.9040
27SVM - Linear Kernel(TransformerWrapperWithInverse(exclude=None, i...0.81000.00000.81000.78310.77020.71250.7527
28Ridge Classifier(TransformerWrapperWithInverse(exclude=None, i...0.83180.00000.83180.85450.82810.74590.7595
29Random Forest Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.99030.92640.93430.92320.88860.8956
30Quadratic Discriminant Analysis(TransformerWrapperWithInverse(exclude=None, i...0.97180.99740.97180.97800.97120.95730.9609
31Ada Boost Classifier(TransformerWrapperWithInverse(exclude=None, i...0.91550.98430.91550.94010.90970.87200.8873
32Gradient Boosting Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.96880.92640.93430.92320.88860.8956
33Linear Discriminant Analysis(TransformerWrapperWithInverse(exclude=None, i...0.97181.00000.97180.97800.97120.95730.9609
34Extra Trees Classifier(TransformerWrapperWithInverse(exclude=None, i...0.94450.99350.94450.95860.94260.91610.9246
35Extreme Gradient Boosting(TransformerWrapperWithInverse(exclude=None, i...0.93550.98680.93550.94400.93430.90230.9077
36Light Gradient Boosting Machine(TransformerWrapperWithInverse(exclude=None, i...0.95360.98570.95360.96340.95280.92980.9356
37CatBoost Classifier(TransformerWrapperWithInverse(exclude=None, i...0.94450.99220.94450.95860.94260.91610.9246
38Dummy Classifier(TransformerWrapperWithInverse(exclude=None, i...0.28640.50000.28640.08220.12770.00000.0000
39Logistic Regression(TransformerWrapperWithInverse(exclude=None, i...0.90730.97510.90730.91590.90640.85970.8645
40Logistic Regression(TransformerWrapperWithInverse(exclude=None, i...0.92380.98270.92380.92720.92340.88560.8877
41Logistic Regression(TransformerWrapperWithInverse(exclude=None, i...0.89730.96840.89730.91080.89550.84450.8525
42Logistic Regression(TransformerWrapperWithInverse(exclude=None, i...0.10760.03600.10760.10790.10770.16280.1628
43Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.94290.92640.95020.92010.88860.9040
44Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.94450.96690.94450.96240.93950.91610.9276
45Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.94290.92640.95020.92010.88860.9040
46Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.94450.96780.94450.96240.93950.91610.9276
47Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.94290.92640.95020.92010.88860.9040
48Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.94450.96690.94450.96240.93950.91610.9276
49Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.94290.92640.95020.92010.88860.9040
50Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.93550.95000.93550.95510.93030.90230.9149
51Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.94290.92640.95020.92010.88860.9040
52Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.93550.98490.93550.95510.93030.90230.9149
53Decision Tree Classifier(TransformerWrapperWithInverse(exclude=None, i...0.92640.94290.92640.93430.92320.88860.8956
54Voting Classifier(TransformerWrapperWithInverse(exclude=None, i...0.97180.99740.97180.97800.97120.95730.9609
55Stacking Classifier(TransformerWrapperWithInverse(exclude=None, i...0.97180.99740.97180.97800.97120.95730.9609
56Light Gradient Boosting Machine(TransformerWrapperWithInverse(exclude=None, i...0.95360.98570.95360.96340.95280.92980.9356
\n", "" ], "text/plain": [ " Model Name \\\n", "Index \n", "0 Logistic Regression \n", "1 K Neighbors Classifier \n", "2 Naive Bayes \n", "3 Decision Tree Classifier \n", "4 SVM - Linear Kernel \n", "5 Ridge Classifier \n", "6 Random Forest Classifier \n", "7 Quadratic Discriminant Analysis \n", "8 Ada Boost Classifier \n", "9 Gradient Boosting Classifier \n", "10 Linear Discriminant Analysis \n", "11 Extra Trees Classifier \n", "12 Extreme Gradient Boosting \n", "13 Light Gradient Boosting Machine \n", "14 CatBoost Classifier \n", "15 Dummy Classifier \n", "16 Decision Tree Classifier \n", "17 Random Forest Classifier \n", "18 Extra Trees Classifier \n", "19 Gradient Boosting Classifier \n", "20 Extreme Gradient Boosting \n", "21 Light Gradient Boosting Machine \n", "22 CatBoost Classifier \n", "23 Logistic Regression \n", "24 K Neighbors Classifier \n", "25 Naive Bayes \n", "26 Decision Tree Classifier \n", "27 SVM - Linear Kernel \n", "28 Ridge Classifier \n", "29 Random Forest Classifier \n", "30 Quadratic Discriminant Analysis \n", "31 Ada Boost Classifier \n", "32 Gradient Boosting Classifier \n", "33 Linear Discriminant Analysis \n", "34 Extra Trees Classifier \n", "35 Extreme Gradient Boosting \n", "36 Light Gradient Boosting Machine \n", "37 CatBoost Classifier \n", "38 Dummy Classifier \n", "39 Logistic Regression \n", "40 Logistic Regression \n", "41 Logistic Regression \n", "42 Logistic Regression \n", "43 Decision Tree Classifier \n", "44 Decision Tree Classifier \n", "45 Decision Tree Classifier \n", "46 Decision Tree Classifier \n", "47 Decision Tree Classifier \n", "48 Decision Tree Classifier \n", "49 Decision Tree Classifier \n", "50 Decision Tree Classifier \n", "51 Decision Tree Classifier \n", "52 Decision Tree Classifier \n", "53 Decision Tree Classifier \n", "54 Voting Classifier \n", "55 Stacking Classifier \n", "56 Light Gradient Boosting Machine \n", "\n", " Model Accuracy AUC \\\n", "Index \n", "0 (TransformerWrapperWithInverse(exclude=None, i... 0.9073 0.9751 \n", "1 (TransformerWrapperWithInverse(exclude=None, i... 0.9636 0.9844 \n", "2 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9868 \n", "3 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9429 \n", "4 (TransformerWrapperWithInverse(exclude=None, i... 0.8100 0.0000 \n", "5 (TransformerWrapperWithInverse(exclude=None, i... 0.8318 0.0000 \n", "6 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9903 \n", "7 (TransformerWrapperWithInverse(exclude=None, i... 0.9718 0.9974 \n", "8 (TransformerWrapperWithInverse(exclude=None, i... 0.9155 0.9843 \n", "9 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9688 \n", "10 (TransformerWrapperWithInverse(exclude=None, i... 0.9718 1.0000 \n", "11 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9935 \n", "12 (TransformerWrapperWithInverse(exclude=None, i... 0.9355 0.9868 \n", "13 (TransformerWrapperWithInverse(exclude=None, i... 0.9536 0.9857 \n", "14 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9922 \n", "15 (TransformerWrapperWithInverse(exclude=None, i... 0.2864 0.5000 \n", "16 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9429 \n", "17 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9903 \n", "18 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9935 \n", "19 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9688 \n", "20 (TransformerWrapperWithInverse(exclude=None, i... 0.9355 0.9868 \n", "21 (TransformerWrapperWithInverse(exclude=None, i... 0.9536 0.9857 \n", "22 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9922 \n", "23 (TransformerWrapperWithInverse(exclude=None, i... 0.9073 0.9751 \n", "24 (TransformerWrapperWithInverse(exclude=None, i... 0.9636 0.9844 \n", "25 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9868 \n", "26 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9429 \n", "27 (TransformerWrapperWithInverse(exclude=None, i... 0.8100 0.0000 \n", "28 (TransformerWrapperWithInverse(exclude=None, i... 0.8318 0.0000 \n", "29 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9903 \n", "30 (TransformerWrapperWithInverse(exclude=None, i... 0.9718 0.9974 \n", "31 (TransformerWrapperWithInverse(exclude=None, i... 0.9155 0.9843 \n", "32 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9688 \n", "33 (TransformerWrapperWithInverse(exclude=None, i... 0.9718 1.0000 \n", "34 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9935 \n", "35 (TransformerWrapperWithInverse(exclude=None, i... 0.9355 0.9868 \n", "36 (TransformerWrapperWithInverse(exclude=None, i... 0.9536 0.9857 \n", "37 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9922 \n", "38 (TransformerWrapperWithInverse(exclude=None, i... 0.2864 0.5000 \n", "39 (TransformerWrapperWithInverse(exclude=None, i... 0.9073 0.9751 \n", "40 (TransformerWrapperWithInverse(exclude=None, i... 0.9238 0.9827 \n", "41 (TransformerWrapperWithInverse(exclude=None, i... 0.8973 0.9684 \n", "42 (TransformerWrapperWithInverse(exclude=None, i... 0.1076 0.0360 \n", "43 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9429 \n", "44 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9669 \n", "45 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9429 \n", "46 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9678 \n", "47 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9429 \n", "48 (TransformerWrapperWithInverse(exclude=None, i... 0.9445 0.9669 \n", "49 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9429 \n", "50 (TransformerWrapperWithInverse(exclude=None, i... 0.9355 0.9500 \n", "51 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9429 \n", "52 (TransformerWrapperWithInverse(exclude=None, i... 0.9355 0.9849 \n", "53 (TransformerWrapperWithInverse(exclude=None, i... 0.9264 0.9429 \n", "54 (TransformerWrapperWithInverse(exclude=None, i... 0.9718 0.9974 \n", "55 (TransformerWrapperWithInverse(exclude=None, i... 0.9718 0.9974 \n", "56 (TransformerWrapperWithInverse(exclude=None, i... 0.9536 0.9857 \n", "\n", " Recall Prec. F1 Kappa MCC \n", "Index \n", "0 0.9073 0.9159 0.9064 0.8597 0.8645 \n", "1 0.9636 0.9709 0.9631 0.9450 0.9494 \n", "2 0.9445 0.9525 0.9438 0.9161 0.9207 \n", "3 0.9264 0.9502 0.9201 0.8886 0.9040 \n", "4 0.8100 0.7831 0.7702 0.7125 0.7527 \n", "5 0.8318 0.8545 0.8281 0.7459 0.7595 \n", "6 0.9264 0.9343 0.9232 0.8886 0.8956 \n", "7 0.9718 0.9780 0.9712 0.9573 0.9609 \n", "8 0.9155 0.9401 0.9097 0.8720 0.8873 \n", "9 0.9264 0.9343 0.9232 0.8886 0.8956 \n", "10 0.9718 0.9780 0.9712 0.9573 0.9609 \n", "11 0.9445 0.9586 0.9426 0.9161 0.9246 \n", "12 0.9355 0.9440 0.9343 0.9023 0.9077 \n", "13 0.9536 0.9634 0.9528 0.9298 0.9356 \n", "14 0.9445 0.9586 0.9426 0.9161 0.9246 \n", "15 0.2864 0.0822 0.1277 0.0000 0.0000 \n", "16 0.9264 0.9502 0.9201 0.8886 0.9040 \n", "17 0.9264 0.9343 0.9232 0.8886 0.8956 \n", "18 0.9445 0.9586 0.9426 0.9161 0.9246 \n", "19 0.9264 0.9343 0.9232 0.8886 0.8956 \n", "20 0.9355 0.9440 0.9343 0.9023 0.9077 \n", "21 0.9536 0.9634 0.9528 0.9298 0.9356 \n", "22 0.9445 0.9586 0.9426 0.9161 0.9246 \n", "23 0.9073 0.9159 0.9064 0.8597 0.8645 \n", "24 0.9636 0.9709 0.9631 0.9450 0.9494 \n", "25 0.9445 0.9525 0.9438 0.9161 0.9207 \n", "26 0.9264 0.9502 0.9201 0.8886 0.9040 \n", "27 0.8100 0.7831 0.7702 0.7125 0.7527 \n", "28 0.8318 0.8545 0.8281 0.7459 0.7595 \n", "29 0.9264 0.9343 0.9232 0.8886 0.8956 \n", "30 0.9718 0.9780 0.9712 0.9573 0.9609 \n", "31 0.9155 0.9401 0.9097 0.8720 0.8873 \n", "32 0.9264 0.9343 0.9232 0.8886 0.8956 \n", "33 0.9718 0.9780 0.9712 0.9573 0.9609 \n", "34 0.9445 0.9586 0.9426 0.9161 0.9246 \n", "35 0.9355 0.9440 0.9343 0.9023 0.9077 \n", "36 0.9536 0.9634 0.9528 0.9298 0.9356 \n", "37 0.9445 0.9586 0.9426 0.9161 0.9246 \n", "38 0.2864 0.0822 0.1277 0.0000 0.0000 \n", "39 0.9073 0.9159 0.9064 0.8597 0.8645 \n", "40 0.9238 0.9272 0.9234 0.8856 0.8877 \n", "41 0.8973 0.9108 0.8955 0.8445 0.8525 \n", "42 0.1076 0.1079 0.1077 0.1628 0.1628 \n", "43 0.9264 0.9502 0.9201 0.8886 0.9040 \n", "44 0.9445 0.9624 0.9395 0.9161 0.9276 \n", "45 0.9264 0.9502 0.9201 0.8886 0.9040 \n", "46 0.9445 0.9624 0.9395 0.9161 0.9276 \n", "47 0.9264 0.9502 0.9201 0.8886 0.9040 \n", "48 0.9445 0.9624 0.9395 0.9161 0.9276 \n", "49 0.9264 0.9502 0.9201 0.8886 0.9040 \n", "50 0.9355 0.9551 0.9303 0.9023 0.9149 \n", "51 0.9264 0.9502 0.9201 0.8886 0.9040 \n", "52 0.9355 0.9551 0.9303 0.9023 0.9149 \n", "53 0.9264 0.9343 0.9232 0.8886 0.8956 \n", "54 0.9718 0.9780 0.9712 0.9573 0.9609 \n", "55 0.9718 0.9780 0.9712 0.9573 0.9609 \n", "56 0.9536 0.9634 0.9528 0.9298 0.9356 " ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get leaderboard\n", "lb = get_leaderboard()\n", "lb" ] }, { "cell_type": "code", "execution_count": 73, "id": "f8a8b060", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n",
       "         steps=[('label_encoding',\n",
       "                 TransformerWrapperWithInverse(exclude=None, include=None,\n",
       "                                               transformer=LabelEncoder())),\n",
       "                ('numerical_imputer',\n",
       "                 TransformerWrapper(exclude=None,\n",
       "                                    include=['sepal_length', 'sepal_width',\n",
       "                                             'petal_length', 'petal_width'],\n",
       "                                    transformer=SimpleImputer(add_indicator=F...\n",
       "                                    transformer=SimpleImputer(add_indicator=False,\n",
       "                                                              copy=True,\n",
       "                                                              fill_value=None,\n",
       "                                                              missing_values=nan,\n",
       "                                                              strategy='most_frequent',\n",
       "                                                              verbose='deprecated'))),\n",
       "                ('normalize',\n",
       "                 TransformerWrapper(exclude=None, include=None,\n",
       "                                    transformer=MinMaxScaler(clip=False,\n",
       "                                                             copy=True,\n",
       "                                                             feature_range=(0,\n",
       "                                                                            1)))),\n",
       "                ['trained_model',\n",
       "                 QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n",
       "                                               store_covariance=False,\n",
       "                                               tol=0.0001)]],\n",
       "         verbose=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n", " steps=[('label_encoding',\n", " TransformerWrapperWithInverse(exclude=None, include=None,\n", " transformer=LabelEncoder())),\n", " ('numerical_imputer',\n", " TransformerWrapper(exclude=None,\n", " include=['sepal_length', 'sepal_width',\n", " 'petal_length', 'petal_width'],\n", " transformer=SimpleImputer(add_indicator=F...\n", " transformer=SimpleImputer(add_indicator=False,\n", " copy=True,\n", " fill_value=None,\n", " missing_values=nan,\n", " strategy='most_frequent',\n", " verbose='deprecated'))),\n", " ('normalize',\n", " TransformerWrapper(exclude=None, include=None,\n", " transformer=MinMaxScaler(clip=False,\n", " copy=True,\n", " feature_range=(0,\n", " 1)))),\n", " ['trained_model',\n", " QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n", " store_covariance=False,\n", " tol=0.0001)]],\n", " verbose=False)" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# select the best model based on F1\n", "lb.sort_values(by='F1', ascending=False)['Model'].iloc[0]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9ecf0bfa", "metadata": {}, "source": [ "Some other parameters that you might find very useful in `get_leaderboard` are:\n", "\n", "- finalize_models\n", "- fit_kwargs\n", "- model_only\n", "- groups\n", "\n", "You can check the docstring of the function for more info." ] }, { "cell_type": "code", "execution_count": 74, "id": "dc76f0a5", "metadata": {}, "outputs": [], "source": [ "# help(get_leaderboard)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "94669c72", "metadata": {}, "source": [ "## ✅ AutoML\n", "This function returns the best model out of all trained models in the current setup based on the optimize parameter. Metrics evaluated can be accessed using the `get_metrics` function." ] }, { "cell_type": "code", "execution_count": 75, "id": "01532054", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n",
       "                              store_covariance=False, tol=0.0001)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n", " store_covariance=False, tol=0.0001)" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "automl()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "726b2986", "metadata": {}, "source": [ "## ✅ Dashboard\n", "The dashboard function generates the interactive dashboard for a trained model. The dashboard is implemented using `ExplainerDashboard`. For more information check out [Explainer Dashboard.](explainerdashboard.readthedocs.io)" ] }, { "cell_type": "code", "execution_count": 76, "id": "ca75507d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: model_output=='probability', so assuming that raw shap output of DecisionTreeClassifier is in probability space...\n", "Generating self.shap_explainer = shap.TreeExplainer(model)\n", "Building ExplainerDashboard..\n", "The explainer object has no decision_trees property. so setting decision_trees=False...\n", "Warning: calculating shap interaction values can be slow! Pass shap_interaction=False to remove interactions tab.\n", "Generating layout...\n", "Calculating shap values...\n", "Calculating prediction probabilities...\n", "Calculating metrics...\n", "Calculating confusion matrices...\n", "Calculating classification_dfs...\n", "Calculating roc auc curves...\n", "Calculating pr auc curves...\n", "Calculating liftcurve_dfs...\n", "Calculating shap interaction values... (this may take a while)\n", "Reminder: TreeShap computational complexity is O(TLD^2), where T is the number of trees, L is the maximum number of leaves in any tree and D the maximal depth of any tree. So reducing these will speed up the calculation.\n", "Calculating dependencies...\n", "Calculating permutation importances (if slow, try setting n_jobs parameter)...\n", "Calculating pred_percentiles...\n", "Calculating predictions...\n", "Reminder: you can store the explainer (including calculated dependencies) with explainer.dump('explainer.joblib') and reload with e.g. ClassifierExplainer.from_file('explainer.joblib')\n", "Registering callbacks...\n", "Starting ExplainerDashboard inline (terminate it with ExplainerDashboard.terminate(8050))\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# dashboard function\n", "dashboard(dt, display_format ='inline')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "58fd3e5a", "metadata": {}, "source": [ "## ✅Create App\n", "This function creates a basic gradio app for inference." ] }, { "cell_type": "code", "execution_count": 79, "id": "5cf989d3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running on local URL: http://127.0.0.1:7860\n", "\n", "To create a public link, set `share=True` in `launch()`.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create gradio app\n", "create_app(best)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a2d8e21d", "metadata": {}, "source": [ "## ✅ Create API\n", "This function takes an input model and creates a POST API for inference." ] }, { "cell_type": "code", "execution_count": 80, "id": "978413c9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "API successfully created. This function only creates a POST API, it doesn't run it automatically. To run your API, please run this command --> !python my_first_api.py\n" ] } ], "source": [ "# create api\n", "create_api(best, api_name = 'my_first_api')" ] }, { "cell_type": "code", "execution_count": 81, "id": "68e539aa", "metadata": {}, "outputs": [], "source": [ "# !python my_first_api.py" ] }, { "cell_type": "code", "execution_count": 82, "id": "a3de3327", "metadata": {}, "outputs": [], "source": [ "# check out the .py file created with this magic command\n", "# %load my_first_api.py" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1023f7df", "metadata": {}, "source": [ "## ✅ Create Docker\n", "This function creates a `Dockerfile` and `requirements.txt` for productionalizing API end-point." ] }, { "cell_type": "code", "execution_count": 83, "id": "452ced14", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing requirements.txt\n", "Writing Dockerfile\n", "Dockerfile and requirements.txt successfully created.\n", " To build image you have to run --> !docker image build -f \"Dockerfile\" -t IMAGE_NAME:IMAGE_TAG .\n", " \n" ] } ], "source": [ "create_docker('my_first_api')" ] }, { "cell_type": "code", "execution_count": 84, "id": "301e1fa5", "metadata": {}, "outputs": [], "source": [ "# check out the DockerFile file created with this magic command\n", "# %load DockerFile" ] }, { "cell_type": "code", "execution_count": 85, "id": "ca1e9ef7", "metadata": {}, "outputs": [], "source": [ "# check out the requirements file created with this magic command\n", "# %load requirements.txt" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e27c212b", "metadata": {}, "source": [ "## ✅ Finalize Model\n", "This function trains a given model on the entire dataset including the hold-out set." ] }, { "cell_type": "code", "execution_count": 86, "id": "65225684", "metadata": {}, "outputs": [], "source": [ "final_best = finalize_model(best)" ] }, { "cell_type": "code", "execution_count": 87, "id": "80d17fec", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n",
       "         steps=[('label_encoding',\n",
       "                 TransformerWrapperWithInverse(exclude=None, include=None,\n",
       "                                               transformer=LabelEncoder())),\n",
       "                ('numerical_imputer',\n",
       "                 TransformerWrapper(exclude=None,\n",
       "                                    include=['sepal_length', 'sepal_width',\n",
       "                                             'petal_length', 'petal_width'],\n",
       "                                    transformer=SimpleImputer(add_indicator=F...\n",
       "                                    transformer=SimpleImputer(add_indicator=False,\n",
       "                                                              copy=True,\n",
       "                                                              fill_value=None,\n",
       "                                                              missing_values=nan,\n",
       "                                                              strategy='most_frequent',\n",
       "                                                              verbose='deprecated'))),\n",
       "                ('normalize',\n",
       "                 TransformerWrapper(exclude=None, include=None,\n",
       "                                    transformer=MinMaxScaler(clip=False,\n",
       "                                                             copy=True,\n",
       "                                                             feature_range=(0,\n",
       "                                                                            1)))),\n",
       "                ('actual_estimator',\n",
       "                 QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n",
       "                                               store_covariance=False,\n",
       "                                               tol=0.0001))],\n",
       "         verbose=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n", " steps=[('label_encoding',\n", " TransformerWrapperWithInverse(exclude=None, include=None,\n", " transformer=LabelEncoder())),\n", " ('numerical_imputer',\n", " TransformerWrapper(exclude=None,\n", " include=['sepal_length', 'sepal_width',\n", " 'petal_length', 'petal_width'],\n", " transformer=SimpleImputer(add_indicator=F...\n", " transformer=SimpleImputer(add_indicator=False,\n", " copy=True,\n", " fill_value=None,\n", " missing_values=nan,\n", " strategy='most_frequent',\n", " verbose='deprecated'))),\n", " ('normalize',\n", " TransformerWrapper(exclude=None, include=None,\n", " transformer=MinMaxScaler(clip=False,\n", " copy=True,\n", " feature_range=(0,\n", " 1)))),\n", " ('actual_estimator',\n", " QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n", " store_covariance=False,\n", " tol=0.0001))],\n", " verbose=False)" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "final_best" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b4693f88", "metadata": {}, "source": [ "## ✅ Convert Model\n", "This function transpiles the trained machine learning model's decision function in different programming languages such as Python, C, Java, Go, C#, etc. It is very useful if you want to deploy models into environments where you can't install your normal Python stack to support model inference." ] }, { "cell_type": "code", "execution_count": 88, "id": "dbe0e9fe", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "public class Model {\n", " public static double[] score(double[] input) {\n", " double[] var0;\n", " if (input[2] <= 0.23275858908891678) {\n", " var0 = new double[] {1.0, 0.0, 0.0};\n", " } else {\n", " if (input[2] <= 0.62931028008461) {\n", " if (input[0] <= 0.180555522441864) {\n", " var0 = new double[] {0.0, 0.0, 1.0};\n", " } else {\n", " var0 = new double[] {0.0, 1.0, 0.0};\n", " }\n", " } else {\n", " if (input[2] <= 0.6637930274009705) {\n", " if (input[1] <= 0.3958333134651184) {\n", " var0 = new double[] {0.0, 0.0, 1.0};\n", " } else {\n", " var0 = new double[] {0.0, 1.0, 0.0};\n", " }\n", " } else {\n", " if (input[3] <= 0.6666666269302368) {\n", " if (input[3] <= 0.6041666567325592) {\n", " var0 = new double[] {0.0, 0.0, 1.0};\n", " } else {\n", " if (input[0] <= 0.6388888359069824) {\n", " var0 = new double[] {0.0, 1.0, 0.0};\n", " } else {\n", " var0 = new double[] {0.0, 0.0, 1.0};\n", " }\n", " }\n", " } else {\n", " var0 = new double[] {0.0, 0.0, 1.0};\n", " }\n", " }\n", " }\n", " }\n", " return var0;\n", " }\n", "}\n", "\n" ] } ], "source": [ "# transpiles learned function to java\n", "print(convert_model(dt, language = 'java'))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ed00202c", "metadata": {}, "source": [ "## ✅ Deploy Model\n", "This function deploys the entire ML pipeline on the cloud.\n", "\n", "**AWS:** When deploying model on AWS S3, environment variables must be configured using the command-line interface. To configure AWS environment variables, type `aws configure` in terminal. The following information is required which can be generated using the Identity and Access Management (IAM) portal of your amazon console account:\n", "\n", "- AWS Access Key ID\n", "- AWS Secret Key Access\n", "- Default Region Name (can be seen under Global settings on your AWS console)\n", "- Default output format (must be left blank)\n", "\n", "**GCP:** To deploy a model on Google Cloud Platform ('gcp'), the project must be created using the command-line or GCP console. Once the project is created, you must create a service account and download the service account key as a JSON file to set environment variables in your local environment. Learn more about it: https://cloud.google.com/docs/authentication/production\n", "\n", "**Azure:** To deploy a model on Microsoft Azure ('azure'), environment variables for the connection string must be set in your local environment. Go to settings of storage account on Azure portal to access the connection string required.\n", "AZURE_STORAGE_CONNECTION_STRING (required as environment variable)\n", "Learn more about it: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?toc=%2Fpython%2Fazure%2FTOC.json" ] }, { "cell_type": "code", "execution_count": 89, "id": "40b20a18", "metadata": {}, "outputs": [], "source": [ "# deploy model on aws s3\n", "# deploy_model(best, model_name = 'my_first_platform_on_aws',\n", "# platform = 'aws', authentication = {'bucket' : 'pycaret-test'})" ] }, { "cell_type": "code", "execution_count": 90, "id": "9e236516", "metadata": {}, "outputs": [], "source": [ "# load model from aws s3\n", "# loaded_from_aws = load_model(model_name = 'my_first_platform_on_aws', platform = 'aws',\n", "# authentication = {'bucket' : 'pycaret-test'})\n", "\n", "# loaded_from_aws" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e169ae86", "metadata": {}, "source": [ "## ✅ Save / Load Model\n", "This function saves the transformation pipeline and a trained model object into the current working directory as a pickle file for later use." ] }, { "cell_type": "code", "execution_count": 91, "id": "bc5cf24a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Successfully Saved\n" ] }, { "data": { "text/plain": [ "(Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n", " steps=[('label_encoding',\n", " TransformerWrapperWithInverse(exclude=None, include=None,\n", " transformer=LabelEncoder())),\n", " ('numerical_imputer',\n", " TransformerWrapper(exclude=None,\n", " include=['sepal_length', 'sepal_width',\n", " 'petal_length', 'petal_width'],\n", " transformer=SimpleImputer(add_indicator=F...\n", " transformer=SimpleImputer(add_indicator=False,\n", " copy=True,\n", " fill_value=None,\n", " missing_values=nan,\n", " strategy='most_frequent',\n", " verbose='deprecated'))),\n", " ('normalize',\n", " TransformerWrapper(exclude=None, include=None,\n", " transformer=MinMaxScaler(clip=False,\n", " copy=True,\n", " feature_range=(0,\n", " 1)))),\n", " ('trained_model',\n", " QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n", " store_covariance=False,\n", " tol=0.0001))],\n", " verbose=False),\n", " 'my_first_model.pkl')" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# save model\n", "save_model(best, 'my_first_model')" ] }, { "cell_type": "code", "execution_count": 92, "id": "e8478d34", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Successfully Loaded\n" ] }, { "data": { "text/html": [ "
Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n",
       "         steps=[('label_encoding',\n",
       "                 TransformerWrapperWithInverse(exclude=None, include=None,\n",
       "                                               transformer=LabelEncoder())),\n",
       "                ('numerical_imputer',\n",
       "                 TransformerWrapper(exclude=None,\n",
       "                                    include=['sepal_length', 'sepal_width',\n",
       "                                             'petal_length', 'petal_width'],\n",
       "                                    transformer=SimpleImputer(add_indicator=F...\n",
       "                                    transformer=SimpleImputer(add_indicator=False,\n",
       "                                                              copy=True,\n",
       "                                                              fill_value=None,\n",
       "                                                              missing_values=nan,\n",
       "                                                              strategy='most_frequent',\n",
       "                                                              verbose='deprecated'))),\n",
       "                ('normalize',\n",
       "                 TransformerWrapper(exclude=None, include=None,\n",
       "                                    transformer=MinMaxScaler(clip=False,\n",
       "                                                             copy=True,\n",
       "                                                             feature_range=(0,\n",
       "                                                                            1)))),\n",
       "                ('trained_model',\n",
       "                 QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n",
       "                                               store_covariance=False,\n",
       "                                               tol=0.0001))],\n",
       "         verbose=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(memory=FastMemory(location=C:\\Users\\owner\\AppData\\Local\\Temp\\joblib),\n", " steps=[('label_encoding',\n", " TransformerWrapperWithInverse(exclude=None, include=None,\n", " transformer=LabelEncoder())),\n", " ('numerical_imputer',\n", " TransformerWrapper(exclude=None,\n", " include=['sepal_length', 'sepal_width',\n", " 'petal_length', 'petal_width'],\n", " transformer=SimpleImputer(add_indicator=F...\n", " transformer=SimpleImputer(add_indicator=False,\n", " copy=True,\n", " fill_value=None,\n", " missing_values=nan,\n", " strategy='most_frequent',\n", " verbose='deprecated'))),\n", " ('normalize',\n", " TransformerWrapper(exclude=None, include=None,\n", " transformer=MinMaxScaler(clip=False,\n", " copy=True,\n", " feature_range=(0,\n", " 1)))),\n", " ('trained_model',\n", " QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,\n", " store_covariance=False,\n", " tol=0.0001))],\n", " verbose=False)" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# load model\n", "loaded_from_disk = load_model('my_first_model')\n", "loaded_from_disk" ] }, { "attachments": {}, "cell_type": "markdown", "id": "de5eee8c", "metadata": {}, "source": [ "## ✅ Save / Load Experiment\n", "This function saves all the experiment variables on disk, allowing to later resume without rerunning the setup function." ] }, { "cell_type": "code", "execution_count": 93, "id": "6a3c61b6", "metadata": {}, "outputs": [], "source": [ "# save experiment\n", "save_experiment('my_experiment')" ] }, { "cell_type": "code", "execution_count": 94, "id": "83252c09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0Session id123
1Targetspecies
2Target typeMulticlass
3Target mappingIris-setosa: 0, Iris-versicolor: 1, Iris-virginica: 2
4Original data shape(150, 5)
5Transformed data shape(150, 5)
6Transformed train set shape(105, 5)
7Transformed test set shape(45, 5)
8Numeric features4
9PreprocessTrue
10Imputation typesimple
11Numeric imputationmean
12Categorical imputationmode
13NormalizeTrue
14Normalize methodminmax
15Fold GeneratorStratifiedKFold
16Fold Number10
17CPU Jobs-1
18Use GPUFalse
19Log ExperimentFalse
20Experiment Nameclf-default-name
21USI9a69
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# load experiment from disk\n", "exp_from_disk = load_experiment('my_experiment', data=data)" ] } ], "metadata": { "kernelspec": { "display_name": "clean", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }