{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 👉 What is PyCaret?\n", "\n", "PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.\n", "\n", "In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more.\n", "\n", "The design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more expertise. Seasoned data scientists are often difficult to find and expensive to hire but citizen data scientists can be an effective way to mitigate this gap and address data-related challenges in the business setting.\n", "\n", "Official Website: https://www.pycaret.org\n", "Documentation: https://pycaret.readthedocs.io/en/latest/" ] }, { "attachments": { "image.png": { "image/png": "" } }, "cell_type": "markdown", "metadata": {}, "source": [ "![image.png](attachment:image.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉 Install PyCaret\n", "Installing PyCaret is very easy and takes only a few minutes. We strongly recommend using a virtual environment to avoid potential conflicts with other libraries. PyCaret's default installation is a slim version of pycaret that only installs hard dependencies that are listed in [requirements.txt](https://github.com/pycaret/pycaret/blob/master/requirements.txt). To install the default version:\n", "\n", "- `pip install pycaret`\n", "\n", "When you install the full version of pycaret, all the optional dependencies as listed [here](https://github.com/pycaret/pycaret/blob/master/requirements-optional.txt) are also installed.To install version:\n", "\n", "- `pip install pycaret[full]`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2.3.4'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check version\n", "from pycaret.utils import version\n", "version()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉Dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexbmichildrensmokerregioncharges
019female27.9000yessouthwest16884.92400
118male33.7701nosoutheast1725.55230
228male33.0003nosoutheast4449.46200
333male22.7050nonorthwest21984.47061
432male28.8800nonorthwest3866.85520
\n", "
" ], "text/plain": [ " age sex bmi children smoker region charges\n", "0 19 female 27.900 0 yes southwest 16884.92400\n", "1 18 male 33.770 1 no southeast 1725.55230\n", "2 28 male 33.000 3 no southeast 4449.46200\n", "3 33 male 22.705 0 no northwest 21984.47061\n", "4 32 male 28.880 0 no northwest 3866.85520" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pycaret.datasets import get_data\n", "data = get_data('insurance')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1338, 7)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉 Data Preparation" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id123
1Targetcharges
2Original Data(1338, 7)
3Missing ValuesFalse
4Numeric Features2
5Categorical Features4
6Ordinal FeaturesFalse
7High Cardinality FeaturesFalse
8High Cardinality MethodNone
9Transformed Train Set(936, 14)
10Transformed Test Set(402, 14)
11Shuffle Train-TestTrue
12Stratify Train-TestFalse
13Fold GeneratorKFold
14Fold Number10
15CPU Jobs-1
16Use GPUFalse
17Log ExperimentFalse
18Experiment Namereg-default-name
19USI3ff5
20Imputation Typesimple
21Iterative Imputation IterationNone
22Numeric Imputermean
23Iterative Imputation Numeric ModelNone
24Categorical Imputerconstant
25Iterative Imputation Categorical ModelNone
26Unknown Categoricals Handlingleast_frequent
27NormalizeFalse
28Normalize MethodNone
29TransformationFalse
30Transformation MethodNone
31PCAFalse
32PCA MethodNone
33PCA ComponentsNone
34Ignore Low VarianceFalse
35Combine Rare LevelsFalse
36Rare Level ThresholdNone
37Numeric BinningFalse
38Remove OutliersFalse
39Outliers ThresholdNone
40Remove MulticollinearityFalse
41Multicollinearity ThresholdNone
42Remove Perfect CollinearityTrue
43ClusteringFalse
44Clustering IterationNone
45Polynomial FeaturesFalse
46Polynomial DegreeNone
47Trignometry FeaturesFalse
48Polynomial ThresholdNone
49Group FeaturesFalse
50Feature SelectionFalse
51Feature Selection Methodclassic
52Features Selection ThresholdNone
53Feature InteractionFalse
54Feature RatioFalse
55Interaction ThresholdNone
56Transform TargetFalse
57Transform Target Methodbox-cox
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pycaret.regression import *\n", "s = setup(data, target = 'charges', session_id = 123)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agebmisex_femalechildren_0children_1children_2children_3children_4children_5smoker_yesregion_northeastregion_northwestregion_southeastregion_southwest
30036.027.5499990.00.00.00.01.00.00.00.01.00.00.00.0
90460.035.0999981.01.00.00.00.00.00.00.00.00.00.01.0
67030.031.5700000.00.00.00.01.00.00.00.00.00.01.00.0
61749.025.6000000.00.00.01.00.00.00.01.00.00.00.01.0
37326.032.9000020.00.00.01.00.00.00.01.00.00.00.01.0
.............................................
123837.022.7050000.00.00.00.01.00.00.00.01.00.00.00.0
114720.031.9200001.01.00.00.00.00.00.00.00.01.00.00.0
10619.028.4000001.00.01.00.00.00.00.00.00.00.00.01.0
104118.023.0849990.01.00.00.00.00.00.00.01.00.00.00.0
112253.036.8600011.00.00.00.01.00.00.01.00.01.00.00.0
\n", "

936 rows × 14 columns

\n", "
" ], "text/plain": [ " age bmi sex_female children_0 children_1 children_2 \\\n", "300 36.0 27.549999 0.0 0.0 0.0 0.0 \n", "904 60.0 35.099998 1.0 1.0 0.0 0.0 \n", "670 30.0 31.570000 0.0 0.0 0.0 0.0 \n", "617 49.0 25.600000 0.0 0.0 0.0 1.0 \n", "373 26.0 32.900002 0.0 0.0 0.0 1.0 \n", "... ... ... ... ... ... ... \n", "1238 37.0 22.705000 0.0 0.0 0.0 0.0 \n", "1147 20.0 31.920000 1.0 1.0 0.0 0.0 \n", "106 19.0 28.400000 1.0 0.0 1.0 0.0 \n", "1041 18.0 23.084999 0.0 1.0 0.0 0.0 \n", "1122 53.0 36.860001 1.0 0.0 0.0 0.0 \n", "\n", " children_3 children_4 children_5 smoker_yes region_northeast \\\n", "300 1.0 0.0 0.0 0.0 1.0 \n", "904 0.0 0.0 0.0 0.0 0.0 \n", "670 1.0 0.0 0.0 0.0 0.0 \n", "617 0.0 0.0 0.0 1.0 0.0 \n", "373 0.0 0.0 0.0 1.0 0.0 \n", "... ... ... ... ... ... \n", "1238 1.0 0.0 0.0 0.0 1.0 \n", "1147 0.0 0.0 0.0 0.0 0.0 \n", "106 0.0 0.0 0.0 0.0 0.0 \n", "1041 0.0 0.0 0.0 0.0 1.0 \n", "1122 1.0 0.0 0.0 1.0 0.0 \n", "\n", " region_northwest region_southeast region_southwest \n", "300 0.0 0.0 0.0 \n", "904 0.0 0.0 1.0 \n", "670 0.0 1.0 0.0 \n", "617 0.0 0.0 1.0 \n", "373 0.0 0.0 1.0 \n", "... ... ... ... \n", "1238 0.0 0.0 0.0 \n", "1147 1.0 0.0 0.0 \n", "106 0.0 0.0 1.0 \n", "1041 0.0 0.0 0.0 \n", "1122 1.0 0.0 0.0 \n", "\n", "[936 rows x 14 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check transformed X_train\n", "get_config('X_train')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['age', 'bmi', 'sex_female', 'children_0', 'children_1', 'children_2',\n", " 'children_3', 'children_4', 'children_5', 'smoker_yes',\n", " 'region_northeast', 'region_northwest', 'region_southeast',\n", " 'region_southwest'],\n", " dtype='object')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# list columns of transformed X_train \n", "get_config('X_train').columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉Model Training & Selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare Models" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMAEMSERMSER2RMSLEMAPETT (Sec)
gbrGradient Boosting Regressor2702.768023242056.44094801.57040.83480.43970.31130.0270
rfRandom Forest Regressor2736.745524862762.23054970.69590.82130.46740.32940.1110
catboostCatBoost Regressor2865.446225334554.77265017.85740.81930.47740.34190.4420
lightgbmLight Gradient Boosting Machine2959.558425236477.04565013.08920.81710.54270.36850.0930
adaAdaBoost Regressor4162.232328328260.09555316.61460.79850.63490.72630.0090
etExtra Trees Regressor2814.296428815493.02605339.08790.79640.48890.33500.0990
xgboostExtreme Gradient Boosting3302.321531739266.60005615.59410.77010.56610.42180.1400
llarLasso Least Angle Regression4315.789538355976.50806173.87400.73110.61050.44150.0050
ridgeRidge Regression4336.230938381496.80006175.95410.73090.61930.44540.0050
lrLinear Regression4323.613638380061.20006175.71640.73080.61750.44320.5930
brBayesian Ridge4333.688138381669.36296175.94760.73080.61510.44500.0060
lassoLasso Regression4323.068838375137.80006175.38010.73080.61400.44310.0060
larLeast Angle Regression4450.267539682987.58966267.89240.72320.64670.46930.0060
dtDecision Tree Regressor3148.340243766011.64916584.71980.68550.53310.34550.0070
huberHuber Regressor3455.299748908984.40596971.26420.65450.47900.21740.0150
ompOrthogonal Matching Pursuit5754.776857503216.42907566.70930.59970.74180.89900.0060
parPassive Aggressive Regressor4164.784361324373.48357747.83320.58400.47240.25860.0070
enElastic Net7369.057390443346.80009468.67820.37910.73770.92560.0050
knnK Neighbors Regressor7805.8425126951808.000011221.65350.12180.83980.91470.0070
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# train all models using default hyperparameters\n", "best = compare_models()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',\n", " init=None, learning_rate=0.1, loss='ls', max_depth=3,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=100,\n", " n_iter_no_change=None, presort='deprecated',\n", " random_state=123, subsample=1.0, tol=0.0001,\n", " validation_fraction=0.1, verbose=0, warm_start=False)\n" ] } ], "source": [ "print(best)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sklearn.ensemble._gb.GradientBoostingRegressor" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(best)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Model" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 MAEMSERMSER2RMSLEMAPE
03001.229437001480.25906082.88420.77900.49840.3140
13389.888549305179.57327021.76470.71330.55740.3361
22926.019142025684.66666482.72200.46790.62150.4025
32744.714434078761.45075837.70170.71540.54120.3740
43924.481659489464.32077712.94140.55750.64550.4796
53322.543542747575.44536538.16300.72500.48690.2928
63158.704749369669.16527026.35530.66410.45110.3089
72405.297031318616.64405596.30380.82780.44970.1434
83021.546139091793.37756252.34300.74750.51170.4381
93588.977253231891.58897296.01890.65710.56790.3653
Mean3148.340243766011.64916584.71980.68550.53310.3455
SD410.79538481549.4829638.33900.10050.06310.0878
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# train individual model\n", "dt = create_model('dt')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort='deprecated',\n", " random_state=123, splitter='best')\n" ] } ], "source": [ "print(dt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tune Hyperparameters" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 MAEMSERMSER2RMSLEMAPE
01710.086718253568.89624272.41960.89100.34350.1349
12342.961833002910.78565744.81600.80810.44620.1421
21992.688423279759.59444824.91030.70530.46720.1580
32250.271125594847.87505059.13510.78630.42460.2126
42157.451624978154.43904997.81500.81420.43630.1531
51991.328818794342.27884335.24420.87910.33990.1565
61688.393520093049.82254482.52720.86330.31370.1210
72060.814526178263.62995116.46980.85610.46130.1332
82088.226023545921.72294852.41400.84790.37410.1592
92233.198527217915.96315217.07930.82470.43020.1662
Mean2051.542124093873.50074890.28300.82760.40370.1537
SD206.30664191347.7243423.09020.05140.05290.0238
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Wall time: 975 ms\n" ] } ], "source": [ "%%time\n", "# tune hyperparameters of model\n", "tuned_dt = tune_model(dt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ensemble Model" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 MAEMSERMSER2RMSLEMAPE
01776.479918836339.80374340.08520.88750.34810.1376
12300.353230644279.55475535.72760.82180.42430.1539
21924.225820323874.47934508.20080.74270.43670.1694
32163.461522106257.53694701.72920.81540.38460.1967
42173.851425725889.54175072.06960.80870.44480.1605
52161.040618397565.93044289.23840.88170.33260.1567
61726.266119731367.89774442.00040.86570.33100.1356
72079.348624270253.89624926.48490.86650.42050.1367
81986.667520824366.80004563.37230.86550.36570.1718
92103.886925488147.23155048.57870.83580.41280.1513
Mean2039.558222634834.26724742.74870.83910.39010.1570
SD174.74023663777.6927375.72450.04170.04120.0180
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bagged_tunned_dt = ensemble_model(tuned_dt)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BaggingRegressor(base_estimator=DecisionTreeRegressor(ccp_alpha=0.0,\n", " criterion='mae',\n", " max_depth=6,\n", " max_features=1.0,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.002,\n", " min_impurity_split=None,\n", " min_samples_leaf=5,\n", " min_samples_split=5,\n", " min_weight_fraction_leaf=0.0,\n", " presort='deprecated',\n", " random_state=123,\n", " splitter='best'),\n", " bootstrap=True, bootstrap_features=False, max_features=1.0,\n", " max_samples=1.0, n_estimators=10, n_jobs=None, oob_score=False,\n", " random_state=123, verbose=0, warm_start=False)\n" ] } ], "source": [ "print(bagged_tunned_dt)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sklearn.ensemble._bagging.BaggingRegressor" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(bagged_tunned_dt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Voting Ensemble" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 MAEMSERMSER2RMSLEMAPE
01684.928918347418.56224283.38870.89040.34220.1300
12268.635331524193.39445614.64100.81670.42710.1416
21926.869121534874.63784640.56840.72730.44810.1605
32159.711923453407.47754842.87180.80420.39880.1923
42139.003425136877.94845013.66910.81300.43840.1554
52052.605717787615.21154217.53660.88560.32960.1540
61677.841519736734.14144442.60440.86570.31870.1243
72027.892425005479.82755000.54800.86250.43400.1293
82014.010021739953.64154662.61230.85960.36480.1631
92135.503026042026.25105103.13890.83230.41660.1543
Mean2008.700123030858.10934782.15790.83570.39180.1505
SD186.21083922992.8879402.27330.04610.04630.0192
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "blender = blend_models([tuned_dt, bagged_tunned_dt])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "VotingRegressor(estimators=[('dt',\n", " DecisionTreeRegressor(ccp_alpha=0.0,\n", " criterion='mae', max_depth=6,\n", " max_features=1.0,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.002,\n", " min_impurity_split=None,\n", " min_samples_leaf=5,\n", " min_samples_split=5,\n", " min_weight_fraction_leaf=0.0,\n", " presort='deprecated',\n", " random_state=123,\n", " splitter='best')),\n", " ('Bagging',\n", " BaggingRegressor(base_estima...\n", " min_impurity_decrease=0.002,\n", " min_impurity_split=None,\n", " min_samples_leaf=5,\n", " min_samples_split=5,\n", " min_weight_fraction_leaf=0.0,\n", " presort='deprecated',\n", " random_state=123,\n", " splitter='best'),\n", " bootstrap=True,\n", " bootstrap_features=False,\n", " max_features=1.0, max_samples=1.0,\n", " n_estimators=10, n_jobs=None,\n", " oob_score=False, random_state=123,\n", " verbose=0, warm_start=False))],\n", " n_jobs=-1, verbose=False, weights=None)\n" ] } ], "source": [ "print(blender)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sklearn.ensemble._voting.VotingRegressor" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(blender)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stacking Ensemble" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 MAEMSERMSER2RMSLEMAPE
02137.434916935113.31774115.22940.89880.34730.2396
12723.007528256992.10765315.73060.83570.42160.2516
22569.887720111251.03084484.55690.74540.43240.2948
32610.210321629398.57724650.74170.81940.40610.3187
42488.733623836624.95264882.27660.82270.45030.2510
52618.461017348168.39924165.11330.88840.33560.2596
62284.533618712209.23194325.76110.87270.35190.2613
72554.460423822535.50814880.83350.86900.42460.2414
82447.761118035323.77854246.80160.88350.38170.3028
92596.618724474708.09004947.19190.84240.43450.2704
Mean2503.110921316232.49944601.42370.84780.39860.2691
SD165.44953532975.1686378.32880.04320.03930.0258
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stacker = stack_models([tuned_dt, bagged_tunned_dt])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "StackingRegressor(cv=KFold(n_splits=10, random_state=RandomState(MT19937) at 0x1D89F110A40,\n", " shuffle=False),\n", " estimators=[('dt',\n", " DecisionTreeRegressor(ccp_alpha=0.0,\n", " criterion='mae',\n", " max_depth=6,\n", " max_features=1.0,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.002,\n", " min_impurity_split=None,\n", " min_samples_leaf=5,\n", " min_samples_split=5,\n", " min_weight_fraction_leaf=0.0,\n", " presort=...\n", " min_weight_fraction_leaf=0.0,\n", " presort='deprecated',\n", " random_state=123,\n", " splitter='best'),\n", " bootstrap=True,\n", " bootstrap_features=False,\n", " max_features=1.0,\n", " max_samples=1.0,\n", " n_estimators=10, n_jobs=None,\n", " oob_score=False,\n", " random_state=123, verbose=0,\n", " warm_start=False))],\n", " final_estimator=LinearRegression(copy_X=True,\n", " fit_intercept=True,\n", " n_jobs=-1, normalize=False),\n", " n_jobs=-1, passthrough=True, verbose=0)\n" ] } ], "source": [ "print(stacker)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyze Model" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6d3160b6c19441cf8b489ec2a9391d60", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "evaluate_model(best)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "interpret_model(tuned_dt)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", " Visualization omitted, Javascript library not loaded!
\n", " Have you run `initjs()` in this notebook? If this notebook was from another\n", " user you must also trust this notebook (File -> Trust notebook). If you are viewing\n", " this notebook on github the Javascript has been stripped for security. If you are using\n", " JupyterLab this error is because a JupyterLab extension has not yet been written.\n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interpret_model(tuned_dt, plot = 'reason', observation=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Predictions" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMAEMSERMSER2RMSLEMAPE
0Gradient Boosting Regressor2386.201817296249.13794158.87590.87890.39850.2922
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# predict on holdout / test set\n", "pred_holdout = predict_model(best);" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agebmisex_femalechildren_0children_1children_2children_3children_4children_5smoker_yesregion_northeastregion_northwestregion_southeastregion_southwestchargesLabel
049.042.6800001.00.00.01.00.00.00.00.00.00.01.00.09800.88867210621.483595
132.037.3349990.00.01.00.00.00.00.00.01.00.00.00.04667.6074227290.151941
227.031.4000001.01.00.00.00.00.00.01.00.00.00.01.034838.87109436012.959871
335.024.1299990.00.01.00.00.00.00.00.00.01.00.00.05125.2158207553.788882
460.025.7400000.01.00.00.00.00.00.00.00.00.01.00.012142.57812514904.032497
\n", "
" ], "text/plain": [ " age bmi sex_female children_0 children_1 children_2 \\\n", "0 49.0 42.680000 1.0 0.0 0.0 1.0 \n", "1 32.0 37.334999 0.0 0.0 1.0 0.0 \n", "2 27.0 31.400000 1.0 1.0 0.0 0.0 \n", "3 35.0 24.129999 0.0 0.0 1.0 0.0 \n", "4 60.0 25.740000 0.0 1.0 0.0 0.0 \n", "\n", " children_3 children_4 children_5 smoker_yes region_northeast \\\n", "0 0.0 0.0 0.0 0.0 0.0 \n", "1 0.0 0.0 0.0 0.0 1.0 \n", "2 0.0 0.0 0.0 1.0 0.0 \n", "3 0.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 \n", "\n", " region_northwest region_southeast region_southwest charges \\\n", "0 0.0 1.0 0.0 9800.888672 \n", "1 0.0 0.0 0.0 4667.607422 \n", "2 0.0 0.0 1.0 34838.871094 \n", "3 1.0 0.0 0.0 5125.215820 \n", "4 0.0 1.0 0.0 12142.578125 \n", "\n", " Label \n", "0 10621.483595 \n", "1 7290.151941 \n", "2 36012.959871 \n", "3 7553.788882 \n", "4 14904.032497 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pred_holdout.head()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexbmichildrensmokerregion
019female27.9000yessouthwest
118male33.7701nosoutheast
228male33.0003nosoutheast
333male22.7050nonorthwest
432male28.8800nonorthwest
\n", "
" ], "text/plain": [ " age sex bmi children smoker region\n", "0 19 female 27.900 0 yes southwest\n", "1 18 male 33.770 1 no southeast\n", "2 28 male 33.000 3 no southeast\n", "3 33 male 22.705 0 no northwest\n", "4 32 male 28.880 0 no northwest" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# predict on new data\n", "data2 = data.copy()\n", "data2.drop('charges', axis=1, inplace=True)\n", "data2.head()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# finalize model\n", "best_final = finalize_model(best)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexbmichildrensmokerregionLabel
019female27.9000yessouthwest18894.260073
118male33.7701nosoutheast3698.287534
228male33.0003nosoutheast6029.271578
333male22.7050nonorthwest8958.189116
432male28.8800nonorthwest3900.039002
\n", "
" ], "text/plain": [ " age sex bmi children smoker region Label\n", "0 19 female 27.900 0 yes southwest 18894.260073\n", "1 18 male 33.770 1 no southeast 3698.287534\n", "2 28 male 33.000 3 no southeast 6029.271578\n", "3 33 male 22.705 0 no northwest 8958.189116\n", "4 32 male 28.880 0 no northwest 3900.039002" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# predict on data2\n", "predictions = predict_model(best_final, data=data2)\n", "predictions.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉 Save / Load / Deploy Model" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Successfully Saved\n" ] }, { "data": { "text/plain": [ "(Pipeline(memory=None,\n", " steps=[('dtypes',\n", " DataTypes_Auto_infer(categorical_features=[],\n", " display_types=True, features_todrop=[],\n", " id_columns=[], ml_usecase='regression',\n", " numerical_features=[], target='charges',\n", " time_features=[])),\n", " ('imputer',\n", " Simple_Imputer(categorical_strategy='not_available',\n", " fill_value_categorical=None,\n", " fill_value_numerical=None,\n", " numeric_strategy...\n", " learning_rate=0.1, loss='ls',\n", " max_depth=3, max_features=None,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_impurity_split=None,\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " n_estimators=100,\n", " n_iter_no_change=None,\n", " presort='deprecated',\n", " random_state=123, subsample=1.0,\n", " tol=0.0001, validation_fraction=0.1,\n", " verbose=0, warm_start=False)]],\n", " verbose=False),\n", " 'insurance-pipeline.pkl')" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "save_model(best_final, 'insurance-pipeline')" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Successfully Loaded\n" ] } ], "source": [ "loaded_pipeline = load_model('insurance-pipeline')" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pipeline(memory=None,\n", " steps=[('dtypes',\n", " DataTypes_Auto_infer(categorical_features=[],\n", " display_types=True, features_todrop=[],\n", " id_columns=[], ml_usecase='regression',\n", " numerical_features=[], target='charges',\n", " time_features=[])),\n", " ('imputer',\n", " Simple_Imputer(categorical_strategy='not_available',\n", " fill_value_categorical=None,\n", " fill_value_numerical=None,\n", " numeric_strategy...\n", " learning_rate=0.1, loss='ls',\n", " max_depth=3, max_features=None,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_impurity_split=None,\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " n_estimators=100,\n", " n_iter_no_change=None,\n", " presort='deprecated',\n", " random_state=123, subsample=1.0,\n", " tol=0.0001, validation_fraction=0.1,\n", " verbose=0, warm_start=False)]],\n", " verbose=False)\n" ] } ], "source": [ "print(loaded_pipeline)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# visualize pipeline\n", "from sklearn import set_config\n", "set_config(display = 'diagram')" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Pipeline(memory=None,\n",
       "         steps=[('dtypes',\n",
       "                 DataTypes_Auto_infer(categorical_features=[],\n",
       "                                      display_types=True, features_todrop=[],\n",
       "                                      id_columns=[], ml_usecase='regression',\n",
       "                                      numerical_features=[], target='charges',\n",
       "                                      time_features=[])),\n",
       "                ('imputer',\n",
       "                 Simple_Imputer(categorical_strategy='not_available',\n",
       "                                fill_value_categorical=None,\n",
       "                                fill_value_numerical=None,\n",
       "                                numeric_strategy...\n",
       "                                           learning_rate=0.1, loss='ls',\n",
       "                                           max_depth=3, max_features=None,\n",
       "                                           max_leaf_nodes=None,\n",
       "                                           min_impurity_decrease=0.0,\n",
       "                                           min_impurity_split=None,\n",
       "                                           min_samples_leaf=1,\n",
       "                                           min_samples_split=2,\n",
       "                                           min_weight_fraction_leaf=0.0,\n",
       "                                           n_estimators=100,\n",
       "                                           n_iter_no_change=None,\n",
       "                                           presort='deprecated',\n",
       "                                           random_state=123, subsample=1.0,\n",
       "                                           tol=0.0001, validation_fraction=0.1,\n",
       "                                           verbose=0, warm_start=False)]],\n",
       "         verbose=False)
DataTypes_Auto_infer(ml_usecase='regression', target='charges')
Simple_Imputer(categorical_strategy='not_available',\n",
       "               fill_value_categorical=None, fill_value_numerical=None,\n",
       "               numeric_strategy='mean', target_variable=None)
New_Catagorical_Levels_in_TestData(replacement_strategy='least frequent',\n",
       "                                   target='charges')
passthrough
passthrough
passthrough
passthrough
New_Catagorical_Levels_in_TestData(replacement_strategy='least frequent',\n",
       "                                   target='charges')
Make_Time_Features(list_of_features=None,\n",
       "                   time_feature=Index([], dtype='object'))
passthrough
passthrough
passthrough
passthrough
passthrough
passthrough
passthrough
Dummify(target='charges')
Remove_100(target='charges')
Clean_Colum_Names()
passthrough
passthrough
passthrough
passthrough
GradientBoostingRegressor(random_state=123)
" ], "text/plain": [ "Pipeline(memory=None,\n", " steps=[('dtypes',\n", " DataTypes_Auto_infer(categorical_features=[],\n", " display_types=True, features_todrop=[],\n", " id_columns=[], ml_usecase='regression',\n", " numerical_features=[], target='charges',\n", " time_features=[])),\n", " ('imputer',\n", " Simple_Imputer(categorical_strategy='not_available',\n", " fill_value_categorical=None,\n", " fill_value_numerical=None,\n", " numeric_strategy...\n", " learning_rate=0.1, loss='ls',\n", " max_depth=3, max_features=None,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_impurity_split=None,\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " n_estimators=100,\n", " n_iter_no_change=None,\n", " presort='deprecated',\n", " random_state=123, subsample=1.0,\n", " tol=0.0001, validation_fraction=0.1,\n", " verbose=0, warm_start=False)]],\n", " verbose=False)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loaded_pipeline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## THE END" ] } ], "metadata": { "kernelspec": { "display_name": "pycaret-new", "language": "python", "name": "pycaret-new" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 2 }