{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 👉 What is PyCaret?\n", "\n", "PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.\n", "\n", "In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more.\n", "\n", "The design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more expertise. Seasoned data scientists are often difficult to find and expensive to hire but citizen data scientists can be an effective way to mitigate this gap and address data-related challenges in the business setting.\n", "\n", "Official Website: https://www.pycaret.org\n", "Documentation: https://pycaret.readthedocs.io/en/latest/" ] }, { "attachments": { "image.png": { "image/png": "" } }, "cell_type": "markdown", "metadata": {}, "source": [ "![image.png](attachment:image.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉 Install PyCaret\n", "Installing PyCaret is very easy and takes only a few minutes. We strongly recommend using a virtual environment to avoid potential conflicts with other libraries. PyCaret's default installation is a slim version of pycaret that only installs hard dependencies that are listed in [requirements.txt](https://github.com/pycaret/pycaret/blob/master/requirements.txt). To install the default version:\n", "\n", "- `pip install pycaret`\n", "\n", "When you install the full version of pycaret, all the optional dependencies as listed [here](https://github.com/pycaret/pycaret/blob/master/requirements-optional.txt) are also installed.To install version:\n", "\n", "- `pip install pycaret[full]`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉Dataset" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexbmichildrensmokerregioncharges
019female27.9000yessouthwest16884.92400
118male33.7701nosoutheast1725.55230
228male33.0003nosoutheast4449.46200
333male22.7050nonorthwest21984.47061
432male28.8800nonorthwest3866.85520
\n", "
" ], "text/plain": [ " age sex bmi children smoker region charges\n", "0 19 female 27.900 0 yes southwest 16884.92400\n", "1 18 male 33.770 1 no southeast 1725.55230\n", "2 28 male 33.000 3 no southeast 4449.46200\n", "3 33 male 22.705 0 no northwest 21984.47061\n", "4 32 male 28.880 0 no northwest 3866.85520" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pycaret.datasets import get_data\n", "data = get_data('insurance')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉 Data Preparation" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Description Value
0session_id123
1Targetcharges
2Original Data(1338, 7)
3Missing ValuesFalse
4Numeric Features2
5Categorical Features4
6Ordinal FeaturesFalse
7High Cardinality FeaturesFalse
8High Cardinality MethodNone
9Transformed Train Set(936, 14)
10Transformed Test Set(402, 14)
11Shuffle Train-TestTrue
12Stratify Train-TestFalse
13Fold GeneratorKFold
14Fold Number10
15CPU Jobs-1
16Use GPUFalse
17Log ExperimentFalse
18Experiment Namereg-default-name
19USIecac
20Imputation Typesimple
21Iterative Imputation IterationNone
22Numeric Imputermean
23Iterative Imputation Numeric ModelNone
24Categorical Imputerconstant
25Iterative Imputation Categorical ModelNone
26Unknown Categoricals Handlingleast_frequent
27NormalizeFalse
28Normalize MethodNone
29TransformationFalse
30Transformation MethodNone
31PCAFalse
32PCA MethodNone
33PCA ComponentsNone
34Ignore Low VarianceFalse
35Combine Rare LevelsFalse
36Rare Level ThresholdNone
37Numeric BinningFalse
38Remove OutliersFalse
39Outliers ThresholdNone
40Remove MulticollinearityFalse
41Multicollinearity ThresholdNone
42ClusteringFalse
43Clustering IterationNone
44Polynomial FeaturesFalse
45Polynomial DegreeNone
46Trignometry FeaturesFalse
47Polynomial ThresholdNone
48Group FeaturesFalse
49Feature SelectionFalse
50Feature Selection Methodclassic
51Features Selection ThresholdNone
52Feature InteractionFalse
53Feature RatioFalse
54Interaction ThresholdNone
55Transform TargetFalse
56Transform Target Methodbox-cox
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pycaret.regression import *\n", "s = setup(data, target = 'charges', session_id = 123)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agebmisex_femalechildren_0children_1children_2children_3children_4children_5smoker_yesregion_northeastregion_northwestregion_southeastregion_southwest
30036.027.5499990.00.00.00.01.00.00.00.01.00.00.00.0
90460.035.0999981.01.00.00.00.00.00.00.00.00.00.01.0
67030.031.5700000.00.00.00.01.00.00.00.00.00.01.00.0
61749.025.6000000.00.00.01.00.00.00.01.00.00.00.01.0
37326.032.9000020.00.00.01.00.00.00.01.00.00.00.01.0
.............................................
123837.022.7050000.00.00.00.01.00.00.00.01.00.00.00.0
114720.031.9200001.01.00.00.00.00.00.00.00.01.00.00.0
10619.028.4000001.00.01.00.00.00.00.00.00.00.00.01.0
104118.023.0849990.01.00.00.00.00.00.00.01.00.00.00.0
112253.036.8600011.00.00.00.01.00.00.01.00.01.00.00.0
\n", "

936 rows × 14 columns

\n", "
" ], "text/plain": [ " age bmi sex_female children_0 children_1 children_2 \\\n", "300 36.0 27.549999 0.0 0.0 0.0 0.0 \n", "904 60.0 35.099998 1.0 1.0 0.0 0.0 \n", "670 30.0 31.570000 0.0 0.0 0.0 0.0 \n", "617 49.0 25.600000 0.0 0.0 0.0 1.0 \n", "373 26.0 32.900002 0.0 0.0 0.0 1.0 \n", "... ... ... ... ... ... ... \n", "1238 37.0 22.705000 0.0 0.0 0.0 0.0 \n", "1147 20.0 31.920000 1.0 1.0 0.0 0.0 \n", "106 19.0 28.400000 1.0 0.0 1.0 0.0 \n", "1041 18.0 23.084999 0.0 1.0 0.0 0.0 \n", "1122 53.0 36.860001 1.0 0.0 0.0 0.0 \n", "\n", " children_3 children_4 children_5 smoker_yes region_northeast \\\n", "300 1.0 0.0 0.0 0.0 1.0 \n", "904 0.0 0.0 0.0 0.0 0.0 \n", "670 1.0 0.0 0.0 0.0 0.0 \n", "617 0.0 0.0 0.0 1.0 0.0 \n", "373 0.0 0.0 0.0 1.0 0.0 \n", "... ... ... ... ... ... \n", "1238 1.0 0.0 0.0 0.0 1.0 \n", "1147 0.0 0.0 0.0 0.0 0.0 \n", "106 0.0 0.0 0.0 0.0 0.0 \n", "1041 0.0 0.0 0.0 0.0 1.0 \n", "1122 1.0 0.0 0.0 1.0 0.0 \n", "\n", " region_northwest region_southeast region_southwest \n", "300 0.0 0.0 0.0 \n", "904 0.0 0.0 1.0 \n", "670 0.0 1.0 0.0 \n", "617 0.0 0.0 1.0 \n", "373 0.0 0.0 1.0 \n", "... ... ... ... \n", "1238 0.0 0.0 0.0 \n", "1147 1.0 0.0 0.0 \n", "106 0.0 0.0 1.0 \n", "1041 0.0 0.0 0.0 \n", "1122 1.0 0.0 0.0 \n", "\n", "[936 rows x 14 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check transformed X_train\n", "get_config('X_train')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['age', 'bmi', 'sex_female', 'children_0', 'children_1', 'children_2',\n", " 'children_3', 'children_4', 'children_5', 'smoker_yes',\n", " 'region_northeast', 'region_northwest', 'region_southeast',\n", " 'region_southwest'],\n", " dtype='object')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# list columns of transformed X_train \n", "get_config('X_train').columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉Model Training & Selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare Models" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Model MAE MSE RMSE R2 RMSLE MAPE TT (Sec)
gbrGradient Boosting Regressor2702.768023242056.44094801.57040.83480.43970.31130.0280
catboostCatBoost Regressor2844.444624943135.52244977.19260.82280.47070.33641.0920
rfRandom Forest Regressor2736.745524862762.23054970.69590.82130.46740.32940.1830
lightgbmLight Gradient Boosting Machine2959.558425236477.04565013.08920.81710.54270.36850.0240
adaAdaBoost Regressor4162.232328328260.09555316.61460.79850.63490.72630.0110
etExtra Trees Regressor2814.296428815493.02605339.08790.79640.48890.33500.1930
xgboostExtreme Gradient Boosting3302.321531739266.60005615.59410.77010.56610.42180.2820
llarLasso Least Angle Regression4315.789538355976.51826173.87400.73110.61050.44150.0060
ridgeRidge Regression4336.230938381496.80006175.95410.73090.61930.44540.0090
brBayesian Ridge4333.688138381669.36296175.94760.73080.61510.44500.0100
lrLinear Regression4323.613638380061.20006175.71640.73080.61750.44320.6420
lassoLasso Regression4323.068838375137.80006175.38010.73080.61400.44310.0070
larLeast Angle Regression4454.382639745068.40966271.99430.72280.64860.47030.0100
dtDecision Tree Regressor3148.340243766011.64916584.71980.68550.53310.34550.0070
huberHuber Regressor3455.299748908984.40596971.26420.65450.47900.21740.0140
ompOrthogonal Matching Pursuit5754.776857503216.42907566.70930.59970.74180.89900.0060
parPassive Aggressive Regressor4164.784361324373.48357747.83320.58400.47240.25860.0100
enElastic Net7369.057390443346.80009468.67820.37910.73770.92560.0060
knnK Neighbors Regressor7805.8425126951808.000011221.65350.12180.83980.91470.0370
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# train all models using default hyperparameters\n", "best = compare_models()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',\n", " init=None, learning_rate=0.1, loss='ls', max_depth=3,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=100,\n", " n_iter_no_change=None, presort='deprecated',\n", " random_state=123, subsample=1.0, tol=0.0001,\n", " validation_fraction=0.1, verbose=0, warm_start=False)\n" ] } ], "source": [ "print(best)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sklearn.ensemble._gb.GradientBoostingRegressor" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(best)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Model" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAE MSE RMSE R2 RMSLE MAPE
03001.229437001480.25906082.88420.77900.49840.3140
13389.888549305179.57327021.76470.71330.55740.3361
22926.019142025684.66666482.72200.46790.62150.4025
32744.714434078761.45075837.70170.71540.54120.3740
43924.481659489464.32077712.94140.55750.64550.4796
53322.543542747575.44536538.16300.72500.48690.2928
63158.704749369669.16527026.35530.66410.45110.3089
72405.297031318616.64405596.30380.82780.44970.1434
83021.546139091793.37756252.34300.74750.51170.4381
93588.977253231891.58897296.01890.65710.56790.3653
Mean3148.340243766011.64916584.71980.68550.53310.3455
SD410.79538481549.4829638.33900.10050.06310.0878
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# train individual model\n", "dt = create_model('dt')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort='deprecated',\n", " random_state=123, splitter='best')\n" ] } ], "source": [ "print(dt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tune Hyperparameters" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAE MSE RMSE R2 RMSLE MAPE
01710.086718253568.89624272.41960.89100.34350.1349
12342.961833002910.78565744.81600.80810.44620.1421
21992.688423279759.59444824.91030.70530.46720.1580
32250.271125594847.87505059.13510.78630.42460.2126
42157.451624978154.43904997.81500.81420.43630.1531
51991.328818794342.27884335.24420.87910.33990.1565
61688.393520093049.82254482.52720.86330.31370.1210
72060.814526178263.62995116.46980.85610.46130.1332
82088.226023545921.72294852.41400.84790.37410.1592
92233.198527217915.96315217.07930.82470.43020.1662
Mean2051.542124093873.50074890.28300.82760.40370.1537
SD206.30664191347.7243423.09020.05140.05290.0238
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Wall time: 889 ms\n" ] } ], "source": [ "%%time\n", "# tune hyperparameters of model\n", "tuned_dt = tune_model(dt)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAE MSE RMSE R2 RMSLE MAPE
02579.829920553234.95774533.56760.87720.40360.2978
13185.642931391105.87845602.77660.81750.46250.3315
22685.659523097034.76954805.93740.70760.47990.3230
32953.799525833993.36355082.71520.78430.46150.3829
42837.255124784197.51594978.37300.81570.44790.3076
52886.928321983688.32154688.67660.85860.36280.2822
62857.332921418065.88284627.96560.85430.39930.3484
72606.648624784213.20734978.37460.86370.47040.2649
82888.380823394063.24674836.74100.84890.45850.3843
92708.305324775400.80004977.48940.84040.45660.3203
Mean2818.978324201499.79434911.26170.82680.44030.3243
SD172.09212884644.6917284.61970.04750.03620.0374
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Wall time: 1.05 s\n" ] } ], "source": [ "%%time\n", "# tune hyperparameters of model\n", "tuned_dt = tune_model(dt, search_library = 'optuna')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAE MSE RMSE R2 RMSLE MAPE
01737.946119084226.90334368.54970.88600.35350.1298
12301.364832666312.66675715.44510.81010.43020.1221
21926.069622308633.48264723.20160.71750.47360.1489
32026.082022473178.77954740.58840.81240.43180.2012
42038.525725854437.05485084.72590.80770.46180.1242
52034.357920359122.75574512.10850.86900.34480.1493
61638.351417652018.44874201.43050.87990.29010.1070
72370.247028957688.57895381.23490.84080.45340.1456
82061.530723682198.82774866.43590.84700.35710.1312
92422.161230646993.24915535.97270.80260.47930.2119
Mean2055.663624368481.07474912.96930.82730.40760.1471
SD242.32174784954.6251480.84670.04700.06230.0323
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Wall time: 2.06 s\n" ] } ], "source": [ "%%time\n", "# tune hyperparameters of model\n", "tuned_dt = tune_model(dt, search_library = 'scikit-optimize')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ensemble Model" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAE MSE RMSE R2 RMSLE MAPE
01714.304218502255.09534301.42480.88950.35120.1272
12422.614530056553.24365482.38570.82520.39430.1723
21791.391118943828.40024352.45080.76010.43160.1539
31969.423019619134.57794429.34920.83620.37610.1680
42349.178227075816.89115203.44280.79860.45910.1597
52182.222419450947.09514410.32280.87490.34340.1559
61664.964418708131.55324325.28980.87270.29500.1120
72222.627225396879.44055039.53170.86040.42200.1422
81948.690820971053.51984579.41630.86450.35130.1752
92219.533325927191.78625091.87510.83300.41350.1654
Mean2048.494922465179.16034721.54890.84150.38380.1532
SD254.74104013793.7153414.91590.03750.04700.0194
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bagged_tunned_dt = ensemble_model(tuned_dt)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BaggingRegressor(base_estimator=DecisionTreeRegressor(ccp_alpha=0.0,\n", " criterion='mae',\n", " max_depth=10,\n", " max_features=0.7926939286102865,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=3.626887508557946e-07,\n", " min_impurity_split=None,\n", " min_samples_leaf=5,\n", " min_samples_split=3,\n", " min_weight_fraction_leaf=0.0,\n", " presort='deprecated',\n", " random_state=123,\n", " splitter='best'),\n", " bootstrap=True, bootstrap_features=False, max_features=1.0,\n", " max_samples=1.0, n_estimators=10, n_jobs=None, oob_score=False,\n", " random_state=123, verbose=0, warm_start=False)\n" ] } ], "source": [ "print(bagged_tunned_dt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Voting Ensemble" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAE MSE RMSE R2 RMSLE MAPE
04317.633040225874.71206342.38710.75970.53770.5095
14603.877047671358.84016904.44490.72280.55280.4544
23526.565730025112.72225479.51760.61980.60980.5355
34059.957733028436.97775747.03720.72420.61590.6499
44719.123647134827.60696865.48090.64940.57830.5456
54036.987739416780.67536278.27850.74640.47450.3684
64098.219443644758.23126606.41800.70300.51160.4656
73938.689237270337.04876104.94370.79510.42890.3185
84493.179843777521.89256616.45840.71720.58000.6167
94452.810744338906.94746658.74660.71440.56130.4606
Mean4224.704440653391.56546360.37130.71520.54510.4925
SD341.84055548059.5251446.17120.04800.05610.0970
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dt = create_model('dt', verbose=False)\n", "lasso = create_model('lasso', verbose=False)\n", "knn = create_model('knn', verbose=False)\n", "blender = blend_models([dt,lasso,knn])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "VotingRegressor(estimators=[('dt',\n", " DecisionTreeRegressor(ccp_alpha=0.0,\n", " criterion='mse',\n", " max_depth=None,\n", " max_features=None,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_impurity_split=None,\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " presort='deprecated',\n", " random_state=123,\n", " splitter='best')),\n", " ('lasso',\n", " Lasso(alpha=1.0, copy_X=True, fit_intercept=True,\n", " max_iter=1000, normalize=False,\n", " positive=False, precompute=False,\n", " random_state=123, selection='cyclic',\n", " tol=0.0001, warm_start=False)),\n", " ('knn',\n", " KNeighborsRegressor(algorithm='auto', leaf_size=30,\n", " metric='minkowski',\n", " metric_params=None, n_jobs=-1,\n", " n_neighbors=5, p=2,\n", " weights='uniform'))],\n", " n_jobs=-1, verbose=False, weights=None)\n" ] } ], "source": [ "print(blender)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sklearn.ensemble._voting.VotingRegressor" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(blender)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stacking Ensemble" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAE MSE RMSE R2 RMSLE MAPE
03330.137226118092.61665110.58630.84400.45280.3698
13823.105536671240.23946055.67830.78680.53070.3557
23330.510929504475.52645431.80220.62640.57010.3860
33029.424921804106.76754669.48680.81790.48040.4295
43931.743636775712.81606064.29820.72650.51320.3751
53461.543229482846.82115429.81090.81030.49370.3541
63479.784931955532.49155652.92250.78260.46840.3600
73610.578630708816.39455541.55360.83110.50300.2904
83761.796127523649.50595246.29860.82220.55960.4627
93913.487835744866.01615978.70100.76980.63680.4228
Mean3567.211330628933.91955518.11390.78180.52090.3806
SD278.97524611695.3476423.50130.06120.05250.0458
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stacker = stack_models([dt,lasso,knn])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "StackingRegressor(cv=KFold(n_splits=10, random_state=RandomState(MT19937) at 0x1BC27FFD268,\n", " shuffle=False),\n", " estimators=[('dt',\n", " DecisionTreeRegressor(ccp_alpha=0.0,\n", " criterion='mse',\n", " max_depth=None,\n", " max_features=None,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_impurity_split=None,\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " presor...\n", " positive=False, precompute=False,\n", " random_state=123, selection='cyclic',\n", " tol=0.0001, warm_start=False)),\n", " ('knn',\n", " KNeighborsRegressor(algorithm='auto',\n", " leaf_size=30,\n", " metric='minkowski',\n", " metric_params=None,\n", " n_jobs=-1, n_neighbors=5,\n", " p=2, weights='uniform'))],\n", " final_estimator=LinearRegression(copy_X=True,\n", " fit_intercept=True,\n", " n_jobs=-1, normalize=False),\n", " n_jobs=-1, passthrough=True, verbose=0)\n" ] } ], "source": [ "print(stacker)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyze Model" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b3e79dd44bd34d7c95e1522e60644df7", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "evaluate_model(best)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "interpret_model(dt)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", " Visualization omitted, Javascript library not loaded!
\n", " Have you run `initjs()` in this notebook? If this notebook was from another\n", " user you must also trust this notebook (File -> Trust notebook). If you are viewing\n", " this notebook on github the Javascript has been stripped for security. If you are using\n", " JupyterLab this error is because a JupyterLab extension has not yet been written.\n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interpret_model(dt, plot = 'reason', observation=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Predictions" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Model MAE MSE RMSE R2 RMSLE MAPE
0Gradient Boosting Regressor2386.201817296249.13794158.87590.87890.39850.2922
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# predict on holdout / test set\n", "pred_holdout = predict_model(best);" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agebmisex_femalechildren_0children_1children_2children_3children_4children_5smoker_yesregion_northeastregion_northwestregion_southeastregion_southwestchargesLabel
049.042.6800001.00.00.01.00.00.00.00.00.00.01.00.09800.88867210621.483595
132.037.3349990.00.01.00.00.00.00.00.01.00.00.00.04667.6074227290.151941
227.031.4000001.01.00.00.00.00.00.01.00.00.00.01.034838.87109436012.959871
335.024.1299990.00.01.00.00.00.00.00.00.01.00.00.05125.2158207553.788882
460.025.7400000.01.00.00.00.00.00.00.00.00.01.00.012142.57812514904.032497
\n", "
" ], "text/plain": [ " age bmi sex_female children_0 children_1 children_2 \\\n", "0 49.0 42.680000 1.0 0.0 0.0 1.0 \n", "1 32.0 37.334999 0.0 0.0 1.0 0.0 \n", "2 27.0 31.400000 1.0 1.0 0.0 0.0 \n", "3 35.0 24.129999 0.0 0.0 1.0 0.0 \n", "4 60.0 25.740000 0.0 1.0 0.0 0.0 \n", "\n", " children_3 children_4 children_5 smoker_yes region_northeast \\\n", "0 0.0 0.0 0.0 0.0 0.0 \n", "1 0.0 0.0 0.0 0.0 1.0 \n", "2 0.0 0.0 0.0 1.0 0.0 \n", "3 0.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 \n", "\n", " region_northwest region_southeast region_southwest charges \\\n", "0 0.0 1.0 0.0 9800.888672 \n", "1 0.0 0.0 0.0 4667.607422 \n", "2 0.0 0.0 1.0 34838.871094 \n", "3 1.0 0.0 0.0 5125.215820 \n", "4 0.0 1.0 0.0 12142.578125 \n", "\n", " Label \n", "0 10621.483595 \n", "1 7290.151941 \n", "2 36012.959871 \n", "3 7553.788882 \n", "4 14904.032497 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pred_holdout.head()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexbmichildrensmokerregion
019female27.9000yessouthwest
118male33.7701nosoutheast
228male33.0003nosoutheast
333male22.7050nonorthwest
432male28.8800nonorthwest
\n", "
" ], "text/plain": [ " age sex bmi children smoker region\n", "0 19 female 27.900 0 yes southwest\n", "1 18 male 33.770 1 no southeast\n", "2 28 male 33.000 3 no southeast\n", "3 33 male 22.705 0 no northwest\n", "4 32 male 28.880 0 no northwest" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# predict on new data\n", "data2 = data.copy()\n", "data2.drop('charges', axis=1, inplace=True)\n", "data2.head()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# finalize model\n", "best_final = finalize_model(best)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexbmichildrensmokerregionLabel
019female27.9000yessouthwest18894.260073
118male33.7701nosoutheast3698.287534
228male33.0003nosoutheast6029.271578
333male22.7050nonorthwest8958.189116
432male28.8800nonorthwest3900.039002
\n", "
" ], "text/plain": [ " age sex bmi children smoker region Label\n", "0 19 female 27.900 0 yes southwest 18894.260073\n", "1 18 male 33.770 1 no southeast 3698.287534\n", "2 28 male 33.000 3 no southeast 6029.271578\n", "3 33 male 22.705 0 no northwest 8958.189116\n", "4 32 male 28.880 0 no northwest 3900.039002" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# predict on data2\n", "predictions = predict_model(best_final, data=data2)\n", "predictions.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 👉 Save / Load / Deploy Model" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Succesfully Saved\n" ] }, { "data": { "text/plain": [ "(Pipeline(memory=None,\n", " steps=[('dtypes',\n", " DataTypes_Auto_infer(categorical_features=[],\n", " display_types=False, features_todrop=[],\n", " id_columns=[], ml_usecase='regression',\n", " numerical_features=[], target='charges',\n", " time_features=[])),\n", " ('imputer',\n", " Simple_Imputer(categorical_strategy='not_available',\n", " fill_value_categorical=None,\n", " fill_value_numerical=None,\n", " numeric_strateg...\n", " learning_rate=0.1, loss='ls',\n", " max_depth=3, max_features=None,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_impurity_split=None,\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " n_estimators=100,\n", " n_iter_no_change=None,\n", " presort='deprecated',\n", " random_state=123, subsample=1.0,\n", " tol=0.0001, validation_fraction=0.1,\n", " verbose=0, warm_start=False)]],\n", " verbose=False), 'insurance-pipeline.pkl')" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "save_model(best_final, 'insurance-pipeline')" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Successfully Loaded\n" ] } ], "source": [ "loaded_pipeline = load_model('insurance-pipeline')" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pipeline(memory=None,\n", " steps=[('dtypes',\n", " DataTypes_Auto_infer(categorical_features=[],\n", " display_types=False, features_todrop=[],\n", " id_columns=[], ml_usecase='regression',\n", " numerical_features=[], target='charges',\n", " time_features=[])),\n", " ('imputer',\n", " Simple_Imputer(categorical_strategy='not_available',\n", " fill_value_categorical=None,\n", " fill_value_numerical=None,\n", " numeric_strateg...\n", " learning_rate=0.1, loss='ls',\n", " max_depth=3, max_features=None,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_impurity_split=None,\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " n_estimators=100,\n", " n_iter_no_change=None,\n", " presort='deprecated',\n", " random_state=123, subsample=1.0,\n", " tol=0.0001, validation_fraction=0.1,\n", " verbose=0, warm_start=False)]],\n", " verbose=False)\n" ] } ], "source": [ "print(loaded_pipeline)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model Succesfully Deployed on AWS S3\n" ] } ], "source": [ "# deploy model on AWS S3\n", "deploy_model(best_final, 'insurance-pipeline-aws', platform = 'aws',\n", " authentication = {'bucket' : 'pycaret-test'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## THE END" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }