# ðŸ‘‰ What is PyCaret?

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.

In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more.

The design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more expertise. Seasoned data scientists are often difficult to find and expensive to hire but citizen data scientists can be an effective way to mitigate this gap and address data-related challenges in the business setting.

Official Website: https://www.pycaret.org
Documentation: https://pycaret.readthedocs.io/en/latest/

![image.png](attachment:image.png)

# ðŸ‘‰ Install PyCaret
Installing PyCaret is very easy and takes only a few minutes. We strongly recommend using a virtual environment to avoid potential conflicts with other libraries. PyCaret's default installation is a slim version of pycaret that only installs hard dependencies that are listed in [requirements.txt](https://github.com/pycaret/pycaret/blob/master/requirements.txt). To install the default version:

- `pip install pycaret`

When you install the full version of pycaret, all the optional dependencies as listed [here](https://github.com/pycaret/pycaret/blob/master/requirements-optional.txt) are also installed.To install version:

- `pip install pycaret[full]`

# ðŸ‘‰Dataset

In [1]:
import pandas as pd
import numpy as np

data = pd.read_csv('c:/users/moezs/pycaret-demo-mlflow/AirPassengers.csv')
data['Date'] = pd.to_datetime(data['Date'])
data.head()

Unnamed: 0,Date,Passengers
0,1949-01-01,112
1,1949-02-01,118
2,1949-03-01,132
3,1949-04-01,129
4,1949-05-01,121


In [2]:
import plotly.express as px
data['MA12'] = data['Passengers'].rolling(12).mean()
fig = px.line(data, x="Date", y=["Passengers", "MA12"], template = 'plotly_dark')
fig.show()

# ðŸ‘‰ Data Preparation

In [3]:
# extract features from date
data['Month'] = [i.month for i in data['Date']]
data['Year'] = [i.year for i in data['Date']]
data['Series'] = np.arange(1,len(data)+1).astype('int64')

# drop date and MA12
data.drop(['Date', 'MA12'], axis=1, inplace=True)

# rearrange columns
data = data[['Series', 'Year', 'Month', 'Passengers']] #re-arrange columns

# check head
data.head()

Unnamed: 0,Series,Year,Month,Passengers
0,1,1949,1,112
1,2,1949,2,118
2,3,1949,3,132
3,4,1949,4,129
4,5,1949,5,121


In [4]:
train = data[data['Year'] < 1960]
test = data[data['Year'] >= 1960]
train.shape, test.shape

((132, 4), (12, 4))

In [7]:
from pycaret.regression import *

s = setup(data = train, test_data = test,
          target = 'Passengers', 
          fold_strategy = 'timeseries',
          numeric_features = ['Year', 'Series'],
          fold = 3,
          transform_target = True,
          session_id = 123, silent = True,
          log_experiment = True, experiment_name = 'airpassengers')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,Passengers
2,Original Data,"(132, 4)"
3,Missing Values,False
4,Numeric Features,2
5,Categorical Features,1
6,Ordinal Features,False
7,High Cardinality Features,False
8,High Cardinality Method,
9,Transformed Train Set,"(132, 13)"


In [8]:
# check X_train index
get_config('X_train').index

Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            122, 123, 124, 125, 126, 127, 128, 129, 130, 131],
           dtype='int64', length=132)

In [9]:
# check X_test index
get_config('X_test').index

Int64Index([132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143], dtype='int64')

# ðŸ‘‰Model Training & Selection

## Compare Models

In [10]:
# train all models using default hyperparameters
best = compare_models(sort = 'MAE')

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
lar,Least Angle Regression,22.398,923.8654,28.2855,0.5621,0.0878,0.0746,0.0333
lr,Linear Regression,22.3981,923.8749,28.2856,0.5621,0.0878,0.0746,2.9367
huber,Huber Regressor,22.4192,891.5686,27.935,0.5988,0.0879,0.0749,0.0433
br,Bayesian Ridge,22.4783,932.2165,28.5483,0.5611,0.0884,0.0746,0.0267
ridge,Ridge Regression,23.1976,1003.9426,30.041,0.5258,0.0933,0.0764,1.99
lasso,Lasso Regression,38.4188,2413.5109,46.8468,0.0882,0.1473,0.1241,2.5667
en,Elastic Net,40.6486,2618.8759,49.4048,-0.0824,0.1563,0.1349,0.03
omp,Orthogonal Matching Pursuit,44.3054,3048.2658,53.8613,-0.4499,0.1713,0.152,0.0267
xgboost,Extreme Gradient Boosting,46.7192,3791.0476,59.9683,-0.5515,0.1962,0.1432,0.26
gbr,Gradient Boosting Regressor,50.1217,4032.0567,61.2306,-0.6189,0.2034,0.1538,0.05


In [11]:
print(best)

PowerTransformedTargetRegressor(copy_X=True, eps=2.220446049250313e-16,
                                fit_intercept=True, fit_path=True, jitter=None,
                                n_nonzero_coefs=500, normalize=True,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                precompute='auto', random_state=123,
                                regressor=Lars(copy_X=True,
                                               eps=2.220446049250313e-16,
                                               fit_intercept=True,
                                               fit_path=True, jitter=None,
                                               n_nonzero_coefs=500,
                                               normalize=True,
                                               precompute='auto',
                                               random_state=123,
                                    

In [12]:
# check on hold-out
pred_holdout = predict_model(best);

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Least Angle Regression,25.0714,972.2733,31.1813,0.8245,0.0692,0.0571


In [13]:
evaluate_model(best)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'paramâ€¦

# Plot Actual and Predicted values

In [14]:
predictions = predict_model(best, data=data)
predictions['Date'] = pd.date_range(start='1949-01-01', end = '1960-12-01', freq = 'MS')

import plotly.express as px
fig = px.line(predictions, x='Date', y=["Passengers", "Label"], template = 'plotly_dark')
fig.add_vrect(x0="1960-01-01", x1="1960-12-01", fillcolor="grey", opacity=0.25, line_width=0)

fig.show()

# Predict Future Values

In [15]:
# create future dataset to score
future_dates = pd.date_range(start = '1961-01-01', end = '1965-01-01', freq = 'MS')

future_df = pd.DataFrame()

future_df['Month'] = [i.month for i in future_dates]
future_df['Year'] = [i.year for i in future_dates]    
future_df['Series'] = np.arange(145,(145+len(future_dates)))

future_df.head()

Unnamed: 0,Month,Year,Series
0,1,1961,145
1,2,1961,146
2,3,1961,147
3,4,1961,148
4,5,1961,149


In [16]:
# finalize model
final_best = finalize_model(best)

In [17]:
# generate predictions on future dataset
predictions_future = predict_model(final_best, data=future_df)
predictions_future.head()

Unnamed: 0,Month,Year,Series,Label
0,1,1961,145,486.278268
1,2,1961,146,482.208187
2,3,1961,147,550.485967
3,4,1961,148,535.187177
4,5,1961,149,538.923789


In [18]:
# plot historic and predicted values
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='1949-01-01', end = '1965-01-01', freq = 'MS')
concat_df.set_index(concat_df_i, inplace=True)

import plotly.express as px
fig = px.line(concat_df, x=concat_df.index, y=["Passengers", "Label"], template = 'plotly_dark')
fig.show()

## THE END