{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# Overview \n", "\n", "In the 10x series of notebooks, we will look at Time Series modeling in pycaret using univariate data and no exogenous variables. We will use the famous airline dataset for illustration. Our plan of action is as follows:\n", "\n", "1. Perform EDA on the dataset to extract valuable insight about the process generating the time series. **(COMPLETED)**\n", "2. Model the dataset based on exploratory analysis (univariable model without exogenous variables). **(Covered in this notebook)**\n", "3. Use an automated approach (AutoML) to improve the performance.\n", "4. User customizations, potential pitfalls and how to overcome them. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Only enable critical logging (Optional)\n", "import os\n", "os.environ[\"PYCARET_CUSTOM_LOGGING_LEVEL\"] = \"CRITICAL\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "System:\n", " python: 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)]\n", "executable: C:\\Users\\Nikhil\\.conda\\envs\\pycaret_dev_sktime_0p11_2\\python.exe\n", " machine: Windows-10-10.0.19044-SP0\n", "\n", "PyCaret required dependencies:\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\Nikhil\\.conda\\envs\\pycaret_dev_sktime_0p11_2\\lib\\site-packages\\_distutils_hack\\__init__.py:30: UserWarning: Setuptools is replacing distutils.\n", " warnings.warn(\"Setuptools is replacing distutils.\")\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " pip: 21.2.2\n", " setuptools: 61.2.0\n", " pycaret: 3.0.0\n", " ipython: Not installed\n", " ipywidgets: 7.7.0\n", " numpy: 1.21.6\n", " pandas: 1.4.2\n", " jinja2: 3.1.2\n", " scipy: 1.8.0\n", " joblib: 1.1.0\n", " sklearn: 1.0.2\n", " pyod: Installed but version unavailable\n", " imblearn: 0.9.0\n", " category_encoders: 2.4.1\n", " lightgbm: 3.3.2\n", " numba: 0.55.1\n", " requests: 2.27.1\n", " matplotlib: 3.5.2\n", " scikitplot: 0.3.7\n", " yellowbrick: 1.4\n", " plotly: 5.8.0\n", " kaleido: 0.2.1\n", " statsmodels: 0.13.2\n", " sktime: 0.11.4\n", " tbats: Installed but version unavailable\n", " pmdarima: 1.8.5\n", "\n", "PyCaret optional dependencies:\n", " shap: Not installed\n", " interpret: Not installed\n", " umap: Not installed\n", " pandas_profiling: Not installed\n", " explainerdashboard: Not installed\n", " autoviz: Not installed\n", " fairlearn: Not installed\n", " xgboost: Not installed\n", " catboost: Not installed\n", " kmodes: Not installed\n", " mlxtend: Not installed\n", " statsforecast: 0.5.5\n", " tune_sklearn: Not installed\n", " ray: Not installed\n", " hyperopt: Not installed\n", " optuna: Not installed\n", " skopt: Not installed\n", " mlflow: 1.25.1\n", " gradio: Not installed\n", " fastapi: Not installed\n", " uvicorn: Not installed\n", " m2cgen: Not installed\n", " evidently: Not installed\n", " nltk: Not installed\n", " pyLDAvis: Not installed\n", " gensim: Not installed\n", " spacy: Not installed\n", " wordcloud: Not installed\n", " textblob: Not installed\n", " psutil: 5.9.0\n", " fugue: Not installed\n", " streamlit: Not installed\n", " prophet: Not installed\n" ] } ], "source": [ "def what_is_installed():\n", " from pycaret import show_versions\n", " show_versions()\n", "\n", "try:\n", " what_is_installed()\n", "except ModuleNotFoundError:\n", " !pip install pycaret\n", " what_is_installed()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import time\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from pycaret.datasets import get_data\n", "from pycaret.time_series import TSForecastingExperiment" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "y = get_data('airline', verbose=False)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# We want to forecast the next 12 months of data and we will use 3 fold cross-validation to test the models.\n", "fh = 12 # or alternately fh = np.arange(1,13)\n", "fold = 3" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# Global Plot Settings\n", "fig_kwargs={'renderer': 'notebook'}" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id8316
1TargetNumber of airline passengers
2ApproachUnivariate
3Exogenous VariablesNot Present
4Original data shape(144, 1)
5Transformed data shape(144, 1)
6Transformed train set shape(132, 1)
7Transformed test set shape(12, 1)
8Rows with missing values0.0%
9Fold GeneratorExpandingWindowSplitter
10Fold Number3
11Enforce Prediction IntervalFalse
12Seasonal Period(s) Tested12
13Seasonality PresentTrue
14Seasonalities Detected[12]
15Primary Seasonality12
16Target Strictly PositiveTrue
17Target White NoiseNo
18Recommended d1
19Recommended Seasonal D1
20PreprocessFalse
21CPU Jobs-1
22Use GPUFalse
23Log ExperimentFalse
24Experiment Namets-default-name
25USI8630
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fig_kwargs=fig_kwargs)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# Available Models\n", "\n", "`pycaret` Time Series Forecasting module has a rich set of models ranging from traditional statistical models such as ARIMA, Exponential Smoothing, ETS, etc to [Reduced Regression Models](https://github.com/pycaret/pycaret/discussions/1760)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameReferenceTurbo
ID
naiveNaive Forecastersktime.forecasting.naive.NaiveForecasterTrue
grand_meansGrand Means Forecastersktime.forecasting.naive.NaiveForecasterTrue
snaiveSeasonal Naive Forecastersktime.forecasting.naive.NaiveForecasterTrue
polytrendPolynomial Trend Forecastersktime.forecasting.trend.PolynomialTrendForeca...True
arimaARIMAsktime.forecasting.arima.ARIMATrue
auto_arimaAuto ARIMAsktime.forecasting.arima.AutoARIMATrue
exp_smoothExponential Smoothingsktime.forecasting.exp_smoothing.ExponentialSm...True
crostonCrostonsktime.forecasting.croston.CrostonTrue
etsETSsktime.forecasting.ets.AutoETSTrue
thetaTheta Forecastersktime.forecasting.theta.ThetaForecasterTrue
tbatsTBATSsktime.forecasting.tbats.TBATSFalse
batsBATSsktime.forecasting.bats.BATSFalse
lr_cds_dtLinear w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
en_cds_dtElastic Net w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
ridge_cds_dtRidge w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
lasso_cds_dtLasso w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
lar_cds_dtLeast Angular Regressor w/ Cond. Deseasonalize...pycaret.containers.models.time_series.BaseCdsD...True
llar_cds_dtLasso Least Angular Regressor w/ Cond. Deseaso...pycaret.containers.models.time_series.BaseCdsD...True
br_cds_dtBayesian Ridge w/ Cond. Deseasonalize & Detren...pycaret.containers.models.time_series.BaseCdsD...True
huber_cds_dtHuber w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
par_cds_dtPassive Aggressive w/ Cond. Deseasonalize & De...pycaret.containers.models.time_series.BaseCdsD...True
omp_cds_dtOrthogonal Matching Pursuit w/ Cond. Deseasona...pycaret.containers.models.time_series.BaseCdsD...True
knn_cds_dtK Neighbors w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
dt_cds_dtDecision Tree w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
rf_cds_dtRandom Forest w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
et_cds_dtExtra Trees w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
gbr_cds_dtGradient Boosting w/ Cond. Deseasonalize & Det...pycaret.containers.models.time_series.BaseCdsD...True
ada_cds_dtAdaBoost w/ Cond. Deseasonalize & Detrendingpycaret.containers.models.time_series.BaseCdsD...True
lightgbm_cds_dtLight Gradient Boosting w/ Cond. Deseasonalize...pycaret.containers.models.time_series.BaseCdsD...True
\n", "
" ], "text/plain": [ " Name \\\n", "ID \n", "naive Naive Forecaster \n", "grand_means Grand Means Forecaster \n", "snaive Seasonal Naive Forecaster \n", "polytrend Polynomial Trend Forecaster \n", "arima ARIMA \n", "auto_arima Auto ARIMA \n", "exp_smooth Exponential Smoothing \n", "croston Croston \n", "ets ETS \n", "theta Theta Forecaster \n", "tbats TBATS \n", "bats BATS \n", "lr_cds_dt Linear w/ Cond. Deseasonalize & Detrending \n", "en_cds_dt Elastic Net w/ Cond. Deseasonalize & Detrending \n", "ridge_cds_dt Ridge w/ Cond. Deseasonalize & Detrending \n", "lasso_cds_dt Lasso w/ Cond. Deseasonalize & Detrending \n", "lar_cds_dt Least Angular Regressor w/ Cond. Deseasonalize... \n", "llar_cds_dt Lasso Least Angular Regressor w/ Cond. Deseaso... \n", "br_cds_dt Bayesian Ridge w/ Cond. Deseasonalize & Detren... \n", "huber_cds_dt Huber w/ Cond. Deseasonalize & Detrending \n", "par_cds_dt Passive Aggressive w/ Cond. Deseasonalize & De... \n", "omp_cds_dt Orthogonal Matching Pursuit w/ Cond. Deseasona... \n", "knn_cds_dt K Neighbors w/ Cond. Deseasonalize & Detrending \n", "dt_cds_dt Decision Tree w/ Cond. Deseasonalize & Detrending \n", "rf_cds_dt Random Forest w/ Cond. Deseasonalize & Detrending \n", "et_cds_dt Extra Trees w/ Cond. Deseasonalize & Detrending \n", "gbr_cds_dt Gradient Boosting w/ Cond. Deseasonalize & Det... \n", "ada_cds_dt AdaBoost w/ Cond. Deseasonalize & Detrending \n", "lightgbm_cds_dt Light Gradient Boosting w/ Cond. Deseasonalize... \n", "\n", " Reference Turbo \n", "ID \n", "naive sktime.forecasting.naive.NaiveForecaster True \n", "grand_means sktime.forecasting.naive.NaiveForecaster True \n", "snaive sktime.forecasting.naive.NaiveForecaster True \n", "polytrend sktime.forecasting.trend.PolynomialTrendForeca... True \n", "arima sktime.forecasting.arima.ARIMA True \n", "auto_arima sktime.forecasting.arima.AutoARIMA True \n", "exp_smooth sktime.forecasting.exp_smoothing.ExponentialSm... True \n", "croston sktime.forecasting.croston.Croston True \n", "ets sktime.forecasting.ets.AutoETS True \n", "theta sktime.forecasting.theta.ThetaForecaster True \n", "tbats sktime.forecasting.tbats.TBATS False \n", "bats sktime.forecasting.bats.BATS False \n", "lr_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "en_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "ridge_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "lasso_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "lar_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "llar_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "br_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "huber_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "par_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "omp_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "knn_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "dt_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "rf_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "et_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "gbr_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "ada_cds_dt pycaret.containers.models.time_series.BaseCdsD... True \n", "lightgbm_cds_dt pycaret.containers.models.time_series.BaseCdsD... True " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp.models()" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# Modeling (Manual)\n", "\n", "In our exploratory analysis, we found that the characteristics of the data meant that we need to difference the data once and take a seasonal difference with period = 12. We also concluded that some autoregressive properties still need to be taken care of after doing this. \n", "\n", "More specifically, if we were building an ARIMA model, we would start with **ARIMA(1,1,0)x(0,1,0,12)**. Let's build this next." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id42
1TargetNumber of airline passengers
2ApproachUnivariate
3Exogenous VariablesNot Present
4Original data shape(144, 1)
5Transformed data shape(144, 1)
6Transformed train set shape(132, 1)
7Transformed test set shape(12, 1)
8Rows with missing values0.0%
9Fold GeneratorExpandingWindowSplitter
10Fold Number3
11Enforce Prediction IntervalFalse
12Seasonal Period(s) Tested12
13Seasonality PresentTrue
14Seasonalities Detected[12]
15Primary Seasonality12
16Target Strictly PositiveTrue
17Target White NoiseNo
18Recommended d1
19Recommended Seasonal D1
20PreprocessFalse
21CPU Jobs-1
22Use GPUFalse
23Log ExperimentFalse
24Experiment Namets-default-name
25USI56a3
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, session_id=42)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**Note:**\n", "\n", "The `setup` provides some useful information out of the box.\n", "\n", "1. Seasonal period of 12 was tested and seasonality was detected at this period. This is what will be used in subsequent modeling automatically.\n", "2. The data splits are also shown - 132 data points for the training dataset and 12 data points for the test dataset.\n", "3. The training dataset will be cross validated using 3 folds. \n" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## ARIMA Model\n", "\n", "**NOTE**:\n", "\n", "1. Specific model hyperparameters can be passed as kwargs to the model. \n", "2. All models in the time series module are are based on the `sktime` package.\n", "2. More details about creating and customizing time series models in pycaret can be found here: https://github.com/pycaret/pycaret/discussions/1757\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.35350.410310.321613.43150.02550.02600.9413
11957-120.68440.685320.923523.26530.05810.05600.8582
21958-121.59881.467345.685047.69550.10660.11320.4911
Meannan0.87890.854325.643428.13080.06340.06510.7635
SDnan0.52670.447714.817814.40510.03330.03620.1956
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model = exp.create_model(\"arima\", order=(1,1,0), seasonal_order=(0,1,0,12))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**NOTE:**\n", "* `create_model` will highlight the cross validation scores across the folds. The time cutoff for each fold is also displayes for convenience. Users may wish to correlate this cutoff with what they get from `plot_model(plot=\"cv\")`.\n", "\n", "* `create_model` retrains the model on the entire dataset after performing cross validation. This allows us to check the performance of the model against the test set simply by using `predict_model`" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMASERMSSEMAERMSEMAPESMAPER2
0ARIMA0.69990.775721.312126.79980.04800.04620.8703
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y_pred
1960-01424.7154
1960-02408.1599
1960-03472.4447
1960-04463.0139
1960-05487.5134
1960-06540.0299
1960-07616.5423
1960-08628.0557
1960-09532.5688
1960-10477.0820
1960-11432.5952
1960-12476.1084
\n", "
" ], "text/plain": [ " y_pred\n", "1960-01 424.7154\n", "1960-02 408.1599\n", "1960-03 472.4447\n", "1960-04 463.0139\n", "1960-05 487.5134\n", "1960-06 540.0299\n", "1960-07 616.5423\n", "1960-08 628.0557\n", "1960-09 532.5688\n", "1960-10 477.0820\n", "1960-11 432.5952\n", "1960-12 476.1084" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Out-of-sample Forecasts\n", "y_predict = exp.predict_model(model)\n", "y_predict" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "The scores listed above are for the test set. We can see that the metrics are actually slightly better than the mean Cross validation score which implies that we have not overfit the model. More details about this can be found in **[this article](https://towardsdatascience.com/bias-variance-tradeoff-in-time-series-8434f536387a)**." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "In a previous notebook, we saw that `plot_model` without an estimator argument works on the original dataset. In addition, by passing the model (`estimator`) to the `plot_model` call, we can plot model diagnostics as well." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot the out-of-sample forecasts\n", "exp.plot_model(estimator=model)\n", "\n", "# # Alternately the following will plot the same thing.\n", "# exp.plot_model(estimator=model, plot=\"forecast\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**NOTE:** \n", "* `predict_model` is intelligent enough to understand the current state of the model (i.e. it is only trained using the _train_ dataset).\n", "* Since the model has only been trained on the _train_ set so far, the predictons are made for the _test_ set.\n", "* Later, we will see that once the model is finalized (trained on the complete _train + test_ set), `predict_model` automatically makes the true future predictons automatically.\n", "* Also note that if the model supports prediction intervals, they are plotted by default for convenience." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Next, let's check the goodness of fit using both diagnostic plots as well as statistical tests. Similar to plot_model, passing an estimator to the `check_stats` call will perform the tests on the model residuals." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TestTest NameDataPropertySettingValue
0SummaryStatisticsResidualLength131.0
1SummaryStatisticsResidual# Missing Values0.0
2SummaryStatisticsResidualMean-0.445207
3SummaryStatisticsResidualMedian-0.9606
4SummaryStatisticsResidualStandard Deviation11.759243
5SummaryStatisticsResidualVariance138.27979
6SummaryStatisticsResidualKurtosis4.244741
7SummaryStatisticsResidualSkewness-0.938657
8SummaryStatisticsResidual# Distinct Values127.0
9White NoiseLjung-BoxResidualTest Statictic{'alpha': 0.05, 'K': 24}21.29991
10White NoiseLjung-BoxResidualTest Statictic{'alpha': 0.05, 'K': 48}43.239443
11White NoiseLjung-BoxResidualp-value{'alpha': 0.05, 'K': 24}0.620976
12White NoiseLjung-BoxResidualp-value{'alpha': 0.05, 'K': 48}0.667946
13White NoiseLjung-BoxResidualWhite Noise{'alpha': 0.05, 'K': 24}True
14White NoiseLjung-BoxResidualWhite Noise{'alpha': 0.05, 'K': 48}True
15StationarityADFResidualStationarity{'alpha': 0.05}True
16StationarityADFResidualp-value{'alpha': 0.05}0.0
17StationarityADFResidualTest Statistic{'alpha': 0.05}-11.577344
18StationarityADFResidualCritical Value 1%{'alpha': 0.05}-3.481682
19StationarityADFResidualCritical Value 5%{'alpha': 0.05}-2.884042
20StationarityADFResidualCritical Value 10%{'alpha': 0.05}-2.57877
21StationarityKPSSResidualTrend Stationarity{'alpha': 0.05}True
22StationarityKPSSResidualp-value{'alpha': 0.05}0.1
23StationarityKPSSResidualTest Statistic{'alpha': 0.05}0.031457
24StationarityKPSSResidualCritical Value 10%{'alpha': 0.05}0.119
25StationarityKPSSResidualCritical Value 5%{'alpha': 0.05}0.146
26StationarityKPSSResidualCritical Value 2.5%{'alpha': 0.05}0.176
27StationarityKPSSResidualCritical Value 1%{'alpha': 0.05}0.216
28NormalityShapiroResidualNormality{'alpha': 0.05}False
29NormalityShapiroResidualp-value{'alpha': 0.05}0.00003
\n", "
" ], "text/plain": [ " Test Test Name Data Property \\\n", "0 Summary Statistics Residual Length \n", "1 Summary Statistics Residual # Missing Values \n", "2 Summary Statistics Residual Mean \n", "3 Summary Statistics Residual Median \n", "4 Summary Statistics Residual Standard Deviation \n", "5 Summary Statistics Residual Variance \n", "6 Summary Statistics Residual Kurtosis \n", "7 Summary Statistics Residual Skewness \n", "8 Summary Statistics Residual # Distinct Values \n", "9 White Noise Ljung-Box Residual Test Statictic \n", "10 White Noise Ljung-Box Residual Test Statictic \n", "11 White Noise Ljung-Box Residual p-value \n", "12 White Noise Ljung-Box Residual p-value \n", "13 White Noise Ljung-Box Residual White Noise \n", "14 White Noise Ljung-Box Residual White Noise \n", "15 Stationarity ADF Residual Stationarity \n", "16 Stationarity ADF Residual p-value \n", "17 Stationarity ADF Residual Test Statistic \n", "18 Stationarity ADF Residual Critical Value 1% \n", "19 Stationarity ADF Residual Critical Value 5% \n", "20 Stationarity ADF Residual Critical Value 10% \n", "21 Stationarity KPSS Residual Trend Stationarity \n", "22 Stationarity KPSS Residual p-value \n", "23 Stationarity KPSS Residual Test Statistic \n", "24 Stationarity KPSS Residual Critical Value 10% \n", "25 Stationarity KPSS Residual Critical Value 5% \n", "26 Stationarity KPSS Residual Critical Value 2.5% \n", "27 Stationarity KPSS Residual Critical Value 1% \n", "28 Normality Shapiro Residual Normality \n", "29 Normality Shapiro Residual p-value \n", "\n", " Setting Value \n", "0 131.0 \n", "1 0.0 \n", "2 -0.445207 \n", "3 -0.9606 \n", "4 11.759243 \n", "5 138.27979 \n", "6 4.244741 \n", "7 -0.938657 \n", "8 127.0 \n", "9 {'alpha': 0.05, 'K': 24} 21.29991 \n", "10 {'alpha': 0.05, 'K': 48} 43.239443 \n", "11 {'alpha': 0.05, 'K': 24} 0.620976 \n", "12 {'alpha': 0.05, 'K': 48} 0.667946 \n", "13 {'alpha': 0.05, 'K': 24} True \n", "14 {'alpha': 0.05, 'K': 48} True \n", "15 {'alpha': 0.05} True \n", "16 {'alpha': 0.05} 0.0 \n", "17 {'alpha': 0.05} -11.577344 \n", "18 {'alpha': 0.05} -3.481682 \n", "19 {'alpha': 0.05} -2.884042 \n", "20 {'alpha': 0.05} -2.57877 \n", "21 {'alpha': 0.05} True \n", "22 {'alpha': 0.05} 0.1 \n", "23 {'alpha': 0.05} 0.031457 \n", "24 {'alpha': 0.05} 0.119 \n", "25 {'alpha': 0.05} 0.146 \n", "26 {'alpha': 0.05} 0.176 \n", "27 {'alpha': 0.05} 0.216 \n", "28 {'alpha': 0.05} False \n", "29 {'alpha': 0.05} 0.00003 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check Goodness of Fit\n", "exp.check_stats(model)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**Observations**\n", "\n", "1. Stationarity tests indicate that the residuals are stationary. \n", "2. The white noise test indicates that the residuals are consistent with white noise. \n", "\n", "This indicates that we have done a good job of extracting most of the signal from the time series data.\n", "\n", "Next, we can plot the diagnostics on the residuals just like we did it on the original dataset." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp.plot_model(model, plot='diagnostics', fig_kwargs={\"height\": 800, \"width\": 1000})" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**Observations**\n", "\n", "1. The ACF and PACF indicate that we have captured most of the autocorelation in the data. There is no serial autocorrelation left in the data to capture.\n", "2. The histogram and QQ plot do indicate some left skewness, but overall the results are satisfactory.\n", "\n", "**NOTE:**\n", "\n", "These plots can be obtained individually as well if needed using the following calls\n", "\n", "* `exp.plot_model(model, plot='residuals')`\n", "* `exp.plot_model(model, plot='acf')`\n", "* `exp.plot_model(model, plot='pacf')`\n", "* `exp.plot_model(model, plot='periodogram')`\n", "\n", "**Another useful plot is the `insample` plot** which shows the model fit to the actual data. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp.plot_model(model, plot='insample')" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "We could also check the decomposition of the residuals to see if \n", "\n", "1. The residual in the decomposition the largest component?\n", "2. there is any any visible trend or seasonality component that has not been captured in the model?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp.plot_model(model, plot=\"decomp\")\n", "exp.plot_model(model, plot=\"decomp_stl\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Reduced Regressors: LightGBM (with internal conditional deseasonalize and detrending)\n", "\n", "We noted above that we could use regression models for time series data as well by converting them into an appropriate format (reduced regression models). Let's see one of these in action. We will use the LightGBM regressor for this. Reduced regression models in pycaret will also detrend and conditionally deseasonalize the data to make it easier for the regression model to capture the autoregressive properties of the data." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.75450.917122.033930.01920.05380.05620.7067
11957-120.80440.810524.593827.51550.06460.06340.8017
21958-120.88801.007625.373132.75210.05430.05650.7600
Meannan0.81560.911724.000230.09560.05750.05870.7561
SDnan0.05510.08051.42642.13850.00500.00340.0389
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMASERMSSEMAERMSEMAPESMAPER2
0LGBMRegressor0.88370.974926.909033.68270.05310.05490.7952
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model = exp.create_model(\"lightgbm_cds_dt\")\n", "y_predict = exp.predict_model(model)\n", "exp.plot_model(estimator=model)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**Observations:**\n", " \n", "1. The overall cross validation metrics are comparable to the ARIMA model, but the forecasts on test data are not good. \n", "\n", "We may wish to tune the hyperparameters of the model to see if we can improve the performance." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.35770.426910.445013.97480.02630.02650.9364
11957-121.05181.051332.155535.69150.09030.08540.6663
21958-120.48080.540913.738617.58190.03220.03190.9308
Meannan0.63010.673018.779722.41610.04960.04790.8445
SDnan0.30240.27159.55329.50200.02890.02660.1261
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Random Grid Search\n", "tuned_model = exp.tune_model(model)\n", "exp.plot_model(estimator=tuned_model)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BaseCdsDtForecaster(regressor=LGBMRegressor(random_state=42), sp=12,\n", " window_length=12)\n", "BaseCdsDtForecaster(degree=2, deseasonal_model='multiplicative',\n", " regressor=LGBMRegressor(bagging_freq=5, colsample_bytree=1,\n", " learning_rate=0.0025551361408324464,\n", " max_depth=8, min_child_samples=78,\n", " n_estimators=224, num_leaves=253,\n", " random_state=42,\n", " reg_alpha=3.144127709026421e-10,\n", " reg_lambda=3.7899513783298346e-07,\n", " subsample=1),\n", " sp=12, window_length=13)\n" ] } ], "source": [ "print(model)\n", "print(tuned_model)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "This is much better than before in terms of metrics as well as the comparison to the test data. We can even compare the model performance visually." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp.plot_model([model, tuned_model], data_kwargs={\"labels\": [\"Baseline\", \"Tuned\"]})" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Getting Ready for Production\n", "\n", "So now we have built 2 models manually. We can not put one of them into production. Let's pick the Reduced regression model for this. \n", "\n", "Before, we can use this model for making future predictions, we need to finalize this. This step will take the model from the previous stage and without changing the model hyperparameters, train the model on the entire _train + test_ dataset so that we can make true future forecasts.\n", "\n", "### Finalizing Models" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y_pred
1961-01452.2415
1961-02442.2790
1961-03507.9412
1961-04495.7020
1961-05502.1397
1961-06573.5355
1961-07636.7857
1961-08637.9355
1961-09558.5830
1961-10489.0101
1961-11428.0954
1961-12483.7110
\n", "
" ], "text/plain": [ " y_pred\n", "1961-01 452.2415\n", "1961-02 442.2790\n", "1961-03 507.9412\n", "1961-04 495.7020\n", "1961-05 502.1397\n", "1961-06 573.5355\n", "1961-07 636.7857\n", "1961-08 637.9355\n", "1961-09 558.5830\n", "1961-10 489.0101\n", "1961-11 428.0954\n", "1961-12 483.7110" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Trains the model with the best hyperparameters on the entire dataset now\n", "final_model = exp.finalize_model(tuned_model)\n", "exp.plot_model(final_model)\n", "exp.predict_model(final_model)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BaseCdsDtForecaster(degree=2, deseasonal_model='multiplicative',\n", " regressor=LGBMRegressor(bagging_freq=5, colsample_bytree=1,\n", " learning_rate=0.0025551361408324464,\n", " max_depth=8, min_child_samples=78,\n", " n_estimators=224, num_leaves=253,\n", " random_state=42,\n", " reg_alpha=3.144127709026421e-10,\n", " reg_lambda=3.7899513783298346e-07,\n", " subsample=1),\n", " sp=12, window_length=13)\n", "BaseCdsDtForecaster(degree=2, deseasonal_model='multiplicative',\n", " regressor=LGBMRegressor(bagging_freq=5, colsample_bytree=1,\n", " learning_rate=0.0025551361408324464,\n", " max_depth=8, min_child_samples=78,\n", " n_estimators=224, num_leaves=253,\n", " random_state=42,\n", " reg_alpha=3.144127709026421e-10,\n", " reg_lambda=3.7899513783298346e-07,\n", " subsample=1),\n", " sp=12, window_length=13)\n" ] } ], "source": [ "print(tuned_model)\n", "print(final_model)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**Observations:**\n", "As we can see, the model hyperparameters are exactly the same. The only difference is that the `tuned_model` has been trained only using the training dataset, while the `final_model` has been trained using the full dataset.\n", "\n", "We can also plot the two models simultaneously to check the forecasts. Since the `tuned_model` has been trained on the train dataset only, it makes forecasts for the test dataset. Since the `final_model` has been trained on the entire dataset, it makes true futute predictions. " ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Save model pickle file\n", "\n", "Now, we can save this model as a pickle file. This model can then be loaded later for making predictions again." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Successfully Saved\n" ] } ], "source": [ "_ = exp.save_model(final_model, \"my_final_model\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Load Model \n", "\n", "Now, let's say you closed your training session but want to make the predictons with the saved model. This can be done easily by loading the model again (usually done in another session)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transformation Pipeline and Model Successfully Loaded\n" ] } ], "source": [ "exp_load = TSForecastingExperiment()\n", "loaded_model = exp_load.load_model(\"my_final_model\")" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y_pred
1961-01452.2415
1961-02442.2790
1961-03507.9412
1961-04495.7020
1961-05502.1397
1961-06573.5355
1961-07636.7857
1961-08637.9355
1961-09558.5830
1961-10489.0101
1961-11428.0954
1961-12483.7110
\n", "
" ], "text/plain": [ " y_pred\n", "1961-01 452.2415\n", "1961-02 442.2790\n", "1961-03 507.9412\n", "1961-04 495.7020\n", "1961-05 502.1397\n", "1961-06 573.5355\n", "1961-07 636.7857\n", "1961-08 637.9355\n", "1961-09 558.5830\n", "1961-10 489.0101\n", "1961-11 428.0954\n", "1961-12 483.7110" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Should match predictions from before the save and load\n", "exp_load.predict_model(loaded_model)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "These predictions match with the ones we got before we saved the model.\n", "\n", "Another use case is that the user may want to forecast for a longer horizon than the one used for the original training. This can be achieved as follows" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Example here shows forecasting out 36 months instead of the default of 12\n", "exp.plot_model(estimator=final_model, data_kwargs={'fh': 36}) " ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Users may also be interested in learning about Multi-step forecasts. More details about this can be **[found here](https://github.com/pycaret/pycaret/discussions/1942)**." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**That's it for this notebook. In the next notebook, we will see a more automate way to model this same data.**" ] } ], "metadata": { "interpreter": { "hash": "c161a91f6f4623a54f30c5492a42e7cf0592610fb90c8abd312086f09f8fbe0f" }, "kernelspec": { "display_name": "pycaret_sktime_0p11_2", "language": "python", "name": "pycaret_sktime_0p11_2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 2 }