{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview \n", "\n", "In the 10x series of notebooks, we will look at Time Series modeling in pycaret using univariate data and no exogenous variables. We will use the famous airline dataset for illustration. Our plan of action is as follows:\n", "\n", "1. Perform EDA on the dataset to extract valuable insight about the process generating the time series. **(COMPLETED)**\n", "2. Model the dataset based on exploratory analysis (univariable model without exogenous variables). **(COMPLETED)**\n", "3. Use an automated approach (AutoML) to improve the performance. **(COMPLETED)**\n", "4. User customizations, potential pitfalls and how to overcome them. **(Covered in this notebook)**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Only enable critical logging (Optional)\n", "import os\n", "os.environ[\"PYCARET_CUSTOM_LOGGING_LEVEL\"] = \"CRITICAL\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "System:\n", " python: 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)]\n", "executable: C:\\Users\\Nikhil\\.conda\\envs\\pycaret_dev_sktime_0p11_2\\python.exe\n", " machine: Windows-10-10.0.19044-SP0\n", "\n", "PyCaret required dependencies:\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\Nikhil\\.conda\\envs\\pycaret_dev_sktime_0p11_2\\lib\\site-packages\\_distutils_hack\\__init__.py:30: UserWarning: Setuptools is replacing distutils.\n", " warnings.warn(\"Setuptools is replacing distutils.\")\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " pip: 21.2.2\n", " setuptools: 61.2.0\n", " pycaret: 3.0.0\n", " ipython: Not installed\n", " ipywidgets: 7.7.0\n", " numpy: 1.21.6\n", " pandas: 1.4.2\n", " jinja2: 3.1.2\n", " scipy: 1.8.0\n", " joblib: 1.1.0\n", " sklearn: 1.0.2\n", " pyod: Installed but version unavailable\n", " imblearn: 0.9.0\n", " category_encoders: 2.4.1\n", " lightgbm: 3.3.2\n", " numba: 0.55.1\n", " requests: 2.27.1\n", " matplotlib: 3.5.2\n", " scikitplot: 0.3.7\n", " yellowbrick: 1.4\n", " plotly: 5.8.0\n", " kaleido: 0.2.1\n", " statsmodels: 0.13.2\n", " sktime: 0.11.4\n", " tbats: Installed but version unavailable\n", " pmdarima: 1.8.5\n", "\n", "PyCaret optional dependencies:\n", " shap: Not installed\n", " interpret: Not installed\n", " umap: Not installed\n", " pandas_profiling: Not installed\n", " explainerdashboard: Not installed\n", " autoviz: Not installed\n", " fairlearn: Not installed\n", " xgboost: Not installed\n", " catboost: Not installed\n", " kmodes: Not installed\n", " mlxtend: Not installed\n", " statsforecast: 0.5.5\n", " tune_sklearn: Not installed\n", " ray: Not installed\n", " hyperopt: Not installed\n", " optuna: Not installed\n", " skopt: Not installed\n", " mlflow: 1.25.1\n", " gradio: Not installed\n", " fastapi: Not installed\n", " uvicorn: Not installed\n", " m2cgen: Not installed\n", " evidently: Not installed\n", " nltk: Not installed\n", " pyLDAvis: Not installed\n", " gensim: Not installed\n", " spacy: Not installed\n", " wordcloud: Not installed\n", " textblob: Not installed\n", " psutil: 5.9.0\n", " fugue: Not installed\n", " streamlit: Not installed\n", " prophet: Not installed\n" ] } ], "source": [ "def what_is_installed():\n", " from pycaret import show_versions\n", " show_versions()\n", "\n", "try:\n", " what_is_installed()\n", "except ModuleNotFoundError:\n", " !pip install pycaret\n", " what_is_installed()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import time\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from pycaret.datasets import get_data\n", "from pycaret.time_series import TSForecastingExperiment" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "y = get_data('airline', verbose=False)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# We want to forecast the next 12 months of data and we will use 3 fold cross-validation to test the models.\n", "fh = 12 # or alternately fh = np.arange(1,13)\n", "fold = 3" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Global Plot Settings\n", "fig_kwargs={'renderer': 'notebook'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# User Customizations\n", "\n", "Let's look at how users can customize various steps in the modeling process" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prediction Customization\n", "\n", "### Forecast Horizon\n", "Sometimes users may wish to customize the forecast horizon after the model has been created. This can be done as follows." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, session_id=42, verbose=False)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.44620.493313.028616.14850.03270.03340.9151
11957-120.59830.599318.292020.34420.05060.04910.8916
21958-121.00440.928028.699930.16690.06710.06970.7964
Meannan0.68300.673520.006922.21990.05010.05070.8677
SDnan0.23560.18516.51175.87460.01410.01480.0513
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model = exp.create_model(\"arima\")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMASERMSSEMAERMSEMAPESMAPER2
0ARIMA0.49550.539515.086718.63800.03120.03120.9373
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y_pred
1960-01420.8767
1960-02397.1069
1960-03456.4335
1960-04442.6482
1960-05463.5822
1960-06513.0988
1960-07587.0872
1960-08596.4580
1960-09499.1383
1960-10442.0694
1960-11396.2036
1960-12438.5023
\n", "
" ], "text/plain": [ " y_pred\n", "1960-01 420.8767\n", "1960-02 397.1069\n", "1960-03 456.4335\n", "1960-04 442.6482\n", "1960-05 463.5822\n", "1960-06 513.0988\n", "1960-07 587.0872\n", "1960-08 596.4580\n", "1960-09 499.1383\n", "1960-10 442.0694\n", "1960-11 396.2036\n", "1960-12 438.5023" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Default prediction\n", "exp.predict_model(model)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y_pred
1960-01420.8767
1960-02397.1069
1960-03456.4335
1960-04442.6482
1960-05463.5822
1960-06513.0988
1960-07587.0872
1960-08596.4580
1960-09499.1383
1960-10442.0694
1960-11396.2036
1960-12438.5023
1961-01453.8109
1961-02429.5811
1961-03488.5351
1961-04474.4479
1961-05495.1374
1961-06544.4560
1961-07618.2840
1961-08627.5248
1961-09530.0999
1961-10472.9458
1961-11427.0110
1961-12469.2538
\n", "
" ], "text/plain": [ " y_pred\n", "1960-01 420.8767\n", "1960-02 397.1069\n", "1960-03 456.4335\n", "1960-04 442.6482\n", "1960-05 463.5822\n", "1960-06 513.0988\n", "1960-07 587.0872\n", "1960-08 596.4580\n", "1960-09 499.1383\n", "1960-10 442.0694\n", "1960-11 396.2036\n", "1960-12 438.5023\n", "1961-01 453.8109\n", "1961-02 429.5811\n", "1961-03 488.5351\n", "1961-04 474.4479\n", "1961-05 495.1374\n", "1961-06 544.4560\n", "1961-07 618.2840\n", "1961-08 627.5248\n", "1961-09 530.0999\n", "1961-10 472.9458\n", "1961-11 427.0110\n", "1961-12 469.2538" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Increased forecast horizon to 2 years instead of the original 1 year\n", "exp.predict_model(model, fh=24)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction Interval\n", "\n", "#### NOTES: \n", "1. **When prediction intervals are requested, the default coverage = 0.9 corresponding to 90% coverage.** \n", "2. **Coverage is symmetrical around the median (alpha = 0.5). Hence a coverage of 0.9 corresponds to lower interval = 0.05 and an upper interval of 0.95 to give a total coverage between lower and upper interval = 0.9.**" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMASERMSSEMAERMSEMAPESMAPER2
0ARIMA0.49550.539515.086718.63800.03120.03120.9373
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y_predlowerupper
1960-01420.8767403.9466437.8067
1960-02397.1069375.3199418.8939
1960-03456.4335431.9786480.8884
1960-04442.6482416.5909468.7055
1960-05463.5822436.5252490.6392
1960-06513.0988485.4054540.7921
1960-07587.0872558.9843615.1902
1960-08596.4580568.0895624.8264
1960-09499.1383470.5969527.6796
1960-10442.0694413.4152470.7236
1960-11396.2036367.4756424.9315
1960-12438.5023409.7260467.2786
\n", "
" ], "text/plain": [ " y_pred lower upper\n", "1960-01 420.8767 403.9466 437.8067\n", "1960-02 397.1069 375.3199 418.8939\n", "1960-03 456.4335 431.9786 480.8884\n", "1960-04 442.6482 416.5909 468.7055\n", "1960-05 463.5822 436.5252 490.6392\n", "1960-06 513.0988 485.4054 540.7921\n", "1960-07 587.0872 558.9843 615.1902\n", "1960-08 596.4580 568.0895 624.8264\n", "1960-09 499.1383 470.5969 527.6796\n", "1960-10 442.0694 413.4152 470.7236\n", "1960-11 396.2036 367.4756 424.9315\n", "1960-12 438.5023 409.7260 467.2786" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# With Prediction Interval (default coverage = 0.9)\n", "exp.predict_model(model, return_pred_int=True)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMASERMSSEMAERMSEMAPESMAPER2
0ARIMA0.49550.539515.086718.63800.03120.03120.9373
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y_predlowerupper
1960-01420.8767407.6860434.0673
1960-02397.1069380.1320414.0818
1960-03456.4335437.3800475.4870
1960-04442.6482422.3463462.9502
1960-05463.5822442.5013484.6631
1960-06513.0988491.5221534.6754
1960-07587.0872565.1915608.9830
1960-08596.4580574.3553618.5606
1960-09499.1383476.9009521.3756
1960-10442.0694419.7441464.3946
1960-11396.2036373.8208418.5863
1960-12438.5023416.0819460.9227
\n", "
" ], "text/plain": [ " y_pred lower upper\n", "1960-01 420.8767 407.6860 434.0673\n", "1960-02 397.1069 380.1320 414.0818\n", "1960-03 456.4335 437.3800 475.4870\n", "1960-04 442.6482 422.3463 462.9502\n", "1960-05 463.5822 442.5013 484.6631\n", "1960-06 513.0988 491.5221 534.6754\n", "1960-07 587.0872 565.1915 608.9830\n", "1960-08 596.4580 574.3553 618.5606\n", "1960-09 499.1383 476.9009 521.3756\n", "1960-10 442.0694 419.7441 464.3946\n", "1960-11 396.2036 373.8208 418.5863\n", "1960-12 438.5023 416.0819 460.9227" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# With Prediction Interval (custom coverage = 0.8, corresponding to lower and upper quantiles of 0.1 and 0.9 respectively)\n", "# The point estimate remains the same as before.\n", "# But the lower and upper intervals are now narrower since we are OK with a lower coverage.\n", "exp.predict_model(model, return_pred_int=True, coverage=0.8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Sometimes, users may wish to get the point estimates at values other than the mean/median. In such cases, they can specify the alpha (quantile) value for the point estimate directly.**\n", "\n", "**NOTE: Not all models support this feature. If this is used with models that do not support it, an error is raised. If you want to only use models that support this feature, you must set `point_alpha` to a floating point value in the `setup` stage (see below).**" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMASERMSSEMAERMSEMAPESMAPER2
0ARIMA0.43350.516813.200417.85490.02920.02870.9425
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y_pred
1960-01426.2742
1960-02404.0529
1960-03464.2301
1960-04450.9556
1960-05472.2083
1960-06521.9277
1960-07596.0468
1960-08605.5022
1960-09508.2376
1960-10451.2047
1960-11405.3624
1960-12447.6766
\n", "
" ], "text/plain": [ " y_pred\n", "1960-01 426.2742\n", "1960-02 404.0529\n", "1960-03 464.2301\n", "1960-04 450.9556\n", "1960-05 472.2083\n", "1960-06 521.9277\n", "1960-07 596.0468\n", "1960-08 605.5022\n", "1960-09 508.2376\n", "1960-10 451.2047\n", "1960-11 405.3624\n", "1960-12 447.6766" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# With Custom Point Estimate (alpha = 0.7)\n", "# The point estimate is now higher than before since we are asking for the\n", "# 70% percentile as the point estimate), vs. mean/median before.\n", "exp.predict_model(model, alpha=0.7)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.71370.844020.841227.62620.05130.05330.7516
11957-120.66780.703820.417223.89180.05570.05390.8505
21958-120.71980.763020.566924.80240.04570.04710.8624
Meannan0.70040.770220.608425.44010.05090.05140.8215
SDnan0.02320.05750.17561.58980.00410.00310.0497
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMASERMSSEMAERMSEMAPESMAPER2
0LinearRegression0.79930.927524.337632.04180.04750.04930.8147
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y_predlowerupper
1960-01399.5740NaNNaN
1960-02384.6911NaNNaN
1960-03420.8922NaNNaN
1960-04412.8696NaNNaN
1960-05438.3520NaNNaN
1960-06494.9357NaNNaN
1960-07556.8907NaNNaN
1960-08558.1492NaNNaN
1960-09503.6881NaNNaN
1960-10449.0433NaNNaN
1960-11405.1229NaNNaN
1960-12431.7701NaNNaN
\n", "
" ], "text/plain": [ " y_pred lower upper\n", "1960-01 399.5740 NaN NaN\n", "1960-02 384.6911 NaN NaN\n", "1960-03 420.8922 NaN NaN\n", "1960-04 412.8696 NaN NaN\n", "1960-05 438.3520 NaN NaN\n", "1960-06 494.9357 NaN NaN\n", "1960-07 556.8907 NaN NaN\n", "1960-08 558.1492 NaN NaN\n", "1960-09 503.6881 NaN NaN\n", "1960-10 449.0433 NaN NaN\n", "1960-11 405.1229 NaN NaN\n", "1960-12 431.7701 NaN NaN" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For models that do not produce a prediction interval --> returns NA values\n", "model_no_pred_int = exp.create_model(\"lr_cds_dt\")\n", "exp.predict_model(model_no_pred_int, return_pred_int=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Forecast Plotting Customization\n", "\n", "Similar to the prediction customization, we can customize the forecast plots as well." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Regular Plot\n", "exp.plot_model(model)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Modified Plot (zoom into the plot to see differences between the 2 plots)\n", "exp.plot_model(model, data_kwargs={\"alpha\": 0.7, \"coverage\": 0.8})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enforce Prediction Intervals\n", "\n", "In some use cases, it is important to have prediction intervals. Users may wish to restrict the modeling to only those models that support prediction intervals.\n", "\n", "* Specifying `point_alpha` to any floating point value restricts the models to only those that provide a prediction interval. The value that is specified corresponds to the quantile of the point prediction that is returned.\n", "* This also adds an extra metric called `COVERAGE`.\n", "* `COVERAGE` gives the percentage of actuals that are within the prediction interval." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id3833
1TargetNumber of airline passengers
2ApproachUnivariate
3Exogenous VariablesNot Present
4Original data shape(144, 1)
5Transformed data shape(144, 1)
6Transformed train set shape(132, 1)
7Transformed test set shape(12, 1)
8Rows with missing values0.0%
9Fold GeneratorExpandingWindowSplitter
10Fold Number3
11Enforce Prediction IntervalTrue
12Seasonal Period(s) Tested12
13Seasonality PresentTrue
14Seasonalities Detected[12]
15Primary Seasonality12
16Target Strictly PositiveTrue
17Target White NoiseNo
18Recommended d1
19Recommended Seasonal D1
20PreprocessFalse
21CPU Jobs-1
22Use GPUFalse
23Log ExperimentFalse
24Experiment Namets-default-name
25USI17db
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameReferenceTurbo
ID
arimaARIMAsktime.forecasting.arima.ARIMATrue
auto_arimaAuto ARIMAsktime.forecasting.arima.AutoARIMATrue
etsETSsktime.forecasting.ets.AutoETSTrue
thetaTheta Forecastersktime.forecasting.theta.ThetaForecasterTrue
tbatsTBATSsktime.forecasting.tbats.TBATSFalse
batsBATSsktime.forecasting.bats.BATSFalse
\n", "
" ], "text/plain": [ " Name Reference Turbo\n", "ID \n", "arima ARIMA sktime.forecasting.arima.ARIMA True\n", "auto_arima Auto ARIMA sktime.forecasting.arima.AutoARIMA True\n", "ets ETS sktime.forecasting.ets.AutoETS True\n", "theta Theta Forecaster sktime.forecasting.theta.ThetaForecaster True\n", "tbats TBATS sktime.forecasting.tbats.TBATS False\n", "bats BATS sktime.forecasting.bats.BATS False" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp = TSForecastingExperiment()\n", "\n", "# We can see that specifying a value for point_alpha enables `Enforce Prediction Interval` in the grid (and limits the models).\n", "exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, point_alpha=0.5)\n", "exp.models()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ModelMASERMSSEMAERMSEMAPESMAPER2COVERAGETT (Sec)
etsETS0.59320.620117.421420.47420.04400.04450.88840.66670.1933
arimaARIMA0.68300.673520.006922.21990.05010.05070.86770.63891.5467
auto_arimaAuto ARIMA0.71810.711421.029723.46610.05250.05310.85090.69442.7633
thetaTheta Forecaster0.97291.030628.319233.86390.06700.07000.67100.63890.0500
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "best_model = exp.compare_models()\n", "\n", "# # To enable slower models such as prophet, BATS and TBATS, add turbo=False\n", "# best_model = exp.compare_models(turbo=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Types of Window Splitters\n", "\n", "Various window splitters are available for performing the cross validation.\n", "\n", "### Sliding Window Splitter" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, fold_strategy='sliding', verbose=False)\n", "exp.plot_model(plot=\"cv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Expanding/Rolling Window\n", "\n", "* They are identical" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, fold_strategy='expanding', verbose=False)\n", "exp.plot_model(plot=\"cv\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, fold_strategy='rolling', verbose=False)\n", "exp.plot_model(plot=\"cv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Error Handling due to lack of data\n", "\n", "Sometimes, there are not enough data points available to perform the experiment. In such cases, pycaret will warn you accordingly." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Not Enough Data Points, set a lower number of folds or fh\n" ] } ], "source": [ "try:\n", " exp = TSForecastingExperiment()\n", " exp.setup(data=y[:30], fh=12, fold=3, fig_kwargs=fig_kwargs)\n", "except ValueError as error:\n", " print(error)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id5965
1TargetNumber of airline passengers
2ApproachUnivariate
3Exogenous VariablesNot Present
4Original data shape(30, 1)
5Transformed data shape(30, 1)
6Transformed train set shape(24, 1)
7Transformed test set shape(6, 1)
8Rows with missing values0.0%
9Fold GeneratorExpandingWindowSplitter
10Fold Number3
11Enforce Prediction IntervalFalse
12Seasonal Period(s) Tested12
13Seasonality PresentFalse
14Seasonalities Detected[1]
15Primary Seasonality1
16Target Strictly PositiveTrue
17Target White NoiseNo
18Recommended d1
19Recommended Seasonal D0
20PreprocessFalse
21CPU Jobs-1
22Use GPUFalse
23Log ExperimentFalse
24Experiment Namets-default-name
25USI2a8f
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "try:\n", " exp = TSForecastingExperiment()\n", " exp.setup(data=y[:30], fh=6, fold=3, fig_kwargs=fig_kwargs)\n", "except ValueError as error:\n", " print(error)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuning Customization" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id42
1TargetNumber of airline passengers
2ApproachUnivariate
3Exogenous VariablesNot Present
4Original data shape(144, 1)
5Transformed data shape(144, 1)
6Transformed train set shape(132, 1)
7Transformed test set shape(12, 1)
8Rows with missing values0.0%
9Fold GeneratorExpandingWindowSplitter
10Fold Number3
11Enforce Prediction IntervalFalse
12Seasonal Period(s) Tested12
13Seasonality PresentTrue
14Seasonalities Detected[12]
15Primary Seasonality12
16Target Strictly PositiveTrue
17Target White NoiseNo
18Recommended d1
19Recommended Seasonal D1
20PreprocessFalse
21CPU Jobs-1
22Use GPUFalse
23Log ExperimentFalse
24Experiment Namets-default-name
25USIf401
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, session_id=42)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.71370.844020.841227.62620.05130.05330.7516
11957-120.66780.703820.417223.89180.05570.05390.8505
21958-120.71980.763020.566924.80240.04570.04710.8624
Meannan0.70040.770220.608425.44010.05090.05140.8215
SDnan0.02320.05750.17561.58980.00410.00310.0497
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model = exp.create_model(\"lr_cds_dt\")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.31570.36999.218412.10770.02330.02350.9523
11957-121.00090.983530.601133.38980.08340.07940.7079
21958-120.47870.488213.678615.86820.03200.03250.9437
Meannan0.59840.613917.832720.45520.04620.04520.8680
SDnan0.29230.26589.21049.27410.02650.02450.1132
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,\n", " window_length=12)\n", "BaseCdsDtForecaster(degree=2, deseasonal_model='multiplicative',\n", " regressor=LinearRegression(fit_intercept=False, n_jobs=-1,\n", " normalize=True),\n", " sp=12, window_length=23)\n" ] } ], "source": [ "# Random Grid Search (default)\n", "tuned_model = exp.tune_model(model)\n", "print(model)\n", "print(tuned_model)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp.plot_model([model, tuned_model], data_kwargs={\"labels\": [\"Original\", \"Tuned\"]})" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.82521.002024.097532.79980.05870.06160.6498
11957-120.71150.748721.753025.41770.05830.05710.8307
21958-120.72030.829620.581826.96840.04450.04590.8373
Meannan0.75230.860122.144128.39530.05390.05490.7726
SDnan0.05160.10561.46173.17810.00660.00660.0869
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,\n", " window_length=12)\n", "BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,\n", " window_length=12)\n" ] } ], "source": [ "# Fixed Grid Search\n", "tuned_model = exp.tune_model(model, search_algorithm=\"grid\")\n", "print(model)\n", "print(tuned_model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", "* In this case, the tuning resulted in worse metrics than the original model (this is possible).\n", "* Hence, pycaret returned the original model as the best one since `choose_better=True` by default.\n", "* If the user does not want this behavior, they can set `choose_better=False`" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.82521.002024.097532.79980.05870.06160.6498
11957-120.71150.748721.753025.41770.05830.05710.8307
21958-120.72030.829620.581826.96840.04450.04590.8373
Meannan0.75230.860122.144128.39530.05390.05490.7726
SDnan0.05160.10561.46173.17810.00660.00660.0869
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,\n", " window_length=12)\n", "BaseCdsDtForecaster(regressor=LinearRegression(fit_intercept=False, n_jobs=-1,\n", " normalize=True),\n", " sp=12)\n" ] } ], "source": [ "tuned_model = exp.tune_model(model, search_algorithm=\"grid\", choose_better=False)\n", "print(model)\n", "print(tuned_model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes, there are time constraints on the tuning so users may wish to adjust the number of hyperparameters that are tried using the `n_iter` argument." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cutoffMASERMSSEMAERMSEMAPESMAPER2
01956-120.31570.36999.218412.10770.02330.02350.9523
11957-121.00090.983530.601133.38980.08340.07940.7079
21958-120.47870.488213.678615.86820.03200.03250.9437
Meannan0.59840.613917.832720.45520.04620.04520.8680
SDnan0.29230.26589.21049.27410.02650.02450.1132
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,\n", " window_length=12)\n", "BaseCdsDtForecaster(degree=2, deseasonal_model='multiplicative',\n", " regressor=LinearRegression(fit_intercept=False, n_jobs=-1,\n", " normalize=True),\n", " sp=12, window_length=23)\n" ] } ], "source": [ "tuned_model = exp.tune_model(model, n_iter=5)\n", "print(model)\n", "print(tuned_model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More information about tunuing in pycaret time series can be found here:\n", " \n", "1. **[Basic Tuning](https://github.com/pycaret/pycaret/discussions/1791)**\n", "2. **[Advanced Tuning](https://github.com/pycaret/pycaret/discussions/1795)**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting Renderer\n", "\n", "Sometimes the plotly renderer if not detected correctly for the environment. In such cases, the users can manually specify the render in pycaret" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Renderer 'plotly_mimetype+notebook' is not a valid Plotly renderer. Valid renderers are:\n", " Renderers configuration\n", "-----------------------\n", " Default renderer: 'plotly_mimetype+notebook'\n", " Available renderers:\n", " ['plotly_mimetype', 'jupyterlab', 'nteract', 'vscode',\n", " 'notebook', 'notebook_connected', 'kaggle', 'azure', 'colab',\n", " 'cocalc', 'databricks', 'json', 'png', 'jpeg', 'jpg', 'svg',\n", " 'pdf', 'browser', 'firefox', 'chrome', 'chromium', 'iframe',\n", " 'iframe_connected', 'sphinx_gallery', 'sphinx_gallery_png']\n", "\n", "When data exceeds a certain threshold (determined by `big_data_threshold`), the renderer is switched to a static one to prevent notebooks from being slowed down.\n", "This renderer may need to be installed manually by users.\n", "Alternately:\n", "Option 1: Users can increase `big_data_threshold` in either `setup` (globally) or `plot_model` (plot specific). Examples.\n", "\t>>> setup(..., fig_kwargs={'big_data_threshold': 1000})\n", "\t>>> plot_model(..., fig_kwargs={'big_data_threshold': 1000})\n", "Option 2: Users can specify any plotly renderer directly in either `setup` (globally) or `plot_model` (plot specific). Examples.\n", "\t>>> setup(..., fig_kwargs={'renderer': 'notebook'})\n", "\t>>> plot_model(..., fig_kwargs={'renderer': 'colab'})\n", "Refer to the docstring in `setup` for more details.\n" ] } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, verbose=False)\n", "exp.plot_model(plot=\"cv\")" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, fig_kwargs={'renderer': 'notebook'}, verbose=False)\n", "exp.plot_model(plot=\"cv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Users can also specify the renderer for specific plot types" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "image/png": "" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp.plot_model(fig_kwargs={'renderer': 'png'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Seasonal Period\n", "\n", "* Setting the seasonal period for time series models is one of the most important aspects that can dictate how accurate the model are.\n", "* By default, pycaret will try to derive the seasonal period from the index. \n", "* When this can not be done, seasonal period needs to be provided manually by the user.\n", "* Even when the seasonal period can be derived from the index, users can always override this manually by specyig the seasonal period." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id641
1TargetNumber of airline passengers
2ApproachUnivariate
3Exogenous VariablesNot Present
4Original data shape(144, 1)
5Transformed data shape(144, 1)
6Transformed train set shape(132, 1)
7Transformed test set shape(12, 1)
8Rows with missing values0.0%
9Fold GeneratorExpandingWindowSplitter
10Fold Number3
11Enforce Prediction IntervalFalse
12Seasonal Period(s) Tested12
13Seasonality PresentTrue
14Seasonalities Detected[12]
15Primary Seasonality12
16Target Strictly PositiveTrue
17Target White NoiseNo
18Recommended d1
19Recommended Seasonal D1
20PreprocessFalse
21CPU Jobs-1
22Use GPUFalse
23Log ExperimentFalse
24Experiment Namets-default-name
25USI325d
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", "\n", "* The default Seasinal Period derived from index = 12\n", "\n", "Users can change this based on EDA. e.g. lets change it to 36" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id955
1TargetNumber of airline passengers
2ApproachUnivariate
3Exogenous VariablesNot Present
4Original data shape(144, 1)
5Transformed data shape(144, 1)
6Transformed train set shape(132, 1)
7Transformed test set shape(12, 1)
8Rows with missing values0.0%
9Fold GeneratorExpandingWindowSplitter
10Fold Number3
11Enforce Prediction IntervalFalse
12Seasonal Period(s) Tested36
13Seasonality PresentFalse
14Seasonalities Detected[1]
15Primary Seasonality1
16Target Strictly PositiveTrue
17Target White NoiseNo
18Recommended d1
19Recommended Seasonal D0
20PreprocessFalse
21CPU Jobs-1
22Use GPUFalse
23Log ExperimentFalse
24Experiment Namets-default-name
25USI546a
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp = TSForecastingExperiment()\n", "exp.setup(data=y, fh=fh, fold=fold, seasonal_period=36, fig_kwargs=fig_kwargs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", " \n", "* In this case, the user specified a seasonal period of 36, but a seasonality test at this period did not detect seasonality.\n", "* Hence a seasonality of 1 will be used for modeling." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
x
0173.786244
1174.850941
2175.435101
3174.807199
4174.872474
\n", "
" ], "text/plain": [ " x\n", "0 173.786244\n", "1 174.850941\n", "2 175.435101\n", "3 174.807199\n", "4 174.872474" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "y = get_data(\"1\", folder=\"time_series/ar1\")" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The index of your 'data' is of type ''. If the 'data' index is not of one of the following types: , , then 'seasonal_period' must be provided. Refer to docstring for options.\n" ] } ], "source": [ "try:\n", " exp = TSForecastingExperiment()\n", " exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs)\n", "except ValueError as error:\n", " print(error)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", "* The frequency/seasonal period could not be derived from the index. Hence the user needs to specify this manually.\n", "* The user can specify an arbitrary seasonal period at first (as below), perform EDA to deterine the appropriate seasonal period for modeling.\n" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id5965
1Targetx
2ApproachUnivariate
3Exogenous VariablesNot Present
4Original data shape(340, 1)
5Transformed data shape(340, 1)
6Transformed train set shape(328, 1)
7Transformed test set shape(12, 1)
8Rows with missing values0.0%
9Fold GeneratorExpandingWindowSplitter
10Fold Number3
11Enforce Prediction IntervalFalse
12Seasonal Period(s) Tested3
13Seasonality PresentTrue
14Seasonalities Detected[3]
15Primary Seasonality3
16Target Strictly PositiveTrue
17Target White NoiseNo
18Recommended d1
19Recommended Seasonal D0
20PreprocessFalse
21CPU Jobs-1
22Use GPUFalse
23Log ExperimentFalse
24Experiment Namets-default-name
25USI41bc
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eda = TSForecastingExperiment()\n", "eda.setup(data=y, fh=fh, fold=fold, seasonal_period=3, fig_kwargs=fig_kwargs)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "eda.plot_model(plot=\"diagnostics\", fig_kwargs={\"height\": 600, \"width\": 1000})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", "* We see wandering behavior in the dataset but no real seasonal pattern.\n", "* We should reser the seasonal period to 1 for the modeling." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 DescriptionValue
0session_id5108
1Targetx
2ApproachUnivariate
3Exogenous VariablesNot Present
4Original data shape(340, 1)
5Transformed data shape(340, 1)
6Transformed train set shape(328, 1)
7Transformed test set shape(12, 1)
8Rows with missing values0.0%
9Fold GeneratorExpandingWindowSplitter
10Fold Number3
11Enforce Prediction IntervalFalse
12Seasonal Period(s) Tested1
13Seasonality PresentFalse
14Seasonalities Detected[1]
15Primary Seasonality1
16Target Strictly PositiveTrue
17Target White NoiseNo
18Recommended d1
19Recommended Seasonal D0
20PreprocessFalse
21CPU Jobs-1
22Use GPUFalse
23Log ExperimentFalse
24Experiment Namets-default-name
25USIcf16
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eda = TSForecastingExperiment()\n", "eda.setup(data=y, fh=fh, fold=fold, seasonal_period=1, fig_kwargs=fig_kwargs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**That's it for this notebook. If you would like to see other demonstrations, feel free to open an issue on [GitHub](https://github.com/pycaret/pycaret/issues).** " ] } ], "metadata": { "interpreter": { "hash": "83be8a105015beb0be3130957f981d91e0431cfb610106a7fbaabcd7fd8062ab" }, "kernelspec": { "display_name": "pycaret_sktime_0p11_2", "language": "python", "name": "pycaret_sktime_0p11_2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 2 }