{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cross Validation\n", "As time-series have the inherent structure we could run into problems with traditional shuffled Kfolds cross-validation. hcrystalball implements forward rolling cross-validation making training set consist only of observations that occurred prior to the observations that form the test set.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hcrystalball.model_selection import FinerTimeSplit\n", "from sklearn.model_selection import cross_validate\n", "from hcrystalball.wrappers import ExponentialSmoothingWrapper" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "# plt.style.use('fivethirtyeight')\n", "plt.rcParams['figure.figsize'] = [12, 6]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hcrystalball.utils import get_sales_data\n", "\n", "df = get_sales_data(n_dates=100, \n", " n_assortments=1,\n", " n_states=1, \n", " n_stores=1)\n", "X, y = pd.DataFrame(index=df.index), df['Sales']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Native Cross Validation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cross_validate(ExponentialSmoothingWrapper(), \n", " X, \n", " y, \n", " cv=FinerTimeSplit(horizon=5, n_splits=2), \n", " scoring='neg_mean_absolute_error')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Grid search and model selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Model selection and parameter tuning is the area where hcrystalball really shines. There is ongoing and probably a never-ending discussion about superiority or inferiority of ML techniques over common statistical/econometrical ones. Why not try both? The problem of a simple comparison between the performance of different kind of algorithms such as SARIMAX, Prophet, regularized linear models, and XGBoost lead to hcrystalball. Let's see how to do it!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hcrystalball.compose import TSColumnTransformer\n", "from hcrystalball.feature_extraction import SeasonalityTransformer\n", "from hcrystalball.wrappers import ProphetWrapper\n", "from hcrystalball.wrappers import get_sklearn_wrapper\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.pipeline import Pipeline\n", "\n", "from hcrystalball.wrappers import SarimaxWrapper\n", "from sklearn.model_selection import GridSearchCV\n", "\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define our pipeline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sklearn_model_pipeline = Pipeline([\n", " ('seasonality', SeasonalityTransformer(freq='D')),\n", " ('model', 'passthrough') # this will be overwritten by param grid\n", "]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define pipeline parameters including different models" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "param_grid = [{'model': [sklearn_model_pipeline],\n", " 'model__model':[get_sklearn_wrapper(RandomForestRegressor), \n", " get_sklearn_wrapper(LinearRegression)]},\n", " {'model': [ProphetWrapper()],\n", " 'model__seasonality_mode':['multiplicative', 'additive']},\n", " {'model': [SarimaxWrapper(order=(2,1,1), suppress_warnings=True)]}\n", " ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run native grid search" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "grid_search = GridSearchCV(estimator=sklearn_model_pipeline, \n", " param_grid=param_grid, \n", " scoring='neg_mean_absolute_error', \n", " cv=FinerTimeSplit(horizon=5, n_splits=2),\n", " refit=False, \n", " error_score=np.nan) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "results = grid_search.fit(X, y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pd.DataFrame(results.cv_results_).sort_values('rank_test_score').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems that the best model is sklearn with RandomForestRegressor, but in time-series, it is often also a good idea to check how the forecasts look like. Unfortunately, this is not possible with sklearn. grid_search is returning just the results, not the predictions of individual models for different splits. hcrystalball thus implements special scoring functions that track all data from the grid_search." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Custom scorer" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hcrystalball.metrics import make_ts_scorer\n", "from sklearn.metrics import mean_absolute_error" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scoring = make_ts_scorer(mean_absolute_error, \n", " greater_is_better=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "grid_search = GridSearchCV(estimator=sklearn_model_pipeline, \n", " param_grid=param_grid, \n", " scoring=scoring, \n", " cv=FinerTimeSplit(horizon=5, n_splits=2),\n", " refit=False, \n", " error_score=np.nan) \n", "results = grid_search.fit(X, y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "results.scorer_.cv_data.loc[:,lambda x: x.columns != 'split'].plot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "hcrystalball internally tracks data based on unique model hashes since model string represantations (reprs) are very long for usable columns names in dataframe, but if you are curious i.e. what was the worse model not to use it for further experiment, you can do it with scorers `estimator_ids` attribute" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "results.scorer_.cv_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can get to the model definitions using hash in `results.scorer_.estimator_ids` dict" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }