{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "hcrystalball is a time series forecasting library which tries to connect two traditionally disconnected worlds of forecasting - traditional econometric approaches and more recent machine learning approaches. It builds on a wonderfully simple Sklearn API, which is familiar to every practitioner in data science. It makes it very easy to try different algorithms such as Exponential Smoothing models and Gradient Boosting trees, robustly cross-validate them, combine them with rich sklearn functionality such as Transformers and Pipelines or put together very different models and construct powerful Ensembles. \n", "\n", "Currently, supported estimators are:\n", "\n", "- All Regressors following SKlearn API\n", "- [Prophet](https://facebook.github.io/prophet/docs/quick_start.html#python-api)\n", "- [AutoARIMA/ARIMA](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.AutoARIMA.html#pmdarima.arima.AutoARIMA)\n", "- [Smoothing models](https://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html#statsmodels.tsa.holtwinters.ExponentialSmoothing) from statsmodels.tsa.holtwinters \n", "- [BATS/TBATS](https://github.com/intive-DataScience/tbats)\n", "- hcrystalball native models - Simple Ensembles, Stacking Ensembles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quick Start" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Following sklearn API all hcrystalball wrappers expect training data X and y for fit method and features in X for predict method. Despite hcrystalball following closely sklearn API, there are 2 biggest differences between sklearn API and hcrystalball API :\n", "\n", "1) hcrystalball uses pandas as primary data interface - X should be pandas dataframe and y can be any 1-dimensional array (i.e. pandas Series)\n", " \n", "2) Time-series predictions are principally different from traditional supervised machine learning because it's very scarce on features, it's common practice to use just the time series itself (in econometrics referred to as univariate predictions). Why is that? To leverage additional features we need to know its values also in the future. This fundamental difference between traditional supervised ML and time-series predictions is reflected in different requirements for input data. The minimal requirement for input data is pandas dataframe X with column 'date' in string data type and any 1D array-like values of y (the target/response/dependent variable) - this reflects the univariate prediction case mentioned above. Additional features (in econometrics referred to as exogenous variables) can be added to the X dataframe." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "plt.style.use('seaborn')\n", "plt.rcParams['figure.figsize'] = [12, 6]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hcrystalball.utils import get_sales_data\n", "\n", "df = get_sales_data(n_dates=100, \n", " n_assortments=1, \n", " n_states=1, \n", " n_stores=1)\n", "X, y = pd.DataFrame(index=df.index), df['Sales']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X.head(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y.plot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### First Forecast" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's import popular Prophet.model and do some predictions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hcrystalball.wrappers import ProphetWrapper" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = ProphetWrapper()\n", "model.fit(X[:-10], y[:-10])\n", "model.predict(X[-10:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Horizon of the prediction (number of steps ahead we want to forecast) is based on the length of provided X when calling predict.\n", "\n", "Name of the returned column with prediction is derived from the wrapper's name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = ProphetWrapper(name = \"my_prophet_model\")\n", "\n", "preds = (model.fit(X[:-10], y[:-10])\n", " .predict(X[-10:])\n", " .merge(y, left_index=True, right_index=True, how='outer')\n", " .tail(50) \n", ")\n", "preds.plot(title=f\"MAE:{(preds['Sales']-preds['my_prophet_model']).abs().mean().round(3)}\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Confidence Intervals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All models which support confidence intervals in predictions have also possibility to return them - they will be returned as {name of the wrapper}_lower / _upper." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = ProphetWrapper(conf_int=True)\n", "\n", "preds = (model.fit(X[:-10], y[:-10])\n", " .predict(X[-10:])\n", " .merge(y, left_index=True, right_index=True, how='outer')\n", " .tail(50) \n", ")\n", "preds.plot(title=f\"MAE:{(preds['Sales']-preds['prophet']).abs().mean().round(3)}\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also try to keep all the functionality of the original model, so any argument which can be passed to wrapped model (i.e.Prophet) can be passed even to the wrapper. All parameters are listed in the signiture of the wrapper, but hcrystalball documents only hcrystalball specific ones, so for model specific parametes please refer to original documentation of wrapper model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ProphetWrapper?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To keep the compatibility with Sklearn as much as possible - all arguments should be passed to the wrapper during initialization. If the original model provided different ways how to add specific parameters (i.e fit method, Prophet add_regressor method...) the wrapper implements specific parameter (i.e. fit_params, extra_regressors) which can be used for passing these arguments during initialization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "inner model of the wrapper could be accessed with wrapper.model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Accessing Wrapped Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use all other wrappers for traditional time series models following the same API:\n", "\n", "- SarimaxWrapper with autosarima support\n", "- BATSWrapper\n", "- TBATSWrapper\n", "- ExponentialSmoothingWrapper\n", "- SimpleSmoothingWrapper\n", "- HoltSmoothingWrapper" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }