{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Autoregressive Modelling in Sklearn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a lot of cases, traditional time series models work well, but there are many traditional machine learning algorithms that work very well on tabular datasets and it would be waste not to leverage their power for time-series forecast. To enable its use we developed SklearnWrapper" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SklearnWrapper\n", "\n", "Allows you use Sklearn-API regressors as autoregressive models for time-series predictions. In terms of usage, there is one difference between the rest of the wrappers and SklearnWrapper. Since the model is provided by package user and we don't know parameters of the model ahead - usage of factory function get_sklearn_wrapper is needed. You can put any sklearn-compatible regressor to the function and it will return SklearnWrapper class" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "plt.style.use('seaborn')\n", "plt.rcParams['figure.figsize'] = [12, 6]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hcrystalball.utils import get_sales_data\n", "\n", "df = get_sales_data(n_dates=365*2, \n", " n_assortments=1, \n", " n_states=1, \n", " n_stores=1)\n", "X, y = pd.DataFrame(index=df.index), df['Sales']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hcrystalball.wrappers import get_sklearn_wrapper" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sklearn's or sklearn's compatible regressor\n", "You can provide any parameter for original regressor model. It doesn't need to be just Sklearn model, only Sklearn API is required. \n", "You define as first positional argument the wrapping class and than you can mix wrappers parameters (`clip_predictions_lower`) with wrapped class parameters(`n_estimators`)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.ensemble import RandomForestRegressor" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = get_sklearn_wrapper(RandomForestRegressor, n_estimators=100, clip_predictions_lower=0., random_state=42)\n", "preds = (model.fit(X[:-10], y[:-10])\n", " .predict(X[-10:])\n", " .merge(y, left_index=True, right_index=True, how='outer')\n", " .tail(50) \n", ")\n", "preds.plot(title=f\"MAE:{(preds['Sales']-preds['sklearn']).abs().mean().round(3)}\");" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }