{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The Magic of SHD\n", "> A simple yet fast and powerful forecasting algorithm\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- categories: [pandas, numpy, data-cleaning]\n", "- hide: false" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What's SHD?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SHD stands for (S)ingle Exponential Smoothing, (H)olt's, (D)amped forecasting algorithm. It's not often that you can describe the entire algorithm in one single sentence but I just did that. And this simple algorithm often outperforms some of the most complex forecasting algorithms including DNNs and FB Prophet on univariate low frequency time series. I have used it on many projects successfully with great results. I am sharing it because the great [Spyros Makridakis](https://www.insead.edu/faculty-research/faculty/spyros-makridakis) reminded on twitter that SHD was found superior in all M (M5 would be an exception) competitions. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">twitter: https://twitter.com/spyrosmakrid/status/1368972398498824193?s=20" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not many know about this gem so I thought I would share my code. It's a reminder that you don't always need complex algorithms to create forecast predictions. Use what's simple and parsimonious. \n", "\n", "**How does it work?**\n", "\n", "Just take arithmatic mean of forecast from SES, Holt's and Damped" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![shd](https://raw.githubusercontent.com/pawarbi/blog/master/images/shd.JPG)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**How does it stack against other algorithms?**\n", "\n", "[Read it yourself](https://flora.insead.edu/fichiersti_wp/inseadwp1999/99-70.pdf). It worked as good and even better than most other algorithms in the M3 competition. It works particularly well with low frequency time series (Yearly, monthly). It works well because we are ensembling three different algorithms. It's been shown that forecast combinations often outperform single best models. \n", "\n", "I will demonstrate it using an example below. This is the same dataset I used in my two previous [blogs](https://pawarbi.github.io/blog/forecasting/r/python/rpy2/altair/fbprophet/ensemble_forecast/uncertainty/simulation/2020/04/21/timeseries-part2.html).\n", "\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import statsmodels.api as sm\n", "from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt\n", "import statsmodels\n", "from statsmodels.tsa.statespace.exponential_smoothing import ExponentialSmoothing\n", "\n", "import scipy\n", "from scipy.stats import boxcox\n", "from scipy.special import inv_boxcox\n", "\n", "from statsmodels.tools.eval_measures import rmse" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pandas: 1.1.5\n", "Statsmodels: 0.12.2\n", "Scipy: 1.5.2\n", "Numpy: 1.19.1\n" ] } ], "source": [ "print('Pandas:', pd.__version__)\n", "print('Statsmodels:', sm.__version__)\n", "print('Scipy:', scipy.__version__)\n", "print('Numpy:', np.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SHD" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "def combshd(train,horizon,seasonality, init):\n", "\n", "# Author: Sandeep Pawar\n", "# Date: 8/30/2020\n", "# version: 1.1\n", "\n", " '''\n", " params\n", " ----------\n", " :train numpy array or Pandas series with univariate data\n", " \n", " :horizon forecast horizon (int)\n", " \n", " :seasonality For monthly 12, yearly 1, quarerly 4 (int)\n", " \n", " :init initialization ('heuristic','concentrated')\n", " \n", " output\n", " ------------\n", " numpy array if length equal to specified horizon\n", " \n", " '''\n", " \n", " train_x,lam = boxcox (train)\n", " ses=(sm.tsa.statespace.ExponentialSmoothing(train_x,\n", " trend=True, \n", " seasonal=None,\n", " initialization_method= init, \n", " damped_trend=False).fit())\n", " \n", " fc1 = inv_boxcox(ses.forecast(horizon),lam)\n", " \n", " holt=(sm.tsa.statespace.ExponentialSmoothing(train_x,\n", " trend=True, \n", " seasonal=seasonality,\n", " initialization_method= init, \n", " damped_trend=False).fit())\n", " \n", " fc2 = inv_boxcox(holt.forecast(horizon),lam)\n", " \n", " damp=(sm.tsa.statespace.ExponentialSmoothing(train_x,\n", " trend=True, \n", " seasonal=seasonality,\n", " initialization_method= init, \n", " damped_trend=True).fit())\n", " \n", " fc3 = inv_boxcox(damp.forecast(horizon),lam)\n", " \n", " fc = (fc1+fc2+fc3)/3\n", " \n", " return fc\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Sales | \n", "
---|---|
Date | \n", "\n", " |
2012-03-31 | \n", "362000 | \n", "
2012-06-30 | \n", "385000 | \n", "
2012-09-30 | \n", "432000 | \n", "
2012-12-31 | \n", "341000 | \n", "
2013-03-31 | \n", "382000 | \n", "