{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ES-RNN model \n", "\n", "In this notebook, we demonstrate how to:\n", "- prepare time series data for training a RNN forecasting model\n", "- get data in the required shape for the keras API\n", "- implement an ES-RNN model in keras to predict the next 3 steps ahead (time *t+1* to *t+3*) in the time series. This model uses recent values of load as the model input. The model will be trained to output a vector, the elements of which are ordered predictions for future time steps.\n", "- enable early stopping to reduce the likelihood of model overfitting\n", "- evaluate the model on a test dataset\n", "\n", "The data in this example is taken from the GEFCom2014 forecasting competition<sup>1</sup>. It consists of 3 years of hourly electricity load and temperature values between 2012 and 2014. The task is to forecast future values of electricity load.\n", "\n", "<sup>1</sup>Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli and Rob J. Hyndman, \"Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond\", International Journal of Forecasting, vol.32, no.3, pp 896-913, July-September, 2016." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "import warnings\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import datetime as dt\n", "from collections import UserDict, deque\n", "from IPython.display import Image\n", "%matplotlib inline\n", "\n", "from common.utils import load_data, mape, TimeSeriesTensor, create_evaluation_df\n", "\n", "pd.options.display.float_format = '{:,.2f}'.format\n", "np.set_printoptions(precision=2)\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load data into Pandas dataframe" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>load</th>\n", " <th>temp</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>2012-01-01 00:00:00</th>\n", " <td>2,698.00</td>\n", " <td>32.00</td>\n", " </tr>\n", " <tr>\n", " <th>2012-01-01 01:00:00</th>\n", " <td>2,558.00</td>\n", " <td>32.67</td>\n", " </tr>\n", " <tr>\n", " <th>2012-01-01 02:00:00</th>\n", " <td>2,444.00</td>\n", " <td>30.00</td>\n", " </tr>\n", " <tr>\n", " <th>2012-01-01 03:00:00</th>\n", " <td>2,402.00</td>\n", " <td>31.00</td>\n", " </tr>\n", " <tr>\n", " <th>2012-01-01 04:00:00</th>\n", " <td>2,403.00</td>\n", " <td>32.00</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " load temp\n", "2012-01-01 00:00:00 2,698.00 32.00\n", "2012-01-01 01:00:00 2,558.00 32.67\n", "2012-01-01 02:00:00 2,444.00 30.00\n", "2012-01-01 03:00:00 2,402.00 31.00\n", "2012-01-01 04:00:00 2,403.00 32.00" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "energy = load_data('data/')\n", "energy.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data preparation\n", "\n", "For this example, we will set *T=6*. This means that the input for each sample is a vector of the prevous 6 hours of the energy load. The choice of *T=6* was arbitrary but should be selected through experimentation.\n", "\n", "*HORIZON=3* specifies that we have a forecasting horizon of 3 (*t+3*)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "valid_start_dt = '2014-08-30 08:00:00'\n", "test_start_dt = '2014-10-31 11:00:00'\n", "test_end_dt = '2014-12-30 18:00:00'\n", "\n", "T = 6\n", "HORIZON = 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create training set." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "train = energy.copy()[energy.index < valid_start_dt][['load']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Scale data to be in range (0, 1). This transformation should be calibrated on the training set only. This is to prevent information from the validation or test sets leaking into the training data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import MinMaxScaler\n", "\n", "scaler = MinMaxScaler()\n", "scaler.fit(train[['load']])\n", "train[['load']] = scaler.transform(train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the TimeSeriesTensor convenience class to:\n", "1. Shift the values of the time series to create a Pandas dataframe containing all the data for a single training example\n", "2. Discard any samples with missing values\n", "3. Transform this Pandas dataframe into a numpy array of shape (samples, time steps, features) for input into Keras\n", "\n", "The class takes the following parameters:\n", "\n", "- **dataset**: original time series\n", "- **H**: the forecast horizon\n", "- **tensor_structure**: a dictionary discribing the tensor structure in the form { 'tensor_name' : (range(max_backward_shift, max_forward_shift), [feature, feature, ...] ) }\n", "- **freq**: time series frequency\n", "- **drop_incomplete**: (Boolean) whether to drop incomplete samples" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "tensor_structure = {'X':(range(-T+1, 1), ['load'])}\n", "train_inputs = TimeSeriesTensor(train, 'load', HORIZON, tensor_structure)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead tr th {\n", " text-align: left;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr>\n", " <th>tensor</th>\n", " <th colspan=\"3\" halign=\"left\">target</th>\n", " <th colspan=\"6\" halign=\"left\">X</th>\n", " </tr>\n", " <tr>\n", " <th>feature</th>\n", " <th colspan=\"3\" halign=\"left\">y</th>\n", " <th colspan=\"6\" halign=\"left\">load</th>\n", " </tr>\n", " <tr>\n", " <th>time step</th>\n", " <th>t+1</th>\n", " <th>t+2</th>\n", " <th>t+3</th>\n", " <th>t-5</th>\n", " <th>t-4</th>\n", " <th>t-3</th>\n", " <th>t-2</th>\n", " <th>t-1</th>\n", " <th>t</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>2012-01-01 05:00:00</th>\n", " <td>0.18</td>\n", " <td>0.23</td>\n", " <td>0.29</td>\n", " <td>0.22</td>\n", " <td>0.18</td>\n", " <td>0.14</td>\n", " <td>0.13</td>\n", " <td>0.13</td>\n", " <td>0.15</td>\n", " </tr>\n", " <tr>\n", " <th>2012-01-01 06:00:00</th>\n", " <td>0.23</td>\n", " <td>0.29</td>\n", " <td>0.35</td>\n", " <td>0.18</td>\n", " <td>0.14</td>\n", " <td>0.13</td>\n", " <td>0.13</td>\n", " <td>0.15</td>\n", " <td>0.18</td>\n", " </tr>\n", " <tr>\n", " <th>2012-01-01 07:00:00</th>\n", " <td>0.29</td>\n", " <td>0.35</td>\n", " <td>0.37</td>\n", " <td>0.14</td>\n", " <td>0.13</td>\n", " <td>0.13</td>\n", " <td>0.15</td>\n", " <td>0.18</td>\n", " <td>0.23</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "tensor target X \n", "feature y load \n", "time step t+1 t+2 t+3 t-5 t-4 t-3 t-2 t-1 t\n", "2012-01-01 05:00:00 0.18 0.23 0.29 0.22 0.18 0.14 0.13 0.13 0.15\n", "2012-01-01 06:00:00 0.23 0.29 0.35 0.18 0.14 0.13 0.13 0.15 0.18\n", "2012-01-01 07:00:00 0.29 0.35 0.37 0.14 0.13 0.13 0.15 0.18 0.23" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_inputs.dataframe.head(3)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "y_train shape: (23328, 3)\n", "x_train shape: (23328, 6, 1)\n" ] } ], "source": [ "print(\"y_train shape: \", train_inputs['target'].shape)\n", "print(\"x_train shape: \", train_inputs['X'].shape)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[[0.22],\n", " [0.18],\n", " [0.14],\n", " [0.13],\n", " [0.13],\n", " [0.15]],\n", "\n", " [[0.18],\n", " [0.14],\n", " [0.13],\n", " [0.13],\n", " [0.15],\n", " [0.18]],\n", "\n", " [[0.14],\n", " [0.13],\n", " [0.13],\n", " [0.15],\n", " [0.18],\n", " [0.23]]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_inputs['X'][:3]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[0.18, 0.23, 0.29],\n", " [0.23, 0.29, 0.35],\n", " [0.29, 0.35, 0.37]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_inputs['target'][:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Construct validation set (keeping T hours from the training set in order to construct initial features)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "look_back_dt = dt.datetime.strptime(valid_start_dt, '%Y-%m-%d %H:%M:%S') - dt.timedelta(hours=T-1)\n", "valid = energy.copy()[(energy.index >=look_back_dt) & (energy.index < test_start_dt)][['load']]\n", "valid[['load']] = scaler.transform(valid)\n", "valid_inputs = TimeSeriesTensor(valid, 'load', HORIZON, tensor_structure)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implement ES-RNN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will implement ES-RNN forecasting model with the following structure:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image('./images/es_rnn.png')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] } ], "source": [ "from keras.models import Model\n", "from keras.layers import Input, GRU, Dense, Lambda\n", "from keras.callbacks import EarlyStopping" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "LATENT_DIM = 5 # number of units in the RNN layer\n", "BATCH_SIZE = 48 # number of samples per mini-batch\n", "EPOCHS = 10 # maximum number of times the training algorithm will cycle through all samples\n", "m = 24 # seasonality length" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create custom layers in Keras\n", "In this section we define two custom layers:\n", "- ***ES*** layer: This layer implements the Expomemtial Smoothing, normalization and de-seasonalization for input data.\n", "- ***Denormalization*** layer: This layer takes the normalization and de-seasonalization coefficients from ES layer and multiply them to output of RNN for de-normalization and seasonalization.\n", "\n", "\n", "There are 3 methods you need to implement in your custom layer:\n", "- build(input_shape): this is where you will define your weights.\n", "- call(x): this is where the layer's logic lives.\n", "- compute_output_shape(input_shape): in case your layer modifies the shape of its input, you should specify here the shape transformation logic. \n", "\n", "You can check [Keras documentation](https://keras.io/layers/writing-your-own-keras-layers/) for more details about creating custom layer." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "from keras import backend as K\n", "from keras.layers import Layer\n", "from keras import initializers\n", "\n", "# Exponential Smoothing + Normalization\n", "class ES(Layer):\n", "\n", " def __init__(self, horizon, m, batch_size, time_steps, **kwargs):\n", " self.horizon = horizon\n", " self.m = m\n", " self.batch_size = batch_size\n", " self.time_steps = time_steps\n", " \n", " super(ES, self).__init__(**kwargs)\n", "\n", " # initialization of the learned parameters of exponential smoothing\n", " def build(self, input_shape):\n", " self.alpha = self.add_weight(name='alpha', shape=(1,),\n", " initializer='uniform', trainable=True)\n", " self.gamma = self.add_weight(name='gamma', shape=(1,),\n", " initializer='uniform', trainable=True)\n", " self.init_seasonality = self.add_weight(name='init_seasonality', shape=(self.m,),\n", " initializer=initializers.Constant(value=0.8), trainable=True)\n", " self.init_seasonality_list = [K.slice(self.init_seasonality,(i,),(1,)) for i in range(self.m)]\n", " self.seasonality_queue = deque(self.init_seasonality_list, self.m)\n", " self.level = self.add_weight(name='init_level', shape=(1,),\n", " initializer=initializers.Constant(value=0.8), \n", " trainable=True)\n", " super(ES, self).build(input_shape) \n", "\n", " def call(self, x):\n", "\n", " # extract time-series from feature vector\n", " n_examples = K.int_shape(x)[0]\n", " if n_examples is None:\n", " n_examples = self.batch_size\n", " x1 = K.slice(x,(0,0,0),(1,self.time_steps,1))\n", " x1 = K.reshape(x1,(self.time_steps,))\n", " x2 = K.slice(x,(1,self.time_steps-1,0),(n_examples-1,1,1))\n", " x2 = K.reshape(x2,(n_examples-1,))\n", " ts = K.concatenate([x1,x2])\n", " \n", " x_norm = [] # normalized values of time-series\n", " ls = [] # coeffients for denormalization of forecasts\n", " \n", " l_t_minus_1 = self.level\n", " \n", " for i in range(n_examples+self.time_steps-1):\n", " \n", " # compute l_t\n", " y_t = ts[i]\n", " s_t = self.seasonality_queue.popleft()\n", " l_t = self.alpha * y_t / s_t + (1 - self.alpha) * l_t_minus_1\n", " \n", " # compute s_{t+m}\n", " s_t_plus_m = self.gamma * y_t / l_t + (1 - self.gamma) * s_t\n", " \n", " self.seasonality_queue.append(s_t_plus_m)\n", " \n", " # normalize y_t\n", " x_norm.append(y_t / (s_t * l_t))\n", "\n", " l_t_minus_1 = l_t\n", "\n", " if i >= self.time_steps-1:\n", " l = [l_t]*self.horizon\n", " l = K.concatenate(l)\n", " s = [self.seasonality_queue[i] for i in range(self.horizon)] # we assume here that horizon < m\n", " s = K.concatenate(s)\n", " ls_t = K.concatenate([K.expand_dims(l), K.expand_dims(s)])\n", " ls.append(K.expand_dims(ls_t,axis=0)) \n", " \n", " self.level = l_t\n", " x_norm = K.concatenate(x_norm)\n", "\n", " # create x_out\n", " x_out = []\n", " for i in range(n_examples):\n", " norm_features = K.slice(x_norm,(i,),(self.time_steps,))\n", " norm_features = K.expand_dims(norm_features,axis=0)\n", " x_out.append(norm_features)\n", "\n", " x_out = K.concatenate(x_out, axis=0)\n", " x_out = K.expand_dims(x_out)\n", "\n", " # create tensor of denormalization coefficients \n", " denorm_coeff = K.concatenate(ls, axis=0)\n", " return [x_out, denorm_coeff]\n", "\n", " def compute_output_shape(self, input_shape):\n", " return [(input_shape[0], input_shape[1], input_shape[2]), (input_shape[0], self.horizon, 2)]\n", " \n", "class Denormalization(Layer):\n", " \n", " def __init__(self, **kwargs):\n", " super(Denormalization, self).__init__(**kwargs)\n", "\n", " def build(self, input_shape):\n", " super(Denormalization, self).build(input_shape) \n", "\n", " def call(self, x):\n", " return x[0] * x[1][:,:,0] * x[1][:,:,1]\n", "\n", " def compute_output_shape(self, input_shape):\n", " return input_shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create ES-RNN model\n", "Since Denormalization layer has inputs from two previous layers, we need to use functional API of Keras to create the model." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": false }, "outputs": [], "source": [ "model_input = Input(shape=(None, 1))\n", "[normalized_input, denormalization_coeff] = ES(HORIZON, m, BATCH_SIZE, T)(model_input)\n", "gru_out = GRU(LATENT_DIM)(normalized_input)\n", "model_output_normalized = Dense(HORIZON)(gru_out)\n", "model_output = Denormalization()([model_output_normalized, denormalization_coeff])\n", "model = Model(inputs=model_input, outputs=model_output)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "model.compile(optimizer='RMSprop', loss='mse')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "__________________________________________________________________________________________________\n", "Layer (type) Output Shape Param # Connected to \n", "==================================================================================================\n", "input_1 (InputLayer) (None, None, 1) 0 \n", "__________________________________________________________________________________________________\n", "es_1 (ES) [(None, None, 1), (N 27 input_1[0][0] \n", "__________________________________________________________________________________________________\n", "gru_1 (GRU) (None, 5) 105 es_1[0][0] \n", "__________________________________________________________________________________________________\n", "dense_1 (Dense) (None, 3) 18 gru_1[0][0] \n", "__________________________________________________________________________________________________\n", "denormalization_1 (Denormalizat (None, 3) 0 dense_1[0][0] \n", " es_1[0][1] \n", "==================================================================================================\n", "Total params: 150\n", "Trainable params: 150\n", "Non-trainable params: 0\n", "__________________________________________________________________________________________________\n" ] } ], "source": [ "model.summary()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=20)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 23328 samples, validate on 1488 samples\n", "Epoch 1/10\n", "23328/23328 [==============================] - 8s 341us/step - loss: 0.0324 - val_loss: 0.0194\n", "Epoch 2/10\n", "23328/23328 [==============================] - 4s 153us/step - loss: 0.0029 - val_loss: 0.0184\n", "Epoch 3/10\n", "23328/23328 [==============================] - 4s 156us/step - loss: 0.0023 - val_loss: 0.0163\n", "Epoch 4/10\n", "23328/23328 [==============================] - 4s 155us/step - loss: 0.0019 - val_loss: 0.0140\n", "Epoch 5/10\n", "23328/23328 [==============================] - 4s 157us/step - loss: 0.0016 - val_loss: 0.0120\n", "Epoch 6/10\n", "23328/23328 [==============================] - 4s 154us/step - loss: 0.0015 - val_loss: 0.0103\n", "Epoch 7/10\n", "23328/23328 [==============================] - 4s 154us/step - loss: 0.0014 - val_loss: 0.0091\n", "Epoch 8/10\n", "23328/23328 [==============================] - 4s 156us/step - loss: 0.0013 - val_loss: 0.0081\n", "Epoch 9/10\n", "23328/23328 [==============================] - 4s 154us/step - loss: 0.0012 - val_loss: 0.0074\n", "Epoch 10/10\n", "23328/23328 [==============================] - 4s 151us/step - loss: 0.0012 - val_loss: 0.0068\n" ] }, { "data": { "text/plain": [ "<keras.callbacks.History at 0x17909381828>" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.fit(train_inputs['X'],\n", " train_inputs['target'],\n", " batch_size=BATCH_SIZE,\n", " shuffle=False,\n", " epochs=EPOCHS,\n", " validation_data=(valid_inputs['X'], valid_inputs['target']),\n", " callbacks=[earlystop],\n", " verbose=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate the model" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "look_back_dt = dt.datetime.strptime(test_start_dt, '%Y-%m-%d %H:%M:%S') - dt.timedelta(hours=T-1)\n", "test = energy.copy()[test_start_dt:test_end_dt][['load']]\n", "test[['load']] = scaler.transform(test)\n", "test_inputs = TimeSeriesTensor(test, 'load', HORIZON, tensor_structure)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "predictions = model.predict(test_inputs['X'], batch_size=BATCH_SIZE)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.54, 0.69, 0.72],\n", " [0.48, 0.61, 0.63],\n", " [0.46, 0.55, 0.57],\n", " ...,\n", " [0.58, 0.59, 0.68],\n", " [0.6 , 0.69, 0.77],\n", " [0.69, 0.78, 0.76]], dtype=float32)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predictions" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>timestamp</th>\n", " <th>h</th>\n", " <th>prediction</th>\n", " <th>actual</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>2014-10-31 16:00:00</td>\n", " <td>t+1</td>\n", " <td>3,717.48</td>\n", " <td>3,437.00</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2014-10-31 17:00:00</td>\n", " <td>t+1</td>\n", " <td>3,543.91</td>\n", " <td>3,466.00</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2014-10-31 18:00:00</td>\n", " <td>t+1</td>\n", " <td>3,483.05</td>\n", " <td>3,374.00</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>2014-10-31 19:00:00</td>\n", " <td>t+1</td>\n", " <td>3,409.18</td>\n", " <td>3,315.00</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>2014-10-31 20:00:00</td>\n", " <td>t+1</td>\n", " <td>3,340.66</td>\n", " <td>3,142.00</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " timestamp h prediction actual\n", "0 2014-10-31 16:00:00 t+1 3,717.48 3,437.00\n", "1 2014-10-31 17:00:00 t+1 3,543.91 3,466.00\n", "2 2014-10-31 18:00:00 t+1 3,483.05 3,374.00\n", "3 2014-10-31 19:00:00 t+1 3,409.18 3,315.00\n", "4 2014-10-31 20:00:00 t+1 3,340.66 3,142.00" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eval_df = create_evaluation_df(predictions, test_inputs, HORIZON, scaler)\n", "eval_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute MAPE for each forecast horizon" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "h\n", "t+1 0.03\n", "t+2 0.08\n", "t+3 0.11\n", "Name: APE, dtype: float64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eval_df['APE'] = (eval_df['prediction'] - eval_df['actual']).abs() / eval_df['actual']\n", "eval_df.groupby('h')['APE'].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute MAPE across all predictions" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.07506279816956267" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mape(eval_df['prediction'], eval_df['actual'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot actuals vs predictions at each horizon for first week of the test period. As is to be expected, predictions for one step ahead (*t+1*) are more accurate than those for 2 or 3 steps ahead" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 1080x576 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_df = eval_df[(eval_df.timestamp<'2014-11-08') & (eval_df.h=='t+1')][['timestamp', 'actual']]\n", "for t in range(1, HORIZON+1):\n", " plot_df['t+'+str(t)] = eval_df[(eval_df.timestamp<'2014-11-08') & (eval_df.h=='t+'+str(t))]['prediction'].values\n", "\n", "fig = plt.figure(figsize=(15, 8))\n", "ax = plt.plot(plot_df['timestamp'], plot_df['actual'], color='red', linewidth=4.0)\n", "ax = fig.add_subplot(111)\n", "ax.plot(plot_df['timestamp'], plot_df['t+1'], color='blue', linewidth=4.0, alpha=0.75)\n", "ax.plot(plot_df['timestamp'], plot_df['t+2'], color='blue', linewidth=3.0, alpha=0.5)\n", "ax.plot(plot_df['timestamp'], plot_df['t+3'], color='blue', linewidth=2.0, alpha=0.25)\n", "plt.xlabel('timestamp', fontsize=12)\n", "plt.ylabel('load', fontsize=12)\n", "ax.legend(loc='best')\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python (dlts)", "language": "python", "name": "dlts" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }