This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click "Annotate" in the pop-up menu. You can also see the annotations of others: click the arrow in the upper right hand corner of the page

14.6 Lag Length Selection using Information Criteria

The selection of lag lengths in AR and ADL models can sometimes be guided by economic theory. However, there are statistical methods that are helpful to determine how many lags should be included as regressors. In general, too many lags inflate the standard errors of coefficient estimates and thus imply an increase in the forecast error while omitting lags that should be included in the model may result in an estimation bias.

The order of an AR model can be determined using two approaches:

  1. The F-test approach

    Estimate an AR(\(p\)) model and test the significance of the largest lag(s). If the test indicates that a particular lag(s) is not significant, we can consider removing it from the model. This approach has the tendency to produce models where the order is too large: in a significance test we always face the risk of rejecting a true null hypothesis!

  2. Relying on an information criterion

    To circumvent the issue of producing too large models, one may choose the lag order that minimizes one of the following two information criteria:

    • The Bayes information criterion (BIC):

      \[BIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{\log(T)}{T}.\]

    • The Akaike information criterion (AIC):

      \[AIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{2}{T}.\]

    Both criteria are estimators of the optimal lag length \(p\). The lag order \(\widehat{p}\) that minimizes the respective criterion is called the BIC estimate or the AIC estimate of the optimal model order. The basic idea of both criteria is that the \(SSR\) decreases as additional lags are added to the model such that the first term decreases whereas the second increases as the lag order grows. One can show that the the \(BIC\) is a consistent estimator of the true lag order while the AIC is not which is due to the differing factors in the second addend. Nevertheless, both estimators are used in practice where the \(AIC\) is sometimes used as an alternative when the \(BIC\) yields a model with “too few” lags.

The function dynlm() does not compute information criteria by default. We will therefore write a short function that reports the \(BIC\) (along with the chosen lag order \(p\) and \(\bar{R}^2\)) for objects of class dynlm.

# compute BIC for AR model objects of class 'dynlm'
BIC <- function(model) {
  
  ssr <- sum(model$residuals^2)
  t <- length(model$residuals)
  npar <- length(model$coef)
  
  return(
    round(c("p" = npar - 1,
          "BIC" = log(ssr/t) + npar * log(t)/t,
          "Adj.R2" = summary(model)$adj.r.squared), 4)
  )
}

Table 14.3 of the book presents a breakdown of how the \(BIC\) is computed for AR(\(p\)) models of GDP growth with order \(p=1,\dots,6\). The final result can easily be reproduced using sapply() and the function BIC() defined above.

# apply the BIC() to an intercept-only model of GDP growth
BIC(dynlm(ts(GDPGR_level) ~ 1))
#>      p    BIC Adj.R2 
#> 0.0000 2.4394 0.0000

# loop BIC over models of different orders
order <- 1:6

BICs <- sapply(order, function(x) 
        "AR" = BIC(dynlm(ts(GDPGR_level) ~ L(ts(GDPGR_level), 1:x))))

BICs
#>          [,1]   [,2]   [,3]   [,4]   [,5]   [,6]
#> p      1.0000 2.0000 3.0000 4.0000 5.0000 6.0000
#> BIC    2.3486 2.3475 2.3774 2.4034 2.4188 2.4429
#> Adj.R2 0.1099 0.1339 0.1303 0.1303 0.1385 0.1325

Note that increasing the lag order increases \(R^2\) because the \(SSR\) decreases as additional lags are added to the model. However, \(\bar{R}^2\) takes into account the number of parameters in the model and adjusts for the increase in \(R^2\) due to adding more variables, but according to the \(BIC\), we should settle for the AR(\(2\)) model instead of the AR(\(5\)) model. It helps us to decide whether the decrease in \(SSR\) is enough to justify adding an additional regressor.

If we had to compare a bigger set of models, a convenient way to select the model with the lowest \(BIC\) is using the function which.min().

# select the AR model with the smallest BIC
BICs[, which.min(BICs[2, ])]
#>      p    BIC Adj.R2 
#> 2.0000 2.3475 0.1339

The \(BIC\) may also be used to select lag lengths in time series regression models with multiple predictors. In a model with \(K\) coefficients, including the intercept, we have \[\begin{align*} BIC(K) = \log\left(\frac{SSR(K)}{T}\right) + K \frac{\log(T)}{T}. \end{align*}\] Notice that choosing the optimal model according to the \(BIC\) can be computationally demanding because there may be many different combinations of lag lengths when there are multiple predictors.

To give an example, we estimate ADL(\(p\),\(q\)) models of GDP growth where, as above, the additional variable is the term spread between short-term and long-term bonds. We impose the restriction that \(p=q_1=\dots=q_k\) so that only \(p_{max}\) models (\(p=1,\dots,p_{max}\)) need to be estimated. In the example below we choose \(p_{max} = 12\).

# loop 'BIC()' over multiple ADL models 
order <- 1:12

BICs <- sapply(order, function(x) 
         BIC(dynlm(GDPGrowth_ts ~ L(GDPGrowth_ts, 1:x) + L(TSpread_ts, 1:x), 
                   start = c(1962, 1), end = c(2012, 4))))

BICs
#>          [,1]   [,2]   [,3]   [,4]    [,5]    [,6]    [,7]    [,8]    [,9]
#> p      2.0000 4.0000 6.0000 8.0000 10.0000 12.0000 14.0000 16.0000 18.0000
#> BIC    2.3411 2.3408 2.3813 2.4181  2.4568  2.5048  2.5539  2.6029  2.6182
#> Adj.R2 0.1332 0.1692 0.1704 0.1747  0.1773  0.1721  0.1659  0.1586  0.1852
#>          [,10]   [,11]   [,12]
#> p      20.0000 22.0000 24.0000
#> BIC     2.6646  2.7205  2.7664
#> Adj.R2  0.1864  0.1795  0.1810

From the definition of BIC(), for ADL models with \(p=q\) it follows that p reports the number of estimated coefficients excluding the intercept. Thus the lag order is obtained by dividing p by 2.

# select the ADL model with the smallest BIC
BICs[, which.min(BICs[2, ])]
#>      p    BIC Adj.R2 
#> 4.0000 2.3408 0.1692

The \(BIC\) is in favor of the ADL(\(2\),\(2\)) model (14.5) we have estimated before.