7.7 Exercises
1. Hypothesis Testing in a Multiple Regression Model — \(t\)-statistics and \(p\)-values
Reconsider the Boston data set and the following estimated model (homoscedasticity-only standard errors in parentheses) from the previous chapter:
\[\widehat{medv}_i = \underset{(0.75)}{32.828} -\underset{(0.05)}{0.994} \times lstat_i -\underset{(0.04)}{0.083} \times crim_i + \underset{(0.01)}{0.038} \times age_i.\]
Just as in the simple linear regression framework we can conduct hypothesis tests about the coefficients in multiple regression models. The most common hypothesis is \(H_0:\beta_j=0\) against the alternative \(H_1:\beta_j\ne 0\) for some \(j\) in \(0,1,\dots,k\).
The packages AER and MASS have been loaded. The coefficient estimates as well as the corresponding standard errors are available in coefs and SEs, respectively.
Instructions:
Use vector arithmetics to solve the following tasks:
Compute \(t\)-statistics for each coefficient by using the predefined objects coefs and SEs. Assign them to tstats.
Compute \(p\)-values for each coefficient and assign them to pval.
Check with the help of logical operators whether the hypotheses are rejected at the \(1\%\) significance level.
Hints:
The \(t\)-statistic for each coefficient is defined as \(t=\frac{\widehat{\beta}_j-\beta_{j,0}}{SE(\widehat{\beta}_j)}\).
The \(p\)-value for a two-sided test using is computed as \(2\cdot\Phi(-|t^{act}|)\) where \(t^{act}\) denotes the computed \(t\)-statistic.
2. Hypothesis Testing in a Multiple Regression Model - Confidence Intervals
Consider again the estimated model
\[\widehat{medv}_i = \underset{(0.75)}{32.828} -\underset{(0.05)}{0.994} \times lstat_i -\underset{(0.04)}{0.083} \times crim_i + \underset{(0.01)}{0.038} \times age_i.\]
which is available as the object mod in your working environment. The packages AER and MASS have been loaded.
Instructions:
- Construct \(99\%\) confidence intervals for all model coefficients. Use the intervals to decide whether the individual null hypotheses \(H_0:\beta_j=0\), \(j=0,1,2,3,4\) are rejected at the \(1\%\) level.
Hint:
- You may use confint() to construct confidence intervals. The confidence level can be set via the argument level.
3. Robust Hypothesis Testing in Multiple Regression Models
The lm object mod from the previous exercises is available in your working environment. The packages AER and MASS have been loaded.
Instructions:
Print a coefficient summary that reports heteroscedasticity-robust standard errors.
Access entries of the matrix generated by coeftest() to check whether the hypotheses are rejected at a 1% significance level. Use logical operators <,>.
Hints:
Using the argument vcov. in coeftest() forces the function to use robust standard errors.
The \(p\)-values are contained in the fourth column of the output generated by coeftest(). Use square brackets to subset the matrix accordingly.
4. Joint Hypothesis Testing — \(F\)-Test I
Sometimes we are interested in testing joint hypotheses which impose restrictions on multiple regression coefficients. For example, in the model
\[medv_i = \beta_0 + \beta_1\times lstat_i + \beta_2\times crim_i + \beta_3\times age_i + u_i\]
we may test the null \(H_0: \beta_2=\beta_3\) vs. the alternative \(H_1: \beta_2\ne\beta_3\) (which is a joint hypothesis as we impose a restriction on two regression coefficients).
The basic idea behind testing such a hypothesis is to conduct two regressions and to compare the outcomes: for one of the regressions we impose the restrictions formalized by the null (we call this the restricted regression model), whereas for the other regression the restriction is left out (we call this the unrestricted model). From this starting point we construct a test-statistic which, under the null, follows a well known distribution, an \(F\) distribution (see the next exercise).
However, in this exercise we start with the initial computations necessary to construct the test statistic.
The packages AER and MASS have been loaded.
Instructions:
Estimate the restricted model, that is, the model where the restriction formalized by \(H_0: \beta_2=\beta_3\) is assumed to be true. Save the model in model_res.
Compute the \(SSR\) of the restricted model and assign it to RSSR.
Estimate the unrestricted model, that is, the model where the restriction is assumed to be false. Save it in model_unres.
Compute the \(SSR\) of the unrestricted model and assign it to USSR.
Hints:
The restricted model can be written as \[medv_i = \beta_0 + \beta_1\times lstat_i + \beta_2\times crim_i + \beta_2\times age_i + u_i\] which, after rearranging, can be expressed as \[medv_i = \beta_0 + \beta_1\times lstat_i + \beta_2\times(crim_i+age_i) + u_i.\]
The \(SSR\) is defined as the sum of the squared residuals.
Note that the residuals of a regression model are available as residuals in the corresponding lm object. So you can access them as usual via the $-operator.
5. Joint Hypothesis Testing — F-Test II
After estimating the models and computing the \(SSR\)s you now have to compute the test-statistic and conduct the \(F\)-test. As mentioned in the last exercise, the test-statistic follows an \(F\) distribution. More precisely, we deal with the \(F_{q,n-k-1}\) distribution where \(q\) denotes the number of restrictions under the null and \(k\) is the of regressors in the unrestricted model, excluding the intercept.
The packages AER and MASS have been loaded. Both models (model_res and model_unres) as well as their SSR (RSSR and USSR) are available in your working environment.
Instructions:
Compute the \(F\)-statistic and assign it to Fstat.
Compute the \(p\)-value and assign it to pval.
Check whether the null is rejected at the \(1\%\) level using logical operators.
Verify your result by using linearHypothesis() and printing the results.
Hints:
The \(F\)-statistic is defined as \(\frac{(RSSR-USSR)/q}{USSR/(n-k-1)}\).
The \(p\)-value can be computed as \(1-F_{q,n-k-1}(F^{act})\) where \(F_{q,n-k-1}\) denotes the CDF of the \(F\)-distribution (pf()) with degrees of freedom \(q\) and \(n-k-1\) and \(F^{act}\) the computed \(F\)-statistic.
linearHypothesis() expects the unrestricted model as well as the null hypothesis as arguments.
6. Joint Hypothesis Testing - Confidence Set
As you know from previous chapters constructing a confidence set for a single regression coefficient results in a simple confidence interval on the real line. However if we consider \(n\) regression coefficients jointly (as we do in a joint hypothesis testing setting) we move from \(\mathbb{R}\) to \(\mathbb{R}^n\) resulting in a n-dimensional confidence set. For the sake of illustration we then often choose \(n=2\), so that we end up with a representable two-dimensional plane.
Recall the estimated model
\[\widehat{medv}_i = \underset{(0.75)}{32.828} -\underset{(0.05)}{0.994} \times lstat_i -\underset{(0.04)}{0.083} \times crim_i + \underset{(0.01)}{0.038} \times age_i.\]
which is available as mod in your working environment. Assume you want to test the null \(H_0: \beta_2=\beta_3=0\) vs. \(H_1: \beta_2\ne 0\) or \(\beta_3\ne 0\).
The packages AER and MASS have been loaded.
Instructions:
Construct a \(99\%\) confidence set for the coefficients of crim and lstat, that is a two-dimensional confidence set. Can you reject the null stated above?
Verify your visual inspection by conducting a corresponding \(F\)-test.
Hints:
Use confidenceEllipse() to construct a two-dimensional confidence set. Besides the coefficients for which the confidence set shall be constructed (which.coef), you have to specify the confidence level (levels).
As usual you can use linearHypothesis() to conduct the \(F\)-test. Note that there are two restrictions now, hence you have to pass a vector containing both restrictions.