10 Regression with Panel Data
Regression using panel data may mitigate omitted variable bias when there is no information on variables that correlate with both the regressors of interest and the dependent variable and if these variables are constant in the time dimension or across entities. When panel data is available, panel regression methods can be used to improve upon multiple regression models. This is because multiple regression models may produce results that lack internal validity in such a setting, as discussed in Chapter 9.
This chapter covers the following topics:
- notation for panel data,
- fixed effects regression using time and/or entity fixed effects,
- computation of standard errors in fixed effects regression models.
Following the book, for applications we make use of the dataset Fatalities from the AER package (Christian Kleiber and Zeileis 2008) which is a panel dataset reporting annual state level observations on U.S. traffic fatalities for the period 1982 through 1988. The applications analyze if there are effects of alcohol taxes and drunk driving laws on road fatalities and, if present, how strong these effects are.
We introduce plm(), a convenient R function that enables us to estimate linear panel regression models which comes with the package plm (Croissant, Millo, and Tappe 2023). Usage of plm() is very similar as for the function lm() which we have used throughout the previous chapters for estimation of simple and multiple regression models.
The following packages and their dependencies are needed for reproduction of the code chunks presented throughout this chapter on your computer:
- AER
- plm
- stargazer
Check whether the following code chunk runs without any errors.