13.5 Exercises
The subsequent exercises guide you in reproducing some of the results presented in one of the most famous DID studies by Card and Krueger (1994). The authors use geography as the “as if” random treatment assignment to study the effect on employment in fast food restaurants caused by an increase in the state minimum wage in New Jersey in the year of 1992, see Chapter 13.4.
The study is based on survey data collected in February 1992 and in November 1992, after New Jersey’s minimum wage rose by \(\$0.80\) from \(\$4.25\) to \(\$5.05\) in April 1992.
Estimating the effect of the wage increase simply by computing the change in employment in New Jersey (as you are asked to do in Exercise 3) would fail to control for omitted variables. By using Pennsylvania as a control in a difference-in-differences (DID) model one can control for variables with a common influence on New Jersey (treatment group) and Pennsylvania (control group). This reduces the risk of omitted variable bias enormously and even works when these variables are unobserved.
For the DID approach to work we must assume that New Jersey and Pennsylvania have parallel trends over time, i.e., we assume that the (unobserved) factors influence employment in Pennsylvania and New Jersey in the same manner. This allows to interpret an observed change in employment in Pennsylvania as the change New Jersey would have experienced if there was no increase in minimum wage (and vice versa).
Against to what standard economic theory would suggest, the authors did not find evidence that the increased minimum wage induced an increase in unemployment in New Jersey using the DID approach: quite the contrary, their results suggest that the \(\$0.80\) minimum wage increase in New Jersey led to a 2.75 full-time equivalent (FTE) increase in employment.
1. The Data from Card & Krueger (1994)
fastfood.dat, the dataset used by Card & Krueger (1994) can be downloaded here. See this link for a detailed explanation of the variables.
This exercise asks you to import the dataset in R and to perform some formatting necessary for the subsequent analysis. This can be tedious using base R functions but is easily done using the dplyr package introducted in Chapter 3.6.
The URL to the dataset is saved in data_URL.
Instructions:
Attach the packages dplyr and foreign.
Read in the dataset fastfood.dta using data_URL and assign it to a data.frame named dat.
In their study, Card & Krueger (1994) measure employment in full time equivalents which they define as the number of full time employees (empft and empft2) plus the number of managers (nmgrs and nmgrs2) plus 0.5 times the number part-time employees (emppt / emppt2). - Define full-time employment before (FTE) and after the wage increase (FTE2) and add both variables to dat
Hints:
read.dta() from the foreign package reads .dta files, a format used by the statistical software package STATA.
mutate() generates new columns using existing ones.
2. State Specific Estimates of Full-Time Employment — I
This exercise asks you to perform a quick calculation of state specific sample means in order to check whether our data on full-time employment is in alignment with the data used by Card & Krueger (1994).
Instructions:
Generate subsets of dat to seperate observations for New Jersey and Pennsylvania. Save them as dat_NJ and dat_PA.
Compute sample means of full-time employment equivalents for New Jersey and Pennsylvania both before and after the minium wage increase in New Jersey. It suffices if your code prints the correct values to the console.
Hints:
- You may use group_by() in conjunction with summarise() to compute groupwise means. Both function come with the dplyr package.
3. State Specific Estimates of Full-Time Employment — II
A naive approach to investigate the impact of the minimum wage increase on employment is to use the estimated difference in mean employment before and after the wage increase for New Jersey fast food restaurants.
This exercise asks you to do the aforementioned and further to test if the estimated difference is significantly different from zero using a robust \(t\)-test.
The subsets dat_NJ and dat_PA from the previous exercise are available in your working environment.
Instructions:
- Use dat_NJ for a robust test of the hypothesis that there is no difference in full-time employment before and after the wage hike in New Jersey at the level of \(5\%\).
Hints:
- The testing problem amounts to a two-sample \(t\)-test which is conveniently done using t.test().
4. Preparing the Data for Regression
The estimations done in Exercise 3 and the difference-in-differences approach we are working towards can be shown to produce the same results OLS applied to specific regression models, see Chapters 13.1 and 3.6.
This exercise asks you to construct a dataset which is more convenient for this purpose than the dataset dat.
Instructions:
Generate the dataset reg_dat from dat in long format, i.e., make sure that for each restaurant (identified by sheet) one observation before and one after the minimum wage increase (identified by D) are included.
Only consider the following variables:
id: sheet number (unique store id)
chain: chain 1=Burger King; 2=KFC; 3=Roy Rogers; 4=Wendys
state: 1 if New Jersey; 0 if Pennsylvania
empl: measure of full-time employment (FTE / FTE2)
D: dummy indicating if the observation was made before or after the minimum wage increase in New Jersey.
Hints:
The original dataset dat has 410 observations of 48 variables (check this using dim(dat)). The dataset reg_dat you are asked to generate must consist of 820 observations of the variables listed above.
It is straightforward to generate a data.frame from the columns of another data.frame using data.frame(…).
Use rbind() to combine two objects of type data.frame by row.
5. A Difference Estimate using Data from Card & Krueger (1994) — II
reg_dat from Exercise 4 is a panel dataset as it has two observations for each fast food restaurant \(i=1,\dots,410\), at time periods \(t=0,1\).
Thus we may write down the simple regression model
\[employment_{i,t} = \beta_0 + \beta_1 D_t + \varepsilon_{i,t},\]
where \(D_t\) is a dummy variable which equals \(0\) if the observation was made before the minimum wage change (\(t=0\)) and \(1\) after the minimum wage change (\(t=1\)), i.e.,
\[\begin{align*} D_t = \begin{cases} 0, & \, \text{if $t=0$ (before wage change),} \\ 1, & \, \text{if $t=1$ (after wage change)} \end{cases} \end{align*}\]
and assume that observations for New Jersey restaurants only are used in computing \(\hat\beta_1\), the OLS estimator of \(\beta_1\), which is also called the differences estimator.
The dataset reg_dat from Exercise 4 and the New Jersey subset dat_NJ are available in your working environment.
Instructions:
Estimate \(\beta_1\) in the model above using OLS. Save the estimated model to emp_mod.
Obtain a robust summary of the results and interpret your findings.
Hints:
Remember that dependencies of the AER package include functions for robust inference on regression models.
The argument subset in lm() takes a logical vector which identifies observations used for estimation.
6. A Difference Estimate using Data from Card & Krueger (1994) — II
The estimate obtained using t.test() on the New Jersey subset in Exercise 3 and the OLS estimate of \(\hat\beta_1\) in Exercise 5 are numercially the same. This also holds for the reported \(t\)-statistics if the same standard error formulas are used (t.test(…, var.equal = T) and coeftest(…, vcov. = vcovHC, type = “HC1”)).
This exercise asks you to check that the above statement is true.
The data from the previous exercises, the result of t.test(…) from Exercise 3 as well as the regression model object emp_mod from Exercise 5 are available in your working environment. The AER package has been attached.
No submission correctness tests are run.
Instructions:
Check that the estimate of \(\beta_1\) in Exercise 5 is equal to the estimated difference in mean employment of New Jersey fastfood restaurants before and after the minimum wage increase from Exercise 3.
Convince yourself that the \(t\)-statistics reported by coeftest(…) in Exercise 5 and t.test(…) in Exercise 3 match.
7. A Difference-in-Differences Estimate — II
As mentioned in Chapter 3.6, the approach discussed in Exercises 5 and 6 is naive: \(\hat\beta_1\) is a biased estimate of the average effect of the minimum wage increase on employment because we cannot control for other determinants of employment that correlate with \(D_t\). As an example, think about macro-economic developments which have a positive impact on the labor market such that employment is higher in the period after the minimum wage increase. It is likely that \(D_t\) is positively correlated with the error term such that \(\hat\beta_1\) overestimates the effect of the wage hike on employment.
This motivates usage of the difference-in-differences (DID) estimator outlined in Chapter 3.6.
Consider the liner regression model
\[employment_{i,t} = \beta_0 + \beta_1 D_t + \beta_2 state_i + \beta_3 (D_t \times state_i) + \varepsilon_{i,t},\]
where we use indices \(i\) and \(t\) just as in the simple regression model in Exercise 5. In this model, \(\beta_3\) is the coefficient we are interested in as it is interpreted as the average difference in employment of New Jersey fastfood restaurants before and after the wage increase after controlling for unobservables which are common to New Jersey and Pennsylvania, the control group. The OLS estimator of \(\beta_3\) is called a DID estimator.
Instructions:
Estimate the above model using OLS and obtain a robust summary.
Interpret your findings.