4.6 Pushing Two-Way ANOVA to the limit: Un-replicated designs and Estimability

In some situations, it is too expensive or impossible to replicate combinations of treatments and only one observation at each combination of the two explanatory variables, A and B, is possible. In these situations, even though we have information about all combinations of A and B, it is no longer possible to test for an interaction. Our regular rules for degrees of freedom show that we have nothing left for the error degrees of freedom and so we have to drop the interaction and call that potential interaction variability “error”.

Without replication we can still perform an analysis of the responses and estimate all the coefficients in the interaction model but an issue occurs with trying to calculate the interaction \(F\)-test statistic – we run out of degrees of freedom for the error. To illustrate these methods, the paper towel example is revisited except that only one response for each combination is used. Now the entire data set can be easily printed out:

## # A tibble: 6 x 4
##   brand drops responses dropsf
##   <fct> <dbl>     <dbl> <fct> 
## 1 B1       10     1.91  10    
## 2 B2       10     3.05  10    
## 3 B1       20     0.774 20    
## 4 B2       20     2.84  20    
## 5 B1       30     1.56  30    
## 6 B2       30     0.547 30

Upon first inspection the interaction plot in Figure 2.69 looks like there might be some interesting interactions present with lines that look to be non-parallel. But remember now that there is only a single observation at each combination of the brands and water levels so there is not much power to detect differences in this sort of situation and no replicates at any combinations of levels that allow estimation of SEs so no bands are produced in the plot.

Interaction plot in paper towel data set with no replication.

Figure 2.69: Interaction plot in paper towel data set with no replication.

The next step would be to assess evidence related to the null hypothesis of no interaction between Brand and Drops. A problem will arise in trying to form the ANOVA table as you would see this when you run the anova79 function on the interaction model:

> anova(lm(responses~dropsf*brand,data=ptR))
Analysis of Variance Table
Response: responses
             Df  Sum Sq Mean Sq F value Pr(>F)
dropsf        2 2.03872 1.01936               
brand         1 0.80663 0.80663               
dropsf:brand  2 2.48773 1.24386               
Residuals     0 0.00000                       
Warning message:
In anova.lm(lm(responses ~ dropsf * brand, data = ptR)) :
  ANOVA F-tests on an essentially perfect fit are unreliable

Warning messages in R output show up after you run functions that contain problems and are generally not a good thing, but can sometimes be ignored. In this case, the warning message is not needed – there are no \(F\)-statistics or p-values in the results so we know there are some issues with the results. The Residuals line is key here – Residuals with 0 df and sums of squares of 0. Without replication, there are no degrees of freedom left to estimate the residual error. My first statistics professor, Dr. Gordon Bril at Luther College, used to refer to this as “shooting your load” by fitting too many terms in the model given the number of observations available. Maybe this is a bit graphic but hopefully will help you remember the need for replication if you want to test for interactions – it did for me. Without replication of observations, we run out of information to test all the desired model components.

So what can we do if we can’t afford replication but want to study two variables in the same study? We can assume that the interaction does not exist and use those degrees of freedom and variability as the error variability. When we drop the interaction from Two-Way models, the interaction variability is added into the \(\text{SS}_E\) so we assume that the interaction variability is really just “noise”, which may not actually be true. We are not able to test for an interaction so must rely on the interaction plot to assess whether an interaction might be present. Figure 2.68 suggests there might be an interaction in these data (the two brands’ lines suggesting non-parallel lines). So in this case, assuming no interaction is present is hard to justify. But if we proceed under this dangerous and untestable assumption, tests for the main effects can be developed.

## Anova Table (Type II tests)
## 
## Response: responses
##            Sum Sq Df F value Pr(>F)
## dropsf    2.03872  2  0.8195 0.5496
## brand     0.80663  1  0.6485 0.5052
## Residuals 2.48773  2

In the additive model, the last row of the ANOVA table that is called the Residuals row is really the interaction row from the interaction model ANOVA table. Neither main effect had a small p-value (Drops: \(F(2,2)=0.82, \text{ p-value}=0.55\) and Brand: \(F(1,2)=0.65, \text{ p-value}=0.51\)) in the additive model. To get small p-values with the small sample sizes that unreplicated designs would generate, the differences would need to be very large because the residual degrees of freedom have become very small. The term-plots in Figure 2.70 show that the differences among the levels are small relative to the residual variability as seen in the error bars around each point estimate.

(ref:fig4-22) Term-plots for the additive model in paper towel data set with no replication.

(ref:fig4-22)

Figure 2.70: (ref:fig4-22)

In the extreme unreplicated situation it is possible to estimate all model coefficients in the interaction model but we can’t do inferences for those estimates since there is no residual variability. Another issue in really any model with categorical predictors but especially noticeable in the Two-Way ANOVA situation is estimability issues. Instead of having issues with running out of degrees of freedom for tests we can run into situations where we do not have information to estimate some of the model coefficients. This happens any time you fail to have observations at either a level of a main effect or at a combination of levels in an interaction model.

To illustrate estimability issues, we will revisit the overtake data. Each of the seven levels of outfits was made up of a combination of different characteristics of the outfits, such as which helmet and pants were chosen, whether reflective leg clips were worn or not, etc. To see all these additional variables, we will introduce a new plot that will feature more prominently in Chapter ?? that allows us to explore relationships among a suite of categorical variables – the tableplot from the tabplot package (Tennekes and de Jonge 2019). It allows us to sort the variables based on a single variable (think about how you might sort a spreadsheet based on one column and look at the results in other columns). The tableplot function displays bars for each response in a row80 based on the category of responses or as a bar with the height corresponding the value of quantitative variables81. It also plots a red cell if the observations were missing for a categorical variable and in grey for missing values on quantitative variables. The plot can be obtained simply as tableplot(DATASETNAME) which will sort the data set based on the first variable. To use our previous work with the sorted levels of Condition2, the code dd[,-1] is used to specify the data set without Condition and then sort=Condition2 is used to sort based on the Condition2 variable. The pals=list("BrBG") option specifies a color palette for the plot that is color-blind friendly.

Tableplot of the full overtake data set sorted by outfit worn (Condition2).

Figure 2.71: Tableplot of the full overtake data set sorted by outfit worn (Condition2).

In the tableplot in Figure 2.71, we can now the six variables created related to aspects of each outfit. For example, the commuter helmet (darkest shade in Helmet column) was worn with all outfits except for the racer and casual. So maybe we would like to explore differences in overtake distances based on the type of helmet worn. Similarly, it might be nice to explore whether wearing reflective pant clips is useful and maybe there is an interaction between helmet type and leg clips on impacts on overtake distance (should we wear both or just one, for example). So instead of using the seven level Condition2 in the model to assess differences based on all combinations of these outfits delineated in the other variables, we can try to fit a model with Helmet and ReflectClips and their interaction for overtake distances:

## 
## Call:
## lm(formula = Distance ~ Helmet * ReflectClips, data = dd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -115.111  -17.756   -0.611   16.889  156.889 
## 
## Coefficients: (3 not defined because of singularities)
##                            Estimate Std. Error t value Pr(>|t|)
## (Intercept)                117.1106     0.4710 248.641   <2e-16
## Helmethat                    0.5004     1.1738   0.426    0.670
## Helmetrace                  -0.3547     1.1308  -0.314    0.754
## ReflectClipsyes                  NA         NA      NA       NA
## Helmethat:ReflectClipsyes        NA         NA      NA       NA
## Helmetrace:ReflectClipsyes       NA         NA      NA       NA
## 
## Residual standard error: 30.01 on 5687 degrees of freedom
## Multiple R-squared:  5.877e-05,  Adjusted R-squared:  -0.0002929 
## F-statistic: 0.1671 on 2 and 5687 DF,  p-value: 0.8461

The full model summary shows some odd things. First there is a warning after Coefficients of (3 not defined because of singularities). And then in the coefficient table, there are NAs for everything in the rows for ReflectClipsyes and the two interaction components. When lm encounters models where the data measured are not sufficient to estimate the model, it essentially drops parts of the model that you were hoping to estimate and only estimates what it can. In this case, it just estimates coefficients for the intercept and two deviation coefficients for Helmet types; the other three coefficients (\(\gamma_2\) and the two \(\omega\)s) are not estimable. This reinforces the need to check coefficients in any model you are fitting. A tally of the counts of observations across the two explanatory variables helps to understand the situation and problem:

##           ReflectClips
## Helmet       no  yes
##   commuter    0 4059
##   hat       779    0
##   race        0  852

There are three combinations that have \(n_{jk}=0\) observations (for example for the commuter helmet, clips were always worn so no observations were made with this helmet without clips). So we have no hope of estimating a mean for the combinations with 0 observations and these are needed to consider interactions. If we revisit the tableplot, we can see how some of these needed combinations do not occur together. So this is an unbalanced design but also lack necessary information to explore the potential research question of interest. In order to study these two variables and their interaction, the researchers would have had to do rides with all six combinations of these variables. This could be quite informative because it could help someone tailor their outfit choice for optimal safety but also would have created many more than seven different outfit combinations to wear.

Hopefully by pushing the limits there are three conclusions available from this section. First, replication is important, both in being able to perform tests for interactions and for having enough power to detect differences for the main effects. Second, when dropping from the interaction model to additive model, the variability explained by the interaction term is pushed into the error term, whether replication is available or not. Third, we need to make sure we have observations at all combinations of variables if we want to be able to estimate models using them and their interaction.

References

Tennekes, Martijn, and Edwin de Jonge. 2019. Tabplot: Tableplot, a Visualization of Large Datasets. https://CRAN.R-project.org/package=tabplot.


  1. We switched back to the anova function here as the Anova function only reports Error in Anova.lm(lm(responses ~ dropsf * brand, data = ptR)) : residual df = 0, which is fine but not as useful for understanding this issue as what anova provides.

  2. In larger data sets, multiple subjects are displayed in each row as proportions of the rows in each category.

  3. Quantitative variables are displayed with boxplot-like bounds to describe the variability in the variable for that row of responses for larger data sets.