YaRrr! The Pirate’s Guide to R

14.4 Ex: Two-way ANOVA

To conduct a two-way ANOVA or a Menage a trois NOVA, just include additional independent variables in the regression model formula with the + sign. That’s it. All the steps are the same. Let’s conduct a two-way ANOVA with both cleaner and type as independent variables. To do this, we’ll set formula = time ~ cleaner + type.

# Step 1: Create ANOVA object with aov()
cleaner.type.aov <- aov(formula = time ~ cleaner + type,
                        data = poopdeck)

# Step 2: Get ANOVA table with summary()
summary(cleaner.type.aov)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## cleaner       2   6057    3028    6.94  0.001 ** 
## type          1  81620   81620  187.18 <2e-16 ***
## Residuals   596 259891     436                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It looks like we found significant effects of both independent variables.

# Step 3: Conduct post-hoc tests
TukeyHSD(cleaner.type.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = time ~ cleaner + type, data = poopdeck)
## 
## $cleaner
##      diff   lwr  upr p adj
## b-a -0.42  -5.3  4.5  0.98
## c-a -6.94 -11.8 -2.0  0.00
## c-b -6.52 -11.4 -1.6  0.01
## 
## $type
##              diff lwr upr p adj
## shark-parrot   23  20  27     0

The only non-significant group difference we found is between cleaner b and cleaner a. All other comparisons were significant.

# Step 4: Look at regression coefficients
cleaner.type.lm <- lm(formula = time ~ cleaner + type,
                      data = poopdeck)

summary(cleaner.type.lm)
## 
## Call:
## lm(formula = time ~ cleaner + type, data = poopdeck)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -59.74 -13.79  -0.68  13.58  83.58 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    54.36       1.71   31.88  < 2e-16 ***
## cleanerb       -0.42       2.09   -0.20  0.84067    
## cleanerc       -6.94       2.09   -3.32  0.00094 ***
## typeshark      23.33       1.71   13.68  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21 on 596 degrees of freedom
## Multiple R-squared:  0.252,  Adjusted R-squared:  0.248 
## F-statistic:   67 on 3 and 596 DF,  p-value: <2e-16

Now we need to interpret the results in respect to two default values (here, cleaner = a and type = parrot). The intercept means that the average time for cleaner a on parrot poop was 54.36 minutes. Additionally, the average time to clean shark poop was 23.33 minutes slower than when cleaning parrot poop.

14.4.1 ANOVA with interactions

Interactions between variables test whether or not the effect of one variable depends on another variable. For example, we could use an interaction to answer the question: Does the effect of cleaners depend on the type of poop they are used to clean? To include interaction terms in an ANOVA, just use an asterix (*) instead of the plus (+) between the terms in your formula. Note that when you include an interaction term in a regression object, R will automatically include the main effects as well/

Let’s repeat our previous ANOVA with two independent variables, but now we’ll include the interaction between cleaner and type. To do this, we’ll set the formula to time ~ cleaner * type.

# Step 1: Create ANOVA object with interactions
cleaner.type.int.aov <- aov(formula = time ~ cleaner * type,
                          data = poopdeck)

# Step 2: Look at summary table
summary(cleaner.type.int.aov)
##               Df Sum Sq Mean Sq F value  Pr(>F)    
## cleaner        2   6057    3028    7.82 0.00044 ***
## type           1  81620   81620  210.86 < 2e-16 ***
## cleaner:type   2  29968   14984   38.71 < 2e-16 ***
## Residuals    594 229923     387                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Looks like we did indeed find a significant interaction between cleaner and type. In other words, the effectiveness of a cleaner depends on the type of poop it’s being applied to. This makes sense given our plot of the data at the beginning of the chapter.

To understand the nature of the difference, we’ll look at the regression coefficients from a regression object:

# Step 4: Calculate regression coefficients
cleaner.type.int.lm <- lm(formula = time ~ cleaner * type,
                          data = poopdeck)

summary(cleaner.type.int.lm)
## 
## Call:
## lm(formula = time ~ cleaner * type, data = poopdeck)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54.28 -12.83  -0.08  12.29  74.87 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           45.76       1.97   23.26  < 2e-16 ***
## cleanerb               8.06       2.78    2.90  0.00391 ** 
## cleanerc              10.37       2.78    3.73  0.00021 ***
## typeshark             40.52       2.78   14.56  < 2e-16 ***
## cleanerb:typeshark   -16.96       3.93   -4.31  1.9e-05 ***
## cleanerc:typeshark   -34.62       3.93   -8.80  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20 on 594 degrees of freedom
## Multiple R-squared:  0.338,  Adjusted R-squared:  0.333 
## F-statistic: 60.8 on 5 and 594 DF,  p-value: <2e-16

Again, to interpret this table, we first need to know what the default values are. We can tell this from the coefficients that are ‘missing’ from the table. Because I don’t see terms for cleanera or typeparrot, this means that cleaner = "a" and type = "parrot" are the defaults. Again, we can interpret the coefficients as differences between a level and the default. It looks like for parrot poop, cleaners b and c both take more time than cleaner a (the default). Additionally, shark poop tends to take much longer than parrot poop to clean (the estimate for typeshark is positive).

The interaction terms tell us how the effect of cleaners changes when one is cleaning shark poop. The negative estimate (-16.96) for cleanerb:typeshark means that cleaner b is, on average 16.96 minutes faster when cleaning shark poop compared to parrot poop. Because the previous estimate for cleaner b (for parrot poop) was just 8.06, this suggests that cleaner b is slower than cleaner a for parrot poop, but faster than cleaner a for shark poop. Same thing for cleaner c which simply has stronger effects in both directions.