This page was last updated on October 28, 2018.
In this tutorial you will learn how to analyze a numeric response variable in relation to a categorical explanatory variable that has more than two groups. In other words, we will learn how to compare the means of more than 2 groups.
When comparing the means of more than two groups, the method that should first be considered is called, somewhat confusingly, the Analysis of Variance (ANOVA).
This tutorial focuses on the fixed-effects ANOVA (also called Model-1 ANOVA), in which the different categories of the explanatory variable are predetermined, of direct interest, and repeatable. These methods therefore typically apply to experimental studies.
In contrast, when the groups are sampled at random from a larger population of groups, as in most observational studies, one should typically use a random-effects ANOVA (also called Model-2 ANOVA). Consult the following webpage for tutorials on how to conduct various types of ANOVA.
tigerstats
knitr
kableExtra
car
ggpubr
We’ve used all but the last package previously.
install.packages("ggpubr")
Load the packages:
library(knitr)
library(tigerstats)
library(car)
library(ggpubr)
library(kableExtra)
circadian <- read.csv(url("https://people.ok.ubc.ca/jpither/datasets/circadian.csv"), header = TRUE)
The circadian
data describe melatonin production in 22 people randomly assigned to one of three light treatments.
head(circadian)
## treatment shift
## 1 control 0.53
## 2 control 0.36
## 3 control 0.20
## 4 control -0.37
## 5 control -0.60
## 6 control -0.64
inspect(circadian)
##
## categorical variables:
## name class levels n missing
## 1 treatment factor 3 22 0
## distribution
## 1 control (36.4%), eyes (31.8%) ...
##
## quantitative variables:
## name class min Q1 median Q3 max mean sd n
## 1 shift numeric -2.83 -1.33 -0.66 -0.05 0.73 -0.7127273 0.8901534 22
## missing
## 1 0
We see that the data are stored in long format, as described in the comparing two means tutorial.
The treatment
variable is a factor
variable with 3 levels.
The shift
variable is our numeric response variable.
There are no missing values in the dataset.
Let’s determine the sample sizes for each treatment group:
sampsizes <- xtabs(~ treatment, data = circadian);
sampsizes
## treatment
## control eyes knee
## 8 7 7
So, pretty small sample sizes per group, but remember that this is an experiment, and large sample sizes can be expensive.
TIP: Consult Chapter 14 regarding **Experimental Design*, including advice on determining appropriate sample sizes.
One of the things you’ll learn about experimental design is that you can reduce sampling error by having balanced (equal) sample sizes among groups. In the present case, we have only minor imbalance: the control group has one more subject than the other two treatment groups.
The analysis of variance (ANOVA) is used to compare means among more than two groups.
Follow these steps when conducting a hypothesis test:
Additional steps are required for ANOVA tests.
In addition to the usual hypothesis test results, you should always report:
The null hypothesis of ANOVA is that the population means \(\mu\)i are the same for all treatments. Thus:
H0: Mean melatonin production is equal among all treatment groups (\(\mu\)1 = \(\mu\)2 = \(\mu\)3).
HA: At least one treatment group’s mean is different from the others
OR
HA: Mean melatonin production is not equal among all treatment groups.
We’ll set \(\alpha\) = 0.05.
We’ll use a fixed-effects ANOVA, which uses the F test statistic.
Because we have a numeric response variable and a categorical explanatory variable, we must choose between a strip chart or boxplot, depending on sample sizes. We discovered above that we have small sample sizes in each group (< 10), so a strip chart is the best way to visualize these data.
We learned in an earlier (tutorial)[https://people.ok.ubc.ca/jpither/modules/Visualizing_association_two_variables.html#create_a_strip_chart] how to use the stripchart
function:
stripchart(shift ~ treatment, data = circadian,
ylab = "Shift in circadian rhythm (h)",
xlab = "Light treatment",
method = "jitter", # jitters the symbols
pch = 1, # pch changes the symbol type
col = "firebrick",
vertical = TRUE,
las = 1) # orients y-axis tick labels properly
Figure 1: Stripchart of phase shifts in the circadian rhythm of melatonin production in 22 participants of an experiment.
We notice that this figure does not look like the one in the text (Fig. 15.1-1): the treatments are ordered differently, and do not have upper-case first letters. Also, we’re missing error bars.
We’ll address these issues now.
First, we need to re-order the levels of the treatment
factor variable.
We need to use the ordered
function:
?ordered
Use the following code, where first we re-order the factors, then we change their names, capitalizing the first letters:
circadian$treatment <- ordered(circadian$treatment, levels = c("control", "knee", "eyes"))
levels(circadian$treatment)
## [1] "control" "knee" "eyes"
Now let’s capitalize the first letters of the levels:
levels(circadian$treatment) <- c("Control", "Knee", "Eyes")
levels(circadian$treatment)
## [1] "Control" "Knee" "Eyes"
Now we can produce a better strip chart.
NOTE: You can produce an even better stripchart than those produced below using the ggplot2
package, as shown here.
We’re going to use the ggpubr
package for this, and its ggstripchart
function, which enables adding error bars. This is not easily done with the regular stripchart
function.
?ggstripchart
NOTE: There is a variety of options for the add
argument, including adding +/- one standard error, and confidence intervals for example.
Here, we’ll use the add = "mean_se"
option to show +/- one standard error, which mirrors what is shown in Figure 15.1-1 in the text.
ggstripchart(circadian,
x = "treatment",
y = "shift",
xlab = "Treatment",
ylab = "Shift in circadian rhythm (h)",
shape = 1, # hollow circles for points
color = "firebrick", # color for points
add = "mean_se", # adds +/- one standard error
add.params = list(color = "black")) # colour for error bars
Figure 2: Stripchart of phase shifts in the circadian rhythm of melatonin production in 22 participants of an experiment. Solid circles denote group means, and bars +/- one SE
Clearly there is no difference between the control and knee treatment groups, but we’ll have to await the results of the ANOVA to see whether the mean of the “eyes” group is different.
add
argument to the ggstripchart
function)The assumptions of ANOVA are:
Normality assumption
The sample sizes per group are rather small, so graphical aids such as histograms or normal quantile plots will not be particularly helpful.
Instead, let’s rely on the fact that (i) the strip chart in Figure 2 does not show any obvious outliers in any of the groups, and (ii) the central limit theory makes ANOVAs quite robust to deviations in normality, especially with larger sample sizes, but even with relatively small sample sizes such as these. So, we’ll proceed under the assumption that the measurements are normally distributed within the three populations.
Equal variance assumption
We’ll use the leveneTest
function from the car
package, as we did in the tutorial for the 2-sample t-test.
variance.check <- leveneTest(shift ~ treatment, data = circadian)
variance.check
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.1586 0.8545
## 19
We see that the null hypothesis of equal variance is not rejected, so the assumption is met.
When you’re conducting a Levene’s Test as a test of an assumption (as we are here), you don’t need to formalize it with hypothesis statement etc…
We state: “We found no evidence against the assumption of equal variance (Levene’s test; F = 0.16; P-value = 0.854).” We therefore proceed with the ANOVA.
shift
), and standard error of the mean for each of the three treatment groupsThere are two steps to conducting an ANOVA in R. The first is to use the lm
function (from the base package), which stands for “linear model” (of which ANOVA is one type), to create an object that holds the key output from the test.
We then use the anova
function (from the base package) on the preceding lm
output object to generate the appropriate output for interpreting the results.
The lm
function uses the same syntax we’ve been using for most of our tests, i.e. Y ~ X.
circadian.lm <- lm(shift ~ treatment, data = circadian)
circadian.anova <- anova(circadian.lm)
circadian.anova
## Analysis of Variance Table
##
## Response: shift
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 2 7.2245 3.6122 7.2894 0.004472 **
## Residuals 19 9.4153 0.4955
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Before we proceed to interpreting the results, let’s make an appropriately formatted ANOVA table.
NOTE: These instructions are for a so-called one-way ANOVA, in which there is only one categorical variable against which a numerical variable is being analyzed. In the present case, we are analyzing a response variable in relation to the “treatment” categorical variable, which has 3 categories or “levels”.
We’re going to use the skills we learned in a previous tutorial to generate a nicely formatted ANOVA table.
We’re also going to write a function to do all the steps in one go. But first, we’ll learn each of the steps.
The ANOVA table provided as the default is not quite what we want:
circadian.anova
## Analysis of Variance Table
##
## Response: shift
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 2 7.2245 3.6122 7.2894 0.004472 **
## Residuals 19 9.4153 0.4955
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It is missing the “Total” row, and it puts the degrees of freedom (df) column before the Sums of Squares. Also, the column headers could be more informative.
Let’s reformat the table accordingly.
First, reorder the columns appropriately:
circadian.anova.new <- circadian.anova[,c(2,1,3:5)]
circadian.anova.new
## Sum Sq Df Mean Sq F value Pr(>F)
## treatment 7.2245 2 3.6122 7.2894 0.004472 **
## Residuals 9.4153 19 0.4955
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Second, rename the columns, and we can use short form (“SS” denotes sum of squares, “df”" is degrees of freedom, “MS” is mean squares):
names(circadian.anova.new) <- c("SS", "df", "MS", "F", "P_val")
circadian.anova.new
## SS df MS F P_val
## treatment 7.2245 2 3.6122 7.2894 0.0044723
## Residuals 9.4153 19 0.4955
Now we need to add a third row holding the “Total” values for the first two columns (Sums of squares and df).
We can do this by first using the colSums
function to sum values in each column:
totals <- colSums(circadian.anova.new)
totals
## SS df MS F P_val
## 16.63984 21.00000 4.10779 NA NA
The result includes more sums than we need: we don’t need the sum of the mean squares values. So, we’ll put an “NA” value in that spot:
totals["MS"] <- NA
totals
## SS df MS F P_val
## 16.63984 21.00000 NA NA NA
Now we can append this information to the ANOVA table, using the rbind
function, nested in the as.data.frame
function:
nice.anova.table <- as.data.frame(rbind(circadian.anova.new, totals),
row.names = c("Treatment", "Error", "Total"))
nice.anova.table
## SS df MS F P_val
## Treatment 7.224492 2 3.6122459 7.289449 0.004472271
## Error 9.415345 19 0.4955445 NA NA
## Total 16.639836 21 NA NA NA
We now have what we need to produce table that is close to what is expected in the Biology guidelines for data presentation. We’ll do this below.
Here’s a function that will do all the above steps in one go, i.e. it will re-format a standard ANOVA output table (from the anova
command) into a format that can be passed to kable
:
create.anova.table <- function(intable){
# "intable" is a standard output table from the `anova` function
temp.anova <- intable[,c(2,1,3:5)] # reorder columns
names(temp.anova) <- c("SS", "df", "MS", "F", "P_val") # rename columns
totals <- colSums(temp.anova) # calculate totals
totals["MS"] <- NA # replace mean squares sum with NA
nice.anova.table <- as.data.frame(rbind(temp.anova, totals),
row.names = c("Treatment", "Error", "Total")) # generate table
return(nice.anova.table)
}
NOTE: this is a quick-and-dirty function that does NOT follow all the best practices in coding!
Let’s try out the function, using our original (unaltered) anova table:
nice.anova.table <- create.anova.table(circadian.anova)
nice.anova.table
## SS df MS F P_val
## Treatment 7.224492 2 3.6122459 7.289449 0.004472271
## Error 9.415345 19 0.4955445 NA NA
## Total 16.639836 21 NA NA NA
It worked!
We’ll use the kable
function we learned about in an earlier tutorial to knit a nice table.
Word format
To generate a nice table that will export to Word properly, paste this code into an R chunk:
options(knitr.kable.NA = '') # suppress showing NA values in table
kable(nice.anova.table, format = "pandoc",
caption = "Table 1: ANOVA table for the circadian rhythm experiment.",
digits = c(3, 0, 4, 2, 3), align = "rrrrr") %>%
kable_styling(full_width = TRUE)
HTML format
To generate a nice table that will export to HTML properly, paste this code into an R chunk:
options(knitr.kable.NA = '') # suppress showing NA values in table
kable(nice.anova.table, format = "html",
caption = "Table 1: ANOVA table for the circadian rhythm experiment.",
digits = c(3, 0, 4, 2, 3), align = "rrrrr") %>%
kable_styling(full_width = TRUE)
SS | df | MS | F | P_val | |
---|---|---|---|---|---|
Treatment | 7.224 | 2 | 3.6122 | 7.29 | 0.004 |
Error | 9.415 | 19 | 0.4955 | ||
Total | 16.640 | 21 |
We can now refer to this nice Table 1 in our concluding statement, and if the Table is appropriately formatted like this one, you can simply refer to the Table rather than including all the additional details in your concluding statement (see below). Be sure to have your table placed near your concluding statement (and any required figure too).
Provided with an ANOVA table, and a good figure (Figure 2), we’re ready to draw a conclusion.
We see that P-value from our nicely formatted ANOVA table (0.004) is less than \(\alpha\), so we reject the null hypothesis, and conclude:
Shifts in circadian rhythm differ significantly among treatment groups (Table 1; ANOVA; F = 7.29; df = 2, 19; P = 0.004).
Note the two different values for degrees of freedom, always showing the numerator (MS_treatment) df first.
Alternative statement:
Shifts in circadian rhythm differ significantly among treatment groups (ANOVA; Table 1).
We presently don’t have a measure of “effect size” to report with our conclusion (akin to a confidence interval). We’ll fix this next.
One measure that is typically reported with any “linear model” like ANOVA is the “variance explained” or coefficient of determination, denoted R2.
This isn’t really a measure of “effect size”, but it is informative, so report it!
To calculate this we use two steps:
circadian.lm.summary <- summary(circadian.lm)
circadian.lm.summary$r.squared
## [1] 0.4341684
As shown on page 469 of the text, this value indicates the “fraction of the variation in Y that is explained by groups”.
The remainder of the variation (1 - R2) is “error”, or variation that is left unexplained by the groups.
So, we can modify our concluding statement accordingly:
Shifts in circadian rhythm differ significantly among treatment groups (ANOVA; Table 1; R2 = 0.43).
But which group(s) differ?? We’ll figure this out in the next section.
As described in Chapter 15 in the text, planned comparisons (aka planned contrasts) are ideal and most powerful, but unfortunately we often need to conduct unplanned comparisons to assess which groups differ in our ANOVA test. This is what we’ll learn here.
We can guess from Figure 2 that it’s the “Eyes” treatment group that differs from the others, but we need a formal test.
We could simply conduct three 2-sample t-tests on each of the three pair-wise comparisons, but then we would inflate our Type-I error rate, due to multiple-testing.
The Tukey-Kramer “post-hoc” (unplanned) test adjusts our P-values correctly to account for multiple tests.
For this test we use the TukeyHSD
function in the base stats package (HSD stands for “Honestly Significant Difference”).
?TukeyHSD
Here’s the code, and note we use the lm
object we created above:
circadianTukey <- TukeyHSD(circadian.lm, conf.level = 0.95)
circadianTukey
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = x)
##
## $treatment
## diff lwr upr p adj
## Knee-Control -0.02696429 -0.9525222 0.8985936 0.9969851
## Eyes-Control -1.24267857 -2.1682364 -0.3171207 0.0078656
## Eyes-Knee -1.21571429 -2.1716263 -0.2598022 0.0116776
The output clearly shows the pairwise comparisons and associated P-values, adjusted for multiple comparisons. It also shows the difference in means, and the lower and upper 95% confidence interval for the differences.
We can see that the mean for the “Eyes” treatment group differs significantly (at \(\alpha\) = 0.05) from each of the other group means.
One typically shows these results visually on the same figure used to display the data (here, Figure 2).
To do this, we superimpose a text letter alongside (or above) each group in the figure, such that groups sharing the same letter are not significantly different according to the Tukey-Kramer post-hoc test.
First, let’s re-plot Figure 2, so that we can see what the upper limits of the y-axis are, for future reference:
ggstripchart(circadian,
x = "treatment",
y = "shift",
xlab = "Treatment",
ylab = "Shift in circadian rhythm (h)",
shape = 1, # hollow circles for points
color = "firebrick", # color for points
add = "mean_se", # adds +/- one standard error
add.params = list(color = "black")) # colour for error bars
Figure 2: Stripchart of phase shifts in the circadian rhythm of melatonin production in 22 participants of an experiment. Solid circles denote group means, and bars +/- one SE
OK, so we see that the upper y-axis limit is around positive 1.
We will add our letters above each group, at around the y = 1.1 line.
To do this, we first add the ylim
argument to the ggstripchart
function, and we also assign the output of the graphing function to an object called stripchart.fig
.
This allows us to add notation to the figure afterwards, using the annotate
function, which is loaded as part of the ggpubr
package, but resides in the ggplot2
package:
annotate
Below, you’ll see that the annotate
function places the “labels” (“a”, “a”, “b”) on the graph according to the x and y coordinates that are provided. Here, I provide x values of 1 through 3 (corresponding to the locations of each of the groups along the x-axis), and a y-value of 1.1, which will be recycled for each value of x:
stripchart.fig <- ggstripchart(circadian,
ylim = c(-3.1, 1.2),
x = "treatment",
y = "shift",
xlab = "Treatment",
ylab = "Shift in circadian rhythm (h)",
shape = 1, # hollow circles for points
color = "firebrick", # color for points
add = "mean_se", # adds +/- one standard error
add.params = list(color = "black"))
# Now annotate:
stripchart.fig + annotate("text", x = 1:3, y = 1.1, label = c("a", "a", "b"))
Figure 3: Stripchart showing the phase shift in the circadian rhythm of melatonin production in 22 experimental participants given alternative light treatments. Solid circles represent group means, and bars represent +/- one SE. Group means sharing the same letter are not significantly different according to the Tukey-Kramer post-hoc test (family-wise \(\alpha\) = 0.05).
NOTE: The chunk option I included to get the figure heading shown above is as follows:
{r, fig.cap = "Figure 3: Stripchart showing the phase shift in the circadian rhythm of melatonin production in 22 experimental participants given alternative light treatments. Solid circles represent group means, and bars represent +/- one SE. Group means sharing the same letter are not significantly different according to the Tukey-Kramer post-hoc test (family-wise $\\alpha$ = 0.05).", fig.width = 4, fig.height = 4.5}
The “family-wise” \(\alpha\) statement means that the Tukey-Kramer test uses our initial \(\alpha\) level, but ensures that the probability of making at least one Type-1 error throughout the course of testing all pairs of means is no greater than the originally stated \(\alpha\) level.
HINT: Figure 3 is the kind of figure that should be referenced, along with ANOVA table, when communicating your results of an ANOVA. For instance, with the resuls of the Tukey-Kramer post-hoc tests superimposed on the figure, you can not only state that the null hypothesis is rejected, but you can also state which group(s) differ from which others.
Pretend that our post-hoc test showed that the “Knee” group was not different from either the control or the Eyes group. Re-do the figure above, but this time place an “ab” above the Knee group on the figure. This is how you would indicate that the control and eyes grouped differed significantly from one-another, but neither differed significantly from the Knee group.
We are now able to write our truly final concluding statement, below.
Shifts in circadian rhythm differ significantly among treatment groups (ANOVA; Table 1; R2 = 0.43). As shown in Figure 3, the mean shift among the “Eyes” subjects was significantly lower than both of the other treatment groups.
Getting started:
read.csv
url
library
Data management / manipulation:
inspect
(tigerstats
/ mosaic
packages)levels
head
xtabs
(tigerstats
/ mosaic
packages)levels
ordered
Graphs:
stripchart
ggstripchart
(from the ggpubr
package)qqnorm
qqline
par
Assumptions:
leveneTest
(car
package)ANOVA:
lm
anova
ANOVA Table:
names
colSums
as.data.frame
rbind
kable
(knitr
and kableExtra
packages)Post-hoc tests:
TukeyHSD
annotate
(ggplot2
package, loaded with ggpubr
package)