A second example of the One-way ANOVA methods involves a study of length of
odontoblasts (cells that are responsible for tooth growth) in 60 Guinea Pigs
(measured in microns) from Crampton (1947) and is available in base R using data(ToothGrowth)
. \(N=60\) Guinea Pigs were obtained
from a local breeder and each received one of three dosages (0.5, 1, or
2 mg/day) of Vitamin C via one of two delivery methods, Orange Juice (OJ) or
ascorbic acid (the stuff in vitamin C capsules, called \(\text{VC}\) below) as the source
of Vitamin C in their diets. Each guinea pig was randomly assigned to receive
one of the six different treatment combinations possible
(OJ at 0.5 mg, OJ at 1 mg, OJ at 2 mg, VC at 0.5 mg, VC at 1 mg, and VC at 2 mg).
The animals were treated similarly otherwise and we can assume lived in
separate cages and only one observation was taken for each guinea pig, so we
can assume the observations are independent61. We need to create a variable that
combines the levels of delivery type (OJ, VC) and the dosages (0.5, 1, and 2)
to use our One-Way ANOVA on the six levels. The interaction
function can be
used create a new variable that is based on combinations of the levels of other
variables. Here a new variable is created in the ToothGrowth
tibble that
we called Treat
using the interaction
function that provides a six-level grouping variable for our One-Way
ANOVA to compare the combinations of treatments. To get a sense of the pattern of
observations in the data set, the counts in supp
(supplement type) and
dose
are provided and then the counts in the new categorical explanatory variable, Treat
.
data(ToothGrowth) #Available in Base R
library(tibble)
ToothGrowth <- as_tibble(ToothGrowth) #Convert data.frame to tibble
library(mosaic)
## supp
## OJ VC
## 30 30
## dose
## 0.5 1 2
## 20 20 20
#Creates a new variable Treat with 6 levels
ToothGrowth$Treat <- with(ToothGrowth, interaction(supp, dose))
#New variable that combines supplement type and dosage
tally(~Treat, data=ToothGrowth)
## Treat
## OJ.0.5 VC.0.5 OJ.1 VC.1 OJ.2 VC.2
## 10 10 10 10 10 10
The tally
function helps us to check for balance;
this is a balanced design
because the same number of guinea pigs (\(n_j=10 \text{ for } j=1, 2,\ldots, 6\))
were measured in each treatment combination.
With the variable Treat
prepared, the first task is to visualize the results
using pirate-plots62 (Figure 2.41) and generate some summary statistics for
each group using favstats
.
## Treat min Q1 median Q3 max mean sd n missing
## 1 OJ.0.5 8.2 9.700 12.25 16.175 21.5 13.23 4.459709 10 0
## 2 VC.0.5 4.2 5.950 7.15 10.900 11.5 7.98 2.746634 10 0
## 3 OJ.1 14.5 20.300 23.45 25.650 27.3 22.70 3.910953 10 0
## 4 VC.1 13.6 15.275 16.50 17.300 22.5 16.77 2.515309 10 0
## 5 OJ.2 22.4 24.575 25.95 27.075 30.9 26.06 2.655058 10 0
## 6 VC.2 18.5 23.375 25.95 28.800 33.9 26.14 4.797731 10 0
(ref:fig3-14) Pirate-plot of odontoblast growth responses for the six treatment level combinations.
Figure 2.41 suggests that the mean tooth growth increases with the dosage level and that OJ might lead to higher growth rates than VC except at a dosage of 2 mg/day. The variability around the means looks to be small relative to the differences among the means, so we should expect a small p-value from our \(F\)-test. The design is balanced as noted above (\(n_j=10\) for all six groups) so the methods are some what resistant to impacts from potential non-normality and non-constant variance but we should still assess the patterns in the plots, especially with smaller sample sizes in each group. There is some suggestion of non-constant variance in the plots but this will be explored further below when we can remove the difference in the means and combine all the residuals together. There might be some skew in the responses in some of the groups (for example in OJ.0.5 a right skew may be present and in OJ.1 a left skew) but there are only 10 observations per group so visual evidence of skew in the pirate-plots could be generated by impacts of very few of the observations. This actually highlights an issue with residual explorations: when the sample sizes are small, our assumptions matter more than when the sample sizes are large, but when the sample sizes are small, we don’t have much information to assess the assumptions and come to a clear conclusion.
Now we can apply our 6+ steps for performing a hypothesis test with these observations.
The research question is about differences in odontoblast growth across these combinations of treatments and they seem to have collected data that allow this to explored. A pirate-plot would be a good start to displaying the results and understanding all the combinations of the predictor variable.
Hypotheses:
\(\boldsymbol{H_0: \mu_{\text{OJ}0.5} = \mu_{\text{VC}0.5} = \mu_{\text{OJ}1} = \mu_{\text{VC}1} = \mu_{\text{OJ}2} = \mu_{\text{VC}2}}\)
vs
\(\boldsymbol{H_A:}\textbf{ Not all } \boldsymbol{\mu_j} \textbf{ equal}\)
The null hypothesis could also be written in reference-coding as below since OJ.0.5 is chosen as the baseline group (discussed below).
The alternative hypothesis can be left a bit less specific:
Plot the data and assess validity conditions:
Independence:
This is where the separate cages note above is important. Suppose that there were cages that contained multiple animals and they competed for food or could share illness or levels of activity. The animals in one cage might be systematically different from the others and this “clustering” of observations would present a potential violation of the independence assumption.
If the experiment had the animals in separate cages, there is no clear dependency in the design of the study and we can assume63 that there is no problem with this assumption.
Constant variance:
The Residuals vs Fitted panel in Figure 2.42 shows some difference in the spreads but the spread is not that different among the groups.
The Scale-Location plot also shows just a little less variability in the group with the smallest fitted value but the spread of the groups looks fairly similar in this alternative presentation related to assessing equal variance.
Put together, the evidence for non-constant variance is not that strong and we can proceed comfortably that there is at least not a clear issue with this assumption. Because of the balanced design, we also get a little more resistance to violation of the equal variance assumption.
Normality of residuals:
Calculate the test statistic and find the p-value:
## Analysis of Variance Table
##
## Response: len
## Df Sum Sq Mean Sq F value Pr(>F)
## Treat 5 2740.10 548.02 41.557 < 2.2e-16
## Residuals 54 712.11 13.19
There are two options here, especially since it seems that our assumptions about variance and normality are not violated (note that we do not say “met” – we just have no clear evidence against them). The parametric and nonparametric approaches should provide similar results here.
The parametric approach is easiest – the p-value comes from the previous
ANOVA table as < 2e-16
. First, note that this is in scientific notation
that is a compact way of saying that the p-value here is \(2.2*10^{-16}\) or
0.00000000000000022.
When you see 2.2e-16
in R output, it also means
that the calculation is at the numerical precision limits of the computer.
What R is really trying to report is that this is a very small number.
When you encounter p-values that are smaller than 0.0001, you should just report that the p-value < 0.0001. Do not report that it is 0 as this gives
the false impression that there is no chance of the result occurring when
it is just a really small probability. This p-value came from an \(F(5,54)\)
distribution (this is the distribution of the test statistic if the null hypothesis
is true) with an $F-statistic of 41.56.
The nonparametric approach is not too hard so we can compare the two approaches here as well:
## [1] 41.55718
par(mfrow=c(1,2))
B <- 1000
Tstar <- matrix(NA, nrow=B)
for (b in (1:B)){
Tstar[b] <- anova(lm(len~shuffle(Treat), data=ToothGrowth))[1,4]
}
pdata(Tstar, Tobs, lower.tail=F)[[1]]
## [1] 0
hist(Tstar, xlim=c(0,Tobs+3))
abline(v=Tobs, col="red", lwd=3)
plot(density(Tstar), xlim=c(0,Tobs+3), main="Density curve of Tstar")
abline(v=Tobs, col="red", lwd=3)
Write a conclusion:
There is strong evidence against the null hypothesis that the different treatments (combinations of OJ/VC and dosage levels) have the same true mean odontoblast growth for these guinea pigs, so we would conclude that the treatments cause at least one of the combinations to have a different true mean.
We can make the causal statement of the treatment causing differences because the treatments were randomly assigned but these inferences only apply to these guinea pigs since they were not randomly selected from a larger population.
Remember that we are making inferences to the population or true means and not the sample means and want to make that clear in any conclusion. When there is not a random sample from a population it is more natural to discuss the true means since we can’t extend to the population values.
The alternative is that there is some difference in the true means – be sure to make the wording clear that you aren’t saying that all the means differ. In fact, if you look back at Figure 2.41, the means for the 2 mg dosages look almost the same so we will have a tough time arguing that all groups differ. The \(F\)-test is about finding evidence of some difference somewhere among the true means. The next section will provide some additional tools to get more specific about the source of those detected differences and allow us to get at estimates of the differences we observed to complete our interpretation.
Discuss size of differences:
It appears that increasing dose levels are related to increased odontoblast growth and that the differences in dose effects change based on the type of delivery method. The difference between 7 and 26 microns for the average length of the cells could be quite interesting to the researchers. This result is harder for me to judge and likely for you than the average distances of cars to bikes but the differences could be very interesting to these researchers.
The “size” discussion can be further augmented by estimated pair-wise differences using methods discussed below.
Scope of inference:
We can make a causal statement of the treatment causing differences in the responses because the treatments were randomly assigned but these inferences only apply to these guinea pigs since they were not randomly selected from a larger population.
Before we leave this example, we should revisit our model estimates and
interpretations. The default model parameterization uses reference-coding.
Running the model summary
function on m2
provides the estimated
coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.23 1.148353 11.520847 3.602548e-16
## TreatVC.0.5 -5.25 1.624017 -3.232726 2.092470e-03
## TreatOJ.1 9.47 1.624017 5.831222 3.175641e-07
## TreatVC.1 3.54 1.624017 2.179781 3.365317e-02
## TreatOJ.2 12.83 1.624017 7.900166 1.429712e-10
## TreatVC.2 12.91 1.624017 7.949427 1.190410e-10
For some practice with the reference-coding used in these models, let’s find
the estimates (fitted values) for observations for a couple of the groups. To work with the
parameters, you need to start with determining the baseline category that was
used by considering which level is not displayed in the output. The
levels
function can list the groups in a categorical variable and their
coding in the data set. The first level is usually the baseline category but
you should check this in the model summary as well.
## [1] "OJ.0.5" "VC.0.5" "OJ.1" "VC.1" "OJ.2" "VC.2"
There is a VC.0.5
in the second row of the model summary, but there is no row for
0J.0.5
and so this must be the baseline category. That means that the fitted value
or model estimate for the OJ at 0.5 mg/day group is the same as the (Intercept)
row
or \(\hat{\alpha}\), estimating a mean tooth growth of 13.23 microns when the pigs get OJ
at a 0.5 mg/day dosage level. You should always start with working on the baseline level
in a reference-coded model. To get estimates for any other group, then you can use the
(Intercept)
estimate and add the deviation (which could be negative) for the group of interest. For
VC.0.5
, the estimated mean tooth growth is
\(\hat{\alpha} + \hat{\tau}_2 = \hat{\alpha} + \hat{\tau}_{\text{VC}0.5}=13.23 + (-5.25)=7.98\)
microns. It is also potentially interesting to directly interpret the estimated difference
(or deviation) between OJ.0.5
(the baseline) and VC.0.5
(group 2) that
is \(\hat{\tau}_{\text{VC}0.5}= -5.25\): we estimate that the mean tooth growth in
VC.0.5
is 5.25 microns shorter than it is in OJ.0.5
. This and many other
direct comparisons of groups are likely of interest to researchers involved in
studying the impacts of these supplements on tooth growth and the next section
will show us how to do that (correctly!).
The reference-coding is still going to feel a little uncomfortable so the comparison
to the cell-means model and exploring the effect plot can help to reinforce
that both models patch together the same estimated means for each group. For
example, we can find our estimate of 7.98 microns for the VC0.5 group in the
output and Figure 2.44. Also note that Figure
2.44 is the same whether you plot the results from
m2
or m3
.
##
## Call:
## lm(formula = len ~ Treat - 1, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.20 -2.72 -0.27 2.65 8.27
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## TreatOJ.0.5 13.230 1.148 11.521 3.60e-16
## TreatVC.0.5 7.980 1.148 6.949 4.98e-09
## TreatOJ.1 22.700 1.148 19.767 < 2e-16
## TreatVC.1 16.770 1.148 14.604 < 2e-16
## TreatOJ.2 26.060 1.148 22.693 < 2e-16
## TreatVC.2 26.140 1.148 22.763 < 2e-16
##
## Residual standard error: 3.631 on 54 degrees of freedom
## Multiple R-squared: 0.9712, Adjusted R-squared: 0.968
## F-statistic: 303 on 6 and 54 DF, p-value: < 2.2e-16
Crampton, E. 1947. “The Growth of the Odontoblast of the Incisor Teeth as a Criterion of Vitamin c Intake of the Guinea Pig.” The Journal of Nutrition 33 (5): 491–504. http://jn.nutrition.org/content/33/5/491.full.pdf.
A violation of the independence assumption could have easily been created if they measured cells in two locations on each guinea pig or took measurements over time on each subject.↩
Note that to see all the group labels in the plot when making the figure, you have to widen the plot window before copying the figure out of R. You can resize the plot window using the small vertical and horizontal “=” signs in the grey bars that separate the different panels in RStudio.↩
In working with researchers on hundreds of projects, my experience has been that many conversations are often required to discover all the potential sources of issues in data sets, especially related to assessing independence of the observations. Discussing how the data were collected is sometimes the only way to understand whether violations of independence are present or not.↩