2.4 One factor ANOVA comparing means across several groups
As an example, suppose we want to compare the mean days to healing for 5 different treatments for fever blisters.
The data set includes four variables:
- 'DaysHeal' is the number of days to healing (fewer days indicate more effective medication) and our outcome variable;
- 'Treatment' is a group variable coded 1 through 5 for the 5 treatments;
- 'TreatName' is a character variable, with character values (TreatA, TreatB, etc.) rather than numeric values for treatment group.
- There are 6 subjects given each of the 5 treatments, for a sample of 30 subjects overall. For most analyses, R prefers numeric variables, but for Analysis of Variance, R prefers that the grouping variable be a character variable rather than a numeric variable.
When R performs an ANOVA, there is a lot of potential output. So I generally save the 'results' of the ANOVA as an object, and then ask for different parts of the output through different commands. To perform the ANOVA:
> fever_anova <- aov(DaysHeal ~ TreatName)
Here I've saved the results of the ANOVA as an object named 'fever_anova':
(If the grouping variable is a numeric variable, you can declare it to be categorical using the factor( ) function. For example, for the numeric 'Treatment' variable, the above ANOVA command becomes
> fever_anova <- aov(DaysHeal ~ factor(Treatment) )
This gives the same results as the above analysis.)
We can now request different summary results about the analysis using the results of this analysis. To see the means for the study groups:
> model.tables(fever_anova,"means",digits=3)
Tables of means
Grand mean
5.633333
TreatmentF
TreatmentF
1 2 3 4 5
7.500 5.000 4.333 5.167 6.167
The select if command or the tapply( ) function can be used to get standard deviations and sample sizes for each group, as described in Section 5b: Finding means and standard deviations for subgroups.
To request the ANOVA table and p-value for the overall ANOVA comparing means across the 5 groups:
> summary(fever_anova)
Df Sum Sq Mean Sq F value r(>F)
TreatmentF 4 36.467 9.117 3.896 0.01359 *
Residuals 25 58.500 2.340
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Given the overall ANOVA shows significance, we can request pairwise comparisons using Tukey's multiple comparison procedure:
> TukeyHSD(fever_anova)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = DaysHeal ~ TreatmentF)
$TreatmentF
diff lwr upr p adj
2-1 -2.5000000 -5.0937744 0.09377442 0.0627671
3-1 -3.1666667 -5.7604411 -0.57289224 0.0113209
4-1 -2.3333333 -4.9271078 0.26044109 0.0927171
5-1 -1.3333333 -3.9271078 1.26044109 0.5660002
3-2 -0.6666667 -3.2604411 1.92710776 0.9410027
4-2 0.1666667 -2.4271078 2.76044109 0.9996956
5-2 1.1666667 -1.4271078 3.76044109 0.6811222
4-3 0.8333333 -1.7604411 3.42710776 0.8770466
5-3 1.8333333 -0.7604411 4.42710776 0.2614661
5-4 1.0000000 -1.5937744 3.59377442 0.7881333