Tests for More Than Two Samples


In this section, we consider comparisons among more than two groups parametrically, using analysis of variance (ANOVA), as well as non-parametrically, using the Kruskal-Wallis test.  

Parametric Analysis of Variance (ANOVA)

To test if the means are equal for more than two groups we perform an analysis of variance test. An ANOVA test will determine if the grouping variable explains a significant portion of the variability in the dependent variable. If so, we would expect that the mean of your dependent variable will be different in each group. The assumptions of an ANOVA test are as follows:

 Here, we will use the Pima.tr dataset. According to National Heart Lung and Blood Institute (NHLBI) website (http://www.nhlbisupport.com/bmi/), BMI can be classified into 4 categories:

alternative accessible content

BMI For Adults Widget

 

  1. Create a categorical variable bmi.new to categorize the continuous bmivariable into four classes based on the definition shown above. Note that we have very few underweight individuals, so collapse underweight and normal weight into "Normal/under weight."
  2. Report the number of individuals in each category.
  3. Calculate the average glucose concentration in each category

 

An Aside

In this Pima.tr dataset the BMI is stored in numerical format, so we need to categorize BMI first since we are interested in whether categorical BMI is associated with the plasma glucose concentration. In the Exercise, you can use an "if-else-" statement to create the bmi.catvariable. Alternatively, we can use cut()function as well. Since we have very few individuals with BMI < 18.5, we will collapse categories "Underweight" and "Normal weight" together.

 

> bmi.label <-  c("Underweight/Normalweight", "Overweight", "Obesity")

> summary(bmi)

> bmi.break <- c(18, 24.9, 29.9, 50)

> bmi.cat <- cut(bmi, breaks=bmi.break, labels = bmi.label)

> table(bmi.cat)

bmi.cat

Underweight/Normal weight         Overweight                   Obesity

                       25                 43                       132

> tapply(glu, bmi.cat, mean)

Normal/under weight          Overweight             Obesity

           108.4800           116.6977             129.2727  

 

Suppose we want to compare the means of plasma glucose concentration for our four BMI categories. We will conduct analysis of variance using bmi.catvariable as a factor.

> bmi.cat <- factor(bmi.cat)

> bmi.anova <- aov(glu ~ bmi.cat)

 

Before looking at the result, you may be interested in checking each category's glucose concentration average. One way it can be done is using the tapply() function. But alternatively, we can also use another function.

> print(model.tables(bmi.anova, "means"))

 

Tables of means

Grand mean

      

123.97

 

 bmi.cat

   Underweight/Normal weight Overweight Obesity

                        108.5      116.7   129.3

rep                      25.0       43.0   132.0

 

Apparently, the glucose level varies in different categories. We can now request the ANOVA table for this analysis to check if the hypothesis testing result matches our observation in summary statistics.

> summary(bmi.anova)

             Df Sum Sq Mean Sq F value   Pr(>F)  

bmi.cat       2  11984    5992  6.2932 0.002242 **

Residuals   197 187575     952                   

We see that we reject the null hypothesis that the mean glucose is equal for all levels of bmi categories (F2,197 = 6.29, p-value = 0.002242). The plasma glucose concentration means in at least two categories are significantly different.

Naturally, we will want to know which category pair has different glucose concentrations. One way to answer this question is to conduct several two-sample tests and then adjust for multiple testing using the Bonferroni correction.

Performing many tests will increase the probability of finding one of them to be significant; that is, the p-values tend to be exaggerated (our type I error rate increases). A common adjustment method is the Bonferroni correction, which adjusts for multiple comparisons by changing the level of significance α for each test to α / (# of tests). Thus, if we were performing 10 tests to maintain a level of significance α of 0.05 we adjust for multiple testing using the Bonferroni correction by using 0.05/10 = 0.005 as our new level of significance.

 

A function called pairwise.t.test computes all possible two-group comparisons.

> pairwise.t.test(glu, bmi.cat, p.adj = "none")

 

        Pairwise comparisons using t tests with pooled SD

 

data:  glu and bmi.cat

 

           Underweight/Normalweight Overweight

Overweight 0.2910                    -        

Obesity    0.0023                    0.0213   

 

P value adjustment method: none

From this result we reject the null hypothesis that the mean glucose for those who are obese is equal to the mean glucose for those who are underweight/normal weight (p-value = 0.0023). We also reject the null hypothesis that the mean glucose for those who are obese is equal to the mean glucose for those who are overweight (p-value = 0.0213). We fail to reject the null hypothesis that the mean glucose for those who are overweight is equal to the mean glucose for those who are underweight (p-value = 0.2910).

We can also make adjustments for multiple comparisons, like so:

> pairwise.t.test(glu, bmi.cat, p.adj = "bonferroni")

 

        Pairwise comparisons using t tests with pooled SD

 

data:  glu and bmi.cat

 

           Underweight/Normal weight Overweight

Overweight 0.8729                    -        

Obesity    0.0069                    0.0639  

P value adjustment method: bonferroni

However, the Bonferroni correction is very conservative. Here, we introduce an alternative multiple comparison approach using Tukey's procedure:

> TukeyHSD(bmi.anova)

  Tukey multiple comparisons of means

    95% family-wise confidence level

Fit: aov(formula = glu ~ bmi.cat)

 

$bmi.cat                                                                                

diff         lwr      upr     p adj

Overweight-Underweight/Normalweight   8.217674 -10.1099039 26.54525 0.5407576

Obesity-Underweight/Normal weight    20.792727   4.8981963 36.68726 0.0064679

Obesity-Overweight                   12.575053  -0.2203125 25.37042 0.0552495

From the pairwise comparison, what do we find regarding the plasma glucose in the different weight categories?

It is important to note that when testing the assumptions of an ANOVA, the var.test function can only be performed for two groups at a time. To look at the assumption of equal variance for more than two groups, we can use side-by-side boxplots:

> boxplot(glu~bmi.cat)

To determine whether or not the assumption of equal variance is met we look to see if the spread is equal for each of the groups.

We can also conduct a formal test for homogeneity of variances when we have more than two groups. This test is called Bartlett's Test, which assumes normality. The procedure is performed as follows:

> bartlett.test(glu~bmi.cat)

 

        Bartlett test of homogeneity of variances

 

data:  glu by bmi.cat

Bartlett's K-squared = 3.6105, df = 2, p-value = 0.1644

 

 

H0: The variability in glucose is equal for all bmi categories.

Ha: The variability in glucose is not equal for all bmi categories.

We fail to reject the null hypothesis that the variability in glucose is equal for all bmi categories (Bartlett's K-squared = 3.6105, df = 2, p-value = 0.1644).