Two-sample Tests


Given the variation within each sample, how likely is it that our two sample means were drawn from populations with the same average? A better way to answer this question is to work out the probability that our two samples were indeed drawn from populations with the same mean. If this probability is very low, then we can be reasonably certain that the means really are different from one another.

Two-sample Paired Test

Paired tests are used when there are two measurements on the same experimental unit. The paired t-test has the same assumptions of independence and normality as a one-sample t-test. Let us look at a data set on weight change (anorexia), also from the MASS library. The data are from 72 young female anorexia patients. The three variables are treatment (Treat), weight before study (Prewt), and weight after study (Postwt). Here we are interested in finding out whether there is a placebo effect (i.e. patients who do not get treated gain some weight in the study).

> detach(Boston) ### important

> attach(anorexia)

> dif <- Postwt - Prewt

> dif.Cont <- dif[which(Treat=="Cont")]

 

Apply the summary() function to variable dif.Cont and comment on the summary statistics.

  1. Create plots to test the normality assumption.
  2. Conduct a formal test for normality.
  3. Does the normality assumption for variable dif.Cont hold?
  4. Conduct a Wilcoxon signed-rank test for determining whether the median number of rooms is significantly different from 6.
  5. Compare the result with section on the one-sample test of the mean

Conducting a "paired" t-test is virtually identical to a one-sample test on the element-wise differences. Both the parametric pair-wise t-tests and non-parametric Wilcoxon signed-rank tests are shown below.

> t.test(dif.Cont)

         One Sample t-test

data:  dif.Cont

t = -0.2872, df = 25, p-value = 0.7763

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

 -3.676708  2.776708

sample estimates:

mean of x

    -0.45

We see that we fail to reject the null hypothesis (t = -0.29, df = 25, p-value = 0.7763) that there is no difference in mean birth weight before and after the study in the Control group. The sample mean difference is equal to -0.45 with a 95% confidence interval of (-3.67, 2.77).

> wilcox.test(dif.Cont)

        Wilcoxon signed rank test with continuity correction

data:  dif.Cont

V = 150, p-value = 0.7468

alternative hypothesis: true location is not equal to 0

We thus fail to reject the null hypothesis (V = 150, p-value = 0.7468) that there is no difference in the median birth weight before and after the study in the Control group.

It is not necessary to create the derived difference variable. Instead, you may turn on the paired argument in the R command as follows:

> t.test(Postwt[which(Treat=="Cont")], Prewt[which(Treat=="Cont")], paired=TRUE)

> wilcox.test(Prewt[which(Treat=="Cont")], Postwt[which(Treat=="Cont")], paired=TRUE)

 

Conduct an appropriate test to determine whether the treatment is effective in the anorexia dataset. (Hint: Create a new variable called trt that is named "Control" if the patient was not given treatment and "Treatment" otherwise).

Paired t Test in R (R Tutorial 4.4) MarinStatsLectures [Contents]

alternative accessible content

Parametric Two-sample T-test

Now, we will analyze the Pima.tr dataset. The US National Institute of Diabetes and Digestive and Kidney Diseases collected data on 532 women who were at least 21 years old, of Pima Indian heritage and living near Phoenix, Arizona, who were tested for diabetes according to World Health Organization criteria. One simple question is whether the plasma glucose concentration is higher in diabetic individuals than it is in non-diabetic individuals.

To do this, we will perform a two-sample t-test which makes the following assumptions:

  1. Independent observations.
  2. Normal distribution for each of the two groups.
  3. Equal variance for each of the two groups.

The statistic is

ttwo-sample = [ (Ȳ1 - Ȳ2) – D0 ] / [Sp2 (1/n1+1/n2) ]   ~ Tn1 + n2 − 2

 

(usually D0 is just 0)

Sp2 (pooled variance) = [(n1 − 1)S12 + (n2 − 1)S2]/(n1 + n2 − 2)

 

> detach(anorexia)

> attach(Pima.tr)

> ?Pima.tr

 

> t.test(glu ~ type)

        Welch Two Sample t-test

 

data:  glu by type

t = -7.3856, df = 121.756, p-value = 2.081e-11

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -40.51739 -23.38813

sample estimates:

 mean in group No mean in group Yes

         113.1061          145.0588

Here we see that we reject the null hypothesis that the mean glucose for those who are diabetic is the same as those who are not diabetic (t = -7.39, df = 121.76, p-value < 2.081e-11). The average glucose for those who are diabetic is 145.06 and for those who are not diabetic is 113.11. The 95% confidence interval for the difference in glucose between the two diabetic groups is (-40.52, -23.38).

One thing to remember about the t.test() function is that it assumes the variances are different by default. The argument var.equal=T can be used to accommodate the scenario of homogeneous variances.

 

(The unequal variances formula is known as Satterthwaite's formula—the degrees of freedom are approximated in the case of unequal variances.)

cf.    http://apcentral.collegeboard.com/apc/public/repository/ap05_stats_allwood_fin4prod.pdf

 

> t.test(glu ~ type, var.equal=T)

In other words, we need to determine if the different groups share the same variance. As we did in the normality checking, we can collect information from summary statistics, plots and formal test and then make our final judgment call.

 

Are the variances of the plasma glucose concentration the same between diabetic individuals and non-diabetic individuals? Use the summary statistics and plots to support your argument.

Comparison of Variance

R provides the var.test() function for testing the assumption that the variances are the same, this is done by testing to see if the ratio of the variances is equal to 1. The test of variances is called the same way as t.test:

 > var.test(glu ~ type)

 

        F test to compare two variances

data:  glu by type

F = 0.7821, num df = 131, denom df = 67, p-value = 0.2336

alternative hypothesis:true ratio of variances is not equal to 1

95 percent confidence interval:

 0.5069535 1.1724351

sample estimates:

ratio of variances

         0.7821009

We fail to reject the null hypothesis that the variance in glucose is equal to the variance in glucose for non-diabetics (F131,67 = 0.7821, p-value = 0.2336). The ratio of the variances is estimated to be 0.78 with a 95% confidence interval of (0.51, 1.17).

So here for our t-test we would use the var.equal=T option.

Two-Sample t Test in R: Independent Groups (R Tutorial 4.2) MarinStatsLectures [Contents]

alternative accessible content

Non-parametric Wilcoxon Test

To perform a nonparametric equivalent of a 2 independent sample t-test we use the Wilcoxon rank sum test. To perform this test in R we need to put the formula argument into the wilcox.test function or provide two vectors for the test. The script below shows one example:

> wilcox.test(glu ~ type)

 

        Wilcoxon rank sum test with continuity correction

data:  glu by type

W = 1894, p-value = 2.240e-11

alternative hypothesis: true location shift is not equal to 0

 

> wilcox.test(glu[type=="Yes"],glu[type=="No"])   # alternative way to call the test

We reject the null hypothesis that the median glucose for those who are diabetic is equal to the median glucose for those who are not diabetic (W = 1894, p-value = 2.24e-11).

Wilcoxon Signed Rank Test in R (R Tutorial 4.5) MarinStats Lectures [Contents]

alternative accessible content