2.2 t-tests for means of measurement outcomes


2.2.1 The one-sample t-test for a mean

The one-sample t-test compares the mean from one sample to some hypothesized value. The t.test( ) function performs a one-sample t-test. For input, we need to specify the variable (vector) that we want to test, and the hypothesized mean value. To test whether the mean age at walking is equal to 12 months for the infants in our age of first walking example:

> t.test(agewalk,mu=12)

One Sample t-test

data: agewalk

t = -4.529, df = 49, p-value = 3.806e-05

alternative hypothesis: true mean is not equal to 12

95 percent confidence interval:

10.74397 11.51603

sample estimates:

mean of x

11.13

The t.test()function can be used to conduct several types of t-tests, and it's a good idea to check the title in the output ('One Sample t-test) and the degrees of freedom (n-1 for a one-sample t-test) to be sure R is performing a one-sample t-test.

R performs a two-tailed test, as indicated by the two-tailed language in the alternative hypothesis. The p-value here is given in scientific notation, and the 'e-05' indicates that the decimal place should be moved 5 spaces to the left; 3.806e-05 is scientific notation for 0.00003806, which would generally be reported as 'p<.001'. R also gives the 95% confidence interval for the mean; if there is no significant difference between the sample mean and the hypothesized value (i.e., if the p-value is greater than .05), the confidence interval for the mean will contain the hypothesized value. If there is a significant difference between the sample mean and the hypothesized mean, the confidence interval will not contain the hypothesized value.

Note that the t.test( ) function does give the mean, but does not give the standard deviation or sample size which are usually reported along with a mean (although, for a one sample test, sample size can be determined from the degrees freedom which are given). This information can be obtained using the sd( ) function and the length( ) function (sd(agewalk) and length(agewalk) for this example – although care is needed with the length( ) command when there are missing values.

2.2.2 The independent samples t-test to compare two means

The t.test( ) function can also be used to perform an independent samples t-test comparing means from two independent samples. For the following syntax, the underlying data set includes the subjects from both samples, with one variable indicating the dependent variable (the outcome variable) and another variable indicating which group a subject is in. To perform the independent samples t-test, we need to specify the object representing the dependent variable and the object representing the group information. For the usual pooled-variance version of the t-test:

> t.test(agewalk ~ group,var.equal=TRUE)

Two Sample t-test

data: agewalk by group

t = -3.1812, df = 48, p-value = 0.002571

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.9331253 -0.4358587

sample estimates:

mean in group 1 mean in group 2

10.72727 11.91176

The t.test( ) function can be used to conduct several types of t-tests, with several different data set ups, and it's a good idea to check the title in the output ('Two Sample t-test) and the degrees of freedom (n1 + n2 – 2) to be sure R is performing the pooled-variance version of the two sample t-test.

R reports a two-tailed p-value, as indicated by the two-tailed phrasing of the alternative hypothesis. The 95% confidence interval that is given is for the difference in the means for the two groups (10.73 – 11.91 gives a difference in means of -1.18, and the CI that R gives is a CI for this difference in means). Note that the output gives the means for each of the two groups being compared, but not the standard deviations or sample sizes. This additional information can be obtained using the tapply( ) function as described in Section 7 (in this example, tapply(agewalk,group,sd) will give standard deviations, table(group) will give n's).

To perform an independent sample t-test using the unequal variance version of the t-test:

> t.test(agewalk ~ group,var.equal=FALSE)

Welch Two Sample t-test

data: agewalk by group

t = -3.1434, df = 31.39, p-value = 0.003635

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.9526304 -0.4163536

sample estimates:

mean in group 1 mean in group 2

10.72727 11.91176

Again, it's good to check the title (Welch Two Sample t-test) and degrees of freedom (which often take on decimal values for the unequal variance version of the t-test) to be sure R is performing the unequal variance version of the two sample t-test. As discussed above, standard deviations and sample sizes are also usually given as part of the summary for a two-sample t-test.

2.2.3 The paired samples t-test

The t.test( ) function can also be used to perform a paired-sample t-test. In this situation, we need to specify the two data vectors representing the two variables to be compared. The following example compares the means of a pre-test score (variable score1) and a post-test score (variable score2) from a sample of 5 subjects. The t.test( ) function does not give the means of the two underlying variables (it does give the mean difference) and so I used the mean( ) function to get this descriptive information. Generally standard deviations and sample size would also be reported, which can be obtained from the sd( ) and length( ) functions.

> mean(score1)

[1] 20.2

> mean(score2)

[1] 21

> t.test(score1,score2,paired=TRUE)

Paired t-test

data: score1 and score2

t = -0.4377, df = 4, p-value = 0.6842

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-5.874139 4.274139

sample estimates:

mean of the differences

-0.8

The t.test( ) function can be used for several different types of t-tests, and so it's a good idea to check the title (Paired t-test) and degrees of freedom (n-1, where n is the number of pairs in the study) to be sure R is performing a paired sample test.

The confidence interval here is the confidence interval for the mean difference; the confidence interval should agree with the p-value in that the CI should not contain 0 when p<0.05, and the CI should contain 0 when p>0.05.

Note that the t.test( ) procedure gives the mean difference, but does not give the standard deviations of the difference or the standard deviations of the two variables. Generally, standard deviations are reported as part of the data summary for a comparison of means, and these standard deviations can be found using the 'sd( )' command.