Z-tests and T-tests


First we will discuss two-sample z-tests and t-tests. These tests are used when the outcome is continuous and the exposure, or predictor, is binary. Z-tests are utilized when both groups you are comparing have a sample size of at least 30, while t-tests are used when one or both of the groups have fewer than 30 members. For example, we may use a two-sample z-test to determine if systolic blood pressure differs between men and women. Or, in a clinical trials setting, we may use a two-sample t-test to determine if viral load differs among people who are on the active treatment compared to the placebo or control treatment.

Here we discuss the comparison of means when the two comparison groups are independent or physically separate. The two groups might be determined by a particular attribute (e.g., sex, diagnosis of cardiovascular disease) or might be set up by the investigator (e.g., participants assigned to receive an experimental treatment or placebo). The first step in the analysis involves computing descriptive statistics on each of the two samples. Specifically, we compute the sample size, mean and standard deviation in each sample and we denote these summary statistics as follows:

n1, xbar (2).png1 and s1 for sample 1 and n2, xbar (2).png2 and s2 for sample 2.  

The designation of sample 1 and sample 2 is arbitrary. In a clinical trial setting the convention is to call the treatment group 1 and the control group 2. However, when comparing men and women, for example, either group can be 1 or 2.  

In the two independent samples application with a continuous outcome, the parameter of interest in the test of hypothesis is the difference in population means, μ12. The null hypothesis is always that there is no difference between groups with respect to means, i.e.,

H0: μ1 - μ2 = 0.  

The null hypothesis can also be written as follows: H0: μ1 = μ2. In the research hypothesis, an investigator can hypothesize that the first mean is larger than the second (H1: μ1 > μ2 ), that the first mean is smaller than the second (H1: μ1 < μ2 ), or that the means are different (H1: μ1 ≠ μ2 ). The three different alternatives represent upper-, lower-, and two-tailed tests, respectively. The following test statistics are used to test these hypotheses.

 

Test Statistics for Testing H0: μ1 = μ2

if n1 > 30 and n2 > 30

if n1 < 30 or n2 < 30

picture47.png

picture48.png

where df =n1+n2-2.

NOTE: The formulas above assume equal variability in the two populations (i.e., the population variances are equal, or s12 = s22). This means that the outcome is equally variable in each of the comparison populations. For analysis, we have samples from each of the comparison populations. If the sample variances are similar, then the assumption about variability in the populations is probably reasonable. As a guideline, if the ratio of the sample variances, s12/s22 is between 0.5 and 2 (i.e., if one variance is no more than double the other), then the formulas above are appropriate. If the ratio of the sample variances is greater than 2 or less than 0.5 then alternative formulas must be used to account for the heterogeneity in variances.    

The test statistics include Sp, which is the pooled estimate of the common standard deviation (again assuming that the variances in the populations are similar) computed as the weighted average of the standard deviations in the samples as follows:

picture49.png  

Because we are assuming equal variances between groups, we pool the information on variability (sample variances) to generate an estimate of the variability in the population.     (Note: Because Sp is a weighted average of the standard deviations in the sample, Sp will always be in between s1 and s2.)

Example: 

Data measured on n=3,539 participants who attended the seventh examination of the Offspring in the Framingham Heart Study are shown below.  

 

Men

Women

Characteristic

n

xbar (2).png

S

n

xbar (2).png

s

Systolic Blood Pressure

1,623

128.2

17.5

1,911

126.5

20.1

Diastolic Blood Pressure

1,622

75.6

9.8

1,910

72.6

9.7

Total Serum Cholesterol

1,544

192.4

35.2

1,766

207.1

36.7

Weight

1,612

194.0

33.8

1,894

157.7

34.6

Height

1,545

68.9

2.7

1,781

63.4

2.5

Body Mass Index

1,545

28.8

4.6

1,781

27.6

5.9

Suppose we now wish to assess whether there is a statistically significant difference in mean systolic blood pressures between men and women using a 5% level of significance.  

H0: μ1 = μ2 H1: μ1 ≠ μ2                       α=0.05

Because both samples are large (> 30), we can use the Z test statistic as opposed to t. Note that statistical computing packages use t throughout. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The guideline suggests investigating the ratio of the sample variances, s12/s22. Suppose we call the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted when interpreting the results. The ratio of the sample variances is 17.52/20.12 = 0.76, which falls between 0.5 and 2 suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is

picture52.png .  

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H0 if Z < -1.960 or is Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2.   Before substituting, we will first compute Sp, the pooled estimate of the common standard deviation.

Picture53.png

Notice that the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer in value to the standard deviation in the women (20.1) as there were slightly more women in the sample.   Recall, Sp is a weight average of the standard deviations in the comparison groups, weighted by the respective sample sizes.  

Now the test statistic:

  picture56.png

We reject H0 because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to show that there is a difference in mean systolic blood pressures between men and women. The p-value is p < 0.010.

Notice that there is a very small difference in the sample means (128.2-126.5 = 1.7 units), but this difference is beyond what would be expected by chance. Is this a clinically meaningful difference? The large sample size in this example is driving the statistical significance. A 95% confidence interval for the difference in mean systolic blood pressures is: 1.7 ± 1.25 or (0.45, 2.95).

You may be wondering how we calculated the above confidence interval. Recall that the standard form of a 95% confidence interval is:

 

Here, is equal to the difference in means, which is 1.7 and the standard error of is the denominator of the z-statistic we calculated above (0.64). Since 1.96 x 0.64 = 1.25, this means that the 95% confidence interval for the difference of means in this case is 1.7 ± 1.25.

Keep in mind that the confidence interval provides an assessment of the magnitude of the difference between means whereas the test of hypothesis and p-value provide an assessment of the statistical significance of the difference.

Example:

A new drug is proposed to lower total cholesterol. A randomized controlled trial is designed to evaluate the efficacy of the medication in lowering cholesterol. Thirty participants are enrolled in the trial and are randomly assigned to receive either the new drug or a placebo. The participants do not know which treatment they are assigned. Each participant is asked to take the assigned treatment for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows.

Parameters of a Hypothetical Drug Trial

Treatment

Sample Size

Mean

Standard Deviation

New Drug

15

195.9

28.7

Placebo

15

217.4

30.3

 

Is there statistical evidence of a reduction in mean total cholesterol in patients taking the new drug for 6 weeks as compared to participants taking placebo? We will run the test using the five-step approach.

H0: μ1 = μ2 H1: μ1 < μ2                         α=0.05

Because both samples are small (< 30), we use the t test statistic. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The ratio of the sample variances, s12/s22 =28.72/30.32 = 0.90, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is:

picture57.png .  

This is a lower-tailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table (in More Resources to the right). In order to determine the critical value of t we need degrees of freedom, df, defined as df=n1+n2-2 = 15+15-2=28. The critical value for a lower tailed test with df=28 and α=0.05 is -2.048 and the decision rule is: Reject H0 if t < -2.048.

We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common standard deviation.

picture58.png

Now the test statistic,

picture61.png

We reject H0 because -2.92 < -2.048. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower in patients taking the new drug for 6 weeks as compared to patients taking placebo, p < 0.005.

Performing a Two-sample T-test in R

Using the Framingham Heart Study example above, we will demonstrate how to perform a two-sample t-test in R. Recall that, regardless of sample size, statistical computing packages, like R, always perform two-sample t-tests and not two-sample z-tests. Assume that we have read in the data and that the systolic blood pressure data is stored in a variable named "SBP7", while the sex of each individual is stored in a variable named "SEX.". Then, the R code to perform a two-sample t-test and its resulting output is:

 

> t.test(SBP7~SEX, alternative="two.sided", var.equal=T)

Two Sample t-test

 

data: SBP7 by SEX

t = 2.7551, df = 3532, p-value = 0.005898

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

0.507767 3.013994

sample estimates:

mean in group 1 mean in group 2

128.2120 126.4511

Note that the t-statistic is a bit different from what we calculated by hand, but this difference is only due to round-off error. Because we were performing a two-tailed test here, we used the option alternative="two.sided". If we were instead performing a test with an upper or lower alternative hypothesis, we would specify "greater" or "less", , respectively, following the equals sign after alternative. If you find that the ratio of the standard deviation in the two groups is outside the range of [0.5, 2.0], you can change the var.equal=T option to be var.equal=F to accommodate the unequal variances between the two groups. From the output, we see that the 95% confidence interval excludes the null value of 0 (indicating no difference in mean SBP between men and women) and we also notice that the p-value is less than 0.05, so we reject the null hypothesis and conclude that there is a difference in mean SBP between men and women.

On average, men's SBP is higher than women's by 1.7 mm Hg.

The video below illustrates how to perform a two-sample t-test using R. The data set for the illustrated exercise is LungCapData.txt.

alternative accessible content