Ztests and Ttests
First we will discuss twosample ztests and ttests. These tests are used when the outcome is continuous and the exposure, or predictor, is binary. Ztests are utilized when both groups you are comparing have a sample size of at least 30, while ttests are used when one or both of the groups have fewer than 30 members. For example, we may use a twosample ztest to determine if systolic blood pressure differs between men and women. Or, in a clinical trials setting, we may use a twosample ttest to determine if viral load differs among people who are on the active treatment compared to the placebo or control treatment.
Here we discuss the comparison of means when the two comparison groups are independent or physically separate. The two groups might be determined by a particular attribute (e.g., sex, diagnosis of cardiovascular disease) or might be set up by the investigator (e.g., participants assigned to receive an experimental treatment or placebo). The first step in the analysis involves computing descriptive statistics on each of the two samples. Specifically, we compute the sample size, mean and standard deviation in each sample and we denote these summary statistics as follows:
n_{1}, _{1} and s_{1} for sample 1 and n_{2}, _{2} and s_{2} for sample 2.
The designation of sample 1 and sample 2 is arbitrary. In a clinical trial setting the convention is to call the treatment group 1 and the control group 2. However, when comparing men and women, for example, either group can be 1 or 2.
In the two independent samples application with a continuous outcome, the parameter of interest in the test of hypothesis is the difference in population means, μ_{1}μ_{2}. The null hypothesis is always that there is no difference between groups with respect to means, i.e.,
H_{0}: μ_{1}  μ_{2} = 0.
The null hypothesis can also be written as follows: H_{0}: μ_{1} = μ_{2}. In the research hypothesis, an investigator can hypothesize that the first mean is larger than the second (H_{1}: μ_{1} > μ_{2} ), that the first mean is smaller than the second (H_{1}: μ_{1} < μ_{2} ), or that the means are different (H_{1}: μ_{1} ≠ μ_{2} ). The three different alternatives represent upper, lower, and twotailed tests, respectively. The following test statistics are used to test these hypotheses.
if n_{1} > 30 and n_{2} > 30 
if n_{1} < 30 or n_{2} < 30 
where df =n_{1}+n_{2}2. 
NOTE: The formulas above assume equal variability in the two populations (i.e., the population variances are equal, or s_{1}^{2} = s_{2}^{2}). This means that the outcome is equally variable in each of the comparison populations. For analysis, we have samples from each of the comparison populations. If the sample variances are similar, then the assumption about variability in the populations is probably reasonable. As a guideline, if the ratio of the sample variances, s_{1}^{2}/s_{2}^{2} is between 0.5 and 2 (i.e., if one variance is no more than double the other), then the formulas above are appropriate. If the ratio of the sample variances is greater than 2 or less than 0.5 then alternative formulas must be used to account for the heterogeneity in variances.
The test statistics include Sp, which is the pooled estimate of the common standard deviation (again assuming that the variances in the populations are similar) computed as the weighted average of the standard deviations in the samples as follows:
Because we are assuming equal variances between groups, we pool the information on variability (sample variances) to generate an estimate of the variability in the population. (Note: Because Sp is a weighted average of the standard deviations in the sample, Sp will always be in between s_{1} and s_{2}.)
Example:
Data measured on n=3,539 participants who attended the seventh examination of the Offspring in the Framingham Heart Study are shown below.

Men 
Women 

Characteristic 
n 
S 
n 
s 

Systolic Blood Pressure 
1,623 
128.2 
17.5 
1,911 
126.5 
20.1 
Diastolic Blood Pressure 
1,622 
75.6 
9.8 
1,910 
72.6 
9.7 
Total Serum Cholesterol 
1,544 
192.4 
35.2 
1,766 
207.1 
36.7 
Weight 
1,612 
194.0 
33.8 
1,894 
157.7 
34.6 
Height 
1,545 
68.9 
2.7 
1,781 
63.4 
2.5 
Body Mass Index 
1,545 
28.8 
4.6 
1,781 
27.6 
5.9 
Suppose we now wish to assess whether there is a statistically significant difference in mean systolic blood pressures between men and women using a 5% level of significance.
 Step 1: Set up hypotheses and determine level of significance
H_{0}: μ_{1} = μ_{2} H_{1}: μ_{1} ≠ μ_{2} α=0.05
 Step 2: Select the appropriate test statistic.
Because both samples are large (> 30), we can use the Z test statistic as opposed to t. Note that statistical computing packages use t throughout. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The guideline suggests investigating the ratio of the sample variances, s_{1}^{2}/s_{2}^{2}. Suppose we call the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted when interpreting the results. The ratio of the sample variances is 17.5^{2}/20.1^{2} = 0.76, which falls between 0.5 and 2 suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is
.
 Step 3:. Set up decision rule.
This is a twotailed test, using a Z statistic and a 5% level of significance. Reject H_{0} if Z < 1.960 or is Z > 1.960.
 Step 4: Compute the test statistic.
We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common standard deviation.
Notice that the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer in value to the standard deviation in the women (20.1) as there were slightly more women in the sample. Recall, Sp is a weight average of the standard deviations in the comparison groups, weighted by the respective sample sizes.
Now the test statistic:
 Step 5: Conclusion.
We reject H_{0} because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to show that there is a difference in mean systolic blood pressures between men and women. The pvalue is p < 0.010.
Notice that there is a very small difference in the sample means (128.2126.5 = 1.7 units), but this difference is beyond what would be expected by chance. Is this a clinically meaningful difference? The large sample size in this example is driving the statistical significance. A 95% confidence interval for the difference in mean systolic blood pressures is: 1.7 ± 1.25 or (0.45, 2.95).
You may be wondering how we calculated the above confidence interval. Recall that the standard form of a 95% confidence interval is:
Here, is equal to the difference in means, which is 1.7 and the standard error of is the denominator of the zstatistic we calculated above (0.64). Since 1.96 x 0.64 = 1.25, this means that the 95% confidence interval for the difference of means in this case is 1.7 ± 1.25.
Keep in mind that the confidence interval provides an assessment of the magnitude of the difference between means whereas the test of hypothesis and pvalue provide an assessment of the statistical significance of the difference.
Example:
A new drug is proposed to lower total cholesterol. A randomized controlled trial is designed to evaluate the efficacy of the medication in lowering cholesterol. Thirty participants are enrolled in the trial and are randomly assigned to receive either the new drug or a placebo. The participants do not know which treatment they are assigned. Each participant is asked to take the assigned treatment for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows.
Treatment 
Sample Size 
Mean 
Standard Deviation 

New Drug 
15 
195.9 
28.7 
Placebo 
15 
217.4 
30.3 
Is there statistical evidence of a reduction in mean total cholesterol in patients taking the new drug for 6 weeks as compared to participants taking placebo? We will run the test using the fivestep approach.
 Step 1: Set up hypotheses and determine level of significance
H_{0}: μ_{1} = μ_{2} H_{1}: μ_{1} < μ_{2} α=0.05
 Step 2: Select the appropriate test statistic.
Because both samples are small (< 30), we use the t test statistic. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The ratio of the sample variances, s_{1}^{2}/s_{2}^{2} =28.7^{2}/30.3^{2} = 0.90, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is:
.
 Step 3: Set up decision rule.
This is a lowertailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table (in More Resources to the right). In order to determine the critical value of t we need degrees of freedom, df, defined as df=n_{1}+n_{2}2 = 15+152=28. The critical value for a lower tailed test with df=28 and α=0.05 is 2.048 and the decision rule is: Reject H_{0} if t < 2.048.
 Step 4: Compute the test statistic.
We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common standard deviation.
Now the test statistic,
 Step 5: Conclusion.
We reject H_{0} because 2.92 < 2.048. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower in patients taking the new drug for 6 weeks as compared to patients taking placebo, p < 0.005.
Performing a Twosample Ttest in R
Using the Framingham Heart Study example above, we will demonstrate how to perform a twosample ttest in R. Recall that, regardless of sample size, statistical computing packages, like R, always perform twosample ttests and not twosample ztests. Assume that we have read in the data and that the systolic blood pressure data is stored in a variable named "SBP7", while the sex of each individual is stored in a variable named "SEX.". Then, the R code to perform a twosample ttest and its resulting output is:
> t.test(SBP7~SEX, alternative="two.sided", var.equal=T)
Two Sample ttest
data: SBP7 by SEX
t = 2.7551, df = 3532, pvalue = 0.005898
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.507767 3.013994
sample estimates:
mean in group 1 mean in group 2
128.2120 126.4511
Note that the tstatistic is a bit different from what we calculated by hand, but this difference is only due to roundoff error. Because we were performing a twotailed test here, we used the option alternative="two.sided". If we were instead performing a test with an upper or lower alternative hypothesis, we would specify "greater" or "less", , respectively, following the equals sign after alternative. If you find that the ratio of the standard deviation in the two groups is outside the range of [0.5, 2.0], you can change the var.equal=T option to be var.equal=F to accommodate the unequal variances between the two groups. From the output, we see that the 95% confidence interval excludes the null value of 0 (indicating no difference in mean SBP between men and women) and we also notice that the pvalue is less than 0.05, so we reject the null hypothesis and conclude that there is a difference in mean SBP between men and women.
On average, men's SBP is higher than women's by 1.7 mm Hg.
The video below illustrates how to perform a twosample ttest using R. The data set for the illustrated exercise is LungCapData.txt.