# Confidence Intervals for Matched Samples, Continuous Outcome

The previous section dealt with confidence intervals for the difference in means between two independent groups. There is an alternative study design in which two comparison groups are dependent, matched or paired. Consider the following scenarios:

• A single sample of participants and each participant is measured twice, once before and then after an intervention.
• A single sample of participants and each participant is measured twice under two different experimental conditions (e.g., in a crossover trial ).

A goal of these studies might be to compare the mean scores measured before and after the intervention, or to compare the mean scores obtained with the two conditions in a crossover study.

Yet another scenario is one in which matched samples are used. For example, we might be interested in the difference in an outcome between twins or between siblings.

Once again we have two samples, and the goal is to compare the two means. However, the samples are related or dependent. In the first scenario, before and after measurements are taken in the same individual. In the last scenario, measures are taken in pairs of individuals from the same family. When the samples are dependent, we cannot use the techniques in the previous section to compare means. Because the samples are dependent, statistical techniques that account for the dependency must be used. These techniques focus on difference scores (i.e., each individual's difference in measures before and after the intervention, or the difference in measures between twins or sibling pairs).

## The Unit of Analysis

This distinction between independent and dependent samples emphasizes the importance of appropriately identifying the unit of analysis, i.e., the independent entities in a study.

• In the one sample and two independent samples applications participants are the units of analysis.
• However, with two dependent samples application,the pair is the unit (and not the number of measurements which is twice the number of units).

The parameter of interest is the mean difference, μd. Again, the first step is to compute descriptive statistics. We compute the sample size (which in this case is the number of distinct participants or distinct pairs), the mean and standard deviation of the difference scores, and we denote these summary statistics as n,  d and sd, respectively. The appropriate formula for the confidence interval for the mean difference depends on the sample size. The formulas are shown in Table 6.5 and are identical to those we presented for estimating the mean of a single sample, except here we focus on difference scores.

### Computing the Confidence Intervals for μd

• If n > 30  Use Z table for standard normal distribution

•  f n < 30  Use t-table with df=n-1

When samples are matched or paired, difference scores are computed for each participant or between members of a matched pair, and "n" is the number of participants or pairs,  is the mean of the difference scores, and Sd is the standard deviation of the difference scores

Example:

In the Framingham Offspring Study, participants attend clinical examinations approximately every four years. Suppose we want to compare systolic blood pressures between examinations (i.e., changes over 4 years). The data below are systolic blood pressures measured at the sixth and seventh examinations in a subsample of n=15 randomly selected participants. Since the data in the two samples (examination 6 and 7) are matched, we compute difference scores by subtracting the blood pressure measured at examination 7 from that measured at examination 6 or vice versa. [If we subtract the blood pressure measured at examination 6 from that measured at examination 7, then positive differences represent increases over time and negative differences represent decreases over time.]

 Subject # Examination 6 Examination 7 Difference 1 168 141 -27 2 111 119 8 3 139 122 -17 4 127 127 0 5 155 125 -30 6 115 123 8 7 125 113 -12 8 123 106 -17 9 130 131 1 10 137 142 5 11 130 131 1 12 129 135 6 13 112 119 7 14 141 130 -11 15 122 121 -1

Notice that several participants' systolic blood pressures decreased over 4 years (e.g., participant #1's blood pressure decreased by 27 units from 168 to 141), while others increased (e.g., participant #2's blood pressure increased by 8 units from 111 to 119). We now estimate the mean difference in blood pressures over 4 years. This is similar to a one sample problem with a continuous outcome except that we are now using the difference scores. In this sample, we have n=15, the mean difference score = -5.3 and sd = 12.8, respectively. The calculations are shown below

 Subject # Difference  Difference - Mean Difference  (Difference - Mean Difference)2  1 -27 -21.7 470.89 2 8 13.3 176.89 3 -17 -11.7 136.89 4 0 5.3 28.09 5 -30 -24.7 610.09 6 8 13.3 176.89 7 -12 -6.7 44.89 8 -17 -11.7 136.89 9 1 6.3 39.69 10 5 10.3 106.09 11 1 6.3 39.69 12 6 11.3 127.69 13 7 12.3 151.29 14 -11 -5.7 32.49 15 -1 4.3 18.49 ∑ = -79.0 ∑ = 0 ∑ =2296.95

Therefore,  and  We can now use these descriptive statistics to compute a 95% confidence interval for the mean difference in systolic blood pressures in the population. Because the sample size is small (n=15), we use the formula that employs the t-statistic. The degrees of freedom are df=n-1=14. From the table of t-scores (see Other Resource on the right), t = 2.145. We can now substitute the descriptive statistics on the difference scores and the t value for 95% confidence as follows:    So, the 95% confidence interval for the difference is (-12.4, 1.8).

Interpretation:

We are 95% confident that the mean difference in systolic blood pressures between examinations 6 and 7 (approximately 4 years apart) is between -12.4 and 1.8. The null (or no effect) value of the CI for the mean difference is zero.   Therefore, based on the 95% confidence interval we can conclude that there is no statistically significant difference in blood pressures over time, because the confidence interval for the mean difference includes zero.

## Crossover Trials

Crossover trials are a special type of randomized trial in which each subject receives both of the two treatments (e.g., an experimental treatment and a control treatment). Participants are usually randomly assigned to receive their first treatment and then the other treatment. In many cases there is a "wash-out period " between the two treatments. Outcomes are measured after each treatment in each participant. [An example of a crossover trial with a wash-out period can be seen in a study by Pincus et al. in which the investigators compared responses to analgesics in patients with osteoarthritis of the knee or hip.] A major advantage to the crossover trial is that each participant acts as his or her own control, and, therefore, fewer participants are generally required to demonstrate an effect. When the outcome is continuous, the assessment of a treatment effect in a crossover trial is performed using the techniques described here.

Example:

A crossover trial is conducted to evaluate the effectiveness of a new drug designed to reduce symptoms of depression in adults over 65 years of age following a stroke. Symptoms of depression are measured on a scale of 0-100 with higher scores indicative of more frequent and severe symptoms of depression. Patients who suffered a stroke were eligible for the trial. The trial was run as a crossover trial in which each patient received both the new drug and a placebo. Patients were blind to the treatment assignment and the order of treatments (e.g., placebo and then new drug or new drug and then placebo) were randomly assigned. After each treatment, depressive symptoms were measured in each patient. The difference in depressive symptoms was measured in each patient by subtracting the depressive symptom score after taking the placebo from the depressive symptom score after taking the new drug. A total of 100 participants completed the trial and the data are summarized below.

 n Mean Difference Std. Dev. Difference Depressive Symptoms After New Drug - Symptoms After Placebo 100 -12.7 8.9

The mean difference in the sample is -12.7, meaning on average patients scored 12.7 points lower on the depressive symptoms scale after taking the new drug as compared to placebo (i.e., improved by 12.7 points on average). What would be the 95% confidence interval for the mean difference in the population? Since the sample size is large, we can use the formula that employs the Z-score.  Substituting the current values we get  So, the 95% confidence interval is (-14.1, -10.7).

Interpretation: We are 95% confident that the mean improvement in depressive symptoms after taking the new drug as compared to placebo is between 10.7 and 14.1 units (or alternatively the depressive symptoms scores are 10.7 to 14.1 units lower after taking the new drug as compared to placebo). Because we computed the differences by subtracting the scores after taking the placebo from the scores after taking the new drug and because higher scores are indicative of worse or more severe depressive symptoms, negative differences reflect improvement (i.e., lower depressive symptoms scores after taking the new drug as compared to placebo). Because the 95% confidence interval for the mean difference does not include zero, we can conclude that there is a statistically significant difference (in this case a significant improvement) in depressive symptom scores after taking the new drug as compared to placebo.