1
00:00:00,000 --> 00:00:00,166
2
00:00:00,166 --> 00:00:11,066
Let's look at a test comparing two independent
samples and a continuous outcome.
3
00:00:11,066 --> 00:00:21,966
For example, this could apply to a randomized
clinical trial in which subjects are randomly
assigned to get a new drug or a placebo or a new
4
00:00:21,966 --> 00:00:28,499
drug and the currently used drug and then you
measure sample sizes and sample means and
sample standard deveiation in each group.
5
00:00:28,500 --> 00:00:35,733
You might also see this situation in a cohort study
comparing two groups, e.g., men versus women,
6
00:00:35,733 --> 00:00:42,966
or whatever two exposure groups are being
compared within the cohort.
7
00:00:42,966 --> 00:00:53,466
We have a continuous outcome and the parameter
of interest is the difference in sample means, mu1
versus mu2.
8
00:00:53,466 --> 00:01:06,032
The null hypothesis is that there is no difference,
and this could be stated as mu1-mu2=0 or as
mu1=mu2.
9
00:01:06,033 --> 00:01:14,433
10
00:01:14,433 --> 00:01:22,833
The alternative hypothesis can be a) that the mean
in the group 1 is smaller, b) that it is larger, or c) the
groups are different without specifying direction.
11
00:01:22,833 --> 00:01:35,799
Most of these tests are done as two-sided tests,
where the alternative is that the two are different.
12
00:01:35,800 --> 00:01:38,966
There are two test statistics,
13
00:01:38,966 --> 00:01:51,199
a Z statistic for large samples and a t-statistic for
small samples. For the large sample test both
samples have to be 30 or greater.
14
00:01:51,200 --> 00:01:59,400
The t-statistic is used if either sample is less than
30.
15
00:01:59,400 --> 00:02:10,433
Both test statistics use the pooled estimate of the
common standard deviation, Sp.
16
00:02:10,433 --> 00:02:17,833
It is the weighted average of the two standard
deviations of the comparison groups.
17
00:02:17,833 --> 00:02:27,566
Here is an example of a clinical trial where we want
to assess the effectiveness of a new drug for
lowering cholesterol. Patients are randomly
18
00:02:27,566 --> 00:02:34,066
assigned to receive either the new drug or a
placebo, and we measure cholesterol after 6
weeks of treatment.
19
00:02:34,066 --> 00:02:41,332
We are specifically asking if there is evidence of a
reduction in cholesterol.
20
00:02:41,333 --> 00:02:48,599
It is important to set up the hypotheses before
looking at the data, based on the question being
asked. Here we ask if the new drug reduces
21
00:02:48,600 --> 00:03:00,666
cholesterol.
22
00:03:00,666 --> 00:03:09,699
Here is the sample data with 15 subjects in each
group.
23
00:03:09,700 --> 00:03:13,566
24
00:03:13,566 --> 00:03:21,332
We will first set up the hypotheses, and the null
hypothesis is that there is no difference in means.
The alternative is that the mean cholesterol in the
25
00:03:21,333 --> 00:03:29,433
treated group is less than that of the placebo
group. The new drug is group 1, and the placebo
group is group 2.
26
00:03:29,433 --> 00:03:37,566
Because the sample is small, we will use the t-
statistic.
27
00:03:37,566 --> 00:03:45,332
We need a critical value from the table of t-
statistics. So, we find our alpha level, 0.05, on the
one-sided alpha row at the top of the table and
28
00:03:45,333 --> 00:03:53,099
then read down to 28 degrees of freedom, and the
critical value is 2.048.
29
00:03:53,100 --> 00:04:00,200
Since this is a lower-tailed test, we reject H0 if the
t-statistic is less than minus 2.048.
30
00:04:00,200 --> 00:04:00,300
31
00:04:00,300 --> 00:04:09,866
The test statistics that we are using for this
procedure assume that the population variances
are equal. To test that assumption we look at the
32
00:04:09,866 --> 00:04:19,432
ratio of the sample variances, in this case 28.7
squared over 30.3 squared, which is 0.90.
33
00:04:19,433 --> 00:04:31,133
If the ratio of the sample variances is between 0.5
and 2, then the assumption is reasonable. So we
are fine in this case.
34
00:04:31,133 --> 00:04:38,166
We compute Sp. Square root of the followiing: (first
sample-1) times its variance plus (second sample
size minus 1) times its variance, then divide those
35
00:04:38,166 --> 00:04:45,199
sums by the sum of the sample size minus 2.
Remember we were given standard deviations, so
we need to square them to get the variances.
36
00:04:45,200 --> 00:04:59,000
Here we get a pooled estimate of the standard
deviation of 29.5, which is exactly between 28.7
and 30.3 because the sample sizes are equal.
37
00:04:59,000 --> 00:05:04,700
Next we calculate the t-statistic, which is the
difference in means divided by Sp times the
square root of the sum of the reciprocals of the
38
00:05:04,700 --> 00:05:10,433
sample sizes.
39
00:05:10,433 --> 00:05:21,666
Here, t equals minus 2.92, so we reject H0,
because that falls below our critical value of
negative 2.048. We have statistically significant
40
00:05:21,666 --> 00:05:29,166
evidence at alpha equal to 0.05 that the mean
cholesterol is lower in patients treated with the new
drug compared to placebo.
41
00:05:29,166 --> 00:05:42,066
We can use the table of t-statistics to approximate
the p-value, which is the smallest level of
significance that would still allow us to reject H0.
42
00:05:42,066 --> 00:05:50,666
So, from the table, I could choose p as small as
0.005 for a one-sided test and still reject H0.
43
00:05:50,666 --> 00:06:02,866
Here's a different example of how to do this in
Excel. The data for each group are entered in
columns.
44
00:06:02,866 --> 00:06:10,999
45
00:06:11,000 --> 00:06:13,666
46
00:06:13,666 --> 00:06:19,366
Then, if you go to the Tools menu and choose the
data analysis tool pack,
47
00:06:19,366 --> 00:06:27,866
there is an analysis tool called t-test two sample,
assuming equal variances. I select that,
48
00:06:27,866 --> 00:06:39,066
and a dialogue box opens asking where are the
data for group 1, and I specify A1 throught A10,
and where are the data for group 2, and I specify
49
00:06:39,066 --> 00:06:47,499
B1 through B10, and because I included row 1, I
check off the "Labels" box.
50
00:06:47,500 --> 00:06:55,933
Excel defaults to an alpha level of 0.05, so I will
leave that,
51
00:06:55,933 --> 00:07:09,466
and then it asks where I want to put the output. I will
place it on this same sheet with the top left corner
of the output at cell D1.
52
00:07:09,466 --> 00:07:13,066
When I click "Ok", this is the result that I get.
53
00:07:13,066 --> 00:07:20,032
Excel gives me the means, variances, and
samples sizes for both groups.
54
00:07:20,033 --> 00:07:26,999
It also calculates Sp, indicated as "pooled
variances".
55
00:07:27,000 --> 00:07:30,166
It gives me the t-statistic
56
00:07:30,166 --> 00:07:39,566
which is negative 1.84 for this comparison. And
then it gives me a p-value for a one-tailed test and
a p-value for a two-tailed test.
57
00:07:39,566 --> 00:07:46,699
So if my H1 was mu 1 not equat to mu 2, a two-
tailed alternative,
58
00:07:46,700 --> 00:07:53,866
I would use the p-values of 0.08, and here I would not reject H0, because I don't have sufficiently strong evidence that the groups are different \
59
00:07:53,866 --> 00:08:01,932
at an alpha level of p less than or equal to 0.05.