﻿1 00:00:00,000 --> 00:00:00,166 2 00:00:00,166 --> 00:00:11,066 Let's look at a test comparing two independent samples and a continuous outcome. 3 00:00:11,066 --> 00:00:21,966 For example, this could apply to a randomized clinical trial in which subjects are randomly assigned to get a new drug or a placebo or a new 4 00:00:21,966 --> 00:00:28,499 drug and the currently used drug and then you measure sample sizes and sample means and sample standard deveiation in each group. 5 00:00:28,500 --> 00:00:35,733 You might also see this situation in a cohort study comparing two groups, e.g., men versus women, 6 00:00:35,733 --> 00:00:42,966 or whatever two exposure groups are being compared within the cohort. 7 00:00:42,966 --> 00:00:53,466 We have a continuous outcome and the parameter of interest is the difference in sample means, mu1 versus mu2. 8 00:00:53,466 --> 00:01:06,032 The null hypothesis is that there is no difference, and this could be stated as mu1-mu2=0 or as mu1=mu2. 9 00:01:06,033 --> 00:01:14,433 10 00:01:14,433 --> 00:01:22,833 The alternative hypothesis can be a) that the mean in the group 1 is smaller, b) that it is larger, or c) the groups are different without specifying direction. 11 00:01:22,833 --> 00:01:35,799 Most of these tests are done as two-sided tests, where the alternative is that the two are different. 12 00:01:35,800 --> 00:01:38,966 There are two test statistics, 13 00:01:38,966 --> 00:01:51,199 a Z statistic for large samples and a t-statistic for small samples. For the large sample test both samples have to be 30 or greater. 14 00:01:51,200 --> 00:01:59,400 The t-statistic is used if either sample is less than 30. 15 00:01:59,400 --> 00:02:10,433 Both test statistics use the pooled estimate of the common standard deviation, Sp. 16 00:02:10,433 --> 00:02:17,833 It is the weighted average of the two standard deviations of the comparison groups. 17 00:02:17,833 --> 00:02:27,566 Here is an example of a clinical trial where we want to assess the effectiveness of a new drug for lowering cholesterol. Patients are randomly 18 00:02:27,566 --> 00:02:34,066 assigned to receive either the new drug or a placebo, and we measure cholesterol after 6 weeks of treatment. 19 00:02:34,066 --> 00:02:41,332 We are specifically asking if there is evidence of a reduction in cholesterol. 20 00:02:41,333 --> 00:02:48,599 It is important to set up the hypotheses before looking at the data, based on the question being asked. Here we ask if the new drug reduces 21 00:02:48,600 --> 00:03:00,666 cholesterol. 22 00:03:00,666 --> 00:03:09,699 Here is the sample data with 15 subjects in each group. 23 00:03:09,700 --> 00:03:13,566 24 00:03:13,566 --> 00:03:21,332 We will first set up the hypotheses, and the null hypothesis is that there is no difference in means. The alternative is that the mean cholesterol in the 25 00:03:21,333 --> 00:03:29,433 treated group is less than that of the placebo group. The new drug is group 1, and the placebo group is group 2. 26 00:03:29,433 --> 00:03:37,566 Because the sample is small, we will use the t- statistic. 27 00:03:37,566 --> 00:03:45,332 We need a critical value from the table of t- statistics. So, we find our alpha level, 0.05, on the one-sided alpha row at the top of the table and 28 00:03:45,333 --> 00:03:53,099 then read down to 28 degrees of freedom, and the critical value is 2.048. 29 00:03:53,100 --> 00:04:00,200 Since this is a lower-tailed test, we reject H0 if the t-statistic is less than minus 2.048. 30 00:04:00,200 --> 00:04:00,300 31 00:04:00,300 --> 00:04:09,866 The test statistics that we are using for this procedure assume that the population variances are equal. To test that assumption we look at the 32 00:04:09,866 --> 00:04:19,432 ratio of the sample variances, in this case 28.7 squared over 30.3 squared, which is 0.90. 33 00:04:19,433 --> 00:04:31,133 If the ratio of the sample variances is between 0.5 and 2, then the assumption is reasonable. So we are fine in this case. 34 00:04:31,133 --> 00:04:38,166 We compute Sp. Square root of the followiing: (first sample-1) times its variance plus (second sample size minus 1) times its variance, then divide those 35 00:04:38,166 --> 00:04:45,199 sums by the sum of the sample size minus 2. Remember we were given standard deviations, so we need to square them to get the variances. 36 00:04:45,200 --> 00:04:59,000 Here we get a pooled estimate of the standard deviation of 29.5, which is exactly between 28.7 and 30.3 because the sample sizes are equal. 37 00:04:59,000 --> 00:05:04,700 Next we calculate the t-statistic, which is the difference in means divided by Sp times the square root of the sum of the reciprocals of the 38 00:05:04,700 --> 00:05:10,433 sample sizes. 39 00:05:10,433 --> 00:05:21,666 Here, t equals minus 2.92, so we reject H0, because that falls below our critical value of negative 2.048. We have statistically significant 40 00:05:21,666 --> 00:05:29,166 evidence at alpha equal to 0.05 that the mean cholesterol is lower in patients treated with the new drug compared to placebo. 41 00:05:29,166 --> 00:05:42,066 We can use the table of t-statistics to approximate the p-value, which is the smallest level of significance that would still allow us to reject H0. 42 00:05:42,066 --> 00:05:50,666 So, from the table, I could choose p as small as 0.005 for a one-sided test and still reject H0. 43 00:05:50,666 --> 00:06:02,866 Here's a different example of how to do this in Excel. The data for each group are entered in columns. 44 00:06:02,866 --> 00:06:10,999 45 00:06:11,000 --> 00:06:13,666 46 00:06:13,666 --> 00:06:19,366 Then, if you go to the Tools menu and choose the data analysis tool pack, 47 00:06:19,366 --> 00:06:27,866 there is an analysis tool called t-test two sample, assuming equal variances. I select that, 48 00:06:27,866 --> 00:06:39,066 and a dialogue box opens asking where are the data for group 1, and I specify A1 throught A10, and where are the data for group 2, and I specify 49 00:06:39,066 --> 00:06:47,499 B1 through B10, and because I included row 1, I check off the "Labels" box. 50 00:06:47,500 --> 00:06:55,933 Excel defaults to an alpha level of 0.05, so I will leave that, 51 00:06:55,933 --> 00:07:09,466 and then it asks where I want to put the output. I will place it on this same sheet with the top left corner of the output at cell D1. 52 00:07:09,466 --> 00:07:13,066 When I click "Ok", this is the result that I get. 53 00:07:13,066 --> 00:07:20,032 Excel gives me the means, variances, and samples sizes for both groups. 54 00:07:20,033 --> 00:07:26,999 It also calculates Sp, indicated as "pooled variances". 55 00:07:27,000 --> 00:07:30,166 It gives me the t-statistic 56 00:07:30,166 --> 00:07:39,566 which is negative 1.84 for this comparison. And then it gives me a p-value for a one-tailed test and a p-value for a two-tailed test. 57 00:07:39,566 --> 00:07:46,699 So if my H1 was mu 1 not equat to mu 2, a two- tailed alternative, 58 00:07:46,700 --> 00:07:53,866 I would use the p-values of 0.08, and here I would not reject H0, because I don't have sufficiently strong evidence that the groups are different \ 59 00:07:53,866 --> 00:08:01,932 at an alpha level of p less than or equal to 0.05.