2.6 Nonparametric statistics for comparing medians of non-normal outcomes


2.6.1 Wilcoxon rank sum test for independent samples

The wilcox.test( ) function performs the Wilcoxon rank sum test (for two independent samples, with the 'paired=FALSE option) and the Wilcoxon signed rank test (for paired samples, with the 'paired=TRUE' option). With samples less than 50 and no ties, R calculates an exact p-value, otherwise R uses a normal approximation with a correction factor to calculate a p-value.

To perform a Wilcoxon rank sum test, data from the two independent groups must be represented by two data vectors. In this example, we want to compare lactate levels for subjects from Group=1 vs. Group=2 (the original data frame contains data on subjects from both study groups, with the Group variable indicating group membership). The following commands create separate data vectors for lactate for subjects in the two study groups (see Section 7 for the subset command; I printed the two data vectors as a check):

> lactate.sga <- subset(Lactate,Group==2)

> lactate.controls <- subset(Lactate,Group==1)

> lactate.sga

[1] 5.79 4.60 4.20 1.65 2.38 5.67 12.60 3.40 7.57 2.48 4.36

> lactate.controls

[1] 3.18 2.52 1.40 2.26 1.61

The following performs the Wilcoxon rank sum test. Note that the wilcox.test function does not provide any descriptive statistics, and so the summary( ) function was used to find medians and interquartile ranges for the two groups.

> wilcox.test(lactate.sga,lactate.controls,paired=FALSE)

Wilcoxon rank sum test

data: lactate.sga and lactate.controls

W = 48, p-value = 0.01923

alternative hypothesis: true location shift is not equal to 0

> summary(lactate.sga)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.650 2.940 4.360 4.973 5.730 12.600

> summary(lactate.controls)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.400 1.610 2.260 2.194 2.520 3.180

Another way to create separate data vectors for the sga and control infants would be to use the 'select if' command rather than the subset command. This avoids creating multiple versions of the data set :

> wilcox.test(Lactate[Group==2],Lactate[Group==1],paired=FALSE)

2.6.2 Wilcoxon signed rank test for paired samples

The wilcox.test( ) function will perform the Wilcoxon signed rank test comparing medians for paired samples. The paired data must be represented by two data vectors with the same number of subjects. In this example, the prescores and postscores variables represent paired test results before and after an intervention. Note that the wilcox.test( )function does not provide descriptive statistics, and so the median( )function was used to calculate the median test scores pre and post intervention. The summary( )function would give the range and interquartile range in addition to the median.

> wilcox.test(prescores,postscores,paired=TRUE)

Wilcoxon signed rank test with continuity correction

data: prescores and postscores

V = 8, p-value = 0.3508

alternative hypothesis: true location shift is not equal to 0

Warning message:

In wilcox.test.default(prescores, postscores, paired = TRUE) :

cannot compute exact p-value with ties

> median(prescores)

[1] 61

> median(postscores)

[1] 59