2.3 z-tests for proportions, categorical outcomes

 


2.3.1 One-sample z-test for a proportion

The prop.test( ) command performs one- and two-sample tests for proportions, and gives a confidence interval for a proportion as part of the output. For example, in the Age at Walking example, let's test the null hypothesis that 50% of infants start walking by 12 months of age. By default, R will perform a two-tailed test. The variable 'walkby12' that takes on the value of 1 for infants who walked by 1 year of age, and 0 for infants who did not start walking until after they were a year old. Using the table( ) command shows that, in this sample, 36/50=.72 of the infants walked by 1 year. The prop.test( ) procedure will perform the z-test comparing this proportion to the hypothesized value; input for the prop.test is the number of events (36), the total sample size (50), the hypothesized value of the proportion under the null (p=0.50 for a null value of 50%). Specifying 'correct=TRUE' tells R to use the small sample correction when calculating the confidence interval (a slightly different formula), and specifying 'correct=FALSE' tells R to use the usual large sample formula for the z-test for a proportion (since categorical data are not normally distributed, the usual z-statistic formula for the confidence interval for a proportion is only reliable with large samples - with at least 5 events and 5 non-events in the sample).

> table(walkby12)

walkby12

0 1

14 36

> prop.test(36,50,p=0.5,correct=FALSE)

1-sample proportions test without continuity correction

data: 36 out of 50, null probability 0.5

X-squared = 9.68, df = 1, p-value = 0.001863

alternative hypothesis: true p is not equal to 0.5

95 percent confidence interval:

0.5833488 0.8252583

sample estimates:

p

0.72

The prop.test( ) procedure can be used for several scenarios, so it's a good idea to check the labeling (1-sample proportions) to make sure we set things up correctly. 72% of infants began walking before age 12 months. The two-tailed p-value here is p=0.0018, which is less than the conventional cut-off of 0.05, and so we can conclude that the percent of infants walking before age 12 months is significantly greater than 50%. The prop.test( ) procedure also gives a confidence interval for this proportion tests a hypothesis about the proportion (see Section 2.1.2). Note that the CI here does not contain the null value of 0.50, agreeing with the p-value that the percent walking by age 12 is greater than 50%.

There is also a 'binom.exact( )' function which calculates a confidence interval for a proportion using an exact formula appropriate for small sample sizes.

2.3.2 Two-sample z-test comparing two proportions

The prop.test( ) command performs a two-sample test for proportions, and gives a confidence interval for the difference in proportions as part of the output. The z-test comparing two proportions is equivalent to the chi-square test of independence, and the prop.test( ) procedure formally calculates the chi-square test. The p-value from the z-test for two proportions is equal to the p-value from the chi-square test, and the z-statistic is equal to the square root of the chi-square statistic in this situation.

The example below uses data from the Age at Walking example, comparing the proportion of infants walking by 1 year in the exercise group (group=1) and control group (group=2). The table( ) command is used to find the number of infants walking by 1 year in each study group, and the proportion walking can be calculated from these frequencies. The prop.test( ) command performs the chi-square test comparing the two proportions; for the two-sample situation, first enter a vector representing the number of successes in each of the two groups (using the c( ) command to create a column vector), and then a vector representing the number of subjects in each of the two groups. To use the usual large-sample formula in calculating the confidence interval, include the 'correct=FALSE' option to turn off the small sample size correction factor in the calculation (although in this example, with only 17 subjects in the control group, the small sample version of the confidence interval might be more appropriate).

> table(by1year,group)

group

by1year 1 2

0 5 9

1 28 8

> 28/33

0.848

> 8/17

0.470

> prop.test(c(28,8),c(33,17),correct=FALSE)

2-sample test for equality of proportions without continuity

correction

data: c(28, 8) out of c(33, 17)

X-squared = 7.9478, df = 1, p-value = 0.004815

alternative hypothesis: two.sided

95 percent confidence interval:

0.1109476 0.6448456

sample estimates:

prop 1 prop 2

0.8484848 0.4705882

Warning message:

In prop.test(c(28, 8), c(33, 17), correct = FALSE) :

Chi-squared approximation may be incorrect

The prop.test( ) command does several different analyses, and it's a good idea to check the title to make sure R is comparing two groups ('2-sample test for equality…'). The p-value (p=0.0048) is a two-tailed p-value testing the null hypothesis of no difference between the two proportions. Since the p-value is less than the conventional 0.05, this example shows a significant difference in the percent of infants walking by 1 year; more infants in the exercise group are walking by 1 year than in the control group. The procedure gives a chi-square statistic which is equal to the square of the z-statistic. Here the z-statistic would be the square root of 7.9478 or z=2.819. The procedure also gives the results of a confidence interval for the difference between the two proportions (see section 2.1.5).