2.5 Chisquare tests for categorical outcomes
2.5.1 The chisquare goodnessoffit test for one sample
The following gives the syntax needed to calculate a chisquare goodnessoffit test from a set of tabled frequencies. As an example, 45 subjects are asked which of 3 screening tests they prefer; 10 subjects prefer Test A, 15 prefer test B, and 20 prefer Test C. We wish to test the null hypothesis that the three screening tests are equally preferred, or equivalently, that 1/3 of subjects prefer each test. The data:
Preference 
Observed Frequency 
Expected Proportion Under the Null 

Test A 
10 
0.333 
Test B 
15 
0.333 
Test C 
20 
0.333 
To analyze these data in R, first create an object (arbitrarily named 'obsfreq' in the example) that contains the observed frequencies. Second, we create an object that contains the expected probabilities under the null (arbitrarily named 'nullprobs'; the third probability was rounded to .334 because the probabilities must sum to 1.00; perhaps a better solution would have been to give the probabilities as 1/3,1/3,1/3, which would also work). Third, we compare the observed frequencies to the expected probabilities through the chisq.test( ) function:
> obsfreq < c(10,15,20)
> nullprobs < c(.333,.333,.334)
> chisq.test(obsfreq,p=nullprobs)
Chisquared test for given probabilities
data: x
Xsquared = 3.3018, df = 2, pvalue = 0.1919
R gives a twotailed pvalue.
2.5.2 Contingency table analysis and the chisquare test of independence
2.5.2.1 The chisquare test of independence from persubject data
From the Age at Walking example, suppose we want to compare the percent of males (coded sexmale=1) between the two groups in our age first walking example. We can first use the 'table( )' function to get the observed counts for the underlying frequency table:
> table(group,sexmale)
sexmale
group 0 1
1 17 16
2 9 8
In group 1, there are 16 males and 17 females, so 48.5% (16/33) of group 1 is male.
In group 2, 47.1% (8/17) are male. The 'prop.table( )' function will calculate these proportions in R:
> prop.table(table(group,sexmale),1)
sexmale
group 0 1
1 0.5151515 0.4848485
2 0.5294118 0.4705882
The 'prop.table( )' command calculates proportions from the indicated table; in this example we want to calculate proportions within groups, and the '1' in the 'prop.table( )' example above indicates that we want proportions calculated within groups for the first variable in the table (within group, so we're calculating the percent of males and females within group 1, and the percent of males and females within group 2). Had we indicated '2' in the above example, R would have calculated proportions within sex, giving the proportions in groups 1 and 2 for males, and the proportions within groups 1 and 2 for females.
Specifying the orientation for the prop.table( ) command can be confusing, and it may be easier (or safer) to just calculate proportions directly for the table of counts. R can be used as a calculator to find these proportions directly:
> 16/(16+17)
[1] 0.4848485
> 8/(8+9)
[1] 0.4705882
The chisq.test() function applied to a table object compares these two percentages through the chisquare test of independence:
> chisq.test(table(group,sexmale),correct=FALSE)
Pearson's Chisquared test
data: table(group, sexmale)
Xsquared = 0.0091, df = 1, pvalue = 0.9238
The 'correct=FALSE' option in the chisq.test function turns off Yates' correction for the chisquare test (which is used with small sample sizes), and gives the standard chisquare test statistic. R gives a twotailed pvalue. Note that the title for the output, 'Pearson's Chisquared test' indicates that these results are for the uncorrected (not Yates' adjusted) chisquare test.
2.5.2.2 The chisquare test of independence from tabled data
R can also perform a chisquare test on frequencies from a contingency table. For example, suppose we want to compare percent of subjects testing positive on a marker for an exposure across three groups:

Group 1 
Group 2 
Group 3 

Test Positive 
20 (40%) 
5 (33.3% 
40 (50%0 
Test Negative 
30 
10 
40 
First, we create an object ('obsfreq' in the example) containing the observed frequencies from the observed table. I printed the object as a check that it was created correctly:
> obsfreq < matrix(c(20,30, 5,10, 40,40),nrow=2,ncol=3)
> obsfreq
[,1] [,2] [,3]
[1,] 20 5 40
[2,] 30 10 40
The 'chisq.test( )' function will then calculate the chisquare statistic for the test of independence for this table:
> chisq.test(obsfreq)
Pearson's Chisquared test
data: obsfreq
Xsquared = 2.1378, df = 2, pvalue = 0.3434
2.5.2.3 Fisher's exact test for small cell sizes
The usual chisquare test is appropriate for large sample sizes. For 2x2 tables with small samples (an expected frequency less than 5), the usual chisquare test exaggerates significance, and Fisher's exact test is generally considered to be a more appropriate procedure. The fisher.test() function performs Fisher's exact test in R:
> fisher.test(group,sexmale)
Fisher's Exact Test for Count Data
data: group and sexmale
pvalue = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.2480199 3.5592990
sample estimates:
odds ratio
0.9455544
R gives the twotailed pvalue, as indicated by the wording of the alternative hypothesis. The odds ratio and a 95% confidence interval for the odds ratio are also given. Since Fisher's test is usually used for small sample situations, the CI for the odds ratio includes a correction for small sample sizes.
2.5.2.4 Relative Risk and Confidence interval for the RR
Epidemiologic analyses are available through 'epitools', an addon package to R. To use the epitools functions, you must first do a onetime installation. In R, click on the 'Packages' menu, then 'Install Package(s)', then select a download site (from the US), then select the epitools package. This will install the addon package onto your computer. To use the package, you must also load it into R: click on the 'Packages' menu, then 'Load Package', then select epitools. While you only need to install the package once onto your computer, you will need to load the package into R each time you want to use it.
The data layout matters for calculating RRs. For the riskratio( ) function from epitools, data should be set up in the following format:

No Disease 
Disease 

Control 


Exposed 


riskratio( ) calculates the RR of disease for those in the exposed group relative to the control group.
For the Age at Walking example, I categorized age at walking as early walking (under 12 months, coded 0) and late walking (12 months or older, coded 1). To find the relative risk for late walking, for kids in Group 2 vs. Group 1, I first printed the 2x2 table as a check, then used the riskratio() function to calculate the relative risk and large sample 95% confidence interval.
> table(group,LateWalker)
LateWalker
group FALSE TRUE
1 28 5
2 8 9
> riskratio.wald(group,LateWalker)
$data
Outcome
Predictor FALSE TRUE Total
1 28 5 33
2 8 9 17
Total 36 14 50
$measure
risk ratio with 95% C.I.
Predictor estimate lower upper
1 1.000000 NA NA
2 3.494118 1.387688 8.797984
$p.value
twosided
Predictor midp.exact fisher.exact chi.square
1 NA NA NA
2 0.008000253 0.007949207 0.004814519
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
Warning message:
In chisq.test(xx, correct = correction) :
Chisquared approximation may be incorrect
The RR here is 3.49 ( (9/17) / (5/33) ) , with a 95% CI of (1.39 , 8.80). There are several versions of a CI for a relative risk, and using 'riskratio.wald( )' requests the standard normal approximation formula; 'riskratio.small( )' uses a correction to the CI for small samples. R will choose the appropriate version of the CI if 'riskratio( )' is specified.
2.5.2.5 Odds ratios and 95% CI for the OR
The epitools addon package also has a function to calculate odds ratios and confidence intervals for odds ratios. You must first load the epitools package into R (see Section 16d). Orientation of the table matters when calculating the OR, and the orientation described above for the relative risk also applies for the odds ratio. Calculating the odds ratio ( (9/8) / (5/28) = 6.3 ) and 95% CI for late walkers, for Group 2 vs. Group 1 in the Age at Walking example:
> oddsratio.wald(group,LateWalker)
$data
Outcome
Predictor FALSE TRUE Total
1 28 5 33
2 8 9 17
Total 36 14 50
$measure
odds ratio with 95% C.I.
Predictor estimate lower upper
1 1.0 NA NA
2 6.3 1.639283 24.2118
$p.value
twosided
Predictor midp.exact fisher.exact chi.square
1 NA NA NA
2 0.008000253 0.007949207 0.004814519
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
Warning message:
In chisq.test(xx, correct = correction) :
Chisquared approximation may be incorrect
The 'oddsratio.wald" option gives the usual estimate for the odds ratio, with OR=6.3 and 95% CI of 1.64 , 24.21. 'oddsratio.small( )' uses a correction for small sample size in calculating the CI.