Analysis of Multiway (r ´ c) Tables
Our previous analyses only allow us to compare two or more proportions with each other. However, we may be interested in seeing whether two factors are independent of one another, in which case we need to consider all levels of each factor, which leads to a table with r rows and c columns (where both r and c can be bigger than 2, depending on the number of levels). For example, in the esophageal cancer data, we may want to determine whether the effects of tobacco and alcohol intake are independent as relating to cancer outcome.
For the analysis of tables with more than two classes on both sides, you can use chisq.test or fisher.test, although the latter can be very computationally demanding if the cell counts are large and there are more than two rows or columns.
Design
An r ´ c table can arise from several different sampling plans, and the notion of "no relation between rows and columns" is correspondingly different. The total in each row might be fixed in advance, and you would be interested in testing whether the distribution over columns is the same for each row, or vice versa if the column totals were fixed. It might also be the case that only the total number is chosen and the individuals are grouped randomly according to the row and column criteria. In the latter case, you would be interested in testing the hypothesis of statistical independence, that the probability of an individual falling into the ijth cell is the product p_{i} ´ p_{j} of the marginal probabilities. However, mathematically the analysis of the table turns out to be the same in all cases!
Example
For the esoph data, test whether the effects of tobacco and alcohol intake are independent in terms of cancer case status.
First, construct a twoway contingency table for the data using the tapply command:
> tob.alc.table<tapply(ncases,list(tobgp,alcgp),sum)
## notice the grouping using "list"
> tob.alc.table
039g/day 4079 80119 120+
09g/day 9 34 19 16
1019 10 17 19 12
2029 5 15 6 7
30+ 5 9 7 10
> chisq.test(tob.alc.table) ## what can you conclude about independence?
In some cases, you may get a warning about the c^{2} approximation being incorrect, which is prompted by some cells having an expected count less than 5.
Perform an appropriate test to determine whether the effects of age and alcohol independently lead to the occurrence of cancer. 
To summarize, let's review the tests for categorical data that we have looked at so far, where they are used, and what form the input data should be in.
Tests for categorical data

Single Proportion 
Two Proportions 
> 2 Proportions 
Twoway tables 
Input 
Comments 

prop.test 
yes 
yes 
yes 
no 
vectors of successes and trials 
accurate for large samples only 
fisher.test 
no 
yes 
no 
yes 
matrix or contingency table 
exact test, but timeconsuming for large tables 
chisq.test 
no 
yes 
no 
yes 
matrix or contingency table 
expected cell frequencies should be > 5 for accuracy 
For more on this topic,SPH offers BS 821: Categorical Data Analysis, taught by Prof. David Gagnon, or BS 852: Statistical Methods for Epidemiology, taught by Profs. Paola Sebastiani or Tim Heeren.
ChiSquare Test, Fishers Exact Test, and Cross Tabulations in R (R Tutorial 4.7) MarinStatsLectures [Contents]
Relative Risk, Odds Ratio and Risk Difference (aka Attributable Risk) in R (R Tutorial 4.8) MarinStatsLectures [Contents]
Reading:
 BS 704 R Notes 2.1, 2.3 and 2.5
Assignment:
 Homework 6 assigned
 Final Project due in 2 weeks