# Summarizing Data in a Cohort Study

Investigators often use contingency tables to summarize data. In essence, the table is a matrix that displays the combinations of exposure and outcome status. If one were summarizing the results of a study with two possible exposure categories and two possible outcomes, one would use a "two by two" table in which the numbers in the four cells indicate the number of subjects within each of the 4 possible categories of risk and disease status.

The table below summarizes the findings. A total of 479 subjects completed the questionnaire, and 124 of them indicated that they had been exposed to the kiddy pool. Of these, 16 subsequently developed Giardia infection, but 108 did not. Among the 355 subjects who denied kiddy pool exposure, 14 developed Giardia infection, and the other 341 did not.

 Swam in Kiddy Pool? Giardia No Giardia Total Cumulative Incidence Yes 16 108 124 16/124 = 12.9% No 14 341 365 14/365 = 3,9%

Organization of the data this way makes it easier to compute the cumulative incidence in each group (12.9% and 3.9% respectively). The incidence in each group provides an estimate of risk, and the groups can be compared in order to estimate the magnitude of association. (This will be addressed in much greater detail in the module on Measures of Association.) One way of quantifying the association is to calculate the relative risk, i.e., dividing the incidence in the exposed group by the incidence in the unexposed group). In this case, the risk ratio is (12.9% / 3.9%) = 3.3. This suggest that subjects who swam in the kiddy pool had 3.3 times the risk of getting Giardia infections compared to those who did not, suggesting that the kiddy pool was the source.

If the kiddy pool was the source of contamination responsible for this outbreak, why was it that:

1. Only 16 people exposed to the kiddy pool developed the infection?
2. There were 14 Giardia cases among people who denied exposure to the kiddy pool?

Before you look at the answer, think about it and try to come up with a possible explanation.

Link to the 2003 Giardia outbreak

Link to CDC page on Organizing Data

Possible Pitfall: Contingency tables can be oriented in several ways, and this can cause confusion when calculating measures of association.

There is no standard rule about how to set up contingency tables, and you will see them set up in different ways.

• With exposure status in rows and outcome status in columns
• With exposure status in columns and outcome status in rows
• With exposed group first followed by non-exposed group
• With non-exposed group first followed by exposed group

If you aren't careful, these different orientations can result in errors in calculating measures of association. One way to avoid confusion is to always set up your contingency tables in the same way. For example, in these learning modules the contingency tables almost always indicate outcome status in columns listing subjects who have the outcome of interest to the left of subjects who do not have the outcome, and exposure status of the exposed (or most exposed) group is listed in a row above those who are unexposed (or have less exposure).

The table below illustrates this arrangement.

 Those With the Outcome Those Without the Outcome Total Exposed (or most exposed) Non-exposed (or least exposed)