Example of a Case-Control Study

The Salmonella outbreak above occurred in a small, well-defined cohort, and the overall attack rate was 58%. A cohort study design works well in these circumstances. However, in most outbreaks the population is not well defined, and cohort studies are not feasible. A good example of this is an actual outbreak of hepatitis A that occurred in Marshfield, MA in 2004.


Excerpts from introduction of the report by the Massachusetts Department of Health

  • "Between February 25 and 27, 2004 six cases of HAV infection in Marshfield residents were reported to …MDPH. In addition, a case of hepatitis A in a Plymouth resident, employed in Marshfield, was reported."
  • "Marshfield had 1 case in 2002 and 0 cases in 2003."
  • "The increase in the number of reported cases … during February in a confined geographic area was an indication of a possible outbreak of hepatitis A infection.


Within a short period of time 20 cases of hepatitis A were identified in the Marshfield area. The epidemic curve suggested a point source epidemic, and the spot map showed the cases to be spread across the entire South Shore of Massachusetts, although the pattern suggested a focus near Marshfield. Hypothesis-generating interviews resulted in five food establishments that were candidate sources. Moreover, the disease was rare, so that even if they interviewed a sample of patrons at each of the restaurants, it is most likely that few, if any would have had recent hepatitis, even from the responsible restaurant.

In a situation like this a case-control design is a much more efficient option. The investigators identified as many cases as possible (19 agreed to answer the questionnaire), and they selected a sample of 38 non-diseased people as a comparison group (the controls). In this case, the "controls" were non-diseased people who were matched to the cases with respect to age, gender, and neighborhood of residence. Investigators then ascertained the prior exposures of subjects in each group, focusing on food establishments and other possibly relevant exposures they had had during the past two months.

When using a case-control strategy for sampling, it is not possible to calculate the incidence (attack rate) in exposed and non-exposed subjects, because the denominators of the exposure groups are unknown. However, one can calculate the odds of disease in exposed and non-exposed subjects, and these can be expressed as an odds ratio, which is a good approximation of a risk ratio in a situation like this, i.e., when the outcome is rare. An odds ratio can be computed for each of the possible sources. Consider the following example:

Cases Controls
Ate at Papa Gino's 10 19
Did not eat at Papa Gino's 9 19
19 38

Given these hypothetical results, the odds that someone who ate a Papa Gino's was a case were 10/19, while the odds that someone not exposed to Papa Gino's became a case were 9/19. These odds are quite similar, and the odds ratio is close to 1.0. The odds ratio can be interpreted the same way as a risk ratio.

Odds Ratio = (10/19) / (9/19) = 1.1

This certainly provides no compelling evidence to suggest an association with Papa Gino's, but, as we did with the risk ratio, we could compute a 95% confidence interval for the odds ratio, and we could also compute a p value. In this case the 95% confidence interval is 0.37 to 3.35, and p= 0.85.

In contrast, consider the findings for Ron's Grill:




Ate at Ron's Grill



Did not eat at Ron's






For Ron's Grill the odds ratio would be computed as follows:

Odds Ratio = (18/7) / (1/29) = 75

This suggests that patrons of Ron's Grill had 75 times the risk of being a case compared to those who did not eat at Ron's. The other three restaurants that had been suspects had odds ratios that were close to 1.0. This certainly provides strong evidence that a Ron's Grill was the source of the outbreak, and further investigation confirmed that one of the food handlers at Ron's had recently had a subclinical case of hepatitis A.

In case-control studies, one of the most difficult decisions is how to select the the controls. Ideally they should be non-diseased people who come from the same source population as the cases, and, aside from their outcome status, they should be comparable to the cases in order to avoid selection bias. Note that in the Marshfield case-control study the controls were selected in a way to ensure that they were comparable with respect to age and gender and lived in similar neighborhoods.

For more information about the conduct and analysis of case-control studies, please see the online modules on:

For more information on developing questionnaires for outbreak studies, see: