Analysis of Case-Control Studies

As with cohort studies and clinical trials one of the first steps in the analysis of a case-control study is to generate simple descriptive statistics on each of the groups being compared, i.e., the case and the controls. This helps characterize the study population, and it also alerts you and your readers to any differences between the groups with respect to other exposures that might cause confounding.

After generating the descriptive statistics for a case-control study, the next step is to organize the data using contingency tables and to calculate estimates for the odds ratio. There may be confounding factors that distort the odds ratio, but one still begins by generating crude measures of association, i.e., estimates that have not yet been adjusted for confounding factors. In a later module you will learn how to use R to adjust for confounding in a case-control study.

Which Study Design is Best?

Selection of a study design depends on the scientific questions being addressed and should take into account ethics and feasibility. For example, randomized clinical trials provide the best opportunity to identify small but potentially important clinical associations, but it would not be ethical to address all questions with a randomized clinical trial (e.g., whether maternal smoking during pregnancy is associated with a greater risk of having a premature or low birth weight infant). Observational studies (cohort studies and case-control studies) avoid many ethical problems, because potentially harmful exposures are not being allocated by the investigators, but they frequently present potential problems with regard to confounding and bias. Clinical trials and prospective cohort studies often require large numbers of subjects and long periods of follow-up that make them too costly to perform. Retrospective cohort studies and case-control studies are best to study outcomes with long latency periods, but getting accurate exposure data may be difficult. Case-control studies are particularly useful when studying rare outcomes, dynamic populations, and in situations in which exposure information is costly or difficult to obtain.