Information Bias (Observation Bias)


From the previous section it should be clear that, even if the categorization of subjects regarding exposure and outcome is perfectly accurate, bias can be introduced differential selection or retention in a study. The converse is also true: even if the selection and retention into the study is a fair representation of the population from which the samples were drawn, the estimate of association can be biased if subjects are incorrectly categorized with respect to their exposure status or outcome. These errors are often referred to as misclassification, and the mechanism that produces these errors can result in either non-differential or differential misclassification. Ken Rothman distinguishes these as follows:

"For exposure misclassification, the misclassification is nondifferential if it is unrelated to the occurrence or presence of disease; if the misclassification of exposure is different for those with and without disease, it is differential. Similarly, misclassification of disease [outcome] is nondifferential if it is unrelated to the exposure; otherwise, it is differential."

Nondifferential Misclassification of Exposure

Nondifferential misclassification means that the frequency of errors is approximately the same in the groups being compared. Misclassification of exposure status is more of a problem than misclassification of outcome (as explained on page 6), but a study may be biased by misclassification of either exposure status, or outcome status, or both.

Nondifferential misclassification of a dichotomous exposure occurs when errors in classification occur to the same degree regardless of outcome. Nondifferential misclassification of exposure is a much more pervasive problem than differential misclassification (in which errors occur with greater frequency in one of the study groups). The figure below illustrates a hypothetical study in which all subjects are correctly classified with respect to outcome, but some of the exposed subjects in each outcome group were incorrectly classified as 'non-exposed.'

Disease status is correct, but some exposed subjects have been incorrectly labeled as unexposed.

Suppose a case-control study was conducted to examine the association between a high fat diet and coronary artery disease. Subjects with heart disease and controls without heart disease might be recruited and asked to complete questionnaires about their dietary habits in order to categorize them as having diets with high fat content or not. It is difficult to assess dietary fat content accurately from questionnaires, so it would not be surprising if there were errors in classification of exposure. However, it is likely that in this scenario the misclassification would occur with more or less equal frequency regardless of the eventual disease status. Nondifferential misclassification of a dichotomous exposure always biases toward the null. In other words, if there is an association, it tends to minimize it regardless of whether it is a positive or a negative association.

The figure above depicts a scenario in which disease status is correctly classified, but some of the exposed subjects are incorrectly classified as non-exposed. This would result in bias toward the null. Rothman gives a hypothetical example in which the true odds ratio for the association between a high fat diet and coronary heart disease is 5.0, but if about 20% of the exposed subjects were misclassified as 'not exposed' in both disease groups, the biased estimate might give an odds ratio of, say, 2.4. In other words, it resulted in bias toward the null.

However, now consider what would happen in the same example if 20% of the exposed subjects were misclassified as 'not exposed' in both outcome groups, AND 20% of the non-exposed subjects were misclassified as 'exposed' in both groups - in other words a scenario that looked something like this:

Outcome status is correct, but some exposed subjects are in the unexposed category, and some unexposed subjects are in the exposed category.

This additional nondifferential misclassification would result in even more severe bias toward the null, giving an odds ratio of perhaps 2.0.

Note that If there are multiple exposure categories, i.e. if the exposure is not dichotomous, then nondifferential misclassification may bias the estimate either toward the null or away from it, depending on the categories into which subjects are misclassified.

Mechanisms for Nondifferential Misclassification

Nondifferential misclassification can occur in a number of ways. Records may be incomplete, e.g., a medical record in which none of the healthcare workers remember to ask about tobacco use. There may be errors in recording or interpreting information in records, or there may be errors in assigning codes to disease diagnoses by clerical workers who are unfamiliar with a patient's hospital course, diagnosis, and treatment. Subjects completing questionnaires or being interviewed may have difficulty in remembering past exposures. Note that if difficulty in remembering past exposures occurs to the same extent in both groups being compared, then there is nondifferential misclassification, which will bias toward the null. However, if one outcome group in a case-control study remembers better than the other, then there is a differential misclassification which is called "recall bias." Recall bias is described below under differential misclassification of exposure.