Causal Inference


Epidemiology is primarily focused on establishing valid associations between 'exposures' and health outcomes. However, establishing an association does not necessarily mean that the exposure is a cause of the outcome. Most definitions of "cause" include the notion that it is something that has an effect or a consequence. Certainly, establishing a valid association between exposure and outcome is a necessary first step that must be accomplished before wrestling with the more complicated, and frequently controversial, question of whether the relationship is causal. However, for most epidemiologists the second step in the process is to consider the entire body of evidence that is available to try to arrive at a reasonable conclusion about the relationship when the overall evidence from epidemiology and other sources (e.g., in vitro, animal, and other types of human studies) is reviewed.

Since a determination that a relationship is causal is a judgment, there is often disagreement, particularly since causality often implies some degree of responsibility for the outcome or may create a demand for public health action, and this often has legal and financial consequences. Many would agree that incomplete evidence or a lack of agreement about causality should not always prevent appropriate actions to protect the public's health. Nevertheless, the question of whether a relationship is causal sometimes has important consequences for a vast number of people. 

 "The world is richer in associations than meanings, and it is the part of wisdom to differentiate the two."

— John Barth, novelist.

Learning Objectives

After completing this module, the student will be able to:



Historical Views of Causation

Four Vital Humors

Historically, there have been many efforts to account for the occurrence of disease outcomes. Religions often attributed disease outbreaks or other misfortunes to divine retribution - punishment for mankind's sins. Hippocrates promoted the concept that disease was the result of an imbalance among four vital "humors" within us:

Hippocrates believed that if one of the humors became excessive or deficient, health would deteriorate and symptoms would develop. Hippocrates was a keen observer and tried to relate an individual's exposures (e.g., diet, exercise, occupation, and other behaviors) to subsequent health outcomes.


Consequently, his recommendations and "prescriptions" were often based on his observations and his perception of cause and effect. His disease model, however crude, also suggested seemingly logical interventions. For example, if he surmised that an individual suffered from too much of the humor "blood", he prescribed blood letting to alleviate the problem. The scene depicted to the right shows a female physician in the process of letting blood from one of her patients.

Link to more on Hippocrates


Another popular theory that persisted until the end of the 19th century was that miasmas were responsible for disease. Bad odors were equated with disease. Miasmas were toxic vapors or gases that emanated from cesspools or swamps or filth, and it was believed that if one inhaled the vapors, disease would result. This theory provided an explanation for outbreaks of infectious disease, including cholera and plague. As a result many ineffective interventions were pursued. Bonfires and smoking urns were used to prevent both plague and cholera. In the 14th century "plague doctors" wore masks with beak-like projections filled with aromatic herbs in order to counteract the effect of miasmas. Echoes of the miasmatic theory can be found in the name "malaria", derived from the Italian for "bad air" (mala, aria). It reflects the correct observation that the disease was more common in swampy areas, but it misidentified the cause as the foul odors rather than the bacterium caused by the mosquito that bred there.

Germ Theory - Koch's Postulates

Even though there was a "germ" of truth in miasmatic theory, in that it focused attention on environmental causes of disease and partly explained social disparities in health (poor people being more likely to live near foul odors), the theory began to fall into disfavor as the germ theory gained acceptance. Louis Pasteur and others introduced the germ theory in 1878.

Louis Pasteur working in his lab

In 1890 Robert Koch proposed specific criteria that should be met before concluding that a disease was caused by a particular bacterium. These became known as Koch's Postulates, which are as follows:

Koch's postulates established standard criteria for drawing conclusions about the cause of infectious disease, but the criteria obviously don't apply to non-infectious diseases. In addition, the criteria also had some limitations even with respect to infectious disease. For example, not all infectious diseases have good animal models. Another problem was that bacteria that we regard as "normal flora", such as the Staphylococcus aureus on our skin, are generally harmless but can cause disease under certain conditions. Moreover, when people are exposed to a bacterium, such as the TB bacillus, they don't necessarily become infected. There appear to be many other factors that play a role in determining whether a given individual becomes infected after they are exposed. Factors such as nutritional status or immune status clearly have an impact in "causing" TB, but all these other determinants aren't accounted for by Koch's postulates.

Webs of Causation

The germ theory obviously didn't provide insights regarding the causes of chronic diseases, and over time it became increasingly apparent that for most diseases there were many contributory factors. Researchers began thinking about complex "webs" of causation. The image below summarizes a web of causation for obesity in the context of a socio-ecologic perspective. Note that some factors are more "proximate" or immediate, such as decreased energy expenditure and increased food intake, while other factors or perhaps root causes are more distal, such as globalization of markets, development, and advertising.

A complicated web of interconnected factors causeing weight disorders

Image source:


What is a Cause?

It is natural for us to ponder the relationship between cause and effect. If one can identify causes, then it is possible to predict future events to some extent. The notion of causation also provides a basis for praise and credit if the effect was desirable or blame if was not.

We all have an intuitive idea of what we think of as a "cause," but how does one define what a cause is? What are the criteria that need to be met in order to say that a factor (or exposure, determinant) is a cause of a particular outcome? Here are several definition:


Characteristics of a Cause

To be a cause, the factor:

Risk Factors versus Causes

Epidemiologists often use the term "risk factor" to indicate a factor that is associated with a given outcome. However, a risk factor is not necessarily a cause. The term risk factor includes surrogates for underlying causes. For example, consider the following table which summarizes characteristics associated with a high risk of breast cancer and characteristics associated with a low risk.


High Risk

Low Risk

Country of Birth

North America, Northern Europe

Asia, Africa

Socioeconomic status



Marital status

Never married

Ever married

Each of these factors (place of birth, socioeconomic status, and marital status) is associated with an increased risk of breast cancer, but none of these are causes. [Recall Susser's definition that a cause is something that makes a difference; or recall Rothman's definition, i.e., that a cause is an event, condition, or characteristic without which the disease would not have occurred.] These risk factors are surrogates or markers for underlying causes, e.g., populations with a higher prevalence of genetic risk from BRCA1 and BRCA2 alleles, or lower parity which in turn is a marker for unopposed estrogen stimulation of breast tissue.) Being born in northern Europe per se is not a cause; it is a marker for populations that may have a greater genetic predisposition to breast cancer.

It is therefore important to distinguish between risk factors and causes. Nevertheless, before one can wrestle with the difficult question of causation, it is first necessary to establish that a valid association exists. Consequently, if we accept Susser's assertion that a cause is something that makes a difference, one might then ask how to tell if a factor makes a difference. Most epidemiologists would agree that, in a broad sense, this is a two step process.

  1. The evidence must be examined to determine that there is a valid association between an exposure and an outcome. This is achieved by conducting epidemiologic studies and critically reviewing the available studies to determine whether random error or bias or confounding might explain the apparent association.
  2. If it is determined that there is a valid association, then one must wrestle with the question of whether the association was causal. Not all associations are causal. There are no standardized rules for determining whether a relationship is causal.

Answering the question of whether a given factor is a cause or not requires making a judgment. There are no rigid criteria for determining whether a causal relationship exists, although there are guidelines that should be considered. The process of determining whether a causal relationship does in fact exist is called "causal inference".

Given the lack of rigid criteria, debate and disagreement over the evidence is inevitable and positive. However, it also means that the debate can be prolonged for reasons other than scientific dispute, for example, if powerful institutions perceive that there are substantial financial implications that would follow from concluding that there is a causal connection. The existence, sources, and implications of global climate change is perhaps the most prominent current example in which concerns have been raised that commercial interests over the causal association between human activity and atmospheric change have affected the scientific process. The most well-known and well-documented example in recent history was the tobacco industry's effort to deny that the association between cigarette smoking and lung cancer was causal.  

Hill's Criteria for Causality

Smoking had long been a contentious issue. Despite its increasing popularity, many had opposed smoking on moral grounds; others claimed that smoking had adverse health effects, but the evidence to support these claims was thin. There had been a remarkable increase in lung cancer in both the US and Britain during the first half of the 20th century, but the cause had not been established. Many attributed the increase to the steady increase in the use of motor vehicles, or the paving of roads, or the steady rise in industry.

Thinking man icon indication a question for the student


Bradfor Hill In 1939, a German study reported an association between smoking and lung cancer. Then, in the 1940s and 1950s there was a succession of studies that sought to examine the cause of the epidemic of lung cancer that was claiming more and more lives. Richard Doll and Austin Bradford Hill (shown on the right) conducted landmark epidemiologic studies that were important in establishing the strong association between smoking and lung cancer. The first was a case-control study conducted in London area hospitals. The cases were patients with lung cancer, and the controls were age and gender matched patients at the same hospital who had diseases other than cancer. [NOTE]

After their case-control study, Doll and Hill launched a prospective cohort study among male physicians in the UK, looking at cause of death as the primary endpoint. The initial findings were published in 1954, with a follow up report in 1958. These studies demonstrated an even stronger association between smoking and lung cancer mortality and also showed that smoking was also significantly associated with other cancers and with a variety of other non-cancerous causes of death including emphysema, chronic bronchitis, TB, atherosclerotic heart disease, stroke, hypertension, and aneurysms.

Despite the strong associations that they found, there was controversy about whether the association was causal. Out of this debate came the notion that causality could not be proven by formulaic consideration of observations; instead, a conclusion of causality was a judgment based on a body of evidence. In 1965 Hill and others proposed certain aspects of evidence that should be considered when trying to draw conclusions about causality. These were not intended to be rigid criteria.

Hill's Criteria

These reports provided the basis for the 1964 US Surgeon General's Report that concluded, "Cigarette smoking is causally related to lung cancer in men; the magnitude of the effect of cigarette smoking far outweighs all other factors." It estimated that moderate smokers had a 10-fold increase in risk of lung cancer, while heavy smokers had a 20-fold increased risk. The report backed pedaled regarding the addictive properties of nicotine concluding that the "tobacco habit should be characterized as an habituation rather than an addiction,...."

The US Surgeon General's report stirred controversy, and even physicians (many of whom were smokers) refused to accept the conclusion that smoking was a cause of lung cancer. In the video below, medical historian Alan Brandt summarizes the reaction to the report and the scientific studies upon which they were based.

Eventually, the epidemiologic studies and the Surgeon General's report did begin to have an impact on public opinion, however A 1958 survey found that only 44 percent of Americans believed smoking caused cancer, but in 1968 78 percent believed that smoking caused lung cancer.

In addition to the tactics described by Alan Brandt in the video, the tobacco industry also steadfastly claimed that there was no proof that cigarettes caused lung cancer. In 1994 the Occupational Health and Safety Administration (OSHA) began a series of hearings regarding a proposed rule on indoor air and the potential harm of environmental tobacco smoke . Part of the hearing was aimed at the more direct question of whether active smoking causes lung cancer. The link below will bring you to a series of responses supplied by tobacco industry scientists when they were asked whether they believe that active smoking causes lung cancer. As you read their responses, consider the following questions:

  1. What distinction do they make between "risk factors" and "cause".
  2. One of the tobacco industry witnesses suggests that lung cancer is multi-factorial. Is this a reasonable possibility?
  3. And, if smoking is multifactorial in etiology, does this mean that tobacco is not a cause?
  4. Can anything be proven to have caused a given case of lung cancer?

Link t the OSHA Testimony (for more excerpts from the hearing, click here).

The Sufficient-Component Cause Model

In 1976 Ken Rothman, who is a member of the epidemiology faculty at BUSPH, proposed a conceptual model of causation known as the "sufficient-component cause model" in an attempt to provide a practical view of causation which also had a sound theoretical basis. The model has similarities to the "web of causation" theory described above, but is more developed in the sense that it simultaneously provides a general model for the conditions necessary to cause (and prevent) disease in a single individual and for the epidemiological study of the causes of disease among groups of individuals.

A Sufficient Cause

Rothman recognized that disease outcomes have multiple contributing determinants that may act together to produce a given instance of disease. For example, exposure to someone who has TB does not necessarily result in the occurrence of TB. Moreover, the set of determinants that produce TB in one individual may not be the same set of conditions that were responsible for the occurrence of TB in others.

Rothman defined a sufficient cause as "...a complete causal mechanism" that "inevitably produces disease." Consequently, a "sufficient cause" is not a single factor, but a minimum set of factors and circumstances that, if present in a given individual, will produce the disease. Aschengrau and Seage use the example of causation of AIDS. A sufficient cause for AIDS might consist of the following components:

The pie chart below might be used to represent the sufficient cause model for this scenario.

A hypothetical cause of AIDS with 3 components: sexual contact with someone with HIV, engaging in unprotected sex, absence of antiretroviral drugs

The model suggests that the presence of these three component causes is sufficient to produce AIDS in this individual. Note further if any one of these components were absent, AIDS would not occur. Hence, Rothman's assertion that a cause is an event, condition, or characteristic without which the disease would not have occurred.   Note that the sufficient cause illustrated here is only one manner in which AIDS could occur. Different individuals will have different sets of individual components that combine to produce a sufficient cause (i.e., a case of AIDS). If one were to apply the sufficient-component cause model to tuberculosis (TB), one possible cause might be represented by the pie chart below.

 Possible component causes of a case of TB consisting of absence of exposure to TB, BCG vaccine, poor ventilation, poor nutrition,, and crowdin,

This sufficient cause may have applied to many of the people who developed TB in the United Kingdom during the 19th and 20th century. The line graph below shows the annual mortality from TB per 100,000 population from 1850 to 1960.

Graph of tuberculosis mortality in the United Kingdom from 1850 to 1950. Mortality declines in an almost linear fashion from 300 per 100,000 population to less than 10 per 100,000. There are transient increases in mortality during the first and second world wars.

During this time span the introduction of "the hygienic idea" and the subsequent development of public health initiatives led to gradual improvements in living conditions, including less crowding, better ventilation, and better nutrition. The decreased prevalence of these components is likely to have been responsible for the steady decline in TB mortality seen during this period. Note, however, the two points on the line graph that correspond to World War I and World War II when there are temporary increases in TB mortality. It is well known that the wars had a widespread impact on the population and that nutrition suffered and people were sometimes seeking shelter in bomb shelters that were poorly ventilated and crowded. The sufficient-component model to the left offers a coherent explanation for the cause of TB mortality in a large proportion of the population during this period, and it also explains the steady decline punctuated with the temporary increases seen during war time.

There may, however, be many sufficient causes of TB which may differ in their components, although some components might be shared among different sufficient causes. Consider, for example, the two sufficient causes below.

A cause of TB consisting of exposure to TB, crowding, poor ventilation, and absence of BCG vaccine. A component cause model consisting of exposure to TB, presence of AIDS, and poor nutrition,

Among the three sufficient causes of TB illustrated above, there are both similarities and differences in the composition of the components. They also differ in the number of components. For example, an individual with AIDS and poor nutrition would be severely immunocompromised, so the only component needed to complete the causation of TB would be exposure to the TB bacillus.

The figure below outlines many of the key factors and events in the transition of a normal cell to a cancerous cell. A variety of environmental factors (chemical carcinogens, radiation, viruses, etc.) might cause damage to the cell's DNA. There are DNA repair mechanisms, but these are not always successful in repairing damage; in addition, some people have inherited defects in DNA repair mechanisms. If repair is unsuccessful, a mutation may result, and if the mutation occurs in a proto-oncogene or an anti-oncogene, regulation of cell replication may be lost. If a third mutation were to occur, damaging apoptosis, then the final component cause is in place, and the cell will be cancerous.

Overview of carcinogenesis as described in the text.

The image below illustrates a sufficient cause that reflects these events. Note also that there may be a period of time before the cancer is detectable or produces symptoms.

A component cause model for cancer: High body mass index, mutated anti-ongogene frin radon, mutated proto-oncogene from tobacco carcinogen, inherited defect in DNA repair, defective apoptosis from tobacco carcinogen,

Necessary Components

Note that in the three sufficient causes of TB above, exposure to TB is present in all three because it must be present for a TB infection to occur. On the other hand, TB exposure by itself will not result in infection unless other components are also present. In other words exposure to the TB bacillus is a necessary, but not solely sufficient component. However, many, if not most, sufficient causes do not have a necessary component.

Features of the Sufficient-Component Cause Model

Aschengrau and Seage point out some of the key features of the sufficient-component cause model:

  1. A cause is not a single component, but a minimal set of conditions or events that inevitably produces the outcome.
  2. Each component in a sufficient cause is called a component cause, and epidemiologists tend to refer to the components as "causes" because the outcome will not occur by that pathway if any one of the components is missing (or prevented) within a given sufficient cause model. Consequently, it is not necessary to identify all of the component causes in order to prevent the disease outcome.
  3. There may be a number of sufficient causes for a given disease or outcome.
  4. A component cause that must be present in every sufficient cause of a given outcome is referred to as a necessary cause. For example, HIV exposure is necessary for AIDS to occur, and TB exposure is necessary for TB infection to occur.
  5. The completion of a sufficient cause is synonymous with the biologic occurrence of the outcome, e.g., the transition to a malignant cancer within a single cell marks the biologic onset of the cancer.
  6. The components of a sufficient cause do not need to act simultaneously; they can act at different times. For example, a mutation in a proto-oncogene in a prostate cell may promote cell replication at one point in time, and it may be some time later when another mutation diminishes the function of an anti-oncogene in the same cell. Thus, each component cause may have a different induction period (the interval between the exposure's presence and disease onset). In contrast, the latent period is the interval between disease onset and the clinical detection of disease, either by screening or as a result of symptoms and diagnostic work up. In the context of screening tests the latent period is referred to as the "detectable pre-clinical phase." In the context of infectious disease, it is the time between initial infection and the first appearance of symptoms.