Defining Populations

We can begin by defining a population as a collection of individuals who share at least one common or organizing characteristic. While this definition is broad, it retains the flexibility to define populations in several ways depending upon the public health question of interest. How we define populations for study affects the analysis, interpretation, and generalizability of results.

When studying population health, it is useful to define study populations based on eligibility criteria, i.e., the characteristics of individuals that make them appropriate for an epidemiologic study. 

Categories of Eligibility Criteria for Study Populations

There are three main categories that are useful for defining eligibility for a study population:

Geographic space and time, e.g., Weymouth, MA in 2002

A defining characteristic, event, or exposure, e.g., responders to the 9/11 attack on the World Trade Center or workers in a shipyard from 1940-1945)

Defined by criteria that promote the likelihood of a successful study. For example, the Physicians' Health Study enrolled over 22,000 male physicians in the US to study the efficacy of low-dose aspirin to prevent heart attacks

Public health questions often focus on specific geographic areas of varying size (village, city, county, state, country) over a specific period of time. People living in a specific location may have many common characteristics that might influence health, including climate, environmental exposures, culture, socioeconomic factors, nutrition, etc. Individuals born during the same period of time (birth cohorts) are often found to have a similar course with respect to health outcomes, and different birth cohorts may have dissimilar health outcomes. Since people frequently move from one place to another, geographically defined cohorts can be dynamic, with people moving in or moving out. Obviously, living within a given geographic area is the primary criterion for membership in the population. Given the dynamic nature of these studies, it is sometimes useful to think of the population as being comprised not of people, but as individual lengths of "person-time" during which each individual met the eligibility criteria. For example, consider a study population focusing on health issues in Woburn, MA from 1970-1980. An individual who moved from Los Angeles to Woburn in 1975 and then moved back to LA two years later would only have contributed 2 person-years of information to the overall study.

If one were interested in studying the health outcomes of newborn infants based on their birth weight, the study population would logically be comprised of neonates and would not necessarily focus narrowly on geography or year of birth. Similarly, the study population might be defined by an event such as the attacks on the World Trade Center and the health consequences among responders to that event. These two examples illustrate relatively stationary populations, but populations defined in this way can be dynamic, such as a study of 70-80 year-olds. During a longitudinal study, new subjects would continually become eligible, while others would become ineligible by virtue of exceeding the age limit or by dying.

The study population might also be defined based on the likelihood of achieving a successful study. For example, in 1981 the Physicians' Health Study invited all 261,248 male physicians between 40 and 84 years of age who lived in the United States and who were registered with the American Medical Association to participate in a randomized clinical trial to test the efficacy of low-dose aspirin and beta carotene in the primary prevention of cardiovascular disease and cancer. Almost half responded to the invitation, but there were also a number of other eligibility criteria and 26,062 were told they could not participate because of a prior history of myocardial infarction, stroke, cancer, or other excluding criteria.

The 33,223 who were eligible and willing were enrolled in a "run-in" phase during which all received active aspirin and placebo beta-carotene. After 18 weeks, participants were asked about their health status, side effects, compliance, and willingness to continue in the trial, and over 11,000 decided not to participate.

The remaining 22,071 physicians were then randomized to one of the four treatment arms of the study. Physicians were chosen because they could provide reliable information on questionnaires, and they would be easier to follow, particularly since they were all registered physicians. Restriction to those between 40-84 years old ensured a population at higher risk of having one of the outcomes of interest, and women were excluded because there were so few female physicians in that age group in 1981.

Finally, the run-in phase narrowed the population even further to the subset of physicians who were most likely to be able and willing to comply with the regimen over time. So, there were multiple eligibility criteria that enhanced the likelihood of a study that would successfully answer the questions being addressed.

Dynamic and Stationary (Fixed) Populations

An individual may meet the eligibility criteria to be included in a population at one point in time, but not at another. Populations with individuals moving in and out of eligibility are termed dynamic in contrast to stationary or fixed populations.

 A population of homeless people would be considered very dynamic, and it would be difficult to conduct a longitudinal follow up study in them. In contrast, workers who dealt with the aftermath of the attacks on the World Trade Center (a population defined by the event) would be considered a stationary or fixed population, because they had experienced the defining event and would be considered members of that cohort until they died, even if they moved elsewhere. The distinction between dynamic and stationary populations is not strict, but it is something that should be considered when designing a study. When studying relatively dynamic populations, consideration should be given to considering data collection based on the "person-time" contributed by individuals when they were eligible. This will be discussed in greater detail in the module on measuring the frequency of health events.

Sampling from a Population

When studying a population, it would be ideal to have all of the information we wanted from all members of the population. However, this is rarely possible because of the time and resources that would be required to collect the information needed. Because of this we commonly take samples that are representative of the population of interest and study them in a way that enables us to make valid inferences about the population from which they were drawn. In order to obtain accurate answers to the questions being addressed and achieve the research goals it is essential to:

These requirements go hand-in-hand, because selection of an appropriate study population is dependent upon the question being addressed. Sometimes the study population seems obvious given the research question, but the study population may be broader than that which at first seems obvious. For example, we saw previously that a study of the causes of hypertension could be conducted among male civil servants in London by comparing the characteristics of people with hypertension to those without it. However, a more complete understanding might be achieved by broadening the study population to include additional populations. When residents of Woburn, MA became alarmed by an unusually high frequency of leukemia and other diseases in the late 1970s, one avenue of study would have been to designate Woburn as the population of interest and to compare the characteristics of diseased residents to those of non-diseased residents. However, this by itself would omit other important comparisons. For example, how did the frequency of leukemia and other diseases in Woburn compare to that observed in Massachusetts in general? Or to the frequency observed across the United States? And how did environmental conditions in Woburn differ from those in other locations?