Two Fundamental Types of Study Questions

Specifying the research questions is essential to selection of an appropriate study population. There are two fundamental types of research questions that have important implications for selecting an appropriate study design.  

Descriptive Research

Descriptive research aims to accurately estimate and describe the frequency of health outcomes and health-related exposures in the population; this requires a representative sample.

Questions like these require samples that are representative of the population being studied, that is comparable to the population in their characteristics. As with all studies, they also require adequate sample size in order to minimize sampling error and to obtain accurate estimates of population parameters.

Analytic (Causal) Research

This second fundamental type of research, analytic research, aims to identify determinants of disease by comparing groups of people to identify valid associations between exposures and health outcomes. This requires more restricted samples, as for example,
when The Physicians' Health Study recruited over 22,000 male physicians in the United States in 1981 to test the efficacy of low-dose aspirin (versus placebo) in preventing myocardial infarctions (heart attacks). Instead of enrolling subjects representative of the general population, they wanted to enroll a large sample of subjects who would be easy to follow for a long period of time. Physicians in the United States are registered and easy to track down, even if they move. They also wanted to enroll subjects whose age put them at risk for developing a heart attack in order to have a sufficient number of "events" to do an adequate analysis. Therefore, they enrolled subjects who were 40 to 84 years old. They also restricted the study to males, because in 1981 there were relatively few female physicians in this age range. While these restrictions increased the likelihood of achieving a successful study with a valid conclusion, they limited the ability to generalize the findings to the general population since the sample was not representative.

Questions like these also require an adequate sample size to precisely assess the strength of an association, but they differ from questions aimed at estimating frequencies in the overall population in that that they require making comparisons, e.g., comparing risk between exposed and non-exposed persons. When trying to answer questions like these regarding etiology, it is not so important that the samples be representative of the overall population. Instead, the key is to compare groups that are comparable to each other with respect to other factors that affect the outcome (so-called "confounding factors").

In the aspirin study the investigators also allocated subjects to the treatment groups randomly in order to ensure their comparability. Questions arose later regarding the applicability (generalizability) of the results to women and even to males who were not physicians, but at least the investigators could confidently conclude that low-dose aspirin had significantly reduced the incidence of myocardial infarction in the subset of the population they had studied. In fact, the random assignment of over 22,000 subjects achieved remarkable comparability among the comparison groups with respect to many known risk factors for heart disease.

Drawing Samples from a Population

Drawing Representative Samples for Estimating Population Parameters

When the goal is to draw a sample that is representative of the population in order to estimate population parameters, one can simple draw a simple random sample, meaning that selection is done by any method such that each individual in the study population has an equal chance of being selected, and the selection of any member does not influence the chances of any other member being selected.

Ideally, one would identify a sampling frame, i.e., a complete list or enumeration of all of the population elements (e.g., people, houses, phone numbers, etc.). Each of these is assigned a unique identification number, and elements are selected at random to determine the individuals to be included in the sample. As a result, each element has an equal chance of being selected, and the probability of being selected can be easily computed. This sampling strategy is most useful for small populations, because it requires a complete enumeration of the population as a first step.

map of Weymouth, MA


Weymouth, MA conducted a town-wide survey in order to assess the health status of the town. The survey was mailed to a random sample of 5,054 households in Weymouth, stratified by zip code to ensure a representative sample from the entire town. Of these, 3,201 surveys were completed and returned, giving a response rate of 63.3%.



Random Selection

Many introductory statistical textbooks contain tables of random numbers that can be used to ensure random selection, and statistical computing packages can be used to generate random numbers. Excel, for example, has a built-in function that can be used to generate random numbers, and statistical packages such as R can also generate random numbers.

Drawing Samples to Identify the Determinants of Health and Disease

Ultimately, we would like to identify the causes of health and disease, but establishing causal relationships requires that a number of conditions are met, and we will explore this in more detail in the next section of this week's materials. For now we can simply ask "Does a certain exposure (E) cause a particular health outcome (O)?"

Exposure (E)              Outcome (O)

The primary goal of analytic research is to identify determinants of health and disease. The putative causes are generally referred to as exposures, and the potential results are referred to as health outcomes.

An exposure is any measurable characteristic that differs across individuals and might affect or be associated with health or disease. Potentially relevant exposures may include any of the following:

A health outcome is any measurable disease, disability, injury, infection, syndrome, symptom, biological or subclinical marker, or health state (positive or negative). Examples might include: