Two Fundamental Types of Study Questions

Specifying the research questions is essential to selection of an appropriate study population, and infinite questions exist. Nevertheless, Keyes and Galea stress two fundamental types of research questions which have important implications selecting an appropriate study design.  

These are:

1. Questions whose goal is accurate estimation of population parameters


Questions like these require samples that are representative of the population being studied, that is comparable to the population in their characteristics (and they require adequate sample size in order to minimize sampling error).

2. Questions whose goal is to identify and quantify exposures that have causal effects on health outcomes.


Questions like these also require an adequate sample size to precisely assess the magnitude of an effect, but they differ from questions aimed at parameter estimation in that that they require making comparisons, e.g., comparing risk between exposed and non-exposed persons. When trying to answer questions like these regarding etiology, it is not so important that the samples be representative of the overall population, but for accurate assessment of the effect the groups being compared must be comparable to each other with respect to other factors that affect the outcome.

Fundamental Study Designs for Both Representative and Purposive Studies

Keyes and Galea identify three fundamental approaches to study design that can be applied regardless of whether one's goal is to take representative samples to estimate population parameters or to take purposive samples in order to determine whether a given exposure or factor causes one or more health outcomes.

  1. One can study the sample at a particular point in time.
  2. One can follow the sample forward in time to compare the frequency of health indicators among two or more exposure groups.
  3. One can examine the retrospective exposure history of a sample.

The second option will only be utilized in analytical studies, which will be covered in a separate module, but the first two options will be seen in the next section describing several types of descritive studies.

Categories of Descriptive Epidemiology

Case Reports

A case report is a detailed description of disease occurrence in a single person. Unusual features of the case may suggest a new hypothesis about the causes or mechanisms of disease.

Example: Acquired Immunodeficiency in an Infant; Possible Transmission by Means of Blood Products

Link to article by Ammann AJ et al: Acquired immunodeficiency in an infant: possible transmission by means of blood products. The Lancet 1:956-958, 1983.

In April 1983 it had not yet been shown that AIDS could be transmitted by blood or blood products. An infant born with Rh incompatibility; required blood products from 18 donors over 8 weeks and subsequently developed unusual recurrent infections with opportunistic agents such as Candida. The infant's T cell count was low, suggesting AIDS. There was no family history of immunodeficiency, but one of the blood donors was found to have died of AIDS. This led the investigators to hypothesize that AIDS could be transmitted by blood transfusion.

Example: Survival after Treatment of Rabies with Induction of Coma.

Link to article by Willoughby R, Jr., et al: N Engl J Med 2005;352:2508-14.

Rabies is almost uniformly fatal once it develops. As of 2005 there had been only four survivors, each of whom received rabies prophylaxis after the bite, but before symptoms developed. Willoughby et al. reported on a 15 year-old girl who rescued and released a bat that had struck an interior window. The bat bit her left index finger. The wound was washed with peroxide, but medical attention was not sought, and no rabies prophylaxis was administered. One month later she began to experience progressive neurological symptoms that were eventually diagnosed as rabies. The mainstay of her treatment was medically induced coma. Eight days later blood tests demonstrated that she had begun to develop an immune response to the rabies virus. Eventually the coma was reversed, and the patient gradually regained consciousness. She had severe neurological deficits, but gradually improved. She was discharged to her home after 76 days. Five months after her initial hospitalization, she was alert and communicative, but had persistent slurred speech and an unsteady gait.

The report by Willoughby et al. is an example of a case report – a detailed description of a single subject. The report is important because it demonstrates that it is possible for victims of rabies to survive, even without post-exposure prophylaxis. However, we have no idea how effective this treatment might be.

Case Series

A case series is a report on the characteristics of a group of subjects who all have a particular disease or condition. Common features among the group may suggest hypotheses about disease causation. Note that the "series" may be small (as in the example below) or it may be large (hundreds or thousands of "cases"). However, the chief limitation is that there is no comparison group. Consequently, common features may suggest hypotheses, but these need to be tested with some sort of analytical study before an association can be accepted as valid.

Example: Pneumocystis carinii pneumonia and mucosal candidiasis in previously healthy homosexual men: evidence of a new acquired cellular immunodeficiency.

Link to article by Gottlieb MS, et al: N Engl J Med 1981;305:1425-1431.

In 1980 –1981 four previously healthy young men were diagnosed with Pneumocystis carinii pneumonia, an unusual "opportunistic" infection that had only been seen in immune compromised people with hereditary disorders or in people with immune compromise due to chemotherapy. The medical histories didn't suggest any preexisting immunodeficiency, but all had decreased immune responses and low T cell counts. These unusual infections suggested the possibility of a previously unknown disease.  It was noted that all four men were sexually active homosexuals, and in the case series which was published in the New England Journal of Medicine the authors speculated that the immune dysfunction was due to a sexually transmitted infectious agent.


This was an extraordinarily important case series (a detailed description of characteristics of a series of people who all have the same disease) that suggested that this new syndrome was associated with sexual activity in male homosexuals. Alerting the medical establishment and proposing a hypothesis was an important milestone in the AIDS epidemic, however, the association could not be securely established based on this small case series. It was not known how many other individuals might be suffering from this new syndrome. It was also not known what the prevalence of homosexuality might be in others with this syndrome or how this might compare to the overall prevalence of homosexuality in the population that gave rise to the cases. As a result, this case series could not securely establish a valid association. Nevertheless, it laid the ground work for subsequent case-control studies and cohort studies (analytic studies) that did establish the risk factors for this disease.

Example: Oral Contraceptives and Hepatocellular Carcinoma?

There had been a number of case reports of liver cancers in young women taking oral contraceptives. A study was undertaken by contacting all of the cancer registries collaborating with the American College of Surgeons. The investigators wanted to collect information on as many of these rare liver tumors as possible across the US.  

Table - Oral Contraceptive Use Among Women Who Developed Liver Cancer

Oral Contraceptive Use

Age 16-25 yrs.

Age 26-35 yrs.

Age 36-45 yrs.














What conclusions can you draw from these data regarding a possible increased risk of liver cancer in woman taking oral contraceptives? Think about it before you look at the answer.




  Key Concept: The key to identifying a case series is that all of the subjects included in the study have the primary disease or outcome of interest. For example, an article reported on 239 people who got bird flu. The article might present tables and graphs that gave information about their age, occupation, where they lived, whether they lived or died, etc., but basically it is a detailed description of the characteristics and outcomes in a group of people who all had the same disease.


Video Summary: Case Reports and Case Series (6:59)

alternative accessible content

Cross-Sectional Surveys

Cross-sectional surveys assess the prevalence of disease and the prevalence of risk factors at the same point in time and provide a "snapshot" of diseases and risk factors simultaneously in a defined population. For example, US government agencies periodically send out large surveys to random samples of the US population, asking about health status and risk factors and behaviors at that point in time. The Health Interview Survey (HIS) and the National Health and Nutrition Examination Survey (NHANES) are good examples.

Time line with an arrow focusing on a specific point in time when a survey is sent out asking about current health behaviors and current health status.

The health questionnaires you are asked to fill out when you go to a new physician or being processed for a new job, or prior to entry into military service are similar to cross-sectional surveys in that they ask about the health problems that you have (heart disease? diabetes? asthma?) and your current behaviors and risk factors (e.g., How old are you? Do you smoke? What is your occupation?).

Cross-sectional surveys ask people their current status with respect to both exposures and diseases. This results in two main disadvantages.

  1. The temporal relationship between exposure and disease outcomes can be unclear, i.e., which came first.
  2. Cross-sectional studies tend to identify prevalent cases of long duration, since people who die quickly or recover quickly or who are no longer employed in a particular occupation are less likely to be identified.

Consider the following example in which a survey was conducted among white male farm workers. The survey asked many questions, but among them were the questions: "Have you been told you have coronary heart disease (CHD)?" And "How would you classify your level of physical activity?" The table below summarizes the findings. 

Table - Current Coronary Heart Disease Among Male Farm Workers


# of Respondents

# Respondents

With CHD

Prevalence of CHD

per 1,000

Currently Not Active




Currently Physically Active




Note that the investigators did not follow these subjects over a period of time, so they did not assess the "incidence" of heart disease. Instead, they asked the subjects questions designed to determine the prevalence of heart disease, i.e., the proportion of the study population that had heart disease at this particular point in time. When they divided the sample into physically active and inactive farmers and computed the prevalence of heart disease in each of these, they found that CHD was much more prevalent among the inactive farmers. However, this was a cross-sectional study that related the prevalence of disease to the prevalence of activity at a point in time. They did not follow subjects over time to track the development of heart disease (i.e., the incidence). Consequently, the temporal relationship between the risk factor of interest (physical inactivity) and the outcome (CHD) is unclear. Had the farmers been physically active prior to developing CHD? Or, did they begin to limit their physical activity after they developed CHD? Consequently physical inactivity could have been either a cause of heart disease, or it could have been a consequence of CHD.

Large cross-sectional surveys are important for monitoring health status and health care needs of the population over time, and they are sometimes useful for suggesting possible associations between risk factors and diseases. However, the temporal relationship between the risk factor and disease is frequently unclear. Under these circumstances, they can generate hypotheses, but these associations need to be tested by appropriate analytical studies.

However, note that under some circumstances, the temporal relationship is clear on a cross-sectional survey. For example, if one conducted a survey of salaries of male and female professors to see if gender was associated with salary inequities, we could regard this as an analytical study, because it is clear that gender was established long before salary level. In this situation the temporal relationship between the "exposure" of interest (gender) and outcome (salary paid) is clear; we know that gender was established before the salary was negotiated. So, in a sense cross-sectional studies (and ecological studies can be thought of as an intermediate category between descriptive and analytic studies.

Video Summary on Cross-Sectional Surveys (8:25)

alternative accessible content

Thinking man icon indicating a question for the student  

A cross-sectional study was conducted to assess the impact of job strain on blood pressure. This study involved measuring the resting blood pressure at a single time point in men ages 18-30 and administering a validated questionnaire concerning factors related to job stress. The study found a statistically significant association between job strain and an increase in blood pressure.

Which of the following are limitations of this study that may have affected the results? (Select all that apply.)

[mark all correct answers]


Ecological Studies (Correlational Studies)

These studies are distinguished by the fact that the unit of observation is not a person; rather it is an entire population or group. In essence, these studies examine the correlation between the average exposure in various populations with the overall frequency of disease within the populations.

In the study below investigators used commerce data to compute the overall consumption of meat by various nations. They then calculated the average (per capita) meat consumption per person by dividing total national meat consumption by the number of people in a given country. There is a clear linear trend; countries with the lowest meat consumption have the lowest rates of colon cancer, and the colon cancer rate among these countries progressively increases as meat consumption increases.

Graph of colon cancer indidence in 25 countries as a function of per capita meat consumption. Countries that eat more meat have greater colon cancer incidence.

Note that in reality, people's meat consumption probably varied widely within nations, and the exposure that was calculated was an average that assumes that everyone ate the average amount of meat. This average exposure was then correlated with the overall disease frequency in each country. The example here suggests that the frequency of colon cancer increases as meat consumption increases. The characteristic of ecological studies that is most striking is that there is no information about individual people. If the data were summarized in a spread sheet, you would not see individual level data; you would see records with data on average exposure in multiple groups.

Morgenstern notes that, "Individual­ level variables are properties of individuals, and ecologic variables are properties of groups. To be more specific, ecologic measures may be classified into three types:

  1. Aggregate measures are summaries (e.g. means or proportions) of observations derived from individuals in each group (e.g. the proportion of smokers or median family income).
  2. Environmental measures are physical characteristics of the place in which members of each group live or work (e.g. air-pollution level or hours of sunlight). Note that each environmental measure has an analogue at the individual level, and these individual exposures, or doses, usually vary among members of each group, though they may remain unmeasured.
  3. Global measures are attributes of groups or places for which there is no distinct analogue at the individual level. Unlike aggregate and environmental measures (e.g. population density, level of social disorganization. or the existence of a specific law).

Morgenstern goes on to note: "Ecologic study designs may be classified on two dimensions: (a) whether the primary group is measured (exploratory vs analytic study); and (b) whether subjects are grouped by place (multiple-group study), by time (time-trend study), or by place and time (mixed study). Despite several practical advantages of ecologic studies, there are many methodologic problems that severely limit causal inference, including ecologic and cross-level bias, problems of confounder control, within-group misclassification, lack of adequate data, temporal ambiguity, collinearity, and migration across groups."

For a detailed review of ecologic studies see follow the link to an article by Morgenstern H: Ecologic Studies in Epidemiology: Concepts, Principles, and Methods. Annual Review of Public Health 1995;16:61-81.

 Thinking man icon indicating a question for the student


For an exposure to cause an outcome, both the exposure and the outcome should occur in the same person.  This is what is observed in ecologic studies.


To see an extraordinary example of an ecologic study, play the video below created by Hans Rosling. This is a magnificent example that examines the correlation between income and life expectancy in the countries of the world over time. It is also a terrific example of a creative, engaging, and powerful way to display a vast quantity of data.

alternative accessible content

Advantages of Ecological Studies:

  1. The data required is frequently readily available. Commerce data can be used to estimate a population's total consumption of products (possible risk factors) such as meat, tobacco, fish, etc. So, these studies are quick & inexpensive.
  2. The "correlation coefficient" or an "r" value provides a measure of how closely the observed data points conform to a straight line. Some authors say that the "r" value is a measure of the association between the risk factor and the disease, but this is incorrect. The slope of the line would be a measure of the strength of association.  (See the course spreadsheet "Epi_Tools. XLSX" for a worksheet that calculates correlation coefficients). The value of a correlation coefficient is from +1 (a perfect positive correlation) and –1 (a perfect negative correlation). See the tabbed activity below for examples.


alternative accessible content Roll over the tabs to view more information.
This content requires JavaScript enabled.

Limitations of Ecological Studies: It is important to bear in mind that the exposure in correlational studies is the average exposure for an entire population or group. This results in major limitations:

  1. Since you don't have any information about the risk factor status or the outcome status of individual people, you can't directly link the risk factor to the disease, i.e., it is not clear that the people who ate the most meat were the ones who got colon cancer. This is sometimes referred to as "ecological bias" or the "ecological fallacy."
  2. Another limitation is that there is no effective way of taking into account, or adjusting for, other factors that influence the outcome (confounding factors). As a result, an apparent correlation, or the lack of a correlation could be misleading. For example, one might find a strong correlation between the average number of hours of TV viewing & the rate of coronary artery disease among different countries. However, this doesn't necessarily mean that TV per se is a risk factor for CAD. There may be a number of other differences between the populations that are associated with higher rates of TV viewing: e.g., greater industrialization, less exercise, greater availability of processed foods and saturated fat, and so forth. And conversely, the lack of a correlation doesn't necessarily imply that there is no association.
  3. Since the exposure levels represent average exposure in a large number of people, correlational studies can mask more complicated relationships, as illustrated below.

When a correlational study compared per capita alcohol consumption to death rates from coronary heart disease in different countries, it appeared that there was a fairly striking negative correlation.

 Graph of per capita alcohol consumption and death rates from coronary heart disease. There appears to be a modest negative correlation.

However, a meta-analysis of prospective cohort studies which determined mortality rates in subjects for whom they had estimates of individual alcohol consumption, showed that there was actually a "J" shaped relationship. The people who drank the most actually had the highest mortality rates; moderate drinkers had the lowest mortality. This relationship was masked in the correlational study, because of the small percentage of people who have more than three drinks per day.

  Results of a cohort study suggesting that risk of death decreases somewhat in subjects with modest alcohol consumption but then rises at higher levels of consumption

Adapted from: Di Castelnuovo A, Costanzo S, et al.: Alcohol Dosing and Total Mortality in Men and Women:  

An Updated Meta-analysis of 34 Prospective Studies. Arch Intern Med. 2006;166(22):2437-2445.

 Video Summary for Ecological Studies (7:48)

alternative accessible content