Descriptive Epidemiology

Introduction

EssentialFunctions.jpg The image to the right illustrates the ten essential functions of public health. Epidemiology plays a particularly important role for three of the functions: monitoring, investigating, and evaluating. Disease surveillance systems and health data sources provide the raw information necessary to monitor trends in health and disease. Descriptive epidemiology provides a way of organizing and analyzing these data in order to understand variations in disease frequency geographically and over time, and how disease (or health) varies among people based on a host of personal characteristics (person, place, and time). This makes it possible to identify trends in health and disease and also provides a means of planning resources for populations. In addition, descriptive epidemiology is important for generating hypotheses (possible explanations) about the determinants of health and disease. By generating hypotheses, descriptive epidemiology also provides the starting point for analytic epidemiology, which formally tests associations between potential determinants and health or disease outcomes. Specific tasks of descriptive epidemiology are the following:

Learning Objectives

After successfully completing this unit, the student will be able to:

 

 busph_subbrand.gif

Hypothesis Formulation – Characteristics of Person, Place, and Time

Descriptive epidemiology searches for patterns by examining characteristics of person, place, & time. These characteristics are carefully considered when a disease outbreak occurs, because they provide important clues regarding the source of the outbreak.

Hypotheses about the determinants of disease arise from considering the characteristics of person, place, and time and looking for differences, similarities, and correlations. Consider the following examples:

Descriptive epidemiology provides a way of organizing and analyzing data on health and disease in order to understand variations in disease frequency geographically and over time and how disease varies among people based on a host of personal characteristics (person, place, and time). Epidemiology had its origins in the desire to understand the determinants of acute infectious diseases, but its methods and applicability have expanded to include chronic diseases as well.

Descriptive Epidemiology for Infectious Disease Outbreaks

Outbreaks generally come to the attention of state or local health departments in one of two ways:

  1. Astute individuals (citizens, physicians, nurses, laboratory workers) will sometimes notice cases of disease occurring close together with respect to time and/or location or they will notice several individuals with unusual features of disease and report them to health authorities.
  2. Public health surveillance systems collect data on 'reportable diseases'. Requirements for reporting infectious diseases in Massachusetts are described in 105 CMR 300.000 (Reportable Diseases, Surveillance, and Isolation and Quarantine Requirements).

Clues About the Source of an Outbreak of Infectious Disease

When an outbreak occurs, one of the first things that should be considered is what is known about that particular disease. How can the disease be transmitted? In what settings is it commonly found? What is the incubation period? There are many good summaries available online. For example, Massachusetts DPH provides this fact sheet for Hepatitis A, which provide a very succinct summary. With this background information in mind, the initial task is to begin to characterize the cases in terms of personal characteristics, location, and time (when did they become ill and where might they have been exposed given the incubation period for that disease. In sense, we are looking for the common element that explains why all of these people became ill. What do they have in common?

"Person"

Information about the cases is typically recorded in a "line listing," a grid on which information for each case is summarized with a separate column for each variable. Demographic information is always relevant, e.g., age, sex, and address, because they are often the characteristics most strongly related to exposure and to the risk of disease. In the beginning of an investigation a small number of cases will be interviewed to look for some common link. These are referred to as "hypothesis-generating interviews." Depending on the means by which the disease is generally transmitted, the investigator might also want to know about other personal characteristics, such as travel, occupation, leisure activities, use of medications, tobacco, drugs. What did these victims have in common? Where did they do their grocery shopping? What restaurants had they gone to in the past month or so? Had they traveled? Had they been exposed to other people who had been ill? Other characteristics will be more specific to the disease under investigation and the setting of the outbreak. For example, if you were investigating an outbreak of hepatitis B, you should consider the usual high-risk exposures for that infection, such as intravenous drug use, sexual contacts, and health care employment. Of course, with an outbreak of foodborne illness (such as hepatitis A), it would be important to ask many questions about possible food exposures. Where do you generally eat your meals? Do you ever eat at restaurants or obtain foods from sources outside the home? Hypothesis generating interviews may quickly reveal some commonalities that provide clues about the possible sources.

 

"Place"

Assessment of an outbreak by place provides information on the geographic extent of a problem and may also show clusters or patterns that provide clues to the identity and origins of the problem. A simple and useful technique for looking at geographic patterns is to plot, on a "spot map" of the area, where the affected people live, work, or may have been exposed. A spot map of cases may show clusters or patterns that reflect water supplies, wind currents, or proximity to a restaurant or grocery store.

In 1854 there was an epidemic of cholera in the Broad Street area of London. John Snow determined the residence or place of business of the victims and plotted them on a street map (the stacked black disks on the map). He noted that the cases were clustered around the Broad Street community pump. It was also noteworthy that there were large numbers of workers in a local workhouse and a brewery, but none of these workers were affected - the workhouse and brewery each had their own well. For a large blow-up of the map, click here.

spotmap.jpg

 

On a spot map within a hospital, nursing home, or other such facility, clustering usually indicates either a focal source or person-to-person spread, while the scattering of cases throughout a facility is more consistent with a common source such as a dining hall. In studying an outbreak of surgical wound infections in a hospital, we might plot cases by operating room, recovery room, and ward room to look for clustering.

 

"Time"

In the "Hepatitis in Sparta" outbreak, the investigators recorded the date of onset of disease for each of the victims. This enabled them to create an "epidemic curve" which showed how the occurrence of disease varied over time. The epidemic curve for the Sparta outbreak is shown below. Knowing that the incubation period for hepatitis A is around 28-30 days, they were able to ascertain from the shape and width of the curve that this was a point source epidemic (see explanation below). This, in conjunction with other information, provided important clues that helped shape their hypotheses about the source of the outbreak.

 epidemic curve.jpg

Epidemic Curves

An "epidemic curve" shows the frequency of new cases over time based on the date of onset of disease. The shape of the curve in relation to the incubation period for a particular disease can give clues about the source. There are three basic types of epidemic curve.

Point source outbreaks (epidemics) involve a common source, such as contaminated food or an infected food handler, and all the exposures tend to occur in a relatively brief period. Consequently, point source outbreaks tend to have epidemic curves with a rapid increase in cases followed by a somewhat slower decline, and all of the cases tend to fall within one incubation period.  The graph above from a hepatitis outbreak is an example of a point source epidemic. The incubation period for hepatitis ranges from 15-50 days, with an average of about 28-30 days. In a point source epidemic you would expect the rise and fall of new cases to occur within about a 30 day span of time, which is what is seen above.

EpidemicCurve_Cholera.gif

Continuous common source epidemics may also rise to a peak and then fall, but the cases do not all occur within the span of a single incubation period. This implies that there is an ongoing source of contamination. The down slope of the curve may be very sharp if the common source is removed or gradual if the outbreak is allowed to exhaust itself. The epidemic curve below is from a cholera outbreak in London in 1854 that was investigated by Dr. John Snow. Cholera has an incubation period of 1-3 days, and even though residents began to flee when the outbreak erupted, you can see that this outbreak lasted for more than a single incubation period. This suggests an ongoing source of infection, in this case the Broad Street pump.

 

EpidemicCurve_Measles.gif

 

Propagated (or progressive source) epidemic. The epidemic curve shown below is from an outbreak of measles that began with a single index case who infected a number of other individuals. (The incubation period for measles averages 10 days with a range of 7-18 days.) One or more of the people infected in the initial wave infected a group of people who become the second wave of infection. So here transmission is person-to-person, rather than from a common source. Propagated epidemic curves usually have a series of successively larger peaks, which are one incubation period apart. The successive waves tend to involve more and more people, until the pool of susceptible people is exhausted or control measures are implemented. This is an ideal example, however; in reality, most of these epidemics do not produce the classic pattern.

For some outbreaks the descriptive information is all that is needed to figure out the source, and control measures can be undertaken rapidly. In other cases, this descriptive information (person, place, and time) helps generate hypotheses about the source, but it isn't obvious what the source is. When this occurs, it is necessary to test the hypotheses by conducting an analytical study, i.e. either a case-control study or a cohort study. This means collecting data and analyzing it in order to identify the source. After the hepatitis outbreak in Marshfield, DPH conducted a case-control study. After an outbreak of Giardia in Milton, MA, a retrospective cohort study was conducted. However, it is important to recognize that you can't test a hypothesis unless you have one to test. So, the descriptive studies that generate hypotheses are essential.

 Toggle open/close quiz question

 

Salmonella_epidemic_curve.jpg

 

Steps in the Investigation of a Disease Outbreak

Most outbreak investigations involve the following steps:

  1. Preparation for the investigation
  2. Verifying the diagnosis and establishing the existence of an outbreak
  3. Establishing a case definition and finding cases
  4. Conducting descriptive epidemiology to determine the personal characteristics of the cases, changes in disease frequency over time, and differences in disease frequency based on location.
  5. Developing hypotheses about the cause or source
  6. Evaluating the hypotheses & refining the hypotheses and conducting additional studies if necessary
  7. Implementing control and prevention measures
  8. Communicating the findings

Some of these steps may be conducted simultaneously, and the order may vary depending on the circumstances. For example, if new cases are continuing to occur and there are steps that can be taken to control the outbreak and prevent more cases, then certainly control and prevention measures would take top priority.

General Information on Outbreak Investigations

For an overview of outbreak investigations for foodborne illness see the CDC web page linked here. Other good general sources of information on how to conduct outbreak investigations can be found in the University of North Carolina (UNC) online Focus on Field Epidemiology series. The following online articles may be of interest:

Volume 1

  • Issue #1: Overview of Outbreak Investigations
  • Issue #2: Anatomy and Physiology of an Outbreak Team
  • Issue #3: Embarking on an Outbreak Investigation
  • Issue #4: Case Finding and Line Listing: A Guide for Investigators
  • Issue #5: Epidemic Curves Ahead with a Focus Flash on Creating an Epidemic Curve in Excel
  • Issue #6:Hypothesis Generation During Outbreaks

Volume 2:

  • Issue #1: Hypothesis-Generating Interviews
  • Issue #2: Developing a Questionnaire
  • Issue #3: Interviewing Techniques

Another good general resource is "Hepatitis in Sparta." This is an online interactive teaching case that thrusts the student into the role of investigator trying to determine the source for an outbreak of hepatitis cases in the town of Sparta.

 

Descriptive Epidemiology for Chronic Diseases

The same questions about person, time, and place can be applied to chronic diseases.  Who are the people who have the disease? What are their characteristics? What is their occupation? Where do they live and work? How did disease occurrence vary over time?

Personal Characteristics

Personal characteristics also provide clues about the causes of chronic diseases. Many disease vary in relation to age and gender, but many other characteristics are also important, such as occupation, diet, sexual activity, travel history, and personal behaviors (exercise, smoking, etc.)

Age-specific Rates of Disease

CHD_freq_Age.jpg

Because so many diseases vary in relation to disease, one frequently sees disease rates categorized this way - so-called "age-specific rates of disease." Mortality rates are very low in the youngest age groups & similar in males and females. In adulthood the mortality rates rise sharply and become higher in males. Although the mortality rate continues to rise into old age, the gender difference begins to narrow. One might describe this as a chronic, progressive disease in which the gender differences raise the question of whether sex hormones play a role, particularly since females begin to catch up after menopause occurs.

 

 

 

 

Differences by Race and Ethnicity

Age and gender are not the only categories we might want to stratify across. We might be interested in stratifying by any factor that might influence disease occurrence. For example, if disease frequency might differ across racial groups, we might want to look at race-specific disease rates, as shown in the table below. Ethnic and racial differences in disease rates sometimes have a genetic basis, e.g., sickle cell anemia in people of African descent or beta thalassemia in people of Mediterranean descent, but in other cases racial differences are due to environmental or socioeconomic factors

Annual Mortality Rates per 100,000 population in the US, 1967

 

Cause

White

Non-White

Homicide

3.5

32.3

Tuberculosis

2.5

9.6

Hypertensive heart disease

21.1

68.6

Diabetes mellitus

16.6

28.9

Pneumonia

26.0

42.4

Non-MVA accidents

28.6

43.9

MVA (motor vehicle)

26.5

29.8

Cirrhosis of liver

13.2

19.9

Respiratory cancer

28.9

29.8

Leukemia

7.4

5.5

 

Other Personal Characteristics

Besides age, gender and race/ethnicity, other personal characteristics that might be important to consider are:

Place: Variation by Location

Differences in disease frequency by location provides important clues about the determinants of chronic diseases. Where does the disease tend to occur?

Map_Gastric_Cancer.jpg

Example 1: Stomach Cancer by Location in the US

These maps show death rates from stomach cancer in females (top) and males (below) in different US counties, with the darkest areas indicating the highest rates, and white showing rates that are below the national average. It is perhaps noteworthy that the "hot spots" in the north–central part of the country coincide with areas having people of Scandinavian descent who have a tradition of eating smoked fish. Could the high rates of stomach cancer be the result of their consumption of smoked fish or other traditional methods of food preservation?

       

Example 2: Differences in Rates of Stomach Cancer in Japan and US

Rates of stomach cancer also vary among countries. Japanese have a higher rate of stomach cancer than Caucasians in California. Is this due to a genetic difference? A dietary difference? The rate among Japanese people diminishes after they move to US, and diminishes even more in their offspring. One possibility is that once the Japanese move here, they begin to shift to an American diet, and this trend is even stronger in their children. Are there important dietary differences? Could consumption of large amounts of smoked fish be a cause of stomach cancer?

 

 

Mortality Rate

(per 100,000 population)

Japanese in Japan

58.4

Japanese immigrants to California

29.9

Sons of Japanese immigrants

11.7

Native Californians (Caucasians)

8.0

 

Variation in Disease Over Time

TB_mortality_over_time_UK.jpg

Changes in disease rate over time can also provide clues for chronic diseases.

 

Example 1: Annual Mortality from Pulmonary Tuberculosis in England and Wales

TB is one of the great killers of all times. The graph on the right shows the mortality rate from TB from 1855-1955 in England and Wales. The remarkable downward trend began well before the development of antibiotics. The steady improvement was probably a direct result of "the sanitary idea" which resulted in concerted efforts to improve working and living conditions, nutrition, ventilation, and waste management. Also, note the increases in TB mortality that occurred during World War I and World War II. This suggests that nutritional deficiencies, translocation, crowding, and other adverse circumstances associated with war are contributing factors to the causation of TB.

  

Example 2: Toxic Shock and Rely Tampons

RelyTampons.jpg

In January 1980 there were several reports of toxic shock syndrome due to infection with Staphylococcus aureus bacteria, and the descriptive epidemiology indicated that the problem was occurring primarily in menstruating women. A CDC task force investigated and eventually traced the outbreak to the introduction of Rely tampons, a super absorbent product marketed by Proctor and Gamble. The monthly cases of toxic shock syndrome in 1980-1981 are shown in the graph on the left [from A. Reingold et al., Toxic shock syndrome surveillance in the United States, 1980-1981. Ann. Intern. Med 96:875, 1982].

There were actual two pieces of evidence related to time variations that supported Rely tampons as the cause. First, descriptive epidemiology suggested a link to menstruation, leading doctors to take bacterial cultures from the vagina. This provided a key clue suggesting a link to certain brands of tampons. In addition, the frequency of toxic shock syndrome clearly correlated with the introduction and subsequent removal of Rely tampons from the market.

 

Other Factors That Can Produce Changes in Disease Frequency Over Years or Decades

If the frequency of a disease or mortality from a disease changes over time, there are several factors which could be responsible:

Categories of Descriptive Epidemiology

Case Reports

A case report is a detailed description of disease occurrence in a single person. Unusual features of the case may suggest a new hypothesis about the causes or mechanisms of disease.

Example 1: Acquired Immunodeficiency in an Infant; Possible Transmission by Means of Blood Products

Ammann AJ et al: Acquired immunodeficiency in an infant: possible transmission by means of blood products. The Lancet 1:956-958, 1983.

In April 1983 it had not yet been shown that AIDS could be transmitted by blood or blood products. An infant born with Rh incompatibility; required blood products from 18 donors over 8 weeks and subsequently developed unusual recurrent infections with opportunistic agents such as Candida. The infant's T cell count was low, suggesting AIDS. There was no family history of immunodeficiency, but one of the blood donors was found to have died of AIDS. This led the investigators to hypothesize that AIDS could be transmitted by blood transfusion.

Example 2: Survival after Treatment of Rabies with Induction of Coma. Willoughby R, Jr., et al: N Engl J Med 2005;352:2508-14.

Rabies is almost uniformly fatal once it develops. As of 2005 there had been only four survivors, each of whom received rabies prophylaxis after the bite, but before symptoms developed. Willoughby et al. reported on a 15 year-old girl who rescued and released a bat that had struck an interior window. The bat bit her left index finger. The wound was washed with peroxide, but medical attention was not sought, and no rabies prophylaxis was administered. One month later she began to experience progressive neurological symptoms that were eventually diagnosed as rabies. The mainstay of her treatment was medically induced coma. Eight days later blood tests demonstrated that she had begun to develop an immune response to the rabies virus. Eventually the coma was reversed, and the patient gradually regained consciousness. She had severe neurological deficits, but gradually improved. She was discharged to her home after 76 days. Five months after her initial hospitalization, she was alert and communicative, but had persistent slurred speech and an unsteady gait.

The report by Willoughby et al. is an example of a case report – a detailed description of a single subject. The report is important because it demonstrates that it is possible for victims of rabies to survive, even without post-exposure prophylaxis. However, we have no idea how effective this treatment might be.

Case Series

A case series is a report on the characteristics of a group of subjects who all have a particular disease or condition. Common features among the group may suggest hypotheses about disease causation. Note that the "series" may be small (as in the example below) or it may be large (hundreds or thousands of "cases"). However, the chief limitation is that there is no comparison group. Consequently, common features may suggest hypotheses, but these need to be tested with some sort of analytical study before an association can be accepted as valid.

Example 3: Pneumocystis carinii pneumonia and mucosal candidiasis in previously healthy homosexual men: evidence of a new acquired cellular immunodeficiency. Gottlieb MS, et al: N Engl J Med 1981;305:1425-1431.

Case_Series_5-men.png

In 1980 –1981 four previously healthy young men were diagnosed with Pneumocystis carinii pneumonia, an unusual "opportunistic" infection that had only been seen in immune compromised people with hereditary disorders or in people with immune compromise due to chemotherapy. The medical histories didn't suggest any preexisting immunodeficiency, but all had decreased immune responses and low T cell counts. These unusual infections suggested the possibility of a previously unknown disease.  It was noted that all four men were sexually active homosexuals, and in the case series which was published in the New England Journal of Medicine the authors speculated that the immune dysfunction was due to a sexually transmitted infectious agent.

 

This was an extraordinarily important case series (a detailed description of characteristics of a series of people who all have the same disease) that suggested that this new syndrome was associated with sexual activity in male homosexuals. Alerting the medical establishment and proposing a hypothesis was an important milestone in the AIDS epidemic, however, the association could not be securely established based on this small case series. It was not known how many other individuals might be suffering from this new syndrome. It was also not known what the prevalence of homosexuality might be in others with this syndrome or how this might compare to the overall prevalence of homosexuality in the population that gave rise to the cases. As a result, this case series could not securely establish a valid association. Nevertheless, it laid the ground work for subsequent case-control studies and cohort studies (analytic studies) that did establish the risk factors for this disease.

Example 4: Oral Contraceptives and Hepatocellular Carcinoma? There had been a number of case reports of liver cancers in young women taking oral contraceptives. A study was undertaken by contacting all of the cancer registries collaborating with the American College of Surgeons. The investigators wanted to collect information on as many of these rare liver tumors as possible across the US.  

OC_Liver_Cancer.jpg  

What conclusions can you draw from these data regarding a possible increased risk of liver cancer in woman taking oral contraceptives? Think about it before you look at the answer.

 Answer

 

Key Concept

The key to identifying a case series is that all of the subjects included in the study have the primary disease or outcome of interest. For example, an article reported on 239 people who got bird flu. The article might present tables and graphs that gave information about their age, occupation, where they lived, whether they lived or died, etc., but basically it is a detailed description of the characteristics and outcomes in a group of people who all had the same disease.

Cross-Sectional Surveys

Cross-Sectional_Surveys.png Cross-sectional surveys assess the prevalence of disease and the prevalence of risk factors at the same point in time and provide a "snapshot" of diseases and risk factors simultaneously in a defined population. For example, US government agencies periodically send out large surveys to random samples of the US population, asking about health status and risk factors and behaviors at that point in time. The Health Interview Survey (HIS) and the National Health and Nutrition Examination Survey (NHANES) are good examples.

The health questionnaires you are asked to fill out when you go to a new physician or being processed for a new job, or prior to entry into military service are similar to cross-sectional surveys in that they ask about the health problems that you have (heart disease? diabetes? asthma?) and your current behaviors and risk factors (e.g., How old are you? Do you smoke? What is your occupation?).

Cross-sectional surveys ask people their current status with respect to both exposures and diseases. This results in two main disadvantages.

  1. The temporal relationship between exposure and disease outcomes can be unclear, i.e., which came first.
  2. Cross-sectional studies tend to identify prevalent cases of long duration, since people who die quickly or recover quickly or who are no longer employed in a particular occupation are less likely to be identified.

Consider the following example in which a survey was conducted among white male farm workers. The survey asked many questions, but among them were the questions: "Have you been told you have coronary heart disease (CHD)?" And "How would you classify your level of physical activity?" The table below summarizes the findings. 

CHD_FarmWorkers.png  

Note that the investigators did not follow these subjects over a period of time, so they did not assess the "incidence" of heart disease. Instead, they asked the subjects questions designed to determine the prevalence of heart disease, i.e., the proportion of the study population that had heart disease at this particular point in time. When they divided the sample into physically active and inactive farmers and computed the prevalence of heart disease in each of these, they found that CHD was much more prevalent among the inactive farmers. However, this was a cross-sectional study that related the prevalence of disease to the prevalence of activity at a point in time. They did not follow subjects over time to track the development of heart disease (i.e., the incidence). Consequently, the temporal relationship between the risk factor of interest (physical inactivity) and the outcome (CHD) is unclear. Had the farmers been physically active prior to developing CHD? Or, did they begin to limit their physical activity after they developed CHD? Consequently physical inactivity could have been either a cause of heart disease, or it could have been a consequence of CHD.

Large cross-sectional surveys are important for monitoring health status and health care needs of the population over time, and they are sometimes useful for suggesting possible associations between risk factors and diseases. However, the temporal relationship between the risk factor and disease is frequently unclear. Under these circumstances, they can generate hypotheses, but these associations need to be tested by appropriate analytical studies.

However, note that under some circumstances, the temporal relationship is clear on a cross-sectional survey. For example, if one conducted a survey of salaries of male and female professors to see if gender was associated with salary inequities, we could regard this as an analytical study, because it is clear that gender was established long before salary level. In this situation the temporal relationship between the "exposure" of interest (gender) and outcome (salary paid) is clear; we know that gender was established before the salary was negotiated. So, in a sense cross-sectional studies (and ecological studies can be thought of as an intermediate category between descriptive and analytic studies.

 Summary Video on Cross-sectional Surveys

Thinking.gif  Toggle open/close quiz question

Ecological Studies (Correlational Studies)

MeatConsumption_ColonCa.jpg These studies are distinguished by the fact that the unit of observation is not a person; rather it is an entire population or group. In essence, these studies examine the correlation between the average exposure in various populations with the overall frequency of disease within the populations.

In the study to the right investigators used commerce data to compute the overall consumption of meat by various nations. They then calculated the average (per capita) meat consumption per person by dividing total national meat consumption by the number of people in a given country. Note that in reality, people's meat consumption probably varied widely within nations, and the exposure that was calculated was an average that assumes that everyone ate the average amount of meat. This average exposure was then correlated with the overall disease frequency in each country. The example here suggests that the frequency of colon cancer increases as meat consumption increases. The characteristic of correlational studies that is most striking is that there is no information about individual people. If the data were summarized in a spread sheet, you would not see individual level data; you would see records with data on average exposure in multiple groups.

Morgenstern notes that, "Individual­ level variables are properties of individuals, and ecologic variables are prop­erties of groups. To be more specific, ecologic measures may be classified into three types:

  1. Aggregate measures are summaries (e.g. means or proportions) of observations derived from individuals in each group (e.g. the proportion of smokers or median family income).
  2. Environmental measures are physical characteristics of the place in which members of each group live or work (e.g. air-pollution level or hours of sunlight). Note that each environmental measure has an analogue at the individual level, and these individual exposures, or doses, usually vary among members of each group, though they may remain unmeasured.
  3. Global measures are attributes of groups or places for which there is no distinct analogue at the individual level. unlike aggregate and environmen­tal measures (e.g. population density, level of social disorganization. or the existence of a specific law).

Morgenstern goes on to note: "Ecologic study designs may be classified on two dimensions: (a) whether the primary group is measured (exploratory vs analytic study); and (b) whether subjects are grouped by place (multiple-group study), by time (time-trend study), or by place and time (mixed study). Despite several practical advantages of ecologic studies, there are many methodologic problems that severely limit causal inference, including ecologic and cross-level bias, problems of confounder control, within-group misclassification, lack of adequate data, temporal ambi­guity, collinearity, and migration across groups."

For a detailed review of ecologic studies see Morgenstern H: Ecologic Studies in Epidemiology: Concepts, Principles, and Methods. Annual Review of Public Health 1995;16:61-81.

 

 

 Toggle open/close quiz question

To see an extraordinary example of an ecologic study, play the video below created by Hans Rosling. This is a magnificent example that examines the correlation between income and life expectancy in the countries of the world over time. It is also a terrific example of an creative, engaging, and powerful way to display a vast quantity of data.

Advantages of Ecological Studies:

  1. The data required is frequently readily available. Commerce data can be used to estimate a population's total consumption of products (possible risk factors) such as meat, tobacco, fish, etc. So, these studies are quick & inexpensive.
  2. The "correlation coefficient" or an "r" value provides a measure of how closely the observed data points conform to a straight line. Some authors say that the "r" value is a measure of the association between the risk factor and the disease, but this is incorrect. The slope of the line would be a measure of the strength of association.  (See the course spreadsheet "EpiTools. XLS" for a worksheet that calculates correlation coefficients). The value of a correlation coefficient is from +1 (a perfect positive correlation) and –1 (a perfect negative correlation). See the tabbed activity below for examples.

  

Limitations of Ecological Studies: It is important to bear in mind that the exposure in correlational studies is the average exposure for an entire population or group. This results in major limitations:

  1. Since you don't have any information about the risk factor status or the outcome status of individual people, you can't directly link the risk factor to the disease, i.e., it is not clear that the people who ate the most meat were the ones who got colon cancer. This is sometimes referred to as "ecological bias" or the "ecological fallacy."
  2. Another limitation is that there is no effective way of taking into account, or adjusting for, other factors that influence the outcome (confounding factors). As a result, an apparent correlation, or the lack of a correlation could be misleading. For example, one might find a strong correlation between the average number of hours of TV viewing & the rate of coronary artery disease among different countries. However, this doesn't necessarily mean that TV per se is a risk factor for CAD. There may be a number of other differences between the populations that are associated with higher rates of TV viewing: e.g., greater industrialization, less exercise, greater availability of processed foods and saturated fat, and so forth. And conversely, the lack of a correlation doesn't necessarily imply that there is no association.
  3. Since the exposure levels represent average exposure in a large number of people, correlational studies can mask more complicated relationships, as illustrated below.

When a correlational study compared per capita alcohol consumption to death rates from coronary heart disease in different countries, it appeared that there was a fairly striking negative correlation.

 Alcohol_CHD_ecologic.jpg

However, a meta-analysis of prospective cohort studies whiich determined mortality rates in subjects for whom they had estimates of individual alcohol consumption, showed that there was actually a "J" shaped relationship. The people who drank the most actually had the highest mortality rates; moderate drinkers had the lowest mortality. This relationship was masked in the correlational study, because of the small percentage of people who have more than three drinks per day.

  

Adapted from: Di Castelnuovo A, Costanzo S, et al.: Alcohol Dosing and Total Mortality in Men and Women:  An Updated Meta-analysis of 34 Prospective Studies. Arch Intern Med. 2006;166(22):2437-2445.

 

Summary & Self-Check

Descriptive studies are useful for:

Thinking.gif

 Toggle open/close quiz question

 Toggle open/close quiz question

 Toggle open/close quiz question

Data Presentation

In order to be useful, the data must be organized and analyzed in a thoughtful, structured way, and the results must be be communicated in a clear, effective way to both the public health workforce and the community at large. Some simple standards are useful to promote clear presentation. Compiled data are commonly summarized in tables, graphs, or some combination.

Simple guidelines for tables.

  1. Provide a concise descriptive title.
  2. Label the rows and columns.
  3. Provide the units in the column headers.
  4. Provide the column total, if appropriate.
  5. If necessary, additional explanatory information may be provided in a footnoted legend immediately beneath the title.

Table 3 - Treatment with Antihypertensive Medication in Men and Women

Sex

Number on Treatment / n

Relative Frequency, %

Male

611/1,622

37.7

Female

608/1,910

31.8

Total

1,219/3,532

34.5

Simple guidelines for figures:

  1. Include a concise descriptive title.
  2. Label the axes clearly showing unit where appropriate.
  3. Use appropriate scales for the vertical and horizontal axes that display the results without exaggerating them with a ranges that are either too expansive or too restrictive.
  4. For line graphs with multiple groups include a simple legend if necessary.

Figure 5 - Relative Frequency of Antihypertensive Medication Use in Men and Women

RelativeFrequencyBarChart-AntiHTN_by_sex.png

Additional resources for summarizing and presenting data:

  1. See the online module for the SPH Biostatistics core course. http://sph.bu.edu/otlt/Sullivan//Web_Pages/BS701_SummarizingData/
  2. Torok M and Anderson M: "Focus on Field Epidemiology: Volume 5; Issue 5:Introduction to Public Health Surveillance."
  3. The CDC also provides another good resource for advice about organizing epidemiologic data.

Other Resources

  1. University of North Carolina (UNC) -Torok M and Anderson M: "Focus on Field Epidemiology: Volume 5; Issue 5:Introduction to Public Health Surveillance."
  2. University of North Carolina (UNC) - Anderson M: "Focus on Field Epidemiology: Volume 5; Issue 6: Public Health Surveillance Systems".
  3. Trifonov V, Khiabanian H, Rabadan R: Geographic Dependence, Surveillance, and Origins of the 2009 Influenza A (H1N1) Virus. Perspective article in: N. Engl. J. Med. 2009;361(2):115-119.  
  4. Scallan E, Hoekstra RM, Angulo FJ, et al. Foodborne Illness Acquired in the United States - Major Pathogens. Emerging Infectious Diseases 2011;17(1):7-15. [Volume 17, Number 1, January 2011, pages 7-15]
  5. Marsden-Haug N, Foster VB, Gould PL, Elbert E, Wang H, Pavlin JA. Code-based syndromic surveillance for influenzalike illness by International Classification of Diseases, ninth revision. Emerg Infect Dis, Feb. 2007;13(2):207-216.