Page 5 of 12

# Measurement Error

All exposure assessments require some way of measuring exposure to a given agent, which can result in measurement error (i.e. exposure measurements determined from samples are not representative of actual exposure). Measurement error is the difference between the true exposure and a measured or observed exposure. While large inter-individual or inter-group differences can be important to detect statistically significant differences, investigators do not want such large inter-individual differences among individuals with similar exposures (i.e. between-subject variability).

Common sources of measurement error include the following:

• Faulty equipment or instruments used to estimate exposure
• Deviations from data collection protocols
• Limitations due to study participant characteristics
• Data entry and/or analysis errors

The table below uses a regression equation to demonstrate the impact measurement error can have on risk estimates in a given study. The first equation highlights what investigators want which is being able to determine the health outcome ("y") based on true exposure ("x"). However, investigators determine health outcome as a function of a measured/observed exposure ("z"). The key concern is the difference between βx and βz.

Impact on Risk Estimates

• What investigators want:
• y = α + βxx + ε
• What investigators have:
• yij= α + βzzij + εij
• α and β are coefficients to be estimated and ε is residual error.
• x = true exposure
• y = health outcome
• z = measured/observed exposure
• i = individual 'i'
• j = day 'j'

The two types of error that can result from measurements are systematic and random error, which can affect the accuracy and precision of exposure measurements. Suppose investigators wanted to monitor changes in body weight over time among the students at Boston University. In theory, they could weigh all of the students with an accurate scale, and would likely find that body weight measurements were more or less symmetrically distributed along a bell-shaped curve. Knowing all of their weights, investigators could also compute the true mean of this student population. However, it really isn't feasible to collect weight measurements for every student at BU. If investigators just want to follow trends, an alternative is to estimate the mean weight of the population by taking a sample of students each year. In order to collect these measurements, investigators use two bathroom scales. It turns out that one of them has been calibrated and is very accurate, but the other has not been calibrated, and it consistently overestimates body weight by about ten pounds.

Now consider four possible scenarios as investigators try to estimate the mean body weight by taking samples. In each of the four panels (shown below), the distribution of the total population (if investigators measured it) is shown by the black bell-shaped curve, and the vertical red line indicates the true mean of the population. Review the illustration below for more information.

Small Sample Size (n = 5 students)

• • Larger Sample Size (n = 50 students)

• • ## Misclassification of Exposure

For environmental exposures, exposure data is often collected on an aggregate scale since individual exposure measurements can be difficult or impossible to collect, which can lead to random misclassification of exposure. Another source of random misclassification takes place as a result of the intra-individual (within-subject) variability, particularly when investigating chronic effects. For example, a study looking to assess dietary pesticide exposure may need to address how dietary intake for an individual may change seasonally, which would result in different exposure levels for the same individual.

Non-random, or differential misclassification of exposure takes place when errors in exposure are more likely to occur in one of the groups being compared. Random misclassification of exposure is when errors in exposure are equally likely to occur in all exposure groups. Random misclassification can be broken down into two categories: Berkson error model and classical error. Berkson error model refers to random misclassification that results in little to no bias in the measurement whereas classical error refers to random misclassification that tends to attenuate the risk estimates (i.e. bias towards the null). An epidemiological study looking to examine the association between environmental tobacco smoke (ETS) and asthma exacerbation in children may define their exposed population as children living with one or more parents who smoke. What are some of the limitations regarding this classification of the exposed population?

## Minimizing Exposure Error

There are measures that can be taken throughout a study (design, data collection, and analysis phase) to minimize or adjust for measurement error. For example, validation studies are studies that attempt to assess a given method of collecting exposure compared to a "gold standard." For example, a study might collect a single urine sample from study participants to analyze it for organophosphate metabolites. But the "gold standard" is to collect 24-hour urine samples. A validation study attempts to determine the level of agreement between the single urine sample and 24-hour urine samples.

#### Case-Control

A study in which exposure of diseased subjects (cases) is compared to the exposure of (randomly selected) controls from the underlying sampling population. A risk estimate (e.g. odds ratio) is obtained by dividing the odds of exposure (in the past) for the cases by the odds of exposure (in the past) in the controls.

#### Cohort

A group (i.e. cohort) of subjects are followed up over time to assess whether they develop the disease of interest or not. The subjects are classified by level of exposure (e.g. yes/no, low, medium, and high) at entry to the study, but may be re-classified at a later stage. A risk estimate (e.g. relative risk or incidence rate ratio) is obtained by comparing the disease rate in subpopulations with different levels of exposure.

#### Cross-sectional

A type of study where exposure and outcome of interest in a given populaiton are assessed at one point in time (i.e. snapshot). Subjects are classified by different levels of exposure (e.g. yes/no, none, low, medium, and high) and the frequency of disease is assessed for each level of exposure. A risk estimate (e.g. prevalence ratio) is obtained by comparing the disease frequencies in subpopulations with different levels of exposure or external controls.

#### Time-series

A study in which the day-to-day variability in exposure levels is correlated to the day-to-day variability in disease rate. Recently this approach has frequently been used in air pollution research, where measurements from ambient air pollution monitoring stations have been linked to daily morbidity and mortality data.

#### Random Error

an error that can result when the characteristics of those who are selected as samples (i.e. study population) do not correctly reflect the true population. This can often be overcome or minimized by taking larger samples.

#### Statistical Power

Statistical power is the probability of correctly rejecting the null hypothesis (i.e. the ability of a test to detect an effect, if the effect actually exists.

#### Duration

How long does exposure take place?

#### Concentration

How much of the agent was one exposed to?

#### Frequency

How many times is one exposed to the agent?

#### Random Error

This scenario highlights the results investigators may have gotten using the accurate scale, but only taking samples consisting of five students. With only five, investigators may get lucky and get an estimate close to the true mean, but they may not. If investigators were to take multiple samples of five, they would find that their estimates varied greatly, and some of the estimates would be far from the true mean. This method has a lot of random error.

#### Random Error and Biased

In this scenario, investigators take small samples and also use the inaccurate scale, so everyone's weight is recorded as being about ten pounds more than the actual weight. This is a systematic error that shifts all of the estimates to the right. So, now there are two sources of error. The small samples are giving investigators estimates that vary widely (random error), and the inaccurate scales shift all of the widgets to the right (systematic error or bias).

#### Random Error, No Bias

This scenario shows the results investigators may get using the accurate scale with multiple samples of 50. Note that there is much less variability; the estimates are more consistent and less likely to give highly inaccurate estimates. Note also that the estimates are symmetrically clustered around the true mean, because there isn't any bias (systematic error). Investigators could have reduced random error even further by collecting measurement data on an even larger sample size.

#### Less Random Error, But Biased

In this scenario, investigators take larger samples, but use the inaccurate scale, so everyone's weight is recorded as being about ten pounds more than their actual weight. Even though there is less random error due to the sample size, the inaccurate scale introduces a systematic error that shifts all of the weight estimates to the right.