Evaluating Screening Tests


Screening tests are often used in clinical practice to assess the likelihood that a person has a particular medical condition. The rationale is that, if disease is identified early (before the manifestation of symptoms), then earlier treatment may lead to cure or improved survival or quality of life. This topic is also addressed in the core course in epidemiology in the learning module on Screening for Disease, in which one of the points that is stressed is that screening tests do not necessarily extend life or improve outcomes. In fact, many screening tests have potential adverse effects that need to be considered and weighed against the potential benefits. In addition, one needs to consider other factors when evaluating screening tests, such as their cost, availability, and discomfort.

Screening tests are often laboratory tests that detect particular markers of a specific disease. For example, the prostate-specific antigen (PSA) test for prostate cancer, which measures blood concentrations of PSA, a protein produced by the prostate gland. Many medical evaluations and tests may be thought of as screening procedures as well. For example, blood pressure tests, routine EKGs, breast exams, digital rectal exams, mammograms, routine blood and urine tests, or even questionnaires about behaviors and risk factors might all be considered screening tests. However, it is important to point out that none of these are definitive; they raise a heightened suspicion of disease, but they aren't diagnostic. A definitive diagnosis generally requires more extensive, sometimes invasive, and more reliable evaluations.

Nevertheless, let's return to the PSA test as an example of a screening test. In the absence of disease, levels of PSA are low, but elevated PSA levels can occur in the presence of prostate cancer, benign prostatic enlargement (a common condition in older men), and in the presence of infection or inflammation of the prostate gland. Thus, elevated levels of PSA may help identify men with prostate cancer, but they do not provide a definitive diagnosis, which requires biopsies of the prostate gland, in which tissue is sampled by a surgical procedure or by inserting a needle into the gland. The biopsy is then examined by a pathologist under a microscope, and based on the appearance of cells in the biopsy, a judgment is made as to whether the patient has prostate cancer or not. Obviously, if the screening test is to be useful clinically two conditions must be met. First, the test has to provide an advantage in distinguishing between, for example, men with and without prostate cancer. Second, one needs to demonstrate that early identification and treatment of the disease results in some improvement: a decreased probability of dying of the disease, or increased survival, or some measurable improvement in outcome.

One can collect data to examine the ability of a screening procedure to identify individuals with a disease. Suppose that a population of N=120 men over 50 years of age who are considered at high risk for prostate cancer have both the PSA screening test and a biopsy. The PSA results are reported as low, slightly to moderately elevated or highly elevated based on the following levels of measured protein, respectively: 0-2.5, 2.6-19.9 and 20 or more nanograms per milliliter.9 The biopsy results of the study are shown below.

PSA Level (Screening Test)

Prostate Cancer

No Prostate Cancer

Totals

Low (0-2.5 ng/ml)

3

61

64

Slight/Moderate Elevation (2.6-19.9 ng/ml)

13

28

41

Highly Elevated (>29 ng/ml)

12

3

15

 Totals

28

92

120

Thus, the probability or likelihood that a man has prostate cancer is related to his PSA level. Based on these data, is the PSA test a clinically important screening test?

Screening for Down Syndrome

To address this question, let's first consider a screening test for Down Syndrome. In pregnancy, women often undergo screening to assess whether their fetus is likely to have Down Syndrome. The screening test evaluates levels of specific hormones in the blood. Screening test results are reported as positive or negative, indicating that a women is more or less likely to be carrying an affected fetus. Suppose that a population of N=4,810 pregnant women undergo the screening test and are scored as either positive or negative depending on the levels of hormones in the blood. In addition, suppose that each woman is followed to birth to determine whether the fetus was, in fact, affected with Down Syndrome. The results of the screening tests are summarized below.

Screening Test

Down Syndrome

No Down Syndrome

Total

Positive

9

351

360

Negative

1

4,449

4,450

Total

10

4,800

4,810

In order to evaluate the screening test, each participant undergoes the screening test and is classified as positive or negative based on criteria that are specific to the test (e.g., high levels of a marker in a serum test or presence of a mass on a mammogram). A definitive diagnosis is also made for each participant based on definitive diagnostic tests or on an actual determination of outcome.

Using the data above, the probability that a woman with a positive screening test has an affected fetus is:

P(Affected Fetus | Screen Positive) = 9/360 = 0.025,

and the probability that a woman with a negative test has an affected fetus is

P(Affected Fetus | Negative Screen Positive) = 1/4,450 = 0.0002.

Is the serum screen a useful test?

Sensitivity and Specificity

As noted above, screening tests are not diagnostic, but instead may identify individuals more likely to have a certain condition. There are two measures that are commonly used to evaluate the performance of screening tests: the sensitivity and specificity of the test. The sensitivity of the test reflects the probability that the screening test will be positive among those who are diseased. In contrast, the specificity of the test reflects the probability that the screening test will be negative among those who, in fact, do not have the disease.

A total of N patients complete both the screening test and the diagnostic test. The data are often organized as follows with the results of the screening test shown in the rows and results of the diagnostic test are shown in the columns.

 

Diseased

Disease Free

Total

Screen Positive

a

b

a+b

Screen Negative

c

d

c+d

 

a+c

b+d

N

 

One might also consider the:

The false positive fraction is 1-specificity and the false negative fraction is 1-sensitivity. Therefore, knowing sensitivity and specificity captures the information in the false positive and false negative fractions. These are simply alternate ways of expressing the same information. Often times, sensitivity and the false positive fraction are reported for a test.

For the screening test for Down Syndrome the following results were obtained:

Screening Test Result

Affected Fetus

Unaffected Fetus

Total

Positive

9

351

360

Negative

1

4,449

4,450

 Totals

10

4,800

4,810

Thus, the performance characteristics of the test are:

Interpretation:

However, the false positive and false negative fractions quantify errors in the test. The errors are often of greatest concern.

The sensitivity and false positive fractions are often reported for screening tests. However, for some tests, the specificity and false negative fractions might be the most important. The most important characteristics of any screening test depend on the implications of an error. In all cases, it is important to understand the performance characteristics of any screening test to appropriately interpret results and their implications.

Positive and Negative Predictive Value

Consider the results of a screening test from the patient's perspective! If the screening test is positive, the patient wants to know "What is the probability that I actually have the disease?" And if the test is negative, astute patients may ask, "What is the probability that I do not actually have disease if my test comes back negative?"

These questions refer to the positive and negative predictive values of the screening test, and they can be answered with conditional probabilities.

 

Diseased

Non-Diseased

Total

Screen Positive

a

b

a+b

Screen Negative

c

d

c+d

 Totals

a+c

b+d

N

Consider again the study evaluating pregnant women for carrying a fetus with Down Syndrome:

Screening Test

Affected Fetus

Unaffected Fetus

Total

Positive

9

351

360

Negative

1

4,449

4,450

Total

10

4,800

4,810

Interpretation:

Positive Predictive Value (Yield) Depends on the Prevalence of Disease

The sensitivity and specificity of a screening test are characteristics of the test's performance at a given cut-off point (criterion of positivity). However, the positive predictive value of a screening test will be influenced not only by the sensitivity and specificity of the test, but also by the prevalence of the disease in the population that is being screened. In this example, the positive predictive value is very low (here 2.5%) because it depends on the prevalence of the disease in the population. This is due to the fact that as the disease becomes more prevalent, subjects are more frequently in the "affected" or "diseased" column, so the probability of disease among subjects with positive tests will be higher.

In this example, the prevalence of Down Syndrome in the population of N=4,810 women is 10/4,810 = 0.002 (i.e., in this population Down Syndrome affects 2 per 1,000 fetuses). While this screening test has good performance characteristics (sensitivity of 90.0% and specificity of 92.7%), the prevalence of the condition is low, so even a test with a high sensitivity and specificity has a low positive predictive value. Because positive and negative predictive values depend on the prevalence of the disease, they cannot be estimated in case control designs.

A Screening Calculator

alternative accessible content