Test Validity


Test validity is the ability of a screening test to accurately identify diseased and non-disease individuals. An ideal screening test is exquisitely sensitive (high probability of detecting disease) and extremely specific (high probability that those without the disease will screen negative). However, there is rarely a clean distinction between "normal" and "abnormal."

The validity of a screening test is based on its accuracy in identifying diseased and non-diseased persons, and this can only be determined if the accuracy of the screening test can be compared to some "gold standard" that establishes the true disease status. The gold standard might be a very accurate, but more expensive diagnostic test. Alternatively, it might be the final diagnosis based on a series of diagnostic tests. If there were no definitive tests that were feasible or if the gold standard diagnosis was invasive, such as a surgical excision, the true disease status might only be determined by following the subjects for a period of time to determine which patients ultimately developed the disease. For example, the accuracy of mammography for breast cancer would have to be determined by following the subjects for several years to see whether a cancer was actually present.

A 2 x 2 table, or contingency table, is also used when testing the validity of a screening test, but note that this is a different contingency table than the ones used for summarizing cohort studies, randomized clinical trials, and case-control studies. The 2 x 2 table below shows the results of the evaluation of a screening test for breast cancer among 64,810 subjects.

 

Diseased

Not Diseased

Total

Test Positive

132

983

1,115

Test Negative

45

63,650

63,695

Column Totals

177

64,633

64,810

The contingency table for evaluating a screening test lists the true disease status in the columns, and the observed screening test results are listed in the rows. The table shown above shows the results for a screening test for breast cancer. There were 177 women who were ultimately found to have had breast cancer, and 64,633 women remained free of breast cancer during the study. Among the 177 women with breast cancer, 132 had a positive screening test (true positives), but 45 had negative tests (false negatives). Among the 64,633 women without breast cancer, 63,650 appropriately had negative screening tests (true negatives), but 983 incorrectly had positive screening tests (false positives).

If we focus on the rows, we find that 1,115 subjects had a positive screening disease, i.e., the test results were abnormal and suggested disease. However, only 132 of these were found to actually have disease, based on the gold standard test. Also note that 63,695 people had a negative screening test, suggesting that they did not have the disease, BUT, in fact 45 of these people were actually diseased.

Sensitivity 

One measure of test validity is sensitivity, i.e., how accurate the screening test is in identifying disease in people who truly have the disease. When thinking about sensitivity, focus on the individuals who, in fact, really were diseased - in this case, the left hand column.

Table - Illustration of the Sensitivity of a Screening Test

The column with diseased subjects is emphasized, since sensitivity focuses on the probability that the test will correctly identify diseased subjects.

 

Diseased

Not Diseased

Total

Test Positive

132

983

1,115

Test Negative

45

63,650

63,695

Column Totals

177

64,633

64,810

What was the probability that the screening test would correctly indicate disease in this subset? The probability is simply the percentage of diseased people who had a positive screening test, i.e., 132/177 = 74.6%. I could interpret this by saying, "The probability of the screening test correctly identifying diseased subjects was 74.6%."

Specificity

Specificity focuses on the accuracy of the screening test in correctly classifying truly non-diseased people. It is the probability that non-diseased subjects will be classified as normal by the screening test.

Table - Illustration of the Specificity of a Screening Test

The column with non-diseased subjects is emphasized. Specificity focuses on the probability that the screening test will correctly identify non-diseased subjects.

 

Diseased

Not Diseased

Total

Test Positive

132

983

1,115

Test Negative

45

63,650

63,695

Column Totals

177

64,633

64,810

 

As noted in the biostatistics module on Probability,

Link to the biostatistics module on Probability,

In this example, the specificity is 63,650/64,633 = 98.5%. I could interpret this by saying, "The probability of the screening test correctly identifying non-diseased subjects was 98.5%."

 

Thinking man icon indicating a question for the student

Question: In the above example, what was the prevalence of disease among the 64,810 people in the study population? Compute the answer on your own before looking at the answer.

Answer