Independence
In probability, two events are said to be independent if the probability of one is not affected by the occurrence or non-occurrence of the other. This definition requires further explanation, so consider the following example.
Earlier in this module we considered data from a population of N=100 men who had both a PSA test and a biopsy for prostate cancer. Suppose we have a different test for prostate cancer. This prostate test produces a numerical risk that classifies a man as at low, moderate or high risk for prostate cancer. A sample of 100 men underwent the new test and also had a biopsy. The data from the biopsy results are summarized below.
Prostate Test Risk |
Prostate Cancer |
No Prostate Cancer |
Total |
---|---|---|---|
Low |
10 |
50 |
60 |
Moderate |
6 |
30 |
36 |
High |
4 |
20 |
24 |
|
20 |
100 |
120 |
- The probability that a man has prostate cancer given he has a low risk is: P(Prostate Cancer | Low Risk) = 10/60 = 0.167.
- The probability that a man has prostate cancer given he has a moderate risk is: P(Prostate Cancer | Moderate Risk) = 6/36 = 0.167.
- The probability that a man has prostate cancer given he has a high risk is: P(Prostate Cancer | High Risk) = 4/24 = 0.167.
Note that regardless of whether the hypothetical Prostate Test was low, moderate, or high, the probability that a subject had cancer was 0.167. In other words, knowing a man's prostate test result does not affect the likelihood that he has prostate cancer in this example. In this case, the probability that a man has prostate cancer is independent of his prostate test result.
Demonstrating Independence
Consider two events, call them A and B (e.g., A might be a low risk based on the "prostate test", and B is a diagnosis of prostate cancer). These two events are independent if P(A | B) = P(A) or if P(B | A) = P(B).
To check independence, we compare a conditional and an unconditional probability: P(A | B) = P(Low Risk | Prostate Cancer) = 10/20 = 0.50 and P(A) = P(Low Risk) = 60/120 = 0.50. The equality of the conditional and unconditional probabilities indicates independence.
Independence can also be tested by examining whether P(B | A) = P(Prostate Cancer | Low Risk) = 10/60 = 0.167 and P(B) = P(Prostate Cancer) = 20/120 = 0.167. In other words, the probability of the patient having a diagnosis of prostate cancer given a low risk "prostate test" (the conditional probability) is the same as the overall probability of having a diagnosis of prostate cancer (the unconditional probability).
Example:
The following table contains information on a population of N=6,732 individuals who are classified as having or not having prevalent cardiovascular disease (CVD). Each individual is also classified in terms of having a family history of cardiovascular disease. In this analysis, family history is defined as a first degree relative (parent or sibling) with diagnosed cardiovascular disease before age 60.
|
Prevalent CVD |
Free of CVD |
Total |
---|---|---|---|
Family History of CVD |
491 |
368 |
859 |
No Family History of CVD |
152 |
5,721 |
5,873 |
Total |
643 |
6,089 |
6,732 |
Are family history and prevalent CVD independent? Is there a relationship between family history and prevalent CVD? This is a question of independence of events.
Let A=Prevalent CVD and B = Family History of CVD. (Note that it does not matter how we define A and B, for example we could have defined A=No Family History and B=Free of CVD, the result will be identical.) We now must check whether P(A | B) = P(A) or if P(B | A) = P(B). Again, it makes no difference which definition is used; the conclusion will be identical. We will compare the conditional probability to the unconditional probability as follows:
Conditional Probability |
Unconditional Probability |
---|---|
P(A | B) = P(Prevalent CVD | Family History of CVD) = 491/859 = 0.572
The probability of prevalent CVD given a family history is 57.2% (as compared to 2.6% among patients with no family history). |
P(A) = P(Prevalent CVD) = 643/6,732 = 0.096
In the overall population, the probability of prevalent CVD is 9.6% (or 9.6% of the population has prevalent CVD). |
Since these probabilities are not equal, family history and prevalent CVD are not independent. Individuals with a family history of CVD are much more likely to have prevalent CVD.