The Binomial Distribution: A Probability Model for a Discrete Outcome
The binomial distribution model is an important probability model that is used when there are two possible outcomes (hence "binomial"). In a situation in which there were more than two distinct outcomes, a multinomial probability model might be appropriate, but here we focus on the situation in which the outcome is dichotomous.
For example, adults with allergies might report relief with medication or not, children with a bacterial infection might respond to antibiotic therapy or not, adults who suffer a myocardial infarction might survive the heart attack or not, a medical device such as a coronary stent might be successfully implanted or not. These are just a few examples of applications or processes in which the outcome of interest has two possible values (i.e., it is dichotomous). The two outcomes are often labeled "success" and "failure" with success indicating the presence of the outcome of interest. Note, however, that for many medical and public health questions the outcome or event of interest is the occurrence of disease, which is obviously not really a success. Nevertheless, this terminology is typically used when discussing the binomial distribution model. As a result, whenever using the binomial distribution, we must clearly specify which outcome is the "success" and which is the "failure".
The binomial distribution model allows us to compute the probability of observing a specified number of "successes" when the process is repeated a specific number of times (e.g., in a set of patients) and the outcome for a given patient is either a success or a failure. We must first introduce some notation which is necessary for the binomial distribution model.
First, we let "n" denote the number of observations or the number of times the process is repeated, and "x" denotes the number of "successes" or events of interest occurring during "n" observations. The probability of "success" or occurrence of the outcome of interest is indicated by "p".
The binomial equation also uses factorials. In mathematics, the factorial of a non-negative integer k is denoted by k!, which is the product of all positive integers less than or equal to k. For example,
- 4! = 4 x 3 x 2 x 1 = 24,
- 2! = 2 x 1 = 2,
- There is one special case, 0! = 1.
With this notation in mind, the binomial distribution model is defined as:
The Binomial Distribution Model
Use of the binomial distribution requires three assumptions:
- Each replication of the process results in one of two possible outcomes (success or failure),
- The probability of success is the same for each replication, and
- The replications are independent, meaning here that a success in one patient does not influence the probability of success in another.
For a more intuitive explanation of the binomial distribution, you might want to watch the following video from KhanAcademy.org.
Examples of Use of the Binomial Model
1. Relief of Allergies
Suppose that 80% of adults with allergies report symptomatic relief with a specific medication. If the medication is given to 10 new patients with allergies, what is the probability that it is effective in exactly seven?
First, do we satisfy the three assumptions of the binomial distribution model?
- The outcome is relief from symptoms (yes or no), and here we will call a reported relief from symptoms a 'success.'
- The probability of success for each person is 0.8.
- The final assumption is that the replications are independent, and it is reasonable to assume that this is true.
We know that:
- # observation is n=10
- # successes or events of interest is x=7
The probability of 7 successes is:
This is equivalent to:
But many of the terms in the numerator and denominator cancel each other out,
so this can be simplified to:
Interpretation: There is a 20.13% probability that exactly 7 of 10 patients will report relief from symptoms when the probability that any one reports relief is 80%.
Note: Binomial probabilities like this can also be computed in an Excel spreadsheet using the =BINOMDIST function. Place the cursor into an empty cell and enter the following formula:
where x= # of 'successes', n = # of replications or observations, and p = probability of success on a single observation.
What is the probability that none report relief? We can again use the binomial distribution model with n=10, x=0 and p=0.80.
This is equivalent to
whixh simpliefies to
Interpretation: There is practically no chance that none of the 10 will report relief from symptoms when the probability of reporting relief for any individual patient is 80%.
What is the most likely number of patients who will report relief out of 10? If 80% report relief and we consider 10 patients, we would expect that 8 report relief. What is the probability that exactly 8 of 10 report relief? We can use the same method that was used above to demonstrate that there is a 30.30% probability that exactly 8 of 10 patients will report relief from symptoms when the probability that any one reports relief is 80%. The probability that exactly 8 report relief will be the highest probability of all possible outcomes (0 through 10).
2. The Probability of Dying after a Heart Attack
The likelihood that a patient with a heart attack dies of the attack is 0.04 (i.e., 4 of 100 die of the attack). Suppose we have 5 patients who suffer a heart attack, what is the probability that all will survive? For this example, we will call a success a fatal attack (p = 0.04). We have n=5 patients and want to know the probability that all survive or, in other words, that none are fatal (0 successes).
We again need to assess the assumptions. Each attack is fatal or non-fatal, the probability of a fatal attack is 4% for all patients and the outcome of individual patients are independent. It should be noted that the assumption that the probability of success applies to all patients must be evaluated carefully. The probability that a patient dies from a heart attack depends on many factors including age, the severity of the attack, and other comorbid conditions. To apply the 4% probability we must be convinced that all patients are at the same risk of a fatal attack. The assumption of independence of events must also be evaluated carefully. As long as the patients are unrelated, the assumption is usually appropriate. Prognosis of disease could be related or correlated in members of the same family or in individuals who are co-habitating. In this example, suppose that the 5 patients being analyzed are unrelated, of similar age and free of comorbid conditions.
There is an 81.54% probability that all patients will survive the attack when the probability that any one dies is 4%. In this example, the possible outcomes are 0, 1, 2, 3, 4 or 5 successes (fatalities). Because the probability of fatality is so low, the most likely response is 0 (all patients survive). The binomial formula generates the probability of observing exactly x successes out of n.
Computing the Probability of a Range of Outcomes
If we want to compute the probability of a range of outcomes we need to apply the formula more than once. Suppose in the heart attack example we wanted to compute the probability that no more than 1 person dies of the heart attack. In other words, 0 or 1, but not more than 1. Specifically we want P(no more than 1 success) = P(0 or 1 successes) = P(0 successes) + P(1 success). To solve this probability we apply the binomial formula twice.
We already computed P(0 successes), we now compute P(1 success):
P(no more than 1 'success') = P(0 or 1 successes) = P(0 successes) + P(1 success)
= 0.8154 + 0.1697 = 0.9851.
The probability that no more than 1 of 5 (or equivalently that at most 1 of 5) die from the attack is 98.51%.
What is the probability that 2 or more of 5 die from the attack? Here we want to compute P(2 or more successes). The possible outcomes are 0, 1, 2, 3, 4 or 5, and the sum of the probabilities of each of these outcomes is 1 (i.e., we are certain to observe either 0, 1, 2, 3, 4 or 5 successes). We just computed P(0 or 1 successes) = 0.9851, so P(2, 3, 4 or 5 successes) = 1 - P(0 or 1 successes) = 0.0149. There is a 1.49% probability that 2 or more of 5 will die from the attack.
Mean and Standard Deviation of a Binomial Population
Mean number of successes:
For the previouos example on the probability of relief from allergies with n-10 trialsand p=0.80 probability of success on each trial:
Binomial Probability Calculator
Suppose you flipped a coin 10 times (i.e., 10 trials), and the probability of getting "heads" was 0.5 (50%). What would be the probability of getting exactly 4 heasds?
Calculating Binomial Probabilities with R
With 4 successes, 10 trials, and probability =0.5 on each trial
What is the :
R coding to compute these
|a) Probability of exactly 4 events =||0.205078||
> dbinom (4, 10, 0.5)
|b) Cumulative probability of < 4 events =||0.171875||
> pbinom (3, 10, 0.5, lower.tail=TRUE)
|c) Cumulative probability of < 4 events =||0.376953||
> pbinom(4, 10, 0.5, lower.tail=TRUE)
|d) Cumulative probability of > 4 events =||0.623047||
> pbinom(4, 10, 0.5, lower.tail=FALSE)
|e) Cumulative probability of > 4 events =||0.828125||
pbinom (3, 10, 0.5, lower.tail=FALSE)