Numerical Summaries for Discrete Variables
Frequency distribution tables are a common and useful way of summarizing discrete variables. Representative examples are shown below.
Frequency Distribution Tables for Dichotomous Variables
In the offspring cohort of the Framingham Heart Study 3,539 subjects completed the 7th examination between 1998 and 2001, which included an extensive physical examination. One of the variables recorded was sex as summarized below in a frequency distribution table.
Table 1  Frequency Distribution Table for Sex
Sex 
Frequency 
Relative Frequency, % 

Male 
1,625 
45.9 
Female 
1,914 
54.1 
Total 
3,539 
100.0 
Note that the third column contains the relative frequencies, which are computed by dividing the frequency in each response category by the sample size (e.g., 1,625/3,539 = 0.459). With dichotomous variables the relative frequencies are often expressed as percentages (by multiplying by 100).
The investigators also recorded whether or not the subjects were being treated with antihypertensive medication, as shown below.
Table 2  Frequency Distribution Table for Treatment with Antihypertensive Medication
Treatment 
Frequency 
Relative Frequency (%) 

No 
2,313 
65.5 
Yes 
1,219 
34.5 
Total 
3,532 
100.0 
Missing Data
Note in the table above that there are only n=3,532 valid responses, although the sample size was n=3,539. This indicates that seven individuals had missing data on this particular question. Missing data occurs in studies for a variety of reasons. If there is extensive missing data or if there is a systematic pattern of missing responses, the results of the analysis may be biased (see the module on Bias for EP713 for more detail.) There are techniques for handling missing data, but these are beyond the scope of this course
Sometimes it is of interest to compare two or more groups on the basis of a dichotomous outcome variable. For example, suppose we wish to compare the extent of treatment with antihypertensive medication in men and women, as summarized in the table below.
Table 3  Treatment with Antihypertensive Medication in Men and Women
Sex 
Number on Treatment / n 
Relative Frequency, % 

Male 
611/1,622 
37.7 
Female 
608/1,910 
31.8 
Total 
1,219/3,532 
34.5 
Here, both sex and treatment status are dichotomous variables. Because the numbers of men and women are unequal, the relative frequency of treatment for each sex must be calculated by dividing the number on treatment by the sample size for the sex. The numbers of men and women being treated (frequencies) are almost identical, but the relative frequencies indicate that a higher percentage of men are being treated than women. Note also that the sum of the rightmost column is not 100.0% as it was in previous examples, because it indicates the relative frequency of treatment among all participants (men and women) combined.
Frequency Distribution Tables for Categorical Variables
Recall that categorical variables are those with two or more distinct responses that are unordered. Some examples of categorical variables measured in the Framingham Heart Study include marital status, handedness (right or left) and smoking status. Because the responses are unordered, the order of the responses or categories in the summary table can be changed, for example, presenting the categories alphabetically or perhaps from the most frequent to the least frequent.
Table 4 below summarizes data on marital status from the Framingham Heart Study. The mutually exclusive and exhaustive categories are shown in the first column of the table. The frequencies, or numbers of participants in each response category, are shown in the middle column and the relative frequencies, as percentages, are shown in the rightmost column.
Table 4  Frequency Distribution Table for Marital Status
Marital Status 
Frequency 
Relative Frequency, % 

Single 
203 
5.8 
Married 
2,580 
73.1 
Widowed 
334 
9.5 
Divorced 
367 
10.4 
Separated 
46 
1.3 
Total 
3,530 
100.0 
There are n=3,530 valid responses to the marital status question (9 participants did not provide marital status data). The majority of the sample is married (73.1%), and approximately 10% of the sample is divorced. Another 10% are widowed, 6% are single, and 1% are separated.
Frequency Distribution Tables for Ordinal Variables
Some discrete variables are inherently ordinal. In addition to inherently ordered categories (e.g., excellent, very good, good, fair, poor), investigators will sometimes collect information on continuously distributed measures, but then categorize these measurements because it makes it easier for clinical decision making. For example, the NHLBI (National Heart Lung, and Blood Institute and the American Heart Association use the following classification of blood pressure:
 Normal: systolic blood pressure <120 and diastolic blood pressure <80
 Prehypertension: systolic blood pressure between 120139 or diastolic blood pressure between 8089
 Stage I hypertension: systolic blood pressure between 140159 or diastolic blood pressure between 9099
 Stage II hypertension: systolic blood pressure of 160 or more or diastolic blood pressure of 100 or more
The American Heart Association uses the following classification for total cholesterol levels:
 Desirable: total cholesterol <200 mg/dL,
 Borderline high risk: total cholesterol between 200–239 mg/dL and
 High risk: total cholesterol of 240 mg/dL or greater
Body mass index (BMI) is computed as the ratio of weight in kilograms to height in meters squared and the following categories are often used:
 Underweight: BMI <18.5
 Normal weight: BMI between 18.524.9
 Overweight: BMI between 2529.9
 Obese: BMI of 30 or greater
These are all examples of common continuous measures that have been categorized to create ordinal variables. The table below is a frequency distribution table for the ordinal blood pressure variable. The mutually exclusive and exhaustive categories are shown in the first column of the table. The frequencies, or numbers of participants in each response category, are shown in the middle column and the relative frequencies, as percentages, are shown in the rightmost columns. The key summary statistics for ordinal variables are relative frequencies and cumulative relative frequencies.
Table 5  Frequency Distribution for Blood Pressure Category
Blood Pressure 
Frequency 
Relative Frequency (%) 
Cumulative Frequency 
Cumulative Relative Frequency, % 

Normal 
1,206 
34.1 
1,206 
34.1 
PreHypertension 
1,452 
41.1 
2,658 
75.2 
Stage I Hypertension 
653 
18.5 
3,311 
93.7 
Stage II Hypertension 
222 
6.3 
3,533 
100.0 
Total 
3,533 
100.0 


Note that the cumulative frequencies reflect the number of patients at the particular blood pressure level or below. For example, 2,658 patients have normal blood pressure or prehypertension. There are 3,311 patients with normal, prehypertension or Stage I hypertension. The cumulative relative frequencies are very useful for summarizing ordinal variables and indicate the proportion (between 01) or percentage (between 0%100%) of patients at a particular level or below. In this example, 75.2% of the patients are NOT classified as hypertensive (i.e., they have normal blood pressure or prehypertension). Notice that for the last (highest) blood pressure category, the cumulative frequency is equal to the sample size (n=3,533) and the cumulative relative frequency is 100% indicating that all of the patients are at the highest level or below.
Table 6  Frequency Distribution Table for Smoking Status
Smoking Status 
Frequency 
Relative Frequency, % 

NonSmoker 
1,330 
37.6 
Former 
1,724 
48.8 
Current 
482 
13.6 
Total 
3,536 
100.0 