Numerical Summaries for Discrete Variables

Frequency distribution tables are a common and useful way of summarizing discrete variables. Representative examples are shown below.

Frequency Distribution Tables for Dichotomous Variables

In the offspring cohort of the Framingham Heart Study 3,539 subjects completed the 7th examination between 1998 and 2001, which included an extensive physical examination. One of the variables recorded was sex as summarized below in a frequency distribution table.

Table 1 - Frequency Distribution Table for Sex

Sex	Frequency	Relative Frequency, %
Male	1,625	45.9
Female	1,914	54.1
Total	3,539	100.0

Note that the third column contains the relative frequencies, which are computed by dividing the frequency in each response category by the sample size (e.g., 1,625/3,539 = 0.459). With dichotomous variables the relative frequencies are often expressed as percentages (by multiplying by 100).

The investigators also recorded whether or not the subjects were being treated with antihypertensive medication, as shown below.

Table 2 - Frequency Distribution Table for Treatment with Antihypertensive Medication

Treatment	Frequency	Relative Frequency (%)
No	2,313	65.5
Yes	1,219	34.5
Total	3,532	100.0

Missing Data

Note in the table above that there are only n=3,532 valid responses, although the sample size was n=3,539. This indicates that seven individuals had missing data on this particular question. Missing data occurs in studies for a variety of reasons. If there is extensive missing data or if there is a systematic pattern of missing responses, the results of the analysis may be biased (see the module on Bias for EP713 for more detail.) There are techniques for handling missing data, but these are beyond the scope of this course

Sometimes it is of interest to compare two or more groups on the basis of a dichotomous outcome variable. For example, suppose we wish to compare the extent of treatment with antihypertensive medication in men and women, as summarized in the table below.

Table 3 - Treatment with Antihypertensive Medication in Men and Women

Sex	Number on Treatment / n	Relative Frequency, %
Male	611/1,622	37.7
Female	608/1,910	31.8
Total	1,219/3,532	34.5

Here, both sex and treatment status are dichotomous variables. Because the numbers of men and women are unequal, the relative frequency of treatment for each sex must be calculated by dividing the number on treatment by the sample size for the sex. The numbers of men and women being treated (frequencies) are almost identical, but the relative frequencies indicate that a higher percentage of men are being treated than women. Note also that the sum of the rightmost column is not 100.0% as it was in previous examples, because it indicates the relative frequency of treatment among all participants (men and women) combined.

Frequency Distribution Tables for Categorical Variables

Recall that categorical variables are those with two or more distinct responses that are unordered. Some examples of categorical variables measured in the Framingham Heart Study include marital status, handedness (right or left) and smoking status. Because the responses are unordered, the order of the responses or categories in the summary table can be changed, for example, presenting the categories alphabetically or perhaps from the most frequent to the least frequent.

Table 4 below summarizes data on marital status from the Framingham Heart Study. The mutually exclusive text annotation indicator and exhaustive categories are shown in the first column of the table. The frequencies, or numbers of participants in each response category, are shown in the middle column and the relative frequencies, as percentages, are shown in the rightmost column.

Table 4 - Frequency Distribution Table for Marital Status

Marital Status	Frequency	Relative Frequency, %
Single	203	5.8
Married	2,580	73.1
Widowed	334	9.5
Divorced	367	10.4
Separated	46	1.3
Total	3,530	100.0

There are n=3,530 valid responses to the marital status question (9 participants did not provide marital status data). The majority of the sample is married (73.1%), and approximately 10% of the sample is divorced. Another 10% are widowed, 6% are single, and 1% are separated.

Frequency Distribution Tables for Ordinal Variables

Some discrete variables are inherently ordinal. In addition to inherently ordered categories (e.g., excellent, very good, good, fair, poor), investigators will sometimes collect information on continuously distributed measures, but then categorize these measurements because it makes it easier for clinical decision making. For example, the NHLBI (National Heart Lung, and Blood Institute and the American Heart Association use the following classification of blood pressure:

Normal: systolic blood pressure <120 and diastolic blood pressure <80
Pre-hypertension: systolic blood pressure between 120-139 or diastolic blood pressure between 80-89
Stage I hypertension: systolic blood pressure between 140-159 or diastolic blood pressure between 90-99
Stage II hypertension: systolic blood pressure of 160 or more or diastolic blood pressure of 100 or more

The American Heart Association uses the following classification for total cholesterol levels:

Desirable: total cholesterol <200 mg/dL,
Borderline high risk: total cholesterol between 200–239 mg/dL and
High risk: total cholesterol of 240 mg/dL or greater

Body mass index (BMI) is computed as the ratio of weight in kilograms to height in meters squared and the following categories are often used:

Underweight: BMI <18.5
Normal weight: BMI between 18.5-24.9
Overweight: BMI between 25-29.9
Obese: BMI of 30 or greater

These are all examples of common continuous measures that have been categorized to create ordinal variables. The table below is a frequency distribution table for the ordinal blood pressure variable. The mutually exclusive text annotation indicator and exhaustive categories are shown in the first column of the table. The frequencies, or numbers of participants in each response category, are shown in the middle column and the relative frequencies, as percentages, are shown in the rightmost columns. The key summary statistics for ordinal variables are relative frequencies and cumulative relative frequencies.

Table 5 - Frequency Distribution for Blood Pressure Category

Blood Pressure	Frequency	Relative Frequency (%)	Cumulative Frequency	Cumulative Relative Frequency, %
Normal	1,206	34.1	1,206	34.1
Pre-Hypertension	1,452	41.1	2,658	75.2
Stage I Hypertension	653	18.5	3,311	93.7
Stage II Hypertension	222	6.3	3,533	100.0
Total	3,533	100.0

Note that the cumulative frequencies reflect the number of patients at the particular blood pressure level or below. For example, 2,658 patients have normal blood pressure or pre-hypertension. There are 3,311 patients with normal, pre-hypertension or Stage I hypertension. The cumulative relative frequencies are very useful for summarizing ordinal variables and indicate the proportion (between 0-1) or percentage (between 0%-100%) of patients at a particular level or below. In this example, 75.2% of the patients are NOT classified as hypertensive (i.e., they have normal blood pressure or pre-hypertension). Notice that for the last (highest) blood pressure category, the cumulative frequency is equal to the sample size (n=3,533) and the cumulative relative frequency is 100% indicating that all of the patients are at the highest level or below.

Table 6 - Frequency Distribution Table for Smoking Status

Smoking Status	Frequency	Relative Frequency, %
Non-Smoker	1,330	37.6
Former	1,724	48.8
Current	482	13.6
Total	3,536	100.0

return to top | previous page | next page