# Numerical Summaries for Discrete Variables

Frequency distribution tables are a common and useful way of summarizing discrete variables. Representative examples are shown below.

## Frequency Distribution Tables for Dichotomous Variables

In the offspring cohort of the Framingham Heart Study 3,539 subjects completed the 7th examination between 1998 and 2001, which included an extensive physical examination. One of the variables recorded was sex as summarized below in a frequency distribution table.

Table 1 - Frequency Distribution Table for Sex

Sex

Frequency

Relative Frequency, %

Male

1,625

45.9

Female

1,914

54.1

Total

3,539

100.0

Note that the third column contains the relative frequencies, which are computed by dividing the frequency in each response category by the sample size (e.g., 1,625/3,539 = 0.459). With dichotomous variables the relative frequencies are often expressed as percentages (by multiplying by 100).

The investigators also recorded whether or not the subjects were being treated with antihypertensive medication, as shown below.

Table 2 - Frequency Distribution Table for Treatment with Antihypertensive Medication

Treatment

Frequency

Relative Frequency (%)

No

2,313

65.5

Yes

1,219

34.5

Total

3,532

100.0

### Missing Data

Note in the table above that there are only n=3,532 valid responses, although the sample size was n=3,539.  This indicates that seven individuals had missing data on this particular question.  Missing data occurs in studies for a variety of reasons. If there is extensive missing data or if there is a systematic pattern of missing responses, the results of the analysis may be biased (see the module on Bias for EP713 for more detail.) There are techniques for handling missing data, but these are beyond the scope of this course

Sometimes it is of interest to compare two or more groups on the basis of a dichotomous outcome variable.  For example, suppose we wish to compare the extent of treatment with antihypertensive medication in men and women, as summarized in the table below.

Table 3 - Treatment with Antihypertensive Medication in Men and Women

Sex

Number on Treatment / n

Relative Frequency, %

Male

611/1,622

37.7

Female

608/1,910

31.8

Total

1,219/3,532

34.5

Here, both sex and treatment status are dichotomous variables. Because the numbers of men and women are unequal, the relative frequency of treatment for each sex must be calculated by dividing the number on treatment by the sample size for the sex. The numbers of men and women being treated (frequencies) are almost identical, but the relative frequencies indicate that a higher percentage of men are being treated than women. Note also that the sum of the rightmost column is not 100.0% as it was in previous examples, because it indicates the relative frequency of treatment among all participants (men and women) combined.

## Frequency Distribution Tables for Categorical Variables

Recall that categorical variables are those with two or more distinct responses that are unordered. Some examples of categorical variables measured in the Framingham Heart Study include marital status, handedness (right or left) and smoking status. Because the responses are unordered, the order of the responses or categories in the summary table can be changed, for example, presenting the categories alphabetically or perhaps from the most frequent to the least frequent.

Table 4 below summarizes data on marital status from the Framingham Heart Study. The mutually exclusive and exhaustive categories are shown in the first column of the table. The frequencies, or numbers of participants in each response category, are shown in the middle column and the relative frequencies, as percentages, are shown in the rightmost column.

Table 4 - Frequency Distribution Table for Marital Status

Marital Status

Frequency

Relative Frequency, %

Single

203

5.8

Married

2,580

73.1

Widowed

334

9.5

Divorced

367

10.4

Separated

46

1.3

Total

3,530

100.0

There are n=3,530 valid responses to the marital status question (9 participants did not provide marital status data). The majority of the sample is married (73.1%), and approximately 10% of the sample is divorced. Another 10% are widowed, 6% are single, and 1% are separated.

## Frequency Distribution Tables for Ordinal Variables

Some discrete variables are inherently ordinal. In addition to inherently ordered categories (e.g., excellent, very good, good, fair, poor), investigators will sometimes collect information on continuously distributed measures, but then categorize these measurements because it makes it easier for clinical decision making. For example, the NHLBI (National Heart Lung, and Blood Institute and the American Heart Association use the following classification of blood pressure:

• Normal: systolic blood pressure <120 and diastolic blood pressure <80
• Pre-hypertension: systolic blood pressure between 120-139 or diastolic blood pressure between 80-89
• Stage I hypertension: systolic blood pressure between 140-159 or diastolic blood pressure between 90-99
• Stage II hypertension: systolic blood pressure of 160 or more or diastolic blood pressure of 100 or more

The American Heart Association uses the following classification for total cholesterol levels:

• Desirable: total cholesterol <200 mg/dL,
• Borderline high risk: total cholesterol between 200–239 mg/dL and
• High risk: total cholesterol of 240 mg/dL or greater

Body mass index (BMI) is computed as the ratio of weight in kilograms to height in meters squared and the following categories are often used:

• Underweight: BMI <18.5
• Normal weight: BMI between 18.5-24.9
• Overweight: BMI between 25-29.9
• Obese: BMI of 30 or greater

These are all examples of common continuous measures that have been categorized to create ordinal variables. The table below is a frequency distribution table for the ordinal blood pressure variable. The mutually exclusive and exhaustive categories are shown in the first column of the table. The frequencies, or numbers of participants in each response category, are shown in the middle column and the relative frequencies, as percentages, are shown in the rightmost columns. The key summary statistics for ordinal variables are relative frequencies and cumulative relative frequencies.

Table 5 - Frequency Distribution for Blood Pressure Category

Blood Pressure

Frequency

Relative Frequency (%)

Cumulative Frequency

Cumulative Relative Frequency, %

Normal

1,206

34.1

1,206

34.1

Pre-Hypertension

1,452

41.1

2,658

75.2

Stage I Hypertension

653

18.5

3,311

93.7

Stage II Hypertension

222

6.3

3,533

100.0

Total

3,533

100.0

Note that the cumulative frequencies reflect the number of patients at the particular blood pressure level or below.  For example, 2,658 patients have normal blood pressure or pre-hypertension. There are 3,311 patients with normal, pre-hypertension or Stage I hypertension. The cumulative relative frequencies are very useful for summarizing ordinal variables and indicate the proportion (between 0-1) or percentage (between 0%-100%) of patients at a particular level or below. In this example, 75.2% of the patients are NOT classified as hypertensive (i.e., they have normal blood pressure or pre-hypertension). Notice that for the last (highest) blood pressure category, the cumulative frequency is equal to the sample size (n=3,533) and the cumulative relative frequency is 100% indicating that all of the patients are at the highest level or below.

Levels of circulating thyroid hormones provide the basis for classifying patients for diagnosis and therapy.  Based on circulating levels, patients are classified as euthyroid (normal levels), hypothyroid (low levels) or hyperthyroid (high levels).  What type of variable is this?

Table 6 - Frequency Distribution Table for Smoking Status

Smoking Status

Frequency

Relative Frequency, %

Non-Smoker

1,330

37.6

Former

1,724

48.8

Current

482

13.6

Total

3,536

100.0