Enhancing SAS Output With 'Proc Format'


The output generated by a SAS program is often the final product of lots of hard work. Therefore, it is often useful to format the output so that it can be read and understood without further documentation. To improve the readability of output, we can assign descriptions called formats to the values of variables included in a data step. For example, the variable sex can be formatted so that a value of 1 appears as 'male' in the output, and a value of 2 appears as 'female'.

To format a variable:

  1. Use proc format prior to the data step to define the formats.
  2. In the data step, assign the format to the specified variable(s) using a format statement. Here, the format name must be followed by a '.' in order to run.

 

 

Note: both steps 1 and 2 are needed to format variables.

Note also that you assign a name to the format (fname).

(Step 1)

proc format;

value fname <existing value1>='new value1'

<existing value2>='new value2'

…;

run;

 

data name1;

input varl var2 $ var3;

(Step 2)

format varl fname. var3 fname.;

<Programming Statements>;

datalines;

<Data Matrix>

run;

 

Example:

Formatting Numeric Variables

This example is modified from Cody and Smith, Chapter 3, Section C. Note that the format statement is placed before the cards (or datalines) statement.

 

(Step 1)

proc format;

/*Creates a format called sexfmt with values 'Male" and Female' for the numeric values 1 and 2*/

value sexfmt 1='Male' 2='Female';

value racef 1='White' 2='African-Amer' 3='Hispanic' 4='Other';

value maritalf 1='Single' 2='Married' 3='Widowed' 4='Divorced';

value educf 1='High Sch or Less' 2='2 Yr College' 3='4 Yr College' 4='Graduate Degree';

value likertf 1='Strgly Disagree' 2='Disagree'

3='Neutral' 4='Agree' 5='Strgly Agree';

run;

 

data questionnaire;

input id age gender race marital educ pres arms cities;

 

/* assign formats to variables */

(Step 2)

format gender sexfmt. race racef. marital maritalf.

educ educf. pres arms cities likertf.;

 

/* note that the format "likertf" is being applied to three variables */

cards;

;

run;

  

/* Tables of formatted variables */

proc freq data=questionnaire;

tables gender race marital educ pres arms cities;

run;

 

 

 

Example:

Formatting Character Variables

A format name for a character variable must conform to certain rules:

/*Remember that proc format just creates the formats, no variable gets formatted until the format is assigned to a variable in a data step*/

(Step 1)

proc format;

value $sexf 'm'='male' 'f'='female';

value agegrpf 1='1= 45 and under' 2='2= older than 45';

run;

 

 

data one;

input id name $ sex $ age weight height x1 x2 x3 x4;

if (0 le age le 45) then agegroup=1;

else if age gt 45 then agegroup=2;

(Step 2)

format sex $sexf. agegroup agegrpf.; /* assign formats*/

cards;

;

run;

 

proc freq data=one;

tables sex agegroup;

run;

 

 

Note: Formats do not replace the actual values of the variable. When using statements that use the values of the variable, such as if-then statements, the actual values of the variable must be used.

For example, to select a subsample of male subjects from the above data step, write:

 

(CORRECT)

data males;

set one;

if sex='m';

run;

 

(INCORRECT)

data males;

set one;

if sex='male';

run;

 

Example: Final Summary

In this example we will:

options pageno=1 nodate ps=55 ls=87;

libname perm 'c:\temp';

 

proc format;

value smkf 0='nonsmoker' 1='smoker';

value racef 1='African American' 2='White' 3='Other';

value $riskf 'l'='Low risk' 'med'='Moderate risk' 'hi'='High risk';

run;

 

data perm.newhers; /* update the permanent data set in c:\temp */

set perm.hers;

 

* create a variable for total cholesterol;

chol=ldl+hdl;

 

* Create variable for race category;

if nonwhite = . or afr_amer = . then race=.;

else if afr_amer=1 then race=1;

else if nonwhite=0 then race=2;

else race=3;

 

* Create a risk variable;

if chol =. or smoking=. then risk=' ';

else if chol < 240 and smoking = 0 then risk ='l';

else if (chol < 240 and smoking = 1) or (chol ge 240 and smoking = 0) then risk='med';

else risk='hi';

 

format smoking smkf. race racef. risk $riskf.;

 

label risk="Risk category based on cholesterol and smoking status"

chol="Total cholesterol"

race="Racial group";

 

* only keep the variables that we will be using here;

keep chol age race risk smoking;

run;

 

* look at the ten youngest subjects in the data set;

proc sort data=perm.hers;

by age;

run;

 

proc print data=perm.hers (obs=10) noobs;

var age risk race chol smoking;

run;

 

* look at the ten oldest subjects in the data set;

proc sort data=perm.hers;

by descending age;

run;

 

proc print data=perm.hers (obs=10) noobs;

var age risk race chol smoking;

run;

* look at the risk by race tabulation;

proc freq data=perm.hers;

tables risk*race;

run;