SAS Basics - Part 2

Module 3: SAS Programming Basics - Part 2

 

Introduction


This module continues the introduction to some basic, but very important and frequently used commands and operations in SAS.

Learning Objectives


After completing this modules, the student will be able to: 

 

 

 

'Title' & 'Footnote' Statements


To make your output easier to read, you can use the title statement to create output page headers and the footnote statement to create output page footers.

 

Title and footnote statements must come BEFORE or INSIDE the procedure for which they are to appear.

When title or footnote statements of the same number are used, the title or footnote is replaced.

When different title or footnote numbers are used, as in the examples below, the titles will appear one after another in the output. There can be as many as 10 titles or footnotes.

 

title1 'you supply1';

title2 'you supply2';

 

footnote1 'you supply1';

footnote2 'you supply2';

footnote3 'you supply3';

The titles and footnotes will continue to appear for all subsequent procedure output until replaced. Follow the example code below to erase all titles and footnotes.

 

title ;

footnote ;

Example:

data sgakids;

input id grp age glucose lactate alanine;

cards;

;

run;

 

title1 ' Basic summary statistics ';

title2 ' PROC MEANS - glucose, lactate and alanine';

footnote ' SGA Kids ';

proc means data=sgakids;

var glucose lactate alanine;

run;

 

Note that the footnote actually appears at the bottom of the page; here we moved it up so you can see it…

 

'proc univariate'


proc univariate produces descriptive statistics on continuous variables just like proc means, but many more of them, and also can produce some univariate plots.

 

proc univariate data=name;

var varl var2;

run;

 

Example:

 

title2 ' PROC UNIVARIATE - glucose ';

proc univariate data=sgakids;

var glucose;

run;

Notice that the title1 we used above will continue to be the first title, but we have reset the second title. The footnote will also continue to be displayed in your output (although we will not copy it here).

 

'proc chart' and 'proc freq'


proc chart is used to construct histograms for continuous variables or bar charts for categorical (or discrete) variables.

proc freq is used to produce frequency tables (categorical data only) 

Histograms

 

 

proc chart data=name;

vbar varl var2;

run;

vbar tells SAS to produce a vertical bar chart/histogram. To produce a horizontal bar chart/histogram replace vbar with hbar.

Recall the Dixon and Massey example data set from the first module [Note: The 'dixonmassey' data set is from Dixon WJ and Massey FJ Jr.,: Introduction to Statistical Analysis, Fourth Edition, McGraw Hill Book Company, 1983.] 

 

data dixonmassey;

input Obs chol52 chol62 age cor dchol agelt50;

datalines;

;

Suppose we would like histograms of cholesterol in 1952, and a bar chart of coronary events.

proc chart;

vbar chol52;

run;

To plot this horizontally you would use the following: 

proc chart;

hbar chol52;

run;

 

You might notice these plots are not terribly attractive. You can instead use proc gchart, which operates in the same way but produces nicer looking figures.

proc gchart;

vbar chol52;

run;

 

 

 

proc gchart;

vbar chol52;

run;

 

Frequency Tables

To generate frequency tables you can use the generic commands:

 

proc freq data=name;

tables var;

run;

 

Example:

 

proc freq;

tables cor;

run;

 

 

Bar Charts

The discrete command tells SAS that the variable is discrete and to create a bar chart. Note that either vbar or hbar can be used.

 

The generic form is:

proc chart data=name;

hbar var/discrete;

run;

 

Example:

proc chart;

hbar cor/discrete;

run;

 

 

Here is what it would look like with gchart.

proc gchart;

vbar cor/discrete;

run;

  

'label' Statement


You can add variable labels using the label statement as shown here.

label varl='you supply label' var2='you supply label';

 

Variable labels are used to provide a more full description of a variable than the sometimes cryptic variable names.

 

label sbp1999='Systolic Blood Pressure in 1999';

When a label statement is placed in a data step, the label stays with the variable for all subsequent procedures, unless relabeled. When placed in a procedure the label only stays attached to the variable for that procedure.

 

Use double quotes if there is to be a single quote in the label. For example,

 

label mombp="mother's systolic bld pressure";

 

You can label multiple variables in one label statement.

label varl='you supply label' var2='you supply label';

Comments


You can make your program easier to follow by including comments. They will not appear in your output.

Comments start with * and end with ;

or

are placed inside /* */

They appear in green in your program.

 

Example:

 

title 'STUDY OF GENETIC COMPONENT OF HYPERTENSION';

footnote 'results of example for class 3';

 

data family;

input id mombp dadbp kidbp;

parentsbp=(mombp + dadbp) / 2; * CREATING A NEW VARIABLE;

 

/*Since a SAS statement ends with a semicolon (;), a statement may last over several lines like the label statement below. Or several SAS statements can be on the same line, like in the proc print step below.*/

 

label mombp="mother's systolic bld pressure"

parentsbp ="average of parents' bld pressure"

id='identification number for family';

datalines;

;

run;

 

**use the "label" option in proc print to print out the variable labels**;

proc print data=family label; var mombp dadbp parentsbp; run;

 

 

proc chart data=family;

vbar kidbp;

title2; ** erases title2;

run;

 

proc gchart data=family;

vbar kidbp;run;

 

STUDY OF GENETIC COMPONENT OF HYPERTENSION

 

 

STUDY OF GENETIC COMPONENT OF HYPERTENSION

 

 

STUDY OF GENETIC COMPONENT OF HYPERTENSION

 

 

Selecting Subsets of Variables Using 'keep' and 'drop' Statements


You can create a new data set with only a subset of the variables in the original data set using a keep or drop statement.

Suppose you want to print just three of the variables in this data set: studyid, age, and height.

 

data one;

input studyid name $ sex $ age weight height;

cards;

run;

Using 'var'

You can do this by specifying the variables in the var statement in proc print.

 

proc print data=one;

var studyid age height;

run;

Using 'keep'

However, you might want to do a lot of analyses on just those variables, or may want to have a data set with no identifying information such as subject name. If so, another way to do this is to use a keep statement to create a new data set only with the selected variables.

 

data two;set one;

keep studyid age height;

proc print;

run;

 

Using 'drop' 

Yet another way to do this is to use a drop statement to drop the other variables from your new data set.

 

data three;set one;

drop name sex weight;

proc print;

run;

 

These will all produce the same output.

Let's look at the log produced by running these three methods.

 

407 data one;

408 input studyid name $ sex $ age weight height;

409 cards;

 

NOTE: The data set WORK.ONE has 11 observations and 6 variables.

 

425 data two;set one;

426 keep studyid age height;

427 run;

 

NOTE: There were 11 observations read from the data set WORK.ONE.

NOTE: The data set WORK.TWO has 11 observations and 3 variables.

 

430

431 data three;set one;

432 drop name sex weight;

433 run;

 

NOTE: There were 11 observations read from the data set WORK.ONE.

NOTE: The data set WORK.THREE has 11 observations and 3 variables.

 

Selecting Subsets of Observations Using 'if' and 'where' Statements


You can create a new data set with only a subset of the observations in the original data set using an if or where statement.

To create a new data set that only includes a subset of observations from an existing data set, use a set statement in conjunction with a subsetting if statement; this is often called a "select if" statement. The set statement creates a duplicate of the original data step and the if statement selects only the observations that meet the if statement requirement.

The where statement can be used equivalently in a data step (we will see that it can also be used in procs, while the if statement is specific to data steps).

Example:

There is a data set called pbkid which we will describe in detail in the next module. For now, assume it is a SAS data set with 76 boys and 48 girls.

First, we select only girls, i.e., those with sex=2, using an if statement (or a "select if"), and produce their mean IQ score.

 

data pbf;

set pbkid;

if sex=2;

run;

 

proc means data=pbf;

var iq;

run;

 

Next, we select only boys, those with sex=1, using a where statement, and produce their mean IQ score:

 

data pbm;

set pbkid;

where sex=1;

run;

 

proc means data=pbm;

var iq;

run;

 

The 'where' Statement in Procs

Above, to produce statistics on a subset of our observations only, we created a subset data set using an if (or where) statement in the data step, and then applied the proc.

Alternatively, we can use a where statement directly in the proc as shown below.

 

proc means data=pbkid;

var iq;

where sex = 1;

title1 'MALE (1) IQ SCORES';

run;

proc means data=pb;

var iq;

where sex=2;

title1 'FEMALE (2) IQ SCORES';

run;

'proc sort', and the 'by' Statement


proc sort is the main tool for sorting a data set in SAS. The general format is as follows:

proc sort data=<name of data>;

by <name of variable>;

run;

Sorting by a Single Variable (default: ascending order)

data one;

input studyid name $ sex $ age weight height;

cards;

run;

 

proc sort data=one;

by weight;

run;

/*will sort data one by the variable weight in ascending order */

proc print data=one;

run;

 

• When sorted in ascending order (default), missing values are listed first because SAS treats numeric missing values as having a value of negative infinity.

• Sorting a data set is required when using a BY statement in a procedure as shown below.

The 'BY' Statement

The 'BY' statement instructs SAS to apply the SAS procedure for each subset of data as defined by the different values of the variable specified in the BY statement, and this works in the majority of SAS procedures. The general format is as follows:

 

proc <name of SAS Procedure> data=<name of data>;

<SAS Statements>

by <variable name>;

run;

IMPORTANT: Sorting is necessary when using a BY statement in a procedure. If the data set is not sorted an error message will appear in the Log File. Remember to always examine the Log File after running SAS data steps and procedures.

Example:

 

/* First sort the data */

proc sort data=one;

by sex;

run;

 

proc print data=one;

by sex;

run;

 

proc means data=one;

by sex;

var age weight height;

run;

 

 

Sorting in Descending Order by a Single Variable

Example:

 

proc sort data=one;

by descending height;

run;

 

proc print data=one;

id studyid;

var name age height;

run;

 

Enhancing SAS Output With 'Proc Format'


The output generated by a SAS program is often the final product of lots of hard work. Therefore, it is often useful to format the output so that it can be read and understood without further documentation. To improve the readability of output, we can assign descriptions called formats to the values of variables included in a data step. For example, the variable sex can be formatted so that a value of 1 appears as 'male' in the output, and a value of 2 appears as 'female'.

To format a variable:

  1. Use proc format prior to the data step to define the formats.
  2. In the data step, assign the format to the specified variable(s) using a format statement. Here, the format name must be followed by a '.' in order to run.

 

 

Note: both steps 1 and 2 are needed to format variables.

Note also that you assign a name to the format (fname).

(Step 1)

proc format;

value fname <existing value1>='new value1'

<existing value2>='new value2'

…;

run;

 

data name1;

input varl var2 $ var3;

(Step 2)

format varl fname. var3 fname.;

<Programming Statements>;

datalines;

<Data Matrix>

run;

 

Example:

Formatting Numeric Variables

This example is modified from Cody and Smith, Chapter 3, Section C. Note that the format statement is placed before the cards (or datalines) statement.

 

(Step 1)

proc format;

/*Creates a format called sexfmt with values 'Male" and Female' for the numeric values 1 and 2*/

value sexfmt 1='Male' 2='Female';

value racef 1='White' 2='African-Amer' 3='Hispanic' 4='Other';

value maritalf 1='Single' 2='Married' 3='Widowed' 4='Divorced';

value educf 1='High Sch or Less' 2='2 Yr College' 3='4 Yr College' 4='Graduate Degree';

value likertf 1='Strgly Disagree' 2='Disagree'

3='Neutral' 4='Agree' 5='Strgly Agree';

run;

 

data questionnaire;

input id age gender race marital educ pres arms cities;

 

/* assign formats to variables */

(Step 2)

format gender sexfmt. race racef. marital maritalf.

educ educf. pres arms cities likertf.;

 

/* note that the format "likertf" is being applied to three variables */

cards;

;

run;

  

/* Tables of formatted variables */

proc freq data=questionnaire;

tables gender race marital educ pres arms cities;

run;

 

 

 

Example:

Formatting Character Variables

A format name for a character variable must conform to certain rules:

/*Remember that proc format just creates the formats, no variable gets formatted until the format is assigned to a variable in a data step*/

(Step 1)

proc format;

value $sexf 'm'='male' 'f'='female';

value agegrpf 1='1= 45 and under' 2='2= older than 45';

run;

 

 

data one;

input id name $ sex $ age weight height x1 x2 x3 x4;

if (0 le age le 45) then agegroup=1;

else if age gt 45 then agegroup=2;

(Step 2)

format sex $sexf. agegroup agegrpf.; /* assign formats*/

cards;

;

run;

 

proc freq data=one;

tables sex agegroup;

run;

 

 

Note: Formats do not replace the actual values of the variable. When using statements that use the values of the variable, such as if-then statements, the actual values of the variable must be used.

For example, to select a subsample of male subjects from the above data step, write:

 

(CORRECT)

data males;

set one;

if sex='m';

run;

 

(INCORRECT)

data males;

set one;

if sex='male';

run;

 

Example: Final Summary

In this example we will:

options pageno=1 nodate ps=55 ls=87;

libname perm 'c:\temp';

 

proc format;

value smkf 0='nonsmoker' 1='smoker';

value racef 1='African American' 2='White' 3='Other';

value $riskf 'l'='Low risk' 'med'='Moderate risk' 'hi'='High risk';

run;

 

data perm.newhers; /* update the permanent data set in c:\temp */

set perm.hers;

 

* create a variable for total cholesterol;

chol=ldl+hdl;

 

* Create variable for race category;

if nonwhite = . or afr_amer = . then race=.;

else if afr_amer=1 then race=1;

else if nonwhite=0 then race=2;

else race=3;

 

* Create a risk variable;

if chol =. or smoking=. then risk=' ';

else if chol < 240 and smoking = 0 then risk ='l';

else if (chol < 240 and smoking = 1) or (chol ge 240 and smoking = 0) then risk='med';

else risk='hi';

 

format smoking smkf. race racef. risk $riskf.;

 

label risk="Risk category based on cholesterol and smoking status"

chol="Total cholesterol"

race="Racial group";

 

* only keep the variables that we will be using here;

keep chol age race risk smoking;

run;

 

* look at the ten youngest subjects in the data set;

proc sort data=perm.hers;

by age;

run;

 

proc print data=perm.hers (obs=10) noobs;

var age risk race chol smoking;

run;

 

* look at the ten oldest subjects in the data set;

proc sort data=perm.hers;

by descending age;

run;

 

proc print data=perm.hers (obs=10) noobs;

var age risk race chol smoking;

run;

* look at the risk by race tabulation;

proc freq data=perm.hers;

tables risk*race;

run;

 

 

 

 

Using the Output Delivery System (ODS)


SAS features a complicated, but powerful mechanism to format the somewhat voluminous reports that it produces. The Output Delivery System (ODS) allows the results of analyses to be created for the web (in HTML format), postscript, or Microsoft Word rich-text format (rtf).

For now, we will use ODS only to produce results that can be copied easily into Word documents. To do this, add the following statements around the procedures you want ODS to output to a file:

 

ODS RTF FILE='c:\temp\class2.RTF';

 

proc print data=one;

var name sex age;

id studyid;

run;

ODS RTF CLOSE;

The output will be save in the temp folder in a RTF file called 'class2.rtf'

You can enclose your entire program with the two statements if you want all your output in one file, or you can save separate files for separate parts of your results.

 

Using SAS Help


SAS offers extensive online help, which can be accessed from the Help menu

 

 

SAS Products and Learning to Use SAS are helpful menus. The Learning to Use SAS menu provides links to learning resources such as sample programs, web resources, and tutorials. The SAS Products menu provides links for different SAS products. In this class, only elements from Base SAS and SAS/STAT will be covered.

 Base SAS has help for commands, statements, and procedures dealing with data manipulation and basic summary statistics; SAS/STAT has help for procedures carrying out advanced statistical methods.

 

Example:

To find help for PROC MEANS, click SAS Products>Base SAS>SAS Procedures>Procedures>The MEANS Procedure

 

 

Note that the help for proc means is under Base SAS because the means procedure provides basic summary statistics.

The Index tab provides a keyword searchable index of the SAS commands and procedures.