Selecting Subsets of Observations Using 'if' and 'where' Statements


You can create a new data set with only a subset of the observations in the original data set using an if or where statement.

To create a new data set that only includes a subset of observations from an existing data set, use a set statement in conjunction with a subsetting if statement; this is often called a "select if" statement. The set statement creates a duplicate of the original data step and the if statement selects only the observations that meet the if statement requirement.

The where statement can be used equivalently in a data step (we will see that it can also be used in procs, while the if statement is specific to data steps).

Example:

There is a data set called pbkid which we will describe in detail in the next module. For now, assume it is a SAS data set with 76 boys and 48 girls.

First, we select only girls, i.e., those with sex=2, using an if statement (or a "select if"), and produce their mean IQ score.

 

data pbf;

set pbkid;

if sex=2;

run;

 

proc means data=pbf;

var iq;

run;

 

Next, we select only boys, those with sex=1, using a where statement, and produce their mean IQ score:

 

data pbm;

set pbkid;

where sex=1;

run;

 

proc means data=pbm;

var iq;

run;

 

The 'where' Statement in Procs

Above, to produce statistics on a subset of our observations only, we created a subset data set using an if (or where) statement in the data step, and then applied the proc.

Alternatively, we can use a where statement directly in the proc as shown below.

 

proc means data=pbkid;

var iq;

where sex = 1;

title1 'MALE (1) IQ SCORES';

run;

proc means data=pb;

var iq;

where sex=2;

title1 'FEMALE (2) IQ SCORES';

run;