Selecting Subsets of Observations Using 'if' and 'where' Statements
You can create a new data set with only a subset of the observations in the original data set using an if or where statement.
To create a new data set that only includes a subset of observations from an existing data set, use a set statement in conjunction with a subsetting if statement; this is often called a "select if" statement. The set statement creates a duplicate of the original data step and the if statement selects only the observations that meet the if statement requirement.
The where statement can be used equivalently in a data step (we will see that it can also be used in procs, while the if statement is specific to data steps).
Example:
There is a data set called pbkid which we will describe in detail in the next module. For now, assume it is a SAS data set with 76 boys and 48 girls.
First, we select only girls, i.e., those with sex=2, using an if statement (or a "select if"), and produce their mean IQ score.
data pbf;
set pbkid;
if sex=2;
run;
proc means data=pbf;
var iq;
run;
Next, we select only boys, those with sex=1, using a where statement, and produce their mean IQ score:
data pbm;
set pbkid;
where sex=1;
run;
proc means data=pbm;
var iq;
run;
The 'where' Statement in Procs
Above, to produce statistics on a subset of our observations only, we created a subset data set using an if (or where) statement in the data step, and then applied the proc.
Alternatively, we can use a where statement directly in the proc as shown below.
proc means data=pbkid;
var iq;
where sex = 1;
title1 'MALE (1) IQ SCORES';
run;
proc means data=pb;
var iq;
where sex=2;
title1 'FEMALE (2) IQ SCORES';
run;