Selecting Subsets of Variables Using 'keep' and 'drop' Statements


You can create a new data set with only a subset of the variables in the original data set using a keep or drop statement.

Suppose you want to print just three of the variables in this data set: studyid, age, and height.

 

data one;

input studyid name $ sex $ age weight height;

cards;

run;

Using 'var'

You can do this by specifying the variables in the var statement in proc print.

 

proc print data=one;

var studyid age height;

run;

Using 'keep'

However, you might want to do a lot of analyses on just those variables, or may want to have a data set with no identifying information such as subject name. If so, another way to do this is to use a keep statement to create a new data set only with the selected variables.

 

data two;set one;

keep studyid age height;

proc print;

run;

 

Using 'drop' 

Yet another way to do this is to use a drop statement to drop the other variables from your new data set.

 

data three;set one;

drop name sex weight;

proc print;

run;

 

These will all produce the same output.

Let's look at the log produced by running these three methods.

 

407 data one;

408 input studyid name $ sex $ age weight height;

409 cards;

 

NOTE: The data set WORK.ONE has 11 observations and 6 variables.

 

425 data two;set one;

426 keep studyid age height;

427 run;

 

NOTE: There were 11 observations read from the data set WORK.ONE.

NOTE: The data set WORK.TWO has 11 observations and 3 variables.

 

430

431 data three;set one;

432 drop name sex weight;

433 run;

 

NOTE: There were 11 observations read from the data set WORK.ONE.

NOTE: The data set WORK.THREE has 11 observations and 3 variables.