The Data Step


The general format of a data step is as follows:

data name1;

input varl var2 $ var3;

<Programming Statements>;

datalines;

<Data Matrix>

;

run;

 

Line 1: In the first line we designate a name for the new data set. Here the data set is called name1. The data set name may be up to 32 alphanumeric characters and must begin with a letter. No special characters are allowed in the name except for '_'.

Line 2: The input statement indicates which variables are included in the data set. Here there are 3 variables with the names: var1, var2, var3. SAS differentiates between variables whose values are numeric and variables whose values are character. For character variables, a dollar sign '$' must be added after the name of the variable (like for var2 above). The variable names may be up to 32 alphanumeric characters and must begin with a letter. No special characters are allowed in the variable name except for '_'.

Line 3: There may be many lines of programming statements between the input statement and the datalines statement. Programming statements are used to manipulate the variables in the data set, create new variables, label and format variables, and exclude observations from the data set.

Line 4: Tells SAS that the data to be analyzed are next. Note that cards may be used instead of datalines.

Line 5: The data matrix contain rows of observations and columns of variables.

Line 6: The final semicolon indicates that there is no more data to be read.

Line 7: The run; statement must be on the last line of the data step and indicates that the data step is finished.

 

Example:

In module 1 we created a very small data set in SAS as follows:

 

data weight;

input height weight;

cards;

65 130

70 150

67 145

72 180

62 110

;

run;

Procedure Steps


"Proc" statements are the procedures that are to be performed on the data set.

General Format:

proc <procedure name> data =<data set name> <options>;

<SAS statements>;

run;

Proc Print 

"proc print" is the procedure that lists data:

proc print data=name <options>;

var varl var2;

run;

Example:

First the data set is defined and the data ("cards") are input.

data one;

input studyid name $ sex $ age weight height;

cards;

run;

 

[The next steps in the program are commands to print the specified fields in the data set.]

proc print data=one;

var name sex age;

run;

 

Here is the resulting output:

Notice that, by default, SAS adds a variable OBS in the output for proc print that indexes the rows in the data set. However, the noobs option can be used to suppress OBS from the Output.

 

proc print data=one noobs;

var name sex age;

run;