Creating New Variables


A New Variable

Suppose we have a data set called weight which has height and weight data.

 

data weight;

input height weight;

cards;

65 130

70 150

67 145

72 180

62 110

;

run;

We would like to create a new data set with a new variable, BMI, or body mass index, based on height and weight.

To create a new variable choose a name for the new variable, use a data step, and then define it based on already existing variables using the equals sign (=).

Examples 

Body mass index (BMI) is equal to (weight in pounds x 703) / (height in inches)2

So in this case, if we had a data set that contained weight in pounds and height in inches, we could use SAS to compute a derived variable called "bmi" based on these two other variables. Here's how we can do this in SAS:

 

data w;

input height weight;

bmi = (weight*703)/(height**2);

cards;

65 130

70 150

67 145

72 180

62 110

;

run;

The data set "w" has three variables, height, weight, and bmi. Note that the statement creating the new variable, bmi, is between the input statement and the cards statement. The creation of a new variable always occurs within a data step.

Note: The creation of a new variable always occurs within a data step.

Creating New Data Sets


The Set Statement

It is also possible to take an existing data set and create a new data set with additional variables, instead of inputting the data anew. We first create a copy of the data set using the set statement, and then make changes in the data step. The following data step creates a SAS data set called weight_new, which is identical to the SAS data set weight. 

data weight_new;

set weight;

run;

The set statement puts the data from the data set weight (created above) into a new data set called weight_new. Because the data set weight already exists within SAS, no input statement is necessary. Note that the structure and contents of the new data set weight_new are identical to those of the SAS data set weight.

You can look at your log file to confirm what your code is doing:

 

92 data weight_new;

93 set weight;

94 run;

NOTE: There were 5 observations read from the data set WORK.WEIGHT.

NOTE: The data set WORK.WEIGHT_NEW has 5 observations and 2 variables.

The log will always show you your code and then log notes (and warnings and errors). From now on, we will only show the actual log (not the code).

In order to create a new variable in an existing SAS data set, the data set must first be read into SAS and then a data step must be used to create a new SAS data set and the new variable.

The following data step creates a new (temporary) SAS data set called bmidata, which is identical to the SAS data set weight but with the addition of a new variable bmi.

 

data bmidata1;

set weight;

bmi = (weight*703)/(height**2);

run;

NOTE: There were 5 observations read from the data set WORK.WEIGHT.

NOTE: The data set WORK.BMIDATA1 has 5 observations and 3 variables.

 

New Data set From a Permanent SAS Data Set

We can also create a new data set from an already existing permanent SAS data set. We first import the permanent SAS data set in the C: directory called weight.sas7bdat,and create a new (temporary) SAS data set called bmidata2, with the addition of the variable bmi.

libname indata 'C:\Users';

data bmidata2;

set indata.weight;

bmi = (weight*703)/(height**2);

run;

NOTE: There were 5 observations read from the data set INDATA.WEIGHT.

NOTE: The data set WORK.BMIDATA2 has 5 observations and 3 variables.

 Next, we will import the permanent SAS data set weight.sas7bdat in the C: directory and create a new permanent SAS data set in the C: directory called weight1.sas7bdat, again adding the variable bmi. Note that we do not have to include the libname statement again since we have already done so above (in the same SAS session).

data indata.weight1;

set indata.weight;

bmi = (weight*703)/(height**2);

run;

  

NOTE: There were 5 observations read from the data set INDATA.WEIGHT.

NOTE: The data set INDATA.WEIGHT1 has 5 observations and 3 variables.