Creating New Variables Using if-then;if-then-else; and if-then-else-then Statements
An if-then statement can be used to create a new variable for a selected subset of the observations.
For each observation in the data set, SAS evaluates the expression following the if. When the expression is true, the statement following then is executed.
Example:
if age ge 65 then older=1;
When the expression is false, SAS ignores the statement following then. For a person whose age is less than 65, the variable older will be missing.
Note that the above statement could equivalently be written
if age >= 65 then older=1;
An optional else statement can be included (if-then-else) to provide an alternative action when the if expression is false.
if age ge 65 then older=1;
else older=0;
For a person whose age is less than 65, the variable older will equal 0.
An optional else-if statement can follow the if-then statement. SAS evaluates the expression in the else-if statement only when the previous expression is false. else-if statements are useful when forming mutually exclusive groups.
if 40 < age <= 50 then agegroup=1;
else if 50 < age <= 60 then agegroup=2;
else if age > 60 then agegroup=3;
- A person whose age is between 40 and 50 (notice the strict inequality: those aged exactly 40 will not be included) will be in agegroup 1.
- A person aged between 50 and 60 will be in agegroup 2 (again, notice the strict inequality: those aged exactly 50 will not be included in this agegroup, but will be in agegroup 1).
- A person whose age is greater than 60 will be in agegroup 3.
- A person whose age is 40 or younger will not be assigned to an agegroup, and their agegroup variable will be missing.
Note that this if-then-else-if statement could equivalently be written
if 40 lt age le 50 then agegroup=1;
else if 50 lt age le 60 then agegroup=2;
else if age gt 60 then agegroup=3;
An if statement can be followed by exactly one else statement or by many else-if statements. SAS will keep evaluating the if-then-else-if statements until it encounters the first true statement.
Character variable data must always be enclosed in quotes.
|
The following code creates a new variable called group from an existing variable called gpa. The new variable called group takes on one of two values: "good standing" if a person's gpa is greater than or equal to 3.0 and "not good standing" if a person's gpa is less than 3.0.
data grades;
input name $ gpa;
if gpa<3.0 then group = "not good standing";
if gpa>=3.0 then group = "good standing";
cards;
Ann 3.7
Bart 2.9
Cecil 3.5
Denise 4.0
Emily 2.5
Frank 3.6
;
run;
proc print;
run;
This results in:
Note that SAS does not generally distinguish between upper and lower case (you can use either). The exception is in the value of character variables. The value "Good standing" is not the same as the value "good standing".
SAS code follows the rules of logic: SAS evaluates if-then statements in the order in which they appear in the datastep. |
Suppose we want to create a variable called gpagroup which takes on one of 3 values:
- "Excellent Grades" for those with a gpa greater than or equal to 3.5,
- "Good" for those with a gpa greater than or equal to 3.0 and
- "Satisfactory" for those with a gpa greater than or equal to 2.5.
We run the following code:
data grades;
input name $ gpa;
if gpa>=3.5 then gpagroup = "Excellent Grades";
if gpa>=3.0 then gpagroup = "Good";
if gpa >= 2.5 then gpagroup = "Satisfactory";
cards;
Ann 3.7
Bart 2.9
Cecil 3.5
Denise 4.0
Emily 2.5
Frank 3.6
;
run;
What went wrong?
We should instead use if-then-else statements as follows:
data grades;
input name $ gpa;
if gpa>=3.5 then gpagroup = "Excellent Grades";
else if gpa>=3.0 then gpagroup = "Good";
else if gpa >= 2.5 then gpagroup = "Satisfactory";
cards;
Ann 3.7
Bart 2.9
Cecil 3.5
Denise 4.0
Emily 2.5
Frank 3.6
;
run;
proc print;
run;