Mathematical Expressions and SAS Functions


In order to understand how to create new variables using mathematical expressions in SAS we must first review the rules of operation:

 

Rule 1: Expressions within parentheses are evaluated first.

Rule 2: Operations are performed in order of priority.

= or eq (equal)

^= or ne (not equal)

> or gt (greater than)

< or lt (less than)

>= or ge (greater than or equal to)

<= or le (less than or equal to)

^ or not (negation)

Rule 3: For operators with the same priority, operations are performed left to right except for priority 1 operations which are performed right to left.

 

Example 1:

. B = 3, C = 6, D = 9

 

X = B * C / D

= 3 * 6 / 9

= 18 / 9

= 2

Example 2: 

G = 2, H = 4, I = 1, J = 3

 

X = G / I + H * J

= 2 / 1 + 4 * 3

= 2 + 4 * 3

= 2 + 12

= 14

Example 3: 

Y = 2, Z = 3, A = 2

 

X = Y * Z**A

= 2 * 3**2

= 2 * 9

= 18

Functions


Sum Function

Calculates the sum of the variables in parentheses. Missing values are treated as 0.

sum(var1, var2, var3)

Mean Function

Finds the average of the variables in parentheses. If a value is missing for a given observation, then the average of the non-missing variables is calculated.

mean(var1, var2, var3, var4)

Square Root Function

sqrt(var)

Natural Logarithm Function

log(var)

Example:

We have data on brain MRI measures in six people, one measure of total cranial volume (TCV), which should not change much over time, and three measures of total brain volume (TCB).

We want to look at change in TCBV, the ratio of TCB to TCV from time 1 to time 2. We do this in two ways. First we calculate TCBV at each time, and simply subtract these two variables. We also do this in one statement (without first calculating the two TCBV variables).

Next we create a new variable which is the log of this difference.

Finally, we want to create the average TCB over the three measures (and try three different methods).

 

data one;

input id tcb1 tcb2 tcb3 tcv;

tcbv1=tcb1/tcv;

tcbv2=tcb2/tcv;

tcbv_d_A=(tcb1/tcv)-(tcb2/tcv);

tcbv_d_B=tcbv1-tcbv2;

log_d=log(tcbv_d_A);

mean_tcb=mean(of tcb1,tcb2,tcb3);

ave_tcb_A=(tcb1+tcb2+tcb3)/3;

ave_tcb_B=tcb1+tcb2+tcb3/3;

cards;

1 980 975 975 1255

2 994 980 970 1262

3 1015 1002 1000 1280

4 940 . 900 1240

5 1020 1010 . 1259

6 998 998 990 1245

Let's look at the log.

 

689 data one;

690 input id tcb1 tcb2 tcb3 tcv;

691 tcbv1=tcb1/tcv;

692 tcbv2=tcb2/tcv;

693 tcbv_d_A=(tcb1/tcv)-(tcb2/tcv);

694 tcbv_d_B=tcbv1-tcbv2;

695 log_d=log(tcbv_d_A);

696 mean_tcb=mean(of tcb1,tcb2,tcb3);

697 ave_tcb_A=(tcb1+tcb2+tcb3)/3;

698 ave_tcb_B=tcb1+tcb2+tcb3/3;

699 cards;

 

NOTE: Invalid argument to function LOG(0) at line 695 column 9.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---

705 6 998 998 990 1245

id=6 tcb1=998 tcb2=998 tcb3=990 tcv=1245 tcbv1=0.8016064257 tcbv2=0.8016064257 tcbv_d_A=0

tcbv_d_B=0 log_d=. mean_tcb=995.33333333 ave_tcb_A=995.33333333 ave_tcb_B=2326 _ERROR_=1 _N_=6

NOTE: Missing values were generated as a result of performing an operation on missing values.

Each place is given by: (Number of times) at (Line):(Column).

1 at 692:13 1 at 693:28 1 at 694:17 1 at 695:9 1 at 697:18 1 at 697:23

1 at 698:17 1 at 698:27

NOTE: Mathematical operations could not be performed at the following places. The results of the

operations have been set to missing values.

Each place is given by: (Number of times) at (Line):(Column).

1 at 695:9

NOTE: The data set WORK.ONE has 6 observations and 13 variables.

 

Since for ID 6, the difference between TCBV at times 1 and 2 is zero, the log of this difference cannot be calculated, and SAS tells you this, and sets log_d to missing.

Notice that tcbv_d_A and tcbv_d_B are exactly the same.

 

Which of the three average calculations is correct?