Using proc freq to Perform Chi-Square Tests


Example:

The Kawasaki study data are in a SAS data set with 167 observations (one for each child) and three variables, an ID number, treatment arm (GG or ASA), and an indicator variable for any CA abnormality at visit 3 or visit 4.

 

The ORDER= Option

The PROC FREQ statement has an option that defines the order in which values appear in frequencies and crosstabs generated by PROC FREQ.

The default is ORDER=INTERNAL, which means that data is ordered (alphabetically or numerically) by the unformatted values of the data. For example, the ARM variable in the above example takes on a value of 'ASA' or 'GG,' and thus, by default, the ASA values will appear before the GG values in the PROC FREQ output.

The option ORDER=FORMATTED will order the data by (ascending) formatted values of variables. The impacts of other ORDER= options are given at the end of this module.

Formatting the outcome so that the event is in the first column

Using the format below, since "E" comes before "N" alphabetically, "Event" will be in column 1 and "No event" in column 2. However, ASA will be in row 1 since ASA is formatted "0-Aspirin" and GG is formatted "1-Gamma Globulin".

 

proc format;

value $armf "ASA"="0-Aspirin" "GG"="1-Gamma Globulin";

value eventf 0='No event' 1='Event';

run;

 

proc freq data=d.kawa; order=formatted;

format arm $armf. anyv34 eventf.;

tables arm*anyv34;

run;

 

Other Options

We can keep including a format statement in each proc but let's instead format them in a data step.

 

data one;set d.kawa;

format arm $armf. anyv34 eventf.;

There are several options that can be included after a / in the TABLE statement.

Suppressing the Column and Overall Percentage

proc freq data=one order=formatted;

tables arm*anyv34 / nocol nopercent;

run;

  

 

Including Expected Frequencies

proc freq data=one order=formatted;

tables arm*anyv34 / expected ;

run;

 

Requesting the Chi Square Test

proc freq data=one order=formatted;

tables arm*anyv34 / chisq ;

run;

The 2 x 2 table is produced as above, plus the following output.

 

The highlighted row contains the chi square statistic and its associated p-value

Note: If > 20% of the cell frequencies are <5, SAS will print a warning, and you should not use the chi-square test. Instead, use the Two-sided Fisher's Exact Test (printed by default when the table is 2 x 2).