Using proc freq to Perform Chi-Square Tests
Example:
The Kawasaki study data are in a SAS data set with 167 observations (one for each child) and three variables, an ID number, treatment arm (GG or ASA), and an indicator variable for any CA abnormality at visit 3 or visit 4.
The ORDER= Option
The PROC FREQ statement has an option that defines the order in which values appear in frequencies and crosstabs generated by PROC FREQ.
The default is ORDER=INTERNAL, which means that data is ordered (alphabetically or numerically) by the unformatted values of the data. For example, the ARM variable in the above example takes on a value of 'ASA' or 'GG,' and thus, by default, the ASA values will appear before the GG values in the PROC FREQ output.
The option ORDER=FORMATTED will order the data by (ascending) formatted values of variables. The impacts of other ORDER= options are given at the end of this module.
Formatting the outcome so that the event is in the first column
Using the format below, since "E" comes before "N" alphabetically, "Event" will be in column 1 and "No event" in column 2. However, ASA will be in row 1 since ASA is formatted "0-Aspirin" and GG is formatted "1-Gamma Globulin".
proc format;
value $armf "ASA"="0-Aspirin" "GG"="1-Gamma Globulin";
value eventf 0='No event' 1='Event';
run;
proc freq data=d.kawa; order=formatted;
format arm $armf. anyv34 eventf.;
tables arm*anyv34;
run;
Other Options
We can keep including a format statement in each proc but let's instead format them in a data step.
data one;set d.kawa;
format arm $armf. anyv34 eventf.;
There are several options that can be included after a / in the TABLE statement.
- The norow, nocol,and nopercent options restrict the number of entries in the table.
- The measures option estimates the odds ratio and the relative risk with their accompanying confidence intervals.
- The chisq option requests the chi-square test. The expected option requests the expected cell frequencies be included in the cells. A warning displayed in the output if more than 20% of the cells have expected counts of less than 5.
Suppressing the Column and Overall Percentage
proc freq data=one order=formatted;
tables arm*anyv34 / nocol nopercent;
run;
Including Expected Frequencies
proc freq data=one order=formatted;
tables arm*anyv34 / expected ;
run;
Requesting the Chi Square Test
proc freq data=one order=formatted;
tables arm*anyv34 / chisq ;
run;
The 2 x 2 table is produced as above, plus the following output.
The highlighted row contains the chi square statistic and its associated p-value
Note: If > 20% of the cell frequencies are <5, SAS will print a warning, and you should not use the chi-square test. Instead, use the Two-sided Fisher's Exact Test (printed by default when the table is 2 x 2). |