The Chi-Squared Test of Independence


The chi-squared test of independence also uses the chi-squared statistic and chi-squared distribution, but it is used to test whether there is a difference in frequency among two or more groups. The outcome is categorical (2 or more levels) or ordinal. Therefore, there can be multiple rows or columns in our contingency table, and the degress of freedom are

where r= the number of rows in the contingency table, and c= the number of columns.

For example, in the following contingency table, df=(r-1)*(c-1)= (3-1)*(3-1)=4:

 

Good

Fair

Poor

High Exposure

 

 

 

Medium Exposure

 

 

 

Low Exposure

 

 

 

There are 3 exposure categories and 3 outcome categories, so df= (3-1) * (3-1) = 2*2 = 4

The research question can be phrased as either:

Therefore,

Example 1:

Investigators wanted to study factors related to whether an HIV individual would disclose the fact that they were HIV+ to their sexual partners.

[Stein MD, Freedberg KA, Sullivan LM, Savetsky J, Levenson SM, Hingson R, Samet JH. Sexual ethics. Disclosure of HIV-positive status to partners. 1998 Feb 9;158(3):253-7.]

The abstract stated:

"We interviewed 203 consecutive patients presenting for primary care for HIV at 2 urban hospitals. One hundred twenty-seven reported having sexual partners during the previous 6 months. The primary outcome of interest was whether patients had told all the sexual partners they had been with over the past 6 months that they were HIV positive.

One study sought to determine whether the frequency of disclosure varied depending on the potential mode of transmission risk, and their findings are shown in the table below.

Table 1: Observed Data

HIV Transmission Risk

Disclosed

Not Disclosed

Total

Injection Drug Use

35 (67%)

17

52

Homosexual contact

13 (52%)

12

25

Heterosexual contact

29 (58%)

21

50

Total

77 (60.6%)

50

127

 

Note that a total of 77 individuals out of 127 reported disclosure, and the other 50 did not. Therefore, the overall frequency of disclosure was 77/127= 60.6%. If there were no differences among the three groups, one would expect the frequency of disclosure to be 60.6% for each of the three groups. We can then calculate the number of expected disclosure in each of the three risk categories by multiplying the number of subjects in each category by 0.606. For example, there wer 52 injection drug users, so expected disclosures would be 52 x 0.606 = 31.5. And we can compute the expected number of not disclosures in this category by simply subracting 31.5 from 52, so the number of non-disclosures for injection drug use is 52-31.5=20.5. If we repeat this procedure for the other two risk categories, we can create the table of frequencies that would be expected if the null hypothesis were true, as shown in Table 2 below.

Table 2: Expected Under the Null Hypothesis

HIV Transmission Risk

Disclosed

Not Disclosed

Total

Injection Drug Use

31.5 (60.6%)

52-31.5 = 20.5

52

Homosexual contact

15.2 (60.6%)

25-15.2 = 9.8

25

Heterosexual contact

30.3 (60.6%)

50-30.3 = 19.7

50

Total

77 (60.6%)

50

127

 

Now we can compute the chi-squared statistic using the formula

Next, we need to compute the degrees of freedom, which is

where r = the number of category rows and c = the number of category columns. In this case:

We can see from the chi-squared table that the critical value of χ2 with 2 degrees of freedom and α=0.05 is 5.99, but our computed is only 1.95, so we would fail to reject the null hypothesis, and we would conclude that there is insufficient evidence to conclude that the frequency of disclosure varies among these three risk categories.

However, we can get a better idea of the actual p-value by using the 1-pchisq()command in R and providinn the chi-squared statistic and the degrees of freedom in parentheses.

> 1-pchisq(1.95,2)

[1] 0.3771924

Therefore, the p-value is 0.38.

Note that we use 1-pchitest because we want the probability given by the upper tail of the chi-squared distribution.