# Analyzing Categorical Data

## Risk Ratios and Odds ratios

In analyzing epidemiological data one is often interested in calculating the risk ratio (RR, sometimes referred to as relative risk), which is the ratio of the risk (probability) of disease among the exposed compared to the risk (probability) of disease among the non-exposed. It indicates how many times the risk is increased in "exposed" subjects compared to unexposed subjects. In this situation the "exposures" are the various foods that may have been consumed, so we would want to compute the RRs comparing people who ate a particular food item to those who did not eat that item.

One can also calculate the odds of disease among those who ate a given food and the odds of disease among those who didn't eat it. From these, one can compute the odds ratio. For a more complete explanation see the following epidemiology module

## The Difference Between "Probability" and "Odds"?

• The probability that an event will occur is the fraction of times you expect to see that event in many trials. Probabilities always range between 0 and 1.
• The odds are defined as the probability that the event will occur divided by the probability that the event will not occur.

If the probability of an event occurring is Y, then the probability of the event not occurring is 1-Y. (Example: If the probability of an event is 0.80 (80%), then the probability that the event will not occur is 1-0.80 = 0.20, or 20%.

The odds of an event represent the ratio of the (probability that the event will occur) / (probability that the event will not occur). This could be expressed as follows:

Odds of event = Y / (1-Y)

So, in this example, if the probability of the event occurring = 0.80, then the odds are 0.80 / (1-0.80) = 0.80/0.20 = 4 (i.e., 4 to 1).

• If a race horse runs 100 races and wins 25 times and loses the other 75 times, the probability of winning is 25/100 = 0.25 or 25%, but the odds of the horse winning are 25/75 = 0.333 or 1 win to 3 loses.
• If the horse runs 100 races and wins 5 and loses the other 95 times, the probability of winning is 0.05 or 5%, and the odds of the horse winning are 5/95 = 0.0526.
• If the horse runs 100 races and wins 50, the probability of winning is 50/100 = 0.50 or 50%, and the odds of winning are 50/50 = 1 (even odds).
• If the horse runs 100 races and wins 80, the probability of winning is 80/100 = 0.80 or 80%, and the odds of winning are 80/20 = 4 to 1.

NOTE that when the probability is low, the odds and the probability are very similar.

Note that in case-control studies one, but one can compute an odds ratio. annot compute a risk ratio, For a more detailed explanation refer to these epidemiology module: Measures of Association.

> table(case)

case

FALSE  TRUE

625   469

The probability of being a case is 469/length(case) or 42.9%.

On the other hand the odds of being a case is 469/625 = 0.7504.

R has a number of packages that you need to install to use; these calculate odds ratios, relative risks, and do tests and calculate confidence intervals for these quantities. (Although we can also calculate these by writing our own code!) Some examples are the packages epitools, epiR, epibasix, which can be installed from the CRAN website. Here we'll use epitools.

>library(epitools)

> riskratio(beefcurry[which(beefcurry!=9)], case[which(beefcurry !=9)])

# risk ratio for cases among those eating beef curry, removing the missing

\$data

Outcome

Predictor   0   1 Total

0      69  22    91

1     551 447   998

Total 620 469  1089

\$measure

risk ratio with 95% C.I.

Predictor estimate    lower    upper

0  1.00000       NA       NA

1  1.85266 1.279276 2.683039

\$p.value

two-sided

Predictor   midp.exact fisher.exact   chi.square

0           NA           NA           NA

1 0.0001033504  0.000144711 0.0001437224

• H0: There is no association between gastrointestinal illness and eating beef curry: RR = 1
• Ha: There is an association between gastrointestinal illness and eating beef curry: RR ≠ 1

When testing the null hypothesis that there is no association between gastrointestinal illness and eating beef curry we reject the null hypothesis (p = 0.000143). Those who ate beef curry have 1.85 times the risk (95% CI 1.28, 2.68) of having gastrointestinal illness in comparison to those who did not eat beef curry.

> oddsratio(beefcurry[which(beefcurry!=9)],case[which(beefcurry !=9)])

\$data

Outcome

Predictor   0   1 Total

0      69  22    91

1     551 447   998

Total 620 469  1089

\$measure

odds ratio with 95% C.I.

Predictor estimate    lower    upper

0 1.000000       NA       NA

1 2.530309 1.564366 4.251073

\$p.value

two-sided

Predictor   midp.exact fisher.exact   chi.square

0           NA           NA           NA

1 0.0001033504  0.000144711 0.0001437224

When testing the null hypothesis that there is no association between gastrointestinal illness and eating beef curry we reject the null hypothesis (p = 0.000143). Those who ate beef curry have 2.53 times the odds (95% CI 1.56, 4.25) of having gastrointestinal illness in comparison to those who did not eat beef curry.

 Calculate the odds ratio and relative risk of developing food poisoning for those who had eaten éclairs. [Hint: first create a variable "eclair.eat" to enumerate people who had eaten eclairs]