Risk Ratios and Odds Ratios with 95% Confidence Interval

riskratio.wald(exposure_var, outcome_var)
When dealing with a cohort study or a clinical trial, this command calculates a risk ratio and 95% confidence interval for the risk ratio and also performs a chi-squared test.

oddsatio.wald(exposure_var, outcome_var)
When dealing with a case-control study, this command calculates an odds ratio and 95% confidence interval for the odds ratio and also performs a chi-squared test.

Note: The "base package" in R does not have this function, but R has supplemental packages that can be loaded to add additional analytical tools, including confidence intervals for RR and OR. These tools are in the "epitools" package. You need to install the Epitools package into your version of R once from the Console in R Studio. Then, whenever you want to use the "wald" functions, you need to include a line in your script that will load the package.

Installing the Epitools Package

Go to the Console window (lower left) in R and type:

>install.packages("epitools")

Be sure to include the quotation marks around "epitools". R wil install the package and display the following:

Installing package into 'C:/Users/wlamorte/Documents/R/win-library/3.5' (as 'lib' is unspecified) trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/epitools_0.5-10.1.zip' Content type 'application/zip' length 317397 bytes (309 KB)

Loading the Epitools Package When You Want to Use It

You only have to install the epitools package once, but you have to load it up each time you use it by including the following line of code in your script before the riskratio.wald command.

library(epitools)

R responds with:

Warning message: package 'epitools' was built under R version 3.4.2

Computing a Risk Ratio and 95% Confidence Limits from a Data Set

I have a data set from the Framingham Heart Study, and I want to compute the risk ratio for the association between type 2 diabetes ("diabetes") and risk of being hospitalized with a myocardial infarction ("hospmi"). I begin by creating a contingency table with the table() command.

table(diabetes,hospmi)
************hospmi
diabetes    0    1
*******0 2557  210
*******1  183   48

Then, to compute the risk ratio and confidence limits, I insert the table parameters into the riskratio.wald() function:

riskratio.wald(table(diabetes,hospmi))

$data

**************hospmi
diabetes   
***0   1 Total
*******0   2557 210  2767
*******1    183  48   231
   Total
***2740 258  2998

$measure
risk ratio with 95% C.I.
diabetes estimate    lower   upper
*******0  1.00000       NA      NA
********2.73791 2.062282 3.63488
NOTE: I changed the text color to red to call it to your attention. R does not do this.

Using the same data for illustration, I can similarly compute an odds ratio and its confidence interval using the oddsratio.wald() function:

oddsratio.wald(table(diabetes,hospmi))
$data

hospmi
diabetes   
**0   1 Total
*******0   2557 210  2767
*******1    183  48   231
***Total ***2740 258  2998

$measure
odds ratio with 95% C.I.
diabetes estimate    lower    upper
*******0 1.000000       NA       NA
*******1 *3.193755 2.256038 4.521233

Computing Risk Ratios and Odds Ratios using the epiR package

There are many extra packages for R and many alternate ways to compute things. Another package that is useful for risk ratios and odds ratios is the epiR package. Like the epi.tools package, it must be installed once, and then it must be loaded into each script in which it is used.

To do the one time installation, go to the lower right window and click on the Packages tab and then on the Install tab. In the window that opens, enter epiR as shown below.

Then click on the Install button, and wait a few seconds while the package is installed.

Then, to use the program, you must load it into your script. Here is an example of its use in calculating a risk ratio, the 95% confidence interval for the risk ratio, the risk difference, and the attributable fraction.

In the example below, I use a data set from the Framingham Heart Study to create a table called "TAB" that summarizes the occurrence of being hospitalized for a myocardial infarction (hospmi) for diabetics and non-diabetics. I then print TAB to verify the counts, then call up the epiR package, and then give the command

> epi.2by2(TAB,method="cohort.count", conf.level = 0.95)

This asks R to use the data object called "TAB" and to analyze it as counts in a cohort study and compute the 95% confidence interval for the risk ratio.

> TAB<-table(diabetes, hospmi)
> TAB

              hospmi
diabetes       0       1
            0 2557   210
            1   183     48

> library(epiR)

Package epiR 1.0-15 is loaded
Type help(epi.about) for summary information
Type browseVignettes(package = 'epiR') to learn how to use epiR for applied epidemiological analyses

> epi.2by2(TAB,method="cohort.count", conf.level = 0.95)
                        Outcome +     Outcome -     Total      Inc risk *       Odds
Exposed +                 2557           210       2767            92.4      12.18
Exposed -                   183            48        231            79.2       3.81
Total                       2740           258       2998            91.4      10.62

Point estimates and 95% CIs:
-------------------------------------------------------------------
Inc risk ratio                           1.17 (1.09, 1.25)
Odds ratio                               3.19 (2.26, 4.52)
Attrib risk *                          13.19 (7.87, 18.51)
Attrib risk in population *             12.17 (6.85, 17.50)
Attrib fraction in exposed (%)          14.27 (8.34, 19.82)
Attrib fraction in population (%)       13.32 (7.73, 18.57)
-------------------------------------------------------------------
  Test that OR = 1: chi2(1) = 47.158 Pr>chi2 = <0.001
  Wald confidence limits
  CI: confidence interval
  * Outcomes per 100 population units

NOTE: The "Attributable Risk (Attrib risk;) is the risk difference. The last two measures can be ignored for PH717. Also, since the Framingham Heart Study was a cohort study, we can ignore the odds ratio.

epi R for Case-Control Studies

If we were analyzying a table from a case control study, we would use the following command:

> epi.2by2(TAB, method="case.control", conf.level = 0.95)

                Outcome +       Outcome -       Total       Prevalence *     Odds
Exposed +           2557           210        2767               92.4     12.18
Exposed -             183             48         231               79.2      3.81
Total                2740             258        2998               91.4     10.62
 

Point estimates and 95% CIs:
-------------------------------------------------------------------
Odds ratio (W)                                   3.19 (2.26, 4.52)
Attrib prevalence *                             13.19 (7.87, 18.51)
Attrib prevalence in population *               12.17 (6.85, 17.50)
Attrib fraction (est) in exposed   (%)          68.67 (54.64, 78.07)
Attrib fraction (est) in population (%)         64.10 (51.97, 73.17)
-------------------------------------------------------------------
  Test that OR = 1: chi2(1) = 47.158 Pr>chi2 = <0.001
  Wald confidence limits
  CI: confidence interval
  * Outcomes per 100 population units

Here we are only interested in the odds ratio and its 95% confidence interval. The other output can be ignored.

Computing a Risk Ratio and 95% Confidence Limits When you DON'T Have a Data Set

The preceding illustration showed how you use a table() command to create a contingency table that can be interpreted by riskratio.wald() or oddsratio.wald from a data set. However, suppose you don't have the raw data set, and you just have the counts in a contingency table. In this situation, the first task is to create a contingency table that R can interpret correctly in order to compute RR or OR and the corresponding 95% confidence interval.

Creating a contingency table that R can understand

When R executes the table() command, it does so with the lowest named variables first in both the rows and columns. Here is the table showing the distribution of being hospitalized for a myocardial infarction (hospmi) among those with and without type 2 diabetes.

**************hospmi
diabetes   
***0   1 Total
*******0   2557 210  2767
*******1   *183  48   231
   Total
**2740 258  2998

Note that it lists those without diabetes in the first row, and it list those without having been hospitalized for MI in the first column, since 0 comes before 1.

Important: Adhering to this "lowest first" format will become important if you want to run riskratio.wald() if you don't have a raw data. If you are given the counts in a contingency table without access to the raw data set you will need to create a contingency table in R that adheres to this structure using the matrix() function, as explained below.

Note also that "riskratio.wald" can be used to analyze prevalence data also. In this case, the procedure calculates a prevalence ratio and its 95% confidence limits.

If you are given the counts in a contingency table, i.e., you do not have the raw data set, you can re-create the table in R and then compute the risk ratio and its 95% confidence limits using the riskratio.wald() function in Epitools.

No CVD (0) CVD (1) Total
No HTN (0) **********1017 **********165 ********1182
HTN (1) **********2260 **********992 ********3252

This is where the orientation of the contingency table is critical, i.e., with the unexposed (reference) group in the first row and the subjects without the outcome in the first column.

The contingency table for R is created using the matrix function, entering the data for the first column, then second column as follows:

R Code:

# the 1st line below creates the contingency table; the 2nd line prints the table so you can check the orientation of the numbers
RRtable<-matrix(c(1017,2260,165,992),nrow = 2, ncol = 2)
RRtable

     [,1] [,2]
[1,] 1017  165
[2,] 2260  992
# The next line asks R to compute the RR and 95% confidence interval
riskratio.wald(RRtable)
$data
         
*******Outcome
Predictor  Disease1 Disease2 Total
Exposed1      1017      165  1182
Exposed2      2260      992  3252
Total         3277     1157  4434

$measure
risk ratio with 95% C.I.
Predictor  estimate   
*lower    upper
Exposed1
**1.000000        NA  *****NA
Exposed2 ****2.185217 *1.879441 2.540742

NOTE: The "Exposed2 line shows the risk ratio and the lower and upper limits of its 95% confidence interval. I changed the text color to red to bring this to your attention. R does not show this in red.

$p.value
two-sided
Predictor  midp.exact fisher.exact  chi.square
Exposed1          NA      
******NA**********NA          
Exposed2           0
*7.357611e-31 *1.35953e-28zz
NOTE: The last entry in the line above shows the p-value from the chi-squared test, which I highlighted in red. R does not show this in red.

$correction [1] FALSE

attr(,"method") [1] "Unconditional MLE & normal approlimation (Wald) CI"

The risk ratio and 95% confidence interval are listed in the output under $measure.

An Alternative Method for Reading Table Data into Riskratio.wald

The table below summarizes the prevalence of migraine headaches in people exposed to low or high concentrations of flame retardants. The table is already oriented showing the least exposed in the first column and the non-diseased subjects in the first column, i.e., the format required by RStudio.

 

Disease -

Disease +

Total

Exp. -

380

20

400

Exp. +

540

60

600

After loading epitools, I can employ riskratio.wald to compute the prevalence ratio using the previously described method to create a table object called "mytab" and using the matrix command to read the data by COLUMNS and specifying the numbers of rows and columns.

mytab<-matrix(c(380,540,20,60),nrow=2,ncol=2)
riskratio.wald(mytab)

An alternative method is to have R read the count by ROWS using the following command:

riskratio.wald(c(380,20,540,60))

Note that this method does not use the matrix function, and it does not require one to specify the number of rows and columns. Nevertheless, both methods give identical output.

$data
Outcome
Predictor Disease1 Disease2 Total
Exposed1 380 20 400
Exposed2 540 60 600
Total 920 80 1000

$measure
risk ratio with 95% C.I.
Predictor estimate lower upper
Exposed1 1 NA NA
Exposed2 2 1.225264 3.264603

$p.value
two-sided
Predictor midp.exact fisher.exact chi.square
Exposed1 NA NA NA
Exposed2 0.003676476 0.004173438 0.004300957

Computing an Odds and 95% Confidence Limits When you DON'T Have a Data Set

This procedure is similar to the preceding section, except that you will use the oddsratio.wald() function. Once again, it is critical that you use the matrix command correctly in order to create a contingency table that will give the correct results. We will illustrate using the same counts as in the example above.

No CVD (0) CVD (1) Total
No HTN (0) **********1017 **********165 ********1182
HTN (1) **********2260 **********992 ********3252

ORtable<-matrix(c(1017,2260,165,992),nrow = 2, ncol = 2)
ORtable
     [,1] [,2]
[1,] 1017  165
[2,] 2260  992
oddsratio.wald(ORtable)
$data
          Outcome
Predictor  Disease1 Disease2 Total
 Exposed1     1017      165  1182
 Exposed2     2260      992  3252
 Total        3277     1157  4434

$measure
odds ratio with 95% C.I.
Predictor  estimate    lower    upper
Exposed1
*1.000000       NA    ****NA 
Exposed2
**2.705455 *2.258339 *3.241093

$p.value two-sided
Predictor  midp.exact fisher.exact  chi.square
Exposed1          NA  
*********NA**********NA                
Exposed2           0
*7.357611e-31 1.35953e-28

$correction [1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approlimation (Wald) CI"