Performing Logistic Regression Analysis Using R

A survey was conducted among 600 teenagers to determine factors related to the likelihood of wearing a seatbelt when in a motor vehicle. The primary outcome was whether a seatbelt was always worn (coded 0 for no and 1 for yes). The independent variables were grade in school (grade), male sex (sexm), Hispanic (0 or 1), Asian (0 or 1), other race (raceother, coded 0 or 1), riding with a drinking driver (ridedd, coded 0 or 1), and having smoked tobacco in the past 30 days (smoke30, coded 0 or 1).

A multiple logistic regression analysis can be performed using the "glm" function in R (general linear models). "glm" includes different procedures so we need to add the code at the end "family=binomial (link=logit)" to indicate logistic regression. We can conduct the logistic analysis using the code below:

>log.out <-glm(beltalways~sexm + grade + hispanic + asian + raceother + ridedd + smoke30, family=binomial (link=logit))
> summary(log.out)

The default output gives the regression slopes which can be used to judge the direction of associations and their statistical significance.

Coefficients: Estimate Std. Error z value  Pr(>|z|)
(Intercept) -1.31109    0.84639  -1.549  0.121376
sexm        -0.15505    0.17060  -0.909  0.363414
grade          0.17192    0.07599  2.262  0.023672 *
hispanic     -0.10128    0.20118  -0.503  0.614646
asian        -0.32015    0.30163  -1.061  0.288514
raceother    -0.01991    0.42393  -0.047  0.962535
ridedd       -0.65090    0.18932  -3.438  0.000586 ***
smoke30      -0.60969    0.24168  -2.523  0.011646

However, since we used log(odds of seatbelt use) as the outcome, we need to exponentiate the coefficients in order to get the odds ratios. R can do this for us with the following command:

> exp(log.out$coeff)

(Intercept)      sexm      grade   hispanic      asian   raceother  
0.2695271   0.8563720  1.1875858  0.9036756  0.7260428   0.9802837

ridedd    smoke30
0.5215746  0.5435210

R will also generate the 95% confidence limits for each of these.

> exp(confint(log.out))
Waiting for profiling to be done...
2.5%      97.5%
(Intercept)  0.0510250  1.4138545
sexm         0.6126924  1.1963753
grade        1.0236885  1.3793179
hispanic     0.6086518  1.3404946
asian        0.4010068  1.3119174
raceother   0.4248549  2.2668680
ridedd       0.3590405  0.7547487
smoke30      0.3359862  0.8687327

 We might summarize these findings as shown in this table. 

Variable

Adjusted OR

95% Confidence Int.

p-value

Sex (M=1; F=0)

0.86

0.61,1.20

0.36

Grade in School

1.19

1.02, 1.38

  0.024

Race/Ethnicity

Hispanic vs. white

Asian vs. white

Other vs. white

 

0.90

0.73

0.98

 

0.61, 1.34

0.40, 1.31

0.43, 2.25

 

  0.615

0.29

  0.963

Ride with a drinking driver

0.52

  0.36, 0.76

  0.001

Smoke past 30 days

0.54

0.34, 0.87

  0.012

 

Test Yourself

Interpret the results of the analysis above in a few sentences. Write down your interpretation before looking at the answer.

Answer

 

Practice Your R Skills

You previously conducted an analysis of the data set called "Steroids_rct.csv" to determine whether birth weight differed in neonates delivered to mothers who had been treated with steroids. Since birth weight is a continuous outcome, you used multiple linear regression see if there was an association after controlling for confounding.

Recall that the primary outcome was a composite outcome. The outcome variable "outcome" was coded 1 if any one of the designated complications occurred, i.e., respiratory distress syndrome, bronchopulmonary dysplasia, severe intraventricular hemorrhage, sepsis or perinatal death, and "outcome" was coded 0 if none of these occurred. You will now use multiple logistic regression to look at determinants of "outcome."

The data are posted in a file called "Steroids_rct.csv". The variables are named and coded as follows:

Variable

Coded as:

treatment

1=steroids; 0=placebo

malesex

1=male; 0=female

gestage

gestational age in weeks

birthweight

birthweight of the infant in grams

mat_age

maternal age in years

Questions:

  1. Is there an effect of treatment on "outcome"? (Run the crude, or unadjusted, analysis)
    1. Run a chi-square test of independence.
    2. Also run a simple logistic regression analysis and also compute the odds ratio for treatment and the 95% CI for the odds ratio.
  1. Is there an effect of treatment on outcome adjusting for gestational age, maternal and age infant sex?

 

Link to answers in a Word file