Logistic Regression in R


To perform logistic regression in R, you need to use the glm() function.  Here, glm stands for "general linear model." Suppose we want to run the above logistic regression model in R, we use the following command:

> summary( glm( vomiting ~ age, family = binomial(link = logit) ) )

 

Call:

glm(formula = vomiting ~ age, family = binomial(link = logit))

 

Deviance Residuals:

    Min       1Q   Median       3Q      Max 

-1.0671  -1.0174  -0.9365   1.3395   1.9196 

 

Coefficients:

             Estimate Std. Error z value Pr(>|z|)   

(Intercept) -0.141729   0.106206  -1.334    0.182   

age         -0.015437   0.003965  -3.893 9.89e-05 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for binomial family taken to be 1)

 

Null deviance: 1452.3  on 1093  degrees of freedom

Residual deviance: 1433.9 on 1092  degrees of freedom

 

AIC: 1437.9

 

Number of Fisher Scoring iterations: 4

 

To get the significance for the overall model we use the following command:

> 1-pchisq(1452.3-1433.9, 1093-1092)

 [1] 1.79058e-05

The input to this test is:

This is analogous to the global F test for the overall significance of the model that comes automatically when we run the lm() command. This is testing the null hypothesis that the model is no better (in terms of likelihood) than a model fit with only the intercept term, i.e. that all beta terms are 0.

Thus the logistic model for these data is:

E[ odds(vomiting) ] = -0.14 – 0.02*age

 

This means that for a one-unit increase in age there is a 0.02 decrease in the log odds of vomiting. This can be translated to e-0.02 =  0.98. Groups of people in an age group one unit higher than a reference group have, on average, 0.98 times the odds of vomiting.

How do we test the association between vomiting and age?

When testing the null hypothesis that there is no association between vomiting and age we reject the null hypothesis at the 0.05 alpha level (z = -3.89, p-value = 9.89e-05).

On average, the odds of vomiting is 0.98 times that of identical subjects in an age group one unit smaller.

 

Finally, when we are looking at whether we should include a particular variable in our model (maybe it's a confounder), we can include it based on the "10% rule," where if the change in our estimate of interest changes more than 10% when we include the new covariate in the model, then we that new covariate in our model. When we do this in logistic regression, we compare the exponential of the betas, not the untransformed betas themselves!

 

Test the hypothesis that being nauseated was not associated with sex and age (hint: use a multiple logistic regression model).  Test the overall hypothesis that there is no association between nausea and sex and age.  Then test the individual main effects hypothesis (i.e. no association between sex and nausea after adjusting for age, and vice versa).