Logistic Regression in R
To perform logistic regression in R, you need to use the glm() function. Here, glm stands for "general linear model." Suppose we want to run the above logistic regression model in R, we use the following command:
> summary( glm( vomiting ~ age, family = binomial(link = logit) ) )
Call:
glm(formula = vomiting ~ age, family = binomial(link = logit))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0671 -1.0174 -0.9365 1.3395 1.9196
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.141729 0.106206 -1.334 0.182
age -0.015437 0.003965 -3.893 9.89e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1452.3 on 1093 degrees of freedom
Residual deviance: 1433.9 on 1092 degrees of freedom
AIC: 1437.9
Number of Fisher Scoring iterations: 4
To get the significance for the overall model we use the following command:
> 1-pchisq(1452.3-1433.9, 1093-1092)
[1] 1.79058e-05
The input to this test is:
- deviance of "null" model minus deviance of current model (can be thought of as "likelihood")
- degrees of freedom of the null model minus df of current model
This is analogous to the global F test for the overall significance of the model that comes automatically when we run the lm() command. This is testing the null hypothesis that the model is no better (in terms of likelihood) than a model fit with only the intercept term, i.e. that all beta terms are 0.
Thus the logistic model for these data is:
E[ odds(vomiting) ] = -0.14 – 0.02*age
This means that for a one-unit increase in age there is a 0.02 decrease in the log odds of vomiting. This can be translated to e-0.02 = 0.98. Groups of people in an age group one unit higher than a reference group have, on average, 0.98 times the odds of vomiting.
How do we test the association between vomiting and age?
- H0: There is no association between vomiting and age (the odds ratio is equal to 1).
- Ha: There is an association between vomiting and age (the odds ratio is not equal to 1).
When testing the null hypothesis that there is no association between vomiting and age we reject the null hypothesis at the 0.05 alpha level (z = -3.89, p-value = 9.89e-05).
On average, the odds of vomiting is 0.98 times that of identical subjects in an age group one unit smaller.
Finally, when we are looking at whether we should include a particular variable in our model (maybe it's a confounder), we can include it based on the "10% rule," where if the change in our estimate of interest changes more than 10% when we include the new covariate in the model, then we that new covariate in our model. When we do this in logistic regression, we compare the exponential of the betas, not the untransformed betas themselves!
Test the hypothesis that being nauseated was not associated with sex and age (hint: use a multiple logistic regression model). Test the overall hypothesis that there is no association between nausea and sex and age. Then test the individual main effects hypothesis (i.e. no association between sex and nausea after adjusting for age, and vice versa). |