Multiple Variable Regression

Multiple Linear Regression

In the example below I am using part of the data from the Framingham Heart Study to determine whether body mass index (bmi) is associated with systolic blood pressure (sysbp).

First, I run a simple linear regression to assess the crude (unadjusted) association between bmi and sysbp.

# Simple Linear Regression
> summary(lm(sysbp~bmi))

Call:
lm(formula = sysbp ~ bmi)

Residuals:
Min 1Q Median 3Q Max
-49.960 -15.973 -2.991 13.389 116.933

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 108.9627 2.6391 41.29 <2e-16 ***
bmi 1.1959 0.1008 11.87 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.12 on 2996 degrees of freedom
Multiple R-squared: 0.04489, Adjusted R-squared: 0.04457
F-statistic: 140.8 on 1 and 2996 DF, p-value: < 2.2e-16

Interpretation: BMI is significant related to systolic blood pressure (p<0.0001). Each increment in BMI is associated with an increase in sysbp of 1.2 mm Hg.

> #Multiple linear Regression
> summary(lm(sysbp~bmi + age + male + ldl + hdl +cursmoke))

Call:
lm(formula = sysbp ~ bmi + age + male + ldl + hdl + cursmoke)

Residuals:
Min 1Q Median 3Q Max
-54.981 -14.886 -2.319 11.703 105.840

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.885480 4.723993 9.713 <2e-16 ***
bmi *1.263627 0.097482 *12.963 <2e-16 ***
age 0.925972   0.047140 *19.643 <2e-16 ***
male -0.896683   0.818306 *-1.096    0.273
ldl 0.021030 0.008265 2.545    *0.011 *
hdl 0.041834 0.026281 1.592 *0.112
cursmoke -0.499081 0.836412 **-0.597 *0.551
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.68 on 2991 degrees of freedom
Multiple R-squared: 0.166, Adjusted R-squared: 0.1643
F-statistic: 99.22 on 6 and 2991 DF, p-value: < 2.2e-16

Interpetation: After adjusting for age, sex, LDL, HDL, and current smoking, BMI was still significantly associated with systolic blood pressure. Each unit increase in BMI was associated with a modest increase in systolic blood pressure of about 1.3 mm Hg on average (p<0.0001). The adjusted estimate for BMI was similar to the crude estimate.

Multiple Logistic Regression

I am again using the Framinghams data set, but now my goal is to test whether there is an association between diabetes and odds of being hospitalized for a myocardial infarction (hospmi). I begin by looking at the crude association. So, now the outcome of interest (hospmi) is a dichotomous variable, and I have to use logistic regression instead of linear regression.

# First I will examine the crude association with a simple chi-squared test
> table(diabetes,hospmi)
hospmi
diabetes 0 1
0 **2557 210
1 *183 48
> library(epitools)
> oddsratio.wald(table(diabetes,hospmi))

$data

hospmi
diabetes 0 1 Total
***0 2557 210 2767
***1 183 48 231
*Total **2740 *258 2998

$measure
odds ratio with 95% C.I.
diabetes estimate lower upper
0 1.000000 NA **NA
1 3.193755 2.256038 4.521233

$p.value
two-sided
diabetes midp.exact fisher.exact chi.square
0 NA NA NA
1 1.945462e-09 1.88176e-09 6.548224e-12

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

Interpretation: Subjects with diabetes had 1.95 times the odds of being hospitalized for a myocardial infarction compared to subjects without diabetes (p<0.0001)

# Multiple Logistic Regression

> log.out <-glm(hospmi~diabetes + age + male + bmi + sysbp + diabp + hdl + ldl,
+ family=binomial (link=logit))
> summary(log.out)

Call:
glm(formula = hospmi ~ diabetes + age + male + bmi + sysbp +
diabp + hdl + ldl, family = binomial(link = logit))

Deviance Residuals:
Min 1Q Median *3Q *Max
-1.3615 -0.4634 -0.3282 -0.2277 2.7531

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.946249 0.961247 -6.186 6.17e-10 ***
diabetes 0.946976 0.192002    4.932 8.13e-07 ***
age 0.016407 0.009186    1.786 0.074088 .
male 1.197212 0.153976    7.775 7.52e-15 ***
bmi -0.003890 0.018470 -0.211 0.833172
sysbp 0.012896 0.004114    3.135 0.001721 **
diabp -0.005405 0.007965 -0.679 0.497354
hdl -0.017798 0.005306 -3.354 0.000795 ***
ldl 0.007328 0.001374 *5.334 9.62e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1758.7 on 2997 degrees of freedom
Residual deviance: 1579.5 on 2989 degrees of freedom
AIC: 1597.5

Number of Fisher Scoring iterations: 6

# The next command asks R to provide the adjusted odds ratios for each variable in the model

> exp(log.out$coeff)
(Intercept) diabetes age male **bmi sysbp diabp
0.002615633 2.577901246 1.016542819 3.310874851 0.996117094 1.012979017 0.994609173

**hdl ldl
0.982359551 1.007354740

# The next command asks for the 95% confidence intervals for the adjusted odds ratios.

> exp(confint(log.out))

Waiting for profiling to be done...

2.5 % 97.5 %
(Intercept) 0.000392781 0.01703735
diabetes 1.755767291 3.73167495
age 0.998370449 1.03500180
male 2.457227924 4.49622942
bmi 0.960202678 1.03233392
sysbp 1.004774973 1.02112476
diabp 0.979236759 1.01032182
hdl 0.972038277 0.99246563
ldl 1.004631473 1.01006555

Interpretation: After adjusting for age, sex, BMI systolic and diastolic blood pressure, HDL cholesterol, and LDL cholesterol, diabetics had 0.95 times the odds of being hospitalized for a myocardial infarction compared to non-diabetics. Since the crude odds ratio was 1.95, this indicates that the association between diabetes and hospitalizaiton for MI was confounded by one or more of these other risk factors.

return to top | previous page | next page