Multiple Variable Regression

Multiple Linear Regression

In the example below I am using part of the data from the Framingham Heart Study to determine whether body mass index (bmi) is associated with systolic blood pressure (sysbp).

First, I run a simple linear regression to assess the crude (unadjusted) association between bmi and sysbp.

# Simple Linear Regression
> summary(lm(sysbp~bmi))

Call:
lm(formula = sysbp ~ bmi)

Residuals:
    Min     1Q   Median       3Q     Max
-49.960 -15.973   -2.991   13.389 116.933

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 108.9627     2.6391     41.29     <2e-16 ***
bmi           1.1959     0.1008     11.87     <2e-16 ***
---
Signif. codes:   0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.12 on 2996 degrees of freedom
Multiple R-squared:   0.04489, Adjusted R-squared:   0.04457
F-statistic: 140.8 on 1 and 2996 DF,   p-value: < 2.2e-16

Interpretation: BMI is significant related to systolic blood pressure (p<0.0001). Each increment in BMI is associated with an increase in sysbp of 1.2 mm Hg.

> #Multiple linear Regression
> summary(lm(sysbp~bmi + age + male + ldl + hdl +cursmoke))

Call:
lm(formula = sysbp ~ bmi + age + male + ldl + hdl + cursmoke)

Residuals:
    Min      1Q   Median       3Q     Max
-54.981 -14.886   -2.319   11.703 105.840

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) 45.885480   4.723993     9.713     <2e-16 ***
bmi         *1.263627   0.097482   *12.963     <2e-16 ***
age          0.925972   0.047140   *19.643     <2e-16 ***
male        -0.896683   0.818306   *-1.096      0.273      
ldl          0.021030   0.008265     2.545     *0.011 *  
hdl          0.041834   0.026281     1.592     *0.112      
cursmoke    -0.499081   0.836412   **-0.597     *0.551      
---
Signif. codes:   0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.68 on 2991 degrees of freedom
Multiple R-squared:   0.166,     Adjusted R-squared:   0.1643
F-statistic: 99.22 on 6 and 2991 DF,   p-value: < 2.2e-16

Interpetation: After adjusting for age, sex, LDL, HDL, and current smoking, BMI was still significantly associated with systolic blood pressure. Each unit increase in BMI was associated with a modest increase in systolic blood pressure of about 1.3 mm Hg on average (p<0.0001). The adjusted estimate for BMI was similar to the crude estimate.

Multiple Logistic Regression

I am again using the Framinghams data set, but now my goal is to test whether there is an association between diabetes and odds of being hospitalized for a myocardial infarction (hospmi). I begin by looking at the crude association. So, now the outcome of interest (hospmi) is a dichotomous variable, and I have to use logistic regression instead of linear regression.

# First I will examine the crude association with a simple chi-squared test
> table(diabetes,hospmi)
             hospmi
diabetes     0     1
      0 **2557   210
      1   *183    48
> library(epitools)
> oddsratio.wald(table(diabetes,hospmi))

$data

            hospmi
diabetes     0     1 Total
    ***0   2557 210   2767
    ***1   183   48    231
  *Total **2740 *258   2998

$measure
          odds ratio with 95% C.I.
diabetes estimate    lower   upper
       0 1.000000       NA     **NA
       1 3.193755 2.256038 4.521233

$p.value
              two-sided
diabetes     midp.exact fisher.exact     chi.square
        0           NA            NA           NA
        1 1.945462e-09   1.88176e-09 6.548224e-12

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

Interpretation: Subjects with diabetes had 1.95 times the odds of being hospitalized for a myocardial infarction compared to subjects without diabetes (p<0.0001)

# Multiple Logistic Regression

> log.out <-glm(hospmi~diabetes + age + male + bmi + sysbp + diabp + hdl + ldl,
+  family=binomial (link=logit))
> summary(log.out) 

Call:
glm(formula = hospmi ~ diabetes + age + male + bmi + sysbp +
      diabp + hdl + ldl, family = binomial(link = logit))

Deviance Residuals:
    Min        1Q   Median        *3Q       *Max  
-1.3615   -0.4634   -0.3282   -0.2277     2.7531  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)      
(Intercept) -5.946249     0.961247   -6.186 6.17e-10 ***
diabetes     0.946976     0.192002    4.932 8.13e-07 ***
age          0.016407     0.009186    1.786 0.074088 .  
male         1.197212     0.153976    7.775 7.52e-15 ***
bmi         -0.003890     0.018470   -0.211 0.833172  
sysbp        0.012896     0.004114    3.135 0.001721 **
diabp       -0.005405     0.007965   -0.679 0.497354      
hdl         -0.017798     0.005306   -3.354 0.000795 ***
ldl          0.007328     0.001374   *5.334 9.62e-08 ***
---
Signif. codes:   0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

      Null deviance: 1758.7   on 2997   degrees of freedom
Residual deviance: 1579.5   on 2989   degrees of freedom
AIC: 1597.5

Number of Fisher Scoring iterations: 6

# The next command asks R to provide the adjusted odds ratios for each variable in the model

> exp(log.out$coeff)
(Intercept)    diabetes         age        male       **bmi       sysbp       diabp
0.002615633 2.577901246 1.016542819 3.310874851 0.996117094 1.012979017 0.994609173

      **hdl         ldl
0.982359551 1.007354740

# The next command asks for the 95% confidence intervals for the adjusted odds ratios.

> exp(confint(log.out))

Waiting for profiling to be done...

                  2.5 %       97.5 %
(Intercept) 0.000392781 0.01703735
diabetes    1.755767291 3.73167495
age         0.998370449 1.03500180
male        2.457227924 4.49622942
bmi         0.960202678 1.03233392
sysbp       1.004774973 1.02112476
diabp       0.979236759 1.01032182
hdl         0.972038277 0.99246563
ldl         1.004631473 1.01006555

Interpretation: After adjusting for age, sex, BMI systolic and diastolic blood pressure, HDL cholesterol, and LDL cholesterol, diabetics had 0.95 times the odds of being hospitalized for a myocardial infarction compared to non-diabetics. Since the crude odds ratio was 1.95, this indicates that the association between diabetes and hospitalizaiton for MI was confounded by one or more of these other risk factors.