Multiple Linear Regression


Model Specification and Output

In reality, most regression analyses use more than a single predictor. Specification of a multiple regression analysis is done by setting up a model formula with plus (+) between the predictors:

> lm2<-lm(pctfat.brozek~age+fatfreeweight+neck,data=fatdata)

which corresponds to the following multiple linear regression model:

pctfat.brozek = β0 + β1*age + β2*fatfreeweight + β3*neck + ε

 

This tests the following hypotheses:

> summary(lm2)

 

Call:

lm(formula = pctfat.brozek ~ age + fatfreeweight + neck, data = fatdata)

 

Residuals:

         Min        1Q       Median        3Q        Max

-16.67871  -3.62536   0.07768   3.65100  16.99197

 

Coefficients:

               Estimate    Std. Error t value Pr(>|t|)   

(Intercept)    -53.01330   5.99614   -8.841   < 2e-16 ***

age            0.03832    0.03298    1.162    0.246   

fatfreeweight  -0.23200    0.03086  -7.518    1.02e-12 ***

neck            2.72617    0.22627  12.049   < 2e-16 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Residual standard error: 5.901 on 248 degrees of freedom

Multiple R-squared: 0.4273,     Adjusted R-squared: 0.4203

F-statistic: 61.67 on 3 and 248 DF,  p-value: < 2.2e-16

Global Null Hypothesis

Main Effects Hypothesis

Model with Categorical Variables or Factors


Sometimes, we may be also interested in using categorical variables as predictors. According to the information posted in the website of National Heart Lung and Blood Institute (http://www.nhlbi.nih.gov/health/public/heart/obesity/lose_wt/risk.htm), individuals with body mass index (BMI) greater than or equal to 25 are classified as overweight or obesity. In our dataset, the variable adiposity is equivalent to BMI.

Create a categorical variable bmi, which takes value of "overweight or obesity" if adiposity >= 25 and "normal or underweight" otherwise.

 

 

 With the variable bmi you generated from the previous exercise, we go ahead to model our data.

> lm3 <- lm(pctfat.brozek ~ age + fatfreeweight + neck + factor(bmi), data = fatdata)

> summary(lm3)

 

 

Call:

lm(formula = pctfat.brozek ~ age + fatfreeweight + neck + factor(bmi),

    data = fatdata)

 

Residuals:

     Min       1Q   Median       3Q      Max

-13.4222  -3.0969  -0.2637   2.7280  13.3875

 

Coefficients:

                                  Estimate Std. Error t value Pr(>|t|)   

(Intercept)                      -21.31224    6.32852  -3.368 0.000879 ***

age                                0.01698    0.02887   0.588 0.556890   

fatfreeweight                     -0.23488    0.02691  -8.727 3.97e-16 ***

neck                               1.83080    0.22152   8.265 8.63e-15 ***

factor(bmi)overweight or obesity   7.31761    0.82282   8.893  < 2e-16 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Residual standard error: 5.146 on 247 degrees of freedom

Multiple R-squared:  0.5662,     Adjusted R-squared:  0.5591

F-statistic: 80.59 on 4 and 247 DF,  p-value: < 2.2e-16

 

Note that although factor bmi has two levels, the result only shows one level: "overweight or obesity", which is called the "treatment effect". In other words, the level "normal or underweight" is considered as baseline or reference group and the estimate of factor(bmi) overweight or obesity 7.3176 is the effect difference of these two levels on percent body fat.

 

Multiple Linear Regression in R (R Tutorial 5.3) MarinStatsLectures [Contents]

alternative accessible content