# Multiple Linear Regression

## Model Specification and Output

In reality, most regression analyses use more than a single predictor. Specification of a multiple regression analysis is done by setting up a model formula with plus (+) between the predictors:

> lm2<-lm(pctfat.brozek~age+fatfreeweight+neck,data=fatdata)

which corresponds to the following multiple linear regression model:

pctfat.brozek = β0 + β1*age + β2*fatfreeweight + β3*neck + ε

This tests the following hypotheses:

• H0: There is no linear association between pctfat.brozek and age, fatfreeweight and neck.
• Ha: Here is a linear association between pctfat.brozek and age, fatfreeweight and neck.

> summary(lm2)

Call:

lm(formula = pctfat.brozek ~ age + fatfreeweight + neck, data = fatdata)

Residuals:

Min        1Q       Median        3Q        Max

-16.67871  -3.62536   0.07768   3.65100  16.99197

Coefficients:

Estimate    Std. Error t value Pr(>|t|)

(Intercept)    -53.01330   5.99614   -8.841   < 2e-16 ***

age            0.03832    0.03298    1.162    0.246

fatfreeweight  -0.23200    0.03086  -7.518    1.02e-12 ***

neck            2.72617    0.22627  12.049   < 2e-16 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.901 on 248 degrees of freedom

Multiple R-squared: 0.4273,     Adjusted R-squared: 0.4203

F-statistic: 61.67 on 3 and 248 DF,  p-value: < 2.2e-16

Global Null Hypothesis

• When testing the null hypothesis that there is no linear association between Brozek percent fat, age, fatfreeweight, and neck, we reject the null hypothesis (F3,248 = 61.67, p-value < 2.2e-16). Age, fatfreeweight and neck explain 42.73% of the variability in Brozek percent fat.

Main Effects Hypothesis

• When testing the null hypothesis that there is no linear association between Brozek percent fat and age after adjusting for fatfreeweight and neck, we fail to reject the null hypothesis (t = 1.162, df = 248, p-value = 0.246).  For a one-unit change in age, on average, the Brozek percent fat increases by 0.03, after adjusting for fatfreeweight and neck.
• When testing the null hypothesis that there is no linear association between Brozek percent fat and fatfreeweight after adjusting for age and neck, we reject the null hypothesis (t = -7.518, df = 248, p-value =1.02e-12).  For a one-unit increase in fatfreeweight, Brozek percent fat decreases by 0.23 units after adjusting for age and neck.
• When testing the null hypothesis that there is no linear association between Brozek percent fat and neck after adjusting for fatfreeweight and age, we reject the null hypothesis (t = 12.049, df = 248, p-value < 2e-16).  For a one-unit increase in neck there is a 2.73 increase in Brozek percent fat, after adjusting for age and fatfreeweight.

# Model with Categorical Variables or Factors

Sometimes, we may be also interested in using categorical variables as predictors. According to the information posted in the website of National Heart Lung and Blood Institute (http://www.nhlbi.nih.gov/health/public/heart/obesity/lose_wt/risk.htm), individuals with body mass index (BMI) greater than or equal to 25 are classified as overweight or obesity. In our dataset, the variable adiposity is equivalent to BMI.

 Create a categorical variable bmi, which takes value of "overweight or obesity" if adiposity >= 25 and "normal or underweight" otherwise.

With the variable bmi you generated from the previous exercise, we go ahead to model our data.

> lm3 <- lm(pctfat.brozek ~ age + fatfreeweight + neck + factor(bmi), data = fatdata)

> summary(lm3)

Call:

lm(formula = pctfat.brozek ~ age + fatfreeweight + neck + factor(bmi),

data = fatdata)

Residuals:

Min       1Q   Median       3Q      Max

-13.4222  -3.0969  -0.2637   2.7280  13.3875

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)                      -21.31224    6.32852  -3.368 0.000879 ***

age                                0.01698    0.02887   0.588 0.556890

fatfreeweight                     -0.23488    0.02691  -8.727 3.97e-16 ***

neck                               1.83080    0.22152   8.265 8.63e-15 ***

factor(bmi)overweight or obesity   7.31761    0.82282   8.893  < 2e-16 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.146 on 247 degrees of freedom

Multiple R-squared:  0.5662,     Adjusted R-squared:  0.5591

F-statistic: 80.59 on 4 and 247 DF,  p-value: < 2.2e-16

Note that although factor bmi has two levels, the result only shows one level: "overweight or obesity", which is called the "treatment effect". In other words, the level "normal or underweight" is considered as baseline or reference group and the estimate of factor(bmi) overweight or obesity 7.3176 is the effect difference of these two levels on percent body fat.

Multiple Linear Regression in R (R Tutorial 5.3) MarinStatsLectures [Contents]