Multiple Linear Regression
Model Specification and Output
In reality, most regression analyses use more than a single predictor. Specification of a multiple regression analysis is done by setting up a model formula with plus (+) between the predictors:
> lm2<-lm(pctfat.brozek~age+fatfreeweight+neck,data=fatdata)
which corresponds to the following multiple linear regression model:
pctfat.brozek = β0 + β1*age + β2*fatfreeweight + β3*neck + ε
This tests the following hypotheses:
- H0: There is no linear association between pctfat.brozek and age, fatfreeweight and neck.
- Ha: Here is a linear association between pctfat.brozek and age, fatfreeweight and neck.
> summary(lm2)
Call:
lm(formula = pctfat.brozek ~ age + fatfreeweight + neck, data = fatdata)
Residuals:
Min 1Q Median 3Q Max
-16.67871 -3.62536 0.07768 3.65100 16.99197
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -53.01330 5.99614 -8.841 < 2e-16 ***
age 0.03832 0.03298 1.162 0.246
fatfreeweight -0.23200 0.03086 -7.518 1.02e-12 ***
neck 2.72617 0.22627 12.049 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.901 on 248 degrees of freedom
Multiple R-squared: 0.4273, Adjusted R-squared: 0.4203
F-statistic: 61.67 on 3 and 248 DF, p-value: < 2.2e-16
Global Null Hypothesis
- When testing the null hypothesis that there is no linear association between Brozek percent fat, age, fatfreeweight, and neck, we reject the null hypothesis (F3,248 = 61.67, p-value < 2.2e-16). Age, fatfreeweight and neck explain 42.73% of the variability in Brozek percent fat.
Main Effects Hypothesis
- When testing the null hypothesis that there is no linear association between Brozek percent fat and age after adjusting for fatfreeweight and neck, we fail to reject the null hypothesis (t = 1.162, df = 248, p-value = 0.246). For a one-unit change in age, on average, the Brozek percent fat increases by 0.03, after adjusting for fatfreeweight and neck.
- When testing the null hypothesis that there is no linear association between Brozek percent fat and fatfreeweight after adjusting for age and neck, we reject the null hypothesis (t = -7.518, df = 248, p-value =1.02e-12). For a one-unit increase in fatfreeweight, Brozek percent fat decreases by 0.23 units after adjusting for age and neck.
- When testing the null hypothesis that there is no linear association between Brozek percent fat and neck after adjusting for fatfreeweight and age, we reject the null hypothesis (t = 12.049, df = 248, p-value < 2e-16). For a one-unit increase in neck there is a 2.73 increase in Brozek percent fat, after adjusting for age and fatfreeweight.
Model with Categorical Variables or Factors
Sometimes, we may be also interested in using categorical variables as predictors. According to the information posted in the website of National Heart Lung and Blood Institute (http://www.nhlbi.nih.gov/health/public/heart/obesity/lose_wt/risk.htm), individuals with body mass index (BMI) greater than or equal to 25 are classified as overweight or obesity. In our dataset, the variable adiposity is equivalent to BMI.
Create a categorical variable bmi, which takes value of "overweight or obesity" if adiposity >= 25 and "normal or underweight" otherwise.
|
With the variable bmi you generated from the previous exercise, we go ahead to model our data.
> lm3 <- lm(pctfat.brozek ~ age + fatfreeweight + neck + factor(bmi), data = fatdata)
> summary(lm3)
Call:
lm(formula = pctfat.brozek ~ age + fatfreeweight + neck + factor(bmi),
data = fatdata)
Residuals:
Min 1Q Median 3Q Max
-13.4222 -3.0969 -0.2637 2.7280 13.3875
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -21.31224 6.32852 -3.368 0.000879 ***
age 0.01698 0.02887 0.588 0.556890
fatfreeweight -0.23488 0.02691 -8.727 3.97e-16 ***
neck 1.83080 0.22152 8.265 8.63e-15 ***
factor(bmi)overweight or obesity 7.31761 0.82282 8.893 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.146 on 247 degrees of freedom
Multiple R-squared: 0.5662, Adjusted R-squared: 0.5591
F-statistic: 80.59 on 4 and 247 DF, p-value: < 2.2e-16
Note that although factor bmi has two levels, the result only shows one level: "overweight or obesity", which is called the "treatment effect". In other words, the level "normal or underweight" is considered as baseline or reference group and the estimate of factor(bmi) overweight or obesity 7.3176 is the effect difference of these two levels on percent body fat.
Multiple Linear Regression in R (R Tutorial 5.3) MarinStatsLectures [Contents]