Simple Linear Regression Model Fitting
The fat data frame contains 252 observations (individuals) on 19 variables. Here we don't need all the variables, so let's create a smaller dataset to use.
> fatdata<fat[,c(1,2,5:11)]
> summary(fatdata[,1]) # do you remember what the negative index (1) here means?
Suppose we are interested in the relationship between body percent fat and neck circumference.
> lm1 < lm(pctfat.brozek ~ neck, data = fatdata)
> plot(pctfat.brozek ~ neck, data = fatdata)
> abline(lm1)
> names(lm1)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
> summary(lm1)
> lm1
Call:
lm(formula = pctfat.brozek ~ neck, data = fatdata)
Coefficients:
(Intercept) neck
40.598 1.567
The argument pctfat.brozek ~ neck to lm function is a model formula. The resulting plot is shown in th figure on the right, and the abline() function extracts the coefficients of the fitted model and adds the corresponding regression line to the plot. The fittedmodel object is stored as lm1, which is essentially a list.
The fitted model is pctfat.brozek = 40.598 + 1.567* neck. An lm object in fact contains more information than you just saw. For example, the basic extractor function is summary. The output from summary() is selfexplanatory. For our model, we obtain
> summary(lm1)
Call:
lm(formula = pctfat.brozek ~ neck, data = fatdata) #This is the model formula
Residuals:
Min 1Q Median 3Q Max
14.0076 4.9450 0.2405 5.0321 21.1344
Coefficients:
Estimate Std. Error t value Pr(>t) #These are the comprehensive results
(Intercept) 40.5985 6.6857 6.072 4.66e09 ***
neck 1.5671 0.1756 8.923 < 2e16 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.764 on 250 degrees of freedom
Multiple Rsquared: 0.2416, Adjusted Rsquared: 0.2385
Fstatistic: 79.62 on 1 and 250 DF, pvalue: < 2.2e16
The output provides a brief numerical summary of the residuals as well as a table of the estimated regression results.
Here the tvalue of 8.923 and pvalue of less than 2e16 corresponds to the individual test of the hypothesis that "the true coefficient for variable neck equals 0". Also, two versions of rsquared tell us how much of the variation of the response variable is explained by our predictors, and not by error. In our case, the model explains around 24% of the variation of percent of body fat. The last row of results is the test for the hypothesis that all regression coefficients are zero.
When testing the null hypothesis that there is no linear association between neck size and Brozek percent fat we reject the null hypothesis (F_{1,250} = 79.62, pvalue < 2.2e16, or t = 8.923, df = 250, pvalue < 2.2e16). For a one unit increase in neck there is a 1.57 increase in Brozek percent fat. Neck explains 24.16% of the variability in Brozek percent fat.

Other Functions for Fitted Linear Model Objects
We have seen how summary can be used to extract information about the results of a regression analysis. In this session, we will introduce some more extraction functions. Table 4.2 lists generic function for fitted linear model objects. For example, we may obtain a plot of residuals versus fitted values via
> plot(fitted(lm1), resid(lm1))
> qqnorm(resid(lm1))
and check whether residuals might have come from a normal distribution by checking for a straight line on a QQ plot via qqnorm() function. The plot()function for class lm() provides six types of diagnostic plots, four of which are shown by default. Their discussion will be postponed until later. All plots may be accessed individually using the which argument, for example, plot(lm1, which=2), if only the QQplot is desired.
Generic functions for fitted (linear) model objects
Function 
Description 

print() 
simple printed display 
summary() 
standard regression output 
coef() 
(or coefficients()) extracting the regression coefficients 
residuals() 
(or resid()) extracting residuals 
fitted() 
(or fitted.values()) extracting fitted values 
anova() 
comparison of nested models 
predict() 
predictions for new data 
plot() 
diagnostic plots 
confint() 
confidence intervals for the regression coefficients 
deviance() 
residual sum of squares 
vcov() 
(estimated) variancecovariance matrix 
logLik() 
loglikelihood (assuming normally distribted errors) 
AIC 
information criteria including AIC, BIC/SBC (assuming normally distributed errors) 
Linear Regression in R (R Tutorial 5.1) MarinStatsLectures [Contents]