Introduction
Link to video transcript in a Word file
In this module we will continue our study of measurement variables, but instead of comparing means between groups, we will provide tools for examining the association between two variables with continuous distributions, i.e., correlations. The association between two such variables can be described mathematically with simple linear regression if the association is reasonably linear. Simple linear regression is a useful tool, and it also provides a foundation for the multiple regression methods that will enable you to evaluate and adjust for confounding variables in a later module.
In this module we will consider when correlation is appropriate and how to interpret correlation coefficients. We will also discuss the assumptions of the linear regression mode and how to interpret slope, confidence interval for the slope, p-value for the slope, and R2. To apply our learning, we will perform correlation and regression analyses and create scatter plots using R.
Essential Questions
- How do we assess the relationship between an exposure and a health outcome when both are continuously distributed measurement variables?
- How can we tell if two continuous variables are related?
Learning Objectives
After completing this section, you will be able to:
- Give examples of research questions that would be appropriately answered through correlation or regression analysis
- Interpret results of correlation and regression analyses presented in tables and figures from the public health literature
- Construct and interpret scatter plots describing association between variables, using the R statistical package
- Compute and interpret a correlation coefficient and p-value for a correlation coefficient using the R statistical package
- Describe the linear regression model and interpret the slope and R2 from a linear regression
- Conduct a linear regression analysis using the R statistical package
- Explain the results of a linear regression to a lay person in an understandable way