Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health



In this section we discuss correlation analysis which is a technique used to quantify the associations between two continuous variables. For example, we might want to quantify the association between body mass index and systolic blood pressure, or between hours of exercise per week and percent body fat. Regression analysis is a related technique to assess the relationship between an outcome variable and one or more risk factors or confounding variables (confounding is discussed later). The outcome variable is also called the response or dependent variable, and the risk factors and confounders are called the predictors, or explanatory or independent variables. In regression analysis, the dependent variable is denoted "Y" and the independent variables are denoted by "X".

[ NOTE: The term "predictor" can be misleading if it is interpreted as the ability to predict even beyond the limits of the data. Also, the term "explanatory variable" might give an impression of a causal effect in a situation in which inferences should be limited to identifying associations. The terms "independent" and "dependent" variable are less subject to these interpretations as they do not strongly imply cause and effect.

Learning Objectives

After completing this module, the student will be able to:

  1. Define and provide examples of dependent and independent variables in a study of a public health problem
  2. Compute and interpret a correlation coefficient
  3. Compute and interpret coefficients in a linear regression analysis