What Are the Risk Factors for Cardiovascular Disease?
The longest running and most influential prospective cohort study on heart disease is the Framingham Heart Study, which began in 1948. You can see a brief history of the study at the following link: Link to Framingham Heart Study History. Note also that on the left side of this page there is a link to "Research Milestones" of the Framingham Heart Study; be sure to take a look at the list.
We can use some of the data from the original Framingham cohort to begin to explore the relationship between obesity/overweight and coronary heart disease. One problem with analyzing this data is the potential for confounding among the risk factors. For example, smokers may weight less than non-smokers because nicotine curbs appetite. In addition, obesity has been associated with increases in blood pressure, which is an established risk factor for heart disease. Consequently, even if obesity is associated with an increased risk of heart disease or death, one might ask whether this association is independent of the association between obesity and hypertension.
Link to the full period 3 Framingham data set
Link to the partial data set called fram-nosmoke-nolow.CSV
Explanation of variables in the Framingham data sets.
Exercise for Analysis of Discrete Variables "bmicat" (BMI category) and FMI_FCHD
We will now use data collected at period 3 of the Framingham Heart Study to begin to explore the association between obesity and MI_FCHD (hospitalization for Myocardial Infarction or Fatal Coronary Heart Disease). In order to simplify the analysis we have created a file called fram-nosmoke-nolow.CSV, from which we have removed the following subjects:]
- Those with BMI<20,
- Current smokers
- Those without data on BMI
In this exercise you will use what you learned in the class that covered analysis of discrete data (chi-squared tests and computation of risk ratios and 95% confidence intervals for the risk ratio using R). Using fram-nosmoke-nolow.CSV, your task is to examine two associations:
- The association between overweight and MI-FCHD (comparing overweight (BMI=25.0-29.9) vs. normal (BMI=20-24.99)
- The association between obesity (BMI=30+) vs. normal (20-24.99)
In a later exercise we will re-examine these relationships using multiple logistic regression analysis to adjust for confounding by other risk factors such as hypertension, diabetes, and serum cholesterol levels (LDLC and HDLC). For now your task is to
- Read in the fram-nosmoke-nolow.CSV data set
- Create a variable called "bmicat", which defines three categories of BMI - "normal", "over" (for overweight), and "obese" as defined above
- Compute the risk ratio and 95% confidence limits for the risk ratio and the p-value for each of the two associations listed above
- Interpret your findings in 2-3 sentences
NOTE: This is a draft of an entire answer key just to illustrate what we might assign and what the results might be using this particular data set. I'm not proposing to post all of this in the cases study./WL
> fram_nosmoke_nolow <- read_csv("C:/Users/wlamorte/Desktop/Weymouth/fram-nosmoke-nolow.csv")
> View(fram_nosmoke_nolow)
> fr<-na.omit(fram_nosmoke_nolow)
> attach(fr)
> bmicat<-ifelse(BMI>29.99, "obese", ifelse(BMI>24.99, "over", "normal"))
> table(bmicat,MI_FCHD)
MI_FCHD
bmicat 0 1
norm 694 79
obese 296 50
over 800 123
> RRtableobese<-matrix(c(694,296,79,50),nrow=2,ncol=2)
> RRtableobese
[,1] [,2]
[1,] 694 79
[2,] 296 50
> riskratio.wald(RRtableobese)
$data
Outcome
Predictor Disease1 Disease2 Total
Exposed1 694 79 773
Exposed2 296 50 346
Total 990 129 1119
$measure
risk ratio with 95% C.I.
Predictor estimate lower upper
Exposed1 1.00000 NA NA
Exposed2 1.41399 1.015808 1.968254
$p.value
two-sided
Predictor midp.exact fisher.exact chi.square
Exposed1 NA NA NA
Exposed2 0.0442418 0.04319723 0.04054254
$correction [1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
#Association with Overweight
> RRtableover<-matrix(c(694,800,79,123),nrow=2,ncol=2)
> riskratio.wald(RRtableover)
$data
Outcome
Predictor Disease1 Disease2 Total
Exposed1 694 79 773
Exposed2 800 123 923
Total 1494 202 1696
$measure
risk ratio with 95% C.I.
Predictor estimate lower upper
Exposed1 1.000000 NA NA
Exposed2 1.303935 0.9994429 1.701193
$p.value
two-sided
Predictor midp.exact fisher.exact chi.square
Exposed1 NA NA NA
Exposed2 0.04906706 0.05062938 0.04919578
$correction
[1] FALSE
Exercise for Multiple Logistic Regression Analysis of "bmicat" (BMI category) and MI_FCHD
We previously examined the associations between overweight or obesity and risk of being hospitalized or dying of coronary heart disease. In this analysis we will re-examine these two associations using multiple logistic regression to adjust for confounding by other risk factors for CHD.
> fram_nosmoke_nolow <- read_csv("C:/Users/wlamorte/Desktop/Weymouth/fram-nosmoke-nolow.csv")
> View(fram_nosmoke_nolow)
> fr<-na.omit(fram_nosmoke_nolow)
> attach(fr)
> bmicat<-ifelse(BMI>29.99, "obese", ifelse(BMI>24.99, "over", "normal"))
# Create a subset that excludes overweight
> noover<-subset(fr, bmicat != "over")
> detach(fr)
> attach(noover)
# Create a new variable called "obese"
> obese<-ifelse(BMI>29.99,1,0)
#Logistic Regression for Exposure "obese" versus "normal" after adjusting for AGE
> log1<-glm(MI_FCHD ~ obese + AGE , family=binomial(link=logit))
> summary(log1)
Call:
glm(formula = MI_FCHD ~ obese + AGE, family = binomial(link = logit))
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8421 -0.5311 -0.4431 -0.3544 2.5069
Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.47423 0.78801 -6.947 3.73e-12 ***
obese 0.48740 0.20780 2.346 0.019 *
AGE 0.05166 0.01200 4.304 1.68e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 723.60 on 1023 degrees of freedom
Residual deviance: 700.12 on 1021 degrees of freedom
AIC: 706.12
Number of Fisher Scoring iterations: 5
> exp(log1$coefficients)
(Intercept) obese AGE
0.004193445 1.628083938 1.053013525
> exp(confint(log1))
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.0008647461 0.01907765
obese 1.0776944454 2.43824787
AGE 1.0287671000 1.07841237
# Repeat logistic regression for obese vs. normal after adjusting for some other risk factors
> log1<-glm(MI_FCHD ~ obese + AGE + LDLC + HDLC + SYSBP + DIABETES , family=binomial(link=logit))
> summary(log1)
Call:
glm(formula = MI_FCHD ~ obese + AGE + LDLC + HDLC + SYSBP + DIABETES, family = binomial(link = logit))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6418 -0.5109 -0.3771 -0.2730 2.6883
Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.507675 1.071545 -6.073 1.25e-09 ***
obese -0.030331 0.229960 -0.132 0.895067
AGE 0.035424 0.013193 2.685 0.007253 **
LDLC 0.005227 0.002044 2.557 0.010558 *
HDLC -0.029157 0.007926 -3.679 0.000235 ***
SYSBP 0.017527 0.004650 3.769 0.000164 ***
DIABETES 0.983129 0.288066 3.413 0.000643 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 723.60 on 1023 degrees of freedom
Residual deviance: 646.96 on 1017 degrees of freedom
AIC: 660.96
Number of Fisher Scoring iterations: 5
> exp(log1$coefficients)
(Intercept) obese AGE LDLC HDLC SYSBP DIABETES
0.001491945 0.970124848 1.036058404 1.005240782 0.971264064 1.017681865 2.672805712
> exp(confint(log1))
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.0001753714 0.01177539
obese 0.6134638132 1.51388154
AGE 1.0097635508 1.06345263
LDLC 1.0012045369 1.00927869
HDLC 0.9559094337 0.98609662
SYSBP 1.0084451438 1.02703963
DIABETES 1.4965798888 4.64849227
###############################################################################
# Logistic regression comparing overweight to normal after adjusting for AGE
#Create a subset consisting of just overweight and normal subjects
> noobese<-subset(fr, bmicat != "obese")
> detach(fr)
> attach(noobese)
# Create a new variable called "over
> over<-ifelse(BMI>24.99, 1,0)
> log1<-glm(MI_FCHD ~ over + AGE , family=binomial(link=logit))
> summary(log1)
Call:
glm(formula = MI_FCHD ~ over + AGE, family = binomial(link = logit))
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8341 -0.5388 -0.4437 -0.3560 2.5030
Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.00036 0.78106 -6.402 1.53e-10 ***
over -0.45846 0.20758 -2.209 0.0272 *
AGE 0.05154 0.01200 4.294 1.76e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 723.60 on 1023 degrees of freedom
Residual deviance: 700.71 on 1021 degrees of freedom
AIC: 706.71
Number of Fisher Scoring iterations: 5
> exp(log1$coefficients)
(Intercept) over AGE
0.006735502 0.632258308 1.052892683
#Repeat logistic regression for overweight versus normal adjusting for other risk factors
> log1<-glm(MI_FCHD ~ over + AGE + LDLC + HDLC + SYSBP + DIABETES , family=binomial(link=logit))
> summary(log1)
Call:
glm(formula = MI_FCHD ~ over + AGE + LDLC + HDLC + SYSBP + DIABETES, family = binomial(link = logit))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5164 -0.5258 -0.4037 -0.2906 2.8485
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.290747 0.852881 -7.376 1.63e-13 ***
over 0.027337 0.169264 0.162 0.87170
AGE 0.042661 0.010639 4.010 6.08e-05 ***
LDLC 0.003594 0.001630 2.205 0.02748 *
HDLC -0.026993 0.006032 -4.475 7.65e-06 ***
SYSBP 0.014310 0.003635 3.937 8.25e-05 ***
DIABETES 0.800967 0.241694 3.314 0.00092 ***
--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1150.8 on 1602 degrees of freedom
Residual deviance: 1049.0 on 1596 degrees of freedom
AIC: 1063
Number of Fisher Scoring iterations: 5
> exp(log1$coefficients)
(Intercept) over AGE LDLC HDLC SYSBP DIABETES
0.001853375 1.027713943 1.043583813 1.003600021 0.973367583 1.014413359 2.227694210
> exp(confint(log1))
2.5 % 97.5 %
(Intercept) 0.000339305 0.009639505
over 0.738581214 1.435390814
AGE 1.022205549 1.065783923
LDLC 1.000360834 1.006785335
HDLC 0.961715848 0.984733408
SYSBP 1.007205634 1.021679382
DIABETES 1.370262731 3.542918370
- What conclusions would you draw from your analysis?
- What do these results suggest about the relative importance of obesity versus overweight?
- If we focus on obese versus normal categories of BMI, the results are different when we adjust for just AGE and when we adjust for AGE plus other risk factors such as LDLC HDLC, SYSBP.
- Is it possible that these are biological intermediates, i.e., that the effects of overweight and obesity are mediated via obesity's effects on LDLC, HDLC, blood pressure, and diabetes?
ANSWER:
Overweight as a risk factor:
In the earlier crude (unadjusted) analysis we found that those who were overweight had 1.3 times the risk of being hospitalized for an MI or dying of CHD compared to those with normal BMI (95% confidence interval: 0.999 to 1.70). When logistic regression was used to adjust for confounding by age, subjects who were overweight had 1.32 times the risk of being hospitalized for an MI or dying from CHD compared to those with normal BMI (95% confidence interval 0.97 to 1.83). Thus, in both cases the association was of borderline significance. However, when we adjusted for additional risk factors (blood pressure LDLC, HDLC, and diabetes), this apparent association disappeared (RR=1.03, 95% confidence interval: 0.74 to 1.44.)
Obesity as a risk factor:
The earlier crude analysis of obesity's association with hospitalization for MI or death from CHD suggested that those who were obese had 1.41 (95% confidence interval: 1.015808 1.968254, p=0.044). When logistic regression was used to adjust for confounding by age, obese subjects had 1.63 times the risk of being hospitalized for an MI or dying from CHD (95% confidence interval: 0.97 to 1.83). However, once again, when logistic régression was used to adjust for age, blood pressure, LDLC, HDLC, and diabetes), the association was no longer significant (RR=0.97, 95% confidence interval: 0.61 to 1.51).
It has been well-established that overweight and obesity are associated with increased blood pressure, abnormalities in LDLC and HDLC, and an increased risk of diabetes. Therefore, in this case it is likely that these other risk factors are actually biological intermediates, i.e., that obesity causes elevations in blood pressure, lipid abnormalities, and an increased risk of type 2 diabetes. These, in turn, cause an increased risk of myocardial infarction or death from coronary heart disease.
EDITORIAL NOTE: Other data sets available at NHLBI BioLINCC at https://biolincc.nhlbi.nih.gov/home/]