What Are the Risk Factors for Cardiovascular Disease?

The longest running and most influential prospective cohort study on heart disease is the Framingham Heart Study, which began in 1948. You can see a brief history of the study at the following link: Link to Framingham Heart Study History. Note also that on the left side of this page there is a link to "Research Milestones" of the Framingham Heart Study; be sure to take a look at the list.

We can use some of the data from the original Framingham cohort to begin to explore the relationship between obesity/overweight and coronary heart disease. One problem with analyzing this data is the potential for confounding among the risk factors. For example, smokers may weight less than non-smokers because nicotine curbs appetite. In addition, obesity has been associated with increases in blood pressure, which is an established risk factor for heart disease. Consequently, even if obesity is associated with an increased risk of heart disease or death, one might ask whether this association is independent of the association between obesity and hypertension.

Link to the full period 3 Framingham data set

Link to the partial data set called fram-nosmoke-nolow.CSV

Explanation of variables in the Framingham data sets.

Exercise for Analysis of Discrete Variables "bmicat" (BMI category) and FMI_FCHD

Thinking man icon signifying a question for the student

We will now use data collected at period 3 of the Framingham Heart Study to begin to explore the association between obesity and MI_FCHD (hospitalization for Myocardial Infarction or Fatal Coronary Heart Disease). In order to simplify the analysis we have created a file called fram-nosmoke-nolow.CSV, from which we have removed the following subjects:]

Those with BMI<20,
Current smokers
Those without data on BMI

In this exercise you will use what you learned in the class that covered analysis of discrete data (chi-squared tests and computation of risk ratios and 95% confidence intervals for the risk ratio using R). Using fram-nosmoke-nolow.CSV, your task is to examine two associations:

The association between overweight and MI-FCHD (comparing overweight (BMI=25.0-29.9) vs. normal (BMI=20-24.99)
The association between obesity (BMI=30+) vs. normal (20-24.99)

In a later exercise we will re-examine these relationships using multiple logistic regression analysis to adjust for confounding by other risk factors such as hypertension, diabetes, and serum cholesterol levels (LDLC and HDLC). For now your task is to

Read in the fram-nosmoke-nolow.CSV data set
Create a variable called "bmicat", which defines three categories of BMI - "normal", "over" (for overweight), and "obese" as defined above
Compute the risk ratio and 95% confidence limits for the risk ratio and the p-value for each of the two associations listed above
Interpret your findings in 2-3 sentences

NOTE: This is a draft of an entire answer key just to illustrate what we might assign and what the results might be using this particular data set. I'm not proposing to post all of this in the cases study./WL

> fram_nosmoke_nolow <- read_csv("C:/Users/wlamorte/Desktop/Weymouth/fram-nosmoke-nolow.csv")

> View(fram_nosmoke_nolow)

> fr<-na.omit(fram_nosmoke_nolow)

> attach(fr)

> bmicat<-ifelse(BMI>29.99, "obese", ifelse(BMI>24.99, "over", "normal"))

> table(bmicat,MI_FCHD)

MI_FCHD

bmicat 0 1

norm 694 79

obese 296 50

over 800 123

> RRtableobese<-matrix(c(694,296,79,50),nrow=2,ncol=2)

> RRtableobese

[,1] [,2]

[1,] 694 79

[2,] 296 50

> riskratio.wald(RRtableobese)

$data

Outcome

Predictor Disease1 Disease2 Total

Exposed1 694 79 773

Exposed2 296 50 346

Total 990 129 1119

$measure

risk ratio with 95% C.I.

Predictor estimate lower upper

Exposed1 1.00000 NA NA

Exposed2 1.41399 1.015808 1.968254

$p.value

two-sided

Predictor midp.exact fisher.exact chi.square

Exposed1 NA NA NA

Exposed2 0.0442418 0.04319723 0.04054254

$correction [1] FALSE

attr(,"method")

[1] "Unconditional MLE & normal approximation (Wald) CI"

#Association with Overweight

> RRtableover<-matrix(c(694,800,79,123),nrow=2,ncol=2)

> riskratio.wald(RRtableover)

$data

Outcome

Predictor Disease1 Disease2 Total

Exposed1 694 79 773

Exposed2 800 123 923

Total 1494 202 1696

$measure

risk ratio with 95% C.I.

Predictor estimate lower upper

Exposed1 1.000000 NA NA

Exposed2 1.303935 0.9994429 1.701193

$p.value

two-sided

Predictor midp.exact fisher.exact chi.square

Exposed1 NA NA NA

Exposed2 0.04906706 0.05062938 0.04919578

$correction

[1] FALSE

Exercise for Multiple Logistic Regression Analysis of "bmicat" (BMI category) and MI_FCHD

Thinking man icon signifying a question for the student

We previously examined the associations between overweight or obesity and risk of being hospitalized or dying of coronary heart disease. In this analysis we will re-examine these two associations using multiple logistic regression to adjust for confounding by other risk factors for CHD.

> fram_nosmoke_nolow <- read_csv("C:/Users/wlamorte/Desktop/Weymouth/fram-nosmoke-nolow.csv")

> View(fram_nosmoke_nolow)

> fr<-na.omit(fram_nosmoke_nolow)

> attach(fr)

> bmicat<-ifelse(BMI>29.99, "obese", ifelse(BMI>24.99, "over", "normal"))

# Create a subset that excludes overweight

> noover<-subset(fr, bmicat != "over")

> detach(fr)

> attach(noover)

# Create a new variable called "obese"

> obese<-ifelse(BMI>29.99,1,0)

#Logistic Regression for Exposure "obese" versus "normal" after adjusting for AGE

> log1<-glm(MI_FCHD ~ obese + AGE , family=binomial(link=logit))

> summary(log1)

Call:

glm(formula = MI_FCHD ~ obese + AGE, family = binomial(link = logit))

Deviance Residuals:

Min 1Q Median 3Q Max

-0.8421 -0.5311 -0.4431 -0.3544 2.5069

Coefficients: Estimate Std. Error z value Pr(>|z|)

(Intercept) -5.47423 0.78801 -6.947 3.73e-12 ***

obese 0.48740 0.20780 2.346 0.019 *

AGE 0.05166 0.01200 4.304 1.68e-05 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 723.60 on 1023 degrees of freedom

Residual deviance: 700.12 on 1021 degrees of freedom

AIC: 706.12

Number of Fisher Scoring iterations: 5

> exp(log1$coefficients)

(Intercept) obese AGE

0.004193445 1.628083938 1.053013525

> exp(confint(log1))

Waiting for profiling to be done...

2.5 % 97.5 %

(Intercept) 0.0008647461 0.01907765

obese 1.0776944454 2.43824787

AGE 1.0287671000 1.07841237

# Repeat logistic regression for obese vs. normal after adjusting for some other risk factors

> log1<-glm(MI_FCHD ~ obese + AGE + LDLC + HDLC + SYSBP + DIABETES , family=binomial(link=logit))

> summary(log1)

Call:

glm(formula = MI_FCHD ~ obese + AGE + LDLC + HDLC + SYSBP + DIABETES, family = binomial(link = logit))

Deviance Residuals:

Min 1Q Median 3Q Max

-1.6418 -0.5109 -0.3771 -0.2730 2.6883

Coefficients: Estimate Std. Error z value Pr(>|z|)

(Intercept) -6.507675 1.071545 -6.073 1.25e-09 ***

obese -0.030331 0.229960 -0.132 0.895067

AGE 0.035424 0.013193 2.685 0.007253 **

LDLC 0.005227 0.002044 2.557 0.010558 *

HDLC -0.029157 0.007926 -3.679 0.000235 ***

SYSBP 0.017527 0.004650 3.769 0.000164 ***

DIABETES 0.983129 0.288066 3.413 0.000643 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 723.60 on 1023 degrees of freedom

Residual deviance: 646.96 on 1017 degrees of freedom

AIC: 660.96

Number of Fisher Scoring iterations: 5

> exp(log1$coefficients)

(Intercept) obese AGE LDLC HDLC SYSBP DIABETES

0.001491945 0.970124848 1.036058404 1.005240782 0.971264064 1.017681865 2.672805712

> exp(confint(log1))

Waiting for profiling to be done...

2.5 % 97.5 %

(Intercept) 0.0001753714 0.01177539

obese 0.6134638132 1.51388154

AGE 1.0097635508 1.06345263

LDLC 1.0012045369 1.00927869

HDLC 0.9559094337 0.98609662

SYSBP 1.0084451438 1.02703963

DIABETES 1.4965798888 4.64849227

###############################################################################

# Logistic regression comparing overweight to normal after adjusting for AGE

#Create a subset consisting of just overweight and normal subjects

> noobese<-subset(fr, bmicat != "obese")

> detach(fr)

> attach(noobese)

# Create a new variable called "over

> over<-ifelse(BMI>24.99, 1,0)

> log1<-glm(MI_FCHD ~ over + AGE , family=binomial(link=logit))

> summary(log1)

Call:

glm(formula = MI_FCHD ~ over + AGE, family = binomial(link = logit))

Deviance Residuals:

Min 1Q Median 3Q Max

-0.8341 -0.5388 -0.4437 -0.3560 2.5030

Coefficients: Estimate Std. Error z value Pr(>|z|)

(Intercept) -5.00036 0.78106 -6.402 1.53e-10 ***

over -0.45846 0.20758 -2.209 0.0272 *

AGE 0.05154 0.01200 4.294 1.76e-05 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 723.60 on 1023 degrees of freedom

Residual deviance: 700.71 on 1021 degrees of freedom

AIC: 706.71

Number of Fisher Scoring iterations: 5

> exp(log1$coefficients)

(Intercept) over AGE

0.006735502 0.632258308 1.052892683

#Repeat logistic regression for overweight versus normal adjusting for other risk factors

> log1<-glm(MI_FCHD ~ over + AGE + LDLC + HDLC + SYSBP + DIABETES , family=binomial(link=logit))

> summary(log1)

Call:

glm(formula = MI_FCHD ~ over + AGE + LDLC + HDLC + SYSBP + DIABETES, family = binomial(link = logit))

Deviance Residuals:

Min 1Q Median 3Q Max

-1.5164 -0.5258 -0.4037 -0.2906 2.8485

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -6.290747 0.852881 -7.376 1.63e-13 ***

over 0.027337 0.169264 0.162 0.87170

AGE 0.042661 0.010639 4.010 6.08e-05 ***

LDLC 0.003594 0.001630 2.205 0.02748 *

HDLC -0.026993 0.006032 -4.475 7.65e-06 ***

SYSBP 0.014310 0.003635 3.937 8.25e-05 ***

DIABETES 0.800967 0.241694 3.314 0.00092 ***

--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1150.8 on 1602 degrees of freedom

Residual deviance: 1049.0 on 1596 degrees of freedom

AIC: 1063

Number of Fisher Scoring iterations: 5

> exp(log1$coefficients)

(Intercept) over AGE LDLC HDLC SYSBP DIABETES

0.001853375 1.027713943 1.043583813 1.003600021 0.973367583 1.014413359 2.227694210

> exp(confint(log1))

2.5 % 97.5 %

(Intercept) 0.000339305 0.009639505

over 0.738581214 1.435390814

AGE 1.022205549 1.065783923

LDLC 1.000360834 1.006785335

HDLC 0.961715848 0.984733408

SYSBP 1.007205634 1.021679382

DIABETES 1.370262731 3.542918370

What conclusions would you draw from your analysis?
What do these results suggest about the relative importance of obesity versus overweight?
If we focus on obese versus normal categories of BMI, the results are different when we adjust for just AGE and when we adjust for AGE plus other risk factors such as LDLC HDLC, SYSBP.
Is it possible that these are biological intermediates, i.e., that the effects of overweight and obesity are mediated via obesity's effects on LDLC, HDLC, blood pressure, and diabetes?

ANSWER:
Overweight as a risk factor:

In the earlier crude (unadjusted) analysis we found that those who were overweight had 1.3 times the risk of being hospitalized for an MI or dying of CHD compared to those with normal BMI (95% confidence interval: 0.999 to 1.70). When logistic regression was used to adjust for confounding by age, subjects who were overweight had 1.32 times the risk of being hospitalized for an MI or dying from CHD compared to those with normal BMI (95% confidence interval 0.97 to 1.83). Thus, in both cases the association was of borderline significance. However, when we adjusted for additional risk factors (blood pressure LDLC, HDLC, and diabetes), this apparent association disappeared (RR=1.03, 95% confidence interval: 0.74 to 1.44.)

Obesity as a risk factor:

The earlier crude analysis of obesity's association with hospitalization for MI or death from CHD suggested that those who were obese had 1.41 (95% confidence interval: 1.015808 1.968254, p=0.044). When logistic regression was used to adjust for confounding by age, obese subjects had 1.63 times the risk of being hospitalized for an MI or dying from CHD (95% confidence interval: 0.97 to 1.83). However, once again, when logistic régression was used to adjust for age, blood pressure, LDLC, HDLC, and diabetes), the association was no longer significant (RR=0.97, 95% confidence interval: 0.61 to 1.51).

It has been well-established that overweight and obesity are associated with increased blood pressure, abnormalities in LDLC and HDLC, and an increased risk of diabetes. Therefore, in this case it is likely that these other risk factors are actually biological intermediates, i.e., that obesity causes elevations in blood pressure, lipid abnormalities, and an increased risk of type 2 diabetes. These, in turn, cause an increased risk of myocardial infarction or death from coronary heart disease.

EDITORIAL NOTE: Other data sets available at NHLBI BioLINCC at https://biolincc.nhlbi.nih.gov/home/]

return to top | previous page | next page