An Exercise Using R to Compute Chi-Squared from a Data Set

In this exercise you will examine associations between baseline health characteristics measured during the 1st  visit in the Framingham Heart Study and development of heart disease & death over 20 years of follow-up. You will use a subset (n=500) of the data saved in the file FramHSn500.csv.

The data is coded as shown in the table below:

Variable Names

Description

RANDID

subject ID number

SEX

1=male, 2=female

AGE

age in years

SYSBP

systolic blood pressure in mmHg

DIABP

diastolic blood pressure in mmHg

TOTCHOL

total cholesterol in mg/dL

CURSMOKE

current smoking status, 1= smoker, 0=nonsmoker

FRAM_BMI

body mass index measured in kg/m2

COFFEE

cups of coffee per day; (6= 6 or more)

DIABETES

developed diabetes over 20 years of follow-up, 1=yes, 0=no

HEARTDIS

developed heart disease over 20 yrs of follow-up, 1=yes, 0=no

ANYDEATH

death (any cause) over 20 yrs. of follow-up;  1=died;0=living

 Save your R code and output as you answer the following questions.

  1. Are smokers at higher risk of death over follow-up than non-smokers?  Test this with a chi-square test, reporting the proportion of smokers and non-smokers who have died over 20 years of follow-up, the risk ratio, the value of the chi-square statistic, degrees of freedom, and p-value.  Summarize your conclusions.
  2. Are smokers at higher risk of death over follow-up than non-smokers?  Find the risk ratio of death for smokers vs. non-smokers, and the 95% confidence interval for this risk ratio (remember that the orientation of the table matters when finding a RR).  Given this confidence interval, do smokers have significantly higher risk of death?  Explain.
  3.  Is there an association between coffee consumption and death over the 20-year follow-up?  Test this with a chi-square statistic, reporting the proportion who have died in each category of coffee consumption, the value of the test statistic, degrees of freedom, and p-value. Summarize your conclusions.
  4. Do those who develop heart disease have a higher risk of death over follow-up?  What percent of those with and without heart disease die over follow-up?  Test through a chi-square statistic, reporting the value of the test statistic, degrees of freedom, and p-value.  Summarize your conclusions.
  5. Do those who develop heart disease have a higher risk of death over follow-up?  Find the risk ratio of death for with vs. without heart disease, and the 95% confidence interval for this relative risk (remember that the orientation of the table matters when finding a RR).  Given this confidence interval, do those who develop heart disease have significantly higher risk of death?  Explain.

Link to All Answers in a Word file

 

Test Yourself

#1 - Starship Enterprise

Now let's return to the question of mortality rates on the Enterprise. Here is the data. Is the mortality rate significantly greater among the Red crew members? Use R to analyze these data.

Color

Areas

Crew

Fatalities

Blue

Science and Medical

136 7

Gold

Command and Helm

55 9

Red

Operations, Engineering, and Security

239 4

Ship's total

All

430 40
Link to Answer in a Word file

 

#2 - The Titanic

Now let's reconsider 1st, 2nd, or 3rd class passengers differed significantly in their risk of death after the Titanic struck an iceberg. Here is the data.

Women

Alive

Dead

Total

1st Class

137 4 141

2nd Class

79 13 92
3rd Class 88 91 179

Use R to compute the risk ratio of death for 2nd class compared to 1st class (reference group) and for 3rd class compared to 1st class. Also compute the 95% confidence intervals for these risk ratio estimates and the p-values.

Was the risk of death significantly higher in 2nd class and 3rd class passengers?

Link to the Answer in a Word file