An Exercise Using R to Compute Chi-Squared from a Data Set
In this exercise you will examine associations between baseline health characteristics measured during the 1st visit in the Framingham Heart Study and development of heart disease & death over 20 years of follow-up. You will use a subset (n=500) of the data saved in the file FramHSn500.csv.
The data is coded as shown in the table below:
Variable Names |
Description |
RANDID |
subject ID number |
SEX |
1=male, 2=female |
AGE |
age in years |
SYSBP |
systolic blood pressure in mmHg |
DIABP |
diastolic blood pressure in mmHg |
TOTCHOL |
total cholesterol in mg/dL |
CURSMOKE |
current smoking status, 1= smoker, 0=nonsmoker |
FRAM_BMI |
body mass index measured in kg/m2 |
COFFEE |
cups of coffee per day; (6= 6 or more) |
DIABETES |
developed diabetes over 20 years of follow-up, 1=yes, 0=no |
HEARTDIS |
developed heart disease over 20 yrs of follow-up, 1=yes, 0=no |
ANYDEATH |
death (any cause) over 20 yrs. of follow-up; 1=died;0=living |
Save your R code and output as you answer the following questions.
- Are smokers at higher risk of death over follow-up than non-smokers? Test this with a chi-square test, reporting the proportion of smokers and non-smokers who have died over 20 years of follow-up, the risk ratio, the value of the chi-square statistic, degrees of freedom, and p-value. Summarize your conclusions.
- Are smokers at higher risk of death over follow-up than non-smokers? Find the risk ratio of death for smokers vs. non-smokers, and the 95% confidence interval for this risk ratio (remember that the orientation of the table matters when finding a RR). Given this confidence interval, do smokers have significantly higher risk of death? Explain.
- Is there an association between coffee consumption and death over the 20-year follow-up? Test this with a chi-square statistic, reporting the proportion who have died in each category of coffee consumption, the value of the test statistic, degrees of freedom, and p-value. Summarize your conclusions.
- Do those who develop heart disease have a higher risk of death over follow-up? What percent of those with and without heart disease die over follow-up? Test through a chi-square statistic, reporting the value of the test statistic, degrees of freedom, and p-value. Summarize your conclusions.
- Do those who develop heart disease have a higher risk of death over follow-up? Find the risk ratio of death for with vs. without heart disease, and the 95% confidence interval for this relative risk (remember that the orientation of the table matters when finding a RR). Given this confidence interval, do those who develop heart disease have significantly higher risk of death? Explain.
Link to All Answers in a Word file
Test Yourself
#1 - Starship Enterprise
Now let's return to the question of mortality rates on the Enterprise. Here is the data. Is the mortality rate significantly greater among the Red crew members? Use R to analyze these data.
Color |
Areas |
Crew |
Fatalities |
Blue |
Science and Medical |
136 | 7 |
Gold |
Command and Helm |
55 | 9 |
Red |
Operations, Engineering, and Security |
239 | 4 |
Ship's total |
All |
430 | 40 |
#2 - The Titanic
Now let's reconsider 1st, 2nd, or 3rd class passengers differed significantly in their risk of death after the Titanic struck an iceberg. Here is the data.
Women |
Alive |
Dead |
Total |
1st Class |
137 | 4 | 141 |
2nd Class |
79 | 13 | 92 |
3rd Class | 88 | 91 | 179 |
Use R to compute the risk ratio of death for 2nd class compared to 1st class (reference group) and for 3rd class compared to 1st class. Also compute the 95% confidence intervals for these risk ratio estimates and the p-values.
Was the risk of death significantly higher in 2nd class and 3rd class passengers?
Link to the Answer in a Word file