Authors:

Timothy C. Heeren, PhD, Professor of Biostastics

Jacqueline N. Milton, PhD, Clinical Assistant Professor, Biostatistics

Boston University School of Public Health

Basic Statistical Analysis Using the R Statistical Package

Introduction


R is a freely distributed software package for statistical analysis and graphics, developed and managed by the R Development Core Team. R can be downloaded from the Internet site of the Comprehensive R Archive Network (CRAN) (http://cran.r-project.org). Check that you download the correct version of R for your operating system (for example, XP for the PC, Tiger or earlier versions of OSX for Macs). R is related to the S statistical language which is commercially available as S-PLUS.

R is an object-oriented language. For our basic applications, matrices representing data sets (where columns represent different variables and rows represent different subjects) and column vectors representing variables (one value for each subject in a sample) are objects in R. Functions in R perform calculations on objects. For example, if 'cholesterol' was an object representing cholesterol levels from a sample, the function 'mean(cholesterol)' would calculate the mean cholesterol for the sample. For our basic applications, results of an analysis are displayed on the screen. Results from analyses can also be saved as objects in R, allowing the user to manipulate results or use the results in further analyses.

Data can be directly entered into R, but we will usually use MS Excel to create a data set. Data sets are arranged with each column representing a variable, and each row representing a subject; a data set with 5 variables recorded on 50 subjects would be represented in an Excel file with 5 columns and 50 rows. Data can be entered and edited using Excel. Excel can save files in 'comma delimited format', or .csv files; these .csv files can then be read into R for analysis.

R is an interactive language. When you start R, a blank window appears with a '>', which is the ready prompt, on the first line of the window. Analyses are performed through a series of commands; the user enters a command and R responds, the user then enters the next command and R responds. In this document, commands typed in by the user are given in red and responses from R are given in blue; R uses this same color scheme.

Some helpful odds and ends when using R: