Authors:

Timothy C. Heeren, PhD, Professor of Biostastics

Jacqueline N. Milton, PhD, Clinical Assistant Professor, Biostatistics

Boston University School of Public Health

# Basic Statistical Analysis Using the R Statistical Package

# Introduction

R is a freely distributed software package for statistical analysis and graphics, developed and managed by the R Development Core Team. R can be downloaded from the Internet site of the Comprehensive R Archive Network (CRAN) (http://cran.r-project.org). Check that you download the correct version of R for your operating system (for example, XP for the PC, Tiger or earlier versions of OSX for Macs). R is related to the S statistical language which is commercially available as S-PLUS.

R is an object-oriented language. For our basic applications, matrices representing data sets (where columns represent different variables and rows represent different subjects) and column vectors representing variables (one value for each subject in a sample) are objects in R. Functions in R perform calculations on objects. For example, if 'cholesterol' was an object representing cholesterol levels from a sample, the function 'mean(cholesterol)' would calculate the mean cholesterol for the sample. For our basic applications, results of an analysis are displayed on the screen. Results from analyses can also be saved as objects in R, allowing the user to manipulate results or use the results in further analyses.

Data can be directly entered into R, but we will usually use MS Excel to create a data set. Data sets are arranged with each column representing a variable, and each row representing a subject; a data set with 5 variables recorded on 50 subjects would be represented in an Excel file with 5 columns and 50 rows. Data can be entered and edited using Excel. Excel can save files in 'comma delimited format', or .csv files; these .csv files can then be read into R for analysis.

R is an interactive language. When you start R, a blank window appears with a '>', which is the ready prompt, on the first line of the window. Analyses are performed through a series of commands; the user enters a command and R responds, the user then enters the next command and R responds. In this document, commands typed in by the user are given in red and responses from R are given in blue; R uses this same color scheme.

Some helpful odds and ends when using R:

- Entering an object name will generally print that object.
- R is case sensitive, so an object named Group must be referred to as Group, not group.
- The up and down arrow keys can be used to recall and scroll through past commands, which can save typing when fixing typos or modifying a command.
- Entering a letter and then hitting the Tab key twice will list the commands and objects starting with that letter.
- Material can be cut and pasted into or from the R window. This allows you to save and print R results as part of MS Word documents, or save the text of your R session as a record of your work. R text is generally formatted as Courier font, and using Courier 9 point font works well for R output.
- There is a lot of R help out on the internet. For example, I was stuck trying to decipher the R help page for analysis of variance and so I googled 'Analysis of Variance R'. I found several sites offering examples.
- As with any software program, there usually is more than one way to do things through R. The methods in this handout are not the only way to perform these analyses through R, and you should feel free to experiment and explore.