Reading and Writing Data to and from R
Reading files into R
Usually we will be using data already in a file that we need to read into R in order to work on it. R can read data from a variety of file formats—for example, files created as text, or in Excel, SPSS or Stata. We will mainly be reading files in text format .txt or .csv (comma-separated, usually created in Excel).
To read an entire data frame directly, the external file will normally have a special form
- The first line of the file should have a name for each variable in the data frame.
- Each additional line of the file has as its first item a row label and the values for each variable.
Here we use the example dataset called airquality.csv and airquality.txt
Input file form with names and row labels:
Ozone Solar.R* Wind Temp Month Day
1 41***** 190** 7.4** 67**** 5 **1
2 36***** 118** 8.0** 72**** 5** 2
3 12***** 149* 12.6** 74**** 5** 3
4 18***** 313* 11.5 **62**** 5** 4
5 NA***** NA** 14.3** 56**** 5** 5
...
By default numeric items (except row labels) are read as numeric variables. This can be changed if necessary.
The function read.table()
can then be used to read the data frame directly
> airqual <- read.table("C:/Desktop/airquality.txt")
Similarly, to read .csv files the read.csv() function can be used to read in the data frame directly
[Note: I have noticed that occasionally you'll need to do a double slash in your path //. This seems to depend on the machine.]
> airqual <- read.csv("C:/Desktop/airquality.csv")
In addition, you can read in files using the file.choose() function in R. After typing in this command in R, you can manually select the directory and file where your dataset is located.
|
Occasionally, you will need to read in data that does not already have column name information. For example, the dataset BOD.txt looks like this:
1 8.3
2 10.3
3 19.0
4 16.0
5 15.6
7 19.8
Initially, there are no column names associated with the dataset. We can use the colnames() command to assign column names to the dataset. Suppose that we want to assign columns, "Time" and "demand" to the BOD.txt dataset. To do so we do the following
> bod <- read.table("BOD.txt", header=F)
> colnames(bod) <- c("Time","demand")
> colnames(bod)
[1] "Time" "demand"
The first command reads in the dataset, the command "header=F" specifies that there are no column names associated with the dataset.
Read in the cars.txt dataset and call it car1. Make sure you use the "header=F" option to specify that there are no column names associated with the dataset. Next, assign "speed" and "dist" to be the first and second column names to the car1 dataset. |
The two videos below provide a nice explanations of different methods to read data from a spreadsheet into an R dataset.
Import Data, Copy Data from Excel to R, Both .csv and .txt Formats (R Tutorial 1.3) MarinStatsLectures [Contents]
Importing Data and Working With Data in R (R Tutorial 1.4) MarinStatsLectures [Contents]
Writing Data to a File
After working with a dataset, we might like to save it for future use. Before we do this, let's first set up a working directory so we know where we can find all our data sets and files later.
Setting up a Directory
In the R window, click on "File" and then on "Change dir". You should then see a box pop up titled "Choose directory". For this class, choose the directory "Desktop" by clicking on "Browse", then select "Desktop" and click "OK". In the future, you may want to create a directory on your computer where you keep your data sets and codes for this class.
Alternatively, you can use the setwd() function to assign as working directory.
> setwd("C:/Desktop")
To find out what your current working directory is, type
> getwd()
Setting Up Working Directories in R (R Tutorial 1.8) MarinStatsLectures [Contents]
In R, we can write data frames easily to a file, using the write.table() command.
> write.table(cars1, file="cars1.txt", quote=F)
The first argument refers to the data frame to be written to the output file, the second is the name of the output file. By default R will surround each entry in the output file by quotes, so we use quote=F.
Now, let's check whether R created the file on the Desktop, by going to the Desktop and clicking to open the file. You should see a file with three columns, the first giving the index (or row number) and the other two the speed and distance. R by default creates a column of row indices. If we wanted to create a file without the row indices, we would use the command:
> write.table(cars1, file="cars1.txt", quote=F, row.names=F)
Datasets in R
Watch the video below for a concise intoduction to working with the variables in an R dataset
Working with Variables and Data in R (R Tutorial 1.5) MarinStatsLecures [Contents]
Around 100 datasets are supplied with R (in the package datasets), and others are available.
To see the list of datasets currently available use the command:
data()
We will first look at a data set on CO2 (carbon dioxide) uptake in grass plants available in R.
> CO2
[Note: capitalization matters here; also: it's the letter O, not zero. Typing this command should display the entire dataset called CO2, which has 84 observations (in rows) and 5 variables (columns).]
To get more information on the variables in the dataset, type in
> help(CO2)
Evaluate and report the mean and standard deviation of the variables "Concentration" and "Uptake". |
Subsetting Data in R With Square Brackets and Logic Statements (R Tutorial 1.6) MarinStatsLecures [Contents]