Review of Matrices, Arrays, and Data Frames


We discussed vectors, matrices and arrays in the first module in this series. Let's recall how to create a matrix of data from some given measurements, say heights and weights of 15 students. Suppose we have the data below:

 

height   58   59   60   61   62   63   64   65   66    67    68    69    70    71    72

weight  115  117  120  123  126  129  132  135  139   142   146   150   154   159   164

 

We can read this into R using the following commands:

>  height = c(58,59,60,61,62,63,64,65,66,67,68,69,70,71,72)

>  weight = c(115,117,120,123,126,129,132,135,139,142,146,150,154,159,164)

 

This gives us two vectors, but it may be useful to convert this into a matrix, so that each person's height and weight appear together. To do this, we can use the command:

> htwtmatrix = matrix(c(height,weight),15,2) # what do 15 and 2 refer to?

> htwtmatrix

      [,1] [,2]

 [1,]   58  115

 [2,]   59  117

 [3,]   60  120

 [4,]   61  123

 [5,]   62  126

 [6,]   63  129

 [7,]   64  132

 [8,]   65  135

 [9,]   66  139

[10,]   67  142

[11,]   68  146

[12,]   69  150

[13,]   70  154

[14,]   71  159

[15,]   72  164

 

What do you notice about how R creates a matrix from a vector? It constructs matrices column-wise by default, so if you want to create a matrix row-by-row, you need to give it an additional argument "byrow=T".

 

How would you create a matrix that has height and weight as the two rows instead of columns?

Look for help on this under help(matrix) if necessary.

Now we have each person's height and weight together. However, for future reference, instead of storing the data as a matrix, it might be helpful to have the column names together with the data.

 

Recall from module 1 that in order to assign column names, we first have to convert htwtmatrix to a data frame. A data frame is a list of vectors and/or factors of the same length that are related "across" such that data in the same row position come from the same experimental unit (subject, animal, etc.). In addition, it has a unique set of row names. To convert htwtmatrix to a data frame, we use the command:

> htwtdata = data.frame(htwtmatrix)

> htwtdata

   X1  X2

1  58 115

2  59 117

3  60 120

4  61 123

5  62 126

6  63 129

7  64 132

8  65 135

9  66 139

10 67 142

11 68 146

12 69 150

13 70 154

14 71 159

15 72 164

 

The command as.data.frame() works as well.

Notice that now the columns are named "X1" and "X2". We can now assign names to the columns by means of the "names()" command:

> names(htwtdata) = c("height","weight")

We can always find the column names of a data frame, without opening up the whole data set, by typing in

> names(htwtdata)

[1] "height" "weight"

 

Let us recall how R operates on matrices, and how that compares to data frames. Recall that R evaluates functions over entire vectors (and matrices), avoiding the need to loops. For example, what do the following commands do?

> htwtmatrix*2

> htwtmatrix[,1]/12    # convert height in inches to feet

> mean(htwtmatrix[,2])

 

To get the dimensions or number of rows or columns of a data frame, it is often useful to use one of the following commands:

> dim(htwtdata)

> nrow(htwtdata)

> ncol(htwtdata)

What does the following R command do?

 

> htwtdata[,2]*703/htwtdata[,1]^2  

 

How would you get R to give you the height and weight of the 8th student in the data set? The 8th and 10th student?