Review of Matrices, Arrays, and Data Frames
We discussed vectors, matrices and arrays in the first module in this series. Let's recall how to create a matrix of data from some given measurements, say heights and weights of 15 students. Suppose we have the data below:
height 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
weight 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
We can read this into R using the following commands:
> height = c(58,59,60,61,62,63,64,65,66,67,68,69,70,71,72)
> weight = c(115,117,120,123,126,129,132,135,139,142,146,150,154,159,164)
This gives us two vectors, but it may be useful to convert this into a matrix, so that each person's height and weight appear together. To do this, we can use the command:
> htwtmatrix = matrix(c(height,weight),15,2) # what do 15 and 2 refer to?
> htwtmatrix
[,1] [,2]
[1,] 58 115
[2,] 59 117
[3,] 60 120
[4,] 61 123
[5,] 62 126
[6,] 63 129
[7,] 64 132
[8,] 65 135
[9,] 66 139
[10,] 67 142
[11,] 68 146
[12,] 69 150
[13,] 70 154
[14,] 71 159
[15,] 72 164
What do you notice about how R creates a matrix from a vector? It constructs matrices column-wise by default, so if you want to create a matrix row-by-row, you need to give it an additional argument "byrow=T".
How would you create a matrix that has height and weight as the two rows instead of columns? Look for help on this under help(matrix) if necessary. |
Now we have each person's height and weight together. However, for future reference, instead of storing the data as a matrix, it might be helpful to have the column names together with the data.
Recall from module 1 that in order to assign column names, we first have to convert htwtmatrix to a data frame. A data frame is a list of vectors and/or factors of the same length that are related "across" such that data in the same row position come from the same experimental unit (subject, animal, etc.). In addition, it has a unique set of row names. To convert htwtmatrix to a data frame, we use the command:
> htwtdata = data.frame(htwtmatrix)
> htwtdata
X1 X2
1 58 115
2 59 117
3 60 120
4 61 123
5 62 126
6 63 129
7 64 132
8 65 135
9 66 139
10 67 142
11 68 146
12 69 150
13 70 154
14 71 159
15 72 164
The command as.data.frame() works as well.
Notice that now the columns are named "X1" and "X2". We can now assign names to the columns by means of the "names()" command:
> names(htwtdata) = c("height","weight")
We can always find the column names of a data frame, without opening up the whole data set, by typing in
> names(htwtdata)
[1] "height" "weight"
Let us recall how R operates on matrices, and how that compares to data frames. Recall that R evaluates functions over entire vectors (and matrices), avoiding the need to loops. For example, what do the following commands do?
> htwtmatrix*2
> htwtmatrix[,1]/12 # convert height in inches to feet
> mean(htwtmatrix[,2])
To get the dimensions or number of rows or columns of a data frame, it is often useful to use one of the following commands:
> dim(htwtdata)
> nrow(htwtdata)
> ncol(htwtdata)
What does the following R command do?
> htwtdata[,2]*703/htwtdata[,1]^2 |
How would you get R to give you the height and weight of the 8th student in the data set? The 8th and 10th student? |