1.7 Finding means, medians and standard deviations

The 'mean( )' function calculates means from an object representing either a data matrix or a variable vector. For example, for the 'kidswalk' data set described above, we can calculate the means for all the variables in the data set (a dataframe object):

> mean(kidswalk)

subjno group sex agewalk

25.50 1.34 0.48 11.13

The mean( ) function can also be used to calculate the mean of a single variable (a data vector object):

> mean(agewalk)

[1] 11.13

The 'sd( )' function calculates standard deviations, either for all variables in a data set or for specific variables.

> sd(kidswalk)

subjno group sex agewalk

14.5773797 0.4785181 0.5046720 1.3583078

> sd(agewalk)

[1] 1.358308

The length() function returns the number of values (n, the sample size) in a data vector:

> length(agewalk)

[1] 50

The median of a variable, along with the minimum, maximum, 25th percentile and 75th percentile, are given by the 'summary( )' function:

> summary(Age_walk)

Min. 1st Qu. Median Mean 3rd Qu. Max.

9.00 10.00 11.25 11.13 12.00 13.50

1.8 Finding frequencies and proportions for categorical variables

For categorical variables, the 'table( )' function gives the number of subjects in each category, and using the two functions 'prop.table(table( ))' gives the proportion of subjects in each category (although I find it easier to just calculate the proportions from the frequencies). For example, in the age at walking data set, the variable 'sexmale' is coded 0 for females and 1 for males. The number of males and females in the data set are:

> table(sexmale)


0 1

26 24

The proportions of males and females can be calculated from the frequencies, using R as a calculator:

> 26/(26+24)


> 24/(26+24)


Alternatively, proportions can be calculated using the prop.table( ) command (although this gets a bit complicated in more involved applications):

> prop.table(table(sexmale))


0 1

0.52 0.48