10.5 Additional aggregation functions
There are many, many other aggregation functions that I haven’t covered in this chapter – mainly because I rarely use them. In fact, that’s a good reminder of a peculiarity about R, there are many methods to achieve the same result, and your choice of which method to use will often come down to which method you just like the most.
10.5.1 rowMeans()
, colMeans()
To easily calculate means (or sums) across all rows or columns in a matrix or dataframe, use rowMeans()
, colMeans()
, rowSums()
or colSums()
.
For example, imagine we have the following data frame representing scores from a quiz with 5 questions, where each row represents a student, and each column represents a question. Each value can be either 1 (correct) or 0 (incorrect)
q1 | q2 | q3 | q4 | q5 |
---|---|---|---|---|
1 | 1 | 1 | 1 | 1 |
0 | 0 | 0 | 1 | 0 |
0 | 1 | 1 | 1 | 0 |
0 | 1 | 0 | 1 | 1 |
0 | 0 | 0 | 1 | 1 |
# Some exam scores
exam <- data.frame("q1" = c(1, 0, 0, 0, 0),
"q2" = c(1, 0, 1, 1, 0),
"q3" = c(1, 0, 1, 0, 0),
"q4" = c(1, 1, 1, 1, 1),
"q5" = c(1, 0, 0, 1, 1))
Let’s use rowMeans()
to get the average scores for each student:
Now let’s use colMeans()
to get the average scores for each question:
# What percent of students got each question correct?
colMeans(exam)
## q1 q2 q3 q4 q5
## 0.2 0.6 0.4 1.0 0.6
Warning rowMeans()
and colMeans()
only work on numeric columns. If you try to apply them to non-numeric data, you’ll receive an error.
10.5.2 apply
family
There is an entire class of apply
functions in R that apply functions to groups of data. For example, tapply()
, sapply()
and lapply()
each work very similarly to aggregate()
. For example, you can calculate the average length of movies by genre with tapply()
as follows.
with(movies, tapply(X = time, # DV is time
INDEX = genre, # IV is genre
FUN = mean, # function is mean
na.rm = TRUE)) # Ignore missing
## Action Adventure Black Comedy Comedy Concert/Performance Documentary
## 113 106 113 99 78 69
## Drama Horror Multiple Genres Musical Reality Romantic Comedy
## 116 99 114 113 44 107
## Thriller/Suspense Western
## 112 121
tapply()
, sapply()
, and lapply()
all work very similarly, their main difference is in the structure of their output. For example, lapply()
returns a list (we’ll cover lists in a future chapter).