7.7 Ordering Data Frames
You can reorder as well as select. For example, the following code selects the first five rows ofm111survey
and then reverses them:
<- m111survey[, c("height", "ideal_ht")]
df <- df[5:1, ]
dfRev head(dfRev)
## height ideal_ht
## 5 72 72
## 4 62 65
## 3 64 NA
## 2 74 76
## 1 76 78
If you want, you can even scramble the rows of the data frame in a random order:
<- nrow(m111survey)
n <- sample(1:n, size = n, replace = FALSE)
shuffle <- m111survey[shuffle, ]
df head(df[c("sex", "seat")]) #show just two columns
## sex seat
## 25 female 2_middle
## 51 female 2_middle
## 69 female 1_front
## 52 female 2_middle
## 64 male 3_back
## 13 female 1_front
It is quite common to order the rows of a frame according to the values of a particular variable. For example, you might want to arrange the rows by height
, so that the frame begins with the shortest subject and ends with the tallest.
Accomplishing this task requires a study of R’s order()
function. Consider the following vector:
<- c(15, 12, 23, 7) vec
Call order()
with this vector as an argument:
order(vec)
## [1] 4 2 1 3
order()
returns the indices of the elements of vec
, in the following order:
- the index of the smallest element (7, at index 4 of
vec
); - the index of the second-smallest element (12, at index 2 of
vec
); - the index of the third-smallest element (15, at index 1 of
vec
); - the index of the largest element (23, at index 3 of
vec
).
Can you guess the output of the following function-call without looking for the answer underneath?
order(vec)] vec[
## [1] 7 12 15 23
Sure enough, the result is vec
sorted: from smallest to largest element.
Now the sorting of vec
could have been accomplished with R’s sort()
function:
sort(vec)
## [1] 7 12 15 23
The power of order()
comes with the rearrangement of rows of a data frame. In order to “sort” the frame from shortest to tallest subject, call:
<- m111survey[order(m111survey$height), ]
df head(df[, c("sex", "height")]) # to show that it worked
## sex height
## 45 female 51
## 26 female 54
## 9 female 59
## 13 female 59
## 40 female 60
## 69 female 61
If you want to order the rows from tallest to shortest instead, then use the decreasing
parameter, which by default is FALSE
:
<- m111survey[order(m111survey$height, decreasing = TRUE), ]
df head(df[, c("sex", "height")]) # to show that it worked
## sex height
## 8 male 79
## 14 female 78
## 1 male 76
## 58 male 76
## 34 male 75
## 54 male 75
Sometimes you want to order by two or more variables. For example suppose you want to arrange the frame so that the folks preferring to sit in front come first, followed by the people who prefer the middle and ending with the people who prefer the back. Within these groups you would like people to be arranged from shortest to tallest. Then call:
<- with(m111survey, order(seat, height))
ordering <- m111survey[ordering, ]
df head(df[, c("seat", "height")], n = 10) # see if it worked
## seat height
## 45 1_front 51
## 26 1_front 54
## 13 1_front 59
## 69 1_front 61
## 4 1_front 62
## 12 1_front 62
## 23 1_front 63
## 38 1_front 63
## 61 1_front 63
## 57 1_front 64
7.7.1 Practice Exercises
Consider the following vector:
<- c("Mole", "Frog", "Rat", "Badger") creatures
Write down what you think will be the result of the call:
order(creatures)
Then check your answer by actually running:
<- c("Mole", "Frog", "Rat", "Badger") creatures order(creatures)
What will be the result of the following?
order(creatures, decreasing = TRUE)
Arrange the rows of the data frame
mosaicData::CPS85
in order, from the lowest to the highest wage. Break ties by experience (less experience coming before more experience).Arrange the rows of the data frame
mosaicData::CPS85
in order, from the lowest to the highest wage. Break ties by experience (more experience coming before less experience).
7.7.2 Solutions to Practice Exercises
Here’s what you get:
order(creatures)
## [1] 4 2 1 3
Here’s what you get:
order(creatures, decreasing = TRUE)
## [1] 3 1 2 4
Here is one way:
order(CPS85$wage, CPS85$exper), ] CPS85[
Here is one way:
order(CPS85$wage, CPS85$exper, CPS85[decreasing = c(FALSE, TRUE)), ]