7.7 Ordering Data Frames

You can reorder as well as select. For example, the following code selects the first five rows ofm111survey and then reverses them:

df <- m111survey[, c("height", "ideal_ht")]
dfRev <- df[5:1, ]
head(dfRev)
##   height ideal_ht
## 5     72       72
## 4     62       65
## 3     64       NA
## 2     74       76
## 1     76       78

If you want, you can even scramble the rows of the data frame in a random order:

n <- nrow(m111survey)
shuffle <- sample(1:n, size = n, replace = FALSE)
df <- m111survey[shuffle, ]
head(df[c("sex", "seat")])  #show just two columns
##       sex     seat
## 25 female 2_middle
## 51 female 2_middle
## 69 female  1_front
## 52 female 2_middle
## 64   male   3_back
## 13 female  1_front

It is quite common to order the rows of a frame according to the values of a particular variable. For example, you might want to arrange the rows by height, so that the frame begins with the shortest subject and ends with the tallest.

Accomplishing this task requires a study of R’s order() function. Consider the following vector:

vec <- c(15, 12, 23, 7)

Call order() with this vector as an argument:

order(vec)
## [1] 4 2 1 3

order() returns the indices of the elements of vec, in the following order:

  • the index of the smallest element (7, at index 4 of vec);
  • the index of the second-smallest element (12, at index 2 of vec);
  • the index of the third-smallest element (15, at index 1 of vec);
  • the index of the largest element (23, at index 3 of vec).

Can you guess the output of the following function-call without looking for the answer underneath?

vec[order(vec)]
## [1]  7 12 15 23

Sure enough, the result is vec sorted: from smallest to largest element.

Now the sorting of vec could have been accomplished with R’s sort()function:

sort(vec)
## [1]  7 12 15 23

The power of order() comes with the rearrangement of rows of a data frame. In order to “sort” the frame from shortest to tallest subject, call:

df <- m111survey[order(m111survey$height), ]
head(df[, c("sex", "height")])  # to show that it worked
##       sex height
## 45 female     51
## 26 female     54
## 9  female     59
## 13 female     59
## 40 female     60
## 69 female     61

If you want to order the rows from tallest to shortest instead, then use the decreasing parameter, which by default is FALSE:

df <- m111survey[order(m111survey$height, decreasing = TRUE), ]
head(df[, c("sex", "height")])  # to show that it worked
##       sex height
## 8    male     79
## 14 female     78
## 1    male     76
## 58   male     76
## 34   male     75
## 54   male     75

Sometimes you want to order by two or more variables. For example suppose you want to arrange the frame so that the folks preferring to sit in front come first, followed by the people who prefer the middle and ending with the people who prefer the back. Within these groups you would like people to be arranged from shortest to tallest. Then call:

ordering <- with(m111survey, order(seat, height))
df <- m111survey[ordering, ]
head(df[, c("seat", "height")], n = 10)  # see if it worked
##       seat height
## 45 1_front     51
## 26 1_front     54
## 13 1_front     59
## 69 1_front     61
## 4  1_front     62
## 12 1_front     62
## 23 1_front     63
## 38 1_front     63
## 61 1_front     63
## 57 1_front     64

7.7.1 Practice Exercises

  1. Consider the following vector:

    creatures <- c("Mole", "Frog", "Rat", "Badger")

    Write down what you think will be the result of the call:

    order(creatures)

    Then check your answer by actually running:

    creatures <- c("Mole", "Frog", "Rat", "Badger")
    order(creatures)
  2. What will be the result of the following?

    order(creatures, decreasing = TRUE)
  3. Arrange the rows of the data frame mosaicData::CPS85 in order, from the lowest to the highest wage. Break ties by experience (less experience coming before more experience).

  4. Arrange the rows of the data frame mosaicData::CPS85 in order, from the lowest to the highest wage. Break ties by experience (more experience coming before less experience).

7.7.2 Solutions to Practice Exercises

  1. Here’s what you get:

    order(creatures)
    ## [1] 4 2 1 3
  2. Here’s what you get:

    order(creatures, decreasing = TRUE)
    ## [1] 3 1 2 4
  3. Here is one way:

    CPS85[order(CPS85$wage, CPS85$exper), ]
  4. Here is one way:

    CPS85[order(CPS85$wage, CPS85$exper, 
                decreasing = c(FALSE, TRUE)), ]