14.3 purrr Higher-Order Functions for Iteration

In the remainder of the Chapter we will study important higher-order functions: functions that take a function as an argument and apply that function to each element of another data structure. As we have said previously, such functions often serve as alternatives to loops.

The higher-order functions we study come from the package purrr, which is attached whenever we load the tidy-verse.

14.3.1 map() and Variations

Suppose that we want to generate five vectors, each of which consists of ten numbers randomly chosen between 0 and 1. We accomplish the task with a loop, as follows:

# set up a list of length 10:
lst <- vector(mode = "list", length = 5)
for ( i in 1:5 ) {
  lst[[i]] <- runif(10)
}
str(lst)
## List of 5
##  $ : num [1:10] 0.546 0.271 0.189 0.267 0.956 ...
##  $ : num [1:10] 0.0384 0.5677 0.9629 0.5131 0.0181 ...
##  $ : num [1:10] 0.184 0.268 0.477 0.263 0.107 ...
##  $ : num [1:10] 0.9318 0.2411 0.3267 0.0647 0.1426 ...
##  $ : num [1:10] 0.573 0.364 0.524 0.604 0.119 ...

If we wanted the vectors to have length \(1, 4, 9, 16,\) and 25, then we could write:

lst <- vector(mode = "list", length = 5)
for ( i in 1:5 ) {
  lst[[i]] <- runif(i^2)
}
str(lst)
## List of 5
##  $ : num 0.647
##  $ : num [1:4] 0.394 0.619 0.477 0.136
##  $ : num [1:9] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
##  $ : num [1:16] 0.409 0.54 0.961 0.654 0.547 ...
##  $ : num [1:25] 0.96407 0.07147 0.95581 0.94798 0.00119 ...

In the first example, the elements in the vector 1:5 didn’t matter—we wanted a vector of length ten each time—and in the second case the elements in the 1:5 did matter, in that they determined the lengths of the five vectors produced. Of course in general we could apply runif() to each element of any vector at all, like this:

vec <- c(5, 7, 8, 2, 9)
lst <- vector(mode = "list", length = length(vec))
for ( i in seq_along(vec) ) {
  lst[[i]] <- runif(vec[i])
}
str(lst)
## List of 5
##  $ : num [1:5] 0.647 0.394 0.619 0.477 0.136
##  $ : num [1:7] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
##  $ : num [1:8] 0.826 0.423 0.409 0.54 0.961 ...
##  $ : num [1:2] 0.1968 0.0779
##  $ : num [1:9] 0.818 0.942 0.884 0.166 0.355 ...

If we can apply runif() to each element of a vector, why not apply an arbitrary function to each element? That’s what the function map() will do for us. The general form of map() is:

map(.x, .f, ...)

In the template above:

  • .x can be a list or any atomic vector;
  • .f is a function that is to be applied to each element of .x. In the default operation of map(), each element of .x becomes in turn the first argument of .f.
  • ... consists of other arguments that are supplied as arguments for the .f function, in case you have to set other parameters of the function in order to get it to perform in the way you would like.

The result is always a list.

With map() we can get the list in our second example as follows:

howMany <- c(5, 7, 8, 2, 9)
lst <- 
  howMany %>%
  map(runif)
str(lst)
## List of 5
##  $ : num [1:5] 0.647 0.394 0.619 0.477 0.136
##  $ : num [1:7] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
##  $ : num [1:8] 0.826 0.423 0.409 0.54 0.961 ...
##  $ : num [1:2] 0.1968 0.0779
##  $ : num [1:9] 0.818 0.942 0.884 0.166 0.355 ...

If we had wanted the random numbers to be between—say—4 and 8, then we would supply extra arguments to runif() as follows:

lst <- 
  howMany %>%
  map(runif, min = 4, max = 8)
str(lst)
## List of 5
##  $ : num [1:5] 6.59 5.58 6.47 5.91 4.54
##  $ : num [1:7] 4.27 4.52 5.57 4.01 6.48 ...
##  $ : num [1:8] 7.3 5.69 5.64 6.16 7.84 ...
##  $ : num [1:2] 4.79 4.31
##  $ : num [1:9] 7.27 7.77 7.54 4.66 5.42 ...

The default behavior of map() is that the .x vector supplies the first argument of .f. However, if some ... parameters are supplied then .x substitutes for the first parameter that is not mentioned in .... In the above example, the min and maxparameters are the second and third parameters for runif() so .x substitutes for the first parameter—the one that determines how many random numbers will be generated. In the example below, the vector lowerBounds substitutes for min, the second parameter of runif():

lowerBounds <- 1:3
lowerBounds %>%
  map(runif, n = 2, max = 8)
## [[1]]
## [1] 3.008575 5.964877
## 
## [[2]]
## [1] 6.514459 6.282105
## 
## [[3]]
## [1] 6.688118 7.431739

Sometimes we wish to vary two or more of the parameters of function. In that case we use pmap(). The first parameter of pmap() is named .l and takes a list of vectors (or lists). For example:

howMany <- c(3,1,4)
upperBounds <- c(1, 5, 10)
list(howMany, upperBounds) %>%
  pmap(runif, min = 0)
## [[1]]
## [1] 0.4142409 0.5140328 0.6190231
## 
## [[2]]
## [1] 0.193564
## 
## [[3]]
## [1] 3.0889976 7.4509632 3.6170952 0.2348365

Observe that pmap() knows to interpret the first element of the input-list—the vector howmany as giving the values of the first argument of runif(). The second parameter of runif() (min) is set at 0, so pmap() deduces that upperBounds—the second element of the input-list—gives the values for the next next parameter in line, the parameter max.

One might just as well use pmap() to vary all three parameters:

howMany <- c(3,1,4)
lowerBounds <- c(-5, 0, 5)
upperBounds <- c(0, 5, 10)
args <- list(howMany, lowerBounds, upperBounds)
args %>%
  pmap(runif) %>% 
  str()
## List of 3
##  $ : num [1:3] -0.00743 -2.35882 -3.90104
##  $ : num 1.01
##  $ : num [1:4] 5.38 9.16 8.45 9.67

The .f parameter can be any function, including one that you define yourself. Here’s an example:

rLetters <- function(n, upper) {
  if ( upper ) {
    sample(LETTERS, size = n, replace = TRUE)
  } else {
    sample(letters, size = n, replace = TRUE)
  }
}
george <- c(3, 6, 9)                # vary number of letters to pick
bettina <- c(TRUE, FALSE, TRUE)  # vary the case (upper, lower)
list(george, bettina) %>% 
  pmap(rLetters)
## [[1]]
## [1] "U" "D" "C"
## 
## [[2]]
## [1] "j" "a" "s" "d" "m" "k"
## 
## [[3]]
## [1] "S" "S" "R" "A" "V" "P" "A" "N" "B"

You could also set f to be a function that you write on the spot, without even bothering to give it a name:

c(1, 3, 5) %>% 
  map(function(x) runif(3, min = 0, max = x))
## [[1]]
## [1] 0.4170589 0.5614115 0.6146708
## 
## [[2]]
## [1] 1.0992127 0.4576997 0.6006109
## 
## [[3]]
## [1] 1.134999 2.752574 2.083488

In computer programming a function is called anonymous when it is not the value bound to some name. .

map() allows a shortcut for defining anonymous functions. The above call could have been written as:

c(1, 3, 5) %>% 
  map(~runif(3, min = 0, max = .))
## [[1]]
## [1] 0.1435887 0.8404723 0.9722919
## 
## [[2]]
## [1] 1.217995 1.631378 1.481980
## 
## [[3]]
## [1] 4.475972 3.900816 4.272925

The ~ indicates that the body of the function is about to be begin. The . stands for the parameter of the function.

When we introduced map() we said that .x was a vector or a list, In fact .x could be an object that can be coerced into a list. Hence it is quite common to use map() with the data frames: the frame is turned into a list, each element of which is a column of the frame. Here is an example:

data("m111survey", package = "bcscr")
numberNA <-
  m111survey %>% 
  map(~sum(is.na(.)))
str(numberNA)
## List of 12
##  $ height         : int 0
##  $ ideal_ht       : int 2
##  $ sleep          : int 0
##  $ fastest        : int 0
##  $ weight_feel    : int 0
##  $ love_first     : int 0
##  $ extra_life     : int 0
##  $ seat           : int 0
##  $ GPA            : int 1
##  $ enough_Sleep   : int 0
##  $ sex            : int 0
##  $ diff.ideal.act.: int 2

Note that the elements of the returned list inherit the names of the input data frame. This holds for any named input:

numbers <- c(1, 3, 5)
names(numbers) <- c("one", "three", "five")
numbers %>% 
  map(~runif(3, min = 0, max = .))
## $one
## [1] 0.2331179 0.9318823 0.9580263
## 
## $three
## [1] 2.9757873 1.5267223 0.6133808
## 
## $five
## [1] 0.07197437 4.77782735 3.19760884

When the result can take on a form more simple than a list, it is possible to use variants of map() such as:

  • map_int()
  • map_dbl()
  • map_lgl()
  • map_chr()

Thus we could obtain a named integer vector of the number of NA-values for each variable in m11survey as follows:

numberNA <-
  m111survey %>% 
  map_int(~sum(is.na(.)))
numberNA
##          height        ideal_ht           sleep         fastest     weight_feel 
##               0               2               0               0               0 
##      love_first      extra_life            seat             GPA    enough_Sleep 
##               0               0               0               1               0 
##             sex diff.ideal.act. 
##               0               2

Here are the types of each variable:

m111survey %>% 
  map_chr(typeof)
##          height        ideal_ht           sleep         fastest     weight_feel 
##        "double"        "double"        "double"       "integer"       "integer" 
##      love_first      extra_life            seat             GPA    enough_Sleep 
##       "integer"       "integer"       "integer"        "double"       "integer" 
##             sex diff.ideal.act. 
##       "integer"        "double"

Here is a statement of whether or not each variable is a factor:

m111survey %>% 
  map_lgl(is.factor)
##          height        ideal_ht           sleep         fastest     weight_feel 
##           FALSE           FALSE           FALSE           FALSE            TRUE 
##      love_first      extra_life            seat             GPA    enough_Sleep 
##            TRUE            TRUE            TRUE           FALSE            TRUE 
##             sex diff.ideal.act. 
##            TRUE           FALSE

14.3.2 walk() and Variations

walk() is similar to map(), but is used when we are interested in producing side-effects. It applies its .f argument to each element of .x is was given, but also returns the .x in case we want to pipe it into some other function.

Here we use walk() only for its side-effect: we re-write a familiar function to print a pattern to the Console without using a loop.

pattern <- function(char = "*", n = 5) {
  lineLength <- c(1:n, (n-1):1)
  theLine <- function(char, n) {
    cat(rep(char, times = n), "\n", sep = "")
  }
  lineLength %>% walk(theLine, char = char)
}

pattern(char = "a", n = 7)
## a
## aa
## aaa
## aaaa
## aaaaa
## aaaaaa
## aaaaaaa
## aaaaaa
## aaaaa
## aaaa
## aaa
## aa
## a

The next example illustrates the use of the return-value of walk(). We would like to save plots of all numerical variables from the data frame m111survey, and also print summaries of them to the Console.

First we create a directory to hold the plots:

if ( !dir.exists("plots") ) dir.create("plots")

Next, we get the numerical variables in m111survey:

numericals <-
  m111survey %>% 
  keep(is.numeric)   # purrr::keep()

We used purrr::keep(), which retains only the elements of its input .x such that its second argument .p ( a function that returns a single TRUE or FALSE) returns TRUE.

We will also need the names of the numerical variables:

numNames <-
  numericals %>% 
  names()

We need a function to save the density plot of a single numerical variable:

saveGraph <- function(var, varname) {
  p <-
    ggplot(data = NULL, aes(x = var)) +
    geom_density(fill = "burlywood") +
    labs(title = paste0("Density plot for ",
                        varname, "."))
  ggsave(filename = paste0("plots/density_", varname, ".png"),
         plot = p, device = "png")
}

We also need a function to produce a summary of a single numerical variable:

makeSummary <- function(x, varname) {
  five <- fivenum(x, na.rm = TRUE)
  list(variable = varname,
       min = five[1],
       Q1 = five[2],
       median = five[3],
       Q3 = five[4],
       max = five[5])
}

Now we walk through the process. We will actually use the functionpwalk(), which will take the following inputs:

  • .x (a list with two elements: the data frame of numerical variables and the vector of the names of these variables), and
  • .f (the function saveGraph, to make and save a density plot)

We also use pmap_dfr(), which takes a list consisting of the data frame and variable-names and constructs a data frame row-by-row, with each row summarizing one of the variables.

list(numericals, numNames) %>% 
  pwalk(saveGraph) %>%  # returns the list
  pmap_dfr(makeSummary)
## # A tibble: 6 x 6
##          variable   min    Q1  median     Q3   max
##             <chr> <dbl> <dbl>   <dbl>  <dbl> <dbl>
## 1          height  51.0  65.0  68.000  71.75    79
## 2        ideal_ht  54.0  67.0  68.000  75.00    90
## 3           sleep   2.0   5.0   7.000   7.00    10
## 4         fastest  60.0  90.5 102.000 119.50   190
## 5             GPA   1.9   2.9   3.225   3.56     4
## 6 diff.ideal.act.  -4.0   0.0   2.000   3.00    18

Check the plots directory; it should contain these files:

  • density_diff.ideal.act.png
  • density_fastest.png
  • density.GPA.png
  • density_height.png
  • density_ideal_ht.png
  • density_sleep.png

14.3.3 Practice Exercises

  1. Use map() to produce a list of the squares of the whole numbers from 1 to 10.

  2. Use map_dbl() to produce a numerical vector of the squares of the whole numbers from 1 to 10.

  3. Use map_chr to state the type of each element of the following list:

    lst <- list(
      letters,
      seq(2, 20, by = 2),
      c(1L, 5L, 7L),
      1:10 > 5.5
    )
  4. Here are some people:

    people <- c("Bettina", "Raj", "Isabella", "Khalil")

    The following vector tells whether or not each person is a Grand Poo-Bah:

    status <- c("humble", "poobah", "poobah", "humble")

    Use pwalk() to properly greet each person. The result in the console should be as follows:

    ## Yo, dawg.
    ## Hail, O Grand Poo-Bah Raj!
    ## Hail, O Grand Poo-Bah Isabella!
    ## Yo, dawg.

14.3.4 Solutions to the Practice Exercises

  1. Try this:

    map(1:10, ~.^2)
    ## [[1]]
    ## [1] 1
    ## 
    ## [[2]]
    ## [1] 4
    ## 
    ## [[3]]
    ## [1] 9
    ## 
    ## [[4]]
    ## [1] 16
    ## 
    ## [[5]]
    ## [1] 25
    ## 
    ## [[6]]
    ## [1] 36
    ## 
    ## [[7]]
    ## [1] 49
    ## 
    ## [[8]]
    ## [1] 64
    ## 
    ## [[9]]
    ## [1] 81
    ## 
    ## [[10]]
    ## [1] 100

    This is more verbose, but works just as well:

    map(1:10, function(x) x^2)
    ## [[1]]
    ## [1] 1
    ## 
    ## [[2]]
    ## [1] 4
    ## 
    ## [[3]]
    ## [1] 9
    ## 
    ## [[4]]
    ## [1] 16
    ## 
    ## [[5]]
    ## [1] 25
    ## 
    ## [[6]]
    ## [1] 36
    ## 
    ## [[7]]
    ## [1] 49
    ## 
    ## [[8]]
    ## [1] 64
    ## 
    ## [[9]]
    ## [1] 81
    ## 
    ## [[10]]
    ## [1] 100
  2. Try this:

    map_dbl(1:10, ~.^2)
    ##  [1]   1   4   9  16  25  36  49  64  81 100

    Again the more verbose approach works just as well:

    map_dbl(1:10, function(x) x^2)
    ##  [1]   1   4   9  16  25  36  49  64  81 100
  3. Try this:

    map_chr(lst, typeof)
    ## [1] "character" "double"    "integer"   "logical"
  4. Try this:

    list(people, status) %>% 
      pwalk(function(person, type) {
        if ( type == "poobah" ) {
          cat("Hail, O Grand Poo-Bah ",
              person, "!\n", sep = "")
        } else {
          cat("Yo, dawg.\n")
        }
      })