14.4 Other purrr Higher-Order Functions

14.4.1 keep() and discard()

keep() is similar to dplyr’s filter(), but whereas filter() chooses rows of a data frame based on a given condition, keep() chooses the elements of the input list or vector .x based on a condition named .p.

Examples:

# keep the numbers that are 1 more than a multiple of 3
1:20 %>% 
  keep(.p = ~ . %% 3 == 1)
## [1]  1  4  7 10 13 16 19
# keep the factors in m111survey
m111survey %>% 
  keep(is.factor) %>% 
  str()
## 'data.frame':    71 obs. of  6 variables:
##  $ weight_feel : Factor w/ 3 levels "1_underweight",..: 1 2 2 1 1 3 2 2 2 3 ...
##  $ love_first  : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ extra_life  : Factor w/ 2 levels "no","yes": 2 2 1 1 2 1 2 2 2 1 ...
##  $ seat        : Factor w/ 3 levels "1_front","2_middle",..: 1 2 2 1 3 1 1 3 3 2 ...
##  $ enough_Sleep: Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 2 1 2 ...
##  $ sex         : Factor w/ 2 levels "female","male": 2 2 1 1 2 2 2 2 1 1 ...

discard(.x,, . p = condition) is equivalent to keep(.x, .p = !condition). Thus:

# discard numbers that are 1 more than a multiple of 3
1:20 %>% 
  discard(.p = ~ . %% 3 == 1)
##  [1]  2  3  5  6  8  9 11 12 14 15 17 18 20
# discard the factors in m111survey
m111survey %>% 
  discard(is.factor) %>% 
  str()
## 'data.frame':    71 obs. of  6 variables:
##  $ height         : num  76 74 64 62 72 70.8 70 79 59 67 ...
##  $ ideal_ht       : num  78 76 NA 65 72 NA 72 76 61 67 ...
##  $ sleep          : num  9.5 7 9 7 8 10 4 6 7 7 ...
##  $ fastest        : int  119 110 85 100 95 100 85 160 90 90 ...
##  $ GPA            : num  3.56 2.5 3.8 3.5 3.2 3.1 3.68 2.7 2.8 NA ...
##  $ diff.ideal.act.: num  2 2 NA 3 0 NA 2 -3 2 0 ...

14.4.2 reduce()

Another important member of the purrr family is reduce() . Given a vector .x and a function .f that takes two inputs, reduce() does the following:

  • applies f to elements 1 and 2 of .x, getting a result;
  • applies f to the result and to element 3 of .x, getting another result;
  • applies f to this new result and to element 4 of .x, getting yet another result …
  • … and so on until all of the elements of .x have been exhausted.
  • then reduce() returns the final result in the above series of operations.

For example, suppose that you want to add up the elements of the vector:

vec <- c(3, 1, 4, 6)

Of course you could just use:

sum(vec)
## [1] 14

After all, sum() has been written to apply to many elements at once. But what if addition could only be done two numbers at a time? How might you proceed? You could:

  • add the 3 and 1 of (the first two elements of vec), getting 4;
  • then add 4 to 4, the third element of vec, getting 8;
  • then add 8 to 6, the final element of vec, getting 14;
  • then return 14.

reduce() operates in this way.

vec %>%
  reduce(.f = sum)
## [1] 14

Can you see how reduce() gets its name? Step by step, it “reduces” its .x argument, which may consist of many elements, to a single value.

A common application of reduce() is to take an operation that is defined on only two items and extend it to operate on any number of items. Consider, for example, the function intersect(), , which will find the intersection of any two vectors of the same type:

vec1 <- c(3, 4, 5, 6)
vec2 <- c(4, 6, 8, -4)
intersect(vec1, vec2)
## [1] 4 6

You cannot intersect three or more vectors at once:

intersect(vec1, vec2, c(4, 7, 9))
## Error in base::intersect(x, y, ...) : unused argument (c(4, 7, 9))

With reduce() you can intersect as many vectors as you like, provided that they are first stored in a list.

lst <- list(c("Akash", "Bipan", "Chandra", "Devadatta", "Raj"),
            c("Raj", "Vikram", "Sita", "Akash", "Chandra"),
            c("Akash", "Raj", "Chandra", "Bipan", "Lila"),
            c("Akash", "Vikram", "Devadatta", "Raj", "Lila"))
lst %>% 
  reduce(intersect)
## [1] "Akash" "Raj"

You can write your own function to supply as the argument for .f, but it has to be able to operate on two arguments. reduce() will take the first argument of the .f function to be what has been “accumulated” so far, and the second argument of the .f function—the value to be combined with what has been accumulated—will be provided by the current element of .x.

As a simple example, let’s write our own reduce-summer in a way that shows the user the reduction process at work:

## the .f function:
mySummer <- function(acc, curr) {
  cat("So far I have ", acc, ",\n")
  cat("But just now I was given " , curr, " to add in.\n\n", sep = "")
  sum(acc, curr)
}

## .x will be the whole numbers from 1 to 4:
1:4 %>% 
  reduce(.f = mySummer)
## So far I have  1 ,
## But just now I was given 2 to add in.
## 
## So far I have  3 ,
## But just now I was given 3 to add in.
## 
## So far I have  6 ,
## But just now I was given 4 to add in.
## [1] 10

When you write your own .f function, it’s a good idea to use names for the parameters that remind you of their role in the reduction process. acc (for “accumulated”) and curr (for “current”) are used above.

reduce() can take an argument called .init. When this argument is given a value, operation begins by applying to .f to .init and the first element of .x. For example:

1:4 %>% 
  reduce(.f = mySummer, .init = 100)
## So far I have  100 ,
## But just now I was given 1 to add in.
## 
## So far I have  101 ,
## But just now I was given 2 to add in.
## 
## So far I have  103 ,
## But just now I was given 3 to add in.
## 
## So far I have  106 ,
## But just now I was given 4 to add in.
## [1] 110

14.4.2.1 An Extended Example of Reduction

Let’s apply reduce() with .init to the task of making a truth table: the set of all \(2^n\) logical vectors of a given length \(n\).

The set \(S_1\) of vectors of length \(n = 1\) consists of only two vectors:

##           
## vec1  TRUE
## vec2 FALSE

Now consider a systematic way to construct the set \(S_2\) of all the vectors of length two. We know that there are four such vectors:

##                 
## vec1  TRUE  TRUE
## vec2  TRUE FALSE
## vec3 FALSE  TRUE
## vec4 FALSE FALSE

Observe that the first two of them begin with TRUE and end with the set \(S_1\) of vectors of length one:

##                
## vec1 TRUE  TRUE
## vec2 TRUE FALSE

The last two of them begin with FALSE and also end with \(S_1\):

##                 
## vec3 FALSE  TRUE
## vec4 FALSE FALSE

Now consider \(S_3\), the set of all eight vectors of length three:

##                       
## vec1  TRUE  TRUE  TRUE
## vec2  TRUE  TRUE FALSE
## vec3  TRUE FALSE  TRUE
## vec4  TRUE FALSE FALSE
## vec5 FALSE  TRUE  TRUE
## vec6 FALSE  TRUE FALSE
## vec7 FALSE FALSE  TRUE
## vec8 FALSE FALSE FALSE

Observe that the first four of them end begin with TRUE and and with the vectors of \(S_2\):

##                      
## vec1 TRUE  TRUE  TRUE
## vec2 TRUE  TRUE FALSE
## vec3 TRUE FALSE  TRUE
## vec4 TRUE FALSE FALSE

The last four of them begin with FALSE and also end with the vectors of \(S_2\):

##                       
## vec5 FALSE  TRUE  TRUE
## vec6 FALSE  TRUE FALSE
## vec7 FALSE FALSE  TRUE
## vec8 FALSE FALSE FALSE

The pattern is now clear. If for any \(m \ge 1\) you are in possession of the \(2^m \times m\) matrix \(S_m\) of all possible vectors of length \(m\), then to obtain the \(2^{m+1} \times (m+1)\) matrix \(S_{m+1}\) of all possible vectors of length \(m+1\) you should:

  • stack \(2^m\) TRUEs on top of \(2^m\) FALSEs, creating a \(2^{m+1} \times 1\) matrix \(U\);
  • stack the \(S_m\) underneath itself, creating a \(2^{m+1} \times m\) matrix \(V\);
  • place \(U\) next to \(V\).

reduce() with .init set to \(S_1\) is appropriate for this iterative building process. Here is an implementation:

makeTable <- function(n, verbose = FALSE) {
  # make .init (S_1)
  s1 <- matrix(c(TRUE, FALSE), nrow = 2)
  rownames(s1) <- c("vec1", "vec2")
  colnames(s1) <- c("")
  
  # make .f
  buildNext <- function(accum, value) {
    if ( verbose ) {
      cat("On value ", value, 
          " with accumalated material:",
          sep = "")
      print(accum)
    }
    if ( value == 1 ) return(accum)
    r <- nrow(accum)
    u <- c(rep(TRUE, times = r),
           rep(FALSE, times = r))
    v <- rbind(accum, accum)
    nextMatrix <- cbind(u, v)
    colnames(nextMatrix) <- rep("", times = value)
    rownames(nextMatrix) <- paste0("vec", 1:(2^value), sep = "")
    if ( verbose ) {
      cat("Finishing value", value, 
          ", and I've built:",
          sep = "")
      print(nextMatrix)
      cat("\n\n")
    }
    nextMatrix
  }
  
  # build from .init to the final product S_n
  reduce(.x = 1:n, .f = buildNext, .init = s1)
}

We have included a verbose option so we can watch the process as it unfolds.

Note also that the parameters for the .f function are named:

  • acc (what has been “accumulated” up to the current step), and
  • value (the value of .x at the current step).

It’s conventional to give these or similar names to the parameters of the building-function.

Let’s try it out:

makeTable(3, verbose = TRUE)
## On value 1 with accumalated material:          
## vec1  TRUE
## vec2 FALSE
## On value 2 with accumalated material:          
## vec1  TRUE
## vec2 FALSE
## Finishing value2, and I've built:                
## vec1  TRUE  TRUE
## vec2  TRUE FALSE
## vec3 FALSE  TRUE
## vec4 FALSE FALSE
## 
## 
## On value 3 with accumalated material:                
## vec1  TRUE  TRUE
## vec2  TRUE FALSE
## vec3 FALSE  TRUE
## vec4 FALSE FALSE
## Finishing value3, and I've built:                      
## vec1  TRUE  TRUE  TRUE
## vec2  TRUE  TRUE FALSE
## vec3  TRUE FALSE  TRUE
## vec4  TRUE FALSE FALSE
## vec5 FALSE  TRUE  TRUE
## vec6 FALSE  TRUE FALSE
## vec7 FALSE FALSE  TRUE
## vec8 FALSE FALSE FALSE
##                       
## vec1  TRUE  TRUE  TRUE
## vec2  TRUE  TRUE FALSE
## vec3  TRUE FALSE  TRUE
## vec4  TRUE FALSE FALSE
## vec5 FALSE  TRUE  TRUE
## vec6 FALSE  TRUE FALSE
## vec7 FALSE FALSE  TRUE
## vec8 FALSE FALSE FALSE

Of course in practice we would not turn on the verbose option:

makeTable(4)
##                              
## vec1   TRUE  TRUE  TRUE  TRUE
## vec2   TRUE  TRUE  TRUE FALSE
## vec3   TRUE  TRUE FALSE  TRUE
## vec4   TRUE  TRUE FALSE FALSE
## vec5   TRUE FALSE  TRUE  TRUE
## vec6   TRUE FALSE  TRUE FALSE
## vec7   TRUE FALSE FALSE  TRUE
## vec8   TRUE FALSE FALSE FALSE
## vec9  FALSE  TRUE  TRUE  TRUE
## vec10 FALSE  TRUE  TRUE FALSE
## vec11 FALSE  TRUE FALSE  TRUE
## vec12 FALSE  TRUE FALSE FALSE
## vec13 FALSE FALSE  TRUE  TRUE
## vec14 FALSE FALSE  TRUE FALSE
## vec15 FALSE FALSE FALSE  TRUE
## vec16 FALSE FALSE FALSE FALSE

14.4.3 Practice Exercises

  1. The operator * (multiplication) is really a function:

    `*`(3,5)
    ## [1] 15

    But it can only multiply two numbers at once. The R-function prod() cna handle as many numbers as you like:

    prod(3,5,2,7)
    ## [1] 210

    Use reduce() and * to write your own function product() that takes a numerical vector vec and returns the product of the elements of the vector. It should work liek this:

    product(vec = c(3,4,5))
    ## [1] 60

    (Hint: in the call to reduce() you will have to the refer to the *-function as `*`.)

  2. Modify the function product() so that it in a single call to reduce() it multiplies the number 2 by the product of the elements of vec. (Hint: set .init to an appropriate value.)

  3. The data frame iris gives information on 150 irises. Use keep() to create a new data frame that includes only the numerical variables having a mean greater than 3.5.

14.4.4 Solutions to the Practice Exercises

  1. Try this:

    product <- function(vec) {
      reduce(vec, .f = `*`)
    }
  2. Try this:

    product <- function(vec) {
      reduce(vec, .f = `*`, .init = 2)
    }
  3. Try this:

    bigIris <-
      iris %>%
      keep(is.numeric) %>% 
      keep(~mean(.) > 3.5)
    str(bigIris)
    ## 'data.frame':    150 obs. of  2 variables:
    ##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    ##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...

    The following does not work. Why?

    bigIris <-
      iris %>%
      keep(function(x) {
        is.numeric(x) & mean(x) > 3.5
      })
    }