14.4 Other purrr Higher-Order Functions
14.4.1 keep()
and discard()
keep()
is similar to dplyr’s filter()
, but whereas filter()
chooses rows of a data frame based on a given condition, keep()
chooses the elements of the input list or vector .x
based on a condition named .p
.
Examples:
# keep the numbers that are 1 more than a multiple of 3
1:20 %>%
keep(.p = ~ . %% 3 == 1)
## [1] 1 4 7 10 13 16 19
# keep the factors in m111survey
%>%
m111survey keep(is.factor) %>%
str()
## 'data.frame': 71 obs. of 6 variables:
## $ weight_feel : Factor w/ 3 levels "1_underweight",..: 1 2 2 1 1 3 2 2 2 3 ...
## $ love_first : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ extra_life : Factor w/ 2 levels "no","yes": 2 2 1 1 2 1 2 2 2 1 ...
## $ seat : Factor w/ 3 levels "1_front","2_middle",..: 1 2 2 1 3 1 1 3 3 2 ...
## $ enough_Sleep: Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 2 1 2 ...
## $ sex : Factor w/ 2 levels "female","male": 2 2 1 1 2 2 2 2 1 1 ...
discard(.x,, . p = condition)
is equivalent to keep(.x, .p = !condition)
. Thus:
# discard numbers that are 1 more than a multiple of 3
1:20 %>%
discard(.p = ~ . %% 3 == 1)
## [1] 2 3 5 6 8 9 11 12 14 15 17 18 20
# discard the factors in m111survey
%>%
m111survey discard(is.factor) %>%
str()
## 'data.frame': 71 obs. of 6 variables:
## $ height : num 76 74 64 62 72 70.8 70 79 59 67 ...
## $ ideal_ht : num 78 76 NA 65 72 NA 72 76 61 67 ...
## $ sleep : num 9.5 7 9 7 8 10 4 6 7 7 ...
## $ fastest : int 119 110 85 100 95 100 85 160 90 90 ...
## $ GPA : num 3.56 2.5 3.8 3.5 3.2 3.1 3.68 2.7 2.8 NA ...
## $ diff.ideal.act.: num 2 2 NA 3 0 NA 2 -3 2 0 ...
14.4.2 reduce()
Another important member of the purrr family is reduce()
. Given a vector .x
and a function .f
that takes two inputs, reduce()
does the following:
- applies
f
to elements 1 and 2 of.x
, getting a result; - applies
f
to the result and to element 3 of.x
, getting another result; - applies
f
to this new result and to element 4 of.x
, getting yet another result … - … and so on until all of the elements of
.x
have been exhausted. - then
reduce()
returns the final result in the above series of operations.
For example, suppose that you want to add up the elements of the vector:
<- c(3, 1, 4, 6) vec
Of course you could just use:
sum(vec)
## [1] 14
After all, sum()
has been written to apply to many elements at once. But what if addition could only be done two numbers at a time? How might you proceed? You could:
- add the 3 and 1 of (the first two elements of
vec
), getting 4; - then add 4 to 4, the third element of
vec
, getting 8; - then add 8 to 6, the final element of
vec
, getting 14; - then return 14.
reduce()
operates in this way.
%>%
vec reduce(.f = sum)
## [1] 14
Can you see how reduce()
gets its name? Step by step, it “reduces” its .x
argument, which may consist of many elements, to a single value.
A common application of reduce()
is to take an operation that is defined on only two items and extend it to operate on any number of items. Consider, for example, the function intersect()
, , which will find the intersection of any two vectors of the same type:
<- c(3, 4, 5, 6)
vec1 <- c(4, 6, 8, -4)
vec2 intersect(vec1, vec2)
## [1] 4 6
You cannot intersect three or more vectors at once:
intersect(vec1, vec2, c(4, 7, 9))
## Error in base::intersect(x, y, ...) : unused argument (c(4, 7, 9))
With reduce()
you can intersect as many vectors as you like, provided that they are first stored in a list.
<- list(c("Akash", "Bipan", "Chandra", "Devadatta", "Raj"),
lst c("Raj", "Vikram", "Sita", "Akash", "Chandra"),
c("Akash", "Raj", "Chandra", "Bipan", "Lila"),
c("Akash", "Vikram", "Devadatta", "Raj", "Lila"))
%>%
lst reduce(intersect)
## [1] "Akash" "Raj"
You can write your own function to supply as the argument for .f
, but it has to be able to operate on two arguments. reduce()
will take the first argument of the .f
function to be what has been “accumulated” so far, and the second argument of the .f
function—the value to be combined with what has been accumulated—will be provided by the current element of .x
.
As a simple example, let’s write our own reduce-summer in a way that shows the user the reduction process at work:
## the .f function:
<- function(acc, curr) {
mySummer cat("So far I have ", acc, ",\n")
cat("But just now I was given " , curr, " to add in.\n\n", sep = "")
sum(acc, curr)
}
## .x will be the whole numbers from 1 to 4:
1:4 %>%
reduce(.f = mySummer)
## So far I have 1 ,
## But just now I was given 2 to add in.
##
## So far I have 3 ,
## But just now I was given 3 to add in.
##
## So far I have 6 ,
## But just now I was given 4 to add in.
## [1] 10
When you write your own .f
function, it’s a good idea to use names for the parameters that remind you of their role in the reduction process. acc
(for “accumulated”) and curr
(for “current”) are used above.
reduce()
can take an argument called .init
. When this argument is given a value, operation begins by applying to .f
to .init
and the first element of .x
. For example:
1:4 %>%
reduce(.f = mySummer, .init = 100)
## So far I have 100 ,
## But just now I was given 1 to add in.
##
## So far I have 101 ,
## But just now I was given 2 to add in.
##
## So far I have 103 ,
## But just now I was given 3 to add in.
##
## So far I have 106 ,
## But just now I was given 4 to add in.
## [1] 110
14.4.2.1 An Extended Example of Reduction
Let’s apply reduce()
with .init
to the task of making a truth table: the set of all \(2^n\) logical vectors of a given length \(n\).
The set \(S_1\) of vectors of length \(n = 1\) consists of only two vectors:
##
## vec1 TRUE
## vec2 FALSE
Now consider a systematic way to construct the set \(S_2\) of all the vectors of length two. We know that there are four such vectors:
##
## vec1 TRUE TRUE
## vec2 TRUE FALSE
## vec3 FALSE TRUE
## vec4 FALSE FALSE
Observe that the first two of them begin with TRUE
and end with the set \(S_1\) of vectors of length one:
##
## vec1 TRUE TRUE
## vec2 TRUE FALSE
The last two of them begin with FALSE
and also end with \(S_1\):
##
## vec3 FALSE TRUE
## vec4 FALSE FALSE
Now consider \(S_3\), the set of all eight vectors of length three:
##
## vec1 TRUE TRUE TRUE
## vec2 TRUE TRUE FALSE
## vec3 TRUE FALSE TRUE
## vec4 TRUE FALSE FALSE
## vec5 FALSE TRUE TRUE
## vec6 FALSE TRUE FALSE
## vec7 FALSE FALSE TRUE
## vec8 FALSE FALSE FALSE
Observe that the first four of them end begin with TRUE
and and with the vectors of \(S_2\):
##
## vec1 TRUE TRUE TRUE
## vec2 TRUE TRUE FALSE
## vec3 TRUE FALSE TRUE
## vec4 TRUE FALSE FALSE
The last four of them begin with FALSE
and also end with the vectors of \(S_2\):
##
## vec5 FALSE TRUE TRUE
## vec6 FALSE TRUE FALSE
## vec7 FALSE FALSE TRUE
## vec8 FALSE FALSE FALSE
The pattern is now clear. If for any \(m \ge 1\) you are in possession of the \(2^m \times m\) matrix \(S_m\) of all possible vectors of length \(m\), then to obtain the \(2^{m+1} \times (m+1)\) matrix \(S_{m+1}\) of all possible vectors of length \(m+1\) you should:
- stack \(2^m\)
TRUE
s on top of \(2^m\)FALSE
s, creating a \(2^{m+1} \times 1\) matrix \(U\); - stack the \(S_m\) underneath itself, creating a \(2^{m+1} \times m\) matrix \(V\);
- place \(U\) next to \(V\).
reduce()
with .init
set to \(S_1\) is appropriate for this iterative building process. Here is an implementation:
<- function(n, verbose = FALSE) {
makeTable # make .init (S_1)
<- matrix(c(TRUE, FALSE), nrow = 2)
s1 rownames(s1) <- c("vec1", "vec2")
colnames(s1) <- c("")
# make .f
<- function(accum, value) {
buildNext if ( verbose ) {
cat("On value ", value,
" with accumalated material:",
sep = "")
print(accum)
}if ( value == 1 ) return(accum)
<- nrow(accum)
r <- c(rep(TRUE, times = r),
u rep(FALSE, times = r))
<- rbind(accum, accum)
v <- cbind(u, v)
nextMatrix colnames(nextMatrix) <- rep("", times = value)
rownames(nextMatrix) <- paste0("vec", 1:(2^value), sep = "")
if ( verbose ) {
cat("Finishing value", value,
", and I've built:",
sep = "")
print(nextMatrix)
cat("\n\n")
}
nextMatrix
}
# build from .init to the final product S_n
reduce(.x = 1:n, .f = buildNext, .init = s1)
}
We have included a verbose
option so we can watch the process as it unfolds.
Note also that the parameters for the .f
function are named:
acc
(what has been “accumulated” up to the current step), andvalue
(the value of.x
at the current step).
It’s conventional to give these or similar names to the parameters of the building-function.
Let’s try it out:
makeTable(3, verbose = TRUE)
## On value 1 with accumalated material:
## vec1 TRUE
## vec2 FALSE
## On value 2 with accumalated material:
## vec1 TRUE
## vec2 FALSE
## Finishing value2, and I've built:
## vec1 TRUE TRUE
## vec2 TRUE FALSE
## vec3 FALSE TRUE
## vec4 FALSE FALSE
##
##
## On value 3 with accumalated material:
## vec1 TRUE TRUE
## vec2 TRUE FALSE
## vec3 FALSE TRUE
## vec4 FALSE FALSE
## Finishing value3, and I've built:
## vec1 TRUE TRUE TRUE
## vec2 TRUE TRUE FALSE
## vec3 TRUE FALSE TRUE
## vec4 TRUE FALSE FALSE
## vec5 FALSE TRUE TRUE
## vec6 FALSE TRUE FALSE
## vec7 FALSE FALSE TRUE
## vec8 FALSE FALSE FALSE
##
## vec1 TRUE TRUE TRUE
## vec2 TRUE TRUE FALSE
## vec3 TRUE FALSE TRUE
## vec4 TRUE FALSE FALSE
## vec5 FALSE TRUE TRUE
## vec6 FALSE TRUE FALSE
## vec7 FALSE FALSE TRUE
## vec8 FALSE FALSE FALSE
Of course in practice we would not turn on the verbose
option:
makeTable(4)
##
## vec1 TRUE TRUE TRUE TRUE
## vec2 TRUE TRUE TRUE FALSE
## vec3 TRUE TRUE FALSE TRUE
## vec4 TRUE TRUE FALSE FALSE
## vec5 TRUE FALSE TRUE TRUE
## vec6 TRUE FALSE TRUE FALSE
## vec7 TRUE FALSE FALSE TRUE
## vec8 TRUE FALSE FALSE FALSE
## vec9 FALSE TRUE TRUE TRUE
## vec10 FALSE TRUE TRUE FALSE
## vec11 FALSE TRUE FALSE TRUE
## vec12 FALSE TRUE FALSE FALSE
## vec13 FALSE FALSE TRUE TRUE
## vec14 FALSE FALSE TRUE FALSE
## vec15 FALSE FALSE FALSE TRUE
## vec16 FALSE FALSE FALSE FALSE
14.4.3 Practice Exercises
The operator
*
(multiplication) is really a function:`*`(3,5)
## [1] 15
But it can only multiply two numbers at once. The R-function
prod()
cna handle as many numbers as you like:prod(3,5,2,7)
## [1] 210
Use
reduce()
and*
to write your own functionproduct()
that takes a numerical vectorvec
and returns the product of the elements of the vector. It should work liek this:product(vec = c(3,4,5))
## [1] 60
(Hint: in the call to
reduce()
you will have to the refer to the*
-function as`*`
.)Modify the function
product()
so that it in a single call toreduce()
it multiplies the number 2 by the product of the elements ofvec
. (Hint: set.init
to an appropriate value.)The data frame iris gives information on 150 irises. Use
keep()
to create a new data frame that includes only the numerical variables having a mean greater than 3.5.
14.4.4 Solutions to the Practice Exercises
Try this:
<- function(vec) { product reduce(vec, .f = `*`) }
Try this:
<- function(vec) { product reduce(vec, .f = `*`, .init = 2) }
Try this:
<- bigIris %>% iris keep(is.numeric) %>% keep(~mean(.) > 3.5) str(bigIris)
## 'data.frame': 150 obs. of 2 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
The following does not work. Why?
<- bigIris %>% iris keep(function(x) { is.numeric(x) & mean(x) > 3.5 })}