14.3 purrr Higher-Order Functions for Iteration
In the remainder of the Chapter we will study important higher-order functions: functions that take a function as an argument and apply that function to each element of another data structure. As we have said previously, such functions often serve as alternatives to loops.
The higher-order functions we study come from the package purrr, which is attached whenever we load the tidy-verse.
14.3.1 map()
and Variations
Suppose that we want to generate five vectors, each of which consists of ten numbers randomly chosen between 0 and 1. We accomplish the task with a loop, as follows:
# set up a list of length 10:
<- vector(mode = "list", length = 5)
lst for ( i in 1:5 ) {
<- runif(10)
lst[[i]]
}str(lst)
## List of 5
## $ : num [1:10] 0.546 0.271 0.189 0.267 0.956 ...
## $ : num [1:10] 0.0384 0.5677 0.9629 0.5131 0.0181 ...
## $ : num [1:10] 0.184 0.268 0.477 0.263 0.107 ...
## $ : num [1:10] 0.9318 0.2411 0.3267 0.0647 0.1426 ...
## $ : num [1:10] 0.573 0.364 0.524 0.604 0.119 ...
If we wanted the vectors to have length \(1, 4, 9, 16,\) and 25, then we could write:
<- vector(mode = "list", length = 5)
lst for ( i in 1:5 ) {
<- runif(i^2)
lst[[i]]
}str(lst)
## List of 5
## $ : num 0.647
## $ : num [1:4] 0.394 0.619 0.477 0.136
## $ : num [1:9] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
## $ : num [1:16] 0.409 0.54 0.961 0.654 0.547 ...
## $ : num [1:25] 0.96407 0.07147 0.95581 0.94798 0.00119 ...
In the first example, the elements in the vector 1:5
didn’t matter—we wanted a vector of length ten each time—and in the second case the elements in the 1:5
did matter, in that they determined the lengths of the five vectors produced. Of course in general we could apply runif()
to each element of any vector at all, like this:
<- c(5, 7, 8, 2, 9)
vec <- vector(mode = "list", length = length(vec))
lst for ( i in seq_along(vec) ) {
<- runif(vec[i])
lst[[i]]
}str(lst)
## List of 5
## $ : num [1:5] 0.647 0.394 0.619 0.477 0.136
## $ : num [1:7] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
## $ : num [1:8] 0.826 0.423 0.409 0.54 0.961 ...
## $ : num [1:2] 0.1968 0.0779
## $ : num [1:9] 0.818 0.942 0.884 0.166 0.355 ...
If we can apply runif()
to each element of a vector, why not apply an arbitrary function to each element? That’s what the function map()
will do for us. The general form of map()
is:
map(.x, .f, ...)
In the template above:
.x
can be a list or any atomic vector;.f
is a function that is to be applied to each element of.x
. In the default operation ofmap()
, each element of.x
becomes in turn the first argument of.f
....
consists of other arguments that are supplied as arguments for the.f
function, in case you have to set other parameters of the function in order to get it to perform in the way you would like.
The result is always a list.
With map()
we can get the list in our second example as follows:
<- c(5, 7, 8, 2, 9)
howMany <-
lst %>%
howMany map(runif)
str(lst)
## List of 5
## $ : num [1:5] 0.647 0.394 0.619 0.477 0.136
## $ : num [1:7] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
## $ : num [1:8] 0.826 0.423 0.409 0.54 0.961 ...
## $ : num [1:2] 0.1968 0.0779
## $ : num [1:9] 0.818 0.942 0.884 0.166 0.355 ...
If we had wanted the random numbers to be between—say—4 and 8, then we would supply extra arguments to runif()
as follows:
<-
lst %>%
howMany map(runif, min = 4, max = 8)
str(lst)
## List of 5
## $ : num [1:5] 6.59 5.58 6.47 5.91 4.54
## $ : num [1:7] 4.27 4.52 5.57 4.01 6.48 ...
## $ : num [1:8] 7.3 5.69 5.64 6.16 7.84 ...
## $ : num [1:2] 4.79 4.31
## $ : num [1:9] 7.27 7.77 7.54 4.66 5.42 ...
The default behavior of map()
is that the .x
vector supplies the first argument of .f
. However, if some ...
parameters are supplied then .x
substitutes for the first parameter that is not mentioned in ...
. In the above example, the min
and max
parameters are the second and third parameters for runif()
so .x
substitutes for the first parameter—the one that determines how many random numbers will be generated. In the example below, the vector lowerBounds
substitutes for min
, the second parameter of runif()
:
<- 1:3
lowerBounds %>%
lowerBounds map(runif, n = 2, max = 8)
## [[1]]
## [1] 3.008575 5.964877
##
## [[2]]
## [1] 6.514459 6.282105
##
## [[3]]
## [1] 6.688118 7.431739
Sometimes we wish to vary two or more of the parameters of function. In that case we use pmap()
. The first parameter of pmap()
is named .l
and takes a list of vectors (or lists). For example:
<- c(3,1,4)
howMany <- c(1, 5, 10)
upperBounds list(howMany, upperBounds) %>%
pmap(runif, min = 0)
## [[1]]
## [1] 0.4142409 0.5140328 0.6190231
##
## [[2]]
## [1] 0.193564
##
## [[3]]
## [1] 3.0889976 7.4509632 3.6170952 0.2348365
Observe that pmap()
knows to interpret the first element of the input-list—the vector howmany
as giving the values of the first argument of runif()
. The second parameter of runif()
(min
) is set at 0, so pmap()
deduces that upperBounds
—the second element of the input-list—gives the values for the next next parameter in line, the parameter max
.
One might just as well use pmap()
to vary all three parameters:
<- c(3,1,4)
howMany <- c(-5, 0, 5)
lowerBounds <- c(0, 5, 10)
upperBounds <- list(howMany, lowerBounds, upperBounds)
args %>%
args pmap(runif) %>%
str()
## List of 3
## $ : num [1:3] -0.00743 -2.35882 -3.90104
## $ : num 1.01
## $ : num [1:4] 5.38 9.16 8.45 9.67
The .f
parameter can be any function, including one that you define yourself. Here’s an example:
<- function(n, upper) {
rLetters if ( upper ) {
sample(LETTERS, size = n, replace = TRUE)
else {
} sample(letters, size = n, replace = TRUE)
}
}<- c(3, 6, 9) # vary number of letters to pick
george <- c(TRUE, FALSE, TRUE) # vary the case (upper, lower)
bettina list(george, bettina) %>%
pmap(rLetters)
## [[1]]
## [1] "U" "D" "C"
##
## [[2]]
## [1] "j" "a" "s" "d" "m" "k"
##
## [[3]]
## [1] "S" "S" "R" "A" "V" "P" "A" "N" "B"
You could also set f
to be a function that you write on the spot, without even bothering to give it a name:
c(1, 3, 5) %>%
map(function(x) runif(3, min = 0, max = x))
## [[1]]
## [1] 0.4170589 0.5614115 0.6146708
##
## [[2]]
## [1] 1.0992127 0.4576997 0.6006109
##
## [[3]]
## [1] 1.134999 2.752574 2.083488
In computer programming a function is called anonymous when it is not the value bound to some name. .
map()
allows a shortcut for defining anonymous functions. The above call could have been written as:
c(1, 3, 5) %>%
map(~runif(3, min = 0, max = .))
## [[1]]
## [1] 0.1435887 0.8404723 0.9722919
##
## [[2]]
## [1] 1.217995 1.631378 1.481980
##
## [[3]]
## [1] 4.475972 3.900816 4.272925
The ~
indicates that the body of the function is about to be begin. The .
stands for the parameter of the function.
When we introduced map()
we said that .x
was a vector or a list, In fact .x
could be an object that can be coerced into a list. Hence it is quite common to use map()
with the data frames: the frame is turned into a list, each element of which is a column of the frame. Here is an example:
data("m111survey", package = "bcscr")
<-
numberNA %>%
m111survey map(~sum(is.na(.)))
str(numberNA)
## List of 12
## $ height : int 0
## $ ideal_ht : int 2
## $ sleep : int 0
## $ fastest : int 0
## $ weight_feel : int 0
## $ love_first : int 0
## $ extra_life : int 0
## $ seat : int 0
## $ GPA : int 1
## $ enough_Sleep : int 0
## $ sex : int 0
## $ diff.ideal.act.: int 2
Note that the elements of the returned list inherit the names of the input data frame. This holds for any named input:
<- c(1, 3, 5)
numbers names(numbers) <- c("one", "three", "five")
%>%
numbers map(~runif(3, min = 0, max = .))
## $one
## [1] 0.2331179 0.9318823 0.9580263
##
## $three
## [1] 2.9757873 1.5267223 0.6133808
##
## $five
## [1] 0.07197437 4.77782735 3.19760884
When the result can take on a form more simple than a list, it is possible to use variants of map()
such as:
map_int()
map_dbl()
map_lgl()
map_chr()
Thus we could obtain a named integer vector of the number of NA
-values for each variable in m11survey
as follows:
<-
numberNA %>%
m111survey map_int(~sum(is.na(.)))
numberNA
## height ideal_ht sleep fastest weight_feel
## 0 2 0 0 0
## love_first extra_life seat GPA enough_Sleep
## 0 0 0 1 0
## sex diff.ideal.act.
## 0 2
Here are the types of each variable:
%>%
m111survey map_chr(typeof)
## height ideal_ht sleep fastest weight_feel
## "double" "double" "double" "integer" "integer"
## love_first extra_life seat GPA enough_Sleep
## "integer" "integer" "integer" "double" "integer"
## sex diff.ideal.act.
## "integer" "double"
Here is a statement of whether or not each variable is a factor:
%>%
m111survey map_lgl(is.factor)
## height ideal_ht sleep fastest weight_feel
## FALSE FALSE FALSE FALSE TRUE
## love_first extra_life seat GPA enough_Sleep
## TRUE TRUE TRUE FALSE TRUE
## sex diff.ideal.act.
## TRUE FALSE
14.3.2 walk()
and Variations
walk()
is similar to map()
, but is used when we are interested in producing side-effects. It applies its .f
argument to each element of .x
is was given, but also returns the .x
in case we want to pipe it into some other function.
Here we use walk()
only for its side-effect: we re-write a familiar function to print a pattern to the Console without using a loop.
<- function(char = "*", n = 5) {
pattern <- c(1:n, (n-1):1)
lineLength <- function(char, n) {
theLine cat(rep(char, times = n), "\n", sep = "")
}%>% walk(theLine, char = char)
lineLength
}
pattern(char = "a", n = 7)
## a
## aa
## aaa
## aaaa
## aaaaa
## aaaaaa
## aaaaaaa
## aaaaaa
## aaaaa
## aaaa
## aaa
## aa
## a
The next example illustrates the use of the return-value of walk()
. We would like to save plots of all numerical variables from the data frame m111survey
, and also print summaries of them to the Console.
First we create a directory to hold the plots:
if ( !dir.exists("plots") ) dir.create("plots")
Next, we get the numerical variables in m111survey
:
<-
numericals %>%
m111survey keep(is.numeric) # purrr::keep()
We used purrr::keep()
, which retains only the elements of its input .x
such that its second argument .p
( a function that returns a single TRUE
or FALSE
) returns TRUE
.
We will also need the names of the numerical variables:
<-
numNames %>%
numericals names()
We need a function to save the density plot of a single numerical variable:
<- function(var, varname) {
saveGraph <-
p ggplot(data = NULL, aes(x = var)) +
geom_density(fill = "burlywood") +
labs(title = paste0("Density plot for ",
"."))
varname, ggsave(filename = paste0("plots/density_", varname, ".png"),
plot = p, device = "png")
}
We also need a function to produce a summary of a single numerical variable:
<- function(x, varname) {
makeSummary <- fivenum(x, na.rm = TRUE)
five list(variable = varname,
min = five[1],
Q1 = five[2],
median = five[3],
Q3 = five[4],
max = five[5])
}
Now we walk through the process. We will actually use the functionpwalk()
, which will take the following inputs:
.x
(a list with two elements: the data frame of numerical variables and the vector of the names of these variables), and.f
(the functionsaveGraph
, to make and save a density plot)
We also use pmap_dfr()
, which takes a list consisting of the data frame and variable-names and constructs a data frame row-by-row, with each row summarizing one of the variables.
list(numericals, numNames) %>%
pwalk(saveGraph) %>% # returns the list
pmap_dfr(makeSummary)
## # A tibble: 6 x 6
## variable min Q1 median Q3 max
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 height 51.0 65.0 68.000 71.75 79
## 2 ideal_ht 54.0 67.0 68.000 75.00 90
## 3 sleep 2.0 5.0 7.000 7.00 10
## 4 fastest 60.0 90.5 102.000 119.50 190
## 5 GPA 1.9 2.9 3.225 3.56 4
## 6 diff.ideal.act. -4.0 0.0 2.000 3.00 18
Check the plots
directory; it should contain these files:
density_diff.ideal.act.png
density_fastest.png
density.GPA.png
density_height.png
density_ideal_ht.png
density_sleep.png
14.3.3 Practice Exercises
Use
map()
to produce a list of the squares of the whole numbers from 1 to 10.Use
map_dbl()
to produce a numerical vector of the squares of the whole numbers from 1 to 10.Use
map_chr
to state the type of each element of the following list:<- list( lst letters,seq(2, 20, by = 2), c(1L, 5L, 7L), 1:10 > 5.5 )
Here are some people:
<- c("Bettina", "Raj", "Isabella", "Khalil") people
The following vector tells whether or not each person is a Grand Poo-Bah:
<- c("humble", "poobah", "poobah", "humble") status
Use
pwalk()
to properly greet each person. The result in the console should be as follows:## Yo, dawg. ## Hail, O Grand Poo-Bah Raj! ## Hail, O Grand Poo-Bah Isabella! ## Yo, dawg.
14.3.4 Solutions to the Practice Exercises
Try this:
map(1:10, ~.^2)
## [[1]] ## [1] 1 ## ## [[2]] ## [1] 4 ## ## [[3]] ## [1] 9 ## ## [[4]] ## [1] 16 ## ## [[5]] ## [1] 25 ## ## [[6]] ## [1] 36 ## ## [[7]] ## [1] 49 ## ## [[8]] ## [1] 64 ## ## [[9]] ## [1] 81 ## ## [[10]] ## [1] 100
This is more verbose, but works just as well:
map(1:10, function(x) x^2)
## [[1]] ## [1] 1 ## ## [[2]] ## [1] 4 ## ## [[3]] ## [1] 9 ## ## [[4]] ## [1] 16 ## ## [[5]] ## [1] 25 ## ## [[6]] ## [1] 36 ## ## [[7]] ## [1] 49 ## ## [[8]] ## [1] 64 ## ## [[9]] ## [1] 81 ## ## [[10]] ## [1] 100
Try this:
map_dbl(1:10, ~.^2)
## [1] 1 4 9 16 25 36 49 64 81 100
Again the more verbose approach works just as well:
map_dbl(1:10, function(x) x^2)
## [1] 1 4 9 16 25 36 49 64 81 100
Try this:
map_chr(lst, typeof)
## [1] "character" "double" "integer" "logical"
Try this:
list(people, status) %>% pwalk(function(person, type) { if ( type == "poobah" ) { cat("Hail, O Grand Poo-Bah ", "!\n", sep = "") person, else { } cat("Yo, dawg.\n") } })