14 Functional Programming in R
It was simple, but you know, it’s always simple when you’ve done it.
—Simone GabbrielliniIn this Chapter we aren’t going to cover any fundamentally new R powers. Instead we’ll get acquainted with just one aspect of a computer programming paradigm known as functional programming. We will examine a set of Rfunctions for which functions themselves are supplied as arguments. These functions allow us to accomplish a great deal of computation in rather concise and expressive code. Not only are they useful in R itself, but they help you to reason abstractly about computation and prepare you for functionalprogramming aspects of other programming languages.
14.1 Programming Paradigms
Let us begin by exploring the notion of a programming paradigm in general. We will go on in this Chapter to consider two programming paradigms for which R provides considerable support. In the next Chapter we will consider a third programming paradigm that exists in R.
A programming paradigm is a way to describe some of the features of programming languages. Often a paradigm includes principles concerning the use of these features, or embodies a view that these features have special importance and utility in good programming practice.
14.1.1 Procedural Programming
One of the older programming paradigms in existence is procedural programming. It is supported in many popular languages and is often the first paradigm within which beginners learn to program. In fact, if one’s programming does not progress beyond a rudimentary level, one may never become aware that one is working within the procedural paradigm—or any paradigm at all, for that matter.
Before we define procedural programming, let’s illustrate it with an example. Almost any of the programs we have written so far would do as examples; for specificity, let’s consider the following snippet of code that produces from the data frame m111survey
a new, smaller frame consisting of just the numerical variables:
# find the numer of columns in the data frame:
cols < length(names(m111survey))
#set up a logical vector of length equal to the number of columns:
is_numerical < logical(cols)
# loop through. For each variable, say if it is numerical:
for (i in seq_along(is_numerical)) {
is_numerical[i] < is.numeric(m111survey[, i])
}
# pick the numerical variables from the data frame
num_summ_111 < m111survey[, is_numerical]
# have a look at the result:
str(num_summ_111)
## 'data.frame': 71 obs. of 6 variables:
## $ height : num 76 74 64 62 72 70.8 70 79 59 67 ...
## $ ideal_ht : num 78 76 NA 65 72 NA 72 76 61 67 ...
## $ sleep : num 9.5 7 9 7 8 10 4 6 7 7 ...
## $ fastest : int 119 110 85 100 95 100 85 160 90 90 ...
## $ GPA : num 3.56 2.5 3.8 3.5 3.2 3.1 3.68 2.7 2.8 NA ...
## $ diff.ideal.act.: num 2 2 NA 3 0 NA 2 3 2 0 ...
By now there is nothing mysterious about the above codesnippet. What we want to become conscious of is the approach we have taken to the problem of selecting the numerical variables. In particular, observe that:
 We worked throughout with data, some of which, like
m111survey
, was given to us and some of which we created on our own to help solve the problem. For example, we created the variablecols
. Note also the very helpful indexvariablei
in thefor
loop. We set up the data structureisNumerical
in order to hold a set of data (TRUE
s andFALSE
s).  We relied on various procedures to create data and to manipulate that data in order to produce the desired result. Some of the procedures appeared as special blocks of code—most notably the
for
loop. Other procedures took the form of functions. As we know, a function encapsulates a useful procedure so that it can be easily reused in a wide variety of circumstances, without the user having to know the details of how it works. We know thatnames()
will give us the vector of names of the columns ofm111survey
, thatlength()
will tell us how many names, there are, thatis.numeric()
will tell us whether or not a given variable inm111survey
is a numerical variable, and so on. The procedures embodied in these functions were written by other folks and we could examine them if we had the time and interest, but for the most part we are content simply to know how to access them.
Procedural programming is a paradigm that solves problems with programs that can be broken up into collections of variables, data structures and procedures. In this paradigm, there is a sharp distinction between variables and data structures on the one hand and procedures on the other. Variables and data structures are data—they are the “stuff” that a program manipulates to produce other data, other “stuff.” Procedures do the manipulating, turning stuff into other stuff.
14.2 The Functional Programming Paradigm
Let us now turn to the second of the two major programming paradigms that we study in this Chapter: Functional Programming.
14.2.1 The Ubiquity of Functions in R
Let’s a bit more closely at our code snippet. Notice how prominently functions figure into it, on nearly every line. In fact, every line calls at least one function! This might seem unbelievable: after all, consider the line below:
num_summ_111 < m111survey[, is_numerical]
There don’t appear to be any functions being called, here! But in fact two functions get called:

The socalled assignment operator
<
is actually a function in disguise: the more official—albeit less readable—form ofvariable < value
is:`<`(variable, value)
Thus, to assign the value 3 to that variable
a
one could write:`<`(a, 3) a # check that a is really 3
## [1] 3

The subsetting operator for vectors
[
, more formally known as extraction (seehelp(Extract)
) is also a function. The expressionm111survey[, isNumerical]
is actually the following functioncall in disguise:`[`(m111survey, isNumerical)
Indeed functions are ubiquitous in R. This is part of the significance of the following wellknown remark by a developer of S, the precursorlanguage of R:
“To understand computations in R, two slogans are helpful:
 Everything that exists is an object.
 Everything that happens is a function call.”
—John Chambers
The second slogan indicates that functions are everywhere in R. It also corresponds to the first principle of the functional programming paradigm, namely:
Computation is regarded as the evaluation of functions.
14.2.2 Functions as FirstClass Citizens
So functions are ubiquitous in R. Another interesting thing about them is that even though they seem to be associated with procedures—after all, they make things happen—they are, nevertheless, also objects. They are data, or “stuff” if you like.
This may not seem obvious at first. But look at the following code, where you can ask what type of thing a function is:
typeof(is.numeric)
## [1] "builtin"
The socalled “primitive” functions of R—the functions written not in R but in Ccode—are “built in” objects. On the other hand, consider this userdefined function:
f < function(x) x+3
typeof(f)
## [1] "closure"
Functions other than primitive functions are objects of type “closure.”^{34}
If a function can be a certain type of thing, then it must be a “thing”—an object, something you can manipulate. For example, you can put functions in a list:
lst < list(is.numeric, f)
lst
## [[1]]
## function (x) .Primitive("is.numeric")
##
## [[2]]
## function(x) x+3
Very importantly, you can make functions serve as argument for other functions, and functions can return other functions as their results. The following example demonstrates both of these possibilities.
cuber < function(f) {
g < function(x) f(x)^3
g
}
h < cuber(abs)
h(2) # returns 2^3 = 2^3 = 8
## [1] 8
In fact, in R functions can be treated just like any variable. In computer programming, we say that such functions are firstclass citizens.
Although it is not often stated as a separate principle of the functional programming paradigm it is true that in languages that provide support for functional programming, the following principle holds true:
Functions are firstclass citizens.
14.2.3 Minimize Side Effects
In the codesnippet under consideration, we note that there are two types of functions:
 functions that return a value;
 functions that provide output to the console or make a change in the Global Environment.
Example of the first type of function included:
length()
names()
seq_along()
is.numeric()
 the extractionfunction
`[`()
A function that produced output to the console was str()
.
The assignment function `<`()
added cols
, isNumerical
and numsm111
to the Global Environment, and also made changes to isNumerical
in the course of the for
loop.
Of course we have seen examples of functions that do two of these things at once, for example:
my_fun < function(x) {
cat("my_fun is running!\n") # output to console
x + 3 # return a value
}
my_fun(6)
## my_fun is running!
## [1] 9
In computer programming, output to the console, along with changes of state—changes to the Global Environment or to the file structure of your computer—are called sideeffects. Functions that only return values and do not produce sideeffects are called pure functions.
A third principle of the functional programming paradigm is:
Functions should be pure.
Now this principle is difficult to adhere to, and in fact if you were to adhere strictly to it in R then your programs would never “do” anything. There do exist quite practical programming languages in which all of the functions are pure—and this leads to some very interesting features such as that the order in which operations are evaluated doesn’t affect what the function returns—but these “purely functional” languages manage purity by having other objects besides functions produce the necessary sideeffects. In R we happily let our functions have sideeffects: we certainly want to do some assignment, and print things out to the console from time to time.
One way that R does support the third principle of functional programming is that it makes it easy to avoid having your functions modify the Global Environment. To see this consider the following example:
add_three < function(x) {
heavenly_hash < 5
x+3 # returns this value
}
result < add_three(10)
result
heavenly_hash
## [1] 13
## Error: object 'heavenly_hash' not found
This is as we expect: the variable heavenly_ash
exists only in the runtime environment that is created in the call to add_three()
. As soon as the function finishes execution that environment dies, and heavenly_hash
dies long with it. In particular, it never becomes part of the Global Environment.
If you really want your functions to modify the Global Environment—or any environment other than its runtime environment, for that matter—then you have to take special measures. You could, for example, use the superassignment operator <<
:
add_three_side_effect < function(x) {
heavenly_hash << 5
x+3 # returns this value
}
result < add_three_side_effect(10)
result
## [1] 13
heavenly_hash
## [1] 5
The superassignment operator looks for the name heavenly_hash
in the parent environment of the runtime environment, If if finds heavenly_hash
there then it changes its value to 5 and stops. Otherwise it looks in the next parent up, and so on until it reaches the Global Environment, at which point if it doesn’t find a heavenly_hash
it creates one and gives it the value. In the example above, assuming you ran the function from the console, the parent environment is the Global Environment and the function has made a change to it: a sideeffect.
Except in the case of explicit assignment functions like `<`()
, changes made by functions to the Global Environment can be quite problematic. After all, we are used to using functions without having to look inside them to see how they do their work. Even if we once wrote the function ourselves, we may not remember how it works, so if it creates side effects we may not remember that it does, and calling them could interfere with other important work that the program is doing. (If the program already has heavenly_hash
in the Global Environment and the we call a function that changes it value, we could be in for big trouble.) Accordingly, R supports the third principle of functional programming to the extent of making it easy for you to avoid function calls that change your Global Environment.
14.2.4 Procedures as HigherOrder Function Calls
The last principle of the functional programming paradigms that we will state here isn’t really a formal principle: it is really more an indication of the programming style that prevails in languages where functions are firstclass objects and that provide other support for functional programming. The final principle is:
As much as possible, procedures should be accomplished by function calls, In particular, loops should be replaced by calls to higherorder functions.
A higherorder function is simply a function that takes other functions as arguments. R provides a nice set of higherorder functions, many of which substitute for iterative procedures such as loops. In subsequent sections we will study the some of the most important higherorder functions, and see how they allow us to express some fairly complex procedures in a concise and readable way. You will also see how this style really blurs the distinction—so fundamental to procedural programming—between data and procedures. In functional programming, functions ARE data, and procedures are just function calls.
14.2.5 Functional Programming: A Summary
For our purposes, the principles of the functional programming paradigm are as follows:
 Computation consists in the evaluation of functions.
 Functions are firstclass citizens in the language.
 Functions should only return values; they should not produce sideeffects. (At the very least they should not modify the Global Environment unless they are dedicated to assignment in the first place.)
 As much as possible, procedures should be written in terms of function calls. In particular, loops should be replaced by calls to higherorder functions.
14.3 purrr HigherOrder Functions for Iteration
In the remainder of the Chapter we will study important higherorder functions: functions that take a function as an argument and apply that function to each element of another data structure. As we have said previously, such functions often serve as alternatives to loops.
The higherorder functions we study come from the package purrr, which is attached whenever we load the tidyverse.
14.3.1 map()
and Variations
Suppose that we want to generate five vectors, each of which consists of ten numbers randomly chosen between 0 and 1. We accomplish the task with a loop, as follows:
# set up a list of length 5:
lst < vector(mode = "list", length = 5)
for (i in 1:5) {
lst[[i]] < runif(10)
}
str(lst)
## List of 5
## $ : num [1:10] 0.271 0.189 0.267 0.956 0.473 ...
## $ : num [1:10] 0.5677 0.9629 0.5131 0.0181 0.7333 ...
## $ : num [1:10] 0.268 0.477 0.263 0.107 0.608 ...
## $ : num [1:10] 0.2411 0.3267 0.0647 0.1426 0.5102 ...
## $ : num [1:10] 0.364 0.524 0.604 0.119 0.835 ...
If we wanted the vectors to have length \(1, 4, 9, 16,\) and 25, then we could write:
## List of 5
## $ : num 0.647
## $ : num [1:4] 0.394 0.619 0.477 0.136
## $ : num [1:9] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
## $ : num [1:16] 0.409 0.54 0.961 0.654 0.547 ...
## $ : num [1:25] 0.96407 0.07147 0.95581 0.94798 0.00119 ...
In the first example, the elements in the vector 1:5
didn’t matter—we wanted a vector of length ten each time—and in the second case the elements in the 1:5
did matter, in that they determined the lengths of the five vectors produced. Of course in general we could apply runif()
to each element of any vector at all, like this:
vec < c(5, 7, 8, 2, 9)
lst < vector(mode = "list", length = length(vec))
for (i in seq_along(vec)) {
lst[[i]] < runif(vec[i])
}
str(lst)
## List of 5
## $ : num [1:5] 0.647 0.394 0.619 0.477 0.136
## $ : num [1:7] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
## $ : num [1:8] 0.826 0.423 0.409 0.54 0.961 ...
## $ : num [1:2] 0.1968 0.0779
## $ : num [1:9] 0.818 0.942 0.884 0.166 0.355 ...
If we can apply runif()
to each element of a vector, why not apply an arbitrary function to each element? That’s what the function map()
will do for us. The general form of map()
is:
map(.x, .f, ...)
In the template above:

.x
can be a list or any atomic vector; 
.f
is a function that is to be applied to each element of.x
. In the default operation ofmap()
, each element of.x
becomes in turn the first argument of.f
. 
...
consists of other arguments that are supplied as arguments for the.f
function, in case you have to set other parameters of the function in order to get it to perform in the way you would like.
The result is always a list.
With map()
we can get the list in our second example as follows:
## List of 5
## $ : num [1:5] 0.647 0.394 0.619 0.477 0.136
## $ : num [1:7] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
## $ : num [1:8] 0.826 0.423 0.409 0.54 0.961 ...
## $ : num [1:2] 0.1968 0.0779
## $ : num [1:9] 0.818 0.942 0.884 0.166 0.355 ...
If we had wanted the random numbers to be between—say—4 and 8, then we would supply extra arguments to runif()
as follows:
## List of 5
## $ : num [1:5] 6.59 5.58 6.47 5.91 4.54
## $ : num [1:7] 4.27 4.52 5.57 4.01 6.48 ...
## $ : num [1:8] 7.3 5.69 5.64 6.16 7.84 ...
## $ : num [1:2] 4.79 4.31
## $ : num [1:9] 7.27 7.77 7.54 4.66 5.42 ...
The default behavior of map()
is that the .x
vector supplies the first argument of .f
. However, if some ...
parameters are supplied then .x
substitutes for the first parameter that is not mentioned in ...
. In the above example, the min
and max
parameters are the second and third parameters for runif()
so .x
substitutes for the first parameter—the one that determines how many random numbers will be generated. In the example below, the vector lower_bounds
substitutes for min
, the second parameter of runif()
:
## [[1]]
## [1] 3.008575 5.964877
##
## [[2]]
## [1] 6.514459 6.282105
##
## [[3]]
## [1] 6.688118 7.431739
Sometimes we wish to vary two or more of the parameters of function. In that case we use pmap()
. The first parameter of pmap()
is named .l
and takes a list of vectors (or lists). For example:
how_many < c(3,1,4)
upper_bounds < c(1, 5, 10)
list(how_many, upper_bounds) %>%
pmap(runif, min = 0)
## [[1]]
## [1] 0.4142409 0.5140328 0.6190231
##
## [[2]]
## [1] 0.193564
##
## [[3]]
## [1] 3.0889976 7.4509632 3.6170952 0.2348365
Observe that pmap()
knows to interpret the first element of the inputlist—the vector how_many
as giving the values of the first argument of runif()
. The second parameter of runif()
(min
) is set at 0, so pmap()
deduces that upper_bounds
—the second element of the inputlist—gives the values for the next next parameter in line, the parameter max
.
One might just as well use pmap()
to vary all three parameters:
how_many < c(3,1,4)
lower_bounds < c(5, 0, 5)
upper_bounds < c(0, 5, 10)
args < list(how_many, lower_bounds, upper_bounds)
args %>%
pmap(runif) %>%
str()
## List of 3
## $ : num [1:3] 0.00743 2.35882 3.90104
## $ : num 1.01
## $ : num [1:4] 5.38 9.16 8.45 9.67
The .f
parameter can be any function, including one that you define yourself. Here’s an example:
r_letters < function(n, upper) {
if (upper) {
sample(LETTERS, size = n, replace = TRUE)
} else {
sample(letters, size = n, replace = TRUE)
}
}
# vary number of letters to pick
sample_sizes < c(3, 6, 9)
# vary the case (upper, lower)
uppercase < c(TRUE, FALSE, TRUE)
list(sample_sizes, uppercase) %>%
pmap(r_letters)
## [[1]]
## [1] "O" "G" "A"
##
## [[2]]
## [1] "x" "s" "p" "m" "n" "u"
##
## [[3]]
## [1] "R" "I" "Q" "M" "D" "E" "Y" "M" "O"
You could also set f
to be a function that you write on the spot, without even bothering to give it a name:
## [[1]]
## [1] 0.2002036 0.2269999 0.5505148
##
## [[2]]
## [1] 1.250093 0.430766 2.521417
##
## [[3]]
## [1] 4.861459 2.029991 2.718963
In computer programming a function is called anonymous when it is not the value bound to some name. .
map()
allows a shortcut for defining anonymous functions. The above call could have been written as:
## [[1]]
## [1] 0.4939934 0.8951945 0.7801631
##
## [[2]]
## [1] 2.5637547 0.6993537 2.7956469
##
## [[3]]
## [1] 4.790131 4.959646 2.544537
The ~
indicates that the body of the function is about to be begin. The .
stands for the parameter of the function.
When we introduced map()
we said that .x
was a vector or a list, In fact .x
could be an object that can be coerced into a list. Hence it is quite common to use map()
with the data frames: the frame is turned into a list, each element of which is a column of the frame. Here is an example:
data("m111survey", package = "bcscr")
number_na <
m111survey %>%
map(~ sum(is.na(.)))
str(number_na)
## List of 12
## $ height : int 0
## $ ideal_ht : int 2
## $ sleep : int 0
## $ fastest : int 0
## $ weight_feel : int 0
## $ love_first : int 0
## $ extra_life : int 0
## $ seat : int 0
## $ GPA : int 1
## $ enough_Sleep : int 0
## $ sex : int 0
## $ diff.ideal.act.: int 2
Note that the elements of the returned list inherit the names of the input data frame. This holds for any named input:
numbers < c(1, 3, 5)
names(numbers) < c("one", "three", "five")
numbers %>%
map(~runif(3, min = 0, max = .))
## $one
## [1] 0.20446027 0.01439487 0.95556547
##
## $three
## [1] 1.918565 1.987142 2.199431
##
## $five
## [1] 4.372549 4.764371 2.637053
When the result can take on a form more simple than a list, it is possible to use variants of map()
such as:
Thus we could obtain a named integer vector of the number of NA
values for each variable in m11survey
as follows:
## height ideal_ht sleep fastest
## 0 2 0 0
## weight_feel love_first extra_life seat
## 0 0 0 0
## GPA enough_Sleep sex diff.ideal.act.
## 1 0 0 2
Here are the types of each variable:
## height ideal_ht sleep fastest
## "double" "double" "double" "integer"
## weight_feel love_first extra_life seat
## "integer" "integer" "integer" "integer"
## GPA enough_Sleep sex diff.ideal.act.
## "double" "integer" "integer" "double"
Here is a statement of whether or not each variable is a factor:
## height ideal_ht sleep fastest
## FALSE FALSE FALSE FALSE
## weight_feel love_first extra_life seat
## TRUE TRUE TRUE TRUE
## GPA enough_Sleep sex diff.ideal.act.
## FALSE TRUE TRUE FALSE
14.3.2 walk()
and Variations
walk()
is similar to map()
, but is used when we are interested in producing sideeffects. It applies its .f
argument to each element of .x
is was given, but also returns the .x
in case we want to pipe it into some other function.
Here we use walk()
only for its sideeffect: we rewrite a familiar function to print a pattern to the Console without using a loop.
pattern < function(char = "*", n = 5) {
line_length < c(1:n, (n1):1)
the_line < function(char, n) {
cat(rep(char, times = n), "\n", sep = "")
}
line_length %>% walk(the_line, char = char)
}
pattern(char = "a", n = 7)
## a
## aa
## aaa
## aaaa
## aaaaa
## aaaaaa
## aaaaaaa
## aaaaaa
## aaaaa
## aaaa
## aaa
## aa
## a
The next example illustrates the use of the returnvalue of walk()
. We would like to save plots of all numerical variables from the data frame m111survey
, and also print summaries of them to the Console.
First we create a directory to hold the plots:
if ( !dir.exists("plots") ) dir.create("plots")
Next, we get the numerical variables in m111survey
:
We used purrr::keep()
, which retains only the elements of its input .x
such that its second argument .p
( a function that returns a single TRUE
or FALSE
) returns TRUE
.
We will also need the names of the numerical variables:
We need a function to save the density plot of a single numerical variable:
save_graph < function(var, varname) {
p <
ggplot(data = NULL, aes(x = var)) +
geom_density(fill = "burlywood") +
labs(title = paste0(
"Density plot for ",
varname, ".")
)
ggsave(
filename = paste0("plots/density_", varname, ".png"),
plot = p, device = "png"
)
}
We also need a function to produce a summary of a single numerical variable:
make_summary < function(x, varname) {
five < fivenum(x, na.rm = TRUE)
list(
variable = varname,
min = five[1],
Q1 = five[2],
median = five[3],
Q3 = five[4],
max = five[5]
)
}
Now we walk through the process. We will actually use the functionpwalk()
, which will take the following inputs:

.x
(a list with two elements: the data frame of numerical variables and the vector of the names of these variables), and 
.f
(the functionsaveGraph
, to make and save a density plot)
We also use pmap_dfr()
, which takes a list consisting of the data frame and variablenames and constructs a data frame rowbyrow, with each row summarizing one of the variables.
## # A tibble: 6 x 6
## variable min Q1 median Q3 max
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 height 51.0 65.0 68.000 71.75 79
## 2 ideal_ht 54.0 67.0 68.000 75.00 90
## 3 sleep 2.0 5.0 7.000 7.00 10
## 4 fastest 60.0 90.5 102.000 119.50 190
## 5 GPA 1.9 2.9 3.225 3.56 4
## 6 diff.ideal.act. 4.0 0.0 2.000 3.00 18
Check the plots
directory; it should contain these files:
density_diff.ideal.act.png
density_fastest.png
density.GPA.png
density_height.png
density_ideal_ht.png
density_sleep.png
14.3.3 Example: Flowery Meadow Redux
In Section 4.3.3.1 we simulated people walking through a meadow, picking flowers until they had picked a desired number of flowers of a desired color. In Section 9.3.3 we used lists to store the results of such a simulation. Now we’ll see how to store the results as a data frame.
First, we modify the helperfunction that simulates one person picking flowers so that, instead of returning a vector of colors, it returns a data frame:
## colors in the filed:
flower_colors < c("blue", "red", "pink", "crimson", "orange")
## new helperfunction:
walk_meadow_df < function(person, color, wanted) {
picking < TRUE
## the following will be extended to hold the flowers picked:
flowers_picked < character()
desired_count < 0
while (picking) {
picked < sample(flower_colors, size = 1)
flowers_picked < c(flowers_picked, picked)
if (picked == color) desired_count < desired_count + 1
if (desired_count == wanted) picking < FALSE
}
## return a data frame:
data.frame(
person = rep(person, times = length(flowers_picked)),
color = flowers_picked
)
}
Note that the new function takes an extra parameter person
, the name of the person picking the flowers.
Let’s try it out:
walk_meadow_df("Scarecrow", "red", 1)
## person color
## 1 Scarecrow blue
## 2 Scarecrow blue
## 3 Scarecrow crimson
## 4 Scarecrow crimson
## 5 Scarecrow crimson
## 6 Scarecrow crimson
## 7 Scarecrow red
Now we write the function to make the data frame of results for a group of people. pmap()
will come in handy.
all_walk_df < function(people, favs, numbers) {
## initialize a list of the required length:
list(people, favs, numbers) %>%
## run it through pmap() to get a list of data frames:
pmap(walk_meadow_df) %>%
## the following purrr function converts the list of df's
## into one data frame, binding:
list_rbind()
}
Let’s try it out:
results <
all_walk_df(
people = c("Dorothy", "Toto"),
favs = c("blue", "orange"),
numbers = c(4, 2)
)
Here are the results:
14.3.4 Practice Exercises
Use
map()
to produce a list of the squares of the whole numbers from 1 to 10.Use
map_dbl()
to produce a numerical vector of the squares of the whole numbers from 1 to 10.
Use
map_chr
to state the type of each element of the following list: 
Here are some people:
people < c("Bettina", "Raj", "Isabella", "Khalil")
The following vector tells whether or not each person is a Grand PooBah:
status < c("humble", "poobah", "poobah", "humble")
Use
pwalk()
to properly greet each person. The result in the console should be as follows:## Yo, dawg. ## Hail, O Grand PooBah Raj! ## Hail, O Grand PooBah Isabella! ## Yo, dawg.
14.3.5 Solutions to the Practice Exercises

Try this:
map(1:10, ~ .^2)
This is more verbose, but works just as well:
map(1:10, function(x) x^2)

Try this:
map_dbl(1:10, ~ .^2)
## [1] 1 4 9 16 25 36 49 64 81 100
Again the more verbose approach works just as well:
map_dbl(1:10, function(x) x^2)

Try this:
map_chr(lst, typeof)
## [1] "character" "double" "integer" "logical"

Try this:
14.4 Other purrr HigherOrder Functions
14.4.1 keep()
and discard()
keep()
is similar to dplyr’s filter()
, but whereas filter()
chooses rows of a data frame based on a given condition, keep()
chooses the elements of the input list or vector .x
based on a condition named .p
.
Examples:
## [1] 1 4 7 10 13 16 19
## 'data.frame': 71 obs. of 6 variables:
## $ weight_feel : Factor w/ 3 levels "1_underweight",..: 1 2 2 1 1 3 2 2 2 3 ...
## $ love_first : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ extra_life : Factor w/ 2 levels "no","yes": 2 2 1 1 2 1 2 2 2 1 ...
## $ seat : Factor w/ 3 levels "1_front","2_middle",..: 1 2 2 1 3 1 1 3 3 2 ...
## $ enough_Sleep: Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 2 1 2 ...
## $ sex : Factor w/ 2 levels "female","male": 2 2 1 1 2 2 2 2 1 1 ...
discard(.x,, . p = condition)
is equivalent to keep(.x, .p = !condition)
. Thus:
## [1] 2 3 5 6 8 9 11 12 14 15 17 18 20
## 'data.frame': 71 obs. of 6 variables:
## $ height : num 76 74 64 62 72 70.8 70 79 59 67 ...
## $ ideal_ht : num 78 76 NA 65 72 NA 72 76 61 67 ...
## $ sleep : num 9.5 7 9 7 8 10 4 6 7 7 ...
## $ fastest : int 119 110 85 100 95 100 85 160 90 90 ...
## $ GPA : num 3.56 2.5 3.8 3.5 3.2 3.1 3.68 2.7 2.8 NA ...
## $ diff.ideal.act.: num 2 2 NA 3 0 NA 2 3 2 0 ...
14.4.2 reduce()
Another important member of the purrr family is reduce()
. Given a vector .x
and a function .f
that takes two inputs, reduce()
does the following:
 applies
f
to elements 1 and 2 of.x
, getting a result;  applies
f
to the result and to element 3 of.x
, getting another result;  applies
f
to this new result and to element 4 of.x
, getting yet another result …  … and so on until all of the elements of
.x
have been exhausted.  then
reduce()
returns the final result in the above series of operations.
For example, suppose that you want to add up the elements of the vector:
vec < c(3, 1, 4, 6)
Of course you could just use:
sum(vec)
## [1] 14
After all, sum()
has been written to apply to many elements at once. But what if addition could only be done two numbers at a time? How might you proceed? You could:
 add the 3 and 1 of (the first two elements of
vec
), getting 4;  then add 4 to 4, the third element of
vec
, getting 8;  then add 8 to 6, the final element of
vec
, getting 14;  then return 14.
reduce()
operates in this way.
## [1] 14
Can you see how reduce()
gets its name? Step by step, it “reduces” its .x
argument, which may consist of many elements, to a single value.
A common application of reduce()
is to take an operation that is defined on only two items and extend it to operate on any number of items. Consider, for example, the function intersect()
, , which will find the intersection of any two vectors of the same type:
## [1] 4 6
You cannot intersect three or more vectors at once:
## Error in base::intersect(x, y, ...) : unused argument (c(4, 7, 9))
With reduce()
you can intersect as many vectors as you like, provided that they are first stored in a list.
lst < list(
c("Akash", "Bipan", "Chandra", "Devadatta", "Raj"),
c("Raj", "Vikram", "Sita", "Akash", "Chandra"),
c("Akash", "Raj", "Chandra", "Bipan", "Lila"),
c("Akash", "Vikram", "Devadatta", "Raj", "Lila")
)
lst %>%
reduce(intersect)
## [1] "Akash" "Raj"
You can write your own function to supply as the argument for .f
, but it has to be able to operate on two arguments. reduce()
will take the first argument of the .f
function to be what has been “accumulated” so far, and the second argument of the .f
function—the value to be combined with what has been accumulated—will be provided by the current element of .x
.
As a simple example, let’s write our own reducesummer in a way that shows the user the reduction process at work:
## the .f function:
my_summer < function(acc, curr) {
cat("So far I have ", acc, ",\n")
cat(
"But just now I was given " , curr,
" to add in.\n\n", sep = ""
)
sum(acc, curr)
}
## .x will be the whole numbers from 1 to 4:
1:4 %>%
reduce(.f = my_summer)
## So far I have 1 ,
## But just now I was given 2 to add in.
##
## So far I have 3 ,
## But just now I was given 3 to add in.
##
## So far I have 6 ,
## But just now I was given 4 to add in.
## [1] 10
When you write your own .f
function, it’s a good idea to use names for the parameters that remind you of their role in the reduction process. acc
(for “accumulated”) and curr
(for “current”) are used above.
reduce()
can take an argument called .init
. When this argument is given a value, operation begins by applying to .f
to .init
and the first element of .x
. For example:
## So far I have 100 ,
## But just now I was given 1 to add in.
##
## So far I have 101 ,
## But just now I was given 2 to add in.
##
## So far I have 103 ,
## But just now I was given 3 to add in.
##
## So far I have 106 ,
## But just now I was given 4 to add in.
## [1] 110
14.4.2.1 An Extended Example of Reduction
Let’s apply reduce()
with .init
to the task of making a truth table: the set of all \(2^n\) logical vectors of a given length \(n\).
The set \(S_1\) of vectors of length \(n = 1\) consists of only two vectors:
##
## vec1 TRUE
## vec2 FALSE
Now consider a systematic way to construct the set \(S_2\) of all the vectors of length two. We know that there are four such vectors:
##
## vec1 TRUE TRUE
## vec2 TRUE FALSE
## vec3 FALSE TRUE
## vec4 FALSE FALSE
Observe that the first two of them begin with TRUE
and end with the set \(S_1\) of vectors of length one:
##
## vec1 TRUE TRUE
## vec2 TRUE FALSE
The last two of them begin with FALSE
and also end with \(S_1\):
##
## vec3 FALSE TRUE
## vec4 FALSE FALSE
Now consider \(S_3\), the set of all eight vectors of length three:
##
## vec1 TRUE TRUE TRUE
## vec2 TRUE TRUE FALSE
## vec3 TRUE FALSE TRUE
## vec4 TRUE FALSE FALSE
## vec5 FALSE TRUE TRUE
## vec6 FALSE TRUE FALSE
## vec7 FALSE FALSE TRUE
## vec8 FALSE FALSE FALSE
Observe that the first four of them end begin with TRUE
and and with the vectors of \(S_2\):
##
## vec1 TRUE TRUE TRUE
## vec2 TRUE TRUE FALSE
## vec3 TRUE FALSE TRUE
## vec4 TRUE FALSE FALSE
The last four of them begin with FALSE
and also end with the vectors of \(S_2\):
##
## vec5 FALSE TRUE TRUE
## vec6 FALSE TRUE FALSE
## vec7 FALSE FALSE TRUE
## vec8 FALSE FALSE FALSE
The pattern is now clear. If for any \(m \ge 1\) you are in possession of the \(2^m \times m\) matrix \(S_m\) of all possible vectors of length \(m\), then to obtain the \(2^{m+1} \times (m+1)\) matrix \(S_{m+1}\) of all possible vectors of length \(m+1\) you should:
 stack \(2^m\)
TRUE
s on top of \(2^m\)FALSE
s, creating a \(2^{m+1} \times 1\) matrix \(U\);  stack the \(S_m\) underneath itself, creating a \(2^{m+1} \times m\) matrix \(V\);
 place \(U\) next to \(V\).
reduce()
with .init
set to \(S_1\) is appropriate for this iterative building process. Here is an implementation:
make_table < function(n, verbose = FALSE) {
# make .init (S_1)
s1 < matrix(c(TRUE, FALSE), nrow = 2)
rownames(s1) < c("vec1", "vec2")
colnames(s1) < c("")
# make .f
build_next < function(accum, value) {
if (verbose) {
cat(
"On value ", value,
" with accumulated material:",
sep = ""
)
print(accum)
}
if (value == 1) return(accum)
r < nrow(accum)
u < c(
rep(TRUE, times = r),
rep(FALSE, times = r)
)
v < rbind(accum, accum)
next_matrix < cbind(u, v)
colnames(next_matrix) < rep("", times = value)
rownames(next_matrix) < paste(
"vec", 1:(2^value), sep = ""
)
if (verbose) {
cat(
"Finishing value", value,
", and I've built:",
sep = ""
)
print(next_matrix)
cat("\n\n")
}
next_matrix
}
# build from .init to the final product S_n
reduce(.x = 1:n, .f = build_next, .init = s1)
}
We have included a verbose
option so we can watch the process as it unfolds.
Note also that the parameters for the .f
function are named:

acc
(what has been “accumulated” up to the current step), and 
value
(the value of.x
at the current step).
It’s conventional to give these or similar names to the parameters of the buildingfunction.
Let’s try it out:
make_table(3, verbose = TRUE)
## On value 1 with accumulated material:
## vec1 TRUE
## vec2 FALSE
## On value 2 with accumulated material:
## vec1 TRUE
## vec2 FALSE
## Finishing value2, and I've built:
## vec1 TRUE TRUE
## vec2 TRUE FALSE
## vec3 FALSE TRUE
## vec4 FALSE FALSE
##
##
## On value 3 with accumulated material:
## vec1 TRUE TRUE
## vec2 TRUE FALSE
## vec3 FALSE TRUE
## vec4 FALSE FALSE
## Finishing value3, and I've built:
## vec1 TRUE TRUE TRUE
## vec2 TRUE TRUE FALSE
## vec3 TRUE FALSE TRUE
## vec4 TRUE FALSE FALSE
## vec5 FALSE TRUE TRUE
## vec6 FALSE TRUE FALSE
## vec7 FALSE FALSE TRUE
## vec8 FALSE FALSE FALSE
##
## vec1 TRUE TRUE TRUE
## vec2 TRUE TRUE FALSE
## vec3 TRUE FALSE TRUE
## vec4 TRUE FALSE FALSE
## vec5 FALSE TRUE TRUE
## vec6 FALSE TRUE FALSE
## vec7 FALSE FALSE TRUE
## vec8 FALSE FALSE FALSE
Of course in practice we would not turn on the verbose
option:
make_table(4)
##
## vec1 TRUE TRUE TRUE TRUE
## vec2 TRUE TRUE TRUE FALSE
## vec3 TRUE TRUE FALSE TRUE
## vec4 TRUE TRUE FALSE FALSE
## vec5 TRUE FALSE TRUE TRUE
## vec6 TRUE FALSE TRUE FALSE
## vec7 TRUE FALSE FALSE TRUE
## vec8 TRUE FALSE FALSE FALSE
## vec9 FALSE TRUE TRUE TRUE
## vec10 FALSE TRUE TRUE FALSE
## vec11 FALSE TRUE FALSE TRUE
## vec12 FALSE TRUE FALSE FALSE
## vec13 FALSE FALSE TRUE TRUE
## vec14 FALSE FALSE TRUE FALSE
## vec15 FALSE FALSE FALSE TRUE
## vec16 FALSE FALSE FALSE FALSE
14.4.3 Practice Exercises

The operator
*
(multiplication) is really a function:`*`(3,5)
## [1] 15
But it can only multiply two numbers at once. The Rfunction
prod()
cna handle as many numbers as you like:prod(3,5,2,7)
## [1] 210
Use
reduce()
and*
to write your own functionproduct()
that takes a numerical vectorvec
and returns the product of the elements of the vector. It should work liek this:product(vec = c(3,4,5))
## [1] 60
(Hint: in the call to
reduce()
you will have to the refer to the*
function as`*`
.) Modify the function
product()
so that it in a single call toreduce()
it multiplies the number 2 by the product of the elements ofvec
. (Hint: set.init
to an appropriate value.)The data frame iris gives information on 150 irises. Use
keep()
to create a new data frame that includes only the numerical variables having a mean greater than 3.5.
14.4.4 Solutions to the Practice Exercises

Try this:
product < function(vec) { reduce(vec, .f = `*`) }

Try this:
product < function(vec) { reduce(vec, .f = `*`, .init = 2) }

Try this:
## 'data.frame': 150 obs. of 2 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
The following does not work. Why?
14.5 Functionals vs. Loops
The higherorder functions we have studied in this chapter are often called functionals. As we pointed out earlier, they deliver results that could have been produced by a writing a loop of some sort.
Once you get used to functionals, you will find that they are often more “expressive” than loops—easier for others to read and to understand, and less prone to bugs. Also, many of them are optimized by the developers of R to run a bit faster than an ordinary loop written in R.
For example, consider the following list. It consists of ten thousand vectors, each of which contains 100 randomlygenerated numbers.
If we want the mean of each vector, we could write a loop:
Or we could use map_dbl()
:
means < map_dbl(lst, mean)
Comparing the two using system.time()
, on my machine I got:
system.time(means < map_dbl(lst, mean))
## user system elapsed
## 1.557 0.073 1.630
For the loop, I get:
system.time({
means < numeric(10000)
for (i in 1:10000) {
means[i] < mean(lst[[i]])
}
})
## user system elapsed
## 1.653 0.075 1.730
The mapfunction is a bit faster, but the difference is small.
Remember also that vectorization is much faster than looping, and is also usually quite expressive, so don’t struggle to take a functional approach when vectorization is possible. (This advice applies to a several examples from this Chapter, in which the desired computations had already been accomplished in earlier chapters by some form of vectorization.)
14.6 Conclusion
In this Chapter we have concentrated on only a single aspect of the Functional Programming paradigm: exploiting the fact that functions are firstclass citizens in R, we studied a number of higherorder functions that can substitute for loops. There is certainly a great deal more to Functional Programming than the mere avoidance of loops, but we’ll end our study at this point. Familiarity with higherorder functions will stand you in good stead when you begin, in subsequent courses on web programming, to learn the JavaScript language. JavaScript makes constant use of higherorder functions!
Glossary
 Programming Paradigm

A programming paradigm is a way to describe some of the features of programming languages. Often a paradigm includes principles concerning the use of these features, or embodies a view that these features have special importance and utility in good programming practice.
 Procedural Programming

A programming paradigm that solves problems with programs that can be broken up into collections of variables, data structures and procedures. This paradigm tends to draw a sharp distinction between variables and data structures on the one hand and procedures on the other.
 Functional Programming

A programming paradigm that stresses the central role of functions. Some of its basic principles are:
 Computation consists in the evaluation of functions.
 Functions are firstclass citizens in the language.
 Functions should only return values; they should not produce sideeffects.
 As much as possible, procedures should be written in terms of function calls.
 Pure Function

A function that does not produce sideeffects.
 Side Effect

A change in the state of the program (i.e., a change in the Global Environment) or any interaction external to the program (i.e., printing to the console).
 HigherOrder Function

A function that takes another function as an argument.
 Anonymous Function

A function that does not have a name.
 Refactoring

The act of rewriting computer code so that it performs the same task as before, but in a different way. (This is usually done to make the code more humanreadable or to make it perform the task more quickly.)
Exercises

Explain in words what the following line of code produces when given a numerical vector
y
:map(y, function(x) x^3 + 1)
In the course of your explanation, say whether the result is a vector or a list.

Which do you think works faster for a given numerical vector
y
? This code:Or this code?
sqrt(y)
Justify your answer with a convincing example, using
system.time()
. What moral do you draw from this? 
To refactor computer code is to rewrite the code so that it does the same thing, but in a different way. We might refactor code in order to make it more readable by humans, or to make it perform its task more quickly.
Refactor the following code so that it uses
keep()
instead of a loop:df < bcscr::m111survey keep_variable < logical(length(names(df))) for (col in seq_along(keep_variable)) { var < df[, col] is_numeric < is.numeric(var) all_there < !any(is.na(var)) keep_variable[col] < is_numeric && all_there } new_frame < df[, keep_variable] head(new_frame)

The following function produces a list of vectors of uniform random numbers, where the lower and upper bounds of the numbers are given by the arguments to the parameters
lower
andupper
respectively, and the number of vectors in the list and the number of random numbers in each vector are given by a vector supplied to the parametervecs
.random_sims < function(vecs, lower = 0, upper= 1, seed = NULL) { # set seed if none is provided by the user if (!is.null(seed)) { set.seed(seed) } lst < vector(mode = "list", length = length(vecs)) for (i in seq_along(vecs)) { lst[[i]] < runif(vecs[i], min = lower, max = upper) } lst }
Refactor the code for
random_sims()
so that it usesmap()
instead of a loop. 
The following enhanced version of
randomSims()
is even more flexible, as it allows both the upper and lower limits for the randomlygenerated numbers to vary with each vector of numbers that is produced.random_sims2 < function(vecs, lower, upper, seed = NULL) { # validate input if (!(length(vecs) == length(upper) && length(upper) == length(lower)) ) { return( cat("All vectors entered must have the same length.") ) } if (any(upper < lower)) { return( cat(paste0( "Every upper bound must be at least as ", "big as the corresponding lower bound." ) ) ) } # set seed if none is provided by the user if (!is.null(seed)) { set.seed(seed) } lst < vector(mode = "list", length = length(vecs)) for (i in seq_along(vecs)) { lst[[i]] < runif( vecs[i], min = lower[i], max = upper[i] ) } lst }
Use
pmap()
to refactor the code forrandom_sims2()
so as to avoid using the loop. 
Supposing that
y
is a numerical vector, explain in words what the following code produces: Write a line of code using the subsetting operator
[
that produces the same result as the code in the previous problem.
Use
keep()
to write a function calledodd_members()
that, given any numerical vector, returns a vector containing the odd numbers of the given vector. Your function should take a single argument calledvec
, the given vector. A typical example of use would be as follows:odd_members(vec = 1:10)
## [1] 1 3 5 7 9

You are given the following list of character vectors:
lst < list( c("Akash", "Bipan", "Chandra", "Devadatta", "Raj"), c("Raj", "Vikram", "Sita", "Akash", "Chandra"), c("Akash", "Raj", "Chandra", "Bipan", "Lila"), c("Akash", "Vikram", "Devadatta", "Raj", "Lila") )
Use
reduce()
and theunion()
function to obtain a character vector that is the union of all the vectors inlst
. Remember the function
subStrings()
from the exercises of the Chapter on Strings? Refactor it so that it does EXACTLY the same thing but makes no use of loops.
Solve Part One of Advent of Code 2022 Day 3. Save your input file in your submit folder with the filename
input_aoc_202203.txt
, and read in the input data, naming itinput
, using the following code:input < readLines("input_aoc_202203.txt")

Solve Part One of Advent of Code 2022 Day 25. Save your input file in your submit folder with the filename
input_aoc_202225.txt
, and read in the input data, naming itinput
, using the following code:input < readLines("input_aoc_202225.txt")
(Hint: The snafu numberingsystem bears some relationship to base5 numbering. After reviewing Section 11.4, write two helperfunctions: one to convert numbers to “basesnafu” and another to convert basesnafu representations to numbers.)